Quick intro to regexp internals: think DFA

Source : xkcd
> time perl -e '("a" x 10) =~ /^(a?){10}(a){10}$/;'real 0m0.010s
user 0m0.005s
sys 0m0.004s
> time perl -e '("a" x 25) =~ /^(a?){25}(a){25}$/;'real 0m5.541s
user 0m5.532s
sys 0m0.007s
NFA
DFA, recognizing the same langage.
> time perl -e 'print ("a" x 25)' | grep -E "^(a?){25}(a){25}$"
aaaaaaaaaaaaaaaaaaaaaaaaa
real 0m0.0011s
user 0m0.007s
sys 0m0.007s

Conclusion

This was just a quick overview of differences between NFA and DFA when implementing regexps. There are many other types of automata that have been used throughout the ages to implement pattern recognition. Regexp world is a fantastic universe mixing data-structures, algorithms, and formal languages. If you want to learn more about history and implementations, you can check my slides on Github, or have a look at the following resources.

References

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store