Regular Expressions

Regular Expressions allow us to search for patterns in text. Sometimes we want to analyze or enumerate patterns. Sometimes we want to replace patterns. Sometimes regular expressions are part of the logic of a larger piece of the puzzle, e.g., used in a conditional statement that controls the flow of part of our analysis, depending on the structure of a text to be analyzed.

Regular expressions can be used in many tools, e.g., in awk, bash, Java, Python, R, SQL, etc. The way to write a regular expression varies a little bit, from one tool to the next, but once we understand the basic idea of regular expressions, we can perform text analysis more easily and in more innovative ways.

Open Group’s Single UNIX Specification Version 4 (login required; no cost to register) webpage provides a description of regular expressions in UNIX.

Regular Expressions Books

  • Mastering Regular Expressions, 3rd Edition, by Jeffrey E.F. Friedl (O’Reilly, 2006), available at O’Reilly or Amazon

  • Regular Expressions Cookbook, 2nd Edition, by Jan Goyvaerts and Steven Levithan (O’Reilly, 2012), available at O’Reilly or Amazon

  • Introducing Regular Expressions by Michael Fitzgerald (O’Reilly, 2012), available at O’Reilly or Amazon