grep and Other RegEx Functions
grep stands for " globally search for a regular expression and print all matches," just as in UNIX. The function allows you to use regular expressions to search for a pattern in a vector of strings or characters, and returns the index (indices) of the match(es).
Additionally, the function
grepl (derived from grep-logical) uses the same inputs, but returns a logical vector, where
TRUE indicates a match at that index, and
FALSE indicates the opposite.
grep(".*s$", c("waffle", "waffles", "pancake", "pancakes"))
 2 4
grepl(".*s$", c("cats", "bats", "geese", "meese")) # fun fact: meese is not the plural of moose
 TRUE TRUE FALSE FALSE
Oftentimes finding the indices of matches in your text isn’t what you want — your goal is to change the text into a format that’s better for parsing. For this, we have
gsub. These functions take a regular expression and a replacement expression, applying the replacement to a string or a vector of strings.
The key difference here is that
sub applies only to the first match, while
gsub applies to all matches (derived from global-substitution).
sub("l", "?", "The best part of waking up is Folgers in your cup") # not sponsored or affiliated
 "The best part of waking up is Fo?gers in your cup"
gsub("[aeiou]", "!", "The best part of waking up is Folgers in your cup") # globally not sponsored or affiliated
 "Th! b!st p!rt !f w!k!ng !p !s F!lg!rs !n y!!r c!p"
For these functions, it’s equally valid to apply vector-wise or individually. Applying on vectors will repeat the substitution process for each individual string, so naturally a single string would work.
Regular expressions are hard, even for some veteran programmers, as the rules and match characters are subtly different for each programming language. This is a great resource for RegEx basics — everything on the second page is useful — and string manipulations which encompass more than that of