Catalog / Advanced Regular Expressions Cheat Sheet
Advanced Regular Expressions Cheat Sheet
A concise guide to advanced regular expression patterns and techniques, including lookarounds, backreferences, and conditional matching, designed to help you master complex text manipulation.
Lookarounds
Positive Lookahead
|
Matches a group without capturing it. Useful when you need to group parts of a regex but don’t need to refer back to them. Example:
Matches a URL but doesn’t capture the protocol. |
|
Asserts that the regex matches the Example:
Matches a word followed by ’ Inc.’, without including ’ Inc.’ in the matched text. |
|
Find “X” only if followed by “Y”. |
Example |
Matches ‘foo’ only if it’s followed by ‘bar’, but ‘bar’ is not part of the match. |
Use cases |
Validating password strength, parsing structured data, and conditional replacements. |
Real-world example |
Extract the version number from ‘app-1.2.3.zip’ using |
Negative Lookahead
|
Find “X” only if not followed by “Y”. |
Example |
Matches ‘foo’ only if it’s NOT followed by ‘bar’. |
|
Asserts that the regex matches if the Example:
Matches ‘%word’ only if it is not preceded by a digit. |
Use cases |
Filtering log files, validating data formats, and advanced search functionalities. |
Real-world example |
Find all words that are not preceded by a number using |
|
Asserts that the regex matches if the Example:
Matches a one or more digits if not preceded by a capital letter |
Positive Lookbehind
|
Asserts that the regex matches the Example:
Matches a number preceded by ‘USD’, without including ‘USD’ in the matched number. |
|
Find “Y” only if preceded by “X”. |
Example |
Matches ‘foo’ only if it’s preceded by ‘bar’, but ‘bar’ is not part of the match. |
Use cases |
Extracting data from specific contexts, validating formatted input, and data sanitization. |
Real-world example |
Extract file sizes (numbers) only when they are indicated in kilobytes (KB) using |
Note |
Not supported in all regex engines. |
Negative Lookbehind
|
Asserts that the regex matches if the Example:
Matches ‘%word’ only if it is not preceded by a digit. |
|
Find “Y” only if not preceded by “X”. |
Example |
Matches ‘foo’ only if it’s NOT preceded by ‘bar’. |
Use cases |
Filtering data based on context, excluding unwanted patterns, and refining search results. |
Real-world example |
Find function names that are not part of a class method definition using |
Note |
Not supported in all regex engines. |
Backreferences
Basic Backreference
|
Refers to the text matched by the 1st, 2nd, etc. capturing group. Example:
Matches a repeated word, like ‘the the’. |
Use cases |
Finding duplicate words, validating symmetrical patterns, and complex text replacements. |
Example |
Find duplicated words in a text: |
Common mistake |
Forgetting that backreferences refer to the exact matched text, not the pattern. |
Real-world example |
Correct HTML tag pairing using |
Note |
Backreferences can significantly increase the complexity (and processing time) of regex matching. |
Named Capture Groups
|
Defines a named capture group. Example:
Matches a date and names the groups ‘year’, ‘month’, and ‘day’. |
|
Alternative syntax for defining named capture groups in .NET. |
|
Refers to a named capture group. Example:
Matches repeated words using the named group ‘word’. |
Use cases |
Parsing complex data structures, extracting specific parts of a string, and making regexes more readable. |
Real-world example |
Extract specific parts of a log entry like timestamp, log level, and message using named groups for better clarity and maintainability. |
Note |
Named groups improve readability but might not be supported in all regex engines. |
Backreference in Replacement
|
Refers to captured groups in the replacement string. Example:
Swaps the first and last word separated by a comma and space. |
|
Alternative syntax for backreferences in replacement strings, especially in languages like Python. |
Use cases |
Reformatting data, swapping fields, and complex string manipulations. |
Example |
Reformat phone numbers from ‘123-456-7890’ to ‘(123) 456-7890’ using |
Note |
Ensure that the backreference number matches the intended capture group to avoid unexpected results. |
Real-world example |
Swap first name and last name in a CSV file, where names are separated by a comma, using backreferences in the replacement string. |
Conditional Matching
If-Then-Else Conditionals
|
Matches either the |
Condition syntax |
(?(1)then|else) - Condition based on whether group 1 matched. |
Example |
Matches email addresses, optionally enclosed in angle brackets. |
Use cases |
Handling optional elements, validating complex data formats, and adapting matching based on context. |
Real-world example |
Parse data entries where some fields are optional but depend on the presence of others, such as address fields in a contact database. |
Note |
Not supported in all regex engines, and syntax may vary. |
If-Then Conditionals
|
Matches the |
Condition syntax |
(?(name)then) - Condition based on whether named group ‘name’ matched. |
Example |
Matches a number, optionally enclosed in parentheses, but only if both parentheses are present. |
Use cases |
Validating paired elements, handling different formats, and ensuring data consistency. |
Real-world example |
Process log entries that may or may not include a timestamp, but require specific handling if the timestamp is present. |
Note |
Like If-Then-Else, If-Then conditionals have limited support across regex engines. |
Recursion
Recursive Patterns
|
Recurses the entire regular expression. Example:
Matches nested parentheses. |
|
Recurses the nth subpattern. |
Use cases |
Matching nested structures, parsing markup languages, and validating complex syntax. |
Note |
Recursion is powerful but can lead to performance issues or stack overflow errors with deeply nested structures. Not supported in all regex engines. |
Example |
Match nested HTML tags like |
Real-world example |
Parse nested JSON or XML structures, ensuring that all opening tags have corresponding closing tags. |