Regex & Text Manipulation Cheatsheet

Regex Fundamentals

Basic Metacharacters

`.` (Dot)	Matches any single character except newline. Example: `a.c` matches “abc”, “aac”, “adc”, etc.
`^` (Caret)	Matches the beginning of the string. Example: `^abc` matches “abc” only if it’s at the beginning.
`$` (Dollar)	Matches the end of the string. Example: `xyz$` matches “xyz” only if it’s at the end.
`*` (Asterisk)	Matches 0 or more occurrences of the preceding character or group. Example: `ab*c` matches “ac”, “abc”, “abbc”, “abbbc”, etc.
`+` (Plus)	Matches 1 or more occurrences of the preceding character or group. Example: `ab+c` matches “abc”, “abbc”, “abbbc”, etc., but not “ac”.
`?` (Question Mark)	Matches 0 or 1 occurrence of the preceding character or group. Example: `ab?c` matches “ac” or “abc”.
`[]` (Character Set)	Matches any single character within the set. Example: `[aeiou]` matches any vowel.
`[^]` (Negated Character Set)	Matches any single character not within the set. Example: `[^aeiou]` matches any character that is not a vowel.
`\|` (Pipe)	Acts as an “OR” operator, matching either the expression before or after the pipe. Example: `cat\|dog` matches either “cat” or “dog”.

Quantifiers and Grouping

`{n}`	Matches exactly `n` occurrences. Example: `a{3}` matches “aaa”.
`{n,}`	Matches `n` or more occurrences. Example: `a{2,}` matches “aa”, “aaa”, “aaaa”, etc.
`{n,m}`	Matches between `n` and `m` occurrences. Example: `a{2,4}` matches “aa”, “aaa”, or “aaaa”.
`( )` (Grouping)	Groups patterns together, allowing you to apply quantifiers or other operations to the entire group. Example: `(ab)+` matches “ab”, “abab”, “ababab”, etc.
`\` (Escape)	Escapes special characters, allowing you to match them literally. Example: `\*` matches a literal asterisk.

Character Classes

`\d`	Matches any digit (0-9). Example: `\d+` matches one or more digits.
`\w`	Matches any word character (letters, digits, and underscores). Example: `\w+` matches one or more word characters.
`\s`	Matches any whitespace character (space, tab, newline, etc.). Example: `\s+` matches one or more whitespace characters.
`\D`	Matches any non-digit character. Example: `\D+` matches one or more non-digit characters.
`\W`	Matches any non-word character. Example: `\W+` matches one or more non-word characters.
`\S`	Matches any non-whitespace character. Example: `\S+` matches one or more non-whitespace characters.

Advanced Regex Concepts

Lookarounds (Zero-Width Assertions)

`(?=pattern)` (Positive Lookahead)	Asserts that the pattern is followed by the specified `pattern`, but doesn’t include the `pattern` in the match. Example: `\w+(?=\s)` matches a word followed by a space, but the space isn’t part of the match.
`?!pattern` (Negative Lookahead)	Asserts that the pattern is not followed by the specified `pattern`. Example: `\w+(?!\s)` matches a word not followed by a space.
`(?<=pattern)` (Positive Lookbehind)	Asserts that the pattern is preceded by the specified `pattern`, but doesn’t include the `pattern` in the match. Requires fixed width pattern in some languages. Example: `(?<=\s)\w+` matches a word preceded by a space, but the space isn’t part of the match.
`?<!pattern` (Negative Lookbehind)	Asserts that the pattern is not preceded by the specified `pattern`. Requires fixed width pattern in some languages. Example: `(?<!\s)\w+` matches a word not preceded by a space.

Backreferences

\1, \2, etc.

Refers to the captured group with the corresponding number. Useful for matching repeated patterns.

Example: (.)\1+ matches two or more consecutive identical characters.

Flags/Modifiers

`i` (Case-insensitive)	Makes the regex case-insensitive. Example: `/abc/i` matches “abc”, “Abc”, “ABC”, etc.
`g` (Global)	Finds all matches rather than stopping after the first. Example: `/abc/g` finds all occurrences of “abc” in a string.
`m` (Multiline)	Treats the string as multiple lines, allowing `^` and `$` to match the start and end of each line. Example: `/^abc$/m` matches “abc” at the beginning of any line.
`s` (Dotall)	Allows the `.` to match newline characters as well. Example: `/a.c/s` matches “a\nc”.

Text Manipulation Techniques

String Splitting

Splitting by a delimiter

Use the split() method (or equivalent) to divide a string into an array based on a delimiter.

Example (Python):

text = "apple,banana,orange"
result = text.split(",") # Output: ['apple', 'banana', 'orange']

Splitting by Regex

Use regex for more complex splitting scenarios.

Example (JavaScript):

const text = "one two three  four";
const result = text.split(/\s+/); // Split by one or more spaces
// Output: ['one', 'two', 'three', 'four']

String Replacement

Basic Replacement

Replace a substring with another string.

Example (Java):

String text = "Hello World";
String result = text.replace("World", "Java"); // Output: Hello Java

Regex Replacement

Use regex for more powerful replacement operations.

Example (C#):

using System.Text.RegularExpressions;

string text = "123-456-7890";
string result = Regex.Replace(text, "[\\d-]", "X"); // Output: XXX-XXX-XXXX

Substring Extraction

Using indices

Extract a portion of a string using start and end indices.

Example (C++):

#include <iostream>
#include <string>

int main() {
  std::string text = "Hello World";
  std::string result = text.substr(6, 5); // Start at index 6, length 5
  std::cout << result << std::endl; // Output: World
  return 0;
}

Regex-based extraction

Use regex groups to extract specific parts of a string.

Example (Ruby):

text = "My phone number is 123-456-7890"
match = text.match(/.*(\d{3}-\d{3}-\d{4})/) #Capture group
if match
  puts match[1] # Output: 123-456-7890
end

Regex & Text Manipulation in Algorithms

Palindrome Check

Use regex to preprocess the string by removing non-alphanumeric characters and converting to lowercase. Then, compare the string to its reverse.

Example (Python):

import re

def is_palindrome(s):
  processed_string = re.sub(r'[^a-zA-Z0-9]', '', s).lower()
  return processed_string == processed_string[::-1]

print(is_palindrome("A man, a plan, a canal: Panama")) # Output: True

Validating User Input

Regex is excellent for validating formats such as email addresses, phone numbers, or passwords.

Example (JavaScript):

function isValidEmail(email) {
  const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
  return emailRegex.test(email);
}

console.log(isValidEmail("[email protected]")); // Output: true
console.log(isValidEmail("invalid-email")); // Output: false

Parsing Log Files

Regex can be used to extract relevant information from log files.

Example (Python):

import re

log_line = "2023-10-26 10:00:00 INFO: User logged in"
match = re.search(r'INFO: (.*)$', log_line)
if match:
  print(match.group(1))  # Output: User logged in

String Compression/Decompression

Text manipulation techniques can be used in string compression and decompression algorithms, such as Run-Length Encoding (RLE).

Example (Python):


def compress_string(s):
    compressed = ''
    count = 1
    for i in range(len(s)):
        if i + 1 < len(s) and s[i] == s[i + 1]:
            count += 1
        else:
            compressed += s[i] + str(count)
            count = 1
    return compressed

print(compress_string("AAABCCDAA")) # Output: A3B1C2D1A2

Browse / Regex & Text Manipulation Cheatsheet

Regex & Text Manipulation Cheatsheet

Regex & Text Manipulation Cheatsheet

Regex Fundamentals

Basic Metacharacters

Quantifiers and Grouping

Character Classes

Advanced Regex Concepts

Lookarounds (Zero-Width Assertions)

Backreferences

Flags/Modifiers

Text Manipulation Techniques

String Splitting

String Replacement

Substring Extraction

Regex & Text Manipulation in Algorithms

Palindrome Check

Validating User Input

Parsing Log Files

String Compression/Decompression