The Regex Survival Guide · 12 Patterns You Actually Need

01The minimal cheatsheet

Six metacharacters cover most of what you write:

. — any character except newline
* — 0 or more of the previous
+ — 1 or more
? — 0 or 1 (also: make a quantifier non-greedy)
^ — start of string (or line in multiline mode)
$ — end of string

Plus four character classes:

\d — digit, equivalent to [0-9]
\w — word character, equivalent to [a-zA-Z0-9_]
\s — whitespace
[abc] — any one of a, b, or c. [^abc] — none of them.

That's 90% of regex.

02Email validation (the practical one)

✓ practical email regex

/^[^\s@]+@[^\s@]+\.[^\s@]+$/

Reads as: "one or more non-space-non-@ characters, then @, then more non-space-non-@, then a dot, then more non-space-non-@."

Why not the RFC-compliant 500-character regex? Because it accepts addresses no real-world email provider accepts, and rejects nothing users actually try to send. For form validation, the simple version is correct. The truth is: you can't reliably validate an email with regex — you have to send a confirmation email.

03URL matching

✓ extract URLs from text

/https?:\/\/[^\s<>"]+/g

Matches http:// or https:// followed by anything that isn't whitespace, angle bracket, or quote. The ? after https makes the s optional. The g flag finds all matches, not just the first.

For full URL validation (not extraction), use new URL(str) in JavaScript and catch the exception. The URL constructor is more accurate than any regex.

04Phone numbers — accept any format

✓ flexible phone matching

/^\+?[\d\s\-\(\)]{7,}$/

Reads: "optional plus, then 7+ characters of digits, spaces, hyphens, or parens." Don't try to enforce country-specific formats. Strip non-digits and validate the digit count separately:

✓ better approach — strip then validate

const digits = phone.replace(/\D/g, '');
const valid = digits.length >= 7 && digits.length <= 15;

05Dates — ISO format

✓ ISO date

/^\d{4}-\d{2}-\d{2}$/

Reads: "4 digits, hyphen, 2 digits, hyphen, 2 digits." This matches format, not validity — "2026-13-45" passes. After regex match, parse with new Date(str) and check !isNaN(date) for actual date validation.

06Strip extra whitespace

✓ normalize whitespace

// Collapse runs of whitespace to single spaces, trim ends
const normalized = str.replace(/\s+/g, ' ').trim();

\s+ matches one or more whitespace characters (space, tab, newline). Replace with single space, trim ends. This is the canonical "clean up user input" pattern.

07Capture groups — extract pieces

Parentheses create capture groups. You access them by index (1-based) in most languages, or by name with (?<name>...).

✓ extract parts of a string

// Match "2026-05-25" and extract year, month, day
const match = '2026-05-25'.match(/^(\d{4})-(\d{2})-(\d{2})$/);
// match[1] = '2026', match[2] = '05', match[3] = '25'

// Named groups (clearer):
const match = '2026-05-25'.match(/^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/);
// match.groups.year, match.groups.month, match.groups.day

Named groups are worth the verbosity. match[3] is a magic number; match.groups.day documents itself.

08Non-greedy matching

Quantifiers like + and * are greedy by default — they match as much as possible. This causes a classic bug:

✗ greedy match captures too much

// Goal: extract content inside <a> tags
'<a>link 1</a> and <a>link 2</a>'.match(/<a>(.+)<\/a>/);
// matches: '<a>link 1</a> and <a>link 2</a>'
// captures: 'link 1</a> and <a>link 2'  ← WRONG

✓ non-greedy with ?

'<a>link 1</a> and <a>link 2</a>'.matchAll(/<a>(.+?)<\/a>/g);
// 1st match captures: 'link 1'
// 2nd match captures: 'link 2'

Adding ? after a quantifier makes it non-greedy — match as little as possible. Default to non-greedy when you're extracting things from larger text.

Bigger warning: don't parse HTML or XML with regex. There are entire StackOverflow threads dedicated to why this is a bad idea. Use a proper parser (DOMParser in browsers, cheerio in Node). Regex is fine for "find the email address in this paragraph," not for "extract all the form fields from this HTML page."

09Word boundaries

\b matches a word boundary — the transition between a word character and a non-word character. Useful for matching whole words:

✓ match whole word only

// 'cat' matches 'cat', not 'cats' or 'concatenate'
/\bcat\b/

// Useful for find-and-replace
text.replace(/\boldname\b/g, 'newname');

10Alternation — OR matching

✓ match one of several

// HTTP status codes 4xx or 5xx
/\b(4|5)\d{2}\b/

// Common file extensions
/\.(jpg|jpeg|png|gif|webp)$/i

// Yes/no variations
/^(y|yes|yeah)$/i

The i flag at the end makes matching case-insensitive. (a|b|c) matches a OR b OR c. Wrap in non-capturing group (?:a|b|c) if you don't need to capture the matched value.

11Hex colors

✓ validate hex color

// Match #rgb or #rrggbb (with or without alpha)
/^#([0-9a-f]{3}|[0-9a-f]{4}|[0-9a-f]{6}|[0-9a-f]{8})$/i

Matches 3, 4, 6, or 8 hex characters after a #. The | alternations allow each valid length. Case-insensitive via i.

12Log parsing — extract structured fields

✓ Apache/nginx log line

// 127.0.0.1 - - [25/May/2026:12:34:56 +0000] "GET /path HTTP/1.1" 200 1234
/^(?<ip>\S+) \S+ \S+ \[(?<date>[^\]]+)\] "(?<method>\S+) (?<path>\S+) [^"]+" (?<status>\d+) (?<size>\d+)/

Named groups make log-parsing regexes readable. \S+ means "one or more non-whitespace characters," which works as a field separator. [^\]]+ means "one or more characters that aren't a closing bracket," which captures the date string inside brackets without including the brackets themselves.

13Catastrophic backtracking — the trap

Some regex patterns can hang your process indefinitely on certain inputs. Example: (a+)+b on the input "aaaaaaaaaaX". The regex engine tries every possible grouping of a's and takes exponential time to fail.

Avoid nested quantifiers like (a+)+ or (a*)*. If you find yourself writing one, restructure. Modern languages (Rust, Go) use linear-time regex engines that don't have this problem; JavaScript, Python, and Java do.

Test regex with adversarial input. If validation user-input fields, consider running regex with a timeout, or use a linear-time regex library like RE2.

∞The discipline

Regex is a power tool. It's also famously unreadable — the joke is that nobody can read their own regex three months later. Two habits make this manageable:

Add comments with the x flag (where supported), or write a comment block above the regex explaining what it matches in plain English.
Test with regex101.com before shipping. The site shows you a step-by-step match breakdown and warns about catastrophic backtracking risk.

Regex is best when used in moderation. For complex parsing, a real parser is always better. But for the 12 patterns above, regex is the right tool — and now you have them.