The regex survival guide — 12 patterns you actually need.
Regex tutorials teach you theory: lookbehinds, atomic groups, possessive quantifiers. In reality you'll use 5% of that 95% of the time. This is the practical reference — the patterns that solve real problems, with explanations of what each piece does, and the common edge cases that trip people up.
01The minimal cheatsheet
Six metacharacters cover most of what you write:
.— any character except newline*— 0 or more of the previous+— 1 or more?— 0 or 1 (also: make a quantifier non-greedy)^— start of string (or line in multiline mode)$— end of string
Plus four character classes:
\d— digit, equivalent to[0-9]\w— word character, equivalent to[a-zA-Z0-9_]\s— whitespace[abc]— any one of a, b, or c.[^abc]— none of them.
That's 90% of regex.
02Email validation (the practical one)
/^[^\s@]+@[^\s@]+\.[^\s@]+$/
Reads as: "one or more non-space-non-@ characters, then @, then more non-space-non-@, then a dot, then more non-space-non-@."
Why not the RFC-compliant 500-character regex? Because it accepts addresses no real-world email provider accepts, and rejects nothing users actually try to send. For form validation, the simple version is correct. The truth is: you can't reliably validate an email with regex — you have to send a confirmation email.
03URL matching
/https?:\/\/[^\s<>"]+/g
Matches http:// or https:// followed by anything that isn't whitespace, angle bracket, or quote. The ? after https makes the s optional. The g flag finds all matches, not just the first.
For full URL validation (not extraction), use new URL(str) in JavaScript and catch the exception. The URL constructor is more accurate than any regex.
04Phone numbers — accept any format
/^\+?[\d\s\-\(\)]{7,}$/
Reads: "optional plus, then 7+ characters of digits, spaces, hyphens, or parens." Don't try to enforce country-specific formats. Strip non-digits and validate the digit count separately:
const digits = phone.replace(/\D/g, '');
const valid = digits.length >= 7 && digits.length <= 15;
05Dates — ISO format
/^\d{4}-\d{2}-\d{2}$/
Reads: "4 digits, hyphen, 2 digits, hyphen, 2 digits." This matches format, not validity — "2026-13-45" passes. After regex match, parse with new Date(str) and check !isNaN(date) for actual date validation.
06Strip extra whitespace
// Collapse runs of whitespace to single spaces, trim ends
const normalized = str.replace(/\s+/g, ' ').trim();
\s+ matches one or more whitespace characters (space, tab, newline). Replace with single space, trim ends. This is the canonical "clean up user input" pattern.
07Capture groups — extract pieces
Parentheses create capture groups. You access them by index (1-based) in most languages, or by name with (?<name>...).
// Match "2026-05-25" and extract year, month, day
const match = '2026-05-25'.match(/^(\d{4})-(\d{2})-(\d{2})$/);
// match[1] = '2026', match[2] = '05', match[3] = '25'
// Named groups (clearer):
const match = '2026-05-25'.match(/^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/);
// match.groups.year, match.groups.month, match.groups.day
Named groups are worth the verbosity. match[3] is a magic number; match.groups.day documents itself.
08Non-greedy matching
Quantifiers like + and * are greedy by default — they match as much as possible. This causes a classic bug:
// Goal: extract content inside <a> tags
'<a>link 1</a> and <a>link 2</a>'.match(/<a>(.+)<\/a>/);
// matches: '<a>link 1</a> and <a>link 2</a>'
// captures: 'link 1</a> and <a>link 2' ← WRONG
'<a>link 1</a> and <a>link 2</a>'.matchAll(/<a>(.+?)<\/a>/g);
// 1st match captures: 'link 1'
// 2nd match captures: 'link 2'
Adding ? after a quantifier makes it non-greedy — match as little as possible. Default to non-greedy when you're extracting things from larger text.
09Word boundaries
\b matches a word boundary — the transition between a word character and a non-word character. Useful for matching whole words:
// 'cat' matches 'cat', not 'cats' or 'concatenate'
/\bcat\b/
// Useful for find-and-replace
text.replace(/\boldname\b/g, 'newname');
10Alternation — OR matching
// HTTP status codes 4xx or 5xx
/\b(4|5)\d{2}\b/
// Common file extensions
/\.(jpg|jpeg|png|gif|webp)$/i
// Yes/no variations
/^(y|yes|yeah)$/i
The i flag at the end makes matching case-insensitive. (a|b|c) matches a OR b OR c. Wrap in non-capturing group (?:a|b|c) if you don't need to capture the matched value.
11Hex colors
// Match #rgb or #rrggbb (with or without alpha)
/^#([0-9a-f]{3}|[0-9a-f]{4}|[0-9a-f]{6}|[0-9a-f]{8})$/i
Matches 3, 4, 6, or 8 hex characters after a #. The | alternations allow each valid length. Case-insensitive via i.
12Log parsing — extract structured fields
// 127.0.0.1 - - [25/May/2026:12:34:56 +0000] "GET /path HTTP/1.1" 200 1234
/^(?<ip>\S+) \S+ \S+ \[(?<date>[^\]]+)\] "(?<method>\S+) (?<path>\S+) [^"]+" (?<status>\d+) (?<size>\d+)/
Named groups make log-parsing regexes readable. \S+ means "one or more non-whitespace characters," which works as a field separator. [^\]]+ means "one or more characters that aren't a closing bracket," which captures the date string inside brackets without including the brackets themselves.
13Catastrophic backtracking — the trap
Some regex patterns can hang your process indefinitely on certain inputs. Example: (a+)+b on the input "aaaaaaaaaaX". The regex engine tries every possible grouping of a's and takes exponential time to fail.
Avoid nested quantifiers like (a+)+ or (a*)*. If you find yourself writing one, restructure. Modern languages (Rust, Go) use linear-time regex engines that don't have this problem; JavaScript, Python, and Java do.
Test regex with adversarial input. If validation user-input fields, consider running regex with a timeout, or use a linear-time regex library like RE2.
∞The discipline
Regex is a power tool. It's also famously unreadable — the joke is that nobody can read their own regex three months later. Two habits make this manageable:
- Add comments with the
xflag (where supported), or write a comment block above the regex explaining what it matches in plain English. - Test with regex101.com before shipping. The site shows you a step-by-step match breakdown and warns about catastrophic backtracking risk.
Regex is best when used in moderation. For complex parsing, a real parser is always better. But for the 12 patterns above, regex is the right tool — and now you have them.