Regular expressions are one of those tools that sit awkwardly between essential and arcane. You use them constantly — in form validation, log parsing, search-and-replace, data cleaning — yet the syntax remains cryptic no matter how many times you look it up. This cheat sheet gives you 20 battle-tested patterns you can copy directly into your code, along with the understanding to modify them confidently.
Quick Reference: Regex Metacharacters
These are the building blocks. Every pattern below is composed from these elements.
| Character | Meaning | Example |
|---|---|---|
. | Any character except newline | a.c matches "abc", "a1c", "a-c" |
^ | Start of string (or line with m flag) | ^Hello matches "Hello world" but not "Say Hello" |
$ | End of string (or line with m flag) | world$ matches "Hello world" |
* | 0 or more of preceding | ab*c matches "ac", "abc", "abbc" |
+ | 1 or more of preceding | ab+c matches "abc", "abbc" but not "ac" |
? | 0 or 1 (optional) | colou?r matches "color" and "colour" |
{n} | Exactly n repetitions | \d{4} matches exactly four digits |
{n,m} | Between n and m repetitions | \w{3,8} matches 3 to 8 word characters |
\d | Any digit (0-9) | \d{3} matches "123", "007" |
\D | Any non-digit | \D+ matches "abc", "---" |
\w | Word character (a-z, A-Z, 0-9, _) | \w+ matches "hello_world" |
\W | Non-word character | \W matches spaces, punctuation |
\s | Whitespace (space, tab, newline) | \s+ matches one or more spaces |
\b | Word boundary | \bcat\b matches "cat" but not "category" |
[abc] | Character class — any of a, b, c | [aeiou] matches any vowel |
[^abc] | Negated class — anything except a, b, c | [^0-9] matches non-digits |
(group) | Capture group | (\d{2})/(\d{2}) captures day and month |
(?:group) | Non-capturing group | (?:ab)+ matches "abab" without capturing |
a|b | Alternation (or) | cat|dog matches "cat" or "dog" |
20 Ready-to-Use Patterns
1. Email Address
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Matches most standard email addresses. This pattern handles common formats well but is not RFC 5322 compliant — edge cases like quoted local parts or IP address domains are not covered. For production email validation, send a confirmation email rather than relying solely on regex.
2. URL (http/https)
https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)
Matches HTTP and HTTPS URLs including paths, query strings and fragments. The https? at the start makes the "s" optional, matching both protocols.
3. UK Postcode
^[A-Z]{1,2}\d[A-Z\d]?\s*\d[A-Z]{2}$
Matches UK postcodes like SW1A 1AA, M1 1AA, B1 1AA, EC1A 1BB. The \s* in the middle allows for optional spacing. Use the i flag for case-insensitive matching. This covers all valid UK postcode formats including the single-letter area codes (like M for Manchester) and double-letter ones (like SW for South West London).
4. UK Phone Number
^(?:(?:\+44)|(?:0))(?:\d\s?){9,10}$
Handles both +44 international and 0-prefix domestic formats, with optional spaces between digit groups.
5. Date (DD/MM/YYYY)
^(0[1-9]|[12]\d|3[01])\/(0[1-9]|1[0-2])\/\d{4}$
Validates the day range (01-31) and month range (01-12). Note this does not validate that the day is valid for the given month — 31/02/2026 would pass. For full date validation, parse with a date library after the regex check.
6. IPv4 Address
^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$
Validates each octet is in the 0-255 range. The alternation 25[0-5]|2[0-4]\d|[01]?\d\d? handles the three ranges: 250-255, 200-249, and 0-199.
7. Hex Colour Code
^#?([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$
Matches both 6-digit (#FF5733) and 3-digit (#F00) hex colours, with or without the hash prefix.
8. Strong Password
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
At least 8 characters with one uppercase, one lowercase, one digit and one special character. The (?=...) syntax is a lookahead — it asserts a condition without consuming characters. Each lookahead checks for one requirement independently.
Test These Patterns Live
Paste any regex and test string to see matches, capture groups and highlighting in real time.
Open Regex Tester →9. Username (alphanumeric, 3-16 chars)
^[a-zA-Z0-9_]{3,16}$
Letters, numbers and underscores only. Adjust the {3,16} range to match your application requirements.
10. URL Slug
^[a-z0-9]+(?:-[a-z0-9]+)*$
Matches URL-friendly strings like "my-blog-post" or "product123". No consecutive hyphens, no leading or trailing hyphens.
11. Credit Card Number
^\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}$
Matches 16-digit card numbers with optional spaces or hyphens between groups. This is for format validation only — use Luhn algorithm for checksum validation.
12. HTML Tag
<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)
Captures the tag name in group 1 and content in group 3. Important caveat: regex is fundamentally unable to parse nested HTML correctly. Use this for simple extraction tasks only — for anything complex, use a proper DOM parser.
13. Whitespace Trimming
^\s+|\s+$
Match leading and trailing whitespace for removal. The alternation | matches either the start or end pattern. In most languages, the built-in trim() method is more efficient, but this pattern is useful in regex-only contexts like editor find-and-replace.
14. Duplicate Words
\b(\w+)\s+\1\b
Finds consecutive duplicate words like "the the" or "is is". The \1 backreference matches whatever the first capture group matched. Extremely useful for proofreading text content.
15. ISO Date (YYYY-MM-DD)
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
16. File Extension
\.([a-zA-Z0-9]+)$
Captures the file extension in group 1. The escaped dot \. ensures it matches a literal period, not any character.
17. Number with Commas
^\d{1,3}(,\d{3})*(\.\d+)?$
Matches formatted numbers like 1,234,567.89. The pattern enforces proper comma placement — commas must separate groups of exactly three digits.
18. CSS Property
[a-z-]+\s*:\s*[^;]+;
Extracts CSS property declarations like color: red; or background-color: #fff;. Useful for parsing inline styles or extracting specific rules from stylesheets.
19. JSON Key-Value Pair
"([^"]+)"\s*:\s*("([^"]*)"|\d+|true|false|null)
Extracts key-value pairs from simple JSON. Group 1 captures the key, group 2 the full value. For complex nested JSON, always use a proper JSON parser.
20. Markdown Link
\[([^\]]+)\]\(([^)]+)\)
Group 1 captures the link text, group 2 captures the URL. Handy for extracting or converting Markdown links in batch processing.
Lookaheads and Lookbehinds
Lookarounds are zero-width assertions — they check what is ahead of or behind the current position without including it in the match. They are essential for patterns that need context without consuming characters.
| Syntax | Name | Example | Meaning |
|---|---|---|---|
(?=...) | Positive lookahead | \d+(?= pounds) | Digits followed by " pounds" (without matching "pounds") |
(?!...) | Negative lookahead | \d+(?! dollars) | Digits NOT followed by " dollars" |
(?<=...) | Positive lookbehind | (?<=\$)\d+ | Digits preceded by "$" (without matching "$") |
(?<!...) | Negative lookbehind | (?<!\$)\d+ | Digits NOT preceded by "$" |
The strong password pattern above (#8) uses four positive lookaheads to check multiple conditions simultaneously. Each lookahead starts from the same position and independently verifies one requirement. This technique is powerful whenever you need to validate multiple criteria against the same input.
Regex Flags
| Flag | Name | Effect |
|---|---|---|
g | Global | Find all matches, not just the first |
i | Case-insensitive | /hello/i matches "Hello", "HELLO", "hElLo" |
m | Multiline | ^ and $ match line starts/ends, not just string start/end |
s | Dotall / Single-line | . matches newlines too (normally it does not) |
u | Unicode | Enables full Unicode matching — essential for non-ASCII text |
Flag syntax varies by language. JavaScript uses /pattern/flags, Python uses re.IGNORECASE or inline (?i), and Java uses Pattern.compile("pattern", Pattern.CASE_INSENSITIVE). Always check your language documentation for the correct flag syntax.
Regex Across Languages
While the core syntax is similar, regex engines differ in important ways:
JavaScript: Uses /pattern/flags literal syntax or new RegExp("pattern", "flags"). Supports named groups with (?<name>...) since ES2018. The matchAll() method returns an iterator of all matches including capture groups.
Python: The re module provides re.search(), re.match() (anchored to start), and re.findall(). Raw strings r"pattern" avoid double-escaping backslashes. Named groups use (?P<name>...) syntax. The re.VERBOSE flag allows comments and whitespace in patterns for readability.
C# / .NET: Uses the Regex class in System.Text.RegularExpressions. Supports balancing groups for matching nested structures — a feature unique to .NET. Compiled regex with RegexOptions.Compiled improves performance for patterns used repeatedly.
Go: Uses the regexp package with RE2 syntax. Importantly, Go does not support backreferences or lookaheads — the RE2 engine guarantees linear-time matching at the cost of these features. If you need lookaheads in Go, you will need to restructure your approach.
Performance Tips
- Be specific with character classes —
[a-z]+is faster than.+because the engine can fail faster on non-matching characters without backtracking - Use non-capturing groups —
(?:abc)instead of(abc)when you do not need the capture, avoiding the overhead of storing group matches - Avoid catastrophic backtracking — nested quantifiers like
(a+)+or(a|a)+can cause exponential time complexity and hang your application. Test patterns against pathological inputs - Anchor when possible —
^pattern$is faster than an unanchoredpatternbecause the engine only tries matching from the start - Use possessive quantifiers or atomic groups where supported —
a++or(?>a+)prevent backtracking into the group entirely - Compile and reuse — in languages like Python and C#, compile frequently-used patterns once and reuse the compiled object rather than recompiling on every call
Common Troubleshooting
Pattern matches in tester but not in code: Check whether your language requires escaping backslashes in strings. In Java and C#, \d in a string literal needs to be \\d. Python raw strings r"\d" avoid this issue.
Only the first match is found: You probably need the global flag. In JavaScript, use /pattern/g or matchAll(). In Python, use re.findall() instead of re.search().
Match includes too much text: Quantifiers are greedy by default — .+ matches as much as possible. Add ? to make it lazy: .+? matches as little as possible. The classic example is matching HTML tags: <.+> matches from the first < to the last > in the entire string, while <.+?> matches each individual tag.
Regex works on test data but fails on production input: Check for encoding issues (UTF-8 vs Latin-1), invisible characters (zero-width spaces, byte order marks), and platform-specific line endings (\r\n on Windows vs \n on Unix).
Test Any Regex Pattern
Real-time matching, capture groups, flag toggles and presets. Everything runs in your browser — your data stays private.
Open Regex Tester →