What Is a Regular Expression? A Beginner's Guide to Regex

Regular expressions look intimidating but follow simple rules. This guide explains what regex is, how it works, and the most useful patterns to know.

The pattern-matching tool every developer eventually needs

A regular expression (regex) is a sequence of characters that defines a search pattern. You use it to find, match, or replace text based on structure rather than exact content. Instead of searching for the exact string “hello”, you can search for “any word that starts with h and ends with a vowel” — and regex gives you the syntax to express that precisely.

Regex is built into almost every programming language — JavaScript, Python, PHP, Go, Java — and into command-line tools like grep, sed, and awk. It’s used in form validation, log parsing, text transformation, search-and-replace in editors, and URL routing.

Test your patterns against real input with Regex Tester.

The basic building blocks

Literals — The simplest regex: characters that match themselves. The pattern cat matches the string “cat” anywhere it appears. Case matters by default.

. (dot) — Matches any single character except a newline. c.t matches “cat”, “cut”, “c3t”, and “c t”.

* (asterisk) — Matches zero or more of the preceding element. ca*t matches “ct”, “cat”, “caat”, “caaaat”.

+ (plus) — Matches one or more of the preceding element. ca+t matches “cat” and “caat” but not “ct”.

? (question mark) — Matches zero or one of the preceding element. colou?r matches both “color” and “colour”.

^ and $ — Anchors. ^ matches the start of a string; $ matches the end. ^hello only matches strings that begin with “hello”. world$ only matches strings ending with “world”.

Character classes

Square brackets define a character class — a set of characters where any one can match.

[aeiou] — matches any lowercase vowel. [0-9] — matches any digit. Equivalent to \d. [a-z] — matches any lowercase letter. [A-Za-z0-9] — matches any alphanumeric character. [^aeiou] — the ^ inside brackets negates: matches any character that is NOT a vowel.

Shorthand classes:

  • \d — digit (0-9)
  • \w — word character (letters, digits, underscore)
  • \s — whitespace (space, tab, newline)
  • \D, \W, \S — the negated versions

Quantifiers and grouping

{n} — Exactly n repetitions. \d{4} matches exactly four digits. {n,m} — Between n and m repetitions. \d{2,4} matches 2, 3, or 4 digits. {n,} — n or more repetitions.

() — Parentheses create a capturing group. (ha)+ matches “ha”, “haha”, “hahaha”. Groups also let you reference matched text in replacements.

| (pipe) — Alternation. cat|dog matches either “cat” or “dog”.

Five patterns worth memorizing

Email validation (basic):

[\w.-]+@[\w.-]+\.[a-zA-Z]{2,}

Matches most email addresses. Not perfect (email spec is complex) but sufficient for form validation.

URL:

https?://[\w.-]+(?:/[\w./?=&#%-]*)?

Matches HTTP and HTTPS URLs, including paths and query strings.

Phone number (flexible):

[\+]?[(]?[0-9]{3}[)]?[-\s.]?[0-9]{3}[-\s.]?[0-9]{4,6}

Handles most international phone number formats.

IPv4 address:

\b(?:\d{1,3}\.){3}\d{1,3}\b

Matches four groups of 1-3 digits separated by dots.

Date (YYYY-MM-DD):

\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])

Validates ISO 8601 date format with basic range checking.

Flags that change behavior

Most regex engines support flags that modify how the pattern matches:

i — Case-insensitive. cat with the i flag also matches “CAT”, “Cat”, “cAt”. g — Global. Find all matches in a string, not just the first. m — Multiline. ^ and $ match the start/end of each line, not just the string. s — Dotall. Makes . match newlines too.

When regex is the wrong tool

Regex is powerful but can become a maintenance problem when overused. Some situations where a dedicated parser is better:

  • Parsing HTML or XML — Regex can’t handle nested tags reliably. Use a DOM parser.
  • Parsing JSON — Use JSON.parse(), not regex.
  • Complex date validation — Use a date library that understands calendar rules.
  • Fully validating email addresses — The RFC 5322 spec is so complex that even dedicated email validation libraries get edge cases wrong. For practical purposes, send a confirmation email instead.

A regex pattern that grows beyond 50 characters is usually a sign that the problem needs a different approach. Build incrementally, test with real input, and document the pattern’s intent — future you will thank present you.

Use Regex Tester to build and validate patterns against real-world input before putting them in your code.


✨ Missing something?
Can't find the tool you need?
Request it — we build new tools based on what people ask for.
Request a tool