Regular Expressions in Java

9 min read

startsWith, contains, and equals cover most String comparisons. When the rule is more shape-based — any three digits, anything that looks like an email, every UUID in this log — you reach for a regular expression. Java's regex API lives in java.util.regex: Pattern (the compiled rule) and Matcher (the engine that walks it across an input). The regex syntax is the same as in JavaScript and Python, with one Java-specific wrinkle: every backslash in the pattern has to be doubled in the source code.

Two ways to use regex

Two entry points, two use cases:

// 1) String.matches — does the WHOLE string match this pattern?
boolean ok = "alice@test.com".matches(".*@.*\\.com");      // true
 
// 2) Pattern + Matcher — find one or many matches inside a larger string
import java.util.regex.Matcher;
import java.util.regex.Pattern;
 
Pattern p = Pattern.compile("\\d{3}");
Matcher m = p.matcher("Status: 200 OK");
if (m.find()) {
    System.out.println(m.group());                          // "200"
}

String.matches(...) is the shortcut for "does the entire input fit?" It returns true only if the whole string matches — there's an implicit ^...$ around your pattern. Useful for validation: "is this email-shaped?", "is this a UUID?".

Pattern.compile(...).matcher(input) is the workhorse: it gives you a Matcher that you drive with find(), group(), and start()/end(). Use this when you need to extract substrings or scan for every occurrence.

Double backslashes — the Java tax

In a raw regex, \d means "any digit." In a Java string literal, \ is itself an escape character, so to write a literal backslash you need \\. To get a regex \d inside a Java string, you write "\\d".

The cheat sheet:

Regex (what the engine sees)Java string literal
\d"\\d"
\s"\\s"
\."\\."
\\"\\\\"
[0-9]+"[0-9]+" (no backslashes needed)

Every time you see \\ in a Java pattern, the engine sees a single \. This catches absolutely every newcomer once. Your IDE may underline literals with valid regex or warn you about a malformed one — pay attention to those squiggles.

Common patterns for QA work

Patterns you'll meet day-to-day:

String email   = "\\w+@\\w+\\.\\w+";                                       // alice@test.com
String status  = "\\d{3}";                                                 // 200, 404, 500
String uuid    = "[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}";
String isoDate = "\\d{4}-\\d{2}-\\d{2}";                                   // 2026-05-06
String url     = "https?://[\\w.-]+(/[\\w.-]*)*";                          // http or https URL

A bigger character set:

  • \d digit, \D non-digit, \s whitespace, \S non-whitespace, \w word char ([A-Za-z0-9_]), \W non-word
  • . any character (use \\. for a literal dot)
  • * zero or more, + one or more, ? zero or one, {n} exactly n, {n,m} n to m
  • ^ start, $ end (matters with Matcher.find(); String.matches already anchors)
  • [abc] any of a, b, c — [^abc] any except a, b, c
  • (group) capture group, (?:group) non-capturing

We're not going deep into regex syntax — qa.codes has a dedicated Regex for Testers cheat sheet and the Regex Tester utility for live experimentation. The lesson is about how to use regex from Java; the patterns themselves transfer between languages.

A real example — extracting a status code

import java.util.regex.Matcher;
import java.util.regex.Pattern;
 
public class ExtractStatus {
    public static void main(String[] args) {
        String response = "HTTP/1.1 502 Bad Gateway";
 
        Matcher m = Pattern.compile("HTTP/\\d\\.\\d (\\d{3}) (.+)").matcher(response);
        if (m.find()) {
            String code = m.group(1);
            String reason = m.group(2);
            System.out.println("code = " + code);
            System.out.println("reason = " + reason);
        }
    }
}

Output:

code = 502
reason = Bad Gateway

The pattern has two capture groups in parentheses: (\\d{3}) and (.+). After a successful find(), m.group(0) (or just m.group()) is the whole match; m.group(1) is the first capture; m.group(2) is the second. Capture groups are how you pull parts of a match out of the input.

Finding all occurrences

find() returns one match at a time. Call it in a loop to walk every occurrence:

import java.util.regex.Matcher;
import java.util.regex.Pattern;
 
public class CountNumbers {
    public static void main(String[] args) {
        String summary = "Total: 32, Passed: 28, Failed: 4";
 
        Matcher m = Pattern.compile("\\d+").matcher(summary);
        int count = 0;
        while (m.find()) {
            System.out.println("found " + m.group() + " at index " + m.start());
            count++;
        }
        System.out.println("total numbers: " + count);
    }
}

Output:

found 32 at index 7
found 28 at index 19
found 4 at index 31
total numbers: 3

m.start() returns the index where the current match begins; m.end() is one past the end. Useful when you need to know where in the original string a match was, not just what it matched.

replaceAll with regex — masking and rewriting

String.replaceAll(regex, replacement) rewrites every match. The replacement can reference capture groups with $1, $2, etc.:

public class MaskNumbers {
    public static void main(String[] args) {
        String log = "User 42 logged in from IP 10.0.0.7 with id 12345";
 
        // Mask 4+ digit numbers
        String masked = log.replaceAll("\\d{4,}", "****");
        System.out.println(masked);
 
        // Reformat key=value pairs into key: value
        String pairs = "env=staging timeout=10 retries=3";
        String niceFormat = pairs.replaceAll("(\\w+)=(\\w+)", "$1: $2");
        System.out.println(niceFormat);
    }
}

Output:

User 42 logged in from IP 10.0.0.7 with id ****
env: staging timeout: 10 retries: 3

Two distinct uses: pure replacement ($1, $2 not used) and capture-and-rewrite ($1, $2 reorganise the match). For QA reporting, masking PII (account numbers, tokens, emails) before logs leave the test agent is a typical use.

String.replaceFirst(...) is the same but stops after the first match; String.replace(literal, literal) does a non-regex replacement (cheaper and safer when the search is fixed text).

Compile once, reuse many

Pattern.compile(...) is not free — the regex engine builds a state machine. If a pattern is used in a loop, compile it once outside the loop and reuse the Pattern:

private static final Pattern STATUS_CODE = Pattern.compile("\\d{3}");
 
public static String extractCode(String line) {
    Matcher m = STATUS_CODE.matcher(line);
    return m.find() ? m.group() : null;
}

String.matches, String.replaceAll, etc. compile the pattern internally on every call — fine for one-off uses, wasteful in a tight loop. The static final field idiom is the standard way to compile-once-use-many.

How regex matching flows

Reading top down: compile the rule, attach it to the input, repeatedly call find() to advance, pull out matches with group(). That four-step rhythm covers nearly every regex use case in QA tooling.

Tip: qa.codes/utilities/regex-tester is a sandbox for prototyping a pattern against real input before pasting it into your Java code. Most regex bugs come from authoring the pattern in your head; testing it interactively first saves a lot of recompiles.

⚠️ Common mistakes

  • Single backslash in a Java pattern. Pattern.compile("\d{3}") is a compile error (\d is an unknown escape). The fix is "\\d{3}". Watch for \., \\, \s, \b — every regex backslash needs to be doubled in the source.
  • Confusing String.matches and Matcher.find. matches requires the entire input to match; find looks for any match anywhere. "abc 200 def".matches("\\d{3}") returns false (the whole string isn't three digits); "abc 200 def".matches(".*\\d{3}.*") returns true. Pick the right one for what you're asking.
  • Recompiling the same pattern in a tight loop. for (...) { Pattern.compile("...").matcher(s).find(); } rebuilds the state machine every iteration. Hoist the Pattern.compile(...) out — usually as a static final field.

🎯 Practice task

Extract structured data from log lines. 25-30 minutes.

  1. Create LogScanner.java. import java.util.regex.Matcher; and import java.util.regex.Pattern;.
  2. Define a sample input array of log lines. Example:
    String[] lines = {
        "2026-05-06 09:00:01 INFO  Login OK in 1450ms",
        "2026-05-06 09:00:03 ERROR Status 502 from /checkout",
        "2026-05-06 09:00:05 INFO  Search OK in 820ms",
        "2026-05-06 09:00:07 ERROR Status 500 from /export"
    };
  3. Build a static final Pattern STATUS = Pattern.compile("Status (\\d{3}) from (/\\w+)");. Loop over the lines and use STATUS.matcher(line).find() to find any line that matches. Print code and path from groups 1 and 2.
  4. Use line.matches("\\d{4}-\\d{2}-\\d{2}.*") (or just check startsWith if you prefer) to confirm every line begins with an ISO date.
  5. Use String.replaceAll("\\d{1,5}ms", "***ms") to mask all duration values in the lines. Print before and after.
  6. Use a single regex with three capture groups to parse a key/value pair input like env=staging timeout=10 retries=3. Walk every match and print each key/value.
  7. Stretch: open the Regex Tester on qa.codes, paste a real Selenium error log into it, and find the pattern that captures every "element not found" CSS selector. Then paste that pattern into your Java code (with the \\ doubling) and confirm it produces the same captures. Crossing between an interactive tester and Java's escape rules is a useful skill on its own.

You can now extract and rewrite text by shape, not just by literal. Lesson 3 introduces lambdas — the syntax that powers the next-generation collection processing in lesson 4.

// tip to track lessons you complete and pick up where you left off across devices.