Practical Regex Guide: Essential Patterns for Search, Replace, and Validation
Text

Practical Regex Guide: Essential Patterns for Search, Replace, and Validation

From regex basics to practical patterns for work. Covers email, phone, URL, postal code validation, search/replace techniques, and common metacharacters with examples.

What Is Regex and Why Is It Necessary?

Regular expressions (RegEx) are a language for describing string patterns. They enable "searching, extracting, or replacing strings matching specific patterns" in just a few lines of code.

For example, "determining if a string is a valid email address" requires dozens of if statements in regular programming, but regex completes it in one line.

Where Regex Excels

  • Validation: Check correct format in input forms (email, phone, postal code)
  • Search/Extract: Extract specific error messages from log files
  • Replace: Bulk replacement in many text files (e.g., hide all phone numbers)
  • Data Cleansing: Batch delete unnecessary spaces and line breaks
Regex TesterTest regular expressions in real-time with visual syntax highlighting.

Basic Regex Syntax

Metacharacters (Characters with Special Meaning)

CharMeaningExample
.Any single charactera.c → "abc", "a9c", "a c"
^Line start^Hello → line starts with "Hello"
$Line endworld$ → line ends with "world"
*0+ occurrences of preceding charab*c → "ac", "abc", "abbc"
+1+ occurrences of preceding charab+c → "abc", "abbc" (not "ac")
?0 or 1 occurrence of preceding charab?c → "ac", "abc"
|OR (either)cat|dog → "cat" or "dog"
()Grouping(ab)+ → "ab", "abab", "ababab"
[]Character class (any one char)[abc] → "a", "b", "c"
[^]Negated character class (except these)[^abc] → anything but "a", "b", "c"

Special Characters Requiring Escape

Escape these characters with \ to search literally:

\ ^ $ . * + ? ( ) [ ] { } | /

Example: To search $100\$100

Common Character Classes & Shorthands

PatternMeaningEquivalent
\dDigit[0-9]
\DNon-digit[^0-9]
\wWord character (alphanumeric+_)[a-zA-Z0-9_]
\WNon-word character[^a-zA-Z0-9_]
\sWhitespace (space/tab/newline)[ \t\n\r\f\v]
\SNon-whitespace[^ \t\n\r\f\v]

Example: "3-digit number" → \d{3}
Example: "Alphanumeric 4-8 chars" → \w{4,8}

Practical Regex Pattern Collection

Email Address Validation

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Explanation:

  • ^...$: Match entire string
  • [a-zA-Z0-9._%+-]+: Local part (before @)
  • @: At sign
  • [a-zA-Z0-9.-]+: Domain name
  • \.[a-zA-Z]{2,}: Top-level domain (.com, .jp, etc.)

Japanese Phone Number (With/Without Hyphens)

^0\d{1,4}-?\d{1,4}-?\d{4}$

Explanation:

  • ^0: Starts with 0
  • \d{1,4}: Area code (1-4 digits)
  • -?: Hyphen 0 or 1 time
  • \d{1,4}-?\d{4}: Local exchange and subscriber number

Example: Matches 03-1234-5678, 090-1234-5678, 0312345678

URL Extraction

https?:\/\/[\w\/:%#\$&\?\(\)~\.=\+\-]+

Explanation:

  • https?: http or https
  • :\/\/: :// (escaped slashes)
  • [\w\/:%#\$&\?\(\)~\.=\+\-]+: URL-allowed characters

Japanese Postal Code (〒123-4567)

^\d{3}-\d{4}$

Explanation:

  • ^\d{3}: Starts with 3 digits
  • -: Hyphen
  • \d{4}$: Ends with 4 digits

Date (YYYY-MM-DD Format)

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

Explanation:

  • \d{4}: Year (4 digits)
  • (0[1-9]|1[0-2]): Month (01-12)
  • (0[1-9]|[12]\d|3[01]): Day (01-31)
Text DiffCompare two texts or code snippets to highlight differences instantly.

Practical Search & Replace Techniques

Case 1: Replace All Phone Numbers with "-**-***"

Search: 0\d{1,4}-?\d{1,4}-?\d{4}
Replace: ***-****-****

Case 2: Remove All HTML Tags

Search: <[^>]+>
Replace: (empty string)

Explanation: <[^>]+> means "starts with <, followed by 1+ non-> chars, ends with >".

Case 3: Remove All Leading Spaces

Search: ^\s+
Replace: (empty string)

Explanation: ^ is line start, \s+ is 1+ whitespace.

Case 4: Reorder Using Capture Groups

Original: 田中太郎(Tanaka Taro)
Search: (.+)((.+))
Replace: $2 - $1
Result: Tanaka Taro - 田中太郎

Explanation: Groups captured by () can be referenced as $1, $2.

HTML Tag RemoverInstantly strip all HTML tags to extract clean, readable raw text.

Common Mistakes and Debugging

Mistake 1: Forgetting to Escape

. or * without escape are interpreted as metacharacters.
Wrong: file*.txt → "file followed by 0+ any char, .txt"
Correct: file\*.txt → "file*.txt" literal string

Mistake 2: Greedy Matching

.* uses longest match, matching beyond intended range.
Example: <div>Hello</div><div>World</div> with <div>.*</div>
→ Matches entire <div>Hello</div><div>World</div> (unintended)

Solution: Use non-greedy .*?
<div>.*?</div> → Matches <div>Hello</div> and <div>World</div> separately

Mistake 3: Unable to Match Across Newlines

By default, . doesn't match newlines.
Solution: Enable s flag (dotall) or use [\s\S].

FAQ: Regex Questions

Q1. Which languages/tools support regex?

A. Nearly all programming languages (JavaScript, Python, Java, PHP, Ruby, etc.), text editors (VS Code, Sublime Text, Vim), and command-line tools (grep, sed, awk). However, detailed syntax and features (lookahead, lookbehind, etc.) may vary by language.

Q2. Regex is too complex to read...

A. Regex often becomes "write-once code." Maintain readability by:

  • Adding comments (many languages support (?#comment))
  • Splitting into variables and combining
  • Testing incrementally with regex tester tools
    Example: regex101.com, regexr.com are convenient online tools.

Q3. How to specify "full-width katakana only"?

A. Use ^[ァ-ヶー]+$. In Unicode property-enabled environments, ^\p{Katakana}+$ works (requires JavaScript u flag).

Summary: Practice Makes Perfect with Regex

Regex initially seems cryptic, but memorizing basic patterns and actually using them enables quick mastery. Follow these learning steps:

  1. Memorize basic metacharacters: . ^ $ * + ? | ( ) [ ]
  2. Imitate common patterns: email, phone, URL
  3. Test with regex tester: Verify actual matches
  4. Use in real work: Practice with search, replace, validation
  5. Split complex patterns: Maintain readability

Mastering regex dramatically streamlines text processing. Start using it today, little by little.

Related Articles