Logo
  • About
  • Demo
  • Roadmap
  • Contact
Download
Redactinator
/Roadmap
Roadmap
/
🎯
Features
/
Sherlock.PII

Sherlock.PII

App Version Implemented
v1.5
Details

This is the PII module. Goal is to be able to redact any SSN, Email, and DoB strings from within files.

Notes

Priority
Medium
Status
Done

Social Security Numbers (SSN)

  • Regex Pattern: r'\b\d{3}-\d{2}-\d{4}\b'
  • How it works:
    • \b: This is a "word boundary." It ensures that we are matching a whole word or number, preventing it from finding an SSN inside a longer string of digits.
    • \d{3}: Looks for exactly three digits (\d means any digit, {3} means three of them).
    • : Matches the literal hyphen character.
    • \d{2}: Looks for exactly two digits.
    • : Matches the second hyphen.
    • \d{4}: Looks for exactly four digits.
    • \b: Another word boundary to mark the end.

This pattern specifically targets the classic XXX-XX-XXXX format for Social Security Numbers.

Dates of Birth (DOB)

  • Regex Pattern: r'\b(0[1-9]|1[0-2])[-/](0[1-9]|[12]\d|3[01])[-/](\d{4}|\d{2})\b'
  • How it works:
    • (0[1-9]|1[0-2]): This part looks for the month. It matches a 0 followed by a digit from 1 to 9 (for 01 to 09) OR a 1 followed by a digit from 0 to 2 (for 10, 11, 12).
    • [-/]: Matches either a hyphen or a forward slash as the separator.
    • (0[1-9]|[12]\d|3[01]): This handles the day. It looks for a 0 followed by 1-9 OR a 1 or 2 followed by any digit OR a 3 followed by a 0 or 1. This covers days from 01 to 31.
    • [-/]: Matches the second separator.
    • (\d{4}|\d{2}): This finds the year, matching either a four-digit year or a two-digit year.

This regex is flexible enough to find dates like MM-DD-YYYY, MM/DD/YYYY, MM-DD-YY, and MM/DD/YY.

Emails

  • Regex Pattern: r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
  • How it works:
    • [A-Za-z0-9._%+-]+: This matches the username part of the email. It allows one or more (+) uppercase letters, lowercase letters, numbers, and the special characters ._%+-.
    • @: Matches the literal "@" symbol.
    • [A-Za-z0-9.-]+: This matches the domain name (like "gmail" or "yahoo"). It allows one or more letters, numbers, dots, and hyphens.
    • \.: Matches the literal dot before the top-level domain (like ".com").
    • [A-Z|a-z]{2,}: This looks for the top-level domain (TLD). It requires at least two ({2,}) uppercase or lowercase letters.

This pattern effectively finds most standard email address formats.

Redactinator.com

Privacy Policy

License

© 2026 Redactinator. All rights reserved.