pynydus.security.presidio

PII redaction using Microsoft Presidio + custom recognizers.

Presidio handles PII only: names, emails, phone numbers, SSNs, credit cards, addresses, etc. Secret/credential detection is handled by gitleaks (see pynydus.security.gitleaks).

Required dependencies:

  • presidio-analyzer >= 2.2

  • presidio-anonymizer >= 2.2

  • spaCy model: en_core_web_lg

Module Contents

Classes

PIIReplacement

A single PII replacement.

RedactionResult

Result of PII redaction on a text.

PIIRedactor

Stateful PII redactor with stable {{PII_NNN}} placeholders.

Functions

_build_ssn_recognizer

Detect US Social Security Numbers (Presidio’s built-in sometimes misses).

_build_us_passport_recognizer

Detect US passport numbers.

_build_drivers_license_recognizer

Detect common US driver’s license patterns.

_create_analyzer

Create a Presidio AnalyzerEngine with PII-focused custom recognizers.

_get_analyzer

Return the singleton AnalyzerEngine, creating it on first call.

_resolve_overlaps

Remove overlapping detections, keeping the highest-scoring / longest.

Data

API

pynydus.security.presidio._build_ssn_recognizer() presidio_analyzer.PatternRecognizer

Detect US Social Security Numbers (Presidio’s built-in sometimes misses).

pynydus.security.presidio._build_us_passport_recognizer() presidio_analyzer.PatternRecognizer

Detect US passport numbers.

pynydus.security.presidio._build_drivers_license_recognizer() presidio_analyzer.PatternRecognizer

Detect common US driver’s license patterns.

pynydus.security.presidio._create_analyzer() presidio_analyzer.AnalyzerEngine

Create a Presidio AnalyzerEngine with PII-focused custom recognizers.

pynydus.security.presidio._analyzer: presidio_analyzer.AnalyzerEngine | None

None

pynydus.security.presidio._get_analyzer() presidio_analyzer.AnalyzerEngine

Return the singleton AnalyzerEngine, creating it on first call.

pynydus.security.presidio._DEFAULT_SCORE_THRESHOLD

0.4

pynydus.security.presidio._SUPPRESSED_ENTITIES

None

class pynydus.security.presidio.PIIReplacement

A single PII replacement.

original: str

None

pii_type: str

None

placeholder: str

None

start: int

None

end: int

None

class pynydus.security.presidio.RedactionResult

Result of PII redaction on a text.

redacted_text: str

None

replacements: list[pynydus.security.presidio.PIIReplacement]

‘field(…)’

class pynydus.security.presidio.PIIRedactor(start_index: int = 1, score_threshold: float = _DEFAULT_SCORE_THRESHOLD)

Stateful PII redactor with stable {{PII_NNN}} placeholders.

Uses Microsoft Presidio plus custom recognizers. The same surface value maps to the same placeholder across calls on one instance.

Initialization

Create a redactor.

Args: start_index: First placeholder index (default 1{{PII_001}}). score_threshold: Minimum Presidio confidence to accept a span.

redact(text: str) pynydus.security.presidio.RedactionResult

Redact PII spans. repeated values reuse the same placeholder.

Args: text: Input UTF-8 text.

Returns: Redacted text and per-span replacement metadata.

_get_placeholder(value: str) str

Return existing or allocate next {{PII_NNN}} for value.

property counter: int

Next placeholder index after the last assignment.

property mapping: dict[str, str]

Return the original → placeholder mapping.

pynydus.security.presidio._resolve_overlaps(results: list[presidio_analyzer.RecognizerResult]) list[presidio_analyzer.RecognizerResult]

Remove overlapping detections, keeping the highest-scoring / longest.