pynydus.security.presidio¶
PII redaction using Microsoft Presidio + custom recognizers.
Presidio handles PII only: names, emails, phone numbers, SSNs, credit
cards, addresses, etc. Secret/credential detection is handled by gitleaks
(see pynydus.security.gitleaks).
Required dependencies:
presidio-analyzer >= 2.2
presidio-anonymizer >= 2.2
spaCy model: en_core_web_lg
Module Contents¶
Classes¶
A single PII replacement. |
|
Result of PII redaction on a text. |
|
Stateful PII redactor with stable |
Functions¶
Detect US Social Security Numbers (Presidio’s built-in sometimes misses). |
|
Detect US passport numbers. |
|
Detect common US driver’s license patterns. |
|
Create a Presidio AnalyzerEngine with PII-focused custom recognizers. |
|
Return the singleton AnalyzerEngine, creating it on first call. |
|
Remove overlapping detections, keeping the highest-scoring / longest. |
Data¶
API¶
- pynydus.security.presidio._build_ssn_recognizer() presidio_analyzer.PatternRecognizer¶
Detect US Social Security Numbers (Presidio’s built-in sometimes misses).
- pynydus.security.presidio._build_us_passport_recognizer() presidio_analyzer.PatternRecognizer¶
Detect US passport numbers.
- pynydus.security.presidio._build_drivers_license_recognizer() presidio_analyzer.PatternRecognizer¶
Detect common US driver’s license patterns.
- pynydus.security.presidio._create_analyzer() presidio_analyzer.AnalyzerEngine¶
Create a Presidio AnalyzerEngine with PII-focused custom recognizers.
- pynydus.security.presidio._get_analyzer() presidio_analyzer.AnalyzerEngine¶
Return the singleton AnalyzerEngine, creating it on first call.
- pynydus.security.presidio._DEFAULT_SCORE_THRESHOLD¶
0.4
- pynydus.security.presidio._SUPPRESSED_ENTITIES¶
None
- class pynydus.security.presidio.PIIReplacement¶
A single PII replacement.
- class pynydus.security.presidio.RedactionResult¶
Result of PII redaction on a text.
- replacements: list[pynydus.security.presidio.PIIReplacement]¶
‘field(…)’
- class pynydus.security.presidio.PIIRedactor(start_index: int = 1, score_threshold: float = _DEFAULT_SCORE_THRESHOLD)¶
Stateful PII redactor with stable
{{PII_NNN}}placeholders.Uses Microsoft Presidio plus custom recognizers. The same surface value maps to the same placeholder across calls on one instance.
Initialization
Create a redactor.
Args: start_index: First placeholder index (default
1→{{PII_001}}). score_threshold: Minimum Presidio confidence to accept a span.- redact(text: str) pynydus.security.presidio.RedactionResult¶
Redact PII spans. repeated values reuse the same placeholder.
Args: text: Input UTF-8 text.
Returns: Redacted text and per-span replacement metadata.