Detect personally identifiable information (PII) in text.
Finds emails, phone numbers, SSNs, credit cards, IP addresses, and
person names. Optionally returns redacted text with PII replaced by
type labels (e.g. [EMAIL], [PHONE]). BERT-NER + regex ensemble.
Args:
text: Text to scan for personally identifiable information.
redact: If true, return redacted text with PII replaced by [TYPE].
Returns:
dict with keys:
- pii_found (list): Detected PII items, each containing:
- text (str): The PII value found
- type (str): PII type (EMAIL, PHONE, SSN, CREDIT_CARD, IP, PERSON)
- start (int): Character offset start
- end (int): Character offset end
- score (float 0-1): Detection confidence
- count (int): Total PII items found
- redacted_text (str|null): Text with PII replaced (when redact=true)
- has_pii (bool): Whether any PII was detected