Detect and MASK personally identifiable information in a document (PDF or image).
USE THIS WHEN you need to know what PII a document contains, or to get a redacted copy before
forwarding / logging / passing it to another model. Two layers: a deterministic regex+checksum
pass for structured identifiers (emails, payment cards, SSN, PAN, ABN) and a vision model for
the unstructured PII — names, addresses, dates of birth, phone numbers, and photo/signature
presence.
Provide the document ONE way: `url` (a public http(s) link, fetched server-side) or `bytes_b64`
(inline base64, plus `filename`). `max_pages` caps how many pages are read (default a few;
ceiling 10).
Returns `{pii_found, by_type, items[] (type, masked preview, method), redacted_text, has_photo,
has_signature}`. Values are MASKED in the response — the raw PII is never returned. DETECTION
coverage, not a guarantee: it may miss PII or over-flag, so review before relying on it for
compliance. The document is never stored.