phi-redact-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@phi-redact-mcpredact 'My SSN is 078-05-1120 and email is test@example.com'"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
phi-redact-mcp
An MCP server that redacts PII/PHI from text before it ever reaches an LLM — self-hosted, fail-closed, and HIPAA-aware.
Teams building LLM and agent pipelines in regulated domains have no clean, drop-in way to strip PHI/PII from a payload before it crosses into a model provider's infrastructure. phi-redact-mcp is that boundary: three MCP tools — redact, restore, detect — that scrub sensitive values into reversible placeholders, run entirely inside infrastructure you control, and block the request if detection is uncertain instead of leaking data.
redact("Patient MRN: 1234567, provider NPI 1234567893, ssn 078-05-1120, john.doe@example.com")
redacted_text (safe to send to the model):
"Patient MRN: [MEDICAL_RECORD_NUMBER_1], provider NPI [NPI_1], ssn [US_SSN_1], [EMAIL_ADDRESS_1]"
token_map (kept local, never sent to the model):
[MEDICAL_RECORD_NUMBER_1] → 1234567
[NPI_1] → 1234567893
[US_SSN_1] → 078-05-1120
[EMAIL_ADDRESS_1] → john.doe@example.comSend the redacted text to the model; keep the token_map local; call restore afterward to rehydrate the result. Round-trips are byte-exact and proven with property-based tests.
Why this exists
The PHI/PII-redaction MCP niche is real but underserved — the existing options are thin Presidio wrappers with no HIPAA-specific detection and, critically, no guarantee that a detection failure blocks the request instead of silently passing raw data through. So teams either roll their own boundary or ship sensitive data to a provider and lean on a BAA to cover it — the design-time mistake that causes real compliance incidents.
Naive Presidio wrapper | Regex-in-your-app | Cloud DLP API | phi-redact-mcp | |
Drop-in MCP tools | sometimes | ❌ | ❌ | ✅ |
Fail-closed on uncertain detection | ❌ | ❌ | ❌ | ✅ |
HIPAA identifiers (NPI, DEA, MBI, MRN, CLIA) | ❌ | partial | partial | ✅ |
Reversible (restore original) | rarely | DIY | some | ✅ |
Runs self-hosted, zero egress | ✅ | ✅ | ❌ (sends data out) | ✅ |
Works with zero heavy deps | ❌ (needs spaCy) | ✅ | n/a | ✅ (regex engine) |
Optional ML NER (names, addresses) | ✅ | ❌ | ✅ | ✅ ( |
Why it was built: MCP went mainstream fast — it's now first-class in Claude, Cursor, and ChatGPT, across thousands of servers — but the PHI/PII-redaction corner was left to a few unmaintained wrappers. This fills that gap with a single honest, auditable, fail-closed boundary, kept open source so the redaction logic you depend on is fully inspectable rather than a black box.
Related MCP server: MCP Presidio
Features
Three tools, one boundary —
redact(→ scrubbed text + reversible token map),restore(→ original),detect(→ entities found, no mutation).Fail-closed by construction — if detection errors or any detection lands below the confidence threshold, the call returns a typed error. Uncertainty blocks; it never redacts-what-it-can and passes the rest.
HIPAA-aware detection — checksum-validated NPI and DEA, position-typed Medicare MBI, context-anchored MRN, CLIA lab IDs, plus standard PII (email, phone, SSN, credit card, IP, URL).
Zero-egress, self-hosted — the default engine is pure regex + checksums with no network calls and no heavy dependencies. It installs anywhere Python does.
Optional ML upgrade —
pip install "phi-redact-mcp[presidio]"adds Microsoft Presidio + spaCy forPERSON/LOCATIONNER, transparently.Reversible & deterministic — collision-proof typed placeholders make
restore(redact(x)) == xfor arbitrary input; same input + config always yields the same output.
When to use it (and when not to)
Reach for phi-redact-mcp when:
You send healthcare, clinical, financial, or user-generated text to a third-party LLM API and need PHI/PII kept out of that provider's infrastructure and logs.
You're building an agent or MCP pipeline in a regulated domain and want a drop-in scrubbing boundary you wire in with one tool call.
You need reversible redaction so downstream steps still work:
redact→ send to model →restore.You want a self-hosted, no-egress detector you can audit line by line.
You need HIPAA-specific identifiers (NPI, DEA, Medicare MBI, MRN, CLIA), not just names and emails.
Reach for something else when:
You need irreversible de-identification / anonymization (tokenization, k-anonymity) — redaction here is reversible by design.
You need to redact non-text data (images, audio, PDFs, database rows) — scope is text.
You want a certified compliance product — this is one technical control, not a compliance program (see Scope & honest limitations).
You want a transparent proxy that auto-scrubs everything in the request path — v1 is explicit tool calls; proxy mode is on the roadmap.
You require guaranteed 100% recall — no detector, this one included, can promise that.
Quickstart (< 60 seconds)
pip install phi-redact-mcp # zero heavy deps; runs immediatelyThen register it with your MCP client.
Claude Desktop / Claude Code (claude_desktop_config.json, or claude mcp add phi-redact -- phi-redact-mcp):
{
"mcpServers": {
"phi-redact": {
"command": "phi-redact-mcp"
}
}
}Cursor (.cursor/mcp.json) and VS Code use the same shape — see examples/ for ready-to-paste configs.
Want name/address detection too?
pip install "phi-redact-mcp[presidio]"
python -m spacy download en_core_web_lgThe server auto-detects Presidio and upgrades — no config change needed. (Set PHI_MCP_ENGINE=regex to force the dependency-free engine, or =presidio to require the ML one.)
How it works
A tool call comes in over stdio; the Redactor core runs the configured detection engine, resolves overlaps deterministically, applies the fail-closed threshold check, and swaps detected spans for reversible typed placeholders. Only scrubbed text is meant to leave the boundary you run.
flowchart LR
A[MCP client<br/>Claude · Cursor · agent] -- redact / restore / detect --> B[phi-redact-mcp<br/>stdio server]
B --> C[Redactor core<br/>fail-closed · reversible]
C --> D{Detection engine}
D -->|default, zero deps| E[Regex + checksums]
D -->|optional| F[Presidio + spaCy NER]
C -. scrubbed text .-> A
A -- scrubbed text only --> G[(LLM / downstream)]The Redactor core depends only on a small DetectionEngine interface — never on Presidio or MCP directly. Raw data and the detection engine stay inside the boundary you run; only scrubbed text leaves it. See docs/ARCHITECTURE.md and docs/THREAT_MODEL.md.
The tools
redact(text) → { redacted_text, token_map, entities }
Replaces detected PHI/PII with typed placeholders like [NPI_1]. token_map maps each placeholder back to its original value — keep it local; never send it to the model. entities lists what was redacted (type/span/score) for auditing.
restore(redacted_text, token_map) → { text }
Reverses a redaction, recovering the original text exactly. Safe to call on model output that still contains the placeholders.
detect(text) → { entities, count }
Reports the entities found — type, span, confidence — without modifying the text. Unlike redact, it surfaces low-confidence hits rather than blocking, so you can inspect coverage before trusting the boundary in a pipeline.
How to use it (a real pipeline)
The pattern is redact → model → restore, with the token map never leaving your side:
Scrub before the model. Call
redact(user_text). Send onlyredacted_textto the LLM. Keeptoken_mapin your process — treat it as sensitively as the raw input, and never pass it to the model.Let the model work on placeholders. It sees
[NPI_1],[US_SSN_1], etc. — semantically neutral tokens it can reason about and echo back.Rehydrate after. Call
restore(model_output, token_map)to swap the real values back into the model's response before it reaches your user or database.Handle the block. If
redactreturns a[LOW_CONFIDENCE]or[DETECTION_ERROR]tool error, the boundary refused to leak — surface it, tighten input, or lower the risk, but don't send the raw text onward.
Before trusting it in a pipeline, call detect(sample_text) on representative (synthetic) data to see exactly what is and isn't caught, and tune the thresholds (below) to your risk tolerance.
Fail-closed, precisely
Two thresholds govern every redact call:
detection_floor(default0.35) — the sensitivity boundary. Signals below it are treated as noise.min_confidence(default0.5) — the trust threshold.
Any candidate that survives the floor but scores below min_confidence puts the call into fail-closed mode: it returns a [LOW_CONFIDENCE] error rather than redacting the confident spans and passing the uncertain one through. Engine errors return [DETECTION_ERROR]. On any error, no redacted text is returned. Both thresholds are configurable (see below).
Configuration
All optional; sane defaults mean it runs with zero config. Set via the client's env block.
Variable | Default | Meaning |
|
|
|
|
| Trust threshold; detections below it fail closed |
|
| Below this, a signal is treated as noise |
|
| Reject larger input with a typed error |
|
| spaCy model for the Presidio engine |
Entity coverage
Entity | Regex engine (default) | Presidio engine ( |
Email, Phone, SSN, Credit card, IP, URL | ✅ | ✅ |
NPI (Luhn + 80840 check digit) | ✅ | ✅ |
DEA (check digit) | ✅ | ✅ |
Medicare MBI (position-typed) | ✅ | ✅ |
MRN (context-anchored) | ✅ | ✅ |
CLIA lab number | ✅ | ✅ |
Person names | ❌ | ✅ (spaCy NER) |
Addresses / locations | ❌ | ✅ (spaCy NER) |
Detection quality is measured, not asserted — see the eval harness. On the synthetic corpus the default engine clears the project bar (recall ≥ 0.90, precision ≥ 0.80) on the HIPAA identifier set.
Scope & honest limitations
This tool reduces PHI/PII exposure at one boundary. It does not make a system "HIPAA compliant." Compliance is a property of an entire system and organization — its policies, contracts, access controls, audit posture, and people — not of any single library. Running phi-redact-mcp can be part of a compliant design, but it is not a certification, a guarantee, or a substitute for a Business Associate Agreement, a risk assessment, or legal counsel.
Concretely, this project does not: guarantee 100% detection (no detector does), de-identify beyond reversible redaction, cover non-text data, or act as a transparent proxy in v1 (redaction is via explicit tool calls you wire in). No detector is perfect — evaluate on your own representative data before relying on it. See docs/THREAT_MODEL.md for the full boundary, assumptions, and residual risks, and SECURITY.md to report issues.
How to contribute
Contributions are very welcome — this is a deliberately friendly place to make your first open-source PR, and the maintainer tries to respond quickly.
The easiest high-value contribution: add a detection recognizer for a new identifier (a regex + an optional check-digit validator + a test). The add-a-recognizer issue form doubles as the spec, and CONTRIBUTING.md walks through the six steps.
Other good ways to help: improve docs, add test cases or example client configs, or pick up something from the roadmap. Browse good first issues or open an issue to propose something.
git clone https://github.com/Rinava/phi-mcp && cd phi-mcp
pip install -e ".[dev]"
pytest # fast invariant suite (Presidio faked, sub-second)
ruff check . && mypy src/phi_mcp
python eval/run_eval.pyThe full guide — dev setup, conventions, and the no-real-PHI rule for fixtures — is in CONTRIBUTING.md. By contributing you agree your work is MIT-licensed.
License
MIT — matches Presidio and maximizes reuse. Built with Microsoft Presidio (optional) and the MCP Python SDK.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Rinava/phi-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server