How do I use mcp-dlp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@mcp-dlp read customer-contract.txt" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

mcp-dlp

by aaravjain151

Overview Schema Related Servers Score Discussions

Python

Local

MCP DLP Prototype

A data-loss-prevention (DLP) layer for AI agents, built as a Model Context Protocol (MCP) server. It sits between a document connector and the agent: when a document is fetched, its contents are scanned for sensitive data, sensitive values are redacted (or the whole document is blocked, for credentials), and every read is recorded in an audit log — so raw sensitive data never reaches the model or the user.

This is a local prototype using a mock Google Drive-style file connector. Real Google Drive integration is out of scope by design (see Limitations).

More docs: Architecture · Write-up

The problem

AI agents are increasingly wired into business systems (Google Drive, Slack, Notion, Jira, etc.) through MCP connectors. An agent can fetch a document and pass its contents straight into a model or show them to a user — including any Social Security numbers, credit card numbers, API keys, or other secrets the document happens to contain. This prototype demonstrates one way to close that gap.

Related MCP server: ZugaShield

How it works

Client (MCP Inspector)
        │  calls read_document("customer-contract.txt")
        ▼
MCP server (server.py)
        │  1. connector reads the raw file from mock_drive/
        │  2. scanner.scan()  -> finds sensitive data + positions + confidence
        │  3. decide_action() -> allowed | redacted | blocked
        │  4a. redacted: scanner.redact() rebuilds text with labels
        │  4b. blocked:  returns a [BLOCKED] message, no content
        │  5. log_audit_entry() appends one JSON line to logs/audit_log.jsonl
        ▼
Client receives ONLY the redacted text or block message — never the raw document

The key design point: the DLP layer lives between document retrieval and the tool's return value. The raw text is read into a local variable and never leaves the function — only the redacted result or a block message is returned.

Project layout

mcp-dlp/
├── server.py            # MCP server: read_document tool, policy, audit logging
├── scanner.py           # detection rules (RULES), redaction, labels
├── test_scanner.py      # 24 unit tests (pytest)
├── mock_drive/          # sample documents (the mock connector's "files")
│   ├── customer-contract.txt
│   ├── engineering-notes.txt
│   └── support-ticket.txt
├── logs/
│   └── audit_log.jsonl  # append-only audit trail (auto-created)
└── pyproject.toml

Setup

Requires Python 3.10+, uv, and Node.js (the MCP Inspector runs via npx).

# from the project root
uv add "mcp[cli]>=1.27,<2"   # pinned below v2 for stability
uv add --dev pytest

The mcp SDK is pinned to <2 deliberately: a breaking v2 is scheduled and the prior spec revision (2025-11-25) is the stable target for this prototype.

Demo (under 5 minutes)

Start the server, which launches the MCP Inspector and prints a URL with a session token pre-filled:

uv run mcp dev server.py

Open that URL, go to the Tools tab, and select read_document. The demo walks through three documents that exercise all three policy outcomes:

1. A user asks to read a document with sensitive data. Call read_document with customer-contract.txt. The source file contains a name, email, phone, SSN, and credit card.

2. The DLP layer detects and redacts. The response keeps the customer name but replaces the email, phone, SSN, and card with labels:

Customer: John Smith
Email: [REDACTED_EMAIL]
Phone: [REDACTED_PHONE]
SSN: [REDACTED_SSN]
Card on file: [REDACTED_CREDIT_CARD]

The raw values never leave the server.

3. Credentials are blocked entirely. Call read_document with engineering-notes.txt. Because it contains live credentials, the document is withheld:

[BLOCKED] 'engineering-notes.txt' contains high-risk credentials
(API_KEY, AWS_ACCESS_KEY, BEARER_TOKEN) and was withheld by DLP policy.

4. The audit log shows what was detected and what action was taken. Every read, redacted or blocked, is recorded:

cat logs/audit_log.jsonl

Summary of the three sample documents:

Document	Expected result	Why
`customer-contract.txt`	redacted	contains PII (email, phone, SSN, card)
`support-ticket.txt`	redacted	contains PII + a low-confidence account number
`engineering-notes.txt`	blocked	contains credentials (API key, AWS key, bearer token)

Running the tests

uv run pytest -v

24 tests cover every detector, redaction correctness, context preservation, overlap handling, confidence levels, and — importantly — false-positive guards (e.g. the word "password" in ordinary prose must not be redacted).

Detection coverage

Type	Confidence	Notes
Email	high	standard structure
Phone (formatted)	high	parens / dashes / dots / `+1`
Phone (bare)	low	10 bare digits — ambiguous
SSN (formatted)	high	dashed or spaced
SSN (bare)	low	9 bare digits — ambiguous
Credit card	high	issuer-prefix + length (Visa, Mastercard)
Bearer token	high	anchored on the `Bearer` keyword
API key	high	known vendor prefixes (`sk-`, `ghp_`, …)
AWS access key	high	`AKIA` / `ABIA` prefixes
Private key	high	full PEM block, header to footer
Secret (generic)	high	`keyword = value` for password/token/secret/etc.

Confidence is split deliberately: a formatted SSN or phone number is strong evidence, while bare digits could be an order ID or account number. Low-confidence findings are still redacted (fail-safe), but the distinction is recorded and is used to ensure a low-confidence guess can never trigger a full block.

Policy: allowed / redacted / blocked

Findings	Action	Returned to agent
none	allowed	original document
PII / financial (email, phone, SSN, card)	redacted	cleaned document with labels
credentials (API key, AWS key, bearer, private key)	blocked	`[BLOCKED]` message, no content

The block list (BLOCK_TYPES in server.py) is fail-closed: a document containing live credentials is withheld entirely rather than partially redacted, on the principle that an agent should not be handling a credentials file at all. The generic SECRET detector is intentionally redact-only (not block), because it is the fuzziest, lowest- precision rule and shouldn't withhold a whole document on its own.

Audit log

Every read appends one JSON object to logs/audit_log.jsonl (JSON Lines: append-only, one record per line). Example:

{"timestamp": "2026-06-26T09:34:21Z", "connector": "mock_google_drive", "tool": "read_document", "document_name": "engineering-notes.txt", "findings_count": 3, "finding_types": ["API_KEY", "AWS_ACCESS_KEY", "BEARER_TOKEN"], "action": "blocked", "original_length": 201, "redacted_length": 0}

Configuration / extensibility

Detection rules live in RULES in scanner.py as a list of (label, compiled_regex, confidence[, capture_group]) tuples. Adding a detector is one line; no changes to the scanning logic are needed.
Redaction labels live in the LABELS dict — change a label in one place.
Block policy is the BLOCK_TYPES set in server.py — one line to make the policy stricter or looser.

Limitations & what production would need

This is a prototype. Honest gaps, and the reasoning behind them:

Regex-based detection, not ML. Real DLP (Microsoft Purview, Google DLP) combines regex with named-entity recognition and ML classifiers. Regex alone misses context and unusual formats. Production would add an NER/ML layer with a human review queue.
API-key coverage is a finite prefix list. Only encoded vendor prefixes are caught (Stripe, GitHub, AWS, …). A vendor whose prefix isn't listed is missed. This is the same approach real secret scanners (Gitleaks, GitGuardian) use, but their lists are far larger and continuously updated.
No entropy-based secret detection. Unlabeled high-entropy strings (a random secret not next to a password = keyword) are not caught. Entropy detection was deliberately skipped because it false-positives heavily on hashes, UUIDs, and git SHAs without a review queue to absorb the noise.
Credit-card matching has no Luhn checksum. Detection is issuer-prefix + length only, so a number matching the prefix pattern but failing the Luhn check would still be flagged. For DLP this over-flagging is the safer error, but a checksum would reduce false positives.
Overlap resolution is position-based. When two findings overlap, the left-most one wins. This is fine for the current rule set but isn't a true severity ranking; a production version would resolve overlaps by a type-priority order.
Mock connector only. Documents are local files. Real Google Drive integration (OAuth, the Drive API, streaming large files) is out of scope.
Single document, full-text scan. No streaming or chunking; very large documents are read into memory whole.

Tech

Python · MCP Python SDK (FastMCP) · stdio transport · regex detection · pytest · JSON Lines audit logging.

Install Server

license - not found

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Tools

read_documentA

Related MCP Servers

Shrike Security MCP Server
Security Autonomous Agents
Shrike-Security
A
license
A
quality
B
maintenance
Protects AI agents from threats like prompt injection, jailbreaks, and SQL injection through a multi-layer scanning pipeline. It also enables PII redaction and rehydration to ensure data privacy during LLM interactions.
Last updated 2026-07-16
12
427
1
Apache 2.0
ZugaShield
Security Autonomous Agents Agent Orchestration
Zuga-Technologies
A
license
-
quality
C
maintenance
A 7-layer security system for AI agents that detects and blocks prompt injection, data exfiltration, and malicious tool calls. It enables real-time scanning of inputs, outputs, and tool definitions to protect agentic workflows from emerging AI-specific threats.
Last updated 2026-07-05
1
MIT
Aegis MCP Server
Autonomous Agents Security Developer Tools
cleburn
A
license
-
quality
B
maintenance
An enforcement layer that validates AI agent actions against governance policies, including path permissions and content scanning, at runtime. It enables secure, role-based execution of file operations and commands with zero token overhead by processing policies independently from the agent's context.
Last updated 2026-05-23
83
3
MIT
query-sanitizer-mcp
Security AI & Machine Learning Autonomous Agents
vidoluco
F
license
-
quality
C
maintenance
A local DLP middleware that redacts sensitive information from prompts using local models before they reach external LLMs. It provides tools to sanitize queries, restore placeholders in responses, and manage a ledger of redactions to maintain data privacy.
Last updated 2026-07-04
1

View all related MCP servers

Related MCP Connectors

rail-score
Responsible-AI guardrails for agents: scoring with policy, injection & PII detection, DPDP.
agent-prompt-injection-firewall-mcp
The WAF for agents. Pattern-based + heuristic firewall scans prompts, RAG documents, tool argume...
shadowgate-mcp
Security firewall for AI agents — scans MCP calls for injection, secrets, and risks.

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/aaravjain151/mcp-dlp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server