Cloakbox
Provides a read-only MCP server interface to a DuckDB database, allowing LLMs to query a sanitized copy of sensitive data while preserving analytical capabilities.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Cloakboxshow me a report of student enrollment by course"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Cloakbox
Let any LLM analyze your sensitive data — without ever showing it a real identity.
Cloakbox builds a sanitized, analysis-ready copy of your database in which people are
replaced by stable, join-preserving tokens. An LLM queries the copy (read-only) and
sees tokens like SUB_2c17917e63b5 instead of names. A separate, isolated, human-only
tool can re-identify when a person genuinely needs to — and every reversal is audited.
Built on DuckDB + the Model Context Protocol. Reuses mature building blocks; see docs/04-prior-art.md.
VAULT (real data) ──build──► CLOAKBOX (tokens, no PII) ──read-only MCP──► LLM
read-only │
└──► MAPPING (isolated) ◄── manual, human-only decoderWhy it's different
Most "PII firewalls" redact text in flight as the AI hits real data — a detection
miss is a live leak. Cloakbox inverts that: it pre-sanitizes the whole warehouse
with an explicit, fail-closed policy, then lets the AI roam the clean copy freely.
And it proves no analytical value was lost: equivalence_check.py shows reports
return identical numbers on the vault and the box.
Runtime PII proxy | Cloakbox | |
Basis | Detection (miss = leak) | Explicit per-column policy + fail-closed scan |
Cross-table joins | Best-effort | Preserved by deterministic tokens |
Correctness | — | Equivalence proof in CI |
Re-identification | Often inline/automatic | Isolated, manual, human-only, audited |
Related MCP server: Secure Billing MCP Server
Quickstart (synthetic data, ~1 minute)
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cd pipeline
python3 make_example_vault.py # generate a fake vault
python3 build_cloakbox.py init # create the secret salt (once)
python3 build_cloakbox.py build # vault -> sanitized cloakbox + isolated mapping
python3 build_cloakbox.py validate # fail-closed PII + k-anonymity scan
python3 equivalence_check.py # prove the box == the vault, numericallyFull walkthrough: docs/quickstart.md.
See it run
The build is fail-closed and the result is provably equivalent to the source:
$ python3 build_cloakbox.py validate
== Residual PII scan (emails) ==
OK — no email-shaped values found.
== Token format check ==
(checked all policy-tokenized columns)
== k-anonymity report (k=5) ==
ok students: 0 QI-groups below k on (grade_level, campus)
WARN enrollments: 403 QI-groups below k on (course_id, grade, campus)
VALIDATION PASSED
$ python3 equivalence_check.py
== 4a) Aggregate report equivalence (numbers must match exactly) ==
IDENTICAL passed assessments per course
IDENTICAL distinct subjects per campus
IDENTICAL enrollments joined to courses, count per subject area
== 4b) Identity-labelled report (vault relabelled via mapping == box) ==
IDENTICAL distinct subjects per teacher
EQUIVALENCE PASSED — Cloakbox reproduces the vault's output exactlyHow it works
Tokenize, deterministically.
PREFIX_ + sha256(salt || domain || value). Same input → same token, so joins and distinct-counts survive; one-way, so the box can't be reversed. (policy)Fail closed.
validatescans for residual emails and malformed tokens and reports k-anonymity violations; a leak blocks the build.Gate read-only. A DuckDB MCP server points only at
cloakbox.duckdb. (gateway template)Re-identify out-of-band. The decoder is a manual, isolated, audited CLI — never reachable by the model.
Layout
pipeline/ build engine, policy config, equivalence check, synthetic-data generator
decode/ isolated, human-only re-identification tool
gateway/ read-only MCP config + agent guardrail rule (templates)
docs/ architecture, anonymization policy, security, prior art, decoder, quickstartPoint it at your own data
Edit pipeline/cloakbox_config.py: set the vault path
and adjust the column rules and report definitions to your schema. Re-run build +
equivalence. Never commit real data or the salt — see .gitignore.
Security
Read docs/03-security.md. Key point: a read-only DuckDB
connection blocks writes but does not sandbox the filesystem — the real
isolation is OS file permissions keeping the vault, salt, and mapping out of the
gateway's reach (pipeline/secure_paths.sh). This is a pattern, not a compliance
certification; have counsel review regulated deployments.
License
MIT — see LICENSE.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/mbufkin/cloakbox'
If you have feedback or need assistance with the MCP directory API, please join our Discord server