Bulkhead
Integrates with GitHub Copilot to scan and redact sensitive content such as credentials, personal data, and injection attacks in real-time during AI-assisted coding.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@BulkheadCheck my clipboard for secrets and PII"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Bulkhead
Cascading content protection for AI-powered development tools.
Bulkhead detects and redacts sensitive content -- PII, secrets, prompt injection, and system prompt leakage -- before it leaks through LLM-powered features. It runs as a VS Code extension, an HTTP REST server, or an MCP server for AI coding assistants.
Deployment | VS Code extension, HTTP REST server, MCP server, Docker |
Detection | 154 secret patterns across 13 categories, 45+ PII entity types across 20+ countries, prompt injection, system prompt leakage, test data detection |
Architecture | Three-layer cascading classifier (regex, BERT, LLM) |
Policy | Named policies (strict, moderate), composable compliance overlays, risk assessment with classified issues |
Tests | 185 tests including adversarial suite, policy system, and performance benchmarks |
Why Bulkhead?
The AI Coding Assistant Blind Spot
Every time a developer uses an AI coding assistant, the IDE sends context to an LLM -- not just the active file, but adjacent tabs, terminal output, clipboard contents, and project-wide search results. That context routinely includes .env files with database credentials, test fixtures with real customer data, and config files with API keys.
This leakage happens at edit time, not commit time. It is continuous, invisible, and bypasses every existing control: git hooks don't see it, SAST doesn't scan it, WAFs don't intercept it. There is no diff, no PR, no audit trail. By the time a secret scanner catches something at commit, the AI assistant has already sent it to an LLM provider dozens of times.
What's at Stake
HIPAA: $50,000 -- $1.9M per violation. Average healthcare breach: $9.77M
PCI-DSS: $5,000 -- $100,000/month in non-compliance fines
GDPR: Up to 4% of annual global revenue
SOC 2: Audit failures mean lost enterprise contracts
Average data breach cost: $4.88M (IBM 2024). Mean time to identify: 194 days
A single leaked AWS key costs $50,000+ in cloud resource abuse on average
How Bulkhead Solves It
Bulkhead sits between your editor and the AI, catching sensitive content before it leaves. Three detection layers trade speed for depth -- expensive inference only runs on the fraction of content that needs it:
flowchart LR
A["Input Text"] --> B["Layer 1: Regex\nsub-ms"]
B -->|"60-70% resolved"| C["Confirmed"]
B -->|"remaining"| D["Layer 2: BERT\n20-50ms"]
D -->|"~90% resolved"| C
D -->|"~5-10% ambiguous"| E["Layer 3: LLM\n500ms-2s"]
E --> CEvery detection carries provenance -- which layer flagged it, at what confidence, and why. Your compliance team gets "regex matched SSN with Luhn validation" or "BERT flagged a name at 0.92 confidence," not "the AI said so."
Deploy Anywhere, Change Nothing
VS Code extension -- catches leaks at the point of creation, on every keystroke
HTTP REST server -- CI/CD pipeline integration, API gateway sidecar
MCP server -- direct integration with Claude Code and GitHub Copilot
Docker -- zero-install containerized deployment for air-gapped environments
Same engine, same 154 secret patterns, same 45+ PII types, same policies. Different transports.
For the full business case with industry scenarios, architecture walkthroughs, and comparison with alternatives, see Why Do We Need Content Protection for AI Dev Tools?
Related MCP server: Shrike Security MCP Server
What Makes Bulkhead Different
The core innovation is the cascading classifier -- three detection layers that progressively trade speed for depth, so expensive inference only runs on the small fraction of content that actually needs it. Regex handles the bulk (sub-millisecond, every keystroke). A BERT model resolves contextual entities like names and locations. An LLM disambiguates the genuinely hard cases ("Is 'Jordan' a person or a country?"). Each detection carries full provenance -- which layer flagged it, at what confidence, and why.
The detection patterns themselves are ported from established open-source projects (see Attribution). Bulkhead's contribution is the cascade architecture, the BERT worker thread integration, the LLM disambiguation layer, the multi-platform server architecture, and the deduplication logic that ties it all together.
The Problem
Every time you use an AI coding assistant, your editor content gets sent to an LLM. That content can include:
Personal data -- SSNs, credit cards, emails, phone numbers, medical IDs
Secrets -- API keys, tokens, database credentials, private keys
Prompt injection -- malicious instructions hidden in code comments or data
System prompt leakage -- attempts to extract your AI tool's configuration
Bulkhead sits between your code and the AI, catching sensitive content before it leaves.
Install
npm install @bulkhead-ai/coreAlso available as @floatingsidewal/bulkhead-core via GitHub Packages and as a Docker container at ghcr.io/floatingsidewal/bulkhead.
Quick Start
import { createEngine, getPolicy } from "@bulkhead-ai/core";
const engine = createEngine({
enabled: true, debounceMs: 500,
guards: { pii: { enabled: true }, secret: { enabled: true }, injection: { enabled: true }, contentSafety: { enabled: false } },
cascade: { escalationThreshold: 0.75, contextSentences: 3, modelEnabled: false, modelId: "Xenova/bert-base-NER" },
policy: "strict",
});
const policy = getPolicy("strict");
const { risk } = await engine.policyScan("My SSN is 123-45-6789", policy);
console.log(risk.level); // "high"
console.log(risk.issues); // [{ category: "pii", entityType: "US_SSN", severity: "high", isTestData: true }]See the How-To Guide for comprehensive examples including medical record scanning, bulk data redaction, and custom policies.
Project Structure
bulkhead/
packages/
core/ @bulkhead-ai/core Detection engine, guards, cascade, policies
vscode/ bulkhead VS Code extension
server/ @bulkhead-ai/server HTTP REST server + MCP server
docs/ Guides: architecture, API, policies, patterns
Dockerfile Multi-stage build (HTTP + MCP modes)
docker-compose.yml HTTP and MCP service definitionsHow It Works
Bulkhead uses the cascading classifier -- three detection layers that trade off speed against depth:
Layer | Speed | What it catches | When it runs |
Regex | Sub-ms | Structured PII (SSN, credit cards, IBAN), secrets (AWS keys, tokens), injection patterns | Every keystroke (debounced) |
BERT | 20-50ms | Names, locations, organizations -- contextual entities regex can't catch | On-demand "Deep Scan" or |
LLM | 500ms-2s | Ambiguous cases ("Is 'Jordan' a person or country?") | Only for the ~5-10% of detections the BERT layer can't resolve |
Each detection carries provenance -- which layer flagged it, at what confidence, and whether it's confirmed or needs review.
What It Detects
PII (45+ entity types across 20+ countries)
Patterns ported from Microsoft Presidio with checksum validation (Luhn, IBAN mod-97, Verhoeff) and context-aware scoring.
Generic: Credit cards, email, IBAN, IP addresses, MAC addresses, phone numbers, URLs, crypto wallets, dates
US: SSN, driver's license, passport, bank accounts, ITIN, Medicare (MBI), NPI, ABA routing, DEA license
UK: NHS number, NINO, passport, postcode, vehicle registration
EU: Spain NIF/NIE, Italy fiscal code/VAT/driver/passport/ID, Poland PESEL, Finland PIC, Sweden personnummer, Germany tax ID/passport
APAC: Singapore NRIC/UEN, Australia ABN/ACN/TFN/Medicare, India PAN/Aadhaar/vehicle/voter/passport, Korea RRN/passport, Thailand TNIN
Africa: Nigeria NIN
Secrets (154 patterns across 13 categories)
Patterns sourced from HAI-Guardrails, GitLeaks, and public provider documentation.
Cloud: AWS, Azure, GCP
Source control: GitHub, GitLab, Bitbucket
CI/CD: Jenkins, CircleCI, Travis CI, Drone
Communication: Slack, Twilio, SendGrid, Mailgun
Payment: Stripe, Square, PayPal
Database: Connection strings, Redis, MongoDB
Infrastructure: Terraform, Vault, Consul
SaaS: Jira, Confluence, Datadog, New Relic
AI/ML: OpenAI, Anthropic, HuggingFace, Cohere
Auth: Auth0, Okta, Firebase, Clerk, Supabase
CDN/Hosting: Cloudflare, Netlify, Vercel, Heroku
Social: Twitter, Facebook, LinkedIn
Generic: JWT, private keys, high-entropy strings
Prompt Injection
16 regex patterns + heuristic similarity matching against known attack phrases. Catches: "ignore previous instructions", role-play attacks, DAN mode, jailbreak attempts.
System Prompt Leakage
7 regex patterns + heuristic matching. Catches: "reveal your system prompt", "repeat everything above", extraction techniques.
Quick Start
Install from source
git clone https://github.com/your-org/bulkhead.git
cd bulkhead
npm install
npm run buildVS Code Extension
Open the
bulkhead/folder in VS CodePress
F5to launch the Extension Development HostOpen any file -- Bulkhead auto-scans on edit (regex layer, debounced)
Use the command palette:
Bulkhead: Scan File-- regex scanBulkhead: Deep Scan-- regex + BERT + LLM cascade
HTTP REST Server
# Development
cd packages/server && npm run dev
# Production
npm run build && node packages/server/dist/main.js
# With API key authentication
BULKHEAD_API_KEY=my-secret-key node packages/server/dist/main.js# Scan text
curl -X POST http://localhost:3000/v1/scan \
-H "Content-Type: application/json" \
-d '{"text": "My SSN is 123-45-6789"}'
# Scan and redact
curl -X POST http://localhost:3000/v1/redact \
-H "Content-Type: application/json" \
-d '{"text": "My SSN is 123-45-6789"}'MCP Server (Claude Code)
Add to .mcp.json in your project root:
{
"mcpServers": {
"bulkhead": {
"command": "npx",
"args": ["tsx", "packages/server/src/mcp/index.ts"]
}
}
}MCP Server (GitHub Copilot CLI)
Add to .github/copilot/mcp.json:
{
"mcpServers": {
"bulkhead": {
"command": "npx",
"args": ["tsx", "packages/server/src/mcp/index.ts"]
}
}
}Docker
# HTTP server on port 3000
docker compose up bulkhead
# MCP server on stdio
docker compose run --rm -i bulkhead-mcpUse as a library
import { GuardrailsEngine, PiiGuard, SecretGuard } from "@bulkhead-ai/core";
const engine = new GuardrailsEngine();
engine.addGuard(new PiiGuard());
engine.addGuard(new SecretGuard());
const results = await engine.analyze("Email: john@example.com, Key: AKIAIOSFODNN7EXAMPLE");
// results[0].detections -> [{ entityType: "EMAIL_ADDRESS", source: "regex", disposition: "confirmed", ... }]
// results[1].detections -> [{ entityType: "AWS_ACCESS_KEY", source: "regex", disposition: "confirmed", ... }]Configuration
VS Code Settings
Setting | Default | Description |
|
| Master toggle |
|
| Delay before auto-scan on edit |
|
| PII detection |
|
| Secret detection |
|
| Prompt injection detection |
|
| Enable BERT model (downloads ~29MB on first use) |
|
| BERT confidence below which detections escalate to LLM |
|
| Sentences of context sent to LLM for disambiguation |
Environment Variables (Server)
Variable | Default | Description |
|
| HTTP server port |
|
| HTTP server bind address |
|
| Log level: |
| (none) | API key for authentication. When set, all |
| (disabled) | CORS origin. Set to |
|
| Maximum request body size in bytes (default 1MB) |
|
| Enable PII guard |
|
| Enable secret guard |
|
| Enable injection guard |
|
| Enable BERT model for Layer 2 |
|
| HuggingFace model ID |
|
| BERT confidence threshold for LLM escalation |
|
| LLM provider: |
| (none) | API key for the LLM provider |
| (none) | Endpoint URL for custom LLM provider |
Testing
npm test # Run all 185 tests
npm run test:watch # Watch mode (core package)
npm run lint # Type-check all packagesThe test suite includes an adversarial test suite covering evasion techniques, false positive resistance, mixed-threat documents, performance benchmarks with ASCII bar charts, and a "kitchen sink" document that triggers all threat types simultaneously.
Documentation
Why Bulkhead? -- Business case, real-world scenarios, comparison with alternatives
Architecture -- Cascading classifier design, component map, entry points
Policy Guide -- Policies, risk assessment, test data detection, compliance overlays
Deployment -- Five deployment scenarios with configuration and examples
API Reference -- HTTP endpoints, MCP tools, environment variables
Guards -- Guard implementation details
Patterns -- Detection pattern reference
Testing -- Test strategy and adversarial suite
How-To -- Usage guides and library integration
Attribution
Bulkhead derives detection patterns and guard architecture from two open-source projects. The cascading classifier, BERT integration, LLM disambiguation, VS Code extension, server architecture, and deduplication logic are independently developed. See ATTRIBUTION.md for full details and NOTICES for original copyright notices.
Microsoft Presidio (MIT) -- PII detection patterns, checksum algorithms, entity taxonomy
HAI-Guardrails (MIT) -- Guard architecture, detection tactics, security patterns
Contributing
See CONTRIBUTING.md for guidelines.
License
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/floatingsidewal/bulkhead'
If you have feedback or need assistance with the MCP directory API, please join our Discord server