Skip to main content
Glama
MichaelEnny

healthsec-mcp

by MichaelEnny

healthsec-mcp

MCP connector for clinical-AI security evaluation: adversarial robustness, privacy leakage, and standards-compliance tools composed into a Security Posture Score, callable directly by AI agents.

Full paper structure, research contribution, and milestones: ../STRUCTURE.md. Practical tool-by-tool usage reference: docs/TOOLS.md. Illustrative worked example: examples/end_to_end_security_evaluation.md. Two ways to run this: locally via Claude Desktop (below), or via the agent-usefulness study harness (reproduce/agent_usefulness_study/).

Status: M1-M5 implemented, with two exceptions noted below. All 10 tools (the 9 planned + get_audit_log, added to close a gap found during M4) are registered on the server, lint-clean (ruff), type-clean (mypy), and covered by CI (.github/workflows/ci.yml). One documented open limitation remains: boundary attack's flip_rate/auroc_drop don't reproduce the reference numbers even though auroc_clean matches exactly (see TECHNICAL_DESIGN.md section 12; golden test marked xfail(strict=True)). Everything else that has a reference target is golden-verified -- exactly for the deterministic tools (M3 standards, M4 compute_sps), within tolerance for the ML-based ones (M1 run_fgsm). run_membership_inference (M2) has no published golden target (see TECHNICAL_DESIGN.md section 4.3) so it's covered by property tests only. The agent-usefulness study has been run once (6 tasks, real transcripts), but its rating data does not match the protocol's design — see reproduce/agent_usefulness_study/PROTOCOL.md's "Actual execution status" section for the full, honest account: the "2 raters" were, in turn, one person filling both columns, then one person plus ChatGPT as an LLM-judge cross-check. Neither is two independent human raters. A real second human rater still needs to score transcripts/blinded/ from scratch before this data can support the paper's agent-usefulness claim. Everything else in M5 (docs/TOOLS.md, CI, MCP contract tests, examples/) is done. See Milestones in STRUCTURE.md.

Layout

healthsec-mcp/
├── src/healthsec_mcp/
│   ├── adversarial/       # fgsm.py, boundary.py, plausibility.py -- implemented (M1)
│   ├── privacy/           # membership_inference.py -- implemented (M2)
│   ├── standards/         # attack_coverage.py, rbac.py, audit.py, compliance.py -- implemented (M3)
│   ├── tools/             # adversarial_tools.py, privacy_tools.py, standards_tools.py, sps_tools.py
│   ├── io/                # schemas.py -- FeatureBatch (n<=100), FeaturePool (n<=5000)
│   ├── registry.py        # in-session model-handle registry: handle -> model object map only
│   ├── authz.py           # authorization gate (M4) -- wraps registry.resolve(), records every
│   │                      # attempt (success or denial) to audit.py, regardless of outcome
│   ├── audit.py           # append-only log of this connector's own tool calls (M4)
│   ├── sps.py             # Security Posture Score composer (M4) -- weights are a parameter,
│   │                      # not hardcoded, per the open design question this resolved
│   ├── report.py          # generate_security_report's logic (M5) -- never states a deployment
│   │                      # recommendation unless compute_sps's output was actually supplied
│   ├── server.py          # FastMCP server, registers all 10 implemented tools
│   └── local_datasets.py  # shared loader/registrar for the 4 local models -- the one place
│                          # CONNECTOR_DATA_ROOT is resolved, used by study_server.py and
│                          # scripts/run_local_server.py so path logic isn't duplicated per script
├── tests/
│   ├── unit/              # per-module unit tests (synthetic models + pure-function cases, fast) --
│   │                      # includes test_sps.py (exact SPS=78.9 golden match), test_authz.py/
│   │                      # test_audit.py (gate + audit-trail behavior), test_report.py
│   ├── golden/            # regression tests against reference/ (validated result tables) --
│   │                      # standards tools + compute_sps match exactly; ML-attack tools match
│   │                      # within tolerance (boundary attack currently xfails, see Status)
│   ├── contract/          # MCP tool-schema contract tests (M5) -- every tool has a real
│   │                      # description + valid JSON Schema; spot-checks required params
│   └── fault_injection/   # bad models/inputs, cap enforcement (FGSM n≤100, MI pool≤5k)
├── reproduce/
│   ├── diagnose_boundary_discrepancy.py     # fast standalone RNG/AUROC diagnostic
│   ├── diagnose_nearest_vs_first.py         # confirms nearest vs. first-found pick different points
│   ├── diagnose_nearest_full_compare.py     # confirms they still give identical attack outcomes
│   ├── run_attacks.py     # run FGSM + boundary attack against any of the 4 local models
│   ├── results/           # tracked: JSON output of run_attacks.py (aggregate metrics, no PHI)
│   └── agent_usefulness_study/    # (M5) PROTOCOL.md + full execution harness (study_server.py,
│                                  # run_study.py, blind_transcripts.py, analyze_ratings.py) --
│                                  # has been run once, see Status and PROTOCOL.md
├── scripts/
│   └── run_local_server.py  # stdio entry point for Claude Desktop -- pre-registers one of the
│                             # 4 local models under a fixed handle, see "Running with Claude Desktop"
├── reference/              # tracked: validated result tables (aggregate metrics, no PHI)
├── examples/               # (M5) end_to_end_security_evaluation.md -- illustrative, hand-authored
│                           # transcript with real reference numbers, not a live-recorded session
├── docs/                   # (M5) TOOLS.md -- practical tool-by-tool usage reference
└── .github/workflows/      # (M5) ci.yml -- ruff + mypy + pytest, path-scoped to this connector

../data/                    # sibling to this package, gitignored (see ../.gitignore)
├── models/                 # icu_mortality_rf.pkl, ed_admission_rf.pkl, ckd_rf.pkl, wdbc_rf.pkl, *_meta.json
├── mimic/                  # icu_cohort/, ed_cohort/ train+test splits
├── ckd/processed/          # train+test splits
└── breast_cancer/processed/  # train+test splits

Related MCP server: inkog

Running the tests

This project's path (deeply nested under the thesis directory tree) exceeds Windows' 260-character limit for scikit-learn's compiled binaries, so the venv must live at a short path outside the project. Replace <you> below with your actual Windows username (e.g. C:\Users\wisdo\...) -- it's a placeholder, not literal text to paste:

uv venv --python 3.11 "C:\Users\<you>\.venvs\healthsec-mcp"
uv pip install -e ".[dev]" --python "C:\Users\<you>\.venvs\healthsec-mcp\Scripts\python.exe"

# fast suites (~25s) -- unit, fault-injection, and MCP contract tests
& "C:\Users\<you>\.venvs\healthsec-mcp\Scripts\python.exe" -m pytest tests/unit tests/fault_injection tests/contract -v

# golden regression against ../data/ (~7 min -- LIME explains each sample individually)
& "C:\Users\<you>\.venvs\healthsec-mcp\Scripts\python.exe" -m pytest tests/golden -v -s

# lint + type-check (instant, what CI runs)
& "C:\Users\<you>\.venvs\healthsec-mcp\Scripts\python.exe" -m ruff check src/ tests/ reproduce/ scripts/
& "C:\Users\<you>\.venvs\healthsec-mcp\Scripts\python.exe" -m mypy src/ scripts/run_local_server.py

../data/models/ and ../data/mimic/ must be populated first (see Data below). CI (.github/workflows/ci.yml) runs pytest tests/ directly, letting the MIMIC-IV-dependent golden tests skip automatically via their own skipif../data/ is gitignored and never present in CI.

Running attacks against a model

reproduce/run_attacks.py runs FGSM + boundary attack against any of the four locally available models. icu_mortality/ed_admission are the regression baseline (compared against reference/ in the golden tests); ckd/wdbc have no published ground truth -- this is new evaluation used to demonstrate the tools generalize beyond the two validated cohorts.

# one dataset at a time
& "C:\Users\<you>\.venvs\healthsec-mcp\Scripts\python.exe" reproduce\run_attacks.py --dataset ckd
& "C:\Users\<you>\.venvs\healthsec-mcp\Scripts\python.exe" reproduce\run_attacks.py --dataset wdbc
& "C:\Users\<you>\.venvs\healthsec-mcp\Scripts\python.exe" reproduce\run_attacks.py --dataset icu_mortality
& "C:\Users\<you>\.venvs\healthsec-mcp\Scripts\python.exe" reproduce\run_attacks.py --dataset ed_admission

# all four in one run (slowest -- icu_mortality/ed_admission each have up to
# 100 samples for boundary attack, similar runtime to the golden tests)
& "C:\Users\<you>\.venvs\healthsec-mcp\Scripts\python.exe" reproduce\run_attacks.py --dataset all

Each run writes its results to reproduce/results/<dataset>_attack_results.json (tracked in git, overwritten on each run) so nothing is lost once the terminal scrolls past it.

Running with Claude Desktop

This connects healthsec-mcp to Claude Desktop as a local MCP server -- Claude Desktop spawns it as a subprocess and talks to it over stdio. This is the intended way to actually use the connector day to day; the other supported path is the agent-usefulness study harness (reproduce/agent_usefulness_study/), which spawns the same server programmatically to script Condition A/B comparisons instead.

Two categories of tools, two setup paths

The 7 standards/SPS/report tools (assess_attack_coverage, check_rbac, score_audit_completeness, score_compliance, compute_sps, generate_security_report, get_audit_log) don't touch a registered model at all -- they score or compose evidence you pass directly in the tool call. These work with zero setup the moment Claude Desktop can launch the server at all.

The 3 model-touching tools (run_fgsm, run_boundary_attack, run_membership_inference) need a model registered under a model_handle before Claude can call them. This is the part that trips people up: Claude Desktop spawns healthsec-mcp as a brand-new subprocess with an empty registry every time -- there's no interactive Python session inside that subprocess for you to call registry.register() from after the fact. scripts/run_local_server.py solves this: it's a small wrapper that loads one of your 4 local models, registers it under a fixed, predictable handle, and then starts the same server -- point Claude Desktop at this script instead of the bare healthsec-mcp command if you need the model-touching tools.

Setup

1. Make sure the venv is set up (see "Running the tests" above if not).

2. Decide which path you need:

  • Only need the standards/SPS/report tools? Skip to step 4 and point Claude Desktop at healthsec-mcp directly (or the equivalent python -m healthsec_mcp.server) -- no dataset flag needed.

  • Need the model-touching tools too? Use scripts/run_local_server.py with one of --dataset icu_mortality|ed_admission|ckd|wdbc. This requires ../data/models/ and the matching processed dataset to already be populated (see "Data" below) -- icu_mortality/ed_admission need PhysioNet-credentialed MIMIC-IV data; ckd/wdbc are public and work out of the box if you've run the setup in "Running the tests."

3. Find (or create) Claude Desktop's config file:

%APPDATA%\Claude\claude_desktop_config.json

On Windows that's typically C:\Users\<you>\AppData\Roaming\Claude\claude_desktop_config.json. If the file doesn't exist yet, create it with just {"mcpServers": {}} and add your entry inside.

4. Add an entry under mcpServers. Replace <you> with your actual Windows username and adjust the repo path to match where you've cloned this project. Model-touching setup (recommended default -- gives you all 10 tools):

{
  "mcpServers": {
    "healthsec": {
      "command": "C:\\Users\\<you>\\.venvs\\healthsec-mcp\\Scripts\\python.exe",
      "args": [
        "C:\\Users\\<you>\\OneDrive\\Desktop\\University_of_the_Cumberlands\\Courses\\Thesis_2026_Proposing\\Papers_and_Code\\ai-agents-connectors\\01-healthcare-ai-security-connector\\healthsec-mcp\\scripts\\run_local_server.py",
        "--dataset",
        "icu_mortality"
      ]
    }
  }
}

Standards/report-tools-only setup (no model registration, works without ../data/ at all):

{
  "mcpServers": {
    "healthsec": {
      "command": "C:\\Users\\<you>\\.venvs\\healthsec-mcp\\Scripts\\healthsec-mcp.exe"
    }
  }
}

JSON requires double backslashes in Windows paths (\\, not \) -- copy the pattern above exactly, don't use single backslashes.

5. Restart Claude Desktop completely (quit from the system tray, not just close the window) so it picks up the config change.

6. Verify it connected. In a new Claude Desktop chat, look for a tools/connector icon indicating healthsec is available, or just ask Claude something that requires a tool, e.g. "What MCP tools do you have available from healthsec?" If nothing shows up, check Claude Desktop's logs (Help menu, or %APPDATA%\Claude\logs\) for a subprocess spawn error -- the most common cause is a typo'd path or JSON syntax error in the config file.

Using it

If you registered a model (step 2's model-touching path), reference the handle directly in your prompt -- it equals the dataset name you chose, e.g.:

Using model_handle="icu_mortality", check whether this model is vulnerable to small adversarial perturbations.

For the standards/report tools, just supply the evidence directly (Claude will ask for it, or you can paste it inline) -- see docs/TOOLS.md for a worked example of every tool, including the exact input shapes each one expects.

Troubleshooting

  • Claude says it has no tools from healthsec -- almost always a config path typo, or Claude Desktop wasn't fully restarted. Check the logs mentioned in step 6.

  • Model-touching tools fail with "model_handle is not registered" -- you're pointed at the bare healthsec-mcp command instead of scripts/run_local_server.py, or the --dataset you chose doesn't match the handle you referenced in your prompt (they're the same string, e.g. --dataset ckd gives you model_handle="ckd", not anything else).

  • run_local_server.py crashes on startup -- almost always a missing file under ../data/. icu_mortality/ed_admission specifically require the PhysioNet-credentialed MIMIC-IV data (see "Data" below); ckd/wdbc should work if you've completed the venv setup in "Running the tests," since those two are the ones with no such restriction.

MCP tools

Tool

Status

Authz-gated?

Input

Output

run_fgsm

implemented, golden-verified

yes

model, batch (n≤100), ε

flip rate, AUROC drop, plausibility rate

run_boundary_attack

implemented (known limitation, see Status)

yes

model, batch

flip rate, drop, mean steps

run_membership_inference

implemented, no published golden target

yes

model, member_pool + nonmember_pool (≤5k each)

MI accuracy/AUROC, privacy risk, patients-at-risk (direct count, not extrapolated)

assess_attack_coverage

implemented, golden-verified exactly

no

control set

PASS/PARTIAL/FAIL + mitigated/tested counts + coverage %

check_rbac

implemented, golden-verified exactly

no

already-executed probe results

pass count + enforcement rate

score_audit_completeness

implemented, golden-verified exactly

no

audit log entries

completeness rate

score_compliance

implemented, golden-verified exactly

no

HIPAA/FHIR checklist

per-standard % + overall %

compute_sps

implemented, golden-verified exactly (SPS=78.9 on reference inputs)

no

the 9 subscore inputs above

composite SPS 0–100, deployment tier, per-dimension breakdown

generate_security_report

implemented

no

any subset of the above tools' outputs

Markdown + structured report; only states a deployment recommendation if sps was supplied

get_audit_log

implemented

no

(none)

this session's full audit trail

"Authz-gated" tools resolve model_handle through authz.authorize(), which records every attempt to the audit trail whether it succeeds or is denied. Standards, compute_sps, generate_security_report, and get_audit_log don't touch a model at all -- they score/compose evidence the caller already has -- so they aren't gated.

See docs/TOOLS.md for a worked example of every tool plus a full end-to-end workflow.

Audit trail

Every authz-gated tool call is recorded to audit.default_audit_log in-memory, for the life of the server process: {timestamp, tool, model_handle, authorized, input_hash, detail}. This is the connector's own non-repudiation record; retrieve it via the get_audit_log tool. You can audit the auditor: standards.audit.score_audit_completeness will happily score default_audit_log.entries() against the same completeness check it runs on any other log, though the field names differ (this log's schema is authz.py's own, not the validated methodology's REQUIRED_FIELDS).

Tech stack

Python 3.11 (not 3.13 — passlib/bcrypt incompatibility; use import bcrypt directly), mcp (FastMCP), scikit-learn, numpy/pandas, FastAPI, pytest. License: Apache 2.0.

Data

MIMIC-IV ICU + ED cohorts and trained RF models. reference/ (this directory) holds the validated result tables — small, aggregate, no PHI, safe to commit. ../data/ (one level up, sibling to this package) holds the actual models and MIMIC-IV-derived data used to run the golden tests locally — this requires PhysioNet credentialing and is gitignored at the connector root (../.gitignore); it must never be committed or published. Override its location with CONNECTOR_DATA_ROOT if needed.

Safety

Adversarial and privacy-attack tools are scoped to a user's own models and gated behind an authorization check (authz.py) — a model only becomes usable by being registered directly through registry.py's Python API in the user's own script; there is no MCP tool that lets an agent register or guess a handle. Every authorized and denied attempt is recorded to the audit trail (audit.py) — see "Risks / limitations / ethics" in ../STRUCTURE.md.

Install Server
A
license - permissive license
A
quality
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/MichaelEnny/healthsec-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server