Skip to main content
Glama

MCP-Grounded 🩺

A multi-agent pipeline for medical image classification with verification-aware abstention, coordinated via the Model Context Protocol (MCP).

"Instead of always guessing, the AI says β€” I'm not confident enough, I'll skip this one."


What is this?

MCP-Grounded is a 4-agent AI pipeline that classifies skin lesion images from the HAM10000 dataset. What makes it novel: the final agent can abstain from answering when it isn't confident β€” making it safer for medical use.

All four agents are real MCP tools, not just described as such.


Related MCP server: cross-validated-search

Pipeline

Skin lesion image
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  BiomedCLIP     β”‚  Agent 1: Extract 512-dim embedding
β”‚  (Extract)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚              MCP Server                  β”‚
            β”‚                                          β”‚
            β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
            β”‚  β”‚ Retrieve │──────▢│  Rerank  β”‚        β”‚
            β”‚  β”‚ Agent 2  β”‚       β”‚  Agent 3 β”‚        β”‚
            β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜        β”‚
            β”‚                          β”‚               β”‚
            β”‚                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
            β”‚                 β”‚  Verify / Abstain  β”‚   β”‚
            β”‚                 β”‚     Agent 4        β”‚   β”‚
            β”‚                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
            └────────────────────────── β”‚ β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                        β”‚
                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                          β”‚                           β”‚
                    conf β‰₯ Ο„                     conf < Ο„
                          β”‚                           β”‚
                       PREDICT                    ABSTAIN

Results

Retrieval Quality

Metric

Value

Recall@1

77.9%

Recall@5

93.5%

Recall@10

96.3%

Recall@50

99.2%

Verification-Aware Abstention (key result)

Threshold Ο„

Coverage

Selective Accuracy

0.0 (answer all)

100.0%

67.0%

0.5

96.9%

69.0%

0.6

83.4%

77.0%

0.7

52.0%

91.3%

0.8

4.9%

98.6%

At Ο„ = 0.7, selective accuracy improves +24 percentage points over the no-abstention baseline.

Risk–Coverage Curve

Risk-Coverage Curve

As the confidence threshold rises, coverage drops but selective accuracy climbs sharply β€” proving abstention makes the system safer.

Calibration

Metric

Value

ECE before temperature scaling

0.191

ECE after temperature scaling

0.185

Learned temperature T

0.944


Dataset

HAM10000 β€” 10,015 dermoscopic images across 7 skin lesion categories:

akiec Β· bcc Β· bkl Β· df Β· mel Β· nv Β· vasc

Split: 70% train / 15% validation / 15% test (stratified).


How to Run

Step 1 β€” Generate embeddings (Google Colab, GPU)

Open notebook1_embeddings.py in Google Colab with a T4 GPU runtime. Run all cells top to bottom. Downloads HAM10000 and produces embeddings.npz.

Step 2 β€” Run experiments (Google Colab)

Open notebook2_experiments.py in a new Colab notebook. Upload embeddings.npz. Run all cells. Produces:

  • All result tables (Recall@K, accuracy, calibration, abstention)

  • risk_coverage.png

  • clf_weights.npz

Step 3 β€” Run the MCP server (local)

pip install "mcp[cli]" numpy torch
python mcp_grounded_server.py

Starts a live MCP server with three callable tools: retrieve, rerank, classify_and_verify.


Requirements

mcp[cli]
numpy
torch
open_clip_torch
scikit-learn
pandas
pillow
tqdm
matplotlib

See requirements.txt.


File Structure

mcp_grounded/
β”œβ”€β”€ notebook1_embeddings.py     # Colab: download HAM10000, extract BiomedCLIP embeddings
β”œβ”€β”€ notebook2_experiments.py    # Colab: retrieval, calibration, abstention experiments
β”œβ”€β”€ mcp_grounded_server.py      # Local: FastMCP server exposing 4 agents as tools
β”œβ”€β”€ risk_coverage.png           # Figure 2: risk-coverage curve
β”œβ”€β”€ requirements.txt
└── README.md

Citation

If you use this work, please cite:

@inproceedings{mcpgrounded2025,
  title     = {MCP-Grounded: A Multi-Agent Pipeline with Verification-Aware Abstention for Medical Image Classification},
  author    = {[Your Name]},
  booktitle = {[Conference Name]},
  year      = {2025}
}

License

MIT License. Dataset (HAM10000) is CC-BY-NC-SA-4.0 β€” see Kaggle for terms.


Built with BiomedCLIP Β· FastMCP Β· HAM10000

F
license - not found
-
quality - not tested
C
maintenance

Maintenance

–Maintainers
–Response time
–Release cycle
–Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/silmun15/-mcp-grounded'

If you have feedback or need assistance with the MCP directory API, please join our Discord server