Skip to main content
Glama
florenciakabas

xai-toolkit

xai-toolkit

ML model explainability as plain-English narratives, exposed via MCP.

Ask your model why in natural language. Get a deterministic, reproducible English answer backed by SHAP analysis — directly inside VS Code Copilot.

User: "Why was sample 42 classified as degraded?"

Copilot: The model classified this sample as degraded (probability: 0.91)
         primarily because of 3 factors: total_acid_number = 4.8 (pushing toward
         the positive class by +0.28), water_content_ppm = 312.0 (+0.19),
         and viscosity_40c = 48.2 (+0.14). The top opposing factor is
         flash_point = 215.0 (pushing away from the positive class by -0.06).

No SHAP plots to decipher. No code to write. English that a decision-maker can act on.


Quick Start

# 1. Install dependencies
uv sync

# 2. Train the models (run once)
uv run python scripts/train_lubricant_model.py

# 3. Run the test suite (should show 100+ passing)
uv run python -m pytest tests/ -v

# 4. Open VS Code — the MCP server starts automatically via .vscode/mcp.json
#    Open Copilot chat in agent mode and ask:
#    "What models are available?"

Requirements: Python 3.11+, uv, VS Code with GitHub Copilot


What It Does

Seven MCP tools answer the most common explainability questions:

Ask

Tool

Returns

"What models are available?"

list_models

Model IDs, types, accuracy

"Tell me about the data"

describe_dataset

Sample count, class distribution, stats

"What does this model do?"

summarize_model

Model type, accuracy, top 5 features

"Which features matter most?"

compare_features

Ranked feature importance with magnitudes

"Why was sample N classified as X?"

explain_prediction

Narrative + SHAP bar chart

"Show me the full SHAP breakdown"

explain_prediction_waterfall

Narrative + waterfall plot

"How does feature F affect predictions?"

get_partial_dependence

Narrative + PDP/ICE plot

Every response includes a complete audit trail: model ID, timestamp, tool version, and a SHA256 hash of the input data. Same question + same data = same answer, every time.


Architecture

VS Code Copilot (Sonnet 4.5)
    │  natural language question
    ▼
MCP Client (built into Copilot)
    │  JSON-RPC over stdio
    ▼
xai-toolkit MCP Server  (server.py — thin adapter only)
    │
    ├── explainers.py   SHAP values, PDP/ICE, global importance, data hashing
    ├── narrators.py    Structured data → deterministic English paragraphs
    ├── plots.py        matplotlib → base64 PNG (bar, waterfall, PDP+ICE)
    ├── schemas.py      Pydantic contracts — single source of truth
    └── registry.py     ModelRegistry — load and serve multiple model types

Design principle: The LLM is the presenter, not the analyst. All computation and narrative generation happens in pure Python. The LLM chooses the right tool and wraps the pre-computed result conversationally. This guarantees reproducibility — the LLM cannot hallucinate SHAP values.


Project Layout

xai-mcp/
├── src/xai_toolkit/         # Source package
├── tests/                   # 100+ pytest tests
├── docs/
│   ├── decisions/           # 7 Architecture Decision Records
│   └── scalability-path.md  # PoC → Production roadmap
├── scenarios/               # YAML acceptance criteria (day1–day5)
├── scripts/                 # Model training scripts
├── models/                  # Trained model artifacts
├── data/                    # Test datasets
└── .vscode/mcp.json         # MCP server configuration

Running Tests

# Full test suite
uv run python -m pytest tests/ -v

# Write snapshot golden files (run once after first install)
uv run python -m pytest tests/test_snapshots.py --snapshot-update -v

# Just the fast unit tests (no model loading)
uv run python -m pytest tests/test_narrators.py tests/test_explainers.py tests/test_reproducibility.py -v

# Second model integration tests (requires trained RF model)
uv run python -m pytest tests/test_second_model.py -v

Adding a New Model

  1. Train and save your model following the convention in scripts/train_lubricant_model.py

  2. Add one line to the startup block in server.py:

    registry.load_from_disk("your_model_id", MODELS_DIR, DATA_DIR)
  3. Run uv run python -m pytest tests/test_second_model.py -v — all tools should work for your new model with zero code changes

See AGENTS.md for full coding standards and architecture guidance.


Architecture Decision Records

Seven decisions documented in docs/decisions/:

ADR

Decision

001

Pure functions separated from MCP layer

002

Deterministic narratives — no LLM calls

003

stdio → Streamable HTTP migration path

004

Pydantic schemas as single source of truth

005

Consistent tool output structure

006

ModelRegistry pattern

007

Single-agent architecture


Production Path

This local PoC becomes a production service by changing infrastructure, not code. See docs/scalability-path.md for the full roadmap, including Databricks integration, MLflow model registry, Unity Catalog data access, and the integration path with the existing Kedro-based XAI pipeline.

Estimated effort to production: 2–4 weeks (platform team, not application code).


Install Server
A
security – no known vulnerabilities
F
license - not found
A
quality - confirmed to work

Resources

Looking for Admin?

Admins can modify the Dockerfile, update the server description, and track usage metrics. If you are the server author, to access the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/florenciakabas/xai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server