Which integrations are available for this server?

Provides a production-ready path for scalable model explainability and data access through integration with Databricks and Unity Catalog. Enables users to query machine learning model behavior and individual prediction reasoning using natural language directly within the VS Code Copilot interface. Facilitates integration with Kedro-based machine learning pipelines to provide narrative explainability for model outputs within existing workflows. Supports integration with MLflow model registries to track and generate human-readable explanations for registered machine learning models.

How do I use xai-toolkit?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@xai-toolkit Explain why customer 8821 was predicted to churn" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

xai-toolkit

ML model explainability as plain-English narratives, exposed via MCP.

Ask your model why in natural language. Get a deterministic, reproducible English answer backed by SHAP analysis — directly inside VS Code Copilot.

User: "Why was sample 42 classified as degraded?"

Copilot: The model classified this sample as degraded (probability: 0.91)
         primarily because of 3 factors: total_acid_number = 4.8 (pushing toward
         the positive class by +0.28), water_content_ppm = 312.0 (+0.19),
         and viscosity_40c = 48.2 (+0.14). The top opposing factor is
         flash_point = 215.0 (pushing away from the positive class by -0.06).

No SHAP plots to decipher. No code to write. English that a decision-maker can act on.

Quick Start

# 1. Install dependencies
uv sync

# 2. Train the models (run once)
uv run python scripts/train_lubricant_model.py

# 3. Run the test suite (should show 100+ passing)
uv run python -m pytest tests/ -v

# 4. Open VS Code — the MCP server starts automatically via .vscode/mcp.json
#    Open Copilot chat in agent mode and ask:
#    "What models are available?"

Requirements: Python 3.11+, uv, VS Code with GitHub Copilot

What It Does

Seven MCP tools answer the most common explainability questions:

Ask	Tool	Returns
"What models are available?"	`list_models`	Model IDs, types, accuracy
"Tell me about the data"	`describe_dataset`	Sample count, class distribution, stats
"What does this model do?"	`summarize_model`	Model type, accuracy, top 5 features
"Which features matter most?"	`compare_features`	Ranked feature importance with magnitudes
"Why was sample N classified as X?"	`explain_prediction`	Narrative + SHAP bar chart
"Show me the full SHAP breakdown"	`explain_prediction_waterfall`	Narrative + waterfall plot
"How does feature F affect predictions?"	`get_partial_dependence`	Narrative + PDP/ICE plot

Every response includes a complete audit trail: model ID, timestamp, tool version, and a SHA256 hash of the input data. Same question + same data = same answer, every time.

Architecture

VS Code Copilot (Sonnet 4.5)
    │  natural language question
    ▼
MCP Client (built into Copilot)
    │  JSON-RPC over stdio
    ▼
xai-toolkit MCP Server  (server.py — thin adapter only)
    │
    ├── explainers.py   SHAP values, PDP/ICE, global importance, data hashing
    ├── narrators.py    Structured data → deterministic English paragraphs
    ├── plots.py        matplotlib → base64 PNG (bar, waterfall, PDP+ICE)
    ├── schemas.py      Pydantic contracts — single source of truth
    └── registry.py     ModelRegistry — load and serve multiple model types

Design principle: The LLM is the presenter, not the analyst. All computation and narrative generation happens in pure Python. The LLM chooses the right tool and wraps the pre-computed result conversationally. This guarantees reproducibility — the LLM cannot hallucinate SHAP values.

Project Layout

xai-mcp/
├── src/xai_toolkit/         # Source package
├── tests/                   # 100+ pytest tests
├── docs/
│   ├── decisions/           # 7 Architecture Decision Records
│   └── scalability-path.md  # PoC → Production roadmap
├── scenarios/               # YAML acceptance criteria (day1–day5)
├── scripts/                 # Model training scripts
├── models/                  # Trained model artifacts
├── data/                    # Test datasets
└── .vscode/mcp.json         # MCP server configuration

Running Tests

# Full test suite
uv run python -m pytest tests/ -v

# Write snapshot golden files (run once after first install)
uv run python -m pytest tests/test_snapshots.py --snapshot-update -v

# Just the fast unit tests (no model loading)
uv run python -m pytest tests/test_narrators.py tests/test_explainers.py tests/test_reproducibility.py -v

# Second model integration tests (requires trained RF model)
uv run python -m pytest tests/test_second_model.py -v

Adding a New Model

Train and save your model following the convention in scripts/train_lubricant_model.py

Add one line to the startup block in server.py:

registry.load_from_disk("your_model_id", MODELS_DIR, DATA_DIR)

Run uv run python -m pytest tests/test_second_model.py -v — all tools should work for your new model with zero code changes

See AGENTS.md for full coding standards and architecture guidance.

Architecture Decision Records

Seven decisions documented in docs/decisions/:

ADR	Decision
001	Pure functions separated from MCP layer
002	Deterministic narratives — no LLM calls
003	stdio → Streamable HTTP migration path
004	Pydantic schemas as single source of truth
005	Consistent tool output structure
006	ModelRegistry pattern
007	Single-agent architecture

Production Path

This local PoC becomes a production service by changing infrastructure, not code. See docs/scalability-path.md for the full roadmap, including Databricks integration, MLflow model registry, Unity Catalog data access, and the integration path with the existing Kedro-based XAI pipeline.

Estimated effort to production: 2–4 weeks (platform team, not application code).

FastMCP — MCP server framework used here
SHAP — explainability library
Model Context Protocol — the standard this implements

xai-toolkit

xai-toolkit

Quick Start

What It Does

Architecture

Project Layout

Running Tests

Adding a New Model

Architecture Decision Records

Production Path

Resources

Looking for Admin?

Tools

Latest Blog Posts

MCP directory API

xai-toolkit

Quick Start

What It Does

Architecture

Project Layout

Running Tests

Adding a New Model

Architecture Decision Records

Production Path

Related

Resources

Looking for Admin?

Tools

Latest Blog Posts

MCP directory API