xai-toolkit

AGENTS.md•6 KiB

# AGENTS.md — Instructions for AI Agents Working in This Repository ## Project Summary (one sentence) **xai-toolkit** exposes ML model explainability as plain-English narratives via MCP, so an LLM in VS Code Copilot can answer "Why did the model flag this well as high risk?" with a deterministic, SHAP-backed explanation. ## Commands ```bash # Run all tests (use this exact form — avoids Windows canonicalize-path errors) uv run python -m pytest # Run a specific test file uv run python -m pytest tests/test_narrators.py -v # Run tests matching a keyword uv run python -m pytest -k "waterfall" -v # Train models (only needed if models/ is empty) uv run python scripts/train_toy_model.py # XGBoost uv run python scripts/train_rf_model.py # RandomForest # Start the MCP server locally uv run python -m xai_toolkit.server # CLI interface (same functions, no MCP dependency) uv run python -m xai_toolkit.cli models uv run python -m xai_toolkit.cli explain --model xgboost_breast_cancer --sample 0 uv run python -m xai_toolkit.cli context --query "high risk biopsy" ``` ## The Golden Rule **The LLM is the presenter, not the analyst.** All analysis is done by pure Python functions. The LLM's only job is to choose the right MCP tool and present the pre-computed result conversationally. Same question + same data = same answer, always. ## Architecture (six layers, one direction) ``` User question → server.py → explainers.py → narrators.py → response ↓ ↓ schemas.py plots.py ↓ registry.py ``` - **`server.py`** — MCP adapter. Thin routing only. No computation here. - **`explainers.py`** — SHAP/PDP computation + data hashing. Two modes: on-the-fly (`compute_shap_values`) or from pre-computed pipeline artifacts (`load_shap_from_pipeline`). Both return the same `ShapResult` schema. - **`narrators.py`** — Structured data → deterministic English. No LLM calls. - **`plots.py`** — matplotlib → base64 PNG. Three types: SHAP bar, waterfall, PDP+ICE overlay. - **`schemas.py`** — Pydantic models. Single source of truth for all contracts. - **`registry.py`** — `ModelRegistry`: load/list/get models. Central loader. - **`pipeline_compat.py`** — Bridge to colleague's Kedro pipeline artifacts. Reads `shap_values.npy`, `shap_expected_value.npy`, and `shap_metadata.json`. Includes `_detect_model_type()` attributed to Tamás's original implementation. - **`knowledge.py`** — Business context retrieval (ADR-009). Loads markdown documents from `knowledge/`, chunks by heading, searches via TF-IDF. Pure Python, no MCP imports. Supports "Glass Floor" provenance pattern. - **`cli.py`** — CLI adapter. Same pure functions as `server.py`, outputs JSON to stdout. Zero MCP dependency. For scripting, CI/CD, and environments where MCP servers are unavailable. ## MCP Tools (8 total) | Tool | Question it answers | Plot | |---|---|---| | `list_models` | What models are available? | — | | `describe_dataset` | What does the data look like? | — | | `summarize_model` | What does this model do overall? | — | | `compare_features` | Which features matter most? | — | | `explain_prediction` | Why was sample N classified as X? | SHAP bar | | `explain_prediction_waterfall` | Show the full SHAP breakdown | Waterfall | | `get_partial_dependence` | How does feature F affect predictions? | PDP+ICE | | `retrieve_business_context` | What should I do about this? (ADR-009) | — | ## Tool Output Contract Every **success** returns: `narrative`, `evidence`, `metadata`, `plot_base64`, `grounded` (always `True`). Every **failure** returns: `error_code`, `message`, `available`, `suggestion`. See `schemas.py` for exact Pydantic definitions. The `grounded: True` flag signals deterministic computation. If the LLM answers without calling a tool, it must prepend a ⚠️ disclaimer. `retrieve_business_context` returns a `KnowledgeSearchResult` (not `ToolResponse`) with `provenance_label: "ai-interpreted"`. The Glass Floor Protocol (ADR-009) requires the LLM to present deterministic output (Layer 1) separately from business context (Layer 2). See `.github/copilot-instructions.md` rule 6. ## Design Decisions (docs/decisions/) | ADR | Decision | |---|---| | 001 | Pure functions separated from MCP layer | | 002 | Deterministic narratives — no LLM calls | | 003 | stdio transport (PoC) → Streamable HTTP (production) | | 004 | Pydantic schemas as single source of truth | | 005 | Every tool returns narrative + evidence + metadata + optional plot | | 006 | ModelRegistry pattern — one central loader | | 007 | Single-agent architecture — LLM routes via tool descriptions | | 008 | Pipeline bridge — read pre-computed SHAP artifacts, don't recompute | | 009 | RAG augmentation with Glass Floor provenance separation | ## How to Add a New Model ```bash # 1. Train and save (adapt train_rf_model.py for your model type) uv run python scripts/train_rf_model.py # 2. Add one line to server.py startup: registry.load_from_disk("your_model_id", MODELS_DIR, DATA_DIR) # 3. Run the test suite — all tools work automatically: uv run python -m pytest tests/test_second_model.py -v ``` No other code changes needed. The registry + parametrized tests handle the rest. ## Coding Standards - Python 3.11+, type hints on all public signatures - Pydantic v2 for all data contracts - pytest with parametrized tests for multi-model coverage - Google-style docstrings on all public functions - Scenario YAML written before implementing each feature (see `scenarios/`) ## What NOT to Do - **No computation in `server.py`** — it is an adapter only - **No LLM calls in `narrators.py`** — narratives must be deterministic - **No MCP/FastMCP imports outside `server.py`** - **No changes to tool output structure without updating `schemas.py` first** - **No new narrator without a snapshot test** to lock its output - **No hard-coded file paths in tests** — use fixtures and `pathlib` - **Do not auto-generate or overwrite this file** — update it alongside code changes

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/florenciakabas/xai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

AGENTS.md•6 KiB