xai-toolkit

xai-mcp
docs

scalability-path.md•9.17 KiB

# Scalability Path: From Local PoC to Production Service > **Audience:** Engineering leads, data science managers, and architects > **Purpose:** Show that this prototype is not a throwaway — it is a foundation. > **Bottom line:** The Python code, test suite, and user experience are production-ready today. > What changes for production is infrastructure, not logic. --- ## 1. What We Built (The PoC) A local MCP server that exposes ML model explainability as plain-English narratives, callable from VS Code Copilot agent mode via natural language. **What it does today:** - Accepts natural language questions: *"Why was well #47 flagged as high risk?"* - Calls the right explainability tool automatically (SHAP, PDP/ICE, global importance) - Returns a deterministic, reproducible English narrative backed by quantitative evidence - Supports two model types (XGBoost, RandomForest) with identical interfaces - Every response carries a full audit trail: model ID, timestamp, tool version, data hash **What it runs on today:** - Developer laptop (Windows/Mac/Linux) - Transport: stdio (local process) - Models: pickle files on disk - Data: CSV files on disk - Auth: none (local only) - Users: one (the developer running VS Code) --- ## 2. What Changes for Production Each row below is an independent upgrade. They can be done in any order. | Component | PoC (Now) | Production | Effort | Owner | |---|---|---|---|---| | **Transport** | stdio (local process) | Streamable HTTP | 1–2 days | Platform | | **Hosting** | Developer laptop | Databricks App or container | 3–5 days | Platform | | **Models** | `.joblib` files on disk | MLflow Model Registry | 2–3 days | ML Eng | | **Data** | CSV files on disk | Unity Catalog tables | 2–3 days | Data Eng | | **Auth** | None | OAuth via Databricks SSO | 3–5 days | Platform | | **Concurrency** | Single user | Multi-user (HTTP handles this) | 0 days (free with transport change) | — | | **Governance** | None | Unity Catalog permissions + audit logging | 3–5 days | Governance | | **CI/CD** | Manual `uv run pytest` | GitHub Actions pipeline | 1–2 days | DevOps | **Total estimated effort to production: 2–4 weeks** depending on platform team availability. The application code requires zero changes for most of these upgrades. ### Transport upgrade (the most important one — it's also the easiest) The entire user experience — the natural language interface, the English narratives, the SHAP analysis — stays identical. Only one line changes: ```python # PoC (today): mcp.run(transport="stdio") # Production: mcp.run(transport="streamable-http", host="0.0.0.0", port=8000) ``` The VS Code `mcp.json` configuration changes from pointing at a local process to pointing at a URL: ```json // PoC: { "command": "uv", "args": ["run", "python", "-m", "xai_toolkit.server"] } // Production: { "url": "https://xai-toolkit.internal.company.com/mcp" } ``` The Copilot user experience is **identical** in both cases. --- ## 3. What Stays the Same (The Important Part) This is the section that matters most for long-term value. **Everything in `src/xai_toolkit/` requires no changes for production:** | Module | Role | Production status | |---|---|---| | `explainers.py` | SHAP computation, PDP/ICE, data hashing | ✅ Production-ready today | | `narrators.py` | Deterministic English generation | ✅ Production-ready today | | `plots.py` | matplotlib → base64 PNG | ✅ Production-ready today | | `schemas.py` | Pydantic data contracts | ✅ Production-ready today | | `registry.py` | Model loading/management | ✅ Interface stable; swap loading source | **Everything in `tests/` transfers directly to CI:** - 99 tests, all passing - Snapshot tests prevent narrative drift - Error handling tests prevent silent failures - Parametrized multi-model tests ensure new models work before deployment **The user experience is already production-quality:** - Deterministic: same question + same data = same answer, every time - Auditable: every response carries model ID, timestamp, tool version, and data hash - Graceful errors: unknown models, out-of-range indices, and typos all return helpful guidance --- ## 4. Databricks Integration Path This is the natural production home for this toolkit given existing infrastructure. ### Phase 2: MCP Server on Databricks Apps ``` User (VS Code Copilot) │ │ HTTPS / MCP Streamable HTTP ▼ Databricks App (xai-toolkit MCP server) │ │ ▼ ▼ MLflow Model Registry Unity Catalog (feature data) (pre-trained models) (production datasets) ``` **What this enables:** - Any engineer with VS Code + Copilot access can query any registered model - No data leaves Databricks — the MCP server runs inside the security perimeter - Model versioning handled by MLflow — the server always serves the "production" alias - Unity Catalog permissions control which users can query which models ### Phase 2 Registry Change The `ModelRegistry.load_from_disk()` method gets a sibling: ```python # Current: registry.load_from_disk("xgboost_breast_cancer", models_dir, data_dir) # Phase 2 (interface stays identical for tool handlers): registry.load_from_mlflow( model_id="xgboost_breast_cancer", mlflow_uri="databricks://...", registered_model_name="well_risk_classifier", alias="production", ) ``` Tool handlers call `registry.get("xgboost_breast_cancer")` in both cases — they are completely unaware of whether the model came from disk or MLflow. ### Phase 2 Data Change ```python # Current: X_test = pd.read_csv("data/breast_cancer_test_X.csv") # Phase 2: X_test = spark.table("catalog.schema.well_features_test").toPandas() ``` One line in `load_from_mlflow()`. Nothing else changes. --- ## 5. Integration with the Existing Kedro XAI Pipeline An existing pipeline (`EMOrg-Prd/xai-xgboost-clf`) already computes SHAP values in production batch runs. This toolkit does **not** replace it. They solve different halves of the same problem: ``` EXISTING PIPELINE (the engine) THIS MCP SERVER (the steering wheel) ───────────────────────────── ────────────────────────────────────── Batch computation (kedro run) → Interactive queries (natural language) Output: SHAP artifacts on disk → Output: English narratives in chat Audience: data scientists → Audience: any engineer with VS Code Invocation: CLI + config → Invocation: "Why was this well flagged?" Strength: reproducible compute → Strength: accessible communication ``` **Phase 2 integration:** The MCP server reads pre-computed SHAP artifacts from the Kedro pipeline's output store (MLflow / artifact store) instead of computing SHAP on demand. The pipeline is the computation engine; this is the access layer. ``` Kedro pipeline (nightly batch) → computes SHAP for all wells → saves artifacts to MLflow ← already exists and works MCP Server (on-demand, interactive) → receives "Why was well #47 flagged?" → reads pre-computed SHAP from MLflow ← replaces on-demand SHAP call → narrates → returns to user ``` **This means:** - The pipeline's computation stays completely untouched - The MCP server adds a new consumer, not a new computation path - Latency improves dramatically (read vs. compute) - The Kedro pipeline's existing parameterization and hooks become integration points --- ## 6. Multi-Agent Architecture (When It Becomes Relevant) **Current:** Single MCP server, single LLM routes via tool descriptions. This is correct for the current scope — 7 tools, one domain, one team. **When to evolve:** ``` Phase 1 (Now): VS Code Copilot → [xai-toolkit MCP server, 7 tools] Phase 2 (When tool count exceeds ~15): VS Code Copilot → [xai-toolkit MCP server] explainability domain → [data-access MCP server] Unity Catalog queries → [compliance MCP server] audit and reporting Phase 3 (Autonomous workflows): Orchestration layer (LangGraph or similar) → spawns specialist agents per domain → coordinates multi-step workflows → e.g., "Review all flagged wells this week and draft an inspection report" ``` The jump from Phase 1 to Phase 2 requires no changes to the xai-toolkit codebase — just connecting additional MCP servers to Copilot via `mcp.json`. --- ## Summary | Question | Answer | |---|---| | Is this production-ready code? | **Yes.** The application logic, tests, and schemas are production-quality. | | What needs to change for production? | Infrastructure only: transport, hosting, model source, data source. | | How long to production? | **2–4 weeks** with platform team support. | | Does this replace the existing Kedro pipeline? | **No.** It adds a natural language access layer on top of it. | | Can we add new models without code changes? | **Yes.** Register them in `server.py` startup. All tools work automatically. | | What happens if a tool fails? | Structured error with guidance. The LLM presents it helpfully. | | Can we audit what explanation was given? | **Yes.** Every response includes model ID, timestamp, tool version, and data hash. |

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/florenciakabas/xai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

scalability-path.md•9.17 KiB