Schema | xai-toolkit

xai-toolkit

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
No arguments

Capabilities

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": false }
`prompts`	{ "listChanged": false }
`resources`	{ "subscribe": false, "listChanged": false }
`experimental`	{}

Tools

Functions exposed to the LLM to take actions

Name	Description
explain_predictionA	Explain why a single sample received its classification. Returns a plain-English narrative explaining which features drove the model's prediction for the given sample, backed by SHAP values. Optionally includes a SHAP bar chart (tornado plot) visualization. Checks the result store first for precomputed explanations. Falls back to on-the-fly SHAP computation if not found. Args: model_id: ID of a registered model (e.g., "gbc_lubricant_quality"). sample_index: Row index in the test dataset to explain (0-based). include_plot: If True, include a SHAP bar chart as base64 PNG (default: True).
explain_prediction_waterfallA	Show a SHAP waterfall plot for a single prediction. `The waterfall shows how the base prediction builds up to the final prediction feature by feature. This is the most detailed SHAP visualization. Args: model_id: ID of a registered model (e.g., "gbc_lubricant_quality"). sample_index: Row index in the test dataset to explain (0-based).`
summarize_modelA	Summarize what a model does and what drives its decisions. `Returns model type, accuracy, number of features, and the top features ranked by importance — all in plain English. Args: model_id: ID of a registered model (e.g., "gbc_lubricant_quality").`
compare_featuresB	Rank features by importance and describe which matter most. `Returns a ranked list of features with their magnitude, direction, and comparative language — all in plain English. Args: model_id: ID of a registered model (e.g., "gbc_lubricant_quality"). top_n: Number of top features to include (default: 10).`
get_partial_dependenceA	Show how a single feature affects predictions across its range. Returns a narrative describing the relationship between the feature and the model's predicted probability. Optionally includes a PDP + ICE plot (model-agnostic visualization, not SHAP-based). PDP (bold line) shows the average effect. ICE (gray lines) show individual sample effects, revealing heterogeneity the average hides. Args: model_id: ID of a registered model (e.g., "gbc_lubricant_quality"). feature_name: Name of the feature to analyze (e.g., "mean radius"). include_plot: If True, include a PDP+ICE plot as base64 PNG (default: True).
list_modelsA	List all registered models with their metadata. Returns model IDs, types, dataset names, feature counts, and accuracy. Use this to discover what models are available before asking questions.
describe_datasetA	Describe the dataset associated with a model. `Returns number of samples, features, class distribution, missing values, and basic statistics — all in plain English. Args: model_id: ID of a registered model (e.g., "gbc_lubricant_quality").`
compare_predictionsA	Compare what two models predict for the same sample and explain why. `Returns whether the models agree, their confidence levels, and which features they share or diverge on — all in plain English. Use this to build trust in predictions by checking cross-model consistency. Args: model_id_1: ID of the first model (e.g., "gbc_lubricant_quality"). model_id_2: ID of the second model (e.g., "rf_lubricant_quality"). sample_index: Row index in the test dataset to compare (0-based).`
detect_driftA	Detect data drift between a model's training data and test data. `Checks the result store first for precomputed drift results. Falls back to on-the-fly computation if not found. Numeric features are tested with PSI (primary) and KS (supporting). Categorical features are tested with chi-squared. Args: model_id: ID of a registered model (e.g., "gbc_lubricant_quality").`
detect_feature_driftB	Detect drift for a single feature between training and test data. `Returns detailed drift analysis for one feature: statistical test results, severity, how the distribution shifted (direction, magnitude), and reference vs. current distribution summaries. Args: model_id: ID of a registered model (e.g., "gbc_lubricant_quality"). feature_name: Name of the feature to analyze (e.g., "mean radius").`
list_drift_alertsA	Browse batch drift findings across features and models. `Returns precomputed drift results from the result store. This is a discovery tool — it does NOT fall back to on-the-fly computation. Args: model_id: Filter to a specific model. If None, returns results for all models with stored drift data. severity: Filter by severity ("none", "moderate", "severe"). run_id: Filter to a specific batch run.`
list_explained_samplesA	Browse which samples have precomputed explanations. `Returns a summary of available precomputed explanations from the result store. This is a discovery tool — it does NOT compute explanations on the fly. Args: model_id: Model identifier. run_id: Filter to a specific batch run.`
standard_briefingA	Generate a concise, predefined briefing from persisted batch results. `This tool is retrieval-first and does not run SHAP/drift computations. It is designed as a reusable daily/weekly briefing entry point for stakeholders who ask a similar set of baseline questions each run. Args: model_id: Optional model filter. If None, includes all models with persisted result-store artifacts. run_id: Optional run filter. top_cases: Maximum number of highlighted explained samples per model.`
retrieve_business_contextA	Retrieve relevant business context from the knowledge base. Searches loaded business documents (e.g., clinical protocols, operational rules) for sections relevant to the query. Returns ranked chunks with source provenance for the Glass Floor presentation pattern. Use this AFTER an explainability tool to find actionable business guidance. For example, after explain_prediction returns a high-probability malignant classification, call this with a query like 'high risk malignant biopsy' to retrieve the relevant clinical protocol sections. The provenance_label is always 'ai-interpreted' — any synthesis from these chunks by the LLM is NOT deterministic and must be clearly distinguished from grounded tool outputs. Args: query: Natural language search query (e.g., 'high risk biopsy referral'). top_k: Maximum number of chunks to return (default: 5).
get_xai_methodologyA	Retrieve the XAI analysis methodology guide. Call this BEFORE starting any model explanation to understand the correct tool sequence (explanation funnel), Glass Floor protocol, and anti-patterns to avoid. Returns the full workflow guide.
list_skillsB	List available versioned context skills and guardrail metadata.
get_glass_floorA	Retrieve the Glass Floor separation principles for presenting model explanations alongside business context. Call this when you need to present both deterministic model outputs and AI-interpreted business guidance. Returns the two-layer separation protocol.
get_skillC	Retrieve one skill by id/version with guardrails and checksum.
record_feedbackA	Record expert feedback on a toolkit narrative. Call this when an expert evaluates the quality of an explanation. The narrative is hashed to create an auditable link between the feedback and the exact output that was evaluated. Valid ratings: excellent, useful, too_technical, too_vague, missing_context Args: model_id: Which model produced the output being evaluated. tool_name: Which tool produced it (e.g., "explain_prediction"). narrative: The exact narrative text being evaluated (will be hashed). rating: Expert's assessment (excellent/useful/too_technical/too_vague/missing_context). audience_role: Evaluator's role (e.g., "reliability_engineer"). business_line: Business context (e.g., "lubricants"). expert_comment: Optional free-text elaboration.
get_taste_contextA	Retrieve organizational taste — what experts think good explanations look like. Returns aggregated feedback from experts across business lines. Use this to understand audience preferences before presenting results. For example, if reliability engineers consistently rate explanations as "too_technical", the LLM can adjust its presentation framing. All filters are optional. Omit all for a full summary. Args: model_id: Filter to a specific model. audience_role: Filter to a specific role (e.g., "reliability_engineer"). business_line: Filter to a specific business line (e.g., "lubricants"). tool_name: Filter to a specific tool (e.g., "explain_prediction").

Prompts

Interactive templates invoked by user choice

Name	Description
`xai_methodology`	The XAI analysis methodology — explanation funnel, Glass Floor, anti-patterns.

Resources

Contextual data attached and managed by the client

Name	Description
No resources

Server Configuration
Capabilities
Tools
Prompts
Resources

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/florenciakabas/xai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server