Skip to main content
Glama
florenciakabas

xai-toolkit

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault

No arguments

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": false
}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
explain_prediction

Explain why a single sample received its classification.

Returns a plain-English narrative explaining which features drove
the model's prediction for the given sample, backed by SHAP values.
Optionally includes a SHAP bar chart (tornado plot) visualization.

Checks the result store first for precomputed explanations.
Falls back to on-the-fly SHAP computation if not found.

Args:
    model_id: ID of a registered model (e.g., "gbc_lubricant_quality").
    sample_index: Row index in the test dataset to explain (0-based).
    include_plot: If True, include a SHAP bar chart as base64 PNG (default: True).
explain_prediction_waterfall

Show a SHAP waterfall plot for a single prediction.

The waterfall shows how the base prediction builds up to the final
prediction feature by feature. This is the most detailed SHAP visualization.

Args:
    model_id: ID of a registered model (e.g., "gbc_lubricant_quality").
    sample_index: Row index in the test dataset to explain (0-based).
summarize_model

Summarize what a model does and what drives its decisions.

Returns model type, accuracy, number of features, and the top features
ranked by importance — all in plain English.

Args:
    model_id: ID of a registered model (e.g., "gbc_lubricant_quality").
compare_features

Rank features by importance and describe which matter most.

Returns a ranked list of features with their magnitude, direction,
and comparative language — all in plain English.

Args:
    model_id: ID of a registered model (e.g., "gbc_lubricant_quality").
    top_n: Number of top features to include (default: 10).
get_partial_dependence

Show how a single feature affects predictions across its range.

Returns a narrative describing the relationship between the feature
and the model's predicted probability. Optionally includes a PDP + ICE
plot (model-agnostic visualization, not SHAP-based).

PDP (bold line) shows the average effect. ICE (gray lines) show individual
sample effects, revealing heterogeneity the average hides.

Args:
    model_id: ID of a registered model (e.g., "gbc_lubricant_quality").
    feature_name: Name of the feature to analyze (e.g., "mean radius").
    include_plot: If True, include a PDP+ICE plot as base64 PNG (default: True).
list_models

List all registered models with their metadata.

Returns model IDs, types, dataset names, feature counts, and accuracy. Use this to discover what models are available before asking questions.

describe_dataset

Describe the dataset associated with a model.

Returns number of samples, features, class distribution, missing values,
and basic statistics — all in plain English.

Args:
    model_id: ID of a registered model (e.g., "gbc_lubricant_quality").
compare_predictions

Compare what two models predict for the same sample and explain why.

Returns whether the models agree, their confidence levels, and which
features they share or diverge on — all in plain English. Use this
to build trust in predictions by checking cross-model consistency.

Args:
    model_id_1: ID of the first model (e.g., "gbc_lubricant_quality").
    model_id_2: ID of the second model (e.g., "rf_lubricant_quality").
    sample_index: Row index in the test dataset to compare (0-based).
detect_drift

Detect data drift between a model's training data and test data.

Checks the result store first for precomputed drift results.
Falls back to on-the-fly computation if not found.

Numeric features are tested with PSI (primary) and KS (supporting).
Categorical features are tested with chi-squared.

Args:
    model_id: ID of a registered model (e.g., "gbc_lubricant_quality").
detect_feature_drift

Detect drift for a single feature between training and test data.

Returns detailed drift analysis for one feature: statistical test
results, severity, how the distribution shifted (direction, magnitude),
and reference vs. current distribution summaries.

Args:
    model_id: ID of a registered model (e.g., "gbc_lubricant_quality").
    feature_name: Name of the feature to analyze (e.g., "mean radius").
list_drift_alerts

Browse batch drift findings across features and models.

Returns precomputed drift results from the result store. This is a
discovery tool — it does NOT fall back to on-the-fly computation.

Args:
    model_id: Filter to a specific model. If None, returns results
        for all models with stored drift data.
    severity: Filter by severity ("none", "moderate", "severe").
    run_id: Filter to a specific batch run.
list_explained_samples

Browse which samples have precomputed explanations.

Returns a summary of available precomputed explanations from the
result store. This is a discovery tool — it does NOT compute
explanations on the fly.

Args:
    model_id: Model identifier.
    run_id: Filter to a specific batch run.
standard_briefing

Generate a concise, predefined briefing from persisted batch results.

This tool is retrieval-first and does not run SHAP/drift computations.
It is designed as a reusable daily/weekly briefing entry point for
stakeholders who ask a similar set of baseline questions each run.

Args:
    model_id: Optional model filter. If None, includes all models with
        persisted result-store artifacts.
    run_id: Optional run filter.
    top_cases: Maximum number of highlighted explained samples per model.
retrieve_business_context

Retrieve relevant business context from the knowledge base.

Searches loaded business documents (e.g., clinical protocols, operational
rules) for sections relevant to the query. Returns ranked chunks with
source provenance for the Glass Floor presentation pattern.

Use this AFTER an explainability tool to find actionable business guidance.
For example, after explain_prediction returns a high-probability malignant
classification, call this with a query like 'high risk malignant biopsy'
to retrieve the relevant clinical protocol sections.

The provenance_label is always 'ai-interpreted' — any synthesis from
these chunks by the LLM is NOT deterministic and must be clearly
distinguished from grounded tool outputs.

Args:
    query: Natural language search query (e.g., 'high risk biopsy referral').
    top_k: Maximum number of chunks to return (default: 5).
get_xai_methodology

Retrieve the XAI analysis methodology guide.

Call this BEFORE starting any model explanation to understand the correct tool sequence (explanation funnel), Glass Floor protocol, and anti-patterns to avoid. Returns the full workflow guide.

list_skills

List available versioned context skills and guardrail metadata.

get_glass_floor

Retrieve the Glass Floor separation principles for presenting model explanations alongside business context.

Call this when you need to present both deterministic model outputs and AI-interpreted business guidance. Returns the two-layer separation protocol.

get_skill

Retrieve one skill by id/version with guardrails and checksum.

record_feedback

Record expert feedback on a toolkit narrative.

Call this when an expert evaluates the quality of an explanation.
The narrative is hashed to create an auditable link between the
feedback and the exact output that was evaluated.

Valid ratings: excellent, useful, too_technical, too_vague, missing_context

Args:
    model_id: Which model produced the output being evaluated.
    tool_name: Which tool produced it (e.g., "explain_prediction").
    narrative: The exact narrative text being evaluated (will be hashed).
    rating: Expert's assessment (excellent/useful/too_technical/too_vague/missing_context).
    audience_role: Evaluator's role (e.g., "reliability_engineer").
    business_line: Business context (e.g., "lubricants").
    expert_comment: Optional free-text elaboration.
get_taste_context

Retrieve organizational taste — what experts think good explanations look like.

Returns aggregated feedback from experts across business lines.
Use this to understand audience preferences before presenting results.
For example, if reliability engineers consistently rate explanations as
"too_technical", the LLM can adjust its presentation framing.

All filters are optional. Omit all for a full summary.

Args:
    model_id: Filter to a specific model.
    audience_role: Filter to a specific role (e.g., "reliability_engineer").
    business_line: Filter to a specific business line (e.g., "lubricants").
    tool_name: Filter to a specific tool (e.g., "explain_prediction").

Prompts

Interactive templates invoked by user choice

NameDescription
xai_methodologyThe XAI analysis methodology — explanation funnel, Glass Floor, anti-patterns.

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/florenciakabas/xai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server