| explain_prediction | Explain why a single sample received its classification. Returns a plain-English narrative explaining which features drove
the model's prediction for the given sample, backed by SHAP values.
Optionally includes a SHAP bar chart (tornado plot) visualization.
Checks the result store first for precomputed explanations.
Falls back to on-the-fly SHAP computation if not found.
Args:
model_id: ID of a registered model (e.g., "gbc_lubricant_quality").
sample_index: Row index in the test dataset to explain (0-based).
include_plot: If True, include a SHAP bar chart as base64 PNG (default: True).
|
| explain_prediction_waterfall | Show a SHAP waterfall plot for a single prediction. The waterfall shows how the base prediction builds up to the final
prediction feature by feature. This is the most detailed SHAP visualization.
Args:
model_id: ID of a registered model (e.g., "gbc_lubricant_quality").
sample_index: Row index in the test dataset to explain (0-based).
|
| summarize_model | Summarize what a model does and what drives its decisions. Returns model type, accuracy, number of features, and the top features
ranked by importance — all in plain English.
Args:
model_id: ID of a registered model (e.g., "gbc_lubricant_quality").
|
| compare_features | Rank features by importance and describe which matter most. Returns a ranked list of features with their magnitude, direction,
and comparative language — all in plain English.
Args:
model_id: ID of a registered model (e.g., "gbc_lubricant_quality").
top_n: Number of top features to include (default: 10).
|
| get_partial_dependence | Show how a single feature affects predictions across its range. Returns a narrative describing the relationship between the feature
and the model's predicted probability. Optionally includes a PDP + ICE
plot (model-agnostic visualization, not SHAP-based).
PDP (bold line) shows the average effect. ICE (gray lines) show individual
sample effects, revealing heterogeneity the average hides.
Args:
model_id: ID of a registered model (e.g., "gbc_lubricant_quality").
feature_name: Name of the feature to analyze (e.g., "mean radius").
include_plot: If True, include a PDP+ICE plot as base64 PNG (default: True).
|
| list_models | List all registered models with their metadata. Returns model IDs, types, dataset names, feature counts, and accuracy.
Use this to discover what models are available before asking questions. |
| describe_dataset | Describe the dataset associated with a model. Returns number of samples, features, class distribution, missing values,
and basic statistics — all in plain English.
Args:
model_id: ID of a registered model (e.g., "gbc_lubricant_quality").
|
| compare_predictions | Compare what two models predict for the same sample and explain why. Returns whether the models agree, their confidence levels, and which
features they share or diverge on — all in plain English. Use this
to build trust in predictions by checking cross-model consistency.
Args:
model_id_1: ID of the first model (e.g., "gbc_lubricant_quality").
model_id_2: ID of the second model (e.g., "rf_lubricant_quality").
sample_index: Row index in the test dataset to compare (0-based).
|
| detect_drift | Detect data drift between a model's training data and test data. Checks the result store first for precomputed drift results.
Falls back to on-the-fly computation if not found.
Numeric features are tested with PSI (primary) and KS (supporting).
Categorical features are tested with chi-squared.
Args:
model_id: ID of a registered model (e.g., "gbc_lubricant_quality").
|
| detect_feature_drift | Detect drift for a single feature between training and test data. Returns detailed drift analysis for one feature: statistical test
results, severity, how the distribution shifted (direction, magnitude),
and reference vs. current distribution summaries.
Args:
model_id: ID of a registered model (e.g., "gbc_lubricant_quality").
feature_name: Name of the feature to analyze (e.g., "mean radius").
|
| list_drift_alerts | Browse batch drift findings across features and models. Returns precomputed drift results from the result store. This is a
discovery tool — it does NOT fall back to on-the-fly computation.
Args:
model_id: Filter to a specific model. If None, returns results
for all models with stored drift data.
severity: Filter by severity ("none", "moderate", "severe").
run_id: Filter to a specific batch run.
|
| list_explained_samples | Browse which samples have precomputed explanations. Returns a summary of available precomputed explanations from the
result store. This is a discovery tool — it does NOT compute
explanations on the fly.
Args:
model_id: Model identifier.
run_id: Filter to a specific batch run.
|
| standard_briefing | Generate a concise, predefined briefing from persisted batch results. This tool is retrieval-first and does not run SHAP/drift computations.
It is designed as a reusable daily/weekly briefing entry point for
stakeholders who ask a similar set of baseline questions each run.
Args:
model_id: Optional model filter. If None, includes all models with
persisted result-store artifacts.
run_id: Optional run filter.
top_cases: Maximum number of highlighted explained samples per model.
|
| retrieve_business_context | Retrieve relevant business context from the knowledge base. Searches loaded business documents (e.g., clinical protocols, operational
rules) for sections relevant to the query. Returns ranked chunks with
source provenance for the Glass Floor presentation pattern.
Use this AFTER an explainability tool to find actionable business guidance.
For example, after explain_prediction returns a high-probability malignant
classification, call this with a query like 'high risk malignant biopsy'
to retrieve the relevant clinical protocol sections.
The provenance_label is always 'ai-interpreted' — any synthesis from
these chunks by the LLM is NOT deterministic and must be clearly
distinguished from grounded tool outputs.
Args:
query: Natural language search query (e.g., 'high risk biopsy referral').
top_k: Maximum number of chunks to return (default: 5).
|
| get_xai_methodology | Retrieve the XAI analysis methodology guide. Call this BEFORE starting any model explanation to understand the correct
tool sequence (explanation funnel), Glass Floor protocol, and anti-patterns
to avoid. Returns the full workflow guide. |
| list_skills | List available versioned context skills and guardrail metadata. |
| get_glass_floor | Retrieve the Glass Floor separation principles for presenting model explanations alongside business context. Call this when you need to present both deterministic model outputs and
AI-interpreted business guidance. Returns the two-layer separation protocol. |
| get_skill | Retrieve one skill by id/version with guardrails and checksum. |
| record_feedback | Record expert feedback on a toolkit narrative. Call this when an expert evaluates the quality of an explanation.
The narrative is hashed to create an auditable link between the
feedback and the exact output that was evaluated.
Valid ratings: excellent, useful, too_technical, too_vague, missing_context
Args:
model_id: Which model produced the output being evaluated.
tool_name: Which tool produced it (e.g., "explain_prediction").
narrative: The exact narrative text being evaluated (will be hashed).
rating: Expert's assessment (excellent/useful/too_technical/too_vague/missing_context).
audience_role: Evaluator's role (e.g., "reliability_engineer").
business_line: Business context (e.g., "lubricants").
expert_comment: Optional free-text elaboration.
|
| get_taste_context | Retrieve organizational taste — what experts think good explanations look like. Returns aggregated feedback from experts across business lines.
Use this to understand audience preferences before presenting results.
For example, if reliability engineers consistently rate explanations as
"too_technical", the LLM can adjust its presentation framing.
All filters are optional. Omit all for a full summary.
Args:
model_id: Filter to a specific model.
audience_role: Filter to a specific role (e.g., "reliability_engineer").
business_line: Filter to a specific business line (e.g., "lubricants").
tool_name: Filter to a specific tool (e.g., "explain_prediction").
|