xai-toolkit

xai-mcp
docs
decisions

005-tool-output-structure.md•2.4 KiB

# ADR-005: Consistent Tool Output Structure (narrative + evidence + metadata) ## Status: Accepted ## Date: 2025-02-22 ## Context We have 6 MCP tools, each returning different types of data. The consuming LLM (Sonnet 4.5 in Copilot) needs a predictable response shape to present results consistently. Without a standard structure, each tool becomes a special case requiring unique handling logic. ## Decision Every MCP tool returns a `ToolResponse` with five fields: ```python { "narrative": str, # Plain English interpretation (REQUIRED) "evidence": dict, # Structured numeric data backing the narrative "metadata": dict, # Audit trail: model_id, timestamp, tool_version, data_hash "plot_base64": str, # Optional base64-encoded PNG visualization "grounded": bool, # Always True for tool responses (epistemic label) } ``` The `narrative` is the primary output — what the LLM presents to the user. The `evidence` is structured data for programmatic consumption or follow-up. The `metadata` enables auditability and reproducibility. The `grounded` flag is an epistemic label for the consuming LLM: `True` signals that this answer was computed deterministically from a registered model and is audit-ready. The LLM is instructed to prepend a disclaimer on any response it generates *without* calling a tool — i.e. when `grounded=True` is absent from the response entirely. This gives users a visible, consistent signal about whether they are reading a verified computation or general knowledge. ## Consequences ### Positive - ✅ LLM always knows to look for `narrative` and present it - ✅ Copilot instructions can say "present the narrative verbatim" — one rule - ✅ Audit trail is automatic for every tool call - ✅ Evidence enables follow-up questions without recomputation - ✅ Structure is extensible — new fields don't break existing consumers ### Negative - ⚠️ Some tools return very different evidence shapes (SHAP values vs. PDP curves) - ⚠️ `evidence` is typed as `dict` rather than a specific model (flexibility vs. safety) ## Alternatives Considered 1. **Free-form dict per tool** — Rejected: LLM would need per-tool parsing logic 2. **Narrative only** — Rejected: loses structured data for programmatic use 3. **Strongly typed evidence per tool** — Considered for future: would require discriminated union or per-tool response types. Current `dict` is pragmatic for PoC.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/florenciakabas/xai-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

005-tool-output-structure.md•2.4 KiB