Implementing Distributed Tracing and Token Analytics in MCP Agents using OTel

The development of sophisticated, multi-tool AI systems has been greatly streamlined by the Model Context Protocol (MCP). MCP defines a standardized way for language models (LLMs) to discover and utilize external functionalities, referred to as tools. These tools are often hosted on dedicated MCP servers, forming a decentralized network of capabilities for an overarching agent the application or service responsible for orchestrating the LLM and its chosen tools.

However, as agentic workflows become more complex, visibility into their operation becomes challenging. Developers operating MCP servers often lack crucial information about their user base, the specific tools being utilized, and the resulting performance metrics ¹. Similarly, the users (or consumers) of these agents require insight into context management and token usage, which directly correlates with both latency and operational cost.

To address this gap, observability must move beyond simple logging to structured, protocol-aware data collection. OpenTelemetry (OTel), a vendor-agnostic standard for instrumentation, tracing, and metrics, provides the necessary framework. OTel’s architecture is ideally suited to monitor the distributed nature of MCP, tracking an agent's journey from a client request, through the LLM’s decision-making, and into the final execution on an external tool server.

OpenTelemetry as the Foundation for MCP Observability

The fundamental requirement for robust MCP analytics is establishing a consistent data collection method across heterogeneous environments. This is precisely where OpenTelemetry offers decisive advantages over proprietary logging solutions. OTel’s core data types traces, metrics, and logs. They can be mapped directly onto the key performance indicators (KPIs) of an MCP server and its connected agents.

The Two Sides of MCP Analytics

Implementing OTel within the MCP ecosystem requires a dual-pronged approach, targeting both the producer (server) and the consumer (client) of the agent workflow.

1. Server-Side Observability (The Producer)

The MCP server developer is primarily concerned with the health and efficiency of their offered tools. By instrumenting the server with OTel, a developer can gather detailed metrics on:

Tool Execution Latency: Tracking the time spent processing a tool call from request receipt to response transmission. High-latency tools are easily identified as potential bottlenecks in the agent’s overall workflow.
Error Distribution: Monitoring the frequency and type of errors across different tools and server versions. This allows for targeted maintenance and performance tuning.
Usage Breakdown: Analyzing which tools are called most frequently by specific users or client applications. This insight helps prioritize development efforts based on real-world demand.
Server Health: Gathering operational data on the server instances, including operating system, architecture, and running version, aiding in debugging and infrastructure planning.

The data collected here focuses on the operational performance of the MCP server as a service provider.

2. Client-Side Observability (The Consumer)

The consumer side focuses on the agentic workflow efficiency, which is heavily dictated by context management and token costs. This data is often gathered through a proxy or gateway that intercepts communications with the LLM provider. Key metrics include:

Token Usage Analytics: Detailed tracking of input and output tokens per session and per request. This is critical for managing LLM API costs.
Context Efficiency: Identifying the proportion of cached tokens versus new tokens, helping developers optimize message history management.
Model Consumption: Listing which LLM models (e.g., Anthropic, OpenAI, Google) are being used by the agent and in what volume.

This data provides the user with visibility into their total cost of context management, thereby incentivizing the use of more efficient MCP servers.

Instrumenting the Agent Lifecycle

To achieve deep observability, the standard OTel concepts must be consistently applied across the MCP stack. OTel uses a consistent data model that can be implemented using ready-made instrumentation packages, which are available for popular agent development languages like TypeScript and Python.

Tracing Agent Execution

The most valuable OTel component for MCP is distributed tracing. A trace represents the entire end-to-end journey of a single agent session or request, composed of individual units of work called spans.

A simplified MCP trace flow involves the following spans:

Agent Request Span (Client-side): Initiated when a user submits a query to the agent.
LLM Call Span (Client-side/Proxy): Records the interaction with the LLM, including the input and output message content, token count, and the model's decision to call a tool.
Tool Call Request Span (Client-side): Records the request sent from the agent to the MCP server.
Tool Execution Span (Server-side): Records the server’s internal processing of the tool call, including its duration and outcome (success or error).

By linking these spans through a common Trace ID (a feature built into OTel), a developer can visualize the full path of execution, identify where time is being spent (LLM generation vs. tool execution), and connect a high token count to a subsequent tool error.

Practical Implementation: TypeScript SDK Instrumentation

The Model Context Protocol's decentralized nature requires standardized, language-specific SDKs to simplify OTel implementation. For developers utilizing TypeScript (TS), instrumentation involves installing a dedicated package and adding minimal configuration to the MCP server's initialization sequence.

First, install the Shinzo instrumentation package alongside the core MCP SDK:

npm install @shinzolabs/instrumentation-mcp
npm install @modelcontextprotocol/sdk # Peer Dependency

The core instrumentation logic is then added during the server setup using the instrumentServer function, which automatically handles OTel configuration and context management:

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js"
import { instrumentServer } from "@shinzolabs/instrumentation-mcp"

// 1. Create your MCP server
const server = new McpServer({ 
    name: "my-mcp-server", 
    version: "1.0.0", 
    description: "My instrumented MCP server" 
})

// 2. Add telemetry instrumentation
const telemetry = instrumentServer(server, {
    serverName: "my-mcp-server",
    serverVersion: "1.0.0",
    exporterEndpoint: "https://api.app.shinzo.ai/telemetry/ingest_http", // OTel compatible endpoint
    exporterAuth: { 
        type: "bearer", 
        token: "your-ingest-token-here" // Secured with Bearer Token
    }
})

// 3. Continue with your normal server setup
server.tool("hello", { 
    description: "Say hello", 
    inputSchema: { 
        type: "object", 
        properties: { name: { type: "string" } } 
    } 
}, async (args) => {
    // The execution of this tool is now automatically traced
    return { content: `Hello, ${args.name}!` }
})

This minimal setup automatically wraps the server's tool execution logic, generating OTel traces and metrics that capture execution time and other vital metadata. The data is then exported via OTLP (OpenTelemetry Protocol) to the specified endpoint, providing deep visibility into the server's performance. For development purposes, the exporter type can be switched to "console" to verify local telemetry output.

Behind the Scenes / How It Works: The Dual-Path Telemetry Architecture

The integration of OTel for MCP observability effectively creates a dual-path telemetry architecture to capture all necessary data:

Path 1: Decentralized MCP Server Instrumentation

For performance and usage analytics, the MCP servers are instrumented directly.

Instrumentation: The server application uses an OTel SDK (e.g., Python or TypeScript packages) to automatically and manually generate traces and metrics for all tool calls and internal operations.
Export: The OTel Exporter sends the data (via the OTLP protocol) to a dedicated OTel Collector or directly to an analytics ingest service.
Data Focus: Tool call count, latency, error rate, and server metadata.

Path 2: Centralized Token and Context Interception

For critical cost and efficiency analytics (like token usage), a passive proxy or gateway is required to sit between the agent and the LLM provider.

Interception: The agent application is configured to send its requests (including model queries and tool-use intentions) to the proxy endpoint instead of directly to the LLM API.
Data Collection: The proxy intercepts the request and response, extracts the token counts (input, output, cache ratio), and model details before forwarding the request to the final LLM provider (e.g., OpenAI or Anthropic).
Data Focus: Token counts, LLM model used, and session-level context usage.

The Distributed Tracing Challenge

The primary architectural challenge lies in unifying these two paths into a single distributed trace. The LLM acts as the decision point, and while the proxy can create a trace for the LLM call, connecting this client-side trace to the new server-side trace that begins when the tool is executed is complex.

To achieve true end-to-end tracing, the client must propagate the Trace ID established in Path 2 into the request sent to the MCP server, and the MCP server (Path 1) must be configured to recognize and continue this trace. This requires:

Standardized Context Propagation: A mandatory protocol (e.g., using W3C Trace Context headers) must be enforced for all MCP tool calls to carry the parent Trace ID.
Mandatory Instrumentation: Both client and server developers must use OTel-compatible instrumentation that correctly handles context propagation.

This distributed tracing is essential for determining if high token consumption (Path 2) is justified by successful, efficient tool execution (Path 1).

My Thoughts

The push to leverage OpenTelemetry for Model Context Protocol analytics marks a professional and sustainable step forward for the agent ecosystem. By adopting a vendor-neutral, open standard, the community ensures that observability tools are interoperable, preventing vendor lock-in and allowing developers to switch between various backend analytics platforms easily.

The current effort to standardize the OTel semantic conventions for Generative AI is the most critical near-term improvement. Without community-agreed-upon tags for concepts like mcp.tool_name, agent.session_id, and llm.cached_tokens, every tool will report data inconsistently, rendering true ecosystem-wide comparison and tooling impossible.

The ultimate goal, full end-to-end distributed tracing across client-side LLM calls and decentralized server-side tool execution, remains a complex area. While technical solutions exist, achieving widespread adoption requires that the instrumentation packages simplify the process for the end-developer, abstracting away the complexities of context propagation headers. As this standardization matures, it will significantly improve the efficiency of MCP agents, leading to better context management and reduced operational costs for all users.

Acknowledgements

We thank Austin ,CEO of Shinzo Labs, for sharing his expertise and demonstrating the platform. The insights were drawn from the talk Building MCP Analytics with OpenTelemetry — Deep Dive with Shinzo Labs’ CEO ¹, hosted by the MCP Developers Summit. We extend our gratitude to the broader MCP and AI community for driving the development of these essential open standards.

References

Building MCP Analytics with OpenTelemetry — Deep Dive with Shinzo Labs’ CEO
↩