# Token Tracking
Track response sizes and estimate token usage for cost analysis.
---
## Overview
MCP servers don't have direct access to LLM token counts - tokens are returned to the client, not the server. mcpstat provides both:
1. **Server-side estimation** - Estimate tokens from response character count
2. **Client-side injection** - Report actual tokens from LLM API responses
---
## Basic Usage (Estimation)
Track response sizes for automatic token estimation:
```python
@app.call_tool()
async def handle_tool(name: str, arguments: dict):
result = await my_logic(arguments)
# Record with response size for token estimation
await stat.record(
name, "tool",
response_chars=len(str(result))
)
return result
```
mcpstat estimates tokens using ~3.5 characters per token (conservative for mixed content).
---
## Actual Token Tracking
If you have access to actual token counts from your LLM provider:
### Method 1: Record with Tokens
```python
await stat.record(
name, "tool",
input_tokens=100,
output_tokens=250
)
```
### Method 2: Deferred Reporting
```python
# Record the call first
await stat.record("my_tool", "tool")
# Later, when tokens are available
response = await anthropic.messages.create(...)
await stat.report_tokens(
"my_tool",
response.usage.input_tokens,
response.usage.output_tokens
)
```
---
## Token Statistics
`get_stats()` includes comprehensive token information:
```python
stats = await stat.get_stats()
```
### Response Structure
```python
{
"token_summary": {
"total_input_tokens": 5000, # Sum across all tools
"total_output_tokens": 12000, # Sum across all tools
"total_estimated_tokens": 3500, # From response_chars
"has_actual_tokens": True # True if any actual tokens recorded
},
"stats": [
{
"name": "my_tool",
"call_count": 10,
"total_input_tokens": 1000,
"total_output_tokens": 2500,
"total_response_chars": 8000,
"estimated_tokens": 2286,
"avg_tokens_per_call": 350, # (input + output) / calls
...
}
]
}
```
### Token Fields
| Field | Description |
|-------|-------------|
| `total_input_tokens` | Cumulative input tokens (if tracked) |
| `total_output_tokens` | Cumulative output tokens (if tracked) |
| `total_response_chars` | Cumulative response characters |
| `estimated_tokens` | Tokens estimated from response size |
| `avg_tokens_per_call` | Average tokens per invocation |
---
## Estimation vs. Actual
mcpstat prioritizes actual tokens over estimates:
```python
# Priority for avg_tokens_per_call:
if total_input_tokens + total_output_tokens > 0:
avg = (input + output) / call_count # Use actual
else:
avg = estimated_tokens / call_count # Fall back to estimate
```
---
## Use Cases
### Cost Analysis
Track token usage to estimate API costs:
```python
stats = await stat.get_stats()
summary = stats["token_summary"]
total_tokens = summary["total_input_tokens"] + summary["total_output_tokens"]
estimated_cost = total_tokens * 0.00001 # Example rate
print(f"Estimated cost: ${estimated_cost:.4f}")
```
### Identifying Token-Heavy Tools
Find tools that consume the most tokens:
```python
stats = await stat.get_stats()
# Sort by total tokens
by_tokens = sorted(
stats["stats"],
key=lambda s: s["total_input_tokens"] + s["total_output_tokens"],
reverse=True
)
for tool in by_tokens[:5]:
total = tool["total_input_tokens"] + tool["total_output_tokens"]
print(f"{tool['name']}: {total} tokens")
```
### Optimization Recommendations
```python
stats = await stat.get_stats()
for tool in stats["stats"]:
avg = tool["avg_tokens_per_call"]
if avg > 1000:
print(f"⚠️ {tool['name']}: {avg} avg tokens/call - consider optimization")
```
---
## Database Schema
Token tracking adds these columns to `mcpstat_usage`:
| Column | Type | Description |
|--------|------|-------------|
| `total_input_tokens` | INTEGER | Cumulative input tokens |
| `total_output_tokens` | INTEGER | Cumulative output tokens |
| `total_response_chars` | INTEGER | Cumulative response characters |
| `estimated_tokens` | INTEGER | Tokens estimated from response size |
---
## Migration
> **Since v0.2.1**: Token tracking columns were added in version 0.2.1. Existing databases are automatically migrated to include these columns. All existing data is preserved, and new columns default to `0`.
---
## Best Practices
### 1. Track Response Sizes
Even without actual tokens, tracking response sizes provides useful estimates:
```python
await stat.record(
name, "tool",
response_chars=len(json.dumps(result))
)
```
### 2. Use Deferred Reporting for Accuracy
When actual tokens are available, use `report_tokens()`:
```python
# In your client code
response = await client.messages.create(...)
await stat.report_tokens(
tool_name,
response.usage.input_tokens,
response.usage.output_tokens
)
```
### 3. Monitor High-Token Tools
Regularly check for tools with high average token usage:
```python
for tool in stats["stats"]:
if tool["avg_tokens_per_call"] > 500:
print(f"Review: {tool['name']}")
```