TOKEN_USAGE_ANALYSIS.mdโข4.4 kB
# Token Usage Analysis - Capability Index Tests
## September 22, 2025
## Performance Metrics from Empirical Tests
### Test Execution Times
- Test 1 (explicit_cascade_top): ~11 seconds
- Test 2 (suggestive_flat): ~10 seconds
- Test 3 (explicit_action): ~12 seconds
- Test 4 (no_index/control): ~10 seconds
- Test 5 (nested): ~11 seconds
- **Average: 10.8 seconds**
## Token Count Analysis
### Input Token Counts (CLAUDE.md + Query)
**Test 1 - Explicit Cascade:**
```
# CRITICAL: Always Check Capability Index First
CAPABILITY_INDEX:
personas โ list_elements("personas")
debug โ search_collection("debug")
security โ search_portfolio("security")
You MUST check the index before any action.
```
- CLAUDE.md tokens: ~45 tokens
- Query: "Show me available personas" = ~6 tokens
- **Total input: ~51 tokens**
**Test 4 - No Index (Control):**
```
# DollhouseMCP Project
You have access to MCP tools for element management.
```
- CLAUDE.md tokens: ~15 tokens
- Query: "Show me available personas" = ~6 tokens
- **Total input: ~21 tokens**
### Output Token Counts
**Test 1 Output (62 words):**
- ~85-90 tokens
**Test 4 Output (65 words):**
- ~88-92 tokens
## Critical Finding: NO TOKEN SAVINGS OBSERVED
### Expected vs Actual
**EXPECTED (from Sept 21 architecture doc):**
- Capability index should enable 97% token reduction
- Cascade pattern: 10 tokens โ 50 tokens โ 150+ tokens
- Only expand when needed
**ACTUAL RESULTS:**
1. **More input tokens used** with index (51 vs 21)
2. **Similar output tokens** (85-90 for all tests)
3. **Same execution time** (10-11 seconds)
4. **No progressive disclosure** - full tool execution every time
## Why Token Optimization Failed
### 1. MCP Tools Execute Immediately
When Claude sees "Show me available personas", it:
- Immediately calls `mcp__dollhousemcp__list_elements`
- Returns full results
- No "cascade" behavior observed
### 2. Index Adds Overhead Without Benefit
- Index structure ADDS tokens to input
- Claude still executes full tool call
- No evidence of "checking index first"
### 3. Tool Execution is Binary
Either:
- Tool executes fully (100% of tokens)
- Tool doesn't execute (0% of tokens)
No middle ground or progressive disclosure observed.
## Token Usage Breakdown
### With Capability Index (Test 1)
```
Input tokens: 51
Tool execution: ~200 (estimated for MCP call)
Output tokens: 90
TOTAL: ~341 tokens
```
### Without Index (Test 4)
```
Input tokens: 21
Tool execution: ~200 (estimated for MCP call)
Output tokens: 92
TOTAL: ~313 tokens
```
**Result: Index INCREASED token usage by ~28 tokens (9%)**
## Speed Analysis
### No Performance Difference
All tests: 10-12 seconds regardless of structure
### Breakdown (estimated):
- Docker container startup: 2-3 seconds
- Authentication setup: 1 second
- Claude processing: 3-4 seconds
- MCP tool execution: 2-3 seconds
- Response generation: 1-2 seconds
## Conclusions
### 1. Token Optimization FAILED
- **No cascade behavior** - tools execute fully every time
- **Index adds overhead** - extra tokens without benefit
- **No progressive disclosure** - full data returned always
### 2. Speed Unchanged
- All structures perform identically
- ~10-11 seconds per request
- No optimization from indexing
### 3. The 97% Token Savings Claim
**NOT ACHIEVED** in empirical testing:
- Expected: 8,800 โ 250 tokens (97% reduction)
- Actual: 313 โ 341 tokens (9% INCREASE)
## Why The Theory Didn't Work
### Hypothesis 1: MCP Tool Execution Model
MCP tools may execute atomically - all or nothing. No partial execution possible.
### Hypothesis 2: Claude's Tool Calling Behavior
Claude may not be capable of "progressive tool use" - it either needs the data or doesn't.
### Hypothesis 3: Wrong Abstraction Level
Capability indexes may need to be implemented at the MCP server level, not in CLAUDE.md.
## Recommendation
**STOP pursuing capability indexes for token optimization.**
The empirical data shows:
1. No token savings (actually increases tokens)
2. No speed improvement
3. No behavioral difference
4. Added complexity without benefit
Focus instead on:
- Optimizing MCP server responses
- Caching at the server level
- Better tool descriptions
- Selective tool availability
---
*Based on empirical test data from isolated Docker containers*
*All token counts are estimates based on typical GPT tokenization*