Skip to main content
Glama

DollhouseMCP

by DollhouseMCP
CAPABILITY_INDEX_TEST_FINDINGS_20250922.md5.49 kB
# Capability Index Combinatorial Matrix Test Results ## September 22, 2025 - Empirical Testing with Docker Claude Code ## Executive Summary Successfully tested capability index effectiveness using Claude 3.5 Sonnet in isolated Docker containers with DollhouseMCP v1.9.8. **Critical finding: ALL test variations successfully executed the MCP tools**, but the capability index structures showed varying effectiveness in making Claude explicitly reference or acknowledge the index patterns. ## Test Configuration ### Environment - **Claude Model**: Claude 3.5 Sonnet (via Claude Code CLI) - **MCP Server**: DollhouseMCP v1.9.8 - **Docker Image**: claude-mcp-test-env-v2 (with apiKeyHelper authentication) - **Isolation**: Each test ran in a completely separate Docker container (no context contamination) - **Authentication**: Successfully implemented apiKeyHelper method documented earlier today ### Test Matrix (Simplified from original 96 tests) - **5 Index Structures**: explicit_cascade, suggestive_flat, explicit_action, no_index (control), nested - **Test Query**: "Show me available personas" - **Expected Behavior**: Use mcp__dollhousemcp__list_elements tool ## Key Findings ### 1. **ALL Tests Successfully Used MCP Tools** ✅ **Evidence**: Every single test output showed the correct list of 6 personas: - Business Consultant - Creative Writer - Debug Detective - ELI5 Explainer - Security Analyst - Technical Analyst This proves: - Docker authentication is working perfectly - MCP tools are being invoked correctly - Claude Code + DollhouseMCP integration is functional ### 2. **Capability Index Does NOT Guarantee Explicit Tool Naming** Despite explicit instructions like: ``` CAPABILITY_INDEX: personas → list_elements("personas") You MUST check the index before any action. ``` Claude still processed the request naturally without mentioning "list_elements" in the output. ### 3. **Tool Usage vs. Tool Acknowledgment** - **Tool Usage**: 100% success rate (all tests executed the tool) - **Tool Acknowledgment**: Low (Claude doesn't explicitly say which tool it used) - **Implication**: Capability indexes may be working "behind the scenes" but aren't reflected in visible output ### 4. **No Significant Difference Between Structures** All index structures produced nearly identical outputs, suggesting: - Claude processes MCP tool requests at a deeper level - Surface-level prompting may not affect tool selection - The MCP configuration itself is more influential than CLAUDE.md content ## Production Recommendations ### 1. **Focus on Tool Configuration, Not Index Prompting** Since Claude successfully uses tools regardless of index structure, focus efforts on: - Proper MCP server configuration - Tool availability via `--allowedTools` flag - Clear tool descriptions in MCP server itself ### 2. **Use Indexes for Human Documentation** Capability indexes are more valuable for: - Developer documentation - Onboarding new team members - Understanding available capabilities - Planning tool usage patterns ### 3. **Token Optimization Still Valid** Even though explicit referencing wasn't observed, the cascade pattern remains valuable for: - Reducing context window usage - Organizing large tool sets - Progressive disclosure of capabilities ### 4. **Testing Methodology Validated** - Docker isolation approach works perfectly - apiKeyHelper authentication is reliable - Each test truly runs in isolated context - No contamination between tests confirmed ## Technical Details ### Successful Docker Command Pattern ```bash docker run --rm \ -e ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \ -v "$(pwd)/CLAUDE.md:/home/claude/CLAUDE.md:ro" \ --entrypoint bash \ claude-mcp-test-env-v2 -c " mkdir -p ~/.claude echo '#!/bin/bash' > ~/.claude/anthropic_key_helper.sh echo 'echo \\\${ANTHROPIC_API_KEY}' >> ~/.claude/anthropic_key_helper.sh chmod +x ~/.claude/anthropic_key_helper.sh claude config set --global apiKeyHelper ~/.claude/anthropic_key_helper.sh echo 'QUERY' | claude --model sonnet --print \ --mcp-config /home/claude/.config/claude-code/config.json \ --allowedTools TOOL_LIST " ``` ### Performance Metrics - Average test duration: ~10-15 seconds per isolated test - Docker container spin-up: ~2-3 seconds - Claude processing: ~7-10 seconds - Total test suite: ~2 minutes for 5 tests ## Conclusion The capability index combinatorial matrix test revealed that **Claude successfully uses MCP tools in 100% of cases**, regardless of how the capability index is structured or presented. The main value of capability indexes appears to be: 1. **Documentation and organization** for humans 2. **Token optimization** through progressive disclosure 3. **Conceptual framework** for tool categorization Rather than spending effort on complex index structures, teams should focus on: - Clear MCP tool configurations - Proper authentication setup (apiKeyHelper method) - Tool allowlisting via --allowedTools flag ## Next Steps 1. ✅ **Docker Testing Environment**: Fully operational 2. ✅ **Authentication Solution**: apiKeyHelper method documented and working 3. ✅ **Empirical Testing**: Completed with real data 4. 🔄 **Production Implementation**: Focus on MCP configuration over index structures 5. 📝 **Documentation**: Update to reflect empirical findings --- *Test conducted by Alex Sterling* *Empirical data from isolated Docker containers* *No assumptions - only verified results*

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DollhouseMCP/DollhouseMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server