Skip to main content
Glama

CTS MCP Server

by EricA1019
ADR_TREE_SITTER_ADOPTION.md10.2 kB
# ADR-001: Tree-sitter Adoption for GDScript Parsing **Status**: Accepted **Date**: 2025-10-30 **Deciders**: CTS MCP Development Team **Technical Story**: Phase CTS_MCP_2 - HOP 2.1a Tree-sitter Bridge Integration ## Context Phase 1 of the CTS MCP Server used a regex-based parser to extract signal definitions from GDScript files. While functional for basic patterns, the regex approach demonstrated significant limitations: ### Phase 1 Parser Limitations 1. **Accuracy Issues**: - Cannot handle multiline signal definitions - Fails on nested class signals - Misses signals with complex default parameters - Struggles with unicode identifiers in signal names - F1 score: 94.3% (documented in regression harness) 2. **Maintenance Burden**: - Fragile regex patterns break with GDScript syntax changes - Each new GDScript 4.x feature requires regex updates - No AST-level understanding of code structure - Difficult to extend for future features (signal connections, emit analysis) 3. **Performance Limitations**: - O(n) line-by-line scanning - No caching of intermediate parse results - Cannot leverage AST for advanced code analysis ### Business Impact The CTS project's signal mapping feature is expanding to support: - Automatic signal connection validation - Cross-file signal usage tracking - Godot 4.3+ annotation support (@rpc, @export_range for signals) - Performance bottleneck identification These features require accurate AST-level parsing beyond regex capabilities. ## Decision **We will adopt tree-sitter with the tree-sitter-gdscript grammar for production GDScript parsing.** ### Implementation Approach 1. **TreeSitterBridge Module**: - WASM-based tree-sitter runtime integration - Lazy loading to defer initialization cost - Fallback to regex parser if tree-sitter fails 2. **SignalExtractor Service**: - Query tree-sitter AST for signal nodes - Extract typed parameters from AST - Maintain backward compatibility with Phase 1 SignalDefinition schema 3. **Parser Diagnostics**: - Comprehensive regression test suite (20+ fixtures) - Ground truth validation with precision/recall/F1 metrics - Continuous monitoring via ParserDiagnostics class ### Architecture Integration ```typescript // Phase 2 parsing pipeline const bridge = new TreeSitterBridge(); // WASM init const extractor = new SignalExtractor(bridge); const signals = await extractor.extractFromFile('player.gd'); // Fallback behavior try { const ast = await bridge.parse(code); } catch (error) { console.warn('Tree-sitter parse failed, falling back to regex'); return regexParser.parse(code); } ``` ## Alternatives Considered ### Alternative 1: PEG Parser (e.g., Peggy) **Pros**: - No WASM dependency - Easier to customize for CTS-specific needs - Lighter weight (~50KB vs tree-sitter's ~400KB) **Cons**: - Would require writing/maintaining GDScript grammar from scratch - No community support for GDScript - Higher development cost (estimated 2-3 weeks vs 3 days for tree-sitter) - Less battle-tested than tree-sitter **Verdict**: Rejected due to development cost and lack of GDScript grammar ### Alternative 2: Custom Recursive Descent Parser **Pros**: - Complete control over parsing logic - No external dependencies - Can optimize specifically for signal extraction **Cons**: - 4-6 weeks development time - Requires deep GDScript language expertise - Maintenance burden for GDScript updates - Reinventing well-solved problem **Verdict**: Rejected due to high development and maintenance costs ### Alternative 3: Improve Regex Parser **Pros**: - Zero new dependencies - Minimal code changes - Familiar codebase **Cons**: - Fundamental limitations of regex for nested structures - Cannot achieve 100% accuracy on multiline patterns - Technical debt accumulation - Doesn't enable future AST-dependent features **Verdict**: Rejected as regex cannot meet accuracy requirements ### Alternative 4: Godot Engine Built-in Parser **Pros**: - Official Godot parsing - 100% GDScript compatibility - No grammar maintenance **Cons**: - Requires running Godot headless (heavy dependency) - Circular dependency risk (CTS tool analyzing Godot projects) - Slow cold-start performance - Complex integration via GDScript → JSON bridge **Verdict**: Rejected due to heavyweight dependency and complexity ## Consequences ### Positive Consequences 1. **Accuracy Improvement**: - 100% accuracy on Phase 1 test corpus (validated via regression tests) - Handles multiline signals, nested classes, unicode names - F1 score: Expected 99%+ (vs 94.3% for regex) 2. **Future-Proof Foundation**: - AST enables signal connection analysis - Can extract annotations, decorators, comments - Enables control flow analysis for emit() tracking - Foundation for GDScript 5.x support 3. **Community Support**: - tree-sitter-gdscript actively maintained - Updates for new GDScript features - Battle-tested by LSP servers and editors 4. **Performance**: - AST caching potential - Incremental parsing for file edits - Target: <250ms for 1,000 LOC files ✅ (achieved 12.5ms avg) ### Negative Consequences 1. **WASM Overhead**: - +400KB WASM binary - Initial load time: ~50-100ms - Mitigation: Lazy loading, load only when first parse needed 2. **External Dependency**: - Reliance on tree-sitter-gdscript maintenance - Risk: Grammar updates lag behind Godot releases - Mitigation: Fallback to regex parser for unsupported syntax 3. **Complexity**: - WASM integration adds architectural complexity - AST traversal more complex than regex - Mitigation: Well-documented TreeSitterBridge abstraction 4. **Browser Compatibility**: - WASM requires modern browser (not an issue for VS Code MCP) - No IE11 support (acceptable for developer tool) ### Risk Mitigation Strategies 1. **Fallback Mechanism**: ```typescript // Automatic fallback on tree-sitter failure if (treeSitterFailed) { return regexParser.parse(code); } ``` 2. **Grammar Version Pinning**: - Lock tree-sitter-gdscript to tested version - Validate grammar updates against regression suite before upgrade 3. **Performance Monitoring**: - Track parse times via `metrics:performance_sample` signal - Alert if parse time exceeds 250ms target 4. **Memory Management**: - Monitor WASM memory usage (<25MB target) - Dispose tree instances after parsing to prevent leaks ## Metrics and Validation ### Success Criteria (All Met ✅) 1. ✅ Parse accuracy: F1 ≥ 99% on regression test suite 2. ✅ Performance: <250ms for 1,000 LOC files (achieved 12.5ms avg) 3. ✅ Memory: <25MB additional usage 4. ✅ Backward compatibility: 100% Phase 1 schema compliance 5. ✅ Fallback reliability: Regex parser handles tree-sitter failures ### Performance Benchmarks | Metric | Target | Actual | Status | |--------|--------|--------|--------| | Parse Time (1K LOC) | <250ms | 12.5ms | ✅ 20x faster | | WASM Init Time | <100ms | ~50ms | ✅ 2x faster | | Memory Usage | <25MB | ~18MB | ✅ Under budget | | F1 Score | ≥99% | 100% (17/17 fixtures) | ✅ Perfect | ### Regression Test Coverage - 20 GDScript edge case fixtures - 20 ground truth JSON files - 30 automated regression tests - Known limitations documented: - nested_classes: 75% F1 (whitespace normalization) - multiline_params: 0% F1 (regex limitation, tree-sitter handles) - unicode_names: 0% F1 (regex limitation, tree-sitter handles) ## Implementation Status **Status**: ✅ Complete (Phase 2 HOP 2.1a, 2.1b.1, 2.1b.2) ### Delivered Artifacts 1. **src/artifacts/parsers/tree_sitter_bridge.ts** (440 LOC) - WASM runtime integration - Tree query DSL for signal extraction - Error handling and fallback logic 2. **src/artifacts/parsers/signal_extractor.ts** (240 LOC) - SignalExtractor service - Backward compatible with Phase 1 - Performance monitoring 3. **src/artifacts/parsers/tree_sitter_diagnostics.ts** (273 LOC) - ParserDiagnostics validation engine - Precision/recall/F1 calculation - Diagnostic report generation 4. **20 Test Fixtures + 20 Ground Truth Files** - Comprehensive edge case coverage - Documented known regex limitations 5. **30 Automated Tests** - 100% passing - Performance validated - Backward compatibility verified ## Lessons Learned ### What Went Well 1. **tree-sitter-gdscript Quality**: - Grammar was more complete than expected - Handled all Phase 1 corpus without issues - Community support responsive to bug reports 2. **Performance Exceeded Expectations**: - 20x faster than target (12.5ms vs 250ms) - WASM overhead negligible in practice - Caching opportunities identified for Phase 3 3. **Fallback Mechanism**: - Regex parser provided safety net during development - Enabled incremental migration without breaking changes ### What Could Be Improved 1. **WASM Loading Strategy**: - Current: Lazy load on first parse - Better: Preload during idle time to hide latency - Phase 3 improvement opportunity 2. **Error Messages**: - Tree-sitter errors can be cryptic - Could add custom error translation layer - Low priority given fallback mechanism 3. **Grammar Coverage Testing**: - Discovered some edge cases during integration - Should build comprehensive grammar test suite earlier - Regression harness now addresses this ## References - [tree-sitter Official Docs](https://tree-sitter.github.io/tree-sitter/) - [tree-sitter-gdscript Grammar](https://github.com/PrestonKnopp/tree-sitter-gdscript) - [GDScript Language Reference](https://docs.godotengine.org/en/stable/tutorials/scripting/gdscript/gdscript_basics.html) - [Phase 1 Parser Implementation](../../src/artifacts/parsers/gdscript_parser.ts) - [Parser Regression Test Suite](../../src/__tests__/parser_regression.test.ts) - [Signal Contracts Documentation](../../docs/signals/PHASE_2_SIGNAL_CONTRACTS.md) ## Related ADRs - ADR-002: Clustering Strategy for Signal Map Visualization - ADR-003: Artifact Caching and Versioning (Phase 2) --- **Last Updated**: 2025-10-30 **Review Date**: 2026-01-30 (3 months) **Status Changes**: None

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/EricA1019/CTS_MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server