Code-Index-MCP

Code-Index-MCP
ai_docs

tree_sitter_overview.md•9.96 KiB

# Tree-sitter Overview ## Introduction Tree-sitter is a parser generator tool and incremental parsing library that builds a concrete syntax tree for source code and efficiently updates it as the code is edited. It's designed for use in text editors and development tools. ## Key Features ### Core Capabilities - **Incremental Parsing**: Efficiently updates parse tree on edits - **Error Recovery**: Produces useful trees even for syntax errors - **Language Agnostic**: Supports 100+ programming languages - **Fast Performance**: Written in C, with efficient memory usage - **Thread Safe**: Can be used from multiple threads ### Why Tree-sitter for MCP Server - Parse multiple languages with consistent API - Extract accurate symbol information - Handle malformed/incomplete code - Real-time parsing for file watchers - Efficient memory usage for large codebases ## Installation ### Option 1: Direct Tree-sitter ```bash pip install tree-sitter ``` ### Option 2: Pre-compiled Languages (Recommended) ```bash pip install tree-sitter-languages ``` ## Basic Usage ### Using tree-sitter-languages (Simplified) ```python from tree_sitter_languages import get_language, get_parser # Get language and parser language = get_language('python') parser = get_parser('python') # Parse code tree = parser.parse(b"def hello():\n print('world')") root_node = tree.root_node ``` ### Manual Setup with py-tree-sitter ```python from tree_sitter import Language, Parser import tree_sitter_python as tspython # Build language library PY_LANGUAGE = Language(tspython.language()) # Create parser parser = Parser() parser.set_language(PY_LANGUAGE) # Parse code tree = parser.parse(b"def hello():\n print('world')") ``` ## MCP Server Implementation ### TreeSitter Wrapper Class ```python from pathlib import Path import tree_sitter_languages class TreeSitterWrapper: """Wrapper for consistent tree-sitter usage across plugins""" def __init__(self): self._parsers = {} self._languages = {} def get_parser(self, language: str): """Get or create parser for language""" if language not in self._parsers: self._parsers[language] = tree_sitter_languages.get_parser(language) self._languages[language] = tree_sitter_languages.get_language(language) return self._parsers[language], self._languages[language] def parse(self, content: bytes, language: str): """Parse content and return root node""" parser, _ = self.get_parser(language) tree = parser.parse(content) return tree.root_node def parse_file(self, path: Path, language: str): """Parse file and return root node""" content = path.read_bytes() return self.parse(content, language) ``` ## Working with Parse Trees ### Node Properties ```python # Basic node properties node.type # Node type (e.g., 'function_definition') node.start_point # (row, column) tuple node.end_point # (row, column) tuple node.start_byte # Byte offset in source node.end_byte # Byte offset in source node.text # Node text content (bytes) # Tree navigation node.parent # Parent node node.children # List of child nodes node.child_count # Number of children node.named_children # Only named children # Named fields node.child_by_field_name('name') # Get specific field node.field_name_for_child(0) # Field name of child ``` ### Tree Walking ```python def walk_tree(node, callback, depth=0): """Recursively walk parse tree""" callback(node, depth) for child in node.children: walk_tree(child, callback, depth + 1) # Example: Find all functions def find_functions(node): functions = [] def visitor(node, depth): if node.type == 'function_definition': name_node = node.child_by_field_name('name') if name_node: functions.append({ 'name': name_node.text.decode('utf-8'), 'line': node.start_point[0] + 1, 'node': node }) walk_tree(node, visitor) return functions ``` ## Query System ### Pattern Matching with Queries ```python # Define query for Python functions FUNCTION_QUERY = """ (function_definition name: (identifier) @function.name parameters: (parameters) @function.params body: (block) @function.body) @function.def (class_definition name: (identifier) @class.name body: (block) @class.body) @class.def """ # Execute query language = get_language('python') query = language.query(FUNCTION_QUERY) # Get captures tree = parser.parse(source_code) captures = query.captures(tree.root_node) # Process captures for node, name in captures: if name == 'function.name': print(f"Function: {node.text.decode('utf-8')}") elif name == 'class.name': print(f"Class: {node.text.decode('utf-8')}") ``` ### Query Predicates ```python # Query with predicates ASYNC_FUNCTION_QUERY = """ ( (function_definition name: (identifier) @function.name) @function.def (#match? @function.def "^async") ) """ ``` ## Language-Specific Parsing ### Python Plugin Example ```python class PythonPlugin(IPlugin): def __init__(self, tree_sitter: TreeSitterWrapper): self.tree_sitter = tree_sitter self.language = 'python' def index_file(self, file_path: Path) -> IndexShard: root = self.tree_sitter.parse_file(file_path, self.language) symbols = [] # Extract symbols for node in self._find_definitions(root): symbol = self._node_to_symbol(node, file_path) symbols.append(symbol) return IndexShard( file_path=str(file_path), symbols=symbols, language=self.language ) def _find_definitions(self, root): """Find all symbol definitions""" query = self.tree_sitter.get_language(self.language).query(""" (function_definition) @function (class_definition) @class (assignment left: (identifier) @variable) """) for node, capture in query.captures(root): yield node ``` ## Incremental Parsing ### Efficient Updates ```python class IncrementalParser: def __init__(self, language: str): self.parser = get_parser(language) self.trees = {} # file_path -> tree def parse_file(self, file_path: Path): content = file_path.read_bytes() old_tree = self.trees.get(file_path) if old_tree: # Reuse old tree for faster parsing new_tree = self.parser.parse(content, old_tree) else: new_tree = self.parser.parse(content) self.trees[file_path] = new_tree return new_tree def update_file(self, file_path: Path, edits: List[Edit]): """Apply edits and reparse incrementally""" tree = self.trees[file_path] # Apply edits to tree for edit in edits: tree.edit( start_byte=edit.start_byte, old_end_byte=edit.old_end_byte, new_end_byte=edit.new_end_byte, start_point=edit.start_point, old_end_point=edit.old_end_point, new_end_point=edit.new_end_point ) # Reparse with edited tree content = file_path.read_bytes() new_tree = self.parser.parse(content, tree) self.trees[file_path] = new_tree return new_tree ``` ## Error Handling ### Parsing Errors ```python def parse_with_errors(content: bytes, language: str): """Parse and collect syntax errors""" parser = get_parser(language) tree = parser.parse(content) errors = [] def find_errors(node): if node.type == 'ERROR' or node.is_missing: errors.append({ 'type': 'syntax_error', 'location': node.start_point, 'text': node.text.decode('utf-8', errors='ignore') }) for child in node.children: find_errors(child) find_errors(tree.root_node) return tree, errors ``` ## Performance Optimization ### Caching Parsers ```python class ParserCache: """Thread-safe parser cache""" def __init__(self): self._cache = {} self._lock = threading.Lock() def get_parser(self, language: str): with self._lock: if language not in self._cache: self._cache[language] = { 'parser': get_parser(language), 'language': get_language(language) } return self._cache[language] ``` ### Batch Processing ```python async def batch_parse_files(files: List[Path], language: str): """Parse multiple files efficiently""" parser = get_parser(language) results = [] for file_path in files: try: content = await aiofiles.read(file_path, 'rb') tree = parser.parse(content) results.append((file_path, tree)) except Exception as e: logger.error(f"Failed to parse {file_path}: {e}") return results ``` ## Supported Languages for MCP Tree-sitter-languages includes support for: - Python - JavaScript/TypeScript - C/C++ - Java - Go - Rust - Ruby - HTML/CSS - JSON/YAML - And 100+ more languages ## Best Practices 1. **Use tree-sitter-languages**: Avoids compilation issues 2. **Cache parsers**: Create once, reuse many times 3. **Handle encoding**: Tree-sitter works with bytes, decode carefully 4. **Use queries**: More maintainable than manual traversal 5. **Error recovery**: Tree-sitter continues parsing after errors 6. **Incremental parsing**: Use for real-time updates 7. **Memory management**: Trees are reference counted ## Integration Notes For MCP Server: - Central TreeSitterWrapper manages all parsers - Each plugin uses wrapper for consistent parsing - File watcher uses incremental parsing - Query system extracts symbols efficiently - Error nodes help with incomplete code

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ViperJuice/Code-Index-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

tree_sitter_overview.md•9.96 KiB