CocoIndex Code MCP Server
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@CocoIndex Code MCP Serverfind the implementation of the merge sort algorithm"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
CocoIndex Code MCP Server
A Model Context Protocol (MCP) server that provides a RAG (Retrieval Augmented Generation) tool with hybrid search capabilities combining vector similarity and keyword metadata search for code retrieval. Built on the CocoIndex data transformation framework with specialized support for multiple programming languages.
This RAG MCP server enables AI tools (LLMs) to retrieve relevant code snippets from large codebases efficiently and in real-time, leveraging CocoIndex's incremental indexing, tree-sitter based chunking, and smart language-specific embeddings. It enhances the performance of code generation, code completion, and code understanding by virtually enlarging the context window available to the AI models.
Currently uses PostgreSQL + pgvector as the vector database backend, but can be adapted to other backends supported by CocoIndex.
Table of Contents
Related MCP server: docs-mcp-server
Quickstart
1. Clone the Repository (optional)
git clone --recursive https://github.com/aanno/cocoindex-code-mcp-server.git
cd cocoindex-code-mcp-serverChecking out the sources is not strictly necessary if you just want to use the MCP server, as it can be installed from PyPI. However, there are some scripts e.g. for starting the pgvector database that are missing from the PyPI package.
2. Install
Build from source using maturin:
# Install dependencies from PyPI
uv sync
uv sync --all-extras
# And build from source
maturin developOr simple install from PyPI:
pip install cocoindex-code-mcp-serverI provide native wheels for many systems (including Linux, Windows and MacOS) on PyPI, so no build should be necessary in most cases. cocoindex-code-mcp-server needs Python 3.11+ (and I prefer to build abi3 wheels for better compatibility).
3. Start the PostgreSQL Database
In one terminal on your local machine, start the pgvector database:
cd cocoindex-code-mcp-server
./scripts/cocoindex-postgresql.sh
# Maybe you need to install pgvector extension once
./scripts/install-pgvector.pyUsing the scripts is optional, however you need a running PostgreSQL + pgvector database for the MCP server to work.
4. Configure the MCP Server (DB Connection)
cocoindex_code_mcp_server uses the COCOINDEX_DATABASE_URL environment variable to connect to the database.
It reads the .env file in the current directory if present. You can copy the provided .env.template to .env and
adjust the connection string if needed.
The current directory does not need to be the directory that you want to scan (see section 'Command Line Arguments' below for details).
cp .env.template .env5. Start the MCP Server
In another terminal, start the cocoindex_code_mcp_server:
cd cocoindex-code-mcp-server
python -m cocoindex_code_mcp_server.main_mcp_server --rescan --port 3033 <path_to_code_directory>The server will index the code in the specified directory and start serving requests. This will take some time. It is ready when you see something like:
CodeEmbedding.files (batch update): 1505 source rows NO CHANGEThe PyPI package does provide starting server with cocoindex-code-mcp-server <options> <root-source-dir>. Remember
that you need a running PostgreSQL + pgvector database for this to work.
6. Use the MCP Server
You can now use the RAG server running at http://localhost:3033 as a streaming HTTP MCP server. For example, with Claude Code, use the following snippet within "mcpServers" in your .mcp.json file:
{
"cocoindex-rag": {
"command": "pnpm",
"args": [
"dlx",
"mcp-remote@next",
"http://localhost:3033/mcp"
]
}
}Command Line Arguments
Argument | Type | Default | Description |
| positional | - | Path(s) to code directory/directories to index (can specify multiple) |
| option | - | Alternative way to specify paths (can use multiple times) |
| flag | false | Disable live update mode |
| int | 60 | Polling interval in seconds for live updates |
| flag | false | Use default CocoIndex embedding instead of smart embedding |
| flag | false | Use default CocoIndex chunking instead of tree-sitter/AST chunking |
| flag | false | Use default CocoIndex language handling |
| int | 100 | Chunk size scaling factor as percentage (100=default, <100=smaller, >100=larger) |
| int | 3000 | Port to listen on for HTTP |
| string | 127.0.0.1 | Host/interface to bind to ( |
| string | INFO | Logging level (DEBUG, INFO, WARNING, ERROR) |
| flag | false | Enable JSON responses instead of SSE streams |
| flag | false | Clear database and tracking tables before starting to force re-indexing |
| option | - | Gitignore-style pattern for files to index (e.g. |
| option | - | Gitignore-style pattern for files to exclude (e.g. |
| flag | false | Disable automatic exclusion of files matched by |
File filtering
Include patterns
By default the server uses a broad built-in include list (~70 source-code file patterns, all in **/<pattern> form so they match at any directory depth). When --include is given, it replaces the built-in include list entirely — only the patterns you specify will be indexed.
Exclude patterns and .gitignore support
.gitignore files found anywhere in the scan tree are automatically respected and take priority:
With
.gitignore: the gitignore-derived patterns replace the built-in exclude list. They are the authoritative source of exclusions for the project.Without
.gitignore: the built-in exclude list (common build artefacts, hidden directories, dependency folders, etc.) is used as a fallback.
In both cases, patterns given via --exclude are appended on top. Use --no-gitignore to disable .gitignore processing entirely (the built-in fallback list then applies as usual).
Pattern conversion
All patterns follow gitignore rules and are automatically converted to the globset format used by CocoIndex:
Gitignore pattern | Converted to | Meaning |
|
| Directory named |
|
| Any |
|
| Only the root-level |
|
| Already explicit, left as-is |
Negation (!) is not supported and will be warned and skipped.
Stale-result filtering
When patterns are narrowed between runs (without --rescan), the database may still contain indexed entries from the previous broader scan. To prevent stale results from appearing in queries, the server applies a post-query path filter to all three search types (keyword, vector, hybrid). Any result whose path no longer matches the current include/exclude patterns is silently dropped and logged at INFO level. This means narrowing your patterns takes effect immediately without requiring --rescan.
Note: broadening patterns works in the opposite direction — new files are picked up by CocoIndex's incremental indexer on the next update pass.
Examples
# Index a single directory with live updates
python -m cocoindex_code_mcp_server.main_mcp_server /path/to/code
# Index multiple directories
python -m cocoindex_code_mcp_server.main_mcp_server /path/to/code1 /path/to/code2
# Force re-indexing with custom port
python -m cocoindex_code_mcp_server.main_mcp_server --rescan --port 3033 /path/to/code
# Disable live updates (one-time indexing)
python -m cocoindex_code_mcp_server.main_mcp_server --no-live /path/to/code
# Custom chunk size (50% smaller chunks)
python -m cocoindex_code_mcp_server.main_mcp_server --chunk-factor-percent 50 /path/to/code
# Index only Nix and Dhall files (replaces built-in include list)
python -m cocoindex_code_mcp_server.main_mcp_server --include '*.nix' --include '*.dhall' /path/to/code
# Exclude test directories and lock files (on top of .gitignore / built-in excludes)
python -m cocoindex_code_mcp_server.main_mcp_server --exclude '**/tests/**' --exclude '*.lock' /path/to/code
# Disable .gitignore-based exclusion (built-in fallback list takes over)
python -m cocoindex_code_mcp_server.main_mcp_server --no-gitignore /path/to/codeFeatures
CocoIndex Backend: Uses CocoIndex as the embedding and vector database backend with PostgreSQL + pgvector
Multiple Language Support: Specialized support for 20+ programming languages with language-specific parsers and embeddings
Streaming HTTP MCP Server: Real-time code retrieval via Model Context Protocol over HTTP
Code Change Detection: Incremental indexing with automatic detection of file changes
Tree-sitter Chunking: Advanced code parsing and chunking using tree-sitter AST for better code understanding
Smart Embedding: Multiple embedding models automatically selected based on programming language (see Smart Embedding)
Hybrid Search: Combines vector similarity search with keyword/metadata filtering for precise results
Vector Search: Semantic similarity using language-specific code embeddings
Keyword Search: Exact matching on metadata fields (functions, classes, imports, etc.)
Hybrid Search: Weighted combination of both approaches with configurable weights
Supported Languages
The server supports multiple programming languages with varying levels of integration:
Language | Extensions | Embedding Model | AST Chunking | Tree-sitter | Remarks |
Python |
| GraphCodeBERT | ✅ astchunk | ✅ python | Custom (not using visitor), metadata extraction: |
Rust |
| UniXcoder | ? | ✅ rust | Full metadata support with specialized visitor: |
JavaScript |
| GraphCodeBERT | ?astchunk? | ✅ javascript | Full metadata support with specialized visitor: |
TypeScript |
| UniXcoder | ✅ astchunk | ✅ typescript | Extends javascript visitor: |
TSX |
| UniXcoder | ✅ astchunk | ?typescript? | ?see typescript? |
Java |
| GraphCodeBERT | ✅ astchunk | ✅ java | Full metadata support with specialized visitor: |
Kotlin |
| UniXcoder | ? | ✅ kotlin | Full metadata support with specialized visitor: |
C |
| GraphCodeBERT | ? | ✅ c | Full metadata support with specialized visitor: |
C++ |
| GraphCodeBERT | ? | ✅ cpp | Extends C visitor: |
C# |
| UniXcoder | ✅ astchunk | ❌ | Tree-sitter parsing/chunking only |
Haskell |
| all-mpnet-base-v2 | ✅ | ✅ | Custom maturin extension with specialized visitor, chunker: |
Other Languages | see | all-mpnet-base-v2 | ❌ | ❌ ?regex? | cocoindex defaults (baseline) |
Legend
Embedding Model: The embedding model automatically selected for the language
AST Chunking: Advanced chunking using ASTChunk or custom implementations (based on ideas from ASTChunk and using tree-sitter for the language).
Tree-sitter: Language has tree-sitter parser configured for AST analysis. (python tree-sitter bindings, except for Haskell which uses a Maturin/Rust extension based on rust bindings cargos
tree-sitterandtree-sitter-haskell.)Remarks: Additional notes about support level
Other Languages: Files recognized but only basic text embedding and chunking applied (cocoindex defaults). This includes: Go, PHP, Ruby, Swift, Scala, Dart, CSS, HTML, JSON, Markdown, YAML, TOML, SQL, R, Fortran, Pascal, XML
Smart Embedding
The server uses language-aware code embeddings that automatically select the optimal embedding model based on the programming language. This approach provides better semantic understanding of code compared to generic text embeddings.
How It Works
The smart embedding system uses different specialized models optimized for different programming languages:
GraphCodeBERT (
microsoft/graphcodebert-base)Optimized for: Python, Java, JavaScript, PHP, Ruby, Go, C, C++
Pre-trained on code from these languages with graph-based code understanding
Best for languages with explicit structure and common patterns
UniXcoder (
microsoft/unixcoder-base)Optimized for: Rust, TypeScript, C#, Kotlin, Scala, Swift, Dart
Unified cross-lingual model for multiple languages
Best for modern statically-typed languages
Fallback Model (
sentence-transformers/all-mpnet-base-v2)Used for: Languages not specifically supported by code models
General-purpose text embedding for broader language support
768-dimensional embeddings matching code-specific models
Automatic Selection
The embedding model is automatically selected based on file extension:
# Example: Python file automatically uses GraphCodeBERT
file: main.py → language: python → model: microsoft/graphcodebert-base
# Example: Rust file automatically uses UniXcoder
file: lib.rs → language: rust → model: microsoft/unixcoder-base
# Example: Haskell file uses fallback model
file: Main.hs → language: haskell → model: sentence-transformers/all-mpnet-base-v2Benefits
Better Code Understanding: Code-specific models understand programming constructs better than generic text models
Language-Specific Optimization: Each language gets embeddings from models trained on that language
Consistent Search Quality: Similar code snippets in the same language produce similar embeddings
Zero Configuration: Automatic model selection requires no manual configuration
Implementation Details
The smart embedding system is implemented as an external wrapper around CocoIndex's SentenceTransformerEmbed function, located in python/cocoindex_code_mcp_server/smart_code_embedding.py. This approach:
Does not modify CocoIndex source code
Uses CocoIndex as a pure dependency
Provides drop-in compatibility with existing workflows
Can be easily updated independently
For more technical details, see:
Development
Prerequisites
Rust (latest stable version)
Python 3.11+
Maturin (build tool for Python extensions in Rust)
PostgreSQL with pgvector extension
Tree-sitter language parsers (automatically installed via pyproject.toml)
Run tests
# Run tests to verify installation
pytest -c pytest.ini tests/
uv run --with pytest --with pytest-asyncio --with pytest-mock pytest tests/test_pattern_utils.py -v --tb=short 2>&1 | tail -20Code Quality
The project uses mypy for type checking. Use the provided scripts:
# Type check main source code
./scripts/mypy-check.sh
# Type check tests
./scripts/mypy-check-tests.shProject Structure
python/cocoindex_code_mcp_server/: Main MCP server implementationmain_mcp_server.py: MCP server entry pointcocoindex_config.py: CocoIndex flow configurationsmart_code_embedding.py: Language-aware embedding selectionmappers.py: Language and field mappingstree_sitter_parser.py: Tree-sitter parsing utilitiesdb/: Database abstraction layerpgvector/: PostgreSQL + pgvector backend
lang/: Language-specific handlerspython/: Python code analyzerhaskell/: Haskell support (via Rust extension)
tests/: Pytest test suitedocs/: Documentationclaude/: Development notes and architecture docscocoindex/: CocoIndex-specific documentationinstructions/: Task instructions and guides
rust/: Rust componentssrc/lib.rs: Haskell tree-sitter Rust extension
astchunk/: ASTChunk submodule for advanced code chunking
Running Tests
# Run all tests
pytest -c pytest.ini tests/
# Run specific test file
pytest -c pytest.ini tests/test_hybrid_search_integration.py
# Run with coverage
pytest -c pytest.ini tests/ --cov=python/cocoindex_code_mcp_server --cov-report=htmlContributing
Contributions are welcome! Please open issues and pull requests on the GitHub repository.
Development Workflow
Fork the repository
Create a feature branch
Make your changes with tests
Run type checking:
./scripts/mypy-check.shRun tests:
pytest tests/Submit a pull request
Areas for Contribution
Additional language support (parsers, embeddings, chunking)
Enhanced metadata extraction for existing languages
Performance optimizations
Documentation improvements
Bug fixes and issue resolution
License
AGPL-3.0 or later
Links
CocoIndex Framework: https://cocoindex.io
GitHub Repository: https://github.com/aanno/cocoindex-code-mcp-server
Model Context Protocol: https://modelcontextprotocol.io
ASTChunk: https://github.com/codelion/astchunk
Acknowledgments
Built on top of the excellent CocoIndex framework for incremental data transformation and the Model Context Protocol for AI tool integration.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/aanno/cocoindex-code-mcp-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server