CodeGrok MCP
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@CodeGrok MCPfind the authentication logic"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
CodeGrok MCP
Semantic Code Search for AI Assistants
Give your AI assistant the power to truly understand your codebase
Features • Quick Start • Capabilities • Limitations • Integrations • Use Cases
What is CodeGrok MCP?
CodeGrok MCP is a Model Context Protocol (MCP) server that enables AI assistants to intelligently search and understand codebases using semantic embeddings and Tree-sitter parsing.
Unlike simple text search, CodeGrok understands code structure - it knows what functions, classes, and methods are, and can find relevant code even when you describe it in natural language.
You: "Where is authentication handled?"
CodeGrok: Returns auth middleware, login handlers, JWT validation code...Why Use CodeGrok?
The Problem: AI assistants have limited context windows. Sending your entire codebase is expensive and often impossible.
The Solution: CodeGrok indexes your code once, then AI can query semantically and receive only the 5-10 most relevant code snippets—10-100x token reduction vs naive "read all files" approaches.
Features
Semantic Code Search - Find code by meaning, not just keywords
9 Languages Supported - Python, JavaScript, TypeScript, C, C++, Go, Java, Kotlin, Bash
28 File Extensions - Comprehensive coverage including
.jsx,.tsx,.mjs,.hpp, etc.Fast Parallel Indexing - 3-5x faster with multi-threaded parsing
Incremental Updates - Only re-index changed files (auto mode)
Local & Private - All data stays on your machine in
.codegrok/folderZero LLM Dependencies - Lightweight, focused tool (no API keys required)
GPU Acceleration - Auto-detects CUDA for faster embeddings
Works with Any MCP Client - Claude, Cursor, Cline, and more
✅ What CodeGrok CAN Do
For Live Coding (AI-Assisted Development)
Capability | Description |
Semantic Code Search | Natural language queries → vector similarity search against indexed code |
Find Code by Purpose | Query "How does auth work?" → Returns relevant auth files with line numbers |
Symbol Extraction | Extracts functions, classes, methods with signatures, docstrings, calls, imports |
Incremental Updates |
|
Persistent Storage | Index survives restarts in |
Load Existing Index |
|
For Learning a New Codebase
Capability | Description |
Entry Point Discovery | Query "main entry point" to find where execution starts |
Architecture Understanding | Query "database connection" to find DB layer |
Domain Concepts | Query "user authentication flow" to find auth logic |
Index Statistics | See files parsed, symbols extracted, timing info |
❌ What CodeGrok CANNOT Do
Important: Understanding limitations helps you use the tool effectively.
Not Designed For
Limitation | Explanation |
Code Execution | Pure indexing/search - no interpreter, no running tests |
Code Modification | Read-only search - doesn't write or edit files |
Real-time File Watching | No daemon mode - manually call |
Cross-repository Search | Single codebase per index - can't search multiple projects simultaneously |
Find All Usages | Finds definitions, not references (no "who calls this function?") |
Type Inference / LSP | No language server - no jump-to-definition, no autocomplete |
Git History Analysis | Indexes current state only - no commit history or blame |
Regex/Exact Search | Semantic only - use |
Code Metrics | No complexity scoring, no linting, no coverage data |
Technical Constraints
Constraint | Impact |
First index is slow | ~50 chunks/second (~3-4 min for 10K symbols) |
Memory usage | Embedding models use 500MB-2GB RAM |
Model download | First run downloads ~500MB model from HuggingFace |
Query latency | ~50-100ms per search |
Quick Start
Installation
# Clone the repository
git clone https://github.com/rdondeti/CodeGrok_mcp.git
cd CodeGrok_mcp
# Option 1: Use setup script (recommended)
./setup.sh # Linux/macOS
# or
.\setup.ps1 # Windows PowerShell
# Option 2: Manual install
python -m venv .venv
source .venv/bin/activate # Linux/macOS
pip install -e .
# Verify installation
codegrok-mcp --helpSetup script options:
Flag | Description |
| Remove existing venv before creating new |
| Install production dependencies only |
| Skip verification step |
First Index
Once integrated with your AI tool (see below), ask your assistant:
"Learn my codebase at /path/to/my/project"Then search:
"Find how API endpoints are defined"
"Where is error handling implemented?"
"Show me the database models"🎯 Use Cases
Use Case 1: Live Coding with AI
How CodeGrok Saves Tokens:
Without CodeGrok:
AI tries to read entire codebase → exceeds context window → fails or costs $$
With CodeGrok:
AI: "I need to add a new route"
↓ calls get_sources("Express route definition")
CodeGrok: Returns routes/api.js:15, routes/auth.js:8
↓ AI reads only those 2 files
Result: 10-100x fewer tokens, faster responsesUse Case 2: Learning a New Codebase
Step 1: "Learn my codebase at ~/projects/big-app"
Step 2: "Where is the main entry point?"
Step 3: "How is authentication implemented?"
Step 4: "Find the database connection logic"
Step 5: "Show me how API errors are handled"Use Case 3: Code Review Assistance
"Find all functions that handle user input"
"Where is validation performed?"
"Show me error handling patterns"🔌 AI Tool Integrations
Claude Code (CLI)
The easiest way to add CodeGrok to Claude Code:
# Add the MCP server
claude mcp add codegrok-mcp -- codegrok-mcpOr manually add to your settings (~/.claude/settings.json):
{
"mcpServers": {
"codegrok": {
"command": "codegrok-mcp"
}
}
}Usage in Claude Code:
> learn my codebase at ./my-project
> find authentication logic
> where is the main entry point?Claude Desktop
Add to your Claude Desktop configuration:
Platform | Config File Location |
macOS |
|
Windows |
|
Linux |
|
{
"mcpServers": {
"codegrok": {
"command": "codegrok-mcp",
"args": []
}
}
}Restart Claude Desktop after saving.
Cursor
Cursor supports MCP servers through its extension system:
Open Settings → Extensions → MCP
Add Server Configuration:
{
"codegrok": {
"command": "codegrok-mcp",
"transport": "stdio"
}
}Or add to .cursor/mcp.json in your project:
{
"servers": {
"codegrok": {
"command": "codegrok-mcp"
}
}
}Windsurf (Codeium)
Windsurf supports MCP through Cascade:
Open Cascade Settings
Navigate to MCP Servers
Add configuration:
{
"codegrok": {
"command": "codegrok-mcp",
"transport": "stdio"
}
}Cline (VS Code)
Add to Cline's MCP settings in VS Code:
Open Command Palette (
Ctrl+Shift+P/Cmd+Shift+P)Search "Cline: Open MCP Settings"
Add:
{
"mcpServers": {
"codegrok": {
"command": "codegrok-mcp"
}
}
}Zed Editor
Zed supports MCP through its assistant panel. Add to settings:
{
"assistant": {
"mcp_servers": {
"codegrok": {
"command": "codegrok-mcp"
}
}
}
}Continue (VS Code/JetBrains)
Add to your Continue configuration (~/.continue/config.json):
{
"mcpServers": [
{
"name": "codegrok",
"command": "codegrok-mcp"
}
]
}Generic MCP Client
For any MCP-compatible client, use stdio transport:
# Command to run
codegrok-mcp
# Transport
stdio (stdin/stdout)
# Protocol
Model Context Protocol (MCP)MCP Tools Reference
CodeGrok provides 4 tools for AI assistants:
Tool | Description | Key Parameters |
| Index a codebase (smart modes) |
|
| Semantic code search |
|
| Get index statistics | None |
| List supported languages | None |
Learn modes:
auto(default): Smart detection - incremental reindex if exists, full index if newfull: Force complete re-index (destroys existing index)load_only: Just load existing index without any indexing
Tool Examples
Learn a Codebase
{
"tool": "learn",
"arguments": {
"path": "/home/user/my-project",
"mode": "auto"
}
}Response:
{
"success": true,
"message": "Indexed 150 files with 1,247 symbols",
"stats": {
"total_files": 150,
"total_symbols": 1247,
"total_chunks": 2834,
"indexing_time": 12.5
}
}Search for Code
{
"tool": "get_sources",
"arguments": {
"question": "How is user authentication implemented?",
"n_results": 5
}
}Response:
{
"sources": [
{
"file": "src/auth/middleware.py",
"symbol": "authenticate_request",
"type": "function",
"line": 45,
"content": "def authenticate_request(request):\n ...",
"score": 0.89
}
]
}Incremental Update (using learn with auto mode)
{
"tool": "learn",
"arguments": {
"path": "/home/user/my-project",
"mode": "auto"
}
}Response (when index exists):
{
"success": true,
"mode_used": "incremental",
"files_added": 2,
"files_modified": 5,
"files_deleted": 1
}Supported Languages
Language | Extensions | Parser |
Python |
| tree-sitter-python |
JavaScript |
| tree-sitter-javascript |
TypeScript |
| tree-sitter-typescript |
C |
| tree-sitter-c |
C++ |
| tree-sitter-cpp |
Go |
| tree-sitter-go |
Java |
| tree-sitter-java |
Kotlin |
| tree-sitter-kotlin |
Bash |
| tree-sitter-bash |
Total: 9 languages, 28 file extensions
How It Works
Architecture
┌─────────────────────────────────────────────────────────────┐
│ MCP Client │
│ (Claude, Cursor, Cline, etc.) │
└─────────────────────────┬───────────────────────────────────┘
│ MCP Protocol (stdio)
▼
┌─────────────────────────────────────────────────────────────┐
│ CodeGrok MCP Server │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Parsers │ │ Embeddings │ │ Vector Storage │ │
│ │ (Tree-sitter)│ │ (Sentence │ │ (ChromaDB) │ │
│ │ │ │ Transformers)│ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘Indexing Pipeline
Source Files → Tree-sitter Parser → Symbol Extraction →
Code Chunks → Embeddings → ChromaDB StorageParse: Tree-sitter extracts functions, classes, methods with signatures
Chunk: Code is split into semantic chunks with context (docstrings, imports, calls)
Embed: Sentence-transformers create vector embeddings
Store: ChromaDB persists vectors locally in
.codegrok/
Search Pipeline
Query → Embedding → Vector Similarity → Ranked ResultsEmbed Query: Convert natural language to vector
Search: Find similar vectors in ChromaDB
Return: Top-k results with file paths, line numbers, and code snippets
Storage
All data is stored locally in your project:
your-project/
└── .codegrok/
├── chroma/ # Vector database
└── metadata.json # Index metadata (stats, file mtimes)Configuration
Environment Variables
Variable | Description | Default |
| Embedding model to use |
|
| Compute device (cpu/cuda/mps) | Auto-detect |
Embedding Models
Model | Size | Best For |
| 768d / 137M | Code (default, recommended) - uses |
The default model (nomic-ai/CodeRankEmbed) is optimized for code retrieval with:
768-dimensional embeddings
8192 max sequence length
State-of-the-art performance on CodeSearchNet benchmarks
Security Note: trust_remote_code
The default embedding model (nomic-ai/CodeRankEmbed) requires trust_remote_code=True when loading via SentenceTransformers. This flag allows execution of custom Python code bundled with the model.
Why it's required:
The model uses a custom Nomic BERT architecture that isn't part of the standard HuggingFace model library
Custom files:
modeling_hf_nomic_bert.py(model architecture),configuration_hf_nomic_bert.py(config)
Security audit: The custom code has been reviewed and contains:
Standard PyTorch neural network definitions
No
exec(),eval(), or dynamic code executionNo subprocess or shell commands
No network requests beyond HuggingFace's standard model download APIs
Only imports from trusted libraries (torch, transformers, einops, safetensors)
For maximum security:
Review the model code yourself: nomic-ai/CodeRankEmbed on HuggingFace
Pin to a specific model revision in production deployments
Consider using Microsoft CodeBERT (
microsoft/codebert-base) as an alternative that doesn't requiretrust_remote_code(with potential quality trade-offs)
Development
Setup
# Clone
git clone https://github.com/rdondeti/CodeGrok_mcp.git
cd CodeGrok_mcp
# Run setup script
./setup.sh # Linux/macOS (includes dev dependencies)
.\setup.ps1 # Windows PowerShell
# For clean reinstall:
./setup.sh --cleanTesting
# Run all tests
pytest
# Run with coverage
pytest --cov=src/codegrok_mcp --cov-report=term-missing
# Run specific test categories
pytest tests/unit/ -v # Fast unit tests
pytest tests/integration/ -v # Integration tests (uses real embeddings)
pytest tests/mcp/ -v # MCP protocol simulation testsCode Quality
# Format code
black src/
# Type checking
mypy src/
# Linting
flake8 src/FAQ & Troubleshooting
# Check installation
pip show codegrok-mcp
# Check Python version (need 3.10+)
python --version
# Reinstall
pip install -e .Large codebases (>10k files) take longer on first index
Use
learnagain after first index for incremental updates (auto mode)Close other heavy applications
Consider indexing a subdirectory first
Be more specific in queries (e.g., "JWT token validation" instead of "auth")
Re-index if codebase changed significantly
Check that the code type you're searching exists
Index smaller portions of the codebase
The default
coderankembedmodel uses ~500MB-2GB RAMClose other applications
Use learn tool first:
"Learn my codebase at /path/to/project"Comparison with Other Tools
Feature | CodeGrok MCP | grep/ripgrep | GitHub Search | Sourcegraph |
Semantic Search | ✅ | ❌ | Partial | ✅ |
Local/Private | ✅ | ✅ | ❌ | ❌ |
MCP Support | ✅ | ❌ | ❌ | ❌ |
No API Keys | ✅ | ✅ | ❌ | ❌ |
Multi-language | ✅ | ✅ | ✅ | ✅ |
Code Structure Aware | ✅ | ❌ | Partial | ✅ |
Offline | ✅ | ✅ | ❌ | ❌ |
Contributing
Contributions are welcome! Please:
Fork the repository
Create a feature branch (
git checkout -b feature/amazing)Make your changes
Run tests (
pytest)Format code (
black src/)Submit a Pull Request
Development Guidelines
Follow Black formatting (line length 100)
Add type hints to all functions
Write tests for new features
Update documentation
License
MIT License - see LICENSE for details.
Related Projects
Model Context Protocol - The protocol that powers this integration
Tree-sitter - Fast, accurate code parsing
ChromaDB - Vector database for embeddings
Sentence Transformers - State-of-the-art embeddings
Support
Issues: GitHub Issues
Discussions: GitHub Discussions
Made with ❤️ for developers who want AI that truly understands their code
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/dondetir/CodeGrok_mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server