MCP Code Analysis Server

MIT License

Overview InspectNew Endpoints Schema Related Servers Reviews Score

mcpcodeanalysis

CLAUDE.md•11.5 kB

# MCP Code Analysis Server - Complete Guide ## Project Overview This is a production-ready MCP (Model Context Protocol) server that provides intelligent code analysis and search capabilities for large codebases. It uses TreeSitter for parsing, PostgreSQL with pgvector for storage, and OpenAI embeddings for semantic search. The server implements the Model Context Protocol, making it compatible with Claude Desktop, custom clients, and any MCP-compatible application. It provides comprehensive code analysis including semantic search, dependency analysis, domain-driven design extraction, and advanced refactoring suggestions. ## Development Environment - Uses Nix flakes with uv for Python dependency management - Python 3.11 environment - PostgreSQL with pgvector extension - Docker Compose for deployment - **Use the nix develop environment to execute python** ## Key Commands - `nix develop` - Enter development shell - `uv sync` - Sync Python dependencies - `docker-compose up` - Start PostgreSQL and run server - `pytest` - Run tests - `ruff check .` - Run linter - `mypy .` - Type checking - `black .` - Format code - `viberdash monitor --interval 300` - Monitor code quality metrics (updates every 5 minutes) ## Project Structure ``` ├── src/ │ ├── scanner/ # Code scanning module │ ├── parser/ # TreeSitter parsing │ ├── embeddings/ # OpenAI embedding generation │ ├── mcp_server/ # FastMCP server implementation │ ├── query/ # Query processing │ └── database/ # PostgreSQL models and migrations ├── tests/ # Test suite ├── config.yaml # Configuration file ├── docker-compose.yml # Docker services ├── pyproject.toml # Python project config └── flake.nix # Nix development environment ``` ## Core Components ### 1. Scanner Module (`src/scanner/`) - Scans directories for Python files - Uses Git for change tracking - Implements incremental indexing - Respects exclude patterns from config.yaml ### 2. Parser Module (`src/parser/`) - Uses TreeSitter to extract code structure - Extracts classes, functions, imports, docstrings - Stores parsed data in PostgreSQL ### 3. Embeddings Module (`src/embeddings/`) - Generates both raw code and interpreted embeddings - Uses OpenAI text-embedding-ada-002 - Implements chunking strategies for large files - Stores embeddings in pgvector ### 4. MCP Server (`src/mcp_server/`) - FastMCP HTTP-based implementation serving on `http://localhost:8080/mcp/v1/messages` - Full MCP protocol compliance with session management - Comprehensive tool suite (30+ tools): #### Core Analysis Tools: - `search_code` - Natural language semantic search - `explain_code` - Hierarchical code explanations - `find_definition` - Locate symbol definitions - `find_usage` - Find usage locations - `analyze_dependencies` - Dependency analysis - `suggest_refactoring` - AI-powered refactoring suggestions - `find_similar_code` - Pattern matching and code similarity - `get_code_structure` - Module/file structure analysis #### Domain-Driven Design Tools: - `extract_domain_model` - Extract domain entities and relationships - `find_context_relationships` - Context mapping analysis - `analyze_bounded_contexts` - Bounded context identification - `suggest_aggregate_roots` - Aggregate root suggestions - `analyze_repository_patterns` - Repository pattern analysis #### Package & Architecture Tools: - `analyze_packages` - Package structure analysis - `get_package_dependencies` - Package dependency graphs - `find_circular_dependencies` - Circular dependency detection - `get_package_coupling` - Coupling metrics analysis - `get_package_tree` - Package hierarchy visualization #### Repository Management: - `list_repositories` - Repository management with stats - `sync_repository` - Manual repository synchronization - `health_check` - Server health and status ### 5. Query Module (`src/query/`) - Processes natural language queries - Implements ranking algorithms - Handles aggregation for hierarchical explanations ## Database Schema Main tables: - `files` - File metadata and Git tracking - `modules` - Python modules - `classes` - Class definitions - `functions` - Function/method definitions - `imports` - Import statements - `code_embeddings` - Vector embeddings ## Testing Strategy - Use the MCP server's own codebase as test data - Test incremental updates with Git changes - Verify all MCP tools work correctly - Performance benchmarks for large codebases ## Implementation Phases 1. **Core Infrastructure** - Scanner, database, Docker setup 2. **Parser Integration** - TreeSitter, AST mapping 3. **Embedding System** - OpenAI integration, pgvector storage 4. **MCP Server** - FastMCP, all tools implementation 5. **Advanced Features** - Refactoring, similarity, dependencies ## Performance Requirements - Initial indexing: <1000 files/minute - Incremental updates: <10s for 100 files - Query response: <2s - Support up to 10M LOC ## Environment Variables ```bash OPENAI_API_KEY=your_api_key POSTGRES_PASSWORD=secure_password DATABASE_URL=postgresql://codeanalyzer:password@postgres:5432/code_analysis ``` ## Quick Start ### Development Setup 1. `nix develop` - Enter development shell 2. `cp config.example.yaml config.yaml` - Create config file 3. Edit `config.yaml` with your OpenAI API key and repository URLs 4. `uv sync` - Install Python dependencies ### Production Deployment (Recommended) ```bash # Start all services with Docker Compose docker-compose up -d # Check service status docker ps # View logs docker logs mcp-server docker logs mcp-scanner docker logs mcp-postgres ``` The Docker setup includes: - PostgreSQL with pgvector extension - Automatic database initialization - Code scanner service (incremental updates) - MCP server on port 8080 - Persistent data volumes ### Manual Development For development without Docker: 1. `docker-compose up -d postgres` - Start only database 2. `python -m src.mcp_server init-db` - Initialize database 3. `python -m src.scanner` - Run initial scan 4. `python -m src.mcp_server serve` - Start MCP server ## Using the MCP Server ### With Claude Desktop Add this configuration to Claude Desktop's MCP settings: ```json { "mcp": { "servers": { "code-analysis": { "command": "node", "args": ["-e", "require('http').request('http://localhost:8080/mcp/v1/messages', {method:'POST', headers:{'Content-Type':'application/json'}}, res => res.pipe(process.stdout)).end()"], "env": {} } } } } ``` Or use HTTP transport directly at `http://localhost:8080/mcp/v1/messages` ### Available Tools & Usage Examples #### Code Search & Analysis ```javascript // Natural language code search search_code({ query: "TreeSitter parser that extracts function definitions", limit: 10 }) // Find specific definitions find_definition({ name: "CodeProcessor", entity_type: "class" }) // Get file structure get_code_structure({ file_path: "src/parser/treesitter_parser.py" }) // Analyze dependencies analyze_dependencies({ file_path: "src/mcp_server/server.py" }) ``` #### Domain-Driven Design Analysis ```javascript // Extract domain model extract_domain_model({ code_path: "src/database/models.py", include_relationships: true }) // Find bounded contexts analyze_bounded_contexts({ search_paths: ["src/"], min_entities: 3 }) // Suggest aggregate roots suggest_aggregate_roots({ domain_path: "src/domain/", include_reasoning: true }) ``` #### Architecture & Package Analysis ```javascript // Analyze package structure analyze_packages({ root_path: "src/", include_metrics: true }) // Find circular dependencies find_circular_dependencies({ root_path: "src/", max_depth: 5 }) // Get coupling metrics get_package_coupling({ package_path: "src/mcp_server/", include_details: true }) ``` #### Repository Management ```javascript // List all repositories with stats list_repositories({ include_stats: true }) // Manual sync sync_repository({ repository_url: "https://github.com/user/repo", force_full_scan: false }) // Health check health_check() ``` ### Configuration #### config.yaml Structure ```yaml # OpenAI API configuration openai_api_key: "your-api-key-here" # Repositories to track repositories: - url: https://github.com/your-org/your-repo branch: main # optional, uses default branch if not specified access_token: github_pat_... # for private repos # Database configuration database: host: localhost port: 5432 database: code_analysis user: codeanalyzer password: your-password # MCP server settings mcp: host: 0.0.0.0 port: 8080 # Scanner settings scanner: storage_path: ./repositories exclude_patterns: - __pycache__ - "*.pyc" - .git - node_modules - venv - .env # Embedding configuration embeddings: model: text-embedding-3-small batch_size: 100 max_tokens: 8000 # LLM configuration (for analysis tools) llm: model: gpt-4o-mini temperature: 0.2 max_tokens: 4096 ``` ### Incremental Scanning The scanner automatically handles incremental updates: 1. **Initial Scan**: Full repository scan on first run 2. **Incremental Updates**: Only processes files changed since last sync 3. **Git Integration**: Uses Git commit history to identify changes 4. **Periodic Sync**: Runs every 5 minutes by default 5. **Efficient Processing**: Smaller batch sizes for changed files Monitor scanner activity: ```bash # View scanner logs docker logs mcp-scanner # Check for changed files docker logs mcp-scanner | grep "changed files" ``` ### Performance Characteristics - **Initial indexing**: ~1000 files/minute - **Incremental updates**: <10s for 100 files - **Query response**: <2s for semantic search - **Capacity**: Supports up to 10M lines of code - **Memory usage**: ~500MB for typical projects - **Database size**: ~100MB per 1M LOC ### Monitoring & Debugging #### Health Checks ```bash # Container health docker ps # Service logs docker logs mcp-server docker logs mcp-scanner docker logs mcp-postgres # Database connectivity docker exec mcp-postgres psql -U codeanalyzer -d code_analysis -c "SELECT COUNT(*) FROM files;" ``` #### Common Issues 1. **"Could not resolve target entity" warnings**: Expected for external dependencies 2. **Health check failures**: Docker health checks expect `/health` endpoint (cosmetic only) 3. **Scanner errors**: Check Git repository access and API keys 4. **Embedding failures**: Verify OpenAI API key and rate limits #### Performance Tuning ```yaml # Adjust in config.yaml scanner: batch_size: 5 # Reduce for memory constraints embeddings: batch_size: 50 # Reduce for API rate limits database: pool_size: 10 # Adjust connection pool ``` ## Notes for Development - Follow existing code patterns in TreeSitter parsing - Use batch operations for embedding generation - Implement proper error handling for external APIs - Add logging for debugging large codebase scans - Consider memory usage for large file processing - Use database transactions for consistency ## Development Memories - Do not try hacky workarounds. Fix it. - For every change of the software you have to rebuild the docker container to see the result. - Always run python, uv etc in nix develop - Never mention claude or anthropic in commit messages. - Always use the current versions of packages and apis

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/johannhartmann/mcpcodeanalysis'

If you have feedback or need assistance with the MCP directory API, please join our Discord server