Skip to main content
Glama

Smart Code Search MCP Server

ARCHITECTURE.md14.1 kB
# SCS-MCP Architecture ## System Overview SCS-MCP (Smart Code Search - Model Context Protocol) is a sophisticated code intelligence system that provides semantic search, analysis, and voice interaction capabilities to Claude Desktop and other MCP-compatible clients. ``` ┌─────────────────────────────────────────────────────────────┐ │ Claude Desktop │ │ (or MCP Client) │ └────────────────────┬───────────────────────────────────────┘ │ MCP Protocol ┌────────────────────▼───────────────────────────────────────┐ │ MCP Server Layer │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Request Router & Handler │ │ │ └──────────────────────────────────────────────────────┘ │ └────────────────────┬───────────────────────────────────────┘ │ ┌────────────────────▼───────────────────────────────────────┐ │ Core Services │ │ ┌────────────┐ ┌────────────┐ ┌──────────────────┐ │ │ │ Search │ │ Analysis │ │ Orchestration │ │ │ │ Engine │ │ Tools │ │ Framework │ │ │ └────────────┘ └────────────┘ └──────────────────┘ │ └────────────────────┬───────────────────────────────────────┘ │ ┌────────────────────▼───────────────────────────────────────┐ │ Data Layer │ │ ┌────────────┐ ┌────────────┐ ┌──────────────────┐ │ │ │ SQLite │ │ Embeddings │ │ Git History │ │ │ │ Database │ │ Cache │ │ Analyzer │ │ │ └────────────┘ └────────────┘ └──────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` ## Core Components ### 1. MCP Server (`src/server.py`) The main entry point that implements the Model Context Protocol specification. **Responsibilities**: - Protocol implementation (JSON-RPC) - Request routing and validation - Response formatting - Error handling - Rate limiting **Key Features**: - Async request handling - Thread-safe operations - Connection pooling - Request caching ### 2. Search Engine (`src/search_engine.py`) Hybrid search system combining semantic and keyword matching. **Architecture**: ```python SearchEngine ├── EmbeddingGenerator (Sentence Transformers) │ └── all-MiniLM-L6-v2 model ├── IndexManager │ ├── SQLite storage │ └── FAISS vector index (optional) ├── QueryProcessor │ ├── Query parsing │ ├── Synonym expansion │ └── Filter application └── ResultRanker ├── Semantic scoring ├── Keyword matching └── Hybrid ranking ``` **Search Pipeline**: 1. Query preprocessing (tokenization, normalization) 2. Embedding generation 3. Vector similarity search 4. Keyword matching 5. Result fusion and ranking 6. Post-processing and formatting ### 3. Code Indexer (`src/enhanced_indexer.py`) Multi-language code parser and indexer using Tree-sitter. **Supported Languages**: - Python (full AST analysis) - JavaScript/TypeScript - Java - C/C++ - Go - Rust - Ruby **Index Structure**: ```sql symbols ├── id (PRIMARY KEY) ├── name ├── type (function/class/variable) ├── file_path ├── line_number ├── column_number ├── signature ├── docstring ├── code_snippet ├── embedding (BLOB) └── metadata (JSON) dependencies ├── source_symbol_id ├── target_symbol_id ├── dependency_type └── context git_history ├── commit_hash ├── file_path ├── change_type ├── diff_content └── metadata ``` ### 4. Analysis Tools (`src/tools/`) Modular analysis components for code intelligence. **Tool Categories**: #### Code Quality Tools - `instant_review.py`: Real-time code review - `complexity_analyzer.py`: Cyclomatic complexity - `test_gap_analyzer.py`: Test coverage analysis - `security_analyzer.py`: Basic vulnerability scanning #### Git Analysis Tools - `git_analyzer.py`: Repository history analysis - `git_search.py`: Commit message search - `change_tracker.py`: File change tracking #### Dependency Tools - `dependency_analyzer.py`: Import analysis - `circular_detector.py`: Circular dependency detection - `usage_tracker.py`: Symbol usage tracking #### Model Information Tools - `model_info_tools.py`: AI model capabilities - `cost_estimator.py`: Token usage estimation - `model_selector.py`: Task-based model selection ### 5. Orchestration Framework (`src/orchestrators/`) High-level coordination for complex operations. **Orchestrator Pattern**: ```python class Orchestrator: def __init__(self): self.tools = [] self.pipeline = [] async def execute(self, context): results = {} for stage in self.pipeline: results[stage] = await self.run_stage(stage, context, results) return self.aggregate_results(results) ``` **Available Orchestrators**: - `DebtOrchestrator`: Technical debt analysis - `RefactorOrchestrator`: Refactoring coordination - `MigrationOrchestrator`: Code migration planning - `QualityOrchestrator`: Comprehensive quality assessment - `PerformanceOrchestrator`: Performance optimization ### 6. Voice Assistant (`voice-assistant/`) Web-based voice interface with media capture capabilities. **Architecture**: ``` Voice Assistant ├── Server (Node.js/Express) │ ├── WebSocket handler │ ├── MCP client │ └── Media processor ├── Web UI (HTML/JS) │ ├── Voice recognition (Web Speech API) │ ├── Media gallery │ └── Real-time updates ├── VS Code Extension │ ├── Editor context │ ├── Command palette │ └── Status bar └── Storage ├── SQLite (metadata) └── File system (media) ``` ## Data Flow ### Search Request Flow ``` 1. Client Request └── MCP Server receives query └── Query Processor ├── Parse and validate ├── Generate embedding └── Build search parameters └── Search Engine ├── Vector search ├── Keyword search └── Merge results └── Post-processor ├── Format response ├── Add context └── Return to client ``` ### Indexing Flow ``` 1. File Discovery └── File Scanner └── Language Detector └── Parser (Tree-sitter) ├── Extract symbols ├── Extract dependencies └── Extract documentation └── Embedding Generator └── Database Writer ├── Store symbols ├── Store embeddings └── Update indices ``` ## Database Schema ### Core Tables ```sql -- Symbol storage CREATE TABLE symbols ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, type TEXT NOT NULL, file_path TEXT NOT NULL, line_number INTEGER, signature TEXT, docstring TEXT, code TEXT, embedding BLOB, metadata JSON, created_at TIMESTAMP, updated_at TIMESTAMP ); -- Search index CREATE TABLE search_index ( id INTEGER PRIMARY KEY, symbol_id INTEGER, content TEXT, embedding BLOB, tfidf_vector BLOB, FOREIGN KEY (symbol_id) REFERENCES symbols(id) ); -- Dependencies CREATE TABLE dependencies ( id INTEGER PRIMARY KEY, source_id INTEGER, target_id INTEGER, type TEXT, context TEXT, FOREIGN KEY (source_id) REFERENCES symbols(id), FOREIGN KEY (target_id) REFERENCES symbols(id) ); -- Git history CREATE TABLE git_commits ( hash TEXT PRIMARY KEY, message TEXT, author TEXT, timestamp TIMESTAMP, files_changed TEXT, stats JSON ); -- Cache CREATE TABLE cache ( key TEXT PRIMARY KEY, value BLOB, expires_at TIMESTAMP ); ``` ## Performance Optimizations ### 1. Caching Strategy **Multi-level caching**: - L1: In-memory LRU cache (most recent queries) - L2: SQLite cache table (persistent cache) - L3: File system cache (large results) **Cache invalidation**: - Time-based expiry (TTL: 5 minutes default) - Event-based (file changes) - Manual refresh ### 2. Indexing Optimizations - **Incremental indexing**: Only changed files - **Parallel processing**: Multi-threaded parsing - **Batch operations**: Bulk database inserts - **Lazy loading**: On-demand embedding generation ### 3. Search Optimizations - **Query optimization**: SQL query planning - **Vector quantization**: Reduced embedding size - **Early termination**: Stop at result threshold - **Result streaming**: Progressive response ## Security Considerations ### 1. Input Validation - Parameter sanitization - SQL injection prevention - Path traversal protection - Size limits enforcement ### 2. Access Control - File system boundaries - Git repository isolation - Read-only operations - No code execution ### 3. Data Protection - No credential storage - Temporary file cleanup - Secure communication (MCP protocol) - Error message sanitization ## Scalability ### Horizontal Scaling ``` Load Balancer ├── MCP Server Instance 1 ├── MCP Server Instance 2 └── MCP Server Instance N └── Shared Database (PostgreSQL) └── Distributed Cache (Redis) ``` ### Vertical Scaling - **Memory**: Increase embedding cache size - **CPU**: More worker threads - **Storage**: Larger index capacity - **GPU**: Hardware acceleration for embeddings ## Monitoring & Observability ### Metrics - Request latency (p50, p95, p99) - Search accuracy (precision/recall) - Index size and growth - Cache hit rates - Error rates ### Logging ```python # Structured logging logger.info("search_request", { "query": query, "results": len(results), "latency_ms": latency, "cache_hit": cache_hit }) ``` ### Health Checks ```json GET /health { "status": "healthy", "version": "1.0.0", "uptime": 3600, "index_size": 150000, "cache_hit_rate": 0.85 } ``` ## Future Architecture Considerations ### Planned Enhancements 1. **Distributed indexing**: Apache Spark integration 2. **Real-time updates**: File watcher integration 3. **Advanced ML models**: CodeBERT, GraphCodeBERT 4. **Cloud deployment**: AWS Lambda, Google Cloud Run 5. **Multi-tenant support**: Workspace isolation ### API Evolution ```yaml # Proposed v2 API structure /api/v2/ /search /semantic /keyword /hybrid /analyze /quality /security /performance /refactor /suggest /preview /apply ``` ## Development Guidelines ### Code Organization ``` src/ ├── core/ # Core functionality ├── tools/ # Analysis tools ├── orchestrators/ # High-level coordinators ├── utils/ # Shared utilities ├── models/ # Data models └── tests/ # Test suites ``` ### Design Principles 1. **Modularity**: Loosely coupled components 2. **Extensibility**: Plugin architecture 3. **Testability**: Dependency injection 4. **Performance**: Async-first design 5. **Reliability**: Graceful degradation ### Testing Strategy - Unit tests: 80% coverage minimum - Integration tests: API endpoints - Performance tests: Load testing - End-to-end tests: User workflows ## Deployment Architecture ### Docker Deployment ```dockerfile # Multi-stage build FROM python:3.11-slim AS builder # Build stage FROM python:3.11-slim # Runtime stage ``` ### Kubernetes Deployment ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: scs-mcp spec: replicas: 3 selector: matchLabels: app: scs-mcp template: spec: containers: - name: scs-mcp image: scs-mcp:latest resources: requests: memory: "1Gi" cpu: "500m" limits: memory: "2Gi" cpu: "1000m" ``` ## Contributing See [CONTRIBUTING.md](../CONTRIBUTING.md) for development setup and guidelines.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/stevenjjobson/scs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server