Skip to main content
Glama

Joern MCP Server

by Lekssays
ARCHITECTURE.md•22.8 kB
# joern-mcp Architecture ## Overview **joern-mcp** is a Model Context Protocol (MCP) server that provides AI assistants with static code analysis capabilities using Joern's Code Property Graph (CPG) technology. It implements a microservices architecture with Docker isolation, Redis caching, and asynchronous query execution. ## High-Level Architecture ``` ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ Client (Claude, etc) │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ │ ā–¼ ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ FastMCP Server (HTTP Transport) │ │ - Port 4242 │ │ - Async request handling │ │ - Streamable HTTP responses │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ │ ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ │ ā–¼ ā–¼ ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ Core Services │ │ Tool Registries │ ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤ ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤ │ • SessionManager │ │ • Core Tools │ │ • DockerOrchestrator │ │ • Code Browsing Tools │ │ • CPGGenerator │ │ • Taint Analysis Tools │ │ • QueryExecutor │ │ • Additional Tools │ │ • GitManager │ │ │ │ • RedisClient │ │ │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ │ ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ │ ā–¼ ā–¼ ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ Docker Containers │ │ Redis Cache │ │ (Joern per session) │ │ - Sessions │ │ │ │ - Query Results │ │ • One container per session │ │ - CPG Mappings │ │ • Isolated environments │ │ │ │ • CPG generation & queries │ │ │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ │ ā–¼ ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ Playground Directory │ │ - codebases/ │ │ - cpgs/ │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ``` ## Core Components ### 1. FastMCP Server (`main.py`) **Purpose**: Entry point and server initialization using the FastMCP 2.12.4 framework. **Key Features**: - HTTP transport for network accessibility - Lifespan context manager for startup/shutdown - Concurrent client support - Streamable HTTP responses **Responsibilities**: - Service initialization and lifecycle management - Tool registration - Request routing - Error handling and logging ### 2. Service Layer (`src/services/`) #### SessionManager **File**: `session_manager.py` Manages analysis sessions with: - Session creation and lifecycle (CREATING → READY → CLOSED) - TTL-based automatic cleanup - Redis backing for persistence - Session metadata (language, source type, source path) - Container tracking **Key Methods**: ```python create_session(source_type, source_path, language) get_session(session_id) update_session(session_id, status) touch_session(session_id) # Reset TTL close_session(session_id) ``` #### DockerOrchestrator **File**: `docker_orchestrator.py` Orchestrates Docker container lifecycle: - Container creation per session - Volume mounting (workspace + playground) - Container cleanup - Resource management **Key Methods**: ```python start_container(session_id, workspace_path, playground_path) → container_id stop_container(container_id) get_container(container_id) cleanup() ``` **Mount Structure**: ``` Container ā”œā”€ā”€ /workspace → {workspace_root}/repos/{session_id} ā”œā”€ā”€ /playground → {playground_path} └── /cpg.bin → Generated CPG binary ``` #### CPGGenerator **File**: `cpg_generator.py` Generates Code Property Graphs from source code: - Language detection - Exclusion pattern application - Java memory optimization - CPG caching by source+language hash - Progress tracking **Key Methods**: ```python generate_cpg(session_id, source_path, language) → cpg_path register_session_container(session_id, container_id) ``` **Caching Strategy**: - Cache key: `SHA256({source_path}|{language}|{exclusion_patterns})` - CPGs stored in `playground/cpgs/{cache_key}/cpg.bin` - Reuse across sessions with identical sources #### QueryExecutor **File**: `query_executor.py` Executes CPGQL queries in Docker containers: - Two execution paths: - **Persistent shell mode**: Reuses loaded CPG (fast) - **Direct mode**: Spawns new Joern process (slower, fallback) - Query timeout handling - Result parsing (JSON, CSV) - Error recovery **Query Execution Flow**: ``` 1. Get container for session 2. Create unique query ID 3. Write query to temp file in container 4. Execute via Joern shell/script 5. Read JSON result from /tmp/ 6. Parse and return results ``` #### GitManager **File**: `git_manager.py` Handles repository operations: - Repository cloning - Branch selection - Token-based authentication - Rate limiting **Key Methods**: ```python clone_repository(repo_url, target_path, branch, token) ``` #### RedisClient **File**: `src/utils/redis_client.py` Async Redis wrapper for: - Session persistence - Query result caching - Container ID mapping - Connection pooling ### 3. Tool Layer (`src/tools/`) #### Core Tools (`core_tools.py`) - `create_cpg_session`: Initialize analysis session - `run_cpgql_query`: Synchronous query execution - `run_cpgql_query_async`: Asynchronous query execution - `get_query_status`: Poll async query status - `get_query_result`: Retrieve async results - `get_session_status`: Session metadata - `list_sessions`: Active sessions with filtering - `close_session`: Clean up session resources #### Code Browsing Tools (`code_browsing_tools.py`) - `get_codebase_summary`: File/method/call statistics - `list_files`: Filter files by name/path patterns - `list_methods`: Discover methods with filtering - `get_method_source`: Retrieve method source code - `list_calls`: Find call relationships - `get_call_graph`: Build call graphs (depth-limited BFS) - `list_parameters`: Method parameter information - `find_literals`: Search for hardcoded values - `get_code_snippet`: Extract code by line range - `find_bounds_checks`: Detect buffer safety checks #### Taint Analysis Tools (`taint_analysis_tools.py`) - `find_taint_sources`: Locate external input points - `find_taint_sinks`: Find dangerous operations - `find_taint_flows`: Trace data from source to sink - `find_argument_flows`: Match expressions across calls - `check_method_reachability`: BFS reachability in call graph - `list_taint_paths`: Enumerate detailed flow paths - `get_program_slice`: Backward/forward data slicing - `get_data_dependencies`: Variable dependency tracking ### 4. Configuration (`src/config.py`, `config.yaml`) **ServerConfig**: ```yaml host: 0.0.0.0 port: 4242 log_level: INFO ``` **RedisConfig**: ```yaml host: localhost port: 6379 password: null db: 0 ``` **JoernConfig**: ```yaml binary_path: joern memory_limit: 16g java_opts: "-Xmx16G -Xms8G -XX:+UseG1GC" ``` **CPGConfig**: - `generation_timeout`: 600s - `max_repo_size_mb`: 500MB - `exclusion_patterns`: 50+ patterns for test/doc/build dirs - `supported_languages`: 13 languages - `taint_sources/sinks`: Per-language patterns **SessionConfig**: ```yaml ttl: 3600 # Session timeout idle_timeout: 1800 # Inactivity timeout max_concurrent: 100 # Max concurrent sessions ``` **StorageConfig**: ```yaml workspace_root: /tmp/joern-mcp cleanup_on_shutdown: true ``` ### 5. Data Models (`src/models.py`) **Session**: ```python @dataclass class Session: id: str status: str # CREATING, READY, CLOSED source_type: str # local, github source_path: str language: str created_at: float last_accessed: float container_id: Optional[str] cpg_path: Optional[str] ``` **QueryResult**: ```python @dataclass class QueryResult: success: bool data: List[Any] # Query results error: Optional[str] # Error message row_count: int execution_time: float ``` **SourceType** (Enum): - LOCAL: Local file system - GITHUB: GitHub repository **SessionStatus** (Enum): - CREATING: CPG generation in progress - READY: Ready for queries - CLOSED: Cleaned up ## Request Flow Diagram ### Create Session Flow ``` client.create_cpg_session() │ ā–¼ validate inputs (source_type, language) │ ā”œā”€ GITHUB: git_manager.clone_repository() └─ LOCAL: use source_path directly │ ā–¼ session_manager.create_session() → Session(CREATING) │ ā–¼ docker.start_container(workspace_path, playground_path) │ ā–¼ cpg_generator.generate_cpg(source_path, language) │ (runs in container) ā–¼ Session(READY) + CPG path stored in Redis ``` ### Query Execution Flow ``` client.run_cpgql_query(session_id, query) │ ā–¼ session_manager.get_session(session_id) │ ā–¼ query_executor.execute_query() │ ā”œā”€ Get container for session │ ā”œā”€ Generate unique query_id │ ā”œā”€ Write query script to /tmp/query_{id}.sc │ ā”œā”€ Execute: joern --script /tmp/query_{id}.sc │ └─ Output: /tmp/query_result_{id}.json │ ā”œā”€ Read result JSON │ └─ Parse and return QueryResult ``` ## Directory Structure ``` joern-mcp/ ā”œā”€ā”€ src/ │ ā”œā”€ā”€ __init__.py │ ā”œā”€ā”€ config.py # Configuration loading │ ā”œā”€ā”€ models.py # Data models │ ā”œā”€ā”€ utils.py # Helper utilities │ │ │ ā”œā”€ā”€ services/ │ │ ā”œā”€ā”€ __init__.py │ │ ā”œā”€ā”€ session_manager.py # Session lifecycle │ │ ā”œā”€ā”€ docker_orchestrator.py # Docker management │ │ ā”œā”€ā”€ cpg_generator.py # CPG generation │ │ ā”œā”€ā”€ query_executor.py # Query execution │ │ └── git_manager.py # Git operations │ │ │ ā”œā”€ā”€ tools/ │ │ ā”œā”€ā”€ __init__.py │ │ ā”œā”€ā”€ core_tools.py # Core MCP tools │ │ ā”œā”€ā”€ code_browsing_tools.py │ │ ā”œā”€ā”€ taint_analysis_tools.py │ │ └── mcp_tools.py # Tool registry │ │ │ └── utils/ │ ā”œā”€ā”€ __init__.py │ ā”œā”€ā”€ redis_client.py # Redis wrapper │ ā”œā”€ā”€ validators.py # Input validation │ └── exceptions.py # Custom exceptions │ ā”œā”€ā”€ playground/ │ ā”œā”€ā”€ codebases/ # Source code to analyze │ │ └── core/ # Sample C codebase │ │ └── core.c │ └── cpgs/ # Cached CPG binaries │ ā”œā”€ā”€ tests/ │ ā”œā”€ā”€ test_utils.py │ ā”œā”€ā”€ test_mcp_tools.py │ ā”œā”€ā”€ test_data_dependencies.py │ └── ... │ ā”œā”€ā”€ examples/ │ ā”œā”€ā”€ sample_client.py # Basic usage demo │ ā”œā”€ā”€ browsing_client.py # Code browsing demo │ ā”œā”€ā”€ taint_client.py # Taint analysis demo │ └── ... │ ā”œā”€ā”€ main.py # Server entry point ā”œā”€ā”€ config.yaml # Runtime configuration ā”œā”€ā”€ config.example.yaml # Configuration template ā”œā”€ā”€ requirements.txt # Python dependencies ā”œā”€ā”€ pyproject.toml # Project metadata ā”œā”€ā”€ setup.sh # Setup script ā”œā”€ā”€ build.sh # Build script ā”œā”€ā”€ cleanup.py # Resource cleanup utility ā”œā”€ā”€ Dockerfile.joern # Joern container image ā”œā”€ā”€ README.md # User documentation └── ARCHITECTURE.md # This file ``` ## Data Flow Patterns ### Session Lifecycle ``` ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ Session Lifecycle │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ 1. CREATE PHASE (0-2 min typically) ā”œā”€ create_cpg_session() called ā”œā”€ Session marked as CREATING ā”œā”€ Docker container started ā”œā”€ CPG generated (1-3 min for large repos) └─ Session marked as READY 2. ACTIVE PHASE (0-3600s) ā”œā”€ Queries executed via run_cpgql_query() ā”œā”€ Each query increments session.last_accessed ā”œā”€ Results cached in Redis └─ TTL reset on each access 3. IDLE PHASE (1800s max) ā”œā”€ No queries received for 30 minutes ā”œā”€ Session eligible for cleanup └─ Resources released 4. CLEANUP PHASE ā”œā”€ Container stopped ā”œā”€ Workspace deleted ā”œā”€ Redis session entry removed └─ Next access creates new session (cache CPG) ``` ### Query Caching Strategy ``` ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ Query Result Caching │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ Query received │ ā”œā”€ Generate cache key: SHA256(session_id + query) │ ā”œā”€ Check Redis: cache_key exists? │ ā”œā”€ YES: Return cached result (fast path ~10ms) │ └─ NO: Continue to execution │ ā”œā”€ Execute query in container (1-10s) │ ā”œā”€ Store result in Redis │ └─ TTL: 5 minutes (QueryConfig.cache_ttl) │ └─ Return result to client Benefits: - Identical queries return instantly - Reduces container CPU load - Enables fast iteration for analysis ``` ### CPG Caching Strategy ``` ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ CPG Generation Caching │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ generate_cpg(source_path, language) │ ā”œā”€ Calculate cache key: │ cache_key = SHA256( │ source_path + │ language + │ exclusion_patterns_hash │ ) │ ā”œā”€ Check: playground/cpgs/{cache_key}/cpg.bin exists? │ ā”œā”€ YES: Load from disk (2-5s, cache hit) │ └─ NO: Generate new CPG (2-10 min, cache miss) │ └─ Return cpg_path Implications: - First session with source: slow (2-10 min) - Subsequent sessions: fast (reuse CPG) - GitHub repos cached by URL+branch - Local codebases cached by absolute path ``` ## Concurrency Model ### Threading & Async ``` FastMCP Server (Async) ā”œā”€ Handler 1: create_cpg_session() │ └─ cpu_generator.generate_cpg() → runs in Docker │ ā”œā”€ Handler 2: run_cpgql_query() │ └─ query_executor.execute_query() → Docker │ └─ Handler N: ... Each handler: - Async/await for I/O - Docker subprocess executed in executor thread pool - Redis operations async via aioredis - Max concurrent: 100 sessions (config) ``` ### Session Isolation ``` Each session gets: - Unique session_id (UUID) - Dedicated Docker container - Isolated /workspace directory - Isolated /tmp within container - Independent CPG instance No cross-session interference: - Queries only access own CPG - Containers cannot see each other - Workspace isolated by bind mount - Redis keys namespaced by session_id ``` ## Error Handling Strategy ### Error Classification **Validation Errors** (HTTP 400): - Invalid source_type - Unsupported language - Malformed queries **Not Found Errors** (HTTP 404): - Session doesn't exist - Method not found **Resource Errors** (HTTP 503): - Docker connection failure - Redis connection failure - Disk space exhausted **Execution Errors** (HTTP 500): - CPG generation timeout - Query execution timeout - Joern subprocess crash ### Error Recovery ```python # Example: Query Execution try: result = await query_executor.execute_query(...) except QueryExecutionError as e: logger.error(f"Query failed: {e}") return { "success": False, "error": { "code": "QUERY_ERROR", "message": str(e) } } except Exception as e: logger.exception(f"Unexpected error: {e}") return { "success": False, "error": { "code": "INTERNAL_ERROR", "message": "An unexpected error occurred" } } ``` ## Performance Characteristics ### Typical Operation Times | Operation | Time | Notes | |-----------|------|-------| | Session creation (new) | 2-10 min | Depends on repo size, CPG generation is slow | | Session creation (cached) | 10-30s | Docker container startup + CPG load | | Simple query | 500ms-2s | Depends on CPG complexity | | Complex query | 5-30s | BFS traversals, large result sets | | Query cache hit | ~10ms | Redis lookup | | Codebase summary | 1-5s | Statistics collection | | Call graph (depth 5) | 2-10s | Graph traversal | | Taint analysis | 5-60s | Path exploration algorithm | ### Memory Usage | Component | Typical | Peak | |-----------|---------|------| | Server process | 100-200MB | 300MB+ with many sessions | | Docker container | 2-4GB | 8GB+ for large CPGs | | Redis | 50-500MB | Per cached query results | | Workspace (/tmp) | 500MB-5GB | Per CPG (persistent) | ### Scalability Limits - **Max concurrent sessions**: 100 (configurable) - **Max repo size**: 500MB (configurable) - **Query timeout**: 300s (configurable) - **CPG generation timeout**: 600s (configurable) - **Session TTL**: 3600s (1 hour, configurable) ## Security Considerations ### Input Validation ```python # All user inputs validated before processing validate_source_type(source_type) # Must be "local" or "github" validate_language(language) # Must be in supported list validate_session_id(session_id) # Must be valid UUID validate_github_url(url) # Must be valid GitHub URL ``` ### Container Isolation - Each session runs in separate Docker container - No file system access outside bound volumes - Network access controlled by Docker daemon - Resource limits applied (CPU, memory) ### Query Safety - CPGQL queries execute in isolated Joern instance - No direct shell access (parameterized execution) - Timeout prevents runaway queries - Results sanitized before return ### Credential Management - GitHub tokens not persisted - Passed only at clone time - Credentials never logged - Environment variables for secrets ## Extension Points ### Adding New Tools 1. **Create tool function**: ```python # tools/new_tools.py @mcp.tool() async def my_new_tool(session_id: str, param1: str) -> Dict[str, Any]: """Tool description""" pass ``` 2. **Register tool**: ```python # tools/__init__.py from .new_tools import register_new_tools def register_tools(mcp, services): # ... existing registrations register_new_tools(mcp, services) ``` ### Adding New Services 1. **Create service class**: ```python # services/new_service.py class NewService: async def initialize(self): pass async def cleanup(self): pass ``` 2. **Integrate in lifespan**: ```python # main.py services['new_service'] = NewService() await services['new_service'].initialize() ``` ## Testing Strategy ### Unit Tests (`tests/`) - Model validation - Utility function behavior - Error handling ### Integration Tests - Service interactions - Docker communication - Redis operations ### End-to-End Tests - Full request workflows - Examples in `examples/` ## Deployment ### Production Checklist - [ ] Configure `config.yaml` with production settings - [ ] Set environment variables for secrets - [ ] Configure Redis for persistence - [ ] Set up monitoring/logging aggregation - [ ] Configure resource limits - [ ] Set up backup for playground/cpgs - [ ] Test with target codebase size - [ ] Monitor memory and disk usage --- **Last Updated**: October 2025 **Version**: 0.2.1

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Lekssays/joern-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server