Skip to main content
Glama

Joern MCP Server

by Lekssays
README.md10.7 kB
# 🕷️ joern-mcp A Model Context Protocol (MCP) server that provides AI assistants with static code analysis capabilities using [Joern](https://joern.io)'s Code Property Graph (CPG) technology. ## Features - **Multi-Language Support**: Java, C/C++, JavaScript, Python, Go, Kotlin, C#, Ghidra, Jimple, PHP, Ruby, Swift - **Docker Isolation**: Each analysis session runs in a secure container - **GitHub Integration**: Analyze repositories directly from GitHub URLs - **Session-Based**: Persistent CPG sessions with automatic cleanup - **Redis-Backed**: Fast caching and session management - **Async Queries**: Non-blocking CPG generation and query execution ## Quick Start ### Prerequisites - Python 3.8+ - Docker - Redis - Git ### Installation 1. **Clone and install dependencies**: ```bash git clone https://github.com/Lekssays/joern-mcp.git cd joern-mcp pip install -r requirements.txt ``` 2. **Setup (builds Joern image and starts Redis)**: ```bash ./setup.sh ``` 3. **Configure** (optional): ```bash cp config.example.yaml config.yaml # Edit config.yaml as needed ``` 4. **Run the server**: ```bash python main.py # Server will be available at http://localhost:4242 ``` ## Integration with GitHub Copilot The server uses **Streamable HTTP** transport for network accessibility and supports multiple concurrent clients. Add to your VS Code `settings.json`: ```json { "github.copilot.advanced": { "mcp": { "servers": { "joern-mcp": { "url": "http://localhost:4242/mcp", } } } } } ``` Make sure the server is running before using it with Copilot: ```bash python main.py ``` ## Available Tools ### Core Tools - **`create_cpg_session`**: Initialize analysis session from local path or GitHub URL - **`run_cpgql_query`**: Execute synchronous CPGQL queries with JSON output - **`run_cpgql_query_async`**: Execute asynchronous queries with status tracking - **`get_query_status`**: Check status of asynchronously running queries - **`get_query_result`**: Retrieve results from completed queries - **`cleanup_queries`**: Clean up old completed query results - **`get_session_status`**: Check session state and metadata - **`list_sessions`**: View active sessions with filtering - **`close_session`**: Clean up session resources - **`cleanup_all_sessions`**: Clean up multiple sessions and containers ### Code Browsing Tools - **`get_codebase_summary`**: Get high-level overview of codebase (file count, method count, language) - **`list_files`**: List all source files with optional regex filtering - **`list_methods`**: Discover all methods/functions with filtering by name, file, or external status - **`get_method_source`**: Retrieve actual source code for specific methods - **`list_calls`**: Find function call relationships and dependencies - **`get_call_graph`**: Build call graphs (outgoing callees or incoming callers) with configurable depth - **`list_parameters`**: Get detailed parameter information for methods - **`find_literals`**: Search for hardcoded values (strings, numbers, API keys, etc) - **`get_code_snippet`**: Retrieve code snippets from files with line range ### Security Analysis Tools - **`find_taint_sources`**: Locate likely external input points (taint sources) - **`find_taint_sinks`**: Locate dangerous sinks where tainted data could cause vulnerabilities - **`find_taint_flows`**: Find dataflow paths from sources to sinks using Joern dataflow primitives - **`find_argument_flows`**: Find flows where the exact same expression is passed to both source and sink calls - **`check_method_reachability`**: Check if one method can reach another through the call graph - **`list_taint_paths`**: List detailed taint flow paths from sources to sinks - **`get_program_slice`**: Build a program slice from a specific line or call ### Example Usage ```python # Create session from GitHub { "tool": "create_cpg_session", "arguments": { "source_type": "github", "source_path": "https://github.com/user/repo", "language": "java" } } # Get codebase overview { "tool": "get_codebase_summary", "arguments": { "session_id": "abc-123-def" } } # List all methods in the codebase { "tool": "list_methods", "arguments": { "session_id": "abc-123-def", "include_external": false, "limit": 50 } } # Get source code for a specific method { "tool": "get_method_source", "arguments": { "session_id": "abc-123-def", "method_name": "authenticate" } } # Find what methods call a specific function { "tool": "get_call_graph", "arguments": { "session_id": "abc-123-def", "method_name": "execute_query", "depth": 2, "direction": "incoming" } } # Search for hardcoded secrets { "tool": "find_literals", "arguments": { "session_id": "abc-123-def", "pattern": "(?i).*(password|secret|api_key).*", "limit": 20 } } # Get code snippet from a file { "tool": "get_code_snippet", "arguments": { "session_id": "abc-123-def", "filename": "src/main.c", "start_line": 10, "end_line": 25 } } # Run custom CPGQL query { "tool": "run_cpgql_query", "arguments": { "session_id": "abc-123-def", "query": "cpg.method.name.l" } } # Find potential security vulnerabilities { "tool": "find_taint_sources", "arguments": { "session_id": "abc-123-def", "language": "c" } } # Check for data flows from sources to sinks { "tool": "find_taint_flows", "arguments": { "session_id": "abc-123-def", "source_patterns": ["getenv", "fgets"], "sink_patterns": ["system", "sprintf"] } } # Find argument flows between function calls { "tool": "find_argument_flows", "arguments": { "session_id": "abc-123-def", "source_name": "validate_input", "sink_name": "process_data", "arg_index": 0 } } # Get detailed taint paths { "tool": "list_taint_paths", "arguments": { "session_id": "abc-123-def", "source_pattern": "getenv", "sink_pattern": "system", "max_paths": 5 } } # Build program slice for security analysis { "tool": "get_program_slice", "arguments": { "session_id": "abc-123-def", "filename": "main.c", "line_number": 42, "call_name": "memcpy" } } ``` ### Security Analysis Capabilities The security analysis tools provide comprehensive vulnerability detection including: **Taint Analysis:** - Source identification: `find_taint_sources` locates external input points - Sink identification: `find_taint_sinks` finds dangerous operations - Flow analysis: `find_taint_flows` traces data from sources to sinks - Argument flow analysis: `find_argument_flows` finds exact expression reuse between calls - Path enumeration: `list_taint_paths` provides detailed propagation chains **Program Slicing:** - Backward slicing: `get_program_slice` shows all code affecting a specific operation - Data dependencies: Variable assignments and data flow tracking - Control dependencies: Conditional statements affecting execution **Reachability Analysis:** - Method connectivity: `check_method_reachability` verifies call graph connections - Impact analysis: Understand potential execution paths ## Configuration Key settings in `config.yaml`: ```yaml server: host: 0.0.0.0 port: 4242 log_level: INFO redis: host: localhost port: 6379 sessions: ttl: 3600 # Session timeout (seconds) max_concurrent: 50 # Max concurrent sessions cpg: generation_timeout: 600 # CPG generation timeout (seconds) supported_languages: [java, c, cpp, javascript, python, go, kotlin, csharp, ghidra, jimple, php, ruby, swift] ``` Environment variables override config file settings (e.g., `MCP_HOST`, `REDIS_HOST`, `SESSION_TTL`). ## Example CPGQL Queries **Find all methods:** ```scala cpg.method.name.l ``` **Find hardcoded secrets:** ```scala cpg.literal.code("(?i).*(password|secret|api_key).*").l ``` **Find SQL injection risks:** ```scala cpg.call.name(".*execute.*").where(_.argument.isLiteral.code(".*SELECT.*")).l ``` **Find complex methods:** ```scala cpg.method.filter(_.cyclomaticComplexity > 10).l ``` ## Architecture - **FastMCP Server**: Built on FastMCP 2.12.4 framework with **Streamable HTTP** transport - **HTTP Transport**: Network-accessible API supporting multiple concurrent clients - **Docker Containers**: One isolated Joern container per session - **Redis**: Session state and query result caching - **Async Processing**: Non-blocking CPG generation - **CPG Caching**: Reuse CPGs for identical source/language combinations ## Development ### Project Structure ``` joern-mcp/ ├── src/ │ ├── services/ # Session, Docker, Git, CPG, Query services │ ├── tools/ # MCP tool definitions │ ├── utils/ # Redis, logging, validators │ └── models.py # Data models ├── playground/ # Test codebases and CPGs ├── main.py # Server entry point ├── config.yaml # Configuration └── requirements.txt # Dependencies ``` ### Running Tests ```bash # Install dev dependencies pip install -r requirements.txt # Run tests pytest # Run with coverage pytest --cov=src --cov-report=html ``` ### Code Quality ```bash # Format black src/ tests/ isort src/ tests/ # Lint flake8 src/ tests/ mypy src/ ``` ## Troubleshooting **Setup issues:** ```bash # Re-run setup to rebuild and restart services ./setup.sh ``` **Docker issues:** ```bash # Verify Docker is running docker ps # Check Joern image docker images | grep joern # Check Redis container docker ps | grep joern-redis ``` **Redis connection issues:** ```bash # Test Redis connection docker exec joern-redis redis-cli ping # Check Redis logs docker logs joern-redis # Restart Redis docker restart joern-redis ``` **Server connectivity:** ```bash # Test server is running curl http://localhost:4242/health # Check server logs for errors python main.py ``` **Loading large projects:** ```yaml joern: binary_path: ${JOERN_BINARY_PATH:joern} memory_limit: ${JOERN_MEMORY_LIMIT:16g} java_opts: ${JOERN_JAVA_OPTS:-Xmx16G -Xms8G -XX:+UseG1GC -Dfile.encoding=UTF-8} ``` **Debug logging:** ```bash export MCP_LOG_LEVEL=DEBUG python main.py ``` ## Contributing 1. Fork the repository 2. Create a feature branch: `git checkout -b feature-name` 3. Make changes and add tests 4. Run tests: `pytest && black . && flake8` 5. Submit a pull request ## Acknowledgments - [Joern](https://github.com/joernio/joern) - Static analysis platform - [FastMCP](https://github.com/jlowin/fastmcp) - MCP framework - [Model Context Protocol](https://modelcontextprotocol.io/) - MCP specification --- Built with ❤️ in Doha 🇶🇦

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Lekssays/joern-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server