Local DeepWiki MCP Server

SECURITY_FINDINGS_DETAILS.md•13.1 KiB

# Security Audit - Detailed Technical Findings ## File-by-File Analysis ### CRITICAL FINDINGS --- ## Finding 1: API Key Exposure in Memory **Severity:** HIGH **CWE:** CWE-798 (Use of Hard-Coded Credentials) **Files Affected:** - `/src/local_deepwiki/providers/llm/anthropic.py:45` - `/src/local_deepwiki/providers/llm/openai.py:49` - `/src/local_deepwiki/providers/embeddings/openai.py:35` ### Vulnerable Code **File: `/src/local_deepwiki/providers/llm/anthropic.py`** ```python 37 def __init__(self, model: str = "claude-sonnet-4-20250514", api_key: str | None = None): 38 """Initialize the Anthropic provider. 39 40 Args: 41 model: Anthropic model name. 42 api_key: Optional API key. Uses ANTHROPIC_API_KEY env var if not provided. 43 """ 44 self._model = model 45 self._api_key = api_key or os.environ.get("ANTHROPIC_API_KEY") # ⚠️ EXPOSED 46 self._client = AsyncAnthropic(api_key=self._api_key) ``` **File: `/src/local_deepwiki/providers/llm/openai.py`** ```python 41 def __init__(self, model: str = "gpt-4o", api_key: str | None = None): 42 """Initialize the OpenAI provider. 43 44 Args: 45 model: OpenAI model name. 46 api_key: Optional API key. Uses OPENAI_API_KEY env var if not provided. 47 """ 48 self._model = model 49 self._api_key = api_key or os.environ.get("OPENAI_API_KEY") # ⚠️ EXPOSED 50 self._client = AsyncOpenAI(api_key=self._api_key) ``` ### Issues 1. **Plaintext Storage:** API keys stored as instance variables in plaintext 2. **No Encryption:** Keys held in memory with no protection 3. **Weak Access Control:** Any code with access to provider instance can extract key 4. **Memory Dump Risk:** If process memory is dumped, keys are exposed 5. **Long Lifetime:** Keys persist for entire provider instance lifetime ### Attack Scenario ```python # Attacker with process access could do: provider = AnthropicProvider() # Extract key via: # 1. Process memory dump # 2. Debugging tools (gdb, lldb) # 3. Introspection in Python REPL # 4. Accidental logging ``` ### Proof of Concept ```python import local_deepwiki.providers.llm.anthropic as anthro provider = anthro.AnthropicProvider() # Attacker can access via: provider._api_key exposed_key = provider._api_key # ⚠️ KEY EXPOSED ``` ### Remediation 1. Use environment variables only, never store in instance 2. Implement credential manager pattern 3. Clear credentials after use (if possible) 4. Add secret masking in logs --- ## Finding 2: Insufficient Error Message Sanitization **Severity:** HIGH **CWE:** CWE-209 (Information Exposure Through an Error Message) **Files Affected:** - `/src/local_deepwiki/handlers.py:131-135` - `/src/local_deepwiki/handlers.py:254-256` - Multiple provider files ### Vulnerable Code **File: `/src/local_deepwiki/handlers.py`** ```python 128 except Exception as e: # noqa: BLE001 129 # Broad catch is intentional: top-level error handler for MCP tools 130 # that converts any unhandled exception to a user-friendly error message 131 logger.exception(f"Unexpected error in {func.__name__}: {e}") 132 error = DeepWikiError( 133 message=f"An unexpected error occurred: {e}", # ⚠️ EXPOSED 134 hint="Check the logs for more details. If this persists, please report the issue.", 135 ) 136 return [TextContent(type="text", text=format_error_response(error))] ``` **File: `/src/local_deepwiki/providers/llm/anthropic.py`** ```python 254 except Exception as e: 255 self._handle_api_error(e) 256 raise # ⚠️ May expose full API error ``` ### Examples of Information Leakage 1. **File Path Exposure** ``` Error: /home/user/.config/local-deepwiki/models/embedding.bin: No such file → Reveals home directory structure ``` 2. **API Endpoint Exposure** ``` Error: Connection refused to http://localhost:11434/api/chat → Reveals internal Ollama configuration ``` 3. **Model Information** ``` Error: Model 'some-invalid-model' not found. Available models: [list] → Reveals what models are installed ``` 4. **Library Information** ``` Error: anthropic.APIConnectionError: Unable to connect → Reveals specific library versions through error format ``` ### Attack Scenario 1. Attacker uses repository indexing tool 2. Deliberately causes error (e.g., invalid LLM configuration) 3. Error message reveals internal paths or configurations 4. Attacker maps infrastructure ### Remediation ```python # VULNERABLE (current) except Exception as e: message = f"An unexpected error occurred: {e}" # SECURE (recommended) except Exception as e: logger.exception(f"Detailed error: {e}") # Generic message to user, no details message = "An unexpected error occurred. Please contact support." ``` --- ## Finding 3: Missing Access Control Layer **Severity:** MEDIUM **CWE:** CWE-639 (Authorization Bypass Through User-Controlled Key) **Files Affected:** `/src/local_deepwiki/handlers.py` (all tool handlers) ### Issue No authentication or authorization checks on MCP tool calls. ### Vulnerable Functions ```python # handlers.py:365 - handle_ask_question # No check if user should access this repo repo_path = Path(validated.repo_path).resolve() # handlers.py:930 - handle_search_code # No access control repo_path = Path(validated.repo_path).resolve() # handlers.py:889 - handle_read_wiki_page # No permission validation page_path = (wiki_path / page).resolve() # handlers.py:1006 - handle_export_wiki_html # No authorization check wiki_path = Path(validated.wiki_path).resolve() ``` ### Attack Scenarios **Scenario 1: Unauthorized Repository Access** ``` User A calls: ask_question("/path/to/user-b-private-repo") → No check if User A has access → User A can read any repository ``` **Scenario 2: Secret Exposure** ``` Repository indexed contains: - .env files with API keys - Private credentials - Configuration with passwords User can: ask_question or search_code Result: Secrets exposed in code chunks ``` **Scenario 3: Denial of Service** ``` Attacker calls: index_repository("/massive/repo", full_rebuild=true) → No rate limiting → Can exhaust system resources ``` ### Remediation Implement authorization layer: ```python async def handle_index_repository(args, server=None): # Add permission check user = get_current_user(server) # Extract from MCP context repo_path = Path(args["repo_path"]).resolve() # Check if user allowed if not is_user_allowed_access(user, repo_path): raise AuthorizationError(f"Access denied to {repo_path}") # Continue with indexing... ``` --- ## Finding 4: Dependency Version Pinning Issues **Severity:** MEDIUM **CWE:** CWE-1104 (Use of Unmaintained Third Party Components) **File:** `/pyproject.toml` ### Issues **1. No Upper Version Bounds** ```toml # Current: >=X.X pyyaml >= 6.0 # ⚠️ Could allow 7.0 with breaking changes weasyprint >= 68.0 # ⚠️ Complex parser, new versions may have issues ``` **2. Potential Vulnerabilities** **PyYAML (Line 30)** ```toml pyyaml >= 6.0 ``` Risk: If using `yaml.load()` instead of `yaml.safe_load()`, YAML deserialization RCE possible. **WeasyPrint (Line 36)** ```toml weasyprint >= 68.0 ``` Risk: Complex HTML/CSS rendering, potential DoS via malicious CSS in exported PDFs. **3. External Service Dependencies** ```toml ollama >= 0.4 # External service, RCE if Ollama itself is vulnerable ``` ### Proof of Concept - YAML RCE (if unsafe loading used) ```yaml # Malicious config file exploit: !!python/object/apply:os.system ["rm -rf /"] ``` ### Recommendations ```toml # Add upper bounds pyyaml >= 6.0, < 7.0 weasyprint >= 68.0, < 69.0 ollama >= 0.4, < 1.0 # Or pin to specific versions pyyaml == 6.0.1 weasyprint == 68.1 ``` --- ## YAML Safety Check Required **File:** `/src/local_deepwiki/config.py` **Status:** NEEDS VERIFICATION ### Required Check Search for: ```bash grep -n "yaml.load(" src/local_deepwiki/config.py ``` If found (not safe_load), this is a HIGH severity vulnerability: ```python # VULNERABLE data = yaml.load(config_file) # SECURE data = yaml.safe_load(config_file) ``` --- ## Finding 5: Progress Notification Data Exposure **Severity:** LOW **CWE:** CWE-200 (Exposure of Sensitive Information to an Unauthorized Actor) **File:** `/src/local_deepwiki/handlers.py:1380-1395` ### Vulnerable Code ```python 1380 progress_data = { 1381 "step": latest.current, 1382 "total_steps": latest.total or 0, 1383 "step_type": latest.phase.value, 1384 "message": latest.message, 1385 "eta_seconds": latest.eta_seconds, 1386 "**latest.metadata, # ⚠️ May contain file paths 1387 } 1388 1394 message=json.dumps(progress_data), ``` ### Issue The `metadata` field spreads directly into progress notification: ```python metadata={ "files_processed": status.total_files, "total_files": status.total_files, "chunks_created": status.total_chunks, "pages_generated": len(wiki_structure.pages), } ``` While currently safe, could accidentally leak paths if metadata expanded. ### Remediation ```python # Add allowlist for progress metadata allowed_fields = {"files_processed", "total_files", "chunks_created"} safe_metadata = {k: v for k, v in latest.metadata.items() if k in allowed_fields} ``` --- ## Finding 6: No Input Size Limits on Vector Operations **Severity:** LOW/MEDIUM **CWE:** CWE-770 (Allocation of Resources Without Limits) **File:** `/src/local_deepwiki/handlers.py:392` ### Vulnerable Code ```python 392 search_results = await vector_store.search(question, limit=max_context) ``` ### Issue While `max_context` is bounded (validation.py: max 50 chunks), no limits on: - Prompt/question length - Number of chunks returned for deep research - Total tokens processed ### Attack ``` send: question="a " * 1000000 # 1MB of text Result: Memory exhaustion or API rate limit ``` ### Current Protections - `MAX_WIKI_PAGE_SIZE` = 10MB (line 23, validation.py) - `MAX_CONTEXT_CHUNKS` = 50 (validation.py) - `MAX_SEARCH_LIMIT` = 100 (validation.py) ### Recommendation Add input size validation: ```python MAX_QUESTION_LENGTH = 2000 # characters if len(question) > MAX_QUESTION_LENGTH: raise ValidationError("Question too long") ``` --- ## Finding 7: No Audit Logging **Severity:** MEDIUM **CWE:** CWE-778 (Insufficient Logging) **Impact:** Cannot track who accessed what code ### Missing Audit Trail For: - Who queried which repository - What code was accessed - When access occurred - Export/download events - Configuration changes ### Recommendation Implement audit logging: ```python async def audit_log(user: str, action: str, resource: str, result: str): """Log security-relevant events""" logger.info(f"AUDIT: user={user} action={action} resource={resource} result={result}") ``` --- ## INFRASTRUCTURE SECURITY NOTES ### 1. Configuration File Permissions **Location:** `~/.config/local-deepwiki/config.yaml` **Risk:** Config may contain API keys with world-readable permissions **Mitigation:** ```bash # Should enforce chmod 600 ~/.config/local-deepwiki/config.yaml ``` ### 2. Ollama Local Exposure **Location:** Ollama provider connects to http://localhost:11434 **Risk:** If Ollama runs on non-localhost, credentials could be intercepted **Mitigation:** - Only connect to localhost - Validate Ollama version before use - Consider using Unix sockets instead ### 3. Flask Web UI Security **File:** `/src/local_deepwiki/web/app.py` **Note:** Not thoroughly analyzed, but web UI may have CSRF/XSS issues if user-supplied data rendered without escaping. --- ## SUMMARY OF FINDINGS | Finding | Severity | Type | CWE | Status | |---------|----------|------|-----|--------| | API Key Exposure | HIGH | Credential | 798 | ⚠️ OPEN | | Error Message Leakage | HIGH | InfoDisclosure | 209 | ⚠️ OPEN | | No Access Control | MEDIUM | Authorization | 639 | ⚠️ OPEN | | Version Pinning | MEDIUM | Dependencies | 1104 | ⚠️ OPEN | | Progress Data Leak | LOW | InfoDisclosure | 200 | ⚠️ REVIEW | | No Input Limits | LOW | DoS | 770 | ⚠️ REVIEW | | No Audit Logging | MEDIUM | Logging | 778 | ⚠️ OPEN | --- ## SECURE CODE PATTERNS FOUND ✓ **Path Traversal Protection** - Proper use of `Path.is_relative_to()` ✓ **Input Validation** - Comprehensive Pydantic validation ✓ **Error Handling** - Good exception handling structure ✓ **Async Best Practices** - Proper use of asyncio ✓ **Type Hints** - Full type annotations throughout --- ## TESTING CHECKLIST - [ ] Run `bandit -r src/` for security issues - [ ] Run `pip-audit` for dependency vulnerabilities - [ ] Verify YAML uses only `safe_load()` - [ ] Test path traversal with `../../../etc/passwd` - [ ] Verify API keys not in logs - [ ] Check config file has mode 600 - [ ] Test error messages don't leak paths - [ ] Verify no hardcoded credentials in code - [ ] Test error handling with malformed input - [ ] Verify progress notifications safe --- ## NEXT STEPS 1. **Immediate (this week):** - Implement API key encryption/secure storage - Sanitize error messages 2. **Short-term (next 2 weeks):** - Add access control layer - Fix dependency pinning - Verify YAML safety 3. **Medium-term (next month):** - Add audit logging - Input size validation - Path allowlist/denylist 4. **Long-term (ongoing):** - Regular dependency updates - Security testing in CI/CD - Penetration testing

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/UrbanDiver/local-deepwiki-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

SECURITY_FINDINGS_DETAILS.md•13.1 KiB