Skip to main content
Glama

Adversary MCP Server

by brettbergin
JSON_RESPONSE_HANDLING_IMPROVEMENTS.md8.01 kB
# JSON Response Handling Improvements This document outlines the comprehensive improvements made to JSON response handling for session-aware LLM analysis and CLI/MCP symmetry. ## Problem Summary The original implementation suffered from several critical issues: 1. **Line Number Generation**: LLM was generating artificial sequential line numbers (8, 12, 16, 20, etc.) because it had no access to actual file content 2. **Confidence Conversion Failures**: String confidence values ("HIGH", "very_high") were failing with "could not convert string to float" errors 3. **Threat Aggregation Over-Merging**: 8 distinct findings were being over-aggregated into 1 combined threat due to poor fingerprinting 4. **CLI/MCP Asymmetry**: Different code paths caused CLI and MCP to produce different results for identical configurations ## Root Cause Analysis ### 1. Session-Aware Analysis Issues - **File Content Access**: LLM had no access to actual file content, only project metadata - **Response Parsing**: JSON responses contained inconsistent formats for line numbers and confidence values - **Context Enhancement**: Session-aware prompts were not providing sufficient context for accurate analysis ### 2. JSON Parsing Fragility - **String-to-Float Conversion**: Confidence mappings were incomplete for common LLM response formats - **Line Number Extraction**: No handling for "estimated_X-Y" format returned by session-aware analysis - **Error Handling**: JSON parsing failures caused complete analysis breakdown ### 3. Code Path Divergence - **CLI vs MCP**: Different methods used for file analysis (`analyze_file_with_context` vs `analyze_with_session`) - **Content Formatting**: Inconsistent file content presentation to LLM between interfaces ## Comprehensive Solutions Implemented ### 1. Enhanced File Content Access (`src/adversary_mcp_server/scanner/llm_scanner.py:analyze_file_with_context`) ```python # Read the file content with line numbers try: with open(file_path, "r", encoding="utf-8", errors="ignore") as f: file_lines = f.readlines() # Format content with line numbers numbered_content = "" for i, line in enumerate(file_lines, 1): numbered_content += f"{i:4d} | {line}" query = f"""Analyze {file_path.name} ({language}) for security vulnerabilities. ## File Content with Line Numbers: ```{language} {numbered_content}``` Please analyze the above code for security vulnerabilities. Provide the EXACT line number where each vulnerability occurs.""" ``` **Impact**: LLM now has complete file content with accurate line numbers, eliminating artificial sequential numbering. ### 2. Robust Confidence String-to-Float Conversion (`src/adversary_mcp_server/session/llm_session_manager.py:_create_finding_from_data`) ```python # Map confidence string to float confidence_value = finding_data.get("confidence", 0.8) if isinstance(confidence_value, str): confidence_str = confidence_value.lower() confidence_map = { "very_low": 0.1, "low": 0.3, "medium": 0.5, "high": 0.8, "very_high": 0.95, } confidence = confidence_map.get(confidence_str, 0.8) else: confidence = float(confidence_value) ``` **Impact**: All common confidence string formats now convert properly, eliminating "could not convert string to float" errors. ### 3. Enhanced Line Number Extraction (`src/adversary_mcp_server/session/llm_session_manager.py:_extract_line_number`) ```python def _extract_line_number(self, line_number_str: str | int) -> int: """Extract line number from various formats (e.g., 'estimated_10-15' -> 10).""" if isinstance(line_number_str, int): return max(1, line_number_str) if isinstance(line_number_str, str): # Handle formats like "estimated_10-15" if "estimated_" in line_number_str: numbers = line_number_str.replace("estimated_", "").split("-") try: return max(1, int(numbers[0])) except (ValueError, IndexError): return 1 # Handle direct number strings try: return max(1, int(line_number_str)) except ValueError: return 1 return 1 ``` **Impact**: Session-aware analysis responses with "estimated_X-Y" formats now extract correct line numbers. ### 4. CLI/MCP Symmetry Achievement (`src/adversary_mcp_server/application/mcp_server.py:adv_scan_file`) Applied identical file content enhancement to MCP interface: ```python # Same file content formatting as CLI language = self._detect_language(file_path) try: with open(file_path, "r", encoding="utf-8", errors="ignore") as f: file_lines = f.readlines() # Format content with line numbers (identical to CLI) numbered_content = "" for i, line in enumerate(file_lines, 1): numbered_content += f"{i:4d} | {line}" query = f"""Analyze {file_path.name} ({language}) for security vulnerabilities. ## File Content with Line Numbers: ```{language} {numbered_content}``` Please analyze the above code for security vulnerabilities. Provide the EXACT line number where each vulnerability occurs.""" ``` **Impact**: Perfect symmetry achieved between CLI and MCP interfaces. ## Test Coverage for Regression Prevention ### Core JSON Parsing Tests (`tests/session/test_json_parsing_specific.py`) 1. **Line Number Extraction**: Tests all formats including "estimated_X-Y" 2. **Confidence Conversion**: Tests all string mappings and numeric passthroughs 3. **Severity Mapping**: Tests case-insensitive string-to-enum conversion 4. **JSON Sanitization**: Tests malformed JSON handling 5. **Finding Creation**: Tests with minimal and complete data 6. **Exception Handling**: Tests graceful degradation ### Symmetry Validation **CLI vs MCP Results for `examples/vulnerable_python.py`:** - **Total Threats**: 11 (both CLI and MCP) - **Severity Distribution**: Critical: 7, High: 4 (identical) - **Scanner Attribution**: LLM: 3, Semgrep: 2, Combined: 6 (identical) - **Line Numbers**: All accurate (25, 32, 40, 49, 55, 62, 76, 91, 102, 109, 126) ## Results Achieved ### Before Fix - **8 → 1 Over-Aggregation**: Multiple distinct vulnerabilities merged into single threat - **Artificial Line Numbers**: Sequential numbering (8, 12, 16, 20) instead of actual locations - **Confidence Conversion Errors**: "could not convert string to float: 'HIGH'" - **CLI/MCP Asymmetry**: Different results for identical configurations ### After Fix - **Perfect Threat Resolution**: 11 distinct threats with accurate details - **Accurate Line Numbers**: Exact source code locations (25, 32, 40, etc.) - **Robust Confidence Handling**: All string formats convert properly - **CLI/MCP Symmetry**: Identical results for identical configurations ## Maintenance Notes ### Critical Methods to Monitor 1. `LLMSessionManager._extract_line_number()` - Line number parsing 2. `LLMSessionManager._create_finding_from_data()` - Confidence conversion 3. `LLMScanner.analyze_file_with_context()` - File content formatting 4. `CleanMCPServer.adv_scan_file()` - MCP interface symmetry ### Regression Indicators - Line numbers returning to sequential patterns (8, 12, 16, 20) - Confidence conversion errors in logs - CLI and MCP producing different threat counts for same file - Threat over-aggregation (many findings → few threats) ### Test Commands for Validation ```bash # Run focused JSON parsing tests python -m pytest tests/session/test_json_parsing_specific.py -v # Symmetry validation adversary-mcp-cli scan-file examples/vulnerable_python.py --use-llm --no-validation # vs MCP tool with identical parameters ``` ## Future Enhancements 1. **Enhanced JSON Extraction**: Support for more complex nested response formats 2. **Confidence Calibration**: ML-based confidence adjustment based on finding type 3. **Multi-Language Line Mapping**: Language-specific line number extraction 4. **Performance Optimization**: Caching of file content formatting for repeated scans

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/brettbergin/adversary-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server