Skip to main content
Glama
lucasoeth

mitmproxy-mcp MCP Server

by lucasoeth

analyze_protection

Analyze HTTP traffic flows to detect bot protection mechanisms and extract challenge details for security testing and analysis.

Instructions

Analyze flow for bot protection mechanisms and extract challenge details

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
session_idYesThe ID of the session
flow_indexYesThe index of the flow to analyze
extract_scriptsNoWhether to extract and analyze JavaScript from the response (default: true)

Implementation Reference

  • The main handler function for the 'analyze_protection' tool. It retrieves the specified flow from the session, performs analysis using helper functions for protection systems, cookies, challenges, scripts, and generates suggestions, returning a JSON summary.
    async def analyze_protection(arguments: dict) -> list[types.TextContent]:
        """
        Analyze a flow for bot protection mechanisms and extract challenge details.
        """
        session_id = arguments.get("session_id")
        flow_index = arguments.get("flow_index")
        extract_scripts = arguments.get("extract_scripts", True)
        
        if not session_id:
            return [types.TextContent(type="text", text="Error: Missing session_id")]
        if flow_index is None:
            return [types.TextContent(type="text", text="Error: Missing flow_index")]
        
        try:
            flows = await get_flows_from_dump(session_id)
            
            try:
                flow = flows[flow_index]
                
                if flow.type != "http":
                    return [types.TextContent(type="text", text=f"Error: Flow {flow_index} is not an HTTP flow")]
                
                # Analyze the flow for protection mechanisms
                analysis = {
                    "flow_index": flow_index,
                    "method": flow.request.method,
                    "url": flow.request.url,
                    "protection_systems": identify_protection_system(flow),
                    "request_cookies": analyze_cookies(dict(flow.request.headers)),
                    "has_response": flow.response is not None,
                }
                
                if flow.response:
                    # Add response analysis
                    content_type = flow.response.headers.get("Content-Type", "")
                    is_html = "text/html" in content_type
                    
                    analysis.update({
                        "status_code": flow.response.status_code,
                        "response_cookies": analyze_cookies(dict(flow.response.headers)),
                        "challenge_analysis": analyze_response_for_challenge(flow),
                        "content_type": content_type,
                        "is_html": is_html,
                    })
                    
                    # If HTML and script extraction is requested, extract and analyze JavaScript
                    if is_html and extract_scripts:
                        try:
                            html_content = flow.response.content.decode('utf-8', errors='ignore')
                            analysis["scripts"] = extract_javascript(html_content)
                        except Exception as e:
                            analysis["script_extraction_error"] = str(e)
                
                # Add remediation suggestions based on findings
                analysis["suggestions"] = generate_suggestions(analysis)
                
                return [types.TextContent(type="text", text=json.dumps(analysis, indent=2))]
                
            except IndexError:
                return [types.TextContent(type="text", text=f"Error: Flow index {flow_index} out of range")]
                
        except FileNotFoundError:
            return [types.TextContent(type="text", text="Error: Session not found")]
        except Exception as e:
            return [types.TextContent(type="text", text=f"Error analyzing protection: {str(e)}")]
  • The JSON schema definition and registration of the 'analyze_protection' tool in the list_tools handler, specifying input parameters.
    types.Tool(
        name="analyze_protection",
        description="Analyze flow for bot protection mechanisms and extract challenge details",
        inputSchema={
            "type": "object",
            "properties": {
                "session_id": {
                    "type": "string",
                    "description": "The ID of the session"
                },
                "flow_index": {
                    "type": "integer",
                    "description": "The index of the flow to analyze"
                },
                "extract_scripts": {
                    "type": "boolean",
                    "description": "Whether to extract and analyze JavaScript from the response (default: true)",
                    "default": True
                }
            },
            "required": ["session_id", "flow_index"]
        }
    )
  • The dispatch logic in the call_tool handler that routes requests for 'analyze_protection' to the implementation function.
    elif name == "analyze_protection":
        return await analyze_protection(arguments)
  • Helper function to identify bot protection systems by matching signatures in headers and content against known patterns.
    def identify_protection_system(flow) -> List[Dict[str, Any]]:
        """
        Identify potential bot protection systems based on signatures.
        """
        protections = []
        
        # Combine all searchable content
        searchable_content = ""
        # Add request headers
        for k, v in flow.request.headers.items():
            searchable_content += f"{k}: {v}\n"
        
        # Check response if available
        if flow.response:
            # Add response headers
            for k, v in flow.response.headers.items():
                searchable_content += f"{k}: {v}\n"
            
            # Add response content if it's text
            content_type = flow.response.headers.get("Content-Type", "")
            if "text" in content_type or "javascript" in content_type or "json" in content_type:
                try:
                    searchable_content += flow.response.content.decode('utf-8', errors='ignore')
                except Exception:
                    pass
        
        # Check for protection signatures
        for vendor, signatures in BOT_PROTECTION_SIGNATURES.items():
            matches = []
            for sig in signatures:
                if re.search(sig, searchable_content, re.IGNORECASE):
                    matches.append(sig)
            
            if matches:
                protections.append({
                    "vendor": vendor,
                    "confidence": len(matches) / len(signatures) * 100,
                    "matching_signatures": matches
                })
        
        return sorted(protections, key=lambda x: x["confidence"], reverse=True)
  • Helper function to analyze the response for challenge presence based on status codes, headers, and content patterns.
    def analyze_response_for_challenge(flow) -> Dict[str, Any]:
        """
        Analyze a response to determine if it contains a challenge.
        """
        if not flow.response:
            return {"is_challenge": False}
        
        result = {
            "is_challenge": False,
            "challenge_indicators": [],
            "status_code": flow.response.status_code,
            "challenge_type": "unknown"
        }
        
        # Check status code
        if flow.response.status_code in [403, 429, 503]:
            result["challenge_indicators"].append(f"Suspicious status code: {flow.response.status_code}")
        
        # Check for challenge headers
        challenge_headers = {
            "cf-mitigated": "Cloudflare mitigation",
            "cf-chl-bypass": "Cloudflare challenge bypass",
            "x-datadome": "DataDome protection",
            "x-px": "PerimeterX",
            "x-amz-captcha": "AWS WAF Captcha"
        }
        
        for header, description in challenge_headers.items():
            if any(h.lower() == header.lower() for h in flow.response.headers.keys()):
                result["challenge_indicators"].append(f"Challenge header: {description}")
        
        # Check for challenge content patterns
        content = flow.response.content.decode('utf-8', errors='ignore')
        challenge_patterns = [
            (r'captcha', "CAPTCHA"),
            (r'challenge', "Challenge term"),
            (r'blocked', "Blocking message"),
            (r'verify.*human', "Human verification"),
            (r'suspicious.*activity', "Suspicious activity message"),
            (r'security.*check', "Security check message"),
            (r'ddos', "DDoS protection message"),
            (r'automated.*request', "Automated request detection")
        ]
        
        for pattern, description in challenge_patterns:
            if re.search(pattern, content, re.IGNORECASE):
                result["challenge_indicators"].append(f"Content indicator: {description}")
        
        # Determine if this is a challenge response
        result["is_challenge"] = len(result["challenge_indicators"]) > 0
        
        # Determine challenge type
        if "CAPTCHA" in " ".join(result["challenge_indicators"]):
            result["challenge_type"] = "captcha"
        elif "JavaScript" in content and result["is_challenge"]:
            result["challenge_type"] = "javascript"
        elif result["is_challenge"]:
            result["challenge_type"] = "other"
        
        return result
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions analyzing and extracting details, but fails to describe key traits such as whether this is a read-only operation, potential side effects, performance considerations (e.g., time-intensive due to script extraction), or error handling. This leaves significant gaps in understanding how the tool behaves.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose without unnecessary words. Every part earns its place by clearly stating the action and target, making it easy to parse and understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of analyzing bot protection mechanisms and the lack of annotations and output schema, the description is incomplete. It doesn't explain what 'challenge details' include, how results are returned, or any limitations (e.g., only works with certain flow types). For a tool with no structured behavioral or output information, more context is needed to be fully helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the input schema fully documents all three parameters. The description adds no additional meaning beyond what's in the schema (e.g., it doesn't explain how 'session_id' or 'flow_index' relate to protection analysis). With high schema coverage, the baseline score of 3 is appropriate as the description doesn't compensate but also doesn't detract.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('analyze', 'extract') and resources ('flow', 'bot protection mechanisms', 'challenge details'), making it easy to understand what the tool does. However, it doesn't explicitly differentiate from sibling tools like 'get_flow_details' or 'list_flows', which might also analyze or extract information from flows.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'get_flow_details' or 'extract_json_fields'. It lacks context on prerequisites (e.g., needing a session or flow index), exclusions, or specific scenarios where this tool is preferred, leaving usage decisions ambiguous.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lucasoeth/mitmproxy-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server