hakrawler_crawl
Crawl websites to discover endpoints, forms, and hidden URLs for security testing and vulnerability assessment in bug bounty programs.
Instructions
Execute hakrawler for fast web crawling and endpoint discovery.
Args: url: Target URL to crawl depth: Crawling depth forms: Extract form endpoints robots: Parse robots.txt sitemap: Parse sitemap wayback: Include Wayback Machine URLs insecure: Skip TLS verification additional_args: Additional hakrawler arguments
Returns: Web crawling and endpoint discovery results
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| additional_args | No | ||
| depth | No | ||
| forms | No | ||
| insecure | No | ||
| robots | No | ||
| sitemap | No | ||
| url | Yes | ||
| wayback | No |
Implementation Reference
- src/mcp_server/app.py:1242-1287 (handler)MCP tool handler and registration for 'hakrawler_crawl'. This function defines the tool interface, validates inputs via type hints, constructs payload, and proxies execution to the REST API endpoint /api/hakrawler.@mcp.tool() def hakrawler_crawl( url: str, depth: int = 2, forms: bool = True, robots: bool = True, sitemap: bool = True, wayback: bool = False, insecure: bool = False, additional_args: str = "", ) -> dict[str, Any]: """Execute hakrawler for fast web crawling and endpoint discovery. Args: url: Target URL to crawl depth: Crawling depth forms: Extract form endpoints robots: Parse robots.txt sitemap: Parse sitemap wayback: Include Wayback Machine URLs insecure: Skip TLS verification additional_args: Additional hakrawler arguments Returns: Web crawling and endpoint discovery results """ data = { "url": url, "depth": depth, "forms": forms, "robots": robots, "sitemap": sitemap, "wayback": wayback, "insecure": insecure, "additional_args": additional_args, } logger.info(f"🕷️ Starting hakrawler crawling on {url}") result = api_client.safe_post("api/hakrawler", data) if result.get("success"): logger.info(f"✅ hakrawler crawling completed on {url}") else: logger.error("❌ hakrawler crawling failed") return result
- Core handler function for hakrawler execution in REST API. Extracts parameters, builds subprocess command, executes hakrawler binary, parses output into structured findings with timings and stats.@tool(required_fields=["url"]) def execute_hakrawler(): """Execute hakrawler for fast web crawling and endpoint discovery.""" data = request.get_json() params = extract_hakrawler_params(data) started_at = datetime.now() command = build_hakrawler_command(params) execution_result = execute_command( " ".join(command), timeout=params.get("timeout", 120) ) ended_at = datetime.now() return parse_hakrawler_output( execution_result, params, command, started_at, ended_at )
- Helper function to parse hakrawler stdout into structured JSON findings, extracting URLs, adding metadata like severity, confidence, tags, and computing stats.def parse_hakrawler_output( execution_result: dict[str, Any], params: dict, command: list[str], started_at: datetime, ended_at: datetime, ) -> dict[str, Any]: """Parse hakrawler execution results into structured findings.""" duration_ms = int((ended_at - started_at).total_seconds() * 1000) if not execution_result["success"]: return { "success": False, "tool": "hakrawler", "params": params, "command": command, "started_at": started_at.isoformat(), "ended_at": ended_at.isoformat(), "duration_ms": duration_ms, "error": execution_result.get("error", "Command execution failed"), "findings": [], "stats": {"findings": 0, "dupes": 0, "payload_bytes": 0}, } # Parse successful output stdout = execution_result.get("stdout", "") findings = [] # Extract URLs from hakrawler output for line in stdout.strip().split("\n"): line = line.strip() if not line: continue # Parse URL findings url_info = _extract_url_from_line(line) if url_info: finding = { "type": "url", "target": url_info.get("url", line), "evidence": { "raw_output": line, "source": url_info.get("source", "crawl"), }, "severity": "info", "confidence": "medium", "tags": ["hakrawler", "url-discovery"], "raw_ref": line, } findings.append(finding) payload_bytes = len(stdout.encode("utf-8")) return { "success": True, "tool": "hakrawler", "params": params, "command": command, "started_at": started_at.isoformat(), "ended_at": ended_at.isoformat(), "duration_ms": duration_ms, "findings": findings, "stats": { "findings": len(findings), "dupes": 0, "payload_bytes": payload_bytes, }, }
- Helper to construct the hakrawler CLI command line arguments from input parameters.def build_hakrawler_command(params: dict) -> list[str]: """Build the hakrawler command from parameters.""" args = ["hakrawler"] # Add URL args.extend(["-url", params["url"]]) # Add depth parameter args.extend(["-depth", str(params["depth"])]) # Add boolean flags only if enabled if params["forms"]: args.append("-forms") if params["robots"]: args.append("-robots") if params["sitemap"]: args.append("-sitemap") if params["wayback"]: args.append("-wayback") if params["insecure"]: args.append("-insecure") # Add additional arguments if params["additional_args"]: args.extend(shlex.split(params["additional_args"])) return args
- src/rest_api_server/tools/hakrawler/__init__.py:1-4 (registration)Module init that imports and exposes the execute_hakrawler function for automatic registration via the @tool decorator in the main app.py."""Hakrawler tool module - web crawler for web application reconnaissance.""" from .hakrawler import execute_hakrawler