check_crawlhtml_status
Monitor the status of a Firecrawl HTML crawl job by providing its ID. Obtain real-time updates and insights into the crawl process, ensuring efficient management of web data extraction tasks.
Instructions
Check the status of an existing Firecrawl HTML crawl job.
Args:
crawl_id: ID of the crawl job to check
Returns:
Dictionary containing the current status of the crawl job
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| crawl_id | Yes |
Implementation Reference
- Main handler function for the 'check_crawlhtml_status' MCP tool. It takes a crawl_id and delegates to the _check_job_status helper with job_type='crawlhtml'.async def check_crawlhtml_status( crawl_id: str, ) -> Dict[str, Any]: """Check the status of an existing Firecrawl HTML crawl job. Args: crawl_id: ID of the crawl job to check Returns: Dictionary containing the current status of the crawl job """ return await _check_job_status(crawl_id, "crawlhtml")
- Core implementation logic for checking Firecrawl job status, used by check_crawlhtml_status and similar tools. Handles FirecrawlApp client initialization, status querying, and response formatting.async def _check_job_status( job_id: str, job_type: Firecrawl_JobType, ) -> Dict[str, Any]: """Generic function to check the status of a Firecrawl job. Args: job_id: ID of the job to check job_type: Type of job ('crawlhtml' or 'llmtxt') Returns: Dictionary containing the current status of the job """ # Get configuration with API key config = _prepare_firecrawl_config() # Check if config contains an error if "error" in config: return {"error": config["error"]} try: # Initialize the Firecrawl client firecrawl = FirecrawlApp(api_key=config["api_key"]) # Check status based on job type if job_type == "crawlhtml": result = firecrawl.check_crawl_status(job_id) # Return a more user-friendly response for crawl jobs status_info = { "id": job_id, "status": result.get("status", "unknown"), "completed_urls": result.get("completed", 0), "total_urls": result.get("total", 0), } elif job_type == "llmfulltxt": result = firecrawl.check_generate_llms_text_status(job_id) # Return a more user-friendly response for llmfull.txt jobs status_info = { "id": job_id, "status": result.get("status", "unknown"), } # Add llmfull.txt content if job is completed if result.get("status") == "completed" and "data" in result: status_info["llmfulltxt"] = result["data"].get("llmsfulltxt", "") else: return {"error": f"Unknown job type: {job_type}"} return status_info except Exception as e: return {"error": f"Error checking {job_type} status: {str(e)}"}
- uns_mcp/connectors/external/__init__.py:9-25 (registration)Registration of the check_crawlhtml_status tool (and other Firecrawl tools) via mcp.tool() decorator in the register_external_connectors function.def register_external_connectors(mcp: FastMCP): """Register all external connector tools with the MCP server.""" # Register Firecrawl tools from .firecrawl import ( cancel_crawlhtml_job, check_crawlhtml_status, check_llmtxt_status, invoke_firecrawl_crawlhtml, invoke_firecrawl_llmtxt, ) mcp.tool()(invoke_firecrawl_crawlhtml) mcp.tool()(check_crawlhtml_status) mcp.tool()(invoke_firecrawl_llmtxt) mcp.tool()(check_llmtxt_status) mcp.tool()(cancel_crawlhtml_job) # mcp.tool()(cancel_llmtxt_job) # currently commented till firecrawl brings up a cancel feature