Skip to main content
Glama
Unstructured-IO

Unstructured API MCP Server

Official

check_crawlhtml_status

Monitor the status of a Firecrawl HTML crawl job by providing its ID. Obtain real-time updates and insights into the crawl process, ensuring efficient management of web data extraction tasks.

Instructions

Check the status of an existing Firecrawl HTML crawl job.

Args: crawl_id: ID of the crawl job to check Returns: Dictionary containing the current status of the crawl job

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
crawl_idYes

Implementation Reference

  • Main handler function for the 'check_crawlhtml_status' MCP tool. It takes a crawl_id and delegates to the _check_job_status helper with job_type='crawlhtml'.
    async def check_crawlhtml_status( crawl_id: str, ) -> Dict[str, Any]: """Check the status of an existing Firecrawl HTML crawl job. Args: crawl_id: ID of the crawl job to check Returns: Dictionary containing the current status of the crawl job """ return await _check_job_status(crawl_id, "crawlhtml")
  • Core implementation logic for checking Firecrawl job status, used by check_crawlhtml_status and similar tools. Handles FirecrawlApp client initialization, status querying, and response formatting.
    async def _check_job_status( job_id: str, job_type: Firecrawl_JobType, ) -> Dict[str, Any]: """Generic function to check the status of a Firecrawl job. Args: job_id: ID of the job to check job_type: Type of job ('crawlhtml' or 'llmtxt') Returns: Dictionary containing the current status of the job """ # Get configuration with API key config = _prepare_firecrawl_config() # Check if config contains an error if "error" in config: return {"error": config["error"]} try: # Initialize the Firecrawl client firecrawl = FirecrawlApp(api_key=config["api_key"]) # Check status based on job type if job_type == "crawlhtml": result = firecrawl.check_crawl_status(job_id) # Return a more user-friendly response for crawl jobs status_info = { "id": job_id, "status": result.get("status", "unknown"), "completed_urls": result.get("completed", 0), "total_urls": result.get("total", 0), } elif job_type == "llmfulltxt": result = firecrawl.check_generate_llms_text_status(job_id) # Return a more user-friendly response for llmfull.txt jobs status_info = { "id": job_id, "status": result.get("status", "unknown"), } # Add llmfull.txt content if job is completed if result.get("status") == "completed" and "data" in result: status_info["llmfulltxt"] = result["data"].get("llmsfulltxt", "") else: return {"error": f"Unknown job type: {job_type}"} return status_info except Exception as e: return {"error": f"Error checking {job_type} status: {str(e)}"}
  • Registration of the check_crawlhtml_status tool (and other Firecrawl tools) via mcp.tool() decorator in the register_external_connectors function.
    def register_external_connectors(mcp: FastMCP): """Register all external connector tools with the MCP server.""" # Register Firecrawl tools from .firecrawl import ( cancel_crawlhtml_job, check_crawlhtml_status, check_llmtxt_status, invoke_firecrawl_crawlhtml, invoke_firecrawl_llmtxt, ) mcp.tool()(invoke_firecrawl_crawlhtml) mcp.tool()(check_crawlhtml_status) mcp.tool()(invoke_firecrawl_llmtxt) mcp.tool()(check_llmtxt_status) mcp.tool()(cancel_crawlhtml_job) # mcp.tool()(cancel_llmtxt_job) # currently commented till firecrawl brings up a cancel feature

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Unstructured-IO/UNS-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server