Skip to main content
Glama
Unstructured-IO

Unstructured API MCP Server

Official

check_crawlhtml_status

Monitor the progress of an HTML crawling job by checking its current status using the crawl job ID.

Instructions

Check the status of an existing Firecrawl HTML crawl job.

Args: crawl_id: ID of the crawl job to check Returns: Dictionary containing the current status of the crawl job

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
crawl_idYes

Implementation Reference

  • The primary handler function for the 'check_crawlhtml_status' MCP tool. It takes a crawl_id and delegates to the internal _check_job_status helper to query the Firecrawl API for the job status.
    async def check_crawlhtml_status( crawl_id: str, ) -> Dict[str, Any]: """Check the status of an existing Firecrawl HTML crawl job. Args: crawl_id: ID of the crawl job to check Returns: Dictionary containing the current status of the crawl job """ return await _check_job_status(crawl_id, "crawlhtml")
  • Core helper function that performs the actual status check by initializing FirecrawlApp client, querying the appropriate API endpoint based on job_type, and formatting the response.
    async def _check_job_status( job_id: str, job_type: Firecrawl_JobType, ) -> Dict[str, Any]: """Generic function to check the status of a Firecrawl job. Args: job_id: ID of the job to check job_type: Type of job ('crawlhtml' or 'llmtxt') Returns: Dictionary containing the current status of the job """ # Get configuration with API key config = _prepare_firecrawl_config() # Check if config contains an error if "error" in config: return {"error": config["error"]} try: # Initialize the Firecrawl client firecrawl = FirecrawlApp(api_key=config["api_key"]) # Check status based on job type if job_type == "crawlhtml": result = firecrawl.check_crawl_status(job_id) # Return a more user-friendly response for crawl jobs status_info = { "id": job_id, "status": result.get("status", "unknown"), "completed_urls": result.get("completed", 0), "total_urls": result.get("total", 0), } elif job_type == "llmfulltxt": result = firecrawl.check_generate_llms_text_status(job_id) # Return a more user-friendly response for llmfull.txt jobs status_info = { "id": job_id, "status": result.get("status", "unknown"), } # Add llmfull.txt content if job is completed if result.get("status") == "completed" and "data" in result: status_info["llmfulltxt"] = result["data"].get("llmsfulltxt", "") else: return {"error": f"Unknown job type: {job_type}"} return status_info except Exception as e: return {"error": f"Error checking {job_type} status: {str(e)}"}
  • The registration of the check_crawlhtml_status tool using the mcp.tool() decorator within the register_external_connectors function.
    mcp.tool()(check_crawlhtml_status)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Unstructured-IO/UNS-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server