Skip to main content
Glama

firecrawlsearchagent_firecrawl_extract_web_data

Extract specific structured data from web pages or entire domains using natural language instructions. Define your extraction needs with an 'extraction_prompt' to retrieve precise information from URLs or wildcard-specified domains.

Instructions

Extract structured data from one or multiple web pages using natural language instructions. This tool can process single URLs or entire domains (using wildcards like example.com/*). Use this when you need specific information from websites rather than general search results. You must specify what data to extract from the pages using the 'extraction_prompt' parameter.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
extraction_promptYesNatural language description of what data to extract from the pages.
urlsYesList of URLs to extract data from. Can include wildcards (e.g., 'example.com/*') to crawl entire domains.

Implementation Reference

  • MCP call_tool handler that parses the prefixed tool name, looks up the corresponding agent and tool from registry, and proxies execution to the remote Mesh API.
    @app.call_tool() async def call_tool(name: str, arguments: dict) -> List[types.TextContent]: """Call the specified tool with the given arguments.""" try: if name not in self.tool_registry: raise ValueError(f"Unknown tool: {name}") tool_info = self.tool_registry[name] result = await self.execute_tool( agent_id=tool_info["agent_id"], tool_name=tool_info["tool_name"], tool_arguments=arguments, ) # Convert result to TextContent return [types.TextContent(type="text", text=str(result))] except Exception as e: logger.error(f"Error calling tool {name}: {e}") raise ValueError(f"Failed to call tool {name}: {str(e)}") from e return app
  • Dynamically registers tools from remote agent metadata into the local registry, constructing the tool ID as '{agent_id.lower()}_{tool_name}' which matches 'firecrawlsearchagent_firecrawl_extract_web_data' when agent_id='firecrawlsearchagent' and tool_name='firecrawl_extract_web_data'. Only registers if enabled in config.json.
    for tool in agent_data.get("tools", []): if tool.get("type") == "function": function_data = tool.get("function", {}) tool_name = function_data.get("name") if not tool_name: continue # Check if this tool is enabled based on configuration if not self.is_tool_enabled(agent_id, tool_name): logger.debug( f"Skipping tool {tool_name} for agent {agent_id} (not in config)" # noqa: E501 ) tools_skipped += 1 continue # Create a unique tool ID tool_id = f"{agent_id.lower()}_{tool_name}" # Get parameters or create default schema parameters = function_data.get("parameters", {}) if not parameters: parameters = { "type": "object", "properties": {}, "required": [], } # Store tool info tool_registry[tool_id] = { "agent_id": agent_id, "tool_name": tool_name, "description": function_data.get("description", ""), "parameters": parameters, } tools_enabled += 1 logger.debug(f"Enabled tool: {tool_id}")
  • Registers the tool schema (inputSchema) sourced from the remote agent's function parameters metadata.
    @app.list_tools() async def list_tools() -> List[types.Tool]: """List all available tools.""" return [ types.Tool( name=tool_id, description=tool_info["description"], inputSchema=tool_info["parameters"], ) for tool_id, tool_info in self.tool_registry.items() ]
  • Helper function that sends the tool arguments to the remote Mesh API for execution on the specified agent.
    async def execute_tool( self, agent_id: str, tool_name: str, tool_arguments: Dict[str, Any] ) -> Dict[str, Any]: """Execute a tool on a mesh agent. Args: agent_id: ID of the agent to execute the tool on tool_name: Name of the tool to execute tool_arguments: Arguments to pass to the tool Returns: Tool execution result Raises: ToolExecutionError: If there's an error executing the tool """ request_data = { "agent_id": agent_id, "input": {"tool": tool_name, "tool_arguments": tool_arguments}, } # Add API key if available if Config.HEURIST_API_KEY: request_data["api_key"] = Config.HEURIST_API_KEY try: result = await call_mesh_api( "mesh_request", method="POST", json=request_data ) return result.get("data", result) # Prefer the 'data' field if it exists except MeshApiError as e: # Re-raise API errors with clearer context raise ToolExecutionError(str(e)) from e except Exception as e: logger.error(f"Error calling {agent_id} tool {tool_name}: {e}") raise ToolExecutionError( f"Failed to call {agent_id} tool {tool_name}: {str(e)}" ) from e
  • Low-level HTTP client helper for calling the Mesh API endpoints.
    async def call_mesh_api( path: str, method: str = "GET", json: Dict[str, Any] = None ) -> Dict[str, Any]: """Helper function to call the mesh API endpoint. Args: path: API path to call method: HTTP method to use json: Optional JSON payload Returns: API response as dictionary Raises: MeshApiError: If there's an error calling the API """ async with aiohttp.ClientSession() as session: url = f"{Config.HEURIST_API_ENDPOINT}/{path}" try: headers = {} if Config.HEURIST_API_KEY: headers["X-HEURIST-API-KEY"] = Config.HEURIST_API_KEY async with session.request( method, url, json=json, headers=headers ) as response: if response.status != 200: error_text = await response.text() raise MeshApiError(f"Mesh API error: {error_text}") return await response.json() except aiohttp.ClientError as e: logger.error(f"Error calling mesh API: {e}") raise MeshApiError(f"Failed to connect to mesh API: {str(e)}") from e

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/heurist-network/heurist-mesh-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server