one_extract
Extract structured data from web pages using LLM. Define inputs via URLs, prompts, and JSON schema. Works with cloud AI or self-hosted LLM for customizable and precise web content extraction.
Instructions
Extract structured information from web pages using LLM. Supports both cloud AI and self-hosted LLM extraction.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| allowExternalLinks | No | Allow extraction from external links | |
| enableWebSearch | No | Enable web search for additional context | |
| includeSubdomains | No | Include subdomains in extraction | |
| prompt | No | Prompt for the LLM extraction | |
| schema | No | JSON schema for structured data extraction | |
| systemPrompt | No | System prompt for LLM extraction | |
| urls | Yes | List of URLs to extract information from |
Implementation Reference
- src/tools.ts:255-295 (schema)Defines the Tool object for 'one_extract', including name, description, and detailed inputSchema for extraction parameters.export const EXTRACT_TOOL: Tool = { name: 'one_extract', description: 'Extract structured information from web pages using LLM. ' + 'Supports both cloud AI and self-hosted LLM extraction.', inputSchema: { type: 'object', properties: { urls: { type: 'array', items: { type: 'string' }, description: 'List of URLs to extract information from', }, prompt: { type: 'string', description: 'Prompt for the LLM extraction', }, systemPrompt: { type: 'string', description: 'System prompt for LLM extraction', }, schema: { type: 'object', description: 'JSON schema for structured data extraction', }, allowExternalLinks: { type: 'boolean', description: 'Allow extraction from external links', }, enableWebSearch: { type: 'boolean', description: 'Enable web search for additional context', }, includeSubdomains: { type: 'boolean', description: 'Include subdomains in extraction', }, }, required: ['urls'], }, };
- src/index.ts:66-73 (registration)Registers the 'one_extract' tool by including EXTRACT_TOOL in the array of tools advertised via the ListTools handler. Note: No execution handler (switch case) is present for this tool.server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [ SEARCH_TOOL, EXTRACT_TOOL, SCRAPE_TOOL, MAP_TOOL, ], }));