extract_structured_content
Extract structured data from web pages using specified CSS selectors, enabling precise content retrieval for processing in LLMs or other applications.
Instructions
Extracts structured content from a web page using CSS selectors
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| selectors | Yes | CSS selectors to extract content | |
| url | Yes | URL to extract content from |
Implementation Reference
- src/server.ts:327-383 (handler)The handler function for the extract_structured_content tool. Validates input arguments (url and selectors object), simulates content extraction using mock data for each selector, formats the results as Markdown sections, and returns a structured text content response. Includes error handling./** * Handle the extract_structured_content tool */ private async handleExtractStructuredContent(args: any) { // Validate arguments if ( typeof args !== 'object' || args === null || typeof args.url !== 'string' || typeof args.selectors !== 'object' ) { throw new McpError(ErrorCode.InvalidParams, 'Invalid arguments for extract_structured_content'); } const { url, selectors } = args; try { // In a real implementation, you would: // 1. Use Cloudflare Browser Rendering to fetch the page // 2. Use the /scrape endpoint to extract content based on selectors // For this simulation, we'll return mock results const mockResults: Record<string, string> = {}; for (const [key, selector] of Object.entries(selectors)) { if (typeof selector === 'string') { // Simulate extraction based on selector mockResults[key] = `Extracted content for selector "${selector}"`; } } // Format the results const formattedResults = Object.entries(mockResults) .map(([key, value]) => `## ${key}\n${value}`) .join('\n\n'); return { content: [ { type: 'text', text: `# Structured Content from ${url}\n\n${formattedResults}`, }, ], }; } catch (error) { console.error('[Error] Error extracting structured content:', error); return { content: [ { type: 'text', text: `Error extracting structured content: ${error instanceof Error ? error.message : String(error)}`, }, ], isError: true, }; } }
- src/server.ts:108-128 (schema)Tool schema definition returned by listTools, specifying name, description, and inputSchema requiring 'url' string and 'selectors' object (CSS selectors as keys with string values).{ name: 'extract_structured_content', description: 'Extracts structured content from a web page using CSS selectors', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL to extract content from', }, selectors: { type: 'object', description: 'CSS selectors to extract content', additionalProperties: { type: 'string', }, }, }, required: ['url', 'selectors'], }, },
- src/server.ts:189-191 (registration)Registration in the CallToolRequest handler switch statement, logging the call and dispatching to the specific handleExtractStructuredContent method.case 'extract_structured_content': console.error(`[API] Extracting structured content from: ${args?.url}`); return await this.handleExtractStructuredContent(args);