extract_structured_content
Extract structured data from web pages using CSS selectors to efficiently gather specific content for analysis or integration into LLM workflows.
Instructions
Extracts structured content from a web page using CSS selectors
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| selectors | Yes | CSS selectors to extract content | |
| url | Yes | URL to extract content from |
Input Schema (JSON Schema)
{
"properties": {
"selectors": {
"additionalProperties": {
"type": "string"
},
"description": "CSS selectors to extract content",
"type": "object"
},
"url": {
"description": "URL to extract content from",
"type": "string"
}
},
"required": [
"url",
"selectors"
],
"type": "object"
}
Implementation Reference
- src/server.ts:295-347 (handler)The primary handler function for the 'extract_structured_content' tool. Validates input arguments (url and selectors), simulates extraction using mock data based on CSS selectors, formats the results as markdown, and returns structured content or error response.private async handleExtractStructuredContent(args: any) { // Validate arguments if ( typeof args !== 'object' || args === null || typeof args.url !== 'string' || typeof args.selectors !== 'object' ) { throw new McpError(ErrorCode.InvalidParams, 'Invalid arguments for extract_structured_content'); } const { url, selectors } = args; try { // In a real implementation, you would: // 1. Use Cloudflare Browser Rendering to fetch the page // 2. Use the /scrape endpoint to extract content based on selectors // For this simulation, we'll return mock results const mockResults: Record<string, string> = {}; for (const [key, selector] of Object.entries(selectors)) { if (typeof selector === 'string') { // Simulate extraction based on selector mockResults[key] = `Extracted content for selector "${selector}"`; } } // Format the results const formattedResults = Object.entries(mockResults) .map(([key, value]) => `## ${key}\n${value}`) .join('\n\n'); return { content: [ { type: 'text', text: `# Structured Content from ${url}\n\n${formattedResults}`, }, ], }; } catch (error) { console.error('Error extracting structured content:', error); return { content: [ { type: 'text', text: `Error extracting structured content: ${error instanceof Error ? error.message : String(error)}`, }, ], isError: true, }; }
- src/server.ts:101-117 (schema)Input schema defining the expected parameters for the tool: 'url' (string, required) and 'selectors' (object with CSS selector strings, required).inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL to extract content from', }, selectors: { type: 'object', description: 'CSS selectors to extract content', additionalProperties: { type: 'string', }, }, }, required: ['url', 'selectors'], },
- src/server.ts:98-118 (registration)Tool registration in the ListTools response, including name, description, and input schema.{ name: 'extract_structured_content', description: 'Extracts structured content from a web page using CSS selectors', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL to extract content from', }, selectors: { type: 'object', description: 'CSS selectors to extract content', additionalProperties: { type: 'string', }, }, }, required: ['url', 'selectors'], }, },
- src/server.ts:150-151 (registration)Dispatch case in the CallToolRequest handler that routes to the specific tool handler.case 'extract_structured_content': return await this.handleExtractStructuredContent(args);