Skip to main content
Glama

extract_structured_content

Extract structured data from web pages using CSS selectors to efficiently gather specific content for analysis or integration into LLM workflows.

Instructions

Extracts structured content from a web page using CSS selectors

Input Schema

NameRequiredDescriptionDefault
selectorsYesCSS selectors to extract content
urlYesURL to extract content from

Input Schema (JSON Schema)

{ "properties": { "selectors": { "additionalProperties": { "type": "string" }, "description": "CSS selectors to extract content", "type": "object" }, "url": { "description": "URL to extract content from", "type": "string" } }, "required": [ "url", "selectors" ], "type": "object" }

Implementation Reference

  • The primary handler function for the 'extract_structured_content' tool. Validates input arguments (url and selectors), simulates extraction using mock data based on CSS selectors, formats the results as markdown, and returns structured content or error response.
    private async handleExtractStructuredContent(args: any) { // Validate arguments if ( typeof args !== 'object' || args === null || typeof args.url !== 'string' || typeof args.selectors !== 'object' ) { throw new McpError(ErrorCode.InvalidParams, 'Invalid arguments for extract_structured_content'); } const { url, selectors } = args; try { // In a real implementation, you would: // 1. Use Cloudflare Browser Rendering to fetch the page // 2. Use the /scrape endpoint to extract content based on selectors // For this simulation, we'll return mock results const mockResults: Record<string, string> = {}; for (const [key, selector] of Object.entries(selectors)) { if (typeof selector === 'string') { // Simulate extraction based on selector mockResults[key] = `Extracted content for selector "${selector}"`; } } // Format the results const formattedResults = Object.entries(mockResults) .map(([key, value]) => `## ${key}\n${value}`) .join('\n\n'); return { content: [ { type: 'text', text: `# Structured Content from ${url}\n\n${formattedResults}`, }, ], }; } catch (error) { console.error('Error extracting structured content:', error); return { content: [ { type: 'text', text: `Error extracting structured content: ${error instanceof Error ? error.message : String(error)}`, }, ], isError: true, }; }
  • Input schema defining the expected parameters for the tool: 'url' (string, required) and 'selectors' (object with CSS selector strings, required).
    inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL to extract content from', }, selectors: { type: 'object', description: 'CSS selectors to extract content', additionalProperties: { type: 'string', }, }, }, required: ['url', 'selectors'], },
  • src/server.ts:98-118 (registration)
    Tool registration in the ListTools response, including name, description, and input schema.
    { name: 'extract_structured_content', description: 'Extracts structured content from a web page using CSS selectors', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL to extract content from', }, selectors: { type: 'object', description: 'CSS selectors to extract content', additionalProperties: { type: 'string', }, }, }, required: ['url', 'selectors'], }, },
  • src/server.ts:150-151 (registration)
    Dispatch case in the CallToolRequest handler that routes to the specific tool handler.
    case 'extract_structured_content': return await this.handleExtractStructuredContent(args);

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/amotivv/cloudflare-browser-rendering'

If you have feedback or need assistance with the MCP directory API, please join our Discord server