Skip to main content
Glama
amotivv

Web Content MCP Server

extract_structured_content

Extract structured data from web pages using CSS selectors to efficiently gather specific content for analysis or integration into LLM workflows.

Instructions

Extracts structured content from a web page using CSS selectors

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
selectorsYesCSS selectors to extract content
urlYesURL to extract content from

Implementation Reference

  • The primary handler function for the 'extract_structured_content' tool. Validates input arguments (url and selectors), simulates extraction using mock data based on CSS selectors, formats the results as markdown, and returns structured content or error response.
    private async handleExtractStructuredContent(args: any) {
      // Validate arguments
      if (
        typeof args !== 'object' || 
        args === null || 
        typeof args.url !== 'string' ||
        typeof args.selectors !== 'object'
      ) {
        throw new McpError(ErrorCode.InvalidParams, 'Invalid arguments for extract_structured_content');
      }
    
      const { url, selectors } = args;
    
      try {
        // In a real implementation, you would:
        // 1. Use Cloudflare Browser Rendering to fetch the page
        // 2. Use the /scrape endpoint to extract content based on selectors
        
        // For this simulation, we'll return mock results
        const mockResults: Record<string, string> = {};
        
        for (const [key, selector] of Object.entries(selectors)) {
          if (typeof selector === 'string') {
            // Simulate extraction based on selector
            mockResults[key] = `Extracted content for selector "${selector}"`;
          }
        }
        
        // Format the results
        const formattedResults = Object.entries(mockResults)
          .map(([key, value]) => `## ${key}\n${value}`)
          .join('\n\n');
        
        return {
          content: [
            {
              type: 'text',
              text: `# Structured Content from ${url}\n\n${formattedResults}`,
            },
          ],
        };
      } catch (error) {
        console.error('Error extracting structured content:', error);
        return {
          content: [
            {
              type: 'text',
              text: `Error extracting structured content: ${error instanceof Error ? error.message : String(error)}`,
            },
          ],
          isError: true,
        };
      }
  • Input schema defining the expected parameters for the tool: 'url' (string, required) and 'selectors' (object with CSS selector strings, required).
    inputSchema: {
      type: 'object',
      properties: {
        url: {
          type: 'string',
          description: 'URL to extract content from',
        },
        selectors: {
          type: 'object',
          description: 'CSS selectors to extract content',
          additionalProperties: {
            type: 'string',
          },
        },
      },
      required: ['url', 'selectors'],
    },
  • src/server.ts:98-118 (registration)
    Tool registration in the ListTools response, including name, description, and input schema.
    {
      name: 'extract_structured_content',
      description: 'Extracts structured content from a web page using CSS selectors',
      inputSchema: {
        type: 'object',
        properties: {
          url: {
            type: 'string',
            description: 'URL to extract content from',
          },
          selectors: {
            type: 'object',
            description: 'CSS selectors to extract content',
            additionalProperties: {
              type: 'string',
            },
          },
        },
        required: ['url', 'selectors'],
      },
    },
  • src/server.ts:150-151 (registration)
    Dispatch case in the CallToolRequest handler that routes to the specific tool handler.
    case 'extract_structured_content':
      return await this.handleExtractStructuredContent(args);
Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/amotivv/cloudflare-browser-rendering'

If you have feedback or need assistance with the MCP directory API, please join our Discord server