Skip to main content
Glama
amotivv

Web Content MCP Server

extract_structured_content

Extract structured data from web pages using CSS selectors to efficiently gather specific content for analysis or integration into LLM workflows.

Instructions

Extracts structured content from a web page using CSS selectors

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
selectorsYesCSS selectors to extract content
urlYesURL to extract content from

Implementation Reference

  • The primary handler function for the 'extract_structured_content' tool. Validates input arguments (url and selectors), simulates extraction using mock data based on CSS selectors, formats the results as markdown, and returns structured content or error response.
    private async handleExtractStructuredContent(args: any) {
      // Validate arguments
      if (
        typeof args !== 'object' || 
        args === null || 
        typeof args.url !== 'string' ||
        typeof args.selectors !== 'object'
      ) {
        throw new McpError(ErrorCode.InvalidParams, 'Invalid arguments for extract_structured_content');
      }
    
      const { url, selectors } = args;
    
      try {
        // In a real implementation, you would:
        // 1. Use Cloudflare Browser Rendering to fetch the page
        // 2. Use the /scrape endpoint to extract content based on selectors
        
        // For this simulation, we'll return mock results
        const mockResults: Record<string, string> = {};
        
        for (const [key, selector] of Object.entries(selectors)) {
          if (typeof selector === 'string') {
            // Simulate extraction based on selector
            mockResults[key] = `Extracted content for selector "${selector}"`;
          }
        }
        
        // Format the results
        const formattedResults = Object.entries(mockResults)
          .map(([key, value]) => `## ${key}\n${value}`)
          .join('\n\n');
        
        return {
          content: [
            {
              type: 'text',
              text: `# Structured Content from ${url}\n\n${formattedResults}`,
            },
          ],
        };
      } catch (error) {
        console.error('Error extracting structured content:', error);
        return {
          content: [
            {
              type: 'text',
              text: `Error extracting structured content: ${error instanceof Error ? error.message : String(error)}`,
            },
          ],
          isError: true,
        };
      }
  • Input schema defining the expected parameters for the tool: 'url' (string, required) and 'selectors' (object with CSS selector strings, required).
    inputSchema: {
      type: 'object',
      properties: {
        url: {
          type: 'string',
          description: 'URL to extract content from',
        },
        selectors: {
          type: 'object',
          description: 'CSS selectors to extract content',
          additionalProperties: {
            type: 'string',
          },
        },
      },
      required: ['url', 'selectors'],
    },
  • src/server.ts:98-118 (registration)
    Tool registration in the ListTools response, including name, description, and input schema.
    {
      name: 'extract_structured_content',
      description: 'Extracts structured content from a web page using CSS selectors',
      inputSchema: {
        type: 'object',
        properties: {
          url: {
            type: 'string',
            description: 'URL to extract content from',
          },
          selectors: {
            type: 'object',
            description: 'CSS selectors to extract content',
            additionalProperties: {
              type: 'string',
            },
          },
        },
        required: ['url', 'selectors'],
      },
    },
  • src/server.ts:150-151 (registration)
    Dispatch case in the CallToolRequest handler that routes to the specific tool handler.
    case 'extract_structured_content':
      return await this.handleExtractStructuredContent(args);
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states what the tool does but lacks details on traits like error handling (e.g., invalid URLs or selectors), performance (e.g., timeouts or rate limits), output format (e.g., JSON structure), or side effects (e.g., whether it makes network requests). This leaves significant gaps for an agent to understand how the tool behaves beyond its basic function.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose ('Extracts structured content from a web page') and specifies the method ('using CSS selectors'). There is no wasted verbiage or redundant information, making it highly concise and well-structured for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (involving web scraping with CSS selectors), lack of annotations, and no output schema, the description is incomplete. It doesn't cover behavioral aspects like error handling, output structure, or limitations (e.g., JavaScript-rendered content). While the schema documents parameters well, the overall context for safe and effective use is insufficient, especially for a tool that interacts with external resources.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with clear descriptions for both parameters ('url' and 'selectors'). The description adds minimal value beyond the schema by mentioning 'CSS selectors', which aligns with the schema's description for 'selectors'. It doesn't provide additional context like selector syntax examples or URL validation rules. Given the high schema coverage, a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('extracts') and target ('structured content from a web page'), and specifies the method ('using CSS selectors'). It distinguishes from siblings like 'fetch_page' (which likely retrieves raw HTML) and 'summarize_content' (which processes content). However, it doesn't explicitly contrast with 'search_documentation', leaving some ambiguity about when to choose one over the other.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing a valid URL), exclusions (e.g., not for non-web content), or comparisons with sibling tools like 'fetch_page' for raw HTML or 'search_documentation' for query-based extraction. Usage is implied only by the tool's name and description.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/amotivv/cloudflare-browser-rendering'

If you have feedback or need assistance with the MCP directory API, please join our Discord server