Skip to main content
Glama
yokingma

OneSearch MCP Server

one_scrape

Extract content from webpages with customizable options including markdown, HTML, screenshots, and structured data extraction. Supports dynamic content handling through pre-scrape actions like clicking, scrolling, or JavaScript execution.

Instructions

Scrape a single webpage with advanced options for content extraction. Supports various formats including markdown, HTML, and screenshots. Can execute custom actions like clicking or scrolling before scraping.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to scrape
formatsNoContent formats to extract (default: ['markdown'])
onlyMainContentNoExtract only the main content, filtering out navigation, footers, etc.
includeTagsNoHTML tags to specifically include in extraction
excludeTagsNoHTML tags to exclude from extraction
waitForNoTime in milliseconds to wait for dynamic content to load
timeoutNoMaximum time in milliseconds to wait for the page to load
actionsNoList of actions to perform before scraping
extractNoConfiguration for structured data extraction
mobileNoUse mobile viewport
skipTlsVerificationNoSkip TLS certificate verification
removeBase64ImagesNoRemove base64 encoded images from output
locationNoLocation settings for scraping

Implementation Reference

  • Core implementation of the one_scrape tool: calls Firecrawl's scrapeUrl API and processes the response into MCP content format.
    async function processScrape(url: string, args: ScrapeParams) {
      const res = await firecrawl.scrapeUrl(url, {
        ...args,
      });
    
      if (!res.success) {
        throw new Error(`Failed to scrape: ${res.error}`);
      }
    
      const content: string[] = [];
    
      if (res.markdown) {
        content.push(res.markdown);
      }
    
      if (res.rawHtml) {
        content.push(res.rawHtml);
      }
    
      if (res.links) {
        content.push(res.links.join('\n'));
      }
    
      if (res.screenshot) {
        content.push(res.screenshot);
      }
    
      if (res.html) {
        content.push(res.html);
      }
    
      if (res.extract) {
        content.push(res.extract);
      }
    
      return {
        content: [
          {
            type: 'text',
            text: content.join('\n\n') || 'No content found',
          },
        ],
        result: res,
        success: true,
      };
    }
  • Dispatch handler for 'one_scrape' tool call: validates input, handles logging and errors, delegates to processScrape.
    case 'one_scrape': {
      if (!checkScrapeArgs(args)) {
        throw new Error(`Invalid arguments for tool: [${name}]`);
      }
      try {
        const startTime = Date.now();
        server.sendLoggingMessage({
          level: 'info',
          data: `[${new Date().toISOString()}] Scraping started for url: [${args.url}]`,
        });
    
        const { url, ...scrapeArgs } = args;
        const { content, success, result } = await processScrape(url, scrapeArgs);
    
        server.sendLoggingMessage({
          level: 'info',
          data: `[${new Date().toISOString()}] Scraping completed in ${Date.now() - startTime}ms`,
        });
    
        return {
          content,
          result,
          success,
        };
      } catch (error) {
        server.sendLoggingMessage({
          level: 'error',
          data: `[${new Date().toISOString()}] Error scraping: ${error}`,
        });
        const msg = error instanceof Error ? error.message : 'Unknown error';
        return {
          success: false,
          content: [
            {
              type: 'text',
              text: msg,
            },
          ],
        };
      }
    }
  • Input schema and metadata definition for the 'one_scrape' tool.
    export const SCRAPE_TOOL: Tool = {
      name: 'one_scrape',
      description:
        'Scrape a single webpage with advanced options for content extraction. ' +
        'Supports various formats including markdown, HTML, and screenshots. ' +
        'Can execute custom actions like clicking or scrolling before scraping.',
      inputSchema: {
        type: 'object',
        properties: {
          url: {
            type: 'string',
            description: 'The URL to scrape',
          },
          formats: {
            type: 'array',
            items: {
              type: 'string',
              enum: [
                'markdown',
                'html',
                'rawHtml',
                'screenshot',
                'links',
                'screenshot@fullPage',
                'extract',
              ],
            },
            description: "Content formats to extract (default: ['markdown'])",
          },
          onlyMainContent: {
            type: 'boolean',
            description:
              'Extract only the main content, filtering out navigation, footers, etc.',
          },
          includeTags: {
            type: 'array',
            items: { type: 'string' },
            description: 'HTML tags to specifically include in extraction',
          },
          excludeTags: {
            type: 'array',
            items: { type: 'string' },
            description: 'HTML tags to exclude from extraction',
          },
          waitFor: {
            type: 'number',
            description: 'Time in milliseconds to wait for dynamic content to load',
          },
          timeout: {
            type: 'number',
            description:
              'Maximum time in milliseconds to wait for the page to load',
          },
          actions: {
            type: 'array',
            items: {
              type: 'object',
              properties: {
                type: {
                  type: 'string',
                  enum: [
                    'wait',
                    'click',
                    'screenshot',
                    'write',
                    'press',
                    'scroll',
                    'scrape',
                    'executeJavascript',
                  ],
                  description: 'Type of action to perform',
                },
                selector: {
                  type: 'string',
                  description: 'CSS selector for the target element',
                },
                milliseconds: {
                  type: 'number',
                  description: 'Time to wait in milliseconds (for wait action)',
                },
                text: {
                  type: 'string',
                  description: 'Text to write (for write action)',
                },
                key: {
                  type: 'string',
                  description: 'Key to press (for press action)',
                },
                direction: {
                  type: 'string',
                  enum: ['up', 'down'],
                  description: 'Scroll direction',
                },
                script: {
                  type: 'string',
                  description: 'JavaScript code to execute',
                },
                fullPage: {
                  type: 'boolean',
                  description: 'Take full page screenshot',
                },
              },
              required: ['type'],
            },
            description: 'List of actions to perform before scraping',
          },
          extract: {
            type: 'object',
            properties: {
              schema: {
                type: 'object',
                description: 'Schema for structured data extraction',
              },
              systemPrompt: {
                type: 'string',
                description: 'System prompt for LLM extraction',
              },
              prompt: {
                type: 'string',
                description: 'User prompt for LLM extraction',
              },
            },
            description: 'Configuration for structured data extraction',
          },
          mobile: {
            type: 'boolean',
            description: 'Use mobile viewport',
          },
          skipTlsVerification: {
            type: 'boolean',
            description: 'Skip TLS certificate verification',
          },
          removeBase64Images: {
            type: 'boolean',
            description: 'Remove base64 encoded images from output',
          },
          location: {
            type: 'object',
            properties: {
              country: {
                type: 'string',
                description: 'Country code for geolocation',
              },
              languages: {
                type: 'array',
                items: { type: 'string' },
                description: 'Language codes for content',
              },
            },
            description: 'Location settings for scraping',
          },
        },
        required: ['url'],
      },
    };
  • src/index.ts:66-73 (registration)
    Registers the SCRAPE_TOOL (one_scrape) in the MCP server's list of available tools.
    server.setRequestHandler(ListToolsRequestSchema, async () => ({
      tools: [
        SEARCH_TOOL,
        EXTRACT_TOOL,
        SCRAPE_TOOL,
        MAP_TOOL,
      ],
    }));
  • Helper function to validate arguments for the one_scrape tool.
    function checkScrapeArgs(args: unknown): args is ScrapeParams & { url: string } {
      return (
        typeof args === 'object' &&
        args !== null &&
        'url' in args &&
        typeof args.url === 'string'
      );
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions advanced options, content extraction formats, and custom actions like clicking or scrolling, which gives some insight into functionality. However, it lacks critical behavioral details such as rate limits, authentication requirements, error handling, or what happens with dynamic content. For a complex scraping tool with 13 parameters, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured in two sentences. The first sentence states the core purpose and key features, while the second adds details on custom actions. There's no wasted language, and it's front-loaded with essential information. It could be slightly improved by integrating sibling differentiation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (13 parameters, nested objects, no output schema, and no annotations), the description is insufficiently complete. It lacks details on output format, error conditions, performance characteristics, and how it differs from siblings. Without annotations or an output schema, the description should provide more behavioral and contextual guidance to compensate, which it doesn't do adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds minimal parameter semantics beyond the input schema. It mentions 'advanced options for content extraction' and 'custom actions like clicking or scrolling,' which loosely relates to parameters like 'formats' and 'actions,' but doesn't provide additional context or examples. Since schema description coverage is 100%, the baseline score is 3, as the schema already documents parameters thoroughly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Scrape a single webpage with advanced options for content extraction.' It specifies the verb ('scrape') and resource ('single webpage'), and mentions advanced capabilities like content extraction formats and custom actions. However, it doesn't explicitly differentiate from sibling tools like 'one_extract' or 'one_search,' which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus its siblings (one_extract, one_map, one_search). It mentions capabilities like content extraction and custom actions, but doesn't specify scenarios where this tool is preferred over alternatives or any prerequisites for use. This leaves the agent without context for tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/yokingma/one-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server