Skip to main content
Glama

one_scrape

Extract content from a single webpage in formats like HTML, markdown, or screenshots. Perform actions like clicking, scrolling, or executing JavaScript. Includes options for dynamic content, structured data extraction, and mobile viewport.

Instructions

Scrape a single webpage with advanced options for content extraction. Supports various formats including markdown, HTML, and screenshots. Can execute custom actions like clicking or scrolling before scraping.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
actionsNoList of actions to perform before scraping
excludeTagsNoHTML tags to exclude from extraction
extractNoConfiguration for structured data extraction
formatsNoContent formats to extract (default: ['markdown'])
includeTagsNoHTML tags to specifically include in extraction
locationNoLocation settings for scraping
mobileNoUse mobile viewport
onlyMainContentNoExtract only the main content, filtering out navigation, footers, etc.
removeBase64ImagesNoRemove base64 encoded images from output
skipTlsVerificationNoSkip TLS certificate verification
timeoutNoMaximum time in milliseconds to wait for the page to load
urlYesThe URL to scrape
waitForNoTime in milliseconds to wait for dynamic content to load

Implementation Reference

  • Core handler function that invokes Firecrawl's scrapeUrl API with provided parameters, processes various response formats (markdown, html, etc.), and returns formatted MCP content.
    async function processScrape(url: string, args: ScrapeParams) { const res = await firecrawl.scrapeUrl(url, { ...args, }); if (!res.success) { throw new Error(`Failed to scrape: ${res.error}`); } const content: string[] = []; if (res.markdown) { content.push(res.markdown); } if (res.rawHtml) { content.push(res.rawHtml); } if (res.links) { content.push(res.links.join('\n')); } if (res.screenshot) { content.push(res.screenshot); } if (res.html) { content.push(res.html); } if (res.extract) { content.push(res.extract); } return { content: [ { type: 'text', text: content.join('\n\n') || 'No content found', }, ], result: res, success: true, }; }
  • Dispatch handler in CallToolRequestSchema that validates input, logs progress, calls processScrape, and handles errors for the 'one_scrape' tool.
    case 'one_scrape': { if (!checkScrapeArgs(args)) { throw new Error(`Invalid arguments for tool: [${name}]`); } try { const startTime = Date.now(); server.sendLoggingMessage({ level: 'info', data: `[${new Date().toISOString()}] Scraping started for url: [${args.url}]`, }); const { url, ...scrapeArgs } = args; const { content, success, result } = await processScrape(url, scrapeArgs); server.sendLoggingMessage({ level: 'info', data: `[${new Date().toISOString()}] Scraping completed in ${Date.now() - startTime}ms`, }); return { content, result, success, }; } catch (error) { server.sendLoggingMessage({ level: 'error', data: `[${new Date().toISOString()}] Error scraping: ${error}`, }); const msg = error instanceof Error ? error.message : 'Unknown error'; return { success: false, content: [ { type: 'text', text: msg, }, ], }; } }
  • Defines the Tool object for 'one_scrape' including detailed inputSchema with parameters for URL, formats, actions, extraction schemas, and more.
    export const SCRAPE_TOOL: Tool = { name: 'one_scrape', description: 'Scrape a single webpage with advanced options for content extraction. ' + 'Supports various formats including markdown, HTML, and screenshots. ' + 'Can execute custom actions like clicking or scrolling before scraping.', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL to scrape', }, formats: { type: 'array', items: { type: 'string', enum: [ 'markdown', 'html', 'rawHtml', 'screenshot', 'links', 'screenshot@fullPage', 'extract', ], }, description: "Content formats to extract (default: ['markdown'])", }, onlyMainContent: { type: 'boolean', description: 'Extract only the main content, filtering out navigation, footers, etc.', }, includeTags: { type: 'array', items: { type: 'string' }, description: 'HTML tags to specifically include in extraction', }, excludeTags: { type: 'array', items: { type: 'string' }, description: 'HTML tags to exclude from extraction', }, waitFor: { type: 'number', description: 'Time in milliseconds to wait for dynamic content to load', }, timeout: { type: 'number', description: 'Maximum time in milliseconds to wait for the page to load', }, actions: { type: 'array', items: { type: 'object', properties: { type: { type: 'string', enum: [ 'wait', 'click', 'screenshot', 'write', 'press', 'scroll', 'scrape', 'executeJavascript', ], description: 'Type of action to perform', }, selector: { type: 'string', description: 'CSS selector for the target element', }, milliseconds: { type: 'number', description: 'Time to wait in milliseconds (for wait action)', }, text: { type: 'string', description: 'Text to write (for write action)', }, key: { type: 'string', description: 'Key to press (for press action)', }, direction: { type: 'string', enum: ['up', 'down'], description: 'Scroll direction', }, script: { type: 'string', description: 'JavaScript code to execute', }, fullPage: { type: 'boolean', description: 'Take full page screenshot', }, }, required: ['type'], }, description: 'List of actions to perform before scraping', }, extract: { type: 'object', properties: { schema: { type: 'object', description: 'Schema for structured data extraction', }, systemPrompt: { type: 'string', description: 'System prompt for LLM extraction', }, prompt: { type: 'string', description: 'User prompt for LLM extraction', }, }, description: 'Configuration for structured data extraction', }, mobile: { type: 'boolean', description: 'Use mobile viewport', }, skipTlsVerification: { type: 'boolean', description: 'Skip TLS certificate verification', }, removeBase64Images: { type: 'boolean', description: 'Remove base64 encoded images from output', }, location: { type: 'object', properties: { country: { type: 'string', description: 'Country code for geolocation', }, languages: { type: 'array', items: { type: 'string' }, description: 'Language codes for content', }, }, description: 'Location settings for scraping', }, }, required: ['url'], }, };
  • src/index.ts:66-73 (registration)
    Registers SCRAPE_TOOL (one_scrape) in the MCP server's ListTools response.
    server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [ SEARCH_TOOL, EXTRACT_TOOL, SCRAPE_TOOL, MAP_TOOL, ], }));
  • Helper function to validate input arguments for the scrape tool, ensuring 'url' is present.
    function checkScrapeArgs(args: unknown): args is ScrapeParams & { url: string } { return ( typeof args === 'object' && args !== null && 'url' in args && typeof args.url === 'string' ); }

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/yokingma/one-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server