Skip to main content
Glama

one_scrape

Extract content from webpages with customizable options including markdown, HTML, screenshots, and structured data extraction. Supports dynamic content handling through pre-scrape actions like clicking, scrolling, or JavaScript execution.

Instructions

Scrape a single webpage with advanced options for content extraction. Supports various formats including markdown, HTML, and screenshots. Can execute custom actions like clicking or scrolling before scraping.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to scrape
formatsNoContent formats to extract (default: ['markdown'])
onlyMainContentNoExtract only the main content, filtering out navigation, footers, etc.
includeTagsNoHTML tags to specifically include in extraction
excludeTagsNoHTML tags to exclude from extraction
waitForNoTime in milliseconds to wait for dynamic content to load
timeoutNoMaximum time in milliseconds to wait for the page to load
actionsNoList of actions to perform before scraping
extractNoConfiguration for structured data extraction
mobileNoUse mobile viewport
skipTlsVerificationNoSkip TLS certificate verification
removeBase64ImagesNoRemove base64 encoded images from output
locationNoLocation settings for scraping

Implementation Reference

  • Core implementation of the one_scrape tool: calls Firecrawl's scrapeUrl API and processes the response into MCP content format.
    async function processScrape(url: string, args: ScrapeParams) { const res = await firecrawl.scrapeUrl(url, { ...args, }); if (!res.success) { throw new Error(`Failed to scrape: ${res.error}`); } const content: string[] = []; if (res.markdown) { content.push(res.markdown); } if (res.rawHtml) { content.push(res.rawHtml); } if (res.links) { content.push(res.links.join('\n')); } if (res.screenshot) { content.push(res.screenshot); } if (res.html) { content.push(res.html); } if (res.extract) { content.push(res.extract); } return { content: [ { type: 'text', text: content.join('\n\n') || 'No content found', }, ], result: res, success: true, }; }
  • Dispatch handler for 'one_scrape' tool call: validates input, handles logging and errors, delegates to processScrape.
    case 'one_scrape': { if (!checkScrapeArgs(args)) { throw new Error(`Invalid arguments for tool: [${name}]`); } try { const startTime = Date.now(); server.sendLoggingMessage({ level: 'info', data: `[${new Date().toISOString()}] Scraping started for url: [${args.url}]`, }); const { url, ...scrapeArgs } = args; const { content, success, result } = await processScrape(url, scrapeArgs); server.sendLoggingMessage({ level: 'info', data: `[${new Date().toISOString()}] Scraping completed in ${Date.now() - startTime}ms`, }); return { content, result, success, }; } catch (error) { server.sendLoggingMessage({ level: 'error', data: `[${new Date().toISOString()}] Error scraping: ${error}`, }); const msg = error instanceof Error ? error.message : 'Unknown error'; return { success: false, content: [ { type: 'text', text: msg, }, ], }; } }
  • Input schema and metadata definition for the 'one_scrape' tool.
    export const SCRAPE_TOOL: Tool = { name: 'one_scrape', description: 'Scrape a single webpage with advanced options for content extraction. ' + 'Supports various formats including markdown, HTML, and screenshots. ' + 'Can execute custom actions like clicking or scrolling before scraping.', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL to scrape', }, formats: { type: 'array', items: { type: 'string', enum: [ 'markdown', 'html', 'rawHtml', 'screenshot', 'links', 'screenshot@fullPage', 'extract', ], }, description: "Content formats to extract (default: ['markdown'])", }, onlyMainContent: { type: 'boolean', description: 'Extract only the main content, filtering out navigation, footers, etc.', }, includeTags: { type: 'array', items: { type: 'string' }, description: 'HTML tags to specifically include in extraction', }, excludeTags: { type: 'array', items: { type: 'string' }, description: 'HTML tags to exclude from extraction', }, waitFor: { type: 'number', description: 'Time in milliseconds to wait for dynamic content to load', }, timeout: { type: 'number', description: 'Maximum time in milliseconds to wait for the page to load', }, actions: { type: 'array', items: { type: 'object', properties: { type: { type: 'string', enum: [ 'wait', 'click', 'screenshot', 'write', 'press', 'scroll', 'scrape', 'executeJavascript', ], description: 'Type of action to perform', }, selector: { type: 'string', description: 'CSS selector for the target element', }, milliseconds: { type: 'number', description: 'Time to wait in milliseconds (for wait action)', }, text: { type: 'string', description: 'Text to write (for write action)', }, key: { type: 'string', description: 'Key to press (for press action)', }, direction: { type: 'string', enum: ['up', 'down'], description: 'Scroll direction', }, script: { type: 'string', description: 'JavaScript code to execute', }, fullPage: { type: 'boolean', description: 'Take full page screenshot', }, }, required: ['type'], }, description: 'List of actions to perform before scraping', }, extract: { type: 'object', properties: { schema: { type: 'object', description: 'Schema for structured data extraction', }, systemPrompt: { type: 'string', description: 'System prompt for LLM extraction', }, prompt: { type: 'string', description: 'User prompt for LLM extraction', }, }, description: 'Configuration for structured data extraction', }, mobile: { type: 'boolean', description: 'Use mobile viewport', }, skipTlsVerification: { type: 'boolean', description: 'Skip TLS certificate verification', }, removeBase64Images: { type: 'boolean', description: 'Remove base64 encoded images from output', }, location: { type: 'object', properties: { country: { type: 'string', description: 'Country code for geolocation', }, languages: { type: 'array', items: { type: 'string' }, description: 'Language codes for content', }, }, description: 'Location settings for scraping', }, }, required: ['url'], }, };
  • src/index.ts:66-73 (registration)
    Registers the SCRAPE_TOOL (one_scrape) in the MCP server's list of available tools.
    server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [ SEARCH_TOOL, EXTRACT_TOOL, SCRAPE_TOOL, MAP_TOOL, ], }));
  • Helper function to validate arguments for the one_scrape tool.
    function checkScrapeArgs(args: unknown): args is ScrapeParams & { url: string } { return ( typeof args === 'object' && args !== null && 'url' in args && typeof args.url === 'string' ); }
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/yokingma/one-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server