scrape-website
Extract website content in markdown, HTML, or text formats from any URL for data collection and analysis purposes.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| formats | No |
Implementation Reference
- src/server.js:68-126 (handler)The handler function for the scrape-website tool. It scrapes the website using firecrawl.scrapeUrl, logs debug info, handles errors with Sentry, and returns the scraped content as markdown or other formats in a structured response.async ({ url, formats }) => { return await Sentry.startSpan( { name: "scrape-website" }, async () => { try { // Debug input console.error('DEBUG: Scraping URL:', url, 'with formats:', formats); // Add Sentry breadcrumb for debugging Sentry.addBreadcrumb({ category: 'scrape-website', message: `Scraping URL: ${url}`, data: { formats }, level: 'info' }); // Scrape the website const scrapeResult = await firecrawl.scrapeUrl(url, { formats: formats }); // Debug raw response console.error('DEBUG: Raw scrape result:', JSON.stringify(scrapeResult, null, 2)); if (!scrapeResult.success) { // Capture error in Sentry Sentry.captureMessage(`Failed to scrape website: ${scrapeResult.error}`, 'error'); return { content: [{ type: "text", text: `Failed to scrape website: ${scrapeResult.error}` }], isError: true }; } // Return the content directly return { content: [{ type: "text", text: scrapeResult.markdown || scrapeResult.content || 'No content available' }] }; } catch (error) { console.error('DEBUG: Caught error:', error); // Capture exception in Sentry Sentry.captureException(error); return { content: [{ type: "text", text: `Error scraping website: ${error.message}` }], isError: true }; } } ); }
- src/server.js:64-67 (schema)Zod input schema defining the parameters for the scrape-website tool: a required URL and optional formats array defaulting to markdown.{ url: z.string().url(), formats: z.array(z.enum(['markdown', 'html', 'text'])).default(['markdown']) },
- src/server.js:62-127 (registration)Registration of the scrape-website tool on the MCP server using server.tool(), specifying name, input schema, and handler function.server.tool( "scrape-website", { url: z.string().url(), formats: z.array(z.enum(['markdown', 'html', 'text'])).default(['markdown']) }, async ({ url, formats }) => { return await Sentry.startSpan( { name: "scrape-website" }, async () => { try { // Debug input console.error('DEBUG: Scraping URL:', url, 'with formats:', formats); // Add Sentry breadcrumb for debugging Sentry.addBreadcrumb({ category: 'scrape-website', message: `Scraping URL: ${url}`, data: { formats }, level: 'info' }); // Scrape the website const scrapeResult = await firecrawl.scrapeUrl(url, { formats: formats }); // Debug raw response console.error('DEBUG: Raw scrape result:', JSON.stringify(scrapeResult, null, 2)); if (!scrapeResult.success) { // Capture error in Sentry Sentry.captureMessage(`Failed to scrape website: ${scrapeResult.error}`, 'error'); return { content: [{ type: "text", text: `Failed to scrape website: ${scrapeResult.error}` }], isError: true }; } // Return the content directly return { content: [{ type: "text", text: scrapeResult.markdown || scrapeResult.content || 'No content available' }] }; } catch (error) { console.error('DEBUG: Caught error:', error); // Capture exception in Sentry Sentry.captureException(error); return { content: [{ type: "text", text: `Error scraping website: ${error.message}` }], isError: true }; } } ); } );