crawl_web

crawl_web

Extract webpage content in Markdown, raw HTML, or AI-enhanced formats for analysis and processing.

Instructions

Crawl a specific webpage and extract its content in various formats including Markdown, raw HTML, and AI-enhanced HTML.

Input Schema

TableJSON Schema

Name	Required	Description
`url`	Yes	URL to crawl and extract content from
`markdown`	No	Return content in Markdown format
`raw_html`	No	Return original, unprocessed HTML
`enhanced_html`	No	Return AI-enhanced, cleaned HTML

Implementation Reference

src/index.ts:150-179 (handler)
The handler function for the 'crawl_web' tool. It calls makeCrawlRequest with the provided arguments, returns the JSON result, or an error message if the request fails.
async (args) => { try { const result = await makeCrawlRequest<Record<string, unknown>>({ url: args.url, markdown: args.markdown, raw_html: args.raw_html, enhanced_html: args.enhanced_html, }); return { content: [ { type: "text" as const, text: JSON.stringify(result, null, 2), }, ], }; } catch (error) { const errorMessage = error instanceof Error ? error.message : "Unknown error occurred"; return { content: [ { type: "text" as const, text: `Error crawling URL: ${errorMessage}`, }, ], isError: true, }; } }
src/index.ts:35-40 (schema)
Zod schema defining the input parameters for the 'crawl_web' tool, including URL and format options.
const WebCrawlSchema = z.object({ url: z.string().describe("URL to crawl and extract content from"), markdown: z.boolean().optional().default(true).describe("Return content in Markdown format"), raw_html: z.boolean().optional().default(false).describe("Return original, unprocessed HTML"), enhanced_html: z.boolean().optional().default(true).describe("Return AI-enhanced, cleaned HTML"), });
src/index.ts:146-180 (registration)
Registration of the 'crawl_web' tool using server.tool(), including name, description, schema, and inline handler.
server.tool( "crawl_web", "Crawl a specific webpage and extract its content in various formats including Markdown, raw HTML, and AI-enhanced HTML.", WebCrawlSchema.shape, async (args) => { try { const result = await makeCrawlRequest<Record<string, unknown>>({ url: args.url, markdown: args.markdown, raw_html: args.raw_html, enhanced_html: args.enhanced_html, }); return { content: [ { type: "text" as const, text: JSON.stringify(result, null, 2), }, ], }; } catch (error) { const errorMessage = error instanceof Error ? error.message : "Unknown error occurred"; return { content: [ { type: "text" as const, text: `Error crawling URL: ${errorMessage}`, }, ], isError: true, }; } } );
src/index.ts:74-94 (helper)
Helper function that performs the POST request to the Crawleo /crawl API endpoint using the provided body and API key.
async function makeCrawlRequest<T>( body: Record<string, unknown> ): Promise<T> { const apiKey = getApiKey(); const response = await fetch(`${API_BASE_URL}/crawl`, { method: "POST", headers: { "Content-Type": "application/json", "x-api-key": apiKey, }, body: JSON.stringify(body), }); if (!response.ok) { const errorText = await response.text(); throw new Error(`API request failed: ${response.status} - ${errorText}`); } return response.json() as Promise<T>; }

crawleo-mcp

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API