crawl_web
Extract webpage content in Markdown, raw HTML, or AI-enhanced formats for analysis and processing.
Instructions
Crawl a specific webpage and extract its content in various formats including Markdown, raw HTML, and AI-enhanced HTML.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to crawl and extract content from | |
| markdown | No | Return content in Markdown format | |
| raw_html | No | Return original, unprocessed HTML | |
| enhanced_html | No | Return AI-enhanced, cleaned HTML |
Implementation Reference
- src/index.ts:150-179 (handler)The handler function for the 'crawl_web' tool. It calls makeCrawlRequest with the provided arguments, returns the JSON result, or an error message if the request fails.async (args) => { try { const result = await makeCrawlRequest<Record<string, unknown>>({ url: args.url, markdown: args.markdown, raw_html: args.raw_html, enhanced_html: args.enhanced_html, }); return { content: [ { type: "text" as const, text: JSON.stringify(result, null, 2), }, ], }; } catch (error) { const errorMessage = error instanceof Error ? error.message : "Unknown error occurred"; return { content: [ { type: "text" as const, text: `Error crawling URL: ${errorMessage}`, }, ], isError: true, }; } }
- src/index.ts:35-40 (schema)Zod schema defining the input parameters for the 'crawl_web' tool, including URL and format options.const WebCrawlSchema = z.object({ url: z.string().describe("URL to crawl and extract content from"), markdown: z.boolean().optional().default(true).describe("Return content in Markdown format"), raw_html: z.boolean().optional().default(false).describe("Return original, unprocessed HTML"), enhanced_html: z.boolean().optional().default(true).describe("Return AI-enhanced, cleaned HTML"), });
- src/index.ts:146-180 (registration)Registration of the 'crawl_web' tool using server.tool(), including name, description, schema, and inline handler.server.tool( "crawl_web", "Crawl a specific webpage and extract its content in various formats including Markdown, raw HTML, and AI-enhanced HTML.", WebCrawlSchema.shape, async (args) => { try { const result = await makeCrawlRequest<Record<string, unknown>>({ url: args.url, markdown: args.markdown, raw_html: args.raw_html, enhanced_html: args.enhanced_html, }); return { content: [ { type: "text" as const, text: JSON.stringify(result, null, 2), }, ], }; } catch (error) { const errorMessage = error instanceof Error ? error.message : "Unknown error occurred"; return { content: [ { type: "text" as const, text: `Error crawling URL: ${errorMessage}`, }, ], isError: true, }; } } );
- src/index.ts:74-94 (helper)Helper function that performs the POST request to the Crawleo /crawl API endpoint using the provided body and API key.async function makeCrawlRequest<T>( body: Record<string, unknown> ): Promise<T> { const apiKey = getApiKey(); const response = await fetch(`${API_BASE_URL}/crawl`, { method: "POST", headers: { "Content-Type": "application/json", "x-api-key": apiKey, }, body: JSON.stringify(body), }); if (!response.ok) { const errorText = await response.text(); throw new Error(`API request failed: ${response.status} - ${errorText}`); } return response.json() as Promise<T>; }