get_html
Extract sanitized HTML from any webpage to identify form fields, analyze page structure, and find CSS selectors for web automation and crawling operations.
Instructions
[STATELESS] Get sanitized/processed HTML for inspection and automation planning. Use when: finding form fields/selectors, analyzing page structure before automation, building schemas. Returns cleaned HTML showing element names, IDs, and classes - perfect for identifying selectors for subsequent crawl operations. Commonly used before crawl to find selectors for automation. Creates new browser each time.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The URL to extract HTML from |
Implementation Reference
- src/handlers/content-handlers.ts:162-178 (handler)MCP tool handler that fetches HTML via service and returns it formatted as MCP text content.async getHTML(options: HTMLEndpointOptions) { try { const result: HTMLEndpointResponse = await this.service.getHTML(options); // Response has { html: string, url: string, success: true } return { content: [ { type: 'text', text: result.html || '', }, ], }; } catch (error) { throw this.formatError(error, 'get HTML'); } }
- Zod schema defining input validation for get_html tool (requires url).export const GetHtmlSchema = createStatelessSchema( z.object({ url: z.string().url(), }), 'get_html', );
- src/server.ts:857-860 (registration)Tool registration in MCP server request handler: dispatches get_html calls to contentHandlers.getHTML after validation.case 'get_html': return await this.validateAndExecute('get_html', args, GetHtmlSchema, async (validatedArgs) => this.contentHandlers.getHTML(validatedArgs), );
- src/crawl4ai-service.ts:239-255 (helper)Service helper that performs HTTP POST request to Crawl4AI /html endpoint to retrieve processed HTML.async getHTML(options: HTMLEndpointOptions): Promise<HTMLEndpointResponse> { // Validate URL if (!validateURL(options.url)) { throw new Error('Invalid URL format'); } try { const response = await this.axiosClient.post('/html', { url: options.url, // Only url is supported by the endpoint }); return response.data; } catch (error) { return handleAxiosError(error); } }