Skip to main content
Glama
omgwtfwow

MCP Server for Crawl4AI

by omgwtfwow

get_html

Extract sanitized HTML from web pages to analyze structure, identify form fields, and plan automation selectors for web crawling operations.

Instructions

[STATELESS] Get sanitized/processed HTML for inspection and automation planning. Use when: finding form fields/selectors, analyzing page structure before automation, building schemas. Returns cleaned HTML showing element names, IDs, and classes - perfect for identifying selectors for subsequent crawl operations. Commonly used before crawl to find selectors for automation. Creates new browser each time.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to extract HTML from

Implementation Reference

  • MCP handler that executes the get_html tool: calls service to fetch HTML and formats response as MCP text content block.
    async getHTML(options: HTMLEndpointOptions) { try { const result: HTMLEndpointResponse = await this.service.getHTML(options); // Response has { html: string, url: string, success: true } return { content: [ { type: 'text', text: result.html || '', }, ], }; } catch (error) { throw this.formatError(error, 'get HTML'); } }
  • Core service implementation: validates URL and POSTs to Crawl4AI backend /html endpoint to retrieve processed HTML.
    async getHTML(options: HTMLEndpointOptions): Promise<HTMLEndpointResponse> { // Validate URL if (!validateURL(options.url)) { throw new Error('Invalid URL format'); } try { const response = await this.axiosClient.post('/html', { url: options.url, // Only url is supported by the endpoint }); return response.data; } catch (error) { return handleAxiosError(error); } }
  • Input schema validation using Zod: requires a valid URL string.
    export const GetHtmlSchema = createStatelessSchema( z.object({ url: z.string().url(), }), 'get_html', );
  • src/server.ts:857-860 (registration)
    Tool call routing: switch case in server handles get_html requests, validates with schema, delegates to handler.
    case 'get_html': return await this.validateAndExecute('get_html', args, GetHtmlSchema, async (validatedArgs) => this.contentHandlers.getHTML(validatedArgs), );
  • src/server.ts:274-287 (registration)
    Tool metadata registration: defines name, description, and input schema advertised in listTools response.
    name: 'get_html', description: '[STATELESS] Get sanitized/processed HTML for inspection and automation planning. Use when: finding form fields/selectors, analyzing page structure before automation, building schemas. Returns cleaned HTML showing element names, IDs, and classes - perfect for identifying selectors for subsequent crawl operations. Commonly used before crawl to find selectors for automation. Creates new browser each time.', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL to extract HTML from', }, }, required: ['url'], }, },

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/omgwtfwow/mcp-crawl4ai-ts'

If you have feedback or need assistance with the MCP directory API, please join our Discord server