Skip to main content
Glama
omgwtfwow

MCP Server for Crawl4AI

by omgwtfwow

get_html

Extract sanitized HTML from any webpage to identify form fields, analyze page structure, and find CSS selectors for web automation and crawling operations.

Instructions

[STATELESS] Get sanitized/processed HTML for inspection and automation planning. Use when: finding form fields/selectors, analyzing page structure before automation, building schemas. Returns cleaned HTML showing element names, IDs, and classes - perfect for identifying selectors for subsequent crawl operations. Commonly used before crawl to find selectors for automation. Creates new browser each time.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to extract HTML from

Implementation Reference

  • MCP tool handler that fetches HTML via service and returns it formatted as MCP text content.
    async getHTML(options: HTMLEndpointOptions) { try { const result: HTMLEndpointResponse = await this.service.getHTML(options); // Response has { html: string, url: string, success: true } return { content: [ { type: 'text', text: result.html || '', }, ], }; } catch (error) { throw this.formatError(error, 'get HTML'); } }
  • Zod schema defining input validation for get_html tool (requires url).
    export const GetHtmlSchema = createStatelessSchema( z.object({ url: z.string().url(), }), 'get_html', );
  • src/server.ts:857-860 (registration)
    Tool registration in MCP server request handler: dispatches get_html calls to contentHandlers.getHTML after validation.
    case 'get_html': return await this.validateAndExecute('get_html', args, GetHtmlSchema, async (validatedArgs) => this.contentHandlers.getHTML(validatedArgs), );
  • Service helper that performs HTTP POST request to Crawl4AI /html endpoint to retrieve processed HTML.
    async getHTML(options: HTMLEndpointOptions): Promise<HTMLEndpointResponse> { // Validate URL if (!validateURL(options.url)) { throw new Error('Invalid URL format'); } try { const response = await this.axiosClient.post('/html', { url: options.url, // Only url is supported by the endpoint }); return response.data; } catch (error) { return handleAxiosError(error); } }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/omgwtfwow/mcp-crawl4ai-ts'

If you have feedback or need assistance with the MCP directory API, please join our Discord server