get_html

get_html

Extract sanitized HTML from web pages to analyze structure, identify form fields, and plan automation selectors for web crawling operations.

Instructions

[STATELESS] Get sanitized/processed HTML for inspection and automation planning. Use when: finding form fields/selectors, analyzing page structure before automation, building schemas. Returns cleaned HTML showing element names, IDs, and classes - perfect for identifying selectors for subsequent crawl operations. Commonly used before crawl to find selectors for automation. Creates new browser each time.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`url`	Yes	The URL to extract HTML from

Implementation Reference

src/handlers/content-handlers.ts:162-178 (handler)

MCP handler that executes the get_html tool: calls service to fetch HTML and formats response as MCP text content block.

async getHTML(options: HTMLEndpointOptions) {
  try {
    const result: HTMLEndpointResponse = await this.service.getHTML(options);

    // Response has { html: string, url: string, success: true }
    return {
      content: [
        {
          type: 'text',
          text: result.html || '',
        },
      ],
    };
  } catch (error) {
    throw this.formatError(error, 'get HTML');
  }
}

src/crawl4ai-service.ts:239-255 (helper)

Core service implementation: validates URL and POSTs to Crawl4AI backend /html endpoint to retrieve processed HTML.

async getHTML(options: HTMLEndpointOptions): Promise<HTMLEndpointResponse> {
  // Validate URL
  if (!validateURL(options.url)) {
    throw new Error('Invalid URL format');
  }

  try {
    const response = await this.axiosClient.post('/html', {
      url: options.url,
      // Only url is supported by the endpoint
    });

    return response.data;
  } catch (error) {
    return handleAxiosError(error);
  }
}

src/schemas/validation-schemas.ts:55-60 (schema)

Input schema validation using Zod: requires a valid URL string.

export const GetHtmlSchema = createStatelessSchema(
  z.object({
    url: z.string().url(),
  }),
  'get_html',
);

src/server.ts:857-860 (registration)

Tool call routing: switch case in server handles get_html requests, validates with schema, delegates to handler.

case 'get_html':
  return await this.validateAndExecute('get_html', args, GetHtmlSchema, async (validatedArgs) =>
    this.contentHandlers.getHTML(validatedArgs),
  );

src/server.ts:274-287 (registration)

Tool metadata registration: defines name, description, and input schema advertised in listTools response.

  name: 'get_html',
  description:
    '[STATELESS] Get sanitized/processed HTML for inspection and automation planning. Use when: finding form fields/selectors, analyzing page structure before automation, building schemas. Returns cleaned HTML showing element names, IDs, and classes - perfect for identifying selectors for subsequent crawl operations. Commonly used before crawl to find selectors for automation. Creates new browser each time.',
  inputSchema: {
    type: 'object',
    properties: {
      url: {
        type: 'string',
        description: 'The URL to extract HTML from',
      },
    },
    required: ['url'],
  },
},

MCP Server for Crawl4AI

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API