Skip to main content
Glama

crawl_web

Extract webpage content in Markdown, raw HTML, or AI-enhanced formats for analysis and processing.

Instructions

Crawl a specific webpage and extract its content in various formats including Markdown, raw HTML, and AI-enhanced HTML.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL to crawl and extract content from
markdownNoReturn content in Markdown format
raw_htmlNoReturn original, unprocessed HTML
enhanced_htmlNoReturn AI-enhanced, cleaned HTML

Implementation Reference

  • The handler function for the 'crawl_web' tool. It calls makeCrawlRequest with the provided arguments, returns the JSON result, or an error message if the request fails.
    async (args) => {
      try {
        const result = await makeCrawlRequest<Record<string, unknown>>({
          url: args.url,
          markdown: args.markdown,
          raw_html: args.raw_html,
          enhanced_html: args.enhanced_html,
        });
    
        return {
          content: [
            {
              type: "text" as const,
              text: JSON.stringify(result, null, 2),
            },
          ],
        };
      } catch (error) {
        const errorMessage = error instanceof Error ? error.message : "Unknown error occurred";
        return {
          content: [
            {
              type: "text" as const,
              text: `Error crawling URL: ${errorMessage}`,
            },
          ],
          isError: true,
        };
      }
    }
  • Zod schema defining the input parameters for the 'crawl_web' tool, including URL and format options.
    const WebCrawlSchema = z.object({
      url: z.string().describe("URL to crawl and extract content from"),
      markdown: z.boolean().optional().default(true).describe("Return content in Markdown format"),
      raw_html: z.boolean().optional().default(false).describe("Return original, unprocessed HTML"),
      enhanced_html: z.boolean().optional().default(true).describe("Return AI-enhanced, cleaned HTML"),
    });
  • src/index.ts:146-180 (registration)
    Registration of the 'crawl_web' tool using server.tool(), including name, description, schema, and inline handler.
    server.tool(
      "crawl_web",
      "Crawl a specific webpage and extract its content in various formats including Markdown, raw HTML, and AI-enhanced HTML.",
      WebCrawlSchema.shape,
      async (args) => {
        try {
          const result = await makeCrawlRequest<Record<string, unknown>>({
            url: args.url,
            markdown: args.markdown,
            raw_html: args.raw_html,
            enhanced_html: args.enhanced_html,
          });
    
          return {
            content: [
              {
                type: "text" as const,
                text: JSON.stringify(result, null, 2),
              },
            ],
          };
        } catch (error) {
          const errorMessage = error instanceof Error ? error.message : "Unknown error occurred";
          return {
            content: [
              {
                type: "text" as const,
                text: `Error crawling URL: ${errorMessage}`,
              },
            ],
            isError: true,
          };
        }
      }
    );
  • Helper function that performs the POST request to the Crawleo /crawl API endpoint using the provided body and API key.
    async function makeCrawlRequest<T>(
      body: Record<string, unknown>
    ): Promise<T> {
      const apiKey = getApiKey();
      
      const response = await fetch(`${API_BASE_URL}/crawl`, {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          "x-api-key": apiKey,
        },
        body: JSON.stringify(body),
      });
    
      if (!response.ok) {
        const errorText = await response.text();
        throw new Error(`API request failed: ${response.status} - ${errorText}`);
      }
    
      return response.json() as Promise<T>;
    }
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Crawleo/crawleo-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server