Skip to main content
Glama
omgwtfwow

MCP Server for Crawl4AI

by omgwtfwow

get_markdown

Extract web content as markdown with filtering options for raw content, optimized output, keyword search, or AI-powered extraction.

Instructions

[STATELESS] Extract content as markdown with filtering options. Supports: raw (full content), fit (optimized, default), bm25 (keyword search), llm (AI-powered extraction). Use bm25/llm with query for specific content. Creates new browser each time. For persistence use create_session + crawl.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to extract markdown from
filterNoFilter type: raw (full), fit (optimized), bm25 (search), llm (AI extraction)fit
queryNoQuery string for bm25/llm filters. Required when using bm25 or llm filter.
cacheNoCache-bust parameter (use different values to force fresh extraction)0

Implementation Reference

  • Main handler function that executes the get_markdown tool logic: maps input parameters to API format, calls the underlying service, formats the response with URL, filter, query, cache info and markdown content.
    async getMarkdown(
      options: Omit<MarkdownEndpointOptions, 'f' | 'q' | 'c'> & { filter?: string; query?: string; cache?: string },
    ) {
      try {
        // Map from schema property names to API parameter names
        const result: MarkdownEndpointResponse = await this.service.getMarkdown({
          url: options.url,
          f: options.filter as FilterType | undefined, // Schema provides 'filter', API expects 'f'
          q: options.query, // Schema provides 'query', API expects 'q'
          c: options.cache, // Schema provides 'cache', API expects 'c'
        });
    
        // Format the response
        let formattedText = `URL: ${result.url}\nFilter: ${result.filter}`;
    
        if (result.query) {
          formattedText += `\nQuery: ${result.query}`;
        }
    
        formattedText += `\nCache: ${result.cache}\n\nMarkdown:\n${result.markdown || 'No content found.'}`;
    
        return {
          content: [
            {
              type: 'text',
              text: formattedText,
            },
          ],
        };
      } catch (error) {
        throw this.formatError(error, 'get markdown');
      }
    }
  • Zod schema definition for get_markdown inputs with validation refinement ensuring query is provided for 'bm25' or 'llm' filters.
    const GetMarkdownBaseSchema = z.object({
      url: z.string().url(),
      filter: z.enum(['raw', 'fit', 'bm25', 'llm']).optional().default('fit'),
      query: z.string().optional(),
      cache: z.string().optional().default('0'),
    });
    
    export const GetMarkdownSchema = createStatelessSchema(GetMarkdownBaseSchema, 'get_markdown').refine(
      (data) => {
        // If filter is bm25 or llm, query is required
        if ((data.filter === 'bm25' || data.filter === 'llm') && !data.query) {
          return false;
        }
        return true;
      },
      {
        message: 'Query parameter is required when using bm25 or llm filter',
        path: ['query'],
      },
    );
  • src/server.ts:821-827 (registration)
    MCP server registration of the get_markdown tool in the CallToolRequestHandler switch: uses schema validation and delegates to ContentHandlers.getMarkdown.
    case 'get_markdown':
      return await this.validateAndExecute(
        'get_markdown',
        args,
        GetMarkdownSchema as z.ZodSchema<z.infer<typeof GetMarkdownSchema>>,
        async (validatedArgs) => this.contentHandlers.getMarkdown(validatedArgs),
      );
  • Underlying service helper that performs the HTTP POST to the Crawl4AI /md endpoint to extract markdown from the URL with specified options.
    async getMarkdown(options: MarkdownEndpointOptions): Promise<MarkdownEndpointResponse> {
      // Validate URL
      if (!validateURL(options.url)) {
        throw new Error('Invalid URL format');
      }
    
      try {
        const response = await this.axiosClient.post('/md', {
          url: options.url,
          f: options.f,
          q: options.q,
          c: options.c,
        });
    
        return response.data;
      } catch (error) {
        return handleAxiosError(error);
      }
    }
  • src/server.ts:120-149 (registration)
    Tool metadata registration in ListToolsRequestHandler, including name, description, and input schema for get_markdown.
    {
      name: 'get_markdown',
      description:
        '[STATELESS] Extract content as markdown with filtering options. Supports: raw (full content), fit (optimized, default), bm25 (keyword search), llm (AI-powered extraction). Use bm25/llm with query for specific content. Creates new browser each time. For persistence use create_session + crawl.',
      inputSchema: {
        type: 'object',
        properties: {
          url: {
            type: 'string',
            description: 'The URL to extract markdown from',
          },
          filter: {
            type: 'string',
            enum: ['raw', 'fit', 'bm25', 'llm'],
            description: 'Filter type: raw (full), fit (optimized), bm25 (search), llm (AI extraction)',
            default: 'fit',
          },
          query: {
            type: 'string',
            description: 'Query string for bm25/llm filters. Required when using bm25 or llm filter.',
          },
          cache: {
            type: 'string',
            description: 'Cache-bust parameter (use different values to force fresh extraction)',
            default: '0',
          },
        },
        required: ['url'],
      },
    },

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/omgwtfwow/mcp-crawl4ai-ts'

If you have feedback or need assistance with the MCP directory API, please join our Discord server