Skip to main content
Glama
omgwtfwow

MCP Server for Crawl4AI

by omgwtfwow

get_markdown

Extract web content as markdown with filtering options for raw content, optimized output, keyword search, or AI-powered extraction.

Instructions

[STATELESS] Extract content as markdown with filtering options. Supports: raw (full content), fit (optimized, default), bm25 (keyword search), llm (AI-powered extraction). Use bm25/llm with query for specific content. Creates new browser each time. For persistence use create_session + crawl.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to extract markdown from
filterNoFilter type: raw (full), fit (optimized), bm25 (search), llm (AI extraction)fit
queryNoQuery string for bm25/llm filters. Required when using bm25 or llm filter.
cacheNoCache-bust parameter (use different values to force fresh extraction)0

Implementation Reference

  • Main handler function that executes the get_markdown tool logic: maps input parameters to API format, calls the underlying service, formats the response with URL, filter, query, cache info and markdown content.
    async getMarkdown(
      options: Omit<MarkdownEndpointOptions, 'f' | 'q' | 'c'> & { filter?: string; query?: string; cache?: string },
    ) {
      try {
        // Map from schema property names to API parameter names
        const result: MarkdownEndpointResponse = await this.service.getMarkdown({
          url: options.url,
          f: options.filter as FilterType | undefined, // Schema provides 'filter', API expects 'f'
          q: options.query, // Schema provides 'query', API expects 'q'
          c: options.cache, // Schema provides 'cache', API expects 'c'
        });
    
        // Format the response
        let formattedText = `URL: ${result.url}\nFilter: ${result.filter}`;
    
        if (result.query) {
          formattedText += `\nQuery: ${result.query}`;
        }
    
        formattedText += `\nCache: ${result.cache}\n\nMarkdown:\n${result.markdown || 'No content found.'}`;
    
        return {
          content: [
            {
              type: 'text',
              text: formattedText,
            },
          ],
        };
      } catch (error) {
        throw this.formatError(error, 'get markdown');
      }
    }
  • Zod schema definition for get_markdown inputs with validation refinement ensuring query is provided for 'bm25' or 'llm' filters.
    const GetMarkdownBaseSchema = z.object({
      url: z.string().url(),
      filter: z.enum(['raw', 'fit', 'bm25', 'llm']).optional().default('fit'),
      query: z.string().optional(),
      cache: z.string().optional().default('0'),
    });
    
    export const GetMarkdownSchema = createStatelessSchema(GetMarkdownBaseSchema, 'get_markdown').refine(
      (data) => {
        // If filter is bm25 or llm, query is required
        if ((data.filter === 'bm25' || data.filter === 'llm') && !data.query) {
          return false;
        }
        return true;
      },
      {
        message: 'Query parameter is required when using bm25 or llm filter',
        path: ['query'],
      },
    );
  • src/server.ts:821-827 (registration)
    MCP server registration of the get_markdown tool in the CallToolRequestHandler switch: uses schema validation and delegates to ContentHandlers.getMarkdown.
    case 'get_markdown':
      return await this.validateAndExecute(
        'get_markdown',
        args,
        GetMarkdownSchema as z.ZodSchema<z.infer<typeof GetMarkdownSchema>>,
        async (validatedArgs) => this.contentHandlers.getMarkdown(validatedArgs),
      );
  • Underlying service helper that performs the HTTP POST to the Crawl4AI /md endpoint to extract markdown from the URL with specified options.
    async getMarkdown(options: MarkdownEndpointOptions): Promise<MarkdownEndpointResponse> {
      // Validate URL
      if (!validateURL(options.url)) {
        throw new Error('Invalid URL format');
      }
    
      try {
        const response = await this.axiosClient.post('/md', {
          url: options.url,
          f: options.f,
          q: options.q,
          c: options.c,
        });
    
        return response.data;
      } catch (error) {
        return handleAxiosError(error);
      }
    }
  • src/server.ts:120-149 (registration)
    Tool metadata registration in ListToolsRequestHandler, including name, description, and input schema for get_markdown.
    {
      name: 'get_markdown',
      description:
        '[STATELESS] Extract content as markdown with filtering options. Supports: raw (full content), fit (optimized, default), bm25 (keyword search), llm (AI-powered extraction). Use bm25/llm with query for specific content. Creates new browser each time. For persistence use create_session + crawl.',
      inputSchema: {
        type: 'object',
        properties: {
          url: {
            type: 'string',
            description: 'The URL to extract markdown from',
          },
          filter: {
            type: 'string',
            enum: ['raw', 'fit', 'bm25', 'llm'],
            description: 'Filter type: raw (full), fit (optimized), bm25 (search), llm (AI extraction)',
            default: 'fit',
          },
          query: {
            type: 'string',
            description: 'Query string for bm25/llm filters. Required when using bm25 or llm filter.',
          },
          cache: {
            type: 'string',
            description: 'Cache-bust parameter (use different values to force fresh extraction)',
            default: '0',
          },
        },
        required: ['url'],
      },
    },
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden and discloses key behavioral traits: statelessness ('[STATELESS]'), browser creation ('Creates new browser each time'), and persistence alternatives. It does not cover rate limits or auth needs, but adds significant context beyond basic function.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with each sentence adding value: stating purpose, listing filters, explaining usage, and noting behavioral aspects. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is fairly complete for a tool with 4 parameters and stateless behavior. It covers purpose, usage, and key traits, but lacks details on output format or error handling, which would enhance completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters. The description adds some meaning by explaining filter purposes (e.g., 'raw (full content), fit (optimized, default)'), but does not provide additional syntax or format details beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('extract content as markdown') and distinguishes it from siblings by mentioning filtering options and browser creation, unlike tools like 'get_html' or 'extract_links'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides clear context on when to use specific filters (e.g., 'Use bm25/llm with query for specific content') and mentions alternatives for persistence ('For persistence use create_session + crawl'), but does not explicitly state when not to use this tool versus all siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/omgwtfwow/mcp-crawl4ai-ts'

If you have feedback or need assistance with the MCP directory API, please join our Discord server