get_markdown
Extract web content as markdown with filtering options for raw content, optimized output, keyword search, or AI-powered extraction.
Instructions
[STATELESS] Extract content as markdown with filtering options. Supports: raw (full content), fit (optimized, default), bm25 (keyword search), llm (AI-powered extraction). Use bm25/llm with query for specific content. Creates new browser each time. For persistence use create_session + crawl.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The URL to extract markdown from | |
| filter | No | Filter type: raw (full), fit (optimized), bm25 (search), llm (AI extraction) | fit |
| query | No | Query string for bm25/llm filters. Required when using bm25 or llm filter. | |
| cache | No | Cache-bust parameter (use different values to force fresh extraction) | 0 |
Implementation Reference
- src/handlers/content-handlers.ts:18-50 (handler)Main handler function that executes the get_markdown tool logic: maps input parameters to API format, calls the underlying service, formats the response with URL, filter, query, cache info and markdown content.async getMarkdown( options: Omit<MarkdownEndpointOptions, 'f' | 'q' | 'c'> & { filter?: string; query?: string; cache?: string }, ) { try { // Map from schema property names to API parameter names const result: MarkdownEndpointResponse = await this.service.getMarkdown({ url: options.url, f: options.filter as FilterType | undefined, // Schema provides 'filter', API expects 'f' q: options.query, // Schema provides 'query', API expects 'q' c: options.cache, // Schema provides 'cache', API expects 'c' }); // Format the response let formattedText = `URL: ${result.url}\nFilter: ${result.filter}`; if (result.query) { formattedText += `\nQuery: ${result.query}`; } formattedText += `\nCache: ${result.cache}\n\nMarkdown:\n${result.markdown || 'No content found.'}`; return { content: [ { type: 'text', text: formattedText, }, ], }; } catch (error) { throw this.formatError(error, 'get markdown'); } }
- Zod schema definition for get_markdown inputs with validation refinement ensuring query is provided for 'bm25' or 'llm' filters.const GetMarkdownBaseSchema = z.object({ url: z.string().url(), filter: z.enum(['raw', 'fit', 'bm25', 'llm']).optional().default('fit'), query: z.string().optional(), cache: z.string().optional().default('0'), }); export const GetMarkdownSchema = createStatelessSchema(GetMarkdownBaseSchema, 'get_markdown').refine( (data) => { // If filter is bm25 or llm, query is required if ((data.filter === 'bm25' || data.filter === 'llm') && !data.query) { return false; } return true; }, { message: 'Query parameter is required when using bm25 or llm filter', path: ['query'], }, );
- src/server.ts:821-827 (registration)MCP server registration of the get_markdown tool in the CallToolRequestHandler switch: uses schema validation and delegates to ContentHandlers.getMarkdown.case 'get_markdown': return await this.validateAndExecute( 'get_markdown', args, GetMarkdownSchema as z.ZodSchema<z.infer<typeof GetMarkdownSchema>>, async (validatedArgs) => this.contentHandlers.getMarkdown(validatedArgs), );
- src/crawl4ai-service.ts:120-138 (helper)Underlying service helper that performs the HTTP POST to the Crawl4AI /md endpoint to extract markdown from the URL with specified options.async getMarkdown(options: MarkdownEndpointOptions): Promise<MarkdownEndpointResponse> { // Validate URL if (!validateURL(options.url)) { throw new Error('Invalid URL format'); } try { const response = await this.axiosClient.post('/md', { url: options.url, f: options.f, q: options.q, c: options.c, }); return response.data; } catch (error) { return handleAxiosError(error); } }
- src/server.ts:120-149 (registration)Tool metadata registration in ListToolsRequestHandler, including name, description, and input schema for get_markdown.{ name: 'get_markdown', description: '[STATELESS] Extract content as markdown with filtering options. Supports: raw (full content), fit (optimized, default), bm25 (keyword search), llm (AI-powered extraction). Use bm25/llm with query for specific content. Creates new browser each time. For persistence use create_session + crawl.', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL to extract markdown from', }, filter: { type: 'string', enum: ['raw', 'fit', 'bm25', 'llm'], description: 'Filter type: raw (full), fit (optimized), bm25 (search), llm (AI extraction)', default: 'fit', }, query: { type: 'string', description: 'Query string for bm25/llm filters. Required when using bm25 or llm filter.', }, cache: { type: 'string', description: 'Cache-bust parameter (use different values to force fresh extraction)', default: '0', }, }, required: ['url'], }, },