get_markdown

get_markdown

Extract web content as markdown with filtering options including full content, optimized formatting, keyword search, or AI-powered extraction for specific information.

Instructions

[STATELESS] Extract content as markdown with filtering options. Supports: raw (full content), fit (optimized, default), bm25 (keyword search), llm (AI-powered extraction). Use bm25/llm with query for specific content. Creates new browser each time. For persistence use create_session + crawl.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`cache`	No	Cache-bust parameter (use different values to force fresh extraction)	0
`filter`	No	Filter type: raw (full), fit (optimized), bm25 (search), llm (AI extraction)	fit
`query`	No	Query string for bm25/llm filters. Required when using bm25 or llm filter.
`url`	Yes	The URL to extract markdown from

Implementation Reference

src/handlers/content-handlers.ts:18-50 (handler)
Main MCP tool handler: validates input (via server), maps schema params to API params (filter→f, query→q, cache→c), calls service.getMarkdown(), formats response as MCP content block with metadata + markdown.
async getMarkdown( options: Omit<MarkdownEndpointOptions, 'f' | 'q' | 'c'> & { filter?: string; query?: string; cache?: string }, ) { try { // Map from schema property names to API parameter names const result: MarkdownEndpointResponse = await this.service.getMarkdown({ url: options.url, f: options.filter as FilterType | undefined, // Schema provides 'filter', API expects 'f' q: options.query, // Schema provides 'query', API expects 'q' c: options.cache, // Schema provides 'cache', API expects 'c' }); // Format the response let formattedText = `URL: ${result.url}\nFilter: ${result.filter}`; if (result.query) { formattedText += `\nQuery: ${result.query}`; } formattedText += `\nCache: ${result.cache}\n\nMarkdown:\n${result.markdown || 'No content found.'}`; return { content: [ { type: 'text', text: formattedText, }, ], }; } catch (error) { throw this.formatError(error, 'get markdown'); } }
src/schemas/validation-schemas.ts:26-45 (schema)
Zod input validation schema for get_markdown tool. Defines url (required), filter enum (raw/fit/bm25/llm, default 'fit'), query (required for bm25/llm), cache. Uses createStatelessSchema helper.
const GetMarkdownBaseSchema = z.object({ url: z.string().url(), filter: z.enum(['raw', 'fit', 'bm25', 'llm']).optional().default('fit'), query: z.string().optional(), cache: z.string().optional().default('0'), }); export const GetMarkdownSchema = createStatelessSchema(GetMarkdownBaseSchema, 'get_markdown').refine( (data) => { // If filter is bm25 or llm, query is required if ((data.filter === 'bm25' || data.filter === 'llm') && !data.query) { return false; } return true; }, { message: 'Query parameter is required when using bm25 or llm filter', path: ['query'], }, );
src/server.ts:121-149 (registration)
Tool metadata registration in listTools response: name, detailed description, inputSchema matching the Zod schema.
name: 'get_markdown', description: '[STATELESS] Extract content as markdown with filtering options. Supports: raw (full content), fit (optimized, default), bm25 (keyword search), llm (AI-powered extraction). Use bm25/llm with query for specific content. Creates new browser each time. For persistence use create_session + crawl.', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL to extract markdown from', }, filter: { type: 'string', enum: ['raw', 'fit', 'bm25', 'llm'], description: 'Filter type: raw (full), fit (optimized), bm25 (search), llm (AI extraction)', default: 'fit', }, query: { type: 'string', description: 'Query string for bm25/llm filters. Required when using bm25 or llm filter.', }, cache: { type: 'string', description: 'Cache-bust parameter (use different values to force fresh extraction)', default: '0', }, }, required: ['url'], }, },
src/server.ts:821-827 (registration)
Dispatch handler: routes 'get_markdown' calls to validateAndExecute with GetMarkdownSchema and ContentHandlers.getMarkdown
case 'get_markdown': return await this.validateAndExecute( 'get_markdown', args, GetMarkdownSchema as z.ZodSchema<z.infer<typeof GetMarkdownSchema>>, async (validatedArgs) => this.contentHandlers.getMarkdown(validatedArgs), );
src/crawl4ai-service.ts:120-138 (helper)
Service helper: Makes HTTP POST to Crawl4AI /md endpoint with url,f,q,c params. Validates URL format. Handles axios errors uniformly.
async getMarkdown(options: MarkdownEndpointOptions): Promise<MarkdownEndpointResponse> { // Validate URL if (!validateURL(options.url)) { throw new Error('Invalid URL format'); } try { const response = await this.axiosClient.post('/md', { url: options.url, f: options.f, q: options.q, c: options.c, }); return response.data; } catch (error) { return handleAxiosError(error); } }

MCP Server for Crawl4AI

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API