extract_pdf_text

Extract text content from PDF documents with optional metadata and formatting preservation for data processing and analysis.

Instructions

Extract text content from PDF documents with optional metadata and formatting preservation

Input Schema

TableJSON Schema

Name	Required	Description	Default
`file_path`	Yes	Path to the PDF file to extract text from
`pages`	No	Page range to extract (e.g., "1-5", "1,3,5", or "all")	all
`preserve_formatting`	No	Whether to preserve text formatting and structure
`include_metadata`	No	Whether to include document metadata in the response

Implementation Reference

src/tools/extract-text.ts:37-62 (handler)
Core handler function that validates input, extracts text from PDF using PDFProcessor service, handles optional metadata inclusion, and returns structured results or errors.
export async function handleExtractText(args: unknown): Promise<ExtractTextResult> { try { const params = ExtractTextParamsSchema.parse(args); const processor = new PDFProcessor(); const result = await processor.extractText( params.file_path, params.preserve_formatting ); const response: ExtractTextResult = { text: result.text, page_count: result.pageCount, processing_time_ms: result.processingTimeMs }; if (params.include_metadata) { response.metadata = result.metadata; } return response; } catch (error) { const mcpError = handleError(error, typeof args === 'object' && args !== null && 'file_path' in args ? String(args.file_path) : undefined); throw new Error(JSON.stringify(mcpError)); } }
src/tools/extract-text.ts:7-35 (schema)
Tool definition including name, description, and JSON input schema matching the Zod validation schema used in the handler.
export const extractTextTool: Tool = { name: 'extract_pdf_text', description: 'Extract text content from PDF documents with optional metadata and formatting preservation', inputSchema: { type: 'object', properties: { file_path: { type: 'string', description: 'Path to the PDF file to extract text from' }, pages: { type: 'string', description: 'Page range to extract (e.g., "1-5", "1,3,5", or "all")', default: 'all' }, preserve_formatting: { type: 'boolean', description: 'Whether to preserve text formatting and structure', default: true }, include_metadata: { type: 'boolean', description: 'Whether to include document metadata in the response', default: false } }, required: ['file_path'] } };
src/types/mcp-types.ts:7-12 (schema)
Zod schema for runtime input validation of extract_pdf_text parameters, parsed in the handler.
export const ExtractTextParamsSchema = z.object({ file_path: filePathValidation, pages: z.string().default('all'), preserve_formatting: z.boolean().default(true), include_metadata: z.boolean().default(false) });
src/index.ts:39-46 (registration)
Registers the extract_pdf_text tool (via extractTextTool) in the MCP server's list of available tools.
this.server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [ extractTextTool, extractMetadataTool, extractPagesTool, validatePDFTool, ], }));
src/index.ts:53-61 (registration)
Switch case in CallToolRequest handler that dispatches to the extract_pdf_text implementation by calling handleExtractText.
case 'extract_pdf_text': return { content: [ { type: 'text', text: JSON.stringify(await handleExtractText(args), null, 2), }, ], };

PDF Reader MCP Server

extract_pdf_text

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API