pdf_extract_text
Extract text from PDF files with page range control for efficient content analysis and data processing.
Instructions
Extract text content from a PDF file. Returns first 10 pages by default to avoid exceeding LLM context limits. Use the 'pages' parameter for specific pages.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| filePath | Yes | Absolute path to the PDF file | |
| pages | No | Page range, e.g. '1-5' or '1,3,5'. Defaults to first 10 pages. |
Implementation Reference
- src/tools/read.ts:17-79 (handler)The `pdf_extract_text` tool is registered and implemented within `src/tools/read.ts`. The implementation validates the input path and file size, determines the page range, extracts the text using a service, and returns the result or an error message.
server.registerTool( "pdf_extract_text", { description: "Extract text content from a PDF file. Returns first 10 pages by default to avoid exceeding LLM context limits. Use the 'pages' parameter for specific pages.", inputSchema: z .object({ filePath: z.string().max(4096).describe("Absolute path to the PDF file"), pages: z .string() .max(256) .optional() .describe( "Page range, e.g. '1-5' or '1,3,5'. Defaults to first 10 pages." ), }) .strict(), annotations: { readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: false, }, }, async ({ filePath, pages }) => { try { const resolvedPath = await validatePdfPath(filePath); await validateFileSize(resolvedPath); const totalPages = await getPdfPageCount(resolvedPath); let pageIndices: number[]; let extractedLabel: string; if (pages) { pageIndices = parsePageRange(pages, totalPages); extractedLabel = pages; } else { const count = Math.min(DEFAULT_EXTRACT_PAGES, totalPages); pageIndices = Array.from({ length: count }, (_, i) => i); extractedLabel = count === 1 ? "1" : `1-${count}`; } const result = await extractPdfText(resolvedPath, pageIndices); const response: Record<string, unknown> = { totalPages, extractedPages: extractedLabel, pages: result.pages, }; if (result.pages.length < totalPages) { response.note = `Showing pages ${extractedLabel} of ${totalPages}. Request specific pages for more.`; } return toolSuccess(response); } catch (error) { return toolError( error instanceof Error ? error.message : String(error) ); } } );