Skip to main content
Glama

pdf_extract_text

Extract text from PDF files with page range control for efficient content analysis and data processing.

Instructions

Extract text content from a PDF file. Returns first 10 pages by default to avoid exceeding LLM context limits. Use the 'pages' parameter for specific pages.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
filePathYesAbsolute path to the PDF file
pagesNoPage range, e.g. '1-5' or '1,3,5'. Defaults to first 10 pages.

Implementation Reference

  • The `pdf_extract_text` tool is registered and implemented within `src/tools/read.ts`. The implementation validates the input path and file size, determines the page range, extracts the text using a service, and returns the result or an error message.
    server.registerTool(
      "pdf_extract_text",
      {
        description:
          "Extract text content from a PDF file. Returns first 10 pages by default to avoid exceeding LLM context limits. Use the 'pages' parameter for specific pages.",
        inputSchema: z
          .object({
            filePath: z.string().max(4096).describe("Absolute path to the PDF file"),
            pages: z
              .string()
              .max(256)
              .optional()
              .describe(
                "Page range, e.g. '1-5' or '1,3,5'. Defaults to first 10 pages."
              ),
          })
          .strict(),
        annotations: {
          readOnlyHint: true,
          destructiveHint: false,
          idempotentHint: true,
          openWorldHint: false,
        },
      },
      async ({ filePath, pages }) => {
        try {
          const resolvedPath = await validatePdfPath(filePath);
          await validateFileSize(resolvedPath);
    
          const totalPages = await getPdfPageCount(resolvedPath);
    
          let pageIndices: number[];
          let extractedLabel: string;
    
          if (pages) {
            pageIndices = parsePageRange(pages, totalPages);
            extractedLabel = pages;
          } else {
            const count = Math.min(DEFAULT_EXTRACT_PAGES, totalPages);
            pageIndices = Array.from({ length: count }, (_, i) => i);
            extractedLabel = count === 1 ? "1" : `1-${count}`;
          }
    
          const result = await extractPdfText(resolvedPath, pageIndices);
    
          const response: Record<string, unknown> = {
            totalPages,
            extractedPages: extractedLabel,
            pages: result.pages,
          };
    
          if (result.pages.length < totalPages) {
            response.note = `Showing pages ${extractedLabel} of ${totalPages}. Request specific pages for more.`;
          }
    
          return toolSuccess(response);
        } catch (error) {
          return toolError(
            error instanceof Error ? error.message : String(error)
          );
        }
      }
    );

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/AryanBV/pdf-toolkit-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server