read_pdf_text
Extract text from PDF files with customizable options like page selection, layout preservation, and text encoding using the PDFtotext MCP Server.
Instructions
Extract text content from a PDF file using pdftotext from poppler-utils
Input Schema
Name | Required | Description | Default |
---|---|---|---|
encoding | No | Text encoding for output (default: UTF-8) | UTF-8 |
layout | No | Preserve original text layout formatting (default: false) | |
page | No | Specific page number to extract (1-based indexing). If not specified, extracts all pages. | |
path | Yes | Path to the PDF file (relative to current working directory or absolute path) |
Input Schema (JSON Schema)
{
"properties": {
"encoding": {
"default": "UTF-8",
"description": "Text encoding for output (default: UTF-8)",
"enum": [
"UTF-8",
"Latin1",
"ASCII"
],
"type": "string"
},
"layout": {
"default": false,
"description": "Preserve original text layout formatting (default: false)",
"type": "boolean"
},
"page": {
"description": "Specific page number to extract (1-based indexing). If not specified, extracts all pages.",
"minimum": 1,
"type": "number"
},
"path": {
"description": "Path to the PDF file (relative to current working directory or absolute path)",
"type": "string"
}
},
"required": [
"path"
],
"type": "object"
}