extract_text
Extract text from specific PDF pages or entire documents. Define start and end pages for targeted extraction, or retrieve all text efficiently. Returns text as strings or page-numbered dictionaries.
Instructions
Extract text from PDF pages
Args:
pdf_path: Path to the PDF file
start_page: Page number to start extraction (0-indexed). If None, starts from first page.
end_page: Page number to end extraction (0-indexed, inclusive). If None, ends at start_page if specified, otherwise extracts all pages.
Returns:
If extracting a single page: string containing the page text
If extracting multiple pages: dictionary mapping page numbers to page text
Input Schema
Name | Required | Description | Default |
---|---|---|---|
end_page | No | ||
pdf_path | Yes | ||
start_page | No |
Input Schema (JSON Schema)
{
"properties": {
"end_page": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "End Page"
},
"pdf_path": {
"title": "Pdf Path",
"type": "string"
},
"start_page": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"title": "Start Page"
}
},
"required": [
"pdf_path"
],
"title": "extract_textArguments",
"type": "object"
}