read_pdf
Extract text and metadata from PDF documents using OCR, returning structured markdown content with bounding boxes and page data for AI processing.
Instructions
Read a PDF document and return the complete OCRResponse as a dictionary.
Returns the full OCR response including all pages, not just the first page. The response includes pages with markdown content, bounding boxes, and other OCR metadata.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| absolute_path | Yes |
Implementation Reference
- main.py:103-126 (handler)The handler function for the 'read_pdf' tool. It uses the Lizeur class to perform OCR on the PDF file at the given absolute path and returns a structured dictionary with the OCR results including pages and metadata. The @mcp.tool() decorator registers it as an MCP tool.@mcp.tool() def read_pdf(absolute_path: str) -> dict: """Read a PDF document and return the complete OCRResponse as a dictionary. Returns the full OCR response including all pages, not just the first page. The response includes pages with markdown content, bounding boxes, and other OCR metadata. """ ocr_response = Lizeur().read_document(Path(absolute_path)) if ocr_response is None: return {"error": "Failed to process document"} response_data = ocr_response.model_dump() return { "status": "success", "document_path": absolute_path, "total_pages": len(response_data.get("pages", [])), "pages": response_data.get("pages", []), "metadata": { k: v for k, v in response_data.items() if k != "pages" and k != "model_dump_json" }, }
- main.py:33-64 (helper)Core helper method in Lizeur class that handles reading and caching of OCR results for documents, used by the read_pdf tool.def read_document(self, path: Path) -> OCRResponse | None: """Read a document and return the OCRResponse.""" logging.info(f"read_document: Reading document {path.name}") # Check if the document is already cached cached_document_path = self.cache_path / path.name if cached_document_path.exists(): logging.info(f"read_document: Document {path.name} is already cached.") try: with open(cached_document_path, "r") as f: cached_json = f.read() # Parse JSON and reconstruct OCRResponse cached_data = json.loads(cached_json) return OCRResponse.model_validate(cached_data) except (json.JSONDecodeError, ValueError) as e: logging.warning(f"Failed to load cached document {path.name}: {e}") # Remove corrupted cache file cached_document_path.unlink(missing_ok=True) # OCR the document ocr_response = self._ocr_document(path) if ocr_response is None: return None # Cache the document using model_dump_json() for direct JSON serialization try: with open(cached_document_path, "w") as f: f.write(ocr_response.model_dump_json(indent=2)) logging.info(f"Successfully cached document {path.name}") except Exception as e: logging.error(f"Failed to cache document {path.name}: {e}") return ocr_response
- main.py:66-100 (helper)Private helper method that performs the actual OCR using MistralAI API, including upload, processing, and cleanup, called by read_document.def _ocr_document(self, path: Path) -> OCRResponse | None: """OCR a document and return the OCRResponse.""" try: # Upload the file to MistralAI uploaded_file = self.mistral.files.upload( file={ "file_name": path.stem, "content": path.read_bytes(), }, purpose="ocr", ) # Process the uploaded file with OCR ocr_response = self.mistral.ocr.process( document={ "type": "file", "file_id": uploaded_file.id, }, model="mistral-ocr-latest", include_image_base64=True, ) # Clean up the uploaded file try: self.mistral.files.delete(uploaded_file.id) except Exception as e: logging.warning( f"Failed to delete uploaded file {uploaded_file.id}: {e}" ) return ocr_response except Exception as e: logging.error(f"OCR processing failed for {path}: {e}") return None