FPD_get_document_content_with_mistral_ocr
Extract text from USPTO petition documents using hybrid extraction: free PyPDF2 for text-based PDFs, Mistral OCR for scanned documents. Analyze legal arguments, issues, and patterns in petition decisions.
Instructions
Extract full text from USPTO petition documents with intelligent hybrid extraction (PyPDF2 first, Mistral OCR fallback).
PREREQUISITE: First use fpd_get_petition_details to get document_identifier from documentBag. Auto-optimizes cost: free PyPDF2 for text-based PDFs, ~$0.001/page Mistral OCR only for scanned documents. MISTRAL_API_KEY is optional - without it, only PyPDF2 extraction is available (works well for text-based PDFs).
USE CASES:
Analyze petition legal arguments and Director's reasoning
Extract petition issues, CFR rules cited, statutory references
Detect patterns across multiple petitions (e.g., common denial reasons)
Correlate petition text with PTAB challenge strategies
Profile examiner behavior from supervisory review petitions
COST OPTIMIZATION:
auto_optimize=True (default): Try free PyPDF2 first, fallback to Mistral OCR if needed (70% cost savings)
auto_optimize=False: Use Mistral OCR directly (~$0.001/page)
Returns: extracted_content, extraction_method, processing_cost_usd, page_count
Example workflow:
fpd_get_petition_details(petition_id='0b71b685-...', include_documents=True)
fpd_get_document_content(petition_id='0b71b685-...', document_identifier='DSEN5APWPHOENIX')
Analyze extracted text for legal arguments, issues, and patterns
For document selection strategies and cost optimization, use FPD_get_guidance('cost').
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| petition_id | Yes | ||
| document_identifier | Yes | ||
| auto_optimize | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |