paperless_extract_field
Extract a specific field from a document using a built-in extractor or custom regex. The OCR text is parsed locally, returning only the extracted value for privacy.
Instructions
PRIVACY TIER 2 (local extraction): Extract ONLY a specific field from a document. The OCR text is fetched into the MCP server process, parsed LOCALLY, and only the extracted value(s) are returned — the full content never enters the model context. extraction_pattern may be a named built-in extractor (dollar_amounts, dates, emails, phone_numbers, addresses, total_amount) or a custom JavaScript-style regular expression. This is the extensibility point where a future local LLM extraction backend would plug in.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| document_id | Yes | Paperless document ID. | |
| extraction_pattern | Yes | Named built-in extractor (dollar_amounts, dates, emails, phone_numbers, addresses, total_amount) OR a raw regex. If a raw regex, all matches (capture group 1 if present) are returned. | |
| regex_flags | No | Optional flags for a raw-regex pattern (default 'gi'). |