We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/nfsarch33/pdf-mcp-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server
# Changelog
All notable changes to this project will be documented in this file.
This project follows Keep a Changelog and Semantic Versioning.
## Unreleased
## 1.2.18 - 2026-02-13
### Fixed
- **BUG-011**: `server_version` in `get_llm_backend_info` returned stale pip-installed version ("1.2.0") instead of current version from `pyproject.toml`. ([#41](https://github.com/nfsarch33/pdf-mcp-server/issues/41))
- Root cause: `importlib.metadata.version()` reads the version from the last `pip install`, not the current source checkout.
- Fix: `pdf_mcp/__init__._get_version()` now reads directly from `pyproject.toml` (single source of truth) with `importlib.metadata` as fallback for non-editable installs.
- 2 new tests in `TestServerVersionExposure`: `test_version_matches_pyproject`, `test_version_from_init_matches_pyproject`.
### Added
- **Passport surname cleanup integration tests**: 2 new tests in `TestPassportSurnameSingleLetterCleanup` to verify MRZ garbled delimiter recovery at the `_extract_passport_fields` level.
- **Release SOP**: Added "Step 5: Version consistency check" and "Step 8: pip reinstall" to release-sop skill (lesson learned from BUG-011).
- Full regression: 437 passed, 5 skipped, 0 failures.
## 1.2.17 - 2026-02-13
### Fixed
- **MRZ garbled delimiter recovery**: When OCR misreads one `<` in the `<<` surname/given separator as a letter (e.g., `K`), `_parse_mrz_names` now detects `<X<` patterns and recovers the correct split. Fixes LIAN surname contamination ("LIAN K JIZHI" -> "LIAN" / "JIZHI").
- 5 new tests in `TestMRZGarbledDelimiterHeuristic`: K-garbled, L-garbled, normal split preserved, legitimate single-letter name, single-name passport.
### Added
- **Server version in `get_llm_backend_info`**: Response now includes `server_version` field for deployment verification (recommendation from v1.2.16 E2E report).
- 2 new tests in `TestServerVersionExposure`.
- Full regression: 433 passed, 5 skipped, 0 failures.
## 1.2.16 - 2026-02-16
### Fixed
- **BUG-009 (CRITICAL)**: `extract_tables` crash with `"object of type 'TableFinder' has no len()"`.
- PyMuPDF >= 1.24 returns `TableFinder` object instead of list from `page.find_tables()`.
- New `_get_tables_list()` helper normalizes `TableFinder`, plain list, and iterable returns to a plain list.
- Applied to both initial table detection and BUG-007 text-strategy retry path.
- 5 new tests: `TestTableFinderCompat` (4 unit) + `TestExtractTablesTableFinder` (1 integration).
- **consensus_runs not exposed in MCP schema**: Added `consensus_runs` parameter to `server.py` `extract_structured_data()` wrapper with default=1.
- 1 new test: `TestConsensusRunsMCPExposure`.
- Now accessible via MCP tool calls (e.g., `extract_structured_data(..., consensus_runs=3)`).
- **MRZ name parsing**: Added 2 new regression tests for `_parse_mrz_names` with garbled `<<` delimiter scenarios.
- Full regression: 428 passed, 5 skipped, 0 failures.
## 1.2.15 - 2026-02-16
### Added
- **Multi-run VLM consensus**: New `_vlm_field_consensus()` helper computes per-field majority vote across N VLM runs for stochastic error reduction.
- Case-insensitive matching: normalizes values for comparison, preserving original casing in output.
- VLM null-string filtering: excludes "NULL", "None", "N/A" placeholder values from voting.
- Confidence mapping: unanimous (0.95), majority (0.85), no-majority (0.60), single-run (0.85).
- 8 new tests in `TestVLMFieldConsensus`: unanimous, majority-2/3, null-excluded, all-null, empty, single, tie, case-insensitive.
- 1 new test in `TestConsensusRunsParam`: verifies backward-compatible default of `consensus_runs=1`.
- **`consensus_runs` parameter** on `extract_structured_data()`: controls number of VLM calls for passport extraction (default: 1, no behavior change).
- When `consensus_runs > 1`, runs VLM N times and applies `_vlm_field_consensus()` before merging into MRZ-derived data.
- **Skills updates**: Added workflow checklists to `release-sop` skill per Anthropic official guide. Updated `pdf-form-filling` skill with `consensus_runs` parameter reference.
- Full regression: 420 passed, 5 skipped, 0 failures.
## 1.2.14 - 2026-02-12
### Added
- **MRZ `<<` delimiter parsing helper**: New `_parse_mrz_names()` extracts surname and given names from MRZ name fields with OCR-resilient fallback strategies.
- Standard `<<` delimiter split (ICAO 9303 spec).
- OCR artifact normalization: `< <` (space-inserted) automatically converted to `<<` before parsing.
- Fallback: when no delimiter found, treats entire field as surname (single-name passports).
- Replaces inline splitting logic in both TD3 and TD1 code paths (DRY refactor).
- 8 new tests in `TestMRZNameParsing`: standard split, single name, no delimiter, spaced chevron, empty given, multi-word surname, empty input, all fillers.
- **Cursor Skills audit and updates** (following Anthropic official guide):
- Fixed `memo-kb-sync` skill: corrected global-kb path from `~/Code/zendesk/global-kb` to `~/Code/global-kb`.
- Updated `llm-e2e-qa` skill: corrected VLM model reference to Qwen2.5-VL-7B-Instruct.
- Updated `cross-platform-setup` skill: corrected VLM model reference.
- Updated all skill descriptions to third-person per official guide.
- New `pdf-form-filling` skill: form filling workflow, MCP tool reference, and critical rules for cross-instance sharing.
- Updated `FORM_FILLING_PROMPT.md` to v1.2.14 with current VLM model and fill diagnostics.
- Full regression: 411 passed, 5 skipped, 0 failures.
## 1.2.13 - 2026-02-12
### Fixed
- **BUG-007 (MEDIUM)**: `extract_tables` column splitting fails on PDFs without ruling lines.
- New `_is_collapsed_table()` helper detects when all data concatenates into the first column.
- Auto-retries with `"text"` strategy when collapse is detected.
- New `strategy` parameter: `"lines"` (default), `"lines_strict"`, or `"text"`.
- 7 new tests in `TestCollapsedTableDetection` + 2 in `TestExtractTablesStrategy`.
- **BUG-008 (LOW)**: `analyze_pdf_content` entity extraction misidentifies IDs as phones, returns document fragments as names.
- New `_extract_phones()` helper requires separator characters (rejects bare 10-digit reference numbers).
- New `_extract_names()` helper filters newline-spanning matches and common document/form words.
- 6 new tests in `TestEntityExtractionPatterns`.
- Full regression: 403 passed, 5 skipped, 0 failures.
## 1.2.12 - 2026-02-12
### Fixed
- **BUG-006a (HIGH)**: v1.2.11 regression -- VLM returning literal "NULL" string accepted at 0.85 confidence, overwriting valid MRZ-recovered names.
- New `_is_vlm_null_string()` helper filters VLM placeholder strings ("NULL", "None", "N/A", empty) before they can overwrite extracted data.
- Reduced `_sanitize_mrz_name()` penalty from 0.2 to 0.05 for successful recoveries, keeping confidence >= 0.7 to avoid unnecessary VLM fallback.
- 9 new tests in `TestVLMNullStringFilter` + 2 new confidence-threshold tests in `TestMRZNameSanitization`.
- Full regression: 388 passed, 5 skipped, 0 failures.
## 1.2.11 - 2026-02-12
### Fixed
- **BUG-006 (HIGH)**: Passport `given_names` and `surname` containing OCR garbage characters from MRZ filler misreads.
- New `_sanitize_mrz_name()` helper detects and removes OCR-garbled filler characters (repeated chars like "sssss", non-alpha symbols like "££").
- Applied to both TD3 (passport) and TD1 (ID card) MRZ extraction paths (DRY).
- Names with detected garbage get reduced confidence, triggering VLM fallback for name extraction.
- 8 new tests in `TestMRZNameSanitization`: clean names, multi-word, trailing garbage, all-garbage, empty/None, non-alpha, uppercase normalization.
## 1.2.10 - 2026-02-12
### Added
- **Form fill diagnostics**: `fill_pdf_form()` now returns `filled_fields_count`, `total_form_fields`, and `unmatched_fields` in the result dict.
- Detects data keys that don't match any form field (catches typos and wrong field names).
- New DRY helper `_compute_fill_stats()` shared by both fillpdf and pypdf code paths.
- 6 new tests in `TestFormFillDiagnostics`: matched count, total fields, typo detection, all-match, all-unmatched, empty data.
## 1.2.9 - 2026-02-12
### Added
- **LLM retry with exponential backoff**: `_call_llm()` now retries up to 2 times (3 total attempts) on transient failures with exponential backoff (1s, 2s delays).
- New `max_retries` parameter (default: 2) for caller control.
- Retry attempts and final failures logged at DEBUG level.
- 8 new tests in `TestLLMRetryLogic` covering: first-success no-retry, recovery on 2nd attempt, exhaustion, zero-retries, exponential delays, logging, and no-backend edge case.
## 1.2.8 - 2026-02-12
### Added
- **Edge case hardening for form-filling**: 14 new tests covering production-critical scenarios.
- `TestFormFillingEdgeCases` (8 tests): empty data dict, non-existent field names, Unicode values (CJK + accents), very long values, special PDF chars, file-not-found, corrupted PDF, no-AcroForm PDF.
- `TestGetFormFieldsEdgeCases` (4 tests): no-AcroForm, file-not-found, corrupted PDF, XFA error dict.
- `TestEncryptedPdfEdgeCases` (2 tests): encrypted PDF field access and fill attempts.
- All 14 edge case scenarios confirmed handled correctly by existing code — no bugs found, validating robustness.
### Validated
- Full test suite: 355 passed, 5 skipped, 0 failures (~237s) - up from 341/5 (+14 new tests, zero regressions).
## 1.2.7 - 2026-02-12
### Added
- **Structured diagnostic logging**: `pdf_tools` module now uses Python `logging` for visibility into critical decision points. All log messages use `DEBUG` level to avoid noise in production.
- Logging in `fill_pdf_form`: logs which engine completed the fill (`fillpdf` vs `pypdf`), fillpdf fallback reasons, and pypdf checkbox/button fallback triggers.
- Logging in LLM backend selection: `_get_llm_backend()` logs which backend was selected and why (override, auto-detect, or unavailable).
- Logging in LLM calls: `_call_local_llm`, `_call_ollama_llm`, `_call_openai_llm` log failure reasons instead of silently returning `None`.
- Logging in model resolution: `_resolve_local_model_name()` logs auto-detected vs default model.
- Logging in health check: `_check_local_model_server()` logs unreachable or non-200 status.
- 6 new tests in `TestStructuredLogging`: logger existence, fill engine log, backend selection log, LLM failure log, model resolution log, checkbox fallback log.
### Validated
- Full test suite: 341 passed, 5 skipped, 0 failures (~248s) - up from 335/5 (+6 new tests, zero regressions).
## 1.2.6 - 2026-02-12
### Fixed
- **AcroForm checkbox/radio button filling**: `fill_pdf_form` now properly toggles checkbox and radio button fields using `NameObject` values (`/Yes` / `/Off`) instead of `TextStringObject`. Previously, setting checkbox values via `_apply_form_field_values` wrote plain text strings, which PDF viewers ignored.
- **pypdf checkbox crash**: Wrapped `update_page_form_field_values` in try/except to handle `KeyError: '/AP'` when pypdf encounters checkbox widgets without appearance dictionaries. Our `_apply_form_field_values` serves as a robust fallback.
### Added
- `_is_button_field()`: Detects `/Btn` field type on field or parent, supporting merged widget/field structures.
- `_get_checkbox_on_value()`: Inspects `/AP` > `/N` appearances to find the correct "on" state (defaults to `/Yes`).
- `_set_button_value()`: Sets `/V` and `/AS` on button fields to proper `NameObject` values.
- 6 new tests in `TestCheckboxRadioFormFilling`: truthy/falsy/bool/X filling, text field preservation, create-and-fill roundtrip.
### Validated
- Full test suite: 335 passed, 5 skipped, 0 failures (~236s) - up from 325/9 (+6 new tests, +4 previously-skipping pypdf checkbox tests now passing, zero regressions).
## 1.2.5 - 2026-02-16
### Added
- **Field alias groups**: 19 alias groups (DOB/date_of_birth, first_name/given_name, sex/gender, phone/telephone, etc.) for score-3 matching in `fill_pdf_form_any`. Non-standard forms with common label variations now match at maximum confidence.
- `_FIELD_ALIAS_GROUPS` constant with O(1) reverse lookup via `_ALIAS_LOOKUP` dict.
- `_are_aliases()` helper for symmetric alias checking.
- `derived_fields` key in passport output: tracks fields computed (not directly read) by `_cross_validate_passport_dates`. Addresses BUG-005 recommendation #4.
- 10 new tests: alias matching (9), derived flag (1).
### Validated
- Full test suite: 325 passed, 9 skipped, 0 failures (~255s) - up from 315/9 (+10 new tests, zero regressions).
## 1.2.4 - 2026-02-16
### Fixed
- **BUG-005 (CRITICAL)**: Passport `issue_date` was off by -1 day. The VLM derived `issue_date = expiry - 10 years` but passports expire the day BEFORE the 10-year anniversary. New formula: `issue_date = expiry + 1 day - 10 years` per ICAO/international standard. Fixes [#34](https://github.com/nfsarch33/pdf-mcp-server/issues/34).
- **VLM prompt regression**: Removed instruction telling VLM to derive issue_date from expiry. VLM now instructed to READ the issue_date directly from the passport image, returning null if unreadable.
### Added
- `_derive_issue_from_expiry()`: Centralized helper for computing issue date from expiry using correct formula (`expiry + 1 day - 10 years`).
- New detection: `_cross_validate_passport_dates` now detects BOTH VLM error patterns: (1) returning expiry as issue_date, (2) deriving `issue = expiry - 10 years` (BUG-005).
- 8 new tests: BUG-005 cross-validation correction (3), VLM derivation detection (2), legitimate date preservation (1), full pipeline (1), prompt inspection (1).
- Confidence lowered to 0.50 (from 0.60) for derived dates to flag uncertainty.
### Validated
- Full test suite: 315 passed, 9 skipped, 0 failures (~235s) - up from 307/9 (+8 new tests, zero regressions).
- All bug fix tests written first (TDD red-green), then fixes implemented.
- Real-world impact: This bug caused incorrect dates in an Australian immigration form (Form 1006 BVB application). The fix prevents future misreporting.
## 1.2.3 - 2026-02-11
### Fixed
- **BUG-003 (HIGH)**: `_parse_mrz_date()` now accepts `is_expiry` parameter for context-aware century determination per ICAO 9303. Expiry year '33' correctly produces 2033, not 1933. Fixes cascading issue_date cross-validation failures.
- **BUG-004 (LOW)**: `personal_number` from MRZ positions 28-42 now filtered for OCR noise. Lines with <50% digits are discarded as garbage instead of stored as-is.
- **VLM-QUALITY-003 (MEDIUM)**: Chinese passport issuing authority post-processing: VLM responses containing "Ministry of Foreign Affairs" (intro page text) are corrected to "National Immigration Administration, PRC" when issuing_country is CHN. VLM prompt improved with data-page guidance.
### Added
- `_correct_chinese_passport_authority()`: Post-processes VLM authority responses for Chinese passports.
- 8 new tests: BUG-003 expiry century (5), BUG-004 personal_number noise (2), VLM-QUALITY-003 authority correction (1).
### Validated
- Full test suite: 307 passed, 9 skipped, 0 failures (~237s) - up from 299/9 (+8 new tests, zero regressions).
- All bug fix tests written first (TDD red-green), then fixes implemented.
## 1.2.2 - 2026-02-11
### Fixed
- **MRZ-GAP-002 (HIGH)**: MRZ line detection now tolerates OCR noise that produces lines 42-46 chars instead of exactly 44. New `_normalize_mrz_candidate()` trims or pads lines to spec length. Also accepts MRZ line 2 candidates without `<` filler (common OCR artifact).
- **MRZ-GAP-001 (MEDIUM)**: MRZ-derived fields (surname, given_names, nationality, birth_date, sex, expiry_date) now reliably appear in `passport+llm` output because the root cause (MRZ line detection failure on noisy OCR) is fixed.
- **VLM-QUALITY-001 (MEDIUM)**: New `_cross_validate_passport_dates()` detects when VLM returns the expiry date as issue_date and applies domain-knowledge correction (10-year validity rule). VLM prompt now includes MRZ expiry context to prevent confusion.
- **VLM-QUALITY-002 (MEDIUM)**: VLM enhancement prompt now also requests `passport_number` when MRZ/regex extraction fails, ensuring the VLM compensates for poor-quality OCR on second passport scans.
### Added
- `_normalize_mrz_candidate()`: Normalizes near-length MRZ lines to exact ICAO spec length.
- `_normalize_date_to_yymmdd()`: Normalizes various date formats to YYMMDD for MRZ comparison.
- `_cross_validate_passport_dates()`: Cross-validates VLM issue_date against MRZ expiry_date with domain-knowledge fallback.
- 7 new tests: MRZ-GAP-001 output completeness (2), MRZ-GAP-002 noisy OCR tolerance (3), VLM-QUALITY-001 date cross-validation (2).
### Validated
- Full test suite: 299 passed, 9 skipped, 0 failures (~230s) - up from 292/9 (+7 new tests, zero regressions).
- All bug fix tests written first (TDD red-green), then fixes implemented.
## 1.2.1 - 2026-02-11
### Fixed
- **BUG-002 (HIGH)**: `extract_structured_data(data_type="passport")` now uses VLM when available instead of early-returning with `backend: null`. Visual-zone fields (issue_date, issuing_authority, place_of_birth) are enhanced by LLM. Method reports `"passport+llm"` when VLM is used.
- **BUG-002a (MEDIUM)**: Passport number fallback regex no longer captures junk prefixes ("PASSPORT | P CHN"). Issuing authority regex no longer matches "Bearer's signature" OCR noise. Removed overly broad standalone `authority` keyword from pattern.
- **BUG-002b (LOW)**: `extract_structured_data` and `analyze_pdf_content` now correctly read `pages_extracted` from `extract_text` result instead of non-existent `page_count` key. Page counts are no longer always 0.
### Added
- 9 new tests: BUG-002 VLM integration (4), BUG-002a regex quality (3), BUG-002b page count (2).
### Validated
- Full test suite: 292 passed, 9 skipped, 0 failures (~246s) - up from 283/9 (+9 new tests, zero regressions).
- All bug fix tests written first (TDD red-green), then fixes implemented.
- E2E local VLM: all tests passing with Qwen2.5-VL-7B-Instruct on RTX 3090.
## 1.2.0 - 2026-02-10
### Added
- **MRZ checksum validation**: New `_mrz_check_digit()` and `_mrz_validate_field()` per ICAO 9303. Passport number, birth date, and expiry date checksums are validated, boosting confidence to 0.95 when valid.
- **TD1 ID card format**: `_extract_mrz_lines()` now supports TD1 (3x30 chars) and TD2 (2x36) in addition to TD3 (2x44) passports.
- **OCR error correction for MRZ**: New `_correct_mrz_ocr_errors()` fixes common OCR misreads (O->0, I->1, B->8) in digit/alpha positions per MRZ structure.
- **LCS fuzzy matching**: New `_lcs_similarity()` for normalized Longest Common Subsequence scoring. `_score_label_match` now falls back to LCS for abbreviations and fuzzy matches.
- **Multi-line text area detection**: `detect_form_fields()` now includes `detected_multiline_areas` for large blank rectangles suitable for multi-line text input.
- **18 new tests**: MRZ checksum (6), TD1 format (2), OCR correction (2), LCS similarity (6), geometric detection (2).
- Installed pyzbar Python package (libzbar0 system lib requires sudo).
### Changed
- `_extract_passport_fields()` now uses dynamic confidence based on checksum validation (0.95 for validated, 0.7 for unvalidated).
- `_score_label_match()` enhanced with LCS fallback for better non-standard form field matching.
### Validated
- Full test suite: 283 passed, 9 skipped, 0 failures (~236s) - up from 264/10 (+19 new tests, -1 skip).
- All 18 new feature tests pass.
- E2E local VLM: all tests passing with Qwen2.5-VL-7B-Instruct on RTX 3090.
- Lint: all checks passed (ruff).
### Research
- VLM model confirmed: Qwen2.5-VL-7B-Instruct is optimal for RTX 3090 (24GB). No switch needed.
- Potential future upgrade: olmOCR-7B for structured output, RolmOCR for OCR-specific fine-tune.
## 1.1.0 - 2026-02-10
### Fixed
- **Local VLM integration was non-functional**: `_call_local_llm` called non-existent `/generate` endpoint with raw prompt format. Fixed to use OpenAI-compatible `/v1/chat/completions` with proper messages array. This was the root cause of all E2E local VLM test failures.
- **`get_local_server_models` used wrong endpoint**: Called `/models` (404 on vLLM). Fixed to `/v1/models`.
- **`get_local_server_health` crashed on vLLM**: vLLM returns empty 200 from `/health`. Fixed to handle non-JSON responses gracefully.
### Added
- **Model auto-detection**: New `_resolve_local_model_name()` queries `/v1/models` to auto-detect the running model instead of requiring hardcoded `LOCAL_VLM_MODEL` config. Works with any OpenAI-compatible server.
### Changed
- `_call_local_llm` now uses proper `system`/`user` message roles instead of concatenating prompts, enabling better model behavior.
- Updated mock test to match new OpenAI chat completions response format.
### Validated
- Full test suite: 264 passed, 10 skipped, 0 failures in ~230s (best ever; was 261/13).
- E2E local VLM: 5/5 tests passing with Qwen2.5-VL-7B-Instruct on RTX 3090.
- VLM sanity: 2+2=4 in 1.7s via vLLM on RTX 3090 (24GB).
- Lint: all checks passed (ruff).
## 1.0.9 - 2026-02-10
### Changed
- Updated test count in PROJECT_STATUS_PROMPT.md (261 passed, 13 skipped).
- Updated Pepper SOPs: `pdf-mcp-server-release-sop.md` v3.0 now reflects centralized versioning (was still referencing old 6-location strategy).
- Updated `release-sop-tag-based.md` with tag rollback procedure and reference to comprehensive SOP.
### Added
- **Tag rollback procedure**: Verified workflow for deleting and recreating tags. Documented in all release SOPs.
- **VLM Apple Silicon findings**: Documented that Qwen3-VL image/complex extraction times out (>300s) on Apple Silicon via Ollama (CPU-only for vision encoder). Simple text queries work (~25s). NVIDIA GPUs required for production VLM workloads.
### Validated
- Tag-based release CI: v1.0.8 Release workflow succeeded on both original and re-created tag.
- Tag rollback: delete remote tag + release, recreate tag -> CI re-fires and creates new release.
- Full test suite: 261 passed, 13 skipped, 0 failures in 94s.
- Lint: all checks passed (ruff).
- Self-contained scripts: `setup_environment.sh` and `run_local_vlm.sh` have no external repo dependencies.
## 1.0.8 - 2026-02-10
### Changed
- **Version centralization**: Single source of truth is now `pyproject.toml`. Removed hardcoded version strings from `pdf_tools.py`, `server.py`, `README.md`, and `GLOBAL_CURSOR_INSTRUCTIONS.md`. Runtime access via `from pdf_mcp import __version__`.
- **README**: Replaced static version string with dynamic GitHub Release badge (`shields.io/github/v/release`).
- **Release workflow**: Enhanced `.github/workflows/release.yml` with test gate (runs full test suite on tagged commit) and version gate (validates tag matches pyproject.toml + CHANGELOG) before creating GitHub Release.
### Added
- `pdf_mcp/__init__.py` with `__version__` derived from `importlib.metadata.version("pdf-mcp")` at runtime.
- Tag-based release SOP documented in Pepper KB (`tag-based-release-sop.md`).
### Version Locations (reduced from 6 to 2)
- **Bump**: `pyproject.toml` only (single source)
- **Historical**: `CHANGELOG.md` (manual, section headers)
- **Derived**: `pdf_mcp.__version__`, README badge, CI scripts (all auto)
## 1.0.7 - 2026-02-10
### Fixed
- Version consistency: synchronized version strings across all 6 required locations (pyproject.toml, README.md, CHANGELOG.md, pdf_tools.py, server.py, GLOBAL_CURSOR_INSTRUCTIONS.md) - previously 4 files were stale at 1.0.4.
- Updated test count in PROJECT_STATUS_PROMPT.md to match actual results (261 passed, 13 skipped).
### Validated
- Full test suite: 261 passed, 13 skipped, 0 failures in 94s.
- E2E slow tests: 3 passed, 7 skipped.
- VLM manual test: Qwen3-VL-8B via Ollama responding correctly (~16s).
- Lint: all checks passed (ruff).
### Documentation
- Updated WSL Ubuntu onboarding guide with Qwen3-VL model references and GPU notes.
- Updated release SOP checklist to include all 6 version locations (was missing 3).
- Updated release SOP to v2.2 with recurring version mismatch lessons.
## 1.0.6 - 2026-02-10
### Changed
- Ollama `_call_ollama_llm` now sets `num_predict=4096` and `keep_alive='10m'` to prevent Qwen3 thinking tokens from consuming the entire response budget and model unloading mid-session.
### Validated
- Qwen3-VL-8B via Ollama: 100% accurate OCR on synthetic images (10.2s on M4 Pro 48GB warm cache).
- Full-page scanned document OCR via Ollama on Apple Silicon is slow (6+ min); recommended for NVIDIA GPU machines via vLLM for production use.
- Qwen3-VL-30B-A3B downloaded and available via Ollama (19GB).
- Full test suite: 258 passed, 6 skipped, 0 failures (non-slow), 3 passed E2E.
- Cross-platform scripts reviewed: `run_local_vlm.sh` (GPU-aware), `setup_environment.sh` (self-contained).
### Performance Notes
- Apple Silicon (M4 Pro 48GB): VLM works well for small images and text tasks; full-page scanned document OCR is slow through Ollama (use Tesseract for bulk OCR, VLM for accuracy-critical fields).
- NVIDIA GPUs (WSL): vLLM backend recommended for production VLM serving. `run_local_vlm.sh` auto-selects GPU with most VRAM.
## 1.0.5 - 2026-02-09
### Changed
- Default Ollama model updated from `qwen2.5:7b` (text-only) to `qwen3-vl:8b` (vision-language) for improved OCR accuracy on PDF page images.
- Fixed Ollama API compatibility: updated `_call_ollama_llm` to use Pydantic attribute access (`response.message.content`) instead of dict `.get()` for Ollama's ChatResponse.
- `_call_llm` now uses `llm_setup.get_ollama_model_name()` instead of hardcoded default, respecting environment variable overrides.
### Fixed
- Test isolation: patched all "no-LLM" integration tests to explicitly disable LLM backends, preventing real Ollama calls when model is available on the system.
- Updated mock in `test_call_ollama_llm_with_mock` to use `MagicMock` for Pydantic-compatible response objects.
### Validated
- Qwen3-VL-8B via Ollama: 100% accuracy (10/10 fields) on invoice OCR test with 73s warm-cache inference time on M4 Pro 48GB.
- Full test suite: 258 passed, 6 skipped, 0 failures in 93s (non-slow tests).
## 1.0.4 - 2026-01-31
### Added
- `get_form_templates` and `create_pdf_form_from_template` for common client workflows.
- OCR options for `extract_structured_data` (`ocr_engine`, `ocr_language`) to improve passport scans.
### Changed
- Improved non-standard form label matching and checkbox handling in `fill_pdf_form_any`.
- Expanded passport issue date/issuing authority label patterns.
## 1.0.3 - 2026-01-30
### Added
- Non-LLM passport extraction via `extract_structured_data(data_type="passport")` using MRZ parsing and label heuristics.
- Issue date and issuing authority extraction for passport scans when OCR text is available.
- Passport label-only extraction heuristics for low-quality scans without MRZ.
- XFA form detection with explicit unsupported errors in form tools.
## 1.0.2 - 2026-01-30
### Added
- README documentation for Agent Extensions (Skills, Subagents, Hooks)
- Usage examples for skills, subagents, and hooks in README
### Changed
- Updated skipped test count from 12-18 to 8 (accurate after Tesseract install)
## 1.0.1 - 2026-01-30
### Added
- **Agent Skills**: Project-level skills in `.cursor/skills/`
- `release-sop`: End-to-end release checklist automation
- `llm-e2e-qa`: LLM E2E test and manual QA instructions
- `memo-kb-sync`: Bi-directional memory sync procedures
- **Subagents**: Custom AI assistants in `.cursor/agents/`
- `verifier`: Validates completed work (skeptical validator)
- `debugger`: Root cause analysis specialist
- `test-runner`: Proactive test automation
- **Hooks**: Agent behavior control in `.cursor/hooks.json`
- `beforeShellExecution`: Blocks destructive commands (git reset --hard, rm -rf)
### Changed
- Tesseract OCR verified working (v5.5.2)
- Test coverage confirmed: 260 passed, 8 skipped (stable)
### Technical Notes
- Skills use YAML frontmatter with name and description
- Subagents can be invoked explicitly via /name syntax
- Hooks run via `block-destructive.sh` shell script
## 1.0.0 - 2026-01-30
### Milestone: Production Release
This release marks the first stable production version of pdf-mcp-server.
### Highlights
- **51 PDF tools** across 13+ categories
- **268 tests** (260 passed, 8 skipped)
- **Multi-backend LLM support**: Local VLM, Ollama, OpenAI
- **E2E verified**: All LLM backends tested with real servers
### LLM Integration (v0.9.x series)
- Local VLM server at localhost:8100 (free, recommended)
- Ollama integration with qwen2.5:1.5b (free)
- OpenAI API support (paid, optional)
- `auto_fill_pdf_form`: LLM-powered form filling
- `extract_structured_data`: Entity extraction from PDFs
- `analyze_pdf_content`: Document analysis and summarization
### Test Coverage
- Local VLM: 5/5 E2E tests passing
- Ollama: 2/2 E2E tests passing
- OpenAI: 2/2 skipped (no API key)
- Core: 260 tests passing
### Technical Notes
- pypdf form filling has known bug (tests skip gracefully)
- Recommended model: Qwen3-VL-30B-A3B for DocVQA tasks
- Idempotent model installation via `make install-llm-models`
## 0.9.9 - 2026-01-29
### Added
- Ollama integration now fully tested (installed Ollama v0.15.2 with qwen2.5:1.5b)
- pyzbar/zbar barcode detection now working
### Changed
- Test coverage improved: 260 passed, 8 skipped (was 256 passed, 12 skipped)
- E2E tests: 8 passed, 2 skipped (was 6 passed, 4 skipped)
### Technical Notes
- Remaining skips are acceptable:
- 4 pypdf bug (external library issue)
- 2 OpenAI (no API key - optional)
- 2 Tesseract (tests error handling behavior)
- All core functionality is tested and verified
## 0.9.8 - 2026-01-29
### Fixed
- Tests now gracefully skip on pypdf `AttributeError` bug in form filling
- Added try/except handling for known pypdf bug with `get_object()` on dict
### Technical Notes
- pypdf has a bug where `get_object()` is called on plain dict instead of DictionaryObject
- Affects form filling with certain PDF structures
- Tests skip with clear message instead of failing
- Bug present in pypdf 5.1.0 through 6.5.0+
## 0.9.7 - 2026-01-29
### Fixed
- `check_llm_status.py` now handles models returned as list of dicts (not just strings)
- Test mocking for `get_local_server_health/models` uses `unittest.mock.patch` correctly
- Updated start command to use `.venv/bin/activate` instead of `uv run`
### Verified (E2E with Real LLM)
- Local VLM server running at localhost:8100 with 13 models available
- All 6 E2E local VLM tests passed
- `make check-llm` shows correct model list
- 256 passed, 12 skipped
## 0.9.6 - 2026-01-29
### Changed
- **DRY cleanup**: Consolidated `LOCAL_MODEL_SERVER_URL` and `LOCAL_VLM_MODEL` into `llm_setup.py`
- `pdf_tools.py` now imports LLM config from `llm_setup.py` (single source of truth)
- Test count: 268 total (1 new test for LOCAL_VLM_MODEL)
### Technical Notes
- Follows DRY/KISS/SOLID principles for cleaner, more maintainable code
- No functional changes - refactoring only
## 0.9.5 - 2026-01-29
### Added
- `get_local_server_health()` and `get_local_server_models()` for local server diagnostics
- Enhanced Ollama E2E test skips with specific model presence checks
- Recommended model info in `make check-llm` output (Qwen3-VL-30B-A3B)
- 5 new tests for local server diagnostics (total: 267 tests)
### Changed
- Ollama E2E tests now show clear skip reasons (missing CLI, service, or model)
- `check_llm_status.py` reports loaded models when available
### Technical Notes
- Research-backed recommendation: Qwen3-VL-30B-A3B for best DocVQA (95.7%)
- MoE architecture: 30B params but only ~8B active per token
- Local server at localhost:8100 preferred (free, no API costs)
## 0.9.4 - 2026-01-29
### Added
- `pdf_mcp/llm_setup.py` helpers for Ollama model detection
- `scripts/ensure_ollama_model.py` to avoid duplicate model installs
- `make install-llm-models` target for safe model setup
### Changed
- `scripts/check_llm_status.py` output now reports model status and uses ASCII formatting
- README LLM setup now uses `make install-llm-models` (skips duplicate downloads)
- Test count: 262 total (7 new llm setup tests)
### Verified (Manual Testing)
- Local VLM: `make check-llm` shows local available
- E2E: `make test-e2e` passed for local backend (Ollama/OpenAI skipped)
## 0.9.3 - 2026-01-29
### Added
- New `scripts/check_llm_status.py` script for formatted LLM status output
- `backend_available` field in `extract_structured_data` response for transparency
- Improved `make check-llm` output with colored status and setup instructions
### Fixed
- Tests now properly handle when real LLM backends are running
- `test_auto_fill_without_any_llm_returns_error` - patches all backends
- `test_call_llm_without_any_backend_returns_none` - patches all backends
- E2E test assertions now allow pattern matching success (no LLM needed)
### Changed
- Test count: 243 passed, 12 skipped (with local server running)
- Improved test isolation for LLM backend availability
### Verified (Manual Testing)
- ✅ Local VLM backend working correctly at localhost:8100
- ✅ `get_llm_backend_info()` detects local server
- ✅ `_call_local_llm()` returns valid responses (2+2=4, capital of France=Paris)
- ✅ `extract_structured_data()` correctly extracts invoice data
- ✅ `analyze_pdf_content()` generates summaries with local LLM
- ✅ E2E tests pass with real local backend
## 0.9.2 - 2026-01-29
### Added
- **E2E Tests with Real LLM Backends** (not just mocks!)
- `TestE2ELocalVLM`: 5 tests that call actual local model server at localhost:8100
- `TestE2EOllama`: 2 tests that call actual Ollama service
- `TestE2EOpenAI`: 2 tests that call actual OpenAI API
- `TestBackendComparison`: 1 test comparing outputs across backends
- **Makefile LLM targets**:
- `make test-llm`: Run all LLM-related tests (mocked)
- `make test-e2e`: Run E2E tests with real LLM backends (requires running servers)
- `make check-llm`: Check LLM backend status
- `make install-llm`: Install LLM dependencies
- Registered `pytest.mark.slow` marker for E2E tests
- Documentation for LLM test targets in README
### Changed
- Total tests increased from 237 to 255 (+18 new tests)
- Skipped tests increased from 8 to 18 (E2E tests skip when servers not available)
- Updated README test coverage section with LLM test documentation
### Technical Notes
- E2E tests automatically skip if corresponding backend is not available
- Local VLM tests require model server at `http://localhost:8100`
- Ollama tests require Ollama service running locally
- OpenAI tests require `OPENAI_API_KEY` (incurs actual API costs!)
## 0.9.1 - 2026-01-29
### Added
- 20 new comprehensive integration tests for v0.9.0 multi-backend LLM support
- Local VLM integration tests
- Ollama integration tests
- Backend field verification tests
- Backend fallback chain tests
- Environment configuration tests
- Unified `_call_llm` routing tests
- MCP tool registration tests for v0.9.0
### Fixed
- Test compatibility with optional Ollama dependency (proper skip handling)
- Test compatibility with pypdf form filling edge cases
### Test Coverage
- Total tests increased from 217 to 237
- 8 tests skip when optional dependencies unavailable
## 0.9.0 - 2026-01-29
### Added
- **Local VLM Support**: Cost-free local model integration for agentic AI features.
- Multi-backend support: `local` (localhost:8100), `ollama`, and `openai`
- Backend auto-detection with priority: local > ollama > openai
- `get_llm_backend_info()`: Check available backends and current selection
- Environment variable `PDF_MCP_LLM_BACKEND` for backend override
- Environment variable `LOCAL_MODEL_SERVER_URL` for custom server URL
- New helper functions:
- `_check_local_model_server()`: Health check for local server
- `_call_local_llm()`: Call local model server
- `_call_ollama_llm()`: Call Ollama models
- `_get_llm_backend()`: Auto-select best available backend
- 18 new tests for multi-backend support (local VLM, Ollama, OpenAI)
### Changed
- All agentic functions now support `backend` parameter to force specific backend
- Default model changed from `gpt-4o-mini` to `auto` (auto-selects based on backend)
- Agentic functions return `backend` field indicating which LLM was used
- Total test count increased from 199 to 217
- Tool count increased from 50 to 51
### Technical Notes
- **Zero cost option**: Local VLM using Qwen3-VL-30B-A3B at localhost:8100
- **Best VLM for Mac**: Qwen3-VL-30B-A3B (MoE, 95.7% DocVQA, 16.5GB memory)
- **Cross-platform**: Ollama support for easy deployment anywhere
- All backends gracefully degrade - pattern matching works without any LLM
## 0.8.0 - 2026-01-28
### Added
- **Agentic AI Integration**: LLM-powered PDF processing capabilities.
- `auto_fill_pdf_form`: Intelligent form filling with LLM-powered field mapping. Maps source data to form fields even when names don't exactly match.
- `extract_structured_data`: Extract structured data from PDFs using pattern matching or LLM. Supports invoice, receipt, contract types and custom schemas.
- `analyze_pdf_content`: Document analysis including type classification, entity extraction (dates, amounts, names), and summarization.
- `[llm]` optional dependency group for OpenAI integration.
- 22 new tests for agentic features (unit tests with mocks + integration tests).
- LLM helper functions: `_call_llm`, `_HAS_OPENAI` flag for optional OpenAI support.
### Changed
- Total test count increased from 180 to 199 (202 collected, 3 skipped for optional deps).
- Tool count increased from 47 to 50.
- Module docstrings updated to reflect new agentic capabilities.
### Technical Notes
- Agentic features gracefully degrade without OpenAI:
- `auto_fill_pdf_form`: Falls back to direct field name matching
- `extract_structured_data`: Uses pattern-based extraction
- `analyze_pdf_content`: Uses keyword-based classification
- Pattern matching supports common document types without LLM dependency.
- LLM integration uses `gpt-4o-mini` by default for cost efficiency.
## 0.7.0 - 2026-01-26
### Removed
Deprecated tools have been removed as announced in v0.6.0:
- `insert_text`, `edit_text`, `remove_text` - Use `add_text_annotation`, `update_text_annotation`, `remove_text_annotation` instead
- `extract_text_native`, `extract_text_ocr`, `extract_text_smart`, `extract_text_with_confidence` - Use `extract_text` with appropriate `engine` parameter instead
- `split_pdf_by_bookmarks`, `split_pdf_by_pages` - Use `split_pdf` with `mode` parameter instead
- `export_to_markdown`, `export_to_json` - Use `export_pdf` with `format` parameter instead
- `get_full_metadata` - Use `get_pdf_metadata(pdf_path, full=True)` instead
### Changed
- **API Cleanup**: 12 deprecated functions removed, consolidating into 4 unified tools
- Internal implementations refactored with `_impl` helper functions for cleaner code
- All tests updated to use the new consolidated API
- Tool count reduced from 59 to 47 (cleaner API surface)
### Fixed
- Server module docstring had duplicate "PII detection" line
## 0.6.0 - 2026-01-28
### Added
- **Consolidated API**: Unified tools for cleaner, more maintainable API surface.
- `extract_text`: Unified text extraction with engine selection (native, auto, smart, ocr, force_ocr) and optional confidence scores.
- `split_pdf`: Unified PDF splitting with mode selection (pages, bookmarks).
- `export_pdf`: Unified export with format selection (markdown, json).
- `get_pdf_metadata(full=True)`: Extended metadata including document info.
- 12 new integration tests for consolidated API tools.
### Changed
- Total test count increased from 168 to 180.
### Deprecated
The following tools are deprecated and will be removed in v0.7.0:
- `insert_text`, `edit_text`, `remove_text` → Use `add_text_annotation`, `update_text_annotation`, `remove_text_annotation`
- `extract_text_native`, `extract_text_ocr`, `extract_text_smart`, `extract_text_with_confidence` → Use `extract_text`
- `split_pdf_by_bookmarks`, `split_pdf_by_pages` → Use `split_pdf`
- `export_to_markdown`, `export_to_json` → Use `export_pdf`
- `get_full_metadata` → Use `get_pdf_metadata(full=True)`
## 0.5.2 - 2026-01-29
### Added
- Signature timestamping via `timestamp_url` for `sign_pdf` and `sign_pdf_pem`.
- Revocation checks and validation embedding controls for signing.
- DocMDP permission selection (`no_changes`, `fill_forms`, `annotate`) for signed PDFs.
- Integration test coverage for timestamped and DocMDP-signed PDFs.
## 0.5.1 - 2026-01-28
### Added
- `sign_pdf`: digitally sign PDFs using PKCS#12/PFX certificates.
- `sign_pdf_pem`: digitally sign PDFs using PEM key + cert chain.
- Integration tests for certificate-based signing.
- `cryptography` dependency for test certificate generation.
## 0.5.0 - 2026-01-28
### Added
- `create_pdf_form`: create PDF files with standard AcroForm fields.
- `fill_pdf_form_any`: fill standard and non-standard forms using label detection.
- `add_highlight`: add highlight annotations by text or rectangle.
- `add_date_stamp`: add date stamps as FreeText annotations.
- `detect_pii_patterns`: detect common PII patterns (email, phone, SSN, credit cards).
- Release runbook: added PR/branch hygiene SOP.
## 0.4.1 - 2026-01-27
### Added
- `PROJECT_MEMO/PROJECT_STATUS_PROMPT.md`: status prompt for v0.3.0 planning.
- `PROJECT_MEMO/README.md`: reference to the new status prompt.
## 0.4.0 - 2026-01-27
### Added
- `reorder_pages`: reorder PDF pages with an explicit 1-based page list.
- `redact_text_regex`: redact text using a regex pattern.
- `sanitize_pdf_metadata`: remove standard and custom metadata keys.
- `export_to_markdown`: export PDF text to Markdown.
- `export_to_json`: export PDF text and metadata to JSON.
- `add_page_numbers`: add page numbers as annotations.
- `add_bates_numbering`: add Bates numbering as annotations.
- `verify_digital_signatures`: validate digital signatures in PDFs.
- `get_full_metadata`: return full metadata and document info.
## 0.3.0 - 2026-01-27
### Added
- **Link Extraction**
- `extract_links`: Extract URLs, hyperlinks, and internal references from PDFs
- Link categorization by type (uri, goto, external_goto, launch, named)
- Page-level link filtering
- **PDF Optimization**
- `optimize_pdf`: Compress/reduce PDF file size
- Quality settings: low (max compression), medium, high (min compression)
- Reports original/optimized size and compression ratio
- **Barcode/QR Code Detection**
- `detect_barcodes`: Detect and decode barcodes and QR codes in PDFs
- Supports QR codes, Code128, Code39, EAN13, EAN8, UPC-A, etc.
- Requires optional pyzbar library
- **Page Splitting**
- `split_pdf_by_bookmarks`: Split PDFs by table of contents/bookmarks
- `split_pdf_by_pages`: Split PDFs by page count
- Configurable pages per split
- **PDF Comparison**
- `compare_pdfs`: Diff two PDFs and identify differences
- Compares page count, text content, and optionally images
- Generates human-readable summary
- **Batch Processing**
- `batch_process`: Process multiple PDFs with a single operation
- Supports: get_info, extract_text, extract_links, optimize
- Reports individual success/failure for each file
- 42 new integration tests for Phase 3 features
- Module docstrings in pdf_tools.py and server.py
### Changed
- Total test count increased to 149
## 0.2.0 - 2026-01-26
### Added
- **OCR Phase 2: Enhanced OCR with Multi-language Support**
- `get_ocr_languages`: Get available OCR languages and Tesseract status
- `extract_text_with_confidence`: OCR with word-level confidence scores (0-100)
- Multi-language support using Tesseract language codes (e.g., "eng+fra")
- Confidence filtering with `min_confidence` parameter
- **Table Extraction**
- `extract_tables`: Extract tables from PDF pages as structured data
- Output formats: "list" (list of lists) or "dict" (list of dicts with headers)
- Table bounding box and cell data extraction
- **Image Extraction**
- `extract_images`: Extract embedded images to files (png/jpeg/ppm)
- `get_image_info`: Get image metadata without extracting
- Configurable minimum dimensions filter
- Image position and format information
- **Smart/Hybrid Text Extraction**
- `extract_text_smart`: Per-page method selection (native vs OCR)
- Configurable native text threshold for OCR fallback
- Optimal handling of hybrid documents with mixed page types
- **Form Auto-Detection**
- `detect_form_fields`: Detect potential form fields using text analysis
- Label pattern detection (Name:, Date:, Address:, etc.)
- Checkbox/selection pattern detection
- Suggestions for non-AcroForm PDF forms
- Field type guessing based on label text
- Comprehensive integration tests for all Phase 2 features (38 new test cases)
### Changed
- Project goal clarified: "Extract 99% of information from any PDF file"
## 0.1.3 - 2026-01-06
### Added
- **OCR Support (Phase 1)**: New tools for text extraction from scanned/image-based PDFs.
- `detect_pdf_type`: Classify PDFs as "searchable", "image_based", or "hybrid" with detailed metrics.
- `extract_text_native`: Fast native text layer extraction (no OCR).
- `extract_text_ocr`: Text extraction with OCR fallback; supports auto/native/tesseract/force_ocr engines.
- `get_pdf_text_blocks`: Extract text blocks with bounding box positions for layout analysis.
- Optional `[ocr]` dependency group: `pytesseract` and `pillow` for Tesseract integration.
- Comprehensive OCR test suite (`tests/test_ocr.py`) covering 9 PDF fixtures with 33+ test methods.
- New PDF test fixtures for OCR testing: scanned documents, image-based PDFs, hybrid documents.
### Changed
- Updated project description to reflect OCR capabilities.
- README now includes OCR setup instructions and tool documentation.
## 0.1.2 - 2025-12-17
### Fixed
- `fill_pdf_form`: if `fillpdf/pdfrw` cannot parse PDFs with compressed object streams (common in some Adobe InDesign exports), we fall back to the `pypdf` fill path so the operation succeeds.
- `flatten_pdf`: same robustness as above; falls back to `pypdf` when `fillpdf/pdfrw` cannot parse the input.
- `flatten_pdf` internal behavior: handle PDFs where `/AcroForm` is an indirect object and ensure `/Annots` updates use proper PDF object keys.
### Added
- Real-world regression coverage using `tests/1006.pdf` (InDesign-style form PDF) that runs every MCP tool end-to-end with two scenarios.
## 0.1.1 - 2025-12-16
### Added
- `clear_pdf_form_fields`: clear (delete) values for selected form fields while keeping fields fillable.
- `encrypt_pdf`: password-protect PDFs (intended after `add_signature_image` to protect a signed PDF).
- Cursor post-push smoke test: `scripts/cursor_smoke.py` and `docs/CURSOR_SMOKE_TEST.md`.
### Changed
- Form filling is more robust on non-standard AcroForms; values are persisted in `/V` and `encrypt_pdf` normalizes trailer IDs for compatibility.
- Memory/rules hygiene: repo includes `.cursor/rules/template.rules` and documented SOP to keep academic/personal content untracked.
## 0.1.0
### Added
- MCP server over stdio with PDF tools for form fields, form fill, flatten, merge, extract, rotate.
- Annotation and page editing tools.
- Managed text insert, edit, remove via FreeText annotations.
- Metadata get and set tools.
- GitHub Actions workflows: CI, CodeQL, dependency review, optional AI review.