Registry Review MCP Server
Version: 2.0.0 Status: Phase 4.2 Complete (LLM-Native Field Extraction + Performance Optimization) Next: Phase 5 (Integration & Polish)
Automated registry review workflows for carbon credit project registration using the Model Context Protocol (MCP).
Overview
The Registry Review MCP Server automates the manual document review process for carbon credit project registration. It transforms a 6-8 hour manual review into a 60-90 minute guided workflow with complete audit trail and structured outputs.
Core Value Proposition: Enable Registry Agents (like Becca) to process project documentation 5-10x faster by automating:
Document discovery and classification
Evidence extraction and requirement mapping
Cross-document validation
Compliance checking and report generation
Quick Start
Once integrated with Claude Desktop, try the complete workflow:
The prompts guide you through the entire review process automatically!
Features
✅ Phase 1: Foundation (Complete)
Session Management - Create, load, update, and delete review sessions
Atomic State Persistence - Thread-safe file operations with locking
Configuration Management - Environment-based settings with validation
Error Hierarchy - Comprehensive error types for graceful handling
Caching Infrastructure - TTL-based caching for expensive operations
Checklist System - Soil Carbon v1.2.2 methodology with 23 requirements
✅ Phase 2: Document Processing (Complete)
Document Discovery - Recursively scan project directories for documents
Smart Classification - Auto-classify documents by type (project plan, baseline report, monitoring report, etc.)
PDF Text Extraction - Extract text and tables from PDF documents with caching
GIS Metadata - Extract metadata from shapefiles and GeoJSON files
Markdown Integration - Read markdown conversions from marker skill
Quick-Start Workflow - Single-command session creation + discovery
Auto-Selection - Prompts automatically select most recent session
Helpful Guidance - Clear error messages with actionable next steps
✅ Phase 3: Evidence Extraction (Complete)
Requirement Mapping - Automatically map all 23 checklist requirements to documents
Evidence Extraction - Extract text snippets with ±100 words of context
Page Citations - Include page numbers from PDF markers for precise references
Section References - Add markdown section headers for navigation
Keyword-Based Search - Smart keyword extraction with phrase detection and stop-word filtering
Relevance Scoring - Documents scored 0.0-1.0 based on keyword coverage and density
Status Classification - Automatic covered/partial/missing/flagged classification
Coverage Analysis - Overall statistics with confidence scores (85-90% accuracy on test data)
Structured Field Extraction - Extract specific data fields (dates, IDs) using regex patterns
✅ Phase 4: Validation & Reporting (Complete)
Cross-Document Validation - Date alignment (120-day rule), land tenure (fuzzy matching), project ID consistency
Validation Results - Status indicators (pass/warning/fail) with flagged items for review
Report Generation - Markdown and JSON formats with complete findings and evidence
Structured Output - Machine-readable reports with all evidence, validations, and citations
Summary Statistics - Requirements coverage, validation results, items for human review
✅ Phase 4.2: LLM-Native Field Extraction (Complete)
Intelligent Date Extraction - LLM-powered extraction of project dates with high accuracy (80%+ recall)
Land Tenure Analysis - Owner name extraction with fuzzy deduplication (75% similarity threshold)
Project ID Recognition - Automated extraction with false positive filtering
Cost Optimization - Prompt caching (90% reduction), session fixtures (66% reduction), parallel processing
Production-Ready Infrastructure - Retry logic, boundary-aware chunking, comprehensive validation
Quality Assurance - 99 tests (100% passing) with real-world accuracy validation
📋 Phase 5: Planned
Human review workflow and approval decision support
Export to additional formats (PDF, CSV)
Advanced contradiction detection
Integration with external registry systems
Installation
Prerequisites
Python >=3.10
UV package manager
4GB RAM minimum (8GB recommended)
3GB disk space for ML models (marker PDF conversion)
GPU optional but recommended for faster PDF conversion
Setup
First PDF Conversion: On first use, marker will download ML models (~1GB, one-time). This takes 30-60 seconds. Subsequent conversions are fast (models are cached).
System Requirements:
CPU Mode: 10-30 seconds per PDF page (default)
GPU Mode: 2-5 seconds per PDF page (with CUDA GPU)
Storage: ~3GB for marker models + document storage
Memory: 4GB minimum, 8GB recommended for large PDFs
Configuration
Environment Variables (.env)
The MCP server is configured via environment variables. Copy .env.example to .env and configure:
Required for LLM Extraction (Phase 4.2):
Optional Configuration:
See .env.example for all available configuration options.
Claude Desktop Integration
Add to claude_desktop_config.json:
Note: The server automatically loads configuration from .env in the project directory. You don't need to specify environment variables in claude_desktop_config.json unless you want to override them.
Restart Claude Desktop to load the server.
Usage
The Complete Workflow (3 Simple Prompts)
Stage 1: Initialize
Creates a new review session with project metadata.
Stage 2: Document Discovery
Discovers and classifies all documents (auto-selects your session).
Stage 3: Evidence Extraction
Maps all requirements to evidence with page citations (auto-selects your session).
That's it! Three prompts, fully automated, highly informative results.
Alternative: Provide Details to Any Stage
Each prompt can accept project details directly:
This creates a session and discovers documents in one step!
Available Tools
Session Management:
start_review- Quick-start: Create session and discover documents in one stepcreate_session- Create new review sessionload_session- Load existing sessionlist_sessions- List all sessionsdelete_session- Delete a session
Document Processing:
discover_documents- Scan and classify project documentsextract_pdf_text- Extract text from PDF filesextract_gis_metadata- Extract GIS shapefile metadata
Evidence Extraction:
extract_evidence- Map all requirements to documents and extract evidencemap_requirement- Map a single requirement to documents with evidence snippets
Validation:
cross_validate- Run all cross-document validation checksvalidate_date_alignment- Verify dates are within 120-day rulevalidate_land_tenure- Check land tenure consistency with fuzzy name matchingvalidate_project_id- Verify project ID patterns and consistency
Report Generation:
generate_review_report- Generate complete review report in Markdown or JSONexport_review- Export report to custom location
Prompts (Workflow Stages):
/initialize- Stage 1: Create session and load checklist/document-discovery- Stage 2: Discover and classify documents/evidence-extraction- Stage 3: Extract evidence for all requirements/cross-validation- Stage 4: Run cross-document validation checks/report-generation- Stage 5: Generate complete review reports
Example Session
Development
Running Tests
Current Test Coverage:
99 total tests (100% passing)
Phase 1 (Infrastructure): 23 tests
Phase 2 (Document Processing): 6 tests
Phase 3 (Evidence Extraction): 6 tests
Phase 4 (Validation & Reporting): 19 tests
Phase 4.2 (LLM Extraction): 32 tests
Unit tests (extraction, chunking, caching): 20 tests
JSON validation: 17 tests
Real-world accuracy tests: 3 tests
Locking Mechanism: 4 tests
UX Improvements: 3 tests
Integration & Fixtures: 6 tests
Code Quality
Project Structure
The 7-Stage Workflow
✅ Initialize - Create session and load checklist
✅ Document Discovery - Scan and classify all documents
✅ Evidence Extraction - Map requirements to evidence with page citations
✅ Cross-Validation - Verify consistency across documents (dates, land tenure, project IDs)
✅ Report Generation - Generate structured review report (Markdown + JSON)
📋 Human Review - Present flagged items for decision (Phase 5)
📋 Complete - Finalize and export report (Phase 5)
Roadmap
See ROADMAP.md for detailed implementation plan.
Current Status:
✅ Phase 1 (Foundation): Complete
✅ Phase 2 (Document Processing): Complete
✅ Phase 3 (Evidence Extraction): Complete
✅ Phase 4 (Validation & Reporting): Complete
✅ Phase 4.2 (LLM-Native Field Extraction): Complete
📋 Phase 5 (Integration & Polish): Next
Phase 4.2 Achievements:
Core Extraction Capabilities:
LLM-powered field extraction for dates, land tenure, and project IDs
Intelligent date parsing with 80%+ recall on real-world documents
Owner name extraction with fuzzy deduplication (75% similarity threshold)
Project ID recognition with false positive filtering
Performance Optimization (9 Refactoring Tasks Completed):
BaseExtractor Class - Eliminated 240 lines of duplicate code through inheritance
Configuration Documentation - Documented 6 LLM settings in
.env.exampleJSON Validation Tests - 17 comprehensive tests for malformed API responses
Retry Logic - Exponential backoff with jitter (1s → 32s max delay)
Parallel Processing - Concurrent chunk processing with
asyncio.gather()Fuzzy Deduplication - rapidfuzz integration for name variations
Boundary-Aware Chunking - Smart splitting at paragraphs/sentences/words
Prompt Caching - 90% cost reduction with Anthropic ephemeral cache
Integration Test Fixtures - 66% test cost reduction with session-scoped fixtures
Quality Assurance:
99 tests (100% passing) - up from 61 tests
Real-world accuracy validation with Botany Farm ground truth
Comprehensive JSON validation and error handling
Cost tracking and API call monitoring
Performance Metrics:
Full evidence extraction: ~2.4 seconds for 23 requirements
Cross-validation: <1 second for all checks
Report generation: ~0.5 seconds for both Markdown and JSON
Coverage on Botany Farm: 73.9% (11 covered, 12 partial, 0 missing)
Test execution: ~15 seconds for full suite (99 tests)
API cost reduction: 90% via caching, 66% test cost reduction via fixtures
License
Copyright © 2025 Regen Network Development, Inc.
Last Updated: November 12, 2025 Next Milestone: Phase 5 - Integration, Human Review Workflow & Polish