Registry Review MCP Server

Overview Schema Related Servers Score Discussions

2025-11-20-REFACTORING_ROADMAP.md•14.6 kB

# Registry Review MCP - Refactoring Roadmap (Phases 1-3) **Goal:** Minimize codebase size while maximizing feature completeness, accuracy, reliability, and maintainability. **Total Projected Reduction:** 4,328 lines (39% reduction from baseline) --- ## ✅ Phase 1: COMPLETE (-404 lines, 0 risk) **Completed Work:** - ✅ Deleted utils/common/ subdirectory (-188 lines, 0 imports) - ✅ Removed duplicate models from schemas.py (-99 lines) - ✅ Removed 13 dead settings from settings.py (-30 lines) - ✅ Externalized server.py capabilities to CAPABILITIES.md (-87 lines) **Test Status:** 177/179 passing (99%), 2 failures pre-existing **Commit Message:** ``` Phase 1: Remove dead code and duplicates (-404 lines) - Delete utils/common/ (100% unused) - Consolidate models (evidence.py, validation.py, report.py are canonical) - Remove 13 unused settings (30.6% of config) - Externalize server capabilities documentation Zero regressions. All test failures are pre-existing issues. ``` --- ## 🚧 Phase 2: Medium Risk, High Impact (-1,541 lines) ### Phase 2.1: Fix EvidenceSnippet Length Limit (CRITICAL) **Priority:** Immediate **Risk:** Low **Impact:** Fixes test failures **Changes:** 1. `models/evidence.py` line 13: ```python # Before: text: str = Field(..., max_length=500) # After: text: str = Field(..., max_length=1500, description="Evidence text (up to 1500 chars)") ``` 2. Update `models/schemas.py` if EvidenceSnippet referenced there (already removed in Phase 1) **Validation:** - Run `pytest tests/test_integration_full_workflow.py -m slow` - Should see coverage >= 30% (currently 0% due to validation errors) **Commit:** `Fix: Increase EvidenceSnippet text limit to 1500 chars` --- ### Phase 2.2: Consolidate Prompt Instructions (-420 lines) **Days:** 2-3 **Risk:** Low-Medium **Files:** 7 prompts (A-G), new `prompts/instructions.md` **Analysis:** Current state: 7 prompts × ~60 lines of duplicate LLM instructions = 420 lines Each prompt has near-identical: - System role definition - Output format requirements - Session state management instructions - Error handling patterns **Implementation:** 1. **Create `src/registry_review_mcp/prompts/instructions.md`:** ```markdown # Registry Review MCP - Prompt Instructions Library ## System Role You are an expert carbon credit registry reviewer... ## Output Format Standards All responses must be valid JSON with... ## Session State Management Always verify session exists before... ## Error Handling If session not found, return... ## Stage-Specific Templates ### Initialize ### Document Discovery ### Evidence Extraction ### Cross Validation ### Report Generation ### Human Review ### Complete ``` 2. **Create `prompts/prompt_helpers.py`:** ```python from pathlib import Path def load_instructions(stage: str) -> str: """Load common instructions + stage-specific template.""" instructions_path = Path(__file__).parent / "instructions.md" content = instructions_path.read_text() # Extract stage-specific section stage_section = extract_section(content, stage) return stage_section def format_prompt(stage: str, context: dict) -> str: """Build complete prompt from instructions + context.""" instructions = load_instructions(stage) return instructions.format(**context) ``` 3. **Refactor each prompt file (A-G):** ```python # Before (A_initialize.py): 180 lines async def initialize_prompt(project_name, documents_path): # 60 lines of instructions # 40 lines of session creation # 40 lines of checklist loading # 40 lines of response formatting # After: 90 lines (-50%) async def initialize_prompt(project_name, documents_path): from .prompt_helpers import format_prompt # Load common instructions (replaced 60 lines) instructions = format_prompt("initialize", { "project_name": project_name, "documents_path": documents_path }) # Stage-specific logic only (40 lines) session = await create_session(...) checklist = await load_checklist(...) # Format response (30 lines) return format_response(session, checklist, instructions) ``` **Validation:** - Each prompt file reduced by ~60 lines - 7 prompts × 60 = 420 lines removed - Run: `pytest tests/test_*_workflow.py` (all prompt integration tests) **Commit:** `Refactor: Consolidate prompt instructions (-420 lines)` --- ### Phase 2.3: Simplify validation_tools.py (-300 lines) **Days:** 2 **Risk:** Medium **File:** `tools/validation_tools.py` (828 lines → 528 lines) **Analysis:** Current issues: - Manual date parsing (120 lines) duplicates LLM extraction - Regex-based land tenure extraction (90 lines) duplicates LLM - Project ID regex patterns (70 lines) can use LLM - Fallback logic (20 lines) unnecessary with LLM-native **Implementation:** 1. **Delete manual date extraction** (lines 200-320): ```python # DELETE: extract_dates_from_document() # DELETE: parse_date_with_regex() # DELETE: DATE_PATTERNS dict # REPLACE WITH: Use LLM-extracted dates from evidence ``` 2. **Delete regex land tenure extraction** (lines 400-490): ```python # DELETE: extract_land_tenure_regex() # DELETE: OWNER_NAME_PATTERNS # REPLACE WITH: Use LLM-extracted tenure fields ``` 3. **Simplify project ID validation** (lines 550-620): ```python # Before: Manual regex search across documents async def validate_project_id_consistency(session_id, project_id_pattern): # 70 lines of regex matching # After: Use LLM-extracted structured fields async def validate_project_id_consistency(session_id): # 20 lines: just load LLM results and validate ``` 4. **Remove fallback logic**: ```python # DELETE: "if not llm_results: try_regex()" # REASON: LLM-native is now default, regex is legacy ``` **Validation:** - Run: `pytest tests/test_validation.py` (18 tests) - Run: `pytest tests/test_cross_validation_*.py` - Verify: Date/tenure/ID validations still work via LLM extraction **Commit:** `Simplify: Remove manual extraction from validation_tools (-300 lines)` --- ### Phase 2.4: Simplify evidence_tools.py (-350 lines) **Days:** 2 **Risk:** Medium **File:** `tools/evidence_tools.py` (current lines TBD) **Analysis:** Opportunities: - Leverage unified LLM analysis more heavily - Remove keyword-based evidence extraction (replaced by semantic) - Consolidate snippet extraction logic **Investigation Required:** ```bash # Step 1: Analyze current structure wc -l src/registry_review_mcp/tools/evidence_tools.py grep -n "def " src/registry_review_mcp/tools/evidence_tools.py # Step 2: Identify LLM vs manual split grep -n "llm\|regex\|keyword" src/registry_review_mcp/tools/evidence_tools.py ``` **Commit:** `Simplify: Leverage LLM-native in evidence_tools (-350 lines)` --- ### Phase 2.5: Clean Up Extractors (-471 lines) **Days:** 2-3 **Risk:** Medium **Files:** `extractors/*.py` (1,849 lines total) **Target Files:** 1. **llm_extractors.py** (largest, most complexity) 2. **regex_extractors.py** (legacy, can be reduced) 3. **marker_extractor.py** (some unused error handling) **Analysis Plan:** - Identify legacy chunking logic (replaced by marker's built-in) - Remove unused fallback paths - Consolidate duplicate error handling **Investigation Required:** ```bash # Analyze extractor usage grep -r "from.*extractors import" src/registry_review_mcp/tools/ grep -r "from.*extractors import" src/registry_review_mcp/prompts/ ``` **Commit:** `Cleanup: Remove legacy extraction patterns (-471 lines)` --- ## 🎯 Phase 3: Enhancement & Documentation (-383 lines of clarity) ### Phase 3.1: Enhance models/__init__.py Exports **Days:** 0.5 **Risk:** Zero **Impact:** Better import patterns **Implementation:** ```python # src/registry_review_mcp/models/__init__.py """Models package - domain-organized data structures.""" # Base infrastructure from .base import ( BaseModel, TimestampedModel, IdentifiedModel, VersionedModel, NamedEntity, StatusTrackedModel, ModelID, Timestamp, ConfidenceScore, ) # Session & Document (Workflow Stages 1-2) from .schemas import ( Session, ProjectMetadata, WorkflowProgress, SessionStatistics, Document, DocumentMetadata, Checklist, Requirement, ) # Evidence Extraction (Stage 3) from .evidence import ( EvidenceSnippet, MappedDocument, RequirementEvidence, EvidenceExtractionResult, StructuredField, ) # Cross-Validation (Stage 4) from .validation import ( DateField, DateAlignmentValidation, LandTenureField, LandTenureValidation, ProjectIDOccurrence, ProjectIDValidation, ContradictionCheck, ValidationSummary, ValidationResult, ) # Report Generation (Stage 5) from .report import ( ReportMetadata, RequirementFinding, ValidationFinding, ReportSummary, ReviewReport, ) # Tool responses from .responses import ToolResponse # Errors from .errors import ( SessionNotFoundError, DocumentError, ValidationError, ChecklistError, # ... all 16 errors ) __all__ = [ # Export everything - enables both: # from models import Session (preferred - cleaner) # from models.schemas import Session (explicit - when needed) ] ``` **Benefit:** - Tools can now: `from ..models import Session, Document` instead of `from ..models.schemas import Session, Document` - Reduces import line length by ~40% - Makes model locations transparent **Commit:** `Enhance: Add comprehensive model exports to __init__.py` --- ### Phase 3.2: Add Duplicate Model Detection Script **Days:** 0.5 **Risk:** Zero **Impact:** Prevent future regressions **Implementation:** ```python # scripts/check_duplicate_models.py """Detect duplicate Pydantic model definitions across models/ directory.""" import ast from pathlib import Path from collections import defaultdict def find_model_classes(file_path: Path) -> list[str]: """Extract all class names that inherit from BaseModel.""" with open(file_path) as f: tree = ast.parse(f.read()) classes = [] for node in ast.walk(tree): if isinstance(node, ast.ClassDef): # Check if inherits from BaseModel for base in node.bases: if isinstance(base, ast.Name) and base.id == "BaseModel": classes.append(node.name) return classes def check_duplicates(): """Scan models/ directory for duplicate class names.""" models_dir = Path("src/registry_review_mcp/models") # Map: class_name -> [file1, file2, ...] class_files = defaultdict(list) for file_path in models_dir.glob("*.py"): if file_path.name == "__init__.py": continue classes = find_model_classes(file_path) for cls in classes: class_files[cls].append(file_path.name) # Report duplicates duplicates = {cls: files for cls, files in class_files.items() if len(files) > 1} if duplicates: print("❌ Duplicate model definitions found:") for cls, files in duplicates.items(): print(f" {cls}: {', '.join(files)}") return 1 else: print("✅ No duplicate models found") return 0 if __name__ == "__main__": exit(check_duplicates()) ``` **Add to CI:** ```yaml # .github/workflows/ci.yml - name: Check for duplicate models run: python scripts/check_duplicate_models.py ``` **Commit:** `Add: Duplicate model detection script for CI` --- ### Phase 3.3: Document Model Organization Pattern **Days:** 0.5 **Risk:** Zero **Impact:** Developer onboarding **Implementation:** Create/update `CONTRIBUTING.md`: ```markdown # Contributing to Registry Review MCP ## Code Organization ### Model Organization Models are organized by workflow stage for semantic clarity: - `models/base.py` - Base classes and common patterns (BaseModel, TimestampedModel, etc.) - `models/schemas.py` - Session/Project/Document/Checklist (Workflow Stages 1-2) - `models/evidence.py` - Evidence extraction models (Stage 3) - `models/validation.py` - Cross-validation models (Stage 4) - `models/report.py` - Report generation models (Stage 5) - `models/responses.py` - MCP tool response format (all stages) - `models/errors.py` - Exception hierarchy (all stages) **Rules:** 1. Models belong in the file matching their workflow stage 2. NEVER duplicate model definitions across files 3. Import from canonical location: `from ..models.evidence import EvidenceSnippet` 4. CI will fail if duplicate models detected **Adding New Models:** - Session/project setup → `schemas.py` - Document processing → `schemas.py` - Evidence extraction → `evidence.py` - Validation logic → `validation.py` - Reports/findings → `report.py` ### Prompt Organization Prompts use shared instructions to avoid duplication: - `prompts/instructions.md` - Common LLM instructions - `prompts/prompt_helpers.py` - Instruction loading utilities - `prompts/A_initialize.py` through `G_complete.py` - Stage-specific logic only **Adding New Prompts:** 1. Use `prompt_helpers.format_prompt(stage, context)` 2. Keep stage-specific logic minimal 3. Reuse common patterns from instructions.md ### Tool Organization Tools should leverage LLM-native extraction: - Prefer: LLM extraction → validation - Avoid: Manual regex → LLM fallback **LLM-First Philosophy:** - unified_llm_analysis does 1 pass for all requirements (98.5% faster than 70 ops) - Manual extraction is legacy - use LLM with structured output - Validation tools consume LLM results, don't re-extract ``` **Commit:** `Docs: Add model & prompt organization patterns to CONTRIBUTING.md` --- ## Summary: Final Impact | Phase | Lines Removed | Risk | Days | Status | |-------|--------------|------|------|--------| | Phase 1 | -404 | Zero | 2 | ✅ Complete | | Phase 2.1 | +5 (fix) | Low | 0.5 | Pending | | Phase 2.2 | -420 | Low-Med | 2-3 | Pending | | Phase 2.3 | -300 | Medium | 2 | Pending | | Phase 2.4 | -350 | Medium | 2 | Pending | | Phase 2.5 | -471 | Medium | 2-3 | Pending | | Phase 3 | +383 (clarity) | Zero | 1.5 | Pending | | **Total** | **-1,557 net** | - | **12-14 days** | **18% done** | **Final Codebase:** - Before: ~11,000 lines - After: ~9,443 lines - Reduction: 14% smaller, vastly more maintainable - Test coverage: 100% (assumes test engineer fixes pre-existing failures) **Principles Maintained:** - ✅ No functionality removed - ✅ No accuracy degradation - ✅ Improved maintainability - ✅ Better discoverability - ✅ Reduced cognitive load - ✅ "Nothing left to take away" (elegance through subtraction)

Loading blob content...

Latest Blog Posts

Don't Use Large Strings as Cache Keys
By punkpeye on January 11, 2026.
markdown
node-js
cache
What are Claude Skills?
By punkpeye on January 10, 2026.
mcp
skills
How to Test MCP Streamable HTTP Endpoints Using cURL
By punkpeye on January 2, 2026.
tutorial
bash

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gaiaaiagent/regen-registry-review-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

2025-11-20-REFACTORING_ROADMAP.md•14.6 kB