Registry Review MCP Server

Overview Schema Related Servers Score Discussions

TEST_INFRASTRUCTURE_ANALYSIS_2025-11-19.md•23 kB

# Test Infrastructure Analysis and Optimization Recommendations **Date:** 2025-11-19 **Scope:** Complete test infrastructure audit covering fixtures, factories, helpers, and patterns **Total Test LOC:** 7,931 lines across 24 test files **Total Test Methods:** 318 tests --- ## Executive Summary The test suite demonstrates thoughtful infrastructure with factory patterns and shared fixtures, but significant opportunities exist for reduction and optimization. This analysis identifies **3,200-4,500 characters** of potential savings through consolidation, helper functions, and pattern extraction. ### Key Findings 1. **Strong Foundation:** Factory pattern implementation (factories.py) is well-designed 2. **Underutilization:** Only 3 of 24 test files use SessionManager context manager 3. **High Duplication:** Manual session cleanup appears 36 times across files 4. **PDF Creation:** Base64 PDF generation duplicated in fixtures vs factories 5. **Assertion Patterns:** Common validation logic not extracted to helpers --- ## 1. Fixture Efficiency Analysis ### Current State **conftest.py Fixtures:** - `temp_data_dir` - Used widely ✓ - `test_settings` - Used widely ✓ - `cache` - Low utilization - `cleanup_sessions` (autouse) - Good design ✓ - `cleanup_examples_sessions` - Explicit use, good separation ✓ - `example_documents_path` - Used widely ✓ - `botany_farm_markdown` (session scope) - EXCELLENT cost optimization ✓ - `botany_farm_dates/tenure/project_ids` (function scope) - Good caching ✓ **Redundant Fixtures in test_upload_tools.py:** ```python @pytest.fixture def sample_pdf_base64(): # 14 lines of PDF content pdf_content = b"%PDF-1.4..." return base64.b64encode(pdf_content).decode('utf-8') @pytest.fixture def sample_pdf2_base64(): # Similar but different content ... @pytest.fixture def sample_pdf3_base64(): # Third variation ... @pytest.fixture def sample_text_base64(): ... ``` **Problem:** These duplicate functionality already in `factories.py`: ```python def pdf_file(suffix: str = "", filename: str = "test.pdf") -> FileBuilder: return FileBuilder().with_filename(filename).with_pdf_content(suffix) def text_file(content: str = "Test content", filename: str = "test.txt") -> FileBuilder: return FileBuilder().with_filename(filename).with_text_content(content) ``` ### Recommendations #### 1.1 Consolidate PDF Generation **Remove:** test_upload_tools.py fixtures (sample_pdf_base64, sample_pdf2_base64, sample_pdf3_base64, sample_text_base64) **Replace with:** Factory functions from factories.py **Estimated Savings:** 80-100 lines (~2,400 chars) ```python # Before (in test_upload_tools.py) def test_create_session_success(self, test_settings, sample_pdf_base64): result = await upload_tools.create_session_from_uploads( files=[{"filename": "test.pdf", "content_base64": sample_pdf_base64}] ) # After from tests.factories import pdf_file def test_create_session_success(self, test_settings): result = await upload_tools.create_session_from_uploads( files=[pdf_file().build()] ) ``` #### 1.2 Create Shared Mock Fixtures **Current:** Mock creation duplicated across test_llm_extraction.py, test_marker_integration.py **Recommendation:** Add to conftest.py ```python # conftest.py @pytest.fixture def mock_llm_response(): """Factory for creating mock LLM responses.""" def _create(json_content: str, use_code_fence: bool = True): if use_code_fence: text = f"```json\n{json_content}\n```" else: text = json_content mock = Mock() mock.content = [Mock(text=text)] return mock return _create @pytest.fixture def mock_llm_client(mock_llm_response): """Factory for creating mock LLM clients.""" def _create(response_json: str): response = mock_llm_response(response_json) client = AsyncMock() client.messages.create.return_value = response return client return _create ``` **Estimated Savings:** 60-80 lines across 3 files (~1,800 chars) --- ## 2. Factory Pattern Usage Optimization ### Current Utilization **Using SessionManager (3 files):** - test_validation.py ✓ - test_integration_full_workflow.py ✓ - factories.py (definition) ✓ **NOT using SessionManager (21 files):** - test_upload_tools.py - 24 manual try/finally blocks - test_document_processing.py - 2 manual cleanup calls - test_evidence_extraction.py - 5 manual cleanup calls - test_user_experience.py - 3 manual cleanup calls - test_infrastructure.py - 1 manual cleanup call - Plus 16 more files ### Pattern: Manual Cleanup Anti-Pattern **Duplicated 36 times across codebase:** ```python try: # Test code result = await upload_tools.create_session_from_uploads(...) assert result["success"] is True # ... more assertions finally: await session_tools.delete_session(result["session_id"]) ``` ### Recommendations #### 2.1 Extend SessionManager Usage **Target Files:** test_upload_tools.py, test_document_processing.py, test_evidence_extraction.py **Before (test_upload_tools.py - line 70-101):** ```python @pytest.mark.asyncio async def test_create_session_success(self, test_settings, sample_pdf_base64): result = await upload_tools.create_session_from_uploads( project_name="Test Project", files=[{"filename": "test.pdf", "content_base64": sample_pdf_base64}], methodology="soil-carbon-v1.2.2" ) assert result["success"] is True assert "session_id" in result # ... 10 more lines of assertions # Cleanup try: await session_tools.delete_session(result["session_id"]) except: pass ``` **After:** ```python from tests.factories import SessionBuilder, SessionManager, pdf_file @pytest.mark.asyncio async def test_create_session_success(self): async with SessionManager() as manager: session_id = await manager.create( SessionBuilder() .with_project_name("Test Project") .with_file(pdf_file()) .with_methodology("soil-carbon-v1.2.2") ) session = await session_tools.load_session(session_id) assert session["project_metadata"]["project_name"] == "Test Project" # Auto-cleanup on exit ``` **Estimated Savings per file:** - test_upload_tools.py: 24 instances × 4 lines = 96 lines (~2,880 chars) - test_document_processing.py: 2 instances × 4 lines = 8 lines (~240 chars) - test_evidence_extraction.py: 5 instances × 4 lines = 20 lines (~600 chars) **Total Estimated Savings:** 124 lines (~3,720 chars) #### 2.2 Create Additional Factory Helpers **Missing Pattern:** Validation result assertions **Current (repeated 15+ times):** ```python assert result["status"] in ["pass", "fail", "warning"] assert "confidence" in result assert 0.0 <= result["confidence"] <= 1.0 assert "flagged_for_review" in result assert isinstance(result["flagged_for_review"], bool) ``` **Proposed Helper (factories.py):** ```python class ValidationAssertions: """Common assertions for validation results.""" @staticmethod def assert_valid_validation_result(result: Dict): """Assert validation result has correct structure.""" assert result["status"] in ["pass", "fail", "warning"] assert "confidence" in result assert 0.0 <= result["confidence"] <= 1.0 assert "flagged_for_review" in result assert isinstance(result["flagged_for_review"], bool) @staticmethod def assert_validation_passed(result: Dict): """Assert validation passed.""" ValidationAssertions.assert_valid_validation_result(result) assert result["status"] == "pass" assert result["flagged_for_review"] is False @staticmethod def assert_validation_failed(result: Dict): """Assert validation failed.""" ValidationAssertions.assert_valid_validation_result(result) assert result["status"] == "fail" assert result["flagged_for_review"] is True ``` **Estimated Savings:** 15 instances × 5 lines = 75 lines (~1,875 chars) --- ## 3. Test Helper Completeness ### Missing Helpers #### 3.1 Document Verification Helpers **Pattern appears 8+ times:** ```python state_manager = StateManager(session_id) docs_data = state_manager.read_json("documents.json") documents = docs_data.get("documents", docs_data) assert len(documents) > 0 ``` **Proposed Helper:** ```python # tests/helpers/document_helpers.py from registry_review_mcp.utils.state import StateManager from typing import List, Dict def get_session_documents(session_id: str) -> List[Dict]: """Get documents from session's documents.json.""" manager = StateManager(session_id) docs_data = manager.read_json("documents.json") return docs_data.get("documents", docs_data) def assert_documents_found(session_id: str, min_count: int = 1): """Assert minimum documents were discovered.""" docs = get_session_documents(session_id) assert len(docs) >= min_count, f"Expected at least {min_count} documents, found {len(docs)}" return docs def assert_document_type_exists(session_id: str, doc_type: str): """Assert specific document type was classified.""" docs = get_session_documents(session_id) types = [d["classification"] for d in docs] assert doc_type in types, f"Expected document type '{doc_type}', found: {types}" ``` **Estimated Savings:** 8 instances × 4 lines = 32 lines (~960 chars) #### 3.2 Evidence Verification Helpers **Pattern appears 6+ times:** ```python manager = StateManager(session_id) evidence_data = manager.read_json("evidence.json") requirements_covered = evidence_data.get("requirements_covered", 0) requirements_total = evidence_data.get("requirements_total", 0) assert requirements_covered > 0 ``` **Proposed Helper:** ```python # tests/helpers/evidence_helpers.py def get_evidence_summary(session_id: str) -> Dict: """Get evidence extraction summary.""" manager = StateManager(session_id) return manager.read_json("evidence.json") def assert_coverage_meets_threshold(session_id: str, min_coverage: float = 0.3): """Assert evidence coverage meets minimum threshold.""" evidence = get_evidence_summary(session_id) covered = evidence.get("requirements_covered", 0) total = evidence.get("requirements_total", 0) if total == 0: pytest.fail("No requirements found in checklist") coverage = covered / total assert coverage >= min_coverage, \ f"Coverage {coverage:.1%} below threshold {min_coverage:.1%} ({covered}/{total})" return evidence ``` **Estimated Savings:** 6 instances × 5 lines = 30 lines (~900 chars) #### 3.3 Workflow Stage Helpers **Pattern appears 12+ times:** ```python session = await session_tools.load_session(session_id) assert session["workflow_progress"]["document_discovery"] == "completed" ``` **Already exists in factories.py but underutilized:** ```python SessionAssertions.assert_workflow_stage_completed(session_id, "document_discovery") ``` **Recommendation:** Promote usage through documentation and examples --- ## 4. Shared Test Utilities ### Patterns Across Multiple Files #### 4.1 LLM Mock Creation (3 files) **Files:** test_llm_extraction.py, test_marker_integration.py, factories.py **Current Duplication:** ```python # test_llm_extraction.py (lines 32-47) mock_response = Mock() mock_response.content = [Mock(text='```json\n[...]\n```')] # test_marker_integration.py (lines 28-43) mock_convert_fn = Mock(return_value=(...)) mock_get_models.return_value = {"models": Mock(), "convert_fn": mock_convert_fn} # factories.py (lines 284-316) def create_llm_mock_response(json_content: str, use_code_fence: bool = True) -> Mock: ... ``` **Recommendation:** Consolidate to factories.py and promote as standard pattern #### 4.2 Unique Document Names (2 files) **Pattern for cache avoidance:** ```python import time doc_name = f"test_{int(time.time() * 1000000)}.pdf" ``` **Already exists in factories.py:** ```python def unique_doc_name(test_name: str) -> str: """Generate unique document name for testing to avoid cache collisions.""" return f"{test_name}_{int(time.time() * 1000000)}.pdf" ``` **Recommendation:** Promote usage #### 4.3 Pytest Skip Patterns (7 files) **Common pattern:** ```python if not settings.anthropic_api_key or not settings.llm_extraction_enabled: pytest.skip("LLM extraction not configured") if not example_path.exists(): pytest.skip("Botany Farm example data not available") ``` **Proposed Helper:** ```python # tests/helpers/skip_helpers.py import pytest from pathlib import Path from registry_review_mcp.config.settings import settings def skip_if_no_llm(): """Skip test if LLM extraction is not configured.""" if not settings.anthropic_api_key or not settings.llm_extraction_enabled: pytest.skip("LLM extraction not configured") def skip_if_no_examples(): """Skip test if example data is not available.""" path = Path(__file__).parent.parent / "examples" / "22-23" if not path.exists(): pytest.skip("Botany Farm example documents not available") return path # Usage @pytest.fixture def botany_farm_path(): return skip_if_no_examples() ``` **Estimated Savings:** 13 instances × 3 lines = 39 lines (~975 chars) --- ## 5. Setup/Teardown Optimization ### Current State **Excellent Design:** - `cleanup_sessions` (autouse, function scope) - Prevents test pollution ✓ - `cleanup_examples_sessions` (explicit) - Preserves user data ✓ - `cleanup_cache_once` (session scope) - Efficient ✓ **Optimization Opportunity:** ```python # conftest.py lines 271-329 @pytest.fixture(autouse=True, scope="function") def cleanup_sessions(): """Clean up TEST sessions before and after each test.""" # ... 58 lines of cleanup logic ``` ### Recommendations #### 5.1 Extract Cleanup Logic to Helper **Current:** Cleanup logic embedded in fixture (58 lines) **Proposed:** Extract to reusable function ```python # tests/helpers/cleanup_helpers.py from pathlib import Path import shutil import json def cleanup_test_sessions(): """Remove ONLY definitively test-created sessions.""" data_dir = Path(__file__).parent.parent.parent / "data" sessions_dir = data_dir / "sessions" if not sessions_dir.exists(): return for session_path in sessions_dir.iterdir(): if not session_path.is_dir(): continue # Delete test-* sessions if session_path.name.startswith("test-"): shutil.rmtree(session_path, ignore_errors=True) continue # Check if session points to pytest temp directory try: session_file = session_path / "session.json" if session_file.exists(): with open(session_file) as f: session_data = json.load(f) docs_path_str = session_data.get("project_metadata", {}).get("documents_path", "") if "/tmp/pytest-" in docs_path_str or "/tmp/pytest_" in docs_path_str: shutil.rmtree(session_path, ignore_errors=True) except Exception: shutil.rmtree(session_path, ignore_errors=True) # conftest.py from tests.helpers.cleanup_helpers import cleanup_test_sessions @pytest.fixture(autouse=True, scope="function") def cleanup_sessions(): """Clean up TEST sessions before and after each test.""" cleanup_test_sessions() yield cleanup_test_sessions() ``` **Benefits:** - Reusable in manual cleanup scripts - Testable independently - Cleaner fixture definition --- ## 6. Character Reduction Estimates ### Summary by Recommendation | Recommendation | Files Affected | Estimated Savings (chars) | |---------------|----------------|---------------------------| | 1.1 Consolidate PDF fixtures | 1 | 2,400 | | 1.2 Shared mock fixtures | 3 | 1,800 | | 2.1 Extend SessionManager | 6 | 3,720 | | 2.2 Validation assertions | 4 | 1,875 | | 3.1 Document helpers | 5 | 960 | | 3.2 Evidence helpers | 4 | 900 | | 3.3 Workflow helpers (promotion) | 8 | 600 | | 4.3 Skip helpers | 7 | 975 | | **TOTAL** | **24 files** | **13,230 chars** | ### High-Impact Quick Wins (Top 3) 1. **SessionManager Adoption** (3,720 chars, 6 files) - Replace try/finally with context manager - Immediate readability improvement - Zero functionality change 2. **Consolidate PDF Fixtures** (2,400 chars, 1 file) - Remove duplicate sample_pdf fixtures - Use existing factory functions - One-time refactor 3. **Validation Assertions** (1,875 chars, 4 files) - Extract repeated validation checks - Single helper class addition - Reusable across all validation tests **Total Quick Wins:** 7,995 chars (~60% of total savings) --- ## 7. New Shared Utilities to Create ### Proposed Module Structure ``` tests/ ├── helpers/ │ ├── __init__.py │ ├── cleanup_helpers.py # Session/cache cleanup logic │ ├── document_helpers.py # Document verification │ ├── evidence_helpers.py # Evidence verification │ ├── skip_helpers.py # Conditional test skipping │ └── mock_helpers.py # LLM and API mocking ├── conftest.py # Fixtures only ├── factories.py # Test data builders (existing) └── test_*.py # Test files ``` ### helpers/__init__.py ```python """Shared test helpers and utilities. Import commonly used helpers for convenience: from tests.helpers import ( skip_if_no_llm, get_session_documents, assert_coverage_meets_threshold, ValidationAssertions, ) """ from .skip_helpers import skip_if_no_llm, skip_if_no_examples from .document_helpers import get_session_documents, assert_documents_found from .evidence_helpers import get_evidence_summary, assert_coverage_meets_threshold from .cleanup_helpers import cleanup_test_sessions # Import assertion classes from tests.factories import SessionAssertions, ValidationAssertions __all__ = [ # Skip helpers "skip_if_no_llm", "skip_if_no_examples", # Document helpers "get_session_documents", "assert_documents_found", # Evidence helpers "get_evidence_summary", "assert_coverage_meets_threshold", # Cleanup "cleanup_test_sessions", # Assertions "SessionAssertions", "ValidationAssertions", ] ``` ### helpers/mock_helpers.py ```python """Mock creation helpers for LLM and external services.""" from unittest.mock import Mock, AsyncMock from typing import Dict, Any def create_llm_response(json_content: str, use_code_fence: bool = True) -> Mock: """Create a mock LLM response with JSON content. Args: json_content: The JSON string to return use_code_fence: Whether to wrap in markdown code fence Returns: Mock response object compatible with Anthropic API """ if use_code_fence: text = f"```json\n{json_content}\n```" else: text = json_content mock_response = Mock() mock_response.content = [Mock(text=text)] return mock_response def create_llm_client(response: Mock) -> AsyncMock: """Create a mock LLM client that returns the given response. Args: response: Mock response from create_llm_response() Returns: AsyncMock client compatible with Anthropic SDK """ mock_client = AsyncMock() mock_client.messages.create.return_value = response return mock_client def create_marker_models(markdown_output: str, page_count: int = 1) -> Dict[str, Any]: """Create mock marker models for PDF conversion testing. Args: markdown_output: The markdown text to return page_count: Number of pages to report Returns: Mock models dict compatible with marker_extractor """ mock_convert_fn = Mock(return_value=( markdown_output, {}, # images {"page_count": page_count} # metadata )) return { "models": Mock(), "convert_fn": mock_convert_fn } ``` --- ## 8. Implementation Priority ### Phase 1: Quick Wins (Week 1) **Impact:** 7,995 chars saved, 60% of total 1. **Day 1:** Extend SessionManager to test_upload_tools.py - Refactor 24 try/finally blocks - Savings: 2,880 chars 2. **Day 2:** Remove duplicate PDF fixtures - Update test_upload_tools.py to use factories - Savings: 2,400 chars 3. **Day 3:** Add ValidationAssertions to factories.py - Refactor validation tests - Savings: 1,875 chars 4. **Day 4:** Add skip_helpers.py - Refactor pytest.skip calls - Savings: 975 chars ### Phase 2: Helper Infrastructure (Week 2) **Impact:** 2,760 chars saved, 21% of total 5. **Day 5-6:** Create helpers/ module - document_helpers.py - evidence_helpers.py - Savings: 1,860 chars 6. **Day 7:** Create mock_helpers.py - Consolidate LLM mocking - Savings: 900 chars ### Phase 3: Systematic Adoption (Week 3) **Impact:** 2,475 chars saved, 19% of total 7. **Days 8-10:** Extend SessionManager to remaining files - test_document_processing.py - test_evidence_extraction.py - test_user_experience.py - Savings: 840 chars 8. **Days 11-12:** Promote helper usage - Update documentation - Refactor remaining instances - Savings: 1,635 chars --- ## 9. Validation Metrics ### Success Criteria After implementation, measure: 1. **Reduction Metrics:** - Total test LOC reduced by 8-10% - Average test method length reduced from 25 → 20 lines - Fixture usage increased by 40% 2. **Quality Metrics:** - Zero new test failures - All existing tests pass - Coverage maintained at ≥ 85% 3. **Maintainability Metrics:** - Helper function usage: 60+ call sites - SessionManager adoption: 18/24 files (75%) - Manual cleanup instances: < 5 ### Risk Mitigation - Run full test suite after each phase - Keep original implementations commented during migration - Create rollback commits at phase boundaries --- ## 10. Conclusion The test infrastructure demonstrates mature patterns (factories, session cleanup, cost tracking) but suffers from underutilization and duplication. The proposed changes target: - **13,230 characters saved** (16-20% reduction in test code) - **Zero functionality change** (refactoring only) - **Improved maintainability** through helper consolidation - **Faster test authoring** via reusable patterns The three-phase implementation plan delivers quick wins first (60% of savings in Week 1) while building sustainable infrastructure for long-term maintenance. ### Recommended Next Steps 1. Review this analysis with the team 2. Approve Phase 1 quick wins 3. Create implementation branches 4. Execute Week 1 refactoring 5. Measure results and iterate --- **Analysis Prepared By:** Claude Code (Sonnet 4.5) **Review Status:** Draft **Estimated Implementation:** 3 weeks (15 work days)

Loading blob content...

Latest Blog Posts

Don't Use Large Strings as Cache Keys
By punkpeye on January 11, 2026.
markdown
node-js
cache
What are Claude Skills?
By punkpeye on January 10, 2026.
mcp
skills
How to Test MCP Streamable HTTP Endpoints Using cURL
By punkpeye on January 2, 2026.
tutorial
bash

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gaiaaiagent/regen-registry-review-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

TEST_INFRASTRUCTURE_ANALYSIS_2025-11-19.md•23 kB