Regen Network MCP Server

testing_strategy.md•18.2 KiB

# Comprehensive Testing Strategy for Regen Python MCP **Status:** Implementation Required **Created:** 2025-10-17 **Principle:** No Mock Data - Real Blockchain Data Only --- ## Executive Summary This document outlines a comprehensive testing strategy for the Regen Network MCP server that ensures **actual behavioral correctness** when the MCP is used in production. We follow the principle of **no mock/fake data** - all tests use real Regen Network blockchain data, captured and versioned for reproducibility. ### Current State ✅ **What Exists:** - `/tests/test_prompts.py` - Tests for 8 prompts (content validation only) - `/tests/test_prompts_integration.py` - Basic prompt integration test - **Coverage:** ~15% (prompts only, no tool tests) ❌ **What's Missing:** - Tests for 45 blockchain tools - Real blockchain data fixtures - Integration tests with MCP protocol - End-to-end Claude Code interaction tests - Client-level tests - Error handling and edge case tests --- ## Testing Philosophy ### Principle: Real Data, Real Behavior **Why No Mocks:** 1. **Mocks lie** - They test what we think the API returns, not what it actually returns 2. **Schema drift** - Real Regen Network API may change, mocks won't catch this 3. **Edge cases** - Real data contains edge cases mocks might miss 4. **Confidence** - Tests with real data prove actual functionality **Approach:** 1. **Capture** real data from Regen Network on first run 2. **Version** captured data in repository (`data/test_fixtures/`) 3. **Reuse** captured data for fast, repeatable tests 4. **Update** captured data periodically to catch API changes ### Test Pyramid Structure ``` /\ E2E Tests (10%) / \ - Full MCP protocol with Claude Code / \ - Real network interaction /------\ Integration Tests (30%) / \ - Tool → Client → Network / \ - Multi-component workflows /------------\ Unit Tests (60%) / \ - Individual functions \______________/ - Data validation ``` --- ## Test Categories & Coverage Goals ### 1. Client Tests (`tests/test_client.py`) **Purpose:** Verify RegenClient correctly interacts with blockchain **Coverage:** - ✓ Connection handling & endpoint fallback - ✓ All query methods (baskets, credits, marketplace, etc.) - ✓ Pagination handling - ✓ Error handling & retries - ✓ Health checks **Test Data Strategy:** - Capture real responses from `/regen/ecocredit/v1/credit-types` - Capture real responses from `/regen/ecocredit/v1/classes` - Store in `data/test_fixtures/client/` **Example:** ```python @pytest.mark.asyncio async def test_query_credit_types_real_data(): """Test credit types query with real Regen Network data.""" client = RegenClient() # First run: captures data # Subsequent runs: uses captured data result = await client.query_credit_types() # Validate real structure assert "credit_types" in result assert isinstance(result["credit_types"], list) assert len(result["credit_types"]) > 0 # Validate real data content credit_type = result["credit_types"][0] assert "name" in credit_type assert credit_type["name"] in ["carbon", "biodiversity", "C"] ``` **Target Coverage:** 90% --- ### 2. Tool Tests (`tests/tools/`) **Purpose:** Verify each of 45 tools produces correct output **Structure:** ``` tests/tools/ ├── test_bank_tools.py # 11 bank module tools ├── test_distribution_tools.py # 9 distribution tools ├── test_governance_tools.py # 8 governance tools ├── test_marketplace_tools.py # 5 marketplace tools ├── test_credit_tools.py # 4 credit tools ├── test_basket_tools.py # 5 basket tools ├── test_analytics_tools.py # 3 analytics tools └── fixtures/ ├── bank_responses.json ├── distribution_responses.json ├── governance_responses.json ├── marketplace_responses.json ├── credit_responses.json ├── basket_responses.json └── analytics_responses.json ``` **Test Template for Each Tool:** ```python @pytest.mark.asyncio async def test_list_credit_types_tool(): """Test list_credit_types tool with real data.""" # Call the actual tool result = await list_credit_types() # Validate response structure (MCP format) assert isinstance(result, dict) assert "credit_types" in result or "data" in result # Validate data content (real blockchain data) # This will fail if API changes - GOOD! assert len(result["credit_types"]) >= 5 # Known min count # Validate specific known credit types exist type_names = [ct["name"] for ct in result["credit_types"]] assert "C" in type_names # Carbon credits exist ``` **Data Capture Process:** 1. Run tests against live network (first time) 2. Tests automatically save responses to `fixtures/` 3. Mark data with capture timestamp and version 4. Future runs use cached data (fast) 5. Optional `--update-fixtures` flag to refresh **Target Coverage:** 100% (all 45 tools) --- ### 3. Prompt Tests (`tests/test_prompts.py`) **Status:** Already exists but needs enhancement **Current Coverage:** - ✓ Prompt content validation - ✓ Parameter handling - ✓ Markdown formatting **Additional Tests Needed:** - ❌ Prompt output actually helps user understand concepts - ❌ Code examples in prompts are executable - ❌ Prompts reference real credit classes that exist **Enhancement:** ```python @pytest.mark.asyncio async def test_ecocredit_workshop_references_real_classes(): """Verify prompt references real credit classes from blockchain.""" # Get real credit classes client = RegenClient() classes_response = await client.query_credit_classes() real_class_ids = [c["id"] for c in classes_response["classes"]] # Get prompt content prompt_content = await ecocredit_query_workshop() # Verify prompt references at least some real classes referenced_classes = re.findall(r'C\d{2}', prompt_content) assert any(rc in real_class_ids for rc in referenced_classes), \ f"Prompt references non-existent classes: {referenced_classes}" ``` **Target Coverage:** 100% (all 8 prompts) --- ### 4. Server Integration Tests (`tests/test_server_integration.py`) **Purpose:** Verify complete server startup and MCP protocol compliance **Tests:** - ✓ Server initialization - ✓ All 45 tools registered - ✓ All 8 prompts registered - ✓ Tool execution through MCP - ✓ Prompt execution through MCP - ✓ Error handling at server level **Example:** ```python @pytest.mark.asyncio async def test_server_registers_all_tools(): """Verify server registers all expected tools.""" from mcp_server.server import create_modular_regen_mcp_server server = create_modular_regen_mcp_server() # Count registered tools # This will vary based on FastMCP introspection API tool_count = len(server.list_tools()) assert tool_count == 45, f"Expected 45 tools, got {tool_count}" @pytest.mark.asyncio async def test_tool_execution_through_mcp(): """Test calling a tool through MCP protocol.""" server = create_modular_regen_mcp_server() # Simulate MCP tool call result = await server.call_tool("list_credit_types", {}) # Validate MCP response format assert "content" in result or "result" in result ``` **Target Coverage:** 95% --- ### 5. MCP Protocol Compliance Tests (`tests/test_mcp_compliance.py`) **Purpose:** Ensure strict adherence to MCP specification **Tests:** - ✓ Tool schema validation - ✓ Prompt schema validation - ✓ Response format compliance - ✓ Error format compliance - ✓ JSON-RPC 2.0 compliance **Example:** ```python def test_tool_schemas_valid(): """Verify all tool schemas match MCP specification.""" server = create_modular_regen_mcp_server() for tool in server.list_tools(): # Validate schema structure assert "name" in tool assert "description" in tool assert "inputSchema" in tool # Validate JSON Schema compliance jsonschema.validate( instance=tool["inputSchema"], schema=MCP_TOOL_SCHEMA ) ``` **Target Coverage:** 100% --- ### 6. End-to-End Tests (`tests/test_e2e.py`) **Purpose:** Test complete user workflows as they would occur in Claude Code **Scenarios:** #### Scenario 1: Query Credit Types ```python @pytest.mark.e2e @pytest.mark.slow async def test_e2e_query_credit_types(): """E2E: User asks Claude to list credit types.""" # 1. Start server server = create_modular_regen_mcp_server() # 2. Simulate MCP client connection async with MCPClient(server) as client: # 3. User prompt: "List credit types on Regen Network" tools = await client.list_tools() tool = next(t for t in tools if t["name"] == "list_credit_types") # 4. Claude calls tool result = await client.call_tool("list_credit_types", {}) # 5. Validate user-visible result assert "credit_types" in result assert len(result["credit_types"]) > 0 # 6. Validate response is human-readable credit_type = result["credit_types"][0] assert "name" in credit_type assert "abbreviation" in credit_type ``` #### Scenario 2: Marketplace Analysis ```python @pytest.mark.e2e async def test_e2e_marketplace_analysis(): """E2E: User analyzes marketplace sell orders.""" server = create_modular_regen_mcp_server() async with MCPClient(server) as client: # User: "Show me sell orders for credit batch X" result = await client.call_tool( "list_sell_orders_by_batch", {"batch_denom": "C01-001-20220101-20221231-001"} ) # Validate meaningful data for user assert "sell_orders" in result if len(result["sell_orders"]) > 0: order = result["sell_orders"][0] assert "quantity" in order assert "ask_price" in order ``` **Target Coverage:** Key user journeys (10-15 scenarios) --- ## Real Data Capture Infrastructure ### Fixture Manager (`tests/fixtures/fixture_manager.py`) ```python class FixtureManager: """Manages real blockchain data fixtures with versioning.""" def __init__(self, fixtures_dir: Path): self.fixtures_dir = fixtures_dir self.metadata_file = fixtures_dir / "metadata.json" async def get_or_capture( self, fixture_name: str, capture_func: Callable, ttl_days: int = 30 ) -> Any: """Get cached fixture or capture fresh data.""" fixture_path = self.fixtures_dir / f"{fixture_name}.json" # Check if cached data exists and is fresh if fixture_path.exists(): metadata = self._load_metadata() capture_time = metadata.get(fixture_name, {}).get("captured_at") if capture_time and self._is_fresh(capture_time, ttl_days): # Use cached data with open(fixture_path) as f: return json.load(f) # Capture fresh data print(f"Capturing fresh data for: {fixture_name}") data = await capture_func() # Save fixture with open(fixture_path, 'w') as f: json.dump(data, f, indent=2) # Update metadata self._update_metadata(fixture_name, { "captured_at": datetime.now().isoformat(), "source": "real_blockchain", "network": "regen-1" }) return data ``` ### Fixture Organization ``` data/test_fixtures/ ├── metadata.json # Capture timestamps, versions ├── client/ │ ├── credit_types.json # /credit-types response │ ├── credit_classes.json # /classes response │ ├── projects.json # /projects response │ ├── batches.json # /batches response │ └── sell_orders.json # /sell-orders response ├── tools/ │ ├── bank_responses.json │ ├── distribution_responses.json │ ├── governance_responses.json │ └── ... └── integration/ ├── full_workflow_1.json └── full_workflow_2.json ``` ### Usage in Tests ```python @pytest.fixture async def real_credit_types(fixture_manager): """Fixture providing real credit types data.""" return await fixture_manager.get_or_capture( "credit_types", lambda: RegenClient().query_credit_types(), ttl_days=7 ) @pytest.mark.asyncio async def test_with_real_data(real_credit_types): """Test using real captured data.""" assert len(real_credit_types["credit_types"]) > 0 ``` --- ## Test Execution Strategy ### Test Marks ```python # Speed-based marks @pytest.mark.unit # Fast, no network (uses fixtures) @pytest.mark.integration # Medium, may use network @pytest.mark.e2e # Slow, full workflow @pytest.mark.slow # Any slow test # Network-based marks @pytest.mark.online # Requires network connection @pytest.mark.offline # Can run without network (uses fixtures) # Coverage-based marks @pytest.mark.client # Tests client layer @pytest.mark.tools # Tests tools layer @pytest.mark.server # Tests server layer @pytest.mark.mcp # Tests MCP protocol ``` ### Test Commands ```bash # Fast: Unit tests only (offline, uses fixtures) pytest -m "unit and offline" tests/ # Medium: Add integration tests pytest -m "not e2e" tests/ # Full: Everything including E2E pytest tests/ # Update fixtures from live network pytest --update-fixtures tests/ # Coverage report pytest --cov=src --cov-report=html tests/ # Specific module pytest tests/tools/test_credit_tools.py -v ``` --- ## Implementation Roadmap ### Phase 1: Foundation (Week 1) 1. ✅ **Create test directory structure** ```bash mkdir -p tests/{tools,client,integration,e2e,fixtures} mkdir -p data/test_fixtures/{client,tools,integration} ``` 2. ✅ **Implement FixtureManager** - Create `tests/fixtures/fixture_manager.py` - Handle data capture, caching, versioning - Add metadata tracking 3. ✅ **Setup pytest configuration** - Add test marks - Configure fixtures - Setup coverage ### Phase 2: Client Tests (Week 1-2) 4. ✅ **Test all client query methods** - Start with `/credit-types` - Capture real responses - Test pagination - Test error handling ### Phase 3: Tool Tests (Week 2-3) 5. ✅ **Test credit tools** (4 tools) 6. ✅ **Test basket tools** (5 tools) 7. ✅ **Test marketplace tools** (5 tools) 8. ✅ **Test bank tools** (11 tools) 9. ✅ **Test distribution tools** (9 tools) 10. ✅ **Test governance tools** (8 tools) 11. ✅ **Test analytics tools** (3 tools) ### Phase 4: Integration & E2E (Week 3-4) 12. ✅ **Server integration tests** 13. ✅ **MCP protocol compliance** 14. ✅ **End-to-end scenarios** ### Phase 5: CI/CD (Week 4) 15. ✅ **GitHub Actions workflow** 16. ✅ **Coverage reporting** 17. ✅ **Automated fixture updates** --- ## Success Criteria ### Coverage Metrics - **Overall Coverage:** ≥ 85% - **Client Coverage:** ≥ 90% - **Tools Coverage:** 100% (all 45 tools) - **Prompts Coverage:** 100% (all 8 prompts) - **Server Coverage:** ≥ 95% ### Quality Metrics - **All tests use real data** (no mocks except for errors) - **Tests fail when API changes** (proves they're meaningful) - **E2E scenarios cover** ≥80% of expected user journeys - **Test suite runs** <5 minutes (offline mode) - **Documentation:** Every test has clear docstring ### Behavioral Validation - ✅ Server starts successfully - ✅ All tools callable via MCP - ✅ Real blockchain data fetched correctly - ✅ Error handling works as expected - ✅ Pagination works across all tools - ✅ Claude Code can connect and use tools --- ## Data Staleness Handling ### Problem Real blockchain data changes over time. How do we handle this? ### Solution: Version + TTL Strategy ```python # metadata.json { "credit_types": { "captured_at": "2025-10-17T10:30:00Z", "ttl_days": 7, "network": "regen-1", "api_version": "v1", "checksum": "abc123...", "notes": "Captured during stable period" } } ``` **TTL Guidelines:** - Static data (credit types): 30 days - Slowly changing (classes, projects): 7 days - Frequently changing (sell orders, balances): 1 day - Always fresh (health checks): 0 days (always capture) **Update Process:** ```bash # Weekly automated job pytest --update-fixtures --max-age-days=7 tests/ # Manual refresh pytest --force-refresh tests/ ``` --- ## Handling Test Failures ### Expected Failures (GOOD) 1. **API Schema Changes** - Fixture doesn't match current API - **Action:** Update fixture, validate change intentional 2. **New Credit Classes Added** - Test expects 11 classes, now 12 exist - **Action:** Update test, capture new data 3. **Endpoint Down** - Health check fails - **Action:** Switch to backup endpoint, report issue ### Unexpected Failures (BAD) 1. **Logic Errors** - Tool returns wrong data format - **Action:** Fix tool implementation 2. **Missing Error Handling** - Uncaught exception in edge case - **Action:** Add error handling, add test --- ## Maintenance ### Weekly - Review fixture freshness - Update stale fixtures (>7 days) - Check for API changes ### Monthly - Full fixture refresh - Review coverage metrics - Update test scenarios ### Quarterly - Major fixture overhaul - Performance optimization - Documentation updates --- ## Tools & Dependencies ```toml # pyproject.toml [project.optional-dependencies.test] test = [ "pytest>=7.4.0", "pytest-asyncio>=0.21.0", "pytest-cov>=4.1.0", "pytest-mock>=3.11.0", # Only for network errors "httpx>=0.24.0", "jsonschema>=4.19.0", ] ``` --- ## Conclusion This testing strategy ensures: 1. **Real Behavior:** Tests validate actual MCP functionality 2. **Confidence:** Real blockchain data proves correctness 3. **Maintainability:** Fixtures version and track data sources 4. **Speed:** Cached fixtures enable fast iteration 5. **Reliability:** Comprehensive coverage catches regressions **Next Steps:** 1. Review and approve this strategy 2. Implement FixtureManager 3. Start with Phase 1 (Foundation) 4. Iterate through phases 2-5 --- **Questions for Discussion:** 1. Is 7-day TTL reasonable for most data? 2. Should we test against testnet in addition to mainnet? 3. How should we handle breaking API changes? 4. Should we include performance benchmarks in tests?

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gaiaaiagent/regen-python-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

testing_strategy.md•18.2 KiB