# TASK_4: Comprehensive Testing and Macro Validation Framework
**Created By**: Agent_1 | **Priority**: MEDIUM | **Duration**: 4 hours
**Technique Focus**: Property-Based Testing + Contract Verification + Test Architecture + Macro Testing + Sandbox Execution
**Size Constraint**: Target <250 lines/module, Max 400 if splitting awkward
## š **MERGED FUNCTIONALITY**
This task combines foundation testing framework (original TASK_4) with comprehensive macro testing and validation (TASK_31) for complete quality assurance.
## š¦ Status & Assignment
**Status**: COMPLETED
**Assigned**: Agent_1
**Dependencies**: TASK_1 (Core engine), TASK_2 (KM integration), TASK_3 (Command library)
**Blocking**: None (Final task)
## š Required Reading (Complete before starting)
- [ ] **src/core/**: Review core engine implementation and contracts
- [ ] **src/integration/**: Review KM integration layer and event system
- [ ] **src/commands/**: Review command library and validation framework
- [ ] **tests/TESTING.md**: Current test status and framework setup
- [ ] **CLAUDE.md**: Property-based testing and validation requirements
## šÆ Implementation Overview
Build comprehensive testing and macro validation framework that provides:
1. **Foundation Testing**: Property-based testing, contract verification, and integration testing
2. **Macro Testing**: Automated macro validation with sandbox execution
3. **Quality Assurance**: Test suites, performance benchmarks, and security validation
4. **Production Readiness**: Complete testing framework for macro deployment
<thinking>
Testing framework architecture:
1. Property-Based Testing: Test behavior across input ranges with Hypothesis
2. Contract Verification: Validate pre/post conditions and invariants
3. Integration Testing: End-to-end macro execution validation
4. Macro Testing: Comprehensive macro validation with sandbox execution
5. Performance Testing: Ensure timing requirements and macro performance
6. Security Testing: Validate all security boundaries and macro safety
7. Quality Gates: Pass/fail criteria for macro deployment approval
</thinking>
## ā
Implementation Subtasks (Sequential completion)
### Phase 1: Testing Infrastructure
- [x] **Test configuration**: Set up pytest with coverage and property testing
- [x] **Test utilities**: Create helpers for macro testing and validation
- [x] **Mock framework**: Build mocks for KM integration testing
- [x] **Property generators**: Create Hypothesis strategies for macro data
- [x] **Macro test framework**: Sandbox environment for safe macro testing
- [x] **Test runner**: Automated test execution with comprehensive reporting
### Phase 2: Comprehensive Test Suites
- [x] **Contract verification**: Test all design-by-contract assertions
- [x] **Property-based tests**: Test system properties across input ranges
- [x] **Integration tests**: End-to-end macro execution validation
- [x] **Security tests**: Validate all security boundaries and input sanitization
- [x] **Macro validation**: Functional, performance, and compatibility testing
- [x] **Regression testing**: Automated testing for macro modifications
### Phase 3: Advanced Testing Features
- [x] **Performance benchmarks**: Validate timing requirements and macro performance
- [x] **Test reporting**: Generate comprehensive test coverage and macro quality reports
- [x] **Stress testing**: High-load and resource exhaustion scenarios
- [x] **Security testing**: Macro security boundaries and permission validation
- [x] **Quality gates**: Pass/fail criteria for macro deployment
- [x] **CI/CD integration**: Automated testing pipeline with macro validation
## š§ Implementation Files & Specifications
### Testing Framework Files to Create:
```
tests/
āāā TESTING.md # Complete test status dashboard (updated)
āāā conftest.py # Pytest configuration and fixtures (100-150 lines)
āāā utils/
ā āāā __init__.py # Test utilities public API
ā āāā generators.py # Hypothesis strategies (150-200 lines)
ā āāā mocks.py # Mock objects for testing (100-150 lines)
ā āāā assertions.py # Custom test assertions (75-100 lines)
ā āāā fixtures.py # Reusable test fixtures (100-150 lines)
ā āāā macro_test_runner.py # Macro testing utilities (200-250 lines)
āāā property_tests/
ā āāā test_engine_properties.py # Engine property tests
ā āāā test_command_properties.py # Command property tests
ā āāā test_security_properties.py # Security property tests
ā āāā test_integration_properties.py # Integration property tests
ā āāā test_macro_properties.py # Macro validation property tests
āāā integration/
ā āāā test_end_to_end.py # Complete macro execution tests
ā āāā test_km_integration.py # KM integration tests
ā āāā test_error_scenarios.py # Error handling and recovery tests
ā āāā test_macro_execution.py # Macro execution integration tests
āāā performance/
ā āāā test_benchmarks.py # Performance benchmark tests
ā āāā test_load.py # Load testing for macro execution
ā āāā test_macro_performance.py # Macro-specific performance tests
āāā security/
ā āāā test_input_validation.py # Input validation security tests
ā āāā test_injection_prevention.py # Injection attack prevention tests
ā āāā test_permission_boundaries.py # Permission system tests
ā āāā test_macro_security.py # Macro security validation tests
āāā macro_validation/
āāā test_macro_functional.py # Functional macro testing
āāā test_macro_compatibility.py # Compatibility testing
āāā test_macro_regression.py # Regression testing
āāā test_quality_gates.py # Quality assurance testing
```
### Key Implementation Requirements:
#### conftest.py - Test Configuration
```python
import pytest
from hypothesis import settings, HealthCheck
from unittest.mock import MagicMock
from src.core.types import MacroId, ExecutionContext
from src.integration.km_client import KMClient
from src.testing.macro_test_runner import MacroTestRunner
from src.testing.test_sandbox import TestSandbox
# Hypothesis configuration for property-based testing
settings.register_profile("default", max_examples=100, deadline=1000)
settings.register_profile("ci", max_examples=1000, deadline=5000)
settings.load_profile("default")
@pytest.fixture
def mock_km_client() -> MagicMock:
"""Mock KM client for integration testing."""
mock = MagicMock(spec=KMClient)
mock.register_trigger.return_value = Either.right(TriggerId("test_trigger"))
return mock
@pytest.fixture
def execution_context() -> ExecutionContext:
"""Standard execution context for testing."""
return ExecutionContext.create_test_context(
permissions=frozenset([Permission.TEXT_INPUT, Permission.SYSTEM_SOUND]),
timeout=Duration.from_seconds(30)
)
@pytest.fixture
def macro_test_runner() -> MacroTestRunner:
"""Macro testing runner for comprehensive validation."""
return MacroTestRunner()
@pytest.fixture
def test_sandbox() -> TestSandbox:
"""Isolated sandbox environment for safe macro testing."""
return TestSandbox()
@pytest.fixture
def sample_macro_specs():
"""Sample macro specifications for testing."""
return [
{
"macro_id": "test_macro_1",
"name": "Simple Text Macro",
"actions": [
{"type": "Type a String", "text": "Hello World"}
]
},
{
"macro_id": "test_macro_2",
"name": "Complex Workflow",
"actions": [
{"type": "Get Clipboard"},
{"type": "Search and Replace", "find": "old", "replace": "new"},
{"type": "Set Clipboard"}
]
}
]
```
#### generators.py - Hypothesis Strategies
```python
from hypothesis import strategies as st
from hypothesis.strategies import composite
from src.core.types import MacroId, CommandId, ExecutionContext
from src.commands.text import TypeTextCommand
@composite
def macro_ids(draw) -> MacroId:
"""Generate valid macro IDs."""
return MacroId(draw(st.text(min_size=1, max_size=50, alphabet=st.characters(whitelist_categories=('Lu', 'Ll', 'Nd')))))
@composite
def safe_text_commands(draw) -> TypeTextCommand:
"""Generate safe text commands for property testing."""
text = draw(st.text(min_size=0, max_size=1000).filter(lambda t: is_safe_text_content(t)))
speed = draw(st.sampled_from(list(TypingSpeed)))
return TypeTextCommand(
command_id=draw(command_ids()),
parameters=CommandParameters.empty(),
text=text,
typing_speed=speed
)
@composite
def execution_contexts(draw) -> ExecutionContext:
"""Generate valid execution contexts."""
permissions = draw(st.frozensets(st.sampled_from(list(Permission)), max_size=5))
timeout = draw(st.integers(min_value=1, max_value=300).map(Duration.from_seconds))
return ExecutionContext.create(permissions=permissions, timeout=timeout)
```
#### property_tests/test_engine_properties.py - Engine Properties
```python
from hypothesis import given, assume
import pytest
from src.core.engine import MacroEngine
from src.core.types import MacroDefinition
from tests.utils.generators import macro_definitions, execution_contexts
class TestEngineProperties:
"""Property-based tests for macro engine."""
@given(macro_definitions(), execution_contexts())
def test_execution_always_returns_result(self, macro_def: MacroDefinition, context: ExecutionContext):
"""Property: Engine always returns execution result."""
assume(macro_def.is_valid())
engine = MacroEngine()
result = engine.execute_macro(macro_def, context)
assert result is not None
assert hasattr(result, 'execution_token')
assert hasattr(result, 'status')
@given(macro_definitions())
def test_invalid_macros_rejected(self, macro_def: MacroDefinition):
"""Property: Invalid macros are always rejected."""
assume(not macro_def.is_valid())
engine = MacroEngine()
with pytest.raises(ContractViolationError):
engine.execute_macro(macro_def, ExecutionContext.default())
@given(execution_contexts())
def test_execution_respects_timeouts(self, context: ExecutionContext):
"""Property: Execution never exceeds context timeout."""
assume(context.timeout < Duration.from_seconds(1))
engine = MacroEngine()
long_macro = MacroDefinition.create_sleep_macro(Duration.from_seconds(10))
start_time = time.monotonic()
result = engine.execute_macro(long_macro, context)
execution_time = time.monotonic() - start_time
assert execution_time <= context.timeout.total_seconds() + 0.1 # Small tolerance
```
#### security/test_injection_prevention.py - Security Properties
```python
from hypothesis import given, strategies as st
import pytest
from src.commands.validation import validate_text_input, validate_file_path
from src.commands.text import TypeTextCommand
class TestInjectionPrevention:
"""Security property tests for injection prevention."""
@given(st.text().filter(lambda t: contains_script_injection(t)))
def test_script_injection_always_blocked(self, malicious_text: str):
"""Property: Script injection attempts are always blocked."""
assert not validate_text_input(malicious_text)
with pytest.raises(ValidationError):
TypeTextCommand.create(text=malicious_text)
@given(st.text().filter(lambda t: contains_path_traversal(t)))
def test_path_traversal_always_blocked(self, malicious_path: str):
"""Property: Path traversal attempts are always blocked."""
assert not validate_file_path(malicious_path)
@given(st.text(min_size=1, max_size=10000))
def test_excessive_input_blocked(self, large_input: str):
"""Property: Excessively large inputs are blocked."""
if len(large_input) > MAX_TEXT_LENGTH:
assert not validate_text_input(large_input)
```
#### performance/test_benchmarks.py - Performance Tests
```python
import pytest
import time
from src.core.engine import MacroEngine
from src.commands.text import TypeTextCommand
class TestPerformanceBenchmarks:
"""Performance benchmark tests."""
def test_engine_startup_time(self):
"""Engine startup should be under 10ms."""
start_time = time.perf_counter()
engine = MacroEngine()
startup_time = time.perf_counter() - start_time
assert startup_time < 0.01 # 10ms
def test_command_validation_time(self):
"""Command validation should be under 5ms."""
command = TypeTextCommand.create(text="test text")
start_time = time.perf_counter()
result = command.validate()
validation_time = time.perf_counter() - start_time
assert validation_time < 0.005 # 5ms
assert result is True
@pytest.mark.parametrize("text_length", [10, 100, 1000])
def test_text_command_scaling(self, text_length: int):
"""Text command execution should scale linearly."""
text = "a" * text_length
command = TypeTextCommand.create(text=text)
context = ExecutionContext.create_test_context()
start_time = time.perf_counter()
result = command.execute(context)
execution_time = time.perf_counter() - start_time
# Should be roughly 1ms per 10 characters
expected_max_time = (text_length / 10) * 0.001 + 0.05 # Base overhead
assert execution_time < expected_max_time
```
## šļø Modularity Strategy
- **conftest.py**: Pytest configuration and shared fixtures (target: 125 lines)
- **generators.py**: Hypothesis strategies for property testing (target: 175 lines)
- **mocks.py**: Mock objects and test doubles (target: 125 lines)
- **assertions.py**: Custom assertions and validators (target: 90 lines)
- **Property test files**: Each under 200 lines with focused property testing
- **Integration test files**: Each under 150 lines with specific scenario coverage
- **Performance test files**: Each under 100 lines with targeted benchmarks
## ā
Success Criteria
- Property-based testing validates system behavior across input ranges
- Contract verification ensures all design-by-contract assertions work correctly
- Integration testing validates end-to-end macro execution
- Comprehensive macro testing with functional, performance, security, and compatibility validation
- Isolated sandbox environment for safe macro test execution
- Built-in test suites for common macro patterns and edge cases
- Security testing confirms all input validation, injection prevention, and macro safety
- Performance benchmarks validate timing requirements for system and macros
- Test coverage > 95% for all core modules and macro validation
- TESTING.md provides comprehensive status dashboard with macro quality metrics
- CI/CD pipeline automatically runs all test suites including macro validation
- Quality gates for macro deployment with pass/fail criteria
- Zero flaky tests: All tests consistently pass or fail
- Performance regression detection: Benchmarks fail if performance degrades
- Security regression detection: Security tests fail if vulnerabilities introduced
- Macro regression testing: Automated testing for macro modifications
- Documentation: Complete testing guide with macro validation best practices