Skip to main content
Glama
ARCHITECTURE.md6.57 kB
# MCP Server Architecture ## Overview MCP Server is a production-ready orchestration platform for AI-driven remediation testing. It follows a service-oriented architecture with clear separation of concerns. ## Core Components ### 1. Configuration Layer (`config.py`) - **Pydantic Settings**: Type-safe configuration management - **Environment Overrides**: `MCP_*` environment variables - **YAML Support**: Default `config.yaml` loading - **Validation**: Automatic path creation and validation ### 2. Logging Layer (`logging_config.py`) - **Dual Output**: Console (INFO+) and File (DEBUG+) - **Rotation**: 10MB files, 5 backups - **Artifact Management**: Per-run directory structure - **Structured Logging**: Contextual information with file/line numbers ### 3. Data Models (`models/`) #### Scenario Model (`scenario.py`) - **Meta**: Scenario identification - **Defaults**: Default configurations - **Bindings**: Variable substitution - **Prechecks**: Pre-execution validation - **Fault**: Fault injection definition - **Stabilize**: Wait conditions - **Assistant Steps**: RCA and Remedy interactions - **Execute Remedy**: Command execution - **Verify**: Post-execution validation - **Cleanup**: Resource cleanup - **Report**: Result reporting All models are Pydantic v2 with: - Type validation - Default values - JSON schema generation - Serialization/deserialization ### 4. Orchestration Engine (`orchestration/`) #### FSM (`fsm.py`) State machine with 13 states: ``` INIT → PRECHECK → FAULT_INJECT → STABILIZE → ASSISTANT_RCA → EVAL_RCA → ASSISTANT_REMEDY → EVAL_REMEDY → EXECUTE_REMEDY → VERIFY → PASS/FAIL → CLEANUP ``` **ScenarioContext**: Execution state container - Run metadata - Current state - Step results - Thread/interrupt tracking - Response storage **StepResult**: Individual step outcome - State identifier - Success/failure - Message - Score (optional) - Artifacts - Metadata #### Engine (`engine.py`) Async generator-based orchestration: - Yields `StepResult` for each step - Error handling and recovery - Artifact generation - Variable substitution ### 5. Services (`services/`) #### FaultService (`fault_service.py`) Stub for chaos engineering integration: - `inject()`: Create fault - `cleanup()`: Remove fault - Tracking of active faults - Integration points for Chaos Mesh, Litmus, Gremlin #### ExecutorService (`executor_service.py`) Secure command execution: - `asyncio.subprocess` for local execution - Deny pattern enforcement - Output capture (stdout/stderr) - Artifact storage - Exit code handling #### EvalService (`eval_service.py`) AI response evaluation: - **Regex Guards**: Pattern validation - **JSON Schema**: Structure validation - **Token Jaccard**: Semantic similarity - Reference/metric matching - Threshold-based pass/fail ### 6. Clients (`clients/`) #### RemediationClient (`remediation_client.py`) HTTP client for workflow API: - **httpx**: Async HTTP client - **initiate_remediation()**: Start workflow - **resume_remediation()**: Resume with input - **JSON Pointer Resolution**: Navigate graph structure - **State Management**: Thread/interrupt tracking API Methods: - `InitiateEnsemble`: Create new remediation workflow - `ResumeEnsemble`: Continue workflow with input ### 7. Server (`server.py`) #### ScenarioServiceImpl Main service implementation: - Scenario registry - Run tracking - Async execution - Result aggregation #### MCPServer Server lifecycle management: - Service initialization - Scenario loading - Graceful shutdown - Client cleanup ## Data Flow ``` 1. Load Scenario (YAML → Pydantic Model) ↓ 2. Initialize Context (ScenarioContext) ↓ 3. Orchestration Engine (FSM-based) ├─→ FaultService.inject() ├─→ RemediationClient.initiate() ├─→ EvalService.score() ├─→ ExecutorService.run() └─→ FaultService.cleanup() ↓ 4. Generate Artifacts ├─→ scenario.yaml ├─→ transcript.json ├─→ report.json └─→ cmd_*.txt ↓ 5. Return ScenarioResult ``` ## Error Handling ### Service Level - Try/catch with logging - Graceful degradation - Detailed error messages ### Orchestration Level - State-specific error handling - Automatic cleanup on failure - Final state preservation ### Client Level - HTTP error handling - Timeout management - Retry logic (future) ## Security ### Command Execution - Deny pattern matching - Namespace isolation - Service account enforcement - Output sanitization ### API Communication - HTTPS support - Token-based auth (configurable) - Request validation ## Extensibility ### Adding New Fault Types 1. Update `FaultService.inject()` 2. Add integration with chaos tool 3. Update cleanup logic ### Adding New Evaluations 1. Extend `EvalService.score()` 2. Add new guard types 3. Update `AssistantExpectation` model ### Adding New States 1. Update `State` enum 2. Add handler in `engine.py` 3. Update FSM transitions ## Performance Considerations ### Async/Await - All I/O operations are async - Non-blocking execution - Parallel service calls where possible ### Resource Management - Connection pooling (httpx) - File handle management - Memory-efficient streaming ### Logging - Rotating file handler - Size-based rotation - Async-safe logging ## Testing Strategy ### Unit Tests - Service mocking - Pydantic validation - FSM state transitions ### Integration Tests - Mock remediation API - Local command execution - End-to-end scenarios ### System Tests - Full scenario execution - Artifact validation - Performance benchmarks ## Deployment Patterns ### Standalone ```bash python -m mcp_server.server ``` ### Docker ```bash docker build -t mcp-server . docker run -p 50051:50051 mcp-server ``` ### Kubernetes - StatefulSet for persistence - ConfigMap for scenarios - PVC for logs ### Cloud Functions - Serverless execution - Event-driven triggers - Managed storage ## Monitoring ### Metrics (Future) - Scenario execution time - Pass/fail rates - Service latency - Resource usage ### Tracing (Future) - OpenTelemetry integration - Distributed tracing - Service dependencies ### Alerting - Failed scenarios - Service errors - Resource exhaustion ## Future Enhancements 1. **gRPC Streaming**: Real-time event streaming 2. **WebSocket**: Live scenario updates 3. **Metrics Export**: Prometheus/StatsD 4. **Distributed Tracing**: OpenTelemetry 5. **Multi-tenancy**: Namespace isolation 6. **Scenario Library**: Pre-built templates 7. **UI Dashboard**: Web-based monitoring 8. **CI/CD Integration**: GitHub Actions, Jenkins

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Purv123/Remidiation-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server