# MCP Database System Architecture
## Overview
The MCP (Model Context Protocol) Database System provides intelligent database interaction through a clean, layered architecture. The system automatically determines the best approach for answering questions using SQL queries, semantic search, or hybrid strategies.
## Architecture Layers
### 1. **Presentation Layer** (`presentation/`)
- **MCP Server** (`mcp_server.py`) - Complete MCP protocol implementation
- **HTTP Bridge** (`http_bridge.py`) - REST API for React integration
- **Tools** (`tools/`) - Individual MCP tool implementations
### 2. **Service Layer** (`services/`)
- **Smart Search** (`smart_search_service.py`) - Intelligent orchestration
- **Schema Service** (`schema_service.py`) - Database schema operations
- **SQL Service** (`sql_service.py`) - SQL generation and execution
- **Semantic Service** (`semantic_service.py`) - Vector/text search
- **Synthesis Service** (`synthesis_service.py`) - Response generation
### 3. **Repository Layer** (`repositories/`)
- **Postgres Repository** (`postgres_repository.py`) - Database operations
- **Vector Repository** (`vector_repository.py`) - Vector search operations
- **Embedding Repository** (`embedding_repository.py`) - Embedding generation
### 4. **Shared Components** (`shared/`)
- **Models** (`models.py`) - Data structures and types
- **Exceptions** (`exceptions.py`) - Custom error types
### 5. **Configuration** (`config/`)
- **Config Manager** (`config_manager.py`) - Environment and file-based configuration
- **Example Files** - Configuration templates
## Data Flow
```
User Question
↓
Smart Search Service
↓
Question Classification
↓
Strategy Selection:
- SQL-only: Schema → SQL → Postgres
- Semantic-only: Vector Search → Results
- Hybrid: Both paths → Synthesis
↓
Response Generation
↓
Formatted Answer
```
## Key Features
### Intelligent Question Classification
```python
# Automatic strategy detection
"How many users?" → SQL_ONLY
"What is a user?" → SEMANTIC_ONLY
"Show users and explain" → HYBRID
```
### Safe SQL Execution
- Automatic table existence validation
- Query safety pattern checking
- Row limit enforcement
- Execution time monitoring
### Vector Search with Fallback
- pgvector integration when available
- Automatic fallback to text search
- Similarity scoring and ranking
### Comprehensive Configuration
- Environment-based configuration
- File-based configuration (YAML/JSON)
- Environment variable override
- Validation and error handling
## Component Details
### Smart Search Service
The core orchestrator that:
1. **Classifies** user questions
2. **Routes** to appropriate data sources
3. **Coordinates** multi-step searches
4. **Synthesizes** comprehensive responses
```python
# Example usage
smart_search = SmartSearchService(schema, sql, semantic, synthesis)
result = await smart_search.search("How many active customers do we have?")
```
### Repository Pattern
Clean data access layer with:
- Database connection management
- Error handling and logging
- Safety validations
- Performance monitoring
```python
# Repository usage
postgres_repo = PostgresRepository(engine)
result = postgres_repo.execute_query("SELECT COUNT(*) FROM users")
```
### MCP Protocol Integration
Full MCP server with:
- Tool registration and discovery
- Request/response handling
- Error management
- Async operation support
```python
# MCP server setup
server = MCPDatabaseServer(config)
await server.initialize()
await server.start()
```
## Configuration Management
Multi-source configuration with precedence:
1. Default values
2. Configuration files
3. Environment variables
```yaml
# config.yaml
database:
host: localhost
port: 5432
database: mydb
llm:
provider: openai
model: gpt-4
```
## Error Handling
Comprehensive error management:
- Custom exception hierarchy
- Graceful degradation
- Detailed logging
- User-friendly error messages
## Performance Considerations
- Connection pooling
- Query result caching
- Async operations throughout
- Configurable limits and timeouts
## Security Features
- SQL injection prevention
- Query pattern validation
- API rate limiting
- CORS configuration
- Secure configuration management
## Testing Strategy
- Unit tests for each component
- Integration tests for workflows
- Mock-based testing for external services
- Performance and load testing
## Migration from Legacy
The system includes a compatibility wrapper for existing `LLMDatabaseRouter` usage:
```python
# Legacy code continues to work
router = LLMDatabaseRouter(engine)
result = await router.answer_question("How many users?")
# New recommended approach
smart_search = SmartSearchService(...)
result = await smart_search.search("How many users?")
```
## Deployment
### Development
```bash
# Using configuration file
python -m presentation.mcp_server --config config/dev.yaml
# Using environment variables
export DATABASE_URL=postgresql://...
python -m presentation.mcp_server
```
### Production
```bash
# With Docker
docker run -e DATABASE_URL=... -e OPENAI_API_KEY=... mcp-server
# Direct deployment
MCP_ENVIRONMENT=production python -m presentation.mcp_server
```
## Monitoring
- Structured logging with configurable levels
- Performance metrics collection
- Error tracking and alerting
- Health check endpoints
## Extensibility
The modular architecture allows easy extension:
- **New data sources**: Add repositories
- **New search strategies**: Extend smart search
- **New protocols**: Add presentation layers
- **New LLM providers**: Extend service configurations
## Best Practices
1. **Use Smart Search Service** for new implementations
2. **Configure via environment** for deployment flexibility
3. **Monitor performance** with built-in metrics
4. **Test thoroughly** with provided test suites
5. **Follow security guidelines** for production deployment