# π MCP System Refactoring Guide
## Overview
This guide shows how to migrate from the monolithic `llmDatabaseRouter.py` to a clean, layered architecture.
## ποΈ New Architecture
```
π mcp_system/
βββ π presentation/ # MCP Protocol Layer (thin)
β βββ server.py # MCP server setup & routing
β βββ tools/ # MCP tool implementations
β βββ search_tools.py # Question answering tools
β βββ sql_tools.py # SQL execution tools
β βββ schema_tools.py # Schema inspection tools
β
βββ π§ services/ # Business Logic Layer
β βββ smart_search.py # Main orchestrator
β βββ sql_service.py # SQL generation & validation
β βββ semantic_service.py # Vector/text search
β βββ schema_service.py # Schema introspection
β βββ synthesis_service.py # LLM response generation
β
βββ πΎ repositories/ # Data Access Layer
β βββ postgres_repository.py # Raw DB operations
β βββ vector_repository.py # Embedding operations
β βββ llm_repository.py # LLM API operations
β
βββ π§ shared/ # Shared utilities
βββ models.py # Data classes/types
βββ exceptions.py # Custom exceptions
βββ config.py # Configuration
```
## π Migration Steps
### Phase 1: Move Functions to Repositories
**From llmDatabaseRouter.py β repositories/postgres_repository.py:**
- `safe_run_sql()` β `execute_query()`
- `_get_all_table_names()` β `get_all_table_names()`
- `_validate_table_existence()` β `validate_tables_exist()`
- `_is_sql_safe_to_run()` β `_is_sql_safe_to_run()`
**From llmDatabaseRouter.py β repositories/vector_repository.py:**
- `semantic_rows()` β `search_embeddings()`
- `_text_search_fallback()` β `text_search_fallback()`
- `_generate_embedding()` β Move to separate embedding service
### Phase 2: Extract Services
**Create services/schema_service.py:**
- `get_schema_info()` β Enhanced version with caching
- `_build_schema_description()` β `build_catalog_descriptions()`
- `search_catalog()` β `find_relevant_tables()`
**Create services/sql_service.py:**
- `generate_sql()` β Enhanced with better context
- `_generate_sql_heuristics()` β Fallback method
- `_get_suggested_queries()` β `get_suggested_queries()`
**Create services/semantic_service.py:**
- `_attempt_semantic_search()` β `search()`
- `_extract_search_terms()` β Internal method
**Create services/synthesis_service.py:**
- `_generic_synthesis()` β `synthesize_response()`
- `_build_generic_synthesis_prompt()` β Internal method
- `_clean_markdown_output()` β `clean_markdown()`
### Phase 3: Create Smart Search Orchestrator
**Create services/smart_search.py:**
- `answer()` β Main orchestration method
- Move question classification logic here
- Coordinate between all services
### Phase 4: Thin MCP Layer
**Update server.py:**
- Remove all business logic
- Keep only MCP protocol handling
- Delegate to SmartSearch service
## π§ Key Benefits
### β
Separation of Concerns
- **Repositories**: Pure data access, no business logic
- **Services**: Business logic, no protocol concerns
- **Presentation**: Protocol handling, no data access
### β
Testability
- Each layer can be unit tested independently
- Mock dependencies easily
- Clear test boundaries
### β
Maintainability
- Single responsibility principle
- Easy to find and modify specific functionality
- Clear dependency flow
### β
Scalability
- Services can be scaled independently
- Easy to add new data sources
- Simple to extend with new capabilities
## π Usage Examples
### Before (Monolithic)
```python
# Everything in one place
router = llmDatabaseRouter(engine, llm_client)
response = router.answer("How many users are active?")
```
### After (Clean Architecture)
```python
# Clear separation of concerns
smart_search = SmartSearch(
schema_service, sql_service,
semantic_service, synthesis_service
)
response = smart_search.answer("How many users are active?")
```
## π¦ Dependency Injection
```python
# repositories/
postgres_repo = PostgresRepository(engine)
vector_repo = VectorRepository(engine)
# services/
schema_service = SchemaService(postgres_repo)
sql_service = SQLService(postgres_repo, schema_service)
semantic_service = SemanticService(vector_repo)
synthesis_service = SynthesisService(llm_client)
# orchestrator/
smart_search = SmartSearch(
schema_service, sql_service,
semantic_service, synthesis_service
)
# presentation/
search_tools = SearchTools(smart_search)
```
## π§ͺ Testing Strategy
### Repository Tests
```python
def test_postgres_repository():
repo = PostgresRepository(test_engine)
result = repo.execute_query("SELECT 1")
assert result.success
assert result.data == [{'?column?': 1}]
```
### Service Tests
```python
def test_sql_service():
mock_repo = Mock()
mock_schema = Mock()
service = SQLService(mock_repo, mock_schema)
queries = service.get_suggested_queries("Count users")
assert len(queries) > 0
assert "SELECT COUNT(*)" in queries[0].sql
```
### Integration Tests
```python
def test_smart_search_integration():
search = SmartSearch(...)
response = search.answer("How many active users?")
assert response.success
assert "users" in response.answer_markdown.lower()
```
## π Migration Checklist
- [ ] Create new directory structure
- [ ] Move shared models and exceptions
- [ ] Extract PostgresRepository
- [ ] Extract VectorRepository
- [ ] Create SchemaService
- [ ] Create SQLService
- [ ] Create SemanticService
- [ ] Create SynthesisService
- [ ] Build SmartSearch orchestrator
- [ ] Create thin MCP tools
- [ ] Update server.py
- [ ] Add configuration management
- [ ] Write unit tests
- [ ] Write integration tests
- [ ] Update documentation
## π― Next Steps
1. **Start with repositories** - Move pure data access first
2. **Extract services one by one** - Maintain working system
3. **Create orchestrator** - Wire everything together
4. **Thin the MCP layer** - Remove business logic
5. **Add comprehensive tests** - Ensure reliability
6. **Monitor and optimize** - Improve performance
This refactoring will make your codebase much more maintainable, testable, and scalable! π