Codebase MCP Server

codebase-mcp
specs
008-multi-project-workspace

data-model.md•17.1 KiB

# Data Model: Multi-Project Workspace Support **Feature**: 008-multi-project-workspace **Date**: 2025-10-12 **Status**: Phase 1 Complete ## Overview This document defines the data models required for multi-project workspace support. All models use Pydantic for runtime validation and mypy --strict type safety (Constitutional Principle VIII). --- ## Pydantic Models ### 1. ProjectIdentifier **Purpose**: Validated project identifier with security guarantees **Lifecycle**: - Created: When user provides project_id parameter in MCP tool call - Validated: Immediately upon construction (Pydantic validator) - Used: Converted to PostgreSQL schema name **Fields**: ```python class ProjectIdentifier(BaseModel): """Validated project identifier preventing SQL injection.""" value: str = Field( ..., min_length=1, max_length=50, description="Project identifier (lowercase alphanumeric with hyphens)", examples=["client-a", "frontend", "my-project-123"] ) @field_validator('value') @classmethod def validate_format(cls, v: str) -> str: """Validate project identifier format and security constraints. Rules: - Lowercase alphanumeric with hyphens only - Cannot start or end with hyphen - No consecutive hyphens - Prevents SQL injection via strict character whitelist Raises: ValueError: If identifier format is invalid """ # Length already validated by Field(min_length=1, max_length=50) # Format validation (security-critical) pattern = re.compile(r'^[a-z0-9]+(-[a-z0-9]+)*$') if not pattern.match(v): raise ValueError( "Project identifier must be lowercase alphanumeric with hyphens. " "Cannot start/end with hyphen or have consecutive hyphens. " f"Found: {v}\n\n" "Examples:\n" " ✅ client-a\n" " ✅ my-project-123\n" " ❌ My_Project (uppercase and underscore)\n" " ❌ -project (starts with hyphen)\n" " ❌ project-- (consecutive hyphens)" ) return v def to_schema_name(self) -> str: """Convert validated identifier to PostgreSQL schema name. Returns: Schema name in format: project_{identifier} """ return f"project_{self.value}" ``` **Validation Examples**: - ✅ Valid: `"client-a"`, `"frontend"`, `"my-project-123"` - ❌ Invalid: `"My_Project"` (uppercase, underscore), `"-project"` (starts with hyphen) - ❌ Blocked: `"project'; DROP TABLE--"` (SQL injection attempt) **Traces to**: - FR-004: System MUST validate project identifiers before use - FR-005: System MUST enforce lowercase alphanumeric with hyphen format - FR-006: System MUST enforce maximum 50-character length - FR-007: System MUST prevent identifiers starting or ending with hyphens - FR-008: System MUST prevent consecutive hyphens - FR-016: System MUST prevent security vulnerabilities via identifier validation --- ### 2. WorkspaceConfig **Purpose**: Configuration for a project workspace **Lifecycle**: - Created: When project schema is first provisioned - Stored: In `project_registry.workspace_config` table (PostgreSQL) - Updated: Never (immutable after creation) - Deleted: Future feature (not in scope) **Fields**: ```python class WorkspaceConfig(BaseModel): """Project workspace metadata and configuration.""" project_id: str = Field( ..., description="Validated project identifier (immutable)", ) schema_name: str = Field( ..., description="PostgreSQL schema name (e.g., project_client_a)", ) created_at: datetime = Field( default_factory=datetime.utcnow, description="Timestamp when workspace was provisioned", ) # Future extensibility (not used in initial implementation) metadata: dict[str, Any] = Field( default_factory=dict, description="Optional metadata (reserved for future features)", ) model_config = ConfigDict(frozen=True) # Immutable after creation ``` **Storage**: ```sql -- Stored in global registry schema (shared across all projects) CREATE SCHEMA IF NOT EXISTS project_registry; CREATE TABLE project_registry.workspace_config ( project_id VARCHAR(50) PRIMARY KEY, schema_name VARCHAR(60) NOT NULL UNIQUE, -- "project_" + project_id created_at TIMESTAMP NOT NULL DEFAULT NOW(), metadata JSONB DEFAULT '{}'::jsonb ); -- Index for schema name lookups CREATE UNIQUE INDEX idx_workspace_schema ON project_registry.workspace_config(schema_name); ``` **Traces to**: - FR-009: System MUST create isolated workspace for each unique project identifier - FR-010: System MUST automatically provision workspace on first use --- ### 3. WorkflowIntegrationContext **Purpose**: Cached context from workflow-mcp server (optional integration) **Lifecycle**: - Created: When workflow-mcp query succeeds - Cached: In-memory for 60 seconds (TTL) - Expired: After TTL or on cache invalidation - Deleted: On server shutdown **Fields**: ```python class WorkflowIntegrationContext(BaseModel): """Context from workflow-mcp server with caching metadata.""" active_project_id: str | None = Field( None, description="Active project from workflow-mcp (None if unavailable)", ) status: Literal["success", "timeout", "unavailable", "invalid_response"] = Field( ..., description="Integration status for error handling", ) retrieved_at: datetime = Field( default_factory=datetime.utcnow, description="Cache timestamp for TTL calculation", ) cache_ttl_seconds: int = Field( default=60, description="Time-to-live for cached value", ) def is_expired(self) -> bool: """Check if cached context has exceeded TTL. Returns: True if cache expired, False otherwise """ age_seconds = (datetime.utcnow() - self.retrieved_at).total_seconds() return age_seconds > self.cache_ttl_seconds ``` **Cache Implementation**: ```python # In-memory cache (simple dictionary) _workflow_context_cache: dict[str, WorkflowIntegrationContext] = {} async def get_workflow_context() -> WorkflowIntegrationContext: """Get workflow-mcp context with TTL-based caching.""" # Check cache cached = _workflow_context_cache.get("active_project") if cached and not cached.is_expired(): return cached # Query workflow-mcp (with timeout) try: response = await workflow_client.get("/api/v1/projects/active", timeout=1.0) context = WorkflowIntegrationContext( active_project_id=response.json().get("project_id"), status="success", ) except httpx.TimeoutException: context = WorkflowIntegrationContext( active_project_id=None, status="timeout", ) # ... other error cases # Update cache _workflow_context_cache["active_project"] = context return context ``` **Traces to**: - FR-012: System SHOULD query workflow-mcp for active project when available - FR-013: System MUST gracefully handle workflow-mcp unavailability - FR-014: System MUST categorize workflow-mcp integration failures - FR-015: System SHOULD cache workflow-mcp responses temporarily --- ## Database Schema Changes ### Global Registry Schema **Purpose**: Store workspace metadata across all projects ```sql -- Create global registry schema (one-time initialization) CREATE SCHEMA IF NOT EXISTS project_registry; -- Workspace configuration table CREATE TABLE project_registry.workspace_config ( project_id VARCHAR(50) PRIMARY KEY, schema_name VARCHAR(60) NOT NULL UNIQUE, created_at TIMESTAMP NOT NULL DEFAULT NOW(), metadata JSONB DEFAULT '{}'::jsonb ); -- Index for fast schema lookups CREATE UNIQUE INDEX idx_workspace_schema ON project_registry.workspace_config(schema_name); -- Index for creation date queries (future archival features) CREATE INDEX idx_workspace_created ON project_registry.workspace_config(created_at DESC); ``` ### Per-Project Schemas **Purpose**: Isolate codebase data per project **Schema creation** (auto-provisioned on first use): ```sql -- Create project schema CREATE SCHEMA IF NOT EXISTS project_client_a; -- Set search_path to new schema SET search_path TO project_client_a; -- Create tables (same structure as default schema) CREATE TABLE repositories ( id SERIAL PRIMARY KEY, path VARCHAR(500) NOT NULL UNIQUE, indexed_at TIMESTAMP NOT NULL DEFAULT NOW(), metadata JSONB DEFAULT '{}'::jsonb ); CREATE TABLE code_chunks ( id SERIAL PRIMARY KEY, repository_id INTEGER NOT NULL REFERENCES repositories(id) ON DELETE CASCADE, file_path VARCHAR(500) NOT NULL, chunk_index INTEGER NOT NULL, content TEXT NOT NULL, start_line INTEGER NOT NULL, end_line INTEGER NOT NULL, embedding vector(768), -- pgvector type metadata JSONB DEFAULT '{}'::jsonb, UNIQUE(repository_id, file_path, chunk_index) ); -- pgvector index for semantic search CREATE INDEX idx_chunks_embedding ON code_chunks USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); -- Index for file path queries CREATE INDEX idx_chunks_file_path ON code_chunks(file_path); ``` **Key points**: - **Structure consistency**: All project schemas have identical table definitions - **pgvector extension**: Database-level, available in all schemas - **Isolation**: Each schema has independent indexes and statistics - **Migration support**: Alembic can manage schema structure upgrades --- ## SQLAlchemy Model Changes ### No Model Changes Required **Current models** (e.g., `src/models/repository.py`) work unchanged: ```python from src.models.database import Base class Repository(Base): __tablename__ = "repositories" id: Mapped[int] = mapped_column(primary_key=True) path: Mapped[str] = mapped_column(String(500)) indexed_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow) metadata: Mapped[dict] = mapped_column(JSON, default=dict) ``` **Schema resolution** via `search_path`: ```python async def get_session(project_id: str | None = None) -> AsyncSession: """Get database session scoped to project schema.""" async with SessionLocal() as session: # Resolve schema name schema_name = resolve_schema_name(project_id) # Set search_path (all queries use this schema) await session.execute(text(f"SET search_path TO {schema_name}")) yield session ``` **Benefits**: - **Zero model duplication**: Single model definition serves all projects - **Session-scoped isolation**: Each request targets correct schema - **Migration simplicity**: Single Alembic migration for all schemas --- ## Service Layer Changes ### New Services #### ProjectWorkspaceManager **Purpose**: Manage project workspace lifecycle ```python class ProjectWorkspaceManager: """Manage project workspace provisioning and validation.""" async def ensure_workspace_exists(self, project_id: str) -> str: """Ensure project workspace exists, creating if necessary. Args: project_id: Validated project identifier Returns: Schema name (e.g., "project_client_a") Raises: PermissionError: If schema creation fails due to permissions """ # Validate identifier identifier = ProjectIdentifier(value=project_id) schema_name = identifier.to_schema_name() # Check if schema exists (cached) if await self._schema_exists(schema_name): return schema_name # Create schema (one-time operation) await self._create_schema(schema_name) # Register in workspace_config table await self._register_workspace(project_id, schema_name) return schema_name async def _schema_exists(self, schema_name: str) -> bool: """Check if PostgreSQL schema exists.""" async with engine.connect() as conn: result = await conn.execute(text( "SELECT 1 FROM information_schema.schemata " "WHERE schema_name = :schema_name" ), {"schema_name": schema_name}) return result.scalar() is not None async def _create_schema(self, schema_name: str) -> None: """Create PostgreSQL schema with table structure.""" async with engine.begin() as conn: # Create schema await conn.execute(text(f"CREATE SCHEMA IF NOT EXISTS {schema_name}")) # Set search_path await conn.execute(text(f"SET search_path TO {schema_name}")) # Create tables (use SQLAlchemy metadata) await conn.run_sync(Base.metadata.create_all) async def _register_workspace(self, project_id: str, schema_name: str) -> None: """Register workspace in global registry.""" async with get_session() as session: await session.execute(text( "INSERT INTO project_registry.workspace_config " "(project_id, schema_name) VALUES (:project_id, :schema_name) " "ON CONFLICT (project_id) DO NOTHING" ), {"project_id": project_id, "schema_name": schema_name}) ``` #### WorkflowIntegrationClient **Purpose**: Optional integration with workflow-mcp ```python class WorkflowIntegrationClient: """Optional HTTP client for workflow-mcp integration.""" def __init__(self, base_url: str = "http://localhost:8000"): self.base_url = base_url self.client = httpx.AsyncClient(timeout=httpx.Timeout(1.0)) self._cache: dict[str, WorkflowIntegrationContext] = {} async def get_active_project(self) -> str | None: """Query workflow-mcp for active project with caching. Returns: Active project ID or None if unavailable Error Handling: - Timeout: Returns None, logs warning - Connection refused: Returns None, logs info - Invalid response: Returns None, logs error """ # Check cache cached = self._cache.get("active_project") if cached and not cached.is_expired(): return cached.active_project_id # Query workflow-mcp try: response = await self.client.get(f"{self.base_url}/api/v1/projects/active") response.raise_for_status() data = response.json() context = WorkflowIntegrationContext( active_project_id=data.get("project_id"), status="success", ) except httpx.TimeoutException: logger.warning("workflow-mcp timeout, using default workspace") context = WorkflowIntegrationContext( active_project_id=None, status="timeout", ) except httpx.ConnectError: logger.info("workflow-mcp unavailable, using default workspace") context = WorkflowIntegrationContext( active_project_id=None, status="unavailable", ) except Exception as e: logger.error(f"workflow-mcp error: {e}") context = WorkflowIntegrationContext( active_project_id=None, status="invalid_response", ) # Update cache self._cache["active_project"] = context return context.active_project_id ``` --- ## Configuration Changes ### Settings Updates **Add to `src/config/settings.py`**: ```python class Settings(BaseSettings): # ... existing settings ... # Multi-project workspace settings workflow_mcp_url: Annotated[ HttpUrl | None, Field( default=None, description="Optional workflow-mcp server URL for automatic project detection", ), ] workflow_mcp_timeout: Annotated[ float, Field( default=1.0, ge=0.1, le=5.0, description="Timeout for workflow-mcp queries (seconds)", ), ] workflow_mcp_cache_ttl: Annotated[ int, Field( default=60, ge=10, le=300, description="Cache TTL for workflow-mcp responses (seconds)", ), ] ``` --- ## Model Summary ### Pydantic Models (Runtime Validation) 1. **ProjectIdentifier**: Validated project ID (security-critical) 2. **WorkspaceConfig**: Workspace metadata (immutable) 3. **WorkflowIntegrationContext**: Cached workflow-mcp context (TTL-based) ### Database Tables (PostgreSQL) 1. **project_registry.workspace_config**: Global workspace registry 2. **project_{id}.repositories**: Per-project repository metadata 3. **project_{id}.code_chunks**: Per-project code chunks with embeddings ### Service Classes (Business Logic) 1. **ProjectWorkspaceManager**: Workspace provisioning and validation 2. **WorkflowIntegrationClient**: Optional workflow-mcp integration ### Key Invariants - **Immutability**: Project IDs cannot be renamed after creation - **Isolation**: Complete schema-level data separation - **Backward Compatibility**: NULL project_id → "project_default" schema - **Security**: All identifiers validated before SQL operations **Phase 1 Data Model Complete** ✅

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ravenight13/codebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

data-model.md•17.1 KiB