Skip to main content
Glama

Codebase MCP Server

by Ravenight13
data-model.md11.5 kB
# Data Model: list_tasks Token Optimization **Feature**: Optimize list_tasks MCP Tool for Token Efficiency **Date**: 2025-10-10 **Branch**: 004-as-an-ai ## Overview This document defines the Pydantic models for the two-tier task response pattern: lightweight `TaskSummary` for efficient list operations and full `TaskResponse` for detailed task information. ## Model Hierarchy ``` BaseTaskFields (abstract base) ├── TaskSummary (inherits base fields only) └── TaskResponse (inherits base fields + adds detail fields) ``` ## Entity Definitions ### 1. BaseTaskFields (Base Model) **Purpose**: Shared core fields between summary and full task representations **Fields**: | Field | Type | Required | Validation | Description | |-------|------|----------|------------|-------------| | `id` | UUID | Yes | Valid UUID format | Unique task identifier | | `title` | str | Yes | 1-200 characters, non-empty | Task title | | `status` | Literal | Yes | One of: "need to be done", "in-progress", "complete" | Current task status | | `created_at` | datetime | Yes | ISO 8601 format | Task creation timestamp | | `updated_at` | datetime | Yes | ISO 8601 format | Last modification timestamp | **Pydantic Configuration**: ```python class BaseTaskFields(BaseModel): """Shared fields between summary and full task responses. Constitutional Compliance: - Principle VIII: Pydantic-based type safety with explicit types """ id: UUID title: str = Field(min_length=1, max_length=200) status: Literal["need to be done", "in-progress", "complete"] created_at: datetime updated_at: datetime model_config = ConfigDict( from_attributes=True, # Enable ORM mode for SQLAlchemy json_schema_extra={ "example": { "id": "550e8400-e29b-41d4-a716-446655440000", "title": "Implement user authentication", "status": "in-progress", "created_at": "2025-10-10T10:00:00Z", "updated_at": "2025-10-10T15:30:00Z" } } ) ``` **Validation Rules**: - `title` must be non-empty and ≤200 characters (enforced by Field constraints) - `status` must be one of three valid values (enforced by Literal type) - `created_at` and `updated_at` must be valid datetime objects - `id` must be valid UUID format --- ### 2. TaskSummary (Lightweight Model) **Purpose**: Efficient task representation for list operations, optimized for token usage **Inheritance**: Inherits all fields from `BaseTaskFields` **Additional Fields**: None (uses base fields only) **Token Footprint**: ~120-150 tokens per task (estimated) **Pydantic Configuration**: ```python class TaskSummary(BaseTaskFields): """Lightweight task summary for list operations. Token Efficiency: - Includes only: id, title, status, created_at, updated_at - Excludes: description, notes, planning_references, branches, commits - Target: ~120-150 tokens per task - 15 tasks ≈ 1800-2250 tokens (with response envelope) Constitutional Compliance: - Principle IV: Performance (6x token reduction) - Principle VIII: Type safety (Pydantic validation) """ model_config = ConfigDict( from_attributes=True, json_schema_extra={ "example": { "id": "550e8400-e29b-41d4-a716-446655440000", "title": "Implement user authentication", "status": "in-progress", "created_at": "2025-10-10T10:00:00Z", "updated_at": "2025-10-10T15:30:00Z" } } ) ``` **Use Cases**: - Default response for `list_tasks()` tool - Task browsing and scanning - Quick status overview - Token-efficient operations --- ### 3. TaskResponse (Full Model) **Purpose**: Complete task representation including all metadata **Inheritance**: Inherits all fields from `BaseTaskFields` **Additional Fields**: | Field | Type | Required | Validation | Description | |-------|------|----------|------------|-------------| | `description` | str \| None | No | No max length | Detailed task description | | `notes` | str \| None | No | No max length | Additional notes | | `planning_references` | list[str] | Yes | Array of file paths | Planning document references | | `branches` | list[str] | Yes | Array of git branch names | Associated git branches | | `commits` | list[str] | Yes | Array of 40-char hex strings | Associated git commit hashes | **Token Footprint**: ~800-1000 tokens per task (estimated, with long descriptions) **Pydantic Configuration**: ```python class TaskResponse(BaseTaskFields): """Full task details including all metadata. Token Footprint: - Includes all BaseTaskFields PLUS detail fields - Description/notes can be lengthy (varies by task) - Planning references, branches, commits add arrays - Target: ~800-1000 tokens per task (high variance) Constitutional Compliance: - Principle VIII: Type safety (Pydantic validation) - Principle X: Git integration (branches, commits tracking) """ description: str | None = None notes: str | None = None planning_references: list[str] = Field(default_factory=list) branches: list[str] = Field(default_factory=list) commits: list[str] = Field(default_factory=list) model_config = ConfigDict( from_attributes=True, json_schema_extra={ "example": { "id": "550e8400-e29b-41d4-a716-446655440000", "title": "Implement user authentication", "description": "Add JWT-based authentication with refresh tokens...", "notes": "Consider OAuth2 integration for social login", "status": "in-progress", "created_at": "2025-10-10T10:00:00Z", "updated_at": "2025-10-10T15:30:00Z", "planning_references": [ "specs/001-auth/spec.md", "specs/001-auth/plan.md" ], "branches": ["001-user-auth"], "commits": ["a1b2c3d4e5f6..."] } } ) ``` **Use Cases**: - `get_task(task_id)` tool response - `list_tasks(full_details=True)` tool response (opt-in) - Detailed task examination - Full context for implementation --- ## Model Relationships **Subset Relationship**: ``` TaskSummary ⊂ TaskResponse All TaskSummary fields are present in TaskResponse. TaskResponse adds additional detail fields. ``` **Serialization Compatibility**: - Both models serialize to MCP-compliant JSON - Both models can deserialize from SQLAlchemy `Task` ORM objects (via `model_validate()`) - TaskSummary can be constructed from TaskResponse by field selection (though not needed in practice) --- ## Validation Strategy ### Pydantic Field Validators **Title Validation**: ```python @field_validator('title') @classmethod def validate_title(cls, v: str) -> str: """Ensure title is non-empty after stripping whitespace.""" if not v.strip(): raise ValueError('Title cannot be empty or whitespace-only') return v.strip() ``` **Status Validation**: Enforced by `Literal` type (compile-time + runtime) **Commit Hash Validation** (in TaskResponse): ```python @field_validator('commits') @classmethod def validate_commits(cls, v: list[str]) -> list[str]: """Ensure all commit hashes are 40-character hex strings.""" for commit in v: if not (len(commit) == 40 and all(c in '0123456789abcdef' for c in commit.lower())): raise ValueError(f'Invalid commit hash format: {commit}') return v ``` ### Runtime Validation - All models validate on construction (`model_validate()`) - Validation errors raise `pydantic.ValidationError` with field-level messages - Service layer catches validation errors and converts to user-friendly error responses --- ## Performance Characteristics ### Token Efficiency Comparison | Model | Fields | Tokens/Task (est.) | 15 Tasks Total | |-------|--------|-------------------|----------------| | **TaskSummary** | 5 core fields | ~120-150 | ~1800-2250 | | **TaskResponse** | 5 core + 5 detail | ~800-1000 | ~12000-15000 | **Token Reduction**: ~6x improvement (12000 → 2000 tokens) ### Serialization Performance - Pydantic `model_dump()`: <1ms per task - JSON serialization: <5ms for 15 tasks - Total serialization overhead: <10ms (negligible vs 200ms latency target) --- ## Database Mapping ### SQLAlchemy ORM Model (Unchanged) The underlying `Task` SQLAlchemy model remains unchanged. Both TaskSummary and TaskResponse are constructed from the same ORM object: ```python # SQLAlchemy Task model (src/models/task.py - existing) class Task(Base): __tablename__ = "tasks" id = Column(UUID(as_uuid=True), primary_key=True, default=uuid4) title = Column(String(200), nullable=False) description = Column(Text, nullable=True) notes = Column(Text, nullable=True) status = Column(String(50), nullable=False) created_at = Column(DateTime(timezone=True), server_default=func.now()) updated_at = Column(DateTime(timezone=True), server_default=func.now(), onupdate=func.now()) # ... (branches, commits via relationships) ``` **Conversion Pattern**: ```python # From ORM to Pydantic (service layer) task_orm = await db.execute(select(Task).where(...)) task = task_orm.scalar_one() # Summary conversion summary = TaskSummary.model_validate(task) # Full conversion response = TaskResponse.model_validate(task) ``` --- ## Migration Impact **Breaking Change**: `list_tasks` response format changes **Before**: ```json { "tasks": [ { "id": "...", "title": "...", "description": "...", // ❌ No longer in default response "notes": "...", // ❌ No longer in default response "status": "...", "created_at": "...", "updated_at": "...", "planning_references": [...], // ❌ No longer in default response "branches": [...], // ❌ No longer in default response "commits": [...] // ❌ No longer in default response } ], "total_count": 15 } ``` **After (default)**: ```json { "tasks": [ { "id": "...", "title": "...", "status": "...", "created_at": "...", "updated_at": "..." // ✅ Only 5 core fields } ], "total_count": 15 } ``` **Migration Path**: Use `full_details=True` parameter or call `get_task(task_id)` for specific details --- ## Testing Requirements ### Unit Tests - Test TaskSummary validation (valid and invalid inputs) - Test TaskResponse validation (valid and invalid inputs) - Test model serialization to JSON - Test model construction from SQLAlchemy ORM objects ### Integration Tests - Test token count for TaskSummary list (<2000 tokens for 15 tasks) - Test token count for TaskResponse list (baseline comparison) - Test field presence in serialized responses - Test backward compatibility with `full_details=True` --- ## Constitutional Compliance | Principle | Compliance | Evidence | |-----------|------------|----------| | **VIII. Pydantic Type Safety** | ✅ PASS | All models use Pydantic with explicit types, validators | | **IV. Performance Guarantees** | ✅ PASS | TaskSummary achieves 6x token reduction target | | **V. Production Quality** | ✅ PASS | Comprehensive validation, clear error messages | --- **Status**: ✅ Data model design complete - Ready for contract generation

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ravenight13/codebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server