AutoDocs MCP Server

phase-1-core-validation.md•10.4 KiB

# Phase 1: Core Validation

**Duration**: 2-3 days
**Goal**: Prove the concept works and establish solid foundations
**Status**: ✅ **COMPLETED** - Concept validated, architecture established

## The Challenge

Build a minimal viable system that can:
- Parse Python project dependencies from pyproject.toml files
- Integrate with the MCP (Model Context Protocol) ecosystem
- Provide a single, reliable tool for dependency scanning

**Critical Questions to Answer**:
1. Can we reliably parse diverse pyproject.toml structures?
2. Does MCP integration work smoothly for real AI assistants?
3. What architecture patterns will scale as we add complexity?

## Technical Implementation

### Foundation Architecture

From the very beginning, we established a **layered architecture** that would support future growth:

```python
# Core Services Layer
src/autodoc_mcp/core/
├── dependency_parser.py    # PyProject.toml parsing logic
├── cache_manager.py       # Simple JSON file caching
└── error_formatter.py     # Structured error handling

# Infrastructure Layer
src/autodoc_mcp/
├── main.py               # FastMCP server entry point
├── config.py             # Configuration management
├── models.py             # Pydantic data models
└── exceptions.py         # Custom exception hierarchy
```

**Why This Architecture Worked**:
- **Clear boundaries**: Each component had a single responsibility
- **Easy testing**: Mock boundaries aligned with architectural boundaries
- **Evolutionary**: New features could be added without refactoring existing code
- **Maintainable**: Changes in one layer didn't ripple through others

### The First MCP Tool: `scan_dependencies`

The initial tool was deceptively simple but included sophisticated error handling:

```python
async def scan_dependencies(project_path: Optional[str] = None) -> dict:
    """
    Parse pyproject.toml and extract all dependencies with graceful error handling.

    Args:
        project_path: Path to project directory (defaults to current directory)

    Returns:
        ScanResult with dependencies, warnings, and parsing statistics
    """
```

**Key Innovation**: **Graceful degradation from day one**. Instead of failing on malformed files, the parser collected warnings and returned partial results.

```python
# Example response showing graceful degradation
{
    "success": true,
    "dependencies": {
        "fastmcp": ">=0.1.0",
        "pydantic": "^2.0.0",
        "httpx": "*"
    },
    "warnings": [
        "Invalid version constraint 'invalid-version' for package 'some-pkg', skipped"
    ],
    "statistics": {
        "total_found": 15,
        "valid_parsed": 12,
        "invalid_skipped": 3
    }
}
```

## Technical Decisions That Scaled

### Decision 1: FastMCP Framework
**Choice**: Use FastMCP instead of building raw MCP integration
**Rationale**: Focus on business logic, not protocol implementation
**Long-term Impact**: Enabled rapid development of 7 additional tools without protocol complexity

```python
# Clean, declarative tool definition
@mcp.tool()
async def scan_dependencies(project_path: Optional[str] = None) -> dict:
    """Parse project dependencies from pyproject.toml file."""
    # Implementation focuses on business logic only
```

### Decision 2: Pydantic for Data Validation
**Choice**: Use Pydantic v2 for all data models and validation
**Rationale**: Type safety, automatic validation, and excellent error messages
**Long-term Impact**: Prevented entire classes of runtime errors and improved debugging

```python
class ScanResult(BaseModel):
    """Results from dependency scanning operation."""
    success: bool
    dependencies: Dict[str, str] = Field(default_factory=dict)
    warnings: List[str] = Field(default_factory=list)
    errors: List[str] = Field(default_factory=list)
    statistics: Optional[ScanStatistics] = None
```

### Decision 3: Comprehensive Error Context
**Choice**: Include recovery suggestions in all error responses
**Rationale**: Users need actionable information, not just error messages
**Long-term Impact**: Created consistent, helpful error experience across all 8 tools

```python
# Error messages include context for recovery
{
    "error": "Failed to parse pyproject.toml",
    "details": "Invalid TOML syntax at line 23: Missing closing quote",
    "suggestions": [
        "Check line 23 in pyproject.toml for syntax errors",
        "Validate TOML syntax using an online validator",
        "Ensure all strings are properly quoted"
    ]
}
```

## Quality Foundation

### Testing Strategy from Day One
We established comprehensive testing patterns that supported rapid development:

```python
# Pattern: Integration tests with real files
def test_scan_real_project():
    """Test with actual pyproject.toml file"""
    result = await scan_dependencies("./")
    assert result["success"] is True
    assert "fastmcp" in result["dependencies"]

# Pattern: Error condition testing
def test_scan_malformed_toml():
    """Test graceful handling of invalid TOML"""
    result = await scan_dependencies("./test/fixtures/invalid.toml")
    assert result["success"] is False
    assert "TOML syntax error" in result["errors"][0]
    assert len(result["suggestions"]) > 0
```

**Coverage from Day One**: 85% test coverage established in Phase 1, creating a quality foundation for future development.

### CI/CD Pipeline
Complete automation established early:

```yaml
# Key quality gates from Phase 1
- name: Run tests
  run: pytest --cov=src --cov-report=term-missing

- name: Type checking
  run: mypy src/

- name: Code formatting
  run: ruff check src/ tests/

- name: Security scanning
  run: bandit -r src/
```

## Validation Results

### ✅ **Parsing Reliability Validated**
Tested against 20+ real Python projects with diverse dependency specifications:
- **pydantic**: Complex version constraints with extras
- **django**: Multiple dependency groups (main, dev, test)
- **fastapi**: Modern pyproject.toml structure
- **requests**: Simple, traditional structure

**Result**: 95%+ successful parsing rate with graceful degradation for edge cases.

### ✅ **MCP Integration Validated**
Integrated with multiple AI assistants:
- **Claude Code**: stdio transport working perfectly
- **Cursor**: MCP server configuration successful
- **Local testing**: Direct FastMCP integration validated

**Result**: Smooth integration experience with clear setup instructions.

### ✅ **Architecture Scalability Validated**
Added second tool (`get_basic_docs`) to test architectural patterns:
- New tool added in <1 hour
- No changes required to existing code
- Testing patterns reused successfully

**Result**: Architecture ready for expansion to 8 tools.

## Lessons Learned

### What Worked Exceptionally Well

1. **Graceful Degradation Philosophy**: Collecting warnings instead of failing fast made the tool resilient to real-world messiness.

2. **Architecture-First Approach**: Spending time on the layered architecture paid off immediately when adding the second tool.

3. **Error Context Innovation**: Including recovery suggestions in errors differentiated our UX from standard developer tools.

4. **Quality Gates Early**: Establishing 85% test coverage and CI/CD in Phase 1 prevented technical debt accumulation.

### Challenges and Solutions

#### Challenge 1: TOML Parsing Edge Cases
**Problem**: Python's `toml` library doesn't handle all real-world edge cases gracefully
**Solution**: Wrapped parsing in comprehensive try-catch with specific error messages

```python
try:
    parsed_toml = toml.load(toml_path)
except toml.TomlDecodeError as e:
    return {
        "success": False,
        "errors": [f"TOML syntax error: {str(e)}"],
        "suggestions": [
            "Validate TOML syntax using an online validator",
            "Check for missing quotes or bracket mismatches"
        ]
    }
```

#### Challenge 2: Version Constraint Diversity
**Problem**: Python projects use inconsistent version constraint formats
**Solution**: Built a flexible parser that handles multiple formats gracefully

```python
# Flexible version constraint parsing
VALID_PATTERNS = [
    r"^[><=~!^]*[\d\.]+([\w\d\.-]*)?$",  # Standard semantic versions
    r"^\*$",                              # Wildcard
    r"^[><=~!^]*\d+$",                   # Major version only
]
```

#### Challenge 3: Configuration Management
**Problem**: Different environments need different settings
**Solution**: Environment-aware configuration with validation

```python
class AutoDocsConfig(BaseModel):
    cache_dir: Path = Field(default_factory=lambda: Path.home() / ".cache" / "autodoc-mcp")
    timeout_seconds: int = Field(default=30, ge=5, le=300)
    max_file_size_mb: int = Field(default=10, ge=1, le=100)

    @field_validator("cache_dir")
    @classmethod
    def validate_cache_dir(cls, v: Path) -> Path:
        v.mkdir(parents=True, exist_ok=True)
        return v
```

## Impact on Subsequent Phases

### Foundation for Phase 2
The dependency parsing capability became the input for documentation fetching. The structured error handling patterns were reused for network operations.

### Foundation for Phase 3
The graceful degradation philosophy established in Phase 1 became the template for handling network failures and partial results in Phase 3.

### Foundation for Phase 4
The configuration management and data model patterns scaled perfectly to handle the complexity of multi-dependency context fetching.

## Key Metrics

### Development Velocity
- **Day 1**: Project setup, basic FastMCP integration
- **Day 2**: Dependency parsing with error handling
- **Day 3**: Comprehensive testing and CI/CD setup

### Code Quality
- **Test Coverage**: 85%
- **Type Coverage**: 100% (MyPy strict mode)
- **Documentation**: Complete API documentation for all public methods

### Functionality
- **pyproject.toml Parsing**: 95%+ success rate across diverse projects
- **MCP Integration**: 100% compatibility with tested AI assistants
- **Error Handling**: Comprehensive recovery suggestions for all failure modes

## Looking Forward

Phase 1 established the **quality and architectural foundations** that enabled rapid, confident development in subsequent phases. The patterns established here - graceful degradation, comprehensive testing, and user-focused error messages - became the hallmarks of the entire system.

**Next**: [Phase 2: Documentation Fetching](phase-2-documentation-fetching.md) - Building the core documentation engine.

---

*This phase documentation is part of the AutoDocs MCP Server [Development Journey](../index.md).*

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bradleyfay/autodoc-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

phase-1-core-validation.md•10.4 KiB

# Phase 1: Core Validation

**Duration**: 2-3 days
**Goal**: Prove the concept works and establish solid foundations
**Status**: ✅ **COMPLETED** - Concept validated, architecture established

## The Challenge

Build a minimal viable system that can:
- Parse Python project dependencies from pyproject.toml files
- Integrate with the MCP (Model Context Protocol) ecosystem
- Provide a single, reliable tool for dependency scanning

**Critical Questions to Answer**:
1. Can we reliably parse diverse pyproject.toml structures?
2. Does MCP integration work smoothly for real AI assistants?
3. What architecture patterns will scale as we add complexity?

## Technical Implementation

### Foundation Architecture

From the very beginning, we established a **layered architecture** that would support future growth:

```python
# Core Services Layer
src/autodoc_mcp/core/
├── dependency_parser.py    # PyProject.toml parsing logic
├── cache_manager.py       # Simple JSON file caching
└── error_formatter.py     # Structured error handling

# Infrastructure Layer
src/autodoc_mcp/
├── main.py               # FastMCP server entry point
├── config.py             # Configuration management
├── models.py             # Pydantic data models
└── exceptions.py         # Custom exception hierarchy
```

**Why This Architecture Worked**:
- **Clear boundaries**: Each component had a single responsibility
- **Easy testing**: Mock boundaries aligned with architectural boundaries
- **Evolutionary**: New features could be added without refactoring existing code
- **Maintainable**: Changes in one layer didn't ripple through others

### The First MCP Tool: `scan_dependencies`

The initial tool was deceptively simple but included sophisticated error handling:

```python
async def scan_dependencies(project_path: Optional[str] = None) -> dict:
    """
    Parse pyproject.toml and extract all dependencies with graceful error handling.

    Args:
        project_path: Path to project directory (defaults to current directory)

    Returns:
        ScanResult with dependencies, warnings, and parsing statistics
    """
```

**Key Innovation**: **Graceful degradation from day one**. Instead of failing on malformed files, the parser collected warnings and returned partial results.

```python
# Example response showing graceful degradation
{
    "success": true,
    "dependencies": {
        "fastmcp": ">=0.1.0",
        "pydantic": "^2.0.0",
        "httpx": "*"
    },
    "warnings": [
        "Invalid version constraint 'invalid-version' for package 'some-pkg', skipped"
    ],
    "statistics": {
        "total_found": 15,
        "valid_parsed": 12,
        "invalid_skipped": 3
    }
}
```

## Technical Decisions That Scaled

### Decision 1: FastMCP Framework
**Choice**: Use FastMCP instead of building raw MCP integration
**Rationale**: Focus on business logic, not protocol implementation
**Long-term Impact**: Enabled rapid development of 7 additional tools without protocol complexity

```python
# Clean, declarative tool definition
@mcp.tool()
async def scan_dependencies(project_path: Optional[str] = None) -> dict:
    """Parse project dependencies from pyproject.toml file."""
    # Implementation focuses on business logic only
```

### Decision 2: Pydantic for Data Validation
**Choice**: Use Pydantic v2 for all data models and validation
**Rationale**: Type safety, automatic validation, and excellent error messages
**Long-term Impact**: Prevented entire classes of runtime errors and improved debugging

```python
class ScanResult(BaseModel):
    """Results from dependency scanning operation."""
    success: bool
    dependencies: Dict[str, str] = Field(default_factory=dict)
    warnings: List[str] = Field(default_factory=list)
    errors: List[str] = Field(default_factory=list)
    statistics: Optional[ScanStatistics] = None
```

### Decision 3: Comprehensive Error Context
**Choice**: Include recovery suggestions in all error responses
**Rationale**: Users need actionable information, not just error messages
**Long-term Impact**: Created consistent, helpful error experience across all 8 tools

```python
# Error messages include context for recovery
{
    "error": "Failed to parse pyproject.toml",
    "details": "Invalid TOML syntax at line 23: Missing closing quote",
    "suggestions": [
        "Check line 23 in pyproject.toml for syntax errors",
        "Validate TOML syntax using an online validator",
        "Ensure all strings are properly quoted"
    ]
}
```

## Quality Foundation

### Testing Strategy from Day One
We established comprehensive testing patterns that supported rapid development:

```python
# Pattern: Integration tests with real files
def test_scan_real_project():
    """Test with actual pyproject.toml file"""
    result = await scan_dependencies("./")
    assert result["success"] is True
    assert "fastmcp" in result["dependencies"]

# Pattern: Error condition testing
def test_scan_malformed_toml():
    """Test graceful handling of invalid TOML"""
    result = await scan_dependencies("./test/fixtures/invalid.toml")
    assert result["success"] is False
    assert "TOML syntax error" in result["errors"][0]
    assert len(result["suggestions"]) > 0
```

**Coverage from Day One**: 85% test coverage established in Phase 1, creating a quality foundation for future development.

### CI/CD Pipeline
Complete automation established early:

```yaml
# Key quality gates from Phase 1
- name: Run tests
  run: pytest --cov=src --cov-report=term-missing

- name: Type checking
  run: mypy src/

- name: Code formatting
  run: ruff check src/ tests/

- name: Security scanning
  run: bandit -r src/
```

## Validation Results

### ✅ **Parsing Reliability Validated**
Tested against 20+ real Python projects with diverse dependency specifications:
- **pydantic**: Complex version constraints with extras
- **django**: Multiple dependency groups (main, dev, test)
- **fastapi**: Modern pyproject.toml structure
- **requests**: Simple, traditional structure

**Result**: 95%+ successful parsing rate with graceful degradation for edge cases.

### ✅ **MCP Integration Validated**
Integrated with multiple AI assistants:
- **Claude Code**: stdio transport working perfectly
- **Cursor**: MCP server configuration successful
- **Local testing**: Direct FastMCP integration validated

**Result**: Smooth integration experience with clear setup instructions.

### ✅ **Architecture Scalability Validated**
Added second tool (`get_basic_docs`) to test architectural patterns:
- New tool added in <1 hour
- No changes required to existing code
- Testing patterns reused successfully

**Result**: Architecture ready for expansion to 8 tools.

## Lessons Learned

### What Worked Exceptionally Well

1. **Graceful Degradation Philosophy**: Collecting warnings instead of failing fast made the tool resilient to real-world messiness.

2. **Architecture-First Approach**: Spending time on the layered architecture paid off immediately when adding the second tool.

3. **Error Context Innovation**: Including recovery suggestions in errors differentiated our UX from standard developer tools.

4. **Quality Gates Early**: Establishing 85% test coverage and CI/CD in Phase 1 prevented technical debt accumulation.

### Challenges and Solutions

#### Challenge 1: TOML Parsing Edge Cases
**Problem**: Python's `toml` library doesn't handle all real-world edge cases gracefully
**Solution**: Wrapped parsing in comprehensive try-catch with specific error messages

```python
try:
    parsed_toml = toml.load(toml_path)
except toml.TomlDecodeError as e:
    return {
        "success": False,
        "errors": [f"TOML syntax error: {str(e)}"],
        "suggestions": [
            "Validate TOML syntax using an online validator",
            "Check for missing quotes or bracket mismatches"
        ]
    }
```

#### Challenge 2: Version Constraint Diversity
**Problem**: Python projects use inconsistent version constraint formats
**Solution**: Built a flexible parser that handles multiple formats gracefully

```python
# Flexible version constraint parsing
VALID_PATTERNS = [
    r"^[><=~!^]*[\d\.]+([\w\d\.-]*)?$",  # Standard semantic versions
    r"^\*$",                              # Wildcard
    r"^[><=~!^]*\d+$",                   # Major version only
]
```

#### Challenge 3: Configuration Management
**Problem**: Different environments need different settings
**Solution**: Environment-aware configuration with validation

```python
class AutoDocsConfig(BaseModel):
    cache_dir: Path = Field(default_factory=lambda: Path.home() / ".cache" / "autodoc-mcp")
    timeout_seconds: int = Field(default=30, ge=5, le=300)
    max_file_size_mb: int = Field(default=10, ge=1, le=100)

    @field_validator("cache_dir")
    @classmethod
    def validate_cache_dir(cls, v: Path) -> Path:
        v.mkdir(parents=True, exist_ok=True)
        return v
```

## Impact on Subsequent Phases

### Foundation for Phase 2
The dependency parsing capability became the input for documentation fetching. The structured error handling patterns were reused for network operations.

### Foundation for Phase 3
The graceful degradation philosophy established in Phase 1 became the template for handling network failures and partial results in Phase 3.

### Foundation for Phase 4
The configuration management and data model patterns scaled perfectly to handle the complexity of multi-dependency context fetching.

## Key Metrics

### Development Velocity
- **Day 1**: Project setup, basic FastMCP integration
- **Day 2**: Dependency parsing with error handling
- **Day 3**: Comprehensive testing and CI/CD setup

### Code Quality
- **Test Coverage**: 85%
- **Type Coverage**: 100% (MyPy strict mode)
- **Documentation**: Complete API documentation for all public methods

### Functionality
- **pyproject.toml Parsing**: 95%+ success rate across diverse projects
- **MCP Integration**: 100% compatibility with tested AI assistants
- **Error Handling**: Comprehensive recovery suggestions for all failure modes

## Looking Forward

Phase 1 established the **quality and architectural foundations** that enabled rapid, confident development in subsequent phases. The patterns established here - graceful degradation, comprehensive testing, and user-focused error messages - became the hallmarks of the entire system.

**Next**: [Phase 2: Documentation Fetching](phase-2-documentation-fetching.md) - Building the core documentation engine.

---

*This phase documentation is part of the AutoDocs MCP Server [Development Journey](../index.md).*