# AGENT_KERNEL.md - PROJECT_ytpipe
## π― Project Core Identity
**Name**: YTPipe MCP Backend
**Type**: AI-Native Backend Service
**Status**: Production Ready (95% complete)
**Architecture**: Microservices + MCP Protocol
---
## β‘ Boot Sequence
### 1. Environment Check
```bash
# Verify Python environment
python3 --version # Must be 3.8+
# Activate virtual environment
source venv/bin/activate
# Verify dependencies
python -c "import ytpipe; print('β
ytpipe package loaded')"
```
### 2. System Capabilities
- **11 Services**: Extractors (2), Processors (4), Intelligence (4), Exporters (1)
- **12 MCP Tools**: Pipeline (4), Query (4), Analytics (4)
- **11 Pydantic Models**: Type-safe data contracts
- **12 Custom Exceptions**: Domain-specific error handling
### 3. Primary Interfaces
- **MCP Server**: `python -m ytpipe.mcp.server` (for AI agents)
- **CLI**: `ytpipe URL [OPTIONS]` (for humans)
- **Python API**: `from ytpipe.core.pipeline import Pipeline`
---
## ποΈ Agent Operational Modes
### Mode 1: MCP Server (AI Agent Integration)
**Purpose**: Expose ytpipe to Claude Code and other MCP clients
**Start**:
```bash
python -m ytpipe.mcp.server
```
**Tools Available**:
- Pipeline control (process, download, transcribe, embed)
- Content query (search, find_similar, get_chunk, get_metadata)
- Analytics (seo_optimize, quality_report, topic_timeline, benchmark)
### Mode 2: CLI Execution (Human Interface)
**Purpose**: Backward-compatible command-line interface
**Usage**:
```bash
ytpipe "https://youtube.com/watch?v=VIDEO_ID"
ytpipe URL --backend chromadb --whisper-model large
```
### Mode 3: Python API (Developer Integration)
**Purpose**: Direct library usage in Python projects
**Usage**:
```python
from ytpipe.core.pipeline import Pipeline
from ytpipe.services.intelligence import SEOService
pipeline = Pipeline(output_dir="./output")
result = await pipeline.process(url)
```
---
## π§ Core Patterns
### Data Flow
```
YouTube URL
β
DownloadService β VideoMetadata + audio.mp3
β
TranscriberService β transcript (str)
β
ChunkerService β List[Chunk] with timestamps
β
EmbedderService β List[Chunk] with embeddings
β
VectorStoreService β stored in ChromaDB/FAISS/Qdrant
β
ProcessingResult (Pydantic model)
```
### Service Pattern
```python
class SomeService:
"""Stateless service with clear inputs/outputs."""
def __init__(self, config_params):
"""Configure service."""
self.model = None # Lazy load
async def process(self, input_model: PydanticModel) -> PydanticModel:
"""Process with type-safe contracts."""
# Implementation
return OutputModel(...)
```
### MCP Tool Pattern
```python
@mcp.tool()
async def ytpipe_some_tool(video_id: str, param: str) -> dict:
"""Tool description for AI agents."""
# Load data from files
# Call appropriate service
# Return dict (MCP compatible)
return result.dict()
```
---
## π Critical Constraints
### Type Safety
- **All functions**: Must have type hints
- **All data**: Must use Pydantic models (no raw dicts)
- **All services**: Input/output via typed models
### Performance
- **Lazy loading**: Models load only when first used
- **Model caching**: Reuse loaded models across calls
- **Async operations**: All I/O must be non-blocking
- **Batch processing**: Embeddings generated in batches
### Error Handling
- **Domain exceptions**: Use ytpipe.core.exceptions
- **Graceful degradation**: Services return partial results on errors
- **Clear messages**: Error messages must be actionable
---
## π Common Tasks
### Process Video
```python
from ytpipe.core.pipeline import Pipeline
pipeline = Pipeline(output_dir="./KNOWLEDGE_YOUTUBE")
result = await pipeline.process("https://youtube.com/watch?v=VIDEO_ID")
if result.success:
print(f"β
Processed: {result.metadata.title}")
print(f" Chunks: {len(result.chunks)}")
print(f" Time: {result.processing_time:.1f}s")
```
### Search Transcript
```python
from ytpipe.services.intelligence import SearchService
service = SearchService()
results = service.search_transcript(chunks, query="OpenClaw")
for r in results:
print(f"[{r.timestamp}] {r.text}")
```
### SEO Optimization
```python
from ytpipe.services.intelligence import SEOService
service = SEOService()
seo = service.optimize(metadata, chunks)
print(f"SEO Score: {seo.seo_score}/100")
print(f"Best Title: {seo.suggested_titles[0]['title']}")
```
---
## π Integration Points
### Claude Code MCP Configuration
```json
{
"mcpServers": {
"ytpipe": {
"command": "python",
"args": ["-m", "ytpipe.mcp.server"],
"cwd": "/Users/lech/PROJECTS_all/PROJECT_ytpipe"
}
}
}
```
### As Python Package
```bash
cd PROJECT_ytpipe
pip install -e .
```
### As Microservice
Each service can be extracted into its own container/process for distributed processing.
---
## π Performance Targets
- **Processing**: <2 minutes for 10-minute video
- **Memory**: <2GB peak
- **MCP Tools**: <500ms response for query tools
- **Embeddings**: ~10 chunks/second
---
## π Agent Guidelines
### When Adding Features
1. Check if service exists before creating new one
2. Use existing Pydantic models when possible
3. Follow async/await pattern
4. Add MCP tool if AI agents need access
5. Write tests for new functionality
### When Fixing Bugs
1. Identify which service has the issue
2. Add unit test that reproduces bug
3. Fix in isolated service
4. Validate with integration test
5. Update documentation if needed
### When Optimizing
1. Profile with `--profile` flag
2. Identify bottleneck phase
3. Optimize specific service
4. Benchmark before/after
5. Document improvements
---
**This is a standalone production project. All development should maintain the microservices architecture and MCP integration.**