# PROJECT_ytpipe - Claude Code Instructions
## π― Project Identity
**YTPipe MCP Backend** - Production-ready YouTube processing pipeline with AI agent integration
**Status**: β
**PRODUCTION READY** (95% complete)
**Architecture**: Microservices + MCP Protocol
**Language**: Python 3.8+ with async/await
---
## ποΈ Architecture
```
MCP Server (12 tools) β Pipeline Orchestrator β 11 Services β Pydantic Models
```
### Service Layers
- **Extractors** (2): Download, Transcriber
- **Processors** (4): Chunker, Embedder, VectorStore, Docling
- **Intelligence** (4): Search, SEO, Timeline, Analyzer
- **Exporters** (1): Dashboard
---
## π Commands
### Start MCP Server (for AI agents)
```bash
source venv/bin/activate
python -m ytpipe.mcp.server
```
### CLI (backward compatible)
```bash
source venv/bin/activate
ytpipe "https://youtube.com/watch?v=VIDEO_ID"
ytpipe URL --backend chromadb --whisper-model large --verbose
```
### Python API
```python
from ytpipe.core.pipeline import Pipeline
result = await Pipeline().process(url)
```
### Run Tests
```bash
pytest tests/
python test_seo_service.py
python test_timeline_service.py
```
---
## π Key Paths
| Task | Location |
|------|----------|
| Core models | `ytpipe/core/models.py` |
| Pipeline orchestrator | `ytpipe/core/pipeline.py` |
| MCP server | `ytpipe/mcp/server.py` |
| All services | `ytpipe/services/` |
| CLI wrapper | `ytpipe/cli/main.py` |
| Documentation | `README.md`, `MISSION_ACCOMPLISHED.md` |
---
## π― Rules
### Code Quality
- **Type hints required** - All functions must be typed
- **Pydantic models only** - No raw dicts for data structures
- **Async/await** - All I/O operations must be async
- **Domain exceptions** - Use ytpipe.core.exceptions
- **No placeholders** - Production-ready code only
### Architecture
- **Service isolation** - Services cannot import other services
- **Model contracts** - Services communicate via Pydantic models
- **Lazy loading** - Models load only when first used
- **Stateless services** - No shared state between service calls
### MCP Tools
- **Return dicts** - All tools return plain dicts (MCP compatible)
- **Error handling** - Graceful failures with error messages
- **File loading** - Load from standard output structure
- **Type validation** - Use Pydantic models internally
---
## π οΈ Development
### Adding New Services
1. Create service in appropriate directory (extractors/processors/intelligence/exporters)
2. Import Pydantic models from `ytpipe.core.models`
3. Raise exceptions from `ytpipe.core.exceptions`
4. Add to `__init__.py` exports
5. Write unit tests
### Adding MCP Tools
1. Add `@mcp.tool()` decorator in `ytpipe/mcp/server.py`
2. Load data from files (metadata.json, chunks.jsonl)
3. Call appropriate service
4. Return `.dict()` from Pydantic models
5. Handle file-not-found errors
### Modifying Pipeline
1. Edit `ytpipe/core/pipeline.py`
2. Add new phase or modify existing
3. Track timing in `phase_times` dict
4. Update `ProcessingResult` model if needed
---
## π MCP Tools (12 Total)
### Pipeline Tools (4)
- `ytpipe_process_video` - Full 8-phase pipeline
- `ytpipe_download` - Download only
- `ytpipe_transcribe` - Transcribe audio
- `ytpipe_embed` - Generate embedding
### Query Tools (4)
- `ytpipe_search` - Full-text search
- `ytpipe_find_similar` - Vector similarity
- `ytpipe_get_chunk` - Retrieve chunk
- `ytpipe_get_metadata` - Get metadata
### Analytics Tools (4)
- `ytpipe_seo_optimize` - SEO recommendations
- `ytpipe_quality_report` - Quality analysis
- `ytpipe_topic_timeline` - Timeline visualization
- `ytpipe_benchmark` - Performance metrics
---
## π§ Dependencies
```bash
# Core (already in requirements.txt)
yt-dlp
openai-whisper
sentence-transformers
chromadb
# MCP Layer
mcp
fastmcp
click
pydantic>=2.0
```
---
## π Documentation
- **README.md** - Architecture overview
- **MISSION_ACCOMPLISHED.md** - Project completion summary
- **PARALLEL_SWARM_VICTORY.md** - Parallel agent implementation story
- **TRANSFORMATION_COMPLETE.md** - Transformation details
- Service docs: SEO_SERVICE_IMPLEMENTATION.md, TIMELINE_SERVICE_DOCS.md, etc.
---
## π― Project Vision
**Transform YouTube content β LLM-ready knowledge bases**
### Current Capabilities
- β
Full video processing pipeline
- β
Semantic chunking with timestamps
- β
Vector embeddings and search
- β
AI agent integration (MCP)
- β
SEO optimization
- β
Timeline analysis
- β
Quality scoring
### Future Vision
- [ ] Batch processing (multiple videos)
- [ ] Web interface (FastAPI dashboard)
- [ ] Cloud deployment (AWS/GCP)
- [ ] Real-time processing (live streams)
- [ ] Multi-language support
- [ ] Advanced NLP (summarization, Q&A)
---
## π¨ Important Notes
- **Virtual environment required**: Always `source venv/bin/activate`
- **FFmpeg dependency**: Required for yt-dlp audio extraction
- **GPU optional**: Whisper and embeddings benefit from CUDA
- **Storage**: ~10x video size for temporary files
---
This is a **production-ready** MCP backend ready for AI agent integration and standalone deployment.