YTPipe

CLAUDE.md•5 KiB

# PROJECT_ytpipe - Claude Code Instructions ## 🎯 Project Identity **YTPipe MCP Backend** - Production-ready YouTube processing pipeline with AI agent integration **Status**: ✅ **PRODUCTION READY** (95% complete) **Architecture**: Microservices + MCP Protocol **Language**: Python 3.8+ with async/await --- ## 🏗️ Architecture ``` MCP Server (12 tools) → Pipeline Orchestrator → 11 Services → Pydantic Models ``` ### Service Layers - **Extractors** (2): Download, Transcriber - **Processors** (4): Chunker, Embedder, VectorStore, Docling - **Intelligence** (4): Search, SEO, Timeline, Analyzer - **Exporters** (1): Dashboard --- ## 🚀 Commands ### Start MCP Server (for AI agents) ```bash source venv/bin/activate python -m ytpipe.mcp.server ``` ### CLI (backward compatible) ```bash source venv/bin/activate ytpipe "https://youtube.com/watch?v=VIDEO_ID" ytpipe URL --backend chromadb --whisper-model large --verbose ``` ### Python API ```python from ytpipe.core.pipeline import Pipeline result = await Pipeline().process(url) ``` ### Run Tests ```bash pytest tests/ python test_seo_service.py python test_timeline_service.py ``` --- ## 📂 Key Paths | Task | Location | |------|----------| | Core models | `ytpipe/core/models.py` | | Pipeline orchestrator | `ytpipe/core/pipeline.py` | | MCP server | `ytpipe/mcp/server.py` | | All services | `ytpipe/services/` | | CLI wrapper | `ytpipe/cli/main.py` | | Documentation | `README.md`, `MISSION_ACCOMPLISHED.md` | --- ## 🎯 Rules ### Code Quality - **Type hints required** - All functions must be typed - **Pydantic models only** - No raw dicts for data structures - **Async/await** - All I/O operations must be async - **Domain exceptions** - Use ytpipe.core.exceptions - **No placeholders** - Production-ready code only ### Architecture - **Service isolation** - Services cannot import other services - **Model contracts** - Services communicate via Pydantic models - **Lazy loading** - Models load only when first used - **Stateless services** - No shared state between service calls ### MCP Tools - **Return dicts** - All tools return plain dicts (MCP compatible) - **Error handling** - Graceful failures with error messages - **File loading** - Load from standard output structure - **Type validation** - Use Pydantic models internally --- ## 🛠️ Development ### Adding New Services 1. Create service in appropriate directory (extractors/processors/intelligence/exporters) 2. Import Pydantic models from `ytpipe.core.models` 3. Raise exceptions from `ytpipe.core.exceptions` 4. Add to `__init__.py` exports 5. Write unit tests ### Adding MCP Tools 1. Add `@mcp.tool()` decorator in `ytpipe/mcp/server.py` 2. Load data from files (metadata.json, chunks.jsonl) 3. Call appropriate service 4. Return `.dict()` from Pydantic models 5. Handle file-not-found errors ### Modifying Pipeline 1. Edit `ytpipe/core/pipeline.py` 2. Add new phase or modify existing 3. Track timing in `phase_times` dict 4. Update `ProcessingResult` model if needed --- ## 📊 MCP Tools (12 Total) ### Pipeline Tools (4) - `ytpipe_process_video` - Full 8-phase pipeline - `ytpipe_download` - Download only - `ytpipe_transcribe` - Transcribe audio - `ytpipe_embed` - Generate embedding ### Query Tools (4) - `ytpipe_search` - Full-text search - `ytpipe_find_similar` - Vector similarity - `ytpipe_get_chunk` - Retrieve chunk - `ytpipe_get_metadata` - Get metadata ### Analytics Tools (4) - `ytpipe_seo_optimize` - SEO recommendations - `ytpipe_quality_report` - Quality analysis - `ytpipe_topic_timeline` - Timeline visualization - `ytpipe_benchmark` - Performance metrics --- ## 🔧 Dependencies ```bash # Core (already in requirements.txt) yt-dlp openai-whisper sentence-transformers chromadb # MCP Layer mcp fastmcp click pydantic>=2.0 ``` --- ## 📝 Documentation - **README.md** - Architecture overview - **MISSION_ACCOMPLISHED.md** - Project completion summary - **PARALLEL_SWARM_VICTORY.md** - Parallel agent implementation story - **TRANSFORMATION_COMPLETE.md** - Transformation details - Service docs: SEO_SERVICE_IMPLEMENTATION.md, TIMELINE_SERVICE_DOCS.md, etc. --- ## 🎯 Project Vision **Transform YouTube content → LLM-ready knowledge bases** ### Current Capabilities - ✅ Full video processing pipeline - ✅ Semantic chunking with timestamps - ✅ Vector embeddings and search - ✅ AI agent integration (MCP) - ✅ SEO optimization - ✅ Timeline analysis - ✅ Quality scoring ### Future Vision - [ ] Batch processing (multiple videos) - [ ] Web interface (FastAPI dashboard) - [ ] Cloud deployment (AWS/GCP) - [ ] Real-time processing (live streams) - [ ] Multi-language support - [ ] Advanced NLP (summarization, Q&A) --- ## 🚨 Important Notes - **Virtual environment required**: Always `source venv/bin/activate` - **FFmpeg dependency**: Required for yt-dlp audio extraction - **GPU optional**: Whisper and embeddings benefit from CUDA - **Storage**: ~10x video size for temporary files --- This is a **production-ready** MCP backend ready for AI agent integration and standalone deployment.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/leolech14/ytpipe'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

CLAUDE.md•5 KiB