<div align="center">

# π¬ YTPipe - AI-Native YouTube Processing Pipeline
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://modelcontextprotocol.io/)
[](https://github.com/psf/black)
Transform YouTube videos into **LLM-ready knowledge bases** with a production-ready MCP backend.
[Quick Start](#-quick-start) β’ [Features](#-features) β’ [Documentation](#-documentation) β’ [MCP Tools](#-mcp-tools)
</div>
## β¨ Features
- π€ **MCP Integration** - 12 AI-callable tools for seamless agent integration
- π― **Smart Chunking** - Semantic text chunking with timeline timestamps
- π§ **Vector Embeddings** - 384-dimensional embeddings for semantic search
- π **Full-Text Search** - Context-aware transcript search
- π **SEO Intelligence** - AI-powered title, tag, and description optimization
- β±οΈ **Timeline Analysis** - Topic evolution and keyword density tracking
- ποΈ **Microservices** - 11 independent, composable services
- π **Type-Safe** - Pydantic models throughout
- β‘ **Async-First** - Non-blocking I/O operations
- ποΈ **Multi-Backend** - ChromaDB, FAISS, Qdrant support
---
## π Quick Start
```bash
# Install
git clone https://github.com/leolech14/ytpipe.git
cd ytpipe
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Process a video
ytpipe "https://youtube.com/watch?v=dQw4w9WgXcQ"
```
**Result**: Metadata + Transcript + Semantic Chunks + Embeddings + Vector Storage
---
## π― Usage Examples
### MCP Server (AI Agents)
```bash
python -m ytpipe.mcp.server
```
Then from Claude Code:
```
"Process this video: https://youtube.com/watch?v=VIDEO_ID"
"Search video dQw4w9WgXcQ for 'machine learning'"
"Optimize SEO for video dQw4w9WgXcQ"
```
### CLI (Humans)
```bash
# Basic
ytpipe "https://youtube.com/watch?v=VIDEO_ID"
# Advanced
ytpipe URL --backend faiss --whisper-model large --verbose
```
### Python API (Developers)
```python
from ytpipe.core.pipeline import Pipeline
pipeline = Pipeline(output_dir="./output")
result = await pipeline.process(url)
print(f"β
{result.metadata.title}")
print(f" Chunks: {len(result.chunks)}")
print(f" Time: {result.processing_time:.1f}s")
```
---
## π MCP Tools
### Pipeline (4 tools)
- `ytpipe_process_video` - Full pipeline
- `ytpipe_download` - Download only
- `ytpipe_transcribe` - Transcribe audio
- `ytpipe_embed` - Generate embeddings
### Query (4 tools)
- `ytpipe_search` - Full-text search
- `ytpipe_find_similar` - Semantic search
- `ytpipe_get_chunk` - Get chunk by ID
- `ytpipe_get_metadata` - Get video info
### Analytics (4 tools)
- `ytpipe_seo_optimize` - SEO recommendations
- `ytpipe_quality_report` - Quality metrics
- `ytpipe_topic_timeline` - Topic evolution
- `ytpipe_benchmark` - Performance analysis
---
## ποΈ Architecture
```
MCP Server (12 tools) β Pipeline Orchestrator β 11 Services β Pydantic Models
```
**Services**:
- **Extractors** (2): Download, Transcriber
- **Processors** (4): Chunker, Embedder, VectorStore, Docling
- **Intelligence** (4): Search, SEO, Timeline, Analyzer
- **Exporters** (1): Dashboard
**8 Processing Phases**:
1. Download β 2. Transcription β 3. Chunking β 4. Embeddings β
5. Export β 6. Dashboard β 7. Docling β 8. Vector Storage
---
## π Performance
| Metric | Value |
|--------|-------|
| **Processing Speed** | 4-13x real-time |
| **Memory Usage** | <2GB peak |
| **Chunk Quality** | 85%+ high quality |
| **Embedding Dimension** | 384 |
---
## π§ Requirements
- Python 3.8+
- FFmpeg (for audio extraction)
- 4GB+ RAM recommended
- GPU optional (CUDA for acceleration)
---
## π Documentation
- [Quick Start](QUICKSTART.md)
- [Setup Guide](SETUP_INSTRUCTIONS.md)
- [Architecture](docs/ARCHITECTURE.md)
- [API Reference](CLAUDE.md)
- [Development Guide](AGENT_KERNEL.md)
---
## π€ Contributing
Contributions welcome! Please read [CONTRIBUTING.md](CONTRIBUTING.md) first.
---
## π License
MIT License - see [LICENSE](LICENSE) for details.
---
## π Credits
Built with:
- [FastMCP](https://github.com/jlowin/fastmcp) - MCP server framework
- [OpenAI Whisper](https://github.com/openai/whisper) - Speech-to-text
- [sentence-transformers](https://www.sbert.net/) - Text embeddings
- [Model Context Protocol](https://modelcontextprotocol.io/) - AI tool standard
---
## π§ Contact
**Leonardo Lech**
- Email: leonardo.lech@gmail.com
- GitHub: [@leolech14](https://github.com/leolech14)
---
<div align="center">
**β Star this repo if you find it useful!**
**Transform YouTube β Knowledge Base in seconds**
</div>