YTPipe

README.md•4.85 KiB

<div align="center"> ![YTPipe Banner](assets/ytpipe-banner.webp) # 🎬 YTPipe - AI-Native YouTube Processing Pipeline [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![MCP Compatible](https://img.shields.io/badge/MCP-Compatible-brightgreen.svg)](https://modelcontextprotocol.io/) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) Transform YouTube videos into **LLM-ready knowledge bases** with a production-ready MCP backend. [Quick Start](#-quick-start) • [Features](#-features) • [Documentation](#-documentation) • [MCP Tools](#-mcp-tools) </div> ## ✨ Features - 🤖 **MCP Integration** - 12 AI-callable tools for seamless agent integration - 🎯 **Smart Chunking** - Semantic text chunking with timeline timestamps - 🧠 **Vector Embeddings** - 384-dimensional embeddings for semantic search - 🔍 **Full-Text Search** - Context-aware transcript search - 📊 **SEO Intelligence** - AI-powered title, tag, and description optimization - ⏱️ **Timeline Analysis** - Topic evolution and keyword density tracking - 🏗️ **Microservices** - 11 independent, composable services - 🔐 **Type-Safe** - Pydantic models throughout - ⚡ **Async-First** - Non-blocking I/O operations - 🗄️ **Multi-Backend** - ChromaDB, FAISS, Qdrant support --- ## 🚀 Quick Start ```bash # Install git clone https://github.com/leolech14/ytpipe.git cd ytpipe python3 -m venv venv source venv/bin/activate pip install -r requirements.txt # Process a video ytpipe "https://youtube.com/watch?v=dQw4w9WgXcQ" ``` **Result**: Metadata + Transcript + Semantic Chunks + Embeddings + Vector Storage --- ## 🎯 Usage Examples ### MCP Server (AI Agents) ```bash python -m ytpipe.mcp.server ``` Then from Claude Code: ``` "Process this video: https://youtube.com/watch?v=VIDEO_ID" "Search video dQw4w9WgXcQ for 'machine learning'" "Optimize SEO for video dQw4w9WgXcQ" ``` ### CLI (Humans) ```bash # Basic ytpipe "https://youtube.com/watch?v=VIDEO_ID" # Advanced ytpipe URL --backend faiss --whisper-model large --verbose ``` ### Python API (Developers) ```python from ytpipe.core.pipeline import Pipeline pipeline = Pipeline(output_dir="./output") result = await pipeline.process(url) print(f"✅ {result.metadata.title}") print(f" Chunks: {len(result.chunks)}") print(f" Time: {result.processing_time:.1f}s") ``` --- ## 📋 MCP Tools ### Pipeline (4 tools) - `ytpipe_process_video` - Full pipeline - `ytpipe_download` - Download only - `ytpipe_transcribe` - Transcribe audio - `ytpipe_embed` - Generate embeddings ### Query (4 tools) - `ytpipe_search` - Full-text search - `ytpipe_find_similar` - Semantic search - `ytpipe_get_chunk` - Get chunk by ID - `ytpipe_get_metadata` - Get video info ### Analytics (4 tools) - `ytpipe_seo_optimize` - SEO recommendations - `ytpipe_quality_report` - Quality metrics - `ytpipe_topic_timeline` - Topic evolution - `ytpipe_benchmark` - Performance analysis --- ## 🏗️ Architecture ``` MCP Server (12 tools) → Pipeline Orchestrator → 11 Services → Pydantic Models ``` **Services**: - **Extractors** (2): Download, Transcriber - **Processors** (4): Chunker, Embedder, VectorStore, Docling - **Intelligence** (4): Search, SEO, Timeline, Analyzer - **Exporters** (1): Dashboard **8 Processing Phases**: 1. Download → 2. Transcription → 3. Chunking → 4. Embeddings → 5. Export → 6. Dashboard → 7. Docling → 8. Vector Storage --- ## 📊 Performance | Metric | Value | |--------|-------| | **Processing Speed** | 4-13x real-time | | **Memory Usage** | <2GB peak | | **Chunk Quality** | 85%+ high quality | | **Embedding Dimension** | 384 | --- ## 🔧 Requirements - Python 3.8+ - FFmpeg (for audio extraction) - 4GB+ RAM recommended - GPU optional (CUDA for acceleration) --- ## 📖 Documentation - [Quick Start](QUICKSTART.md) - [Setup Guide](SETUP_INSTRUCTIONS.md) - [Architecture](docs/ARCHITECTURE.md) - [API Reference](CLAUDE.md) - [Development Guide](AGENT_KERNEL.md) --- ## 🤝 Contributing Contributions welcome! Please read [CONTRIBUTING.md](CONTRIBUTING.md) first. --- ## 📝 License MIT License - see [LICENSE](LICENSE) for details. --- ## 🙏 Credits Built with: - [FastMCP](https://github.com/jlowin/fastmcp) - MCP server framework - [OpenAI Whisper](https://github.com/openai/whisper) - Speech-to-text - [sentence-transformers](https://www.sbert.net/) - Text embeddings - [Model Context Protocol](https://modelcontextprotocol.io/) - AI tool standard --- ## 📧 Contact **Leonardo Lech** - Email: leonardo.lech@gmail.com - GitHub: [@leolech14](https://github.com/leolech14) --- <div align="center"> **⭐ Star this repo if you find it useful!** **Transform YouTube → Knowledge Base in seconds** </div>

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/leolech14/ytpipe'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•4.85 KiB