Skip to main content
Glama

YTPipe Banner

🎬 YTPipe - AI-Native YouTube Processing Pipeline

Python 3.8+ License: MIT MCP Compatible Code style: black

Transform YouTube videos into LLM-ready knowledge bases with a production-ready MCP backend.

Quick Start β€’ Features β€’ Documentation β€’ MCP Tools

✨ Features

  • πŸ€– MCP Integration - 12 AI-callable tools for seamless agent integration

  • 🎯 Smart Chunking - Semantic text chunking with timeline timestamps

  • 🧠 Vector Embeddings - 384-dimensional embeddings for semantic search

  • πŸ” Full-Text Search - Context-aware transcript search

  • πŸ“Š SEO Intelligence - AI-powered title, tag, and description optimization

  • ⏱️ Timeline Analysis - Topic evolution and keyword density tracking

  • πŸ—οΈ Microservices - 11 independent, composable services

  • πŸ” Type-Safe - Pydantic models throughout

  • ⚑ Async-First - Non-blocking I/O operations

  • πŸ—„οΈ Multi-Backend - ChromaDB, FAISS, Qdrant support


πŸš€ Quick Start

# Install git clone https://github.com/leolech14/ytpipe.git cd ytpipe python3 -m venv venv source venv/bin/activate pip install -r requirements.txt # Process a video ytpipe "https://youtube.com/watch?v=dQw4w9WgXcQ"

Result: Metadata + Transcript + Semantic Chunks + Embeddings + Vector Storage


🎯 Usage Examples

MCP Server (AI Agents)

python -m ytpipe.mcp.server

Then from Claude Code:

"Process this video: https://youtube.com/watch?v=VIDEO_ID" "Search video dQw4w9WgXcQ for 'machine learning'" "Optimize SEO for video dQw4w9WgXcQ"

CLI (Humans)

# Basic ytpipe "https://youtube.com/watch?v=VIDEO_ID" # Advanced ytpipe URL --backend faiss --whisper-model large --verbose

Python API (Developers)

from ytpipe.core.pipeline import Pipeline pipeline = Pipeline(output_dir="./output") result = await pipeline.process(url) print(f"βœ… {result.metadata.title}") print(f" Chunks: {len(result.chunks)}") print(f" Time: {result.processing_time:.1f}s")

πŸ“‹ MCP Tools

Pipeline (4 tools)

  • ytpipe_process_video - Full pipeline

  • ytpipe_download - Download only

  • ytpipe_transcribe - Transcribe audio

  • ytpipe_embed - Generate embeddings

Query (4 tools)

  • ytpipe_search - Full-text search

  • ytpipe_find_similar - Semantic search

  • ytpipe_get_chunk - Get chunk by ID

  • ytpipe_get_metadata - Get video info

Analytics (4 tools)

  • ytpipe_seo_optimize - SEO recommendations

  • ytpipe_quality_report - Quality metrics

  • ytpipe_topic_timeline - Topic evolution

  • ytpipe_benchmark - Performance analysis


πŸ—οΈ Architecture

MCP Server (12 tools) β†’ Pipeline Orchestrator β†’ 11 Services β†’ Pydantic Models

Services:

  • Extractors (2): Download, Transcriber

  • Processors (4): Chunker, Embedder, VectorStore, Docling

  • Intelligence (4): Search, SEO, Timeline, Analyzer

  • Exporters (1): Dashboard

8 Processing Phases:

  1. Download β†’ 2. Transcription β†’ 3. Chunking β†’ 4. Embeddings β†’

  2. Export β†’ 6. Dashboard β†’ 7. Docling β†’ 8. Vector Storage


πŸ“Š Performance

Metric

Value

Processing Speed

4-13x real-time

Memory Usage

<2GB peak

Chunk Quality

85%+ high quality

Embedding Dimension

384


πŸ”§ Requirements

  • Python 3.8+

  • FFmpeg (for audio extraction)

  • 4GB+ RAM recommended

  • GPU optional (CUDA for acceleration)


πŸ“– Documentation


🀝 Contributing

Contributions welcome! Please read CONTRIBUTING.md first.


πŸ“ License

MIT License - see LICENSE for details.


πŸ™ Credits

Built with:


πŸ“§ Contact

Leonardo Lech


⭐ Star this repo if you find it useful!

Transform YouTube β†’ Knowledge Base in seconds

-
security - not tested
A
license - permissive license
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/leolech14/ytpipe'

If you have feedback or need assistance with the MCP directory API, please join our Discord server