# YTPipe - Quick Start Guide
## π Get Started in 3 Steps
### 1. Setup Environment (30 seconds)
```bash
cd ~/PROJECTS_all/PROJECT_ytpipe
source venv/bin/activate # Or create new: python3 -m venv venv
pip install -e .
```
### 2. Choose Your Interface
#### **Option A: MCP Server** (for AI agents like Claude Code)
```bash
python -m ytpipe.mcp.server
```
Then configure Claude Code:
```json
{
"mcpServers": {
"ytpipe": {
"command": "python",
"args": ["-m", "ytpipe.mcp.server"],
"cwd": "/Users/lech/PROJECTS_all/PROJECT_ytpipe"
}
}
}
```
#### **Option B: CLI** (backward compatible)
```bash
ytpipe "https://youtube.com/watch?v=VIDEO_ID" --verbose
```
#### **Option C: Python API**
```python
from ytpipe.core.pipeline import Pipeline
pipeline = Pipeline(output_dir="./output")
result = await pipeline.process("https://youtube.com/watch?v=VIDEO_ID")
print(f"β
{result.metadata.title}")
print(f" Chunks: {len(result.chunks)}")
print(f" Time: {result.processing_time:.1f}s")
```
### 3. Explore MCP Tools (12 available)
```python
# Process video
ytpipe_process_video(url="https://youtube.com/...")
# Search transcript
ytpipe_search(video_id="VIDEO_ID", query="OpenClaw")
# SEO optimization
ytpipe_seo_optimize(video_id="VIDEO_ID")
# Timeline analysis
ytpipe_topic_timeline(video_id="VIDEO_ID", keywords=["AI", "Python"])
# Quality report
ytpipe_quality_report(video_id="VIDEO_ID")
```
---
## π MCP Tools Reference
### Pipeline Tools (4)
- `ytpipe_process_video` - Full 8-phase pipeline
- `ytpipe_download` - Download + metadata only
- `ytpipe_transcribe` - Whisper transcription
- `ytpipe_embed` - Generate embeddings
### Query Tools (4)
- `ytpipe_search` - Full-text search with context
- `ytpipe_find_similar` - Vector similarity search
- `ytpipe_get_chunk` - Retrieve specific chunk
- `ytpipe_get_metadata` - Get video metadata
### Analytics Tools (4)
- `ytpipe_seo_optimize` - SEO recommendations
- `ytpipe_quality_report` - Quality metrics
- `ytpipe_topic_timeline` - Topic evolution
- `ytpipe_benchmark` - Performance analysis
---
## π― Common Use Cases
### Use Case 1: Process Video for Analysis
```bash
ytpipe "https://youtube.com/watch?v=VIDEO_ID"
```
**Outputs**:
- `KNOWLEDGE_YOUTUBE/{VIDEO_ID}/exports/metadata.json`
- `KNOWLEDGE_YOUTUBE/{VIDEO_ID}/exports/chunks.jsonl`
- `KNOWLEDGE_YOUTUBE/{VIDEO_ID}/exports/transcript.txt`
- Vector database (ChromaDB by default)
### Use Case 2: Search Processed Content
```python
# Via MCP
results = await ytpipe_search(
video_id="VIDEO_ID",
query="machine learning",
max_results=10
)
# Returns: List of matches with timestamps and context
```
### Use Case 3: SEO Optimization
```python
# Via MCP
seo = await ytpipe_seo_optimize(video_id="VIDEO_ID")
print(f"Current Title: {seo['current_title']}")
print(f"SEO Score: {seo['seo_score']}/100")
print(f"Best Title: {seo['suggested_titles'][0]['title']}")
print(f"Tags: {seo['recommended_tags']}")
```
### Use Case 4: Topic Timeline Analysis
```python
# Via MCP
timeline = await ytpipe_topic_timeline(
video_id="VIDEO_ID",
keywords=["Python", "AI", "tutorial"]
)
# Returns: Keyword density over time, topic shifts
```
---
## π§ Configuration
### Vector Backends
- **chromadb** (default) - Local, persistent
- **faiss** - Local, high-performance
- **qdrant** - Cloud/local, production
```bash
ytpipe URL --backend chromadb # or faiss, qdrant
```
### Whisper Models
- **tiny** - Fastest, less accurate
- **base** (default) - Balanced
- **small** - Better accuracy
- **medium** - High accuracy
- **large** - Best accuracy, slowest
```bash
ytpipe URL --whisper-model large
```
### Chunking Options
```bash
ytpipe URL --chunk-size 1000 --chunk-overlap 100
```
---
## π Output Structure
```
KNOWLEDGE_YOUTUBE/
βββ {VIDEO_ID}/
βββ audio.mp3
βββ exports/
βββ metadata.json # Video metadata
βββ chunks.jsonl # All chunks with embeddings
βββ transcript.txt # Full transcript
βββ granite_docling_*.json # Docling processing
βββ dashboard.html # Interactive dashboard
```
---
## π Tips
### Performance
- Use `--whisper-model tiny` for faster processing
- Use `--backend faiss` for faster vector search
- Process shorter videos first to test
### Quality
- Use `--whisper-model large` for better transcription
- Higher chunk overlap = better context preservation
- Check quality_score in chunks.jsonl
### Debugging
- Use `--verbose` flag for detailed output
- Check logs in output directory
- Validate with: `python -c "from ytpipe.mcp.server import mcp; print('β
OK')"`
---
## π Next Steps
1. **Test with real video** - Process a short YouTube video
2. **Explore outputs** - Check KNOWLEDGE_YOUTUBE/{VIDEO_ID}/
3. **Try MCP tools** - Use from Claude Code
4. **Build workflows** - Chain multiple tools together
---
**PROJECT_ytpipe is ready for production use!** π