[]()
[](LICENSE)
[]()
[](https://pypi.org/project/mcp-youtube-intelligence/)
# MCP YouTube Intelligence
> **An MCP server that intelligently analyzes YouTube videos** β transcript extraction, summarization, entity extraction, comment analysis, topic segmentation, and channel monitoring.
π― **Core value**: Raw transcripts (5,000β50,000 tokens) are **processed server-side**, delivering only **~300 tokens** to the LLM. No more wasting your context window.
[νκ΅μ΄](README.md)
---
## π€ Why This Server?
Most YouTube MCP servers dump raw transcripts directly into the LLM context β consuming tens of thousands of tokens per video.
| Feature | Other MCP Servers | MCP YouTube Intelligence |
|---------|:---:|:---:|
| Transcript extraction | β
| β
|
| **Server-side summarization** (token-optimized) | β | β
|
| **Channel monitoring** (RSS) | β | β
|
| **Comment collection + sentiment analysis** | β | β
|
| **Topic segmentation** | β | β
|
| **Entity extraction** (KR+EN, 200+ entities) | β | β
|
| **Transcript search** (keyword β snippets) | β | β
|
| **YouTube search** | β | β
|
| **Playlist analysis** | β | β
|
| **Batch processing** | β | β
|
| SQLite/PostgreSQL storage | β | β
|
| Basic summary (preview-level, no API/model needed) | β | β
|
**Token savings**: ~300 tokens per video (summary) vs. 5,000β50,000 (raw transcript).
---
## ποΈ Architecture
```
βββββββββββββββββββββββββββββββββββββββββββ
β MCP YouTube Intelligence β
β β
YouTube βββΊ yt-dlp/API βββ€ Transcript βββΊ Clean βββΊ Summarize βββββ€βββΊ MCP Client
β β β (~300 tokens)
β ββββΊ Entity Extraction β
β ββββΊ Topic Segmentation β
β ββββΊ Keyword Search β
β β
β Comments βββΊ Filter + Sentiment βββΊ Summary β
β RSS Feed βββΊ Monitor βββΊ New Videos β
β β
β βΌ β
β SQLite / PostgreSQL β
βββββββββββββββββββββββββββββββββββββββββββ
```
Heavy processing (cleaning, summarization, analysis) happens **on the server**. The MCP client receives only **compact results**.
---
## π‘ Use Cases
### π¬ Research & Learning
| Scenario | Traditional | With MYI | Improvement |
|----------|------------|---------|-------------|
| Summarize 1-hour lecture | Watch entire video (60 min) | Read summary (2 min) | β±οΈ **97% time saved** |
| Analyze paper review videos | Manual notes + timestamp hunting | Auto topic segmentation | π Instant navigation |
| Track tech trends | Watch 10 videos individually | Batch process all at once | π **10x throughput** |
**Example**: "Anthropic Agent SDK" tutorial (20 min)
```
Raw transcript: 15,000+ tokens
β MYI summary: ~300 tokens (98% reduction)
β Extracted entities: [Anthropic, Agent SDK, Claude, Tool Use, MCP, Python]
β Topic segments: [Installation, Architecture, Tool Integration, Agent Run, Deployment]
```
### π Market & Trend Monitoring
| Scenario | How | Impact |
|----------|-----|--------|
| Track crypto YouTubers | `monitor_channel` detects new videos β auto-summarize | π‘ Real-time market insights |
| Competitor product analysis | Entity extraction + comment sentiment from launch videos | π― Instant market reaction |
| Investment research | Batch summarize analyst videos β save to Notion DB | π Systematic knowledge base |
**Example**: Channel monitoring β AI agent automation
```bash
# 1. Register channel
mcp-yt monitor UC_x5XG1OV2P6uZZ5FSM9Ttw --interval 3600
# 2. Auto-summarize new videos (cron/script integration)
mcp-yt transcript <new_video_id> --summarize
# β Send summary to Slack/Discord webhook
```
### π€ AI Agent Integration
| Agent | Integration | Use Case |
|-------|------------|----------|
| Claude Code | Direct MCP connection | "Summarize this video" β done in one prompt |
| OpenClaw | Register as Skill | Build automated research pipelines |
| Cursor | MCP config | Instantly analyze coding tutorials |
| Custom bots | CLI pipeline | `mcp-yt transcript ID \| jq .summary` |
**Measured token cost savings**:
```
1 video (20 min) raw transcript to LLM: ~15,000 tokens ($0.015)
After MYI summarization: ~300 tokens ($0.0003)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Savings: 98%, 50x cost efficiency (100 videos: $1.50 β $0.03)
```
### π Education & Content Creation
- **Auto lecture notes**: Chapter-by-chapter summaries via topic segmentation
- **Multilingual analysis**: Auto-detect KO/EN/JA subtitles + summarize
- **Comment insights**: Sentiment analysis reveals content improvement points
- **Playlist batch processing**: Summarize entire lecture series at once
---
## π Quick Start
### Installation
```bash
# uv (recommended)
uv pip install mcp-youtube-intelligence
# pip
pip install mcp-youtube-intelligence
# Optional dependencies
pip install "mcp-youtube-intelligence[all-llm]" # All LLMs (OpenAI + Anthropic + Google)
pip install "mcp-youtube-intelligence[llm]" # OpenAI only
pip install "mcp-youtube-intelligence[anthropic-llm]" # Anthropic only
pip install "mcp-youtube-intelligence[google-llm]" # Google only
pip install "mcp-youtube-intelligence[postgres]" # PostgreSQL backend
pip install "mcp-youtube-intelligence[dev]" # Development (pytest, etc.)
```
> **Prerequisite**: `yt-dlp` must be installed and in your PATH.
> ```bash
> pip install yt-dlp
> ```
### CLI Usage
After installation, the `mcp-yt` command is available.
#### Transcript Extraction
```bash
# Summary (default, ~300 tokens)
mcp-yt transcript https://youtube.com/watch?v=LV6Juz0xcrY
# Full transcript (saved to file)
mcp-yt transcript https://youtube.com/watch?v=LV6Juz0xcrY --mode full
# Split into chunks
mcp-yt transcript https://youtube.com/watch?v=LV6Juz0xcrY --mode chunks
# JSON output
mcp-yt --json transcript https://youtube.com/watch?v=LV6Juz0xcrY
# Save to file
mcp-yt transcript https://youtube.com/watch?v=LV6Juz0xcrY -o summary.txt
```
#### YouTube Search
```bash
mcp-yt search "transformer explained"
mcp-yt search "python tutorial" --max 5 --order date
mcp-yt search "AI news" --channel UCxxxx
```
#### Video Metadata + Summary
```bash
mcp-yt video https://youtube.com/watch?v=LV6Juz0xcrY
```
Sample output:
```
video_id: LV6Juz0xcrY
title: Video Title
channel_name: Channel Name
duration_seconds: 612
view_count: 1500000
summary: This video covers three main topics...
```
#### Comment Collection
```bash
# Top 10 comments
mcp-yt comments https://youtube.com/watch?v=LV6Juz0xcrY
# Newest 20 comments
mcp-yt comments https://youtube.com/watch?v=LV6Juz0xcrY --max 20 --sort newest
# Positive comments only
mcp-yt comments https://youtube.com/watch?v=LV6Juz0xcrY --sentiment positive
# Negative comments only
mcp-yt comments https://youtube.com/watch?v=LV6Juz0xcrY --sentiment negative
```
#### Channel Monitoring
```bash
# Subscribe
mcp-yt monitor subscribe @3blue1brown
# Check for new videos
mcp-yt monitor check --channel UCYO_jab_esuFRV4b17AJtAw
# List subscriptions
mcp-yt monitor list
```
#### Entity Extraction
```bash
mcp-yt entities https://youtube.com/watch?v=LV6Juz0xcrY
```
Sample output:
```
entity_count: 5
entities: (5 items)
[1] type: company, name: NVIDIA, keyword: NVIDIA, count: 12
[2] type: technology, name: AI, keyword: AI, count: 8
[3] type: index, name: NASDAQ, keyword: NASDAQ, count: 5
```
#### Topic Segmentation
```bash
mcp-yt segments https://youtube.com/watch?v=LV6Juz0xcrY
```
#### Playlist
```bash
mcp-yt playlist https://youtube.com/playlist?list=PLrAXtmErZgOe...
mcp-yt playlist PLrAXtmErZgOe... --max 10
```
#### Batch Processing
```bash
mcp-yt batch LV6Juz0xcrY abc123def45 xyz789ghi01
mcp-yt batch LV6Juz0xcrY abc123def45 --mode full
```
#### Search Stored Transcripts
```bash
mcp-yt search-transcripts "transformer architecture"
```
> π‘ Add `--json` to any command for JSON output.
---
## π MCP Server Connection Guide
### Claude Desktop
Add to `claude_desktop_config.json`:
```json
{
"mcpServers": {
"youtube": {
"command": "uvx",
"args": ["mcp-youtube-intelligence"],
"env": {
"OPENAI_API_KEY": "sk-...",
"ANTHROPIC_API_KEY": "sk-ant-...",
"GOOGLE_API_KEY": "AIza...",
"MYI_LLM_PROVIDER": "auto"
}
}
}
}
```
> π‘ Only set the API key(s) for the provider(s) you want to use. `auto` mode detects automatically.
### Claude Code
```bash
claude mcp add youtube -- uvx mcp-youtube-intelligence
```
### OpenCode
Add to your `mcp.json` or project config:
```json
{
"mcpServers": {
"youtube": {
"command": "uvx",
"args": ["mcp-youtube-intelligence"],
"env": {
"OPENAI_API_KEY": "sk-...",
"ANTHROPIC_API_KEY": "sk-ant-...",
"GOOGLE_API_KEY": "AIza..."
}
}
}
}
```
### Cursor
Create `.cursor/mcp.json`:
```json
{
"mcpServers": {
"youtube": {
"command": "uvx",
"args": ["mcp-youtube-intelligence"],
"env": {
"OPENAI_API_KEY": "sk-...",
"ANTHROPIC_API_KEY": "sk-ant-...",
"GOOGLE_API_KEY": "AIza..."
}
}
}
}
```
### Claude Code Skills
Since a CLI is provided, you can register it as a skill in Claude Code:
```
skills/
youtube/
SKILL.md
```
Sample `SKILL.md`:
```markdown
# YouTube Analysis Skill
Use the `mcp-yt` CLI for YouTube video analysis.
## Available Commands
- `mcp-yt transcript <URL>` β Extract/summarize transcript
- `mcp-yt video <URL>` β Video metadata
- `mcp-yt comments <URL>` β Comment analysis
- `mcp-yt entities <URL>` β Entity extraction
- `mcp-yt segments <URL>` β Topic segmentation
- `mcp-yt search "query"` β YouTube search
- `mcp-yt search-transcripts "query"` β Search stored transcripts
- `mcp-yt monitor subscribe <URL>` β Channel monitoring
- `mcp-yt playlist <URL>` β Playlist info
- `mcp-yt batch <id1> <id2>` β Batch processing
## Rules
- Always use `--json` for structured output
- Both video URLs and 11-character IDs are accepted
- Transcript summaries are ~300 tokens by default
```
---
## β Key Features at a Glance
> π‘ **Vibe coders**: Just connect the MCP server and say "summarize this video" β done!
> π‘ **Developers**: Use the CLI (`mcp-yt`) to integrate into scripts and pipelines
### 1. π― Transcript Extraction + Token-Optimized Summarization
Fetches YouTube subtitles and **summarizes server-side**. Instead of sending 5,000β50,000 raw tokens to your LLM, MYI delivers **~300 tokens**.
- **Multilingual auto-detection** (Korean, English, Japanese, etc.)
- Prefers manual captions, falls back to auto-generated
- **Basic summarization works without any API key** (LLM summarization optional)
- β οΈ Extractive summary is sentence-extraction level. For high-quality summaries, LLM integration is recommended.
### 2. π·οΈ Entity Extraction
Automatically identifies **people, companies, technologies, and products** from transcripts. 200+ built-in entities.
- Domains: AI/ML, crypto, programming, global companies, economics, etc.
- Korean + English simultaneous support
- Custom entities can be added
### 3. π Topic Segmentation
Splits long videos into **topic-based segments**. Instantly see "what's discussed where."
- Keyword-shift-based boundary detection
- Auto-labels each segment with a representative topic
- Timestamp integration for jumping to specific sections
### 4. π¬ Comment Collection + Sentiment Analysis
Collects video **comments** and analyzes **positive/negative/neutral** sentiment.
- Sort: by popularity / newest
- Noise filtering: auto-removes spam and bot comments
- Sentiment filter: positive only / negative only / all
### 5. π‘ Channel Monitoring
**Subscribe to YouTube channels via RSS** β detects new uploads automatically.
- Periodic checks (cron/script integration)
- Build auto-summarization pipelines for new videos
- yt-dlp fallback for reliability
### 6. π YouTube Search + Transcript Search
- **YouTube Search**: Find videos by keyword (Data API v3 + yt-dlp fallback)
- **Transcript Search**: Search saved transcripts β returns relevant snippets
- Full playlist analysis supported
### 7. π¦ Batch Processing
Process **multiple videos at once**. Perfect for seminar series, lecture playlists.
- Async parallel processing (semaphore-limited for stability)
- Accepts video ID lists or playlist URLs
### 8. πΎ Data Storage
Analysis results are **automatically saved to a local DB**.
- SQLite (default, zero config) / PostgreSQL (optional)
- Cached results returned instantly on duplicate requests
- Search index for fast keyword lookups
---
## π§ MCP Tools Reference (9 tools)
### 1. `get_video`
Get video metadata + summary in one call. Results are cached.
| Parameter | Type | Required | Description |
|-----------|------|:--------:|-------------|
| `video_id` | string | β
| YouTube video ID |
```json
// Request
{"tool": "get_video", "arguments": {"video_id": "LV6Juz0xcrY"}}
// Response (~300 tokens)
{
"video_id": "LV6Juz0xcrY",
"title": "Video Title",
"channel_name": "Channel",
"duration_seconds": 612,
"view_count": 1500000,
"like_count": 45000,
"summary": "This video covers...",
"transcript_length": 15420,
"status": "done"
}
```
**Estimated tokens**: ~300
---
### 2. `get_transcript`
Retrieve transcript in 3 modes.
| Parameter | Type | Required | Default | Description |
|-----------|------|:--------:|---------|-------------|
| `video_id` | string | β
| β | YouTube video ID |
| `mode` | string | β | `"summary"` | `summary` Β· `full` Β· `chunks` |
**Modes**:
- **`summary`** β Returns a concise summary (~300 tokens, **recommended**)
- **`full`** β Saves transcript to file, returns path (~50 tokens)
- **`chunks`** β Splits into ~2000-char chunks for sequential processing
```json
// summary mode
{"video_id": "abc123", "mode": "summary", "summary": "...", "char_count": 15420}
// full mode
{"video_id": "abc123", "mode": "full", "file_path": "~/.mcp-youtube-intelligence/transcripts/abc123.txt", "char_count": 15420}
// chunks mode
{"video_id": "abc123", "mode": "chunks", "chunk_count": 8, "chunks": [{"index": 0, "text": "...", "char_count": 2000}]}
```
**Estimated tokens**: summary ~300 | full ~50 | chunks ~NΓ500
---
### 3. `get_comments`
Fetch comments with automatic spam/noise filtering and sentiment analysis.
| Parameter | Type | Required | Default | Description |
|-----------|------|:--------:|---------|-------------|
| `video_id` | string | β
| β | YouTube video ID |
| `top_n` | int | β | `10` | Number of comments to return |
| `summarize` | bool | β | `false` | Return summarized view |
```json
{
"video_id": "abc123",
"count": 10,
"comments": [
{"author": "User1", "text": "Great explanation!", "likes": 245, "sentiment": "positive"},
{"author": "User2", "text": "Very helpful", "likes": 132, "sentiment": "positive"}
]
}
```
**Estimated tokens**: ~200β500
---
### 4. `monitor_channel`
RSS-based channel monitoring. Subscribe and detect new uploads.
| Parameter | Type | Required | Default | Description |
|-----------|------|:--------:|---------|-------------|
| `channel_ref` | string | β
| β | Channel URL, @handle, or channel ID |
| `action` | string | β | `"check"` | `add` Β· `check` Β· `list` Β· `remove` |
```json
// Subscribe
{"tool": "monitor_channel", "arguments": {"channel_ref": "@3blue1brown", "action": "add"}}
// Check for new videos
{"tool": "monitor_channel", "arguments": {"channel_ref": "UCYO_jab...", "action": "check"}}
// β {"channel_id": "...", "new_videos": [{"video_id": "abc123", "title": "New Video", "published": "..."}]}
```
**Estimated tokens**: ~100β300
---
### 5. `search_transcripts`
Search stored transcripts by keyword. Returns contextual snippets.
| Parameter | Type | Required | Default | Description |
|-----------|------|:--------:|---------|-------------|
| `query` | string | β
| β | Search keyword or phrase |
| `limit` | int | β | `10` | Maximum results |
```json
{
"query": "transformer",
"count": 3,
"results": [
{"video_id": "abc123", "title": "Attention Is All You Need", "snippet": "...transformer architecture uses..."}
]
}
```
**Estimated tokens**: ~100β400
---
### 6. `extract_entities`
Extract structured entities from a video transcript. Covers companies, indices, crypto, technologies, people, and more β 200+ entities with Korean and English support.
| Parameter | Type | Required | Description |
|-----------|------|:--------:|-------------|
| `video_id` | string | β
| YouTube video ID |
```json
{
"video_id": "abc123",
"entity_count": 5,
"entities": [
{"type": "company", "name": "NVIDIA", "keyword": "NVIDIA", "count": 12},
{"type": "technology", "name": "GPT-4", "keyword": "GPT-4", "count": 8},
{"type": "person", "name": "Sam Altman", "keyword": "Sam Altman", "count": 3}
]
}
```
**Estimated tokens**: ~150β300
---
### 7. `segment_topics`
Segment a video transcript into topical sections based on transition markers.
| Parameter | Type | Required | Description |
|-----------|------|:--------:|-------------|
| `video_id` | string | β
| YouTube video ID |
```json
{
"video_id": "abc123",
"segment_count": 4,
"segments": [
{"segment": 0, "char_count": 3200, "preview": "First 200 chars preview..."},
{"segment": 1, "char_count": 2800, "preview": "Next segment preview..."}
]
}
```
**Estimated tokens**: ~100β250
---
### 8. `search_youtube`
Search YouTube videos by keyword.
| Parameter | Type | Required | Default | Description |
|-----------|------|:--------:|---------|-------------|
| `query` | string | β
| β | Search keyword or phrase |
| `max_results` | int | β | `10` | Max results (1β50) |
| `channel_id` | string | β | β | Limit to specific channel |
| `published_after` | string | β | β | Filter by publish date (ISO 8601) |
| `order` | string | β | `"relevance"` | `relevance` Β· `date` Β· `rating` Β· `viewCount` |
**Estimated tokens**: ~200
---
### 9. `get_playlist`
Get playlist metadata and video list.
| Parameter | Type | Required | Default | Description |
|-----------|------|:--------:|---------|-------------|
| `playlist_id` | string | β
| β | YouTube playlist ID |
| `max_videos` | int | β | `50` | Max videos to retrieve |
**Estimated tokens**: ~200β500
---
## βοΈ Configuration
All settings are managed via environment variables (`MYI_` prefix):
| Variable | Default | Description |
|----------|---------|-------------|
| `MYI_DATA_DIR` | `~/.mcp-youtube-intelligence` | Data directory (DB, transcript files) |
| `MYI_STORAGE` | `sqlite` | Storage backend: `sqlite` Β· `postgres` |
| `MYI_SQLITE_PATH` | `{DATA_DIR}/data.db` | SQLite DB path |
| `MYI_POSTGRES_DSN` | β | PostgreSQL connection string |
| `MYI_TRANSCRIPT_DIR` | `{DATA_DIR}/transcripts` | Transcript file directory |
| `MYI_YT_DLP` | `yt-dlp` | yt-dlp binary path |
| `MYI_YOUTUBE_API_KEY` | β | YouTube Data API key |
| `MYI_MAX_COMMENTS` | `20` | Max comments to fetch |
| `MYI_MAX_TRANSCRIPT_CHARS` | `500000` | Max transcript length |
| `MYI_LLM_PROVIDER` | `auto` | LLM provider: `auto` Β· `openai` Β· `anthropic` Β· `google` Β· `ollama` Β· `vllm` Β· `lmstudio` |
| `MYI_OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama server URL |
| `MYI_OLLAMA_MODEL` | `llama3.1:8b` | Ollama model name |
| `MYI_VLLM_BASE_URL` | `http://localhost:8000` | vLLM server URL |
| `MYI_VLLM_MODEL` | β | vLLM model name |
| `MYI_LMSTUDIO_BASE_URL` | `http://localhost:1234` | LM Studio server URL |
| `MYI_LMSTUDIO_MODEL` | β | LM Studio model name |
| `OPENAI_API_KEY` | β | OpenAI API key |
| `OPENAI_BASE_URL` | β | OpenAI-compatible endpoint |
| `MYI_OPENAI_MODEL` | `gpt-4o-mini` | OpenAI model name |
| `ANTHROPIC_API_KEY` | β | Anthropic API key |
| `MYI_ANTHROPIC_MODEL` | `claude-sonnet-4-20250514` | Anthropic model name |
| `GOOGLE_API_KEY` | β | Google API key |
| `MYI_GOOGLE_MODEL` | `gemini-2.0-flash` | Google model name |
### LLM Integration
By default, **basic summarization** (preview-level, no API/model needed) is used. Connect an LLM for higher-quality summaries.
> β οΈ Extractive summary is sentence-extraction level. For high-quality summaries, LLM integration is recommended.
6 providers (3 cloud + 3 local) are supported, selected via `MYI_LLM_PROVIDER`:
| Provider | API Key Variable | Model Variable | Default Model |
|----------|-----------------|---------------|---------------|
| OpenAI | `OPENAI_API_KEY` | `MYI_OPENAI_MODEL` | `gpt-4o-mini` |
| Anthropic | `ANTHROPIC_API_KEY` | `MYI_ANTHROPIC_MODEL` | `claude-sonnet-4-20250514` |
| Google | `GOOGLE_API_KEY` | `MYI_GOOGLE_MODEL` | `gemini-2.0-flash` |
`MYI_LLM_PROVIDER` defaults to `auto`, which auto-detects based on available API keys.
**OpenAI**
```bash
pip install "mcp-youtube-intelligence[llm]"
export OPENAI_API_KEY=sk-...
export MYI_OPENAI_MODEL=gpt-4o-mini # optional
```
**Anthropic**
```bash
pip install "mcp-youtube-intelligence[anthropic-llm]"
export ANTHROPIC_API_KEY=sk-ant-...
export MYI_ANTHROPIC_MODEL=claude-sonnet-4-20250514 # optional
```
**Google**
```bash
pip install "mcp-youtube-intelligence[google-llm]"
export GOOGLE_API_KEY=AIza...
export MYI_GOOGLE_MODEL=gemini-2.0-flash # optional
```
**Explicit provider selection** (when multiple API keys are set):
```bash
export MYI_LLM_PROVIDER=anthropic # openai / anthropic / google / auto
```
### π Local LLM (Free, Offline-capable)
Get LLM-quality summaries without API costs.
#### Ollama (Recommended)
```bash
# 1. Install Ollama (https://ollama.ai)
# 2. Download a recommended model
ollama pull llama3.1:8b # English (4.7GB, general purpose)
ollama pull gemma2:9b # Multilingual (5.4GB, good Korean)
ollama pull qwen2.5:7b # Multilingual (4.4GB, strong CJK)
ollama pull aya-expanse:8b # Multilingual specialist (4.8GB, 23 languages)
# 3. Set environment variables
export MYI_LLM_PROVIDER=ollama
export MYI_OLLAMA_MODEL=qwen2.5:7b
```
#### vLLM
```bash
export MYI_LLM_PROVIDER=vllm
export MYI_VLLM_BASE_URL=http://localhost:8000
export MYI_VLLM_MODEL=Qwen/Qwen2.5-7B-Instruct
```
#### LM Studio
```bash
export MYI_LLM_PROVIDER=lmstudio
export MYI_LMSTUDIO_BASE_URL=http://localhost:1234
```
### π Recommended Models Guide
| Purpose | Model | Size | Korean | English | Quality |
|---------|-------|------|:------:|:-------:|:-------:|
| **Multilingual (Recommended)** | `qwen2.5:7b` | 4.4GB | β
Good | β
Good | βββ |
| **Multilingual specialist** | `aya-expanse:8b` | 4.8GB | β
Good | β
Good | βββ |
| **Best English** | `llama3.1:8b` | 4.7GB | β οΈ Fair | β
Best | βββ |
| **Lightweight (low-spec PC)** | `qwen2.5:3b` | 1.9GB | β
OK | β
OK | βββ |
| **Ultra-light (Raspberry Pi)** | `qwen2.5:1.5b` | 0.9GB | β οΈ Fair | β
OK | ββ |
| **Korean specialist** | `gemma2:9b` | 5.4GB | β
Good | β
Good | βββ |
| **Cloud best** | GPT-4o / Claude Sonnet | API | β
Best | β
Best | ββββ |
**Legacy approach** (Local LLM via OpenAI-compatible API):
```bash
export OPENAI_API_KEY=ollama
export OPENAI_BASE_URL=http://localhost:11434/v1
export MYI_OPENAI_MODEL=llama3.2
```
**Token cost comparison**:
| Mode | Client Tokens | Server Cost |
|------|:-:|:-:|
| No API key (extractive) | ~300 | Free |
| LLM (gpt-4o-mini) | ~500 | ~$0.001/video |
| LLM (claude-sonnet-4-20250514) | ~500 | ~$0.003/video |
| LLM (gemini-2.0-flash) | ~500 | ~$0.0005/video |
| Raw transcript (other MCP servers) | 5,000β50,000 | Free but destroys context |
---
## π Extractive Summarization Pipeline
Effective summarization without an LLM. Here's how it works:
```
Raw Transcript
β
βΌ
β Sentence splitting (Korean endings + English punctuation)
β
βΌ
β‘ Even chunking (split into N equal chunks)
β β Ensures coverage across beginning/middle/end
β
βΌ
β’ Sentence scoring
β β’ Length weight (longer = more informative)
β β’ Position weight (earlier sentences slightly preferred)
β β’ Keyword bonus ("in conclusion", "key point", etc. β Γ1.6)
β β’ Number bonus (statistics/data β Γ1.4)
β
βΌ
β£ Adaptive length (proportional to source, 500β2000 chars)
β
βΌ
β€ Reassemble in original order β Summary complete
```
---
## π Troubleshooting
### `yt-dlp` not found
```bash
pip install yt-dlp
# Or specify path:
export MYI_YT_DLP=/usr/local/bin/yt-dlp
```
### No transcript available
Some videos lack auto-generated or manual captions. Use `get_video` instead β it still returns metadata without a transcript.
### Slow comment loading
yt-dlp comment extraction can take 30β60s. Limited to 20 comments by default.
### SQLite database locked
Ensure only one server instance is running.
### OpenAI API errors
If LLM summarization fails, it automatically falls back to extractive summarization. Check your `OPENAI_API_KEY` and `MYI_OPENAI_MODEL`.
---
## π€ Contributing
### Development Setup
```bash
git clone https://github.com/JangHyuckYun/mcp-youtube-intelligence.git
cd mcp-youtube-intelligence
pip install -e ".[dev]"
```
### Tests
```bash
pytest tests/ -v
```
### Ideas for Contribution
- Additional entity dictionaries (Japanese, Chinese, etc.)
- Whisper integration for videos without captions
- Advanced comment sentiment analysis
- Export formats (CSV, Markdown)
---
## π Requirements
- Python β₯ 3.10
- `yt-dlp` installed and in PATH
- Internet connection
## π License
Apache 2.0 β see [LICENSE](LICENSE) for details.
---
## π Changelog
| Date | Version | Changes |
|------|---------|---------|
| 2025-02-18 | v0.1.0 | Initial release β 9 MCP tools, CLI (`mcp-yt`), SQLite storage |
| 2025-02-18 | v0.1.1 | Multi-LLM support (OpenAI/Anthropic/Google), license β Apache 2.0 |
| 2025-02-18 | v0.1.2 | yt-dlp transcript fallback, multilingual fallback, extractive summary improvements |