Skip to main content
Glama
AndrewCTF

YouTube Transcript Fetcher (YTT)

by AndrewCTF

YouTube 字幕获取器

使用 Whisper AI 转录功能从任何 YouTube 视频中获取字幕。搜索 YouTube 并获取搜索结果前几名的字幕。无需 YouTube API 密钥。

功能特性

  • Whisper 驱动 — 最先进的 AI 转录,准确率 99% 以上

  • YouTube 搜索 — 搜索 YouTube 并获取搜索结果前几名的字幕

  • 无需 API 密钥 — 无需 YouTube Data API 凭据即可工作

  • 多种格式 — 支持文本、JSON、SRT、VTT 输出

  • 缓存机制 — 基于 SQLite 的缓存,避免重复转录

  • 无速率限制 — Whisper 在本地运行,无外部 API 限制

  • CLI 与库 — 可作为命令行工具或 Python 模块使用

  • MCP 服务器 — 通过模型上下文协议 (MCP) 与 AI 工具集成

安装

# Clone the repository
git clone https://github.com/andrewctf/ytt.git
cd ytt

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/Mac
.venv\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements.txt

# Optional: GPU support
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

注意: 有关详细的 GPU/CUDA 设置,请参阅 QUICKSTART.md

Whisper 的额外设置

Whisper 需要 ffmpeg 进行音频提取:

Windows (使用 winget):

winget install ffmpeg

macOS:

brew install ffmpeg

Linux:

sudo apt install ffmpeg

快速入门

有关详细的安装和设置说明,请参阅 QUICKSTART.md

CLI

# Get transcript (Whisper is used by default)
python cli.py transcript VIDEO_ID

# Or with a full YouTube URL
python cli.py transcript "https://www.youtube.com/watch?v=a1JTPFfshI0"

# Different output formats
python cli.py transcript VIDEO_ID --format json
python cli.py transcript VIDEO_ID --format srt
python cli.py transcript VIDEO_ID --format vtt

# Save to file
python cli.py transcript VIDEO_ID --output transcript.txt

# Batch processing
python cli.py transcript VIDEO_ID1 VIDEO_ID2 VIDEO_ID3

# Search YouTube for videos and get transcripts
python cli.py search "Python tutorial" --limit 5 --with-transcripts

# Search only (no transcripts)
python cli.py search "Python tutorial" --limit 10

# JSON output for search
python cli.py search "Python tutorial" --format json

# Cache management
python cli.py cache-stats
python cli.py cache-stats --clean  # Remove expired entries

Python 库

from src.service import get_transcript
from src.search_service import search, search_and_get_transcripts

# Basic usage
result = await get_transcript("VIDEO_ID")
print(result.content)

# With options
result = await get_transcript(
    "VIDEO_ID",
    language="en",
    output_format="json",
    use_cache=True,
)

# Access metadata
print(f"Source: {result.source}")      # 'whisper' or 'innertube'
print(f"Language: {result.language}")   # Detected language
print(f"Video ID: {result.video_id}")

# Search YouTube for videos
results = await search("Python tutorial", max_results=5)
for video in results:
    print(f"{video.title} ({video.video_id}) - {video.channel_name}")

# Search and get transcripts for results
results = await search_and_get_transcripts("Python tutorial", max_results=3, language="en")
for video, transcript in results:
    if transcript:
        print(f"{video.title}: {transcript.content[:100]}...")

同步使用方式:

import asyncio
from src.service import get_transcript
from src.search_service import search

def fetch_transcript(video_id):
    return asyncio.run(get_transcript(video_id))

def search_videos(query, max_results=5):
    return asyncio.run(search(query, max_results=max_results))

result = fetch_transcript("VIDEO_ID")
print(result.content)

videos = search_videos("Python tutorial")

MCP 服务器

注意: 有关 Claude Desktop、Cursor 和 VS Code 的详细配置,请参阅 QUICKSTART.md

启动 MCP 服务器:

python -m mcp_server.server

该服务器公开了三个工具:

  • get_transcript - 获取单个视频的字幕

  • get_transcripts_batch - 并发获取多个视频的字幕

  • search_videos - 在 YouTube 上搜索匹配查询的视频

或者通过添加到 MCP 设置中与 Claude Desktop 集成:

{
  "mcpServers": {
    "yt-transcript": {
      "command": "python",
      "args": ["-m", "mcp_server.server"],
      "cwd": "/absolute/path/to/ytt"
    }
  }
}

工作原理

Video ID → Cache Check
              ↓ found?
         Return Cached
              ↓ not found
         Whisper (primary)
         - Download audio via yt-dlp
         - Transcribe with faster-whisper
         - Returns word-level timestamps
              ↓ fails?
         Innertube API (fallback)
         - Extract API key from video page
         - Fetch caption tracks
         - Parse JSON3 timed text
              ↓
         Cache Result
              ↓
         Format & Return

Whisper (主要方式)

  • 使用 yt-dlp 下载音频

  • 使用 faster-whisper (CPU 优化版) 进行转录

  • 返回逐词时间戳和片段文本

  • 适用于任何有音频的视频

  • 处理速度约为实时速度的 1-3 倍

Innertube API (回退方式)

  • 抓取 YouTube 内部 API

  • 无需 API 密钥

  • 速度快 (每个视频约 0.5-2 秒)

  • 覆盖率约 85% (部分视频缺少字幕)

  • 有速率限制 (每个 IP 约 5 次请求/10 秒)

输出格式

文本 (默认)

Good morning, here we are, a live suturing course like nobody else has ever
done and what are we covering, we're covering every suturing technique...

JSON

{
  "video_id": "a1JTPFfshI0",
  "language": "en",
  "source": "whisper",
  "segments": [
    {"start": 0.0, "end": 4.5, "text": "Good morning, here we are..."},
    {"start": 4.5, "end": 9.2, "text": "a live suturing course..."}
  ]
}

SRT (SubRip)

1
00:00:00,000 --> 00:00:04,500
Good morning, here we are, a live suturing course...

2
00:00:04,500 --> 00:00:09,200
a live suturing course like nobody else...

VTT (WebVTT)

WEBVTT

00:00:00.000 --> 00:00:04.500
Good morning, here we are, a live suturing course...

00:00:04.500 --> 00:00:09.200
a live suturing course like nobody else...

配置

编辑 config.py 以自定义行为:

class Config:
    # Whisper settings
    WHISPER_MODEL = "base"      # tiny/base/small/medium/large
    WHISPER_FALLBACK_ENABLED = True

    # Cache settings
    CACHE_TTL_DAYS = 7
    CACHE_DB_PATH = ".transcript_cache.db"

    # Rate limiting (for Innertube fallback)
    RATE_LIMIT_RATE = 0.5       # tokens per second
    RATE_LIMIT_BURST = 5       # max bucket size

    # Batch processing
    MAX_BATCH_SIZE = 50

Whisper 模型

模型

速度

准确率

内存

tiny

10x

~75%

~1GB

base

7x

~85%

~1GB

small

4x

~90%

~2GB

medium

2x

~95%

~5GB

large

1x

~97%

~6GB

对于大多数用例,推荐使用 base 模型——速度快且足够准确。

文件结构

ytt/
├── src/
│   ├── __init__.py
│   ├── fetcher.py          # Innertube API client
│   ├── whisper_runner.py    # Whisper transcription
│   ├── parser.py            # Caption parsing utilities
│   ├── formatters.py        # Output formatters
│   ├── cache.py             # SQLite cache
│   ├── rate_limiter.py      # Token bucket
│   ├── service.py           # Orchestrator
│   ├── searcher.py          # YouTube search
│   ├── search_cache.py      # Search result cache
│   ├── search_service.py    # Search orchestrator
│   ├── cuda_dll_manager.py  # Auto-download CUDA libraries
│   └── exceptions.py        # Custom exceptions
├── mcp_server/
│   ├── __init__.py
│   └── server.py           # FastMCP server
├── cli.py                   # CLI entrypoint
├── main.py                  # Library entrypoint
├── config.py                # Configuration
├── requirements.txt         # Core dependencies
├── requirements-mcp.txt     # MCP dependencies
├── README.md
└── QUICKSTART.md

故障排除

"No module named 'rich'"

安装依赖项:

pip install -r requirements.txt

Whisper 报错 "ffmpeg not found"

安装 ffmpeg (见上文安装部分)。

转录速度慢

  • 使用较小的 Whisper 模型 (用 base 代替 large)

  • 通过在 whisper_runner.py 中将 device="cpu" 改为 device="cuda" 来使用 GPU 加速

  • 启用缓存以避免重复转录

Innertube 速率限制

Innertube 回退方式受到 YouTube 的速率限制 (约 5 次请求/10 秒)。使用 Whisper 作为主要方式 (默认) 可避免此问题。缓存也能防止冗余请求。

缓存不工作

检查缓存统计信息:

python cli.py cache-stats

清理过期条目:

python cli.py cache-stats --clean

开发

运行测试

pytest

格式化代码

black src/
ruff check src/

许可证

MIT 许可证

-
security - not tested
A
license - permissive license
-
quality - not tested

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/AndrewCTF/YTT'

If you have feedback or need assistance with the MCP directory API, please join our Discord server