YouTube 字幕获取器

使用 Whisper AI 转录功能从任何 YouTube 视频中获取字幕。搜索 YouTube 并获取搜索结果前几名的字幕。无需 YouTube API 密钥。

功能特性

Whisper 驱动 — 最先进的 AI 转录，准确率 99% 以上
YouTube 搜索 — 搜索 YouTube 并获取搜索结果前几名的字幕
无需 API 密钥 — 无需 YouTube Data API 凭据即可工作
多种格式 — 支持文本、JSON、SRT、VTT 输出
缓存机制 — 基于 SQLite 的缓存，避免重复转录
无速率限制 — Whisper 在本地运行，无外部 API 限制
CLI 与库 — 可作为命令行工具或 Python 模块使用
MCP 服务器 — 通过模型上下文协议 (MCP) 与 AI 工具集成

安装

# Clone the repository
git clone https://github.com/andrewctf/ytt.git
cd ytt

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/Mac
.venv\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements.txt

# Optional: GPU support
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

注意： 有关详细的 GPU/CUDA 设置，请参阅 QUICKSTART.md。

Whisper 的额外设置

Whisper 需要 ffmpeg 进行音频提取：

Windows (使用 winget):

winget install ffmpeg

macOS:

brew install ffmpeg

Linux:

sudo apt install ffmpeg

快速入门

有关详细的安装和设置说明，请参阅 QUICKSTART.md。

CLI

# Get transcript (Whisper is used by default)
python cli.py transcript VIDEO_ID

# Or with a full YouTube URL
python cli.py transcript "https://www.youtube.com/watch?v=a1JTPFfshI0"

# Different output formats
python cli.py transcript VIDEO_ID --format json
python cli.py transcript VIDEO_ID --format srt
python cli.py transcript VIDEO_ID --format vtt

# Save to file
python cli.py transcript VIDEO_ID --output transcript.txt

# Batch processing
python cli.py transcript VIDEO_ID1 VIDEO_ID2 VIDEO_ID3

# Search YouTube for videos and get transcripts
python cli.py search "Python tutorial" --limit 5 --with-transcripts

# Search only (no transcripts)
python cli.py search "Python tutorial" --limit 10

# JSON output for search
python cli.py search "Python tutorial" --format json

# Cache management
python cli.py cache-stats
python cli.py cache-stats --clean  # Remove expired entries

Python 库

from src.service import get_transcript
from src.search_service import search, search_and_get_transcripts

# Basic usage
result = await get_transcript("VIDEO_ID")
print(result.content)

# With options
result = await get_transcript(
    "VIDEO_ID",
    language="en",
    output_format="json",
    use_cache=True,
)

# Access metadata
print(f"Source: {result.source}")      # 'whisper' or 'innertube'
print(f"Language: {result.language}")   # Detected language
print(f"Video ID: {result.video_id}")

# Search YouTube for videos
results = await search("Python tutorial", max_results=5)
for video in results:
    print(f"{video.title} ({video.video_id}) - {video.channel_name}")

# Search and get transcripts for results
results = await search_and_get_transcripts("Python tutorial", max_results=3, language="en")
for video, transcript in results:
    if transcript:
        print(f"{video.title}: {transcript.content[:100]}...")

同步使用方式：

import asyncio
from src.service import get_transcript
from src.search_service import search

def fetch_transcript(video_id):
    return asyncio.run(get_transcript(video_id))

def search_videos(query, max_results=5):
    return asyncio.run(search(query, max_results=max_results))

result = fetch_transcript("VIDEO_ID")
print(result.content)

videos = search_videos("Python tutorial")

MCP 服务器

注意： 有关 Claude Desktop、Cursor 和 VS Code 的详细配置，请参阅 QUICKSTART.md。

启动 MCP 服务器：

python -m mcp_server.server

该服务器公开了三个工具：

get_transcript - 获取单个视频的字幕
get_transcripts_batch - 并发获取多个视频的字幕
search_videos - 在 YouTube 上搜索匹配查询的视频

或者通过添加到 MCP 设置中与 Claude Desktop 集成：

{
  "mcpServers": {
    "yt-transcript": {
      "command": "python",
      "args": ["-m", "mcp_server.server"],
      "cwd": "/absolute/path/to/ytt"
    }
  }
}

工作原理

Video ID → Cache Check
              ↓ found?
         Return Cached
              ↓ not found
         Whisper (primary)
         - Download audio via yt-dlp
         - Transcribe with faster-whisper
         - Returns word-level timestamps
              ↓ fails?
         Innertube API (fallback)
         - Extract API key from video page
         - Fetch caption tracks
         - Parse JSON3 timed text
              ↓
         Cache Result
              ↓
         Format & Return

Whisper (主要方式)

使用 yt-dlp 下载音频
使用 faster-whisper (CPU 优化版) 进行转录
返回逐词时间戳和片段文本
适用于任何有音频的视频
处理速度约为实时速度的 1-3 倍

Innertube API (回退方式)

抓取 YouTube 内部 API
无需 API 密钥
速度快 (每个视频约 0.5-2 秒)
覆盖率约 85% (部分视频缺少字幕)
有速率限制 (每个 IP 约 5 次请求/10 秒)

输出格式

文本 (默认)

Good morning, here we are, a live suturing course like nobody else has ever
done and what are we covering, we're covering every suturing technique...

JSON

{
  "video_id": "a1JTPFfshI0",
  "language": "en",
  "source": "whisper",
  "segments": [
    {"start": 0.0, "end": 4.5, "text": "Good morning, here we are..."},
    {"start": 4.5, "end": 9.2, "text": "a live suturing course..."}
  ]
}

SRT (SubRip)

1
00:00:00,000 --> 00:00:04,500
Good morning, here we are, a live suturing course...

2
00:00:04,500 --> 00:00:09,200
a live suturing course like nobody else...

VTT (WebVTT)

WEBVTT

00:00:00.000 --> 00:00:04.500
Good morning, here we are, a live suturing course...

00:00:04.500 --> 00:00:09.200
a live suturing course like nobody else...

配置

编辑 config.py 以自定义行为：

class Config:
    # Whisper settings
    WHISPER_MODEL = "base"      # tiny/base/small/medium/large
    WHISPER_FALLBACK_ENABLED = True

    # Cache settings
    CACHE_TTL_DAYS = 7
    CACHE_DB_PATH = ".transcript_cache.db"

    # Rate limiting (for Innertube fallback)
    RATE_LIMIT_RATE = 0.5       # tokens per second
    RATE_LIMIT_BURST = 5       # max bucket size

    # Batch processing
    MAX_BATCH_SIZE = 50

Whisper 模型

模型	速度	准确率	内存
tiny	10x	~75%	~1GB
base	7x	~85%	~1GB
small	4x	~90%	~2GB
medium	2x	~95%	~5GB
large	1x	~97%	~6GB

对于大多数用例，推荐使用 base 模型——速度快且足够准确。

文件结构

ytt/
├── src/
│   ├── __init__.py
│   ├── fetcher.py          # Innertube API client
│   ├── whisper_runner.py    # Whisper transcription
│   ├── parser.py            # Caption parsing utilities
│   ├── formatters.py        # Output formatters
│   ├── cache.py             # SQLite cache
│   ├── rate_limiter.py      # Token bucket
│   ├── service.py           # Orchestrator
│   ├── searcher.py          # YouTube search
│   ├── search_cache.py      # Search result cache
│   ├── search_service.py    # Search orchestrator
│   ├── cuda_dll_manager.py  # Auto-download CUDA libraries
│   └── exceptions.py        # Custom exceptions
├── mcp_server/
│   ├── __init__.py
│   └── server.py           # FastMCP server
├── cli.py                   # CLI entrypoint
├── main.py                  # Library entrypoint
├── config.py                # Configuration
├── requirements.txt         # Core dependencies
├── requirements-mcp.txt     # MCP dependencies
├── README.md
└── QUICKSTART.md

故障排除

"No module named 'rich'"

安装依赖项：

pip install -r requirements.txt

Whisper 报错 "ffmpeg not found"

安装 ffmpeg (见上文安装部分)。

转录速度慢

使用较小的 Whisper 模型 (用 base 代替 large)
通过在 whisper_runner.py 中将 device="cpu" 改为 device="cuda" 来使用 GPU 加速
启用缓存以避免重复转录

Innertube 速率限制

Innertube 回退方式受到 YouTube 的速率限制 (约 5 次请求/10 秒)。使用 Whisper 作为主要方式 (默认) 可避免此问题。缓存也能防止冗余请求。

缓存不工作

检查缓存统计信息：

python cli.py cache-stats

清理过期条目：

python cli.py cache-stats --clean

开发

运行测试

pytest

格式化代码

black src/
ruff check src/

许可证

MIT 许可证

YouTube Transcript Fetcher (YTT)