YouTube Transcript Fetcher

Rufe Transkripte von jedem YouTube-Video mithilfe der Whisper-KI-Transkription ab. Durchsuche YouTube und erhalte Transkripte für die besten Ergebnisse. Kein YouTube-API-Schlüssel erforderlich.

Funktionen

Whisper-gestützt — Modernste KI-Transkription, über 99 % Genauigkeit
YouTube-Suche — Durchsuche YouTube und erhalte Transkripte für die besten Ergebnisse
Kein API-Schlüssel erforderlich — Funktioniert ohne Anmeldedaten der YouTube Data API
Mehrere Formate — Ausgabe als Text, JSON, SRT, VTT
Caching — SQLite-basierter Cache vermeidet erneute Transkriptionen
Keine Ratenbegrenzung — Whisper läuft lokal, keine externen API-Limits
CLI & Bibliothek — Verwendung als Befehlszeilentool oder Python-Modul
MCP-Server — Integration mit KI-Tools über das Model Context Protocol

Installation

# Clone the repository
git clone https://github.com/andrewctf/ytt.git
cd ytt

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/Mac
.venv\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements.txt

# Optional: GPU support
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

Hinweis: Für eine detaillierte GPU/CUDA-Einrichtung siehe QUICKSTART.md.

Zusätzliche Einrichtung für Whisper

Whisper benötigt ffmpeg für die Audioextraktion:

Windows (mit winget):

winget install ffmpeg

macOS:

brew install ffmpeg

Linux:

sudo apt install ffmpeg

Schnellstart

Für detaillierte Installations- und Einrichtungsanweisungen siehe QUICKSTART.md.

CLI

# Get transcript (Whisper is used by default)
python cli.py transcript VIDEO_ID

# Or with a full YouTube URL
python cli.py transcript "https://www.youtube.com/watch?v=a1JTPFfshI0"

# Different output formats
python cli.py transcript VIDEO_ID --format json
python cli.py transcript VIDEO_ID --format srt
python cli.py transcript VIDEO_ID --format vtt

# Save to file
python cli.py transcript VIDEO_ID --output transcript.txt

# Batch processing
python cli.py transcript VIDEO_ID1 VIDEO_ID2 VIDEO_ID3

# Search YouTube for videos and get transcripts
python cli.py search "Python tutorial" --limit 5 --with-transcripts

# Search only (no transcripts)
python cli.py search "Python tutorial" --limit 10

# JSON output for search
python cli.py search "Python tutorial" --format json

# Cache management
python cli.py cache-stats
python cli.py cache-stats --clean  # Remove expired entries

Python-Bibliothek

from src.service import get_transcript
from src.search_service import search, search_and_get_transcripts

# Basic usage
result = await get_transcript("VIDEO_ID")
print(result.content)

# With options
result = await get_transcript(
    "VIDEO_ID",
    language="en",
    output_format="json",
    use_cache=True,
)

# Access metadata
print(f"Source: {result.source}")      # 'whisper' or 'innertube'
print(f"Language: {result.language}")   # Detected language
print(f"Video ID: {result.video_id}")

# Search YouTube for videos
results = await search("Python tutorial", max_results=5)
for video in results:
    print(f"{video.title} ({video.video_id}) - {video.channel_name}")

# Search and get transcripts for results
results = await search_and_get_transcripts("Python tutorial", max_results=3, language="en")
for video, transcript in results:
    if transcript:
        print(f"{video.title}: {transcript.content[:100]}...")

Für synchrone Nutzung:

import asyncio
from src.service import get_transcript
from src.search_service import search

def fetch_transcript(video_id):
    return asyncio.run(get_transcript(video_id))

def search_videos(query, max_results=5):
    return asyncio.run(search(query, max_results=max_results))

result = fetch_transcript("VIDEO_ID")
print(result.content)

videos = search_videos("Python tutorial")

MCP-Server

Hinweis: Siehe QUICKSTART.md für eine detaillierte Konfiguration mit Claude Desktop, Cursor und VS Code.

Starte den MCP-Server:

python -m mcp_server.server

Der Server stellt drei Tools bereit:

get_transcript - Transkript für ein einzelnes Video abrufen
get_transcripts_batch - Transkripte für mehrere Videos gleichzeitig abrufen
search_videos - YouTube nach Videos durchsuchen, die einer Suchanfrage entsprechen

Oder integriere ihn in Claude Desktop, indem du ihn zu deinen MCP-Einstellungen hinzufügst:

{
  "mcpServers": {
    "yt-transcript": {
      "command": "python",
      "args": ["-m", "mcp_server.server"],
      "cwd": "/absolute/path/to/ytt"
    }
  }
}

Funktionsweise

Video ID → Cache Check
              ↓ found?
         Return Cached
              ↓ not found
         Whisper (primary)
         - Download audio via yt-dlp
         - Transcribe with faster-whisper
         - Returns word-level timestamps
              ↓ fails?
         Innertube API (fallback)
         - Extract API key from video page
         - Fetch caption tracks
         - Parse JSON3 timed text
              ↓
         Cache Result
              ↓
         Format & Return

Whisper (Primär)

Lädt Audio mit yt-dlp herunter
Transkribiert mit faster-whisper (CPU-optimiert)
Gibt Zeitstempel auf Wortebene und Segmenttext zurück
Funktioniert bei jedem Video mit Audio
~1-3-fache Echtzeit-Verarbeitungsgeschwindigkeit

Innertube API (Fallback)

Durchsucht die interne API von YouTube
Kein API-Schlüssel erforderlich
Schnell (~0,5-2s pro Video)
~85 % Abdeckung (einigen Videos fehlen Untertitel)
Ratenbegrenzt (~5 Anfragen/10s pro IP)

Ausgabeformate

Text (Standard)

Good morning, here we are, a live suturing course like nobody else has ever
done and what are we covering, we're covering every suturing technique...

JSON

{
  "video_id": "a1JTPFfshI0",
  "language": "en",
  "source": "whisper",
  "segments": [
    {"start": 0.0, "end": 4.5, "text": "Good morning, here we are..."},
    {"start": 4.5, "end": 9.2, "text": "a live suturing course..."}
  ]
}

SRT (SubRip)

1
00:00:00,000 --> 00:00:04,500
Good morning, here we are, a live suturing course...

2
00:00:04,500 --> 00:00:09,200
a live suturing course like nobody else...

VTT (WebVTT)

WEBVTT

00:00:00.000 --> 00:00:04.500
Good morning, here we are, a live suturing course...

00:00:04.500 --> 00:00:09.200
a live suturing course like nobody else...

Konfiguration

Bearbeite config.py, um das Verhalten anzupassen:

class Config:
    # Whisper settings
    WHISPER_MODEL = "base"      # tiny/base/small/medium/large
    WHISPER_FALLBACK_ENABLED = True

    # Cache settings
    CACHE_TTL_DAYS = 7
    CACHE_DB_PATH = ".transcript_cache.db"

    # Rate limiting (for Innertube fallback)
    RATE_LIMIT_RATE = 0.5       # tokens per second
    RATE_LIMIT_BURST = 5       # max bucket size

    # Batch processing
    MAX_BATCH_SIZE = 50

Whisper-Modelle

Modell	Geschwindigkeit	Genauigkeit	Speicher
tiny	10x	~75%	~1GB
base	7x	~85%	~1GB
small	4x	~90%	~2GB
medium	2x	~95%	~5GB
large	1x	~97%	~6GB

Das base-Modell wird für die meisten Anwendungsfälle empfohlen — schnell und ausreichend genau.

Dateistruktur

ytt/
├── src/
│   ├── __init__.py
│   ├── fetcher.py          # Innertube API client
│   ├── whisper_runner.py    # Whisper transcription
│   ├── parser.py            # Caption parsing utilities
│   ├── formatters.py        # Output formatters
│   ├── cache.py             # SQLite cache
│   ├── rate_limiter.py      # Token bucket
│   ├── service.py           # Orchestrator
│   ├── searcher.py          # YouTube search
│   ├── search_cache.py      # Search result cache
│   ├── search_service.py    # Search orchestrator
│   ├── cuda_dll_manager.py  # Auto-download CUDA libraries
│   └── exceptions.py        # Custom exceptions
├── mcp_server/
│   ├── __init__.py
│   └── server.py           # FastMCP server
├── cli.py                   # CLI entrypoint
├── main.py                  # Library entrypoint
├── config.py                # Configuration
├── requirements.txt         # Core dependencies
├── requirements-mcp.txt     # MCP dependencies
├── README.md
└── QUICKSTART.md

Fehlerbehebung

"No module named 'rich'"

Installiere die Abhängigkeiten:

pip install -r requirements.txt

Whisper schlägt mit "ffmpeg not found" fehl

Installiere ffmpeg (siehe Abschnitt Installation oben).

Langsame Transkriptionsgeschwindigkeit

Verwende ein kleineres Whisper-Modell (base statt large)
Verwende GPU-Beschleunigung, indem du device="cpu" in device="cuda" in whisper_runner.py änderst
Aktiviere den Cache, um erneute Transkriptionen zu vermeiden

Ratenbegrenzung durch Innertube

Der Innertube-Fallback ist durch YouTube ratenbegrenzt (~5 Anfragen/10s). Verwende Whisper als primäre Methode (Standard), um dies zu vermeiden. Der Cache verhindert zudem redundante Anfragen.

Cache funktioniert nicht

Überprüfe die Cache-Statistiken:

python cli.py cache-stats

Bereinige abgelaufene Einträge:

python cli.py cache-stats --clean

Entwicklung

Tests ausführen

pytest

Code formatieren

black src/
ruff check src/

Lizenz

MIT-Lizenz

YouTube Transcript Fetcher (YTT)