YouTube Transcript Fetcher

Whisper AI 전사를 사용하여 모든 YouTube 비디오에서 자막을 가져옵니다. YouTube를 검색하고 상위 결과에 대한 자막을 얻으세요. YouTube API 키가 필요하지 않습니다.

주요 기능

Whisper 기반 — 최첨단 AI 전사, 99% 이상의 정확도
YouTube 검색 — YouTube를 검색하고 상위 결과에 대한 자막 획득
API 키 불필요 — YouTube Data API 자격 증명 없이 작동
다양한 형식 — 텍스트, JSON, SRT, VTT 출력
캐싱 — SQLite 기반 캐시로 재전사 방지
속도 제한 없음 — Whisper가 로컬에서 실행되므로 외부 API 제한 없음
CLI 및 라이브러리 — 명령줄 도구 또는 Python 모듈로 사용 가능
MCP 서버 — Model Context Protocol을 통해 AI 도구와 통합

설치

# Clone the repository
git clone https://github.com/andrewctf/ytt.git
cd ytt

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/Mac
.venv\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements.txt

# Optional: GPU support
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

참고: 상세한 GPU/CUDA 설정은 QUICKSTART.md를 참조하세요.

Whisper를 위한 추가 설정

Whisper는 오디오 추출을 위해 ffmpeg가 필요합니다:

Windows (winget 사용):

winget install ffmpeg

macOS:

brew install ffmpeg

Linux:

sudo apt install ffmpeg

빠른 시작

상세한 설치 및 설정 지침은 QUICKSTART.md를 참조하세요.

CLI

# Get transcript (Whisper is used by default)
python cli.py transcript VIDEO_ID

# Or with a full YouTube URL
python cli.py transcript "https://www.youtube.com/watch?v=a1JTPFfshI0"

# Different output formats
python cli.py transcript VIDEO_ID --format json
python cli.py transcript VIDEO_ID --format srt
python cli.py transcript VIDEO_ID --format vtt

# Save to file
python cli.py transcript VIDEO_ID --output transcript.txt

# Batch processing
python cli.py transcript VIDEO_ID1 VIDEO_ID2 VIDEO_ID3

# Search YouTube for videos and get transcripts
python cli.py search "Python tutorial" --limit 5 --with-transcripts

# Search only (no transcripts)
python cli.py search "Python tutorial" --limit 10

# JSON output for search
python cli.py search "Python tutorial" --format json

# Cache management
python cli.py cache-stats
python cli.py cache-stats --clean  # Remove expired entries

Python 라이브러리

from src.service import get_transcript
from src.search_service import search, search_and_get_transcripts

# Basic usage
result = await get_transcript("VIDEO_ID")
print(result.content)

# With options
result = await get_transcript(
    "VIDEO_ID",
    language="en",
    output_format="json",
    use_cache=True,
)

# Access metadata
print(f"Source: {result.source}")      # 'whisper' or 'innertube'
print(f"Language: {result.language}")   # Detected language
print(f"Video ID: {result.video_id}")

# Search YouTube for videos
results = await search("Python tutorial", max_results=5)
for video in results:
    print(f"{video.title} ({video.video_id}) - {video.channel_name}")

# Search and get transcripts for results
results = await search_and_get_transcripts("Python tutorial", max_results=3, language="en")
for video, transcript in results:
    if transcript:
        print(f"{video.title}: {transcript.content[:100]}...")

동기식 사용:

import asyncio
from src.service import get_transcript
from src.search_service import search

def fetch_transcript(video_id):
    return asyncio.run(get_transcript(video_id))

def search_videos(query, max_results=5):
    return asyncio.run(search(query, max_results=max_results))

result = fetch_transcript("VIDEO_ID")
print(result.content)

videos = search_videos("Python tutorial")

MCP 서버

참고: Claude Desktop, Cursor, VS Code를 사용한 상세 설정은 QUICKSTART.md를 참조하세요.

MCP 서버 시작:

python -m mcp_server.server

서버는 세 가지 도구를 제공합니다:

get_transcript - 단일 비디오의 자막 가져오기
get_transcripts_batch - 여러 비디오의 자막을 동시에 가져오기
search_videos - 쿼리와 일치하는 YouTube 비디오 검색

또는 MCP 설정에 추가하여 Claude Desktop과 통합하세요:

{
  "mcpServers": {
    "yt-transcript": {
      "command": "python",
      "args": ["-m", "mcp_server.server"],
      "cwd": "/absolute/path/to/ytt"
    }
  }
}

작동 원리

Video ID → Cache Check
              ↓ found?
         Return Cached
              ↓ not found
         Whisper (primary)
         - Download audio via yt-dlp
         - Transcribe with faster-whisper
         - Returns word-level timestamps
              ↓ fails?
         Innertube API (fallback)
         - Extract API key from video page
         - Fetch caption tracks
         - Parse JSON3 timed text
              ↓
         Cache Result
              ↓
         Format & Return

Whisper (기본)

yt-dlp를 사용하여 오디오 다운로드
faster-whisper(CPU 최적화)를 사용하여 전사
단어 단위 타임스탬프 및 세그먼트 텍스트 반환
오디오가 있는 모든 비디오에서 작동
실시간 처리 속도의 약 1~3배

Innertube API (대체)

YouTube 내부 API 스크래핑
API 키 불필요
빠름 (비디오당 약 0.5~2초)
약 85% 커버리지 (일부 비디오는 자막 없음)
속도 제한 있음 (IP당 약 5회 요청/10초)

출력 형식

텍스트 (기본값)

Good morning, here we are, a live suturing course like nobody else has ever
done and what are we covering, we're covering every suturing technique...

JSON

{
  "video_id": "a1JTPFfshI0",
  "language": "en",
  "source": "whisper",
  "segments": [
    {"start": 0.0, "end": 4.5, "text": "Good morning, here we are..."},
    {"start": 4.5, "end": 9.2, "text": "a live suturing course..."}
  ]
}

SRT (SubRip)

1
00:00:00,000 --> 00:00:04,500
Good morning, here we are, a live suturing course...

2
00:00:04,500 --> 00:00:09,200
a live suturing course like nobody else...

VTT (WebVTT)

WEBVTT

00:00:00.000 --> 00:00:04.500
Good morning, here we are, a live suturing course...

00:00:04.500 --> 00:00:09.200
a live suturing course like nobody else...

설정

config.py를 편집하여 동작을 사용자 정의하세요:

class Config:
    # Whisper settings
    WHISPER_MODEL = "base"      # tiny/base/small/medium/large
    WHISPER_FALLBACK_ENABLED = True

    # Cache settings
    CACHE_TTL_DAYS = 7
    CACHE_DB_PATH = ".transcript_cache.db"

    # Rate limiting (for Innertube fallback)
    RATE_LIMIT_RATE = 0.5       # tokens per second
    RATE_LIMIT_BURST = 5       # max bucket size

    # Batch processing
    MAX_BATCH_SIZE = 50

Whisper 모델

모델	속도	정확도	메모리
tiny	10배	~75%	~1GB
base	7배	~85%	~1GB
small	4배	~90%	~2GB
medium	2배	~95%	~5GB
large	1배	~97%	~6GB

대부분의 경우 빠르고 정확도가 충분한 base 모델을 권장합니다.

파일 구조

ytt/
├── src/
│   ├── __init__.py
│   ├── fetcher.py          # Innertube API client
│   ├── whisper_runner.py    # Whisper transcription
│   ├── parser.py            # Caption parsing utilities
│   ├── formatters.py        # Output formatters
│   ├── cache.py             # SQLite cache
│   ├── rate_limiter.py      # Token bucket
│   ├── service.py           # Orchestrator
│   ├── searcher.py          # YouTube search
│   ├── search_cache.py      # Search result cache
│   ├── search_service.py    # Search orchestrator
│   ├── cuda_dll_manager.py  # Auto-download CUDA libraries
│   └── exceptions.py        # Custom exceptions
├── mcp_server/
│   ├── __init__.py
│   └── server.py           # FastMCP server
├── cli.py                   # CLI entrypoint
├── main.py                  # Library entrypoint
├── config.py                # Configuration
├── requirements.txt         # Core dependencies
├── requirements-mcp.txt     # MCP dependencies
├── README.md
└── QUICKSTART.md

문제 해결

"No module named 'rich'"

종속성 설치:

pip install -r requirements.txt

Whisper가 "ffmpeg not found" 오류 발생

ffmpeg를 설치하세요 (위의 설치 섹션 참조).

느린 전사 속도

더 작은 Whisper 모델 사용 (large 대신 base)
whisper_runner.py에서 device="cpu"를 device="cuda"로 변경하여 GPU 가속 사용
캐시를 활성화하여 재전사 방지

Innertube 속도 제한

Innertube 대체 방식은 YouTube에 의해 속도 제한이 걸립니다 (약 5회 요청/10초). 이를 피하려면 Whisper를 기본값으로 사용하세요. 캐시 또한 중복 요청을 방지합니다.

캐시가 작동하지 않음

캐시 통계 확인:

python cli.py cache-stats

만료된 항목 정리:

python cli.py cache-stats --clean

개발

테스트 실행

pytest

코드 포맷팅

black src/
ruff check src/

라이선스

MIT 라이선스

YouTube Transcript Fetcher (YTT)