YouTube Transcript Downloader
Server Details
An MCP server that gives any LLM or agent clean YouTube transcripts on demand: a single video, a whole channel, or a playlist, plus AI cleanup of auto-generated captions. API-key auth, credit-based, same backend as the public v1 API. Get a free API key with 25 free credits at youtubetranscriptdownload.com/account.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4.7/5 across 4 of 4 tools scored.
Each tool has a clearly distinct purpose: single video, channel recent videos, playlist videos, and caption cleaning. Descriptions explicitly guide when to use each, avoiding ambiguity.
All tool names follow a consistent verb_noun pattern in snake_case (get_channel_transcripts, get_playlist_transcripts, get_transcript, polish_transcript). No mixing of styles or vague verbs.
Four tools is well-scoped for a transcript downloader: covering single video, batch from channel, batch from playlist, and a polishing feature. Neither too few nor too many.
The tool set covers the full expected lifecycle: retrieving transcripts from individual videos, channels, and playlists, plus a cleaning function for auto-generated captions. No obvious gaps.
Available Tools
4 toolsget_channel_transcriptsARead-onlyInspect
Get transcripts for a YouTube channel's most recent videos (newest first) as timestamped markdown, one section per video. Use for research across a creator's recent output; for one known video use get_transcript. Read-only; requires an API key. Charges 1 credit per video that returns a transcript, including repeat calls; videos without captions are skipped free. A 10-video call typically costs up to 10 credits, so start with a small limit. Rate limit: 5 requests per 10 seconds.
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | YouTube channel URL or handle (e.g. https://www.youtube.com/@lexfridman or @lexfridman) | |
| limit | No | Number of most-recent videos to fetch, 1-50 (default 10). Upper bound on the credit charge for this call. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses read-only nature, credit cost per video, free skipping for videos without captions, and rate limit of 5 requests per 10 seconds. These details go beyond the annotations (readOnlyHint, destructiveHint). No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Description is informative and well-structured, starting with purpose then adding usage and cost details. It is appropriately sized but could be slightly more concise by removing redundant phrasing.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has 2 parameters, no output schema, and annotations covering safety, the description provides all necessary context: purpose, usage, cost, rate limits, and output format. No gaps remain.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema already describes both parameters (url and limit) with example and credit bound. The description reiterates these but doesn't add new semantic meaning beyond what the schema provides. Baseline 3 applies due to 100% schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool retrieves transcripts for a channel's most recent videos as timestamped markdown, and distinguishes itself from get_transcript (for single video). The verb 'get' and resource 'channel transcripts' are specific.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly advises use for research across recent output, and when to use get_transcript instead. Also recommends starting with a small limit due to credit cost, providing clear when-to-use guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_playlist_transcriptsARead-onlyInspect
Get transcripts for the videos in a YouTube playlist (in playlist order) as timestamped markdown, one section per video. Use for working through a course, series, or curated list; for one known video use get_transcript. Read-only; requires an API key. Charges 1 credit per video that returns a transcript, including repeat calls; videos without captions are skipped free. A 10-video call typically costs up to 10 credits, so start with a small limit. Rate limit: 5 requests per 10 seconds.
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | YouTube playlist URL (e.g. https://www.youtube.com/playlist?list=PLxxxxxx) | |
| limit | No | Number of videos to fetch from the start of the playlist, 1-50 (default 10). Upper bound on the credit charge for this call. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond annotations (read-only, non-destructive), description discloses credit charging per video, video skipping for no captions, rate limit, and cost estimation.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four sentences efficiently cover purpose, use case, cost, and rate limit without redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema, description adequately describes output format and all important behavioral aspects for agent decision-making.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema fully describes parameters; description adds cost context for limit parameter, enhancing semantics.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it gets transcripts for videos in a YouTube playlist in order as timestamped markdown, and distinguishes from get_transcript for single videos.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly says to use for courses/series, provides alternative for single videos, and mentions cost/rate limits as usage guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_transcriptARead-onlyInspect
Get the full transcript of a single YouTube video as timestamped markdown. Read-only: fetches existing captions, modifies nothing. Requires an API key; each successful call charges 1 credit, including repeat calls for the same video, so reuse a transcript already in context instead of re-fetching. Videos without captions return an error and cost nothing. Rate limit: 5 requests per 10 seconds.
| Name | Required | Description | Default |
|---|---|---|---|
| video | Yes | YouTube video ID (e.g. dQw4w9WgXcQ) or full video URL (youtube.com/watch?v=... or youtu.be/... forms) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses read-only nature, credit cost per call, rate limit (5 requests per 10 seconds), and that failed calls cost nothing. Adds value beyond annotations (readOnlyHint, destructiveHint) by specifying exact constraints and error behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Concise at 5 sentences, each sentence adds distinct value. Front-loaded with purpose, then cost, rate limits, and error handling. No redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Covers purpose, parameter format, return format (timestamped markdown), cost, rate limit, and error scenario. Lacks details like language of captions or max transcript length, but adequate for a simple read-only tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% and description reiterates that the parameter accepts video ID or full URL. No additional semantic information provided beyond what schema already states.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states it gets the full transcript of a single YouTube video as timestamped markdown. Distinguishes from sibling tools (get_channel_transcripts, get_playlist_transcripts, polish_transcript) by focusing on a single video.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides good guidance on when to use (single video), cost implications (reuse transcript to avoid charges), and error handling (no captions returns error). Lacks explicit exclusions for siblings, but purpose clarity compensates.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
polish_transcriptARead-onlyInspect
Get a cleaned-up transcript of a YouTube video's auto-generated captions: punctuation and capitalisation restored, filler and false starts removed, paragraphs added, misheard names fixed, faithful to what was said. Use when raw captions are too messy to read or quote; for a plain transcript use get_transcript. Read-only; requires an API key. Each call charges credits by transcript length (about 3 per 1,000 words, minimum 5), including repeat calls, so keep the result in context. Human-uploaded captions (already clean) and transcripts over ~7,000 words return an error without charging. Rate limit: 5 requests per 10 seconds.
| Name | Required | Description | Default |
|---|---|---|---|
| video | Yes | YouTube video ID (e.g. dQw4w9WgXcQ) or full video URL (youtube.com/watch?v=... or youtu.be/... forms) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Description adds significant behavioral context beyond annotations: requires API key, charges credits, error conditions (human-uploaded, over 7000 words), rate limit (5 requests per 10 seconds). Annotations already declare readOnlyHint=true and destructiveHint=false, which the description confirms.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Description is information-dense yet well-structured. The first sentence immediately states the core function and improvements. Subsequent sentences efficiently cover usage guidance, costs, errors, and limitations. No wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (charging, error handling, rate limiting, no output schema), the description covers all essential contextual information: when to use, what happens on success/failure, pricing, and performance limits. The lack of return value description is acceptable since no output schema exists and the purpose is clear.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The single parameter 'video' has full schema description coverage. The description adds critical format clarification: 'YouTube video ID (e.g. dQw4w9WgXcQ) or full video URL' and lists accepted forms. This goes beyond the schema's simple type declaration.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Get a cleaned-up transcript' with specific improvements (punctuation, capitalization, filler removal, etc.). It distinguishes itself from the sibling tool get_transcript by specifying it returns a cleaned version versus a plain transcript.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states when to use: 'when raw captions are too messy to read or quote' and when not: 'for a plain transcript use get_transcript'. Also covers prerequisites (API key), constraints (human-uploaded captions return error, length limit ~7000 words), and cost (credit charges).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!