Kaltura MCP Server

get_caption_content

Extract video caption text or download subtitle files to read transcripts, analyze spoken content, and create accessible media.

Instructions

Get actual CAPTION TEXT or download captions file. USE WHEN: Reading video transcript, downloading subtitles, analyzing spoken content, creating accessible content. RETURNS: Full caption text and download URL. EXAMPLE: 'Get English subtitles for video', 'Read transcript to find mentions of topic'. Use after list_caption_assets to get specific caption ID.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`caption_asset_id`	Yes	Caption ID from list_caption_assets (format: '1_xyz789')

Implementation Reference

src/kaltura_mcp/tools/assets.py:97-189 (handler)

The core handler function that implements the logic for retrieving caption asset details from Kaltura API and downloading the caption text content via HTTP.

async def get_caption_content(
    manager: KalturaClientManager,
    caption_asset_id: str,
) -> str:
    """Get the actual text content of a caption asset."""

    if not CAPTION_AVAILABLE:
        return json.dumps(
            {
                "error": "Caption functionality is not available. The Caption plugin is not installed.",
                "captionAssetId": caption_asset_id,
            },
            indent=2,
        )

    client = manager.get_client()

    try:
        # Get caption asset details
        caption_asset = client.caption.captionAsset.get(caption_asset_id)

        # Get the caption content URL
        content_url = client.caption.captionAsset.getUrl(caption_asset_id)

        # Validate URL before making request
        if not content_url or not isinstance(content_url, str):
            download_error = "Invalid or missing caption URL"
            caption_text = None
        elif not content_url.startswith(("http://", "https://")):
            download_error = "Caption URL must use HTTP or HTTPS protocol"
            caption_text = None
        else:
            # Download the actual caption content
            caption_text = None
            download_error = None

            try:
                # Create a session for downloading
                session = requests.Session()

                # Set headers similar to reference implementation
                headers = {
                    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
                }

                # Download the caption content with timeout
                response = session.get(content_url, headers=headers, timeout=30)
                response.raise_for_status()

                # Get the text content
                caption_text = response.text

            except requests.exceptions.RequestException as e:
                download_error = f"Failed to download caption content: {str(e)}"
            except Exception as e:
                download_error = f"Error processing caption content: {str(e)}"

        result = {
            "captionAssetId": caption_asset_id,
            "entryId": caption_asset.entryId,
            "language": caption_asset.language.value
            if hasattr(caption_asset.language, "value")
            else str(caption_asset.language),
            "label": caption_asset.label,
            "format": caption_asset.format.value
            if hasattr(caption_asset.format, "value")
            else str(caption_asset.format),
            "contentUrl": content_url,
            "size": caption_asset.size,
            "accuracy": caption_asset.accuracy,
        }

        if caption_text is not None:
            result["captionText"] = caption_text
            result["textLength"] = len(caption_text)
            result["note"] = "Caption text content has been successfully downloaded and included."
        else:
            result["downloadError"] = download_error
            result[
                "note"
            ] = "Caption asset details retrieved but text content could not be downloaded. Use contentUrl for manual download."

        return json.dumps(result, indent=2)

    except Exception as e:
        return json.dumps(
            {
                "error": f"Failed to get caption content: {str(e)}",
                "captionAssetId": caption_asset_id,
            },
            indent=2,
        )

src/kaltura_mcp/server.py:448-461 (schema)

The input JSON schema and tool description defined in the list_tools() function for MCP tool registration.

types.Tool(
    name="get_caption_content",
    description="Get actual CAPTION TEXT or download captions file. USE WHEN: Reading video transcript, downloading subtitles, analyzing spoken content, creating accessible content. RETURNS: Full caption text and download URL. EXAMPLE: 'Get English subtitles for video', 'Read transcript to find mentions of topic'. Use after list_caption_assets to get specific caption ID.",
    inputSchema={
        "type": "object",
        "properties": {
            "caption_asset_id": {
                "type": "string",
                "description": "Caption ID from list_caption_assets (format: '1_xyz789')",
            },
        },
        "required": ["caption_asset_id"],
    },
),

src/kaltura_mcp/server.py:523-524 (registration)
Dispatch/execution point in the call_tool handler where get_caption_content is invoked based on tool name.
```
elif name == "get_caption_content":
    result = await get_caption_content(kaltura_manager, **arguments)
```
src/kaltura_mcp/server.py:36-36 (registration)
Import statement in server.py that brings in the get_caption_content function from tools module.
```
get_caption_content,
```
src/kaltura_mcp/tools/__init__.py:15-15 (helper)
Re-export of get_caption_content from assets.py in the tools package __init__.
```
get_caption_content,
```

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It does well by specifying what the tool returns ('Full caption text and download URL') and providing example use cases. However, it doesn't mention potential limitations like file format, size constraints, or authentication requirements that would be helpful for a download operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with clear sections (purpose, usage guidelines, returns, examples, prerequisites). Every sentence adds value with no redundant information. The front-loaded purpose statement immediately communicates the tool's function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter tool with no output schema, the description provides good context about what the tool does, when to use it, and what it returns. It could be more complete by specifying output format details or potential error conditions, but given the tool's relative simplicity and the absence of annotations, it covers most essential information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the schema already documents the single parameter. The description adds some context by mentioning the parameter comes from 'list_caption_assets' and provides format examples, but doesn't add significant semantic meaning beyond what the schema provides. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Get actual CAPTION TEXT or download captions file') and identifies the resource (caption content). It distinguishes from sibling tools by specifying this is for caption content retrieval rather than analytics, attachments, or other media assets.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance with a 'USE WHEN:' section listing specific scenarios (reading transcripts, downloading subtitles, analyzing content, creating accessible content). It also mentions when to use it in relation to a sibling tool ('Use after list_caption_assets to get specific caption ID'), giving clear context and prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/zoharbabin/kaltura-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server