Skip to main content
Glama
misbahsy

Video & Audio Editing MCP Server

by misbahsy

add_text_overlay

Insert custom text overlays into videos at specific times and positions, defining font, color, and placement for enhanced video editing and customization.

Instructions

Adds one or more text overlays to a video at specified times and positions.

Args: video_path: Path to the input main video file. output_video_path: Path to save the video with text overlays. text_elements: A list of dictionaries, where each dictionary defines a text overlay. Required keys for each text_element dict: - 'text': str - The text to display. - 'start_time': str or float - Start time (HH:MM:SS, or seconds). - 'end_time': str or float - End time (HH:MM:SS, or seconds). Optional keys for each text_element dict: - 'font_size': int (default: 24) - 'font_color': str (default: 'white') - 'x_pos': str or int (default: 'center') - 'y_pos': str or int (default: 'h-th-10') - 'box': bool (default: False) - 'box_color': str (default: 'black@0.5') - 'box_border_width': int (default: 0) Returns: A status message indicating success or failure.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
output_video_pathYes
text_elementsYes
video_pathYes

Implementation Reference

  • The core handler function for the 'add_text_overlay' MCP tool. It processes a list of text elements, constructs FFmpeg drawtext filters for each (handling escaping, positioning, timing, styling, and optional boxes), applies them via video filter chain, and outputs the result with audio copy fallback to re-encode.
    def add_text_overlay(video_path: str, output_video_path: str, text_elements: list[dict]) -> str:
        """Adds one or more text overlays to a video at specified times and positions.
    
        Args:
            video_path: Path to the input main video file.
            output_video_path: Path to save the video with text overlays.
            text_elements: A list of dictionaries, where each dictionary defines a text overlay.
                Required keys for each text_element dict:
                - 'text': str - The text to display.
                - 'start_time': str or float - Start time (HH:MM:SS, or seconds).
                - 'end_time': str or float - End time (HH:MM:SS, or seconds).
                Optional keys for each text_element dict:
                - 'font_size': int (default: 24)
                - 'font_color': str (default: 'white')
                - 'x_pos': str or int (default: 'center')
                - 'y_pos': str or int (default: 'h-th-10')
                - 'box': bool (default: False)
                - 'box_color': str (default: 'black@0.5')
                - 'box_border_width': int (default: 0)
        Returns:
            A status message indicating success or failure.
        """
        try:
            if not os.path.exists(video_path):
                return f"Error: Input video file not found at {video_path}"
            if not text_elements:
                return "Error: No text elements provided for overlay."
    
            input_stream = ffmpeg.input(video_path)
            drawtext_filters = []
    
            for element in text_elements:
                text = element.get('text')
                start_time = element.get('start_time')
                end_time = element.get('end_time')
    
                if text is None or start_time is None or end_time is None:
                    return f"Error: Text element is missing required keys (text, start_time, end_time)."
                
                # Thoroughly escape special characters in text
                # Escape single quotes, colons, commas, backslashes, and any other special chars
                safe_text = text.replace('\\', '\\\\').replace("'", "\\'").replace(':', '\\:').replace(',', '\\,')
                
                # Build filter parameters
                filter_params = [
                    f"text='{safe_text}'",
                    f"fontsize={element.get('font_size', 24)}",
                    f"fontcolor={element.get('font_color', 'white')}",
                    f"x={element.get('x_pos', '(w-text_w)/2')}",
                    f"y={element.get('y_pos', 'h-text_h-10')}",
                    f"enable=between(t\\,{start_time}\\,{end_time})"
                ]
    
                # Add box parameters if box is enabled
                if element.get('box', False):
                    filter_params.append("box=1")
                    filter_params.append(f"boxcolor={element.get('box_color', 'black@0.5')}")
                    if 'box_border_width' in element:
                        filter_params.append(f"boxborderw={element['box_border_width']}")
    
                # Add font file if specified
                if 'font_file' in element:
                    font_path = element['font_file'].replace('\\', '\\\\').replace("'", "\\'").replace(':', '\\:')
                    filter_params.append(f"fontfile='{font_path}'")
    
                # Join all parameters with colons
                drawtext_filter = f"drawtext={':'.join(filter_params)}"
                drawtext_filters.append(drawtext_filter)
    
            # Join all drawtext filters with commas
            final_vf_filter = ','.join(drawtext_filters)
    
            try:
                # First attempt: try to copy audio codec
                stream = input_stream.output(output_video_path, vf=final_vf_filter, acodec='copy')
                stream.run(capture_stdout=True, capture_stderr=True)
                return f"Text overlays added successfully (audio copied) to {output_video_path}"
            except ffmpeg.Error as e_acopy:
                try:
                    # Second attempt: re-encode audio if copying fails
                    stream_recode = input_stream.output(output_video_path, vf=final_vf_filter)
                    stream_recode.run(capture_stdout=True, capture_stderr=True)
                    return f"Text overlays added successfully (audio re-encoded) to {output_video_path}"
                except ffmpeg.Error as e_recode_all:
                    err_acopy_msg = e_acopy.stderr.decode('utf8') if e_acopy.stderr else str(e_acopy)
                    err_recode_msg = e_recode_all.stderr.decode('utf8') if e_recode_all.stderr else str(e_recode_all)
                    return f"Error adding text overlays. Audio copy attempt: {err_acopy_msg}. Full re-encode attempt: {err_recode_msg}"
    
        except ffmpeg.Error as e:
            error_message = e.stderr.decode('utf8') if e.stderr else str(e)
            return f"Error processing text overlays: {error_message}"
        except FileNotFoundError:
            return f"Error: Input video file not found."
        except Exception as e:
            return f"An unexpected error occurred: {str(e)}"
  • The docstring defines the input schema (parameters and their types/descriptions) and return type for the MCP tool, used by FastMCP for tool schema generation.
    """Adds one or more text overlays to a video at specified times and positions.
    
    Args:
        video_path: Path to the input main video file.
        output_video_path: Path to save the video with text overlays.
        text_elements: A list of dictionaries, where each dictionary defines a text overlay.
            Required keys for each text_element dict:
            - 'text': str - The text to display.
            - 'start_time': str or float - Start time (HH:MM:SS, or seconds).
            - 'end_time': str or float - End time (HH:MM:SS, or seconds).
            Optional keys for each text_element dict:
            - 'font_size': int (default: 24)
            - 'font_color': str (default: 'white')
            - 'x_pos': str or int (default: 'center')
            - 'y_pos': str or int (default: 'h-th-10')
            - 'box': bool (default: False)
            - 'box_color': str (default: 'black@0.5')
            - 'box_border_width': int (default: 0)
    Returns:
        A status message indicating success or failure.
  • server.py:569-569 (registration)
    The @mcp.tool() decorator registers this function as an MCP tool named 'add_text_overlay', making it available via the FastMCP server.
    def add_text_overlay(video_path: str, output_video_path: str, text_elements: list[dict]) -> str:
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that the tool modifies a video file (implied by 'adds' and output path), describes the return value ('status message indicating success or failure'), and details parameter defaults and optional keys. However, it lacks information on permissions, rate limits, or error handling specifics.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (Args, Returns) and uses bullet points for readability. It is appropriately sized but could be slightly more front-loaded; the first sentence states the purpose, but the detailed parameter info follows immediately.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description provides good context: it explains the tool's purpose, parameters in detail, and return value. However, it lacks information on behavioral aspects like error conditions or performance implications, which would be helpful for a video processing tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate fully. It provides comprehensive details for all 3 parameters, including required/optional keys for 'text_elements', data types, defaults, and examples (e.g., 'HH:MM:SS, or seconds'). This adds significant meaning beyond the minimal schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verb ('adds') and resource ('text overlays to a video'), including scope ('at specified times and positions'). It distinguishes from sibling tools like 'add_image_overlay' and 'add_subtitles' by specifying text overlays rather than images or subtitles.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context through the parameter details (e.g., specifying times and positions for text overlays), but does not explicitly state when to use this tool versus alternatives like 'add_image_overlay' or 'add_subtitles'. No explicit exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/misbahsy/video-audio-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server