Skip to main content
Glama
allvoicelab

All Voice Lab MCP Server

Official
by allvoicelab

subtitle_extraction

Extract hardcoded subtitles from video files using OCR technology. Supports MP4 and MOV formats up to 2GB, with automatic or specified language detection.

Instructions

[AllVoiceLab Tool] Extract subtitles from a video using OCR technology.

This tool processes a video file to extract hardcoded subtitles. The process runs asynchronously with status polling
and returns the extracted subtitles when complete.

Args:
    video_file_path (str): Path to the video file (MP4, MOV). Max size 2GB.
    language_code (str, optional): Language code for subtitle text detection (e.g., 'en', 'zh'). Defaults to 'auto'.
    name (str, optional): Optional project name for identification.
    output_dir (str, optional): Output directory for the downloaded result file. It has a default value.

Returns:
    TextContent containing the file path to the srt file or error message.
    If the process takes longer than expected, returns the project ID for later status checking. 

Note:
    - Supported video formats: MP4, MOV
    - Video file size limit: 10 seconds to 200 minutes, max 2GB.
    - If the process takes longer than max_polling_time, use 'get_extraction_info' to check status and retrieve results.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
video_file_pathYes
language_codeNoauto
nameNo
output_dirNo

Implementation Reference

  • The MCP tool handler function that implements the subtitle_extraction logic: validates video file (path, existence, format MP4/MOV, size <=2GB), calls AllVoiceLab client to submit extraction task, polls status up to 10min, downloads SRT on success, returns TextContent with file path or status.
    def subtitle_extraction_tool(
        video_file_path: str,
        language_code: str = "auto",
        name: str = None,
        output_dir: str = None
    ) -> TextContent:
        all_voice_lab = get_client()
        output_dir = all_voice_lab.get_output_path(output_dir)
        max_polling_time = 600
        polling_interval = 10
        logging.info(
            f"Tool called: subtitle_extraction, video_file_path: {video_file_path}, language_code: {language_code}, name: {name}")
        logging.info(f"Max polling time: {max_polling_time}s, Polling interval: {polling_interval}s")
    
        
        if not video_file_path:
            logging.warning("Video file path parameter is empty")
            return TextContent(
                type="text",
                text="video_file_path parameter cannot be empty"
            )
    
        
        if not os.path.exists(video_file_path):
            logging.warning(f"Video file does not exist: {video_file_path}")
            return TextContent(
                type="text",
                text=f"Video file does not exist: {video_file_path}"
            )
    
        
        _, file_extension = os.path.splitext(video_file_path)
        file_extension = file_extension.lower()
        if file_extension not in [".mp4", ".mov"]:
            logging.warning(f"Unsupported video file format: {file_extension}")
            return TextContent(
                type="text",
                text=f"Unsupported video file format. Only MP4 and MOV formats are supported."
            )
    
        
        max_size_bytes = 2 * 1024 * 1024 * 1024  # 2GB in bytes
        file_size = os.path.getsize(video_file_path)
        if file_size > max_size_bytes:
            logging.warning(f"Video file size exceeds limit: {file_size} bytes, max allowed: {max_size_bytes} bytes")
            return TextContent(
                type="text",
                text=f"Video file size exceeds the maximum limit of 2GB. Please use a smaller file."
            )
    
        try:
            if all_voice_lab is None:
                logging.error("all_voice_lab client is not initialized.")
                return TextContent(type="text",
                                   text="Error: AllVoiceLab client not initialized. Please check server setup.")
    
            
            logging.info("Starting subtitle extraction process")
            project_id = all_voice_lab.subtitle_extraction(
                video_file_path=video_file_path,
                language_code=language_code,
                name=name
            )
            logging.info(f"Subtitle extraction task submitted. Project ID: {project_id}")
    
            
            logging.info(f"Starting to poll extraction status for Project ID: {project_id}")
            start_time = time.time()
            completed = False
    
            
            while time.time() - start_time < max_polling_time:
                try:
                    
                    extraction_info = all_voice_lab.get_extraction_info(project_id)
                    logging.info(f"Extraction status: {extraction_info.status} for Project ID: {project_id}")
    
                    
                    if extraction_info.status.lower() == "success":
                        logging.info(f"Subtitle extraction completed for Project ID: {project_id}")
                        completed = True
    
                        
                        if hasattr(extraction_info, 'result') and extraction_info.result:
                            result_url = extraction_info.result
                            logging.info(f"Downloading subtitle file from: {result_url}")
    
                            try:
                                
                                url = result_url
    
                                
                                headers = all_voice_lab._get_headers(content_type="", accept="*/*")
    
                                
                                response = requests.get(url, headers=headers, stream=True)
    
                                
                                response.raise_for_status()
    
                                
                                # Get original filename without extension
                                original_filename = os.path.splitext(os.path.basename(video_file_path))[0]
                                timestamp = int(time.time())
                                random_suffix = ''.join(random.choices('abcdefghijklmnopqrstuvwxyz0123456789', k=6))
                                filename = f"{original_filename}_subtitle_{timestamp}_{random_suffix}.srt"
    
                                
                                os.makedirs(output_dir, exist_ok=True)
                                file_path = os.path.join(output_dir, filename)
    
                                
                                with open(file_path, 'wb') as f:
                                    for chunk in response.iter_content(chunk_size=8192):
                                        if chunk:
                                            f.write(chunk)
    
                                logging.info(f"Subtitle file saved to: {file_path}")
    
                                
                                info_parts = []
                                info_parts.append(f"Subtitle extraction completed successfully.")
                                info_parts.append(f"Project ID: {project_id}")
                                info_parts.append(f"Subtitle file saved to: {file_path}")
    
                                return TextContent(
                                    type="text",
                                    text="\n".join(info_parts)
                                )
    
                            except Exception as e:
                                logging.error(f"Failed to download subtitle file: {str(e)}")
                                
                                info_parts = []
                                info_parts.append(f"Subtitle extraction completed successfully.")
                                info_parts.append(f"Project ID: {project_id}")
                                info_parts.append(f"Result URL: {extraction_info.result}")
                                info_parts.append(f"Failed to download subtitle file: {str(e)}")
    
                                return TextContent(
                                    type="text",
                                    text="\n".join(info_parts)
                                )
                        else:
                            info_parts = []
                            info_parts.append(f"Subtitle extraction completed successfully.")
                            info_parts.append(f"Project ID: {project_id}")
                            info_parts.append("No subtitle file URL available.")
    
                            return TextContent(
                                type="text",
                                text="\n".join(info_parts)
                            )
    
                    elif extraction_info.status.lower() in ["failed", "error"]:
                        logging.error(f"Subtitle extraction failed for Project ID: {project_id}")
                        error_message = "Subtitle extraction failed."
                        if hasattr(extraction_info, 'message') and extraction_info.message:
                            error_message += f" Message: {extraction_info.message}"
                        return TextContent(
                            type="text",
                            text=f"{error_message}\nProject ID: {project_id}"
                        )
    
                    logging.info(f"Waiting {polling_interval} seconds before next poll")
                    time.sleep(polling_interval)
    
                except Exception as e:
                    logging.error(f"Error while polling extraction status: {str(e)}")
                    time.sleep(polling_interval)
    
    
            if not completed:
                logging.warning(f"Polling timed out after {max_polling_time} seconds for Project ID: {project_id}")
                return TextContent(
                    type="text",
                    text=f"Subtitle extraction is still in progress. Please check the status later using the 'get_extraction_info' tool.\n"
                         f"Project ID: {project_id}"
                )
    
        except FileNotFoundError as e:
            logging.error(f"Error in subtitle_extraction_tool: {str(e)}")
            return TextContent(type="text", text=str(e))
        except Exception as e:
            logging.error(f"Failed to extract subtitles: {str(e)}")
            error_message = f"Failed to extract subtitles. Error: {str(e)}"
            if hasattr(e, 'response') and e.response is not None:
                try:
                    error_detail = e.response.json()
                    error_message = f"Failed to extract subtitles: API Error - {error_detail.get('message', str(e))}"
                except ValueError:  # Not a JSON response
                    error_message = f"Failed to extract subtitles: API Error - {e.response.status_code} {e.response.text}"
            return TextContent(
                type="text",
                text=error_message
            )
  • MCP server registration of the subtitle_extraction tool, including name, detailed description with args and returns.
    mcp.tool(
        name="subtitle_extraction",
        description="""[AllVoiceLab Tool] Extract subtitles from a video using OCR technology.
    
        This tool processes a video file to extract hardcoded subtitles. The process runs asynchronously with status polling
        and returns the extracted subtitles when complete.
    
        Args:
            video_file_path (str): Path to the video file (MP4, MOV). Max size 2GB.
            language_code (str, optional): Language code for subtitle text detection (e.g., 'en', 'zh'). Defaults to 'auto'.
            name (str, optional): Optional project name for identification.
            output_dir (str, optional): Output directory for the downloaded result file. It has a default value.
    
        Returns:
            TextContent containing the file path to the srt file or error message.
            If the process takes longer than expected, returns the project ID for later status checking. 
    
        Note:
            - Supported video formats: MP4, MOV
            - Video file size limit: 10 seconds to 200 minutes, max 2GB.
            - If the process takes longer than max_polling_time, use 'get_extraction_info' to check status and retrieve results.
        """
    )(subtitle_extraction_tool)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does so well. It discloses key behavioral traits: asynchronous processing with status polling, file format/size limits (MP4/MOV, 2GB, 10s-200min), and fallback behavior for long processes (returns project ID). It does not mention rate limits or auth needs, but covers critical operational details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded with the core purpose. Every sentence adds value, such as detailing async processing, parameter semantics, and notes. It could be slightly more concise by integrating the 'Note' section into the main flow, but overall it's well-structured with minimal waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (async processing, 4 parameters, no annotations, no output schema), the description is nearly complete. It covers purpose, usage, parameters, behavioral traits, and output handling (returns file path, project ID, or error). It lacks explicit error cases or detailed output format, but provides sufficient context for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate, and it does excellently. It adds meaning for all parameters: explains 'video_file_path' constraints (MP4/MOV, max 2GB), 'language_code' usage (e.g., 'en', 'zh', default 'auto'), 'name' purpose (project identification), and 'output_dir' behavior (default value, for downloaded result). This goes far beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Extract subtitles from a video using OCR technology.' It specifies the verb ('extract'), resource ('subtitles from a video'), and method ('OCR technology'), distinguishing it from sibling tools like 'remove_subtitle' or 'video_translation_dubbing' that handle different tasks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (for extracting hardcoded subtitles from MP4/MOV videos) and mentions an alternative for status checking ('get_extraction_info'), but does not explicitly state when not to use it or compare it to all sibling tools like 'video_translation_dubbing' which might also involve subtitles.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/allvoicelab/AllVoiceLab-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server