Skip to main content
Glama
elevenlabs

ElevenLabs MCP Server

Official
by elevenlabs

speech_to_text

Transcribe audio files to text with optional speaker identification and flexible output options for accessible content creation.

Instructions

Transcribe speech from an audio file. When save_transcript_to_file=True: Saves output file to directory (default: $HOME/Desktop). When return_transcript_to_client_directly=True, always returns text directly regardless of output mode.

⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.

Args:
    file_path: Path to the audio file to transcribe
    language_code: ISO 639-3 language code for transcription. If not provided, the language will be detected automatically.
    diarize: Whether to diarize the audio file. If True, which speaker is currently speaking will be annotated in the transcription.
    save_transcript_to_file: Whether to save the transcript to a file.
    return_transcript_to_client_directly: Whether to return the transcript to the client directly.
    output_directory: Directory where files should be saved (only used when saving files).
        Defaults to $HOME/Desktop if not provided.

Returns:
    TextContent containing the transcription or MCP resource with transcript data.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
input_file_pathYes
language_codeNo
diarizeNo
save_transcript_to_fileNo
return_transcript_to_client_directlyNo
output_directoryNo

Implementation Reference

  • The handler function decorated with @mcp.tool that implements the speech_to_text tool logic. It handles audio input, calls the ElevenLabs speech-to-text API (scribe_v1 model), supports diarization, formats the transcript, and manages output based on configuration.
    @mcp.tool(
        description=f"""Transcribe speech from an audio file. When save_transcript_to_file=True: {get_output_mode_description(output_mode)}. When return_transcript_to_client_directly=True, always returns text directly regardless of output mode.
    
        ⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.
    
        Args:
            file_path: Path to the audio file to transcribe
            language_code: ISO 639-3 language code for transcription. If not provided, the language will be detected automatically.
            diarize: Whether to diarize the audio file. If True, which speaker is currently speaking will be annotated in the transcription.
            save_transcript_to_file: Whether to save the transcript to a file.
            return_transcript_to_client_directly: Whether to return the transcript to the client directly.
            output_directory: Directory where files should be saved (only used when saving files).
                Defaults to $HOME/Desktop if not provided.
    
        Returns:
            TextContent containing the transcription or MCP resource with transcript data.
        """
    )
    def speech_to_text(
        input_file_path: str,
        language_code: str | None = None,
        diarize: bool = False,
        save_transcript_to_file: bool = True,
        return_transcript_to_client_directly: bool = False,
        output_directory: str | None = None,
    ) -> Union[TextContent, EmbeddedResource]:
        if not save_transcript_to_file and not return_transcript_to_client_directly:
            make_error("Must save transcript to file or return it to the client directly.")
        file_path = handle_input_file(input_file_path)
        if save_transcript_to_file:
            output_path = make_output_path(output_directory, base_path)
            output_file_name = make_output_file("stt", file_path.name, "txt")
        with file_path.open("rb") as f:
            audio_bytes = f.read()
    
        if language_code == "" or language_code is None:
            language_code = None
    
        transcription = client.speech_to_text.convert(
            model_id="scribe_v1",
            file=audio_bytes,
            language_code=language_code,
            enable_logging=True,
            diarize=diarize,
            tag_audio_events=True,
        )
    
        # Format transcript with speaker identification if diarization was enabled
        if diarize:
            formatted_transcript = format_diarized_transcript(transcription)
        else:
            formatted_transcript = transcription.text
    
        if return_transcript_to_client_directly:
            return TextContent(type="text", text=formatted_transcript)
    
        if save_transcript_to_file:
            transcript_bytes = formatted_transcript.encode("utf-8")
    
            # Handle different output modes
            success_message = f"Transcription saved to {file_path}"
            return handle_output_mode(
                transcript_bytes,
                output_path,
                output_file_name,
                output_mode,
                success_message,
            )
    
        # This should not be reached due to validation at the start of the function
        return TextContent(type="text", text="No output mode specified")
  • The @mcp.tool decorator registers the speech_to_text tool with MCP, including the tool description and argument details.
    @mcp.tool(
        description=f"""Transcribe speech from an audio file. When save_transcript_to_file=True: {get_output_mode_description(output_mode)}. When return_transcript_to_client_directly=True, always returns text directly regardless of output mode.
    
        ⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.
    
        Args:
            file_path: Path to the audio file to transcribe
            language_code: ISO 639-3 language code for transcription. If not provided, the language will be detected automatically.
            diarize: Whether to diarize the audio file. If True, which speaker is currently speaking will be annotated in the transcription.
            save_transcript_to_file: Whether to save the transcript to a file.
            return_transcript_to_client_directly: Whether to return the transcript to the client directly.
            output_directory: Directory where files should be saved (only used when saving files).
                Defaults to $HOME/Desktop if not provided.
    
        Returns:
            TextContent containing the transcription or MCP resource with transcript data.
        """
    )
  • Supporting helper function used by speech_to_text to format transcripts with speaker labels when diarization is enabled.
    def format_diarized_transcript(transcription) -> str:
        """Format transcript with speaker labels from diarized response."""
        try:
            # Try to access words array - the exact attribute might vary
            words = None
            if hasattr(transcription, "words"):
                words = transcription.words
            elif hasattr(transcription, "__dict__"):
                # Try to find words in the response dict
                for key, value in transcription.__dict__.items():
                    if key == "words" or (
                        isinstance(value, list)
                        and len(value) > 0
                        and (
                            hasattr(value[0], "speaker_id")
                            if hasattr(value[0], "__dict__")
                            else (
                                "speaker_id" in value[0]
                                if isinstance(value[0], dict)
                                else False
                            )
                        )
                    ):
                        words = value
                        break
    
            if not words:
                return transcription.text
    
            formatted_lines = []
            current_speaker = None
            current_text = []
    
            for word in words:
                # Get speaker_id - might be an attribute or dict key
                word_speaker = None
                if hasattr(word, "speaker_id"):
                    word_speaker = word.speaker_id
                elif isinstance(word, dict) and "speaker_id" in word:
                    word_speaker = word["speaker_id"]
    
                # Get text - might be an attribute or dict key
                word_text = None
                if hasattr(word, "text"):
                    word_text = word.text
                elif isinstance(word, dict) and "text" in word:
                    word_text = word["text"]
    
                if not word_speaker or not word_text:
                    continue
    
                # Skip spacing/punctuation types if they exist
                if hasattr(word, "type") and word.type == "spacing":
                    continue
                elif isinstance(word, dict) and word.get("type") == "spacing":
                    continue
    
                if current_speaker != word_speaker:
                    # Save previous speaker's text
                    if current_speaker and current_text:
                        speaker_label = current_speaker.upper().replace("_", " ")
                        formatted_lines.append(f"{speaker_label}: {' '.join(current_text)}")
    
                    # Start new speaker
                    current_speaker = word_speaker
                    current_text = [word_text.strip()]
                else:
                    current_text.append(word_text.strip())
    
            # Add final speaker's text
            if current_speaker and current_text:
                speaker_label = current_speaker.upper().replace("_", " ")
                formatted_lines.append(f"{speaker_label}: {' '.join(current_text)}")
    
            return "\n\n".join(formatted_lines)
    
        except Exception:
            # Fallback to regular text if something goes wrong
            return transcription.text
    @mcp.resource("elevenlabs://{filename}")

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/elevenlabs/elevenlabs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server