speech_to_text
Transcribe speech from audio files using ElevenLabs API. Supports automatic language detection, speaker diarization, and flexible output options including file saving or direct text return.
Instructions
Transcribe speech from an audio file. When save_transcript_to_file=True: Saves output file to directory (default: $HOME/Desktop). When return_transcript_to_client_directly=True, always returns text directly regardless of output mode.
⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.
Args:
file_path: Path to the audio file to transcribe
language_code: ISO 639-3 language code for transcription. If not provided, the language will be detected automatically.
diarize: Whether to diarize the audio file. If True, which speaker is currently speaking will be annotated in the transcription.
save_transcript_to_file: Whether to save the transcript to a file.
return_transcript_to_client_directly: Whether to return the transcript to the client directly.
output_directory: Directory where files should be saved (only used when saving files).
Defaults to $HOME/Desktop if not provided.
Returns:
TextContent containing the transcription or MCP resource with transcript data.Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| input_file_path | Yes | ||
| language_code | No | ||
| diarize | No | ||
| save_transcript_to_file | No | ||
| return_transcript_to_client_directly | No | ||
| output_directory | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |