Voxtral speech-to-text
voxtral_transcribeTranscribe audio files to text with language hints, speaker diarization, and per-segment timestamps. Use Mistral Voxtral models for accurate speech-to-text conversion.
Instructions
Transcribe an audio file to text using Mistral Voxtral.
Accepted models:
voxtral-mini-latest
voxtral-small-latest
Audio source is one of:
{ type: "file_url", fileUrl: "https://..." } (public URL)
{ type: "file", fileId: "" }
Options:
language: ISO-639-1 hint (e.g. 'fr', 'en'). Boosts accuracy when known.temperature: sampling temperature.diarize: return per-speaker segments (default false).timestampGranularities: ['segment'] to return per-segment timestamps.contextBias: list of phrases/terms that should bias the decoder.
Returns plain text, detected language, optional segments[], and token usage.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| audio | Yes | ||
| model | No | STT model. Default: voxtral-mini-latest. | |
| language | No | ISO-639-1 language hint (e.g. 'fr', 'en'). | |
| temperature | No | ||
| diarize | No | ||
| timestampGranularities | No | Only 'segment' is currently supported. | |
| contextBias | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | ||
| model | Yes | ||
| language | Yes | ||
| segments | No | ||
| usage | No |