Voxtral speech-to-text
voxtral_transcribeTranscribe audio files to text using Mistral Voxtral models. Accepts public URLs or uploaded files, with options for language hint, speaker diarization, and context bias.
Instructions
Transcribe an audio file to text using Mistral Voxtral.
Accepted models:
voxtral-mini-latest
voxtral-small-latest
Audio source is one of:
{ type: "file_url", fileUrl: "https://..." } (public URL)
{ type: "file", fileId: "" }
Options:
language: ISO-639-1 hint (e.g. 'fr', 'en'). Boosts accuracy when known.temperature: sampling temperature.diarize: return per-speaker segments (default false).timestampGranularities: ['segment'] to return per-segment timestamps.contextBias: list of phrases/terms that should bias the decoder.
Returns plain text, detected language, optional segments[], and token usage.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| audio | Yes | ||
| model | No | STT model. Default: voxtral-mini-latest. | |
| language | No | ISO-639-1 language hint (e.g. 'fr', 'en'). | |
| temperature | No | ||
| diarize | No | ||
| timestampGranularities | No | Only 'segment' is currently supported. | |
| contextBias | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | ||
| model | Yes | ||
| language | Yes | ||
| segments | No | ||
| usage | No |