speech_to_text
Transcribe audio files into text using a specified ASR model. Save the output file with optional word timestamps and prioritized vocabulary settings to a designated directory or default location.
Instructions
Convert speech to text with a given model and save the output text file to a given directory. Directory is optional, if not provided, the output file will be saved to $HOME/Desktop.
⚠️ COST WARNING: This tool makes an API call to Whissle which may incur costs. Only use when explicitly requested by the user.
Args:
audio_file_path (str): Path to the audio file to transcribe
model_name (str, optional): The name of the ASR model to use. Defaults to "en-NER"
timestamps (bool, optional): Whether to include word timestamps
boosted_lm_words (List[str], optional): Words to boost in recognition
boosted_lm_score (int, optional): Score for boosted words (0-100)
output_directory (str, optional): Directory where files should be saved.
Defaults to $HOME/Desktop if not provided.
Returns:
TextContent with the transcription and path to the output file.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
audio_file_path | Yes | ||
boosted_lm_score | No | ||
boosted_lm_words | No | ||
model_name | No | en-NER | |
timestamps | No |
Input Schema (JSON Schema)
{
"properties": {
"audio_file_path": {
"title": "Audio File Path",
"type": "string"
},
"boosted_lm_score": {
"default": 80,
"title": "Boosted Lm Score",
"type": "integer"
},
"boosted_lm_words": {
"default": null,
"items": {
"type": "string"
},
"title": "Boosted Lm Words",
"type": "array"
},
"model_name": {
"default": "en-NER",
"title": "Model Name",
"type": "string"
},
"timestamps": {
"default": true,
"title": "Timestamps",
"type": "boolean"
}
},
"required": [
"audio_file_path"
],
"title": "speech_to_textArguments",
"type": "object"
}