Skip to main content
Glama

diarize_speech

Transcribe audio files to text with speaker identification, saving results to a specified directory. Supports custom models, maximum speaker limits, and word boosting for accurate recognition. Uses Whissle API for processing.

Instructions

Convert speech to text with speaker diarization and save the output text file to a given directory. Directory is optional, if not provided, the output file will be saved to $HOME/Desktop.

⚠️ COST WARNING: This tool makes an API call to Whissle which may incur costs. Only use when explicitly requested by the user. Args: audio_file_path (str): Path to the audio file to transcribe model_name (str, optional): The name of the ASR model to use. Defaults to "en-NER" max_speakers (int, optional): Maximum number of speakers to identify boosted_lm_words (List[str], optional): Words to boost in recognition boosted_lm_score (int, optional): Score for boosted words (0-100) output_directory (str, optional): Directory where files should be saved. Defaults to $HOME/Desktop if not provided. Returns: TextContent with the diarized transcription and path to the output file.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
audio_file_pathYes
boosted_lm_scoreNo
boosted_lm_wordsNo
max_speakersNo
model_nameNoen-NER

Implementation Reference

  • The core handler function for the 'diarize_speech' tool. It validates the input audio file (existence, size, format), calls the WhissleClient's diarize_stt API method with retry logic for errors, processes the response to extract transcript, diarize_output, timestamps, etc., and returns a dictionary with the results or error messages.
    def diarize_speech(audio_file_path: str, model_name: str = "en-NER", max_speakers: int = 2, boosted_lm_words: List[str] = None, boosted_lm_score: int = 80) -> Dict: """Diarize speech using Whissle API""" try: # Check if file exists if not os.path.exists(audio_file_path): logger.error(f"Audio file not found: {audio_file_path}") return {"error": f"Audio file not found: {audio_file_path}"} # Check file size file_size = os.path.getsize(audio_file_path) if file_size == 0: logger.error(f"Audio file is empty: {audio_file_path}") return {"error": f"Audio file is empty: {audio_file_path}"} # Check file format file_ext = os.path.splitext(audio_file_path)[1].lower() if file_ext not in ['.wav', '.mp3', '.ogg', '.flac', '.m4a']: logger.error(f"Unsupported audio format: {file_ext}") return {"error": f"Unsupported audio format: {file_ext}. Supported formats: wav, mp3, ogg, flac, m4a"} # Check file size limits max_size_mb = 25 if file_size > max_size_mb * 1024 * 1024: logger.error(f"File too large: {file_size / (1024*1024):.2f} MB") return {"error": f"File too large ({file_size / (1024*1024):.2f} MB). Maximum size is {max_size_mb} MB."} # Log the request details logger.info(f"Diarizing audio file: {audio_file_path}") logger.info(f"File size: {file_size / (1024*1024):.2f} MB") logger.info(f"File format: {file_ext}") # Try with a different model if the default one fails models_to_try = ["en-NER"] last_error = None for try_model in models_to_try: retry_count = 0 max_retries = 2 while retry_count <= max_retries: try: logger.info(f"Attempting diarization with model: {try_model} (Attempt {retry_count+1}/{max_retries+1})") response = client.diarize_stt( audio_file_path=audio_file_path, model_name=try_model, max_speakers=max_speakers, boosted_lm_words=boosted_lm_words, boosted_lm_score=boosted_lm_score ) if response and hasattr(response, 'diarize_output') and response.diarize_output: logger.info(f"Diarization successful with model: {try_model}") result = { "transcript": getattr(response, 'transcript', ''), "duration_seconds": getattr(response, 'duration_seconds', 0), "language_code": getattr(response, 'language_code', 'en'), "diarize_output": response.diarize_output } if hasattr(response, 'timestamps'): result["timestamps"] = response.timestamps return result else: last_error = "No diarized transcription was returned from the API" logger.error(f"No diarized transcription returned from API with model {try_model}") break except Exception as api_error: error_msg = str(api_error) logger.error(f"Error with model {try_model}: {error_msg}") last_error = error_msg error_result = handle_api_error(error_msg, "diarization", retry_count, max_retries) if error_result is not None: if retry_count == max_retries: break else: return {"error": error_result} retry_count += 1 if "HTTP 500" in last_error: logger.error(f"All diarization attempts failed with HTTP 500: {last_error}") return {"error": f"Server error during diarization. This might be a temporary issue with the Whissle API. Please try again later or contact Whissle support. Error: {last_error}"} else: logger.error(f"All diarization attempts failed: {last_error}") return {"error": f"Failed to diarize speech: {last_error}"} except Exception as e: logger.error(f"Unexpected error during diarization: {str(e)}") return {"error": f"Failed to diarize speech: {str(e)}"}
  • The @mcp.tool decorator registers the 'diarize_speech' tool with FastMCP. It provides the tool description, input parameters (schema), and usage instructions.
    @mcp.tool( description="""Convert speech to text with speaker diarization and save the output text file to a given directory. Directory is optional, if not provided, the output file will be saved to $HOME/Desktop. ⚠️ COST WARNING: This tool makes an API call to Whissle which may incur costs. Only use when explicitly requested by the user. Args: audio_file_path (str): Path to the audio file to transcribe model_name (str, optional): The name of the ASR model to use. Defaults to "en-NER" max_speakers (int, optional): Maximum number of speakers to identify boosted_lm_words (List[str], optional): Words to boost in recognition boosted_lm_score (int, optional): Score for boosted words (0-100) output_directory (str, optional): Directory where files should be saved. Defaults to $HOME/Desktop if not provided. Returns: TextContent with the diarized transcription and path to the output file. """ )

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/WhissleAI/whissle-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server