Skip to main content
Glama

diarize_speech

Transcribe audio files into text with speaker identification, saving the output to a specified directory for clear conversation documentation.

Instructions

Convert speech to text with speaker diarization and save the output text file to a given directory. Directory is optional, if not provided, the output file will be saved to $HOME/Desktop.

⚠️ COST WARNING: This tool makes an API call to Whissle which may incur costs. Only use when explicitly requested by the user. Args: audio_file_path (str): Path to the audio file to transcribe model_name (str, optional): The name of the ASR model to use. Defaults to "en-NER" max_speakers (int, optional): Maximum number of speakers to identify boosted_lm_words (List[str], optional): Words to boost in recognition boosted_lm_score (int, optional): Score for boosted words (0-100) output_directory (str, optional): Directory where files should be saved. Defaults to $HOME/Desktop if not provided. Returns: TextContent with the diarized transcription and path to the output file.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
audio_file_pathYes
model_nameNoen-NER
max_speakersNo
boosted_lm_wordsNo
boosted_lm_scoreNo

Implementation Reference

  • Registers the 'diarize_speech' tool using the @mcp.tool decorator from FastMCP. Includes detailed description serving as input schema documentation.
    @mcp.tool( description="""Convert speech to text with speaker diarization and save the output text file to a given directory. Directory is optional, if not provided, the output file will be saved to $HOME/Desktop. ⚠️ COST WARNING: This tool makes an API call to Whissle which may incur costs. Only use when explicitly requested by the user. Args: audio_file_path (str): Path to the audio file to transcribe model_name (str, optional): The name of the ASR model to use. Defaults to "en-NER" max_speakers (int, optional): Maximum number of speakers to identify boosted_lm_words (List[str], optional): Words to boost in recognition boosted_lm_score (int, optional): Score for boosted words (0-100) output_directory (str, optional): Directory where files should be saved. Defaults to $HOME/Desktop if not provided. Returns: TextContent with the diarized transcription and path to the output file. """ )
  • The core handler function that validates input, calls the Whissle client's diarize_stt API method with retry logic and error handling using handle_api_error, and returns the diarized transcription output.
    def diarize_speech(audio_file_path: str, model_name: str = "en-NER", max_speakers: int = 2, boosted_lm_words: List[str] = None, boosted_lm_score: int = 80) -> Dict: """Diarize speech using Whissle API""" try: # Check if file exists if not os.path.exists(audio_file_path): logger.error(f"Audio file not found: {audio_file_path}") return {"error": f"Audio file not found: {audio_file_path}"} # Check file size file_size = os.path.getsize(audio_file_path) if file_size == 0: logger.error(f"Audio file is empty: {audio_file_path}") return {"error": f"Audio file is empty: {audio_file_path}"} # Check file format file_ext = os.path.splitext(audio_file_path)[1].lower() if file_ext not in ['.wav', '.mp3', '.ogg', '.flac', '.m4a']: logger.error(f"Unsupported audio format: {file_ext}") return {"error": f"Unsupported audio format: {file_ext}. Supported formats: wav, mp3, ogg, flac, m4a"} # Check file size limits max_size_mb = 25 if file_size > max_size_mb * 1024 * 1024: logger.error(f"File too large: {file_size / (1024*1024):.2f} MB") return {"error": f"File too large ({file_size / (1024*1024):.2f} MB). Maximum size is {max_size_mb} MB."} # Log the request details logger.info(f"Diarizing audio file: {audio_file_path}") logger.info(f"File size: {file_size / (1024*1024):.2f} MB") logger.info(f"File format: {file_ext}") # Try with a different model if the default one fails models_to_try = ["en-NER"] last_error = None for try_model in models_to_try: retry_count = 0 max_retries = 2 while retry_count <= max_retries: try: logger.info(f"Attempting diarization with model: {try_model} (Attempt {retry_count+1}/{max_retries+1})") response = client.diarize_stt( audio_file_path=audio_file_path, model_name=try_model, max_speakers=max_speakers, boosted_lm_words=boosted_lm_words, boosted_lm_score=boosted_lm_score ) if response and hasattr(response, 'diarize_output') and response.diarize_output: logger.info(f"Diarization successful with model: {try_model}") result = { "transcript": getattr(response, 'transcript', ''), "duration_seconds": getattr(response, 'duration_seconds', 0), "language_code": getattr(response, 'language_code', 'en'), "diarize_output": response.diarize_output } if hasattr(response, 'timestamps'): result["timestamps"] = response.timestamps return result else: last_error = "No diarized transcription was returned from the API" logger.error(f"No diarized transcription returned from API with model {try_model}") break except Exception as api_error: error_msg = str(api_error) logger.error(f"Error with model {try_model}: {error_msg}") last_error = error_msg error_result = handle_api_error(error_msg, "diarization", retry_count, max_retries) if error_result is not None: if retry_count == max_retries: break else: return {"error": error_result} retry_count += 1 if "HTTP 500" in last_error: logger.error(f"All diarization attempts failed with HTTP 500: {last_error}") return {"error": f"Server error during diarization. This might be a temporary issue with the Whissle API. Please try again later or contact Whissle support. Error: {last_error}"} else: logger.error(f"All diarization attempts failed: {last_error}") return {"error": f"Failed to diarize speech: {last_error}"} except Exception as e: logger.error(f"Unexpected error during diarization: {str(e)}") return {"error": f"Failed to diarize speech: {str(e)}"}

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/WhissleAI/whissle-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server