Skip to main content
Glama

diarize_speech

Transcribe audio files into text with speaker identification, saving the output to a specified directory for clear conversation documentation.

Instructions

Convert speech to text with speaker diarization and save the output text file to a given directory. Directory is optional, if not provided, the output file will be saved to $HOME/Desktop.

⚠️ COST WARNING: This tool makes an API call to Whissle which may incur costs. Only use when explicitly requested by the user.

Args:
    audio_file_path (str): Path to the audio file to transcribe
    model_name (str, optional): The name of the ASR model to use. Defaults to "en-NER"
    max_speakers (int, optional): Maximum number of speakers to identify
    boosted_lm_words (List[str], optional): Words to boost in recognition
    boosted_lm_score (int, optional): Score for boosted words (0-100)
    output_directory (str, optional): Directory where files should be saved.
        Defaults to $HOME/Desktop if not provided.

Returns:
    TextContent with the diarized transcription and path to the output file.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
audio_file_pathYes
model_nameNoen-NER
max_speakersNo
boosted_lm_wordsNo
boosted_lm_scoreNo

Implementation Reference

  • Registers the 'diarize_speech' tool using the @mcp.tool decorator from FastMCP. Includes detailed description serving as input schema documentation.
    @mcp.tool(
        description="""Convert speech to text with speaker diarization and save the output text file to a given directory.
        Directory is optional, if not provided, the output file will be saved to $HOME/Desktop.
    
        ⚠️ COST WARNING: This tool makes an API call to Whissle which may incur costs. Only use when explicitly requested by the user.
    
        Args:
            audio_file_path (str): Path to the audio file to transcribe
            model_name (str, optional): The name of the ASR model to use. Defaults to "en-NER"
            max_speakers (int, optional): Maximum number of speakers to identify
            boosted_lm_words (List[str], optional): Words to boost in recognition
            boosted_lm_score (int, optional): Score for boosted words (0-100)
            output_directory (str, optional): Directory where files should be saved.
                Defaults to $HOME/Desktop if not provided.
    
        Returns:
            TextContent with the diarized transcription and path to the output file.
        """
    )
  • The core handler function that validates input, calls the Whissle client's diarize_stt API method with retry logic and error handling using handle_api_error, and returns the diarized transcription output.
    def diarize_speech(audio_file_path: str, model_name: str = "en-NER", max_speakers: int = 2, boosted_lm_words: List[str] = None, boosted_lm_score: int = 80) -> Dict:
        """Diarize speech using Whissle API"""
        try:
            # Check if file exists
            if not os.path.exists(audio_file_path):
                logger.error(f"Audio file not found: {audio_file_path}")
                return {"error": f"Audio file not found: {audio_file_path}"}
            
            # Check file size
            file_size = os.path.getsize(audio_file_path)
            if file_size == 0:
                logger.error(f"Audio file is empty: {audio_file_path}")
                return {"error": f"Audio file is empty: {audio_file_path}"}
            
            # Check file format
            file_ext = os.path.splitext(audio_file_path)[1].lower()
            if file_ext not in ['.wav', '.mp3', '.ogg', '.flac', '.m4a']:
                logger.error(f"Unsupported audio format: {file_ext}")
                return {"error": f"Unsupported audio format: {file_ext}. Supported formats: wav, mp3, ogg, flac, m4a"}
            
            # Check file size limits
            max_size_mb = 25
            if file_size > max_size_mb * 1024 * 1024:
                logger.error(f"File too large: {file_size / (1024*1024):.2f} MB")
                return {"error": f"File too large ({file_size / (1024*1024):.2f} MB). Maximum size is {max_size_mb} MB."}
            
            # Log the request details
            logger.info(f"Diarizing audio file: {audio_file_path}")
            logger.info(f"File size: {file_size / (1024*1024):.2f} MB")
            logger.info(f"File format: {file_ext}")
            
            # Try with a different model if the default one fails
            models_to_try = ["en-NER"]
            last_error = None
            
            for try_model in models_to_try:
                retry_count = 0
                max_retries = 2
                
                while retry_count <= max_retries:
                    try:
                        logger.info(f"Attempting diarization with model: {try_model} (Attempt {retry_count+1}/{max_retries+1})")
                        response = client.diarize_stt(
                            audio_file_path=audio_file_path,
                            model_name=try_model,
                            max_speakers=max_speakers,
                            boosted_lm_words=boosted_lm_words,
                            boosted_lm_score=boosted_lm_score
                        )
                        
                        if response and hasattr(response, 'diarize_output') and response.diarize_output:
                            logger.info(f"Diarization successful with model: {try_model}")
                            
                            result = {
                                "transcript": getattr(response, 'transcript', ''),
                                "duration_seconds": getattr(response, 'duration_seconds', 0),
                                "language_code": getattr(response, 'language_code', 'en'),
                                "diarize_output": response.diarize_output
                            }
                            
                            if hasattr(response, 'timestamps'):
                                result["timestamps"] = response.timestamps
                            
                            return result
                        else:
                            last_error = "No diarized transcription was returned from the API"
                            logger.error(f"No diarized transcription returned from API with model {try_model}")
                            break
                    except Exception as api_error:
                        error_msg = str(api_error)
                        logger.error(f"Error with model {try_model}: {error_msg}")
                        last_error = error_msg
                        
                        error_result = handle_api_error(error_msg, "diarization", retry_count, max_retries)
                        if error_result is not None:
                            if retry_count == max_retries:
                                break
                            else:
                                return {"error": error_result}
                        
                        retry_count += 1
            
            if "HTTP 500" in last_error:
                logger.error(f"All diarization attempts failed with HTTP 500: {last_error}")
                return {"error": f"Server error during diarization. This might be a temporary issue with the Whissle API. Please try again later or contact Whissle support. Error: {last_error}"}
            else:
                logger.error(f"All diarization attempts failed: {last_error}")
                return {"error": f"Failed to diarize speech: {last_error}"}
                
        except Exception as e:
            logger.error(f"Unexpected error during diarization: {str(e)}")
            return {"error": f"Failed to diarize speech: {str(e)}"}

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/WhissleAI/whissle-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server