Server Details
Pronunciation scoring, text-to-speech (12 voices), and speech-to-text with timestamps.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
- Repository
- fasuizu-br/speech-ai-examples
- GitHub Stars
- 0
See and control every tool call
Available Tools
10 toolsassess_pronunciationTry in Inspector
Assess English pronunciation quality from audio.
Scores pronunciation at four levels: overall, sentence, word, and phoneme. Each score is 0-100. Phonemes are returned in both IPA and ARPAbet notation. Sub-300ms inference latency.
Args: audio_base64: Base64-encoded audio data. Supports WAV, MP3, OGG, and WebM formats. text: The reference English text that the speaker was expected to read aloud. audio_format: Audio format hint — one of 'wav', 'mp3', 'ogg', 'webm'. Defaults to 'wav'.
Returns: dict with keys: - overallScore (int 0-100): Overall pronunciation quality - sentenceScore (int 0-100): Sentence-level fluency and accuracy - words (list): Per-word scores, each containing: - word (str): The word - score (int 0-100): Word pronunciation score - phonemes (list): Per-phoneme scores with IPA/ARPAbet notation - decodedTranscript (str): What the model heard (ASR transcript) - transcript (str): Reference text - confidence (float 0-1): Scoring confidence - warnings (list[str]): Quality warnings if any - audioQuality (dict): Audio metrics (SNR, peak/RMS dB, etc.)
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | The reference English text that the speaker was expected to read aloud. | |
| audio_base64 | Yes | Base64-encoded audio data. Supports WAV, MP3, OGG, and WebM formats. | |
| audio_format | No | Audio format hint — one of 'wav', 'mp3', 'ogg', 'webm'. | wav |
check_pronunciation_serviceTry in Inspector
Check if the pronunciation assessment service is healthy and ready.
Returns: dict with keys: - status (str): 'healthy' or error state - modelLoaded (bool): Whether the scoring model is loaded - version (str): API version
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
check_stt_serviceTry in Inspector
Check if the speech-to-text service is healthy and ready.
Returns: dict with keys: - status (str): 'healthy' or error state - modelLoaded (bool): Whether the STT model is loaded - version (str): API version
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
check_tts_serviceTry in Inspector
Check if the text-to-speech service is healthy and ready.
Returns: dict with keys: - status (str): 'healthy' or error state - modelLoaded (bool): Whether the TTS model is loaded - version (str): API version
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
check_whisper_serviceTry in Inspector
Check if the Whisper STT Pro service is healthy and ready.
Returns: dict with keys: - status (str): 'healthy' or error state - modelLoaded (bool): Whether the Whisper model is loaded - diarizeLoaded (bool): Whether the diarization pipeline is loaded - version (str): API version - modelName (str): Whisper model name (e.g. 'large-v3-turbo')
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
get_phoneme_inventoryTry in Inspector
Get the full phoneme inventory supported by the pronunciation scorer.
Returns a list of all English phonemes the engine can assess, including ARPAbet symbol, IPA equivalent, example word, and phoneme category (vowel, consonant, diphthong).
Returns: list of dicts, each with keys: - arpabet (str): ARPAbet symbol (e.g. 'AA', 'TH') - ipa (str): IPA notation - example (str): Example word containing the phoneme - category (str): vowel, consonant, or diphthong
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
list_tts_voicesTry in Inspector
List all available text-to-speech voices with metadata.
Returns: dict with keys: - voices (list): Available voices, each with id, name, gender, accent, grade - defaultVoice (str): Default voice ID
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
synthesize_speechTry in Inspector
Generate natural speech audio from English text.
Produces high-quality speech with 12 English voices. Returns base64-encoded WAV audio (16-bit PCM, 24kHz mono) along with metadata.
Available voices:
af_heart (default), af_bella, af_nicole, af_sarah, af_sky (American female)
am_adam, am_michael (American male)
bf_emma, bf_isabella (British female)
bm_george, bm_lewis, bm_daniel (British male)
Args: text: English text to synthesize (1-5000 characters). voice: Voice ID. See list above. Defaults to 'af_heart'. speed: Speed multiplier from 0.5 to 2.0 (default: 1.0).
Returns: dict with keys: - audio_base64 (str): Base64-encoded WAV audio (16-bit PCM, 24kHz) - duration_ms (str): Audio duration in milliseconds - voice (str): Voice ID used - text_length (str): Input text character count - processing_ms (str): Synthesis time in milliseconds
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | English text to convert to speech. Max 5000 characters. | |
| speed | No | Speech speed multiplier (0.5 = half speed, 2.0 = double). | |
| voice | No | Voice ID (e.g. 'af_heart', 'am_adam'). Uses default if omitted. |
transcribe_audioTry in Inspector
Transcribe audio to text with word-level timestamps.
Converts spoken English audio into text with optional word-level timestamps and per-word confidence scores.
Args: audio_base64: Base64-encoded audio data (WAV, MP3, OGG, FLAC, WebM). audio_format: Audio format hint. Auto-detected from magic bytes if omitted. include_timestamps: Whether to include word-level timing (default: true).
Returns: dict with keys: - text (str): Full decoded transcript - words (list): Per-word results with timestamps, each containing: - word (str): The transcribed word - start (float): Start time in seconds - end (float): End time in seconds - confidence (float 0-1): Word-level confidence - audioDurationMs (int): Audio duration in milliseconds - metadata (dict): Processing time, audio length, model version - audioQuality (dict): Audio metrics (SNR, peak/RMS dB, etc.)
| Name | Required | Description | Default |
|---|---|---|---|
| audio_base64 | Yes | Base64-encoded audio data. Supports WAV, MP3, OGG, FLAC, and WebM formats. | |
| audio_format | No | Audio format hint — 'wav', 'mp3', 'ogg', 'flac', 'webm'. Auto-detected if omitted. | |
| include_timestamps | No | If true, include word-level start/end times and confidence. |
transcribe_audio_proTry in Inspector
Transcribe audio with Whisper Large V3 Turbo — multilingual STT.
Supports 99 languages with automatic language detection, word-level timestamps, per-word confidence scores, and optional speaker diarization (identifies who spoke each word). Best-in-class WER (~2%).
Args: audio_base64: Base64-encoded audio (WAV, MP3, OGG, FLAC, WebM). language: Language code. Auto-detected if omitted. Supports 99 languages. diarize: Enable speaker diarization (default: false). When true, each word includes a speaker label (e.g. SPEAKER_00, SPEAKER_01).
Returns: dict with keys: - text (str): Full decoded transcript - words (list): Per-word results with timestamps, each containing: - word (str), start (float), end (float), confidence (float 0-1) - speaker (str|null): Speaker label when diarize=true - speakers (dict|null): Speaker info with count and labels - audioDurationMs (int): Audio duration in milliseconds - metadata (dict): Processing time, language, languageProbability - audioQuality (dict): Audio metrics (SNR, peak/RMS dB, etc.)
| Name | Required | Description | Default |
|---|---|---|---|
| diarize | No | Enable speaker diarization to identify who spoke each word. | |
| language | No | Language code (e.g. 'en', 'es', 'zh'). Auto-detected when omitted. | |
| audio_base64 | Yes | Base64-encoded audio data. Supports WAV, MP3, OGG, FLAC, and WebM formats. |
To claim this server, publish a /.well-known/glama.json file on your server's domain with the following structure:
The email address must match the email associated with your Glama account. Once verified, the server will appear as claimed by you.
Control your server's listing on Glama, including description and metadata
Receive usage reports showing how your server is being used
Get monitoring and health status updates for your server
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!
Your Connectors
Sign in to create a connector for this server.