vocametrix
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| VOCAMETRIX_API_KEY | Yes | Your Vocametrix API key |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": true
} |
| prompts | {
"listChanged": true
} |
| resources | {
"listChanged": true
} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| vocametrix_calculate_avqiA | Calculate the Acoustic Voice Quality Index (AVQI v2.03 or v3.01), a clinically validated dysphonia score. AVQI > 2.43 (French) / 2.97 (English) indicates dysphonia. Requires a sustained vowel recording (e.g. /a/ held for 3+ seconds). Connected speech is optional but improves accuracy. |
| vocametrix_calculate_dsiB | Calculate the Dysphonia Severity Index (DSI). DSI > 1.6 = normal voice; DSI < –1.6 = severe dysphonia. Requires a sustained vowel WAV file plus voice-range parameters (MPT, F0 range, minimum intensity). |
| vocametrix_calculate_cppA | Calculate Cepstral Peak Prominence (CPP) from a sustained vowel. Higher CPP = better voice quality. Typical normal CPP: 20–28 dB. Clinically sensitive to breathiness and hoarseness. |
| vocametrix_calculate_hnrA | Calculate multi-band Harmonics-to-Noise Ratio (HNR) across frequency bands (80–8000 Hz) with age- and gender-specific norms. Higher HNR = cleaner voice. Normal HNR (500 Hz band): > 20 dB. Requires a sustained vowel. |
| vocametrix_calculate_jitter_shimmerA | Calculate jitter (period perturbation, PPQ5) and shimmer (amplitude perturbation) from a sustained vowel. Normal jitter < 1.04%; normal shimmer < 3.81 dB. Elevated values indicate irregular vibration — associated with dysphonia. |
| vocametrix_calculate_voice_range_profileA | Calculate the Voice Range Profile (VRP / ambitus / glissando) from a glissando recording. Returns frequency range (lowest to highest pitch) and intensity range with age/gender interpretation. Useful for singers and voice rehabilitation assessment. |
| vocametrix_calculate_prosody_similarityA | Compare prosodic patterns between a model (reference) recording and a learner recording. Returns similarity scores for pitch contour, intensity, duration, and pause patterns. Useful for accent coaching, speech imitation training, and L2 pronunciation. |
| vocametrix_calculate_spectralA | Extract advanced spectral measures from a sustained vowel: center of gravity, skewness/kurtosis, H1-H2 (breathiness indicator), H1-A1, H1-A3, LTAS slope and tilt, alpha ratio. Returns age/gender-normalized norms and voice pattern classification. |
| vocametrix_calculate_formantsA | Compute F1–F4 formant statistics (mean, SD, range, CV, IQR) from a sustained vowel with vowel-space stability and articulatory precision scores. Useful for dysarthria assessment, vowel space analysis, and cleft palate evaluation. |
| vocametrix_calculate_sz_ratioA | Calculate the S/Z phonation ratio (duration of sustained /s/ vs /z/). Normal ratio ≈ 1.0. Ratio > 1.4 suggests vocal fold pathology (the /z/ is shorter). Requires two separate recordings: one of sustained /s/ and one of sustained /z/. |
| vocametrix_calculate_gneA | Calculate the Glottal-to-Noise Excitation (GNE) ratio from a sustained vowel. GNE ranges 0–1; values < 0.5 suggest increased noise (breathiness/hoarseness). Computed via native Praat algorithm for clinical reliability. |
| vocametrix_calculate_h1_h2A | Calculate the formant-corrected H1*–H2* voice source measure from a sustained vowel. H1*–H2* is sensitive to breathiness: positive values indicate breathy voice, negative values indicate pressed/tense voice. Normal range: −2 to +2 dB. |
| vocametrix_calculate_abiC | Calculate the Acoustic Breathiness Index (ABI) combining connected speech and sustained vowel. ABI aggregates CPPS, jitter, GNE approximation, HNR (6 kHz), H1-H2, shimmer, and period SD. Sensitive to the full spectrum from breathy to pressed phonation. |
| vocametrix_calculate_voice_dynamicsA | Compute intensity dynamics, pitch-intensity correlation, and composite scores for voice control, projection, stability, effort, and monotonicity. Useful for voice training, public speaking coaching, and vocal fatigue assessment. |
| vocametrix_assess_pronunciationA | Score pronunciation accuracy at phoneme level against a reference text. Returns accuracy, fluency, completeness, and prosody scores (0–100) plus per-word and per-phoneme breakdowns. Supports 30+ locales (en-US, fr-FR, de-DE, zh-CN, ar-SA, etc.). Audio should be a clear reading of the reference text. |
| vocametrix_assess_pronunciation_with_pitchA | Pronunciation assessment enriched with per-word F0 (pitch) contours. In addition to accuracy/fluency/prosody scores, returns fundamental frequency (pitch) statistics for each word — useful for tonal language analysis and prosody coaching. |
| vocametrix_transcribe_audioA | Transcribe an audio file using Azure Speech-to-Text with streaming progress. Returns a transcriptionId and streams progress events via SSE until completion. Returns the final transcript and word-level timing. For long recordings, poll the progress events — transcription may take 30–120 seconds. |
| vocametrix_synthesize_speechB | Synthesize speech from text using Azure neural text-to-speech. Returns an audio URL and word-level timing data. Supports all Azure Neural voice names for the requested locale. |
| vocametrix_synthesize_speech_with_timingA | Synthesize speech via ElevenLabs v2 with per-character timing alignment. Returns audio data and a character-level timing map — useful for lip-sync, subtitles, and karaoke. Supports plain text or SSML markup. |
| vocametrix_measure_sound_levelA | Measure sound level in dB SPL over a specified time window in an audio file. Useful for environmental noise assessment, vocal loudness measurement, and calibration tasks. Note: startSec must be > 0 (use 0.001 for the start of the file). |
| vocametrix_extract_egemapsA | Extract the full openSMILE eGeMAPSv02 feature set (88 acoustic features) from an audio file. Features include F0, jitter, shimmer, HNR, MFCCs, formants, spectral flux, and loudness. Commonly used as input to machine-learning voice pathology classifiers. |
| vocametrix_detect_phonemesA | Detect phonemes in an audio recording using a deep-learning classifier. Returns phoneme labels with confidence scores. Currently supports French (fr) and Estonian (et) phoneme inventories. |
| vocametrix_classify_stutteringA | Classify stuttering disfluency patterns in a speech recording (async, ~30–120 seconds). Returns disfluency types (repetitions, prolongations, blocks), severity score, and fluency rate. The tool polls the result automatically — no separate status call needed. |
| vocametrix_interpret_voice_metricsA | Translate raw voice metrics (jitter, shimmer, HNR, CPPS, F0, etc.) into clinical-language interpretation with severity classification (normal / mild / moderate / severe) and actionable recommendations. Useful when you have metric values from other tools and want a clinician-readable summary. |
| vocametrix_generate_exercisesA | Generate personalized speech therapy exercises tailored to patient profile, pathology, and language. Returns structured exercises with instructions, target phonemes, difficulty level, and therapist tips. |
| vocametrix_generate_word_listA | Generate a word list targeting a specific phoneme with pronunciation hints and difficulty progression. Useful for articulation therapy, phonological awareness drills, and accent training. |
| vocametrix_chat_speech_therapistA | Expert speech therapy assistant providing role-based guidance. Adapts its answers depending on whether the user is a therapist (clinical detail), a patient (accessible explanation), or a parent/caregiver (practical home tips). Maintains conversation context via threadId for multi-turn dialogue. |
| vocametrix_convert_french_to_ipaA | Convert French words or phrases to International Phonetic Alphabet (IPA) transcription. Accepts a single string or an array of up to 20 words. Returns IPA transcription per word with optional syllable boundary marks. |
| vocametrix_interpret_spelling_attemptA | Interpret a speech-to-text transcription of a spelling attempt and give intelligent feedback. Returns whether the spelling matches, an explanation of differences, and correction guidance. Useful for spelling therapy apps where children spell words aloud. |
| vocametrix_check_syntaxA | Analyze text for grammar and syntax errors with severity classification (error/warning/info). Returns overall score, per-issue breakdown, corrected text, and readability statistics. Useful for evaluating written language samples in speech-language assessments. |
| vocametrix_vocabulary_tutorA | Conversational vocabulary tutor adapting to learner profile (native language, target language, age, topic). Uses spaced repetition principles. Maintain conversation context via threadId. |
| vocametrix_adapt_exerciseA | Adapt a speech therapy exercise to a specific learner profile (ADHD, dyslexia, dysgraphia, dyspraxia, Tourette, autism). Returns an HTML-formatted adapted version of the exercise with profile-specific tips. |
| vocametrix_generate_therapy_planA | Launch an asynchronous LangGraph-powered therapy plan generation from session audio embeddings. Returns a therapy_session_id. Use vocametrix_get_therapy_status to poll progress, then vocametrix_get_therapy_result to retrieve the plan once complete (~30–120 seconds). Requires wav2vec embeddings — run eGeMAPS or embedding extraction first. |
| vocametrix_get_therapy_statusA | Poll the status of an async therapy plan generation or stuttering classification session. Statuses: pending → processing → pending_approval → complete (or failed). result_available = true means you can call vocametrix_get_therapy_result. |
| vocametrix_get_therapy_resultA | Retrieve the completed therapy plan result. Only call when vocametrix_get_therapy_status returns result_available = true or status = 'complete'. Returns the full therapy session with exercise plans, clinical narrative, and HTML report path. |
| vocametrix_approve_therapy_planA | Human-in-the-loop approval gate for generated therapy plans. Actions: 'approve' (locks and delivers plan), 'reject' (discards), 'modify' (requires feedback, re-generates). This action is irreversible — once approved, the plan is sent for delivery. |
| vocametrix_full_voice_assessmentA | Run a comprehensive clinical voice assessment in a single call. Executes AVQI, CPP, multi-band HNR, jitter/shimmer, and spectral analysis in parallel, then returns a unified JSON report with all metrics and clinical severity interpretation. Requires both a sustained vowel recording (e.g. /a/ held 3+ s) and a connected speech recording. This is the tool an SLP would use for a full voice quality screening. |
| vocametrix_batch_pronunciationA | Assess pronunciation for all WAV files in a folder against a common reference text. Returns a table (Markdown + JSON) with accuracy, fluency, completeness, and prosody scores per file. Files are processed sequentially to stay within rate limits. Useful for classroom assessments, research cohorts, and batch L2 evaluation. |
| vocametrix_full_therapy_workflowA | End-to-end therapy plan generation with automatic polling and human-in-the-loop approval. Generates a therapy plan from session data, polls until complete, and presents it for approval. Returns the approved plan or the pending plan awaiting your approval action. After reviewing, call vocametrix_approve_therapy_plan with 'approve', 'modify', or 'reject'. |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
| interpret_voice_assessment | Generate a clinical SLP-style interpretation of voice assessment results. Provide the JSON output from vocametrix_full_voice_assessment or individual metric tools. |
| compare_pre_post_therapy | Generate a narrative comparison between two voice assessments (pre- and post-therapy). Quantifies improvement and interprets clinical significance. |
| generate_session_report | Generate a structured therapy session report from pronunciation assessment data. Suitable for clinical documentation and patient progress notes. |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
| api-docs | Vocametrix API quick reference: auth, rate limits, audio requirements, error codes |
| Thresholds: AVQI | Clinical reference thresholds for AVQI |
| Thresholds: DSI | Clinical reference thresholds for DSI |
| Thresholds: CPP | Clinical reference thresholds for CPP |
| Thresholds: HNR | Clinical reference thresholds for HNR |
| Thresholds: JITTER-SHIMMER | Clinical reference thresholds for JITTER-SHIMMER |
| Thresholds: GNE | Clinical reference thresholds for GNE |
| Thresholds: AVQI_LOCALES | Clinical reference thresholds for AVQI_LOCALES |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/pmarmaroli/vocametrix-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server