Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
VOCAMETRIX_API_KEYYesYour Vocametrix API key

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": true
}
prompts
{
  "listChanged": true
}
resources
{
  "listChanged": true
}

Tools

Functions exposed to the LLM to take actions

NameDescription
vocametrix_calculate_avqiA

Calculate the Acoustic Voice Quality Index (AVQI v2.03 or v3.01), a clinically validated dysphonia score. AVQI > 2.43 (French) / 2.97 (English) indicates dysphonia. Requires a sustained vowel recording (e.g. /a/ held for 3+ seconds). Connected speech is optional but improves accuracy.

vocametrix_calculate_dsiB

Calculate the Dysphonia Severity Index (DSI). DSI > 1.6 = normal voice; DSI < –1.6 = severe dysphonia. Requires a sustained vowel WAV file plus voice-range parameters (MPT, F0 range, minimum intensity).

vocametrix_calculate_cppA

Calculate Cepstral Peak Prominence (CPP) from a sustained vowel. Higher CPP = better voice quality. Typical normal CPP: 20–28 dB. Clinically sensitive to breathiness and hoarseness.

vocametrix_calculate_hnrA

Calculate multi-band Harmonics-to-Noise Ratio (HNR) across frequency bands (80–8000 Hz) with age- and gender-specific norms. Higher HNR = cleaner voice. Normal HNR (500 Hz band): > 20 dB. Requires a sustained vowel.

vocametrix_calculate_jitter_shimmerA

Calculate jitter (period perturbation, PPQ5) and shimmer (amplitude perturbation) from a sustained vowel. Normal jitter < 1.04%; normal shimmer < 3.81 dB. Elevated values indicate irregular vibration — associated with dysphonia.

vocametrix_calculate_voice_range_profileA

Calculate the Voice Range Profile (VRP / ambitus / glissando) from a glissando recording. Returns frequency range (lowest to highest pitch) and intensity range with age/gender interpretation. Useful for singers and voice rehabilitation assessment.

vocametrix_calculate_prosody_similarityA

Compare prosodic patterns between a model (reference) recording and a learner recording. Returns similarity scores for pitch contour, intensity, duration, and pause patterns. Useful for accent coaching, speech imitation training, and L2 pronunciation.

vocametrix_calculate_spectralA

Extract advanced spectral measures from a sustained vowel: center of gravity, skewness/kurtosis, H1-H2 (breathiness indicator), H1-A1, H1-A3, LTAS slope and tilt, alpha ratio. Returns age/gender-normalized norms and voice pattern classification.

vocametrix_calculate_formantsA

Compute F1–F4 formant statistics (mean, SD, range, CV, IQR) from a sustained vowel with vowel-space stability and articulatory precision scores. Useful for dysarthria assessment, vowel space analysis, and cleft palate evaluation.

vocametrix_calculate_sz_ratioA

Calculate the S/Z phonation ratio (duration of sustained /s/ vs /z/). Normal ratio ≈ 1.0. Ratio > 1.4 suggests vocal fold pathology (the /z/ is shorter). Requires two separate recordings: one of sustained /s/ and one of sustained /z/.

vocametrix_calculate_gneA

Calculate the Glottal-to-Noise Excitation (GNE) ratio from a sustained vowel. GNE ranges 0–1; values < 0.5 suggest increased noise (breathiness/hoarseness). Computed via native Praat algorithm for clinical reliability.

vocametrix_calculate_h1_h2A

Calculate the formant-corrected H1*–H2* voice source measure from a sustained vowel. H1*–H2* is sensitive to breathiness: positive values indicate breathy voice, negative values indicate pressed/tense voice. Normal range: −2 to +2 dB.

vocametrix_calculate_abiC

Calculate the Acoustic Breathiness Index (ABI) combining connected speech and sustained vowel. ABI aggregates CPPS, jitter, GNE approximation, HNR (6 kHz), H1-H2, shimmer, and period SD. Sensitive to the full spectrum from breathy to pressed phonation.

vocametrix_calculate_voice_dynamicsA

Compute intensity dynamics, pitch-intensity correlation, and composite scores for voice control, projection, stability, effort, and monotonicity. Useful for voice training, public speaking coaching, and vocal fatigue assessment.

vocametrix_assess_pronunciationA

Score pronunciation accuracy at phoneme level against a reference text. Returns accuracy, fluency, completeness, and prosody scores (0–100) plus per-word and per-phoneme breakdowns. Supports 30+ locales (en-US, fr-FR, de-DE, zh-CN, ar-SA, etc.). Audio should be a clear reading of the reference text.

vocametrix_assess_pronunciation_with_pitchA

Pronunciation assessment enriched with per-word F0 (pitch) contours. In addition to accuracy/fluency/prosody scores, returns fundamental frequency (pitch) statistics for each word — useful for tonal language analysis and prosody coaching.

vocametrix_transcribe_audioA

Transcribe an audio file using Azure Speech-to-Text with streaming progress. Returns a transcriptionId and streams progress events via SSE until completion. Returns the final transcript and word-level timing. For long recordings, poll the progress events — transcription may take 30–120 seconds.

vocametrix_synthesize_speechB

Synthesize speech from text using Azure neural text-to-speech. Returns an audio URL and word-level timing data. Supports all Azure Neural voice names for the requested locale.

vocametrix_synthesize_speech_with_timingA

Synthesize speech via ElevenLabs v2 with per-character timing alignment. Returns audio data and a character-level timing map — useful for lip-sync, subtitles, and karaoke. Supports plain text or SSML markup.

vocametrix_measure_sound_levelA

Measure sound level in dB SPL over a specified time window in an audio file. Useful for environmental noise assessment, vocal loudness measurement, and calibration tasks. Note: startSec must be > 0 (use 0.001 for the start of the file).

vocametrix_extract_egemapsA

Extract the full openSMILE eGeMAPSv02 feature set (88 acoustic features) from an audio file. Features include F0, jitter, shimmer, HNR, MFCCs, formants, spectral flux, and loudness. Commonly used as input to machine-learning voice pathology classifiers.

vocametrix_detect_phonemesA

Detect phonemes in an audio recording using a deep-learning classifier. Returns phoneme labels with confidence scores. Currently supports French (fr) and Estonian (et) phoneme inventories.

vocametrix_classify_stutteringA

Classify stuttering disfluency patterns in a speech recording (async, ~30–120 seconds). Returns disfluency types (repetitions, prolongations, blocks), severity score, and fluency rate. The tool polls the result automatically — no separate status call needed.

vocametrix_interpret_voice_metricsA

Translate raw voice metrics (jitter, shimmer, HNR, CPPS, F0, etc.) into clinical-language interpretation with severity classification (normal / mild / moderate / severe) and actionable recommendations. Useful when you have metric values from other tools and want a clinician-readable summary.

vocametrix_generate_exercisesA

Generate personalized speech therapy exercises tailored to patient profile, pathology, and language. Returns structured exercises with instructions, target phonemes, difficulty level, and therapist tips.

vocametrix_generate_word_listA

Generate a word list targeting a specific phoneme with pronunciation hints and difficulty progression. Useful for articulation therapy, phonological awareness drills, and accent training.

vocametrix_chat_speech_therapistA

Expert speech therapy assistant providing role-based guidance. Adapts its answers depending on whether the user is a therapist (clinical detail), a patient (accessible explanation), or a parent/caregiver (practical home tips). Maintains conversation context via threadId for multi-turn dialogue.

vocametrix_convert_french_to_ipaA

Convert French words or phrases to International Phonetic Alphabet (IPA) transcription. Accepts a single string or an array of up to 20 words. Returns IPA transcription per word with optional syllable boundary marks.

vocametrix_interpret_spelling_attemptA

Interpret a speech-to-text transcription of a spelling attempt and give intelligent feedback. Returns whether the spelling matches, an explanation of differences, and correction guidance. Useful for spelling therapy apps where children spell words aloud.

vocametrix_check_syntaxA

Analyze text for grammar and syntax errors with severity classification (error/warning/info). Returns overall score, per-issue breakdown, corrected text, and readability statistics. Useful for evaluating written language samples in speech-language assessments.

vocametrix_vocabulary_tutorA

Conversational vocabulary tutor adapting to learner profile (native language, target language, age, topic). Uses spaced repetition principles. Maintain conversation context via threadId.

vocametrix_adapt_exerciseA

Adapt a speech therapy exercise to a specific learner profile (ADHD, dyslexia, dysgraphia, dyspraxia, Tourette, autism). Returns an HTML-formatted adapted version of the exercise with profile-specific tips.

vocametrix_generate_therapy_planA

Launch an asynchronous LangGraph-powered therapy plan generation from session audio embeddings. Returns a therapy_session_id. Use vocametrix_get_therapy_status to poll progress, then vocametrix_get_therapy_result to retrieve the plan once complete (~30–120 seconds). Requires wav2vec embeddings — run eGeMAPS or embedding extraction first.

vocametrix_get_therapy_statusA

Poll the status of an async therapy plan generation or stuttering classification session. Statuses: pending → processing → pending_approval → complete (or failed). result_available = true means you can call vocametrix_get_therapy_result.

vocametrix_get_therapy_resultA

Retrieve the completed therapy plan result. Only call when vocametrix_get_therapy_status returns result_available = true or status = 'complete'. Returns the full therapy session with exercise plans, clinical narrative, and HTML report path.

vocametrix_approve_therapy_planA

Human-in-the-loop approval gate for generated therapy plans. Actions: 'approve' (locks and delivers plan), 'reject' (discards), 'modify' (requires feedback, re-generates). This action is irreversible — once approved, the plan is sent for delivery.

vocametrix_full_voice_assessmentA

Run a comprehensive clinical voice assessment in a single call. Executes AVQI, CPP, multi-band HNR, jitter/shimmer, and spectral analysis in parallel, then returns a unified JSON report with all metrics and clinical severity interpretation. Requires both a sustained vowel recording (e.g. /a/ held 3+ s) and a connected speech recording. This is the tool an SLP would use for a full voice quality screening.

vocametrix_batch_pronunciationA

Assess pronunciation for all WAV files in a folder against a common reference text. Returns a table (Markdown + JSON) with accuracy, fluency, completeness, and prosody scores per file. Files are processed sequentially to stay within rate limits. Useful for classroom assessments, research cohorts, and batch L2 evaluation.

vocametrix_full_therapy_workflowA

End-to-end therapy plan generation with automatic polling and human-in-the-loop approval. Generates a therapy plan from session data, polls until complete, and presents it for approval. Returns the approved plan or the pending plan awaiting your approval action. After reviewing, call vocametrix_approve_therapy_plan with 'approve', 'modify', or 'reject'.

Prompts

Interactive templates invoked by user choice

NameDescription
interpret_voice_assessmentGenerate a clinical SLP-style interpretation of voice assessment results. Provide the JSON output from vocametrix_full_voice_assessment or individual metric tools.
compare_pre_post_therapyGenerate a narrative comparison between two voice assessments (pre- and post-therapy). Quantifies improvement and interprets clinical significance.
generate_session_reportGenerate a structured therapy session report from pronunciation assessment data. Suitable for clinical documentation and patient progress notes.

Resources

Contextual data attached and managed by the client

NameDescription
api-docsVocametrix API quick reference: auth, rate limits, audio requirements, error codes
Thresholds: AVQIClinical reference thresholds for AVQI
Thresholds: DSIClinical reference thresholds for DSI
Thresholds: CPPClinical reference thresholds for CPP
Thresholds: HNRClinical reference thresholds for HNR
Thresholds: JITTER-SHIMMERClinical reference thresholds for JITTER-SHIMMER
Thresholds: GNEClinical reference thresholds for GNE
Thresholds: AVQI_LOCALESClinical reference thresholds for AVQI_LOCALES

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pmarmaroli/vocametrix-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server