Schema | vocametrix

vocametrix

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
`VOCAMETRIX_API_KEY`	Yes	Your Vocametrix API key

Capabilities

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": true }
`prompts`	{ "listChanged": true }
`resources`	{ "listChanged": true }

Tools

Functions exposed to the LLM to take actions

Name	Description
vocametrix_calculate_avqiA	Calculate the Acoustic Voice Quality Index (AVQI v2.03 or v3.01), a clinically validated dysphonia score. AVQI > 2.43 (French) / 2.97 (English) indicates dysphonia. Requires a sustained vowel recording (e.g. /a/ held for 3+ seconds). Connected speech is optional but improves accuracy.
vocametrix_calculate_dsiB	Calculate the Dysphonia Severity Index (DSI). DSI > 1.6 = normal voice; DSI < –1.6 = severe dysphonia. Requires a sustained vowel WAV file plus voice-range parameters (MPT, F0 range, minimum intensity).
vocametrix_calculate_cppA	Calculate Cepstral Peak Prominence (CPP) from a sustained vowel. Higher CPP = better voice quality. Typical normal CPP: 20–28 dB. Clinically sensitive to breathiness and hoarseness.
vocametrix_calculate_hnrA	Calculate multi-band Harmonics-to-Noise Ratio (HNR) across frequency bands (80–8000 Hz) with age- and gender-specific norms. Higher HNR = cleaner voice. Normal HNR (500 Hz band): > 20 dB. Requires a sustained vowel.
vocametrix_calculate_jitter_shimmerA	Calculate jitter (period perturbation, PPQ5) and shimmer (amplitude perturbation) from a sustained vowel. Normal jitter < 1.04%; normal shimmer < 3.81 dB. Elevated values indicate irregular vibration — associated with dysphonia.
vocametrix_calculate_voice_range_profileA	Calculate the Voice Range Profile (VRP / ambitus / glissando) from a glissando recording. Returns frequency range (lowest to highest pitch) and intensity range with age/gender interpretation. Useful for singers and voice rehabilitation assessment.
vocametrix_calculate_prosody_similarityA	Compare prosodic patterns between a model (reference) recording and a learner recording. Returns similarity scores for pitch contour, intensity, duration, and pause patterns. Useful for accent coaching, speech imitation training, and L2 pronunciation.
vocametrix_calculate_spectralA	Extract advanced spectral measures from a sustained vowel: center of gravity, skewness/kurtosis, H1-H2 (breathiness indicator), H1-A1, H1-A3, LTAS slope and tilt, alpha ratio. Returns age/gender-normalized norms and voice pattern classification.
vocametrix_calculate_formantsA	Compute F1–F4 formant statistics (mean, SD, range, CV, IQR) from a sustained vowel with vowel-space stability and articulatory precision scores. Useful for dysarthria assessment, vowel space analysis, and cleft palate evaluation.
vocametrix_calculate_sz_ratioA	Calculate the S/Z phonation ratio (duration of sustained /s/ vs /z/). Normal ratio ≈ 1.0. Ratio > 1.4 suggests vocal fold pathology (the /z/ is shorter). Requires two separate recordings: one of sustained /s/ and one of sustained /z/.
vocametrix_calculate_gneA	Calculate the Glottal-to-Noise Excitation (GNE) ratio from a sustained vowel. GNE ranges 0–1; values < 0.5 suggest increased noise (breathiness/hoarseness). Computed via native Praat algorithm for clinical reliability.
vocametrix_calculate_h1_h2A	Calculate the formant-corrected H1–H2 voice source measure from a sustained vowel. H1–H2 is sensitive to breathiness: positive values indicate breathy voice, negative values indicate pressed/tense voice. Normal range: −2 to +2 dB.
vocametrix_calculate_abiC	Calculate the Acoustic Breathiness Index (ABI) combining connected speech and sustained vowel. ABI aggregates CPPS, jitter, GNE approximation, HNR (6 kHz), H1-H2, shimmer, and period SD. Sensitive to the full spectrum from breathy to pressed phonation.
vocametrix_calculate_voice_dynamicsA	Compute intensity dynamics, pitch-intensity correlation, and composite scores for voice control, projection, stability, effort, and monotonicity. Useful for voice training, public speaking coaching, and vocal fatigue assessment.
vocametrix_assess_pronunciationA	Score pronunciation accuracy at phoneme level against a reference text. Returns accuracy, fluency, completeness, and prosody scores (0–100) plus per-word and per-phoneme breakdowns. Supports 30+ locales (en-US, fr-FR, de-DE, zh-CN, ar-SA, etc.). Audio should be a clear reading of the reference text.
vocametrix_assess_pronunciation_with_pitchA	Pronunciation assessment enriched with per-word F0 (pitch) contours. In addition to accuracy/fluency/prosody scores, returns fundamental frequency (pitch) statistics for each word — useful for tonal language analysis and prosody coaching.
vocametrix_transcribe_audioA	Transcribe an audio file using Azure Speech-to-Text with streaming progress. Returns a transcriptionId and streams progress events via SSE until completion. Returns the final transcript and word-level timing. For long recordings, poll the progress events — transcription may take 30–120 seconds.
vocametrix_synthesize_speechB	Synthesize speech from text using Azure neural text-to-speech. Returns an audio URL and word-level timing data. Supports all Azure Neural voice names for the requested locale.
vocametrix_synthesize_speech_with_timingA	Synthesize speech via ElevenLabs v2 with per-character timing alignment. Returns audio data and a character-level timing map — useful for lip-sync, subtitles, and karaoke. Supports plain text or SSML markup.
vocametrix_measure_sound_levelA	Measure sound level in dB SPL over a specified time window in an audio file. Useful for environmental noise assessment, vocal loudness measurement, and calibration tasks. Note: startSec must be > 0 (use 0.001 for the start of the file).
vocametrix_extract_egemapsA	Extract the full openSMILE eGeMAPSv02 feature set (88 acoustic features) from an audio file. Features include F0, jitter, shimmer, HNR, MFCCs, formants, spectral flux, and loudness. Commonly used as input to machine-learning voice pathology classifiers.
vocametrix_detect_phonemesA	Detect phonemes in an audio recording using a deep-learning classifier. Returns phoneme labels with confidence scores. Currently supports French (fr) and Estonian (et) phoneme inventories.
vocametrix_classify_stutteringA	Classify stuttering disfluency patterns in a speech recording (async, ~30–120 seconds). Returns disfluency types (repetitions, prolongations, blocks), severity score, and fluency rate. The tool polls the result automatically — no separate status call needed.
vocametrix_interpret_voice_metricsA	Translate raw voice metrics (jitter, shimmer, HNR, CPPS, F0, etc.) into clinical-language interpretation with severity classification (normal / mild / moderate / severe) and actionable recommendations. Useful when you have metric values from other tools and want a clinician-readable summary.
vocametrix_generate_exercisesA	Generate personalized speech therapy exercises tailored to patient profile, pathology, and language. Returns structured exercises with instructions, target phonemes, difficulty level, and therapist tips.
vocametrix_generate_word_listA	Generate a word list targeting a specific phoneme with pronunciation hints and difficulty progression. Useful for articulation therapy, phonological awareness drills, and accent training.
vocametrix_chat_speech_therapistA	Expert speech therapy assistant providing role-based guidance. Adapts its answers depending on whether the user is a therapist (clinical detail), a patient (accessible explanation), or a parent/caregiver (practical home tips). Maintains conversation context via threadId for multi-turn dialogue.
vocametrix_convert_french_to_ipaA	Convert French words or phrases to International Phonetic Alphabet (IPA) transcription. Accepts a single string or an array of up to 20 words. Returns IPA transcription per word with optional syllable boundary marks.
vocametrix_interpret_spelling_attemptA	Interpret a speech-to-text transcription of a spelling attempt and give intelligent feedback. Returns whether the spelling matches, an explanation of differences, and correction guidance. Useful for spelling therapy apps where children spell words aloud.
vocametrix_check_syntaxA	Analyze text for grammar and syntax errors with severity classification (error/warning/info). Returns overall score, per-issue breakdown, corrected text, and readability statistics. Useful for evaluating written language samples in speech-language assessments.
vocametrix_vocabulary_tutorA	Conversational vocabulary tutor adapting to learner profile (native language, target language, age, topic). Uses spaced repetition principles. Maintain conversation context via threadId.
vocametrix_adapt_exerciseA	Adapt a speech therapy exercise to a specific learner profile (ADHD, dyslexia, dysgraphia, dyspraxia, Tourette, autism). Returns an HTML-formatted adapted version of the exercise with profile-specific tips.
vocametrix_generate_therapy_planA	Launch an asynchronous LangGraph-powered therapy plan generation from session audio embeddings. Returns a therapy_session_id. Use vocametrix_get_therapy_status to poll progress, then vocametrix_get_therapy_result to retrieve the plan once complete (~30–120 seconds). Requires wav2vec embeddings — run eGeMAPS or embedding extraction first.
vocametrix_get_therapy_statusA	Poll the status of an async therapy plan generation or stuttering classification session. Statuses: pending → processing → pending_approval → complete (or failed). result_available = true means you can call vocametrix_get_therapy_result.
vocametrix_get_therapy_resultA	Retrieve the completed therapy plan result. Only call when vocametrix_get_therapy_status returns result_available = true or status = 'complete'. Returns the full therapy session with exercise plans, clinical narrative, and HTML report path.
vocametrix_approve_therapy_planA	Human-in-the-loop approval gate for generated therapy plans. Actions: 'approve' (locks and delivers plan), 'reject' (discards), 'modify' (requires feedback, re-generates). This action is irreversible — once approved, the plan is sent for delivery.
vocametrix_full_voice_assessmentA	Run a comprehensive clinical voice assessment in a single call. Executes AVQI, CPP, multi-band HNR, jitter/shimmer, and spectral analysis in parallel, then returns a unified JSON report with all metrics and clinical severity interpretation. Requires both a sustained vowel recording (e.g. /a/ held 3+ s) and a connected speech recording. This is the tool an SLP would use for a full voice quality screening.
vocametrix_batch_pronunciationA	Assess pronunciation for all WAV files in a folder against a common reference text. Returns a table (Markdown + JSON) with accuracy, fluency, completeness, and prosody scores per file. Files are processed sequentially to stay within rate limits. Useful for classroom assessments, research cohorts, and batch L2 evaluation.
vocametrix_full_therapy_workflowA	End-to-end therapy plan generation with automatic polling and human-in-the-loop approval. Generates a therapy plan from session data, polls until complete, and presents it for approval. Returns the approved plan or the pending plan awaiting your approval action. After reviewing, call vocametrix_approve_therapy_plan with 'approve', 'modify', or 'reject'.

Prompts

Interactive templates invoked by user choice

Name	Description
`interpret_voice_assessment`	Generate a clinical SLP-style interpretation of voice assessment results. Provide the JSON output from vocametrix_full_voice_assessment or individual metric tools.
`compare_pre_post_therapy`	Generate a narrative comparison between two voice assessments (pre- and post-therapy). Quantifies improvement and interprets clinical significance.
`generate_session_report`	Generate a structured therapy session report from pronunciation assessment data. Suitable for clinical documentation and patient progress notes.

Resources

Contextual data attached and managed by the client

Name	Description
`api-docs`	Vocametrix API quick reference: auth, rate limits, audio requirements, error codes
`Thresholds: AVQI`	Clinical reference thresholds for AVQI
`Thresholds: DSI`	Clinical reference thresholds for DSI
`Thresholds: CPP`	Clinical reference thresholds for CPP
`Thresholds: HNR`	Clinical reference thresholds for HNR
`Thresholds: JITTER-SHIMMER`	Clinical reference thresholds for JITTER-SHIMMER
`Thresholds: GNE`	Clinical reference thresholds for GNE
`Thresholds: AVQI_LOCALES`	Clinical reference thresholds for AVQI_LOCALES

Server Configuration
Capabilities
Tools
Prompts
Resources

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pmarmaroli/vocametrix-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server