speak

Synthesize speech from a klattsch phoneme string. Returns base64 WAV audio.

What This Is

klattsch is a formant speech synthesizer (late-70s/early-80s style — think Votrax, SAM). You give it a string of ARPAbet phoneme codes with optional voice control directives, and it renders a WAV audio file.

How To Use This Tool

Step 1: Build a phoneme string

Write ARPAbet phonemes separated by spaces, with control directives mixed in. Use the text_to_phonemes tool first to convert English text, then refine by hand.

Step 2 (optional): Set voice character

Prefix your utterance with control directives to set the voice:

bN: base pitch in Hz (b120 = default male, b200 = female, b280 = child)
rN: per-phoneme duration in ms (r80 = fast, r110 = normal, r250+ = sung)
sN: formant scale (1.0 = male, 1.17 = female, 1.3 = child)
vN: vibrato depth in Hz (v3-v6 = expressive, v0 = off)
hN: breathiness 0..1 (h0.3 = airy/whispery)
gN: vocal effort 0=lax..1=tense (default 0.5)
tN: spectral tilt -0.9=darker..+0.9=brighter (t-0.4 = warm, t0.3 = bright)

Step 3: Add prosody (intonation)

! after a vowel for stress: DH AE! T = "THAT" with emphasis
+N/-N on vowels for pitch changes: AY+20 = rising "I", D AH N(-30) = falling "done"
(+N)/(-N) for transient ornaments (don't carry forward)
, ; . for pauses: 100ms, 200ms, 300ms

Step 4: Render

Pass the complete string to this tool.

Quick-Reference Voice Presets

Preset	Directives	Description
Male natural	b120 r100 s1.0 v2	Default voice
Male deep	b90 r95 s0.92 v1 t-0.3 g0.6	Deep, authoritative
Male bright	b130 r105 s1.0 v2 t0.2	Clear, energetic
Female natural	b200 r100 s1.17 v2	Natural female
Female warm	b185 r105 s1.15 v3 t-0.2	Warm, friendly
Female bright	b220 r100 s1.18 v2 t0.2	Bright, cheery
Child	b280 r90 s1.3 v1	Young, higher pitch
Robot	b120 r90 s1.0 v0 h0 g0.8 t0.5	Flat, mechanical
Whisper	b120 r100 s1.0 v0 h0.6 g0.1	Breathy whisper
Dramatic	b100 r130 s1.0 v5	Slow, theatrical
Singing male	bC4 r300 s1.0 v5	For sung notes
Singing female	bG4 r300 s1.17 v4	For sung notes

Intonation Patterns That Sound Natural

Falling statement (period): last vowel gets -20 to -30 e.g. D AH N(-25) Rising question: last vowel gets +20 to +30 e.g. R EH D IY(+25) Listing items: each item rises, last falls e.g. AE(+15) P AH L Z(+15) AO R AH N JH(-20) Excited: higher base pitch, faster b140 r85 ... Serious/deep: lower base pitch, slower b95 r115 ... Sarcastic: exaggerated pitch swings AY+30 M . S OW(-30) . S AH R K AE S T IH K

Singing With Note Names

Instead of Hz for b, use note names: bC4, bD#4, bEb4, bF4, bG4, bA4, bB4 Middle C = C4 (261Hz), A4 = 440Hz Set r250-r400 per phoneme, group notes with parentheses: bC4 r300 ( HH AH ) ( L OW ) bE4 ( W ER L D )

Example Strings

"Hello world" (male): b120 r100 s1.0 HH AH L OW . W ER L D
"How are you?" (female, rising): b200 s1.17 HH AW . AA R . Y UW(+25)
"I am NOT impressed" (stress on NOT): b120 AY . AE M . N AO T! . IH M P R EH S T(-20)
"The quick brown fox" (energetic): b135 r90 t0.2 DH AH . K W IH K . B R AW N . F AA K S
Sing "Twinkle twinkle" (two notes): bC4 r300 ( T W IH NG ) ( K AH L ) bG4 r300 ( T W IH NG ) ( K AH L )
Dramatic movie trailer voice: b95 r140 s0.95 v4 t-0.3 g0.7 IH N . AH . W ER L D(-25) .
Robot announcement: b130 r85 s1.0 v0 h0 g0.8 t0.4 AH T EH N SH AH N . P L IY Z
Whispered secret: b110 r105 v0 h0.5 g0.1 s1.0 P S T . D OW N T . T EH L . EH N IY W AH N

Phoneme Categories (all 39 phonemes)

Vowels: IY IH EH AE AA AO AH UH UW ER AY AW EY OW OY Sonorants: W Y R L M N NG Fricatives: F TH S SH V DH Z ZH HH Stops: P B T D K G (these get automatic burst + silence) Affricates: CH JH

⚠️ P, B, T, D, K, G, CH, JH are stop consonants — they include an automatic silence-burst pattern. Don't add extra pauses after them.

Name	Required	Description	Default
`utterance`	Yes	The klattsch phoneme string. ARPAbet codes + control directives, whitespace-separated. Use text_to_phonemes to convert English first, then tweak.
`sampleRate`	No

Name

Required

Description

Default

utterance

Yes

The klattsch phoneme string. ARPAbet codes + control directives, whitespace-separated. Use text_to_phonemes to convert English first, then tweak.

sampleRate

klattsch-mcp

Instructions

What This Is

How To Use This Tool

Step 1: Build a phoneme string

Step 2 (optional): Set voice character

Step 3: Add prosody (intonation)

Step 4: Render

Quick-Reference Voice Presets

Intonation Patterns That Sound Natural

Singing With Note Names

Example Strings

Phoneme Categories (all 39 phonemes)

Input Schema

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API