Skip to main content
Glama
florianbuetow

low-latency-tts-api-server-mcp

low-latency-tts-service-mcp

Made with AI Verified by Humans

Low-latency local text-to-speech powered by Kokoro through TTS.cpp. The service shells out to a local tts-cli binary for fast Kokoro GGUF inference, then handles queued playback, status tracking, and MCP integration for AI agents. It offers a FastAPI server for HTTP clients, a TypeScript MCP relay for Claude Code, Claude Desktop, or any MCP-compatible client, and an interactive terminal chat REPL (just chat) for typing text and hearing it spoken right away.

Features

Feature

Description

Queued Playback

Sequential speech playback through a background worker, with request status tracking

REST API Server

FastAPI server with /say, /voices, /status/{message_id}, and /health endpoints

MCP Server

Ready-to-use MCP bridge that exposes say, get_voices, and get_status tools

Interactive Chat REPL

Terminal REPL (just chat) that synthesizes and plays each line as you type, generating the next line while the current one is still playing

TTS.cpp Runtime

Uses the local TTS.cpp tts-cli for low-latency Kokoro GGUF speech generation

Kokoro Voices

27 English no-espeak Kokoro voices, including af_heart, af_sky, am_adam, and bm_george

WAV Output

Generated audio is saved as timestamped WAV files under data/output/ when enabled

Explicit Runtime Config

TTS.cpp binary, GGUF model path, sampling parameters, host, port, and playback settings are read from config.yaml

Under the hood, the project shells out to a local TTS.cpp tts-cli binary for Kokoro generation, uses sounddevice for audio output, and uses FastAPI for the HTTP server. The MCP server is a lightweight TypeScript stdio-to-HTTP relay using the Model Context Protocol SDK.

Design Principles

All runtime configuration is explicit. If a required value is missing from config.yaml, or if the configured tts-cli executable or Kokoro GGUF model is not present, the service fails immediately with a clear error. The service does not silently fall back to another model, voice, port, binary, or output directory.

Audio files are written to data/output/ as WAV files with timestamps when save_wav: true in config.yaml. The server serializes requests through a queue and a background audio worker. The worker generates audio for a request, starts playback, and can generate the next queued request while the current one is playing.

Related MCP server: VOICEVOX TTS MCP

Architecture

There are three entry paths into the system. HTTP clients call the FastAPI server directly. AI agents call the MCP relay (mcp/tts-mcp.ts), which reads the server host and port from config.yaml, checks /health, and forwards tool calls to the FastAPI API. Terminal users run the interactive chat REPL (src/main.py), which drives the shared TTS runtime directly without going through the HTTP server. In every case, TTS inference runs out-of-process through the configured TTS.cpp tts-cli, using the Kokoro no-espeak GGUF model.

┌─────────────────────────┐    ┌────────────────────┐      ┌─────────────────────────┐
│        AI Agent         │    │    HTTP Client     │      │      Terminal User      │
│  Claude Code / Desktop  │    │   curl / scripts   │      │        just chat        │
└────────────┬────────────┘    └─────────┬──────────┘      └────────────┬────────────┘
             │ MCP stdio                 │ HTTP                         │ keystrokes
             ▼                           │                              ▼
┌─────────────────────────┐              │                 ┌─────────────────────────┐
│  MCP Server (Node.js)   │              │                 │       src/main.py       │
│     mcp/tts-mcp.ts      │              │                 │  interactive chat REPL  │
│ tools: say, get_voices, │              │                 │ type text -> synth+play │
│       get_status        │              │                 └────────────┬────────────┘
└────────────┬────────────┘              │                              │
             │ HTTP                      │                              │
             ▼                           ▼                              │
┌───────────────────────────────────────────────────────┐               │
│                    FastAPI Server                     │               │
│       src/low_latency_tts_service_mcp/server.py       │               │
│                                                       │               │
│     POST /say  GET /voices  /status/{id}  /health     │               │
│       work queue -> audio worker -> status map        │               │
└───────────────────┬───────────────────────────────────┘               │
                    │                                                   │
                    ▼                                                   ▼
┌────────────────────────────────────────────────────────────────────────────────────┐
│            Shared TTS Runtime - src/low_latency_tts_service_mcp/tts.py             │
│                                                                                    │
│       text cleanup -> TTS.cpp command -> WAV reader -> sounddevice playback        │
└───────────────────┬───────────────────────────────────┬────────────────────────────┘
                    │                                   │
                    ▼                                   ▼
        ┌───────────────────────┐           ┌───────────────────────┐
        │    TTS.cpp tts-cli    │           │   data/output/*.wav   │
        │ Kokoro_no_espeak.gguf │           │   timestamped audio   │
        └───────────┬───────────┘           └───────────────────────┘
                    │
                    ▼
              ┌───────────┐
              │ Speakers  │
              └───────────┘

Prerequisites

  • Python 3.12+

  • uv - Python package manager (install)

  • just - Command runner (install)

  • Node.js 18+ - For the MCP server

  • TTS.cpp tts-cli - A local executable referenced by config.yaml

  • Local audio output device - Required for playback through sounddevice

Project Structure

.
├── src/
│   ├── main.py                         # Interactive chat REPL (just chat / just run)
│   ├── server.py                       # Module wrapper for uv run -m src.server
│   └── low_latency_tts_service_mcp/
│       ├── server.py                   # FastAPI server, queue, statuses, worker
│       └── tts.py                      # Config, Kokoro command, WAV playback
├── tests/
│   ├── test_main.py
│   ├── test_server.py
│   ├── test_tts.py
│   ├── test_integration.py
│   └── architecture/                   # Architecture import rule tests
├── scripts/
│   └── download-model.sh               # Interactive Kokoro model downloader
├── mcp/
│   ├── tts-mcp.ts                      # MCP relay to FastAPI server
│   ├── package.json
│   └── tsconfig.json
├── config/
│   ├── semgrep/                        # Static analysis rules
│   └── codespell/                      # Spell-check configuration
├── data/
│   ├── models/                         # Downloaded Kokoro GGUF model
│   └── output/                         # Generated WAV files
├── stubs/                              # Local type stubs for strict checking
├── vendor/
│   └── TTS.cpp/                        # Local TTS.cpp checkout/build location
├── config.yaml                         # Runtime configuration
├── justfile                            # Command recipes
└── pyproject.toml                      # Project metadata and dependencies

Setup

just init

Creates report directories and installs Python dependencies via uv sync --all-extras. If no Kokoro model is present and the command is running interactively, just init prompts for a model download. In non-interactive contexts it tells you to run just download.

Download a Model

just download

Downloads the Kokoro no-espeak GGUF model used by the default configuration:

Model

Size

Notes

Kokoro_no_espeak.gguf

~354 MB

English no-espeak Kokoro model with 27 built-in voices

The default path is:

data/models/Kokoro_no_espeak.gguf

Getting Started

  1. Run just init - installs Python dependencies

  2. Run just download - downloads Kokoro_no_espeak.gguf if it is not already present

  3. Confirm config.yaml points at the local tts-cli executable and downloaded model

  4. Run just start - starts the FastAPI TTS server

  5. Send requests via HTTP or through the MCP bridge

To try synthesis without the server, run just chat for an interactive terminal REPL: type a line, press Enter twice, and hear it spoken.

Configuration

TTS and server runtime settings live in config.yaml at the project root. Some operational constants, such as status retention and MCP request timeouts, are defined in code. Example:

tts_cli: ./vendor/TTS.cpp/build/bin/tts-cli
model: ./data/models/Kokoro_no_espeak.gguf
output_dir: ./data/output
sample_rate: 24000
lead_silence_ms: 200
default_voice: af_heart
save_wav: true
simplify_punctuation: false
n_threads: 8
timeout_seconds: 120
temperature: 1.0
topk: 50
repetition_penalty: 1.0
top_p: 1.0
host: 0.0.0.0
port: 12000

Key

Description

tts_cli

Path to the local TTS.cpp tts-cli executable

model

Path to Kokoro_no_espeak.gguf

output_dir

Directory for generated WAV files

sample_rate

Expected WAV sample rate in Hz

lead_silence_ms

Silence written before playback starts on a new audio stream

default_voice

Voice used when /say omits a voice

save_wav

Save generated audio to WAV files in output_dir (true or false)

simplify_punctuation

Simplify punctuation before synthesis (true or false)

n_threads

Number of threads passed to TTS.cpp

timeout_seconds

Maximum duration for one TTS.cpp generation command

temperature

Kokoro sampling temperature

topk

Kokoro top-k sampling value

repetition_penalty

Kokoro repetition penalty

top_p

Kokoro top-p sampling value

host

Server listen address

port

Server listen port

Voices

The no-espeak Kokoro model exposes these voice identifiers:

af_alloy, af_aoede, af_bella, af_heart, af_jessica, af_kore, af_nicole,
af_nova, af_river, af_sarah, af_sky, am_adam, am_echo, am_eric,
am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa, bf_alice,
bf_emma, bf_isabella, bf_lily, bm_daniel, bm_fable, bm_george

Usage

Command

Description

just chat

Start the interactive chat REPL (type text, hear speech)

just run

Alias for just chat

just start

Start the FastAPI TTS server in the foreground

just stop

Stop the running server

just status

Check if the server is running

just mcp-install

Install Node dependencies for the MCP relay

just mcp-start

Start the MCP stdio relay from the terminal

just mcp-typecheck

Type-check the MCP TypeScript relay

Interactive Chat (REPL)

just chat

Starts an interactive terminal REPL that synthesizes and plays each submission with the local TTS.cpp tts-cli. Generation for the next line overlaps playback of the current one, so there is no gap between utterances. The REPL drives the shared TTS runtime directly and does not require the FastAPI server to be running.

If --voice is not supplied, the REPL prompts you to pick a voice; otherwise it uses the one you pass. It reads settings (model, sampling parameters, sample rate, save_wav, and more) from config.yaml and fails immediately if the configured tts-cli or GGUF model is missing.

Input controls:

Key

Action

Enter once

Insert a newline into the current line

Enter twice

Submit the buffered text for synthesis

Enter twice on empty input

Quit

ESC twice

Quit

Backspace

Delete the previous character

Run it directly with uv run for more options:

# Pick a voice interactively, then type lines to speak
uv run -m src.main

# Skip voice selection
uv run -m src.main --voice af_heart

# One-shot: synthesize a single string and exit
uv run -m src.main --voice am_adam "Hello from Kokoro."

# List previously generated WAV files in data/output/ and exit
uv run -m src.main --list-outputs

When save_wav: true, each utterance is written to a timestamped WAV under output_dir; when false, audio is played and the temporary file is removed.

Server

just start

Starts a FastAPI server with queued playback. The server validates config.yaml at startup and processes requests sequentially through a background worker.

API

FastAPI auto-generates interactive docs at /docs (Swagger) and /redoc (ReDoc) when the server is running.

Method

Endpoint

Description

GET

/health

Liveness check

GET

/voices

List available voices and default voice

POST

/say

Queue text for synthesis and playback

GET

/status/{message_id}

Check status of a queued, generating, playing, completed, or failed message

POST /say

{
  "text": "Hello, this is a Kokoro TTS request.",
  "voice": "af_heart"
}

Returns 202 Accepted with a message ID and queue position:

{
  "message_id": "msg_20260627_130430_001",
  "status": "queued",
  "queue_position": 0
}

Audio plays through the server machine's speakers.

Message Lifecycle

queued -> generating -> playing -> completed

Failures are reported as:

error

Completed and failed statuses are evicted lazily after 1 hour when later /say or /status requests trigger status cleanup.

MCP Server

The MCP server (mcp/tts-mcp.ts) is a transparent relay between MCP clients and the FastAPI server. It exposes three tools:

Tool

Description

say

Queue text for speech synthesis with a specified voice

get_voices

List all available voices

get_status

Check status of a speech request by message ID

Setup

just mcp-install

Usage with Claude Code / Claude Desktop

Start the FastAPI server first:

just start

Then configure the MCP client to run the TypeScript relay directly. For Claude Code, from the project directory:

claude mcp add --scope local kokoro-tts-project \
  -e KOKORO_TTS_CONFIG_PATH=/path/to/low-latency-tts-service-mcp/config.yaml \
  -- /path/to/low-latency-tts-service-mcp/mcp/node_modules/.bin/tsx \
  /path/to/low-latency-tts-service-mcp/mcp/tts-mcp.ts

For JSON-based MCP configuration:

{
  "mcpServers": {
    "kokoro-tts-project": {
      "command": "/path/to/low-latency-tts-service-mcp/mcp/node_modules/.bin/tsx",
      "args": ["/path/to/low-latency-tts-service-mcp/mcp/tts-mcp.ts"],
      "env": {
        "KOKORO_TTS_CONFIG_PATH": "/path/to/low-latency-tts-service-mcp/config.yaml"
      }
    }
  }
}

The MCP relay reads host and port from config.yaml and calls /health before tool requests. Successful FastAPI JSON responses are returned as MCP text content; health check failures and non-OK HTTP responses are wrapped as structured MCP error results.

Development

Code Quality

Command

Description

just code-format

Auto-fix code style and formatting

just code-style

Check code style and formatting (read-only)

just code-typecheck

Run static type checking with mypy

just code-lspchecks

Run strict type checking with Pyright (LSP-based)

just code-security

Run security checks with bandit

just code-deptry

Check dependency hygiene with deptry

just code-spell

Check spelling in code and documentation

just code-semgrep

Run Semgrep static analysis

just code-audit

Scan dependencies for known vulnerabilities

just code-architecture

Run architecture import rule tests

just code-stats

Generate code statistics with pygount

Testing

Command

Description

just test

Run unit and integration tests

just test-coverage

Run tests with coverage report

CI

  • just ci - Run all validation checks (verbose)

  • just ci-quiet - Run all checks (silent, fail-fast)

The CI pipeline runs in order: init, code-format, code-style, code-typecheck, code-security, code-deptry, code-spell, code-semgrep, code-audit, test, code-architecture, code-lspchecks, and mcp-typecheck.

AI-Assisted Development

This project includes an AGENTS.md file with development rules for AI coding assistants.

F
license - not found
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/florianbuetow/low-latency-tts-api-server-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server