Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
WHISPER_MODELNoWhisper model size to use for local inference (e.g., tiny, base, small, medium, large-v3, turbo).base
OPENAI_API_KEYNoOpenAI API key. Required if using the 'openai' backend.
WHISPER_BACKENDNoTranscription backend to use: 'auto' (tries local first, falls back to OpenAI), 'local' (faster-whisper), or 'openai' (Whisper API).auto
WHISPER_LANGUAGENoISO-639-1 language code (e.g., 'en'). Defaults to auto-detection if not specified.
TELEGRAM_BOT_TOKENNoTelegram Bot API token. Required for the transcribe_telegram_voice tool.

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": false
}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
transcribe_audioA

Transcribe an audio file to text using Whisper.

Supports OGG (Telegram voice), WAV, MP3, FLAC, and most common audio formats.

Args: file_path: Absolute path to the audio file to transcribe. language: Optional ISO-639-1 language code (e.g. 'en', 'fr'). None = auto-detect. word_timestamps: If True, include word-level timestamps in segments.

Returns: dict with: text, language, language_probability, duration, segments, backend, success, error

transcribe_telegram_voiceA

Download and transcribe a Telegram voice message.

Downloads the voice message from Telegram, transcribes it, then deletes the temp file.

Args: file_id: The file_id from a Telegram voice message (from the Message object). bot_token: Telegram bot token. Falls back to TELEGRAM_BOT_TOKEN env var. language: Optional ISO-639-1 language code. None = auto-detect. word_timestamps: Include word-level timestamps in segments.

Returns: Same dict structure as transcribe_audio.

list_modelsB

List available Whisper model sizes with performance characteristics.

Configure the active model via the WHISPER_MODEL environment variable. Default is 'base' -- a good balance of speed and accuracy for voice messages.

check_backendsA

Check which transcription backends are available and configured.

Call this first to verify your setup before transcribing.

speak_textA

Convert text to speech and return an OGG/Opus audio file path.

Plays as a native voice note in Telegram when sent as an attachment.

TTS backends (in priority order):

  1. Kokoro (local, free, natural-sounding) -- auto-starts via uvx kokoro-fastapi

  2. OpenAI TTS (cloud, requires OPENAI_API_KEY, ~$0.015/1k chars)

  3. macOS say (Mac only fallback, sounds robotic)

Configure via TTS_BACKEND env var: "auto" | "kokoro" | "openai" | "macos"

Args: text: Text to synthesise. voice: Voice name. Kokoro voices: af_sky, af_bella, af_sarah, am_adam, am_michael, bf_emma, bm_george, bm_lewis. OpenAI voices: alloy, echo, fable, onyx, nova, shimmer. Configure default via TTS_VOICE env var. output_path: Optional absolute path for the output .ogg file.

Returns: dict with: file_path (absolute .ogg path), backend, voice, success, error

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/abid-mahdi/whisper-telegram-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server