Provides access to Deepgram's speech recognition and text-to-speech capabilities, including audio transcription, natural speech generation, audio analysis with sentiment detection, speaker diarization, and language detection across multiple specialized models.
Deepgram MCP Server
A Model Context Protocol (MCP) server that provides access to Deepgram's speech recognition and text-to-speech capabilities.
Features
Audio Transcription: Convert audio to text with high accuracy
Text-to-Speech: Generate natural-sounding speech from text with automatic compression
Audio Analysis: Extract insights like sentiment, topics, intents, and entities
Speaker Diarization: Identify different speakers in audio
Language Detection: Automatically detect the language of audio
Multiple Models: Support for various Deepgram models optimized for different use cases
Smart Audio Compression: Automatically compresses generated audio files for efficient transfer
Installation
Clone this repository
Install dependencies:
npm installCopy the environment file and add your Deepgram API key:
cp env.example .env # Edit .env and add your DEEPGRAM_API_KEY, OPENAI_API_KEY or GROQ_API_KEY (whatever you want to use)Build the project:
npm run build
Usage
HTTP Transport (Recommended for Production)
The server will start on port 8080 by default. You can specify a different port:
STDIO Transport (For Development)
Available Tools
1. transcribe_audio
Transcribe audio to text with various options for customization.
Parameters:
audioUrl
oraudioData
: Audio source (URL or base64)model
: Deepgram model to use (default: "nova-2-general")language
: Language code (default: "en")punctuate
: Add punctuation (default: true)diarize
: Speaker identification (default: false)sentiment
: Sentiment analysis (default: false)And many more options...
2. text_to_speech
Convert text to speech using Deepgram's TTS models with automatic compression.
Parameters:
text
: Text to convert to speech (required)model
: TTS model to use (default: "aura-asteria-en")voice
: Voice selectionformat
: Output format (default: "mp3")speed
: Speech speed (default: 1.0)
Output:
Original audio file saved to
generated_audio/
folderCompressed audio data saved to
compressed_audio/
folderResponse includes file paths and compression metadata
3. analyze_audio
Perform advanced audio analysis including sentiment, topics, intents, and entities.
Parameters:
audioUrl
oraudioData
: Audio sourcefeatures
: Analysis features to enablemodel
: Model for analysis
4. get_models
Get information about available Deepgram models.
Parameters:
model_type
: Filter by model type ("transcription", "tts", or "all")
Client Configuration
For MCP clients, use this configuration:
Development
API Key
Get your Deepgram API key from Deepgram Console.
Audio Compression System
The TTS functionality includes an intelligent compression system that:
Automatically compresses generated audio files using gzip compression
Saves compressed data to separate files to avoid large agent responses
Provides decompression tools for easy audio file extraction
Maintains quality while reducing file sizes by 2-4x
File Structure
Decompression Tools
Python Script (Recommended):
Node.js Script:
Agno Integration
This MCP server also includes integration with Agno, a high-performance runtime for multi-agent systems.
Agno Tests
The TTS test will:
Generate audio with automatic compression
Save the response to
tts_response.json
Decompress the audio file to
generated_audio/
License
MIT
Developer
Dheeraj Mudireddy (meetdheerajreddy@gmail.com)
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Enables speech-to-text transcription, text-to-speech synthesis, and audio analysis using Deepgram's AI models. Supports features like speaker diarization, sentiment analysis, language detection, and various audio processing capabilities.