Provides audio format conversion and transcoding capabilities, enabling export of generated soundscapes and music to WAV, MP3, OGG, and FLAC formats.
Integrates with Google's Gemini 2.0 Multimodal Live API and Lyria 3 models (Pro and Clip) to generate dynamic environmental soundscapes, professional music, and high-fidelity audio content from text prompts.
🎵 Gemini Audio MCP
Gemini Audio MCP is a high-performance Model Context Protocol (MCP) server that leverages the power of the Gemini 2.0 Multimodal Live API to generate high-fidelity, environmental soundscapes on-demand.
🚀 Mission Statement
Our mission is to provide an immersive, AI-powered audio generation layer for any MCP-compatible environment, enabling the creation of dynamic, seamless, and high-quality environmental audio through simple text prompts.
✨ Key Features
🌊 Dynamic Soundscapes: Generate complex environmental audio using the latest Gemini 2.5 Native Audio models.
🎵 Professional Music: High-fidelity music production via Google's Lyria 3 models:
Lyria 3 Pro: Full song generation with structural coherence ($0.08/req).
Lyria 3 Clip: Low-latency clips and rhythmic loops ($0.04/req).
🔁 Infinite Looping: Seamless, click-free looping with 100ms micro-crossfades.
🔀 Smooth Crossfades: Transition between two different soundscapes with customizable crossfade durations.
📂 Universal Formats: Export audio to a variety of formats (WAV, MP3, OGG, FLAC) powered by FFmpeg.
▶️ Auto-play Integration: Instantly play generated audio through your system's default player upon completion.
⚙️ Persistent Configuration: Fine-tune default bitrates, sample rates, and durations once and reuse them across sessions.
🛠 Installation Guide
Prerequisites
FFmpeg: Required for audio conversion and processing.
macOS:
brew install ffmpegUbuntu/Debian:
sudo apt install ffmpegWindows: Download from ffmpeg.org.
Rust Toolchain: Required for building the project (
cargo).Gemini API Key: Obtain your key from the Google AI Studio.
1. NPM / NPX (Recommended for non-Rust users)
Add the server directly to your MCP client configuration using npx:
{
"mcpServers": {
"gemini-audio": {
"command": "npx",
"args": ["-y", "gemini-audio-mcp"],
"env": {
"GEMINI_API_KEY": "YOUR_API_KEY"
}
}
}
}2. Manual Installation (Rust)
Clone the repository:
git clone https://github.com/mcp-servers/gemini-audio-mcp.git cd gemini-audio-mcpBuild the project:
cargo build --releaseConfigure your environment: Set the
GEMINI_API_KEYenvironment variable in your MCP client or system.
🔧 Tool Usage Examples
Generate a Soundscape
Create an immersive 30-second loop of a cyberpunk rainy city.
{
"name": "generate_soundscape",
"arguments": {
"prompt": "Heavy rain on neon-lit cyberpunk city streets, distant hover-car hums, muffled holographic advertisements.",
"duration": 30,
"format": "mp3",
"auto_play": true
}
}Transition Between Environments
Seamlessly shift from a peaceful forest to a roaring thunderstorm.
{
"name": "transition_soundscape",
"arguments": {
"from_prompt": "Quiet morning forest with chirping birds and rustling leaves.",
"to_prompt": "Intense tropical thunderstorm with loud thunder claps and heavy downpour.",
"transition_duration": 10,
"auto_play": true
}
}Update Server Defaults
Set the default output format to FLAC for higher quality.
{
"name": "configure",
"arguments": {
"default_format": "flac",
"default_sample_rate": 48000
}
}🏛 Architecture Overview
The server is built with a modular Rust architecture designed for efficiency and reliability:
main.rs: The core MCP protocol engine handling tool registration and request dispatching.gemini.rs: Manages low-level WebSocket communication with the Gemini 2.0 Multimodal Live API.audio.rs: Handles PCM data manipulation, including seamless looping algorithms and FFmpeg integration for format transcoding.mixer.rs: Implements audio processing logic for crossfading and blending multiple audio streams.config.rs: Provides a persistent JSON-based configuration layer for user preferences.
📄 License
Distributed under the MIT License. See LICENSE for more information.