Enables audio file analysis using Google's Gemini models, with support for multimodal context through JSON data and system instructions to guide the model's behavior.
Gemini Multimodal Audio Upload
This project provides a Model Context Protocol (MCP) server that enables audio analysis using Google's Gemini models. It allows you to upload audio files, provide optional context (JSON), and receive detailed analysis based on your prompts.
Features
Audio Analysis: Upload and analyze audio files (WAV, MP3, etc.) using Google Gemini.
Multimodal Context: Support for providing additional context via JSON files or strings.
System Instructions: Ability to provide system instructions (e.g., "Gem" definitions) to guide the model's behavior.
MCP Server: Exposes functionality as an MCP tool, making it compatible with MCP clients like Claude Desktop or VS Code extensions.
Prerequisites
Python 3.10 or higher
A Google Cloud Project with the Gemini API enabled.
An API key for the Gemini API.
Installation
Clone the repository:
git clone https://github.com/unscene/gemini-audio-upload.git cd gemini-audio-uploadInstall dependencies with uv:
uv sync
Configuration
Create a
.envfile in the root directory:cp .env.example .env # If .env.example exists, otherwise create newAdd your Google API key to the
.envfile:GOOGLE_API_KEY=your_api_key_here
Usage
Running the MCP Server
You can run the MCP server directly using uv:
However, it is typically run by an MCP client.
MCP Tool: analyze_audio
The server exposes a single tool: analyze_audio.
Arguments:
audio_path(string, required): The absolute path to the audio file you want to analyze.prompt(string, optional): The prompt to guide the analysis. Default: "Describe this audio."json_path(string, optional): Path to a JSON file containing context data.json_context(string, optional): A JSON string containing context data (overridesjson_path).instruction_file(string, optional): Path to a text file containing system instructions.model(string, optional): The Gemini model to use. Default: "gemini-1.5-pro".
Example Usage (Conceptual)
If you are using an MCP client, you might ask:
"Analyze the audio file at
C:\path\to\recording.wavand tell me if the speaker sounds happy."
The client would call the analyze_audio tool with:
audio_path:C:\path\to\recording.wavprompt: "Tell me if the speaker sounds happy."
Client Configuration
Claude Desktop App
To use this server with the Claude Desktop App, add the following configuration to your claude_desktop_config.json file.
Windows Location: %APPDATA%\Claude\claude_desktop_config.json
macOS Location: ~/Library/Application Support/Claude/claude_desktop_config.json
Note: Replace
VS Code (MCP Extension)
If you are using an MCP extension in VS Code (like the official "Model Context Protocol" extension), you can typically configure it in your VS Code settings.json: