Which integrations are available for this server?

Enables audio file analysis using Google's Gemini models, with support for multimodal context through JSON data and system instructions to guide the model's behavior.

How do I use Gemini Audio Upload?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Gemini Audio Upload analyze this meeting recording and summarize the key decisions" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Gemini Multimodal Audio Upload

CodeRabbit Pull Request Reviews

This project provides a Model Context Protocol (MCP) server that enables audio analysis using Google's Gemini models. It allows you to upload audio files, provide optional context (JSON), and receive detailed analysis based on your prompts.

Features

Audio Analysis: Upload and analyze audio files (WAV, MP3, etc.) using Google Gemini.
Multimodal Context: Support for providing additional context via JSON files or strings.
System Instructions: Ability to provide system instructions (e.g., "Gem" definitions) to guide the model's behavior.
MCP Server: Exposes functionality as an MCP tool, making it compatible with MCP clients like Claude Desktop or VS Code extensions.

Prerequisites

Python 3.10 or higher
A Google Cloud Project with the Gemini API enabled.
An API key for the Gemini API.

Installation

Clone the repository:
git clone https://github.com/unscene/gemini-audio-upload.git cd gemini-audio-upload
Install dependencies with uv:
uv sync

Configuration

Create a .env file in the root directory:
cp .env.example .env # If .env.example exists, otherwise create new
Add your Google API key to the .env file:
GOOGLE_API_KEY=your_api_key_here

Usage

Running the MCP Server

You can run the MCP server directly using uv:

uv run gemini_audio/mcp_server.py

However, it is typically run by an MCP client.

MCP Tool: `analyze_audio`

The server exposes a single tool: analyze_audio.

Arguments:

audio_path (string, required): The absolute path to the audio file you want to analyze.
prompt (string, optional): The prompt to guide the analysis. Default: "Describe this audio."
json_path (string, optional): Path to a JSON file containing context data.
json_context (string, optional): A JSON string containing context data (overrides json_path).
instruction_file (string, optional): Path to a text file containing system instructions.
model (string, optional): The Gemini model to use. Default: "gemini-1.5-pro".

Example Usage (Conceptual)

If you are using an MCP client, you might ask:

"Analyze the audio file at C:\path\to\recording.wav and tell me if the speaker sounds happy."

The client would call the analyze_audio tool with:

audio_path: C:\path\to\recording.wav
prompt: "Tell me if the speaker sounds happy."

Client Configuration

Claude Desktop App

To use this server with the Claude Desktop App, add the following configuration to your claude_desktop_config.json file.

Windows Location: %APPDATA%\Claude\claude_desktop_config.json macOS Location: ~/Library/Application Support/Claude/claude_desktop_config.json

{ "mcpServers": { "gemini-audio": { "command": "uv", "args": [ "--directory", "/absolute/path/to/gemini-audio-upload", "run", "gemini_audio/mcp_server.py" ], "env": { "GOOGLE_API_KEY": "your_api_key_here" } } } }

Note: Replace

VS Code (MCP Extension)

If you are using an MCP extension in VS Code (like the official "Model Context Protocol" extension), you can typically configure it in your VS Code settings.json:

"mcp.servers": { "gemini-audio": { "command": "uv", "args": [ "--directory", "C:\\absolute\\path\\to\\gemini-audio-upload", "run", "gemini_audio/mcp_server.py" ], "env": { "GOOGLE_API_KEY": "your_api_key_here" } } }

License

MIT

Install Server

A

security – no known vulnerabilities

F

license - not found

A

quality - confirmed to work

How are these scores calculated?

Resources

GitHub Repository

Need Help?

Report Issue

Related Servers

Tools

analyze_audio

Gemini Audio Upload

Gemini Multimodal Audio Upload

Features

Prerequisites

Installation

Configuration

Usage

Running the MCP Server

MCP Tool: `analyze_audio`

Example Usage (Conceptual)

Client Configuration

Claude Desktop App

VS Code (MCP Extension)

License

Resources

Tools

New MCP Servers

Latest Blog Posts

MCP directory API

Gemini Multimodal Audio Upload

Features

Prerequisites

Installation

Configuration

Usage

Running the MCP Server

MCP Tool: analyze_audio

Example Usage (Conceptual)

Client Configuration

Claude Desktop App

VS Code (MCP Extension)

License

Resources

Tools

New MCP Servers

Latest Blog Posts

MCP directory API

MCP Tool: `analyze_audio`