Uses Google's Gemini AI models to transcribe audio files into text, with multiple model options available for different quality and speed requirements.
Stores audio transcriptions in a Supabase database with pgvector for semantic search capabilities, enabling natural language queries across transcribed audio content.
MCP Audio RAG Server
Transform your audio files into a searchable knowledge base using AI. Ask Claude questions about your meetings, podcasts, lectures, or any audio content.
What is this?
This is an MCP (Model Context Protocol) server that lets you:
Transcribe any audio file using Google's Gemini AI
Store the transcriptions in a searchable database
Search through all your audio content using natural language
Once set up, you can simply ask Claude things like:
"What did they discuss about the budget in my meeting recording?"
"Find mentions of machine learning in my podcast collection"
"What were the key points from yesterday's lecture?"
How It Works
Quick Start
Prerequisites
Node.js 18+ - Download here
Gemini API Key - Get one free
Supabase Account - Sign up free
Step 1: Clone & Install
Step 2: Set Up Supabase Database
Create a new project at supabase.com
Go to SQL Editor in your dashboard
Paste and run the contents of
supabase/schema.sql
Step 3: Get Your API Keys
Supabase (Settings → API):
Copy Project URL →
SUPABASE_URLCopy service_role key →
SUPABASE_SERVICE_KEY
Google AI Studio:
Create key at aistudio.google.com/apikey →
GEMINI_API_KEY
Step 4: Configure
Edit .env:
Step 5: Add to Claude
For Claude Code CLI (~/.claude.json):
For Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on Mac):
Same config as above.
Usage
Transcribe Audio
Just tell Claude to transcribe a file:
Want to use a specific model? Just ask:
Search Your Audio
Ask natural questions:
Manage Your Library
Available Models
Model | Best For |
| Default - Fast & accurate, great balance |
| Fastest, cheapest - good for bulk processing |
| Best quality - complex audio, multiple speakers |
| Newest - cutting edge capabilities |
| Reliable - previous generation |
| Fast - previous generation |
Supported Audio Formats
.mp3 .mp4 .m4a .wav .webm .mpeg .mpga
Available Tools
Tool | Description |
| Transcribe and store an audio file |
| Search through your audio using natural language |
| List all transcribed audio files |
| Get the complete transcript of a file |
| Generate an AI summary of a transcript |
| Remove a transcribed file from the database |
Troubleshooting
Problem | Solution |
"No relevant segments found" | Try rephrasing your search, or check if audio was ingested |
"Missing environment variable" | Check your |
Supabase errors | Make sure you're using |
Slow transcription | Use |
Support This Project
If this project saved you time or helped you out, consider buying me a coffee!
License
MIT - Use it however you want!