Which integrations are available for this server?

Integration with Billboard for chart data. Default LLM backend for audio understanding, providing natural language processing capabilities. Configurable LLM backend via OpenAI-compatible API for audio understanding. Integration with Spotify for music discovery and playlist management. Search for and download audio from YouTube and other platforms via yt-dlp.

How do I use Claud-Ear?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Claud-Ear Analyze the mood and genre of this song." That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Claud-Ear

by Null-Phnix

Overview Schema Related Servers Score Discussions

Python

Local

🔈 Claud-Ear

Give your AI agent the ability to listen to and understand music/audio files — works with ANY MCP client.

Python License: MIT

Audio intelligence MCP server. Semantic understanding, stem separation, lyrics transcription, signal analysis — all as tool calls.

# Your agent asks:
"Analyze this track — what's the genre, tempo, key, and mood?"
"Separate the vocals from the instrumental"
"Transcribe the lyrics from this song"
"Generate a trap beat at 140 BPM"

Claud-Ear connects your AI agent (Hermes Agent, Claude Code, Codex CLI, etc.) to a full audio intelligence pipeline. Drop in an MP3, WAV, FLAC, OGG, M4A, or OPUS file and your agent can analyze, separate, transcribe, and understand it.

Related MCP server: MCP Audio Server

Why I Built This

I have a music library of ~5,000 tracks. I wanted my agent to understand them like I do — not just "this is a 3-minute MP3" but "this is a melancholic D minor indie rock track with a prominent bass line and lyrics about loss."

Existing tools were either:

Shallow — basic metadata (artist, title, duration) with no semantic understanding
Cloud-only — upload your audio to someone's server, pay per analysis, hope they don't train on it
GUI-only — great for humans, useless for agents that need structured tool calls
Single-purpose — one tool for stems, another for transcription, another for analysis, no integration

Claud-Ear is an MCP server because my agent should be able to say "analyze this track" and get back structured data — genre, tempo, key, stems, lyrics, mood — as a tool call result. Not me manually running 4 different CLI tools and copy-pasting the output.

The autonomous agent mode exists because I don't want to manually trigger analysis on 5,000 tracks. It should run overnight, unsupervised, and finish the job.

What It Does

Capability	Model/Tool	What It Gives You
🔍 Semantic understanding	CLAP (LAION/CLAP Music & Speech)	Genre, mood, instruments, era classification
🎛️ Source separation	Demucs HT	Isolate vocals, drums, bass, other as separate files
📝 Lyrics transcription	Whisper large-v3	Transcribe lyrics from isolated vocals
📊 Signal analysis	librosa	Tempo, key, chords, structure, rhythm
⬇️ Audio downloading	yt-dlp	Download from YouTube, Spotify, etc.
🏥 Audio surgery	sonic_surgery	EQ, stem manipulation, dynamics processing
🎹 Beat production	beat_studio + MIDI	Generate beats, chord progressions, melodies

Default LLM backend: Ollama (configurable to any OpenAI-compatible API).

Current Pain Points

These are the battles I'm actively fighting:

server.py is 104K lines — This started as a clean MCP server and became a monolith. CLAP loading, Demucs inference, Whisper transcription, librosa analysis, caching, disk eviction, schema versioning — all in one file. It needs to be split into modules but I keep adding features instead of refactoring.
8GB VRAM means one model at a time — CLAP, Demucs, and Whisper all want GPU. I can't run them simultaneously. The "deep_listen" tool has to load/unload models in sequence, which turns a 2-minute analysis into a 10-minute analysis. I have a GPU lock system but it's a hack.
Cache invalidation is hard — I built LRU memory + disk cache with schema versioning. When I change the output format, old cache entries auto-invalidate. But the cache key logic is fragile — same file, same analysis, different day = cache miss because the schema version bumped. I'm over-engineering caching.
yt-dlp breaks monthly — YouTube changes their frontend, yt-dlp needs an update, and the search_and_download tool stops working until I manually update. This is not the tool's fault but it's a maintenance burden I didn't anticipate.
15-minute max duration is arbitrary — Set to 900 seconds because longer tracks OOM on 8GB VRAM. A 20-minute ambient piece or live set gets truncated. The limit should be dynamic based on available memory, not hardcoded.
Autonomous agent gets stuck — The batch analysis agent runs overnight but sometimes hangs on one track (corrupted file, unsupported codec, Demucs crash). There's no timeout per-track, so one bad file blocks the whole queue. I need per-track error isolation.
Billboard/Spotify integrations are brittle — charts.py and discovery.py depend on third-party APIs with rate limits and breaking changes. The Billboard scraper broke twice in 3 months. These are nice-to-have features that cost more maintenance than value.

End Goals — Where This Is Headed

Short Term (now → 3 months)

Split server.py into modules — one file per capability (clap.py, demucs.py, whisper.py, librosa.py, cache.py)
Per-track timeouts in autonomous agent — one bad file shouldn't block 5,000
Dynamic duration limits — detect available VRAM and set max duration accordingly
Better error isolation — each tool runs in its own subprocess with timeouts and cleanup

Medium Term (3–6 months)

Unified audio knowledge base — all analyzed tracks feed into a ChromaDB graph (genre connections, similar tracks, playlist generation)
Cross-project integration — Deep Video Watcher's beat detection informs Claud-Ear's analysis; Huginn-scraped lyrics feed into track metadata
Local model consolidation — one vision-audio model instead of CLAP + Demucs + Whisper + librosa juggling

Long Term (6–12 months)

Fully autonomous music curation — "Here are 10,000 tracks. Generate me 20 playlists that flow well, with transitions, mood arcs, and no jarring genre jumps"
Real-time audio analysis — analyze a track as it's playing, not as a batch job
Integration with Bifrost — mythology-themed music (Wagnerian opera, Japanese taiko, Nordic folk) gets linked to cultural context in the knowledge graph

Quick Start

Prerequisites

Python 3.11–3.13
CUDA-capable GPU recommended (CPU-only works but is slower)
Ollama running locally (default) or any OpenAI-compatible API
uv (recommended) or pip

Install & Run

# Clone
git clone https://github.com/Null-Phnix/claud-ear.git
cd claud-ear

# Install with uv
uv sync

# Test the LLM backend
uv run python llm_backend.py

# Run the MCP server
uv run claud-ear

Configuration

By default, Claud-Ear connects to Ollama at http://localhost:11434 using llama3.1:8b. To customize:

export AUDIO_LLM_MODEL=llama3.1:8b     # model name
export AUDIO_LLM_HOST=http://localhost:11434  # API endpoint
export AUDIO_LLM_PROVIDER=ollama       # or "openai" for OpenAI-compatible APIs

For OpenAI-compatible providers (vLLM, TGI, LiteLLM, etc.):

export AUDIO_LLM_PROVIDER=openai
export AUDIO_LLM_HOST=http://localhost:8000
export AUDIO_LLM_MODEL=meta-llama/Llama-3.1-8B-Instruct

Connect to Your Agent

Hermes Agent (or any MCP client) — add to your MCP config:

{
  "mcpServers": {
    "claud-ear": {
      "command": "uv",
      "args": ["run", "claud-ear"]
    }
  }
}

Or for Claude Code:

claude mcp add claud-ear -- uv run claud-ear

Tools

`deep_listen(file_path)`

Full analysis pipeline — semantic understanding, source separation, transcription, and signal analysis all in one call. This is the main tool.

`analyze_audio(file_path)`

Quick analysis — genre, mood, instruments, tempo, key. Lighter than deep_listen.

`separate_stems(file_path)`

Isolate vocals, drums, bass, and other stems from a track as separate audio files.

`transcribe_lyrics(file_path)`

Extract and transcribe lyrics from vocals.

`search_and_download(query)`

Search for and download audio from YouTube and other platforms via yt-dlp.

`sonic_surgery(file_path, operation, **params)`

EQ adjustments, stem manipulation, dynamics processing.

`generate_beat(genre, bpm, bars)`

Generate a beat with chord progressions, melodies, and drum patterns as MIDI.

Architecture

claud-ear/
├── server.py              # MCP server (FastMCP) — main entry point
├── llm_backend.py         # Configurable LLM API client (Ollama/OpenAI)
├── agent.py               # Autonomous batch analysis agent
├── beat_studio.py         # Beat production engine
├── quality.py             # Audio quality assessment
├── discovery.py           # Music discovery tools
├── song_db.py             # Track metadata & lyrics database
├── sonic_surgery.py       # Audio repair & enhancement
├── extractor.py           # Feature extraction pipeline
├── download_playlists.py  # Bulk downloader
├── analyze_bass.py        # Bass frequency analysis
├── analyze_bitter.py      # Mood/valence classifier
├── charts.py              # Billboard chart integration
├── power.py               # Energy/sleep scheduling
├── dashboard.py           # Web dashboard
├── query.py               # Natural language music search
├── start_agent.sh         # Start autonomous agent
├── stop_agent.sh          # Stop autonomous agent
├── pause_at_130.sh        # Pause agent during peak hours
└── docs/                  # Design docs & implementation plans

Autonomous Agent

Run the autonomous music intelligence agent to batch-analyze your library:

# Analyze one song (test mode)
uv run python agent.py --one

# Run in continuous loop
./start_agent.sh

# Stop
./stop_agent.sh

The agent scans ~/Documents/music/music data/, finds pending tracks, analyzes them using the configured LLM backend, and writes full analysis documents to ~/Documents/music/analyses/.

License

MIT — use it, fork it, vibe with it.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Null-Phnix/claud-ear'

If you have feedback or need assistance with the MCP directory API, please join our Discord server