StemSplit Vocal Remover & Stem Separator
OfficialStemSplit is an AI-powered server for separating audio into stems, removing background noise from voice recordings, and managing jobs — powered by HTDemucs + DeepFilterNet — usable via MCP-compatible clients (Claude Desktop, Cursor, Cline, etc.) or programmatic calls.
Stem Separation
Separate local audio files (MP3, WAV, FLAC, M4A, OGG, AAC) or direct URLs into up to 6 stems: vocals, instrumental, drums, bass, piano, guitar, and other
Extract vocals and instrumental from YouTube videos (up to 60 min) or public SoundCloud tracks (up to 15 min)
Configure output type (
VOCALS,INSTRUMENTAL,BOTH,FOUR_STEMS,SIX_STEMS), quality (FAST,BALANCED,BEST), and format (MP3,WAV,FLAC)Optionally denoise extracted vocals in the same job (
denoiseVocals: true)Create karaoke versions (instrumentals) or acapellas (vocals)
Voice Cleaning & Noise Removal
Remove background noise (hum, hiss, HVAC, wind, echo, ambient sound) using DeepFilterNet
Ideal for cleaning podcasts, interviews, voiceovers, and dialogue before transcription or video production
Job Management
Submit jobs and optionally wait for completion, downloading outputs locally
Check status and retrieve details for stem, YouTube, SoundCloud, or denoise jobs
List recent jobs of any type
Re-download stems from completed jobs with refreshed presigned URLs
Check remaining API credit balance (in seconds, minutes, and human-readable form)
Integration & Automation
Use natural language prompts or programmatic tool calls in agentic workflows
Supports batch processing for transcription pre-processing, AI training data generation, remixing, and mastering
Provides integration with Cloudflare R2 for storage and retrieval of audio stems via presigned URLs.
Allows processing SoundCloud track URLs to extract vocal and instrumental stems.
Allows processing YouTube video URLs to extract vocal and instrumental stems.
stemsplit-mcp
AI stem separation and voice cleaning as a Model Context Protocol (MCP) server. Remove vocals, build karaoke tracks, isolate dialogue, split any song into vocals, drums, bass, piano, guitar, and other stems — or remove background noise from voice recordings using DeepFilterNet — directly from Claude Desktop, Cursor, Cline, Windsurf, Zed, or any other MCP-compatible client. Works with local audio files (MP3, WAV, FLAC, M4A, OGG, AAC) and YouTube/SoundCloud URLs.
Powered by the StemSplit API (HTDemucs for stem separation, DeepFilterNet for noise removal). The server exchanges only file paths and JSON over MCP — audio bytes never pass through the LLM context. They flow directly between your machine, StemSplit's API, and Cloudflare R2.
What you can do with this
Audio separation basics
Remove vocals from a song — separate any MP3, WAV, or FLAC into vocals and instrumental
Build a karaoke version of any track —
/karaokeslash command returns just the instrumentalExtract an acapella — pull a clean vocal track for remixes, mashups, or re-arrangement
Extract drums, bass, piano, or guitar — split audio into up to six individual stems
Process YouTube videos — paste a
youtube.comoryoutu.beURL and get separated stems back
Audio production & post-production
Clean vocals before processing — isolate vocals first, then pass to a de-esser, noise reducer, or pitch corrector without mix bleed affecting the result
Stem delivery for mastering — auto-generate per-stem exports from a final mix for a mastering engineer
Adaptive game audio — split a track so a game engine can fade individual layers (e.g. mute drums during quiet scenes)
DJ acapella/instrumental packs — batch-generate acapellas and instrumentals for live performance or DJ sets
Sample chopping — extract drums or bass for sample packs in hip-hop / electronic production
Voice cleaning & noise removal
Clean up a podcast or interview — remove hum, hiss, HVAC noise, or ambient room sound from any voice recording
Denoise vocals after stem separation — pass
denoiseVocals: truetoseparate_stemsand get a noise-free vocals stem in one shotClean dialogue for video production — strip wind, echo, or background noise before syncing to picture
Pre-process audio before transcription — clean first for dramatically higher ASR / Whisper accuracy
AI & developer pipelines
Vocals → transcription — isolate vocals first, then feed to Whisper or any ASR model for significantly cleaner speech-to-text
Lyrics generation — vocals → transcription → synced lyrics file, fully automated in a single MCP chain
Training data for AI music models — generate clean separated stems from raw mixed tracks for fine-tuning or dataset building
Content-ID / copyright checking — extract vocals to fingerprint and match against a vocal database
Per-stem audio visualizers — drive instrument-reactive visualizers in video or web apps by separating stems first
Content & media
Podcast / interview cleanup — strip music beds or background music from recorded dialogue
Sync licensing — instantly generate an instrumental version of a submitted track for a music supervisor
Music education apps — isolate individual instruments to build solo/mute practice tools or ear training exercises
Agentic workflows
Build audio agents in your IDE — orchestrate stem separation from Cursor or Claude Desktop using natural language
Batch process audio in MCP-driven pipelines — chain stem separation with transcription, translation, or any other MCP tool
Related MCP server: MusicGPT MCP Server
MCP clients supported
stemsplit-mcp runs as a local stdio MCP server, so it works in any client that supports the standard MCP transport:
Claude Desktop (Anthropic)
Cline (VS Code extension)
Windsurf (Codeium)
Any client following the Model Context Protocol specification
Tools, resources, and prompts
Stem separation
Tool | Use case |
| Upload a local audio file or pass a direct audio URL; get back local file paths to the separated stems |
| Submit a YouTube URL; get back local file paths to the vocals and instrumental stems |
| Submit a SoundCloud track URL; get back local file paths to the vocals and instrumental stems |
| Inspect existing stem jobs |
| Inspect existing YouTube jobs |
| Inspect existing SoundCloud jobs |
| Re-download outputs from a completed job (re-mints fresh 1-hour presigned URLs) |
Voice Cleaner (noise removal)
Tool | Use case |
| Submit an audio file or URL for noise removal; polls until complete and downloads the cleaned audio to disk |
| Check status or retrieve the download URL for a voice cleaner job |
| Browse voice cleaner job history or filter by status |
Account
Tool | Use case |
| Check remaining StemSplit credits |
Plus six ready-made prompts (slash commands): karaoke, isolate_dialogue, sampler_pack, youtube_instrumental, soundcloud_instrumental, clean_voice.
Install
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"stemsplit": {
"command": "npx",
"args": ["-y", "stemsplit-mcp"],
"env": {
"STEMSPLIT_API_KEY": "sk_live_your_key_here"
}
}
}
}Restart Claude Desktop. Type /karaoke or just ask: "Separate the vocals from ~/Music/demo.mp3".
Cursor
Add to ~/.cursor/mcp.json (or per-workspace <workspace>/.cursor/mcp.json):
{
"mcpServers": {
"stemsplit": {
"command": "npx",
"args": ["-y", "stemsplit-mcp"],
"env": {
"STEMSPLIT_API_KEY": "sk_live_your_key_here"
}
}
}
}Cline, Windsurf, Zed, others
Any MCP client that supports stdio-launched servers works. Use the same npx -y stemsplit-mcp command and pass STEMSPLIT_API_KEY via the client's env mechanism.
Get an API key
Sign up at stemsplit.io
Generate a key (format:
sk_live_...)Paste it into your MCP client config as shown above
Configuration
Env var | Required | Default | Description |
| Yes | — | API key, must start with |
| No |
| Override for self-hosted or staging |
| No |
| Base directory where stems are saved. Each job gets a |
Tool reference
separate_stems
Submit an audio file or direct URL for stem separation.
{
"source": "/Users/me/Music/song.mp3",
"outputType": "BOTH",
"quality": "BEST",
"outputFormat": "MP3",
"wait": true
}Field | Type | Default | Notes |
| string (required) | — | Local path (absolute or |
|
|
|
|
|
|
| |
|
|
| |
| boolean |
| Run the extracted vocals stem through Voice Cleaner (DeepFilterNet) after separation |
| string | derived | Display name for the job |
| boolean |
| If true, poll until done and download stems to disk |
| integer |
| Max wait when |
| integer |
| |
| string |
| Where to write stems |
Returns (wait=true):
{
"jobId": "job_abc123",
"status": "COMPLETED",
"creditsCharged": 180,
"outputDir": "/Users/me/Downloads/stemsplit/job_abc123",
"stems": {
"vocals": "/Users/me/Downloads/stemsplit/job_abc123/vocals.mp3",
"instrumental": "/Users/me/Downloads/stemsplit/job_abc123/instrumental.mp3"
}
}separate_youtube
Same shape, but takes youtubeUrl instead of source. Output is fixed to vocals + instrumental, MP3, BEST quality (this is the StemSplit API's contract for YouTube jobs).
clean_voice
Submit an audio file or direct URL for noise removal using DeepFilterNet. Removes background hum, hiss, HVAC noise, wind, echo, and other ambient sounds. By default (wait=true), polls until complete and downloads the cleaned audio to disk.
{
"source": "/Users/me/recordings/podcast-ep12.mp3",
"outputFormat": "MP3",
"wait": true
}Field | Type | Default | Notes |
| string (required) | — | Local path (absolute or |
|
|
| |
| string | derived | Display name for the job |
| boolean |
| If true, poll until done and download the cleaned file to disk |
| integer |
| Max wait when |
| integer |
| |
| string |
| Where to write the cleaned file |
Returns (wait=true):
{
"jobId": "dnz_abc123",
"status": "COMPLETED",
"creditsCharged": 180,
"outputDir": "/Users/me/Downloads/stemsplit/dnz_abc123",
"cleanedAudioPath": "/Users/me/Downloads/stemsplit/dnz_abc123/podcast-ep12_denoised.mp3"
}get_job, list_jobs, get_youtube_job, list_youtube_jobs, get_denoise_job, list_denoise_jobs, get_balance, download_stems
Thin wrappers over the corresponding StemSplit /api/v1 endpoints. download_stems re-fetches the job first to mint fresh 1-hour presigned URLs, so the expiry never matters. get_denoise_job returns outputs.audio.url when the job is COMPLETED.
Resources
Read-only context the LLM can pull on demand.
URI | Returns |
| Live credit balance |
| The 20 most recent stem jobs |
| Detail snapshot with fresh download URLs |
| YouTube job detail with fresh URLs |
| SoundCloud job detail with fresh URLs |
Prompts (slash commands)
Prompt | Argument | Behavior |
|
| Run |
|
| Run |
|
| Run |
|
| Run |
|
| Run |
|
| Run |
Example sessions
Karaoke from a local file (Claude Desktop):
Make a karaoke version of
~/Music/demo.mp3.
Claude calls separate_stems with outputType="BOTH", polls for ~60s, and returns:
Done. Karaoke (instrumental) is at:
/Users/me/Downloads/stemsplit/job_abc123/instrumental.mp3Six-stem sampler pack (Cursor):
Split
./loops/break.wavinto all six stems for sampling.
Cursor calls separate_stems with outputType="SIX_STEMS", quality="BEST", outputDir="./loops/break-stems", and reports each file path so you can drop them into your DAW.
Instrumental from YouTube:
Get me the instrumental of
https://youtu.be/dQw4w9WgXcQ.
Claude calls separate_youtube, polls until COMPLETED, downloads vocals.mp3 and instrumental.mp3 to ~/Downloads/stemsplit/<jobId>/, and returns the instrumental path.
Clean vocals for transcription (Claude Desktop):
Transcribe the lyrics from
~/Music/interview-with-music.mp3— there's a music bed underneath, clean it up first.
Claude calls separate_stems with outputType="VOCALS" to strip the music bed, then passes vocals.mp3 to a transcription tool (e.g. Whisper via another MCP server). The result is a clean transcript with none of the background music interfering.
Batch acapella extraction (Cursor agent):
Extract acapellas from every MP3 in
./tracks/and save them to./acapellas/.
Cursor iterates the directory, calls separate_stems with outputType="VOCALS" and a custom outputDir per file, and returns a list of acapella paths ready for a remix session or AI training dataset.
Vocal isolation → stems for remix (Claude Desktop):
I want to remix
~/Music/original.wav. Give me the acapella and all the individual instrument stems separately.
Claude calls separate_stems twice — once with outputType="VOCALS" for the clean acapella, once with outputType="SIX_STEMS" for the full stem pack — and hands back all seven file paths organized by stem type.
Clean up a noisy podcast recording (Claude Desktop):
My recording at
~/Desktop/episode-23.mp3has a lot of HVAC noise. Can you clean it up?
Claude calls clean_voice with source="~/Desktop/episode-23.mp3", polls for ~10–20s, and returns:
Done. Cleaned audio saved to:
/Users/me/Downloads/stemsplit/dnz_abc123/episode-23_denoised.mp3Separate stems and denoise vocals in one call (Cursor):
Extract the vocals from
./stems/song.wavand clean them up for use in a transcription pipeline.
Cursor calls separate_stems with outputType="VOCALS" and denoiseVocals=true. StemSplit runs Demucs to extract the vocals, then immediately passes them through DeepFilterNet — delivering a single noise-free vocals file in one job.
Voice Cleaner (noise removal)
Use clean_voice (or the /clean_voice slash command) to remove background noise from any audio file using StemSplit Voice Cleaner, powered by DeepFilterNet.
{
"source": "/Users/me/recordings/episode.mp3",
"outputFormat": "MP3"
}What it removes: background hum, HVAC/air conditioning noise, hiss, wind, echo, ambient room sound, and other non-speech noise. Ideal for podcasts, voiceovers, interview recordings, and dialogue.
Limits
Maximum duration: 60 minutes
Output: a single cleaned audio file (same duration as input)
Credits: 1 credit = 1 second of audio
Example (Claude Desktop)
Clean up the background noise in
~/Recordings/interview.wav.
Claude calls clean_voice, polls until COMPLETED (~10–30s depending on length), and returns:
Done. Cleaned audio saved to ~/Downloads/stemsplit/<jobId>/interview_denoised.wavYouTube stem separation
Use separate_youtube (or the /youtube_instrumental slash command) to extract vocals and an instrumental from any YouTube video.
{
"youtubeUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}Accepted URL formats
Format | Example |
Standard watch URL |
|
Short URL |
|
Embed URL |
|
Mobile URL |
|
Bare video ID |
|
Limits
Maximum duration: 60 minutes
Output: vocals + instrumental, MP3, BEST quality (fixed)
Credits: 1 credit = 1 second of video
Example (Claude Desktop)
Get me the instrumental of
https://youtu.be/dQw4w9WgXcQ.
Claude calls separate_youtube, polls until COMPLETED (~60s for a 3-minute video), and returns:
Done. Files saved to ~/Downloads/stemsplit/<jobId>/
vocals.mp3
instrumental.mp3SoundCloud stem separation
Use separate_soundcloud (or the /soundcloud_instrumental slash command) to extract vocals and an instrumental from any public SoundCloud track.
{
"soundcloudUrl": "https://soundcloud.com/artist/track-name"
}Accepted URL formats
Format | Example |
Standard track URL |
|
Mobile URL |
|
Short URL |
|
Limits
Maximum duration: 15 minutes
Must be a public track (private tracks and sets/playlists are not supported)
Output: vocals + instrumental, MP3, BEST quality (fixed)
Credits: 1 credit = 1 second of audio. When track duration is unknown at submission, 4 minutes (240 credits) is held and reconciled on completion.
Example (Claude Desktop)
Remove the vocals from
https://soundcloud.com/artist/my-track.
Claude calls separate_soundcloud, polls until COMPLETED, and returns:
Done. Files saved to ~/Downloads/stemsplit/<jobId>/
vocals.mp3
instrumental.mp3Example (Cursor agent)
Extract the acapella from every SoundCloud URL in
./tracks.txtand save each to./acapellas/.
Cursor reads the file, iterates the URLs, calls separate_soundcloud with outputDir set per track, and returns a list of all saved acapella paths.
Supported inputs
Local files:
mp3,wav,flac,m4a,ogg,webm,aac,wmaDirect URLs: any public
https://URL serving one of the formats above (the StemSplit API fetches it server-side)YouTube:
youtube.com/watch?v=...,youtu.be/...,youtube-nocookie.com/embed/..., or a bare 11-character video IDSoundCloud:
soundcloud.com/artist/track,m.soundcloud.com/artist/track, oron.soundcloud.com/shortcode(public tracks only, max 15 minutes)
Limits: 100 MB / 60 minutes per file. 1 credit = 1 second of audio. Credits are deducted at job submission.
Troubleshooting
Symptom | Fix |
| Set the env var in your MCP client config |
| Key must start with |
| The error includes a |
| Default per-key limit is 60 requests/minute. The error includes |
| Trim or compress the file. Limits are 100 MB and 60 minutes |
| Increase |
YouTube URL passed to | Use |
SoundCloud URL passed to | Use |
| Track is private, a playlist/set, or unavailable. Only public single tracks are supported |
Voice Cleaner job returns no | Job has not yet completed — call |
Development
git clone https://github.com/StemSplit/stemsplit-mcp
cd stemsplit-mcp
npm install
npm run typecheck
npm run lint
npm test
npm run build
STEMSPLIT_API_KEY=sk_live_... npm run inspectnpm run inspect launches the MCP Inspector for interactive testing.
FAQ
How do I remove vocals from a song in Claude Desktop?
Add the install snippet above to claude_desktop_config.json, restart Claude, then ask:
Remove the vocals from
~/Music/song.mp3.
Claude calls the separate_stems tool, waits for the job to complete (~30–60s for a 3-minute track), and hands back the local path to the instrumental file. Or use the /karaoke slash command directly.
Can this work with YouTube URLs?
Yes. Use the separate_youtube tool or the /youtube_instrumental slash command. The StemSplit API handles the YouTube download server-side and returns vocals + instrumental stems. Output is fixed to vocals + instrumental, MP3, BEST quality.
What stems can I extract?
Vocals, instrumental, drums, bass, other, piano, and guitar. Six-stem output (adding piano and guitar) requires quality=BEST and is only available for stem jobs (not YouTube jobs).
How is this different from the StemSplit web app?
The web app is point-and-click. This MCP server lets you orchestrate stem separation through natural-language prompts to an LLM, or programmatic tool calls from any MCP client. Same backend (HTDemucs / Demucs on GPU), different interface. Use the web app for one-off jobs; use the MCP server when you want to chain stem separation with other tools (transcription, translation, agentic pipelines) inside an LLM-driven workflow.
Does this run the AI model locally?
No. The MCP server is a local stdio process that talks to the StemSplit cloud API over HTTPS. Audio bytes are uploaded directly to Cloudflare R2 via presigned PUT (your API key never crosses the network with the audio). Stem separation runs on StemSplit's GPU workers. If you want fully local separation, look at demucs or demucs-onnx.
How much does it cost?
StemSplit uses a pay-per-second model: 1 credit = 1 second of audio. Credits are deducted at job submission. New accounts include free credits. Check current pricing at stemsplit.io/pricing.
What audio formats are supported?
Input: MP3, WAV, FLAC, M4A, OGG, WebM, AAC, WMA (up to 100 MB / 60 minutes). Output: MP3, WAV, or FLAC.
Where do the stems end up?
By default, in ~/Downloads/stemsplit/<jobId>/ with one file per stem. Override per-call with outputDir or globally with the STEMSPLIT_DEFAULT_OUTPUT_DIR env var.
Can I use this in a custom MCP client or LangChain agent?
Yes. stemsplit-mcp follows the MCP spec exactly. Any client that speaks the stdio transport works. For programmatic Node.js / TypeScript clients, see @modelcontextprotocol/sdk.
Can I remove background noise from a recording?
Yes. Use the clean_voice tool (or the /clean_voice prompt). It runs DeepFilterNet on your audio and returns the cleaned file. You can also pass denoiseVocals: true to separate_stems to denoise the extracted vocals stem automatically as part of a stem separation job.
What if the job takes longer than the timeout?
Pass wait: false to separate_stems, separate_youtube, or clean_voice. You'll get the jobId back immediately and can poll later with get_job / get_youtube_job / get_denoise_job. Or set a longer timeoutSeconds (up to 3600s).
How do I get an API key?
Sign up at stemsplit.io and generate a key at stemsplit.io/app/settings/api. The key format is sk_live_....
License
MIT (c) 2026 StemSplit
Related projects
StemSplit — hosted stem separation web app and API
StemSplit API docs — full REST reference + OpenAPI spec
n8n-nodes-stemsplit — n8n community node for stem separation workflows
stemsplit-python — Python SDK for the StemSplit API
stemsplit CLI — command-line tool (Go), available via Homebrew
demucs-onnx — ONNX export of HTDemucs for local inference
Model Context Protocol — the open protocol this server implements
Keywords
stem separation MCP, vocal remover MCP, karaoke generator MCP, voice cleaner MCP, noise removal MCP, background noise remover, DeepFilterNet MCP, Claude Desktop audio, Cursor audio tools, instrumental extractor, acapella extractor, AI stem splitter, MCP audio server, remove vocals from MP3, isolate vocals, split audio into stems, YouTube vocal remover, SoundCloud vocal remover, SoundCloud stem separator, SoundCloud instrumental extractor, HTDemucs MCP, Demucs MCP, MCP server for stem separation, podcast noise removal, audio cleanup AI.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/StemSplit/stemsplit-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server