Provides seamless integration with Logic Pro for professional music production workflows, including exporting audio from Logic Pro projects, processing with AI-powered stem separation, and importing processed stems back into Logic Pro for advanced mixing and arrangement
Utilizes FFmpeg for comprehensive audio format conversion and processing, supporting multiple audio formats (WAV, MP3, FLAC, AAC, M4A, OGG, WMA) and enabling high-quality audio file manipulation
Stem MCP Server 🎵
A comprehensive Model Context Protocol (MCP) server for professional AI-powered audio processing and stem manipulation. Designed specifically for music producers, audio engineers, and Logic Pro users who need advanced audio processing capabilities integrated with AI workflows.
Features 🚀
🎯 Core Audio Processing
🎤 AI Stem Generation: State-of-the-art source separation using Demucs models
✂️ Smart Audio Splitting: Intelligent segmentation with customizable overlap and fade options
🔄 Seamless Loop Creation: Professional loop generation with tempo matching and crossfading
📊 Advanced Audio Analysis: Deep musical feature extraction (tempo, key, spectral characteristics)
🎯 Precise Instrument Isolation: Extract specific instruments with multiple algorithms
🎵 Vocal Processing: Advanced vocal extraction and separation techniques
🎛️ Advanced Features
🎪 Multi-Vocal Range Separation: Split vocals into soprano, alto, tenor, bass ranges
🎼 Musical Structure Analysis: Detect beats, tempo, key signatures, and harmonic content
🔊 Dynamic Range Analysis: RMS energy, peak detection, loudness analysis
🎚️ Spectral Processing: Frequency domain analysis and manipulation
⚡ Batch Processing: Handle multiple files efficiently
🎨 Custom Processing Chains: Combine multiple tools for complex workflows
Supported Audio Formats
WAV, MP3, FLAC, AAC, M4A, OGG, WMA
AI Models
Demucs: State-of-the-art source separation models
htdemucs
(default): High-quality 4-stem separationhtdemucs_ft
: Fine-tuned varianthtdemucs_6s
: 6-stem separationmdx
: Alternative model architecturemdx_extra
: Enhanced MDX model
Installation 🔧
Prerequisites
Python 3.10 or higher (required for MCP compatibility)
FFmpeg (for audio processing)
CUDA-compatible GPU (optional, for faster processing)
Install FFmpeg
Install the MCP Server
Install Dependencies
Configuration ⚙️
MCP Client Configuration
Add this to your MCP client configuration (e.g., Claude Desktop):
Usage Examples 🎯
1. Generate Stems from Audio
Output: Separates audio into vocals, drums, bass, and other instruments.
2. Split Stems into Segments
Output: Creates 15-second segments with 2-second overlap.
3. Create Seamless Loops
Output: Creates an 8-second loop at 120 BPM with smooth crossfading.
4. Analyze Audio Features
Output:
5. Extract Vocals Only
6. Isolate Specific Instruments
API Reference 📚
Complete Tool Suite
🎤 generate_stems
State-of-the-art AI-powered source separation using Demucs models.
Parameters:
audio_path
(string, required): Path to input audio fileoutput_dir
(string, optional): Output directory (default: ".")model_type
(string, optional): Demucs model type"htdemucs"
(default): High-quality 4-stem separation"htdemucs_ft"
: Fine-tuned variant for enhanced quality"htdemucs_6s"
: 6-stem separation (vocals, drums, bass, piano, guitar, other)"mdx"
: Fast processing with good quality"mdx_extra"
: Enhanced MDX model
num_stems
(integer, optional): Number of output stems (2-6, default: 4)
Output: Generates separate audio files for each stem (vocals, drums, bass, other)
✂️ split_stems
Intelligent audio segmentation with customizable parameters.
Parameters:
stem_path
(string, required): Path to audio file to splitoutput_dir
(string, optional): Output directory (default: ".")segment_length
(number, optional): Segment duration in seconds (1-300, default: 30)overlap
(number, optional): Overlap between segments in seconds (0-10, default: 0)
Features:
Smart segment boundary detection
Customizable overlap for smooth transitions
Preserves audio quality and metadata
🔄 create_loop
Professional seamless loop creation with advanced crossfading.
Parameters:
audio_path
(string, required): Path to input audiooutput_path
(string, optional): Output file path (auto-generated if not provided)loop_duration
(number, optional): Loop duration in seconds (0.5-60, default: 4)bpm
(number, optional): Target BPM (60-200, auto-detected if not specified)crossfade_duration
(number, optional): Crossfade length in seconds (0-2, default: 0.1)
Features:
Automatic tempo detection and matching
Smart beat-aligned loop points
Professional crossfading algorithms
Maintains musical timing and feel
📊 analyze_audio
Comprehensive musical and spectral analysis.
Parameters:
audio_path
(string, required): Path to audio file to analyze
Analysis Output:
Basic Properties: Duration, sample rate, channel configuration
Musical Features: Tempo (BPM), key signature, beat tracking
Spectral Analysis: Frequency content, spectral centroid, rolloff
Dynamic Range: RMS energy levels, peak detection
Audio Quality: Zero-crossing rate, harmonic content
🎤 extract_vocal
Advanced vocal extraction with multiple algorithms.
Parameters:
audio_path
(string, required): Path to input audiooutput_path
(string, optional): Output file path (auto-generated if not provided)method
(string, optional): Extraction algorithm"demucs"
(default): AI-powered high-quality separation"librosa"
: Traditional signal processing approach"spectral"
: Frequency domain processing
Features:
Multiple extraction algorithms for different use cases
High-quality vocal isolation
Preserves vocal character and dynamics
🎹 isolate_instrument
Precise instrument isolation using multiple techniques.
Parameters:
audio_path
(string, required): Path to input audioinstrument
(string, optional): Target instrument"vocals"
: Lead and backing vocals"drums"
: Full drum kit"bass"
: Bass guitar and synthesizers"guitar"
: Electric and acoustic guitars"piano"
: Piano and keyboard instruments"other"
: Remaining instruments
output_path
(string, optional): Output file pathmethod
(string, optional): Isolation technique"demucs"
: AI source separation"librosa"
: Signal processing"spectral"
: Frequency domain filtering
🎪 separate_vocal_ranges
NEW: Advanced vocal range separation for choir and multi-vocal arrangements.
Parameters:
audio_path
(string, required): Path to vocal audio fileoutput_dir
(string, optional): Output directory for range files
Output: Separate files for each vocal range:
Soprano: High female voices (C4-C6)
Alto: Low female voices (G3-E5)
Tenor: High male voices (C3-A4)
Bass: Low male voices (E2-E4)
Features:
Frequency-based intelligent separation
Preserves natural vocal characteristics
Ideal for choir arrangements and vocal analysis
🎵 extract_vocal_harmonies
NEW: Isolate and separate vocal harmonies from lead vocals.
Parameters:
audio_path
(string, required): Path to audio with vocal harmoniesoutput_dir
(string, optional): Directory for harmony filessensitivity
(number, optional): Harmony detection sensitivity (0.1-1.0, default: 0.5)
Features:
Separates lead vocals from harmonies
Maintains harmonic relationships
Perfect for remixing and vocal arrangement analysis
Performance Tips 🚀
Hardware Optimization
GPU: Use CUDA-compatible GPU for 10x faster processing
RAM: 16GB+ recommended for processing large files
Storage: SSD recommended for faster I/O operations
Processing Tips
File Format: Use WAV or FLAC for best quality
Sample Rate: 44.1kHz or 48kHz for optimal results
Batch Processing: Process multiple files in sequence for efficiency
Model Selection
htdemucs: Best general-purpose model
htdemucs_6s: Use for 6-stem separation (vocals, drums, bass, piano, guitar, residual)
mdx: Faster processing, slightly lower quality
Development 😠️
💻 Complete Project Structure
🔧 Development Environment Setup
Quick Start
Development Dependencies
🚀 Running in Development Mode
Basic Development Commands
Code Quality & Formatting
🧪 Testing & Quality Assurance
Test Categories
Unit Tests: Individual function and class testing
Integration Tests: MCP client-server communication
Audio Tests: Audio processing accuracy and quality
Performance Tests: Speed and memory usage benchmarks
Regression Tests: Ensure consistent outputs across versions
Running Tests
🔍 Debugging & Profiling
Debug Mode Features
Detailed logging at all processing stages
Audio processing step visualization
Memory usage tracking
Processing time measurements
Model loading and caching information
Performance Profiling
🤝 Contributing Guidelines
Development Workflow
Fork the repository and create your feature branch
Set up development environment with all dependencies
Write comprehensive tests for your changes
Follow code style guidelines (Black, flake8, mypy)
Update documentation for new features
Run full test suite before submitting
Submit pull request with detailed description
Code Style Standards
Python: Follow PEP 8 with Black formatting
Docstrings: Google-style docstrings for all public functions
Type Hints: Use type hints for all function parameters and returns
Comments: Clear, concise comments for complex logic
Error Handling: Comprehensive error handling with informative messages
Pull Request Checklist
☑️ All tests pass locally
☑️ Code follows style guidelines
☑️ Documentation is updated
☑️ New features have tests
☑️ No breaking changes (or clearly documented)
☑️ Performance impact assessed
☑️ Example usage provided
Professional Workflows 🎯
🎚️ Logic Pro Integration
Seamlessly integrate with Logic Pro for enhanced music production:
Complete Production Workflow
🎵 Export from Logic Pro
Export stereo mix or individual tracks
Use 24-bit/48kHz for best quality
Export as WAV or AIFF format
🤖 AI-Powered Processing
Generate high-quality stems using Demucs
Analyze musical content and structure
Extract specific instruments or vocal parts
Create seamless loops from any section
🎹 Import Back to Logic
Import processed stems as individual tracks
Use analyzed BPM data for tempo matching
Apply extracted loops to new compositions
Layer isolated instruments for creative arrangements
Advanced Production Techniques
🎭 Stem-Based Remixing
🎵 Vocal Production Chain
🎶 Loop Library Creation
🎼 Music Production Use Cases
🎵 For Producers
Stem Analysis: Understand song structure and arrangement
Remixing: Extract and manipulate individual elements
Sample Creation: Generate unique samples from existing tracks
Loop Building: Create custom loops for new productions
🎤 For Vocalists & Vocal Coaches
Vocal Isolation: Extract clean vocal tracks from mixes
Harmony Analysis: Study vocal arrangements and harmonies
Range Training: Separate and analyze different vocal ranges
Performance Analysis: Study vocal techniques and patterns
🎸 For Musicians
Instrument Learning: Isolate specific instruments for practice
Transcription: Extract clear instrument tracks for notation
Performance Study: Analyze playing techniques and arrangements
Cover Creation: Create backing tracks by removing specific instruments
🎧 For Audio Engineers
Mix Analysis: Understand frequency content and arrangement
Mastering Reference: Compare individual stems and their processing
Problem Solving: Isolate problematic elements in complex mixes
Quality Control: Analyze audio content and detect issues
🔀 Complete Integration Example
Scenario: Converting a Logic Pro song into stems for remixing
Result: Complete stem-based workflow with:
✅ Individual instrument tracks
✅ Seamless loops ready for new compositions
✅ Separated vocal ranges for detailed editing
✅ Extracted harmonies for remix work
✅ Complete musical analysis data
Advanced Troubleshooting 🔧
🚫 Common Issues & Solutions
Installation Problems
"ModuleNotFoundError: No module named 'demucs'"
"FFmpeg not found"
"MCP server not recognized"
Performance Issues
"CUDA out of memory"
"Slow processing speeds"
"High memory usage"
Audio Quality Issues
"Poor separation quality"
"Artifacts in output"
"Loops don't sound seamless"
File Format Issues
"Unsupported audio format"
"Audio file corrupted"
📝 Debugging Techniques
Enable Verbose Logging
Audio Processing Diagnostics
Performance Monitoring
🔍 Advanced Diagnostics
System Requirements Check
Audio System Diagnostics
📊 Performance Optimization Guide
🚀 Hardware Recommendations
Optimal System Configuration
CPU: Intel i7/i9 or AMD Ryzen 7/9 (8+ cores recommended)
RAM: 32GB+ for professional use, 16GB minimum
GPU: NVIDIA RTX 3060+ with 8GB+ VRAM (for CUDA acceleration)
Storage: SSD for audio files (NVMe preferred for large files)
OS: Linux or macOS for best performance, Windows 11 supported
Performance Benchmarks
Model Type | GPU (RTX 4090) | CPU (i9-12900K) | Memory Usage |
htdemucs | ~45s (3min song) | ~180s | 6GB VRAM / 8GB RAM |
htdemucs_6s | ~60s (3min song) | ~240s | 8GB VRAM / 12GB RAM |
mdx | ~25s (3min song) | ~90s | 4GB VRAM / 6GB RAM |
mdx_extra | ~30s (3min song) | ~120s | 5GB VRAM / 8GB RAM |
⚡ Optimization Strategies
Model Selection Guide
Batch Processing Optimization
Memory Management
📚 Additional Resources
🎵 Music Production Resources
Logic Pro User Guide: Apple's official documentation
Demucs Research Paper: "Music Source Separation in the Waveform Domain"
Audio Processing Theory: Understanding digital signal processing
MCP Specification: Model Context Protocol documentation
🔗 Community & Support
GitHub Issues: Report bugs and request features
Discussions: Share workflows and get community help
Discord: Real-time chat with other users (coming soon)
Blog: Regular updates and tutorials (coming soon)
💰 Commercial Use
This project is open source and free for both personal and commercial use under the MIT license. For enterprise support, custom integrations, or commercial licensing inquiries, please contact the maintainers.
📄 License
MIT License
Copyright (c) 2024 Stem MCP Server Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
🙏 Acknowledgments
🎆 Core Technologies
Demucs: State-of-the-art source separation by Meta Research
LibROSA: Comprehensive audio analysis library
PyTorch: Deep learning framework powering AI models
MCP Protocol: Model Context Protocol specification
SoundFile: Audio file I/O operations
🎵 Audio Processing Libraries
FFmpeg: Universal audio/video processing framework
NumPy & SciPy: Numerical computing foundations
scikit-learn: Machine learning utilities for audio analysis
Pydub: Simple audio manipulation toolkit
🔌 Integration Partners
Logic Pro: Apple's professional music production software
Claude Desktop: AI assistant with MCP support
Music Production Community: Producers, engineers, and musicians worldwide
👥 Contributors
Thanks to all contributors who have helped make this project better:
Core development team
Beta testers and early adopters
Community feedback and feature requests
Documentation and example contributors
🏆 Special Recognition
Meta Research: For developing and open-sourcing Demucs
Anthropic: For creating the MCP protocol and supporting AI-audio workflows
Apple: For Logic Pro integration possibilities
Open Source Community: For the foundation libraries that make this possible
🎆 Project Stats
📋 Languages: Python (primary), Shell scripting
📦 Dependencies: 15+ core libraries, 50+ total with dev dependencies
🤖 AI Models: 5+ Demucs variants supported
🎵 Audio Formats: 8+ supported input/output formats
⚙️ Tools: 8+ MCP tools for comprehensive audio processing
📊 Performance: Up to 10x speed improvement with GPU acceleration
🌍 Platform Support: macOS, Linux, Windows
🎵 Happy Music Making! 🎵
Transform your audio with AI-powered precision
Built with ♥️ for music producers, audio engineers, and creative professionals
🎆 Powered by Demucs • 🤖 Enhanced by AI • 🎹 Designed for Logic Pro
This server cannot be installed
local-only server
The server can only run on the client's local machine because it depends on local resources.
Enables AI-powered audio processing including stem separation, vocal extraction, loop creation, and musical analysis using state-of-the-art Demucs models. Designed for music producers and audio engineers working with Logic Pro and other DAWs.