Voice Mode

MIT License

401

Overview InspectNew Endpoints Schema Related Servers Reviews Score

whisper-coreml.md•6.74 kB

# Whisper CoreML Acceleration Technical documentation for CoreML acceleration in VoiceMode's Whisper implementation. ## Overview CoreML acceleration provides significant performance improvements for Whisper speech-to-text on Apple Silicon Macs, delivering 2-3x faster transcription speeds compared to Metal-only acceleration. ## Performance Comparison | Acceleration Type | Processing Speed | Relative Performance | CPU Usage | |------------------|------------------|---------------------|-----------| | CPU Only | ~1x (baseline) | 100% | High | | Metal (GPU) | ~3-4x | 300-400% | Medium | | CoreML + Metal | ~8-12x | 800-1200% | Low | *Example: Processing 7.6 seconds of audio in 0.2 seconds (38x real-time) with base.en model* ## Technical Architecture ### CoreML Integration VoiceMode uses whisper.cpp with CoreML support through: 1. **Model Conversion**: GGML models are converted to CoreML format during installation 2. **Runtime Selection**: whisper.cpp automatically selects CoreML when available 3. **Fallback Support**: Falls back to Metal acceleration if CoreML fails ### Compilation Requirements CoreML acceleration requires specific build-time dependencies: - **Full Xcode Installation**: Command Line Tools alone insufficient - **Core ML Tools**: For model conversion (coremltools Python package) - **PyTorch**: Required for model conversion pipeline (~2.5GB download) ### Installation Process 1. **Dependency Check**: Verify Xcode and coremlc availability 2. **PyTorch Installation**: Download and install PyTorch via pip 3. **Model Download**: Fetch GGML model from Hugging Face 4. **CoreML Conversion**: Convert GGML → CoreML using coremltools 5. **Verification**: Test CoreML model loading and performance ## Current Implementation Status ### Why CoreML is Currently Disabled As of the current installer, CoreML installation is commented out due to: ```bash # Lines 841-842 and 871-872 in install script # DISABLED: CoreML build issues - users getting errors at 3:30am # setup_coreml_acceleration ``` ### Issues Encountered 1. **Late-night Installation Failures**: Users reported errors during off-hours 2. **Xcode Detection Problems**: Inconsistent Xcode installation detection 3. **PyTorch Download Timeouts**: Large download size causing timeouts 4. **Dependency Conflicts**: Version conflicts between system and pip packages ## Re-enabling CoreML (Technical Guide) ### 1. Prerequisites Verification ```bash # Check for full Xcode installation if [[ -f "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/coremlc" ]]; then echo "✓ Full Xcode detected" else echo "✗ Full Xcode required (not just Command Line Tools)" fi # Verify Xcode license acceptance xcodebuild -checkFirstLaunchStatus ``` ### 2. Model Installation Commands #### CLI Method ```bash # Install base model with CoreML conversion voicemode whisper model install base.en --skip-core-ml=false --install-torch=true # Force reinstallation with CoreML voicemode whisper model install base.en --force-download --install-torch=true ``` #### MCP Tool Method ```python whisper_model_install( model="base.en", force_download=True, skip_core_ml=False, install_torch=True, auto_confirm=True ) ``` ### 3. Troubleshooting Common Issues #### "CoreML not available" Error - Verify Xcode installation path: `/Applications/Xcode.app/Contents/Developer` - Check coremlc binary exists and is executable - Ensure xcode-select points to full Xcode, not Command Line Tools #### PyTorch Installation Failures - Use system Python instead of conda/pyenv - Clear pip cache: `pip cache purge` - Install with specific index: `--extra-index-url https://download.pytorch.org/whl/cpu` #### Model Conversion Failures - Verify adequate disk space (>5GB free) - Check /tmp directory permissions - Monitor conversion logs for specific error messages ## File Locations ### Models Directory ``` ~/.voicemode/services/whisper/models/ ├── ggml-base.en.bin # Original GGML model ├── ggml-base.en-encoder.mlmodel # CoreML encoder └── ggml-base.en-decoder.mlmodel # CoreML decoder (if applicable) ``` ### Configuration Files - Service config: `~/.voicemode/services/whisper/config.json` - Model preferences: `~/.voicemode/whisper-models.txt` - Installation logs: `~/.voicemode/services/whisper/logs/` ## Performance Tuning ### Model Selection for CoreML | Model Size | GGML Size | CoreML Size | Accuracy | Speed | Use Case | |------------|-----------|-------------|----------|--------|----------| | tiny.en | 39MB | ~150MB | Good | Fastest| Real-time | | base.en | 142MB | ~400MB | Better | Fast | Balanced | | small.en | 466MB | ~1.2GB | Best | Good | Accuracy | ### Runtime Optimization ```bash # Set optimal thread count (usually CPU cores) export WHISPER_CPP_THREADS=8 # Enable CoreML explicitly (if detection fails) export WHISPER_COREML=1 # Set model path explicitly export WHISPER_MODEL_PATH="~/.voicemode/services/whisper/models/ggml-base.en.bin" ``` ## Development Notes ### Code Locations - **MCP Tool**: `voice_mode/tools/services/whisper.py` - **Model Management**: `voice_mode/tools/whisper_model_install.py` - **Configuration**: `voice_mode/config.py` - **Service Integration**: `voice_mode/providers.py` ### Future Improvements 1. **Better Error Handling**: More specific error messages for common failures 2. **Partial Installation**: Allow Metal-only fallback during CoreML failures 3. **Model Validation**: Verify CoreML model integrity after conversion 4. **Size Optimization**: Investigate quantization for smaller CoreML models 5. **Auto-retry Logic**: Automatic retry with fallback options ## Security Considerations - CoreML models run in Apple's secure sandbox - No additional network access required after installation - Models stored in user directory with standard permissions - No elevation required for CoreML runtime (unlike some CUDA setups) ## Monitoring and Logging ### Performance Metrics ```bash # Check actual acceleration being used voicemode whisper status # View performance statistics voicemode statistics # Monitor real-time processing tail -f ~/.voicemode/services/whisper/logs/performance.log ``` ### Debug Information ```bash # Enable debug logging export VOICEMODE_DEBUG=true # Test CoreML functionality voicemode whisper model test base.en # Benchmark different acceleration methods voicemode whisper benchmark --models base.en --acceleration metal,coreml ``` This document should be updated as CoreML support evolves and installation issues are resolved.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mbailey/voicemode'

If you have feedback or need assistance with the MCP directory API, please join our Discord server