Voice Mode

whisper.md•5.63 KiB

# Whisper Speech-to-Text Service

Local speech recognition service that converts audio to text for voice-mode using OpenAI's Whisper model.

## Quick Start

```bash
# Install and configure whisper service
voice-mode whisper install

# List available models and their status
voice-mode whisper models

# Download a specific model
voice-mode whisper model install large-v2

# Set the active model
voice-mode whisper model active large-v2

# Start the service
voice-mode whisper start
```

Default endpoint: `http://127.0.0.1:2022/v1`

## Install

### macOS
```bash
# Install whisper.cpp
brew install whisper.cpp

# Download model
mkdir -p ~/.voicemode/models/whisper
cd ~/.voicemode/models/whisper
curl -LO https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v2.bin
```

### Linux
```bash
# Clone and build whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
make

# Download model
mkdir -p ~/.voicemode/models/whisper
cd ~/.voicemode/models/whisper
wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v2.bin
```

## Model Management

Voice Mode provides comprehensive model management commands:

### List Available Models
```bash
voice-mode whisper models
```
Shows all available Whisper models with:
- Installation status (✓ Installed or Download)
- Model sizes
- Language support (English only or Multilingual)
- Currently selected model (highlighted with →)

### Show/Set Active Model
```bash
# Show current active model
voice-mode whisper model active

# Set a different model as active
voice-mode whisper model active small.en
```
Note: After changing the active model, restart the whisper service for changes to take effect.

### Install Models
```bash
# Install default model (large-v2)
voice-mode whisper model install

# Install specific model
voice-mode whisper model install medium

# Install all available models
voice-mode whisper model install all

# Force re-download
voice-mode whisper model install large-v3 --force

# Skip Core ML conversion on Apple Silicon
voice-mode whisper model install large-v2 --skip-core-ml
```

### Remove Models
```bash
# Remove a specific model
voice-mode whisper model remove tiny

# Remove without confirmation
voice-mode whisper model remove tiny.en --force
```

### Available Models
- **tiny** (39 MB) - Fastest, least accurate
- **tiny.en** (39 MB) - Fastest English model
- **base** (142 MB) - Good balance of speed and accuracy
- **base.en** (142 MB) - Good English model
- **small** (466 MB) - Better accuracy, slower
- **small.en** (466 MB) - Better English accuracy
- **medium** (1.5 GB) - High accuracy, slow
- **medium.en** (1.5 GB) - High English accuracy
- **large-v1** (2.9 GB) - Original large model
- **large-v2** (2.9 GB) - Improved large model (recommended)
- **large-v3** (3.1 GB) - Latest large model
- **large-v3-turbo** (1.6 GB) - Faster large model with good accuracy

## Configure

Environment variables in `~/.voicemode/voicemode.env`:

```bash
VOICEMODE_WHISPER_MODEL=large-v2
VOICEMODE_WHISPER_PORT=2022
VOICEMODE_WHISPER_LANGUAGE=auto
VOICEMODE_WHISPER_MODEL_PATH=~/.voicemode/models/whisper
```

### LaunchAgent (macOS)

Create `~/Library/LaunchAgents/com.voicemode.whisper.plist`:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.voicemode.whisper</string>
    <key>ProgramArguments</key>
    <array>
        <string>/opt/homebrew/bin/whisper-server</string>
        <string>--host</string>
        <string>0.0.0.0</string>
        <string>--port</string>
        <string>2022</string>
        <string>--model</string>
        <string>/Users/YOUR_USERNAME/.voicemode/models/whisper/ggml-large-v2.bin</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
</dict>
</plist>
```

### Systemd Service (Linux)

Create `~/.config/systemd/user/whisper.service`:

```ini
[Unit]
Description=Whisper Speech-to-Text Service
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/whisper-server \
    --host 0.0.0.0 \
    --port 2022 \
    --model %h/.voicemode/models/whisper/ggml-large-v2.bin
Restart=always
RestartSec=10

[Install]
WantedBy=default.target
```

## Control

### macOS Commands

```bash
# Start service
launchctl load ~/Library/LaunchAgents/com.voicemode.whisper.plist

# Stop service  
launchctl unload ~/Library/LaunchAgents/com.voicemode.whisper.plist

# Restart service
launchctl unload ~/Library/LaunchAgents/com.voicemode.whisper.plist
launchctl load ~/Library/LaunchAgents/com.voicemode.whisper.plist

# Enable at startup
launchctl load -w ~/Library/LaunchAgents/com.voicemode.whisper.plist

# Disable at startup
launchctl unload -w ~/Library/LaunchAgents/com.voicemode.whisper.plist

# Check status
launchctl list | grep whisper
```

### Linux Commands

```bash
# Start service
systemctl --user start whisper

# Stop service
systemctl --user stop whisper  

# Restart service
systemctl --user restart whisper

# Enable at startup
systemctl --user enable whisper

# Disable at startup
systemctl --user disable whisper

# Check status
systemctl --user status whisper

# View logs
journalctl --user -u whisper -f
```

## Troubleshooting

### Service won't start
- Check if port 2022 is already in use: `lsof -i :2022`
- Verify model file exists at configured path
- Check logs for error messages

### Poor transcription quality
- Try a larger model (base → small → medium → large)
- Ensure audio input quality is good
- Set specific language instead of 'auto' if known

### High CPU usage
- Use a smaller model for better performance
- Consider using GPU acceleration if available

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mbailey/voicemode'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

whisper.md•5.63 KiB

# Whisper Speech-to-Text Service

Local speech recognition service that converts audio to text for voice-mode using OpenAI's Whisper model.

## Quick Start

```bash
# Install and configure whisper service
voice-mode whisper install

# List available models and their status
voice-mode whisper models

# Download a specific model
voice-mode whisper model install large-v2

# Set the active model
voice-mode whisper model active large-v2

# Start the service
voice-mode whisper start
```

Default endpoint: `http://127.0.0.1:2022/v1`

## Install

### macOS
```bash
# Install whisper.cpp
brew install whisper.cpp

# Download model
mkdir -p ~/.voicemode/models/whisper
cd ~/.voicemode/models/whisper
curl -LO https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v2.bin
```

### Linux
```bash
# Clone and build whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
make

# Download model
mkdir -p ~/.voicemode/models/whisper
cd ~/.voicemode/models/whisper
wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v2.bin
```

## Model Management

Voice Mode provides comprehensive model management commands:

### List Available Models
```bash
voice-mode whisper models
```
Shows all available Whisper models with:
- Installation status (✓ Installed or Download)
- Model sizes
- Language support (English only or Multilingual)
- Currently selected model (highlighted with →)

### Show/Set Active Model
```bash
# Show current active model
voice-mode whisper model active

# Set a different model as active
voice-mode whisper model active small.en
```
Note: After changing the active model, restart the whisper service for changes to take effect.

### Install Models
```bash
# Install default model (large-v2)
voice-mode whisper model install

# Install specific model
voice-mode whisper model install medium

# Install all available models
voice-mode whisper model install all

# Force re-download
voice-mode whisper model install large-v3 --force

# Skip Core ML conversion on Apple Silicon
voice-mode whisper model install large-v2 --skip-core-ml
```

### Remove Models
```bash
# Remove a specific model
voice-mode whisper model remove tiny

# Remove without confirmation
voice-mode whisper model remove tiny.en --force
```

### Available Models
- **tiny** (39 MB) - Fastest, least accurate
- **tiny.en** (39 MB) - Fastest English model
- **base** (142 MB) - Good balance of speed and accuracy
- **base.en** (142 MB) - Good English model
- **small** (466 MB) - Better accuracy, slower
- **small.en** (466 MB) - Better English accuracy
- **medium** (1.5 GB) - High accuracy, slow
- **medium.en** (1.5 GB) - High English accuracy
- **large-v1** (2.9 GB) - Original large model
- **large-v2** (2.9 GB) - Improved large model (recommended)
- **large-v3** (3.1 GB) - Latest large model
- **large-v3-turbo** (1.6 GB) - Faster large model with good accuracy

## Configure

Environment variables in `~/.voicemode/voicemode.env`:

```bash
VOICEMODE_WHISPER_MODEL=large-v2
VOICEMODE_WHISPER_PORT=2022
VOICEMODE_WHISPER_LANGUAGE=auto
VOICEMODE_WHISPER_MODEL_PATH=~/.voicemode/models/whisper
```

### LaunchAgent (macOS)

Create `~/Library/LaunchAgents/com.voicemode.whisper.plist`:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.voicemode.whisper</string>
    <key>ProgramArguments</key>
    <array>
        <string>/opt/homebrew/bin/whisper-server</string>
        <string>--host</string>
        <string>0.0.0.0</string>
        <string>--port</string>
        <string>2022</string>
        <string>--model</string>
        <string>/Users/YOUR_USERNAME/.voicemode/models/whisper/ggml-large-v2.bin</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
</dict>
</plist>
```

### Systemd Service (Linux)

Create `~/.config/systemd/user/whisper.service`:

```ini
[Unit]
Description=Whisper Speech-to-Text Service
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/whisper-server \
    --host 0.0.0.0 \
    --port 2022 \
    --model %h/.voicemode/models/whisper/ggml-large-v2.bin
Restart=always
RestartSec=10

[Install]
WantedBy=default.target
```

## Control

### macOS Commands

```bash
# Start service
launchctl load ~/Library/LaunchAgents/com.voicemode.whisper.plist

# Stop service  
launchctl unload ~/Library/LaunchAgents/com.voicemode.whisper.plist

# Restart service
launchctl unload ~/Library/LaunchAgents/com.voicemode.whisper.plist
launchctl load ~/Library/LaunchAgents/com.voicemode.whisper.plist

# Enable at startup
launchctl load -w ~/Library/LaunchAgents/com.voicemode.whisper.plist

# Disable at startup
launchctl unload -w ~/Library/LaunchAgents/com.voicemode.whisper.plist

# Check status
launchctl list | grep whisper
```

### Linux Commands

```bash
# Start service
systemctl --user start whisper

# Stop service
systemctl --user stop whisper  

# Restart service
systemctl --user restart whisper

# Enable at startup
systemctl --user enable whisper

# Disable at startup
systemctl --user disable whisper

# Check status
systemctl --user status whisper

# View logs
journalctl --user -u whisper -f
```

## Troubleshooting

### Service won't start
- Check if port 2022 is already in use: `lsof -i :2022`
- Verify model file exists at configured path
- Check logs for error messages

### Poor transcription quality
- Try a larger model (base → small → medium → large)
- Ensure audio input quality is good
- Set specific language instead of 'auto' if known

### High CPU usage
- Use a smaller model for better performance
- Consider using GPU acceleration if available