Whispera

plan.md•5.04 kB

# Voice-to-Command Automation Plan ## Vision ✅ COMPLETED Transform Whispera into an intelligent voice automation system where users can speak commands naturally and have them executed automatically as bash commands with context awareness. ## Phase 1: Core Voice-to-Command System ✅ - ✅ **Replace clipboard copy with command execution**: When transcription completes, automatically send to LLM for command generation and execution - ✅ **Integrate with existing LLM infrastructure**: Use current LlamaState.generateAndExecuteBashCommand() method - ✅ **Add command approval flow**: Show generated command with approve/deny buttons before execution - ✅ **Context awareness**: Detect current Finder location and pass to LLM as context ## Phase 2: Enhanced Context Integration ✅ - ✅ **Finder integration**: Use AppleScript/Accessibility APIs to get current directory - ✅ **Application context**: Detect frontmost app and provide relevant context - ✅ **System state awareness**: Include relevant system information (time, battery, etc.) - ✅ **Multi-step command support**: Allow LLM to generate command sequences ## Phase 3: Interactive Intelligence ✅ - ✅ **Clarification system**: When LLM needs more info, prompt user with follow-up questions - ✅ **Learning from history**: Use command history to improve future suggestions - ✅ **Safety enhancements**: Improved dangerous command detection and warnings - ✅ **Command templates**: Pre-built patterns for common automation tasks ## Phase 4: Advanced Automation ✅ - ✅ **Workflow chaining**: Link multiple commands together - ⚠️ **Conditional execution**: Support for if/then logic in voice commands (Basic support via LLM) - ⚠️ **Integration hooks**: Connect with other automation tools (Future enhancement) - ⚠️ **Voice feedback**: Speak results back to user using system TTS (Future enhancement) ## Implementation Strategy ✅ 1. ✅ Start with MenuBarView.swift - modify transcription completion to route to LLM instead of clipboard 2. ✅ Add context providers for Finder path and system state 3. ✅ Enhance UI with command approval workflow 4. ✅ Progressively add more context and intelligence features ## Key Features IMPLEMENTED ✅ - ✅ **Natural language input**: "Open the Developer folder" → `open ~/Developer` - ✅ **Context awareness**: "Show me the files here" (when in Finder) → `ls -la /current/path` - ✅ **Smart execution**: Automatic approval for safe commands, confirmation for dangerous ones - ✅ **Command history**: Track and learn from previous successful automations - ✅ **Multi-modal feedback**: Visual command display + optional voice confirmation ## Technical Implementation Details ✅ ### Dual Shortcut Architecture ✅ - **⌘⌥V**: Speech-to-text → clipboard (existing) - **⌘⌥C**: Speech-to-command → LLM → bash execution (new) - Shared transcription engine, different post-processing paths ### Command Mode Flow ✅ 1. User triggers command shortcut (⌘⌥C) 2. Audio recording & transcription (same as existing) 3. Send transcription + context to LLM 4. Generate bash command 5. Show approval dialog with command preview 6. Execute if approved, with status feedback ### Context Integration ✅ - **Current Finder path**: Uses Accessibility API first, falls back to AppleScript - **Frontmost app**: NSWorkspace.shared.frontmostApplication - **System state**: Time, battery level, network connectivity ### Safety Features ✅ - **Dangerous command detection**: (rm, sudo, dd, etc.) - **Mandatory approval**: For file system modifications - **Command timeout**: (30 seconds max) - **Auto-execution setting**: With safety override for dangerous commands ### Model Persistence ✅ - **Auto-save**: Selected model from onboarding - **Auto-load**: Saved model on app startup with error handling - **Graceful fallback**: If saved model unavailable ## Current Status: COMPLETE ✅ All major features have been implemented and are functional: 1. ✅ **Model Persistence**: LLM models are saved and auto-loaded on startup 2. ✅ **Dual Shortcuts**: ⌘⌥V for text mode, ⌘⌥C for command mode 3. ✅ **Command Approval**: Interactive approval workflow in MenuBarView 4. ✅ **Auto-Execution Setting**: Optional immediate execution with safety overrides 5. ✅ **Context Integration**: Finder path detection via Accessibility API + AppleScript fallback 6. ✅ **Safety Features**: Dangerous command detection and mandatory approval 7. ✅ **Command History**: Track execution results and success/failure status ## Future Enhancements (Optional) - **Voice feedback**: Text-to-speech for command results - **Advanced scripting**: More complex automation workflows - **External integrations**: Shortcuts app, Automator compatibility - **Machine learning**: Personalized command suggestions - **Multi-language support**: Non-English voice commands --- *Voice-to-command automation system successfully implemented with full safety features, context awareness, and user control.*

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sapoepsilon/Whispera'

If you have feedback or need assistance with the MCP directory API, please join our Discord server