keyboard
Send keyboard input to any window: type text, press key combos (ctrl+c), or execute multi-step sequences (alt+i then m) with auto-focus and safety guards.
Instructions
Purpose: Send keyboard input to a window: 'type' for text, 'press' for key combos, 'sequence' for atomic multi-step chords. Details: action='type' inserts text (auto-clipboard for non-ASCII / IME-safe). action='press' sends key combos like 'ctrl+c'/'alt+tab'. action='sequence' runs ordered steps in one keyboard lock — use for Alt+letter, letter mnemonic chains where intermediate tool calls would close the menu. Pass windowTitle to auto-focus and auto-guard (identity, foreground, modal) before input. Omitting windowTitle acts on the active window (unguarded). Prefer: Use windowTitle to auto-focus before injection. Set lensId for perception guards. Use desktop_act({action:'setValue'}) for UIA ValuePattern text fields. Caveats: win+r/win+x/win+s/win+l blocked. action='type' does not handle CJK IME composition — use use_clipboard=true or desktop_act({action:'setValue'}). Non-ASCII text (CJK / emoji / diacritics / smart-quote-class punctuation) auto-clipboards to prevent silent-drop and Chrome accelerator hijack; pass forceKeystrokes:true to disable. Background (PostMessage/WM_CHAR) auto-engages for terminal-class windows (Windows Terminal / cmd / PowerShell); DTM_BG_AUTO=1 enables globally. Foreground non-terminal type runs a per-chunk leash; user focus-steal mid-stream aborts with FocusLostDuringType + context.typed/remaining; pass abortOnFocusLoss:false to disable. BG type verifies WM_CHAR via UIA TextPattern read-back; mismatch returns BackgroundInputNotDelivered (see SUGGESTS for false-positive notes). BG press read-back is scoped to terminal-class + enter/tab/arrow; other combos return verifyDelivery:'unverifiable', failure returns BackgroundKeyNotDelivered. action='sequence' is FG-only (BG/foreground_flash schema-rejected); emits verifyDelivery:'focus_only'; mid-loop focus theft returns MenuFocusLostMidSequence + context.remaining: Step[]. Win11 FG refusal returns ForegroundRestricted — terminal-class targets auto-engage BG; non-terminal switch to desktop_act / click_element. Examples: keyboard({action:'type', text:'hello', windowTitle:'Notepad'}) → text injected (guarded) keyboard({action:'type', text:'hello'}) → text injected (unguarded) keyboard({action:'press', keys:'ctrl+c'}) → copy keyboard({action:'press', keys:'escape', windowTitle:'Dialog'}) → dismiss dialog keyboard({action:'sequence', steps:[{keys:'alt+i', gapMs:100},{keys:'m'}], windowTitle:'Microsoft Visual Basic'}) → Insert > Module (atomic)
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| action | Yes | Action selector — one of: type, press, sequence. Per-action required fields are enforced at call time (see the tool description); this flat schema lists every action's fields as optional. | |
| text | No | The text to type (max 10,000 characters) | |
| method | No | Input method. background = WM_CHAR PostMessage (no focus change); foreground = SendInput (current default); auto = pick automatically. | auto |
| narrate | No | Narration level. rich includes UIA or browser state diff when supported. | minimal |
| use_clipboard | No | If true, copy text to clipboard and paste with Ctrl+V instead of simulating keystrokes. Use this when typing URLs, paths, or ASCII text into apps with Japanese IME active — prevents IME from converting characters. Default false. | |
| replaceAll | No | When true, send Ctrl+A to select all existing text before typing. Equivalent to Ctrl+A → keyboard(action='type') in one call (requires field already focused). Default false. | |
| forceKeystrokes | No | When true, always use keystroke mode even if text contains non-ASCII content (CJK, emoji, diacritics, em-dash, smart quotes, etc.) that would normally trigger auto-clipboard. Default false — auto-clipboard is enabled. | |
| windowTitle | No | Partial title of the window that should receive keyboard input. | |
| hwnd | No | Direct window handle ID (takes precedence over windowTitle). Obtain from get_windows response (hwnd field). String type to avoid 64-bit precision issues. | |
| forceFocus | No | Bypass Windows foreground-stealing protection before focusing. | |
| trackFocus | No | Detect if focus was stolen after the action. | |
| settleMs | No | Milliseconds to wait before checking post-action state. | |
| lensId | No | Optional perception lens ID. Guards (safe.keyboardTarget) are evaluated before typing, and a perception envelope is attached to post.perception on success. | |
| fixId | No | Approve a pending suggestedFix (one-shot, 15s TTL). Pass the fixId returned by a previous failed keyboard(action='type') to re-attempt with guard-validated args. | |
| abortOnFocusLoss | No | Focus Leash Phase B: when true, the foreground keystroke send is split into chunks (default 8 chars; override via DTM_LEASH_CHUNK_SIZE env) and the target window's foreground state is verified between chunks. If the user grabs focus mid-stream, the call aborts and returns FocusLostDuringType with context.typed (chars delivered to target) and context.remaining (unsent tail) so the caller can re-focus and retry the unsent portion. Default: true when windowTitle is provided, false otherwise. Has no effect on the clipboard path (atomic Ctrl+V) or the BG (WM_CHAR) path (HWND-targeted, foreground-independent). | |
| forceImeOff | No | Issue #245 系統②: when true, query the target window's IME open-status via Imm32 before typing; if ON, switch OFF for the duration of this call and restore the prior state in `finally`. Prevents silent romaji conversion when the user's Japanese IME is active but the LLM is typing ASCII commands. Requires `windowTitle` or `hwnd` (otherwise no target to query). Default false — existing use_clipboard auto-promotion still handles non-ASCII symbols transparently. No-op when the addon predates the IMM bridge (call proceeds with whatever IME state is in effect). | |
| include | No | Optional response-shape opt-in. `['envelope']` returns the self-documenting envelope (`_version` / `data` / `as_of` / `confidence`). `['raw']` forces raw shape (overrides DESKTOP_TOUCH_ENVELOPE=1 server default). Default behaviour is raw shape (compat with existing clients). | |
| keys | No | Key combo string, e.g. 'ctrl+c', 'alt+tab', 'enter', 'ctrl+shift+s'. Note: win+r, win+x, win+s, win+l are blocked for security. | |
| steps | No | Ordered list of key-press steps. Min 1, max 16. Total duration must not exceed 5000ms (excludes settleMs and focus acquisition). N=1 is allowed but inherits the sequence verification contract (hints.verifyDelivery.status='focus_only'); if you want the stricter keyboard:press contract, call keyboard({action:'press', keys}) directly (issue #278, matrix doc §3.1). |