Skip to main content
Glama

keyboard

Send keyboard input to any window: type text, press key combos (ctrl+c), or execute multi-step sequences (alt+i then m) with auto-focus and safety guards.

Instructions

Purpose: Send keyboard input to a window: 'type' for text, 'press' for key combos, 'sequence' for atomic multi-step chords. Details: action='type' inserts text (auto-clipboard for non-ASCII / IME-safe). action='press' sends key combos like 'ctrl+c'/'alt+tab'. action='sequence' runs ordered steps in one keyboard lock — use for Alt+letter, letter mnemonic chains where intermediate tool calls would close the menu. Pass windowTitle to auto-focus and auto-guard (identity, foreground, modal) before input. Omitting windowTitle acts on the active window (unguarded). Prefer: Use windowTitle to auto-focus before injection. Set lensId for perception guards. Use desktop_act({action:'setValue'}) for UIA ValuePattern text fields. Caveats: win+r/win+x/win+s/win+l blocked. action='type' does not handle CJK IME composition — use use_clipboard=true or desktop_act({action:'setValue'}). Non-ASCII text (CJK / emoji / diacritics / smart-quote-class punctuation) auto-clipboards to prevent silent-drop and Chrome accelerator hijack; pass forceKeystrokes:true to disable. Background (PostMessage/WM_CHAR) auto-engages for terminal-class windows (Windows Terminal / cmd / PowerShell); DTM_BG_AUTO=1 enables globally. Foreground non-terminal type runs a per-chunk leash; user focus-steal mid-stream aborts with FocusLostDuringType + context.typed/remaining; pass abortOnFocusLoss:false to disable. BG type verifies WM_CHAR via UIA TextPattern read-back; mismatch returns BackgroundInputNotDelivered (see SUGGESTS for false-positive notes). BG press read-back is scoped to terminal-class + enter/tab/arrow; other combos return verifyDelivery:'unverifiable', failure returns BackgroundKeyNotDelivered. action='sequence' is FG-only (BG/foreground_flash schema-rejected); emits verifyDelivery:'focus_only'; mid-loop focus theft returns MenuFocusLostMidSequence + context.remaining: Step[]. Win11 FG refusal returns ForegroundRestricted — terminal-class targets auto-engage BG; non-terminal switch to desktop_act / click_element. Examples: keyboard({action:'type', text:'hello', windowTitle:'Notepad'}) → text injected (guarded) keyboard({action:'type', text:'hello'}) → text injected (unguarded) keyboard({action:'press', keys:'ctrl+c'}) → copy keyboard({action:'press', keys:'escape', windowTitle:'Dialog'}) → dismiss dialog keyboard({action:'sequence', steps:[{keys:'alt+i', gapMs:100},{keys:'m'}], windowTitle:'Microsoft Visual Basic'}) → Insert > Module (atomic)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
actionYesAction selector — one of: type, press, sequence. Per-action required fields are enforced at call time (see the tool description); this flat schema lists every action's fields as optional.
textNoThe text to type (max 10,000 characters)
methodNoInput method. background = WM_CHAR PostMessage (no focus change); foreground = SendInput (current default); auto = pick automatically.auto
narrateNoNarration level. rich includes UIA or browser state diff when supported.minimal
use_clipboardNoIf true, copy text to clipboard and paste with Ctrl+V instead of simulating keystrokes. Use this when typing URLs, paths, or ASCII text into apps with Japanese IME active — prevents IME from converting characters. Default false.
replaceAllNoWhen true, send Ctrl+A to select all existing text before typing. Equivalent to Ctrl+A → keyboard(action='type') in one call (requires field already focused). Default false.
forceKeystrokesNoWhen true, always use keystroke mode even if text contains non-ASCII content (CJK, emoji, diacritics, em-dash, smart quotes, etc.) that would normally trigger auto-clipboard. Default false — auto-clipboard is enabled.
windowTitleNoPartial title of the window that should receive keyboard input.
hwndNoDirect window handle ID (takes precedence over windowTitle). Obtain from get_windows response (hwnd field). String type to avoid 64-bit precision issues.
forceFocusNoBypass Windows foreground-stealing protection before focusing.
trackFocusNoDetect if focus was stolen after the action.
settleMsNoMilliseconds to wait before checking post-action state.
lensIdNoOptional perception lens ID. Guards (safe.keyboardTarget) are evaluated before typing, and a perception envelope is attached to post.perception on success.
fixIdNoApprove a pending suggestedFix (one-shot, 15s TTL). Pass the fixId returned by a previous failed keyboard(action='type') to re-attempt with guard-validated args.
abortOnFocusLossNoFocus Leash Phase B: when true, the foreground keystroke send is split into chunks (default 8 chars; override via DTM_LEASH_CHUNK_SIZE env) and the target window's foreground state is verified between chunks. If the user grabs focus mid-stream, the call aborts and returns FocusLostDuringType with context.typed (chars delivered to target) and context.remaining (unsent tail) so the caller can re-focus and retry the unsent portion. Default: true when windowTitle is provided, false otherwise. Has no effect on the clipboard path (atomic Ctrl+V) or the BG (WM_CHAR) path (HWND-targeted, foreground-independent).
forceImeOffNoIssue #245 系統②: when true, query the target window's IME open-status via Imm32 before typing; if ON, switch OFF for the duration of this call and restore the prior state in `finally`. Prevents silent romaji conversion when the user's Japanese IME is active but the LLM is typing ASCII commands. Requires `windowTitle` or `hwnd` (otherwise no target to query). Default false — existing use_clipboard auto-promotion still handles non-ASCII symbols transparently. No-op when the addon predates the IMM bridge (call proceeds with whatever IME state is in effect).
includeNoOptional response-shape opt-in. `['envelope']` returns the self-documenting envelope (`_version` / `data` / `as_of` / `confidence`). `['raw']` forces raw shape (overrides DESKTOP_TOUCH_ENVELOPE=1 server default). Default behaviour is raw shape (compat with existing clients).
keysNoKey combo string, e.g. 'ctrl+c', 'alt+tab', 'enter', 'ctrl+shift+s'. Note: win+r, win+x, win+s, win+l are blocked for security.
stepsNoOrdered list of key-press steps. Min 1, max 16. Total duration must not exceed 5000ms (excludes settleMs and focus acquisition). N=1 is allowed but inherits the sequence verification contract (hints.verifyDelivery.status='focus_only'); if you want the stricter keyboard:press contract, call keyboard({action:'press', keys}) directly (issue #278, matrix doc §3.1).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden and excels. It details auto-clipboard for non-ASCII, foreground/background methods, focus loss handling, blocked combos, error conditions, and verification. Very transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose (multiple paragraphs) with some redundancy (auto-clipboard mentioned in multiple spots). Structure is logical (purpose, details, caveats, examples), but could be more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 19 parameters, no annotations, and no output schema, the description is remarkably complete. It covers all actions, error cases, prerequisites, and provides examples. An agent has sufficient information to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents parameters. The description adds significant context beyond the schema (e.g., auto-clipboard behavior, focus leash, sequence limitations), justifying above baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Send keyboard input to a window' and enumerates three actions (type, press, sequence) with distinct purposes. It differentiates from siblings by mentioning desktop_act for UIA ValuePattern fields. Examples reinforce purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use each action and when to use alternatives (e.g., 'Use desktop_act for UIA ValuePattern text fields'). It includes caveats and examples, but does not exhaustively list when not to use the tool vs. all sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Harusame64/desktop-touch-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server