Skip to main content
Glama

tauri_webview_keyboard

Simulate keyboard input in Tauri applications by typing text or sending key events to automate UI testing and debugging workflows.

Instructions

[Tauri Apps Only] Type text or send keyboard events in a Tauri app. Requires active tauri_driver_session. Targets the only connected app, or the default app if multiple are connected. Specify appIdentifier (port or bundle ID) to target a specific app. For browser keyboard input, use Chrome DevTools MCP instead.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
windowIdNoWindow label to target (defaults to "main")
appIdentifierNoApp port or bundle ID to target. Defaults to the only connected app or the default app if multiple are connected.
actionYesKeyboard action type: "type" for typing text into an element, "press/down/up" for key events
selectorNoCSS selector for element to type into (required for "type" action)
textNoText to type (required for "type" action)
keyNoKey to press (required for "press/down/up" actions, e.g., "Enter", "a", "Escape")
modifiersNoModifier keys to hold

Implementation Reference

  • Core handler function that parses arguments, builds appropriate JavaScript script using buildTypeScript or buildKeyEventScript, and executes it in the Tauri webview via executeInWebview.
    export async function keyboard(options: KeyboardOptions): Promise<string> {
       const { action, selectorOrKey, textOrModifiers, modifiers, windowId, appIdentifier } = options;
    
       // Handle the different parameter combinations based on action
       if (action === 'type') {
          const selector = selectorOrKey;
    
          const text = textOrModifiers as string;
    
          if (!selector || !text) {
             throw new Error('Type action requires both selector and text parameters');
          }
    
          const script = buildTypeScript(selector, text);
    
          try {
             return await executeInWebview(script, windowId, appIdentifier);
          } catch(error: unknown) {
             const message = error instanceof Error ? error.message : String(error);
    
             throw new Error(`Type action failed: ${message}`);
          }
       }
    
       // For press/down/up actions: key is required, modifiers optional
       const key = selectorOrKey;
    
       const mods = Array.isArray(textOrModifiers) ? textOrModifiers : modifiers;
    
       if (!key) {
          throw new Error(`${action} action requires a key parameter`);
       }
    
       const script = buildKeyEventScript(action, key, mods || []);
    
       try {
          return await executeInWebview(script, windowId, appIdentifier);
       } catch(error: unknown) {
          const message = error instanceof Error ? error.message : String(error);
    
          throw new Error(`Keyboard action failed: ${message}`);
       }
    }
  • Zod schema defining input parameters for the tauri_webview_keyboard tool.
    export const KeyboardSchema = WindowTargetSchema.extend({
       action: z.enum([ 'type', 'press', 'down', 'up' ])
          .describe('Keyboard action type: "type" for typing text into an element, "press/down/up" for key events'),
       selector: z.string().optional().describe('CSS selector for element to type into (required for "type" action)'),
       text: z.string().optional().describe('Text to type (required for "type" action)'),
       key: z.string().optional().describe('Key to press (required for "press/down/up" actions, e.g., "Enter", "a", "Escape")'),
       modifiers: z.array(z.enum([ 'Control', 'Alt', 'Shift', 'Meta' ])).optional().describe('Modifier keys to hold'),
    });
  • Tool registration in the central TOOLS array, including description, schema reference, annotations, and inline handler wrapper that invokes the core keyboard() function.
    {
       name: 'tauri_webview_keyboard',
       description:
          '[Tauri Apps Only] Type text or send keyboard events in a Tauri app. ' +
          'Requires active tauri_driver_session. ' +
          MULTI_APP_DESC + ' ' +
          'For browser keyboard input, use Chrome DevTools MCP instead.',
       category: TOOL_CATEGORIES.UI_AUTOMATION,
       schema: KeyboardSchema,
       annotations: {
          title: 'Keyboard Input in Tauri',
          readOnlyHint: false,
          destructiveHint: false,
          openWorldHint: false,
       },
       handler: async (args) => {
          const parsed = KeyboardSchema.parse(args);
    
          if (parsed.action === 'type') {
             return await keyboard({
                action: parsed.action,
                selectorOrKey: parsed.selector,
                textOrModifiers: parsed.text,
                windowId: parsed.windowId,
                appIdentifier: parsed.appIdentifier,
             });
          }
          return await keyboard({
             action: parsed.action,
             selectorOrKey: parsed.key,
             textOrModifiers: parsed.modifiers,
             windowId: parsed.windowId,
             appIdentifier: parsed.appIdentifier,
          });
       },
    },
  • Helper function that generates JavaScript code for typing text into a selected element (used for 'type' action).
    export function buildTypeScript(selector: string, text: string): string {
       const escapedText = text.replace(/\\/g, '\\\\').replace(/'/g, "\\'");
    
       return `
          (function() {
             const selector = '${selector}';
             const text = '${escapedText}';
    
             const element = document.querySelector(selector);
             if (!element) {
                throw new Error('Element not found: ' + selector);
             }
    
             element.focus();
             element.value = text;
             element.dispatchEvent(new Event('input', { bubbles: true }));
             element.dispatchEvent(new Event('change', { bubbles: true }));
    
             return 'Typed "' + text + '" into ' + selector;
          })()
       `;
    }
  • Helper function that generates JavaScript code for dispatching key events (keydown, keypress, keyup) with modifiers (used for 'press', 'down', 'up' actions).
    export function buildKeyEventScript(
       action: string,
       key: string,
       modifiers: string[] = []
    ): string {
       return `
          (function() {
             const action = '${action}';
             const key = '${key}';
             const modifiers = ${JSON.stringify(modifiers)};
    
             const eventOptions = {
                key: key,
                code: key,
                bubbles: true,
                cancelable: true,
                ctrlKey: modifiers.includes('Control'),
                altKey: modifiers.includes('Alt'),
                shiftKey: modifiers.includes('Shift'),
                metaKey: modifiers.includes('Meta'),
             };
    
             const activeElement = document.activeElement || document.body;
    
             if (action === 'press') {
                activeElement.dispatchEvent(new KeyboardEvent('keydown', eventOptions));
                activeElement.dispatchEvent(new KeyboardEvent('keypress', eventOptions));
                activeElement.dispatchEvent(new KeyboardEvent('keyup', eventOptions));
                return 'Pressed key: ' + key + (modifiers.length ? ' with ' + modifiers.join('+') : '');
             } else if (action === 'down') {
                activeElement.dispatchEvent(new KeyboardEvent('keydown', eventOptions));
                return 'Key down: ' + key + (modifiers.length ? ' with ' + modifiers.join('+') : '');
             } else if (action === 'up') {
                activeElement.dispatchEvent(new KeyboardEvent('keyup', eventOptions));
                return 'Key up: ' + key + (modifiers.length ? ' with ' + modifiers.join('+') : '');
             }
    
             throw new Error('Unknown action: ' + action);
          })()
       `;
    }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond annotations: it specifies the required session context, explains targeting logic for connected apps, and mentions the alternative tool for browser contexts. While annotations cover basic safety (readOnlyHint=false, destructiveHint=false), the description provides practical implementation details that help the agent understand when and how to use this tool effectively.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with four sentences that each serve distinct purposes: scope restriction, prerequisites, targeting logic, and alternative tool guidance. There is no wasted text, and key information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 7 parameters, 100% schema coverage, and no output schema, the description provides excellent contextual completeness. It covers scope, prerequisites, targeting behavior, and alternatives. The only minor gap is not explicitly mentioning what happens on successful execution, but given the comprehensive schema and annotations, this is acceptable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema already documents all parameters thoroughly. The description adds minimal parameter semantics beyond the schema, mainly clarifying the appIdentifier targeting behavior. This meets the baseline expectation when schema coverage is complete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Type text or send keyboard events') and resource ('in a Tauri app'), and distinguishes it from sibling tools by explicitly mentioning when to use Chrome DevTools MCP instead. It also specifies the required context ('Requires active tauri_driver_session').

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidelines: it states when to use this tool ('Tauri Apps Only'), when not to use it ('For browser keyboard input, use Chrome DevTools MCP instead'), and mentions prerequisites ('Requires active tauri_driver_session'). It also explains targeting behavior for single vs. multiple connected apps.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/hypothesi/mcp-server-tauri'

If you have feedback or need assistance with the MCP directory API, please join our Discord server