Skip to main content
Glama

browser_press_key

Press a specified key in a browser session to automate keyboard interactions during web automation tasks.

Instructions

Press key

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
keyYes

Implementation Reference

  • index.js:391-396 (registration)
    Registration and handler setup for 'browser_press_key'. It uses `proxyToolCall` to delegate the actual operation to a browser client.
    server.tool('browser_press_key', 'Press key', { key: z.string() },
      async (args) => {
        const check = requireActivePage();
        if (check) return check;
        return proxyToolCall('browser_press_key', args);
      });
  • Helper function that executes tool calls on the connected browser client.
    async function proxyToolCall(toolName, args) {
      log(`[proxyToolCall] ${toolName} with args: ${JSON.stringify(args)}`);
      const { client } = await getOrCreateInstance();
      log(`[proxyToolCall] got client for port ${assignedPort}`);
    
      // Update last used
      if (assignedPort && instances.has(assignedPort)) {
        instances.get(assignedPort).lastUsed = Date.now();
      }
    
      try {
        log(`[proxyToolCall] Calling client.callTool...`);
        const result = await client.callTool({ name: toolName, arguments: args || {} });
        log(`[proxyToolCall] Result type: ${typeof result}`);
        log(`[proxyToolCall] Result: ${JSON.stringify(result).slice(0, 500)}`);
    
        // The SDK returns { content: [...], isError?: boolean }
        // We need to return this same format
        if (result && result.content) {
          return result;
        }
    
        // Fallback: wrap in content array if needed
        return {
          content: [{ type: 'text', text: JSON.stringify(result) }]
        };
      } catch (error) {
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Zero annotations provided, yet the description discloses no behavioral traits. It does not specify whether the key press targets the currently focused element, what key formats are accepted (e.g., 'Enter', 'Control'), or whether modifiers are supported. No mention of side effects like form submission.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

While brief at two words, this is severe under-specification masquerading as conciseness. The single 'sentence' fails to earn its place by providing actionable information to an LLM agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 1-parameter tool with no output schema or annotations, the description provides only a tautology. It lacks critical context: what key values are valid, whether the action requires a specific browser state, and how it differs from text input operations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% (no description on the 'key' parameter), and the description fails to compensate. It does not specify valid values (special keys vs. characters), expected format (case sensitivity), or examples of acceptable inputs.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose2/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Press key' is tautological given the tool name 'browser_press_key'. While it restates the action with different words, it fails to specify what kind of key (keyboard vs. UI element), distinguish from sibling 'browser_type', or clarify the browser automation context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines1/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance provided on when to use this versus 'browser_type' (which likely enters text) or 'browser_click'. No mention of prerequisites like having a focused element or when key presses trigger navigation vs. input.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/OMGEverdo/browser-pool-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server