Skip to main content
Glama

browser_focus_element

Focus on a specific web element using locator strategies like ID, CSS, or XPath to interact with page elements during browser automation.

Instructions

Focus on a specific element

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
byYesLocator strategy to find element
valueYesValue for the locator strategy
timeoutNoMaximum time to wait for element in milliseconds

Implementation Reference

  • Registration of the 'browser_focus_element' tool, including inline handler function that creates ActionService and calls focusElement
    server.tool('browser_focus_element', 'Focus on a specific element', { ...locatorSchema }, async ({ by, value }) => {
      try {
        const driver = stateManager.getDriver();
        const actionService = new ActionService(driver);
        await actionService.focusElement({ by, value });
        return {
          content: [{ type: 'text', text: `Focused on element` }],
        };
      } catch (e) {
        return {
          content: [
            {
              type: 'text',
              text: `Error focusing on element: ${(e as Error).message}`,
            },
          ],
        };
      }
    });
  • Inline handler function for the tool, which handles execution, error catching, and response formatting
    server.tool('browser_focus_element', 'Focus on a specific element', { ...locatorSchema }, async ({ by, value }) => {
      try {
        const driver = stateManager.getDriver();
        const actionService = new ActionService(driver);
        await actionService.focusElement({ by, value });
        return {
          content: [{ type: 'text', text: `Focused on element` }],
        };
      } catch (e) {
        return {
          content: [
            {
              type: 'text',
              text: `Error focusing on element: ${(e as Error).message}`,
            },
          ],
        };
      }
    });
  • Zod schema for locator parameters used in the tool's input schema
    export const locatorSchema = {
      by: z
        .enum(['id', 'css', 'xpath', 'name', 'tag', 'class', 'link', 'partialLink'])
        .describe('Locator strategy to find element'),
      value: z.string().describe('Value for the locator strategy'),
      timeout: z.number().optional().describe('Maximum time to wait for element in milliseconds'),
    };
  • Core implementation of focusing an element using Selenium WebDriver's executeScript to call focus() on the located element
    async focusElement(params: LocatorParams): Promise<void> {
      const locator = LocatorFactory.createLocator(params.by, params.value);
      const element = await this.driver.wait(until.elementLocated(locator), params.timeout || 15000);
      await this.driver.executeScript('arguments[0].focus();', element);
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It states the action but doesn't disclose behavioral traits such as whether focusing triggers events, if it waits for the element to be available (though the timeout parameter hints at this), what happens on failure (e.g., throws an error), or if it requires the browser to be in a specific state. This leaves gaps in understanding the tool's behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with no wasted words. It's front-loaded with the core action, making it easy to scan. Every word earns its place by conveying the essential purpose without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of browser automation and no annotations or output schema, the description is incomplete. It doesn't cover what 'focus' entails (e.g., bringing keyboard focus to an input field), potential side effects, error conditions, or how it fits into broader workflows with sibling tools. This leaves significant gaps for an AI agent to use it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with clear descriptions for 'by' (locator strategy), 'value' (value for the strategy), and 'timeout' (maximum wait time). The description adds no additional meaning beyond the schema, such as examples of valid values or how focusing interacts with these parameters. Baseline 3 is appropriate since the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Focus on a specific element' states the action (focus) and target (element), but it's vague about what 'focus' means in this browser automation context. It doesn't distinguish this from sibling tools like 'browser_click' or 'browser_hover', which might also involve element interaction. The purpose is understandable but lacks specificity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. With siblings like 'browser_click', 'browser_hover', and 'browser_wait_for_element', the description doesn't explain that focusing might be needed for keyboard input or form interactions, nor does it mention prerequisites like requiring the element to be visible or interactable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pshivapr/selenium-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server