Selenium MCP Server

Overview Schema Related Servers Score Discussions

browser_screenshot

Capture a screenshot of the current browser page. Save to a specified path or return base64 data for further processing.

Instructions

Take a screenshot of the current page

Input Schema

TableJSON Schema

Name	Required	Description	Default
`outputPath`	No	Optional path where to save the screenshot. If not provided, returns base64 data.

Implementation Reference

src/tools/actionTools.ts:483-523 (registration)

The tool 'browser_screenshot' is registered using server.tool() with a Zod schema for an optional outputPath parameter, and a handler that delegates to ActionService.takeScreenshot().

server.tool(
  'browser_screenshot',
  'Take a screenshot of the current page',
  {
    outputPath: z
      .string()
      .optional()
      .describe('Optional path where to save the screenshot. If not provided, returns base64 data.'),
  },
  async ({ outputPath }) => {
    try {
      const driver = stateManager.getDriver();
      const actionService = new ActionService(driver);
      const screenshot = await actionService.takeScreenshot();

      if (outputPath) {
        const fs = await import('fs/promises');
        await fs.writeFile(outputPath, screenshot, 'base64');
        return {
          content: [{ type: 'text', text: `Screenshot saved to ${outputPath}` }],
        };
      } else {
        return {
          content: [
            { type: 'text', text: 'Screenshot captured as base64:' },
            { type: 'text', text: screenshot },
          ],
        };
      }
    } catch (e) {
      return {
        content: [
          {
            type: 'text',
            text: `Error taking screenshot: ${(e as Error).message}`,
          },
        ],
      };
    }
  }
);

src/tools/actionTools.ts:492-522 (handler)

The handler function that executes the screenshot logic: gets the driver from stateManager, creates ActionService, calls takeScreenshot(), then optionally saves to file or returns base64 data.

async ({ outputPath }) => {
  try {
    const driver = stateManager.getDriver();
    const actionService = new ActionService(driver);
    const screenshot = await actionService.takeScreenshot();

    if (outputPath) {
      const fs = await import('fs/promises');
      await fs.writeFile(outputPath, screenshot, 'base64');
      return {
        content: [{ type: 'text', text: `Screenshot saved to ${outputPath}` }],
      };
    } else {
      return {
        content: [
          { type: 'text', text: 'Screenshot captured as base64:' },
          { type: 'text', text: screenshot },
        ],
      };
    }
  } catch (e) {
    return {
      content: [
        {
          type: 'text',
          text: `Error taking screenshot: ${(e as Error).message}`,
        },
      ],
    };
  }
}

src/tools/actionTools.ts:486-491 (schema)

The Zod validation schema for the tool: an optional 'outputPath' string parameter.

{
  outputPath: z
    .string()
    .optional()
    .describe('Optional path where to save the screenshot. If not provided, returns base64 data.'),
},

src/services/actionService.ts:125-127 (helper)
The takeScreenshot() helper method in ActionService which delegates to the Selenium WebDriver's takeScreenshot() method, returning a base64-encoded string.
```
async takeScreenshot(): Promise<string> {
  return this.driver.takeScreenshot();
}
```

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It does not specify behavioral details like whether the screenshot captures the visible viewport or the entire page, file format, or scrolling behavior. This is a significant gap for an agent invoking the tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence with no unnecessary words. It is front-loaded with the action. However, it may be too minimal, lacking details that could be added without becoming verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one optional parameter, the description is adequate but not complete. It does not clarify output format (return value if no outputPath), or any side effects. The lack of an output schema increases the need for description completeness, which is not fully met.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with one parameter (outputPath) fully described in the schema. The description adds no additional meaning beyond what the schema already provides. Baseline score of 3 is appropriate since the schema covers the parameter adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Take a screenshot' and the resource 'current page', making the action obvious. It distinguishes from sibling tools like browser_click or browser_navigate, but doesn't explicitly differentiate from potential similar tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No when-to-use or when-not-to-use guidance is provided. The description does not mention alternatives or prerequisites. For a straightforward tool, minimal guidance is acceptable, but some context about capturing visible area vs. full page would help.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pshivapr/selenium-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server