Skip to main content
Glama

browser_screenshot

Capture webpage screenshots for documentation, testing, or debugging using Selenium WebDriver automation.

Instructions

Take a screenshot of the current page

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
outputPathNoOptional path where to save the screenshot. If not provided, returns base64 data.

Implementation Reference

  • MCP server.tool registration and inline handler for 'browser_screenshot'. Defines input schema, executes screenshot capture via ActionService, and returns either file save confirmation or base64 image data in the tool response.
      'browser_screenshot',
      'Take a screenshot of the current page',
      {
        outputPath: z
          .string()
          .optional()
          .describe('Optional path where to save the screenshot. If not provided, returns base64 data.'),
      },
      async ({ outputPath }) => {
        try {
          const driver = stateManager.getDriver();
          const actionService = new ActionService(driver);
          const screenshot = await actionService.takeScreenshot();
    
          if (outputPath) {
            const fs = await import('fs/promises');
            await fs.writeFile(outputPath, screenshot, 'base64');
            return {
              content: [{ type: 'text', text: `Screenshot saved to ${outputPath}` }],
            };
          } else {
            return {
              content: [
                { type: 'text', text: 'Screenshot captured as base64:' },
                { type: 'text', text: screenshot },
              ],
            };
          }
        } catch (e) {
          return {
            content: [
              {
                type: 'text',
                text: `Error taking screenshot: ${(e as Error).message}`,
              },
            ],
          };
        }
      }
    );
  • Zod input schema for the browser_screenshot tool: optional outputPath string.
    {
      outputPath: z
        .string()
        .optional()
        .describe('Optional path where to save the screenshot. If not provided, returns base64 data.'),
    },
  • Core screenshot capture method in ActionService using Selenium WebDriver's driver.takeScreenshot().
    async takeScreenshot(): Promise<string> {
      return this.driver.takeScreenshot();
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but only states the basic action. It doesn't disclose behavioral traits like whether it captures the full page or viewport, file format (e.g., PNG), permissions needed, potential side effects (e.g., pauses execution), or error conditions (e.g., if no page is open).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence with zero wasted words. It's front-loaded with the core action and efficiently communicates the essential purpose without unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete for a tool that performs a potentially complex operation. It lacks details on return values (e.g., base64 data structure), behavioral constraints, or error handling, which are crucial for an agent to use it effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, with the single parameter 'outputPath' fully documented in the schema. The description adds no parameter-specific information beyond what the schema provides, so it meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Take a screenshot') and the target ('of the current page'), which is a specific verb+resource combination. However, it doesn't explicitly differentiate from potential screenshot-related siblings (though none exist in the provided list), so it doesn't reach the highest score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., requires an open browser page), exclusions, or comparisons to other tools in the browser suite, leaving the agent to infer context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pshivapr/selenium-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server