Skip to main content
Glama
MesuterPikin

Browserbase MCP Server

by MesuterPikin

browserbase_screenshot

Capture full-page screenshots of websites for documentation, testing, or analysis purposes. This tool saves screenshots as resources for later reference.

Instructions

Capture a full-page screenshot and return it (and save as a resource).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nameNoThe name of the screenshot

Implementation Reference

  • The core handler function `handleScreenshot` for the `browserbase_screenshot` tool. Captures a full-page screenshot of the active page using Chrome DevTools Protocol's Page.captureScreenshot, resizes the PNG image using Sharp if it exceeds Claude's vision API limits (1568px edge, 1.15MP), generates a timestamped name, registers the base64 image as an MCP resource via `registerScreenshot`, notifies the server of resource changes, and returns a ToolResult with text and image content.
    async function handleScreenshot(
      context: Context,
      params: ScreenshotInput,
    ): Promise<ToolResult> {
      const action = async (): Promise<ToolActionResult> => {
        try {
          const stagehand = await context.getStagehand();
          const page = stagehand.context.pages()[0];
    
          if (!page) {
            throw new Error("No active page available");
          }
    
          // We're taking a full page screenshot to give context of the entire page, similar to a snapshot
          // Enable Page domain if needed
          await page.sendCDP("Page.enable");
    
          // Use CDP to capture screenshot
          const { data } = await page.sendCDP<{ data: string }>(
            "Page.captureScreenshot",
            {
              format: "png",
              fromSurface: true,
            },
          );
    
          // data is already base64 string from CDP
          let screenshotBase64 = data;
    
          // Scale down image if needed for Claude's vision API
          // Claude constraints: max 1568px on any edge AND max 1.15 megapixels
          // Reference: https://docs.anthropic.com/en/docs/build-with-claude/vision#evaluate-image-size
          const imageBuffer = Buffer.from(data, "base64");
          const metadata = await sharp(imageBuffer).metadata();
    
          if (metadata.width && metadata.height) {
            const pixels = metadata.width * metadata.height;
    
            // Min of: width constraint, height constraint, and megapixel constraint
            const shrink = Math.min(
              1568 / metadata.width,
              1568 / metadata.height,
              Math.sqrt((1.15 * 1024 * 1024) / pixels),
            );
    
            // Only resize if we need to shrink (shrink < 1)
            if (shrink < 1) {
              const newWidth = Math.floor(metadata.width * shrink);
              const newHeight = Math.floor(metadata.height * shrink);
    
              process.stderr.write(
                `[Screenshot] Scaling image from ${metadata.width}x${metadata.height} (${(pixels / (1024 * 1024)).toFixed(2)}MP) to ${newWidth}x${newHeight} (${((newWidth * newHeight) / (1024 * 1024)).toFixed(2)}MP) for Claude vision API\n`,
              );
    
              const resizedBuffer = await sharp(imageBuffer)
                .resize(newWidth, newHeight, {
                  fit: "inside",
                  withoutEnlargement: true,
                })
                .png()
                .toBuffer();
    
              screenshotBase64 = resizedBuffer.toString("base64");
            }
          }
          const name = params.name
            ? `screenshot-${params.name}-${new Date()
                .toISOString()
                .replace(/:/g, "-")}`
            : `screenshot-${new Date().toISOString().replace(/:/g, "-")}` +
              context.config.browserbaseProjectId;
    
          // Associate with current mcp session id and store in memory /src/mcp/resources.ts
          const sessionId = context.currentSessionId;
          registerScreenshot(sessionId, name, screenshotBase64);
    
          // Notify the client that the resources changed
          const serverInstance = context.getServer();
    
          if (serverInstance) {
            serverInstance.notification({
              method: "notifications/resources/list_changed",
            });
          }
    
          return {
            content: [
              {
                type: "text",
                text: `Screenshot taken with name: ${name}`,
              },
              {
                type: "image",
                data: screenshotBase64,
                mimeType: "image/png",
              },
            ],
          };
        } catch (error) {
          const errorMsg = error instanceof Error ? error.message : String(error);
          throw new Error(`Failed to take screenshot: ${errorMsg}`);
        }
      };
    
      return {
        action,
        waitForNetwork: false,
      };
    }
  • Defines the input schema `ScreenshotInputSchema` (optional `name` string) using Zod and the `screenshotSchema` ToolSchema object with name `"browserbase_screenshot"`, description, and inputSchema.
    const ScreenshotInputSchema = z.object({
      name: z.string().optional().describe("The name of the screenshot"),
    });
    
    type ScreenshotInput = z.infer<typeof ScreenshotInputSchema>;
    
    const screenshotSchema: ToolSchema<typeof ScreenshotInputSchema> = {
      name: "browserbase_screenshot",
      description: `Capture a full-page screenshot and return it (and save as a resource).`,
      inputSchema: ScreenshotInputSchema,
    };
  • Creates and exports the `screenshotTool` Tool object, combining the `screenshotSchema` and `handleScreenshot` function for use in the tools index.
    const screenshotTool: Tool<typeof ScreenshotInputSchema> = {
      capability: "core",
      schema: screenshotSchema,
      handle: handleScreenshot,
    };
    
    export default screenshotTool;
  • Includes `screenshotTool` in the `TOOLS` array (imported at line 5), which is consumed by the MCP server to register all tools.
    export const TOOLS = [
      ...sessionTools,
      navigateTool,
      actTool,
      extractTool,
      observeTool,
      screenshotTool,
      getUrlTool,
      agentTool,
    ];
  • src/index.ts:168-198 (registration)
    Registers all tools from `TOOLS` (including `browserbase_screenshot`) with the MCP server using `server.tool()`, providing a wrapper handler that invokes `context.run(tool, params)`.
    const tools: MCPToolsArray = [...TOOLS];
    
    // Register each tool with the Smithery server
    tools.forEach((tool) => {
      if (tool.schema.inputSchema instanceof z.ZodObject) {
        server.tool(
          tool.schema.name,
          tool.schema.description,
          tool.schema.inputSchema.shape,
          async (params: z.infer<typeof tool.schema.inputSchema>) => {
            try {
              const result = await context.run(tool, params);
              return result;
            } catch (error) {
              const errorMessage =
                error instanceof Error ? error.message : String(error);
              process.stderr.write(
                `[Smithery Error] ${new Date().toISOString()} Error running tool ${tool.schema.name}: ${errorMessage}\n`,
              );
              throw new Error(
                `Failed to run tool '${tool.schema.name}': ${errorMessage}`,
              );
            }
          },
        );
      } else {
        console.warn(
          `Tool "${tool.schema.name}" has an input schema that is not a ZodObject. Schema type: ${tool.schema.inputSchema.constructor.name}`,
        );
      }
    });
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions saving as a resource, which adds some context beyond the basic action, but fails to detail critical aspects like permissions needed, rate limits, error conditions, or what 'full-page' entails (e.g., scrolling behavior). This leaves significant gaps for an agent to understand operational traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—a single sentence that directly states the tool's function and an additional behavior (saving as a resource). It's front-loaded with the core action and wastes no words, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (involving browser interaction and resource management) and the absence of annotations and output schema, the description is insufficient. It doesn't explain what 'full-page' means, how the screenshot is returned (e.g., format, size), or error handling, leaving the agent with incomplete operational context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for its single parameter 'name', so the description doesn't need to add parameter details. It doesn't provide extra meaning beyond the schema, but with high coverage, a baseline score of 3 is appropriate as the schema handles the parameter documentation adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('capture a full-page screenshot') and the resource (screenshot), with the verb 'capture' being specific. However, it doesn't explicitly differentiate from sibling tools like 'browserbase_stagehand_observe' which might also involve visual capture, leaving room for ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as when a full-page screenshot is needed over other capture methods or sibling tools. It lacks context about prerequisites or exclusions, offering only a basic functional statement.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/MesuterPikin/mcp-server-browserbase'

If you have feedback or need assistance with the MCP directory API, please join our Discord server