Skip to main content
Glama
ampcome-mcps

Playwright Browserbase MCP Server

by ampcome-mcps

browserbase_screenshot

Capture browser page screenshots to verify navigation and content when automated controls are insufficient for web interaction tasks.

Instructions

Takes a screenshot of the current page. Use this tool to learn where you are on the page when controlling the browser with Stagehand. Only use this tool when the other tools are not sufficient to get the information you need.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nameNoThe name of the screenshot

Implementation Reference

  • The core implementation of the 'browserbase_take_screenshot' tool. The 'handle' function captures a screenshot of the page or a specific element (using ref from snapshot), saves it optionally, and returns it as base64-encoded image content (unless configured to omit). Supports PNG (raw=true) or JPEG.
    const screenshot = defineTool<typeof screenshotSchema>({
      capability: "core",
      schema: {
        name: "browserbase_take_screenshot",
        description: `Take a screenshot of the current page or element using ref.`,
        inputSchema: screenshotSchema,
      },
      handle: async (
        context: Context,
        params: ScreenshotInput
      ): Promise<ToolResult> => {
        if (!!params.element !== !!params.ref) {
          throw new Error("Both element and ref must be provided or neither.");
        }
    
        const page = await context.getActivePage();
        if (!page) {
          throw new Error("No active page found for screenshot");
        }
        // Conditionally get snapshot only if ref is provided
        let pageSnapshot: PageSnapshot | null = null;
        if (params.ref) {
          pageSnapshot = context.snapshotOrDie();
        }
        const fileType = params.raw ? "png" : "jpeg";
        const fileName = await outputFile(
          context.config,
          `screenshot-${Date.now()}.${fileType}`
        );
    
        const baseOptions: PageScreenshotOptions = {
          scale: "css",
          timeout: 15000, // Kept existing timeout
        };
    
        let options: PageScreenshotOptions;
    
        if (fileType === "jpeg") {
          options = {
            ...baseOptions,
            type: "jpeg",
            quality: 50, // Quality is only for jpeg
            path: fileName,
          };
        } else {
          options = {
            ...baseOptions,
            type: "png",
            path: fileName,
          };
        }
    
        const isElementScreenshot = params.element && params.ref;
        const code: string[] = [];
        code.push(
          `// Screenshot ${
            isElementScreenshot ? params.element : "viewport"
          } and save it as ${fileName}`
        );
    
        // Conditionally get locator only if ref and snapshot are available
        const locator =
          params.ref && pageSnapshot ? pageSnapshot.refLocator(params.ref) : null;
    
        // Use JSON.stringify for code generation as javascript.formatObject is not available
        const optionsForCode = { ...options };
        // delete optionsForCode.path; // Path is an internal detail for saving, not usually part of the "command" log
    
        if (locator) {
          code.push(
            `// await page.${await generateLocator(
              locator
            )}.screenshot(${JSON.stringify(optionsForCode)});`
          );
        } else {
          code.push(`// await page.screenshot(${JSON.stringify(optionsForCode)});`);
        }
    
        const action = async (): Promise<ToolActionResult> => {
          // Access config via context.config
          const includeBase64 =
            !context.config.tools?.browserbase_take_screenshot?.omitBase64;
    
          // Use the page directly for full page screenshots if locator is null
          const screenshotBuffer = locator
            ? await locator.screenshot(options)
            : await page.screenshot(options);
    
          if (includeBase64) {
            const rawBase64 = screenshotBuffer.toString("base64");
            return {
              content: [
                {
                  type: "image",
                  format: fileType, // format might be redundant if mimeType is present, but kept for now
                  mimeType: fileType === "png" ? `image/png` : `image/jpeg`,
                  data: rawBase64,
                },
              ],
            };
          } else {
            // If base64 is not included, return an empty content array
            return { content: [] };
          }
        };
    
        return {
          code,
          action,
          captureSnapshot: true, 
          waitForNetwork: false, 
        };
      },
    });
  • Zod schema defining the input parameters for the browserbase_take_screenshot tool: optional 'raw' for format, and optional 'element'/'ref' for element-specific screenshot.
    const screenshotSchema = z.object({
      raw: z
        .boolean()
        .optional()
        .describe(
          "Whether to return without compression (PNG). Default is false (JPEG)."
        ),
      element: z
        .string()
        .optional()
        .describe("Human-readable element description."),
      ref: z
        .string()
        .optional()
        .describe("Exact target element reference from the page snapshot.")
    });
    
    type ScreenshotInput = z.infer<typeof screenshotSchema>;
  • src/index.ts:73-108 (registration)
    The browserbase_take_screenshot tool (imported from snapshot.ts) is included in the tools array and registered with the MCP server via server.tool() in the forEach loop.
    const tools: Tool<any>[] = [
      ...common,
      ...snapshot,
      ...keyboard,
      ...getText,
      ...navigate,
      ...session,
      ...contextTools,
    ];
    
    // Register each tool with the Smithery server
    tools.forEach(tool => {
      if (tool.schema.inputSchema instanceof z.ZodObject) {
        server.tool(
          tool.schema.name,
          tool.schema.description,
          tool.schema.inputSchema.shape,
          async (params: z.infer<typeof tool.schema.inputSchema>) => {
            try {
              const result = await context.run(tool, params);
              return result;
            } catch (error) {
              const errorMessage = error instanceof Error ? error.message : String(error);
              process.stderr.write(`[Smithery Error] ${new Date().toISOString()} Error running tool ${tool.schema.name}: ${errorMessage}\n`);
              throw new Error(`Failed to run tool '${tool.schema.name}': ${errorMessage}`);
            }
          }
        );
      } else {
        console.warn(
          `Tool "${tool.schema.name}" has an input schema that is not a ZodObject. Schema type: ${tool.schema.inputSchema.constructor.name}`
        );
      }
    });
    
    return server.server;
  • Configuration schema option for the tool, allowing to omit base64 image data in responses.
    tools: z.object({
      browserbase_take_screenshot: z.object({
        omitBase64: z.boolean().optional().describe("Whether to disable base64-encoded image responses")
      }).optional()
    }).optional()
  • TypeScript type definition for the tool's configuration in Config interface.
    browserbase_take_screenshot?: {
        /**
         * Whether to disable base64-encoded image responses to the clients that
         * don't support binary data or prefer to save on tokens.
        */
        omitBase64?: boolean;
    }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions the tool's purpose and usage context but lacks details on behavioral traits such as permissions needed, rate limits, file output format, or error conditions. However, it does add value by explaining the situational context for use.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by two concise sentences providing usage guidelines. Every sentence adds value without redundancy, making it efficiently structured and appropriately sized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (1 optional parameter, no output schema, no annotations), the description is mostly complete. It covers purpose and usage well but lacks details on behavioral aspects like output format or errors. However, for a simple screenshot tool, this is sufficient, though not exhaustive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 1 parameter with 100% description coverage, so the baseline is 3. The description does not mention the 'name' parameter, but since there are 0 required parameters and the schema fully documents it, this is acceptable. The description adds no param semantics, but the low parameter count and high schema coverage justify a score above baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Takes a screenshot') and resource ('of the current page'), distinguishing it from sibling tools like navigation, extraction, or session management tools. It precisely defines what the tool does without ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('to learn where you are on the page when controlling the browser with Stagehand') and when not to use it ('Only use this tool when the other tools are not sufficient to get the information you need'), clearly differentiating it from alternatives among the sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ampcome-mcps/browserbase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server