Skip to main content
Glama
Winds-AI

autonomous-frontend-browser-tools

browser.screenshot

Capture screenshots of browser tabs for frontend development. Saves images to structured paths and returns them for analysis, requiring DevTools extension connection.

Instructions

Capture current browser tab; saves to structured path and returns image. Requires extension connection with DevTools open.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
randomStringYesany string (ignored)

Implementation Reference

  • MCP tool registration for 'browser.screenshot' including schema (zod validation for dummy param) and handler reference.
    // New name
    server.tool(
      "browser.screenshot",
      "Capture current browser tab; saves to structured path and returns image. Requires extension connection with DevTools open.",
      { randomString: z.string().describe("any string (ignored)") },
      handleCaptureBrowserScreenshot
    );
  • MCP handler function that proxies screenshot request to browser-tools-server /capture-screenshot endpoint and returns formatted response with image data.
    async function handleCaptureBrowserScreenshot() {
      return await withServerConnection(async () => {
        try {
          const targetUrl = `http://${discoveredHost}:${discoveredPort}/capture-screenshot`;
          const activeProjectName = getActiveProjectName();
          const requestPayload = {
            returnImageData: true, // Always return image data
            projectName: activeProjectName, // Pass active project name
          };
    
          const response = await fetch(targetUrl, {
            method: "POST",
            headers: {
              "Content-Type": "application/json",
            },
            body: JSON.stringify(requestPayload),
          });
    
          const result = await response.json();
    
          if (response.ok) {
            const responseContent: any[] = [
              {
                type: "text",
                text: `šŸ“ Project: ${
                  result.projectDirectory || "default-project"
                }\nšŸ“Œ Now Analyze the UI Layout and it's structure properly given the task at hand, then continue`,
              },
            ];
    
            responseContent.push({
              type: "image",
              data: result.imageData,
              mimeType: "image/png",
            });
    
            return {
              content: responseContent,
            } as any;
          } else {
            return {
              content: [
                {
                  type: "text",
                  text: `Error taking screenshot: ${result.error}`,
                },
              ],
              isError: true,
            } as any;
          }
        } catch (error: any) {
          const errorMessage =
            error instanceof Error ? error.message : String(error);
          return {
            content: [
              {
                type: "text",
                text: `Failed to take screenshot: ${errorMessage}`,
              },
            ],
            isError: true,
          } as any;
        }
      });
    }
  • Core implementation handler in browser connector server: orchestrates WebSocket communication with Chrome extension, receives base64 image, persists via ScreenshotService.
    async captureScreenshot(req: express.Request, res: express.Response) {
      logInfo("Browser Connector: Starting captureScreenshot method");
    
      if (!this.activeConnection) {
        logInfo(
          "Browser Connector: No active WebSocket connection to Chrome extension "
        );
        return res.status(503).json({
          error:
            "Chrome extension not connected. Please open Chrome DevTools and ensure the extension is loaded.",
        });
      }
    
      // Extra health checks to avoid sending requests into a stale socket during reconnects
      if (!this.hasActiveConnection()) {
        return res.status(503).json({
          error:
            "Chrome extension not connected (WebSocket not open). Please open DevTools on the target tab.",
        });
      }
    
      const timeSinceHeartbeat = Date.now() - this.lastHeartbeatTime;
      if (timeSinceHeartbeat > this.HEARTBEAT_TIMEOUT) {
        logInfo(
          `Browser Connector: Connection unhealthy (no heartbeat for ${timeSinceHeartbeat}ms)`
        );
        return res.status(503).json({
          error:
            "Chrome extension connection is unhealthy. Open DevTools on the page and try again.",
        });
      }
    
      // Probe for a quick heartbeat response to ensure we're not racing a reconnect
      const heartbeatOk = await this.awaitHeartbeatResponse(1200);
      if (!heartbeatOk) {
        return res.status(503).json({
          error:
            "Chrome extension connection is not ready. Please ensure DevTools is open and retry.",
        });
      }
    
      try {
        // Extract parameters from request body
        logDebug("Browser Connector: Starting screenshot capture...");
        const { projectName, returnImageData, baseDirectory } = req.body || {};
    
        const requestId = `${Date.now()}_${Math.random()
          .toString(36)
          .slice(2, 9)}`;
        logDebug("Browser Connector: Generated requestId:", requestId);
    
        // Create promise that will resolve when we get the screenshot data
        const screenshotPromise = new Promise<{
          data: string;
          path?: string;
          autoPaste?: boolean;
        }>((resolve, reject) => {
          logDebug(
            `Browser Connector: Setting up screenshot callback for requestId: ${requestId}`
          );
          // Store callback in map
          screenshotCallbacks.set(requestId, { resolve, reject });
          logDebug(
            "Browser Connector: Current callbacks:",
            Array.from(screenshotCallbacks.keys())
          );
    
          // Set timeout to clean up if we don't get a response - increased for autonomous operation
          setTimeout(() => {
            if (screenshotCallbacks.has(requestId)) {
              logInfo(
                `Browser Connector: Screenshot capture timed out for requestId: ${requestId} [${this.connectionId}]`
              );
              screenshotCallbacks.delete(requestId);
              reject(
                new Error(
                  `Screenshot capture timed out - no response from Chrome extension [${this.connectionId}] after 30 seconds`
                )
              );
            }
          }, 30000);
        });
    
        // Send screenshot request to extension
        const message = JSON.stringify({
          type: "take-screenshot",
          requestId: requestId,
        });
        logDebug(
          `Browser Connector: Sending WebSocket message to extension:`,
          message
        );
        if (
          !this.activeConnection ||
          this.activeConnection.readyState !== WebSocket.OPEN
        ) {
          throw new Error(
            "WebSocket connection is not open to send screenshot request"
          );
        }
        this.activeConnection.send(message);
    
        // Wait for screenshot data
        logDebug("Browser Connector: Waiting for screenshot data...");
        const {
          data: base64Data,
          path: customPath,
          autoPaste,
        } = await screenshotPromise;
        logDebug(
          "Browser Connector: Received screenshot data, processing with unified service..."
        );
    
        if (!base64Data) {
          throw new Error("No screenshot data received from Chrome extension");
        }
    
        // Use the unified screenshot service
        const screenshotService = ScreenshotService.getInstance();
    
        // Prepare configuration for screenshot service
        // Use project configuration for screenshot path, fallback to customPath if needed
        const projectScreenshotPath = getScreenshotStoragePath();
    
        // Build config using tool helper (statically imported)
        const screenshotConfig = buildScreenshotConfig(
          projectScreenshotPath,
          customPath,
          projectName
        );
    
        // Save screenshot using unified service
        const result = await screenshotService.saveScreenshot(
          base64Data,
          currentUrl,
          screenshotConfig
        );
    
        logInfo(
          `Browser Connector: Screenshot saved successfully to: ${result.filePath}`
        );
        logDebug(
          `Browser Connector: Project directory: ${result.projectDirectory}`
        );
        logDebug(`Browser Connector: URL category: ${result.urlCategory}`);
    
        // Execute auto-paste if requested and on macOS
        if (os.platform() === "darwin" && autoPaste === true) {
          logDebug("Browser Connector: Executing auto-paste to Cursor...");
          try {
            await screenshotService.executeAutoPaste(result.filePath);
            logDebug("Browser Connector: Auto-paste executed successfully");
          } catch (autoPasteError) {
            console.error(
              "[error] Browser Connector: Auto-paste failed:",
              autoPasteError
            );
            // Don't fail the screenshot save for auto-paste errors
          }
        } else {
          if (os.platform() === "darwin" && !autoPaste) {
            logDebug("Browser Connector: Auto-paste disabled, skipping");
          } else {
            logDebug("Browser Connector: Not on macOS, skipping auto-paste");
          }
        }
    
        // Build response object via tool helper
        const response: any = buildScreenshotResponse(result);
    
        logInfo("Browser Connector: Screenshot capture completed successfully");
        res.json(response);
      } catch (error) {
        const errorMessage =
          error instanceof Error ? error.message : String(error);
        console.error(
          "[error] Browser Connector: Error capturing screenshot:",
          errorMessage
        );
        res.status(500).json({
          error: errorMessage,
        });
      }
    }
  • ScreenshotService.saveScreenshot: core persistence logic with intelligent project/URL categorization, path resolution, base64 cleaning, and optional image data return.
    public async saveScreenshot(
      base64Data: string,
      url?: string,
      config: ScreenshotConfig = {}
    ): Promise<ScreenshotResult> {
      // Clean base64 data
      const cleanBase64 = this.cleanBase64Data(base64Data);
    
      // Resolve the complete path structure
      const pathResolution = this.resolveScreenshotPath(url, config);
    
      // Ensure directory exists
      await this.ensureDirectoryExists(path.dirname(pathResolution.fullPath));
    
      // Save the file
      await this.writeScreenshotFile(pathResolution.fullPath, cleanBase64);
    
      // Build result object
      const result: ScreenshotResult = {
        filePath: pathResolution.fullPath,
        filename: pathResolution.filename,
        projectDirectory: pathResolution.projectDirectory,
        urlCategory: pathResolution.urlCategory,
      };
    
      // Include image data if requested
      if (config.returnImageData) {
        result.imageData = cleanBase64;
      }
    
      console.log(`Screenshot saved: ${pathResolution.fullPath}`);
      return result;
    }
  • Helper functions for building screenshot service config and response objects used by handlers.
    export function buildScreenshotConfig(
      projectScreenshotPath?: string,
      customPath?: string,
      projectName?: string
    ): ScreenshotServiceConfig {
      return {
        returnImageData: true,
        baseDirectory: projectScreenshotPath || customPath,
        projectName: projectName,
      };
    }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool's behavior: capturing the current tab, saving to a structured path, returning an image, and requiring DevTools connection. This covers key operational aspects, though it doesn't mention potential limitations like file format, size, or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise with two sentences that each serve a clear purpose: the first explains what the tool does, and the second states the prerequisite. There's no wasted language or unnecessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (screenshot functionality with a prerequisite) and no annotations or output schema, the description is adequate but incomplete. It covers the core purpose and requirement but lacks details about the structured path format, image characteristics, or error handling that would be helpful for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, but the parameter description ('any string (ignored)') is confusing and unhelpful. The tool description doesn't add any meaningful clarification about why this parameter exists or how it should be used, failing to compensate for the schema's poor documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Capture current browser tab'), resource ('browser tab'), and outcome ('saves to structured path and returns image'). It distinguishes itself from sibling tools like browser.navigate or browser.console.read by focusing on screenshot functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Capture current browser tab') and includes a prerequisite ('Requires extension connection with DevTools open'). However, it doesn't explicitly state when not to use it or name alternatives among sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Winds-AI/Frontend-development-MCP-tools-public'

If you have feedback or need assistance with the MCP directory API, please join our Discord server