Skip to main content
Glama
evalstate
by evalstate

screenshot

Read-only

Capture a screenshot of the current screen or window to enable visual input for MCP clients, supporting interaction through image-based communication and live webcam integration.

Instructions

Gets a screenshot of the current screen or window

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • MCP tool handler for 'screenshot': checks for connected clients, sends screenshot request via data channel to browser client, awaits data URL response or error, parses it, and returns text message with embedded image.
    async () => {
      const userClients = getUserClients(user);
      if (userClients.size === 0) {
        return {
          isError: true,
          content: [
            {
              type: "text",
              text: `Have you opened your web browser?. Direct the human to go to ${getMcpHost()}?user=${user}, switch on their webcam and try again.`,
            },
          ],
        };
      }
    
      const clientId = Array.from(userClients.keys())[0];
    
      if (!clientId) {
        throw new Error("No clients connected");
      }
    
      const userCallbacks = getUserCallbacks(user);
    
      // Modified promise to handle both success and error cases
      const result = await new Promise<string | { error: string }>(
        (resolve) => {
          Logger.info(`Taking screenshot for ${clientId} (user: ${user}`);
          userCallbacks.set(clientId, resolve);
    
          userClients
            .get(clientId)
            ?.write(`data: ${JSON.stringify({ type: "screenshot" })}\n\n`);
        }
      );
    
      // Handle error case
      if (typeof result === "object" && "error" in result) {
        return {
          isError: true,
          content: [
            {
              type: "text",
              text: `Failed to capture screenshot: ${result.error}`,
            },
          ],
        };
      }
    
      const { mimeType, base64Data } = parseDataUrl(result);
    
      return {
        content: [
          {
            type: "text",
            text: "Here is the requested screenshot",
          },
          {
            type: "image",
            data: base64Data,
            mimeType: mimeType,
          },
        ],
      };
    }
  • Registration of the 'screenshot' tool on the MCP server with name, description, empty input schema, UI hints, and inline handler.
    mcpServer.tool(
      "screenshot",
      "Gets a screenshot of the current screen or window",
      {},
      {
        openWorldHint: true,
        readOnlyHint: true,
        title: "Take a Screenshot",
      },
      async () => {
        const userClients = getUserClients(user);
        if (userClients.size === 0) {
          return {
            isError: true,
            content: [
              {
                type: "text",
                text: `Have you opened your web browser?. Direct the human to go to ${getMcpHost()}?user=${user}, switch on their webcam and try again.`,
              },
            ],
          };
        }
    
        const clientId = Array.from(userClients.keys())[0];
    
        if (!clientId) {
          throw new Error("No clients connected");
        }
    
        const userCallbacks = getUserCallbacks(user);
    
        // Modified promise to handle both success and error cases
        const result = await new Promise<string | { error: string }>(
          (resolve) => {
            Logger.info(`Taking screenshot for ${clientId} (user: ${user}`);
            userCallbacks.set(clientId, resolve);
    
            userClients
              .get(clientId)
              ?.write(`data: ${JSON.stringify({ type: "screenshot" })}\n\n`);
          }
        );
    
        // Handle error case
        if (typeof result === "object" && "error" in result) {
          return {
            isError: true,
            content: [
              {
                type: "text",
                text: `Failed to capture screenshot: ${result.error}`,
              },
            ],
          };
        }
    
        const { mimeType, base64Data } = parseDataUrl(result);
    
        return {
          content: [
            {
              type: "text",
              text: "Here is the requested screenshot",
            },
            {
              type: "image",
              data: base64Data,
              mimeType: mimeType,
            },
          ],
        };
      }
    );
  • Client-side helper function that captures the screen using navigator.mediaDevices.getDisplayMedia, renders to canvas, optionally resizes, and returns PNG data URL. Used by browser client upon receiving 'screenshot' request.
    export async function captureScreen(): Promise<string> {
        let stream: MediaStream | undefined;
        try {
            stream = await navigator.mediaDevices.getDisplayMedia({
                video: true,
                audio: false,
            });
    
            const canvas = document.createElement("canvas");
            const video = document.createElement("video");
    
            await new Promise((resolve) => {
                video.onloadedmetadata = () => {
                    canvas.width = video.videoWidth;
                    canvas.height = video.videoHeight;
                    video.play();
                    resolve(null);
                };
                if (stream) {
                    video.srcObject = stream;
                } else {
                    throw Error("No stream available");
                }
            });
    
            const context = canvas.getContext("2d");
            context?.drawImage(video, 0, 0, canvas.width, canvas.height);
    
            // Check if resizing is needed
            const MAX_DIMENSION = 1568;
            if (canvas.width > MAX_DIMENSION || canvas.height > MAX_DIMENSION) {
                const scaleFactor = MAX_DIMENSION / Math.max(canvas.width, canvas.height);
                const newWidth = Math.round(canvas.width * scaleFactor);
                const newHeight = Math.round(canvas.height * scaleFactor);
    
                const resizeCanvas = document.createElement("canvas");
                resizeCanvas.width = newWidth;
                resizeCanvas.height = newHeight;
                const resizeContext = resizeCanvas.getContext("2d");
                resizeContext?.drawImage(canvas, 0, 0, newWidth, newHeight);
                return resizeCanvas.toDataURL("image/png");
            }
    
            return canvas.toDataURL("image/png");
        } catch (error) {
            console.error("Error capturing screenshot:", error);
            throw error;
        } finally {
            if (stream) {
                stream.getTracks().forEach((track) => track.stop());
            }
        }
    }
  • Utility function to parse received data URL from client into mimeType and base64 data for embedding in response.
    function parseDataUrl(dataUrl: string): ParsedDataUrl {
      const matches = dataUrl.match(/^data:([^;]+);base64,(.+)$/);
      if (!matches) {
        throw new Error("Invalid data URL format");
      }
      return {
        mimeType: matches[1],
        base64Data: matches[2],
      };
    }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true, so the agent knows this is a safe, non-destructive operation with open-world assumptions. The description adds minimal behavioral context beyond this, such as specifying it captures the 'current screen or window', but doesn't detail aspects like format, size, or potential limitations. No contradiction with annotations exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence that efficiently conveys the core functionality without any wasted words. It is front-loaded with the essential information, making it easy for an agent to parse and understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (0 parameters, no output schema) and rich annotations (readOnlyHint, openWorldHint), the description is adequate but minimal. It covers the basic purpose but lacks details on output format or behavioral nuances that could aid the agent, such as whether it returns an image file or data. It meets minimum viability but has gaps in completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0 parameters with 100% coverage, so no parameter documentation is needed. The description appropriately doesn't discuss parameters, focusing instead on the tool's action. A baseline of 4 is applied since it avoids redundancy and adds value by explaining the tool's purpose without unnecessary details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Gets') and resource ('screenshot of the current screen or window'), making it immediately understandable. However, it doesn't explicitly differentiate from the sibling tool 'capture', which might have overlapping functionality, preventing a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus the sibling tool 'capture', nor does it mention any prerequisites, context, or exclusions. It merely states what the tool does without offering usage instructions, leaving the agent to infer when it's appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/evalstate/mcp-webcam'

If you have feedback or need assistance with the MCP directory API, please join our Discord server