Skip to main content
Glama
evalstate
by evalstate

capture

Read-only

Retrieve real-time webcam images to analyze surroundings, view people, or examine objects, enabling visual interaction and context-based responses. Uses the MCP Webcam Server for live image capture.

Instructions

Gets the latest picture from the webcam. You can use this if the human asks questions about their immediate environment, if you want to see the human or to examine an object they may be referring to or showing you.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • MCP tool registration for 'capture': defines the tool name, description, empty input schema, metadata, and inline asynchronous handler that coordinates with browser clients via SSE to capture webcam image and return it as base64 image content.
    mcpServer.tool(
      "capture",
      "Gets the latest picture from the webcam. You can use this " +
        " if the human asks questions about their immediate environment,  " +
        "if you want to see the human or to examine an object they may be " +
        "referring to or showing you.",
      {},
      {
        openWorldHint: true,
        readOnlyHint: true,
        title: "Take a Picture from the webcam",
      },
      async () => {
        const userClients = getUserClients(user);
        if (userClients.size === 0) {
          return {
            isError: true,
            content: [
              {
                type: "text",
                text: `Have you opened your web browser?. Direct the human to go to ${getMcpHost()}${user !== 'default' ? `?user=${user}` : ''}, switch on their webcam and try again.`,
              },
            ],
          };
        }
    
        const clientId = Array.from(userClients.keys())[0];
    
        if (!clientId) {
          throw new Error("No clients connected");
        }
    
        const userCallbacks = getUserCallbacks(user);
    
        // Modified promise to handle both success and error cases
        const result = await new Promise<string | { error: string }>(
          (resolve) => {
            Logger.info(`Capturing for ${clientId} (user: ${user}`);
            userCallbacks.set(clientId, resolve);
    
            userClients
              .get(clientId)
              ?.write(`data: ${JSON.stringify({ type: "capture" })}\n\n`);
          }
        );
    
        // Handle error case
        if (typeof result === "object" && "error" in result) {
          return {
            isError: true,
            content: [
              {
                type: "text",
                text: `Failed to capture: ${result.error}`,
              },
            ],
          };
        }
    
        const { mimeType, base64Data } = parseDataUrl(result);
    
        return {
          content: [
            {
              type: "text",
              text: "Here is the latest image from the Webcam",
            },
            {
              type: "image",
              data: base64Data,
              mimeType: mimeType,
            },
          ],
        };
      }
    );
  • The core handler function for the 'capture' tool. It checks for connected clients, sends a Server-Sent Event (SSE) with type 'capture' to trigger webcam capture in the browser, awaits the result via callback, parses the data URL, and returns the image or error.
      async () => {
        const userClients = getUserClients(user);
        if (userClients.size === 0) {
          return {
            isError: true,
            content: [
              {
                type: "text",
                text: `Have you opened your web browser?. Direct the human to go to ${getMcpHost()}${user !== 'default' ? `?user=${user}` : ''}, switch on their webcam and try again.`,
              },
            ],
          };
        }
    
        const clientId = Array.from(userClients.keys())[0];
    
        if (!clientId) {
          throw new Error("No clients connected");
        }
    
        const userCallbacks = getUserCallbacks(user);
    
        // Modified promise to handle both success and error cases
        const result = await new Promise<string | { error: string }>(
          (resolve) => {
            Logger.info(`Capturing for ${clientId} (user: ${user}`);
            userCallbacks.set(clientId, resolve);
    
            userClients
              .get(clientId)
              ?.write(`data: ${JSON.stringify({ type: "capture" })}\n\n`);
          }
        );
    
        // Handle error case
        if (typeof result === "object" && "error" in result) {
          return {
            isError: true,
            content: [
              {
                type: "text",
                text: `Failed to capture: ${result.error}`,
              },
            ],
          };
        }
    
        const { mimeType, base64Data } = parseDataUrl(result);
    
        return {
          content: [
            {
              type: "text",
              text: "Here is the latest image from the Webcam",
            },
            {
              type: "image",
              data: base64Data,
              mimeType: mimeType,
            },
          ],
        };
      }
    );
  • Helper function to parse data URLs received from browser captures into mimeType and base64 components, used in both tool handler and resource reader.
    function parseDataUrl(dataUrl: string): ParsedDataUrl {
      const matches = dataUrl.match(/^data:([^;]+);base64,(.+)$/);
      if (!matches) {
        throw new Error("Invalid data URL format");
      }
      return {
        mimeType: matches[1],
        base64Data: matches[2],
      };
    }
  • Helper to retrieve or initialize per-user map of capture callbacks, used to await results from browser capture requests.
    export function getUserCallbacks(user: string): Map<string, (response: string | { error: string }) => void> {
      if (!captureCallbacks.has(user)) {
        captureCallbacks.set(user, new Map());
      }
      return captureCallbacks.get(user)!;
    }
  • Global map storing capture result callbacks, scoped by user and clientId, essential for asynchronous coordination between server tool calls and browser responses.
    export let captureCallbacks = new Map<
      string,
      Map<string, (response: string | { error: string }) => void>
    >(); // user -> clientId -> callback
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true and openWorldHint=true, indicating safe, non-destructive operation with potential for varied outcomes. The description adds valuable context by specifying it captures from 'the webcam' and returns 'the latest picture,' clarifying the source and immediacy of the data, which goes beyond what annotations alone convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by usage guidelines in a clear, efficient manner. Every sentence adds value without redundancy, making it appropriately sized and well-structured for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (0 parameters, no output schema), the description is complete enough for effective use. It covers purpose, usage guidelines, and behavioral context. The absence of an output schema is mitigated by the description's clarity on what is returned ('the latest picture'), though more detail on output format could enhance completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0 parameters with 100% coverage, so no parameter documentation is needed. The description appropriately does not discuss parameters, maintaining focus on tool functionality. A baseline of 4 is applied since there are no parameters to document.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Gets the latest picture from the webcam') and resource ('webcam'), distinguishing it from the sibling tool 'screenshot' which likely captures screen content rather than camera input. The verb 'Gets' is precise and the resource is unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly provides when-to-use guidance with concrete examples: 'if the human asks questions about their immediate environment,' 'if you want to see the human,' or 'to examine an object they may be referring to or showing you.' This gives clear context for selecting this tool over alternatives like 'screenshot'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/evalstate/mcp-webcam'

If you have feedback or need assistance with the MCP directory API, please join our Discord server