Skip to main content
Glama

screenshot_by_uid

Capture screenshots of specific webpage elements by their unique identifier. Use this tool to take base64 PNG images of individual elements for testing, debugging, or documentation purposes.

Instructions

Capture element screenshot by UID as base64 PNG.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
uidYesElement UID from snapshot

Implementation Reference

  • MCP tool handler: validates uid arg, gets Firefox instance, calls takeScreenshotByUid, handles special errors for stale UIDs, builds response with size info
    export async function handleScreenshotByUid(args: unknown): Promise<McpToolResponse> {
      try {
        const { uid } = args as { uid: string };
    
        if (!uid || typeof uid !== 'string') {
          throw new Error('uid required');
        }
    
        const { getFirefox } = await import('../index.js');
        const firefox = await getFirefox();
    
        try {
          const base64Png = await firefox.takeScreenshotByUid(uid);
    
          if (!base64Png || typeof base64Png !== 'string') {
            throw new Error('Invalid screenshot data');
          }
    
          return buildScreenshotResponse(base64Png, uid);
        } catch (error) {
          const errorMsg = (error as Error).message;
    
          // Concise error for stale UIDs
          if (
            errorMsg.includes('stale') ||
            errorMsg.includes('Snapshot') ||
            errorMsg.includes('UID') ||
            errorMsg.includes('not found')
          ) {
            throw new Error(`${uid} stale/invalid. Call take_snapshot first.`);
          }
    
          throw error;
        }
      } catch (error) {
        return errorResponse(error as Error);
      }
    }
  • Tool schema definition: specifies name, description, and inputSchema requiring 'uid' string
    export const screenshotByUidTool = {
      name: 'screenshot_by_uid',
      description: 'Capture element screenshot by UID as base64 PNG.',
      inputSchema: {
        type: 'object',
        properties: {
          uid: {
            type: 'string',
            description: 'Element UID from snapshot',
          },
        },
        required: ['uid'],
      },
    };
  • src/index.ts:140-140 (registration)
    Registers the handler function in the toolHandlers Map used by the MCP server
    ['screenshot_by_uid', tools.handleScreenshotByUid],
  • src/index.ts:184-184 (registration)
    Includes the tool definition in the allTools array returned by listTools
    tools.screenshotByUidTool,
  • Core helper: resolves UID to WebElement, scrolls into view, takes element screenshot using Selenium WebElement.takeScreenshot()
      async takeScreenshotByUid(uid: string): Promise<string> {
        if (!this.resolveUid) {
          throw new Error(
            'takeScreenshotByUid: resolveUid callback not set. Ensure snapshot is initialized.'
          );
        }
    
        const el = await this.resolveUid(uid);
    
        // Scroll element into view
        await this.driver.executeScript(
          'arguments[0].scrollIntoView({block: "center", inline: "center"});',
          el
        );
    
        // Wait for scroll to complete
        await new Promise((resolve) => setTimeout(resolve, 100));
    
        // Take screenshot of element (Selenium automatically crops to element bounds)
        return await el.takeScreenshot();
      }
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but lacks behavioral details. It doesn't disclose whether this is a read-only operation, if it requires specific permissions, potential rate limits, or how it interacts with the page state. The description is minimal and misses key operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with zero wasted words. It's front-loaded with the core purpose and includes essential details (UID method, output format) without unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no annotations and no output schema, the description is insufficient. It doesn't explain what the tool returns (e.g., base64 string structure, error conditions), prerequisites (e.g., requires a snapshot), or how it differs from sibling tools. Given the complexity of screenshot operations, more context is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with the single parameter 'uid' documented as 'Element UID from snapshot'. The description adds no additional parameter semantics beyond what the schema provides, so it meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Capture') and resource ('element screenshot'), specifying it's captured 'by UID' and output format 'as base64 PNG'. It distinguishes from sibling 'screenshot_page' by focusing on elements rather than full pages, though it doesn't explicitly mention this distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like 'screenshot_page' or 'take_snapshot'. The description implies usage for element-specific screenshots but doesn't specify prerequisites (e.g., needing a snapshot first) or contextual constraints.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mozilla/firefox-devtools-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server