Skip to main content
Glama

screenshot

Capture screenshots for visual verification in Xcode builds using specified simulator UUID. Ensure UI accuracy without relying on coordinates from images.

Instructions

Captures screenshot for visual verification. For UI coordinates, use describe_ui instead (don't determine coordinates from screenshots).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
simulatorUuidYes

Implementation Reference

  • Core handler function that executes the screenshot capture using 'xcrun simctl io', optimizes the image with 'sips', encodes to base64, and returns as image content.
    export async function screenshotLogic(
      params: ScreenshotParams,
      executor: CommandExecutor,
      fileSystemExecutor: FileSystemExecutor = getDefaultFileSystemExecutor(),
      pathUtils: { tmpdir: () => string; join: (...paths: string[]) => string } = { ...path, tmpdir },
      uuidUtils: { v4: () => string } = { v4: uuidv4 },
    ): Promise<ToolResponse> {
      const { simulatorId } = params;
      const tempDir = pathUtils.tmpdir();
      const screenshotFilename = `screenshot_${uuidUtils.v4()}.png`;
      const screenshotPath = pathUtils.join(tempDir, screenshotFilename);
      const optimizedFilename = `screenshot_optimized_${uuidUtils.v4()}.jpg`;
      const optimizedPath = pathUtils.join(tempDir, optimizedFilename);
      // Use xcrun simctl to take screenshot
      const commandArgs: string[] = [
        'xcrun',
        'simctl',
        'io',
        simulatorId,
        'screenshot',
        screenshotPath,
      ];
    
      log('info', `${LOG_PREFIX}/screenshot: Starting capture to ${screenshotPath} on ${simulatorId}`);
    
      try {
        // Execute the screenshot command
        const result = await executor(commandArgs, `${LOG_PREFIX}: screenshot`, false);
    
        if (!result.success) {
          throw new SystemError(`Failed to capture screenshot: ${result.error ?? result.output}`);
        }
    
        log('info', `${LOG_PREFIX}/screenshot: Success for ${simulatorId}`);
    
        try {
          // Optimize the image for LLM consumption: resize to max 800px width and convert to JPEG
          const optimizeArgs = [
            'sips',
            '-Z',
            '800', // Resize to max 800px (maintains aspect ratio)
            '-s',
            'format',
            'jpeg', // Convert to JPEG
            '-s',
            'formatOptions',
            '75', // 75% quality compression
            screenshotPath,
            '--out',
            optimizedPath,
          ];
    
          const optimizeResult = await executor(optimizeArgs, `${LOG_PREFIX}: optimize image`, false);
    
          if (!optimizeResult.success) {
            log('warning', `${LOG_PREFIX}/screenshot: Image optimization failed, using original PNG`);
            // Fallback to original PNG if optimization fails
            const base64Image = await fileSystemExecutor.readFile(screenshotPath, 'base64');
    
            // Clean up
            try {
              await fileSystemExecutor.rm(screenshotPath);
            } catch (err) {
              log('warning', `${LOG_PREFIX}/screenshot: Failed to delete temp file: ${err}`);
            }
    
            return {
              content: [createImageContent(base64Image, 'image/png')],
              isError: false,
            };
          }
    
          log('info', `${LOG_PREFIX}/screenshot: Image optimized successfully`);
    
          // Read the optimized image file as base64
          const base64Image = await fileSystemExecutor.readFile(optimizedPath, 'base64');
    
          log('info', `${LOG_PREFIX}/screenshot: Successfully encoded image as Base64`);
    
          // Clean up both temporary files
          try {
            await fileSystemExecutor.rm(screenshotPath);
            await fileSystemExecutor.rm(optimizedPath);
          } catch (err) {
            log('warning', `${LOG_PREFIX}/screenshot: Failed to delete temporary files: ${err}`);
          }
    
          // Return the optimized image (JPEG format, smaller size)
          return {
            content: [createImageContent(base64Image, 'image/jpeg')],
            isError: false,
          };
        } catch (fileError) {
          log('error', `${LOG_PREFIX}/screenshot: Failed to process image file: ${fileError}`);
          return createErrorResponse(
            `Screenshot captured but failed to process image file: ${fileError instanceof Error ? fileError.message : String(fileError)}`,
          );
        }
      } catch (_error) {
        log('error', `${LOG_PREFIX}/screenshot: Failed - ${_error}`);
        if (_error instanceof SystemError) {
          return createErrorResponse(
            `System error executing screenshot: ${_error.message}`,
            _error.originalError?.stack,
          );
        }
        return createErrorResponse(
          `An unexpected error occurred: ${_error instanceof Error ? _error.message : String(_error)}`,
        );
      }
    }
  • Zod schema defining the input parameters for the screenshot tool (simulatorId as UUID). Public schema omits simulatorId as it's session-aware.
    const screenshotSchema = z.object({
      simulatorId: z.string().uuid('Invalid Simulator UUID format'),
    });
    
    // Use z.infer for type safety
    type ScreenshotParams = z.infer<typeof screenshotSchema>;
    
    const publicSchemaObject = screenshotSchema.omit({ simulatorId: true } as const).strict();
  • Tool definition and registration exporting the 'screenshot' tool with name, description, schema, and session-aware handler.
    export default {
      name: 'screenshot',
      description:
        "Captures screenshot for visual verification. For UI coordinates, use describe_ui instead (don't determine coordinates from screenshots).",
      schema: publicSchemaObject.shape, // MCP SDK compatibility
      handler: createSessionAwareTool<ScreenshotParams>({
        internalSchema: screenshotSchema as unknown as z.ZodType<ScreenshotParams>,
        logicFunction: (params: ScreenshotParams, executor: CommandExecutor) => {
          return screenshotLogic(params, executor);
        },
        getExecutor: getDefaultCommandExecutor,
        requirements: [{ allOf: ['simulatorId'], message: 'simulatorId is required' }],
      }),
    };
  • Re-export of the screenshot tool for inclusion in the simulator workflow.
    // Re-export from ui-testing to avoid duplication
    export { default } from '../ui-testing/screenshot.ts';
  • Workflow metadata declaring screenshot-capture capability for dynamic tool discovery.
    export const workflow = {
      name: 'UI Testing & Automation',
      description:
        'UI automation and accessibility testing tools for iOS simulators. Perform gestures, interactions, screenshots, and UI analysis for automated testing workflows.',
      platforms: ['iOS'],
      targets: ['simulator'],
      capabilities: [
        'ui-automation',
        'gesture-simulation',
        'screenshot-capture',
        'accessibility-testing',
        'ui-analysis',
      ],
    };
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions the tool captures screenshots for visual verification, implying a read-only operation that produces an image. However, it lacks details on permissions, output format (e.g., image type, size), side effects, or error conditions. For a tool with no annotations, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise and well-structured: two sentences that efficiently convey the purpose and a key usage guideline. Every word serves a clear purpose, with no wasted text, making it easy to parse and understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (involving screenshot capture with a required parameter), lack of annotations, no output schema, and low schema description coverage, the description is incomplete. It misses critical details like parameter explanation, output format, and behavioral constraints, making it inadequate for full agent understanding without external context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 1 parameter (simulatorUuid) with 0% description coverage, meaning the schema provides no semantic context. The description does not mention this parameter at all, failing to explain what simulatorUuid is, why it's required, or how it relates to screenshot capture. This leaves the parameter's meaning undocumented.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Captures screenshot for visual verification.' It specifies the action (captures) and resource (screenshot) with a clear goal (visual verification). However, it doesn't explicitly differentiate from all sibling tools beyond the one mentioned alternative (describe_ui), leaving some ambiguity about its uniqueness in the broader context of the toolset.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool vs. alternatives: 'For UI coordinates, use describe_ui instead (don't determine coordinates from screenshots).' This clearly defines a specific exclusion case and names the alternative tool, helping the agent avoid misuse.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/getsentry/XcodeBuildMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server