simctl-screenshot-inline

Capture optimized screenshots with inline base64 encoding for direct MCP response transmission.

What it does

Captures simulator screenshots and returns them as base64-encoded images directly in the MCP response. Automatically optimizes images for token efficiency with tile-aligned resizing and WebP/JPEG compression. Includes interactive element detection and coordinate transforms.

Parameters

udid (string, optional): Simulator UDID (auto-detects booted device if omitted)
size (string, optional): Screenshot size - half, full, quarter, thumb (default: half)
appName (string, optional): App name for semantic context
screenName (string, optional): Screen/view name for semantic context
state (string, optional): UI state for semantic context
enableCoordinateCaching (boolean, optional): Enable view fingerprinting for coordinate caching

Screenshot Size Optimization

Automatically optimizes screenshots for token efficiency:

half (default): 256×512 pixels, 1 tile, ~170 tokens (50% savings)
full: Native resolution, 2 tiles, ~340 tokens
quarter: 128×256 pixels, 1 tile, ~170 tokens
thumb: 128×128 pixels, 1 tile, ~170 tokens

Automatic Optimization Process

Capture: Screenshot taken at native resolution
Resize: Automatically resized to tile-aligned dimensions (unless size='full')
Compress: Converted to WebP format at 60% quality (falls back to JPEG if unavailable)
Encode: Base64-encoded for inline MCP response transmission
Extract: Interactive elements detected from accessibility tree
Transform: Coordinate mapping provided for resized screenshots

Returns

MCP response with:

Base64-encoded optimized image (inline)
Screenshot optimization metadata (dimensions, tokens, savings)
Interactive elements with coordinates and properties
Coordinate transform for mapping screenshot to device coordinates
View fingerprint (if enableCoordinateCaching is true)
Semantic metadata (if provided)

Examples

Simple optimized screenshot (256×512)

await simctlScreenshotInlineTool({
  udid: 'device-123'
})

Full resolution screenshot

await simctlScreenshotInlineTool({
  udid: 'device-123',
  size: 'full'
})

Screenshot with semantic context

await simctlScreenshotInlineTool({
  udid: 'device-123',
  appName: 'MyApp',
  screenName: 'LoginScreen',
  state: 'Empty'
})

Screenshot with coordinate caching enabled

await simctlScreenshotInlineTool({
  udid: 'device-123',
  enableCoordinateCaching: true
})

Interactive Element Detection

Automatically extracts interactive elements from the accessibility tree:

Element type (Button, TextField, etc.)
Label and identifier
Bounds (x, y, width, height)
Tappability status

Limited to top 20 elements to avoid token overflow. Elements are filtered to only include those with bounds and hittable status.

Coordinate Transform

When screenshots are resized (size ≠ 'full'), provides automatic coordinate transformation:

Automatic Transformation (Recommended for Agents)

Use the coordinateTransformHelper field in the response with idb-ui-tap:

Identify element coordinates visually from the screenshot
Call idb-ui-tap with applyScreenshotScale: true plus scale factors
The tool automatically transforms screenshot coordinates to device coordinates

Example:

idb-ui-tap {
  x: 256,              // Screenshot coordinate
  y: 512,              // Screenshot coordinate
  applyScreenshotScale: true,
  screenshotScaleX: 1.67,
  screenshotScaleY: 1.66
}
// Tool automatically calculates: deviceX = 256 * 1.67, deviceY = 512 * 1.66

Manual Transformation (For Reference)

If not using automatic transformation:

scaleX: Multiply screenshot X coordinates by this to get device coordinates
scaleY: Multiply screenshot Y coordinates by this to get device coordinates
coordinateTransform.guidance: Human-readable instructions

Important: Most agents should use the automatic transformation via idb-ui-tap's applyScreenshotScale parameter. Manual calculation is provided for reference only.

View Fingerprinting (Opt-in)

When enableCoordinateCaching is true, computes a structural hash of the view:

elementStructureHash: SHA-256 hash of element hierarchy
cacheable: Whether view is stable enough to cache coordinates
elementCount: Number of elements in hierarchy
orientation: Device orientation

Excludes loading states, animations, and dynamic content from caching.

Common Use Cases

Visual analysis: LLM-based screenshot analysis with token optimization
UI automation: Detect interactive elements and get tap coordinates
Bug reporting: Capture and transmit screenshots inline
Test documentation: Screenshot with semantic context for test tracking
Coordinate caching: Store element coordinates for repeated interactions

Token Efficiency

Screenshots are optimized for minimal token usage:

Default (half): ~170 tokens (50% savings vs full)
Full: ~340 tokens (native resolution)
Quarter: ~170 tokens (75% savings vs full)
Thumb: ~170 tokens (smallest, for thumbnails)

Token counts are estimates based on Claude's image processing (170 tokens per 512×512 tile).

Important Notes

Auto-detection: If udid is omitted, uses the currently booted device
Temp files: Uses temp directory for processing, auto-cleans up
WebP fallback: Attempts WebP compression, falls back to JPEG if unavailable
Element extraction: Requires app to be running with accessibility enabled
Coordinate accuracy: Transform provides pixel-perfect coordinate mapping

Error Handling

Simulator not found: Validates simulator exists in cache
Simulator not booted: Indicates simulator must be booted first
Capture failure: Reports if screenshot capture fails
Optimization failure: Falls back to original if optimization fails
Element extraction: Gracefully degrades if accessibility is unavailable

Next Steps After Screenshot

Analyze visually: LLM processes inline image for visual analysis
Interact with elements: Use coordinates from interactiveElements
Tap elements: Apply coordinate transform if resized, then use simctl-tap
Query specific elements: Use simctl-query-ui for targeted element discovery
Cache coordinates: Store fingerprint for reuse on identical views

Comparison with simctl-io

Feature	screenshot-inline	simctl-io
Returns	Base64 inline	File path
Optimization	Automatic	Manual
Elements	Auto-detected	Not included
Transform	Included	Included
Use case	MCP responses	File storage
Token usage	Optimized	Depends on size

XC-MCP: XCode CLI wrapper

screenshot

Instructions