Skip to main content
Glama

snapshot

Retrieve a tree-structured view of a window's UI, listing interactive elements and their IDs for direct manipulation.

Instructions

Return a tree-structured view of a window's UI.

The primary orient tool in no-vision mode.  In vision mode, use it
as a structural complement to screenshot() — cheaper than a
screenshot and gives element IDs directly.  Default behaviour: pick
the currently active window, walk its accessibility tree, prune
anonymous structural wrappers, preserve semantic containers (dialogs,
menus, lists, etc.), and emit an indented text view with one line per
interactive element.

Args:
    app: Snapshot the given app's active window (or first window).
        Case-insensitive.
    window_id: Snapshot a specific window by ID.
    element_id: Start the tree walk from a specific element instead
        of the window root.  Use to dig into a container whose
        children were not visible in a previous (truncated) snapshot.
        To read the text content of a container, use read_text()
        rather than snapshot().
    all_elements: If true, include every named element — not just
        interactive + container roles.  Use when the default
        filter is hiding something.
    max_depth: Maximum tree depth to walk.  Defaults to the
        configured value (typically 20).  Decrease for a faster
        overview of a large window.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
appNo
window_idNo
element_idNo
all_elementsNo
max_depthNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes default behavior (active window, accessibility tree walk, pruning anonymous wrappers, preserving semantic containers, indented text output) and all parameters' effects. No annotations provided, so description fully discloses behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with a clear purpose statement followed by Args section. Each sentence adds value, no redundancy. Efficiently conveys all necessary information without being verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema, explanation of return values is not needed. Description covers all parameters, usage context, and alternatives, making it complete for a tool with 5 parameters and no annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, but description explains all 5 parameters with details: case-insensitivity for app, usage of window_id, element_id for digging into containers, all_elements for including non-interactive, and max_depth for controlling depth. Adds significant meaning beyond bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a tree-structured view of a window's UI, and distinguishes it from siblings like screenshot (structural complement) and read_text (for text content).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says it's the primary orient tool in no-vision mode and a structural complement to screenshot in vision mode. Also advises using read_text for container text instead of snapshot, providing clear when-to-use and alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Touchpoint-Labs/Touchpoint'

If you have feedback or need assistance with the MCP directory API, please join our Discord server