Skip to main content
Glama

desktop_state

Verify desktop state by retrieving focused window, element, cursor position, and modal dialogs. Use after each action to confirm state without screenshots.

Instructions

Purpose: Read-only observation of the current desktop state. Returns focused window/element, modal flag, attention signal from Auto Perception. Phase 4 absorbs former get_active_window / get_cursor_position / get_screen_info / get_document_state via include* flags. Details: Always returns: focusedWindow (title, hwnd, processName), focusedElement (name, type, value, automationId), cursorPos {x,y}, cursorOverElement (name, type), cursorOverWindow, hasModal (boolean), pageState ('ready'|'loading'|'dialog'), attention, visibleWindows count. Optional fields (default off): includeCursor:true → cursor {x,y,monitorId} (richer than cursorPos). includeScreen:true → screen {virtualScreen, displays[], displayCount, primaryIndex}. includeDocument:true → document {url, title, readyState, selection, scroll, viewport} via CDP (silently omitted on non-Chromium foreground). includeSessionContext:true (or include:['sessionContext']) → sessionContext {origin, consoleSessionId, sessionLabel, sessionState, ownWinStation} for Terminal Services session classification (ADR-017, observability-only). Chromium: cursorOverElement is null (UIA sparse); focusedElement may fall back to CDP document.activeElement; hints.focusedElementSource reports which path produced the row ('view' = engine-perception latest_focus, 'uia' = direct UIA query, 'cdp' = document.activeElement). Does NOT enumerate descendants — use desktop_discover for actionable entity list and window list. Prefer: Use after each action to confirm state. Cheapest observation tool — cheaper than any screenshot. attention='ok' means safe to proceed; other values require recovery (see suggest[]). Set include* flags only when you need the extra data (each adds one syscall or CDP round-trip). Caveats: Cannot detect non-UIA elements (custom-drawn UIs, game overlays). hasModal only detects modal dialogs exposed via UIA — browser alert/confirm dialogs may not appear here. includeDocument requires browser_open (CDP active); silently omitted otherwise with hints.documentUnavailable.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
includeCursorNoWhen true, add a richer `cursor` field with monitor index alongside the lightweight `cursorPos`. Phase 4: absorbs former get_cursor_position. Default false.
includeScreenNoWhen true, add a `screen` field with all connected display info (resolution, position, DPI, scale). Phase 4: absorbs former get_screen_info. Default false. Use the displayId values returned here in screenshot / window_dock(action='dock').
includeDocumentNoWhen true, add a `document` field with the focused Chrome tab's url, title, readyState, selection, and scroll position via CDP. Phase 4: absorbs former get_document_state. Default false. Requires browser_open (CDP active); silently omitted on non-Chromium foreground.
includeSessionContextNoWhen true, add a `sessionContext` field with the Terminal Services session classification (origin, consoleSessionId, sessionLabel: 'console'|'rdp'|'other', sessionState: 'active'|'connected'|'disconnected'|'locked'|'unknown', ownWinStation). Default false. Equivalent to `include: ['sessionContext']`. Per ADR-017: observability-only — does not gate input. `sessionState: 'locked'` is a heuristic (active + foreground=null + previous sample within 60s saw a non-null foreground); treat it as a generic input-pause signal — it can also fire on secure-desktop transitions (UAC prompt, Credential UI), where the user-visible state is not strictly 'locked' but input is equally unavailable to this session.
portNoCDP port for includeDocument (default 9222).
tabIdNoOptional CDP tab id for includeDocument; omit for the focused tab.
includeNoOptional response-shape opt-in. `['envelope']` returns the self-documenting envelope (`_version` / `data` / `as_of` / `confidence`). `['raw']` forces raw shape (overrides DESKTOP_TOUCH_ENVELOPE=1 server default). Default behaviour is raw shape (compat with existing clients).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses behavioral traits: cannot detect non-UIA elements, hasModal limitations, includeDocument silently omitted on non-Chromium, Chromium specifics for cursorOverElement and focusedElement. It also mentions hints like focusedElementSource and documentUnavailable, and caveats about browser dialogs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections: Purpose, Details, Prefer, Caveats. Front-loaded with the core purpose. Every sentence provides useful information; no redundancy. Length is justified by the complexity of the tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, multiple optional includes, no output schema), the description is complete. It explains all return fields, optional enrichments, Chromium fallbacks, and limitations. No missing information that would hinder correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining the purpose of each include flag (e.g., 'absorbs former get_cursor_position') and providing context for the envelope option. It clarifies the relationship between includeCursor and cursorPos, and notes default behavior. Slight improvement over schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is for read-only observation of current desktop state, listing specific outputs like focused window, element, cursor, modal flag, etc. It distinguishes itself from sibling desktop_discover by noting it does not enumerate descendants, and mentions it absorbs several former tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance: use after each action to confirm state, and use desktop_discover for actionable entity list. It also advises setting include* flags only when needed, highlighting cost implications. Clearly communicates when to use this tool vs alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Harusame64/desktop-touch-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server