perception_register
Register a live state tracker for a window or browser tab to verify target identity, focus, and safety before performing actions like typing or clicking, reducing screenshot round trips.
Instructions
Purpose: Register a perception lens, a lightweight live state tracker for one window or browser tab. Use it before repeated actions so later tool calls can verify target identity, focus, readiness, modal obstruction, and click safety without taking another screenshot. Details: Returns a lensId that can be passed to action tools such as keyboard_type, keyboard_press, mouse_click, browser_click_element, and browser_navigate. When a tool receives lensId, desktop-touch refreshes the tracked state, evaluates safety guards, and attaches a compact post.perception envelope to the response. The envelope reports attention, guard status, recent changes, and the latest known target state, reducing get_context/screenshot round trips. Prefer: Use for multi-step workflows on the same app window or browser tab, especially before typing, clicking coordinates, navigating browser tabs, or acting after focus may have changed. It is most useful when mistakes would be costly, such as typing into the wrong window or clicking stale coordinates. Caveats: A lens is not a visual recognition model. It tracks structured state from Win32, CDP, and optional UIA sensors. safe.clickCoordinates checks window bounds, not pixel-level occlusion. browserTab lenses require Chrome/Edge with --remote-debugging-port=9222. If attention is dirty, stale, settling, guard_failed, or identity_changed, follow the suggested action before continuing. Maximum 16 active lenses are kept; old lenses may be evicted. Examples: perception_register({name:'editor', target:{kind:'window', match:{titleIncludes:'Visual Studio Code'}}}) → {lensId:'perc-1'} keyboard_type({windowTitle:'Visual Studio Code', text:'hello', lensId:'perc-1'}) → response includes post.perception perception_read({lensId:'perc-1'}) → force a fresh envelope when attention is dirty/stale perception_forget({lensId:'perc-1'}) → release tracking when the workflow is done
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Human-readable name for this lens (e.g. 'target-editor'). Helps identify it in perception_list. | |
| target | Yes | Target entity to track. 'window' targets use Win32; 'browserTab' targets use CDP. | |
| maintain | No | Fluents to keep alive. Defaults to all fluents; irrelevant kinds for the target type are silently ignored (e.g., browser.* fluents are skipped on window lenses). | |
| guards | No | Guards to evaluate before actions that pass this lensId. Defaults to all guards. Remove guards you don't need to reduce false blocks. | |
| guardPolicy | No | How guard failures are handled. 'block' (default) returns {ok:false, code:'GuardFailed'}. 'warn' allows the action through and sets attention:'guard_failed' in the envelope. | block |
| maxEnvelopeTokens | No | Maximum token budget for the perception envelope attached to tool responses. Fields are dropped in priority order when the budget is exceeded. | |
| salience | No | Lens salience hint. 'critical' lenses are refreshed more eagerly (future use). | normal |