Schema | glovebox-mcp

glovebox-mcp

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
`DISPLAY`	Yes	The X11 display to control (e.g., ':1' for the sandbox). Required.
`GLOVEBOX_VISION`	No	Vision backend: 'none', 'basic', or 'local'.	local
`GLOVEBOX_HOST_DISPLAY`	No	Host display for new instances.	:0

Capabilities

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": false }
`prompts`	{ "listChanged": false }
`resources`	{ "subscribe": false, "listChanged": false }
`experimental`	{}

Tools

Functions exposed to the LLM to take actions

Name	Description
launch_appA	Launch a GUI app in its OWN new Xephyr display/window and return its instance id. `command` is any shell command ('chromium', 'xterm', 'gimp', 'inkscape file.svg', …). Chromium automatically gets X11 flags, a per-instance profile, downloads routed to files//, and a CDP debug port (needed by upload_file). Other GTK apps get D-Bus isolation so they render on this display, not the host. Returns {"ok", "instance", "display", "name", "files_dir"} — pass that instance id to every control tool. Fails (isError) if Xephyr can't start on the host display, the command doesn't exist, or the app exits immediately. Each instance has its own cursor, so separate agents can drive separate instances in parallel; never share one instance between agents. The window appears on GLOVEBOX_HOST_DISPLAY (default :0). Give the app a moment to draw (settle_ms/wait_ms) before the first screenshot.
list_instancesA	List all instances (running app windows): id, display, name, command, up. Returns {"ok", "count", "instances": [{instance, display, name, command, up}, …]}. Probes displays :1–:12 for sandboxes started outside this server (e.g. start-display.sh), so it also finds instances this process didn't launch (their name/command show as unknown). Use it to discover what you can drive and to verify an instance is up before acting.
close_instanceA	Close an instance: kill its app + its Xephyr display, and verify it actually went down. Returns {"ok", "instance", "detail"} only after the display stops responding. Fails (isError) if the instance isn't running, if it survives SIGTERM, or for instance 1 — the main start-display.sh sandbox is protected (stop it from the host with `pkill Xephyr`).
get_screen_sizeA	Screen size of an instance + the active vision backend and server version. Returns {"ok", "instance", "width", "height", "vision", "version"}. Call it once before coordinate math — coordinates are absolute pixels, top-left origin, within THIS instance's display. "vision" tells you how to ground: 'basic'/'local' → parse_screen + click_element; 'none' → screenshot + reason about pixels yourself. Fails (isError) if the instance is down.
screenshotA	Screenshot of an instance's window (PNG image, full display). The ONLY way to see the screen — always look before you act. For an element list with coordinates use parse_screen (basic/local vision). Prefer observe='screenshot' on action tools to act and see the result in one call. Fails (isError) if the instance is down.
parse_screenA	Detect on-screen elements and return them with ids + pixel-centers (vision grounding). Returns {"ok", "vision", "screen": [W,H], "annotated", "count", "truncated", "elements": [{id, type, label, interactive, center: [x,y]}, …]} — then click_element(id). Also saves a numbered Set-of-Mark image to /tmp/glovebox_annotated_.png. Backend = GLOVEBOX_VISION: 'local' (OmniParser, text + icons, ~2 s on GPU after a ~6 s first-call model load), 'basic' (tesseract, text only), 'none' (returns a note telling you to screenshot + reason about pixels yourself — this is normal, not an error). Elements are capped at 300 per call ("truncated": true when more were detected). Ids are only valid until the screen changes — re-parse after navigation.
clickA	Click at absolute pixel coords (top-left origin) in an instance. Returns {"ok", "action", "instance", "detail"}. button: 1=left, 2=middle, 3=right. `observe`='screenshot'\|'parse' additionally returns the resulting screen state in the SAME call (saves a round-trip); `settle_ms` (max 5000) waits for the UI to update first — use 400–1500 after anything that navigates or submits. Fails (isError) if the instance is down.
dragA	Drag (mouse-down → move → up) from (x1,y1) to (x2,y2) — for drawing, selecting, sliders. Returns {"ok", "action", "instance", "detail"}. `observe`/`settle_ms`: see click(). Fails (isError) if the instance is down.
double_clickA	Double-click at absolute pixel coords in an instance (e.g. open a file icon, select a word). Returns {"ok", "action", "instance", "detail"}. `observe`/`settle_ms`: see click(). Fails (isError) if the instance is down.
click_elementA	Click an element by id from that instance's most recent parse_screen (no coordinate guessing). Returns {"ok", "action", "instance", "detail", "element", "center"}. Fails (isError) with the fix named if the id is unknown — run parse_screen(instance) first. Ids go stale when the screen changes: after navigation, re-parse before clicking. `observe`/`settle_ms`: see click().
type_textA	Type into the focused field — unicode-safe. Click the field first to focus it. Returns {"ok", "action", "instance", "detail", "method": "xdotool"\|"clipboard"}. Pure ASCII is typed via xdotool; text with any non-ASCII char (č/š/ž …) is inserted via the clipboard + ctrl+v, because xdotool's synthetic unicode keystrokes get silently dropped by some toolkits (e.g. Inkscape's GTK canvas) even though they work in browsers. Fails (isError) rather than silently losing characters if xclip is missing for non-ASCII text. `observe`/`settle_ms`: see click().
upload_fileA	Attach a local file to a page's via Chrome DevTools Protocol — use this for ALL browser uploads (logos, images, docs). The sandbox's native GTK file picker is invisible to us AND hangs the renderer, so never click an upload button expecting a dialog — call this instead. Works on Chromium started by launch_app / start-display.sh (they open a per-instance --remote-debugging-port, 9222+N). `selector` targets the file input on the MAIN page (default = first file input). If the site only inserts the after you click its 'upload' control, click that first (it won't open a dialog we can see, but it wires up the input), then call this. Returns {"ok", "action", "instance", "detail"}; fails (isError) with the reason if the file is missing, no CDP target answers, or the selector matches nothing. NATIVE (non-browser) apps: use open_file() or drive the app's own Open dialog — it IS visible here. `observe`/`settle_ms`: see click().
open_fileA	Open a local file in an app ON an existing instance's display (NOT a new instance). If `app` is given, runs `<app> <filepath>` there (e.g. app='gimp'); otherwise tries xdg-open. Relative paths resolve against the instance's staging folder files// (see list_files). Returns {"ok", "action", "instance", "opened", "app"}; fails (isError) if the file or the app doesn't exist. The app is started asynchronously — screenshot after a moment to confirm it drew. For BROWSER uploads use upload_file() instead. GTK apps get the same X11 + D-Bus handling as launch_app (so they render on this display, not the host).
list_filesA	The instance's staging folder (files// under the install dir) and its contents. Returns {"ok", "instance", "dir", "count", "truncated", "files": […]} (capped at 200 entries, sorted). Drop files there (or reference any host path you can read) to open in apps via open_file() or an app's Open dialog; launch_app Chromium instances also download / 'save as' into it — call this after a download to confirm the file arrived. Browser uploads still go through upload_file().
press_keysA	Press a key or combo (xdotool syntax): 'Return', 'ctrl+a', 'Tab', 'ctrl+t', 'ctrl+l', 'F5'. Returns {"ok", "action", "instance", "detail"}. Fails (isError) on an invalid keysym or a dead instance. Browser navigation: 'ctrl+l' → 'ctrl+a' → type_text(url) → 'Return'. `observe`/`settle_ms`: see click().
move_mouseA	Move the mouse pointer to absolute pixel coords in an instance (no click). Returns {"ok", "action", "instance", "detail"}. Useful for hover states (menus, tooltips) — follow with screenshot() to see the effect. Fails (isError) if the instance is down.
scrollA	Scroll the wheel at the current mouse position: positive = up, negative = down. Returns {"ok", "action", "instance", "detail"}. One unit ≈ one wheel notch (a few lines). Scrolling targets the window under the pointer — move_mouse/click there first. Fails (isError) on amount=0 (a no-op) or a dead instance. `observe`/`settle_ms`: see click().
wait_msA	Wait (e.g. for a page to load or an app to draw) — capped at 10000 ms. Returns {"ok", "action", "requested_ms", "waited_ms", "clamped"} — "clamped": true means you asked for more than the cap and only waited_ms elapsed. Prefer settle_ms on the action itself when you also want to observe the result in the same call.
statusA	Server + sandbox status in one read-only call — run this first, and when filing bug reports. Returns {"ok", "version", "vision", "host_display", "instances": [...same as list_instances], "deps": {xdotool, xclip, xephyr, wmctrl, tesseract, omniparser_weights: true\|false}}. "vision" is the grounding backend (none\|basic\|local); "deps" shows whether each system tool is installed and (for 'local') whether the OmniParser weights are present. No side effects.

Prompts

Interactive templates invoked by user choice

Name	Description
No prompts

Resources

Contextual data attached and managed by the client

Name	Description
No resources

Server Configuration
Capabilities
Tools
Prompts
Resources

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/segentic-lab/glovebox-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server