Skip to main content
Glama

vision_locate

Locate page elements by natural-language description using a vision LLM. Returns coordinates and confidence, with optional click action.

Instructions

⭐ Find an element by natural-language description using a vision LLM.

Uses the same provider as solve_recaptcha_ai (OPENAI_* / ANTHROPIC_* env).
Reuses solve_recaptcha_ai's vision plumbing so any vision-capable model
works (gpt-4o, gpt-5.x, claude, llava, llama-3.2-vision).

Args:
    description: NL description, e.g. "the red Create button at bottom right"
    click: if True, also dispatches a CDP mouse_click at the located point
    api_key/base_url/model/provider: explicit overrides (else from env)

Returns JSON: {"found":true/false, "x":int, "y":int, "confidence":"high|medium|low"}.
Use when CSS selectors are unreliable (visual-only differentiator, dynamic IDs).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
descriptionYes
clickNo
api_keyNo
base_urlNo
modelNo
providerNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/RobithYusuf/mcp-stealth-chrome'

If you have feedback or need assistance with the MCP directory API, please join our Discord server