Skip to main content
Glama

navigate

Navigate to a specified URL and capture a screenshot with interactive elements clearly labeled for automated web interaction and visual analysis.

Instructions

Navigate to a URL and return a screenshot with labeled interactive elements

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to navigate to

Implementation Reference

  • The core _navigate method implementation that navigates to a URL using Playwright and returns an observation with labeled interactive elements.
    def _navigate(self, url: str = None, **_) -> BrowserResult:
        """Navigate to URL and return observation"""
        if not url:
            return BrowserResult(success=False, error="URL required")
        
        self._ensure_browser()
        
        try:
            self._page.goto(url, timeout=30000, wait_until="domcontentloaded")
            self._page.wait_for_timeout(1500)
            return self._observe()
            
        except Exception as e:
            return BrowserResult(
                success=False,
                error=f"Navigation failed: {str(e)}"
            )
  • Tool registration for 'navigate' with schema definition requiring a 'url' parameter of type string.
    Tool(
        name="navigate",
        description="Navigate to a URL and return a screenshot with labeled interactive elements",
        inputSchema={
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "The URL to navigate to"
                }
            },
            "required": ["url"]
        }
    ),
  • The call_tool handler for 'navigate' that extracts the URL argument and calls browser.execute with action='navigate'.
    if name == "navigate":
        result = await asyncio.to_thread(
            browser.execute, 
            action="navigate", 
            url=arguments.get("url")
        )
  • The execute method that maps the 'navigate' action string to the _navigate method handler (line 77).
    def execute(self, action: str, **kwargs) -> BrowserResult:
        """Execute a browser action"""
        if not _playwright_available:
            return BrowserResult(
                success=False,
                error="Playwright not installed. Run: pip install playwright && playwright install chromium"
            )
        
        actions = {
            "navigate": self._navigate,
            "observe": self._observe,
            "click": self._click,
            "multi_click": self._multi_click,
            "type": self._type,
            "scroll": self._scroll,
            "close": self._close
        }
        
        handler = actions.get(action)
        if not handler:
            return BrowserResult(
                success=False,
                error=f"Unknown action: {action}. Available: {list(actions.keys())}"
            )
        
        try:
            return handler(**kwargs)
        except Exception as e:
            return BrowserResult(
                success=False,
                error=f"Browser error: {str(e)}"
            )
  • The _observe helper method called after navigation to capture screenshot and inject Set-of-Mark labels for interactive elements.
    def _observe(self, **_) -> BrowserResult:
        """Get visual observation of current page"""
        if self._page is None:
            return BrowserResult(success=False, error="No page open. Use navigate first.")
        
        try:
            self._page.wait_for_timeout(500)
            
            elements = self._page.evaluate(self.SOM_INJECT_SCRIPT)
            
            self._element_map = {}
            for el in elements:
                self._element_map[el['id']] = {
                    'x': el['x'],
                    'y': el['y'],
                    'width': el['width'],
                    'height': el['height'],
                    'tag': el['tag'],
                    'type': el['type'],
                    'text': el['text']
                }
            
            screenshot_bytes = self._page.screenshot(
                type="jpeg",
                quality=self.SCREENSHOT_QUALITY
            )
            screenshot_base64 = base64.b64encode(screenshot_bytes).decode('utf-8')
            
            elements_for_llm = []
            for el in elements:
                element_info = {
                    'id': el['id'],
                    'tag': el['tag'],
                }
                if el['text']:
                    element_info['text'] = el['text']
                if el['type']:
                    element_info['type'] = el['type']
                elements_for_llm.append(element_info)
            
            return BrowserResult(
                success=True,
                data={
                    'url': self._page.url,
                    'title': self._page.title(),
                    'screenshot': screenshot_base64,
                    'elements': elements_for_llm,
                    'element_count': len(elements)
                },
                metadata={'has_image': True}
            )
            
        except Exception as e:
            return BrowserResult(
                success=False,
                error=f"Observation failed: {str(e)}"
            )
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses the behavioral outcome (screenshot with labeled elements) but omits critical details: whether navigation is headless or visible, timeout or error handling, authentication needs, rate limits, or what 'labeled interactive elements' entails (e.g., bounding boxes, types). This leaves significant gaps for a tool with potential side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action and outcome with zero waste. Every word earns its place, making it highly concise and well-structured for quick understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete for a tool that performs navigation and returns a complex result (screenshot with labels). It lacks details on behavioral traits, error conditions, output format, or prerequisites, leaving the agent with insufficient context to use it effectively beyond basic invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with the parameter 'url' fully documented in the schema. The description adds no additional meaning beyond implying navigation, which aligns with the schema. Baseline 3 is appropriate as the schema does the heavy lifting, but no extra context is provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('navigate to a URL') and the outcome ('return a screenshot with labeled interactive elements'), distinguishing it from sibling tools like 'screenshot' (which likely captures without navigation) and 'click'/'type' (which interact with elements). It uses precise verbs and specifies the resource (URL) and result format.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'screenshot' (which might capture without navigation) or 'click' (which might require prior navigation). It lacks explicit when/when-not instructions or named alternatives, offering only implied usage based on the action described.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/LingTravel/Atlas-Browser'

If you have feedback or need assistance with the MCP directory API, please join our Discord server