Skip to main content
Glama
vincenthopf

Gemini Web Automation MCP

by vincenthopf

browse_web

Automate web browsing tasks using AI to navigate websites, click buttons, fill forms, and extract information through natural language commands.

Instructions

Browse the web to complete a task using AI-powered browser automation. The AI agent can navigate websites, click buttons, fill forms, search for information, and interact with web pages just like a human user. This runs synchronously and returns when the task is complete. Args: task: What you want to accomplish (e.g., "Find the top 3 gaming laptops on Amazon") url: Starting webpage (defaults to Google) Returns: Dictionary containing: - ok: Boolean indicating success - data: Task completion message with results - screenshot_dir: Path to saved screenshots - session_id: Unique session identifier - progress: List of actions taken during browsing - error: Error message (if task failed) Examples: - "Search for Python tutorials and summarize the top result" - "Go to example.com and click the login button" - "Find product reviews for iPhone 15 Pro" Note: For long-running tasks, consider using start_web_task instead.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
taskYes
urlNohttps://www.google.com

Implementation Reference

  • server.py:43-88 (handler)
    Primary handler for the 'browse_web' MCP tool. Registered via @mcp.tool() decorator. Creates GeminiBrowserAgent, runs execute_task in executor, handles cleanup, and returns result. The docstring provides schema description and examples.
    @mcp.tool() async def browse_web(task: str, url: str = "https://www.google.com") -> dict[str, Any]: """ Browse the web to complete a task using AI-powered browser automation. The AI agent can navigate websites, click buttons, fill forms, search for information, and interact with web pages just like a human user. This runs synchronously and returns when the task is complete. Args: task: What you want to accomplish (e.g., "Find the top 3 gaming laptops on Amazon") url: Starting webpage (defaults to Google) Returns: Dictionary containing: - ok: Boolean indicating success - data: Task completion message with results - screenshot_dir: Path to saved screenshots - session_id: Unique session identifier - progress: List of actions taken during browsing - error: Error message (if task failed) Examples: - "Search for Python tutorials and summarize the top result" - "Go to example.com and click the login button" - "Find product reviews for iPhone 15 Pro" Note: For long-running tasks, consider using start_web_task instead. """ logger.info(f"Received web browsing request: {task}") # Create agent instance (browser will be cleaned up automatically) agent = GeminiBrowserAgent(logger=logger) try: # Execute task in thread pool to avoid blocking loop = asyncio.get_event_loop() result = await loop.run_in_executor(None, agent.execute_task, task, url) logger.info(f"Task completed with status: {result.get('ok')}") return result finally: # Clean up browser resources agent.cleanup_browser()
  • Core execution logic delegated from the handler. GeminiBrowserAgent.execute_task sets up Playwright browser, navigates to URL, runs the Gemini Computer Use automation loop via _run_browser_automation_loop, saves screenshots, and formats the result.
    def execute_task( self, task: str, url: Optional[str] = "https://www.google.com" ) -> Dict[str, Any]: """ Execute a browser automation task. Args: task: Description of the browsing task to perform url: Optional starting URL (defaults to Google) Returns: Dictionary with ok status and either data or error """ try: self.logger.info(f"Task: {task}") self.logger.info(f"Starting URL: {url}") self.logger.info(f"Session ID: {self.session_id}") # Setup browser if not already done if not self.page: self.setup_browser() # Navigate to starting URL if provided if url: self.page.goto(url, wait_until="domcontentloaded", timeout=10000) self.logger.info(f"Navigated to: {url}") else: # Start with a search engine self.page.goto( "https://www.google.com", wait_until="domcontentloaded", timeout=10000 ) self.logger.info("Starting from Google") # Run the browser automation loop result = self._run_browser_automation_loop(task) self.logger.info( f"Task completed! Screenshots saved to: {self.screenshot_dir}" ) return { "ok": True, "data": result, "screenshot_dir": str(self.screenshot_dir), "session_id": self.session_id, "progress": self.progress_updates, } except Exception as exc: self.logger.exception("Browser automation failed") return {"ok": False, "error": str(exc)}
  • The main automation loop using Gemini Computer Use API. Configures the model with browser tools, runs iterative turns of content generation, executes function calls (click, type, scroll etc.) via Playwright, captures screenshots, and loops until task completion or max turns.
    def _run_browser_automation_loop(self, task: str, max_turns: int = 30) -> str: """ Run the Gemini Computer Use agent loop to complete the task. Args: task: The browsing task to complete max_turns: Maximum number of agent turns Returns: The final result as a string """ # Configure Gemini with Computer Use config = types.GenerateContentConfig( tools=[ types.Tool( computer_use=types.ComputerUse( environment=types.Environment.ENVIRONMENT_BROWSER ) ) ], ) # Initial screenshot - take once and save initial_screenshot = self.page.screenshot(type="png") timestamp = datetime.now().strftime("%H%M%S") screenshot_path = ( self.screenshot_dir / f"step_{self.screenshot_counter:02d}_initial_{timestamp}.png" ) with open(screenshot_path, "wb") as f: f.write(initial_screenshot) self.logger.info(f"Saved initial screenshot: {screenshot_path}") self.screenshot_counter += 1 # Build initial contents contents = [ Content( role="user", parts=[ Part(text=task), Part.from_bytes(data=initial_screenshot, mime_type="image/png"), ], ) ] self.logger.info(f"Starting browser automation loop for task: {task}") self._add_progress("Started browser automation", "info") # Agent loop for turn in range(max_turns): self.logger.info(f"Turn {turn + 1}/{max_turns}") self._add_progress(f"Turn {turn + 1}/{max_turns}", "turn") try: # Get response from Gemini response = self.gemini_client.models.generate_content( model=GEMINI_MODEL, contents=contents, config=config, ) candidate = response.candidates[0] contents.append(candidate.content) # Check if there are function calls has_function_calls = any( part.function_call for part in candidate.content.parts ) if not has_function_calls: # No more actions - extract final text response text_response = " ".join( [part.text for part in candidate.content.parts if part.text] ) self.logger.info(f"Agent finished: {text_response}") # Save final screenshot timestamp = datetime.now().strftime("%H%M%S") screenshot_path = ( self.screenshot_dir / f"step_{self.screenshot_counter:02d}_final_{timestamp}.png" ) self.page.screenshot(path=str(screenshot_path)) self.logger.info(f"Saved final screenshot: {screenshot_path}") self.screenshot_counter += 1 return text_response # Execute function calls self.logger.info("Executing browser actions...") self._add_progress("Executing browser actions", "action") results = self._execute_gemini_function_calls(candidate) # Get function responses with new screenshot function_responses = self._get_gemini_function_responses(results) # Save screenshot after actions timestamp = datetime.now().strftime("%H%M%S") screenshot_path = ( self.screenshot_dir / f"step_{self.screenshot_counter:02d}_{timestamp}.png" ) self.page.screenshot(path=str(screenshot_path)) self.logger.info(f"Saved screenshot: {screenshot_path}") self.screenshot_counter += 1 # Add function responses to contents contents.append( Content( role="user", parts=[Part(function_response=fr) for fr in function_responses], ) ) except Exception as e: self.logger.error(f"Error in browser automation loop: {e}") raise # If we hit max turns, return what we have return f"Task reached maximum turns ({max_turns}). Please check browser state."

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/vincenthopf/computer-use-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server