Skip to main content

Gemini Web Automation MCP

Overview Inspect Schema Related Servers Score Discussions

MIT License

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
`HEADLESS`	No	Set to 'true' for faster headless mode	false
`GEMINI_MODEL`	No	Gemini model to use	gemini-2.5-computer-use-preview-10-2025
`SCREEN_WIDTH`	No	Browser screen width (recommended by Google)	1440
`SCREEN_HEIGHT`	No	Browser screen height (recommended by Google)	900
`GEMINI_API_KEY`	Yes	Your Gemini API key (required by Google)
`SCREENSHOT_OUTPUT_DIR`	No	Directory to save screenshots	output_screenshots

Tools

Functions exposed to the LLM to take actions

Name	Description
browse_web	Browse the web to complete a task using AI-powered browser automation. The AI agent can navigate websites, click buttons, fill forms, search for information, and interact with web pages just like a human user. This runs synchronously and returns when the task is complete. Args: task: What you want to accomplish (e.g., "Find the top 3 gaming laptops on Amazon") url: Starting webpage (defaults to Google) Returns: Dictionary containing: - ok: Boolean indicating success - data: Task completion message with results - screenshot_dir: Path to saved screenshots - session_id: Unique session identifier - progress: List of actions taken during browsing - error: Error message (if task failed) Examples: - "Search for Python tutorials and summarize the top result" - "Go to example.com and click the login button" - "Find product reviews for iPhone 15 Pro" Note: For long-running tasks, consider using start_web_task instead.
get_web_screenshots	Retrieve screenshots captured during a web browsing session. Each browsing session saves screenshots of the pages visited. Use this to review what the AI agent saw and did during task execution. Args: session_id: Session ID returned from browse_web or check_web_task Returns: Dictionary containing: - ok: Boolean indicating success - screenshots: List of screenshot file paths - session_id: The session identifier - count: Number of screenshots found - error: Error message (if session not found) Example: get_web_screenshots("20251017_143022_a1b2c3d4")
start_web_task	Start a web browsing task in the background and return immediately. Use this for tasks that might take a while (30+ seconds). The task runs asynchronously while you continue working. Check progress with check_web_task(). Args: task: What you want to accomplish on the web url: Starting webpage (defaults to Google) Returns: Dictionary containing: - ok: Boolean indicating task was started successfully - task_id: Unique ID to check progress later - status: Will be "running" - message: Instructions for checking progress Examples: - start_web_task("Research top 10 AI companies and their products") - start_web_task("Find and compare prices for MacBook Pro on 5 different sites") Next steps: Use check_web_task(task_id) to monitor progress. Wait at least 5 seconds between status checks.
check_web_task	Check progress of a background web browsing task. Returns a summary of task progress. By default, returns compact format to avoid filling your context window with verbose progress logs. IMPORTANT: To prevent context bloat, wait at least 3-5 seconds between checks. Use the 'recommended_poll_after' timestamp as guidance. Args: task_id: Task ID from start_web_task() compact: Return summary only (default: True). Set to False for full details. Returns: Dictionary containing: - ok: Boolean indicating success - task_id: Task identifier - status: "pending", "running", "completed", "failed", or "cancelled" - progress_summary: Recent actions (compact mode only) - progress: Full action history (full mode only) - result: Task results (when completed) - error: Error message (when failed) - recommended_poll_after: Timestamp to check again (when running) - polling_guidance: Message about polling frequency Examples: - check_web_task("abc-123-def") # Compact summary - check_web_task("abc-123-def", compact=False) # Full details Best Practice: Only poll every 3-5 seconds to keep your context window clean. Use the wait() tool to pause between checks if your platform doesn't support automatic delays. Recommended workflow: 1. start_web_task("...") 2. wait(5) 3. check_web_task(task_id) 4. If still running, repeat steps 2-3
stop_web_task	Stop a running web browsing task. Immediately halts task execution and cleans up browser resources. Use this when you need to cancel a long-running task that's no longer needed. Args: task_id: Task ID from start_web_task() Returns: Dictionary containing: - ok: Boolean indicating success - message: Confirmation message - task_id: The stopped task ID - error: Error message (if task not found or already completed) Examples: - stop_web_task("abc-123-def") Note: Cannot stop tasks that are already completed or failed.
wait	Wait for a specified number of seconds before continuing. Use this when you need to pause between operations, such as: - Waiting between status checks to avoid rapid polling - Giving a web task time to make progress - Rate limiting your requests - Waiting for external processes to complete Args: seconds: Number of seconds to wait (1-60) Returns: Dictionary containing: - ok: Boolean indicating success - waited_seconds: How long the wait lasted - message: Confirmation message Examples: - wait(5) # Wait 5 seconds - wait(10) # Wait 10 seconds Best Practice: Use this instead of immediately polling check_web_task multiple times. Recommended wait time between status checks: 3-5 seconds. Note: Maximum wait time is 60 seconds to prevent timeout issues.
list_web_tasks	List all web browsing tasks, including active and completed ones. Shows a summary of all tasks in the current session. Useful for tracking multiple concurrent browsing operations. Returns: Dictionary containing: - ok: Boolean indicating success - tasks: Array of task status objects (compact format) - count: Total number of tasks - active_count: Number of currently running tasks Examples: - list_web_tasks() Note: Returns compact task summaries. Use check_web_task(task_id) for details.

Prompts

Interactive templates invoked by user choice

Name	Description
No prompts

Resources

Contextual data attached and managed by the client

Name	Description
No resources

Latest Blog Posts

OpenTelemetry for Model Context Protocol (MCP) Analytics and Agent Observability
By Om-Shree-0709 on .
observability
mcp
opentelemetry
Securing Enterprise AI Agents with Unique Identities in the Model Context Protocol (MCP)
By Om-Shree-0709 on .
When Your Year of Work Gets Copied Overnight: What Actually Matters?
By punkpeye on .
startups

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/vincenthopf/computer-use-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server