Skip to main content
Glama
vincenthopf

Gemini Web Automation MCP

by vincenthopf

start_web_task

Launch background web browsing tasks that run asynchronously while you continue working. Use for research, price comparisons, or data collection that may take 30+ seconds, then monitor progress with check_web_task.

Instructions

Start a web browsing task in the background and return immediately.

Use this for tasks that might take a while (30+ seconds). The task runs
asynchronously while you continue working. Check progress with check_web_task().

Args:
    task: What you want to accomplish on the web
    url: Starting webpage (defaults to Google)

Returns:
    Dictionary containing:
    - ok: Boolean indicating task was started successfully
    - task_id: Unique ID to check progress later
    - status: Will be "running"
    - message: Instructions for checking progress

Examples:
    - start_web_task("Research top 10 AI companies and their products")
    - start_web_task("Find and compare prices for MacBook Pro on 5 different sites")

Next steps:
    Use check_web_task(task_id) to monitor progress.
    Wait at least 5 seconds between status checks.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
taskYes
urlNohttps://www.google.com

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The primary handler for the 'start_web_task' tool. This function is registered via the @mcp.tool() decorator. It creates a background task using task_manager.create_task and initiates execution asynchronously via task_manager.start_task, returning a task_id immediately for later polling.
    @mcp.tool()
    async def start_web_task(task: str, url: str = "https://www.google.com") -> dict[str, Any]:
        """
        Start a web browsing task in the background and return immediately.
    
        Use this for tasks that might take a while (30+ seconds). The task runs
        asynchronously while you continue working. Check progress with check_web_task().
    
        Args:
            task: What you want to accomplish on the web
            url: Starting webpage (defaults to Google)
    
        Returns:
            Dictionary containing:
            - ok: Boolean indicating task was started successfully
            - task_id: Unique ID to check progress later
            - status: Will be "running"
            - message: Instructions for checking progress
    
        Examples:
            - start_web_task("Research top 10 AI companies and their products")
            - start_web_task("Find and compare prices for MacBook Pro on 5 different sites")
    
        Next steps:
            Use check_web_task(task_id) to monitor progress.
            Wait at least 5 seconds between status checks.
        """
        logger.info(f"Starting async web browsing task: {task}")
    
        # Create task
        task_id = task_manager.create_task(task, url)
    
        # Start task in background using anyio (FastMCP best practice)
        # Use anyio.to_thread.run_sync to run the blocking start_task in a thread
        # We await it but start_task itself just spawns the thread and returns immediately
        success = await anyio.to_thread.run_sync(
            task_manager.start_task,
            task_id,
            logger
        )
    
        if not success:
            return {
                "ok": False,
                "error": "Failed to start task"
            }
    
        logger.info(f"Task {task_id} started in background, returning immediately")
    
        return {
            "ok": True,
            "task_id": task_id,
            "status": "running",
            "message": f"Task started. Use check_web_task('{task_id}') to monitor progress."
        }
  • Helper method in BrowserTaskManager that creates a new BrowserTask instance and stores it, returning the unique task_id. Called by start_web_task.
    def create_task(self, task_description: str, url: str = "https://www.google.com") -> str:
        """Create a new browser automation task.
    
        Args:
            task_description: Description of the browsing task
            url: Starting URL
    
        Returns:
            task_id: Unique identifier for the task
        """
        task_id = str(uuid.uuid4())
        task = BrowserTask(task_id, task_description, url)
    
        with self._lock:
            self.tasks[task_id] = task
    
        return task_id
  • Helper method that transitions the task to RUNNING status and spawns a daemon thread to execute _execute_task in the background. Called by start_web_task.
    def start_task(self, task_id: str, logger=None) -> bool:
        """Start executing a task in the background.
    
        Args:
            task_id: Task identifier
            logger: Optional logger instance
    
        Returns:
            True if task started, False if task not found or already running
        """
        with self._lock:
            task = self.tasks.get(task_id)
            if not task or task.status != TaskStatus.PENDING:
                return False
    
            task.status = TaskStatus.RUNNING
            task.started_at = datetime.now(timezone.utc).isoformat()
    
        # Run task in background thread (don't store reference)
        thread = threading.Thread(
            target=self._execute_task,
            args=(task_id, logger),
            daemon=True,
            name=f"BrowserTask-{task_id[:8]}"
        )
        thread.start()
    
        if logger:
            logger.info(f"Started background thread for task {task_id[:8]}... Thread: {thread.name}")
    
        return True
  • Private helper method run in background thread that instantiates GeminiBrowserAgent and calls its execute_task method to perform the actual web browsing. Handles completion, error, and cleanup.
    def _execute_task(self, task_id: str, logger=None):
        """Execute the browser automation task (runs in background thread)."""
        if logger:
            logger.info(f"[Thread {threading.current_thread().name}] Starting execution for task {task_id[:8]}...")
    
        with self._lock:
            task = self.tasks.get(task_id)
            if not task:
                if logger:
                    logger.error(f"Task {task_id} not found in _execute_task")
                return
    
        try:
            # Create browser agent
            agent = GeminiBrowserAgent(logger=logger)
            task.agent = agent
    
            # Execute the task
            result = agent.execute_task(task.task_description, task.url)
    
            with self._lock:
                task.result = result
                task.progress_updates = agent.progress_updates.copy()
                task.status = TaskStatus.COMPLETED
                task.completed_at = datetime.now(timezone.utc).isoformat()
    
        except Exception as e:
            with self._lock:
                task.error = str(e)
                task.status = TaskStatus.FAILED
                task.completed_at = datetime.now(timezone.utc).isoformat()
    
        finally:
            # Clean up browser
            if task.agent:
                task.agent.cleanup_browser()
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key traits: the task runs asynchronously in the background, returns immediately, requires monitoring with check_web_task, and has a default URL. However, it doesn't mention potential errors, timeouts, or resource limits, leaving some behavioral aspects uncovered.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and appropriately sized, with clear sections (purpose, args, returns, examples, next steps). Every sentence adds value, such as explaining asynchronous behavior, providing usage examples, and outlining follow-up steps, with no redundant or wasted content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (asynchronous operation with 2 parameters) and the presence of an output schema (which covers return values), the description is complete. It explains the tool's purpose, usage, parameters, and next steps adequately, compensating for the lack of annotations and low schema coverage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, so the description must compensate. It adds meaningful context for both parameters: 'task' is described as 'What you want to accomplish on the web' with examples, and 'url' is clarified as the 'Starting webpage (defaults to Google)'. This goes beyond the basic schema types, though it could provide more detail on URL formatting or task constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Start a web browsing task in the background') and resource ('web browsing task'), distinguishing it from siblings like browse_web (likely synchronous) and check_web_task (monitoring). It explicitly mentions returning immediately and running asynchronously, which differentiates its purpose from other tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('for tasks that might take a while (30+ seconds)'), when not to use it (implied for shorter tasks), and alternatives (check_web_task for monitoring progress). It also specifies prerequisites like waiting 5 seconds between checks, making usage context clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/vincenthopf/computer-use-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server