Skip to main content
Glama
MesuterPikin

Browserbase MCP Server

by MesuterPikin

browserbase_stagehand_agent

Automate web tasks using AI to navigate websites, extract data, and perform interactions through natural language commands.

Instructions

Execute a task autonomously using Gemini Computer Use agent. The agent will navigate and interact with web pages to complete the given task.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesThe task prompt describing what you want the sub-agent to accomplish. Be clear and specific about the goal. For example: 'Go to Hacker News and find the most controversial post from today, then summarize the top 3 comments'. The agent will autonomously navigate and interact with web pages to complete this task.

Implementation Reference

  • The `handleAgent` function implements the core logic of the `browserbase_stagehand_agent` tool. It initializes a Stagehand agent using the Gemini Computer Use model, executes the autonomous web task based on the input prompt, and returns the result.
    async function handleAgent(
      context: Context,
      params: AgentInput,
    ): Promise<ToolResult> {
      const action = async (): Promise<ToolActionResult> => {
        try {
          const stagehand = await context.getStagehand();
    
          // You need to provide GOOGLE_GENERATIVE_AI_API_KEY
          const agent = stagehand.agent({
            cua: true,
            model: {
              modelName: "google/gemini-2.5-computer-use-preview-10-2025",
              apiKey:
                process.env.GEMINI_API_KEY ||
                process.env.GOOGLE_API_KEY ||
                process.env.GOOGLE_GENERATIVE_AI_API_KEY,
            },
          });
    
          // Execute the task
          const result = await agent.execute({
            instruction: params.prompt,
            maxSteps: 20,
          });
    
          return {
            content: [
              {
                type: "text",
                text: `${result.message}`,
              },
            ],
          };
        } catch (error) {
          const errorMsg = error instanceof Error ? error.message : String(error);
          throw new Error(`Failed to execute agent task: ${errorMsg}`);
        }
      };
    
      return {
        action,
        waitForNetwork: false,
      };
    }
  • Defines the Zod `AgentInputSchema` for the tool's input (prompt) and the `agentSchema` with the tool name 'browserbase_stagehand_agent', description, and input schema.
    const AgentInputSchema = z.object({
      prompt: z.string().describe(
        `The task prompt describing what you want the sub-agent to accomplish.
        Be clear and specific about the goal. For example:
        'Go to Hacker News and find the most controversial post from today, then summarize the top 3 comments'.
        The agent will autonomously navigate and interact with web pages to complete this task.`,
      ),
    });
    
    type AgentInput = z.infer<typeof AgentInputSchema>;
    
    const agentSchema: ToolSchema<typeof AgentInputSchema> = {
      name: "browserbase_stagehand_agent",
      description: `Execute a task autonomously using Gemini Computer Use agent. The agent will navigate and interact with web pages to complete the given task.`,
      inputSchema: AgentInputSchema,
    };
  • src/index.ts:168-198 (registration)
    Dynamically registers all tools from the TOOLS array on the MCP server using `server.tool()`, including 'browserbase_stagehand_agent' by its schema.name, with execution delegated to `context.run(tool, params)`.
    const tools: MCPToolsArray = [...TOOLS];
    
    // Register each tool with the Smithery server
    tools.forEach((tool) => {
      if (tool.schema.inputSchema instanceof z.ZodObject) {
        server.tool(
          tool.schema.name,
          tool.schema.description,
          tool.schema.inputSchema.shape,
          async (params: z.infer<typeof tool.schema.inputSchema>) => {
            try {
              const result = await context.run(tool, params);
              return result;
            } catch (error) {
              const errorMessage =
                error instanceof Error ? error.message : String(error);
              process.stderr.write(
                `[Smithery Error] ${new Date().toISOString()} Error running tool ${tool.schema.name}: ${errorMessage}\n`,
              );
              throw new Error(
                `Failed to run tool '${tool.schema.name}': ${errorMessage}`,
              );
            }
          },
        );
      } else {
        console.warn(
          `Tool "${tool.schema.name}" has an input schema that is not a ZodObject. Schema type: ${tool.schema.inputSchema.constructor.name}`,
        );
      }
    });
  • The `TOOLS` export array collects all available tools, including `agentTool` (browserbase_stagehand_agent), which is imported and used for MCP server registration in src/index.ts.
    export const TOOLS = [
      ...sessionTools,
      navigateTool,
      actTool,
      extractTool,
      observeTool,
      screenshotTool,
      getUrlTool,
      agentTool,
    ];
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It mentions autonomous navigation and interaction but lacks critical behavioral details: whether this tool creates/destroys sessions, requires authentication, has rate limits, timeouts, or error handling. The description states what the tool does but not how it behaves operationally, leaving significant gaps for an agent to understand its traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise: two sentences that directly state the tool's function and method. Every word earns its place - 'Execute a task autonomously' establishes the core purpose, 'using Gemini Computer Use agent' specifies the mechanism, and 'navigate and interact with web pages' clarifies the domain. No wasted words or redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given this is a complex autonomous agent tool with no annotations and no output schema, the description is insufficiently complete. It doesn't explain what the tool returns (success/failure indicators, results of task execution), doesn't mention session management implications, and provides minimal behavioral context. For a tool that presumably orchestrates multiple web interactions, more completeness is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already fully documents the single 'prompt' parameter with examples. The description adds no additional parameter semantics beyond what's in the schema. According to scoring rules, with high schema coverage (>80%), the baseline is 3 even with no param info in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Execute a task autonomously using Gemini Computer Use agent' with the specific action of navigating and interacting with web pages. It distinguishes from siblings like 'browserbase_screenshot' or 'browserbase_stagehand_navigate' by emphasizing autonomous task execution rather than single operations. However, it doesn't explicitly contrast with 'browserbase_stagehand_act' which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for autonomous web-based task completion, with an example prompt provided in the schema. However, it lacks explicit guidance on when to use this tool versus alternatives like 'browserbase_stagehand_act' or 'browserbase_stagehand_extract'. The context is clear but no exclusions or specific alternatives are mentioned in the description itself.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/MesuterPikin/mcp-server-browserbase'

If you have feedback or need assistance with the MCP directory API, please join our Discord server