Skip to main content
Glama
ztobs

Browser Use Server

by ztobs

screenshot

Capture webpage screenshots for documentation, testing, or sharing by automating browser actions after page load.

Instructions

Take a screenshot of a webpage

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to navigate to
full_pageNoWhether to capture the full page or just the viewport
stepsNoComma-separated actions or sentences describing steps to take after page load (e.g., "click #submit, scroll down" or "Fill the login form and submit")

Implementation Reference

  • Core handler for the 'screenshot' tool: validates URL, constructs task for Agent to navigate and perform steps, runs the agent, captures screenshot (full_page option), saves to file, returns base64 and filepath.
    if command == 'screenshot':
        if not args.get('url'):
            return {
                'success': False,
                'error': 'URL is required for screenshot command'
            }
        
        task = f"1. Go to {args['url']}"
        if args.get('steps'):
            steps = args['steps'].split(',')
            for i, step in enumerate(steps, 2):
                task += f"\n{i}. {step.strip()}"
            task += f"\n{len(steps) + 2}. Take a screenshot"
        else:
            task += "\n2. Take a screenshot"
        if args.get('full_page'):
            task += " of the full page"
        
        print(f"[DEBUG] Creating agent for task: {task}")
        use_vision = os.getenv('USE_VISION', 'false').lower() == 'true'
        agent = Agent(task=task, llm=llm, use_vision=use_vision, browser_context=context)
        print("[DEBUG] Running agent")
        await agent.run()
        print("[DEBUG] Agent run completed")
        
        # Get the screenshot from the browser context
        try:
            # await context.navigate_to(args['url'])
            screenshot_base64 = await context.take_screenshot(full_page=args.get('full_page', False))
            filename = f"screenshot_{int(time.time())}.png"
            filepath = os.path.join(SCREENSHOT_DIR, filename)
            
            # Decode base64 and save image
            screenshot_bytes = base64.b64decode(screenshot_base64)
            with open(filepath, 'wb') as f:
                f.write(screenshot_bytes)
                
            return {
                'success': True,
                'screenshot': screenshot_base64, # Keep base64 for potential direct display
                'filepath': os.path.abspath(filepath) # Include full file path in response
            }
        finally:
            await context.close()
  • Tool schema definition including name, description, and input schema with required 'url', optional 'full_page' and 'steps'.
    {
      name: 'screenshot',
      description: 'Take a screenshot of a webpage',
      inputSchema: {
        type: 'object',
        properties: {
          url: {
            type: 'string',
            description: 'The URL to navigate to',
          },
          full_page: {
            type: 'boolean',
            description: 'Whether to capture the full page or just the viewport',
            default: false,
          },
          steps: {
            type: 'string',
            description: 'Comma-separated actions or sentences describing steps to take after page load (e.g., "click #submit, scroll down" or "Fill the login form and submit")',
          },
        },
        required: ['url'],
      },
    },
  • src/index.ts:149-233 (registration)
    Registers the 'screenshot' tool (among others) in the MCP server's ListToolsRequestSchema handler.
    this.server.setRequestHandler(ListToolsRequestSchema, async () => ({
      tools: [
        {
          name: 'screenshot',
          description: 'Take a screenshot of a webpage',
          inputSchema: {
            type: 'object',
            properties: {
              url: {
                type: 'string',
                description: 'The URL to navigate to',
              },
              full_page: {
                type: 'boolean',
                description: 'Whether to capture the full page or just the viewport',
                default: false,
              },
              steps: {
                type: 'string',
                description: 'Comma-separated actions or sentences describing steps to take after page load (e.g., "click #submit, scroll down" or "Fill the login form and submit")',
              },
            },
            required: ['url'],
          },
        },
        {
          name: 'get_html',
          description: 'Get the HTML content of a webpage',
          inputSchema: {
            type: 'object',
            properties: {
              url: {
                type: 'string',
                description: 'The URL to navigate to',
              },
              steps: {
                type: 'string',
                description: 'Comma-separated actions or sentences describing steps to take after page load (e.g., "click #submit, scroll down" or "Fill the login form and submit")',
              },
            },
            required: ['url'],
          },
        },
        {
          name: 'execute_js',
          description: 'Execute JavaScript code on a webpage',
          inputSchema: {
            type: 'object',
            properties: {
              url: {
                type: 'string',
                description: 'The URL to navigate to',
              },
              script: {
                type: 'string',
                description: 'The JavaScript code to execute',
              },
              steps: {
                type: 'string',
                description: 'Comma-separated actions or sentences describing steps to take after page load (e.g., "click #submit, scroll down" or "Fill the login form and submit")',
              },
            },
            required: ['url', 'script'],
          },
        },
        {
          name: 'get_console_logs',
          description: 'Get the console logs of a webpage',
          inputSchema: {
            type: 'object',
            properties: {
              url: {
                type: 'string',
                description: 'The URL to navigate to',
              },
              steps: {
                type: 'string',
                description: 'Comma-separated actions or sentences describing steps to take after page load (e.g., "click #submit, scroll down" or "Fill the login form and submit")',
              },
            },
            required: ['url'],
          },
        },
      ],
    }));
  • MCP CallToolRequestSchema handler specific response formatting for screenshot tool results from Python subprocess.
    if (request.params.name === 'screenshot') {
      return {
        content: [
          {
            type: 'text',
            text: JSON.stringify({
              status: `Screenshot successful.`,
              path: result.filepath,
              // screenshot: 'Data: ' + result.screenshot
            })
          },
        ],
      };
  • Helper: Defines and ensures existence of screenshots directory for saving screenshot files.
    SCREENSHOT_DIR = os.path.join('.', 'screenshots')
    
    async def handle_command(command, args):
        """Handle different browser commands"""
    
        # Ensure screenshot directory exists
        os.makedirs(SCREENSHOT_DIR, exist_ok=True)
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ztobs/cline-browser-use-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server