get_html
Retrieve HTML content from webpages for browser automation tasks, enabling data extraction and page analysis through URL navigation and post-load actions.
Instructions
Get the HTML content of a webpage
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The URL to navigate to | |
| steps | No | Comma-separated actions or sentences describing steps to take after page load (e.g., "click #submit, scroll down" or "Fill the login form and submit") |
Implementation Reference
- src/browser_handler.py:194-220 (handler)Core handler logic for 'get_html' command: validates URL, uses an AI agent to navigate and perform optional steps, then retrieves and returns the page HTML.elif command == 'get_html': if not args.get('url'): return { 'success': False, 'error': 'URL is required for get_html command' } task = f"1. Go to {args['url']}" if args.get('steps'): steps = args['steps'].split(',') for i, step in enumerate(steps, 2): task += f"\n{i}. {step.strip()}" task += f"\n{len(steps) + 2}. Get the page HTML" else: task += "\n2. Get the page HTML" use_vision = os.getenv('USE_VISION', 'false').lower() == 'true' agent = Agent(task=task, llm=llm, use_vision=use_vision, browser_context=context) await agent.run() try: html = await context.get_page_html() return { 'success': True, 'html': html } finally: await context.close()
- src/index.ts:175-190 (schema)Input schema definition for the 'get_html' tool, specifying required 'url' and optional 'steps' parameters.name: 'get_html', description: 'Get the HTML content of a webpage', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL to navigate to', }, steps: { type: 'string', description: 'Comma-separated actions or sentences describing steps to take after page load (e.g., "click #submit, scroll down" or "Fill the login form and submit")', }, }, required: ['url'], },
- src/index.ts:282-290 (handler)MCP server handler that processes the 'get_html' tool call response from Python backend and formats it as MCP content.} else if (request.params.name === 'get_html') { return { content: [ { type: 'text', text: result.html, }, ], };
- src/index.ts:141-233 (registration)Registers the 'get_html' tool with the MCP server using addTool, including its schema.const stdout = Buffer.concat(stdoutChunks).toString('utf8'); reject(new Error(`Failed to parse Python script output: ${error}\nOutput was: ${stdout}`)); } }); }); } private setupToolHandlers() { this.server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: [ { name: 'screenshot', description: 'Take a screenshot of a webpage', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL to navigate to', }, full_page: { type: 'boolean', description: 'Whether to capture the full page or just the viewport', default: false, }, steps: { type: 'string', description: 'Comma-separated actions or sentences describing steps to take after page load (e.g., "click #submit, scroll down" or "Fill the login form and submit")', }, }, required: ['url'], }, }, { name: 'get_html', description: 'Get the HTML content of a webpage', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL to navigate to', }, steps: { type: 'string', description: 'Comma-separated actions or sentences describing steps to take after page load (e.g., "click #submit, scroll down" or "Fill the login form and submit")', }, }, required: ['url'], }, }, { name: 'execute_js', description: 'Execute JavaScript code on a webpage', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL to navigate to', }, script: { type: 'string', description: 'The JavaScript code to execute', }, steps: { type: 'string', description: 'Comma-separated actions or sentences describing steps to take after page load (e.g., "click #submit, scroll down" or "Fill the login form and submit")', }, }, required: ['url', 'script'], }, }, { name: 'get_console_logs', description: 'Get the console logs of a webpage', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL to navigate to', }, steps: { type: 'string', description: 'Comma-separated actions or sentences describing steps to take after page load (e.g., "click #submit, scroll down" or "Fill the login form and submit")', }, }, required: ['url'], }, }, ], }));