get_html
Fetch the HTML content of any webpage by providing its URL. Optionally include specific actions like clicking or scrolling post-page load for dynamic content extraction.
Instructions
Get the HTML content of a webpage
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| steps | No | Comma-separated actions or sentences describing steps to take after page load (e.g., "click #submit, scroll down" or "Fill the login form and submit") | |
| url | Yes | The URL to navigate to |
Implementation Reference
- src/browser_handler.py:194-221 (handler)Core handler for 'get_html' command: validates input, constructs task for AI agent to navigate URL and perform steps, runs agent, retrieves HTML from browser context, and returns it.elif command == 'get_html': if not args.get('url'): return { 'success': False, 'error': 'URL is required for get_html command' } task = f"1. Go to {args['url']}" if args.get('steps'): steps = args['steps'].split(',') for i, step in enumerate(steps, 2): task += f"\n{i}. {step.strip()}" task += f"\n{len(steps) + 2}. Get the page HTML" else: task += "\n2. Get the page HTML" use_vision = os.getenv('USE_VISION', 'false').lower() == 'true' agent = Agent(task=task, llm=llm, use_vision=use_vision, browser_context=context) await agent.run() try: html = await context.get_page_html() return { 'success': True, 'html': html } finally: await context.close()
- src/index.ts:174-191 (schema)Input schema and metadata for the 'get_html' tool in MCP server registration.{ name: 'get_html', description: 'Get the HTML content of a webpage', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'The URL to navigate to', }, steps: { type: 'string', description: 'Comma-separated actions or sentences describing steps to take after page load (e.g., "click #submit, scroll down" or "Fill the login form and submit")', }, }, required: ['url'], }, },
- src/index.ts:282-290 (registration)Dispatch handler in MCP CallToolRequest that processes 'get_html' result from Python script and formats response.} else if (request.params.name === 'get_html') { return { content: [ { type: 'text', text: result.html, }, ], };
- src/index.ts:236-236 (registration)Valid commands list including 'get_html' for tool validation in MCP server.const validCommands = ['screenshot', 'get_html', 'execute_js', 'get_console_logs'];