playwright_action

Automate web browser interactions by locating elements and performing actions like clicking, typing, or extracting data using Playwright selectors and commands.

Instructions

根据给定的locator 和action 执行操作

Input Schema

TableJSON Schema

Name	Required	Description
`action`	Yes	操作类型，可选项: fill,click,dblclick,hover,tap,check,uncheck,select_option,press,type,focus,blur,drag_to,screenshot,get_text,count,is_visible,is_enabled,is_hidden,is_disabled,is_checked,get_attribute,evaluate,wait_for,press_sequentially,clear,scroll_into_view,set_input_files,select_text,dispatch_event,get_inner_text,get_inner_html,get_content,all_inner_texts,all_text_contents
`attribute_name`	No	要获取的属性名称，用于get_attribute操作
`button`	No	鼠标按钮，用于点击操作，可选: 'left', 'right', 'middle'
`by`	Yes	选择元素方式，遵循playwright的选择器原则，可选项: get_by_text,get_by_placeholder,get_by_label,get_by_role,get_by_alt_text,get_by_title,get_by_test_id,get_by_css,locator
`by_value`	Yes	选择元素的值
`delay`	No	按键之间的延迟(毫秒)
`event_data`	No	事件数据，JSON格式的字符串，用于dispatch_event操作
`event_name`	No	要分发的事件名称，用于dispatch_event操作
`exact`	No	是否精确匹配文本
`filename`	No	截图保存的文件名，用于screenshot操作
`files`	No	要上传的文件路径，用于set_input_files操作，多个文件用逗号分隔
`force`	No	是否强制执行动作，无视actionability检查
`has_text`	No	元素必须包含的文本
`modifiers`	No	修饰键，用于键盘操作，可选: 'Alt', 'Control', 'Meta', 'Shift'
`no_wait_after`	No	是否不等待动作完成后的导航
`options`	No	选择框的选项，用于select_option操作，JSON格式的字符串，可以包含 value, label, index
`position`	No	相对于元素的点击位置，格式为: 'x,y'
`script`	No	要在元素上执行的JavaScript代码，用于evaluate操作
`selector`	No	拖拽目标元素的选择器，用于drag_to操作
`state`	No	等待的元素状态，用于wait_for操作，可选: 'attached', 'detached', 'visible', 'hidden'
`strict`	No	是否严格匹配元素（如果找到多个元素是否报错）
`timeout`	No	操作超时时间(毫秒)
`value`	No	操作的值

Implementation Reference

src/playwright_server/tools/action.py:46-360 (handler)
The main asynchronous handler method that performs various Playwright operations on page elements. It locates elements using strategies like get_by_text, get_by_role, etc., and executes actions such as fill, click, screenshot, get_text, wait_for, and many others based on input parameters.
async def handle(self, name: str, arguments: dict | None) -> list[types.TextContent | types.ImageContent | types.EmbeddedResource]: logger.info(f"开始执行操作: by={arguments.get('by')}, by_value={arguments.get('by_value')}, action={arguments.get('action')}") try: page = self.get_page() logger.debug(f"当前页面URL: {page.url}") by = arguments.get("by") by_value = arguments.get("by_value") action = arguments.get("action") value = arguments.get("value") force = arguments.get("force", False) delay = arguments.get("delay") position = arguments.get("position") timeout = arguments.get("timeout") selector = arguments.get("selector") attribute_name = arguments.get("attribute_name") script = arguments.get("script") filename = arguments.get("filename", "screenshot.png") no_wait_after = arguments.get("no_wait_after", False) strict = arguments.get("strict", False) state = arguments.get("state") modifiers = arguments.get("modifiers", []) button = arguments.get("button", "left") options_str = arguments.get("options") files_str = arguments.get("files") event_name = arguments.get("event_name") event_data_str = arguments.get("event_data") exact = arguments.get("exact", False) has_text = arguments.get("has_text") logger.debug(f"操作参数: force={force}, delay={delay}, timeout={timeout}, strict={strict}, exact={exact}") # 解析复杂参数 options = {} if options_str: try: logger.debug(f"解析options字符串: {options_str}") options = json.loads(options_str) except json.JSONDecodeError as e: logger.error(f"options参数格式错误: {str(e)}") return [types.TextContent(type="text", text="options参数格式错误，必须是有效的JSON字符串")] files = [] if files_str: logger.debug(f"解析files字符串: {files_str}") files = [f.strip() for f in files_str.split(",")] event_data = {} if event_data_str: try: logger.debug(f"解析event_data字符串: {event_data_str}") event_data = json.loads(event_data_str) except json.JSONDecodeError as e: logger.error(f"event_data参数格式错误: {str(e)}") return [types.TextContent(type="text", text="event_data参数格式错误，必须是有效的JSON字符串")] # 构建选项字典 action_options: Dict[str, Any] = {} if force is not None: action_options["force"] = force if delay is not None: action_options["delay"] = delay if timeout is not None: action_options["timeout"] = timeout if no_wait_after is not None: action_options["no_wait_after"] = no_wait_after if strict is not None: action_options["strict"] = strict if modifiers: action_options["modifiers"] = modifiers if button: action_options["button"] = button if exact is not None: action_options["exact"] = exact logger.debug(f"构建的action_options: {action_options}") # 处理位置参数 if position: try: logger.debug(f"解析位置参数: {position}") x, y = map(float, position.split(',')) action_options["position"] = {"x": x, "y": y} except ValueError as e: logger.error(f"位置格式错误: {str(e)}") return [types.TextContent(type="text", text="位置格式错误，应为'x,y'")] # 根据by参数获取locator locator = None locator_options = {"exact": exact} if exact is not None else {} try: logger.info(f"开始定位元素: by={by}, by_value={by_value}") if by == "get_by_text": locator = page.get_by_text(by_value, **locator_options) elif by == "get_by_placeholder": locator = page.get_by_placeholder(by_value, **locator_options) elif by == "get_by_label": locator = page.get_by_label(by_value, **locator_options) elif by == "get_by_role": locator = page.get_by_role(by_value, name=has_text, **locator_options) elif by == "get_by_alt_text": locator = page.get_by_alt_text(by_value, **locator_options) elif by == "get_by_title": locator = page.get_by_title(by_value, **locator_options) elif by == "get_by_test_id": locator = page.get_by_test_id(by_value) elif by == "get_by_css": locator = page.locator(by_value, has_text=has_text) elif by == "locator": locator = page.locator(by_value, has_text=has_text) else: logger.error(f"不支持的by类型: {by}") return [types.TextContent(type="text", text=f"不支持的by类型: {by}")] logger.debug("元素定位成功") except Exception as e: logger.error(f"定位元素失败: {str(e)}", exc_info=True) return [types.TextContent(type="text", text=f"定位元素失败: {str(e)}")] # 根据action执行不同操作 result = None try: logger.info(f"开始执行操作: action={action}") if action == "fill": logger.debug(f"填充文本: {value}") await locator.fill(value, **action_options) result = "填充成功" elif action == "click": logger.debug("点击元素") await locator.click(**action_options) result = "点击成功" elif action == "dblclick": logger.debug("双击元素") await locator.dblclick(**action_options) result = "双击成功" elif action == "hover": logger.debug("悬停在元素上") await locator.hover(**action_options) result = "悬停成功" elif action == "tap": logger.debug("轻触元素") await locator.tap(**action_options) result = "轻触成功" elif action == "check": logger.debug("选中复选框") await locator.check(**action_options) result = "选中成功" elif action == "uncheck": logger.debug("取消选中复选框") await locator.uncheck(**action_options) result = "取消选中成功" elif action == "select_option": if options: logger.debug(f"选择选项(使用options): {options}") await locator.select_option(**options, **action_options) else: logger.debug(f"选择选项(使用value): {value}") await locator.select_option(value=value, **action_options) result = "选择选项成功" elif action == "press": logger.debug(f"按下按键: {value}") await locator.press(value, **action_options) result = "按键成功" elif action == "type": logger.debug(f"输入文本: {value}") await locator.type(value, **action_options) result = "输入文本成功" elif action == "focus": logger.debug("聚焦元素") await locator.focus(**action_options) result = "聚焦成功" elif action == "blur": logger.debug("失焦元素") await locator.blur(**action_options) result = "失焦成功" elif action == "drag_to": logger.debug(f"拖拽元素到: {selector}") target = page.locator(selector) await locator.drag_to(target, **action_options) result = "拖拽成功" elif action == "screenshot": logger.debug(f"元素截图，保存到: {filename}") screenshot_path = f"{filename}" await locator.screenshot(path=screenshot_path, **action_options) with open(screenshot_path, "rb") as image_file: encoded_string = base64.b64encode(image_file.read()).decode("utf-8") os.remove(screenshot_path) logger.info(f"截图完成，文件大小: {len(encoded_string)} 字节") return [types.ImageContent(type="image", data=encoded_string, mimeType="image/png")] elif action == "get_text": logger.debug("获取元素文本") text = await locator.text_content(**action_options) logger.debug(f"获取到的文本: {text}") result = text elif action == "count": logger.debug("计算匹配元素数量") count = await locator.count() logger.debug(f"匹配的元素数量: {count}") result = str(count) elif action == "is_visible": logger.debug("检查元素是否可见") is_visible = await locator.is_visible(**action_options) logger.debug(f"元素可见性: {is_visible}") result = str(is_visible) elif action == "is_enabled": logger.debug("检查元素是否启用") is_enabled = await locator.is_enabled(**action_options) logger.debug(f"元素启用状态: {is_enabled}") result = str(is_enabled) elif action == "is_hidden": logger.debug("检查元素是否隐藏") is_hidden = await locator.is_hidden(**action_options) logger.debug(f"元素隐藏状态: {is_hidden}") result = str(is_hidden) elif action == "is_disabled": logger.debug("检查元素是否禁用") is_disabled = await locator.is_disabled(**action_options) logger.debug(f"元素禁用状态: {is_disabled}") result = str(is_disabled) elif action == "is_checked": logger.debug("检查元素是否选中") is_checked = await locator.is_checked(**action_options) logger.debug(f"元素选中状态: {is_checked}") result = str(is_checked) elif action == "get_attribute": logger.debug(f"获取元素属性: {attribute_name}") attribute = await locator.get_attribute(attribute_name, **action_options) logger.debug(f"属性值: {attribute}") result = attribute elif action == "evaluate": logger.debug(f"在元素上执行脚本: {script}") evaluated = await locator.evaluate(script, **action_options) logger.debug(f"脚本执行结果: {evaluated}") result = str(evaluated) elif action == "wait_for": if not state: state = "visible" logger.debug(f"等待元素状态: {state}") await locator.wait_for(state=state, **action_options) result = f"等待元素状态 '{state}' 成功" elif action == "press_sequentially": logger.debug(f"序列按键输入: {value}") await locator.press_sequentially(value, **action_options) result = "序列按键输入成功" elif action == "clear": logger.debug("清除元素内容") await locator.clear(**action_options) result = "清除内容成功" elif action == "scroll_into_view": logger.debug("将元素滚动到视图") await locator.scroll_into_view_if_needed(**action_options) result = "滚动到元素成功" elif action == "set_input_files": if files: logger.debug(f"设置文件输入: {files}") await locator.set_input_files(files, **action_options) result = "设置文件输入成功" else: logger.error("没有指定要上传的文件") return [types.TextContent(type="text", text="没有指定要上传的文件")] elif action == "select_text": logger.debug("选择元素中的文本") await locator.select_text(**action_options) result = "选择文本成功" elif action == "dispatch_event": if not event_name: logger.error("没有指定事件名称") return [types.TextContent(type="text", text="没有指定事件名称")] logger.debug(f"分发事件: {event_name}, 数据: {event_data}") await locator.dispatch_event(event_name, event_data, **action_options) result = f"分发事件 '{event_name}' 成功" elif action == "get_inner_text": logger.debug("获取元素内部文本") inner_text = await locator.inner_text(**action_options) logger.debug(f"内部文本: {inner_text}") result = inner_text elif action == "get_inner_html": logger.debug("获取元素内部HTML") inner_html = await locator.inner_html(**action_options) logger.debug(f"内部HTML长度: {len(inner_html)}") result = inner_html elif action == "get_content": logger.debug("获取元素内容") content = await locator.text_content(**action_options) logger.debug(f"内容: {content}") result = content elif action == "all_inner_texts": logger.debug("获取所有内部文本") all_inner_texts = await locator.all_inner_texts() logger.debug(f"内部文本数量: {len(all_inner_texts)}") result = str(all_inner_texts) elif action == "all_text_contents": logger.debug("获取所有文本内容") all_text_contents = await locator.all_text_contents() logger.debug(f"文本内容数量: {len(all_text_contents)}") result = str(all_text_contents) else: logger.error(f"不支持的action类型: {action}") return [types.TextContent(type="text", text=f"不支持的action类型: {action}")] logger.info(f"操作执行成功: {action}") return [types.TextContent(type="text", text=f"{result}")] except PlaywrightTimeoutError as e: logger.error(f"操作超时: {str(e)}", exc_info=True) return [types.TextContent(type="text", text=f"操作超时: {str(e)}")] except Exception as e: logger.error(f"操作执行失败: {str(e)}", exc_info=True) return [types.TextContent(type="text", text=f"操作执行失败: {str(e)}")] except Exception as e: logger.error(f"操作处理器全局异常: {str(e)}", exc_info=True) return [types.TextContent(type="text", text=f"操作处理器发生错误: {str(e)}")]
src/playwright_server/tools/action.py:18-44 (schema)
Tool schema definition including name 'playwright_action', description, and detailed inputSchema with properties for element selection (by/by_value), action type, value, force, timeout, and action-specific parameters like script, options, files, etc.
name = "playwright_action" description = "根据给定的locator 和action 执行操作" inputSchema = [ Property(name="by", typ="string", description="选择元素方式，遵循playwright的选择器原则，可选项: get_by_text,get_by_placeholder,get_by_label,get_by_role,get_by_alt_text,get_by_title,get_by_test_id,get_by_css,locator", required=True), Property(name="by_value", typ="string", description="选择元素的值", required=True), Property(name="action", typ="string", description="操作类型，可选项: fill,click,dblclick,hover,tap,check,uncheck,select_option,press,type,focus,blur,drag_to,screenshot,get_text,count,is_visible,is_enabled,is_hidden,is_disabled,is_checked,get_attribute,evaluate,wait_for,press_sequentially,clear,scroll_into_view,set_input_files,select_text,dispatch_event,get_inner_text,get_inner_html,get_content,all_inner_texts,all_text_contents", required=True), Property(name="value", typ="string", description="操作的值", required=False), Property(name="force", typ="boolean", description="是否强制执行动作，无视actionability检查", required=False), Property(name="delay", typ="number", description="按键之间的延迟(毫秒)", required=False), Property(name="position", typ="string", description="相对于元素的点击位置，格式为: 'x,y'", required=False), Property(name="timeout", typ="number", description="操作超时时间(毫秒)", required=False), Property(name="selector", typ="string", description="拖拽目标元素的选择器，用于drag_to操作", required=False), Property(name="attribute_name", typ="string", description="要获取的属性名称，用于get_attribute操作", required=False), Property(name="script", typ="string", description="要在元素上执行的JavaScript代码，用于evaluate操作", required=False), Property(name="filename", typ="string", description="截图保存的文件名，用于screenshot操作", required=False), Property(name="no_wait_after", typ="boolean", description="是否不等待动作完成后的导航", required=False), Property(name="strict", typ="boolean", description="是否严格匹配元素（如果找到多个元素是否报错）", required=False), Property(name="state", typ="string", description="等待的元素状态，用于wait_for操作，可选: 'attached', 'detached', 'visible', 'hidden'", required=False), Property(name="modifiers", typ="array", description="修饰键，用于键盘操作，可选: 'Alt', 'Control', 'Meta', 'Shift'", required=False), Property(name="button", typ="string", description="鼠标按钮，用于点击操作，可选: 'left', 'right', 'middle'", required=False), Property(name="options", typ="string", description="选择框的选项，用于select_option操作，JSON格式的字符串，可以包含 value, label, index", required=False), Property(name="files", typ="string", description="要上传的文件路径，用于set_input_files操作，多个文件用逗号分隔", required=False), Property(name="event_name", typ="string", description="要分发的事件名称，用于dispatch_event操作", required=False), Property(name="event_data", typ="string", description="事件数据，JSON格式的字符串，用于dispatch_event操作", required=False), Property(name="exact", typ="boolean", description="是否精确匹配文本", required=False), Property(name="has_text", typ="string", description="元素必须包含的文本", required=False) ]
src/playwright_server/server.py:43-54 (registration)
Registers the ActionToolHandler instance in the global tool_handler_list and populates the tool_handlers dictionary mapping tool names to their handler instances, used by list_tools() and call_tool().
tool_handler_list = [ NavigateToolHandler(), # ScreenshotToolHandler(), EvaluateToolHandler(), GetTextContentToolHandler(), GetHtmlContentToolHandler(), NewSessionToolHandler(), ActionToolHandler() ] # 根据每个处理程序的 name 属性创建字典 tool_handlers = {handler.name: handler for handler in tool_handler_list}

Playwright Server MCP

playwright_action

Instructions

Input Schema

Implementation Reference

Other Tools

Related Tools

Latest Blog Posts

MCP directory API