playwright_get_text_content
Extract and filter visible text content from web pages using browser automation, removing duplicates for clean and actionable data output.
Instructions
获取当前页面中所有可见元素的文本内容,智能过滤重复内容
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||
Implementation Reference
- The async handle method that implements the tool logic: evaluates JavaScript on the page to collect unique text from visible elements with few children, filters by length, and returns the list of texts.async def handle(self, name: str, arguments: dict | None) -> list[types.TextContent | types.ImageContent | types.EmbeddedResource]: logger.info("开始获取页面文本内容") if not self._sessions: logger.warning("没有活跃的会话。需要先创建一个新会话。") return [types.TextContent(type="text", text="No active session. Please create a new session first.")] try: session_id = list(self._sessions.keys())[-1] page = self._sessions[session_id]["page"] logger.debug(f"从页面获取文本, URL: {page.url}") # text_contents = await page.locator('body').all_inner_texts() async def get_unique_texts_js(page): logger.debug("执行JavaScript获取唯一文本") unique_texts = await page.evaluate('''() => { var elements = Array.from(document.querySelectorAll('*')); // 先选择所有元素,再进行过滤 var uniqueTexts = new Set(); for (var element of elements) { if (element.offsetWidth > 0 || element.offsetHeight > 0) { // 判断是否可见 var childrenCount = element.querySelectorAll('*').length; if (childrenCount <= 3) { var innerText = element.innerText ? element.innerText.trim() : ''; if (innerText && innerText.length <= 1000) { uniqueTexts.add(innerText); } var value = element.getAttribute('value'); if (value) { uniqueTexts.add(value); } } } } //console.log( Array.from(uniqueTexts)); return Array.from(uniqueTexts); } ''') return unique_texts # 使用示例 text_contents = await get_unique_texts_js(page) logger.info(f"获取到 {len(text_contents)} 个唯一文本元素") logger.debug(f"文本内容: {text_contents[:3]}...") return [types.TextContent(type="text", text=f"Text content of all elements: {text_contents}")] except Exception as e: logger.error(f"获取文本内容失败: {str(e)}", exc_info=True) return [types.TextContent(type="text", text=f"获取文本内容失败: {str(e)}")]
- Class definition with tool name, description, and input schema (empty, no parameters required). This is used by base.to_tool() to generate JSON schema for MCP.class GetTextContentToolHandler(ToolHandler): name = "playwright_get_text_content" description = "获取当前页面中所有可见元素的文本内容,智能过滤重复内容" inputSchema = []
- src/playwright_server/server.py:43-51 (registration)Registers the GetTextContentToolHandler instance in the tool_handler_list, which is mapped to a dict by name for lookup in MCP tool calls.tool_handler_list = [ NavigateToolHandler(), # ScreenshotToolHandler(), EvaluateToolHandler(), GetTextContentToolHandler(), GetHtmlContentToolHandler(), NewSessionToolHandler(), ActionToolHandler() ]