Skip to main content
Glama
ziux

Playwright Server MCP

by ziux

playwright_get_text_content

Extract and filter visible text content from web pages using browser automation, removing duplicates for clean and actionable data output.

Instructions

获取当前页面中所有可见元素的文本内容,智能过滤重复内容

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The async handle method that implements the tool logic: evaluates JavaScript on the page to collect unique text from visible elements with few children, filters by length, and returns the list of texts.
    async def handle(self, name: str, arguments: dict | None) -> list[types.TextContent | types.ImageContent | types.EmbeddedResource]: logger.info("开始获取页面文本内容") if not self._sessions: logger.warning("没有活跃的会话。需要先创建一个新会话。") return [types.TextContent(type="text", text="No active session. Please create a new session first.")] try: session_id = list(self._sessions.keys())[-1] page = self._sessions[session_id]["page"] logger.debug(f"从页面获取文本, URL: {page.url}") # text_contents = await page.locator('body').all_inner_texts() async def get_unique_texts_js(page): logger.debug("执行JavaScript获取唯一文本") unique_texts = await page.evaluate('''() => { var elements = Array.from(document.querySelectorAll('*')); // 先选择所有元素,再进行过滤 var uniqueTexts = new Set(); for (var element of elements) { if (element.offsetWidth > 0 || element.offsetHeight > 0) { // 判断是否可见 var childrenCount = element.querySelectorAll('*').length; if (childrenCount <= 3) { var innerText = element.innerText ? element.innerText.trim() : ''; if (innerText && innerText.length <= 1000) { uniqueTexts.add(innerText); } var value = element.getAttribute('value'); if (value) { uniqueTexts.add(value); } } } } //console.log( Array.from(uniqueTexts)); return Array.from(uniqueTexts); } ''') return unique_texts # 使用示例 text_contents = await get_unique_texts_js(page) logger.info(f"获取到 {len(text_contents)} 个唯一文本元素") logger.debug(f"文本内容: {text_contents[:3]}...") return [types.TextContent(type="text", text=f"Text content of all elements: {text_contents}")] except Exception as e: logger.error(f"获取文本内容失败: {str(e)}", exc_info=True) return [types.TextContent(type="text", text=f"获取文本内容失败: {str(e)}")]
  • Class definition with tool name, description, and input schema (empty, no parameters required). This is used by base.to_tool() to generate JSON schema for MCP.
    class GetTextContentToolHandler(ToolHandler): name = "playwright_get_text_content" description = "获取当前页面中所有可见元素的文本内容,智能过滤重复内容" inputSchema = []
  • Registers the GetTextContentToolHandler instance in the tool_handler_list, which is mapped to a dict by name for lookup in MCP tool calls.
    tool_handler_list = [ NavigateToolHandler(), # ScreenshotToolHandler(), EvaluateToolHandler(), GetTextContentToolHandler(), GetHtmlContentToolHandler(), NewSessionToolHandler(), ActionToolHandler() ]

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ziux/playwright-plus-python-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server