Skip to main content
Glama
devskido

Playwright MCP Server

by devskido

playwright_get_visible_html

Extract visible HTML content from web pages with configurable cleaning options for scripts, styles, and comments to obtain clean, structured markup.

Instructions

Get the HTML content of the current page. By default, all tags are removed from the output unless removeScripts is explicitly set to false.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
selectorNoCSS selector to limit the HTML to a specific container
removeScriptsNoRemove all script tags from the HTML (default: true)
removeCommentsNoRemove all HTML comments (default: false)
removeStylesNoRemove all style tags from the HTML (default: false)
removeMetaNoRemove all meta tags from the HTML (default: false)
cleanHtmlNoPerform comprehensive HTML cleaning (default: false)
minifyNoMinify the HTML output (default: false)
maxLengthNoMaximum number of characters to return (default: 20000)

Implementation Reference

  • The VisibleHtmlTool class and its execute method implement the core handler logic for the 'playwright_get_visible_html' tool. It retrieves HTML from the page or a selector, applies optional cleaning (remove scripts by default, comments, styles, meta, minify), truncates long output, and handles errors.
    export class VisibleHtmlTool extends BrowserToolBase {
      /**
       * Execute the visible HTML page tool
       */
      async execute(args: any, context: ToolContext): Promise<ToolResponse> {
        // Check if browser is available
        if (!context.browser || !context.browser.isConnected()) {
          // If browser is not connected, we need to reset the state to force recreation
          resetBrowserState();
          return createErrorResponse(
            "Browser is not connected. The connection has been reset - please retry your navigation."
          );
        }
    
        // Check if page is available and not closed
        if (!context.page || context.page.isClosed()) {
          return createErrorResponse(
            "Page is not available or has been closed. Please retry your navigation."
          );
        }
        return this.safeExecute(context, async (page) => {
          try {
            const { selector, removeComments, removeStyles, removeMeta, minify, cleanHtml } = args;
            // Default removeScripts to true unless explicitly set to false
            const removeScripts = args.removeScripts === false ? false : true;
    
            // Get the HTML content
            let htmlContent: string;
    
            if (selector) {
              // If a selector is provided, get only the HTML for that element
              const element = await page.$(selector);
              if (!element) {
                return createErrorResponse(`Element with selector "${selector}" not found`);
              }
              htmlContent = await page.evaluate((el) => el.outerHTML, element);
            } else {
              // Otherwise get the full page HTML
              htmlContent = await page.content();
            }
    
            // Determine if we need to apply filters
            const shouldRemoveScripts = removeScripts || cleanHtml;
            const shouldRemoveComments = removeComments || cleanHtml;
            const shouldRemoveStyles = removeStyles || cleanHtml;
            const shouldRemoveMeta = removeMeta || cleanHtml;
    
            // Apply filters in the browser context
            if (shouldRemoveScripts || shouldRemoveComments || shouldRemoveStyles || shouldRemoveMeta || minify) {
              htmlContent = await page.evaluate(
                ({ html, removeScripts, removeComments, removeStyles, removeMeta, minify }) => {
                  // Create a DOM parser to work with the HTML
                  const parser = new DOMParser();
                  const doc = parser.parseFromString(html, 'text/html');
    
                  // Remove script tags if requested
                  if (removeScripts) {
                    const scripts = doc.querySelectorAll('script');
                    scripts.forEach(script => script.remove());
                  }
    
                  // Remove style tags if requested
                  if (removeStyles) {
                    const styles = doc.querySelectorAll('style');
                    styles.forEach(style => style.remove());
                  }
    
                  // Remove meta tags if requested
                  if (removeMeta) {
                    const metaTags = doc.querySelectorAll('meta');
                    metaTags.forEach(meta => meta.remove());
                  }
    
                  // Remove HTML comments if requested
                  if (removeComments) {
                    const removeComments = (node) => {
                      const childNodes = node.childNodes;
                      for (let i = childNodes.length - 1; i >= 0; i--) {
                        const child = childNodes[i];
                        if (child.nodeType === 8) { // 8 is for comment nodes
                          node.removeChild(child);
                        } else if (child.nodeType === 1) { // 1 is for element nodes
                          removeComments(child);
                        }
                      }
                    };
                    removeComments(doc.documentElement);
                  }
    
                  // Get the processed HTML
                  let result = doc.documentElement.outerHTML;
    
                  // Minify if requested
                  if (minify) {
                    // Simple minification: remove extra whitespace
                    result = result.replace(/>\s+</g, '><').trim();
                  }
    
                  return result;
                },
                {
                  html: htmlContent,
                  removeScripts: shouldRemoveScripts,
                  removeComments: shouldRemoveComments,
                  removeStyles: shouldRemoveStyles,
                  removeMeta: shouldRemoveMeta,
                  minify
                }
              );
            }
    
            // Truncate logic
            const maxLength = typeof args.maxLength === 'number' ? args.maxLength : 20000;
            let output = htmlContent;
            if (output.length > maxLength) {
              output = output.slice(0, maxLength) + '\n<!-- Output truncated due to size limits -->';
            }
            return createSuccessResponse(`HTML content:\n${output}`);
          } catch (error) {
            return createErrorResponse(`Failed to get visible HTML content: ${(error as Error).message}`);
          }
        });
      }
    }
  • Defines the tool name, description, and input schema for 'playwright_get_visible_html', specifying parameters for HTML retrieval and cleaning options.
    {
      name: "playwright_get_visible_html",
      description: "Get the HTML content of the current page. By default, all <script> tags are removed from the output unless removeScripts is explicitly set to false.",
      inputSchema: {
        type: "object",
        properties: {
          selector: { type: "string", description: "CSS selector to limit the HTML to a specific container" },
          removeScripts: { type: "boolean", description: "Remove all script tags from the HTML (default: true)" },
          removeComments: { type: "boolean", description: "Remove all HTML comments (default: false)" },
          removeStyles: { type: "boolean", description: "Remove all style tags from the HTML (default: false)" },
          removeMeta: { type: "boolean", description: "Remove all meta tags from the HTML (default: false)" },
          cleanHtml: { type: "boolean", description: "Perform comprehensive HTML cleaning (default: false)" },
          minify: { type: "boolean", description: "Minify the HTML output (default: false)" },
          maxLength: { type: "number", description: "Maximum number of characters to return (default: 20000)" }
        },
        required: [],
      },
    },
  • In the handleToolCall switch statement, the case for 'playwright_get_visible_html' dispatches execution to the visibleHtmlTool instance.
    case "playwright_get_visible_html":
      return await visibleHtmlTool.execute(args, context);
  • Initializes the VisibleHtmlTool instance in the initializeTools function.
    if (!visibleHtmlTool) visibleHtmlTool = new VisibleHtmlTool(server);
  • Lists 'playwright_get_visible_html' in the BROWSER_TOOLS array used for conditional browser launching.
    "playwright_get_visible_html",
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses a key behavioral trait: script tags are removed by default unless overridden. However, it doesn't mention other important behaviors like whether it returns only visible HTML (implied by the name), error conditions, performance implications, or output format details. The description adds some value but leaves gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—two sentences that directly state the tool's purpose and a key default behavior. Every word earns its place with no fluff or redundancy. It's front-loaded with the core functionality.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 8 parameters, 100% schema coverage, and no output schema, the description is minimally adequate. It covers the basic purpose and one default behavior but lacks context about when to use it, what 'visible' means operationally, or how the output is structured. Given the complexity and lack of annotations, it should provide more guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all 8 parameters. The description adds minimal value beyond the schema by mentioning the default behavior for removeScripts. It doesn't explain parameter interactions (e.g., how cleanHtml relates to other options) or provide usage examples. Baseline 3 is appropriate given high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get the HTML content of the current page.' It specifies the verb ('Get') and resource ('HTML content'), but doesn't explicitly differentiate from sibling tools like playwright_get_visible_text, which might return text instead of HTML. The mention of script tag removal adds specificity but not sibling distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools like playwright_get_visible_text for text extraction or playwright_screenshot for visual capture, nor does it specify prerequisites (e.g., requiring a page to be loaded). Usage context is implied but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/devskido/customed-playwright'

If you have feedback or need assistance with the MCP directory API, please join our Discord server