Skip to main content
Glama

url_to_markdown_tool

Convert web pages to clean markdown format by extracting content, removing unnecessary elements, and ranking information for RAG applications.

Instructions

Extract and convert web page content to markdown format.

This tool scrapes a web page, removes unnecessary elements, ranks content by importance using a custom algorithm, and returns clean markdown. Perfect for RAG applications.

Args: url: The web page URL to analyze and convert

Returns: str: Clean markdown representation of the web page content

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The handler function for the 'url_to_markdown_tool' tool. It is registered using the @mcp.tool() decorator and delegates the core logic to the url_to_markdown helper function from web_extractor.py. The function signature and docstring define the tool's schema.
    @mcp.tool()
    def url_to_markdown_tool(url: str) -> str:
        """
        Extract and convert web page content to markdown format.
        
        This tool scrapes a web page, removes unnecessary elements, 
        ranks content by importance using a custom algorithm, and 
        returns clean markdown. Perfect for RAG applications.
        
        Args:
            url: The web page URL to analyze and convert
            
        Returns:
            str: Clean markdown representation of the web page content
        """
        return url_to_markdown(url)
  • The core helper function implementing the web page to markdown conversion. It handles URL validation, HTML extraction using Selenium, parsing special elements (tables, images, etc.), cleaning, content ranking by importance, and markdown conversion.
    def url_to_markdown(url: str) -> str:
        """
        Convert a URL to markdown format using advanced content extraction.
        
        This is the main function that replaces the original build_output function.
        It extracts HTML, analyzes content importance, and converts to markdown.
        
        Args:
            url: The URL to analyze and convert
            
        Returns:
            str: Markdown formatted content
        """
        try:
            # Ensure valid URL
            clean_url = ensure_url_scheme(url)
            
            # Extract HTML content
            html_content = extract_html_content(clean_url)
            
            # Parse HTML
            soup = BeautifulSoup(html_content, 'html.parser')
            
            # Extract special elements before cleaning
            special_elements = parse_special_elements(soup)
            
            # Clean HTML content
            cleaned_soup = clean_html_content(soup)
            
            # Rank content by importance
            main_content = rank_content_by_importance(cleaned_soup)
            
            # Convert to markdown
            markdown_result = convert_to_markdown(special_elements, main_content)
            
            return markdown_result
            
        except Exception as e:
            return f"Error processing URL {url}: {str(e)}"
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing key behavioral traits: it scrapes web pages, removes unnecessary elements, ranks content by importance using a custom algorithm, and returns clean markdown. This covers the transformation process and output characteristics beyond basic functionality.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with a clear purpose statement, elaboration of the process, usage context, and separate Args/Returns sections. Every sentence adds value without redundancy, and the information is appropriately front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (web scraping with algorithmic ranking), no annotations, and the presence of an output schema (which handles return value documentation), the description provides complete context. It explains the transformation process, use case, parameter meaning, and output format adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage and only one parameter, the description compensates well by explaining the 'url' parameter as 'The web page URL to analyze and convert,' adding meaningful context about its purpose. However, it doesn't specify URL format requirements or constraints, preventing a perfect score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Extract and convert web page content to markdown format') and distinguishes it from the sibling tool 'web_content_qna' by focusing on conversion rather than Q&A. It provides a complete verb+resource+output specification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states 'Perfect for RAG applications,' providing clear context for when to use this tool. However, it doesn't specify when NOT to use it or explicitly contrast with the sibling 'web_content_qna' tool, which would be needed for a score of 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kimdonghwi94/web-analyzer-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server