Skip to main content
Glama

url_to_markdown_tool

Convert web pages to clean markdown format by extracting content, removing unnecessary elements, and ranking information for RAG applications.

Instructions

Extract and convert web page content to markdown format.

This tool scrapes a web page, removes unnecessary elements, ranks content by importance using a custom algorithm, and returns clean markdown. Perfect for RAG applications.

Args: url: The web page URL to analyze and convert

Returns: str: Clean markdown representation of the web page content

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes

Implementation Reference

  • The handler function for the 'url_to_markdown_tool' tool. It is registered using the @mcp.tool() decorator and delegates the core logic to the url_to_markdown helper function from web_extractor.py. The function signature and docstring define the tool's schema.
    @mcp.tool() def url_to_markdown_tool(url: str) -> str: """ Extract and convert web page content to markdown format. This tool scrapes a web page, removes unnecessary elements, ranks content by importance using a custom algorithm, and returns clean markdown. Perfect for RAG applications. Args: url: The web page URL to analyze and convert Returns: str: Clean markdown representation of the web page content """ return url_to_markdown(url)
  • The core helper function implementing the web page to markdown conversion. It handles URL validation, HTML extraction using Selenium, parsing special elements (tables, images, etc.), cleaning, content ranking by importance, and markdown conversion.
    def url_to_markdown(url: str) -> str: """ Convert a URL to markdown format using advanced content extraction. This is the main function that replaces the original build_output function. It extracts HTML, analyzes content importance, and converts to markdown. Args: url: The URL to analyze and convert Returns: str: Markdown formatted content """ try: # Ensure valid URL clean_url = ensure_url_scheme(url) # Extract HTML content html_content = extract_html_content(clean_url) # Parse HTML soup = BeautifulSoup(html_content, 'html.parser') # Extract special elements before cleaning special_elements = parse_special_elements(soup) # Clean HTML content cleaned_soup = clean_html_content(soup) # Rank content by importance main_content = rank_content_by_importance(cleaned_soup) # Convert to markdown markdown_result = convert_to_markdown(special_elements, main_content) return markdown_result except Exception as e: return f"Error processing URL {url}: {str(e)}"

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kimdonghwi94/web-analyzer-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server