Skip to main content
Glama

url_to_markdown_tool

Convert web page content into clean markdown by scraping, removing unnecessary elements, and ranking content for clarity. Ideal for RAG applications and structured data extraction.

Instructions

Extract and convert web page content to markdown format.

This tool scrapes a web page, removes unnecessary elements, ranks content by importance using a custom algorithm, and returns clean markdown. Perfect for RAG applications.

Args: url: The web page URL to analyze and convert

Returns: str: Clean markdown representation of the web page content

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes

Implementation Reference

  • The MCP tool handler and registration for 'url_to_markdown_tool'. It defines the tool function with input/output types and docstring schema, delegating to the core helper.
    @mcp.tool() def url_to_markdown_tool(url: str) -> str: """ Extract and convert web page content to markdown format. This tool scrapes a web page, removes unnecessary elements, ranks content by importance using a custom algorithm, and returns clean markdown. Perfect for RAG applications. Args: url: The web page URL to analyze and convert Returns: str: Clean markdown representation of the web page content """ return url_to_markdown(url)
  • Core implementation of URL to markdown conversion. Orchestrates URL validation, HTML extraction with Selenium, content cleaning, importance ranking, special elements parsing, and markdown conversion.
    def url_to_markdown(url: str) -> str: """ Convert a URL to markdown format using advanced content extraction. This is the main function that replaces the original build_output function. It extracts HTML, analyzes content importance, and converts to markdown. Args: url: The URL to analyze and convert Returns: str: Markdown formatted content """ try: # Ensure valid URL clean_url = ensure_url_scheme(url) # Extract HTML content html_content = extract_html_content(clean_url) # Parse HTML soup = BeautifulSoup(html_content, 'html.parser') # Extract special elements before cleaning special_elements = parse_special_elements(soup) # Clean HTML content cleaned_soup = clean_html_content(soup) # Rank content by importance main_content = rank_content_by_importance(cleaned_soup) # Convert to markdown markdown_result = convert_to_markdown(special_elements, main_content) return markdown_result except Exception as e: return f"Error processing URL {url}: {str(e)}"

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kimdonghwi94/web-analyzer-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server