Skip to main content
Glama

get_url_content_direct

Fetch webpage content and metadata directly from any URL using HTTP requests. This tool retrieves full page content for analysis or processing without requiring browser interaction.

Instructions

Get webpage content directly using HTTP request

Args: url (str): The URL to fetch content from Returns: str: The webpage content and metadata

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes

Implementation Reference

  • The MCP tool handler for get_url_content_direct, registered with @mcp.tool(). It takes a URL input and delegates to the _get_url_content_direct helper for execution.
    @mcp.tool() def get_url_content_direct(url: str) -> str: """Get webpage content directly using HTTP request Args: url (str): The URL to fetch content from Returns: str: The webpage content and metadata """ return _get_url_content_direct(url)
  • Core helper function containing the full implementation logic: HTTP GET request, HTML parsing with BeautifulSoup, content extraction and cleaning, metadata addition, and error handling.
    def _get_url_content_direct(url: str) -> str: """Internal function to get content directly using requests""" try: logger.debug(f"Directly fetching content from URL: {url}") response = requests.get(url, timeout=10, headers={ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36' }) response.raise_for_status() # 尝试检测编码 if 'charset' in response.headers.get('content-type', '').lower(): response.encoding = response.apparent_encoding try: from bs4 import BeautifulSoup soup = BeautifulSoup(response.text, 'html.parser') # 移除不需要的元素 for element in soup(['script', 'style', 'header', 'footer', 'nav', 'aside', 'iframe', 'ad', '.advertisement']): element.decompose() # 尝试找到主要内容区域 main_content = None possible_content_elements = [ soup.find('article'), soup.find('main'), soup.find(class_='content'), soup.find(id='content'), soup.find(class_='post-content'), soup.find(class_='article-content'), soup.find(class_='entry-content'), soup.find(class_='main-content'), soup.select_one('div[class*="content"]'), # 包含 "content" 的任何 class ] for element in possible_content_elements: if element: main_content = element break if not main_content: main_content = soup text = main_content.get_text(separator='\n') lines = [] for line in text.split('\n'): line = line.strip() if line and len(line) > 30: lines.append(line) cleaned_text = ' '.join(lines) if len(cleaned_text) > 1000: end_pos = cleaned_text.rfind('. ', 0, 1000) if end_pos > 0: cleaned_text = cleaned_text[:end_pos + 1] else: cleaned_text = cleaned_text[:1000] metadata = f"URL: {url}\n" metadata += f"Content Length: {len(response.text)} characters\n" metadata += f"Content Type: {response.headers.get('content-type', 'Unknown')}\n" metadata += "---\n\n" return f"{metadata}{cleaned_text}" except Exception as e: logger.error(f"Error extracting text from HTML: {str(e)}") return f"Error extracting text: {str(e)}" except Exception as e: logger.error(f"Error fetching URL content directly: {str(e)}") return f"Error getting content: {str(e)}"
  • mcp2brave.py:428-428 (registration)
    The @mcp.tool() decorator registers the get_url_content_direct function as an MCP tool.
    @mcp.tool()
  • Type hints (url: str -> str) and docstring defining input parameters and output format for the tool.
    def get_url_content_direct(url: str) -> str: """Get webpage content directly using HTTP request Args: url (str): The URL to fetch content from Returns: str: The webpage content and metadata """

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mcp2everything/mcp2brave'

If you have feedback or need assistance with the MCP directory API, please join our Discord server