Skip to main content
Glama

url_content

Extract webpage content and metadata from any URL using the MCP2Brave server. Input a URL to retrieve text and structured data for analysis or processing.

Instructions

直接获取网页内容

参数:
    url (str): 目标网页地址
    
返回:
    str: 网页内容和元数据

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes

Implementation Reference

  • The handler function for the MCP tool 'url_content'. It accepts a URL string parameter and returns the extracted webpage content by calling the internal helper function.
    def url_content(url: str) -> str:
        """直接获取网页内容
        
        参数:
            url (str): 目标网页地址
            
        返回:
            str: 网页内容和元数据
        """
        return _get_url_content_direct(url)
  • mcp2brave.py:440-440 (registration)
    The @mcp.tool() decorator registers the subsequent function as the 'url_content' MCP tool.
    @mcp.tool()
  • The core helper function that performs the actual URL content fetching using requests, HTML parsing with BeautifulSoup, text extraction, cleaning, and metadata addition.
    def _get_url_content_direct(url: str) -> str:
        """Internal function to get content directly using requests"""
        try:
            logger.debug(f"Directly fetching content from URL: {url}")
            response = requests.get(url, timeout=10, headers={
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
            })
            response.raise_for_status()
            
            # 尝试检测编码
            if 'charset' in response.headers.get('content-type', '').lower():
                response.encoding = response.apparent_encoding
                
            try:
                from bs4 import BeautifulSoup
                soup = BeautifulSoup(response.text, 'html.parser')
                
                # 移除不需要的元素
                for element in soup(['script', 'style', 'header', 'footer', 'nav', 'aside', 'iframe', 'ad', '.advertisement']):
                    element.decompose()
                
                # 尝试找到主要内容区域
                main_content = None
                possible_content_elements = [
                    soup.find('article'),
                    soup.find('main'),
                    soup.find(class_='content'),
                    soup.find(id='content'),
                    soup.find(class_='post-content'),
                    soup.find(class_='article-content'),
                    soup.find(class_='entry-content'),
                    soup.find(class_='main-content'),
                    soup.select_one('div[class*="content"]'),  # 包含 "content" 的任何 class
                ]
                
                for element in possible_content_elements:
                    if element:
                        main_content = element
                        break
                
                if not main_content:
                    main_content = soup
                
                text = main_content.get_text(separator='\n')
                
                lines = []
                for line in text.split('\n'):
                    line = line.strip()
                    if line and len(line) > 30:
                        lines.append(line)
                
                cleaned_text = ' '.join(lines)
                if len(cleaned_text) > 1000:
                    end_pos = cleaned_text.rfind('. ', 0, 1000)
                    if end_pos > 0:
                        cleaned_text = cleaned_text[:end_pos + 1]
                    else:
                        cleaned_text = cleaned_text[:1000]
                
                metadata = f"URL: {url}\n"
                metadata += f"Content Length: {len(response.text)} characters\n"
                metadata += f"Content Type: {response.headers.get('content-type', 'Unknown')}\n"
                metadata += "---\n\n"
                
                return f"{metadata}{cleaned_text}"
                
            except Exception as e:
                logger.error(f"Error extracting text from HTML: {str(e)}")
                return f"Error extracting text: {str(e)}"
            
        except Exception as e:
            logger.error(f"Error fetching URL content directly: {str(e)}")
            return f"Error getting content: {str(e)}"
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool '直接获取网页内容' (directly gets webpage content) and mentions '网页内容和元数据' (webpage content and metadata) in the return, but it lacks details on permissions, rate limits, error handling, or whether it's a read-only operation. For a tool with no annotations, this is insufficient to fully inform the agent about its behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with a clear structure: a purpose statement followed by parameter and return sections. However, it's somewhat under-specified—the purpose is brief, and the parameter and return descriptions are minimal. While efficient, it could benefit from more detail to be fully helpful without being verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (a web content fetcher with no annotations and no output schema), the description is incomplete. It mentions the return includes '网页内容和元数据' (webpage content and metadata), but without an output schema, it doesn't detail the structure or types of metadata. For a tool that likely involves network operations and data parsing, more context on behavior, errors, or output format is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaningful semantics beyond the input schema. The schema has 0% description coverage, only listing 'url' as a required string. The description specifies that 'url' is the '目标网页地址' (target webpage address), clarifying its purpose. Since there's only one parameter and the description compensates for the low schema coverage, it earns a high score, though not a 5 as it could provide more context like URL format or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: '直接获取网页内容' (directly get webpage content). It specifies the verb '获取' (get) and resource '网页内容' (webpage content), making the action clear. However, it doesn't explicitly differentiate from sibling tools like 'get_url_content_direct', which appears to have a similar function, so it doesn't reach a score of 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. There are sibling tools like 'get_url_content_direct' and 'brave_search_summary' that might serve similar or overlapping purposes, but the description doesn't mention any context, exclusions, or comparisons. This leaves the agent without clear usage instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mcp2everything/mcp2brave'

If you have feedback or need assistance with the MCP directory API, please join our Discord server