Skip to main content
Glama
billallison

URL Text Fetcher MCP Server

by billallison

fetch_page_links

Extract all links from a web page by providing its URL. This tool helps identify and collect hyperlinks for web scraping, content analysis, or navigation purposes.

Instructions

Return a list of all links on the page.

Args: url: The URL to fetch links from

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • Primary implementation of the fetch_page_links tool handler. Downloads the webpage HTML, parses it using BeautifulSoup, extracts all href attributes from anchor tags, filters valid links (http/https or relative), and returns a formatted list limited to the first 100 links.
    @mcp.tool()
    async def fetch_page_links(url: str) -> str:
        """Return a list of all links on the page.
        
        Args:
            url: The URL to fetch links from
        """
        # Sanitize URL input
        url = sanitize_url(url)
        if not url:
            return "Error: Invalid URL format"
        
        # Validate URL safety
        if not is_safe_url(url):
            logger.warning(f"Blocked unsafe URL for link fetching: {url}")
            return "Error: URL not allowed for security reasons"
            
        try:
            logger.info(f"Fetching page links: {url}")
            resp = requests.get(url, headers=HEADERS, timeout=REQUEST_TIMEOUT, stream=True)
            resp.raise_for_status()
            
            # Check content length
            content_length = resp.headers.get('Content-Length')
            if content_length and int(content_length) > MAX_RESPONSE_SIZE:
                return f"Error: Page too large ({content_length} bytes)"
            
            # Read content with size limit
            content_chunks = []
            total_size = 0
            
            for chunk in resp.iter_content(chunk_size=8192, decode_unicode=True):
                if chunk:
                    total_size += len(chunk)
                    if total_size > MAX_RESPONSE_SIZE:
                        return "Error: Page content too large"
                    content_chunks.append(chunk)
            
            html_content = ''.join(content_chunks)
            soup = BeautifulSoup(html_content, "html.parser")
            links = [a.get('href') for a in soup.find_all('a', href=True) if a.get('href')]
            
            # Filter and clean links
            valid_links = []
            for link in links:
                if link.startswith(('http://', 'https://', '/')):
                    valid_links.append(link)
            
            links_text = "\n".join(f"- {link}" for link in valid_links[:100])  # Limit to 100 links
            
            return f"Links found on {url} ({len(valid_links)} total, showing first 100):\n\n{links_text}"
            
        except requests.RequestException as e:
            logger.error(f"Request failed for {url}: {e}")
            return "Error: Unable to fetch page"
        except Exception as e:
            logger.error(f"Unexpected error fetching links from {url}: {e}", exc_info=True)
            return "Error: Unable to process page"
  • Alternative implementation of the fetch_page_links tool handler for FastMCP, including Pydantic Field for input schema validation. Identical logic to the primary handler.
    @mcp.tool()
    def fetch_page_links(url: str = Field(description="The URL to fetch links from")) -> str:
        """Return a list of all links on the page"""
        # Sanitize URL input
        url = sanitize_url(url)
        if not url:
            return "Error: Invalid URL format"
        
        # Validate URL safety
        if not is_safe_url(url):
            logger.warning(f"Blocked unsafe URL for link fetching: {url}")
            return "Error: URL not allowed for security reasons"
            
        try:
            logger.info(f"Fetching page links: {url}")
            resp = requests.get(url, headers=HEADERS, timeout=REQUEST_TIMEOUT, stream=True)
            resp.raise_for_status()
            
            # Check content length
            content_length = resp.headers.get('Content-Length')
            if content_length and int(content_length) > MAX_RESPONSE_SIZE:
                return f"Error: Page too large ({content_length} bytes)"
            
            # Read content with size limit
            content_chunks = []
            total_size = 0
            
            for chunk in resp.iter_content(chunk_size=8192, decode_unicode=True):
                if chunk:
                    total_size += len(chunk)
                    if total_size > MAX_RESPONSE_SIZE:
                        return "Error: Page content too large"
                    content_chunks.append(chunk)
            
            html_content = ''.join(content_chunks)
            soup = BeautifulSoup(html_content, "html.parser")
            links = [a.get('href') for a in soup.find_all('a', href=True) if a.get('href')]
            
            # Filter and clean links
            valid_links = []
            for link in links:
                if link.startswith(('http://', 'https://', '/')):
                    valid_links.append(link)
    
            links_text = "\n".join(f"- {link}" for link in valid_links[:100])  # Limit to 100 links
            
            return f"Links found on {url} ({len(valid_links)} total, showing first 100):\n\n{links_text}"
            
        except requests.RequestException as e:
            logger.error(f"Request failed for {url}: {e}")
            return "Error: Unable to fetch page"
        except Exception as e:
            logger.error(f"Unexpected error fetching links from {url}: {e}", exc_info=True)
            return "Error: Unable to process page"
  • The tool is listed in the get_server_info tool's output as an available tool, serving as informal registration documentation.
    @mcp.tool()
    async def get_server_info() -> str:
        """Get information about this MCP server including version, implementation, and capabilities.
        
        Returns:
            Server information including version, implementation type, and available features
        """
        info = [
            f"URL Text Fetcher MCP Server",
            f"Version: {__version__}",
            f"Implementation: {__implementation__}",
            f"Brave Search Rate Limit: {BRAVE_RATE_LIMIT_RPS} requests/second",
            f"Request Timeout: {REQUEST_TIMEOUT} seconds",
            f"Content Limit: {CONTENT_LENGTH_LIMIT:,} characters",
            f"Max Response Size: {MAX_RESPONSE_SIZE:,} bytes",
            "",
            "Available Tools:",
            "• fetch_url_text - Download visible text from any URL",
            "• fetch_page_links - Extract all links from a webpage", 
            "• brave_search_and_fetch - Search web and fetch content from top results",
            "• test_brave_search - Test Brave Search API connectivity",
            "• get_server_info - Display this server information",
            "",
            "Security Features:",
            "• SSRF protection against internal network access",
            "• Input sanitization for URLs and search queries",
            "• Content size limiting and memory protection",
            "• Thread-safe rate limiting for API requests",
            "",
            f"Brave API Key: {'✓ Configured' if BRAVE_API_KEY else '✗ Missing'}"
        ]
        
        return "\n".join(info)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool returns a list of links but doesn't cover critical aspects like whether it performs web scraping (implying network calls and potential rate limits), error handling (e.g., for invalid URLs), or output format details. This leaves significant gaps in understanding the tool's behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured, with a clear purpose statement followed by parameter details in a labeled 'Args' section. It avoids unnecessary words, though the formatting could be slightly more polished (e.g., bullet points). Every sentence adds value, making it efficient for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (web scraping for links), no annotations, and an output schema (which handles return values), the description is minimally adequate. It covers the basic purpose and parameter but lacks details on behavioral traits like network dependencies or error scenarios. The output schema reduces the need for return value explanation, but more context would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaningful context for the single parameter 'url' by specifying 'The URL to fetch links from,' which clarifies its role beyond the schema's basic title 'Url.' Since schema description coverage is 0%, this compensates well, though it doesn't detail URL format requirements or validation rules. With only one parameter, the baseline is high.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Return a list of all links on the page.' It specifies the verb ('Return') and resource ('list of all links'), making it easy to understand what the tool does. However, it doesn't explicitly differentiate from sibling tools like 'fetch_url_text' (which might fetch text content rather than links), leaving room for minor ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools such as 'fetch_url_text' or 'brave_search_and_fetch', nor does it specify prerequisites, exclusions, or contextual cues for selection. This lack of comparative context limits its utility in guiding the agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/billallison/brsearch-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server