Skip to main content
Glama

tool_extract_links

Extract all links from a web page to analyze navigation structure and discover resources, with options to filter external links for focused analysis.

Instructions

Extract all links from a page.

Useful for discovering navigation structure and resources.

Args: url: URL to extract links from. filter_external: Only return same-domain links (default True).

Returns: Organized list of internal and external links.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes
filter_externalNo

Implementation Reference

  • The actual implementation of the link extraction logic.
    async def extract_links(url: str, *, filter_external: bool = True) -> str:
        """Extract all links from a page.
    
        Args:
            url: URL to extract links from.
            filter_external: Only return same-domain links.
    
        Returns:
            Markdown list of links organized by type.
    
        Example:
            >>> links = await extract_links("https://example.com")
        """
        try:
            import httpx
            from bs4 import BeautifulSoup
    
            async with httpx.AsyncClient(timeout=15, follow_redirects=True) as client:
                resp = await client.get(url)
                resp.raise_for_status()
                html = resp.text
    
            soup = BeautifulSoup(html, "html.parser")
            base_domain = urlparse(url).netloc
    
            # Categorize links
            from urllib.parse import urljoin
    
            internal_links: list[tuple[str, str]] = []  # (url, text)
            external_links: list[tuple[str, str]] = []
    
            for a in soup.find_all("a", href=True):
                href = a["href"]
                text = a.get_text(strip=True) or href
                absolute_url = urljoin(url, href)
                parsed = urlparse(absolute_url)
    
                if parsed.scheme in ("http", "https"):
                    if parsed.netloc == base_domain:
                        internal_links.append((absolute_url, text))
                    else:
                        external_links.append((absolute_url, text))
    
            # Build report
            report_lines = [
                f"# Links from {url}\n",
                f"## Internal Links ({len(internal_links)})\n",
            ]
    
            # Deduplicate and sort
            internal_links = sorted(set(internal_links), key=lambda x: x[1].lower())
            external_links = sorted(set(external_links), key=lambda x: x[1].lower())
    
            for link_url, text in internal_links[:50]:  # Limit to 50
                report_lines.append(f"- [{text}]({link_url})")
    
            if not filter_external and external_links:
                report_lines.append(f"\n## External Links ({len(external_links)})\n")
                for link_url, text in external_links[:30]:  # Limit to 30
                    report_lines.append(f"- [{text}]({link_url})")
    
            return "\n".join(report_lines)
  • The MCP tool wrapper function that calls extract_links.
    async def tool_extract_links(url: str, filter_external: bool = True) -> str:
        """Extract all links from a page.
    
        Useful for discovering navigation structure and resources.
    
        Args:
            url: URL to extract links from.
            filter_external: Only return same-domain links (default True).
    
        Returns:
            Organized list of internal and external links.
        """
        return await extract_links(url, filter_external=filter_external)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Y4NN777/devlens-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server