tool_extract_links
Extract all links from a web page to analyze navigation structure and discover resources, with options to filter external links for focused analysis.
Instructions
Extract all links from a page.
Useful for discovering navigation structure and resources.
Args: url: URL to extract links from. filter_external: Only return same-domain links (default True).
Returns: Organized list of internal and external links.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| filter_external | No |
Implementation Reference
- src/devlens/tools/advanced.py:159-220 (handler)The actual implementation of the link extraction logic.
async def extract_links(url: str, *, filter_external: bool = True) -> str: """Extract all links from a page. Args: url: URL to extract links from. filter_external: Only return same-domain links. Returns: Markdown list of links organized by type. Example: >>> links = await extract_links("https://example.com") """ try: import httpx from bs4 import BeautifulSoup async with httpx.AsyncClient(timeout=15, follow_redirects=True) as client: resp = await client.get(url) resp.raise_for_status() html = resp.text soup = BeautifulSoup(html, "html.parser") base_domain = urlparse(url).netloc # Categorize links from urllib.parse import urljoin internal_links: list[tuple[str, str]] = [] # (url, text) external_links: list[tuple[str, str]] = [] for a in soup.find_all("a", href=True): href = a["href"] text = a.get_text(strip=True) or href absolute_url = urljoin(url, href) parsed = urlparse(absolute_url) if parsed.scheme in ("http", "https"): if parsed.netloc == base_domain: internal_links.append((absolute_url, text)) else: external_links.append((absolute_url, text)) # Build report report_lines = [ f"# Links from {url}\n", f"## Internal Links ({len(internal_links)})\n", ] # Deduplicate and sort internal_links = sorted(set(internal_links), key=lambda x: x[1].lower()) external_links = sorted(set(external_links), key=lambda x: x[1].lower()) for link_url, text in internal_links[:50]: # Limit to 50 report_lines.append(f"- [{text}]({link_url})") if not filter_external and external_links: report_lines.append(f"\n## External Links ({len(external_links)})\n") for link_url, text in external_links[:30]: # Limit to 30 report_lines.append(f"- [{text}]({link_url})") return "\n".join(report_lines) - src/devlens/server.py:147-159 (registration)The MCP tool wrapper function that calls extract_links.
async def tool_extract_links(url: str, filter_external: bool = True) -> str: """Extract all links from a page. Useful for discovering navigation structure and resources. Args: url: URL to extract links from. filter_external: Only return same-domain links (default True). Returns: Organized list of internal and external links. """ return await extract_links(url, filter_external=filter_external)