extract-links
Extract and analyze hyperlinks from web pages, organizing URLs, anchor text, and contextual information into a structured format. Supports site mapping, SEO analysis, broken link checking, and targeted crawling preparation. Handles relative and absolute URLs with optional base URL and output limits.
Instructions
Extract and analyze all hyperlinks from a web page, organizing them into a structured format with URLs, anchor text, and contextual information. Performance-optimized with stream processing and worker threads for efficient handling of large pages. Works with either a direct URL or raw HTML content. Handles relative and absolute URLs properly by supporting an optional base URL parameter. Results can be limited to prevent overwhelming output for link-dense pages. Returns a comprehensive link inventory that includes destination URLs, link text, titles (if available), and whether links are internal or external to the source domain. Useful for site mapping, content analysis, broken link checking, SEO analysis, and as a preparatory step for targeted crawling operations.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
baseUrl | No | Optional base URL to resolve relative links against. If provided, only links starting with this base URL will be returned. Useful for focusing on internal links. | |
limit | No | Maximum number of links to return. Defaults to 100. Max allowed is 5000. | |
url | Yes | The fully qualified URL of the web page from which to extract links. Must be a valid HTTP or HTTPS URL. |