Skip to main content
Glama
davinoishi

Broken Link Checker MCP Server

by davinoishi

check_site

Recursively crawl websites to identify broken links by scanning internal and external URLs across multiple pages, with options to respect robots.txt and limit concurrent requests.

Instructions

Recursively crawl and check all links across an entire website. This will scan multiple pages and check all internal and external links found. Use with caution on large sites as it may take significant time.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe starting URL of the site to check
excludeExternalLinksNoIf true, only check internal links (default: false)
honorRobotExclusionsNoIf true, respect robots.txt and meta robots tags (default: true)
maxSocketsPerHostNoMaximum concurrent requests per host (default: 1)

Implementation Reference

  • Core handler function for the 'check_site' tool. Uses SiteChecker from 'broken-link-checker' to recursively scan the site starting from the given URL, collects link check results, pages discovered, and any errors encountered.
    function checkSite(url, options = {}) { return new Promise((resolve, reject) => { const results = []; const errors = []; const pages = []; const siteChecker = new SiteChecker(options, { link: (result) => { results.push({ url: result.url.resolved, base: result.base.resolved, html: { tagName: result.html.tagName, text: result.html.text, }, broken: result.broken, brokenReason: result.brokenReason, excluded: result.excluded, excludedReason: result.excludedReason, http: { statusCode: result.http?.response?.statusCode, }, }); }, page: (error, pageUrl) => { if (error) { errors.push({ pageUrl, error: error.message }); } else { pages.push(pageUrl); } }, end: () => { resolve({ results, errors, pages }); }, }); siteChecker.enqueue(url); }); }
  • Input schema defining the parameters for the 'check_site' tool: required 'url', optional 'excludeExternalLinks', 'honorRobotExclusions', and 'maxSocketsPerHost'.
    inputSchema: { type: "object", properties: { url: { type: "string", description: "The starting URL of the site to check", }, excludeExternalLinks: { type: "boolean", description: "If true, only check internal links (default: false)", default: false, }, honorRobotExclusions: { type: "boolean", description: "If true, respect robots.txt and meta robots tags (default: true)", default: true, }, maxSocketsPerHost: { type: "number", description: "Maximum concurrent requests per host (default: 1)", default: 1, }, }, required: ["url"], },
  • server.js:142-174 (registration)
    Registration of the 'check_site' tool in the ListToolsRequestSchema handler, providing name, description, and input schema.
    { name: "check_site", description: "Recursively crawl and check all links across an entire website. This will scan multiple pages and check all internal and external links found. Use with caution on large sites as it may take significant time.", inputSchema: { type: "object", properties: { url: { type: "string", description: "The starting URL of the site to check", }, excludeExternalLinks: { type: "boolean", description: "If true, only check internal links (default: false)", default: false, }, honorRobotExclusions: { type: "boolean", description: "If true, respect robots.txt and meta robots tags (default: true)", default: true, }, maxSocketsPerHost: { type: "number", description: "Maximum concurrent requests per host (default: 1)", default: 1, }, }, required: ["url"], }, },
  • Dispatcher logic in CallToolRequestSchema handler that processes 'check_site' tool calls, prepares options, invokes checkSite, processes results into summary and broken links, and formats the MCP response.
    } else if (name === "check_site") { const options = { excludeExternalLinks: args.excludeExternalLinks || false, honorRobotExclusions: args.honorRobotExclusions !== false, maxSocketsPerHost: args.maxSocketsPerHost || 1, }; const result = await checkSite(args.url, options); const brokenLinks = result.results.filter((link) => link.broken); const summary = { pagesScanned: result.pages.length, totalLinks: result.results.length, brokenLinks: brokenLinks.length, workingLinks: result.results.length - brokenLinks.length, errors: result.errors.length, }; return { content: [ { type: "text", text: JSON.stringify( { summary, brokenLinks, pages: result.pages, errors: result.errors, }, null, 2 ), }, ], };
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/davinoishi/broken-link-checker-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server