README.md•4.97 kB
# Broken Link Checker MCP Server
An MCP (Model Context Protocol) server that provides broken link checking capabilities using the [broken-link-checker](https://github.com/stevenvachon/broken-link-checker) library.
## Features
- **Check Single Page Links**: Scan all links on a single HTML page for broken links
- **Check Entire Site**: Recursively crawl and check all links across an entire website
- Detailed reporting including HTTP status codes, broken reasons, and link metadata
- Support for excluding external links and respecting robots.txt
- **Two deployment modes**: Local stdio or Remote HTTP/SSE
## Installation
```bash
npm install
```
## Deployment Options
### Option 1: Local Usage (stdio transport)
Use `index.js` for local Claude Desktop integration.
### Option 2: Remote Usage (HTTP/SSE transport)
Use `server.js` for remote deployment with ngrok or similar proxy services.
## Usage with Claude Desktop (Local)
### Step 1: Configure Claude Desktop
Add this server to your Claude Desktop configuration file:
**MacOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
**Windows**: `%APPDATA%/Claude/claude_desktop_config.json`
```json
{
  "mcpServers": {
    "broken-link-checker": {
      "command": "node",
      "args": ["/Users/davinoishi/Documents/Projects-AI/BLC/index.js"]
    }
  }
}
```
Make sure to update the path to match your actual installation directory.
### Step 2: Restart Claude Desktop
After updating the configuration, restart Claude Desktop for the changes to take effect.
### Step 3: Use the Tools
The MCP server provides two main tools:
#### 1. `check_page_links`
Check all links on a single HTML page.
**Parameters**:
- `url` (required): The URL of the page to check
- `excludeExternalLinks` (optional): If true, only check internal links (default: false)
- `honorRobotExclusions` (optional): If true, respect robots.txt (default: true)
**Example**:
```
Can you check the links on https://example.com for any broken links?
```
#### 2. `check_site`
Recursively crawl and check all links across an entire website.
**Parameters**:
- `url` (required): The starting URL of the site to check
- `excludeExternalLinks` (optional): If true, only check internal links (default: false)
- `honorRobotExclusions` (optional): If true, respect robots.txt (default: true)
- `maxSocketsPerHost` (optional): Maximum concurrent requests per host (default: 1)
**Example**:
```
Can you crawl https://example.com and check all pages for broken links?
```
## Remote Deployment with HTTP/SSE Transport
For remote deployments (e.g., deploying on a VPS and connecting via ngrok), use the HTTP/SSE server:
### Step 1: Start the HTTP Server
```bash
# Start the HTTP/SSE server (default port 3000)
npm run start:http
# Or specify a custom port
PORT=8080 npm run start:http
```
The server will start on `http://localhost:3000` (or your specified port).
### Step 2: Expose with ngrok (or alternative)
```bash
# Install ngrok if you haven't already
npm install -g ngrok
# Expose your local server
ngrok http 3000
```
ngrok will provide you with a public URL like: `https://abc123.ngrok.io`
### Step 3: Configure Claude Desktop for Remote Connection
Update your Claude Desktop configuration to use the HTTP/SSE transport:
**MacOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
**Windows**: `%APPDATA%/Claude/claude_desktop_config.json`
```json
{
  "mcpServers": {
    "broken-link-checker": {
      "url": "https://your-ngrok-url.ngrok.io/sse"
    }
  }
}
```
Replace `your-ngrok-url.ngrok.io` with your actual ngrok URL.
### Step 4: Test the Connection
1. Check the health endpoint: `https://your-ngrok-url.ngrok.io/health`
2. Restart Claude Desktop
3. Ask Claude to check links on a webpage
### Environment Variables
You can configure the server using environment variables:
```bash
# Copy the example environment file
cp .env.example .env
# Edit .env with your settings
PORT=3000
HOST=0.0.0.0
```
### Production Deployment
For production deployments, consider:
1. **Use a process manager** (PM2, systemd):
   ```bash
   npm install -g pm2
   pm2 start server.js --name broken-link-checker-mcp
   pm2 save
   pm2 startup
   ```
2. **Use a reverse proxy** (nginx, Caddy) for HTTPS
3. **Add authentication** if exposing publicly
4. **Monitor logs and resource usage**
## Output Format
Both tools return JSON with the following structure:
```json
{
  "summary": {
    "totalLinks": 100,
    "brokenLinks": 5,
    "workingLinks": 95
  },
  "brokenLinks": [
    {
      "url": "https://example.com/broken-page",
      "base": "https://example.com",
      "broken": true,
      "brokenReason": "HTTP_404",
      "http": {
        "statusCode": 404
      }
    }
  ]
}
```
## Development
The main server code is in `index.js`. The server uses:
- `@modelcontextprotocol/sdk` for MCP protocol implementation
- `broken-link-checker` for link checking functionality
## License
MIT