README.md•4.97 kB
# Broken Link Checker MCP Server
An MCP (Model Context Protocol) server that provides broken link checking capabilities using the [broken-link-checker](https://github.com/stevenvachon/broken-link-checker) library.
## Features
- **Check Single Page Links**: Scan all links on a single HTML page for broken links
- **Check Entire Site**: Recursively crawl and check all links across an entire website
- Detailed reporting including HTTP status codes, broken reasons, and link metadata
- Support for excluding external links and respecting robots.txt
- **Two deployment modes**: Local stdio or Remote HTTP/SSE
## Installation
```bash
npm install
```
## Deployment Options
### Option 1: Local Usage (stdio transport)
Use `index.js` for local Claude Desktop integration.
### Option 2: Remote Usage (HTTP/SSE transport)
Use `server.js` for remote deployment with ngrok or similar proxy services.
## Usage with Claude Desktop (Local)
### Step 1: Configure Claude Desktop
Add this server to your Claude Desktop configuration file:
**MacOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
**Windows**: `%APPDATA%/Claude/claude_desktop_config.json`
```json
{
"mcpServers": {
"broken-link-checker": {
"command": "node",
"args": ["/Users/davinoishi/Documents/Projects-AI/BLC/index.js"]
}
}
}
```
Make sure to update the path to match your actual installation directory.
### Step 2: Restart Claude Desktop
After updating the configuration, restart Claude Desktop for the changes to take effect.
### Step 3: Use the Tools
The MCP server provides two main tools:
#### 1. `check_page_links`
Check all links on a single HTML page.
**Parameters**:
- `url` (required): The URL of the page to check
- `excludeExternalLinks` (optional): If true, only check internal links (default: false)
- `honorRobotExclusions` (optional): If true, respect robots.txt (default: true)
**Example**:
```
Can you check the links on https://example.com for any broken links?
```
#### 2. `check_site`
Recursively crawl and check all links across an entire website.
**Parameters**:
- `url` (required): The starting URL of the site to check
- `excludeExternalLinks` (optional): If true, only check internal links (default: false)
- `honorRobotExclusions` (optional): If true, respect robots.txt (default: true)
- `maxSocketsPerHost` (optional): Maximum concurrent requests per host (default: 1)
**Example**:
```
Can you crawl https://example.com and check all pages for broken links?
```
## Remote Deployment with HTTP/SSE Transport
For remote deployments (e.g., deploying on a VPS and connecting via ngrok), use the HTTP/SSE server:
### Step 1: Start the HTTP Server
```bash
# Start the HTTP/SSE server (default port 3000)
npm run start:http
# Or specify a custom port
PORT=8080 npm run start:http
```
The server will start on `http://localhost:3000` (or your specified port).
### Step 2: Expose with ngrok (or alternative)
```bash
# Install ngrok if you haven't already
npm install -g ngrok
# Expose your local server
ngrok http 3000
```
ngrok will provide you with a public URL like: `https://abc123.ngrok.io`
### Step 3: Configure Claude Desktop for Remote Connection
Update your Claude Desktop configuration to use the HTTP/SSE transport:
**MacOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
**Windows**: `%APPDATA%/Claude/claude_desktop_config.json`
```json
{
"mcpServers": {
"broken-link-checker": {
"url": "https://your-ngrok-url.ngrok.io/sse"
}
}
}
```
Replace `your-ngrok-url.ngrok.io` with your actual ngrok URL.
### Step 4: Test the Connection
1. Check the health endpoint: `https://your-ngrok-url.ngrok.io/health`
2. Restart Claude Desktop
3. Ask Claude to check links on a webpage
### Environment Variables
You can configure the server using environment variables:
```bash
# Copy the example environment file
cp .env.example .env
# Edit .env with your settings
PORT=3000
HOST=0.0.0.0
```
### Production Deployment
For production deployments, consider:
1. **Use a process manager** (PM2, systemd):
```bash
npm install -g pm2
pm2 start server.js --name broken-link-checker-mcp
pm2 save
pm2 startup
```
2. **Use a reverse proxy** (nginx, Caddy) for HTTPS
3. **Add authentication** if exposing publicly
4. **Monitor logs and resource usage**
## Output Format
Both tools return JSON with the following structure:
```json
{
"summary": {
"totalLinks": 100,
"brokenLinks": 5,
"workingLinks": 95
},
"brokenLinks": [
{
"url": "https://example.com/broken-page",
"base": "https://example.com",
"broken": true,
"brokenReason": "HTTP_404",
"http": {
"statusCode": 404
}
}
]
}
```
## Development
The main server code is in `index.js`. The server uses:
- `@modelcontextprotocol/sdk` for MCP protocol implementation
- `broken-link-checker` for link checking functionality
## License
MIT