# SEO Audit MCP Server
A Model Context Protocol (MCP) server that provides comprehensive technical SEO auditing tools, optimized for job board websites.
## Features
- **Page Analysis** - Deep analysis of individual pages including meta tags, headings, structured data, rendering behavior, and links
- **Site Crawling** - Multi-page crawling with automatic page type classification
- **Lighthouse Integration** - Core Web Vitals and performance auditing
- **Sitemap Analysis** - Robots.txt and XML sitemap parsing with job-specific insights
- **JobPosting Schema Validation** - Specialized validation against Google's requirements
## Installation
### Prerequisites
- Node.js 18+
- Chrome/Chromium (for Playwright)
- Lighthouse CLI (optional, for performance audits)
### Setup
```bash
# Clone or extract the project
cd seo-audit-mcp
# Install dependencies
npm install
# Install Playwright browsers
npx playwright install chromium
# Install Lighthouse globally (optional but recommended)
npm install -g lighthouse
# Build the project
npm run build
```
### Configure Claude Desktop
Add to your Claude Desktop configuration (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS):
```json
{
"mcpServers": {
"seo-audit": {
"command": "node",
"args": ["/path/to/seo-audit-mcp/dist/index.js"]
}
}
}
```
### Configure Claude CLI
Add to your Claude CLI config or use directly:
```bash
# Using npx (after npm link)
claude --mcp-server "node /path/to/seo-audit-mcp/dist/index.js"
```
## Available Tools
### `analyze_page`
Analyze a single web page for SEO factors.
```
Input:
- url (required): The URL to analyze
- waitForSelector: CSS selector to wait for (for JS-heavy pages)
- timeout: Timeout in milliseconds (default: 30000)
- device: 'desktop' or 'mobile' (default: desktop)
Output:
- Meta tags, headings, structured data
- JobPosting schema validation
- JavaScript rendering analysis
- Link and image analysis
```
### `crawl_site`
Crawl multiple pages starting from a URL.
```
Input:
- startUrl (required): Starting URL
- maxPages: Maximum pages to crawl (default: 50)
- maxDepth: Maximum link depth (default: 5)
- includePatterns: Regex patterns to include
- excludePatterns: Regex patterns to exclude
Output:
- Aggregated statistics
- Page type classification (job detail, category, location pages)
- Duplicate detection
- Critical issues and warnings
```
### `run_lighthouse`
Run Lighthouse performance audit.
```
Input:
- url (required): URL to audit
- device: 'mobile' or 'desktop' (default: mobile)
- categories: Array of categories to audit
- saveReport: Save HTML report (default: false)
Output:
- Performance, Accessibility, Best Practices, SEO scores
- Core Web Vitals (LCP, CLS, TBT, FCP, TTFB)
- Optimization opportunities
- Diagnostics
```
### `analyze_sitemap`
Analyze robots.txt and XML sitemaps.
```
Input:
- baseUrl (required): Base URL of the site
- includeSitemapUrls: Include full URL list (default: true)
- maxUrls: Max URLs per sitemap (default: 1000)
Output:
- robots.txt rules and issues
- Discovered sitemaps
- Job URL detection
- Recommendations
```
### `check_urls`
Check HTTP status codes for multiple URLs.
```
Input:
- urls (required): Array of URLs to check
- timeout: Timeout per URL in milliseconds
Output:
- Status code, redirect destination, response time per URL
```
## Usage Examples
### Quick Page Audit
```
"Analyze the SEO of https://example.com/jobs/software-engineer"
```
### Full Site Audit
```
"Crawl https://example.com and analyze their job board SEO.
Focus on structured data and landing pages."
```
### Performance Check
```
"Run a Lighthouse audit on https://example.com/jobs for mobile devices"
```
### Job Board Discovery
```
"Analyze the sitemap for https://example.com and find their job posting pages"
```
## Job Board Specific Features
This tool is optimized for job boards with:
1. **JobPosting Schema Validation**
- Validates all required fields (title, description, datePosted, etc.)
- Checks recommended fields (validThrough, baseSalary, employmentType)
- Remote job validation (applicantLocationRequirements)
- Expiration date checking
2. **Page Type Classification**
- Job detail pages
- Job listing/search pages
- Category landing pages (e.g., /marketing-jobs/)
- Location landing pages (e.g., /jobs-in-new-york/)
- Company profile pages
3. **Expired Job Handling Analysis**
- Detects 404s, redirects, and soft 404s
- Checks validThrough dates in schema
- Recommends proper handling strategies
4. **Recommendations**
- Google Indexing API implementation
- Job-specific sitemaps
- Landing page architecture
## Development
```bash
# Run in development mode
npm run dev
# Run tests
npm test
# Lint code
npm run lint
```
## Architecture
```
src/
├── index.ts # Entry point
├── server.ts # MCP server implementation
├── tools/
│ ├── index.ts # Tool registry
│ ├── crawl-page.ts # Single page analysis
│ ├── crawl-site.ts # Multi-page crawler
│ ├── lighthouse.ts # Performance audits
│ └── sitemap.ts # Sitemap/robots analysis
├── types/
│ └── index.ts # TypeScript definitions
└── utils/
├── browser.ts # Playwright helpers
└── http.ts # HTTP utilities
```
## Troubleshooting
### Playwright Issues
```bash
# Reinstall browsers
npx playwright install chromium --force
# On Linux, you may need system dependencies
npx playwright install-deps
```
### Lighthouse Not Found
```bash
# Install globally
npm install -g lighthouse
# Or use npx (slower)
npx lighthouse --version
```
### Permission Errors
The server needs to write to `/tmp` for temporary files. Ensure proper permissions.
### Timeout Errors
For slow sites, increase timeouts:
- Page analysis: Use `timeout` parameter
- Crawling: Reduce `maxPages` or increase delays
## License
MIT