# SEO Audit MCP Server - Product Requirements Document
## Vision
A comprehensive, agentic SEO auditing system that enables Claude to perform professional-grade technical SEO audits of any website, with specialized capabilities for job boards. The system should be able to analyze a site, identify issues, prioritize fixes, and generate actionable reports that could be presented to prospects or clients.
## Current State (v0.1.0)
### Implemented Features
| Feature | Status | Description |
|---------|--------|-------------|
| Page Analysis | ✅ Complete | Meta tags, headings, structured data, rendering analysis |
| JobPosting Schema Validation | ✅ Complete | Full validation against Google requirements |
| Site Crawling | ✅ Complete | Multi-page crawl with page type classification |
| Lighthouse Integration | ✅ Complete | CWV, performance, accessibility, SEO audits |
| Sitemap Analysis | ✅ Complete | robots.txt and XML sitemap parsing |
| URL Status Checking | ✅ Complete | Bulk URL status and redirect checking |
| JS Rendering Detection | ✅ Complete | CSR vs SSR analysis, framework detection |
### Known Limitations
1. **Lighthouse requires CLI installation** - Not fully self-contained
2. **No persistent storage** - Each audit is independent
3. **No comparative analysis** - Can't compare before/after or vs competitors
4. **Basic link checking** - Doesn't verify all internal links during crawl
5. **No authentication support** - Can't crawl behind login walls
6. **Single-threaded crawling** - Slow for large sites
---
## Roadmap
### Phase 1: Core Improvements (v0.2.0)
**Goal:** Make the existing tools more robust and useful
#### 1.1 Enhanced Link Analysis
- [ ] Parallel link checking during crawl
- [ ] Detect orphan pages (not linked from any other page)
- [ ] Internal link graph visualization data
- [ ] Anchor text analysis
#### 1.2 Improved Crawling
- [ ] Configurable concurrent requests
- [ ] Resume capability (save/load crawl state)
- [ ] XML sitemap-based crawling mode
- [ ] JavaScript execution toggle (SSR validation)
#### 1.3 Better Structured Data
- [ ] Full Schema.org validation (not just JobPosting)
- [ ] Organization, BreadcrumbList, WebSite validation
- [ ] FAQ, HowTo, Article schema detection
- [ ] Schema coverage scoring
#### 1.4 Report Generation
- [ ] Markdown report template
- [ ] HTML report with styling
- [ ] Executive summary generation
- [ ] Issue prioritization algorithm
---
### Phase 2: Job Board Specialization (v0.3.0)
**Goal:** Deep job board expertise
#### 2.1 Expired Job Detection
- [ ] Sample expired job URLs from sitemap history
- [ ] Verify expired job handling (redirect, 404, soft 404)
- [ ] Check validThrough dates against current date
- [ ] Detect jobs with missing expiration
- [ ] Score expired job handling quality
#### 2.2 Landing Page Analysis
- [ ] Detect missing landing page types:
- Category pages (e.g., /marketing-jobs/)
- Location pages (e.g., /jobs-in-new-york/)
- Combined pages (e.g., /marketing-jobs-new-york/)
- Remote job pages
- Company pages
- [ ] Landing page template quality scoring
- [ ] Keyword gap analysis for landing pages
- [ ] Competitor landing page comparison
#### 2.3 Google for Jobs Optimization
- [ ] Google for Jobs eligibility check
- [ ] Indexing API implementation status detection
- [ ] Apply button / directApply validation
- [ ] Salary data completeness scoring
- [ ] Remote job markup validation
#### 2.4 Job Feed Analysis
- [ ] XML/JSON job feed detection
- [ ] Feed freshness monitoring
- [ ] Feed-to-sitemap consistency check
- [ ] Third-party aggregator detection (Indeed, LinkedIn, etc.)
---
### Phase 3: Competitive Intelligence (v0.4.0)
**Goal:** Benchmark against competitors
#### 3.1 Competitor Crawling
- [ ] Side-by-side competitor comparison
- [ ] Structured data coverage comparison
- [ ] Landing page type comparison
- [ ] Performance benchmarking
#### 3.2 SERP Analysis
- [ ] Google for Jobs visibility check
- [ ] Organic ranking sample checks
- [ ] Featured snippet detection
- [ ] Rich result eligibility
#### 3.3 Backlink Profile (Integration)
- [ ] Ahrefs/Moz API integration
- [ ] Domain authority comparison
- [ ] Referring domains analysis
- [ ] Link gap analysis
---
### Phase 4: Monitoring & Alerts (v0.5.0)
**Goal:** Ongoing monitoring capabilities
#### 4.1 Scheduled Audits
- [ ] Cron-based re-auditing
- [ ] Change detection between audits
- [ ] Regression alerts
- [ ] Improvement tracking
#### 4.2 Real-time Monitoring
- [ ] Core Web Vitals monitoring
- [ ] Uptime/status monitoring
- [ ] Schema validation monitoring
- [ ] Sitemap change detection
#### 4.3 Integration Webhooks
- [ ] Slack notifications
- [ ] Email reports
- [ ] API webhooks for CI/CD integration
---
### Phase 5: Advanced Analysis (v0.6.0)
**Goal:** Deep technical analysis
#### 5.1 Log File Analysis
- [ ] Access log parsing
- [ ] Bot crawl analysis (Googlebot, Bingbot)
- [ ] Crawl budget analysis
- [ ] Orphan page detection from logs
#### 5.2 Content Analysis
- [ ] Thin content detection
- [ ] Duplicate content analysis (cross-page)
- [ ] Content freshness scoring
- [ ] Readability analysis
#### 5.3 JavaScript Deep Dive
- [ ] JavaScript bundle analysis
- [ ] Render-blocking resource detection
- [ ] Third-party script impact analysis
- [ ] Hydration timing analysis
#### 5.4 Mobile Specific
- [ ] Mobile-first indexing validation
- [ ] Touch target analysis
- [ ] Mobile redirect detection
- [ ] AMP validation (if applicable)
---
### Phase 6: AI-Powered Analysis (v1.0.0)
**Goal:** Let Claude do the heavy lifting
#### 6.1 Intelligent Issue Detection
- [ ] Pattern recognition across pages
- [ ] Anomaly detection
- [ ] Root cause analysis
- [ ] Impact estimation
#### 6.2 Automated Recommendations
- [ ] Prioritized fix recommendations
- [ ] Code/configuration suggestions
- [ ] Implementation time estimates
- [ ] Expected impact predictions
#### 6.3 Natural Language Reporting
- [ ] Executive summary generation
- [ ] Technical deep-dive generation
- [ ] Client-friendly explanations
- [ ] Custom report sections
---
## Testing Instructions
### Prerequisites
```bash
# Ensure Node.js 18+ is installed
node --version
# Install dependencies
cd seo-audit-mcp
npm install
# Install Playwright browsers
npx playwright install chromium
# Install Lighthouse (optional but recommended)
npm install -g lighthouse
# Build the project
npm run build
```
### Manual Testing (CLI)
You can test individual tools directly:
```bash
# Test page analysis
node -e "
const { analyzePage } = require('./dist/tools/crawl-page.js');
analyzePage({ url: 'https://example.com' })
.then(r => console.log(JSON.stringify(r, null, 2)))
.catch(console.error);
"
# Test sitemap analysis
node -e "
const { analyzeSiteAccess } = require('./dist/tools/sitemap.js');
analyzeSiteAccess({ baseUrl: 'https://example.com' })
.then(r => console.log(JSON.stringify(r, null, 2)))
.catch(console.error);
"
```
### Testing with Claude Desktop
1. Add to `claude_desktop_config.json`:
```json
{
"mcpServers": {
"seo-audit": {
"command": "node",
"args": ["/absolute/path/to/seo-audit-mcp/dist/index.js"]
}
}
}
```
2. Restart Claude Desktop
3. Test with prompts like:
- "Analyze the SEO of https://www.themuse.com"
- "Check the structured data on https://example-job-board.com/jobs/123"
- "Run a Lighthouse audit on https://example.com"
### Testing with Claude CLI
```bash
# Start Claude with the MCP server
claude --mcp-server "node /path/to/seo-audit-mcp/dist/index.js"
# Or add to your Claude config permanently
```
### Integration Test Script
Create `test-integration.js`:
```javascript
const { analyzePage } = require('./dist/tools/crawl-page.js');
const { analyzeSiteAccess } = require('./dist/tools/sitemap.js');
const { closeBrowser } = require('./dist/utils/browser.js');
async function runTests() {
const testUrl = 'https://httpbin.org/html';
console.log('Testing page analysis...');
const pageResult = await analyzePage({ url: testUrl });
console.log('✓ Page analysis:', pageResult.title ? 'Has title' : 'No title');
console.log('\nTesting sitemap analysis...');
const sitemapResult = await analyzeSiteAccess({ baseUrl: 'https://httpbin.org' });
console.log('✓ Sitemap analysis:', sitemapResult.robots.found ? 'robots.txt found' : 'No robots.txt');
await closeBrowser();
console.log('\n✓ All tests passed!');
}
runTests().catch(err => {
console.error('Test failed:', err);
process.exit(1);
});
```
Run with: `node test-integration.js`
---
## Target Metrics
### Performance
- Page analysis: < 10 seconds per page
- Site crawl: < 60 seconds for 50 pages
- Lighthouse: < 60 seconds per URL
- Sitemap analysis: < 30 seconds
### Quality
- JobPosting schema: 100% Google requirement coverage
- CWV metrics: Match PageSpeed Insights values
- Link detection: > 95% accuracy
- Page classification: > 90% accuracy
### Usability
- Zero-config for basic usage
- Clear error messages
- Graceful degradation (Lighthouse optional)
- Comprehensive output for Claude analysis
---
## Success Criteria for v1.0
1. **Complete Job Board Audit**: Can perform a full technical SEO audit of a job board comparable to what a human SEO would deliver
2. **Actionable Output**: Every finding includes clear remediation steps
3. **Prioritized Recommendations**: Issues ranked by impact and effort
4. **Competitive Context**: Can benchmark against competitors
5. **Monitoring Ready**: Can detect regressions over time
6. **Client-Ready Reports**: Can generate professional reports suitable for client presentation
---
## Future Considerations
### Potential Integrations
- Google Search Console API
- Bing Webmaster Tools API
- Google Analytics / GA4
- Ahrefs / Semrush / Moz APIs
- Slack / Discord for alerts
- GitHub for CI/CD integration
### Potential UI Layer
- Web dashboard for visual reports
- Real-time crawl progress
- Historical trend graphs
- Comparison visualizations
### Scaling Considerations
- Distributed crawling for large sites
- Cloud-based Lighthouse execution
- Database for historical data
- Queue system for batch processing