Skip to main content
Glama
adityaarsharma

librecrawl-technical-seo-audit-mcp

πŸ•·οΈ librecrawl-technical-seo-audit-mcp

The AI-native technical SEO crawler.

Run a complete on-site SEO audit on any website β€” straight from Claude, Cursor, Codex, or any Model Context Protocol (MCP) client. Unlimited pages Β· 50+ checks Β· PDF + CSVs Β· MIT-licensed Β· self-hosted Β· ephemeral by design.

Built on the open-source LibreCrawl engine, exposed through 37 MCP tools your AI assistant calls directly.

License: MIT MCP Compatible Python 3.10+ Latest Release GitHub stars Built on LibreCrawl

Works With Works With Works With Works With Works With Works With

⚑ Install in 60s Β· πŸͺ„ What it does Β· πŸš€ 50+ checks Β· πŸ†š Compare Β· πŸ“– Quick start


πŸ€” Don't know what an MCP is? Read this 30-second explainer

Model Context Protocol (MCP) is the open standard that lets AI assistants like Claude, Cursor, or Codex call external tools. Think of it as "USB for AI assistants" β€” you plug a tool in, the AI can use it. librecrawl-technical-seo-audit-mcp is one of those tools. Once installed, you just ask your AI assistant to audit a site, and it does. No GUI. No dashboard. No exports.

New to all this?


πŸͺ„ The whole pitch in 4 lines

You:    Audit https://acme.com β€” full site, no caps, give me the zip
Agent:  β†’ librecrawl_start_chunked_audit Β· polls until done Β· saves zip locally
You:    Show me broken pages + broken external links + hreflang errors
Agent:  β†’ reads CSVs, prints filtered tables. Server already forgot the audit.

That's the product. Your AI assistant runs a full technical SEO audit for you. You get a branded PDF + 7 CSVs covering 50+ technical checks, ready to hand a client. The server wipes everything the moment you download.


πŸ”₯ Why this exists

There are great desktop SEO crawlers (you know the ones). There are great cloud SEO suites. There was no AI-native crawler. librecrawl-technical-seo-audit-mcp fills that gap with five things no comparable open-source MCP server does:

⚑ It runs inside your AI assistant

37 MCP tools your agent calls directly. No GUI app to babysit, no SaaS dashboard to log into, no CSV exports to upload to ChatGPT. You just ask.

πŸš€ Chunked-progressive crawler that never times out

Most SEO MCP servers (SiteAudit MCP, AgentAEO, SE Ranking MCP) run synchronously and disconnect on sites over a few hundred pages. librecrawl-technical-seo-audit-mcp runs the crawl in a background worker thread, persists progress to SQLite WAL, and returns a session_id in under 2 seconds. Your agent polls a tiny status tool until done. 10,000-page enterprise sites work the same as 50-page blogs. Survives PM2 / MCP-client restarts mid-crawl.

πŸ›‘οΈ Catches WAF challenges other crawlers silently misreport

Cloudflare, Akamai, DataDome, Imperva, and PerimeterX challenge pages are served as 200 OK but contain a JavaScript challenge instead of your content. Most crawlers report these as "page OK, all good". librecrawl-technical-seo-audit-mcp fingerprints the challenge in the response body and flags bot_block_challenge_detected. You see what's actually broken.

πŸ€– An AIMD controller tunes crawl delay live

Additive-Increase / Multiplicative-Decrease β€” the same algorithm TCP congestion control uses. Error rate > 10% β†’ halve chunk, double delay. p95 latency > 1.5Γ— target β†’ 1.5Γ— delay. Clean signals β†’ additive decrease. Polite by construction. No rate-limit blow-ups. No manual tuning. Respects robots.txt Crawl-Delay floor.

🧹 Ephemeral by design β€” the agency-safe default

Once you download the zip, the server deletes the session row, every artifact file on disk, AND the upstream LibreCrawl crawl record. Per-audit server footprint after cleanup: 0 bytes, 0 rows. Auditing 50 client sites? Zero data persists where another operator could see it.

πŸ“„ Branded PDF reports ready to hand a client

WeasyPrint, A4, page numbers, footer on every page. Open in any PDF viewer. No SaaS watermark. Hand it to a client as your work.


⚑ Install in 60 seconds

curl -fsSL https://raw.githubusercontent.com/adityaarsharma/librecrawl-technical-seo-audit-mcp/main/install.sh | bash

The installer asks 3 questions (target client, optional Google PageSpeed API key, optional GSC integration) and writes a ready-to-use MCP entry into your Claude / Cursor / Codex / Windsurf config. Done.

You don't need to be. If you can:

  1. Open a terminal (macOS: Cmd+Space β†’ "Terminal" Β· Windows: Win+R β†’ "powershell")

  2. Paste the curl command above

  3. Answer 3 yes/no questions

…you're done. The installer handles Python, Docker, the LibreCrawl backend, and your AI client config. First-audit-to-zip is under 10 minutes from cold start.

git clone https://github.com/adityaarsharma/librecrawl-technical-seo-audit-mcp.git
cd librecrawl-technical-seo-audit-mcp
python3 -m venv venv && source venv/bin/activate
pip install httpx mcp weasyprint markdown fpdf2
# Start LibreCrawl backend on :5080 (see install.sh for Docker compose)
python server.py

Add to your client config (Claude Desktop example):

{
  "mcpServers": {
    "librecrawl": {
      "command": "npx",
      "args": ["-y", "mcp-remote", "http://127.0.0.1:5081/mcp"]
    }
  }
}

πŸš€ 50+ checks every audit

πŸ”’ Security & headers

missing_hsts Β· missing_csp Β· missing_x_frame_options Β· missing_x_content_type_options Β· missing_referrer_policy Β· x_robots_tag_vs_meta_mismatch Β· mixed_content

πŸ›‘οΈ WAF / bot-block detection

bot_block_challenge_detected β€” fingerprints Cloudflare Β· Akamai Β· DataDome Β· Imperva Β· PerimeterX

πŸ—ΊοΈ Sitemap & robots

sitemap_url_noindex Β· sitemap_url_3xx Β· sitemap_url_disallowed_in_robots Β· sitemap_contains_canonicalized Β· sitemap_over_50k_urls Β· sitemap_over_50mb

🌍 Hreflang full audit

missing_return_tag Β· missing_self_reference Β· missing_x_default Β· invalid_codes Β· to_noindex Β· to_broken Β· conflicts_lang_attr

πŸ”— Canonical health

canonical_chain_depth Β· canonical_to_relative Β· canonical_to_redirect Β· canonical_outside_head Β· bad_canonical

πŸ” Redirects (every flavour)

redirect_chains Β· meta_refresh_redirect Β· js_redirect Β· http_refresh_redirect

🏷️ Schema.org (16 types)

Article Β· Product Β· Recipe Β· FAQPage Β· BreadcrumbList Β· Event Β· JobPosting Β· VideoObject Β· HowTo Β· Organization Β· LocalBusiness Β· Person Β· Review Β· AggregateRating Β· Course Β· NewsArticle β€” validates schema.org spec AND Google Rich Results required fields. Handles @graph (Yoast / Rank Math / WPRM).

πŸ”€ URL quality

url_contains_space Β· url_multiple_slashes Β· url_non_ascii Β· url_underscores Β· url_repetitive_path Β· long_urls Β· uppercase_urls Β· url_params_heavy

βš“ Anchor text

non_descriptive_anchor_text Β· empty_anchor_text Β· anchor_image_no_alt Β· broken_bookmarks

πŸ•ΈοΈ Internal linking

internal_nofollow_outlinks Β· nofollow_only_inbound Β· follow_and_nofollow_mixed Β· orphan_pages

πŸ–ΌοΈ Image performance + CLS

lazy_load_attr_missing Β· srcset_missing Β· image_dimensions_missing Β· next_gen_image_format Β· image_oversized_kb Β· missing_alt_pages Β· broken_img_pages

πŸ“ HTML structure

html_over_2mb Β· noscript_in_head Β· broken_or_invalid_html Β· dom_size_excessive Β· lorem_ipsum_detected

β™Ώ Accessibility / metadata

iframes_present Β· iframe_missing_title Β· missing_favicon Β· missing_html_lang Β· invalid_html_lang Β· missing_charset Β· missing_viewport

πŸͺ€ Crawl-budget killers

spider_trap_calendar Β· url_session_id_high_entropy Β· faceted_url_explosion

✍️ Content quality

low_readability (Flesch) Β· long_sentences Β· passive_voice_pct Β· missing_terminal_punctuation Β· boilerplate_ratio Β· ai_tell_tokens_found (delve Β· unlock Β· seamlessly Β· leverage) Β· has_lorem_ipsum

🚨 Dev leaks

outlinks_to_localhost (RFC1918 in production)

πŸ”— Every outbound URL HEAD/GET-validated into 17 status classes β€” ok Β· redirect Β· forbidden Β· not_found Β· timeout Β· dns_error Β· ssl_error Β· connection_refused Β· etc. Per-target: final URL after redirects, source pages, anchor text, response time, server header.

πŸ“ˆ GSC merge β€” pull Google Search Console data, call librecrawl_merge_gsc_data(crawl_id, gsc_data). URLs normalised before joining. Emits 4 extra CSVs: per-page-with-gsc Β· gsc-winners Β· gsc-losers (high impr + CTR <2%) Β· gsc-quick-wins (position 11–20 + impr β‰₯100).


πŸ†š Feature comparison to other on-site SEO crawlers

This is a factual feature comparison. Prices were checked at publication and may have changed β€” see each vendor's site for current pricing. Brand names belong to their respective owners.

Capability

Desktop crawler (Screaming Frog SEO Spiderβ„’)

1

Desktop+cloud crawler (Sitebulbβ„’)

2

Cloud site-audit (Ahrefsβ„’)

3

librecrawl-technical-seo-audit-mcp

Pricing model

Free tier (500 URLs) Β· paid annual licence

Paid monthly subscription

Bundled with main subscription

Free, MIT-licensed, self-hosted

Page cap

500 free / unlimited paid

Unlimited

Tiered by subscription plan

♾️ Unlimited

Runs inside your AI assistant

❌

❌

❌

βœ…

Chunked / background crawl (no timeout)

❌

❌

Cloud only

βœ…

Auto-adaptive crawl delay (AIMD)

❌

Manual

Hidden

βœ…

WAF / bot-block detection on 200-OK pages

❌

❌

❌

βœ…

Sitemap-orphan fill (URLs not internally linked)

❌

❌

❌

βœ…

Ephemeral by default (zero server footprint)

N/A

N/A

N/A

βœ…

Broken links (4xx/5xx/timeout/DNS/SSL)

βœ…

βœ…

βœ…

βœ…

Redirect chains with destination

βœ…

βœ…

βœ…

βœ…

Title / meta / H1 + duplicates

βœ…

βœ…

βœ…

βœ…

Canonical full audit

βœ…

βœ…

βœ…

βœ…

Hreflang full audit (incl. return-tag graph)

βœ…

βœ…

Partial

βœ…

Sitemap full cross-checks

βœ…

βœ…

Partial

βœ…

Schema.org validation (16 types + Rich Results)

βœ…

βœ…

Partial

βœ…

Soft-404 fingerprinting

βœ…

βœ…

βœ…

βœ…

Mixed content (HTTPS β†’ HTTP)

βœ…

βœ…

βœ…

βœ…

Security headers pack

βœ…

βœ…

Partial

βœ…

Image performance + CLS

βœ…

βœ…

βœ…

βœ…

Content quality (Flesch Β· AI-tells Β· boilerplate)

❌

Partial

❌

βœ…

Crawl-budget traps (calendar Β· session-id Β· facets)

βœ…

βœ…

βœ…

βœ…

Branded PDF report

❌

βœ…

❌

βœ…

GSC clicks/impressions merge

Paid add-on

Paid add-on

Native

βœ…

JavaScript rendering

βœ…

βœ…

Cloud only

πŸ›£οΈ Roadmap

Reading guide: if you currently use a paid on-site crawler and your workflow is "crawl β†’ export CSVs β†’ analyse", librecrawl-technical-seo-audit-mcp covers that flow inside your AI assistant for Β£0 with no page caps. If your workflow depends on JavaScript-rendered SPAs, that's on the roadmap but not shipped yet β€” use the desktop tool for now.


πŸ“Š What every audit produces

Single zip, 8 files:

File

Use

SUMMARY.txt

One-page orientation

<domain>-<ts>.pdf

Branded human-readable PDF (open in any viewer)

<domain>-<ts>.md

Markdown source of the PDF (grep-friendly)

per-page.csv

1 row per URL Γ— 30 columns of check booleans + failed_checks_list

sitemap-recon.csv

Sitemap-vs-crawl diff

external-links.csv

Every outbound URL + status

content-audit.csv

Per-page readability + AI-tells

extended-checks.csv

1 row per (URL Γ— check Γ— severity Γ— detail) β€” all 50+ checks


πŸ“– Your first audit

You:   Audit https://example.com β€” full site, no caps

Agent: β†’ librecrawl_start_chunked_audit(url=..., total_max_pages=10000)
         returns session_id in <2s

       β†’ polls librecrawl_audit_status every 25s
         status: crawling, pages_done: 47,  current_delay_ms: 250
         status: crawling, pages_done: 312, last chunk p95: 480ms, err_rate: 0%
         status: done,     pages_done: 534, artifacts_ready: true

       β†’ librecrawl_audit_zip(session_id, auto_cleanup=True)
         returns base64 zip (8 files, 320 KB)
         SAVES LOCALLY as example.com-1780572742.zip
         Server wiped: session_rows=4, files=8, upstream_crawl=1

You:   Show me broken pages + broken external links

Agent: β†’ unzips, reads per-page.csv (filters status_4xx OR status_5xx)
       β†’ reads external-links.csv (filters not_found Β· forbidden Β· 5xx Β· timeout)
       β†’ prints both tables

Local zip is the only copy. Server is back to zero state.


πŸ›£οΈ Roadmap

Status

JavaScript rendering (Playwright headless, DOM diff vs raw HTML) β€” catches SPA / React / Next.js apps

🟑 Designed

Core Web Vitals from CrUX β€” real-user 28-day field data, not just lab PSI

🟑 Designed

axe-core accessibility audit β€” contrast, ARIA, focus order, alt-text quality

🟑 Planned

White-label PDF theming (--brand-config for agencies)

🟑 Planned

Diff mode β€” audit A vs audit B, "what regressed since last week?"

🟑 Planned

Webhook on completion (Slack / Discord) β€” ping when long crawls finish

🟑 Planned

Not planned: keyword research, backlink analysis, SERP tracking. Those are different problems with different MCP servers (DataForSEO, etc.). This tool is laser-focused on technical on-site SEO crawling.

Open an issue to bump priorities or request a check.


πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  MCP client (Claude Code / Desktop / Cursor / Codex …)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚  streamable HTTP or stdio
                             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  librecrawl-technical-seo-audit-mcp wrapper  (server.py β€” FastMCP, 37 tools)    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚ runner.py       β”‚    β”‚ external_links / schema /    β”‚    β”‚
β”‚  β”‚ background      β”‚    β”‚ content_audit / extended_    β”‚    β”‚
β”‚  β”‚ worker thread   β”‚    β”‚ checks / sitemap_fill /      β”‚    β”‚
β”‚  β”‚ AIMD controller β”‚    β”‚ pdf_report                   β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚           β”‚                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚ state.py        β”‚    β”‚ libreclient.py β€” typed       β”‚    β”‚
β”‚  β”‚ SQLite WAL      β”‚    β”‚ wrapper to upstream API      β”‚    β”‚
β”‚  β”‚ session state   β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                          β”‚
                                          β–Ό
                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                          β”‚  LibreCrawl Flask backend    β”‚
                          β”‚  :5080 β€” single-tenant       β”‚
                          β”‚  crawls + extracts SEO data  β”‚
                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

βš™οΈ Configuration

Env var

Default

Purpose

LIBRECRAWL_PORT

5080

LibreCrawl backend port

MCP_PORT

5081

MCP wrapper port

MCP_TRANSPORT

http

http (streamable) or stdio

REPORTS_DIR

~/librecrawl-reports

Where audit artifacts land

PAGESPEED_API_KEY

unset

Optional β€” enables librecrawl_pagespeed*

LIBRECRAWL_STATE_DB

~/librecrawl-state.db

SQLite WAL state store


πŸ› οΈ 37 MCP tools

Chunked audit (95% of work):

  • librecrawl_start_chunked_audit Β· librecrawl_audit_status Β· librecrawl_audit_zip

  • librecrawl_audit_pause Β· librecrawl_audit_resume Β· librecrawl_audit_cancel Β· librecrawl_audit_force_advance

  • librecrawl_audit_artifacts Β· librecrawl_audit_pdf Β· librecrawl_report_content

Specialist:

  • librecrawl_external_links_audit β€” re-run external-link validation on a specific crawl

  • librecrawl_schema_validate Β· librecrawl_schema_check Β· librecrawl_schema_audit

  • librecrawl_merge_gsc_data Β· librecrawl_append_gsc_section β€” Google Search Console data merge

  • librecrawl_pagespeed Β· librecrawl_pagespeed_audit Β· librecrawl_pagespeed_audit_all_crawl_pages β€” PageSpeed Insights

  • librecrawl_site_check β€” instant site-level check

  • librecrawl_internal_links_analysis Β· librecrawl_filter_issues Β· librecrawl_visualization_data

Maintenance:

  • librecrawl_wipe_everything β€” nuclear reset to zero

  • librecrawl_brain_purge_audit β€” purge a single audit

Legacy (kept for backwards compat, avoid for big sites):

  • librecrawl_audit Β· librecrawl_full_audit_strict Β· librecrawl_generate_report Β· librecrawl_export_results Β· librecrawl_get_status Β· librecrawl_get_settings Β· librecrawl_list_crawls Β· librecrawl_start_crawl Β· librecrawl_stop_crawl Β· librecrawl_pause_crawl Β· librecrawl_resume_crawl Β· librecrawl_resume_from_crawl_id


πŸ“œ License & trademarks

Code: MIT. Use it on client work, agency work, internal tools, anything. No attribution required (but appreciated). See LICENSE.

Trademarks. All third-party product names mentioned in this README (including any names referenced in the comparison table) are property of their respective owners. This project is not affiliated with, endorsed by, or sponsored by any third-party tool vendor. Comparisons are based on publicly available information at the time of writing and exist for the purpose of informing readers evaluating different categories of SEO tooling.


πŸ™ Credits

  • LibreCrawl β€” the upstream open-source crawler this MCP server wraps. MIT. Please go star them β€” this project would not exist without that work.

  • Anthropic Model Context Protocol β€” the protocol this server speaks

  • WeasyPrint β€” Markdown β†’ HTML β†’ PDF rendering

  • FastMCP β€” the Python MCP server framework


Built by Aditya Sharma Β· MIT Β· No telemetry Β· No SaaS Β· No vendor lock-in


Discoverability keywords: seo audit mcp server Β· open-source seo crawler Β· self-hosted seo crawler Β· technical seo audit mcp Β· on-site seo audit tool Β· alternative to paid seo crawlers Β· free seo audit tool Β· seo crawler for claude Β· seo crawler for cursor Β· seo crawler for openai codex Β· seo crawler for windsurf Β· seo crawler for continue.dev Β· mcp server for seo Β· model context protocol seo Β· hreflang audit tool free Β· canonical chain checker Β· broken link checker unlimited Β· core web vitals audit cli Β· structured data validator command line Β· schema.org rich results validator Β· sitemap audit tool Β· sitemap orphan detection Β· WAF detection crawler Β· cloudflare challenge detector Β· security headers checker Β· CSP HSTS audit Β· google search console integration crawler Β· soft 404 detection Β· chunked crawler no timeout MCP Β· technical SEO audit api Β· python seo crawler Β· seo agency tool open source Β· ephemeral seo audit Β· agency-safe seo crawler Β· branded pdf seo report Β· seo audit cli tool Β· mit-licensed seo crawler Β· free site audit tool Β· enterprise seo crawler self-hosted Β· librecrawl mcp Β· librecrawl mcp server

F
license - not found
-
quality - not tested
A
maintenance

Maintenance

–Maintainers
–Response time
1dRelease cycle
9Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/adityaarsharma/librecrawl-technical-seo-audit-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server