Which integrations are available for this server?

Integrates with Hermes agent to perform automated reconnaissance, exploitation chaining, and report generation using a local knowledge base.

How do I use Super RAG MCP Server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Super RAG MCP Server How do I exploit Apache 2.4.49?" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Super RAG MCP Server

by test-center-ai

Overview Schema Related Servers Score Discussions

Python

Local

🛡️ Super RAG

An offline, agentic AI for penetration testing — your private cybersecurity brain.

Ask pentest questions, run reconnaissance, and exploit-chain over 61,000+ chunks of curated security knowledge — entirely on your own machine. No cloud. No telemetry. No data leaves your box.

Python Offline Vector DB LM Studio MCP License: MIT PRs Welcome

💡 Why Super RAG?

Enterprise security copilots (Microsoft Security Copilot, CrowdStrike Charlotte) live in the cloud, cost a fortune, and are built for defenders. Super RAG flips that:

🔒 Fully offline — runs against a local LLM in LM Studio. Perfect for air-gapped labs, sensitive engagements, and regions with restricted cloud access.
⚔️ Built for offensive reasoning — not just "summarize this alert," but "I see Apache 2.4.49 — what's my next move?" and it chains recon → vuln-ID → exploitation.
📚 Grounded in real knowledge — indexes 1,110 hand-curated notes plus HackTricks, PayloadsAllTheThings, the OWASP cheat sheets, and hundreds of CTF write-ups. Every answer is cited back to its source file.
🧩 Plugs into your agents — exposed as an MCP server, so Hermes, OpenClaw, Claude Desktop, or any MCP client gains a cybersec_search tool instantly.

⚠️ For authorized use only. This is a tool for pentesters, CTF players, and security researchers operating with explicit written permission. See Responsible Use.

Related MCP server: Security Context MCP Server

✨ Features


🔎 13-strategy hybrid retrieval	Dense vectors + full-text BM25, fused with RRF, then multi-hop, corrective, context-aware and re-ranking passes — tuned per pentest phase.
🤖 Agentic pentest loop	`reason → act → observe → reflect`, with stuck-loop detection and a hybrid mode (auto-recon, manual approval before exploitation).
🍯 Evasion awareness	Built-in honeypot, WAF (8 vendors), and firewall detection — so the agent doesn't waste moves on a tarpit.
🛡️ Hallucination guard	A command registry validates every tool invocation against known-good flags before anything runs.
📝 Report generation	One command turns findings into a professional pentest report or a HackerOne-style bug-bounty submission, with CVSS and evidence.
🔌 MCP integration	One shared server, many agents — `cybersec_search`, `cybersec_answer`, `cybersec_status`.
🎓 Fine-tune ready	Extracts real HTB/VulnHub write-up reasoning into JSONL (never fabricated scenarios).

🏗️ Architecture

flowchart LR
    subgraph Knowledge["📚 Knowledge (61,552 chunks)"]
        V["1,110 curated notes"]
        D["HackTricks · PayloadsAllTheThings · OWASP"]
        W["CTF / HTB write-ups"]
    end
    subgraph Engine["🧠 Super RAG"]
        I["ingest.py<br/>chunk + embed (parallel)"]
        Q[("Qdrant<br/>hybrid index")]
        R["rag_engine.py<br/>13 RAG strategies"]
    end
    subgraph Local["💻 LM Studio (localhost:1234)"]
        E["nomic-embed-text"]
        L["gpt-oss-20b"]
    end
    A["agent.py<br/>pentest loop"]
    M["rag_mcp.py<br/>MCP server :8765"]

    V & D & W --> I --> Q
    I -.embeddings.-> E
    Q --> R --> L
    R --> A
    R --> M
    M --> Hermes & OpenClaw & Claude["Claude Desktop"]

Three tiers, depth over scale: embedded Qdrant (no Docker) for hybrid search → 13 combined RAG strategies for context assembly → a local LLM for generation.

🔬 The 13 RAG strategies (combined on every query)

Most projects use #1 and wonder why retrieval is mediocre. Super RAG layers 13, each earning its place in a pentest workflow:

#	Strategy	What it buys you
4	Hybrid (vector + BM25, RRF-fused)	Semantic recall and exact-string recall for `CVE-2024-1086`, `--no-preauth`, `SeDebugPrivilege`
17	Multi-Hop	Port 389 → LDAP enum → user list → AS-REP roast → hash → crack, each hop informed by the last
9	Agentic	The agent decides when and what to retrieve mid-engagement
6	Memory-Augmented	Remembers what was tried hours ago, so it never re-runs a dead path
3	Corrective	Detects weak retrieval and re-queries with reformulated terms
8	Context-Aware	Filters to the target environment (Windows/AD vs web vs cloud)
18	Reasoning re-rank	Keyword-overlap boost so the most useful chunk floats up, not just the most similar
13	Adaptive	Broad scope during recon, narrow and precise during exploitation
21	Hierarchical	General → specific drill-down, mirroring the ATT&CK structure
5	Speculative	Pre-fetches likely follow-ups in the background to cut latency
11	Self-RAG	Builds on its own prior answers across a session
24	Few-Shot	Pulls a real write-up where someone exploited the same service/version
14	Citation-Aware	Every fact carries its source path — essential for bug-bounty reports

🏆 Model benchmark (16 local models, real pentest scenario)

We benchmarked every model in LM Studio on a 2-turn pentest tool-use scenario (recon → exploit chain), measuring speed, accuracy, and valid tool-loops — ejecting each model between runs for clean numbers.

Rank	Model	tok/s	Tool-loops	Verdict
🥇	openai/gpt-oss-20b	165	2/2 ✅	Only model fast, accurate and emitting clean tool calls. 12 GB.
🥈	qwen3.6-35b reasoning-distilled	139	2/2 ✅	Correct chains, needs fence-stripping
—	gemma-4-26b / glm-4.7-flash	156 / 125	0/2 ❌	Fast & accurate, but think without emitting usable tool calls

Key finding: raw "accuracy" is misleading for agents — several high-scoring models produced empty output because they reasoned internally without ever emitting an actionable tool call. The metric that matters is valid tool-loops, and gpt-oss-20b wins it. Reproduce with python model_benchmark.py.

🚀 Quick start

Prerequisites

LM Studio on localhost:1234 (local server enabled) with nomic-embed-text-v1.5 (embeddings) + any chat model loaded — bring your own local model
Python 3.10+ (tested on 3.14.5)
A GPU is recommended (built on an RTX 5090; the embedder + a 12 GB chat model fit in 24 GB VRAM — but smaller models work too)

📦 Batteries included: 1,110 curated notes ship in ./vault, so it works the moment you clone. setup.py then pulls in the public doc corpora (HackTricks, PayloadsAllTheThings, OWASP, CTF write-ups) for the full ~60k-chunk brain.

# 1. Clone
git clone https://github.com/test-center-ai/super-rag.git
cd super-rag

# 2. Install (no torch, no Docker, no HuggingFace needed)
pip install -r requirements.txt

# 3. Bootstrap — clone the public doc repos + check LM Studio
python setup.py              # or: python setup.py --minimal  (bundled notes only)

# 4. Build the index (resumable; minutes with parallel embedding)
python main.py ingest

# 5. Ask anything
python main.py query "how do I exploit Apache 2.4.49 path traversal"
python main.py query "AS-REP roasting — what tool and command?"

# 6. Run the agentic pentest loop (hybrid: auto-recon, manual exploit approval)
python main.py pentest 10.10.10.5 --scope 10.10.10.0/24

# 7. Generate a report, or check health
python main.py report
python main.py status

Use your own notes instead? Point it anywhere: SUPERRAG_VAULT=/path/to/your/notes python main.py ingest. Any folder of Markdown works.
Pick a model: set CHAT_MODEL in config.py. Our 16-model benchmark crowned openai/gpt-oss-20b for agentic tool-use; google/gemma-4-12b-qat is a great lighter pick for plain Q&A.

🤝 Use it from your AI agents (MCP)

Super RAG runs as one shared MCP server (rag_mcp.py, HTTP @ 127.0.0.1:8765/mcp) so multiple agents can query it concurrently:

python rag_mcp.py        # or let Startup\SuperRAG-MCP.cmd auto-start it

Tools exposed: cybersec_search(query, phase) · cybersec_answer(question) · cybersec_status()

// Claude Desktop / OpenClaw style
"mcp": { "servers": { "cybersec-rag": {
  "url": "http://127.0.0.1:8765/mcp", "transport": "streamable-http"
}}}

# Hermes style (config.yaml)
mcp_servers:
  cybersec-rag: { url: http://127.0.0.1:8765/mcp, enabled: true }

See INTEGRATION.md for the full Hermes + OpenClaw walkthrough.

🎬 It works — real agent output

Scenario: black-box target, nmap reveals Apache httpd 2.4.49.

STEP 1  recon       → nmap -sS -A 10.10.10.5          ✓ correct first move (2.9s)
STEP 2  enumerate   → gobuster on :80 (Apache live)    ✓ methodical (3.3s)
STEP 3  exploit     → curl --path-as-is "…/cgi-bin/.%2e/…/bin/bash" -d 'reverse shell'
                      ✓ textbook CVE-2021-41773 mod_cgi RCE (4.9s)
        sources: htb-cpts/initial-access-exploitation.md, oswe/file-inclusion-upload.md, HackTricks

The model followed correct methodology, grounded each step in the vault, and produced a working exploit chain — fully offline.

📁 Project layout

super-rag/
├── main.py              # CLI: ingest · query · pentest · report · extract · status
├── config.py            # paths, model IDs, chunking, timeouts, tool registry
├── ingest.py            # vault → chunks → parallel embed → Qdrant  (~14× faster pipeline)
├── rag_engine.py        # the 13 RAG strategies + RRF fusion
├── agent.py             # ReAct+Reflect pentest loop, scope + approval gates
├── detector.py          # honeypot / WAF / firewall detection
├── memory.py            # attack-surface graph, findings, stuck-loop tracking
├── report.py            # pentest report + bug-bounty submission generators
├── llm.py               # one streaming chat helper (works for every model)
├── rag_mcp.py           # MCP server for agent integration
├── model_benchmark.py   # the 16-model benchmark harness
├── extract_training.py  # real write-ups → fine-tuning JSONL
└── tools/               # registry + nmap/gobuster/ffuf parsers

🧰 Tech stack

Python 3.14 · Qdrant (embedded) · LM Studio (OpenAI-compatible local API) · nomic-embed-text-v1.5 · gpt-oss-20b · MCP / FastMCP — zero cloud dependencies.

🔐 Responsible use

Authorized targets only. Super RAG does not enforce authorization — that is your legal responsibility. Use it on systems you own or have explicit written permission to test (engagements, CTFs, labs).
Hybrid mode gates exploitation behind a manual [y/N] approval. Don't bypass it.
Indexed external repos are reference-only and untrusted — never execute code pulled from them.
This project is for defensive learning, authorized testing, and CTF/education. Don't be a criminal.

🛣️ Roadmap

Qdrant server mode (Docker) for fully-concurrent multi-agent access
GRPO fine-tune of a 7–14B specialist on extracted real write-ups
Web UI (the CLI works today)
Auto-scoping from engagement rules-of-engagement files

🤝 Contributing

Issues and PRs welcome — new tool parsers, RAG strategies, and detector signatures especially. Keep it defensive, keep it cited.

📄 License

MIT — see LICENSE.

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/test-center-ai/super-rag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server