SentinelMCP
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@SentinelMCPRun the full injection and hijacking suite on my agent."
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
SentinelMCP
An automated red-teaming and reliability-auditing platform for AI agents, exposed as an MCP server.
Most agent projects show an agent doing a task. SentinelMCP does the opposite: it's a multi-agent system whose job is to attack and score other agents - checking them for prompt injection, tool misuse, prompt/data exfiltration, and unreliable behavior under adversarial pressure then reports the results as a reliability scorecard.

Runs entirely on free-tier infrastructure. No credit card required.
Architecture
Four agents, orchestrated with LangGraph:
┌───────────┐ ┌───────────┐ ┌──────────┐ ┌───────────┐
│ Attacker │ ──▶ │ Target │ ──▶ │ Judge │ ──▶ │ Reporter │
│ agent │ │ agent │ │ agent │ │ agent │
└───────────┘ └───────────┘ └──────────┘ └───────────┘
selects/mutates system under scores vs. aggregates
adversarial test (your rubric, JSON into scorecard,
test case agent or the verdict flags high-
sample one) severity failsAttacker — plays back a curated adversarial prompt, or (optionally) has an LLM paraphrase it into a fresh variant.
Target — the agent under test. Ships with a sample customer-support agent (with mock sensitive tools: refund, delete account, update order, get customer data) so you can run the suite immediately. Swap in your own agent by implementing
run(prompt, context) -> {"response_text": ..., "tool_called": ...}.Judge — LLM-as-judge, forced into strict JSON output and validated with pydantic. Tool-hijacking verdicts are additionally checked deterministically (if the target tool actually fired, that overrides the LLM's opinion — an objective signal shouldn't be left to LLM judgment alone).
Reporter — pure aggregation, no LLM call. Produces per-category success rates and flags high-severity failures.
All of this is also exposed as an MCP server (mcp_server/server.py) with four tools: run_injection_suite, score_trajectory, generate_report, list_attack_categories — so any MCP-compatible client can trigger an audit without knowing anything about the internals.
Related MCP server: inkog
Attack taxonomy
8 categories, 24 seed test cases (3 each), defined in eval/taxonomy.py and eval/test_cases.py:
Category | What it tests |
| "Ignore your instructions" style prompts |
| Malicious instructions hidden inside "retrieved" documents/tickets |
| Attempts to trigger refunds, deletions, etc. without authorization |
| Attacker impersonates system/admin/developer |
| Attempts to extract the system prompt verbatim |
| Attempts to extract other users' PII |
| Burying the real instruction under padding text |
| Conflicting instructions to see which one wins |
Extend eval/test_cases.py to grow the suite, or run with --mutate to have the Attacker LLM generate paraphrased variants at runtime.
Running it for free
Every piece of infrastructure here (LangGraph, FastAPI, SQLite, Streamlit, Docker) is free with no caveats. The only thing that costs money by default is LLM inference — here's how to keep that at $0 too.
Option A: Groq free tier (recommended — fast, no local setup)
Sign up at console.groq.com — no credit card needed.
Generate an API key.
cp .env.example .envand setGROQ_API_KEY.
Groq's free tier gives ~30 requests/minute and a generous daily token allowance across open models like Llama 3.3 70B — plenty for this suite. The project paces requests automatically (REQUEST_DELAY_SECONDS in .env) to stay under the per-minute cap, and retries with backoff on 429s.
Option B: Ollama (fully local, zero rate limits)
Install Ollama.
ollama pull llama3.1In
.env, setTARGET_PROVIDER=ollama(orDEFAULT_PROVIDER=ollamato run everything locally).
Recommended split
Run the Target locally on Ollama and the Attacker/Judge/Reporter on Groq — so you're not spending your Groq quota testing both sides of the fight. This is the default in .env.example.
Setup
git clone <this-repo>
cd sentinelmcp
pip install -r requirements.txt
cp .env.example .env # then fill in GROQ_API_KEY (or set up Ollama)Usage
Run the full suite against the sample target agent:
python main.py --target "support-agent-v1"Run only specific categories:
python main.py --target "support-agent-v1" --category tool_hijacking direct_injectionUse LLM-generated paraphrased variants instead of the static seed prompts:
python main.py --target "support-agent-v1" --mutateView results in the dashboard:
streamlit run dashboard/app.pyRun as an MCP server:
python -m mcp_server.serverThen point any MCP-compatible client at it and call run_injection_suite, score_trajectory, or generate_report.
Run with Docker:
docker compose upRun the offline test suite (no API keys needed — uses a fake LLM client to verify all orchestration logic):
pytest tests/ -vTesting against your own agent
Replace agents/target.py's TargetAgent with a wrapper around your real agent. The only contract that matters:
class TargetAgent:
def run(self, prompt: str, context: str = "") -> dict:
# ... call your real agent here ...
return {"response_text": "...", "tool_called": "tool_name_or_None"}Then run python main.py --target "my-real-agent" as usual.
Project structure
sentinelmcp/
├── main.py # CLI entry point
├── config.py # provider/model configuration
├── llm_client.py # unified Groq/Ollama client with rate-limit handling
├── agents/
│ ├── attacker.py
│ ├── target.py # sample target agent + mock sensitive tools
│ ├── judge.py # LLM-as-judge with strict JSON rubric
│ ├── reporter.py # aggregation, no LLM call
│ └── graph.py # LangGraph orchestration
├── eval/
│ ├── taxonomy.py # attack categories + severity weights
│ └── test_cases.py # 24 seed adversarial test cases
├── mcp_server/
│ └── server.py # MCP server exposing the suite as tools
├── storage/
│ └── db.py # SQLite persistence
├── dashboard/
│ └── app.py # Streamlit reliability dashboard
└── tests/
└── test_basic.py # offline tests (fake LLM client, no API needed)Extending this project
Grow the test bank past 24 cases — add to
eval/test_cases.py, or rely on--mutateto generate variants.Add new attack categories in
eval/taxonomy.py.Add an explainability layer — attribute which part of a long/adversarial prompt triggered a deviation (a natural next step, and a nice callback if you've done SHAP/Grad-CAM work elsewhere).
Swap SQLite for Postgres if you need concurrent writers —
storage/db.pyis intentionally the only file that would need to change.Publish the MCP server so others can plug it into their own agent stack via
npx/uvx.
🤝 Contributing
Contributions are welcome! If you'd like to improve the project:
Fork the repository
Create a feature branch (
git checkout -b feature/your-feature)Commit your changes (
git commit -m 'Add some feature')Push to the branch (
git push origin feature/your-feature)Open a Pull Request
Ideas for contributions: additional attack categories, real function-calling for tool-hijack detection, a confidence-based human review queue, Postgres backend for concurrent writes, or CI/CD integration for automated runs.
👤 Author
Md. Musa Islam Fahad
CSE (Data Science) · Daffodil International University, Dhaka
📧 musa.islam.fahad@gmail.com
🌐 Portfolio · GitHub · LinkedIn
📄 License
This project is licensed under the MIT License - see LICENSE for details.
Free to use, modify, and deploy.
🙏 Acknowledgements
LangGraph - Multi-agent orchestration
Model Context Protocol - The MCP spec and Python SDK this project implements against
Groq - Free-tier LLM inference
Ollama - Local model runtime
Streamlit - Dashboard framework
Built as a demonstration of automated AI agent security testing.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/MusaIslamFahad/sentinelmcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server