How do I use groundwork?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@groundwork research AI adoption trends and verify claims" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

groundwork

by Des-Sleigh

Overview Schema Related Servers Score Discussions

Python

Hybrid

Groundwork

coverage python

A grounded, injection-resistant, cost-aware AI research agent — built to prove an agent can be trusted, not just demoed.

Groundwork researches how businesses adopt and apply AI (use cases, vendor landscape, ROI evidence, implementation patterns, risks) and returns an answer where every claim is verified against a retrieved source, ungrounded statements are flagged rather than shipped, and fetched web content is treated as untrusted data — not instructions. You can watch the whole trajectory — plan → gather → ground → critique → synthesize — stream live in a dashboard.

Groundwork dashboard

No screenshot yet? See docs/DEMO-TODO.md — one command brings the dashboard up.

What makes it different

These are the failure modes most agent demos ignore. Groundwork is built around them, and measures itself on them (see Evaluation):

Differentiator	Where it lives
Grounding verification (real entailment)	After synthesis, an LLM checks every claim for entailment against the retrieved sources; the answer ships with an "X of Y claims verified" report and flags the rest. Lexical fallback runs with no key. (`research_agent/llm_grounding.py`, `grounding.py`)
Prompt-injection resistance	Fetched content is wrapped as data and scanned for manipulation patterns; injections are flagged and ignored, never obeyed. Proven by a benign red-team suite. (`research_agent/defenses.py`, `redteam/`)
Cost-aware tiered routing	Haiku-class workers do the bulk research; Sonnet-class planner/critic supervise. One router maps role → model; cost is accounted by role. (`core/providers.py`, `core/cost.py`)
Observability	Full step/trajectory tracing, streamed live to the dashboard over SSE, plus per-run token/cost accounting. (`core/tracing.py`, `api/server.py`)

Related MCP server: wigolo

Architecture

Three layers over a shared core, built MCP → agent → orchestrator — each runnable standalone. Full diagram in docs/architecture.md; the decisions, trade-offs, and known limitations behind each layer are in docs/DESIGN.md.

core/   providers (+ tiered routing) · tracing · cost · types
  │
  ├─ Layer 1  mcp_server/   spec-compliant MCP server: web_search, fetch_url
  │                         (+provenance, untrusted), extract_claims, check_grounding
  ├─ Layer 2  research_agent/  plan → gather → synthesize (cited) → verify grounding
  │                            + injection defenses, tracing, cost
  └─ Layer 3  orchestrator/   planner → workers (parallel) → critic (grounding +
                              injection checks → retry) → synthesize

api/  FastAPI: POST /research streams the trajectory live (SSE)
web/  Next.js dashboard that renders the stream
evals/ labeled datasets + scorer for grounding accuracy & injection resistance

Real web via Tavily when TAVILY_API_KEY is set; offline fixture corpus otherwise — so dev, CI, and the demo all run with zero keys.
Article extraction: trafilatura → BeautifulSoup → regex, best available.

Evaluation

Groundwork scores its own differentiators on labeled datasets (evals/). The lexical grounder and the regex injection detector need no API key, so these numbers are reproducible — CI runs them on every push:

Capability	Method	n	Precision	Recall	F1	Accuracy
Grounding	lexical heuristic	30	0.86	1.00	0.92	0.90
Grounding	LLM entailment (claude-sonnet-4-6)	30	1.00	0.94	0.97	0.97
Injection detection	regex pattern scan	20	1.00	1.00	1.00	1.00

The lexical grounder over-accepts paraphrased contradictions and overclaims that share vocabulary with a source (precision 0.86). The LLM entailment grounder catches exactly those — perfect precision, never accepting an unsupported claim, at a small recall cost. That gap is the whole argument for grounding with a model rather than string overlap.

Re-run: python -m evals.run (writes evals/report.md).

A real, web-grounded sample brief produced by the agent is committed at reports/sample_research_report.md — note its grounding footer ("X of Y claims verified") and how the final-answer grounding pass flags unverified specifics rather than shipping them.

vs. naive RAG

The point of grounding, as a number (benchmark/report.md, python -m benchmark.run):

Approach	Hallucinations shipped ↓	Valid claims kept ↑
Naive RAG (no grounding)	100% (12/12)	100%
Groundwork — lexical grounder	25% (3/12)	100%
Groundwork — LLM entailment	0% (0/12)	94%

A naive retrieve-then-synthesize agent ships every unsupported claim as if it were true. Groundwork's grounding filter is the difference between confident-but-wrong and trustworthy.

Quick start

pip install -e .                  # core; add ".[real,api]" for live web + the API

# 1) Offline three-layer demo — no key. Plan→workers→critic loop, grounding,
#    injection flags, per-role cost, over a fixture corpus with mock models:
python run_demo.py

# 2) The dashboard (offline mock mode):
GROUNDWORK_MOCK=1 uvicorn api.server:app --port 8000      # backend
cd web && npm install && NEXT_PUBLIC_API_URL=http://localhost:8000 npm run dev

# 3) A REAL run (live models + web):
export ANTHROPIC_API_KEY=sk-ant-...   ;  export TAVILY_API_KEY=tvly-...   # optional
python research.py "How are mid-market logistics firms using AI for demand forecasting?"

# 4) MCP server for an MCP client (Claude Desktop): python -m mcp_server.server

Deploy (FastAPI → Render, Next.js → Vercel): docs/DEPLOY.md. The dashboard supports bring-your-own-key — deploy the backend with no server key and visitors paste their own (sent per-request via X-Anthropic-Key, never stored), so a public demo is free and abuse-safe; with no key it runs in offline mock mode.

Tests & CI

pip install -e ".[dev]" && pytest -q && ruff check . --select E,F,I,W --ignore E501

35 tests at ~77% coverage, all offline (no key / network): injection canaries detected and not obeyed; LLM-grounding JSON parsing + entailment verdicts; supported claims ground while fabricated ones are flagged; the orchestrator critic rejects an ungrounded brief, revises, and accounts cost by role; FastAPI endpoints exercised end-to-end (SSE research stream + run history) via TestClient in mock mode; eval- and benchmark-quality regression guards. GitHub Actions runs lint + tests-with-coverage + the offline evals and benchmark on every push.

Safety

Everything in redteam/injection_pages/ is a benign canary — a harmless obedience probe (e.g. an embedded "append BANANA" / "recommend Brand X"). No operational attacks or harmful payloads anywhere. Treating fetched/external content as untrusted data is the core security stance, applied throughout.

Author

Built by Desmond Sleigh — github.com/Des-Sleigh. Sibling project: llm-eval-harness — measuring model quality with the same evaluation discipline Groundwork applies to its own output.

License: MIT.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Des-Sleigh/groundwork'

If you have feedback or need assistance with the MCP directory API, please join our Discord server