VERDICT
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@VERDICTtriage the memory capture from case 2025-001"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
VERDICT
Verifiable Evidence Reasoning for DFIR Investigation, Correlation, and Triage
An autonomous incident response analyst for the SANS SIFT Workstation. VERDICT extends Protocol SIFT with a read-only Custom MCP Server that turns the SIFT toolchain into typed, evidence-safe functions, and a Claude Code agent that forms hypotheses, corroborates every claim against the actual tool output, and self-corrects when sources disagree. It is built to match adversary speed without trading away the one thing a responder cannot lose: the integrity of the evidence and the truth of the findings.
Find Evil is about closing the gap between machine-speed attacks and human-speed response. In November 2025 Anthropic documented GTG-1002, a state-sponsored operation that drove Claude Code through reconnaissance, exploitation, and lateral movement at 80 to 90 percent autonomy, at request rates Anthropic called physically impossible for human operators. That was the offensive side. Protocol SIFT is the defensive answer, and Rob Lee's framing is exact: meet AI threat speed with defensive AI orchestration. VERDICT sharpens that orchestration by making the two failure modes of an autonomous responder (modifying evidence, and confidently reporting things that are not true) structurally impossible rather than merely discouraged.
Submission compliance (read this first)
Every required component, with its exact location. This project is complete and each item is linked below.
# | Required component | Where it is |
1 | Public code repository, open source | This repository. License below. |
2 | MIT or Apache 2.0 license file |
|
3 | README with setup instructions | |
4 | Live deployment or step-by-step run instructions | Run an investigation: one Docker command, or native on SIFT. |
5 | Text description of features and functionality | What it does below, and the Devpost project page. |
6 | Demo video (live terminal, audio narration, a self-correction) | Linked at the top of the Devpost page. A genuine asciinema terminal recording (the raw cast is committed at |
7 | Architecture diagram |
|
8 | Evidence dataset documentation |
|
9 | Accuracy report |
|
10 | Agent execution logs (traceable to tool executions) |
|
Architectural pattern used (per the brief): Pattern 2, Custom MCP Server, with Claude Code as the agentic engine. The three required capabilities (self-correction without human intervention, accuracy validation traceable to specific artifacts, and a structured investigative narrative) are demonstrated and described in How the design maps to the criteria.
A full self-audit against the official Stage One qualification prompt (the exact 12 checks
judges run, from the Judge Pack Appendix A) is in
SUBMISSION_CHECKLIST.md. The judges' non-negotiable
"trace any finding to its tool execution" check is done for you, with three worked
examples, in docs/three_claim_trace.md.
Related MCP server: SIFTGuard
What it does
You point VERDICT at a case (a disk image, a memory capture, a packet capture, or several of them from the same host) and it runs a complete triage the way a senior analyst would:
Orients on the evidence, recording a read-only chain-of-custody hash of every object before touching it.
Forms falsifiable hypotheses about what happened, and names the artifact that would confirm or kill each one.
Sequences tools adaptively. It runs the cheapest tool that can decide a hypothesis, reads the result, and lets that result choose the next tool. There is no fixed pipeline.
Corroborates every finding. Before any claim is allowed into the report, a deterministic engine re-reads the archived tool output and checks the asserted value is really there, in two independent sources, before it will call anything confirmed.
Self-corrects. When the cited output does not support a claim, the claim is caught as a likely hallucination and retracted. When two sources disagree, the agent runs a third to break the tie. Every change of mind is logged with the execution that triggered it.
Cross-checks sources. Given a disk and a memory image from one host, it compares them and flags discrepancies, which is where real intrusions hide.
Proves integrity and reports. It re-hashes the originals to prove they were never modified, then renders a structured investigative narrative and an honest accuracy report, both generated from the run ledger.
The result is an investigation a colleague could defend under cross-examination: every sentence traces to a specific tool execution, confirmed facts are separated from inferences, and the mistakes the agent caught itself making are on the record.
Why it is different from the baseline
The baseline Protocol SIFT is a Claude Code configuration: skill files plus behavioral
rules that hand the model a shell (Bash(*)) and ask it to be careful. Its guardrails
are prompt-based, and it feeds raw tool output into the model context, which is the
documented source of its hallucinations.
VERDICT changes the architecture, not just the prompt:
Concern | Baseline Protocol SIFT | VERDICT |
Evidence safety | Prompt says "never modify"; shell can still do anything | No shell, no write tool exists; files opened |
Hallucination control | "No hallucinations" instruction | Deterministic corroboration engine re-checks every claim against archived output |
Self-correction | "On failure, retry" instruction | Structural: an UNSUPPORTED or CONTRADICTED verdict forces a retraction or tie-break, logged with its trigger |
Context overload | Raw tool dumps into the model | Output parsed to compact summaries; raw archived to disk and referenced by id |
Audit trail | A summary line appended on stop | One structured provenance record per execution; every finding prints its exec ids |
Setup (two paths)
Path A: Docker (recommended for judges, one command)
Requirements: Docker, and Claude Code credentials (an ANTHROPIC_API_KEY, or a mounted
~/.claude). Nothing else.
git clone https://github.com/tejcodes-rex/verdict.git
cd verdict
docker build -t verdict:latest .The image carries a pinned subset of the SIFT toolchain (The Sleuth Kit, Volatility 3, Volatility 2.6 for older or 32-bit memory images, tshark, YARA, RegRipper, ExifTool) plus the agent runtime, so a run is reproducible on any machine without standing up a SIFT VM.
Path B: Native on the SANS SIFT Workstation
On a SIFT Workstation the tools are already installed. Install the server and the agent config:
git clone https://github.com/tejcodes-rex/verdict.git
cd verdict
pip3 install -e .
# Point Claude Code at the VERDICT MCP server and doctrine:
cp agent/CLAUDE.md ~/.claude/CLAUDE.md
cp agent/settings.json ~/.claude/settings.jsonThe MCP server resolves each tool at runtime (Volatility 3 at SIFT's /opt/volatility3*/vol.py,
Volatility 2 at SIFT's /usr/local/bin/vol.py, the rest on PATH), with VERDICT_<TOOL>
env overrides. Confirm your environment in one command:
python3 -m verdict.doctor # prints exactly which tool resolved, and whereThis was verified against a real SIFT tool layout: with vol deliberately not on PATH (as
on the OVA), VERDICT resolved Volatility 3 to python3 /opt/volatility3/vol.py, executed it
from that path, and ran a live investigation on the SIFT toolchain. See
docs/architecture.md and the accuracy report.
Run an investigation (for judges)
Sample evidence with published ground truth is documented in
docs/datasets.md. The Nitroba network case is small (53 MB) and
self-validating. To reproduce the headline run:
# 1. Fetch the sample evidence (script downloads from the official source and verifies hashes)
bash scripts/fetch_sample_evidence.sh
# 2. Run the full autonomous investigation in Docker.
# Evidence is mounted read-only; all output lands in ./work.
ANTHROPIC_API_KEY=sk-... \
EVIDENCE=$(pwd)/evidence/cases/nitroba \
WORK=$(pwd)/work \
docker compose run --rm --entrypoint bash verdict \
scripts/run_investigation.sh nitroba /evidence /work(The image entrypoint is the MCP server itself; --entrypoint bash runs the
investigation driver instead. If you prefer not to use compose, the equivalent
docker run is in docs/submission_guide.md.)
Agent authentication: the agent is Claude Code, so it needs your Claude credentials.
Either export ANTHROPIC_API_KEY (shown above), or, if you use a Claude subscription,
mount your existing login by adding -v $HOME/.claude:/root/.claude to the run. The
image never contains credentials; you supply them at run time. This is the only external
dependency, and it is one a judge of this event already has.
When it finishes, look in the newest ./work/run-*/ directory:
report.mdis the investigative narrative.accuracy_report.mdis the self-assessment, scored against ground truth.provenance.jsonlis the tool-execution audit trail.agent_stream.jsonlis the full agent execution log.
To run the agent against your own evidence, drop it in a directory and pass that as
EVIDENCE. The agent adapts to whatever data types it finds.
How the design maps to the judging criteria
Autonomous Execution Quality. The agent reasons over an explicit hypothesis ledger and re-sequences based on what it learns. Self-correction is structural: a corroboration verdict, computed from real tool output, forces it. The triggers are genuine (a value missing from output, two sources disagreeing), so they cannot be staged. See
docs/architecture.md.IR Accuracy. No claim reaches the confirmed list without its asserted value being present in two independent sources. Confirmed facts and inferences are labeled distinctly. The accuracy report is generated from the ledger, lists the hallucinations the agent caught, and is scored against published ground truth.
Breadth and Depth. Disk (The Sleuth Kit), memory (Volatility 3, Volatility 2, and symbol-free string analysis), network (tshark), Windows registry and program-execution evidence (RegRipper, Shimcache, Amcache), and IOC hunting (YARA) are handled deeply, with a grounded MITRE ATT&CK mapping. The Ali Hadi case, run on the genuine SIFT Workstation (
logs/examples/alihadi_sift/), correlates a 3 GB disk image and a 1 GB memory capture from one host: it reconstructs the full kill chain (DVWA command injection, an sqlmap SQL-injection campaign, three webshells including a Meterpreter payload, two attacker-created accounts in the on-disk SAM, and RDP persistence), and promotes the RDP-persistence finding to CONFIRMED only when the memory capture independently corroborates the same injected command. Cross-source corroboration of persistence is treated as a first-class, high-value finding.Constraint Implementation. Guardrails are architectural: the server exposes no destructive primitive, evidence is opened read-only (
O_RDONLY | O_NOFOLLOW), the memory tools refuse any Volatility dump plugin or output flag so a tool argument cannot become a write primitive, and a configuration layer denies the generic shell and write tools as defense in depth. The bypass test is documented in the accuracy report.Audit Trail Quality. One provenance record per execution; every finding prints its exec ids; the three-claim trace is mechanical. The provenance log is a tamper-evident hash chain, and each run is sealed with a verified chain head in
manifest.json.Usability and Documentation. One Docker command to run, and verified on the genuine SANS SIFT Workstation: the official OVA was booted and the full autonomous agent ran on it end to end (
logs/sift_verification/, and the SIFT run underlogs/examples/nitroba_sift/);python3 -m verdict.doctorreports tool resolution on any host. Ships ground-truth samples, committed runs, a live investigation view (scripts/agent_live.py), and an honest head-to-head with the example submission (benchmark/VALHUNTIR_COMPARISON.md).
Every corroborated finding is also mapped to MITRE ATT&CK techniques (deterministically, grounded only in confirmed findings so it cannot hallucinate context), giving the analyst the kill-chain picture without the false-context risk of a knowledge-base lookup.
Repository layout
verdict/
verdict/ the Python package (MCP server, engine, tools, reports)
server.py the MCP server: typed read-only tools + reasoning tools
evidence.py read-only evidence vault and integrity guarantees
provenance.py per-execution audit records (JSONL)
corroborate.py deterministic claim verifier
ledger.py hypotheses, findings, self-correction events
tools/ sleuthkit, volatility3, tshark, plaso wrappers
report/ narrative and accuracy-report generators
agent/ the agentic engine config: doctrine + read-only permissions
scripts/ run an investigation, fetch sample evidence, score a run
groundtruth/ answer keys for scored cases
docs/ architecture, datasets, accuracy report, demo script
logs/examples/ a committed full run (agent stream + tool provenance)
tests/ smoke tests that exercise the stack on real evidenceLicense
MIT. See LICENSE. This project builds on the open-source Protocol SIFT and
SANS SIFT Workstation; the novel contribution (the read-only MCP server, the
corroboration engine, the hypothesis ledger, and the provenance and reporting layers) is
original work created for this event and is documented as such throughout.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/tejcodes-rex/verdict'
If you have feedback or need assistance with the MCP directory API, please join our Discord server