squad-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@squad-mcpReview the current changes with the advisory squad"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
squad-mcp
Website: https://squad.devthinks.com.br/
MCP server that exposes the squad-dev workflow as deterministic tools, prompts, and resources. It classifies a task, scores its risk, picks an advisory squad of specialist reviewers, slices the changed files per agent, validates the plan, and consolidates the advisory verdicts. The host LLM (Claude Code, Cursor, Warp, Claude Desktop, …) orchestrates; squad-mcp provides the building blocks.
It also ships as a Claude Code plugin that bundles the MCP server, the slash commands (/squad:implement, /squad:review, /squad:question, /squad:debug, /squad:tasks, /squad:next, /squad:task, /squad:grillme, /squad:pipeline, /squad:inventory, /squad:stats, /brainstorm, /commit-suggest, /squad:enable-journaling), and the matching skills behind a single /plugin install.
Install
Claude Code plugin (recommended)
/plugin marketplace add ggemba/squad-mcp
/plugin install squad@gempackThe plugin bundles the MCP server plus the slash commands and skills (/squad:implement, /squad:review, /squad:question, /squad:debug, /squad:tasks, /squad:next, /squad:task, /squad:grillme, /squad:pipeline, /squad:inventory, /squad:stats, /brainstorm, /commit-suggest, /squad:enable-journaling). After install, restart Claude Code to pick up the new commands and the squad MCP server.
npm package (any MCP client)
npx -y @gempack/squad-mcpThe package exposes the squad-mcp binary and works with any MCP-capable client. Examples below.
Claude Desktop
%APPDATA%\Claude\claude_desktop_config.json (Windows) / ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"squad": {
"command": "npx",
"args": ["-y", "@gempack/squad-mcp"]
}
}
}Cursor
.cursor/mcp.json (workspace-scoped) or global Cursor settings:
{
"mcpServers": {
"squad": {
"command": "npx",
"args": ["-y", "@gempack/squad-mcp"]
}
}
}Warp
Settings → MCP servers → add. Command npx, args ["-y", "@gempack/squad-mcp"].
From source (development)
git clone https://github.com/ggemba/squad-mcp.git
cd squad-mcp
npm install
npm run build
node dist/index.jsRelated MCP server: WhenLabs/When
Your first /squad:implement in 60 seconds
After install, the plugin is silent until you invoke it. Drop into a repo with at least one staged or recently committed change and run:
/squad:implement add a /health endpoint that returns {"status":"ok"}What happens, in order:
Classification.
compose_squad_workflowlooks at your prompt + changed files and prints something likework_type: Feature, risk: Low, agents: [developer, qa].Depth resolution. It also picks an execution depth —
quick,normal, ordeep— and surfaces it asmode+mode_sourceon the output. Auto-detect rules:deeponrisk == High/ Security work / auth-money-migration signals;quickon Low-risk diffs with ≤5 files and no high-risk signals;normalotherwise. Pass--quick/--normal/--deepto override. If you force--quickon a high-risk diff,securityis force-included and a structuredmode_warningis set so the host can surface it.Plan. The skill drafts an implementation plan and sends it to
tech-lead-plannerfor review (skipped inquick). You see the plan in chat.Gate 1. The skill stops and asks you to approve. Reply
approved,go, or equivalent to proceed; anything else cancels.Implementation. After approval, the skill writes code. Never commits or pushes — that's your call.
Advisory squad. In v1.5+ the advisory runs AFTER implementation against the actual diff, not a plan draft. Every selected agent (architect, dba, dev, qa, security, reviewer — depends on the selection; capped at 2 for
quick, force-includesarchitect+securityfordeep) reviews in parallel and emits a findings list + aScore: NN/100.Consolidation.
tech-lead-consolidatorproduces a verdict (APPROVED/CHANGES_REQUIRED/REJECTED) plus a scorecard like:SQUAD RUBRIC — weighted 82 / 100 (threshold 75) Application Code ████████████████░░░░ 82 ×18% developer Testing & QA ███████████████░░░░░ 78 ×14% qaThe
tech-lead-consolidatorpersona is skipped inquickmode;apply_consolidation_rulesstill runs to produce the verdict. If the verdict isCHANGES_REQUIRED/REJECTED, the reject-loop dispatches the implementer again against the delta.
Other commands to try once /squad:implement works:
/squad:review— same agents, but on an existing diff or PR (no implementation)./squad:question <question>— fast read-only code Q&A. Spawns thecode-explorersubagent to grep + excerpt the relevant lines and answers withfile:linecitations. Use it for "where is X defined?", "what calls Y?", "how does the auth flow work?". No plan, no gates, no implementation./squad:debug <issue>— read-only bug investigation. Takes a bug description + optional stack trace + repro steps, orients viacode-explorer, then dispatches thedebuggerpersona to emit N ranked hypotheses (1 on--quick, 3 on--normal, 5 with a cross-check pass on--deep) withfile:lineevidence and verification steps. The missing middle between/squad:question(lookup) and/squad:implement(fix)./squad:tasks docs/prd.md— decompose a PRD into atomic tasks with confirmation before they land in.squad/tasks.json./squad:next— pick the next ready task;/squad:task 3— work on a specific one./squad:grillme <plan>— Socratic plan validation. Grills your plan one question at a time against the project's domain language (CONTEXT.md) and prior decisions (ADRs indocs/adr/), and writes resolved terms back to both as it goes. Run it before/squad:implementto stress-test a plan; pass--no-writefor a dry run./squad:pipeline <feature>— runs the six squad steps as one guided, human-gated sequence:brainstorm → grillme → tasks → next → implement → review. See Cradle-to-grave with/squad:pipelinebelow./squad:inventory <recipe>— codebase audit/inventory. Scans the repo for a named pattern (defined by a YAML "recipe pack") and emits a structured Markdown report cross-referenced with framework metadata (routes, handlers). Hybrid pipeline: a deterministicrgsweep does the file IO, then tiered LLM enrichment (Haiku tier-1, Sonnet escalation onrequires_semanticrules or low confidence) classifies each finding. Bundled pack v1:php-inline-sql(inline SQL in PHP/Laravel). Writes one MD to./docs/inventory/<recipe>-<date>.mdby default;--out <path>overrides (accepts Obsidian vault paths). Reads source; never edits code./brainstorm <topic>— exploratory Q&A, no code./commit-suggest— generate a Conventional Commits message for staged changes./squad:stats— observability dashboard over.squad/runs.jsonl. Bar charts (verdict mix, score buckets), Unicode sparkline trend, per-agent breakdown of avg wall-clock and estimated tokens. Read-only; never writes. Flags:--quick(last 7d),--thorough(full history + health panel),--since <ISO>,--last <N>,--no-color. Token figures are estimates (chars ÷ 3.5).
Stuck? Check INSTALL.md → Troubleshooting. The most common failures (Failed to reconnect to plugin:squad:squad, marketplace cache, SSH key) all have entries.
How it works
squad-mcp is a deterministic server — it makes no LLM calls of its own. The host LLM does all the reasoning; the server hands it building blocks (tools, prompts, resources) and the skills wire them into a workflow.
flowchart LR
subgraph Host["Host LLM · Claude Code / Cursor / Warp / Claude Desktop"]
SK["Skills<br/>/squad:implement · /review<br/>/pipeline · /stats · ..."]
SA["Subagents<br/>architect · developer<br/>security · qa · ..."]
end
subgraph Server["squad-mcp server · deterministic, no LLM calls"]
T["Tools<br/>classify · score_risk<br/>select_squad · consolidate"]
P["Prompts<br/>orchestration<br/>advisory · consolidator"]
R["Resources<br/>agent://...<br/>severity://..."]
end
SK -->|invoke tools| T
SK -->|load| P
SA -->|read role def| R
T -->|verdict + rubric scorecard| SKA single /squad:implement run threads two human gates — plan approval and a Blocker halt — so the squad never writes code you did not sign off on:
flowchart TD
A["/squad:implement <task>"] --> B["classify · score risk · select squad · pick depth"]
B --> C{"Gate 1<br/>plan approved?"}
C -->|no| X1["stop — nothing written"]
C -->|yes| D["advisory squad runs in parallel<br/>each agent emits Score 0-100"]
D --> E{"Gate 2<br/>any Blocker?"}
E -->|yes| X2["halt — ask the user"]
E -->|no| F["implement the approved plan"]
F --> G["consolidate → verdict + rubric scorecard"]
G --> H{"verdict"}
H -->|APPROVED| I["done — committing is your call"]
H -->|CHANGES_REQUIRED / REJECTED| J["reject loop → re-review the delta"]
J --> GDepth (quick / normal / deep) auto-scales the run: quick caps the squad at 2 agents and skips the planner + consolidator personas; deep force-includes architect + security and raises the reject-loop ceiling. See Your first /squad:implement for the auto-detect rules.
Examples in practice
Every example is a single line you type into the host. The squad sizes itself from the prompt + the changed files — you only reach for a flag to override.
Low-risk feature — auto-detected quick:
/squad:implement add a /health endpoint that returns {"status":"ok"}
work_type: Feature · risk: Low · mode: quick (auto) · agents: [developer, qa]Planner skipped, 2-agent advisory, sub-30s feedback. Stops at Gate 1 for your approval.
Auth refactor — auto-detected deep:
/squad:implement refactor src/auth/jwt-validator to rotate signing keys
work_type: Security · risk: High · mode: deep (auto) · agents: [architect, security, developer, qa, reviewer]touches_authfires →deep. Planner + consolidator personas run, reject-loop ceiling raised to 3.
Forcing --quick on a risky diff — the safety override:
/squad:implement --quick patch the refund amount rounding in src/billing/ledger.ts
mode: quick (user) · mode_warning set—--quickis honoured butsecurityis force-included as one of the 2 agents becausetouches_moneyfired. The host surfaces themode_warningso the downgrade is never silent.
Review an existing PR and post the verdict:
/squad:review #42Runs the advisory on PR #42's diff, renders the scorecard, then dry-runs the PR post — shows the exact request and the markdown body it would post, and waits for your
go.
Fast read-only code Q&A — no plan, no gates:
/squad:question where is the rubric weighted score computed?Spawns
code-explorer, answers withfile:linecitations. Sub-second on--quick.
See where your runs went:
/squad:statsReads
.squad/runs.jsonland renders a cyan ANSI panel — verdict mix, score buckets, sparkline trend, per-agent token + wall-clock breakdown.
Cradle-to-grave with /squad:pipeline
Each squad skill runs standalone. /squad:pipeline chains them into one guided sequence for a feature going from idea to verified change, so you never have to remember what comes next or how to wire one step's output into the next:
flowchart LR
BS["/brainstorm<br/>decide what to build"] --> GM["/squad:grillme<br/>stress-test the plan"]
GM --> TK["/squad:tasks<br/>decompose into tasks"]
TK --> NX["/squad:next<br/>pick the next task"]
NX --> IM["/squad:implement<br/>build it"]
IM --> RV["/squad:review<br/>review the change"]/squad:pipeline add multi-currency support to the checkout flowThe pipeline is an executor that auto-invokes each sub-skill and halts only at explicit user-decision gates (proceed / adjust / skip / exit). Each time you invoke it, it:
Reconstructs how far the feature has progressed from the conversation context (or the
.squad/pipeline-state.jsonresume cache).Dispatches the next sub-skill via the
Skilltool with arguments pre-filled and depth flags forwarded.Stops at each inter-phase gate to explain the decision you are about to make.
Sub-skills' own internal gates (e.g. /squad:implement Gate 1 plan approval, Gate 2 Blocker halt) keep firing as before — those remain in-skill human checkpoints. Reserved interruption commands (pipeline stop, pipeline skip, pipeline redo, pipeline back) break the auto-flow at any gate. The pipeline records no telemetry of its own; each sub-skill still records its own run, so /squad:stats aggregates them normally.
Flag | Purpose |
| Enter the pipeline mid-sequence ( |
| Forwarded as-is to every step the pipeline recommends. |
# already brainstormed — jump straight to stress-testing the plan
/squad:pipeline --from grillme add multi-currency support to the checkout flow
# take a small change cradle-to-grave at quick depth
/squad:pipeline --quick fix the timezone bug in the daily report jobWhat it provides
Tools (deterministic, pure functions)
Tool | Purpose |
| Hardened |
| Heuristic |
| Compute Low/Medium/High from boolean signals (auth, money, migration, files_count, new_module, api_change). |
| Select advisory agents for a work type. Combines matrix + path hints + content sniff. Returns evidence per file. |
| Filter a file list to those owned by a single agent. Used to build sliced advisory prompts. |
| Advisory check for inviolable-rule violations in a plan (commit/push fences, emojis in code blocks, non-English identifiers, impl-before-approval). |
| One-call pipeline: |
| One-call full bundle: |
| Aggregate advisory reports → final verdict (APPROVED / CHANGES_REQUIRED / REJECTED). Returns weighted rubric scorecard when reports carry per-dimension scores. |
| Pure rubric calculator. Takes per-agent scores (0-100) + optional weight overrides, returns weighted score, per-dimension breakdown, and pre-formatted ASCII scorecard. |
| Read and resolve |
| Load past accept/reject decisions from |
| Append one accept/reject decision to |
| Lifecycle maintenance (v0.11.0+): mark entries older than |
| Build a prompt + JSON schema for the host LLM to decompose a PRD into atomic tasks. Pure-MCP: server does NO LLM calls. Caller (skill) feeds the prompt to its model, then calls |
| Read tasks from |
| Pick the next ready task: candidate status (default pending), all dependencies done, optional agent / changed_files filter. Tiebreak priority then id. Returns null + reason when none ready. |
| Bulk-create tasks. Allocates ids sequentially, validates dependencies resolve (forward refs in batch ok), rejects duplicates and self-deps. Atomic write. |
| Flip a task or subtask status: pending / in-progress / review / done / blocked / cancelled. |
| Append subtasks to an existing task. Mechanical only — caller (skill or LLM) supplies the subtask inputs. |
| Filter a file list to those matching a task's |
| List configured agents with role, ownership, naming conventions. |
| Return the full markdown system prompt for an agent (local override → embedded default). |
| Copy embedded defaults to the local override directory so they can be edited. |
| Append one |
| Read-only journal read. Folds the two-row pair by id, filters (since / limit / agent / verdict / mode / invocation / work_type), and returns either the folded list or an aggregate bundle (outcomes + health + trend) when |
Prompts
squad_orchestration— full Phase 0–12 orchestration guide.agent_advisory— sliced prompt for one advisory agent.consolidator— final verdict prompt for TechLead-Consolidator.
Resources
agent://product-owner,agent://tech-lead-planner,agent://tech-lead-consolidator,agent://architect,agent://dba,agent://developer,agent://reviewer,agent://security,agent://qa. (Renamed from PascalCase /poin v0.6.0 — older 0.5.x consumers must useagent://poinstead.)severity://_severity-and-ownership— severity matrix + ownership rules.severity://skill-squad-dev,severity://skill-squad-review— full skill specs.
Bundled skills
The plugin auto-registers these skills via skills/:
Skill | Trigger | Purpose |
| implementation workflow | Single skill, two modes. |
| read-only code Q&A | Spawns the |
| read-only bug investigation | Bridges |
| Socratic plan validation | Grills your plan one question at a time against the project's domain language ( |
| pre-implementation research | Web research in parallel + specialist agent perspectives → options matrix with cited sources and a recommendation. Produces no code. Position: |
| cradle-to-grave orchestration | Chains six squad steps — |
| codebase audit / inventory | Scans the repo for a named pattern (defined by a YAML "recipe pack") and emits a structured Markdown report cross-referenced with framework metadata (routes, handlers). Hybrid pipeline: a deterministic |
| commit message generator | Read-only suggester for Conventional Commits messages. Runs only an allowlist of git commands; never executes mutations; never adds AI co-author trailers. The user runs the commit themselves. |
| observability dashboard | Read |
| auto-journaling opt-in | Copies the bundled |
Bundled subagents
The plugin's agents/ directory registers eleven native Claude Code subagents you can also dispatch directly via Task(subagent_type=…):
product-owner, architect, dba, developer, reviewer, security, qa, tech-lead-planner, tech-lead-consolidator, plus two utility roles: code-explorer (fast read-only code search; Haiku-class; dispatched by the planner for context gathering or by /squad:question for direct Q&A) and debugger (hypothesis-first bug investigation; Haiku-class; dispatched by /squad:debug to emit ranked root-cause hypotheses with file:line evidence and verification steps). Neither utility role scores the rubric or is auto-selected by the matrix.
The /squad:implement skill orchestrates them. For non-Claude-Code MCP clients (Cursor, Claude Desktop, Warp), the same role markdowns are accessible through the MCP agent://… resources and get_agent_definition tool.
Workflow positioning — each skill is standalone, and /squad:pipeline chains them:
flowchart LR
BS["/brainstorm<br/>decide what to build"] --> IM["/squad:implement<br/>implement what was decided"]
IM --> RV["/squad:review<br/>review what was implemented"]
RV --> CM["/commit-suggest<br/>craft the commit message"]
/squad:pipelinewraps this whole sequence (with/squad:grillmeand/squad:tasksin between) as one guided, human-gated flow — see Cradle-to-grave with/squad:pipeline.
See INSTALL.md for trigger examples and the optional commit-msg git hook + permissions.deny snippet that hard-enforce the read-only and no-AI-attribution invariants at the OS / Claude Code layer.
Repo configuration — .squad.yaml
Drop a .squad.yaml (or .squad.yml) at the repo root to override defaults per-project. Versioned with the code, picked up automatically by compose_squad_workflow and compose_advisory_bundle.
# .squad.yaml — example for a regulated fintech backend
# Rubric weights (must sum to 100 across the agents you list).
# Agents NOT listed are zeroed out — listing weights is an explicit choice
# of which dimensions count for this repo.
weights:
security: 30 # PCI compliance — security weighted higher
dba: 22 # double-entry ledger, money on the line
developer: 20
architect: 15
qa: 13
# Per-dimension flag threshold (default 75). Below this, the dimension is
# marked with ⚠ in the scorecard.
threshold: 80
# Quality floor: APPROVED with weighted score below this becomes
# CHANGES_REQUIRED. Severity rules (Blocker/Major) take precedence.
min_score: 75
# Files excluded from advisory. Glob syntax: ** for any depth, * for one
# segment, ? for one char. Useful for docs-only or generated paths.
skip_paths:
- "docs/**"
- "**/*.md"
- "**/generated/**"
- "vendor/**"
# Agents not relevant for this repo (e.g. internal tool, no PO involved).
disable_agents:
- product-ownerAll keys are optional; partial files merge with package defaults. force_agents in tool calls still wins over disable_agents (config is a default policy, not a veto over explicit caller intent). Validation is strict: weights that don't sum to 100, unknown agent names, or invalid threshold ranges are rejected with a clear error.
The reader is cached by mtime — long-running MCP servers automatically pick up edits without a restart.
Learnings — persistent accept/reject memory
Each time the team accepts or rejects an advisory finding, the decision can be appended to .squad/learnings.jsonl. Future runs of the squad load recent decisions and inject them into per-agent and consolidator prompts so the squad stops re-raising findings the team has already considered.
{"ts":"2026-04-12T15:02:31Z","pr":42,"agent":"security","severity":"Major","finding":"missing CSRF on POST /api/refund","decision":"reject","reason":"CSRF terminated at API gateway, see infra/edge.tf","scope":"src/api/**"}
{"ts":"2026-04-15T09:18:11Z","pr":47,"agent":"architect","severity":"Major","finding":"cross-module coupling Auth → Billing","decision":"accept","reason":"refactored to event bus"}The file lives in git. Decisions are auditable in PR diffs.
Recording decisions (v0.11.0+ Phase 12 prompt)
After /squad:review consolidates findings, the skill surfaces a single batched prompt at the end of the report. It groups the findings by agent + severity (Suggestion-level findings excluded) and asks one question:
Save which findings as precedents? Reply:
accept N1,N2,N3/reject N4/all accept/skip/because <reason>to attach a rationale.
Each affirmative pick fires one record_learning call. Examples:
accept 1,2 because we ship this pattern across servicesreject 3(records the rejection without a reason; the squad will still suppress the same finding next run)all accept(accepts every Blocker / Major / Minor in the report)skipor empty response (records nothing)
Per-finding authorisation is required — silence or "thanks" is not authorisation. The skill never invents a reason; the text after because flows verbatim to record_learning.reason.
For non-MCP environments, use the CLI helper:
node tools/record-learning.mjs --reject \
--agent security \
--finding "missing CSRF on POST /api/refund" \
--reason "CSRF terminated at API gateway" \
--scope "src/api/**" \
--pr 42How the squad uses them
In Phase 5 (per-agent advisory) the skill calls read_learnings(workspace_root, agent, changed_files) and injects the rendered ## Past team decisions block into the agent's prompt. In Phase 10 (consolidator) it does the same without an agent filter — the consolidator sees the full picture across agents.
Each agent is told: when a current finding matches a previously rejected decision (similar agent + similar finding text + matching scope), suppress or downgrade severity unless the diff materially changes the rationale. When a finding contradicts a previously accepted decision, flag the contradiction explicitly.
Lifecycle (v0.11.0+): archive + promote
Two new optional fields on each entry let the journal age gracefully without manual surgery:
archived: true— the entry is past the team's age cutoff and is hidden from defaultread_learningsinjection. The row stays on disk for forensics.promoted: true— the same finding (matched by canonicalised title) has been accepted ≥ N times and now surfaces FIRST in the rendered block as⭐ PROMOTED. Advisors are instructed to treat promoted entries as team policy, not ordinary precedent.
Both flags are set by the prune_learnings MCP tool:
prune_learnings({
workspace_root: <repo>,
max_age_days: 180, // entries older than this get archived: true
min_recurrence: 3, // accept-decisions on the same finding ≥ 3× get promoted: true
dry_run: false // set true to inspect counts without mutating
})prune_learnings never auto-runs. The defaults are max_age_days: 0 (= disabled) and min_recurrence: 3 — invoking with no arguments is a safe no-op. Wire it into a cron or pre-commit hook if you want regular housekeeping. Each non-no-op run produces an atomic rewrite of .squad/learnings.jsonl under the same file lock used by record_learning; concurrent readers either see the pre-rewrite or post-rewrite file in full, never a torn write. A .prev snapshot is kept alongside the file as the rollback point.
The v0.11.0 schema is additive and backward-compatible — a v0.10.x reader strips the unknown archived / promoted fields silently and continues. No schema_version bump.
Configuration
Override defaults via .squad.yaml:
learnings:
path: .squad/learnings.jsonl # default
max_recent: 50 # how many recent entries to inject (hard cap 200)
enabled: true # set false to disable injection without deleting the journalThe store reader is mtime-cached. The journal is append-only by design — the skill never amends or deletes past entries; correcting a stale decision means appending a new one.
Tasks — PRD-decomposed atomic work units
The biggest source of token bloat in a long-running squad session is the squad re-analysing the whole repo for every prompt. The tasks store fixes that by decomposing a PRD into atomic tasks up front, then running the squad on ONE task's narrowed scope at a time.
// .squad/tasks.json (excerpt)
{
"version": 1,
"tasks": [
{
"id": 1,
"title": "Add CSRF token to checkout flow",
"status": "done",
"dependencies": [],
"priority": "high",
"scope": "src/api/checkout/**",
"agent_hints": ["security", "developer"],
"test_strategy": "POST without token → 403; POST with token → 200.",
"subtasks": [],
"created_at": "2026-05-08T12:00:00Z",
"updated_at": "2026-05-09T15:30:00Z"
},
{
"id": 2,
"title": "Wire CSRF middleware into refund endpoint",
"status": "pending",
"dependencies": [1],
"priority": "high",
"scope": "src/api/refund/**",
"subtasks": [],
...
}
]
}scope (glob) and agent_hints are squad-mcp-specific additions on top of the claude-task-master shape — they let slice_files_for_task and compose_squad_workflow narrow the advisory automatically.
Decomposing a PRD
Inside Claude Code:
/squad:tasks docs/prd-payments-refactor.mdThe skill (Phase 0.5):
Calls
compose_prd_parsewith the PRD text.Receives a prompt + JSON schema and runs them through Claude.
Shows you the parsed tasks — title, deps, priority, scope, agent_hints — for review.
Calls
record_tasksonly after you say "record" / "go" / "yes".
The parse is pure-MCP: the squad-mcp server never makes LLM calls. The host (Claude Code, Cursor, Warp) does the inference. No provider keys in the server, no surprises for non-Claude clients.
Working tasks
/squad:next # picks the highest-priority ready task
/squad:task 5 # explicit pick by idFor each task:
slice_files_for_tasknarrows the changed-files list to the task'sscope.compose_squad_workflowruns against that slice; ifagent_hintsis set, only those agents wake up.Phase 1 onward proceeds normally, just with much less context.
When done, the skill flips status to
doneviaupdate_task_status.
Configuration
Override defaults via .squad.yaml:
tasks:
path: .squad/tasks.json # default
enabled: true # set false to silence reads without deleting the fileWrites (record_tasks, update_task_status, expand_task) stay open even when reads are disabled — same policy as learnings. Disabling injection should not throw away the journal.
CLI for non-MCP environments
Mirroring the post-review and record-learning helpers:
# decompose offline (you generate the JSON yourself or via another tool)
echo '[{"title":"Add CSRF","scope":"src/api/**"}]' | node tools/record-tasks.mjs
# inspect
node tools/list-tasks.mjs --status pending
node tools/next-task.mjs --json
# flip status from CI
node tools/update-task-status.mjs --task 5 --status doneThe CLIs share tools/_tasks-io.mjs for read/write and require only node 18+. Schema validation is lighter than the MCP tool — production use should prefer the MCP path.
Posting reviews to PRs (GitHub + Bitbucket Cloud)
Once the squad runs, you can post the verdict + scorecard as a PR review on GitHub or Bitbucket Cloud. The skill /squad:review #42 runs the advisory and offers to post the result; default behaviour is dry-run + confirmation — Claude shows the exact request and the markdown body, then waits for your "go" before posting.
# auto-detect platform from `git remote get-url origin`
echo '<consolidation JSON>' | node tools/post-review.mjs --pr 42 --dry-run
echo '<consolidation JSON>' | node tools/post-review.mjs --pr 42
# force a platform
echo '<consolidation JSON>' | node tools/post-review.mjs --pr 31 --platform bitbucket-cloud --repo repos_acgsa/some-repoThe CLI maps verdict → review action deterministically:
Verdict | Score signal | GitHub | Bitbucket Cloud |
| — |
|
|
| — |
|
|
| weighted < |
|
|
| (opt-in floor) |
|
|
| passes threshold |
|
|
Platform auto-detection
--platform auto (default) parses git remote get-url origin:
github.com/<owner>/<repo>→ GitHubbitbucket.org/<workspace>/<repo>→ Bitbucket CloudAnything else → exit 6 with a clear error. Pass
--platform <name> --repo <a>/<b>to override.
Bitbucket Server / Data Center (self-hosted) is not supported — it has a different REST API surface and would need a separate adapter.
Auth
Platform | Mechanism |
GitHub |
|
Bitbucket Cloud |
|
For Bitbucket Cloud, generate an API Token at https://id.atlassian.com/manage-profile/security/api-tokens with the pullrequest:write scope (App Passwords were deprecated by Atlassian in 2025). Auth is HTTP Basic with email:apiToken.
Severity budget (A.3, May 2026)
Cap how many findings get expanded inline in the PR body before collapsing the surplus into a footnote. Drops happen lowest-severity-first; Blockers are never silently dropped. Useful for big PRs and platforms with tight rate limits (Bitbucket Cloud is 1000 req/h per user).
CLI:
node tools/post-review.mjs --pr 42 --severity-cap 20 --drop-below MinorRepo default via .squad.yaml:
pr_posting:
severity_budget:
per_pr_max: 20 # cap total expanded findings (Blockers exempt)
drop_below: Minor # hard floor — anything strictly below this drops FIRSTWhen the budget hides anything, the body carries a footnote like _Severity budget hid 7 findings (4 minor, 3 suggestion). Tune pr_posting.severity_budget in .squad.yaml._ so it's never silent.
SARIF / JSON output (A.2, May 2026)
Emit a SARIF 2.1.0 artefact for CI gating, IDE annotations, and dedup with linters. Three modes:
# markdown only (default — historical behaviour)
node tools/post-review.mjs --pr 42
# SARIF only — writes .squad/last-review.sarif.json, SKIPS the PR post
node tools/post-review.mjs --output-format sarif
# both — SARIF + post to PR
node tools/post-review.mjs --pr 42 --output-format bothOverride path with --sarif-path <file>. Each result carries a partialFingerprints.canonicalHash (16-char sha256) built from (agent, severity, normalized title) — stable across rebases, enables future dedup-on-rerun and cross-tool dedup with Sonar / CodeQL.
Repo default via .squad.yaml:
pr_posting:
output_format: both # markdown | sarif | both (default markdown)GitHub Code Scanning, GitLab SAST, Sonar, and most ingestion pipelines consume SARIF 2.1.0 directly.
Auto-post (opt-in)
If .squad.yaml has pr_posting.auto_post: true, the skill posts without the second confirmation prompt — but always shows the body first. Auto-post means "skip the second yes/no", not "skip the preview".
pr_posting:
auto_post: true # default false — always asks
request_changes_below_score: 50 # below this, post --request-changes instead of --approve
omit_attribution_footer: false # default false — footer presentDetection strategy (select_squad / slice_files_for_agent)
Three layers, in order of strength:
Content sniff — reads the first 16 KB of each file, matches token regexes (e.g.
class : DbContext,[ApiController],services.AddScoped<>,from 'express',prisma.<model>.findMany,from sqlalchemy,gorm.Open,gin.New). Strong signal, name-agnostic. Patterns can be ext-gated (e.g. only.pyforfrom sqlalchemy) to avoid cross-stack false positives.Path hint — file path regex (e.g.
*Repository.cs,Migrations/,Controller.cs,api/,models/). Cheap, complementary.Conventions — each agent flags non-conformant naming as a finding so future detections improve over time.
Output of select_squad includes per-file evidence with confidence and low_confidence_files for unclassified files. Override via the force_agents parameter or by editing local agent definitions.
Local override of agent definitions
The loader picks ONE local override directory:
If
SQUAD_AGENTS_DIRis set, that path is used exclusively (the platform default is not consulted).Otherwise:
%APPDATA%\squad-mcp\agentson Windows,$XDG_CONFIG_HOME/squad-mcp/agentson Unix (falls back to~/.config/squad-mcp/agentsifXDG_CONFIG_HOMEis unset).
Per-file resolution: if the agent's *.md exists in the chosen local directory, it wins. Otherwise, the embedded default bundled in the package is used.
Override files are loaded verbatim and rendered into the LLM's context with full agent authority — treat the directory as code (user-only writable, not on shared volumes, never sourced from untrusted input).
Since v0.4.0, the override directory is validated against an allowlist (HOME, APPDATA, LOCALAPPDATA, XDG_CONFIG_HOME, process.cwd()); paths outside the allowlist are rejected with OVERRIDE_REJECTED. Set SQUAD_AGENTS_ALLOW_UNSAFE=1 to bypass for unusual setups (logs a warn banner). See INSTALL.md for the full security guidance.
Run the init_local_config tool once to seed the local directory with editable defaults.
Repo layout
squad-mcp/
├── .claude-plugin/ # Claude Code plugin manifest + marketplace
├── .github/workflows/ # CI + release workflows
├── agents/ # Native subagents (one .md per subagent, kebab-case + frontmatter)
├── shared/ # Severity matrix + skill specs (resources, not subagents — kept outside agents/ for the plugin manifest validator)
├── commands/ # Slash commands (/squad:implement, /squad:review, /brainstorm, /commit-suggest)
├── skills/ # Bundled skills
│ ├── squad/ # single skill, two modes (implement | review)
│ ├── brainstorm/
│ └── commit-suggest/
├── src/
│ ├── index.ts # stdio entry
│ ├── tools/ # MCP tools (deterministic functions)
│ ├── resources/ # MCP resources + agent loader
│ ├── prompts/ # MCP prompt templates
│ ├── exec/git.ts # hardened git execution layer
│ ├── observability/logger.ts # structured stderr JSON logs
│ ├── util/path-safety.ts # path-traversal-safe resolution
│ └── config/
│ └── ownership-matrix.ts # agents, work types, content/path patterns
├── tests/ # vitest unit + integration + stdio smoke
├── tools/
│ └── git-hooks/commit-msg # opt-in hook rejecting AI-attribution trailers
└── dist/ # compiled JS (gitignored, shipped via npm)Tests
npm test # vitest (unit + integration)
node tests/smoke.mjs # stdio JSON-RPC smoke test (requires npm run build first)Versioning + release
This project follows SemVer. Releases are tagged vX.Y.Z on main, which triggers the .github/workflows/release.yml workflow to publish @gempack/squad-mcp@X.Y.Z to npm with provenance. See CHANGELOG.md for the version history.
Contributing
Issues and PRs welcome at https://github.com/ggemba/squad-mcp. Run npm test && npm run build before opening a PR. CI runs on Linux + Windows on Node 22 and 24.
License
Apache-2.0. See NOTICE for attribution and third-party dependencies.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/ggemba/squad-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server