**Development Method (FPF-first): FPF Agent Stack**
Method + ceremonies + gates. Accessed 2026-01-09.
# 1. Core working agreements
- Describe, then test: do not call something a Spec until a harness can falsify it.
- Keep plan vs run separate: WorkPlan (intended schedule) is not Work (executed reality).
- Treat all model output as untrusted input; the executor is deterministic and schema-driven.
- Guards are tri-state: pass \| degrade \| abstain. Unknown never becomes pass.
# 2. Artifact flow (I/D/S style)
Keep the pipeline short and falsifiable:
--------------------------------------------------------------------------------------------------------------
Stage Artifact Gate
----------------------- ------------------------------------------------ -------------------------------------
D Functional Description (this stack) Stakeholder review: is it useful?
S BDD feature files (Cucumber) Cucumber green on CI
Impl Skills + runtime code Unit tests + schema checks
E Evidence bundle (AgentFS audit + test reports) Auditor can reproduce from snapshot
--------------------------------------------------------------------------------------------------------------
# 3. Skill authoring pattern (Atomize + Route)
Every skill/tool must ship four parts (lightweight but explicit):
- L-TOOL-xx: Definition (what it means / contract).
- A-TOOL-xx: Admissibility (guard predicate; what must be true to run).
- D-TOOL-xx: Duties (who must do what; retention, reviews).
- E-TOOL-xx: Evidence carriers (logs/traces used to adjudicate A-TOOL-xx).
Put these in SKILL.md so the agent and humans see the same contract.
# 4. Quality gates (CI)
- Static: lint, typecheck, schema validation for every tool interface.
- Dynamic: cucumber (acceptance) + unit tests (contract-level).
- Audit: each cucumber run stores an AgentFS session DB + RunTrace artifact.
- Freshness: evidence artifacts carry valid-until dates; stale evidence accrues epistemic debt and triggers review.
# 5. Proxy audit loop (avoid Goodhart)
Whenever you introduce a metric (latency, pass rate, etc.), explicitly declare what objective it proxies and schedule periodic review.
# Appendix A. Dev Plan (6 milestones)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Milestone Outcome Acceptance evidence
--------------------------------- ---------------------------------------------------------------------- ----------------------------------------------------------
M0: Scaffold Repo layout, skill loader skeleton, cucumber runner wired Cucumber executes a dummy feature
M1: Minimal agent loop FunctionGemma selects one tool; schema validation; abstain on errors Features: tool selection + schema-fail behavior pass
M2: AgentFS integration All tool execution inside AgentFS session; diff + audit captured Features: overlay isolation + audit log pass
M3: Skill pack v1 3-5 core skills (e.g., file ops, repo search, doc generation) Feature: skill discovery + version pinning pass
M4: Guard + evidence discipline Per-tool guards + evidence links; tri-state decisions enforced Features: unknown never pass; degrade/abstain rules pass
M5: Train/evaluate loop Dataset from traces; fine-tune; regressions prevented Cucumber green + lower schema-error rate
-------------------------------------------------------------------------------------------------------------------------------------------------------------------