# Development Method (FPF-first): FPF Agent Stack
Method + ceremonies + gates.
## 1. Core working agreements
- Describe, then test: do not call something a Spec until a harness can falsify it.
- Keep plan vs run separate: WorkPlan (intended schedule) is not Work (executed reality).
- Treat all model output as untrusted input; the executor is deterministic and schema-driven.
- Guards are tri-state: pass | degrade | abstain. Unknown never becomes pass.
## 2. Artifact flow (I/D/S style)
Keep the pipeline short and falsifiable:
| Stage | Artifact | Gate |
| :--- | :--- | :--- |
| **D** | Functional Description (this stack) | Stakeholder review: is it useful? |
| **S** | BDD feature files (Cucumber) | Cucumber green on CI |
| **Impl** | Skills + runtime code | Unit tests + schema checks |
| **E** | Evidence bundle (AgentFS audit + test reports) | Auditor can reproduce from snapshot |
## 3. Skill authoring pattern (Atomize + Route)
Every skill/tool must ship four parts (lightweight but explicit):
- **L-TOOL-xx:** Definition (what it means / contract).
- **A-TOOL-xx:** Admissibility (guard predicate; what must be true to run).
- **D-TOOL-xx:** Duties (who must do what; retention, reviews).
- **E-TOOL-xx:** Evidence carriers (logs/traces used to adjudicate A-TOOL-xx).
Put these in `SKILL.md` so the agent and humans see the same contract.
## 4. Quality gates (CI)
- **Static:** lint, typecheck, schema validation for every tool interface.
- **Dynamic:** cucumber (acceptance) + unit tests (contract-level).
- **Audit:** each cucumber run stores an AgentFS session DB + RunTrace artifact.
- **Freshness:** evidence artifacts carry valid-until dates; stale evidence accrues epistemic debt and triggers review.
## 5. Proxy audit loop (avoid Goodhart)
Whenever you introduce a metric (latency, pass rate, etc.), explicitly declare what objective it proxies and schedule periodic review.
## Appendix A. Dev Plan (6 milestones)
| Milestone | Outcome | Acceptance evidence |
| :--- | :--- | :--- |
| **M0: Scaffold** | Repo layout, skill loader skeleton, cucumber runner wired | Cucumber executes a dummy feature |
| **M1: Minimal agent loop** | FunctionGemma selects one tool; schema validation; abstain on errors | Features: tool selection + schema-fail behavior pass |
| **M2: AgentFS integration** | All tool execution inside AgentFS session; diff + audit captured | Features: overlay isolation + audit log pass |
| **M3: Skill pack v1** | 3-5 core skills (e.g., file ops, repo search, doc generation) | Feature: skill discovery + version pinning pass |
| **M4: Guard + evidence discipline** | Per-tool guards + evidence links; tri-state decisions enforced | Features: unknown never pass; degrade/abstain rules pass |
| **M5: Train/evaluate loop** | Dataset from traces; fine-tune; regressions prevented | Cucumber green + lower schema-error rate |