# Behavioral Specification: FPF Agent Stack Language
This document is the definitive natural language description of the FPF Agent Stack's behaviors. It maps directly to executable BDD (Cucumber) tests and serves as the primary "Language" for system verification.
---
## 1. Skill Management
### 1.1 Discovery and Selection
The runtime must ensure only the necessary capabilities are exposed to the model to conserve tokens and reduce the attack surface.
- **Minimal Loading**: When a user request is submitted (e.g., searching and reporting), the agent must select only relevant skills (e.g., `repo-search`, `file-write`).
- **Abstention**: If no skills match the request (e.g., "Book a flight"), the agent must refuse the task with an `abstain` decision and a clear explanation.
### 1.2 Parsing and Integrity
The system treats `SKILL.md` files as the source of truth for both human and machine instructions.
- **Boundary Extraction**: Metadata such as the skill name and ID must be correctly extracted from the Markdown frontmatter.
- **Kernel Extraction**: The core prompt instructions (the "Kernel") must be accurately read from the body of the file.
- **Permission Enforcement**: If no tools are explicitly allowed in the skill definition, the runtime must enforce an empty permission list.
### 1.3 Usability & Completeness (U.MethodDescription)
The system demands that the `SKILL.md` artifact functions as a complete `U.MethodDescription` (A.3.2) alongside a `U.ServiceClause` (A.2.3).
- **Zero-Shot Enactment**: Agents must be able to construct a valid `U.RoleAssignment` and `U.WorkPlan` from the `U.Method.interface` alone, without latent knowledge.
- **Epistemic Completeness**: The artifact must contain the full `U.Episteme` required for the role.
---
## 2. Sandbox and Safety
### 2.1 AgentFS Isolation
To prevent accidental damage to the host system, all operations are sandboxed.
- **Copy-on-Write**: Modifications made within a session (e.g., writing to `README.md`) must be visible inside that session but must *never* modify the original files in the host workspace.
- **Persistent State**: The host workspace remains "original" until a session is explicitly committed or exported.
### 2.2 Auditability
Every interaction is recorded for forensic and improvement purposes.
- **Action Logging**: Every tool execution (e.g., `write_file`) must generate a corresponding entry in the AgentFS audit log.
- **Metadata Capture**: Audit entries must include critical details like affected file paths and caller IDs.
---
## 3. Decision Logic (Tri-State Guards)
The system operates on a strict non-binary logic to handle uncertainty and failure gracefully.
| State | Rule |
| :--- | :--- |
| **pass** | All preconditions and evidence are valid; proceed with execution. |
| **abstain** | Preconditions are not met or evidence is missing; stop before execution. |
| **degrade** | Execution was attempted but failed; capture the error and report partial success or failure. |
- **Evidence Gating**: If a tool requires evidence (like an open session) and it is missing, the guard must return `abstain`.
- **Failure Capture**: If a tool crashes after a `pass` decision, the state must transition to `degrade`, attaching the error logs to the run trace.
---
## 4. Input Validation (Untrusted Model)
The runtime treats all model outputs as potentially malicious or malformed.
- **Tool Whitelisting**: Any tool call not explicitly defined in the loaded skills registry (e.g., `delete_host_files`) must be blocked with an `abstain` decision.
- **Schema Enforcement**: Tool arguments must be validated against JSON schemas. If the model provides incorrect types (e.g., a number where a string is expected), the call is rejected immediately.
---
## 5. Testing Dictionary (Gherkin-to-Implementation)
The following phrases are used in our executable specifications to link these behaviors to the codebase:
| Gherkin Phrase | Implementation Logic |
| :--- | :--- |
| `the agent plans with FunctionGemma` | Calls the `ModelGateway` to generate a tool-calling plan. |
| `reading "{file}" inside the session` | Queries the AgentFS overlay for the current session state. |
| `the guard decision should be "{state}"` | Asserts the tri-state outcome of the `GuardEngine`. |
| `the dispatcher requests "{id}"` | Resolves a skill ID to a physical `SKILL.md` path. |
| `the explanation should mention "{text}"` | Checks model reasoning for specific keywords/justifications. |
---
## Summary of Safety Invariants
1. **Never** modify host files directly.
2. **Never** execute a tool without a valid schema match.
3. **Never** assume a "pass" if evidence is unknown.
4. **Always** log every filesystem and tool operation.