# v1.16 Task Implementation Checklist Feature
Release Date: TBD
## Background & Problem
LLMs frequently "implement only 3 out of 10 items" problem occurs.
**Root Causes:**
- No verification of individual implementation items when reporting task completion
- If LLM reports "task complete", system proceeds without content verification
- Can report "done" with only mocks, `pass`, or `// TODO`
---
## Solution
### Task Structure Extension
Add `checklist` (list of items to implement) to tasks and track completion status of each item.
```
Task1 (description: "Implement authentication")
└─ checklist:
- Add login method
- Add logout method
- Add session management
- Add password validation
```
### Two-Stage Verification
| Verification | Timing | Content |
|--------------|--------|---------|
| **Implementation Item Level** | Step 13 (Task completion) | All checklist items are done/skipped |
| **Task Level** | Step 15 (POST_IMPL_VERIFY) | Entire feature works correctly |
---
## Design
### Task Registration (READY_PLAN)
```python
submit_phase(data={
"tasks": [
{
"id": "task_1",
"description": "Implement authentication",
"status": "pending",
"checklist": [
{"item": "Add login method", "status": "pending"},
{"item": "Add logout method", "status": "pending"},
{"item": "Add session management", "status": "pending"},
{"item": "Add password validation", "status": "pending"}
]
}
]
})
```
### Task Completion Report (READY_IMPL)
```python
complete_task(task_id="task_1", data={
"summary": "Authentication implementation complete",
"checklist": [
{"item": "Add login method", "status": "done", "evidence": "auth.py:42-58"},
{"item": "Add logout method", "status": "done", "evidence": "auth.py:60-75"},
{"item": "Add session management", "status": "skipped", "reason": "Reusing existing SessionManager"},
{"item": "Add password validation", "status": "done", "evidence": "auth.py:77-95"}
]
})
```
### Checklist Item Structure
| status | evidence | reason | Description |
|--------|----------|--------|-------------|
| `pending` | - | - | Initial state |
| `done` | **Required** | - | Implementation complete (provide evidence as file:line) |
| `skipped` | - | **Required** | Not implementing (explain reason) |
---
## Validation Logic (Orchestrator Side)
### Evidence Format Specification
```
<file_path>:<line>[-<end_line>]
```
**Valid:**
```
auth.py:42
auth.py:42-58
src/services/auth.py:42
src/services/auth.py:42-58
```
**Invalid:**
```
auth.py line 42 # English text format
line 42 in auth.py # Reversed order
auth.py at line 42 # Non-standard format
around auth.py:42 # Vague reference
```
### Regular Expression
```python
import re
EVIDENCE_PATTERN = re.compile(r'^[\w./\-]+\.\w+:\d+(-\d+)?$')
```
### Task Completion Check
```python
def validate_task_completion(task, reported_checklist):
errors = []
# 1. Check all items are reported
original_items = {c["item"] for c in task["checklist"]}
reported_items = {c["item"] for c in reported_checklist}
if original_items != reported_items:
errors.append("checklist items mismatch")
# 2. Validate each item
for item in reported_checklist:
if item["status"] == "pending":
errors.append(f"Item '{item['item']}' is still pending")
elif item["status"] == "done":
if "evidence" not in item or not item["evidence"]:
errors.append(f"Item '{item['item']}' requires evidence")
else:
# Validate evidence format
if not EVIDENCE_PATTERN.match(item["evidence"]):
errors.append(
f"Invalid evidence format for '{item['item']}': '{item['evidence']}'. "
f"Use 'file.py:42' or 'file.py:42-58'"
)
else:
# Validate file existence and content
file_error = validate_evidence_content(item["evidence"])
if file_error:
errors.append(f"Evidence error for '{item['item']}': {file_error}")
elif item["status"] == "skipped":
if "reason" not in item or len(item.get("reason", "")) < 10:
errors.append(f"Item '{item['item']}' requires reason (min 10 chars)")
return errors
```
### Evidence Content Validation
```python
def validate_evidence_content(evidence):
"""
Verify that actual code exists at the location pointed to by evidence
Returns: error message or None
"""
# Parse: "file.py:42-58" → file="file.py", start=42, end=58
match = re.match(r'^(.+):(\d+)(?:-(\d+))?$', evidence)
file_path, start_line, end_line = match.groups()
start_line = int(start_line)
end_line = int(end_line) if end_line else start_line
# 1. File existence check
if not os.path.exists(file_path):
return f"File not found: {file_path}"
# 2. Line range check
with open(file_path) as f:
lines = f.readlines()
if start_line > len(lines):
return f"Line {start_line} exceeds file length ({len(lines)} lines)"
# 3. Empty implementation check
code_lines = lines[start_line-1:end_line]
code = ''.join(code_lines).strip()
# Detect empty function/pass/TODO only
empty_patterns = [
r'^\s*pass\s*$',
r'^\s*\.\.\.\s*$',
r'^\s*#\s*TODO',
r'^\s*//\s*TODO',
r'^\s*raise\s+NotImplementedError',
]
for pattern in empty_patterns:
if re.match(pattern, code, re.MULTILINE):
return f"Empty implementation detected (matches '{pattern}')"
return None
```
---
## phase_contract.yml Changes
### READY_PLAN expected_payload Update
```yaml
READY_PLAN:
expected_payload:
tasks: list[{id, description, status, checklist: list[{item, status}]}]
# Added checklist
instruction: >-
If .code-intel/task_planning.md exists, read it and create a task list following the guidelines.
Each task MUST include a checklist of specific implementation items.
Submit via submit_phase.
```
### READY_IMPL instruction Update
```yaml
READY_IMPL:
expected_payload:
task_id: str
checklist: list[{item, status, evidence?, reason?}]
# Added checklist report
instruction: >-
Run check_write_target before modifying files.
After implementation, report task completion with checklist status:
- For each checklist item, set status to "done" or "skipped"
- "done" requires evidence in strict format: <file_path>:<line> or <file_path>:<start>-<end>
Examples: "auth.py:42", "src/services/auth.py:42-58"
Invalid: "auth.py line 42", "line 42 in auth.py"
- "skipped" requires reason (min 10 characters explaining why not needed)
Submit via complete_task(task_id, data={summary, checklist}).
```
### New Error Messages
```yaml
failures:
READY:
checklist_items_mismatch:
error: payload_mismatch
message: "Reported checklist items don't match registered items. Include all items from the original checklist."
checklist_item_pending:
error: payload_mismatch
message: "Checklist item '{item}' is still pending. Set status to 'done' (with evidence) or 'skipped' (with reason)."
checklist_evidence_required:
error: payload_mismatch
message: "Checklist item '{item}' with status 'done' requires evidence. Format: 'file.py:42' or 'file.py:42-58'."
checklist_evidence_format_invalid:
error: payload_mismatch
message: "Evidence '{evidence}' has invalid format. Required format: <file_path>:<line> or <file_path>:<start>-<end>. Examples: 'auth.py:42', 'src/auth.py:42-58'."
checklist_evidence_file_not_found:
error: payload_mismatch
message: "Evidence file not found: {file_path}. Verify the file path is correct and the file exists."
checklist_evidence_line_out_of_range:
error: payload_mismatch
message: "Evidence line {line} exceeds file length ({total} lines). Use a valid line number within 1-{total}."
checklist_evidence_empty_impl:
error: payload_mismatch
message: "Empty implementation detected at '{evidence}'. Contains only pass/TODO/NotImplementedError. Add actual implementation code."
checklist_reason_required:
error: payload_mismatch
message: "Checklist item '{item}' with status 'skipped' requires reason (min 10 chars). Explain why this item is not needed."
```
---
## task_planning.md Update
```markdown
# Task Planning Guide
- Create tasks for all modification targets identified during exploration phase
- The purpose is to prevent implementation omissions; granularity is flexible
- Do not include post-implementation verification (tests, etc.) in tasks
- Identify dependencies between tasks
- Order by dependencies (independent tasks first)
- Note risk level (High/Medium/Low) for tasks requiring extra caution
## Checklist Requirements
Each task MUST include a `checklist` of specific implementation items.
### How to Write Checklist Items
**Good examples** (concrete, verifiable):
- "Add login() method to auth.py"
- "Implement validate_password() in UserService class"
- "Add new authentication config section to config.yml"
- "Implement /login endpoint in AuthController"
**Bad examples** (vague, unverifiable):
- "Implement authentication" (which file? what exactly?)
- "Login feature" (method? UI? API?)
- "Security handling" (specifically what?)
### Checklist Item Format
```
[verb] [specific change] in/to [file/class/module]
```
Examples:
- Add `login()` method to `auth.py`
- Add `is_active` field to `UserModel`
- Implement `/logout` endpoint in `routes/auth.ts`
- Add hover styles for `.btn-primary` in `styles.css`
### Deriving Checklist from EXPLORATION
Include all modification targets identified during EXPLORATION in the checklist:
```
EXPLORATION findings:
- auth.py: Need to add authentication logic
- models/user.py: Need to add is_active field
- routes/auth.ts: Need /login, /logout endpoints
- tests/: Need to add test files
↓ Convert to
Task: Implement authentication
checklist:
- Add login() method to auth.py
- Add logout() method to auth.py
- Add is_active field to models/user.py
- Implement /login endpoint in routes/auth.ts
- Implement /logout endpoint in routes/auth.ts
```
### When Completing a Task
- `done`: Provide evidence as file:line reference (e.g., "auth.py:42-58")
- `skipped`: Provide reason (min 10 characters) explaining why not needed
```
---
## Implementation Tasks
### 1. phase_contract.yml
- [x] Add `checklist: list[{item, status}]` to `READY_PLAN.expected_payload.tasks`
- [x] Update `READY_PLAN.instruction` (add that checklist is required)
- [x] Add `checklist: list[{item, status, evidence?, reason?}]` to `READY_IMPL.expected_payload`
- [x] Update `READY_IMPL.instruction` (specify evidence format, skipped reason)
- [x] Add the following error messages to `failures.READY`:
- [x] `checklist_items_mismatch`
- [x] `checklist_item_pending`
- [x] `checklist_evidence_required`
- [x] `checklist_evidence_format_invalid`
- [x] `checklist_evidence_file_not_found`
- [x] `checklist_evidence_line_out_of_range`
- [x] `checklist_evidence_empty_impl`
- [x] `checklist_reason_required`
### 2. session.py
- [x] Add `ChecklistItem` dataclass:
```python
@dataclass
class ChecklistItem:
item: str
status: str # "pending" | "done" | "skipped"
evidence: str | None = None
reason: str | None = None
```
- [x] Add `checklist: list[ChecklistItem]` field to `Task` dataclass
- [x] Serialize checklist in `Task.to_dict()`
- [x] Deserialize checklist in `Task.from_dict()`
### 3. code_intel_server.py
- [x] Define `EVIDENCE_PATTERN` regex: `r'^[\w./\-]+\.\w+:\d+(-\d+)?$'`
- [x] Add `validate_evidence_format(evidence)` function
- [x] Add `validate_evidence_content(evidence)` function:
- [x] File existence check
- [x] Line number range check
- [x] Empty implementation detection (pass/TODO/NotImplementedError)
- [x] Add `validate_task_completion(task, reported_checklist)` function:
- [x] All items reported check
- [x] Status is not pending check
- [x] Evidence required for done check
- [x] Evidence format check for done
- [x] Evidence content check for done
- [x] Reason required for skipped check (min 10 chars)
- [x] Support checklist task registration in `_handle_ready()` for READY_PLAN
- [x] Call checklist validation logic in `complete_task()`
- [x] Get error messages from phase_contract.yml via `_get_message()` on errors
### 4. templates/code-intel/task_planning.md
- [x] Add `## Checklist Requirements` section
- [x] Add `### How to Write Checklist Items` (Good/Bad examples)
- [x] Add `### Checklist Item Format`
- [x] Add `### Deriving Checklist from EXPLORATION`
- [x] Add `### When Completing a Task` (evidence/reason format)
### 5. Copy to .code-intel/
- [x] `templates/code-intel/task_planning.md` → `.code-intel/task_planning.md`
- [x] `templates/code-intel/phase_contract.yml` → `.code-intel/phase_contract.yml`
- [x] Copy to `sample/.code-intel/` as well
### 6. Tests
- [x] `test_checklist_registration` - Checklist task registration succeeds
- [x] `test_checklist_missing` - Task registration without checklist (allowed for backward compatibility)
- [x] `test_checklist_items_mismatch` - Error when reported items don't match registered items
- [x] `test_checklist_item_pending` - Error when reporting with pending status
- [x] `test_checklist_evidence_required` - Error when done without evidence
- [x] `test_checklist_evidence_format_valid` - Correct format passes
- [x] `test_checklist_evidence_format_invalid` - Invalid format errors
- [x] `test_checklist_evidence_file_not_found` - Non-existent file errors
- [x] `test_checklist_evidence_line_out_of_range` - Out of range line number errors
- [x] `test_checklist_evidence_empty_impl` - Empty implementation (pass/TODO) errors
- [x] `test_checklist_reason_required` - Error when skipped without reason
- [x] `test_checklist_reason_min_length` - Error when reason is less than 10 chars
- [x] `test_checklist_complete_success` - Completion with correct report
### 7. Documentation
- [x] Add checklist feature overview to `README_ja.md`
- [x] Reflect READY phase changes in `docs/DESIGN_ja.md`
---
## Expected Benefits
### Cases Prevented
- **Accidental omission**: All items in checklist, noticed when reporting
- **Casual skipping**: Forced to think about reason as it's required
- **Mock-only implementation**: Empty function/TODO detected by evidence validation
- **Item oversight**: Server detects "not all items reported"
### Quantitative Impact (Projected)
| Metric | Before | After (Projected) |
|--------|--------|-------------------|
| Implementation omission rate | 70% (3/10) | Below 10% |
| Quality of skipped reasons | - | Min 10 char explanation required |
| Empty implementation detection | 0% | 90%+ |
---
## Notes
- Checklist granularity depends on LLM judgment, so instructions to include all modification targets from EXPLORATION are important
- Evidence validation is not perfect (cannot verify correctness of complex logic)
- Most effective when combined with POST_IMPL_VERIFY (verifier execution)