Mnemex

Overview Schema Related Servers Score Discussions

mnemex
specs
002-natural-language-activation

tasks.md

tasks.md•22.7 KiB

# Tasks: Natural Language Memory Activation

**Feature**: 002-natural-language-activation
**Input**: Design documents from `/specs/002-natural-language-activation/`
**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, data-model.md ✅, contracts/ ✅

**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.

**Tests**: Unit and integration tests are included as this is a core feature requiring ≥90% coverage per constitution.

## Format: `[ID] [P?] [Story] Description`

- **[P]**: Can run in parallel (different files, no dependencies)
- **[Story]**: Which user story this task belongs to (US1, US2, US3, US4)
- Include exact file paths in descriptions

---

## Phase 1: Setup (Shared Infrastructure)

**Purpose**: Project initialization and module structure

- [X] T001 Create `src/cortexgraph/activation/` module directory structure
- [X] T002 Create `src/cortexgraph/activation/__init__.py` with public API exports
- [X] T003 [P] Create `tests/unit/activation/` directory for unit tests
- [X] T004 [P] Create `tests/integration/activation/` directory for integration tests
- [X] T005 [P] Create `tests/contract/` directory for API contract tests
- [X] T006 Create default configuration file template at `src/cortexgraph/activation/activation.yaml.example`

---

## Phase 2: Foundational (Blocking Prerequisites)

**Purpose**: Core models and infrastructure that ALL user stories depend on

**⚠️ CRITICAL**: No user story work can begin until this phase is complete

- [X] T007 Implement `ActivationSignal` Pydantic model in `src/cortexgraph/activation/models.py`
- [X] T008 [P] Implement `MessageAnalysis` Pydantic model in `src/cortexgraph/activation/models.py`
- [X] T009 [P] Implement `RecallAnalysis` Pydantic model in `src/cortexgraph/activation/models.py`
- [X] T010 [P] Implement `ConfidenceThreshold` config model in `src/cortexgraph/activation/config.py`
- [X] T011 [P] Implement `PatternLibrary` config model in `src/cortexgraph/activation/config.py`
- [X] T012 Create configuration loader with YAML parsing in `src/cortexgraph/activation/config.py`
- [X] T013 Implement entity extraction using hybrid spaCy+regex in `src/cortexgraph/activation/entity_extraction.py`
- [X] T014 Implement pattern matching engine with regex compilation in `src/cortexgraph/activation/patterns.py`
- [X] T015 Implement confidence scoring with weighted sigmoid formula in `src/cortexgraph/activation/detectors.py`
- [X] T016 Add activation configuration settings to `src/cortexgraph/config.py`
- [X] T017 [P] Write unit tests for Pydantic model validation in `tests/unit/activation/test_models.py`
- [X] T018 [P] Write unit tests for pattern matching logic in `tests/unit/activation/test_patterns.py`
- [X] T019 [P] Write unit tests for entity extraction in `tests/unit/activation/test_entity_extraction.py`
- [X] T020 [P] Write unit tests for confidence scoring in `tests/unit/activation/test_detectors.py`

**Checkpoint**: ✅ Foundation ready - user story implementation can now begin in parallel

---

## Phase 3: User Story 1 - Automatic Memory Capture (Priority: P1) 🎯 MVP

**Goal**: Users share important information and the system automatically detects memory-worthy content without explicit commands

**Independent Test**: Share a preference like "I prefer PostgreSQL for databases" and verify it's detected with high confidence (>0.7) and suggested parameters (entities, tags, strength)

### Tests for User Story 1

> **NOTE: Write these tests FIRST, ensure they FAIL before implementation**
> **CHECKPOINT (2025-11-24): All 5 test tasks complete - 68 tests pass, 1 xfail**

- [X] T021 [P] [US1] Contract test for `analyze_message` tool validating JSON schema in `tests/contract/test_analyze_message_api.py` (20 tests)
- [X] T022 [P] [US1] Unit test for save detection with explicit triggers in `tests/unit/activation/test_save_detection.py` (5 tests)
- [X] T023 [P] [US1] Unit test for save detection with implicit signals in `tests/unit/activation/test_save_detection.py` (7 tests)
- [X] T024 [P] [US1] Unit test for exclusion patterns (small talk filtering) in `tests/unit/activation/test_save_detection.py` (6 tests + 7 edge/uncertainty tests)
- [X] T025 [US1] Integration test for analyze_message tool with real pattern matching in `tests/integration/activation/test_analyze_message_tool.py` (19 tests)

**KEY FINDING**: Unit tests show `detect_save_intent` in activation module correctly handles "I prefer" via preference_statement signal. MCP tool needs updating to use this module (T027-T028).

### Implementation for User Story 1

- [ ] T026 [US1] Implement save detection logic in `src/cortexgraph/activation/detectors.py` (detect_save_intent function)
- [ ] T027 [US1] Implement message analysis logic in `src/cortexgraph/activation/detectors.py` (analyze_message function)
- [ ] T028 [US1] Create `analyze_message` MCP tool in `src/cortexgraph/tools/analyze_message.py`
- [ ] T029 [US1] Register `analyze_message` tool with FastMCP server in `src/cortexgraph/server.py`
- [ ] T030 [US1] Add validation and error handling for analyze_message tool
- [ ] T031 [US1] Add logging for save detection decisions with confidence scores
- [ ] T032 [US1] Create default save trigger patterns in `activation.yaml.example`

**Checkpoint**: At this point, analyze_message tool should detect memory-worthy content and provide suggested parameters

**Acceptance Verification**:
1. "I prefer PostgreSQL for databases" → should_save=True, confidence ≥0.7, entities=["postgresql"]
2. "Remember this: I use VSCode" → should_save=True, confidence ≥0.9, explicit trigger detected
3. "Nice weather today" → should_save=False, confidence <0.4, excluded as small talk
4. "I decided to use React" → should_save=True, confidence ≥0.7, decision marker detected
5. "My API endpoint is api.example.com" → should_save=True, entities=["api.example.com"]

---

## Phase 4: User Story 2 - Contextual Memory Recall (Priority: P1)

**Goal**: Users ask questions referencing past conversations and the system automatically detects recall intent without explicit "search memory" commands

**Independent Test**: First save a preference using US1 or manually, then ask "what did I say about authentication?" and verify the system detects recall intent (should_search=True, confidence ≥0.7) with extracted query

### Tests for User Story 2

- [ ] T033 [P] [US2] Contract test for `analyze_for_recall` tool validating JSON schema in `tests/contract/test_analyze_for_recall_api.py`
- [ ] T034 [P] [US2] Unit test for recall detection with explicit triggers in `tests/unit/activation/test_recall_detection.py`
- [ ] T035 [P] [US2] Unit test for recall detection with implicit signals in `tests/unit/activation/test_recall_detection.py`
- [ ] T036 [P] [US2] Unit test for query extraction from natural language in `tests/unit/activation/test_recall_detection.py`
- [ ] T037 [P] [US2] Unit test for exclusion patterns (general knowledge questions) in `tests/unit/activation/test_recall_detection.py`
- [ ] T038 [US2] Integration test for analyze_for_recall tool with real pattern matching in `tests/integration/activation/test_analyze_for_recall_tool.py`

### Implementation for User Story 2

- [ ] T039 [US2] Implement recall detection logic in `src/cortexgraph/activation/detectors.py` (detect_recall_intent function)
- [ ] T040 [US2] Implement query extraction from user messages in `src/cortexgraph/activation/detectors.py`
- [ ] T041 [US2] Implement semantic query expansion with tech term mappings in `src/cortexgraph/activation/patterns.py`
- [ ] T042 [US2] Implement recall analysis logic in `src/cortexgraph/activation/detectors.py` (analyze_for_recall function)
- [ ] T043 [US2] Create `analyze_for_recall` MCP tool in `src/cortexgraph/tools/analyze_for_recall.py`
- [ ] T044 [US2] Register `analyze_for_recall` tool with FastMCP server in `src/cortexgraph/server.py`
- [ ] T045 [US2] Add validation and error handling for analyze_for_recall tool
- [ ] T046 [US2] Add logging for recall detection decisions with confidence scores
- [ ] T047 [US2] Create default recall trigger patterns in `activation.yaml.example`
- [ ] T048 [US2] Add 50+ tech term mappings (JWT→"json web token", etc.) to `activation.yaml.example`

**Checkpoint**: At this point, analyze_for_recall tool should detect recall intent and extract search queries

**Acceptance Verification**:
1. "What did I say about auth methods?" → should_search=True, confidence ≥0.7, query="auth methods"
2. "Remind me of my database choice" → should_search=True, confidence ≥0.7, query="database choice"
3. "What's my API endpoint again?" → should_search=True, confidence ≥0.7, possessive marker detected
4. "What did I say about GraphQL?" → should_search=True (even if no memories exist - graceful empty result)
5. "What's the best authentication method?" → should_search=False, confidence <0.4, general knowledge pattern

---

## Phase 5: User Story 3 - Automatic Memory Reinforcement (Priority: P2)

**Goal**: When users revisit previously discussed topics, the system automatically reinforces those memories through integration with existing touch_memory tool

**Independent Test**: Save a memory manually, then reference it in conversation, and verify that the system detects the reference and can trigger reinforcement (this is integration with existing touch_memory - the detection is the new part)

### Tests for User Story 3

- [ ] T049 [P] [US3] Unit test for topic/entity detection in messages in `tests/unit/activation/test_reinforcement_detection.py`
- [ ] T050 [P] [US3] Unit test for cross-domain usage detection via tag similarity in `tests/unit/activation/test_reinforcement_detection.py`
- [ ] T051 [US3] Integration test for reinforcement workflow (detect → recommend touch_memory) in `tests/integration/activation/test_reinforcement_workflow.py`

### Implementation for User Story 3

- [ ] T052 [US3] Implement topic extraction from messages in `src/cortexgraph/activation/detectors.py`
- [ ] T053 [US3] Implement cross-domain usage detection with tag Jaccard similarity in `src/cortexgraph/activation/detectors.py`
- [ ] T054 [US3] Add reinforcement logic to detect when recalled memories should be touched in `src/cortexgraph/activation/detectors.py`
- [ ] T055 [US3] Add reinforcement detection to analyze_message output (new field: should_reinforce with memory_ids)
- [ ] T056 [US3] Add logging for reinforcement detection with cross-domain signals
- [ ] T057 [US3] Update integration tests to verify touch_memory is called after recall in `tests/integration/activation/test_reinforcement_workflow.py`

**Checkpoint**: At this point, the system should detect when memories are referenced and provide signals for reinforcement

**Acceptance Verification**:
1. Previously saved "JWT preference", then reference JWT in new context → reinforcement signal detected
2. Discussed "PostgreSQL" 7 days ago, ask database question today → memory retrieved AND reinforcement recommended
3. Multiple auth memories exist, discuss OAuth → OAuth-specific memory reinforcement (not all auth memories)
4. Memory in danger zone (0.15-0.35 score) → prioritized for surfacing and reinforcement
5. "Based on my React choice, I'll use Next.js" → React memory reinforcement + new Next.js memory

---

## Phase 6: User Story 4 - Decision Support for Ambiguous Cases (Priority: P3)

**Goal**: For borderline cases where confidence is in clarification range (0.4-0.7), provide reasoning and support LLM decision-making

**Independent Test**: Submit an ambiguous statement like "I might use Redis for caching" and verify confidence falls in clarification range with clear reasoning

### Tests for User Story 4

- [ ] T058 [P] [US4] Unit test for uncertainty marker detection in `tests/unit/activation/test_ambiguous_detection.py`
- [ ] T059 [P] [US4] Unit test for confidence threshold decision logic in `tests/unit/activation/test_confidence_thresholds.py`
- [ ] T060 [P] [US4] Unit test for reasoning string generation in `tests/unit/activation/test_reasoning.py`
- [ ] T061 [US4] Integration test for ambiguous cases with clarification flow in `tests/integration/activation/test_ambiguous_workflow.py`

### Implementation for User Story 4

- [ ] T062 [US4] Implement uncertainty marker detection in `src/cortexgraph/activation/detectors.py`
- [ ] T063 [US4] Implement conditional language detection ("if", "maybe", "considering") in `src/cortexgraph/activation/detectors.py`
- [ ] T064 [US4] Enhance reasoning string generation with signal breakdown in `src/cortexgraph/activation/detectors.py`
- [ ] T065 [US4] Implement phrase_signals dictionary population for transparency in `src/cortexgraph/activation/detectors.py`
- [ ] T066 [US4] Add strict_mode parameter support to analyze_message and analyze_for_recall tools
- [ ] T067 [US4] Add context_tags parameter support to analyze_message for context-aware analysis
- [ ] T068 [US4] Add available_tags parameter support to analyze_for_recall for better tag suggestions
- [ ] T069 [US4] Update logging to include phrase_signals for debugging
- [ ] T070 [US4] Create uncertainty marker patterns in `activation.yaml.example`

**Checkpoint**: At this point, ambiguous cases should provide clear reasoning and fall in appropriate confidence ranges

**Acceptance Verification**:
1. "I might use Redis for caching" → confidence 0.4-0.6, uncertainty markers detected, reasoning explains low confidence
2. "What's the best authentication method?" → should_search=False or confidence 0.4-0.6, ambiguous (memory vs general knowledge)
3. Strict mode with "I prefer PostgreSQL" → should_save=False (no explicit trigger in strict mode)
4. Context-aware: "That's my choice" with context_tags=["database"] → better confidence through context
5. Feedback integration: "Don't remember that" → explicit negative command handled

---

## Phase 7: CLI Tools & Documentation

**Purpose**: Developer tools and user-facing documentation

- [ ] T071 [P] Create `cortexgraph-activation` CLI entry point in `src/cortexgraph/cli/activation.py`
- [ ] T072 [P] Implement `analyze` subcommand for testing pattern detection in `src/cortexgraph/cli/activation.py`
- [ ] T073 [P] Implement `test` subcommand for pattern validation in `src/cortexgraph/cli/activation.py`
- [ ] T074 [P] Add CLI command registration to `pyproject.toml` console_scripts
- [ ] T075 [P] Update `README.md` with natural language activation overview and links to quickstart
- [ ] T076 [P] Update `docs/api.md` with analyze_message and analyze_for_recall tool documentation
- [ ] T077 [P] Create `docs/activation.md` with detailed activation system documentation
- [ ] T078 [P] Validate all examples in `quickstart.md` work with implementation

---

## Phase 8: Polish & Cross-Cutting Concerns

**Purpose**: Quality improvements and final validation

- [ ] T079 [P] Add hot-reload support using watchdog library for `activation.yaml` in `src/cortexgraph/activation/config.py`
- [ ] T080 [P] Performance benchmarking: verify analyze_message <50ms p95 in `tests/performance/test_activation_performance.py`
- [ ] T081 [P] Performance benchmarking: verify analyze_for_recall <50ms p95 in `tests/performance/test_activation_performance.py`
- [ ] T082 [P] Performance benchmarking: verify combined workflow <300ms p95 in `tests/performance/test_activation_performance.py`
- [ ] T083 [P] Security review: ensure no sensitive data leaks in reasoning strings
- [ ] T084 [P] Security review: validate pattern injection prevention
- [ ] T085 Code cleanup: remove any debug logging, finalize error messages
- [ ] T086 Run mypy type checking with zero errors on activation module
- [ ] T087 Run ruff linting with zero errors on activation module
- [ ] T088 Run pytest with ≥90% coverage target on activation module
- [ ] T089 Update `CHANGELOG.md` with v0.7.0 natural language activation feature
- [ ] T090 Run all quickstart.md examples as final validation

---

## Dependencies & Execution Order

### Phase Dependencies

- **Setup (Phase 1)**: No dependencies - can start immediately
- **Foundational (Phase 2)**: Depends on Setup completion - BLOCKS all user stories
- **User Story 1 (Phase 3)**: Depends on Foundational phase - Can start once Phase 2 complete
- **User Story 2 (Phase 4)**: Depends on Foundational phase - Can start in parallel with US1 (different files)
- **User Story 3 (Phase 5)**: Depends on Foundational phase AND US2 (needs recall detection for reinforcement signals)
- **User Story 4 (Phase 6)**: Depends on Foundational phase AND US1/US2 (enhances existing detection logic)
- **CLI Tools (Phase 7)**: Depends on US1 and US2 being complete (needs both tools to work)
- **Polish (Phase 8)**: Depends on all user stories being complete

### User Story Dependencies

```
Foundation (Phase 2)
    ├── User Story 1 (US1) - Automatic Memory Capture [P1] ✓ Independent
    ├── User Story 2 (US2) - Contextual Memory Recall [P1] ✓ Independent
    │   └── User Story 3 (US3) - Automatic Memory Reinforcement [P2] (needs US2 for recall detection)
    └── User Story 4 (US4) - Decision Support for Ambiguous Cases [P3] (enhances US1/US2)
```

**Key insight**: US1 and US2 are fully independent and can be developed in parallel. US3 builds on US2 (needs recall to trigger reinforcement). US4 enhances all previous stories.

### Within Each User Story

- Tests MUST be written and FAIL before implementation
- Models (in Foundational) before detectors
- Detectors before MCP tools
- MCP tools before tool registration
- Core implementation before integration tests

### Parallel Opportunities

**Setup Phase (Phase 1)**: Tasks T003, T004, T005, T006 all [P] - different directories

**Foundational Phase (Phase 2)**:
- T008, T009, T010, T011 all [P] - different models in same file (can be written simultaneously)
- T017, T018, T019, T020 all [P] - different test files

**User Story 1 (Phase 3)**:
- T021, T022, T023, T024 all [P] - different test files/functions

**User Story 2 (Phase 4)**:
- T033, T034, T035, T036, T037 all [P] - different test files/functions
- Can run ENTIRE US2 in parallel with US1 (different files, no dependencies)

**User Story 4 (Phase 6)**:
- T058, T059, T060 all [P] - different test files

**CLI Phase (Phase 7)**:
- T071, T072, T073, T074, T075, T076, T077, T078 all [P] - different files

**Polish Phase (Phase 8)**:
- T079, T080, T081, T082, T083, T084 all [P] - different concerns

---

## Parallel Example: User Story 1 + User Story 2

```bash
# These two user stories can be developed completely in parallel:

# Developer A: User Story 1 (Automatic Memory Capture)
Tasks T021-T032: analyze_message tool + save detection
Files: tests/contract/test_analyze_message_api.py
       tests/unit/activation/test_save_detection.py
       tests/integration/activation/test_analyze_message_tool.py
       src/cortexgraph/activation/detectors.py (detect_save_intent, analyze_message)
       src/cortexgraph/tools/analyze_message.py

# Developer B: User Story 2 (Contextual Memory Recall)
Tasks T033-T048: analyze_for_recall tool + recall detection
Files: tests/contract/test_analyze_for_recall_api.py
       tests/unit/activation/test_recall_detection.py
       tests/integration/activation/test_analyze_for_recall_tool.py
       src/cortexgraph/activation/detectors.py (detect_recall_intent, analyze_for_recall)
       src/cortexgraph/activation/patterns.py (query expansion)
       src/cortexgraph/tools/analyze_for_recall.py

# NO file conflicts - completely parallel development!
```

---

## Implementation Strategy

### MVP First (User Story 1 Only)

1. Complete Phase 1: Setup (6 tasks)
2. Complete Phase 2: Foundational (14 tasks) - CRITICAL foundation
3. Complete Phase 3: User Story 1 (12 tasks)
4. **STOP and VALIDATE**: Test analyze_message tool independently
   - "I prefer PostgreSQL" → should_save=True, confidence ≥0.7
   - "Nice weather today" → should_save=False
5. Deploy/demo if ready - **system can now detect memory-worthy content!**

**MVP Delivered**: Users can have natural conversations and the system detects when information should be saved, providing confidence scores and suggested parameters. This is immediately useful even without auto-recall.

### Incremental Delivery

1. **Foundation** (Phase 1+2): 20 tasks → All models, patterns, config ready
2. **MVP** (+ Phase 3): 12 tasks → Automatic memory capture working (analyze_message tool)
3. **Enhanced** (+ Phase 4): 16 tasks → Add automatic recall (analyze_for_recall tool)
4. **Intelligent** (+ Phase 5): 9 tasks → Add automatic reinforcement
5. **Polished** (+ Phase 6): 12 tasks → Handle ambiguous cases with reasoning
6. **Production-Ready** (+ Phase 7+8): 20 tasks → CLI tools, docs, performance validation

**Total**: 89 tasks across 8 phases

### Parallel Team Strategy

With 2-3 developers:

1. **Day 1-2**: Everyone completes Setup + Foundational together (20 tasks, foundation MUST be complete)
2. **Day 3-5**: Once Foundational done:
   - Developer A: User Story 1 (12 tasks) - analyze_message tool
   - Developer B: User Story 2 (16 tasks) - analyze_for_recall tool
   - No conflicts! Different files, parallel work
3. **Day 6**: Integration - verify US1 and US2 both work
4. **Day 7-8**: Developer C adds User Story 3 (9 tasks) while A+B do User Story 4 (12 tasks)
5. **Day 9-10**: Everyone works on CLI, docs, polish (20 tasks in parallel)

**Timeline**: ~10 days with 2-3 developers working efficiently

---

## Task Count Summary

| Phase | Task Count | Can Parallelize |
|-------|-----------|----------------|
| Phase 1: Setup | 6 tasks | 4 tasks (67%) |
| Phase 2: Foundational | 14 tasks | 7 tasks (50%) |
| Phase 3: User Story 1 (P1) | 12 tasks | 5 tasks (42%) |
| Phase 4: User Story 2 (P1) | 16 tasks | 7 tasks (44%) |
| Phase 5: User Story 3 (P2) | 9 tasks | 3 tasks (33%) |
| Phase 6: User Story 4 (P3) | 12 tasks | 4 tasks (33%) |
| Phase 7: CLI & Docs | 8 tasks | 8 tasks (100%) |
| Phase 8: Polish | 12 tasks | 6 tasks (50%) |
| **Total** | **89 tasks** | **44 tasks (49%)** |

**MVP Scope** (US1 only): 32 tasks (Setup + Foundational + US1)
**Core Features** (US1 + US2): 48 tasks (add US2)
**Full Feature** (All User Stories): 81 tasks (add US3 + US4 + CLI + Polish)

---

## Notes

- [P] tasks = different files, no dependencies - can run in parallel
- [Story] label maps task to specific user story for traceability
- Each user story should be independently completable and testable
- Verify tests fail before implementing
- Commit after each task or logical group of related tasks
- Stop at any checkpoint to validate story independently
- Foundation phase is CRITICAL - no shortcuts, this blocks everything
- US1 and US2 have no dependencies on each other - true parallel development
- Pattern matching is deterministic - confidence thresholds may need tuning during testing
- All file paths assume single project structure (`src/cortexgraph/`, `tests/`)

---

**Generated**: 2025-01-24
**Feature**: 002-natural-language-activation
**Total Tasks**: 89 tasks across 8 phases
**MVP Tasks**: 32 tasks (Setup + Foundation + US1)
**Parallel Opportunities**: 44 tasks (49% can run in parallel)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/prefrontalsys/mnemex'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

tasks.md•22.7 KiB

# Tasks: Natural Language Memory Activation

**Feature**: 002-natural-language-activation
**Input**: Design documents from `/specs/002-natural-language-activation/`
**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, data-model.md ✅, contracts/ ✅

**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.

**Tests**: Unit and integration tests are included as this is a core feature requiring ≥90% coverage per constitution.

## Format: `[ID] [P?] [Story] Description`

- **[P]**: Can run in parallel (different files, no dependencies)
- **[Story]**: Which user story this task belongs to (US1, US2, US3, US4)
- Include exact file paths in descriptions

---

## Phase 1: Setup (Shared Infrastructure)

**Purpose**: Project initialization and module structure

- [X] T001 Create `src/cortexgraph/activation/` module directory structure
- [X] T002 Create `src/cortexgraph/activation/__init__.py` with public API exports
- [X] T003 [P] Create `tests/unit/activation/` directory for unit tests
- [X] T004 [P] Create `tests/integration/activation/` directory for integration tests
- [X] T005 [P] Create `tests/contract/` directory for API contract tests
- [X] T006 Create default configuration file template at `src/cortexgraph/activation/activation.yaml.example`

---

## Phase 2: Foundational (Blocking Prerequisites)

**Purpose**: Core models and infrastructure that ALL user stories depend on

**⚠️ CRITICAL**: No user story work can begin until this phase is complete

- [X] T007 Implement `ActivationSignal` Pydantic model in `src/cortexgraph/activation/models.py`
- [X] T008 [P] Implement `MessageAnalysis` Pydantic model in `src/cortexgraph/activation/models.py`
- [X] T009 [P] Implement `RecallAnalysis` Pydantic model in `src/cortexgraph/activation/models.py`
- [X] T010 [P] Implement `ConfidenceThreshold` config model in `src/cortexgraph/activation/config.py`
- [X] T011 [P] Implement `PatternLibrary` config model in `src/cortexgraph/activation/config.py`
- [X] T012 Create configuration loader with YAML parsing in `src/cortexgraph/activation/config.py`
- [X] T013 Implement entity extraction using hybrid spaCy+regex in `src/cortexgraph/activation/entity_extraction.py`
- [X] T014 Implement pattern matching engine with regex compilation in `src/cortexgraph/activation/patterns.py`
- [X] T015 Implement confidence scoring with weighted sigmoid formula in `src/cortexgraph/activation/detectors.py`
- [X] T016 Add activation configuration settings to `src/cortexgraph/config.py`
- [X] T017 [P] Write unit tests for Pydantic model validation in `tests/unit/activation/test_models.py`
- [X] T018 [P] Write unit tests for pattern matching logic in `tests/unit/activation/test_patterns.py`
- [X] T019 [P] Write unit tests for entity extraction in `tests/unit/activation/test_entity_extraction.py`
- [X] T020 [P] Write unit tests for confidence scoring in `tests/unit/activation/test_detectors.py`

**Checkpoint**: ✅ Foundation ready - user story implementation can now begin in parallel

---

## Phase 3: User Story 1 - Automatic Memory Capture (Priority: P1) 🎯 MVP

**Goal**: Users share important information and the system automatically detects memory-worthy content without explicit commands

**Independent Test**: Share a preference like "I prefer PostgreSQL for databases" and verify it's detected with high confidence (>0.7) and suggested parameters (entities, tags, strength)

### Tests for User Story 1

> **NOTE: Write these tests FIRST, ensure they FAIL before implementation**
> **CHECKPOINT (2025-11-24): All 5 test tasks complete - 68 tests pass, 1 xfail**

- [X] T021 [P] [US1] Contract test for `analyze_message` tool validating JSON schema in `tests/contract/test_analyze_message_api.py` (20 tests)
- [X] T022 [P] [US1] Unit test for save detection with explicit triggers in `tests/unit/activation/test_save_detection.py` (5 tests)
- [X] T023 [P] [US1] Unit test for save detection with implicit signals in `tests/unit/activation/test_save_detection.py` (7 tests)
- [X] T024 [P] [US1] Unit test for exclusion patterns (small talk filtering) in `tests/unit/activation/test_save_detection.py` (6 tests + 7 edge/uncertainty tests)
- [X] T025 [US1] Integration test for analyze_message tool with real pattern matching in `tests/integration/activation/test_analyze_message_tool.py` (19 tests)

**KEY FINDING**: Unit tests show `detect_save_intent` in activation module correctly handles "I prefer" via preference_statement signal. MCP tool needs updating to use this module (T027-T028).

### Implementation for User Story 1

- [ ] T026 [US1] Implement save detection logic in `src/cortexgraph/activation/detectors.py` (detect_save_intent function)
- [ ] T027 [US1] Implement message analysis logic in `src/cortexgraph/activation/detectors.py` (analyze_message function)
- [ ] T028 [US1] Create `analyze_message` MCP tool in `src/cortexgraph/tools/analyze_message.py`
- [ ] T029 [US1] Register `analyze_message` tool with FastMCP server in `src/cortexgraph/server.py`
- [ ] T030 [US1] Add validation and error handling for analyze_message tool
- [ ] T031 [US1] Add logging for save detection decisions with confidence scores
- [ ] T032 [US1] Create default save trigger patterns in `activation.yaml.example`

**Checkpoint**: At this point, analyze_message tool should detect memory-worthy content and provide suggested parameters

**Acceptance Verification**:
1. "I prefer PostgreSQL for databases" → should_save=True, confidence ≥0.7, entities=["postgresql"]
2. "Remember this: I use VSCode" → should_save=True, confidence ≥0.9, explicit trigger detected
3. "Nice weather today" → should_save=False, confidence <0.4, excluded as small talk
4. "I decided to use React" → should_save=True, confidence ≥0.7, decision marker detected
5. "My API endpoint is api.example.com" → should_save=True, entities=["api.example.com"]

---

## Phase 4: User Story 2 - Contextual Memory Recall (Priority: P1)

**Goal**: Users ask questions referencing past conversations and the system automatically detects recall intent without explicit "search memory" commands

**Independent Test**: First save a preference using US1 or manually, then ask "what did I say about authentication?" and verify the system detects recall intent (should_search=True, confidence ≥0.7) with extracted query

### Tests for User Story 2

- [ ] T033 [P] [US2] Contract test for `analyze_for_recall` tool validating JSON schema in `tests/contract/test_analyze_for_recall_api.py`
- [ ] T034 [P] [US2] Unit test for recall detection with explicit triggers in `tests/unit/activation/test_recall_detection.py`
- [ ] T035 [P] [US2] Unit test for recall detection with implicit signals in `tests/unit/activation/test_recall_detection.py`
- [ ] T036 [P] [US2] Unit test for query extraction from natural language in `tests/unit/activation/test_recall_detection.py`
- [ ] T037 [P] [US2] Unit test for exclusion patterns (general knowledge questions) in `tests/unit/activation/test_recall_detection.py`
- [ ] T038 [US2] Integration test for analyze_for_recall tool with real pattern matching in `tests/integration/activation/test_analyze_for_recall_tool.py`

### Implementation for User Story 2

- [ ] T039 [US2] Implement recall detection logic in `src/cortexgraph/activation/detectors.py` (detect_recall_intent function)
- [ ] T040 [US2] Implement query extraction from user messages in `src/cortexgraph/activation/detectors.py`
- [ ] T041 [US2] Implement semantic query expansion with tech term mappings in `src/cortexgraph/activation/patterns.py`
- [ ] T042 [US2] Implement recall analysis logic in `src/cortexgraph/activation/detectors.py` (analyze_for_recall function)
- [ ] T043 [US2] Create `analyze_for_recall` MCP tool in `src/cortexgraph/tools/analyze_for_recall.py`
- [ ] T044 [US2] Register `analyze_for_recall` tool with FastMCP server in `src/cortexgraph/server.py`
- [ ] T045 [US2] Add validation and error handling for analyze_for_recall tool
- [ ] T046 [US2] Add logging for recall detection decisions with confidence scores
- [ ] T047 [US2] Create default recall trigger patterns in `activation.yaml.example`
- [ ] T048 [US2] Add 50+ tech term mappings (JWT→"json web token", etc.) to `activation.yaml.example`

**Checkpoint**: At this point, analyze_for_recall tool should detect recall intent and extract search queries

**Acceptance Verification**:
1. "What did I say about auth methods?" → should_search=True, confidence ≥0.7, query="auth methods"
2. "Remind me of my database choice" → should_search=True, confidence ≥0.7, query="database choice"
3. "What's my API endpoint again?" → should_search=True, confidence ≥0.7, possessive marker detected
4. "What did I say about GraphQL?" → should_search=True (even if no memories exist - graceful empty result)
5. "What's the best authentication method?" → should_search=False, confidence <0.4, general knowledge pattern

---

## Phase 5: User Story 3 - Automatic Memory Reinforcement (Priority: P2)

**Goal**: When users revisit previously discussed topics, the system automatically reinforces those memories through integration with existing touch_memory tool

**Independent Test**: Save a memory manually, then reference it in conversation, and verify that the system detects the reference and can trigger reinforcement (this is integration with existing touch_memory - the detection is the new part)

### Tests for User Story 3

- [ ] T049 [P] [US3] Unit test for topic/entity detection in messages in `tests/unit/activation/test_reinforcement_detection.py`
- [ ] T050 [P] [US3] Unit test for cross-domain usage detection via tag similarity in `tests/unit/activation/test_reinforcement_detection.py`
- [ ] T051 [US3] Integration test for reinforcement workflow (detect → recommend touch_memory) in `tests/integration/activation/test_reinforcement_workflow.py`

### Implementation for User Story 3

- [ ] T052 [US3] Implement topic extraction from messages in `src/cortexgraph/activation/detectors.py`
- [ ] T053 [US3] Implement cross-domain usage detection with tag Jaccard similarity in `src/cortexgraph/activation/detectors.py`
- [ ] T054 [US3] Add reinforcement logic to detect when recalled memories should be touched in `src/cortexgraph/activation/detectors.py`
- [ ] T055 [US3] Add reinforcement detection to analyze_message output (new field: should_reinforce with memory_ids)
- [ ] T056 [US3] Add logging for reinforcement detection with cross-domain signals
- [ ] T057 [US3] Update integration tests to verify touch_memory is called after recall in `tests/integration/activation/test_reinforcement_workflow.py`

**Checkpoint**: At this point, the system should detect when memories are referenced and provide signals for reinforcement

**Acceptance Verification**:
1. Previously saved "JWT preference", then reference JWT in new context → reinforcement signal detected
2. Discussed "PostgreSQL" 7 days ago, ask database question today → memory retrieved AND reinforcement recommended
3. Multiple auth memories exist, discuss OAuth → OAuth-specific memory reinforcement (not all auth memories)
4. Memory in danger zone (0.15-0.35 score) → prioritized for surfacing and reinforcement
5. "Based on my React choice, I'll use Next.js" → React memory reinforcement + new Next.js memory

---

## Phase 6: User Story 4 - Decision Support for Ambiguous Cases (Priority: P3)

**Goal**: For borderline cases where confidence is in clarification range (0.4-0.7), provide reasoning and support LLM decision-making

**Independent Test**: Submit an ambiguous statement like "I might use Redis for caching" and verify confidence falls in clarification range with clear reasoning

### Tests for User Story 4

- [ ] T058 [P] [US4] Unit test for uncertainty marker detection in `tests/unit/activation/test_ambiguous_detection.py`
- [ ] T059 [P] [US4] Unit test for confidence threshold decision logic in `tests/unit/activation/test_confidence_thresholds.py`
- [ ] T060 [P] [US4] Unit test for reasoning string generation in `tests/unit/activation/test_reasoning.py`
- [ ] T061 [US4] Integration test for ambiguous cases with clarification flow in `tests/integration/activation/test_ambiguous_workflow.py`

### Implementation for User Story 4

- [ ] T062 [US4] Implement uncertainty marker detection in `src/cortexgraph/activation/detectors.py`
- [ ] T063 [US4] Implement conditional language detection ("if", "maybe", "considering") in `src/cortexgraph/activation/detectors.py`
- [ ] T064 [US4] Enhance reasoning string generation with signal breakdown in `src/cortexgraph/activation/detectors.py`
- [ ] T065 [US4] Implement phrase_signals dictionary population for transparency in `src/cortexgraph/activation/detectors.py`
- [ ] T066 [US4] Add strict_mode parameter support to analyze_message and analyze_for_recall tools
- [ ] T067 [US4] Add context_tags parameter support to analyze_message for context-aware analysis
- [ ] T068 [US4] Add available_tags parameter support to analyze_for_recall for better tag suggestions
- [ ] T069 [US4] Update logging to include phrase_signals for debugging
- [ ] T070 [US4] Create uncertainty marker patterns in `activation.yaml.example`

**Checkpoint**: At this point, ambiguous cases should provide clear reasoning and fall in appropriate confidence ranges

**Acceptance Verification**:
1. "I might use Redis for caching" → confidence 0.4-0.6, uncertainty markers detected, reasoning explains low confidence
2. "What's the best authentication method?" → should_search=False or confidence 0.4-0.6, ambiguous (memory vs general knowledge)
3. Strict mode with "I prefer PostgreSQL" → should_save=False (no explicit trigger in strict mode)
4. Context-aware: "That's my choice" with context_tags=["database"] → better confidence through context
5. Feedback integration: "Don't remember that" → explicit negative command handled

---

## Phase 7: CLI Tools & Documentation

**Purpose**: Developer tools and user-facing documentation

- [ ] T071 [P] Create `cortexgraph-activation` CLI entry point in `src/cortexgraph/cli/activation.py`
- [ ] T072 [P] Implement `analyze` subcommand for testing pattern detection in `src/cortexgraph/cli/activation.py`
- [ ] T073 [P] Implement `test` subcommand for pattern validation in `src/cortexgraph/cli/activation.py`
- [ ] T074 [P] Add CLI command registration to `pyproject.toml` console_scripts
- [ ] T075 [P] Update `README.md` with natural language activation overview and links to quickstart
- [ ] T076 [P] Update `docs/api.md` with analyze_message and analyze_for_recall tool documentation
- [ ] T077 [P] Create `docs/activation.md` with detailed activation system documentation
- [ ] T078 [P] Validate all examples in `quickstart.md` work with implementation

---

## Phase 8: Polish & Cross-Cutting Concerns

**Purpose**: Quality improvements and final validation

- [ ] T079 [P] Add hot-reload support using watchdog library for `activation.yaml` in `src/cortexgraph/activation/config.py`
- [ ] T080 [P] Performance benchmarking: verify analyze_message <50ms p95 in `tests/performance/test_activation_performance.py`
- [ ] T081 [P] Performance benchmarking: verify analyze_for_recall <50ms p95 in `tests/performance/test_activation_performance.py`
- [ ] T082 [P] Performance benchmarking: verify combined workflow <300ms p95 in `tests/performance/test_activation_performance.py`
- [ ] T083 [P] Security review: ensure no sensitive data leaks in reasoning strings
- [ ] T084 [P] Security review: validate pattern injection prevention
- [ ] T085 Code cleanup: remove any debug logging, finalize error messages
- [ ] T086 Run mypy type checking with zero errors on activation module
- [ ] T087 Run ruff linting with zero errors on activation module
- [ ] T088 Run pytest with ≥90% coverage target on activation module
- [ ] T089 Update `CHANGELOG.md` with v0.7.0 natural language activation feature
- [ ] T090 Run all quickstart.md examples as final validation

---

## Dependencies & Execution Order

### Phase Dependencies

- **Setup (Phase 1)**: No dependencies - can start immediately
- **Foundational (Phase 2)**: Depends on Setup completion - BLOCKS all user stories
- **User Story 1 (Phase 3)**: Depends on Foundational phase - Can start once Phase 2 complete
- **User Story 2 (Phase 4)**: Depends on Foundational phase - Can start in parallel with US1 (different files)
- **User Story 3 (Phase 5)**: Depends on Foundational phase AND US2 (needs recall detection for reinforcement signals)
- **User Story 4 (Phase 6)**: Depends on Foundational phase AND US1/US2 (enhances existing detection logic)
- **CLI Tools (Phase 7)**: Depends on US1 and US2 being complete (needs both tools to work)
- **Polish (Phase 8)**: Depends on all user stories being complete

### User Story Dependencies

```
Foundation (Phase 2)
    ├── User Story 1 (US1) - Automatic Memory Capture [P1] ✓ Independent
    ├── User Story 2 (US2) - Contextual Memory Recall [P1] ✓ Independent
    │   └── User Story 3 (US3) - Automatic Memory Reinforcement [P2] (needs US2 for recall detection)
    └── User Story 4 (US4) - Decision Support for Ambiguous Cases [P3] (enhances US1/US2)
```

**Key insight**: US1 and US2 are fully independent and can be developed in parallel. US3 builds on US2 (needs recall to trigger reinforcement). US4 enhances all previous stories.

### Within Each User Story

- Tests MUST be written and FAIL before implementation
- Models (in Foundational) before detectors
- Detectors before MCP tools
- MCP tools before tool registration
- Core implementation before integration tests

### Parallel Opportunities

**Setup Phase (Phase 1)**: Tasks T003, T004, T005, T006 all [P] - different directories

**Foundational Phase (Phase 2)**:
- T008, T009, T010, T011 all [P] - different models in same file (can be written simultaneously)
- T017, T018, T019, T020 all [P] - different test files

**User Story 1 (Phase 3)**:
- T021, T022, T023, T024 all [P] - different test files/functions

**User Story 2 (Phase 4)**:
- T033, T034, T035, T036, T037 all [P] - different test files/functions
- Can run ENTIRE US2 in parallel with US1 (different files, no dependencies)

**User Story 4 (Phase 6)**:
- T058, T059, T060 all [P] - different test files

**CLI Phase (Phase 7)**:
- T071, T072, T073, T074, T075, T076, T077, T078 all [P] - different files

**Polish Phase (Phase 8)**:
- T079, T080, T081, T082, T083, T084 all [P] - different concerns

---

## Parallel Example: User Story 1 + User Story 2

```bash
# These two user stories can be developed completely in parallel:

# Developer A: User Story 1 (Automatic Memory Capture)
Tasks T021-T032: analyze_message tool + save detection
Files: tests/contract/test_analyze_message_api.py
       tests/unit/activation/test_save_detection.py
       tests/integration/activation/test_analyze_message_tool.py
       src/cortexgraph/activation/detectors.py (detect_save_intent, analyze_message)
       src/cortexgraph/tools/analyze_message.py

# Developer B: User Story 2 (Contextual Memory Recall)
Tasks T033-T048: analyze_for_recall tool + recall detection
Files: tests/contract/test_analyze_for_recall_api.py
       tests/unit/activation/test_recall_detection.py
       tests/integration/activation/test_analyze_for_recall_tool.py
       src/cortexgraph/activation/detectors.py (detect_recall_intent, analyze_for_recall)
       src/cortexgraph/activation/patterns.py (query expansion)
       src/cortexgraph/tools/analyze_for_recall.py

# NO file conflicts - completely parallel development!
```

---

## Implementation Strategy

### MVP First (User Story 1 Only)

1. Complete Phase 1: Setup (6 tasks)
2. Complete Phase 2: Foundational (14 tasks) - CRITICAL foundation
3. Complete Phase 3: User Story 1 (12 tasks)
4. **STOP and VALIDATE**: Test analyze_message tool independently
   - "I prefer PostgreSQL" → should_save=True, confidence ≥0.7
   - "Nice weather today" → should_save=False
5. Deploy/demo if ready - **system can now detect memory-worthy content!**

**MVP Delivered**: Users can have natural conversations and the system detects when information should be saved, providing confidence scores and suggested parameters. This is immediately useful even without auto-recall.

### Incremental Delivery

1. **Foundation** (Phase 1+2): 20 tasks → All models, patterns, config ready
2. **MVP** (+ Phase 3): 12 tasks → Automatic memory capture working (analyze_message tool)
3. **Enhanced** (+ Phase 4): 16 tasks → Add automatic recall (analyze_for_recall tool)
4. **Intelligent** (+ Phase 5): 9 tasks → Add automatic reinforcement
5. **Polished** (+ Phase 6): 12 tasks → Handle ambiguous cases with reasoning
6. **Production-Ready** (+ Phase 7+8): 20 tasks → CLI tools, docs, performance validation

**Total**: 89 tasks across 8 phases

### Parallel Team Strategy

With 2-3 developers:

1. **Day 1-2**: Everyone completes Setup + Foundational together (20 tasks, foundation MUST be complete)
2. **Day 3-5**: Once Foundational done:
   - Developer A: User Story 1 (12 tasks) - analyze_message tool
   - Developer B: User Story 2 (16 tasks) - analyze_for_recall tool
   - No conflicts! Different files, parallel work
3. **Day 6**: Integration - verify US1 and US2 both work
4. **Day 7-8**: Developer C adds User Story 3 (9 tasks) while A+B do User Story 4 (12 tasks)
5. **Day 9-10**: Everyone works on CLI, docs, polish (20 tasks in parallel)

**Timeline**: ~10 days with 2-3 developers working efficiently

---

## Task Count Summary

| Phase | Task Count | Can Parallelize |
|-------|-----------|----------------|
| Phase 1: Setup | 6 tasks | 4 tasks (67%) |
| Phase 2: Foundational | 14 tasks | 7 tasks (50%) |
| Phase 3: User Story 1 (P1) | 12 tasks | 5 tasks (42%) |
| Phase 4: User Story 2 (P1) | 16 tasks | 7 tasks (44%) |
| Phase 5: User Story 3 (P2) | 9 tasks | 3 tasks (33%) |
| Phase 6: User Story 4 (P3) | 12 tasks | 4 tasks (33%) |
| Phase 7: CLI & Docs | 8 tasks | 8 tasks (100%) |
| Phase 8: Polish | 12 tasks | 6 tasks (50%) |
| **Total** | **89 tasks** | **44 tasks (49%)** |

**MVP Scope** (US1 only): 32 tasks (Setup + Foundational + US1)
**Core Features** (US1 + US2): 48 tasks (add US2)
**Full Feature** (All User Stories): 81 tasks (add US3 + US4 + CLI + Polish)

---

## Notes

- [P] tasks = different files, no dependencies - can run in parallel
- [Story] label maps task to specific user story for traceability
- Each user story should be independently completable and testable
- Verify tests fail before implementing
- Commit after each task or logical group of related tasks
- Stop at any checkpoint to validate story independently
- Foundation phase is CRITICAL - no shortcuts, this blocks everything
- US1 and US2 have no dependencies on each other - true parallel development
- Pattern matching is deterministic - confidence thresholds may need tuning during testing
- All file paths assume single project structure (`src/cortexgraph/`, `tests/`)

---

**Generated**: 2025-01-24
**Feature**: 002-natural-language-activation
**Total Tasks**: 89 tasks across 8 phases
**MVP Tasks**: 32 tasks (Setup + Foundation + US1)
**Parallel Opportunities**: 44 tasks (49% can run in parallel)