Code Graph Knowledge System

overview.md•18.5 KiB

# Memory Store Overview The Memory Store is a project knowledge persistence system designed specifically for AI agents to maintain continuity across development sessions. Unlike short-term conversation history, the Memory Store preserves curated, structured project knowledge. ## Table of Contents - [What is Memory Store?](#what-is-memory-store) - [Why Memory Store Matters](#why-memory-store-matters) - [Core Concepts](#core-concepts) - [Memory Types](#memory-types) - [Architecture](#architecture) - [Operation Modes](#operation-modes) - [Quick Start](#quick-start) - [Use Cases](#use-cases) --- ## What is Memory Store? Memory Store is a Neo4j-based knowledge management system that allows AI agents and developers to: - **Save Important Decisions**: Architectural choices, technology selections, and their rationale - **Record Preferences**: Coding styles, tool choices, and team conventions - **Document Experiences**: Problems encountered and their solutions - **Track Plans**: Future improvements, TODOs, and roadmap items - **Preserve Context**: Maintain project knowledge across sessions, weeks, and months **Key Principle**: Memory = Structured Project Knowledge Instead of re-explaining project context every session, AI agents can search memories and immediately understand: - "Why did we choose PostgreSQL over MySQL?" - "What's our convention for API endpoint naming?" - "What Redis issues did we encounter in Docker?" --- ## Why Memory Store Matters ### Problem: Context Loss Across Sessions Without Memory Store, AI agents suffer from: - ❌ Repeating the same questions every session - ❌ Forgetting why decisions were made - ❌ Making inconsistent choices - ❌ Re-encountering solved problems - ❌ Breaking established conventions ### Solution: Long-term Project Memory With Memory Store, AI agents gain: - ✅ **Cross-session continuity** - Remember decisions from previous sessions - ✅ **Avoid repeating mistakes** - Recall past problems and solutions - ✅ **Maintain consistency** - Follow established patterns and conventions - ✅ **Track evolution** - Document how decisions change over time - ✅ **Preserve rationale** - Remember *why* something was done, not just *what* --- ## Core Concepts ### 1. Memory as Knowledge Each memory represents a discrete piece of project knowledge: ```python { "id": "uuid-here", "type": "decision", "title": "Use JWT for authentication", "content": "Decided to use JWT tokens instead of session-based auth", "reason": "Need stateless authentication for mobile clients", "importance": 0.9, "tags": ["auth", "architecture"], "created_at": "2025-11-06T10:30:00Z", "updated_at": "2025-11-06T10:30:00Z" } ``` ### 2. Project Organization Memories belong to projects, enabling multi-project knowledge management: ``` Project: web-app ├── Decisions: 15 memories ├── Preferences: 8 memories ├── Experiences: 12 memories ├── Conventions: 6 memories ├── Plans: 10 memories └── Notes: 5 memories ``` ### 3. Knowledge Evolution Memories can supersede each other, preserving decision history: ``` Original Decision (2024-01-15) ↓ superseded by New Decision (2024-03-20) ↓ superseded by Current Decision (2024-11-06) ``` ### 4. Code Integration Memories can link to code via `ref://` handles: ```python related_refs = [ "ref://file/src/auth/jwt.py", "ref://symbol/authenticate_user", "ref://file/config/database.py#L45" ] ``` --- ## Memory Types The Memory Store supports six memory types, each serving a specific purpose: ### 1. Decision **Purpose**: Architectural choices, technology selections, and major design decisions **Importance Range**: 0.7 - 1.0 (high importance) **Examples**: - "Use JWT tokens for stateless authentication" - "Adopt microservices architecture for scalability" - "Choose PostgreSQL over MySQL for JSON support" **When to Use**: - Making technology stack choices - Deciding on architectural patterns - Selecting third-party services or libraries - Establishing security policies ### 2. Preference **Purpose**: Team coding styles, tool preferences, and development practices **Importance Range**: 0.5 - 0.7 (medium importance) **Examples**: - "Use raw SQL instead of ORM for database queries" - "Prefer functional components in React" - "Use kebab-case for API endpoint naming" **When to Use**: - Establishing coding style guidelines - Choosing between equivalent approaches - Setting team tool preferences - Defining code review standards ### 3. Experience **Purpose**: Problems encountered and their solutions, bug fixes, gotchas **Importance Range**: 0.5 - 0.9 (varies by severity) **Examples**: - "Redis fails with 'localhost' in Docker - use service name instead" - "Large file uploads timeout - need to increase nginx client_max_body_size" - "Date parsing breaks in Safari - must use ISO 8601 format" **When to Use**: - Documenting bugs and their fixes - Recording deployment issues - Noting platform-specific quirks - Sharing debugging insights ### 4. Convention **Purpose**: Team rules, naming standards, and established practices **Importance Range**: 0.4 - 0.6 (medium importance) **Examples**: - "All API endpoints must use kebab-case" - "Test files must be in __tests__ directory" - "Environment variables must use UPPER_SNAKE_CASE" **When to Use**: - Documenting naming conventions - Establishing file organization rules - Setting commit message standards - Defining code structure patterns ### 5. Plan **Purpose**: Future improvements, TODOs, roadmap items **Importance Range**: 0.3 - 0.7 (varies by priority) **Examples**: - "Migrate to PostgreSQL 16 for performance improvements" - "Add rate limiting to public API endpoints" - "Refactor authentication middleware for better testability" **When to Use**: - Tracking technical debt - Planning future features - Recording optimization opportunities - Documenting refactoring needs ### 6. Note **Purpose**: General information that doesn't fit other categories **Importance Range**: 0.2 - 0.8 (varies widely) **Examples**: - "Production database backups stored in S3 bucket prod-backups" - "Weekly deployment window is Thursdays 2-4 PM EST" - "API rate limit is 100 requests per minute per IP" **When to Use**: - Recording operational information - Documenting deployment procedures - Noting configuration details - Capturing miscellaneous knowledge --- ## Architecture ### Storage: Neo4j Graph Database Memory Store uses Neo4j for flexible, connected knowledge storage: ```cypher # Node Types (Memory) - Individual memory record (Project) - Project container # Relationships (Memory)-[:BELONGS_TO]->(Project) (Memory)-[:SUPERSEDES]->(Memory) (Memory)-[:RELATES_TO]->(File) (Memory)-[:RELATES_TO]->(Symbol) ``` **Why Neo4j?** - **Graph Relationships**: Natural modeling of memory connections - **Fulltext Search**: Fast search across title, content, reason, tags - **Vector Integration**: Future support for semantic search - **Flexible Schema**: Easy to add new memory types and relationships ### Components ``` ┌─────────────────────────────────────────────────┐ │ Application Layer │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ MCP Server │ │ HTTP API │ │ │ │ (30 tools) │ │ (FastAPI) │ │ │ └──────────────┘ └──────────────┘ │ └─────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────┐ │ Service Layer │ │ ┌──────────────┐ ┌──────────────┐ │ │ │MemoryStore │ │MemoryExtractor│ │ │ │ (manual) │ │ (auto v0.7) │ │ │ └──────────────┘ └──────────────┘ │ └─────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────┐ │ Data Layer (Neo4j) │ │ │ │ ┌────────┐ ┌────────┐ ┌────────┐ │ │ │ Memory │──│Project │ │ Code │ │ │ │ Nodes │ │ Nodes │ │ Refs │ │ │ └────────┘ └────────┘ └────────┘ │ └─────────────────────────────────────────────────┘ ``` --- ## Operation Modes Memory Store operates in two modes based on your configuration: ### Standard Mode (Fulltext Search) **Requirements**: Neo4j database only **Features**: - ✅ Add, update, delete memories - ✅ Fulltext search across title, content, reason, tags - ✅ Filter by type, tags, importance - ✅ Manual memory management (v0.6) - ✅ Automatic extraction (v0.7) **Limitations**: - ❌ No semantic similarity search - ❌ No embedding-based retrieval **Best For**: Projects that don't need semantic search ### Full Mode (With Embeddings) **Requirements**: Neo4j + Embedding provider (OpenAI/Gemini/HuggingFace) **Features**: - ✅ All Standard Mode features - ✅ Semantic similarity search - ✅ Embedding-based memory retrieval - ✅ Find conceptually related memories **Best For**: Large projects with extensive knowledge bases **Configuration**: ```bash # .env file EMBEDDING_PROVIDER=openai # or gemini, huggingface OPENAI_API_KEY=your-key-here ``` --- ## Quick Start ### 1. Using MCP Tools (Recommended for AI Agents) If you're using Claude Desktop, VSCode with MCP, or other MCP-compatible clients: ```python # Add a decision memory add_memory( project_id="my-project", memory_type="decision", title="Use JWT for authentication", content="Decided to use JWT tokens for stateless auth", reason="Need mobile client support and horizontal scaling", importance=0.9, tags=["auth", "security"] ) # Search for memories search_memories( project_id="my-project", query="authentication", memory_type="decision", min_importance=0.7 ) # Get project summary get_project_summary(project_id="my-project") ``` ### 2. Using HTTP API For web applications and custom integrations: ```bash # Add a memory curl -X POST http://localhost:8000/api/v1/memory/add \ -H "Content-Type: application/json" \ -d '{ "project_id": "my-project", "memory_type": "decision", "title": "Use JWT for authentication", "content": "Decided to use JWT tokens for stateless auth", "reason": "Need mobile client support", "importance": 0.9, "tags": ["auth", "security"] }' # Search memories curl -X POST http://localhost:8000/api/v1/memory/search \ -H "Content-Type: application/json" \ -d '{ "project_id": "my-project", "query": "authentication", "min_importance": 0.7 }' ``` ### 3. Using Python Service Directly For Python applications: ```python from src.codebase_rag.services.memory import memory_store import asyncio async def main(): # Initialize await memory_store.initialize() # Add memory result = await memory_store.add_memory( project_id="my-project", memory_type="decision", title="Use JWT for authentication", content="Decided to use JWT tokens", reason="Need stateless auth", importance=0.9, tags=["auth"] ) print(f"Added memory: {result['memory_id']}") # Search results = await memory_store.search_memories( project_id="my-project", query="authentication" ) for memory in results['memories']: print(f"- {memory['title']}") asyncio.run(main()) ``` --- ## Use Cases ### Use Case 1: AI Agent Development Session **Scenario**: AI agent starts working on a new feature **Workflow**: 1. **Search memories** for related decisions and conventions 2. **Review experiences** to avoid known issues 3. **Implement feature** following established patterns 4. **Save new learnings** as memories for future sessions **Example**: ```python # Session starts memories = search_memories( project_id="web-app", query="database migration", memory_type="experience" ) # AI learns: "Always backup before migrations" # After implementation add_memory( project_id="web-app", memory_type="decision", title="Use Alembic for database migrations", content="Adopted Alembic for schema migrations", reason="Better than custom scripts, team familiar with it", importance=0.8 ) ``` ### Use Case 2: Team Onboarding **Scenario**: New team member or AI agent needs to understand project **Workflow**: ```python # Get project overview summary = get_project_summary(project_id="web-app") # Shows: 15 decisions, 8 preferences, 12 experiences # Review top decisions decisions = search_memories( project_id="web-app", memory_type="decision", min_importance=0.8 ) # Quickly understand key architectural choices # Check coding conventions conventions = search_memories( project_id="web-app", memory_type="convention" ) # Learn team standards and practices ``` ### Use Case 3: Knowledge Evolution **Scenario**: Decision needs to change, preserve history **Workflow**: ```python # Original decision old_memory = add_memory( memory_type="decision", title="Use MySQL as database", importance=0.7 ) # Requirements change, decision evolves supersede_memory( old_memory_id=old_memory['memory_id'], new_memory_type="decision", new_title="Migrate to PostgreSQL", new_content="Switched from MySQL to PostgreSQL", new_reason="Need advanced JSON support and full-text search", new_importance=0.9 ) # Old decision preserved but marked as superseded # History maintained for audit trail ``` ### Use Case 4: Bug Prevention **Scenario**: Team encounters a tricky bug, wants to prevent recurrence **Workflow**: ```python # Document the experience add_memory( project_id="mobile-app", memory_type="experience", title="iOS date parsing fails without explicit timezone", content="Date.parse() in iOS Safari fails on dates without explicit timezone", reason="Safari is stricter than Chrome about date formats", importance=0.7, tags=["ios", "safari", "datetime", "bug"], related_refs=["ref://file/src/utils/dateParser.js"] ) # Future sessions # AI agent searches for "date parsing" before implementing # Finds the experience, avoids the bug ``` ### Use Case 5: Automatic Knowledge Capture (v0.7) **Scenario**: Extract memories from git history and code **Workflow**: ```python # Extract from conversation extract_from_conversation( project_id="my-app", conversation=[ {"role": "user", "content": "Should we use Redis or Memcached?"}, {"role": "assistant", "content": "Redis is better because..."} ], auto_save=True ) # Automatically extracts and saves the decision # Extract from git commits extract_from_git_commit( project_id="my-app", commit_sha="abc123", commit_message="feat: add JWT authentication", changed_files=["src/auth/jwt.py"], auto_save=True ) # Extracts architectural decision from commit # Batch extract from repository batch_extract_from_repository( project_id="my-app", repo_path="/path/to/repo", max_commits=50 ) # Comprehensive analysis: commits, comments, docs ``` --- ## Best Practices ### 1. Importance Scoring Guidelines | Score | Category | Examples | |-------|----------|----------| | 0.9-1.0 | Critical | Security decisions, breaking changes, data model changes | | 0.7-0.8 | Important | Architecture choices, major features, API contracts | | 0.5-0.6 | Moderate | Preferences, conventions, common patterns | | 0.3-0.4 | Low | Plans, future work, minor notes | | 0.0-0.2 | Minimal | Temporary notes, experimental ideas | ### 2. Tagging Strategy **Use Domain Tags**: ```python tags = ["auth", "database", "api", "frontend", "backend"] ``` **Use Category Tags**: ```python tags = ["security", "performance", "testing", "deployment"] ``` **Use Status Tags**: ```python tags = ["critical", "deprecated", "experimental", "production"] ``` **Combine Multiple Levels**: ```python tags = ["auth", "security", "jwt", "production", "critical"] ``` ### 3. When to Create Memories **DO Create Memories For**: - ✅ Architecture decisions - ✅ Technology choices - ✅ Tricky bugs and solutions - ✅ Team conventions - ✅ Deployment procedures - ✅ Security findings - ✅ Performance optimizations **DON'T Create Memories For**: - ❌ Routine code changes - ❌ Trivial fixes - ❌ Temporary experiments - ❌ Information already in documentation - ❌ Standard best practices ### 4. Memory Maintenance **Regular Review**: - Review memories every sprint/month - Update importance scores as project evolves - Supersede outdated decisions - Delete obsolete notes **Quality Over Quantity**: - Better to have 20 high-quality memories than 200 low-quality ones - Focus on non-obvious knowledge - Prioritize "why" over "what" --- ## Next Steps - **Manual Memory Management**: See [Manual Guide](./manual.md) - **Search Strategies**: See [Search Guide](./search.md) - **Automatic Extraction**: See [Extraction Guide](./extraction.md) - **API Reference**: See `/api/v1/memory` endpoints - **MCP Tools**: See MCP server documentation --- ## Version History - **v0.6** - Manual memory management with fulltext search - **v0.7** - Automatic extraction from conversations, commits, code comments - `extract_from_conversation`: LLM-powered conversation analysis - `extract_from_git_commit`: Analyze git commits for decisions - `extract_from_code_comments`: Mine TODO, FIXME, NOTE markers - `suggest_memory_from_query`: Auto-suggest from knowledge queries - `batch_extract_from_repository`: Comprehensive repository analysis --- ## Support For issues or questions: - Check the documentation in `/docs/guide/memory/` - Review examples in `/examples/memory_usage_example.py` - See test cases in `/tests/test_memory_store.py`

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/royisme/codebase-rag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

overview.md•18.5 KiB