Medical GraphRAG Assistant

agent-memory-system.md•12.4 KiB

# Agent Memory System The Medical GraphRAG Assistant includes a persistent semantic memory system that allows the AI agent to learn and remember information across conversations. This document provides comprehensive documentation of the memory architecture, implementation, and usage. ## Overview The Agent Memory system provides: - **Semantic storage** of corrections, preferences, and learned knowledge - **Vector-based recall** using NV-CLIP 1024-dimensional embeddings - **Auto-recall** before each query to guide tool selection - **Type-based organization** of memories (correction, preference, knowledge, feedback) - **UI management** via sidebar memory editor ## Architecture ```mermaid flowchart TB subgraph UI["🖥️ Streamlit UI"] CHAT[Chat Interface] MEM_EDITOR[Memory Editor Sidebar Panel] end subgraph AGENT["🤖 Agent Layer"] LLM[LLM Claude/GPT/NIM] AUTO[Auto-Recall Pre-Query Hook] end subgraph MCP["⚡ MCP Tools"] REMEMBER[remember_information Store new memory] RECALL[recall_information Semantic search] STATS[get_memory_stats Statistics] end subgraph STORAGE["🗄️ Vector Memory"] VM[VectorMemory Class] IRIS[(IRIS Database AgentMemoryVectors)] NVCLIP[NV-CLIP 1024-dim Embeddings] end CHAT --> LLM MEM_EDITOR --> VM LLM --> AUTO AUTO --> RECALL LLM --> MCP MCP --> VM VM --> IRIS VM --> NVCLIP ``` ## Database Schema Memories are stored in the `SQLUser.AgentMemoryVectors` table: ```sql CREATE TABLE SQLUser.AgentMemoryVectors ( MemoryID VARCHAR(255) PRIMARY KEY, -- UUID identifier MemoryType VARCHAR(50) NOT NULL, -- correction | preference | knowledge | feedback MemoryText VARCHAR(4000) NOT NULL, -- The memory content Embedding VECTOR(DOUBLE, 1024), -- NV-CLIP embedding Metadata VARCHAR(4000), -- JSON metadata (source, timestamp, etc.) UseCount INT DEFAULT 1, -- Times recalled CreatedAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP, UpdatedAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP ) ``` ### Memory Types | Type | Description | Example | |------|-------------|---------| | `correction` | Factual corrections from user | "Metformin is for diabetes, not hypertension" | | `preference` | User preferences for output | "I prefer concise clinical summaries" | | `knowledge` | Learned medical knowledge | "This hospital uses ICD-10-CM codes" | | `feedback` | User feedback on responses | "The last response was too verbose" | ## Core Components ### 1. VectorMemory Class Located at `src/memory/vector_memory.py`, this class handles all memory operations: ```python class VectorMemory: """Pure IRIS vector-based semantic memory storage.""" def __init__( self, iris_host: str = "localhost", iris_port: int = 1972, iris_namespace: str = "%SYS", iris_username: str = "_SYSTEM", iris_password: str = "ISCDEMO", nvclip_base_url: str = None, table_name: str = "SQLUser.AgentMemoryVectors" ): ... ``` **Key Methods:** | Method | Description | |--------|-------------| | `store(text, memory_type, metadata)` | Store a new memory with embedding | | `recall(query, limit, memory_type)` | Semantic search over memories | | `list_all(memory_type, limit)` | List memories without search | | `delete(memory_id)` | Delete a specific memory | | `get_stats()` | Get memory statistics by type | ### 2. MCP Memory Tools Three memory tools are exposed via the MCP server: #### `remember_information` Store new information in agent memory. ```python @mcp.tool() async def remember_information( content: str, # What to remember memory_type: str = "knowledge", # correction | preference | knowledge | feedback tool_name: str = None # Required for corrections ) -> str: ``` **Usage Examples:** - "Remember that I prefer brief responses" → `preference` - "Remember: metformin treats diabetes" → `knowledge` - "That's wrong, search_knowledge_graph should use..." → `correction` #### `recall_information` Semantic search over stored memories. ```python @mcp.tool() async def recall_information( query: str, # Search query limit: int = 5, # Max results memory_type: str = None # Filter by type ) -> str: ``` **Returns:** List of relevant memories with similarity scores. #### `get_memory_stats` Get statistics about stored memories. ```python @mcp.tool() async def get_memory_stats() -> str: ``` **Returns:** Memory counts by type, total count, storage info. ### 3. Auto-Recall System Before each user query, the system automatically recalls relevant memories: ```python def _auto_recall_memories(self, query: str) -> str: """Recall relevant memories to prepend to system prompt.""" if not self.memory: return "" try: memories = self.memory.recall(query, limit=3) if not memories: return "" memory_context = "\n## Recalled Memories (Auto)\n" for mem in memories: memory_context += f"- [{mem['type']}] {mem['text']}\n" return memory_context except Exception as e: return "" ``` This allows corrections and preferences to automatically influence tool selection and response generation. ## Data Flow ### Storing a Memory ```mermaid sequenceDiagram participant U as User participant L as LLM participant M as MCP Server participant V as VectorMemory participant N as NV-CLIP participant I as IRIS U->>L: "Remember I prefer concise summaries" L->>M: remember_information(content, type="preference") M->>V: store(text, memory_type) V->>N: embed_text(text) N-->>V: 1024-dim vector V->>I: INSERT INTO AgentMemoryVectors I-->>V: Success V-->>M: Memory ID M-->>L: "Stored successfully" L-->>U: "I'll remember that preference" ``` ### Recalling Memories ```mermaid sequenceDiagram participant U as User participant L as LLM participant A as Auto-Recall participant V as VectorMemory participant N as NV-CLIP participant I as IRIS U->>L: "What medications treat diabetes?" L->>A: Pre-query hook A->>V: recall(query, limit=3) V->>N: embed_text(query) N-->>V: Query vector V->>I: SELECT ... VECTOR_COSINE I-->>V: Top-K memories V-->>A: Relevant memories A-->>L: Memory context prepended L->>L: Process with memory context L-->>U: Response (influenced by memories) ``` ## Embedding Strategy The system uses **NV-CLIP** for memory embeddings: | Property | Value | |----------|-------| | Model | `nvidia/nvclip` | | Dimension | 1024 | | Similarity | Cosine (VECTOR_COSINE) | | Storage | IRIS VECTOR(DOUBLE, 1024) | **Why NV-CLIP?** - Multimodal compatibility (same model used for images) - High-quality semantic representations - Fast inference via NVIDIA NIM ### Embedding Generation ```python def _get_embedding(self, text: str) -> List[float]: """Get NV-CLIP text embedding.""" response = requests.post( f"{self.nvclip_base_url}/embeddings", json={ "input": text, "model": "nvidia/nvclip", "input_type": "text" }, headers={"Content-Type": "application/json"}, timeout=30 ) return response.json()["data"][0]["embedding"] ``` ## UI Integration ### Memory Editor (Sidebar) The Streamlit sidebar includes a Memory Editor panel: ```python with st.sidebar.expander("💾 Memory Editor", expanded=False): # Search memories search_query = st.text_input("Search memories") # Add new memory with st.form("add_memory"): new_memory = st.text_area("New memory") memory_type = st.selectbox("Type", ["preference", "knowledge", "correction", "feedback"]) submit = st.form_submit_button("Add Memory") # Display existing memories memories = memory.list_all(limit=20) for mem in memories: col1, col2 = st.columns([4, 1]) with col1: st.markdown(f"**[{mem['type']}]** {mem['text'][:100]}...") with col2: if st.button("🗑️", key=mem['id']): memory.delete(mem['id']) ``` ### Execution Details Integration Recalled memories appear in the "Show Execution Details" panel: ```python with st.expander("Show Execution Details"): # Memories section if recalled_memories: st.markdown("### 🧠 Recalled Memories") for mem in recalled_memories: st.markdown(f"- **[{mem['type']}]** {mem['text']}") ``` ## Configuration ### Environment Variables ```bash # IRIS Database (for memory storage) export IRIS_HOST=localhost export IRIS_PORT=1972 export IRIS_NAMESPACE=%SYS export IRIS_USERNAME=_SYSTEM export IRIS_PASSWORD=your-password # NV-CLIP (for embeddings) export NVCLIP_BASE_URL="http://localhost:8002/v1" ``` ### YAML Configuration In `config/fhir_graphrag_config.aws.yaml`: ```yaml memory: enabled: true table_name: "SQLUser.AgentMemoryVectors" auto_recall: true auto_recall_limit: 3 embedding_model: "nvidia/nvclip" ``` ## Example Usage ### Via Chat Interface **Storing a correction:** ``` User: That's wrong - you should use search_fhir_documents for clinical notes, not hybrid_search Agent: I'll remember that correction about using search_fhir_documents for clinical notes. [Internally calls remember_information with memory_type="correction"] ``` **Storing a preference:** ``` User: Remember that I prefer bullet-point summaries Agent: Got it! I'll remember your preference for bullet-point summaries. [Stores preference in memory] ``` **Automatic recall in action:** ``` User: Find patients with chest pain [Auto-recall retrieves relevant memories before processing] [If previous correction said "use search_fhir_documents for symptoms", the agent will prefer that tool] Agent: [Uses search_fhir_documents based on recalled correction] ``` ### Via Python API ```python from src.memory.vector_memory import VectorMemory # Initialize memory = VectorMemory( iris_host="localhost", nvclip_base_url="http://localhost:8002/v1" ) # Store a memory memory_id = memory.store( text="Patient prefers metric units for lab values", memory_type="preference", metadata={"source": "user_input", "patient_id": "123"} ) # Recall relevant memories results = memory.recall( query="lab results format", limit=5, memory_type="preference" ) # Get statistics stats = memory.get_stats() print(f"Total memories: {stats['total']}") print(f"Corrections: {stats['by_type']['correction']}") ``` ## Similarity Search The recall function uses IRIS vector cosine similarity: ```sql SELECT TOP ? MemoryID, MemoryType, MemoryText, Metadata, UseCount, CreatedAt FROM SQLUser.AgentMemoryVectors WHERE MemoryType = ? -- Optional filter ORDER BY VECTOR_COSINE(Embedding, TO_VECTOR(?)) DESC ``` ### Similarity Thresholds | Score | Interpretation | |-------|----------------| | > 0.85 | High relevance | | 0.70 - 0.85 | Moderate relevance | | 0.50 - 0.70 | Low relevance | | < 0.50 | Likely irrelevant | ## Troubleshooting ### Common Issues **1. Memory search returns 0 results** - Check that NV-CLIP is running and accessible - Verify NVCLIP_BASE_URL environment variable - Test embedding generation independently **2. Memories not persisting** - Verify IRIS connection settings - Check table exists: `SELECT * FROM SQLUser.AgentMemoryVectors` - Ensure write permissions on IRIS namespace **3. Auto-recall not working** - Confirm `auto_recall: true` in config - Check Streamlit logs for recall errors - Verify memory table has records ### Debugging Enable debug logging: ```python import logging logging.getLogger('src.memory').setLevel(logging.DEBUG) ``` Check memory table directly: ```sql -- Count memories by type SELECT MemoryType, COUNT(*) as cnt FROM SQLUser.AgentMemoryVectors GROUP BY MemoryType; -- Recent memories SELECT TOP 10 MemoryID, MemoryType, MemoryText, CreatedAt FROM SQLUser.AgentMemoryVectors ORDER BY CreatedAt DESC; ``` ## Future Enhancements - **Memory decay**: Reduce relevance of old, unused memories - **Memory consolidation**: Merge similar memories - **Memory export/import**: Backup and restore memory state - **Multi-user support**: Isolated memory per user session - **Memory visualization**: Graph view of memory relationships ## Related Documentation - [Architecture Overview](architecture.md) - [MCP Tools Reference](../README.md#mcp-tools-architecture) - [Troubleshooting Guide](troubleshooting.md)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/isc-tdyar/medical-graphrag-assistant'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

agent-memory-system.md•12.4 KiB