Stateless Agents with Stateful Tool Memory

While Large Language Models (LLMs) are powerful in processing and generating text, they are fundamentally stateless. Each API call is treated as a new, isolated request, devoid of any memory of previous interactions. For a chatbot or virtual assistant to be truly useful, it must have a concept of memory, allowing it to remember a user's name, preferences, or the context of a multi-turn conversation. Traditional approaches attempt to solve this by packing the entire chat history into the LLM's prompt, a method that is inefficient, costly, and prone to error. This article explores a more robust and scalable solution: implementing memory as an external, manageable tool using the Model Context Protocol (MCP). This approach enables the agent to remain stateless, with a stateful tool providing persistent memory.

The Challenge of Conversational Memory

The most common method for giving a chatbot "memory" is to simply pass the full conversation history with every new prompt. This approach, often called a "context window," has several significant limitations¹:

Token Limits and Costs: LLMs have a finite context window size. As conversations grow, older messages must be truncated or summarized to fit within this limit, leading to loss of detail. This also increases token usage and, consequently, API costs with every turn, even for simple replies.
Latency and Performance: A larger context window means the model has to process more information for each response, leading to increased latency.
Inconsistency and "Hallucination": The model may misinterpret or "hallucinate" details from the long, unstructured chat history, leading to subtle but critical errors in the conversation.

These issues highlight that the in-prompt context window is best suited for short-term, in-flight reasoning, not for persistent, long-term memory. A better solution must be external to the core model inference loop.

The MCP Approach: Memory as a Tool

MCP shifts the paradigm of memory from an implicit, in-model property to an explicit, external resource managed by a dedicated tool. Instead of asking the model to "remember," the developer provides the model with a memory_tool that has well-defined functions, such as get_fact and store_fact. This is the essence of building a stateless agent with stateful tool memory. The agent itself remains stateless and simply follows a protocol to interact with the external memory store.

The memory tool can be anything from a simple Redis cache to a sophisticated knowledge graph or vector database². This approach offers several advantages:

Scalability: Memory is decoupled from the model. As the number of users or the depth of conversation grows, you can scale the memory store independently of the LLM.
Consistency and Control: You control what is stored and how it is retrieved. The tool can be designed to store structured data, preventing the ambiguity and inconsistency of raw text summaries.
Security: Sensitive user data can be stored securely in a dedicated database with its own access controls, rather than being passed repeatedly through an LLM API³.
Long-Term Memory: The memory can persist across sessions. When a user returns, the agent can use the memory tool to retrieve their history and preferences, enabling a truly personalized experience.

Behind the Scenes: The Tool-Based Memory Flow

Let's walk through a concrete example of how an MCP agent uses a memory tool to remember a user's name.

The memory tool, which we'll call user_data_store, could be defined with the following schema:

// src/mcp/memory-tools.ts /** * A tool to get or store user-specific information. * @interface UserDataStore */ interface UserDataStore { /** * Retrieves a specific fact or detail about the user. * @param {string} user_id The unique ID of the user. * @param {string} key The key for the fact to retrieve (e.g., 'name', 'location'). * @returns {string | null} The value of the fact, or null if not found. */ get_fact: (user_id: string, key: string) => Promise<string | null>; /** * Stores or updates a fact about the user. * @param {string} user_id The unique ID of the user. * @param {string} key The key for the fact to store. * @param {string} value The value of the fact. */ store_fact: (user_id: string, key: string, value: string) => Promise<void>; }

The interaction flow would look like this:

User Input: The user says, "Hi there!"
Initial Tool Call: The agent's first step is to check if it already knows the user's name. It generates a Tool Call for the get_fact tool:
{ "tool_name": "user_data_store", "action": "get_fact", "parameters": { "user_id": "user-123", "key": "name" } }
Tool Result: The tool returns a result. If the name is not found, the result will be null.
{ "status": "success", "result": null }
Agent Logic: The agent sees that the name is null. It then formulates a new response: "Hello! What's your name?"
User Response: The user replies, "My name is Alex."
Second Tool Call: The agent's reasoning identifies the user's name in the new input. It generates a Tool Call to store the name for future use:
{ "tool_name": "user_data_store", "action": "store_fact", "parameters": { "user_id": "user-123", "key": "name", "value": "Alex" } }
Final Response: The agent receives the Tool Result and uses the newly stored fact to provide a personalized response: "Nice to meet you, Alex!"⁴

This flow separates the agent's conversational logic from the memory management. The agent doesn't need to retain the name in its own context; it simply knows to call the right tool when it needs to retrieve or store a fact. This is a robust and scalable pattern that avoids the pitfalls of in-prompt memory.

My Thoughts

Implementing memory via external tools using MCP is a powerful and practical solution for building sophisticated chatbots. It moves us away from treating memory as a tricky prompt-engineering problem and towards a professional, architectural solution. This approach aligns with modern software design principles, such as separation of concerns and modularity⁵.

While some frameworks offer their own memory modules, they often abstract away the underlying mechanism, which can make debugging and customization difficult. The MCP approach, in contrast, is transparent and auditable. Each tool call and its result are logged, providing a clear trail of how the agent's "memory" is being accessed and updated⁶.

A potential trade-off is the added latency of an extra tool call for memory lookup. However, this is a minor cost compared to the performance gains from not sending a massive prompt with every request. The real value lies in the newfound reliability and scalability. For a developer building a production-grade assistant, the ability to control data persistence and prevent state-related inconsistencies is invaluable⁷. This approach truly empowers the creation of stateless, yet highly personalized, assistants.

Building Memory into Chatbots/Assistants with MCP

The Challenge of Conversational Memory

The MCP Approach: Memory as a Tool

Behind the Scenes: The Tool-Based Memory Flow

My Thoughts

References