Xano MCP Server

docs
Below is a detailed explanation of the functionality implemented in **src/model.ts** along with how the associated LLM system prompt (located in **docs/llm-system-prompt.md**) is intended to complement this design.

---

### Functionality of `src/model.ts`

The file implements the core memory model class, `TitanMemoryModel`, which conforms to the abstract interface for a memory model. Here are the key components and mechanisms:

1. **Configuration and Initialization**

   - **Constructor Parameters:**  
     The model is parameterized with values such as the input, hidden and memory dimensions, learning rates (for both the primary network and meta-learning), momentum factor, attention heads, dimensions for keys/values, and memory size.  
     
   - **Glorot (Xavier) Initialization:**  
     Each trainable parameter (weights for projections, encoders, decoders, and auxiliary networks) is initialized using a variation of Glorot initialization. This ensures that the initial weights are set in a balanced way.
     
   - **Trainable Variables:**  
     The model defines several groups of variables:
     
     - **Attention Projections:**  
       These include the query, key, and value projections along with an output projection. They form the foundation for the multi-head attention mechanism.
     
     - **Memory Encoders and Decoders:**  
       Separate variables for short-term, long-term, and meta-memory are defined using dedicated encoder and decoder matrices.
     
     - **Auxiliary Networks:**  
       Two additional networks – one for computing surprise metrics and one for pruning memory entries based on computed scores.

   - **Optimizers:**  
     The model uses Adam optimizers (with separate optimizers for the primary and meta parameters) to update the weights during training.

2. **Core Operations**

   - **Multi-Head Attention:**  
     The private method `computeAttention()` takes a query and the concatenated keys and values (from both the short-term and long-term memory) to compute the attention scores.  
     
     - The queries, keys, and values are first projected and then reshaped (by splitting them into multiple heads).  
     - After scaling and softmax normalization, the attention outputs are reassembled and projected to generate a combined representation.
     
   - **Surprise Metrics Calculation:**  
     The `computeSurprise()` method computes two metrics:
     
     - **Immediate Surprise:** A measure based on the difference between predicted and actual input.
     - **Accumulated Surprise:** How these differences accumulate over the context.
     
     These metrics are computed by concatenating the error (difference) with contextual history and feeding that through the `surpriseNetwork`.

   - **Forward Pass:**  
     The `forward()` method:
     
     - **Encodes the Input:**  
       It encodes the raw input along with memory state tensors (short-term, long-term, and meta) using the respective encoder matrices.
     
     - **Applies Attention:**  
       It applies the attention mechanism by combining encoded data from short-term and long-term memories.
     
     - **Forms a Context Vector:**  
       The context is built by concatenating the encoded input, the attention-derived representation, and the encoded meta information.
     
     - **Predictions and Memory Updates:**  
       A prediction is generated by processing the context through the short-term decoder, and new memory representations (for short-term, long-term, and meta) are computed using the appropriate decoders.
     
     - **Surprise Evaluation:**  
       It calculates surprise metrics to provide feedback on how well the memory model predicted the input.
     
   - **Training Step:**  
     The `trainStep()` method implements one iteration of training:
     
     - It begins by executing a forward pass.
     - The loss is computed as the sum of a prediction loss (mean squared error) and a weighted surprise loss.
     - Gradients are computed (using TensorFlow.js automatic differentiation) with respect to all trainable variables, and subsequently applied using the optimizer.

   - **Memory Pruning:**  
     The `pruneMemory()` method:
     
     - Concatenates the various memory tensors.
     - Computes a pruning score (via a sigmoid over a linear transformation of the concatenated tensor).
     - Selects the top memory entries based on the score to maintain the maximum allowed memory size.

3. **Persistence of the Model**

   - **Saving the Model:**  
     The `saveModel()` method serializes all the essential weight tensors into a JSON object and writes it to disk using Node’s file system promises API. The save location is defined by the path parameter you provide.
     
   - **Loading the Model:**  
     The `loadModel()` method reads the JSON file, parses the saved tensor values, and assigns them back to the model’s respective variables. It includes shape validation to ensure compatibility.

4. **Additional Features**

   - **Configuration Retrieval:**  
     The `getConfig()` method simply returns the current configuration of the model, which is useful for reproducing or debugging the model's operational parameters.
     
   - **Unimplemented Methods:**  
     Methods such as `updateMetaMemory`, `manifoldStep`, and a separate `save` function (distinct from `saveModel`) are marked as unimplemented. They throw errors indicating they are placeholders for extended functionality.

---

### Ensuring that `llm-system-prompt.md` Does Its Job

The **llm-system-prompt.md** file (located in the project's `docs` folder) is intended to bridge the operational flow between your LLM and the Titan Memory Model. Here are key points on ensuring it functions as intended:

- **Integration Instructions:**  
  The system prompt should instruct the LLM to initialize or load the persistent memory using the Titan Memory Model saved state. This might include explicitly referencing the use of functions like `loadModel(path)` to recover the weight configuration.

- **Context Awareness:**  
  The prompt must emphasize that the LLM should leverage the memory states (i.e., short-term, long-term, and meta channels) in every interaction. This helps the LLM produce context-aware responses by updating, querying, and refining its memory using the Titan Memory Model.

- **Consistent Memory Interactions:**  
  The system prompt should outline how the LLM is expected to:
  
  - Execute a forward pass to predict outputs,
  - Incorporate surprise metrics to determine if updates are needed,
  - Use the pruning mechanism to keep the memory size manageable.
  
- **Persistence and Updates:**  
  It should mention that at appropriate times (e.g., end of conversations or after specific interactions), the memory state must be persisted using the provided saving functions. This ensures that the learned patterns are retained over sessions.

- **Integration with Transports:**  
  If your integration involves WebSocket or stdio transports, the prompt should guide the LLM to adhere to the protocols for sending inputs, receiving model predictions, and updating the memory state.

By explicitly outlining these points, the **llm-system-prompt.md** file ensures that any connected LLM leverages the Titan Memory Model’s capabilities—facilitating continuous learning and enhancing the conversational context with persistent memory.

---