Skip to main content
Glama
orneryd

M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

by orneryd
replication.md16.1 kB
# Replication Architecture This document describes the internal architecture of NornicDB's replication system for contributors and advanced users. > **For user documentation, see [Clustering Guide](../user-guides/clustering.md)** ## Overview NornicDB supports three replication modes to meet different availability and consistency requirements: | Mode | Nodes | Consistency | Use Case | |------|-------|-------------|----------| | **Standalone** | 1 | N/A | Development, testing, small workloads | | **Hot Standby** | 2 | Eventual | Simple HA, fast failover | | **Raft Cluster** | 3-5 | Strong | Production HA, consistent reads | | **Multi-Region** | 6+ | Configurable | Global distribution, disaster recovery | ## Architecture Diagram ``` ┌────────────────────────────────────────────────────────────────────────────────┐ │ NORNICDB REPLICATION ARCHITECTURE │ ├────────────────────────────────────────────────────────────────────────────────┤ │ │ │ MODE 1: HOT STANDBY (2 nodes) │ │ ┌─────────────┐ WAL Stream ┌─────────────┐ │ │ │ Primary │ ──────────────────► │ Standby │ │ │ │ (writes) │ (async/sync) │ (failover) │ │ │ └─────────────┘ └─────────────┘ │ │ │ │ MODE 2: RAFT CLUSTER (3-5 nodes) │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ Leader │◄──►│ Follower │◄──►│ Follower │ │ │ │ (writes) │ │ (reads) │ │ (reads) │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ │ │ │ └──────────────────┴──────────────────┘ │ │ Raft Consensus │ │ │ │ MODE 3: MULTI-REGION (Raft clusters + cross-region HA) │ │ ┌─────────────────────────┐ ┌─────────────────────────┐ │ │ │ US-EAST REGION │ │ EU-WEST REGION │ │ │ │ ┌───┐ ┌───┐ ┌───┐ │ │ ┌───┐ ┌───┐ ┌───┐ │ │ │ │ │ L │ │ F │ │ F │ │ WAL │ │ L │ │ F │ │ F │ │ │ │ │ └───┘ └───┘ └───┘ │◄────►│ └───┘ └───┘ └───┘ │ │ │ │ Raft Cluster A │async │ Raft Cluster B │ │ │ └─────────────────────────┘ └─────────────────────────┘ │ │ │ └────────────────────────────────────────────────────────────────────────────────┘ ``` ## Package Structure ``` pkg/replication/ ├── config.go # Configuration loading and validation ├── replicator.go # Core Replicator interface and factory ├── transport.go # ClusterTransport for node-to-node communication ├── ha_standby.go # Hot Standby implementation ├── raft.go # Raft consensus implementation ├── multi_region.go # Multi-region coordinator ├── wal.go # WAL streaming primitives ├── chaos_test.go # Chaos testing infrastructure ├── scenario_test.go # E2E scenario tests └── replication_test.go # Unit tests ``` ## Core Interfaces ### Replicator Interface All replication modes implement this interface: ```go type Replicator interface { // Start starts the replicator Start(ctx context.Context) error // Apply applies a write operation (routes to leader if needed) Apply(cmd *Command, timeout time.Duration) error // IsLeader returns true if this node can accept writes IsLeader() bool // LeaderAddr returns the address of the current leader LeaderAddr() string // LeaderID returns the ID of the current leader LeaderID() string // Health returns health status Health() *HealthStatus // WaitForLeader blocks until a leader is elected WaitForLeader(timeout time.Duration) error // Mode returns the replication mode Mode() ReplicationMode // NodeID returns this node's ID NodeID() string // Shutdown gracefully shuts down Shutdown() error } ``` ### Transport Interface Node-to-node communication: ```go type Transport interface { // Connect establishes a connection to a peer Connect(ctx context.Context, addr string) (PeerConnection, error) // Listen accepts incoming connections Listen(ctx context.Context, addr string, handler ConnectionHandler) error // Close shuts down the transport Close() error } type PeerConnection interface { // WAL streaming (Hot Standby) SendWALBatch(ctx context.Context, entries []*WALEntry) (*WALBatchResponse, error) SendHeartbeat(ctx context.Context, req *HeartbeatRequest) (*HeartbeatResponse, error) SendFence(ctx context.Context, req *FenceRequest) (*FenceResponse, error) SendPromote(ctx context.Context, req *PromoteRequest) (*PromoteResponse, error) // Raft consensus SendRaftVote(ctx context.Context, req *RaftVoteRequest) (*RaftVoteResponse, error) SendRaftAppendEntries(ctx context.Context, req *RaftAppendEntriesRequest) (*RaftAppendEntriesResponse, error) Close() error IsConnected() bool } ``` ### Storage Interface Replication layer's view of storage: ```go type Storage interface { // Commands ApplyCommand(cmd *Command) error // WAL position tracking GetWALPosition() (uint64, error) SetWALPosition(pos uint64) error // Node/Edge operations (used by WAL applier) CreateNode(node *Node) error UpdateNode(node *Node) error DeleteNode(id NodeID) error CreateEdge(edge *Edge) error DeleteEdge(from, to NodeID, relType string) error SetProperty(nodeID NodeID, key string, value interface{}) error } ``` ## Network Protocol ### Port Allocation | Port | Protocol | Purpose | |------|----------|---------| | 7474 | HTTP | REST API, Admin, Health checks | | 7687 | Bolt | Neo4j-compatible client queries | | 7688 | Cluster | Replication, Raft consensus | ### Wire Format The cluster protocol uses length-prefixed JSON over TCP: ``` ┌─────────────┬─────────────────────────────────────┐ │ Length (4B) │ JSON Payload │ │ Big Endian │ │ └─────────────┴─────────────────────────────────────┘ ``` ### Message Types | Type | Code | Direction | Description | |------|------|-----------|-------------| | VoteRequest | 1 | Candidate → Follower | Request vote in election | | VoteResponse | 2 | Follower → Candidate | Grant/deny vote | | AppendEntries | 3 | Leader → Follower | Replicate log entries | | AppendEntriesResponse | 4 | Follower → Leader | Acknowledge entries | | WALBatch | 5 | Primary → Standby | Stream WAL entries | | WALBatchResponse | 6 | Standby → Primary | Acknowledge WAL | | Heartbeat | 7 | Primary → Standby | Health check | | HeartbeatResponse | 8 | Standby → Primary | Health status | | Fence | 9 | Standby → Primary | Fence old primary | | FenceResponse | 10 | Primary → Standby | Acknowledge fence | | Promote | 11 | Admin → Standby | Promote to primary | | PromoteResponse | 12 | Standby → Admin | Promotion status | ## Mode 1: Hot Standby ### Components - **Primary**: Accepts writes, streams WAL to standby - **Standby**: Receives WAL, ready for failover - **WALStreamer**: Manages WAL position and batching - **WALApplier**: Applies WAL entries to storage ### Write Flow ``` Client Primary Standby │ │ │ │─── WRITE (Bolt) ───────► │ │ │ │ │ │── WALBatch ───────────►│ │ │ │ │ │◄─ WALBatchResponse ────│ │ │ │ │◄── SUCCESS ────────────│ │ ``` ### Sync Modes | Mode | Acknowledgment | Data Safety | Latency | |------|----------------|-------------|---------| | `async` | Primary only | Risk of data loss | Lowest | | `semi_sync` | Standby received | Minimal data loss | Medium | | `sync` | Standby persisted | No data loss | Highest | ### Failover Process 1. Standby detects missing heartbeats 2. After `FAILOVER_TIMEOUT`, standby attempts to fence primary 3. Standby promotes itself to primary 4. Clients reconnect to new primary ## Mode 2: Raft Consensus ### Components - **RaftReplicator**: Main Raft node implementation - **Election Timer**: Triggers leader election on timeout - **Log**: In-memory Raft log with commit tracking - **Heartbeat Loop**: Leader sends heartbeats to maintain authority ### State Machine ``` ┌────────────┐ │ Follower │◄────────────────┐ └─────┬──────┘ │ │ election timeout │ ▼ │ ┌────────────┐ │ ┌─────►│ Candidate │─────────────────┤ │ └─────┬──────┘ loses election │ │ │ │ │ │ wins election │ │ ▼ │ │ ┌────────────┐ │ │ │ Leader │─────────────────┘ │ └─────┬──────┘ discovers higher term │ │ └────────────┘ starts new election ``` ### Leader Election 1. Follower's election timer expires 2. Increments term, transitions to Candidate 3. Votes for self, requests votes from peers 4. If majority votes received → becomes Leader 5. Sends heartbeats to maintain leadership ### Log Replication 1. Client sends write to Leader 2. Leader appends entry to log 3. Leader sends AppendEntries to all followers 4. When majority acknowledge → entry committed 5. Leader applies to state machine, responds to client ### Raft RPC Messages **RequestVote:** ```go type RaftVoteRequest struct { Term uint64 CandidateID string LastLogIndex uint64 LastLogTerm uint64 } type RaftVoteResponse struct { Term uint64 VoteGranted bool VoterID string } ``` **AppendEntries:** ```go type RaftAppendEntriesRequest struct { Term uint64 LeaderID string LeaderAddr string PrevLogIndex uint64 PrevLogTerm uint64 Entries []*RaftLogEntry LeaderCommit uint64 } type RaftAppendEntriesResponse struct { Term uint64 Success bool MatchIndex uint64 ConflictIndex uint64 ConflictTerm uint64 } ``` ## Mode 3: Multi-Region ### Components - **MultiRegionReplicator**: Coordinates local Raft + cross-region - **Local Raft Cluster**: Strong consistency within region - **Cross-Region Streamer**: Async WAL replication between regions ### Write Flow 1. Write arrives at region's Raft leader 2. Raft commits locally (strong consistency) 3. Async replication to remote regions 4. Remote regions apply WAL entries ### Conflict Resolution When async replication causes conflicts: | Strategy | Description | |----------|-------------| | `last_write_wins` | Latest timestamp wins | | `first_write_wins` | Earliest timestamp wins | | `manual` | Flag for manual resolution | ## Configuration ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `NORNICDB_CLUSTER_MODE` | `standalone` | `standalone`, `ha_standby`, `raft`, `multi_region` | | `NORNICDB_CLUSTER_NODE_ID` | auto | Unique node identifier | | `NORNICDB_CLUSTER_BIND_ADDR` | `0.0.0.0:7688` | Cluster port binding | | `NORNICDB_CLUSTER_ADVERTISE_ADDR` | same as bind | Address advertised to peers | See [Clustering Guide](../user-guides/clustering.md) for complete configuration reference. ## Testing ### Test Categories | File | Purpose | |------|---------| | `replication_test.go` | Unit tests for each component | | `scenario_test.go` | E2E tests for all modes (A/B/C/D scenarios) | | `chaos_test.go` | Network failure simulation | ### Chaos Testing The chaos testing infrastructure simulates: - Packet loss - High latency (2000ms+) - Connection drops - Data corruption - Packet duplication - Packet reordering - Byzantine failures ### Running Tests ```bash # All replication tests go test ./pkg/replication/... -v # Specific test go test ./pkg/replication/... -run TestScenario_Raft -v # With race detection go test ./pkg/replication/... -race # Skip long-running tests go test ./pkg/replication/... -short ``` ## Implementation Status | Component | Status | Details | |-----------|--------|---------| | Hot Standby | ✅ Complete | 2-node HA with auto-failover | | Raft Cluster | ✅ Complete | 3-5 node strong consistency | | Multi-Region | ✅ Complete | Async cross-region replication | | Chaos Testing | ✅ Complete | Extreme latency, packet loss, Byzantine failures | ## See Also - [Clustering Guide](../user-guides/clustering.md) - User documentation - [System Design](./system-design.md) - Overall architecture - [Plugin System](./plugin-system.md) - APOC plugin architecture

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server