Skip to main content
Glama
orneryd

M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

by orneryd
clustering.md44.1 kB
# NornicDB Clustering Guide This guide covers how to set up, configure, and operate NornicDB in clustered configurations for high availability and scalability. ## Table of Contents 1. [Overview](#overview) 2. [Choosing a Replication Mode](#choosing-a-replication-mode) 3. [Mode 1: Hot Standby (2 Nodes)](#mode-1-hot-standby-2-nodes) 4. [Mode 2: Raft Cluster (3+ Nodes)](#mode-2-raft-cluster-3-nodes) 5. [Mode 3: Multi-Region](#mode-3-multi-region) 6. [Client Connection](#client-connection) 7. [Monitoring & Health Checks](#monitoring-health-checks) 8. [Failover Operations](#failover-operations) 9. [Troubleshooting](#troubleshooting) --- ## Overview NornicDB supports multiple replication modes to meet different availability and consistency requirements: | Mode | Nodes | Consistency | Use Case | | ---------------- | ----- | ------------ | -------------------------------------- | | **Standalone** | 1 | N/A | Development, testing, small workloads | | **Hot Standby** | 2 | Eventual | Simple HA, fast failover | | **Raft Cluster** | 3-5 | Strong | Production HA, consistent reads | | **Multi-Region** | 6+ | Configurable | Global distribution, disaster recovery | ### Architecture Diagram ``` MODE 1: HOT STANDBY (2 nodes) ┌─────────────┐ WAL Stream ┌─────────────┐ │ Primary │ ──────────────────► │ Standby │ │ (writes) │ (async/sync) │ (failover) │ └─────────────┘ └─────────────┘ MODE 2: RAFT CLUSTER (3-5 nodes) ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Leader │◄──►│ Follower │◄──►│ Follower │ │ (writes) │ │ (reads) │ │ (reads) │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ └──────────────────┴──────────────────┘ Raft Consensus MODE 3: MULTI-REGION (Raft clusters + cross-region HA) ┌─────────────────────────┐ ┌─────────────────────────┐ │ US-EAST REGION │ │ EU-WEST REGION │ │ ┌───┐ ┌───┐ ┌───┐ │ WAL │ ┌───┐ ┌───┐ ┌───┐ │ │ │ L │ │ F │ │ F │ │◄────►│ │ L │ │ F │ │ F │ │ │ └───┘ └───┘ └───┘ │async │ └───┘ └───┘ └───┘ │ │ Raft Cluster A │ │ Raft Cluster B │ └─────────────────────────┘ └─────────────────────────┘ ``` --- ## Choosing a Replication Mode ### Hot Standby (2 nodes) **Choose this when:** - You need simple, fast failover - You have exactly 2 nodes available - Eventual consistency is acceptable - You want minimal operational complexity **Trade-offs:** - ✅ Simple setup and operation - ✅ Fast failover (~1-5 seconds) - ✅ Low resource overhead - ⚠️ Only 2 nodes (no quorum) - ⚠️ Risk of data loss on async failover ### Raft Cluster (3-5 nodes) **Choose this when:** - You need strong consistency guarantees - You want automatic leader election - You can deploy 3+ nodes - Data integrity is critical **Trade-offs:** - ✅ Strong consistency (linearizable) - ✅ Automatic leader election - ✅ No data loss on failover - ⚠️ Higher latency (quorum writes) - ⚠️ Requires odd number of nodes ### Multi-Region (6+ nodes) **Choose this when:** - You need geographic distribution - You want disaster recovery across regions - You have users in multiple geographic areas - You can tolerate cross-region latency **Trade-offs:** - ✅ Geographic redundancy - ✅ Local read performance - ✅ Disaster recovery - ⚠️ Complex setup - ⚠️ Cross-region latency for writes --- ## Mode 1: Hot Standby (2 Nodes) Hot Standby provides simple high availability with one primary node handling all writes and one standby node receiving replicated data. ### Configuration Overview | Variable | Primary | Standby | Description | | ----------------------------------- | -------------- | -------------- | ---------------------- | | `NORNICDB_CLUSTER_MODE` | `ha_standby` | `ha_standby` | Replication mode | | `NORNICDB_CLUSTER_NODE_ID` | `primary-1` | `standby-1` | Unique node identifier | | `NORNICDB_CLUSTER_HA_ROLE` | `primary` | `standby` | Node role | | `NORNICDB_CLUSTER_HA_PEER_ADDR` | `standby:7688` | `primary:7688` | Peer address | | `NORNICDB_CLUSTER_HA_AUTO_FAILOVER` | `true` | `true` | Enable auto-failover | ### Docker Compose Setup Create a `docker-compose.ha.yml`: ```yaml version: "3.8" services: nornicdb-primary: image: nornicdb:latest container_name: nornicdb-primary hostname: primary ports: - "7474:7474" # HTTP API - "7687:7687" # Bolt protocol volumes: - primary-data:/data environment: # Cluster Configuration NORNICDB_CLUSTER_MODE: ha_standby NORNICDB_CLUSTER_NODE_ID: primary-1 NORNICDB_CLUSTER_BIND_ADDR: 0.0.0.0:7688 # HA Standby Configuration NORNICDB_CLUSTER_HA_ROLE: primary NORNICDB_CLUSTER_HA_PEER_ADDR: standby:7688 NORNICDB_CLUSTER_HA_SYNC_MODE: semi_sync NORNICDB_CLUSTER_HA_HEARTBEAT_MS: 1000 NORNICDB_CLUSTER_HA_FAILOVER_TIMEOUT: 30s NORNICDB_CLUSTER_HA_AUTO_FAILOVER: "true" networks: - nornicdb-cluster healthcheck: test: ["CMD", "curl", "-f", "http://localhost:7474/health"] interval: 10s timeout: 5s retries: 3 nornicdb-standby: image: nornicdb:latest container_name: nornicdb-standby hostname: standby ports: - "7475:7474" # HTTP API (different port) - "7688:7687" # Bolt protocol (different port) volumes: - standby-data:/data environment: # Cluster Configuration NORNICDB_CLUSTER_MODE: ha_standby NORNICDB_CLUSTER_NODE_ID: standby-1 NORNICDB_CLUSTER_BIND_ADDR: 0.0.0.0:7688 # HA Standby Configuration NORNICDB_CLUSTER_HA_ROLE: standby NORNICDB_CLUSTER_HA_PEER_ADDR: primary:7688 NORNICDB_CLUSTER_HA_SYNC_MODE: semi_sync NORNICDB_CLUSTER_HA_HEARTBEAT_MS: 1000 NORNICDB_CLUSTER_HA_FAILOVER_TIMEOUT: 30s NORNICDB_CLUSTER_HA_AUTO_FAILOVER: "true" networks: - nornicdb-cluster depends_on: - nornicdb-primary healthcheck: test: ["CMD", "curl", "-f", "http://localhost:7474/health"] interval: 10s timeout: 5s retries: 3 volumes: primary-data: standby-data: networks: nornicdb-cluster: driver: bridge ``` ### Starting the Cluster ```bash # Start the HA cluster docker compose -f docker-compose.ha.yml up -d # Check status docker compose -f docker-compose.ha.yml ps # View logs docker compose -f docker-compose.ha.yml logs -f ``` ### Sync Modes | Mode | Description | Latency | Data Safety | | ----------- | --------------------------- | ------- | ----------------- | | `async` | Acknowledge immediately | Lowest | Risk of data loss | | `semi_sync` | Wait for standby to receive | Medium | Minimal data loss | | `sync` | Wait for standby to persist | Highest | No data loss | ### Manual Failover ```bash # Trigger manual failover (run on standby) curl -X POST http://localhost:7475/admin/promote # Or via environment variable on restart NORNICDB_CLUSTER_HA_ROLE=primary ``` --- ## Mode 2: Raft Cluster (3+ Nodes) Raft provides strong consistency with automatic leader election. All writes go through the elected leader and are replicated to followers before acknowledgment. ### Configuration Overview | Variable | Node 1 | Node 2 | Node 3 | Description | | --------------------------------- | --------- | --------- | --------- | ----------------- | | `NORNICDB_CLUSTER_MODE` | `raft` | `raft` | `raft` | Replication mode | | `NORNICDB_CLUSTER_NODE_ID` | `node-1` | `node-2` | `node-3` | Unique node ID | | `NORNICDB_CLUSTER_RAFT_BOOTSTRAP` | `true` | `false` | `false` | Bootstrap cluster | | `NORNICDB_CLUSTER_RAFT_PEERS` | See below | See below | See below | Peer addresses | ### Docker Compose Setup Create a `docker-compose.raft.yml`: ```yaml version: "3.8" services: nornicdb-node1: image: nornicdb:latest container_name: nornicdb-node1 hostname: node1 ports: - "7474:7474" # HTTP API - "7687:7687" # Bolt protocol volumes: - node1-data:/data environment: # Cluster Configuration NORNICDB_CLUSTER_MODE: raft NORNICDB_CLUSTER_NODE_ID: node-1 NORNICDB_CLUSTER_BIND_ADDR: 0.0.0.0:7688 NORNICDB_CLUSTER_ADVERTISE_ADDR: node1:7688 # Raft Configuration NORNICDB_CLUSTER_RAFT_CLUSTER_ID: my-cluster NORNICDB_CLUSTER_RAFT_BOOTSTRAP: "true" NORNICDB_CLUSTER_RAFT_PEERS: "node-2:node2:7688,node-3:node3:7688" NORNICDB_CLUSTER_RAFT_ELECTION_TIMEOUT: 1s NORNICDB_CLUSTER_RAFT_HEARTBEAT_TIMEOUT: 100ms NORNICDB_CLUSTER_RAFT_SNAPSHOT_INTERVAL: 300 NORNICDB_CLUSTER_RAFT_SNAPSHOT_THRESHOLD: 10000 networks: - nornicdb-cluster healthcheck: test: ["CMD", "curl", "-f", "http://localhost:7474/health"] interval: 10s timeout: 5s retries: 3 nornicdb-node2: image: nornicdb:latest container_name: nornicdb-node2 hostname: node2 ports: - "7475:7474" - "7688:7687" volumes: - node2-data:/data environment: NORNICDB_CLUSTER_MODE: raft NORNICDB_CLUSTER_NODE_ID: node-2 NORNICDB_CLUSTER_BIND_ADDR: 0.0.0.0:7688 NORNICDB_CLUSTER_ADVERTISE_ADDR: node2:7688 NORNICDB_CLUSTER_RAFT_CLUSTER_ID: my-cluster NORNICDB_CLUSTER_RAFT_BOOTSTRAP: "false" NORNICDB_CLUSTER_RAFT_PEERS: "node-1:node1:7688,node-3:node3:7688" NORNICDB_CLUSTER_RAFT_ELECTION_TIMEOUT: 1s NORNICDB_CLUSTER_RAFT_HEARTBEAT_TIMEOUT: 100ms networks: - nornicdb-cluster depends_on: - nornicdb-node1 nornicdb-node3: image: nornicdb:latest container_name: nornicdb-node3 hostname: node3 ports: - "7476:7474" - "7689:7687" volumes: - node3-data:/data environment: NORNICDB_CLUSTER_MODE: raft NORNICDB_CLUSTER_NODE_ID: node-3 NORNICDB_CLUSTER_BIND_ADDR: 0.0.0.0:7688 NORNICDB_CLUSTER_ADVERTISE_ADDR: node3:7688 NORNICDB_CLUSTER_RAFT_CLUSTER_ID: my-cluster NORNICDB_CLUSTER_RAFT_BOOTSTRAP: "false" NORNICDB_CLUSTER_RAFT_PEERS: "node-1:node1:7688,node-2:node2:7688" NORNICDB_CLUSTER_RAFT_ELECTION_TIMEOUT: 1s NORNICDB_CLUSTER_RAFT_HEARTBEAT_TIMEOUT: 100ms networks: - nornicdb-cluster depends_on: - nornicdb-node1 volumes: node1-data: node2-data: node3-data: networks: nornicdb-cluster: driver: bridge ``` ### Starting the Cluster ```bash # Start the Raft cluster docker compose -f docker-compose.raft.yml up -d # Wait for leader election (typically 1-5 seconds) sleep 5 # Check cluster status curl http://localhost:7474/admin/cluster/status ``` ### Raft Tuning Parameters | Parameter | Default | Description | | ------------------------- | ------- | ----------------------------- | | `RAFT_ELECTION_TIMEOUT` | `1s` | Time before starting election | | `RAFT_HEARTBEAT_TIMEOUT` | `100ms` | Leader heartbeat interval | | `RAFT_SNAPSHOT_INTERVAL` | `300` | Seconds between snapshots | | `RAFT_SNAPSHOT_THRESHOLD` | `10000` | Log entries before snapshot | | `RAFT_TRAILING_LOGS` | `10000` | Logs to retain after snapshot | | `RAFT_MAX_APPEND_ENTRIES` | `64` | Max entries per AppendEntries | ### Adding/Removing Nodes ```bash # Add a new voter node (run on leader) curl -X POST http://localhost:7474/admin/cluster/add \ -H "Content-Type: application/json" \ -d '{"node_id": "node-4", "address": "node4:7688"}' # Remove a node (run on leader) curl -X POST http://localhost:7474/admin/cluster/remove \ -H "Content-Type: application/json" \ -d '{"node_id": "node-4"}' ``` --- ## Mode 3: Multi-Region Multi-Region deployment combines Raft clusters within each region with asynchronous replication between regions. ### Configuration Overview | Variable | US-East | EU-West | Description | | --------------------------------- | ----------------------------- | ----------------------------- | ----------------- | | `NORNICDB_CLUSTER_MODE` | `multi_region` | `multi_region` | Replication mode | | `NORNICDB_CLUSTER_REGION_ID` | `us-east` | `eu-west` | Region identifier | | `NORNICDB_CLUSTER_REMOTE_REGIONS` | `eu-west:eu-coordinator:7688` | `us-east:us-coordinator:7688` | Remote regions | ### Docker Compose Setup (US-East Region) Create `docker-compose.us-east.yml`: ```yaml version: "3.8" services: # US-East Raft Cluster (3 nodes) us-east-node1: image: nornicdb:latest container_name: us-east-node1 hostname: us-east-node1 ports: - "7474:7474" - "7687:7687" volumes: - us-east-node1-data:/data environment: # Multi-Region Configuration NORNICDB_CLUSTER_MODE: multi_region NORNICDB_CLUSTER_NODE_ID: us-east-node-1 NORNICDB_CLUSTER_BIND_ADDR: 0.0.0.0:7688 NORNICDB_CLUSTER_ADVERTISE_ADDR: us-east-node1:7688 # Region Configuration NORNICDB_CLUSTER_REGION_ID: us-east NORNICDB_CLUSTER_REMOTE_REGIONS: "eu-west:eu-west-node1:7688" NORNICDB_CLUSTER_CROSS_REGION_MODE: async NORNICDB_CLUSTER_CONFLICT_STRATEGY: last_write_wins # Local Raft Configuration NORNICDB_CLUSTER_RAFT_CLUSTER_ID: us-east-cluster NORNICDB_CLUSTER_RAFT_BOOTSTRAP: "true" NORNICDB_CLUSTER_RAFT_PEERS: "us-east-node-2:us-east-node2:7688,us-east-node-3:us-east-node3:7688" networks: - nornicdb-global us-east-node2: image: nornicdb:latest container_name: us-east-node2 hostname: us-east-node2 ports: - "7475:7474" - "7688:7687" volumes: - us-east-node2-data:/data environment: NORNICDB_CLUSTER_MODE: multi_region NORNICDB_CLUSTER_NODE_ID: us-east-node-2 NORNICDB_CLUSTER_BIND_ADDR: 0.0.0.0:7688 NORNICDB_CLUSTER_ADVERTISE_ADDR: us-east-node2:7688 NORNICDB_CLUSTER_REGION_ID: us-east NORNICDB_CLUSTER_REMOTE_REGIONS: "eu-west:eu-west-node1:7688" NORNICDB_CLUSTER_RAFT_CLUSTER_ID: us-east-cluster NORNICDB_CLUSTER_RAFT_BOOTSTRAP: "false" NORNICDB_CLUSTER_RAFT_PEERS: "us-east-node-1:us-east-node1:7688,us-east-node-3:us-east-node3:7688" networks: - nornicdb-global depends_on: - us-east-node1 us-east-node3: image: nornicdb:latest container_name: us-east-node3 hostname: us-east-node3 ports: - "7476:7474" - "7689:7687" volumes: - us-east-node3-data:/data environment: NORNICDB_CLUSTER_MODE: multi_region NORNICDB_CLUSTER_NODE_ID: us-east-node-3 NORNICDB_CLUSTER_BIND_ADDR: 0.0.0.0:7688 NORNICDB_CLUSTER_ADVERTISE_ADDR: us-east-node3:7688 NORNICDB_CLUSTER_REGION_ID: us-east NORNICDB_CLUSTER_REMOTE_REGIONS: "eu-west:eu-west-node1:7688" NORNICDB_CLUSTER_RAFT_CLUSTER_ID: us-east-cluster NORNICDB_CLUSTER_RAFT_BOOTSTRAP: "false" NORNICDB_CLUSTER_RAFT_PEERS: "us-east-node-1:us-east-node1:7688,us-east-node-2:us-east-node2:7688" networks: - nornicdb-global depends_on: - us-east-node1 volumes: us-east-node1-data: us-east-node2-data: us-east-node3-data: networks: nornicdb-global: driver: bridge ``` ### Cross-Region Replication Modes | Mode | Description | Latency | Consistency | | ----------- | ------------------------------ | ------- | ----------- | | `async` | Fire-and-forget replication | Lowest | Eventual | | `semi_sync` | Wait for remote acknowledgment | Higher | Stronger | ### Conflict Resolution Strategies | Strategy | Description | Use Case | | ----------------- | ------------------------- | ----------------- | | `last_write_wins` | Latest timestamp wins | Most applications | | `manual` | Require manual resolution | Financial data | --- ## Client Connection ### Connecting to Hot Standby ```javascript // Node.js with neo4j-driver const neo4j = require("neo4j-driver"); // Connect to primary for writes const primaryDriver = neo4j.driver( "bolt://localhost:7687", neo4j.auth.basic("admin", "admin") ); // Connect to standby for reads (optional) const standbyDriver = neo4j.driver( "bolt://localhost:7688", neo4j.auth.basic("admin", "admin") ); // For high availability, use routing driver const haDriver = neo4j.driver( "bolt://localhost:7687", neo4j.auth.basic("admin", "admin"), { resolver: (address) => [ "localhost:7687", // Primary "localhost:7688", // Standby ], } ); ``` ### Connecting to Raft Cluster ```javascript // Node.js with neo4j-driver const neo4j = require("neo4j-driver"); // Use routing driver for automatic leader discovery const driver = neo4j.driver( "bolt://localhost:7687", neo4j.auth.basic("admin", "admin"), { resolver: (address) => [ "localhost:7687", // Node 1 "localhost:7688", // Node 2 "localhost:7689", // Node 3 ], } ); // Writes automatically route to leader const session = driver.session({ defaultAccessMode: neo4j.session.WRITE }); await session.run("CREATE (n:Person {name: $name})", { name: "Alice" }); // Reads can go to any node const readSession = driver.session({ defaultAccessMode: neo4j.session.READ }); const result = await readSession.run("MATCH (n:Person) RETURN n"); ``` ### Connecting to Multi-Region ```javascript // Connect to nearest region const usEastDriver = neo4j.driver( "bolt://us-east-node1:7687", neo4j.auth.basic("admin", "admin"), { resolver: (address) => [ "us-east-node1:7687", "us-east-node2:7687", "us-east-node3:7687", ], } ); // For cross-region failover const globalDriver = neo4j.driver( "bolt://us-east-node1:7687", neo4j.auth.basic("admin", "admin"), { resolver: (address) => [ // Primary region "us-east-node1:7687", "us-east-node2:7687", "us-east-node3:7687", // Failover region "eu-west-node1:7687", "eu-west-node2:7687", "eu-west-node3:7687", ], } ); ``` ### Python Connection ```python from neo4j import GraphDatabase # Hot Standby / Raft driver = GraphDatabase.driver( "bolt://localhost:7687", auth=("admin", "admin"), resolver=lambda _: [ ("localhost", 7687), ("localhost", 7688), ("localhost", 7689) ] ) with driver.session() as session: result = session.run("MATCH (n) RETURN count(n)") print(result.single()[0]) ``` --- ## Monitoring & Health Checks ### Health Endpoint ```bash # Check node health curl http://localhost:7474/health # Example response { "status": "healthy", "replication": { "mode": "raft", "node_id": "node-1", "role": "leader", "is_leader": true, "state": "ready", "commit_index": 12345, "applied_index": 12345, "peers": [ {"id": "node-2", "address": "node2:7688", "healthy": true}, {"id": "node-3", "address": "node3:7688", "healthy": true} ] } } ``` ### Cluster Status Endpoint ```bash # Get cluster status (Raft/Multi-Region) curl http://localhost:7474/admin/cluster/status # Example response { "cluster_id": "my-cluster", "mode": "raft", "leader": { "id": "node-1", "address": "node1:7688" }, "members": [ {"id": "node-1", "address": "node1:7688", "state": "leader", "healthy": true}, {"id": "node-2", "address": "node2:7688", "state": "follower", "healthy": true}, {"id": "node-3", "address": "node3:7688", "state": "follower", "healthy": true} ], "commit_index": 12345, "term": 5 } ``` ### Prometheus Metrics ```bash # Scrape metrics curl http://localhost:7474/metrics # Key replication metrics nornicdb_replication_mode{mode="raft"} 1 nornicdb_replication_is_leader 1 nornicdb_replication_commit_index 12345 nornicdb_replication_applied_index 12345 nornicdb_replication_lag_seconds 0.001 nornicdb_replication_peer_healthy{peer="node-2"} 1 nornicdb_replication_peer_healthy{peer="node-3"} 1 ``` ### Docker Health Checks ```yaml healthcheck: test: ["CMD", "curl", "-f", "http://localhost:7474/health"] interval: 10s timeout: 5s retries: 3 start_period: 30s ``` --- ## Failover Operations ### Hot Standby Failover #### Automatic Failover (Default) Automatic failover is enabled by default when `NORNICDB_CLUSTER_HA_AUTO_FAILOVER=true`. When the standby detects that the primary has been unresponsive for `FAILOVER_TIMEOUT`, it automatically: 1. Attempts to fence the old primary (prevent split-brain) 2. Promotes itself to primary 3. Starts accepting writes #### Manual Failover ```bash # 1. Stop the primary gracefully (if possible) docker stop nornicdb-primary # 2. Promote the standby curl -X POST http://localhost:7475/admin/promote # 3. Update clients to point to new primary # (or use routing driver for automatic discovery) # 4. Optionally, restart old primary as new standby docker start nornicdb-primary # It will automatically detect it's no longer primary ``` ### Raft Failover Raft handles failover automatically through leader election: 1. When the leader fails, followers detect missing heartbeats 2. After `ELECTION_TIMEOUT`, a follower starts an election 3. The node with the most up-to-date log wins the election 4. Clients automatically discover the new leader ```bash # Check current leader curl http://localhost:7474/admin/cluster/status | jq '.leader' # Force leader election (if needed) curl -X POST http://localhost:7474/admin/cluster/transfer-leadership ``` ### Multi-Region Failover #### Automatic Regional Failover Within each region, Raft handles failover automatically. #### Cross-Region Failover ```bash # Promote a region to primary curl -X POST http://eu-west-node1:7474/admin/region/promote # This will: # 1. Ensure this region's Raft cluster has a leader # 2. Mark this region as the primary write region # 3. Notify other regions of the change ``` --- ## Troubleshooting ### Common Issues #### 1. Cluster Won't Form **Symptoms:** Nodes can't see each other, no leader elected **Solutions:** ```bash # Check network connectivity docker exec nornicdb-node1 ping node2 # Verify bind/advertise addresses docker exec nornicdb-node1 env | grep NORNICDB_CLUSTER # Check logs for errors docker logs nornicdb-node1 2>&1 | grep -i error ``` #### 2. Split-Brain **Symptoms:** Multiple nodes think they're the leader **Solutions:** ```bash # Check all nodes' status for port in 7474 7475 7476; do echo "Node on port $port:" curl -s http://localhost:$port/health | jq '.replication' done # Force re-election (stop the incorrect leader) docker restart nornicdb-node1 ``` #### 3. Replication Lag **Symptoms:** Standby/followers are behind **Solutions:** ```bash # Check lag metrics curl http://localhost:7474/metrics | grep lag # Increase batch size for faster catch-up NORNICDB_CLUSTER_HA_WAL_BATCH_SIZE=5000 # Check network throughput docker exec nornicdb-primary iperf3 -c standby ``` #### 4. Failover Not Triggering **Symptoms:** Primary is down but standby hasn't promoted **Solutions:** ```bash # Check auto-failover is enabled docker exec nornicdb-standby env | grep AUTO_FAILOVER # Check failover timeout docker exec nornicdb-standby env | grep FAILOVER_TIMEOUT # Check standby logs docker logs nornicdb-standby 2>&1 | grep -i failover # Manual promotion curl -X POST http://localhost:7475/admin/promote ``` ### Log Levels ```bash # Enable debug logging NORNICDB_LOG_LEVEL=debug # Enable replication-specific debug NORNICDB_CLUSTER_DEBUG=true ``` ### Recovery Procedures #### Recovering from Complete Cluster Loss ```bash # 1. Start one node with bootstrap NORNICDB_CLUSTER_RAFT_BOOTSTRAP=true docker compose up -d nornicdb-node1 # 2. Wait for it to become leader sleep 10 # 3. Add remaining nodes docker compose up -d nornicdb-node2 nornicdb-node3 # 4. Verify cluster health curl http://localhost:7474/admin/cluster/status ``` #### Restoring from Backup ```bash # 1. Stop all nodes docker compose down # 2. Restore data to primary/bootstrap node tar -xzf backup.tar.gz -C /path/to/primary-data # 3. Clear other nodes' data (they'll sync from primary) rm -rf /path/to/node2-data/* rm -rf /path/to/node3-data/* # 4. Start cluster docker compose up -d ``` --- ## Configuration Reference ### Environment Variables | Variable | Default | Description | | --------------------------------- | -------------------- | ----------------------------------- | | `NORNICDB_CLUSTER_MODE` | `standalone` | Replication mode | | `NORNICDB_CLUSTER_NODE_ID` | auto-generated | Unique node identifier | | `NORNICDB_CLUSTER_BIND_ADDR` | `0.0.0.0:7688` | Address to bind for cluster traffic | | `NORNICDB_CLUSTER_ADVERTISE_ADDR` | same as bind | Address advertised to peers | | `NORNICDB_CLUSTER_DATA_DIR` | `./data/replication` | Directory for replication state | #### Hot Standby | Variable | Default | Description | | -------------------------------------- | ----------- | --------------------------------------- | | `NORNICDB_CLUSTER_HA_ROLE` | - | `primary` or `standby` (required) | | `NORNICDB_CLUSTER_HA_PEER_ADDR` | - | Address of peer node (required) | | `NORNICDB_CLUSTER_HA_SYNC_MODE` | `semi_sync` | Sync mode: `async`, `semi_sync`, `sync` | | `NORNICDB_CLUSTER_HA_HEARTBEAT_MS` | `1000` | Heartbeat interval in ms | | `NORNICDB_CLUSTER_HA_FAILOVER_TIMEOUT` | `30s` | Time before failover | | `NORNICDB_CLUSTER_HA_AUTO_FAILOVER` | `true` | Enable automatic failover | #### Raft | Variable | Default | Description | | ------------------------------------------ | ---------- | ------------------------- | | `NORNICDB_CLUSTER_RAFT_CLUSTER_ID` | `nornicdb` | Cluster identifier | | `NORNICDB_CLUSTER_RAFT_BOOTSTRAP` | `false` | Bootstrap new cluster | | `NORNICDB_CLUSTER_RAFT_PEERS` | - | Comma-separated peers | | `NORNICDB_CLUSTER_RAFT_ELECTION_TIMEOUT` | `1s` | Election timeout | | `NORNICDB_CLUSTER_RAFT_HEARTBEAT_TIMEOUT` | `100ms` | Heartbeat timeout | | `NORNICDB_CLUSTER_RAFT_SNAPSHOT_INTERVAL` | `300` | Seconds between snapshots | | `NORNICDB_CLUSTER_RAFT_SNAPSHOT_THRESHOLD` | `10000` | Entries before snapshot | #### Multi-Region | Variable | Default | Description | | ------------------------------------ | ----------------- | ---------------------------- | | `NORNICDB_CLUSTER_REGION_ID` | - | Region identifier (required) | | `NORNICDB_CLUSTER_REMOTE_REGIONS` | - | Remote region addresses | | `NORNICDB_CLUSTER_CROSS_REGION_MODE` | `async` | Cross-region sync mode | | `NORNICDB_CLUSTER_CONFLICT_STRATEGY` | `last_write_wins` | Conflict resolution | --- ## Best Practices ### Network Configuration 1. **Use dedicated network for cluster traffic** - Separate user traffic from replication 2. **Low latency between nodes** - < 1ms for Raft, < 100ms for HA standby 3. **Reliable network** - Packet loss causes leader elections ### Security 1. **Enable TLS** for cluster communication (coming soon) 2. **Use firewalls** to restrict cluster port access 3. **Separate credentials** for admin operations ### Capacity Planning | Mode | Min Nodes | Recommended | Max Practical | | ------------ | --------- | ----------- | ------------- | | Hot Standby | 2 | 2 | 2 | | Raft | 3 | 3-5 | 7 | | Multi-Region | 6 | 9+ | No limit | ### Monitoring 1. **Alert on** leader changes, replication lag, peer health 2. **Dashboard** replication metrics alongside application metrics 3. **Test failover** regularly in non-production environments --- ## Technical Deep Dive This section explains the internal architecture of NornicDB clustering for users who want to understand how the system works or need to troubleshoot advanced issues. ### Network Architecture NornicDB uses two separate network protocols: ``` ┌─────────────────────────────────────────────────────────────┐ │ NornicDB Node │ ├─────────────────────────────┬───────────────────────────────┤ │ Client Traffic │ Cluster Traffic │ │ ┌─────────────────────┐ │ ┌─────────────────────────┐ │ │ │ Bolt Protocol │ │ │ Cluster Protocol │ │ │ │ (Port 7687) │ │ │ (Port 7688) │ │ │ ├─────────────────────┤ │ ├─────────────────────────┤ │ │ │ • Cypher queries │ │ │ • Raft consensus │ │ │ │ • Result streaming │ │ │ - RequestVote RPC │ │ │ │ • Transactions │ │ │ - AppendEntries RPC │ │ │ │ • Neo4j compatible │ │ │ • WAL streaming (HA) │ │ │ └─────────────────────┘ │ │ • Heartbeats │ │ │ │ │ • Failover coordination │ │ │ │ └─────────────────────────┘ │ └─────────────────────────────┴───────────────────────────────┘ ``` **Port 7687 (Bolt Protocol):** - Standard Neo4j Bolt protocol for client connections - Used by all Neo4j drivers (Python, Java, Go, etc.) - Handles Cypher query execution and result streaming - In a cluster, followers forward write queries to the leader via Bolt **Port 7688 (Cluster Protocol):** - Binary protocol optimized for low-latency cluster communication - Length-prefixed JSON messages over TCP - Handles all cluster coordination and replication - Should be firewalled from external access ### Message Flow Diagrams #### Hot Standby Write Flow ``` Client Primary Standby │ │ │ │─── WRITE (Bolt) ───────► │ │ │ │ │ │── WALBatch ───────────►│ │ │ │ │ │◄─ WALBatchResponse ────│ │ │ │ │◄── SUCCESS ────────────│ │ │ │ │ ``` #### Raft Consensus Write Flow ``` Client Leader Follower 1 Follower 2 │ │ │ │ │── WRITE (Bolt) ────────► │ │ │ │ │ │ │ │── AppendEntries ───►│ │ │ │── AppendEntries ────┼──────────────────►│ │ │ │ │ │ │◄── Success ─────────│ │ │ │◄── Success ──────────────────────────────│ │ │ │ │ │ │ (quorum reached) │ │ │◄── SUCCESS ────────────│ │ │ │ │ │ │ ``` #### Raft Leader Election ``` Follower A Follower B Follower C │ │ │ │ (leader timeout) │ │ │ │ │ ┌──────────┴──────────┐ │ │ │ Become CANDIDATE │ │ │ │ Increment term │ │ │ │ Vote for self │ │ │ └──────────┬──────────┘ │ │ │ │ │ │── RequestVote ────────► │ │── RequestVote ─────────────────────────────────► │ │ │ │◄── VoteGranted ───────│ │ │◄── VoteGranted ─────────────────────────────────│ │ │ │ ┌──────────┴──────────┐ │ │ │ Become LEADER │ │ │ │ (majority votes) │ │ │ └──────────┬──────────┘ │ │ │ │ │ │── AppendEntries ──────► (heartbeat) │ │── AppendEntries ───────────────────────────────► │ │ │ ``` ### Wire Protocol The cluster protocol uses a simple length-prefixed JSON format: ``` ┌─────────────┬─────────────────────────────────────┐ │ Length (4B) │ JSON Payload │ │ Big Endian │ │ └─────────────┴─────────────────────────────────────┘ ``` **Message Types:** | Type | Code | Direction | Description | | --------------------- | ---- | -------------------- | ------------------------ | | VoteRequest | 1 | Candidate → Follower | Request vote in election | | VoteResponse | 2 | Follower → Candidate | Grant/deny vote | | AppendEntries | 3 | Leader → Follower | Replicate log entries | | AppendEntriesResponse | 4 | Follower → Leader | Acknowledge entries | | WALBatch | 5 | Primary → Standby | Stream WAL entries (HA) | | WALBatchResponse | 6 | Standby → Primary | Acknowledge WAL | | Heartbeat | 7 | Primary → Standby | Health check | | HeartbeatResponse | 8 | Standby → Primary | Health status | | Fence | 9 | Standby → Primary | Fence old primary | | FenceResponse | 10 | Primary → Standby | Acknowledge fence | | Promote | 11 | Admin → Standby | Promote to primary | | PromoteResponse | 12 | Standby → Admin | Promotion status | ### Configuration Parameter Details #### Election Timeout (`NORNICDB_CLUSTER_RAFT_ELECTION_TIMEOUT`) Time a follower waits without hearing from the leader before starting an election. ``` Recommended values: - Same datacenter: 150ms - 500ms - Same region: 500ms - 2s - Cross-region: 2s - 10s (higher than network RTT) Formula: election_timeout > 2 × heartbeat_timeout + max_network_RTT ``` **Too low:** Frequent unnecessary elections, cluster instability **Too high:** Slow failure detection, longer downtime during failover #### Heartbeat Timeout (`NORNICDB_CLUSTER_RAFT_HEARTBEAT_TIMEOUT`) How often the leader sends heartbeats to maintain authority. ``` Recommended values: - Same datacenter: 50ms - 100ms - Same region: 100ms - 500ms - Cross-region: 500ms - 2s Formula: heartbeat_timeout < election_timeout / 2 ``` #### WAL Batch Size (`NORNICDB_CLUSTER_HA_WAL_BATCH_SIZE`) Number of WAL entries sent per batch during replication. ``` Trade-offs: - Higher = better throughput, higher latency per write - Lower = lower latency, more network overhead Recommended: 100-1000 for HA, 64 for Raft (via RAFT_MAX_APPEND_ENTRIES) ``` #### Sync Mode (`NORNICDB_CLUSTER_HA_SYNC_MODE`) | Mode | Acknowledgment | Data Safety | Latency | | ----------- | ----------------- | ----------------- | ------- | | `async` | Primary only | Risk of data loss | Lowest | | `semi_sync` | Standby received | Minimal data loss | Medium | | `sync` | Standby persisted | No data loss | Highest | ### Consistency Levels For queries that support it, NornicDB supports tunable consistency: | Level | Meaning | Use Case | | -------------- | ------------------------ | ----------------------------------- | | `ONE` | Read from any node | Fastest reads, eventual consistency | | `LOCAL_ONE` | Read from local region | Geographic locality | | `QUORUM` | Majority must agree | Strong consistency | | `LOCAL_QUORUM` | Majority in local region | Regional consistency | | `ALL` | All nodes must agree | Maximum consistency, slowest | ### Connection Pool Configuration For applications connecting to clusters, configure your driver's connection pool: ```python # Python example from neo4j import GraphDatabase driver = GraphDatabase.driver( "bolt://node1:7687", auth=("admin", "admin"), max_connection_pool_size=50, connection_acquisition_timeout=60, max_transaction_retry_time=30, # For routing drivers (automatic failover): # "neo4j://node1:7687" # Use neo4j:// scheme ) ``` ```go // Go example driver, _ := neo4j.NewDriver( "bolt://node1:7687", neo4j.BasicAuth("admin", "admin", ""), func(config *neo4j.Config) { config.MaxConnectionPoolSize = 50 config.ConnectionAcquisitionTimeout = time.Minute config.MaxTransactionRetryTime = 30 * time.Second }, ) ``` ### TLS Configuration **Important:** TLS for cluster communication is strongly recommended in production. | Variable | Description | | ------------------------------------ | ------------------------------------- | | `NORNICDB_CLUSTER_TLS_ENABLED` | Enable TLS (`true`/`false`) | | `NORNICDB_CLUSTER_TLS_CERT_FILE` | Path to server certificate (PEM) | | `NORNICDB_CLUSTER_TLS_KEY_FILE` | Path to private key (PEM) | | `NORNICDB_CLUSTER_TLS_CA_FILE` | Path to CA certificate for mTLS | | `NORNICDB_CLUSTER_TLS_VERIFY_CLIENT` | Require client certs (`true`/`false`) | | `NORNICDB_CLUSTER_TLS_MIN_VERSION` | Minimum TLS version (`1.2` or `1.3`) | Example with mTLS: ```yaml environment: NORNICDB_CLUSTER_TLS_ENABLED: "true" NORNICDB_CLUSTER_TLS_CERT_FILE: /certs/server.crt NORNICDB_CLUSTER_TLS_KEY_FILE: /certs/server.key NORNICDB_CLUSTER_TLS_CA_FILE: /certs/ca.crt NORNICDB_CLUSTER_TLS_VERIFY_CLIENT: "true" NORNICDB_CLUSTER_TLS_MIN_VERSION: "1.3" volumes: - ./certs:/certs:ro ``` ### Metrics Reference Key metrics to monitor: | Metric | Description | Alert Threshold | | --------------------------------------- | ------------------ | ---------------- | | `nornicdb_replication_lag_seconds` | Time behind leader | > 5s | | `nornicdb_cluster_leader_changes_total` | Leader elections | > 1/hour | | `nornicdb_raft_term` | Current Raft term | Sudden increases | | `nornicdb_wal_position` | WAL write position | Divergence | | `nornicdb_peer_healthy` | Peer health status | 0 (unhealthy) | | `nornicdb_replication_bytes_total` | Data replicated | Sudden drops | ### Debugging Commands ```bash # Check cluster status (all modes) curl http://localhost:7474/admin/cluster/status | jq # Check Raft state curl http://localhost:7474/admin/raft/state | jq # Check replication lag curl http://localhost:7474/admin/replication/lag # Force step-down (leader only) curl -X POST http://localhost:7474/admin/raft/step-down # Transfer leadership to specific node curl -X POST http://localhost:7474/admin/raft/transfer-leadership \ -H "Content-Type: application/json" \ -d '{"target": "node-2"}' # Check WAL position curl http://localhost:7474/admin/wal/position # Snapshot now (Raft) curl -X POST http://localhost:7474/admin/raft/snapshot ``` --- ## Next Steps - [Cluster Security](../operations/cluster-security.md) - Authentication for clusters - [Replication Architecture](../architecture/replication.md) - Internal architecture details - [Clustering Roadmap](../architecture/clustering-roadmap.md) - Future sharding plans - [Complete Examples](./complete-examples.md) - End-to-end usage examples - [System Design](../architecture/system-design.md) - Overall system architecture

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server