M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

Overview Schema Related Servers Score Discussions

Mimir
nornicdb
cmd
kmeans-test-data

README.md•2.97 KiB

All k-means operations now have structured logging with the `[K-MEANS]` prefix:

| Log | Description |
|-----|-------------|
| `[K-MEANS] ✅ ENABLED` | Clustering enabled with mode/clusters/init |
| `[K-MEANS] 🔄 STARTING` | Clustering starting with embedding count |
| `[K-MEANS] ✅ COMPLETE` | Clustering complete with stats + duration |
| `[K-MEANS] ⏭️  SKIPPED` | Skipped (too few embeddings or not enabled) |
| `[K-MEANS] ❌ FAILED` | Failed with error |
| `[K-MEANS] 🔍 SEARCH` | Search executed with mode + timing |

## 2. Test Data Generator Tool

New tool at `cmd/kmeans-test-data/main.go`:

```bash
# Generate 5000 embeddings with 20 natural clusters and save to file
go run cmd/kmeans-test-data/main.go -mode clusters -count 5000 -clusters 20

# Import directly into NornicDB
go run cmd/kmeans-test-data/main.go -mode clusters -count 5000 -db ./data/kmeans-test

# Generate larger dataset for stress testing
go run cmd/kmeans-test-data/main.go -mode download -download large-text -db ./data/stress-test
```

**Modes:**
- `synthetic` - Random uniformly distributed embeddings
- `clusters` - Embeddings with natural cluster structure (best for k-means testing)
- `download` - Pre-defined datasets (sift-small, glove-25, text-1024, large-text)

**Features:**
- Idempotent (same seed = same data)
- Ground truth cluster labels for validation
- Statistics reporting (cluster sizes, norms, memory)
- Direct import to NornicDB or JSON export

## Full Test Flow

```bash
# 1. Generate test data with clusters
go run cmd/kmeans-test-data/main.go -mode clusters -count 5000 -clusters 50 -db ./data/kmeans-test

# 2. Enable clustering and start NornicDB
export NORNICDB_KMEANS_CLUSTERING_ENABLED=true
go run cmd/nornicdb/main.go -data ./data/kmeans-test

# 3. Watch logs for k-means activity
# [K-MEANS] ✅ Clustering ENABLED | mode=CPU clusters=100 ...
# [K-MEANS] 🔄 STARTING | embeddings=5000
# [K-MEANS] ✅ COMPLETE | clusters=100 embeddings=5000 iterations=12 duration=234ms
```

## K-Means Clustering Integration

### How to Enable
```bash
export NORNICDB_KMEANS_CLUSTERING_ENABLED=true
```

### What Happens

| Stage | Action |
|-------|--------|
| **Startup** | If flag enabled, clustering initialized (CPU mode) |
| **GPU Available** | Upgrades to GPU-accelerated clustering |
| **Index Build** | After indexes built → triggers clustering |
| **Embed Queue Empty** | After batch embedding completes → auto-triggers clustering |
| **Search** | Uses cluster-accelerated search when active (10-50x faster) |

### Smart Behavior
- **Minimum threshold**: Only clusters when 1000+ embeddings (below this, brute-force is faster)
- **Fire-and-forget**: Clustering runs in background, doesn't block embedding worker
- **Auto-upgrade**: Starts with CPU, upgrades to GPU if available later

### New Files Modified
- `pkg/search/search.go` - Added clustering methods and cluster-accelerated search
- `pkg/nornicdb/embed_queue.go` - Added `onQueueEmpty` callback
- `pkg/nornicdb/db.go` - Wired everything together with feature flag

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•2.97 KiB

All k-means operations now have structured logging with the `[K-MEANS]` prefix:

| Log | Description |
|-----|-------------|
| `[K-MEANS] ✅ ENABLED` | Clustering enabled with mode/clusters/init |
| `[K-MEANS] 🔄 STARTING` | Clustering starting with embedding count |
| `[K-MEANS] ✅ COMPLETE` | Clustering complete with stats + duration |
| `[K-MEANS] ⏭️  SKIPPED` | Skipped (too few embeddings or not enabled) |
| `[K-MEANS] ❌ FAILED` | Failed with error |
| `[K-MEANS] 🔍 SEARCH` | Search executed with mode + timing |

## 2. Test Data Generator Tool

New tool at `cmd/kmeans-test-data/main.go`:

```bash
# Generate 5000 embeddings with 20 natural clusters and save to file
go run cmd/kmeans-test-data/main.go -mode clusters -count 5000 -clusters 20

# Import directly into NornicDB
go run cmd/kmeans-test-data/main.go -mode clusters -count 5000 -db ./data/kmeans-test

# Generate larger dataset for stress testing
go run cmd/kmeans-test-data/main.go -mode download -download large-text -db ./data/stress-test
```

**Modes:**
- `synthetic` - Random uniformly distributed embeddings
- `clusters` - Embeddings with natural cluster structure (best for k-means testing)
- `download` - Pre-defined datasets (sift-small, glove-25, text-1024, large-text)

**Features:**
- Idempotent (same seed = same data)
- Ground truth cluster labels for validation
- Statistics reporting (cluster sizes, norms, memory)
- Direct import to NornicDB or JSON export

## Full Test Flow

```bash
# 1. Generate test data with clusters
go run cmd/kmeans-test-data/main.go -mode clusters -count 5000 -clusters 50 -db ./data/kmeans-test

# 2. Enable clustering and start NornicDB
export NORNICDB_KMEANS_CLUSTERING_ENABLED=true
go run cmd/nornicdb/main.go -data ./data/kmeans-test

# 3. Watch logs for k-means activity
# [K-MEANS] ✅ Clustering ENABLED | mode=CPU clusters=100 ...
# [K-MEANS] 🔄 STARTING | embeddings=5000
# [K-MEANS] ✅ COMPLETE | clusters=100 embeddings=5000 iterations=12 duration=234ms
```

## K-Means Clustering Integration

### How to Enable
```bash
export NORNICDB_KMEANS_CLUSTERING_ENABLED=true
```

### What Happens

| Stage | Action |
|-------|--------|
| **Startup** | If flag enabled, clustering initialized (CPU mode) |
| **GPU Available** | Upgrades to GPU-accelerated clustering |
| **Index Build** | After indexes built → triggers clustering |
| **Embed Queue Empty** | After batch embedding completes → auto-triggers clustering |
| **Search** | Uses cluster-accelerated search when active (10-50x faster) |

### Smart Behavior
- **Minimum threshold**: Only clusters when 1000+ embeddings (below this, brute-force is faster)
- **Fire-and-forget**: Clustering runs in background, doesn't block embedding worker
- **Auto-upgrade**: Starts with CPU, upgrades to GPU if available later

### New Files Modified
- `pkg/search/search.go` - Added clustering methods and cluster-accelerated search
- `pkg/nornicdb/embed_queue.go` - Added `onQueueEmpty` callback
- `pkg/nornicdb/db.go` - Wired everything together with feature flag