M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

Overview Schema Related Servers Score Discussions

KMEANS_TESTING.md•3.64 KiB

# K-Means Clustering Testing Guide Quick reference for testing k-means clustering with NornicDB. ## Prerequisites ```bash cd nornicdb ``` ## 1. Generate Test Data ### Option A: Movie Dataset (Best for Semantic Testing) ```bash # Generate 2000 movies with genre-specific content (will cluster by genre) go run cmd/kmeans-test-data/main.go -mode movies -count 2000 -db ./data/movies-test ``` ### Option B: Pre-clustered Embeddings (Best for K-Means Validation) ```bash # Generate 5000 embeddings with 50 known clusters (ground truth) go run cmd/kmeans-test-data/main.go -mode clusters -count 5000 -clusters 50 -db ./data/cluster-test ``` ### Option C: Large Dataset (Stress Testing) ```bash # Generate 10000 embeddings go run cmd/kmeans-test-data/main.go -mode clusters -count 10000 -clusters 100 -db ./data/stress-test ``` ## 2. Start NornicDB with K-Means Enabled ```bash # Enable k-means clustering export NORNICDB_GPU_CLUSTERING_ENABLED=true # For movie data (needs embedder to generate embeddings) export OLLAMA_BASE_URL=http://localhost:11434 go run cmd/nornicdb/main.go -data ./data/movies-test # For pre-clustered data (has embeddings already) go run cmd/nornicdb/main.go -data ./data/cluster-test ``` ## 3. Watch the Logs You should see: ``` 🔬 K-means clustering enabled for accelerated semantic search ✅ Search indexes built from existing data [K-MEANS] ✅ Clustering ENABLED | mode=CPU clusters=100 max_iter=50 init=kmeans++ [K-MEANS] 🔄 STARTING | embeddings=5000 [K-MEANS] ✅ COMPLETE | clusters=100 embeddings=5000 iterations=12 duration=234ms ``` For movie data with embedder: ``` 🧠 Embed worker started 🔄 Processing node movie-00001 for embedding... [K-MEANS] 🔬 Embedding batch complete (2000 processed), triggering k-means clustering... ``` ## 4. Test Search ### Via HTTP API ```bash # Semantic search (uses cluster-accelerated path if available) curl -X POST http://localhost:7474/nornicdb/search \ -H "Content-Type: application/json" \ -d '{"query": "space exploration aliens", "limit": 10}' # Should see in logs: # [K-MEANS] 🔍 SEARCH | mode=clustered clusters_searched=3 candidates=20 duration=1.2ms ``` ### Via Cypher ```cypher // Full-text search CALL db.index.fulltext.queryNodes('default', 'horror scary') YIELD node, score RETURN node.title, node.genre, score LIMIT 10 // Vector similarity search (if embeddings exist) CALL db.index.vector.queryNodes('default', 10, 'romantic love story') YIELD node, score RETURN node.title, score ``` ## 5. Verify Clustering is Working Check these log messages: | Log | Meaning | |-----|---------| | `mode=clustered` | ✅ Using k-means accelerated search | | `mode=brute_force` | ❌ Falling back to brute force | | `mode=brute_force_fallback` | ⚠️ Cluster search failed, using fallback | | `reason=not_yet_clustered` | ⏳ Clustering hasn't run yet | | `reason=too_few_embeddings` | Need 1000+ embeddings | ## 6. Minimum Requirements - **1000+ embeddings** required for k-means to trigger - Fewer embeddings = brute force is faster anyway ## Quick Command Reference ```bash # Generate + import movies go run cmd/kmeans-test-data/main.go -mode movies -count 2000 -db ./data/test # Generate + import clustered embeddings go run cmd/kmeans-test-data/main.go -mode clusters -count 5000 -db ./data/test # Just save to JSON (no import) go run cmd/kmeans-test-data/main.go -mode movies -count 2000 -output ./data/export # Run NornicDB with k-means NORNICDB_GPU_CLUSTERING_ENABLED=true go run cmd/nornicdb/main.go -data ./data/test # Test search curl -X POST localhost:7474/nornicdb/search -d '{"query":"test","limit":10}' ``` ## Cleanup ```bash rm -rf ./data/movies-test ./data/cluster-test ./data/stress-test ```

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

KMEANS_TESTING.md•3.64 KiB