basic-memory

Overview Schema Related Servers Score Discussions

basic-memory
test-int

BENCHMARKS.md•6.6 KiB

# Performance Benchmarks This directory contains performance benchmark tests for Basic Memory search indexing and retrieval. ## Purpose These benchmarks measure baseline performance to track improvements from optimizations. They are particularly important for: - Local semantic search throughput and query latency - Large repositories (100s to 1000s of files) - Validating optimization efforts before and after ranking/indexing changes ## Running Benchmarks ### Run all benchmarks (excluding slow ones) ```bash pytest test-int/test_search_performance_benchmark.py -v -m "benchmark and not slow" ``` ### Run specific benchmark ```bash # Cold indexing throughput (300 notes) pytest test-int/test_search_performance_benchmark.py::test_benchmark_search_index_cold_start_300_notes -v # Query latency for fts/vector/hybrid pytest test-int/test_search_performance_benchmark.py::test_benchmark_search_query_latency_by_mode -v # Retrieval quality (hit@1, recall@5, mrr@10) for lexical/paraphrase suites pytest test-int/test_search_performance_benchmark.py::test_benchmark_search_quality_recall_by_mode -v # Incremental re-index (80 changed notes out of 800) pytest test-int/test_search_performance_benchmark.py::test_benchmark_search_incremental_reindex_80_of_800_notes -v -m slow ``` ### Run all benchmarks including slow ones ```bash pytest test-int/test_search_performance_benchmark.py -v -m benchmark ``` ### Write JSON benchmark artifacts ```bash BASIC_MEMORY_BENCHMARK_OUTPUT=.benchmarks/search-benchmarks.jsonl \ pytest test-int/test_search_performance_benchmark.py -v -m benchmark ``` ### Compare two benchmark runs ```bash uv run python test-int/compare_search_benchmarks.py \ .benchmarks/search-baseline.jsonl \ .benchmarks/search-candidate.jsonl \ --show-missing # via just just benchmark-compare .benchmarks/search-baseline.jsonl .benchmarks/search-candidate.jsonl table --show-missing ``` Optional filters: ```bash uv run python test-int/compare_search_benchmarks.py \ .benchmarks/search-baseline.jsonl \ .benchmarks/search-candidate.jsonl \ --benchmarks "cold index (300 notes),query latency (hybrid)" ``` Markdown output for PR comments: ```bash uv run python test-int/compare_search_benchmarks.py \ .benchmarks/search-baseline.jsonl \ .benchmarks/search-candidate.jsonl \ --format markdown ``` ### Skip benchmarks in regular test runs ```bash pytest -m "not benchmark" ``` ### Optional guardrails (recommended for nightly runs only) ```bash BASIC_MEMORY_BENCH_MIN_COLD_NOTES_PER_SEC=80 \ BASIC_MEMORY_BENCH_MIN_INCREMENTAL_NOTES_PER_SEC=60 \ BASIC_MEMORY_BENCH_MAX_FTS_P95_MS=30 \ BASIC_MEMORY_BENCH_MAX_VECTOR_P95_MS=45 \ BASIC_MEMORY_BENCH_MAX_HYBRID_P95_MS=60 \ pytest test-int/test_search_performance_benchmark.py -v -m benchmark ``` Guardrails are opt-in. When threshold environment variables are not set, tests only report metrics. ## Benchmark Output Each benchmark provides detailed metrics including: - **Performance Metrics**: - Total indexing/re-index time - Notes processed per second - Query latency percentiles (p50/p95/p99) - Retrieval quality metrics (hit@1, recall@5, mrr@10) - **Database Metrics**: - Final SQLite database size for the benchmark run - **Operation Counts**: - Notes indexed - Notes re-indexed - Queries executed per retrieval mode - **Optional JSON Artifacts**: - One JSON object per benchmark test run when `BASIC_MEMORY_BENCHMARK_OUTPUT` is set - Includes benchmark name, UTC timestamp, and metric values ## Example Output ``` BENCHMARK: cold index (300 notes) notes indexed: 300 elapsed (s): 11.4820 notes/sec: 26.13 sqlite size (MB): 4.83 BENCHMARK: query latency (hybrid) queries executed: 32 avg latency (ms): 3.40 p50 latency (ms): 2.94 p95 latency (ms): 5.88 p99 latency (ms): 6.21 ``` ## Interpreting Results ### Good Performance Indicators - **notes/sec stays stable across runs**: indexing path changes are not regressing - **p95 query latency stays stable**: retrieval changes are not regressing tail latency - **recall@5 and mrr@10 stay stable or improve**: relevance quality is not regressing - **sqlite size growth stays proportional to note volume**: vector/index growth remains predictable ### Areas for Improvement - **indexing throughput drops significantly**: inspect per-note indexing and vector chunking - **p95/p99 latency spikes**: inspect fusion and vector candidate scans - **quality metrics drop**: inspect ranking fusion and chunking strategy - **db size growth is disproportionate**: inspect chunk sizing and duplicated indexed text ## Tracking Improvements Before making optimizations: 1. Run benchmarks to establish baseline 2. Optionally set `BASIC_MEMORY_BENCHMARK_OUTPUT` to capture machine-readable metrics 3. Save output for comparison 4. Note any particular pain points (e.g., slow search indexing) After optimizations: 1. Run the same benchmarks 2. Compare metrics: - Notes/sec should increase for indexing and incremental re-index - p95/p99 query latency should decrease or remain stable - SQLite size should remain proportional to note volume 3. Optionally run with guardrail env vars in nightly CI to catch regressions 4. Document improvements in PR ## Guardrail Environment Variables - `BASIC_MEMORY_BENCH_MIN_COLD_NOTES_PER_SEC` - `BASIC_MEMORY_BENCH_MAX_COLD_SQLITE_SIZE_MB` - `BASIC_MEMORY_BENCH_MIN_INCREMENTAL_NOTES_PER_SEC` - `BASIC_MEMORY_BENCH_MAX_INCREMENTAL_SQLITE_SIZE_MB` - `BASIC_MEMORY_BENCH_MAX_FTS_P95_MS` - `BASIC_MEMORY_BENCH_MAX_FTS_P99_MS` - `BASIC_MEMORY_BENCH_MAX_VECTOR_P95_MS` - `BASIC_MEMORY_BENCH_MAX_VECTOR_P99_MS` - `BASIC_MEMORY_BENCH_MAX_HYBRID_P95_MS` - `BASIC_MEMORY_BENCH_MAX_HYBRID_P99_MS` - `BASIC_MEMORY_BENCH_MIN_LEXICAL_FTS_RECALL_AT_5` - `BASIC_MEMORY_BENCH_MIN_LEXICAL_FTS_MRR_AT_10` - `BASIC_MEMORY_BENCH_MIN_LEXICAL_VECTOR_RECALL_AT_5` - `BASIC_MEMORY_BENCH_MIN_LEXICAL_VECTOR_MRR_AT_10` - `BASIC_MEMORY_BENCH_MIN_LEXICAL_HYBRID_RECALL_AT_5` - `BASIC_MEMORY_BENCH_MIN_LEXICAL_HYBRID_MRR_AT_10` - `BASIC_MEMORY_BENCH_MIN_PARAPHRASE_FTS_RECALL_AT_5` - `BASIC_MEMORY_BENCH_MIN_PARAPHRASE_FTS_MRR_AT_10` - `BASIC_MEMORY_BENCH_MIN_PARAPHRASE_VECTOR_RECALL_AT_5` - `BASIC_MEMORY_BENCH_MIN_PARAPHRASE_VECTOR_MRR_AT_10` - `BASIC_MEMORY_BENCH_MIN_PARAPHRASE_HYBRID_RECALL_AT_5` - `BASIC_MEMORY_BENCH_MIN_PARAPHRASE_HYBRID_MRR_AT_10` ## Related Issues - [#351: Performance: Optimize sync/indexing for cloud deployments](https://github.com/basicmachines-co/basic-memory/issues/351) ## Test File Generation Benchmarks generate realistic markdown notes with: - YAML frontmatter with tags - Multiple markdown sections per note - Repeated domain-specific terms for retrieval-mode comparisons - Sufficient content length to exercise chunk-based semantic indexing

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/basicmachines-co/basic-memory'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

BENCHMARKS.md•6.6 KiB