BENCHMARKS.md•4.05 kB
# Performance Benchmarks
This directory contains performance benchmark tests for Basic Memory's sync/indexing operations.
## Purpose
These benchmarks measure baseline performance to track improvements from optimizations. They are particularly important for:
- Cloud deployments with ephemeral databases that need fast re-indexing
- Large repositories (100s to 1000s of files)
- Validating optimization efforts
## Running Benchmarks
### Run all benchmarks (excluding slow ones)
```bash
pytest test-int/test_sync_performance_benchmark.py -v -m "benchmark and not slow"
```
### Run specific benchmark
```bash
# 100 files (fast, ~10-30 seconds)
pytest test-int/test_sync_performance_benchmark.py::test_benchmark_sync_100_files -v
# 500 files (medium, ~1-3 minutes)
pytest test-int/test_sync_performance_benchmark.py::test_benchmark_sync_500_files -v
# 1000 files (slow, ~3-10 minutes)
pytest test-int/test_sync_performance_benchmark.py::test_benchmark_sync_1000_files -v
# Re-sync with no changes (tests scan performance)
pytest test-int/test_sync_performance_benchmark.py::test_benchmark_resync_no_changes -v
```
### Run all benchmarks including slow ones
```bash
pytest test-int/test_sync_performance_benchmark.py -v -m benchmark
```
### Skip benchmarks in regular test runs
```bash
pytest -m "not benchmark"
```
## Benchmark Output
Each benchmark provides detailed metrics including:
- **Performance Metrics**:
- Total sync time
- Files processed per second
- Milliseconds per file
- **Database Metrics**:
- Initial database size
- Final database size
- Database growth (total and per file)
- **Operation Counts**:
- New files indexed
- Modified files processed
- Deleted files handled
- Moved files tracked
## Example Output
```
======================================================================
BENCHMARK: Sync 100 files (small repository)
======================================================================
Generating 100 test files...
Created files 0-100 (100/100)
File generation completed in 0.15s (666.7 files/sec)
Initial database size: 120.00 KB
Starting sync of 100 files...
----------------------------------------------------------------------
RESULTS:
----------------------------------------------------------------------
Files processed: 100
New: 100
Modified: 0
Deleted: 0
Moved: 0
Performance:
Total time: 12.34s
Files/sec: 8.1
ms/file: 123.4
Database:
Initial size: 120.00 KB
Final size: 5.23 MB
Growth: 5.11 MB
Growth per file: 52.31 KB
======================================================================
```
## Interpreting Results
### Good Performance Indicators
- **Files/sec > 10**: Good indexing speed for small-medium repos
- **Files/sec > 5**: Acceptable for large repos with complex relations
- **DB growth < 100KB per file**: Reasonable index size
### Areas for Improvement
- **Files/sec < 5**: May benefit from batch operations
- **ms/file > 200**: High latency per file, check for N+1 queries
- **DB growth > 200KB per file**: Search index may be bloated (trigrams?)
## Tracking Improvements
Before making optimizations:
1. Run benchmarks to establish baseline
2. Save output for comparison
3. Note any particular pain points (e.g., slow search indexing)
After optimizations:
1. Run the same benchmarks
2. Compare metrics:
- Files/sec should increase
- ms/file should decrease
- DB growth per file may decrease (with search optimizations)
3. Document improvements in PR
## Related Issues
- [#351: Performance: Optimize sync/indexing for cloud deployments](https://github.com/basicmachines-co/basic-memory/issues/351)
## Test File Generation
Benchmarks generate realistic markdown files with:
- YAML frontmatter with tags
- 3-10 observations per file with categories
- 1-3 relations per file (including forward references)
- Varying content to simulate real usage
- Files organized in category subdirectories