CodeRAG

storage.md•11.6 KiB

# Storage CodeRAG provides two storage implementations: `PersistentStorage` for SQLite-backed persistence and `MemoryStorage` for in-memory usage. ## PersistentStorage SQLite-based storage using LibSQL and Drizzle ORM. Supports chunk-level indexing with BM25 scoring. ### Constructor ```typescript new PersistentStorage(config?: DbConfig) ``` #### Parameters **config** `DbConfig` (optional) ```typescript interface DbConfig { codebaseRoot?: string // Project root (default: process.cwd()) dbPath?: string // Custom database path (optional) } ``` If `dbPath` is not provided, uses `~/.coderag/projects/<hash>/` based on the codebase root path. #### Example ```typescript import { PersistentStorage } from '@sylphx/coderag' // Automatic path (recommended) const storage = new PersistentStorage({ codebaseRoot: '/path/to/project' }) // Creates: ~/.coderag/projects/a3d5f8b2/index.db // Custom path const storage = new PersistentStorage({ dbPath: '/custom/path/index.db' }) ``` ## Storage Interface Both `PersistentStorage` and `MemoryStorage` implement the `Storage` interface: ```typescript interface Storage { storeFile(file: CodebaseFile): Promise<void> storeFiles?(files: CodebaseFile[]): Promise<void> getFile(path: string): Promise<CodebaseFile | null> getAllFiles(): Promise<CodebaseFile[]> deleteFile(path: string): Promise<void> clear(): Promise<void> count(): Promise<number> getChunkCount?(): Promise<number> exists(path: string): Promise<boolean> } ``` ## File Operations ### storeFile() Store a single file in the index. ```typescript async storeFile(file: CodebaseFile): Promise<void> ``` #### Parameters ```typescript interface CodebaseFile { path: string content: string size: number mtime: number | Date language?: string hash: string } ``` #### Example ```typescript await storage.storeFile({ path: 'src/index.ts', content: 'export function hello() { }', size: 30, mtime: Date.now(), language: 'typescript', hash: 'abc123' }) ``` ### storeFiles() Store multiple files in a batch (more efficient than individual stores). ```typescript async storeFiles(files: CodebaseFile[]): Promise<void> ``` #### Example ```typescript const files: CodebaseFile[] = [ { path: 'src/a.ts', content: '...', /* ... */ }, { path: 'src/b.ts', content: '...', /* ... */ } ] await storage.storeFiles(files) ``` ### getFile() Retrieve a file by path. ```typescript async getFile(path: string): Promise<CodebaseFile | null> ``` #### Example ```typescript const file = await storage.getFile('src/index.ts') if (file) { console.log(file.content) } ``` ### getAllFiles() Get all indexed files. ```typescript async getAllFiles(): Promise<CodebaseFile[]> ``` #### Example ```typescript const files = await storage.getAllFiles() console.log(`Total files: ${files.length}`) ``` ### deleteFile() Delete a file from the index. ```typescript async deleteFile(path: string): Promise<void> ``` #### Example ```typescript await storage.deleteFile('src/removed.ts') ``` ### deleteFiles() Delete multiple files in a batch (PersistentStorage only). ```typescript async deleteFiles(paths: string[]): Promise<void> ``` #### Example ```typescript await storage.deleteFiles(['src/a.ts', 'src/b.ts']) ``` ### clear() Clear all files from the index. ```typescript async clear(): Promise<void> ``` #### Example ```typescript await storage.clear() ``` ### count() Get total number of indexed files. ```typescript async count(): Promise<number> ``` #### Example ```typescript const fileCount = await storage.count() console.log(`Indexed ${fileCount} files`) ``` ### exists() Check if a file exists in the index. ```typescript async exists(path: string): Promise<boolean> ``` #### Example ```typescript if (await storage.exists('src/index.ts')) { console.log('File is indexed') } ``` ## Chunk Operations These methods are specific to `PersistentStorage` and support chunk-level indexing. ### storeChunks() Store chunks for a file (replaces existing chunks). ```typescript async storeChunks(filePath: string, chunks: ChunkData[]): Promise<number[]> ``` #### Parameters ```typescript interface ChunkData { content: string type: string // AST node type (e.g., 'FunctionDeclaration') startLine: number endLine: number metadata?: Record<string, unknown> } ``` #### Returns `Promise<number[]>` - Array of chunk IDs. #### Example ```typescript const chunkIds = await storage.storeChunks('src/index.ts', [ { content: 'export function hello() { }', type: 'FunctionDeclaration', startLine: 1, endLine: 3 } ]) ``` ### storeManyChunks() Store chunks for multiple files in a batch. ```typescript async storeManyChunks( fileChunks: Array<{ filePath: string; chunks: ChunkData[] }> ): Promise<Map<string, number[]>> ``` #### Returns `Promise<Map<string, number[]>>` - Map of file paths to chunk ID arrays. #### Example ```typescript const chunkIdMap = await storage.storeManyChunks([ { filePath: 'src/a.ts', chunks: [{ content: '...', type: 'FunctionDeclaration', startLine: 1, endLine: 5 }] }, { filePath: 'src/b.ts', chunks: [{ content: '...', type: 'ClassDeclaration', startLine: 1, endLine: 10 }] } ]) const aChunkIds = chunkIdMap.get('src/a.ts') ``` ### getChunksForFile() Get all chunks for a file. ```typescript async getChunksForFile(filePath: string): Promise<StoredChunk[]> ``` #### Returns ```typescript interface StoredChunk extends ChunkData { id: number fileId: number filePath: string } ``` #### Example ```typescript const chunks = await storage.getChunksForFile('src/index.ts') for (const chunk of chunks) { console.log(`${chunk.type} (${chunk.startLine}-${chunk.endLine}): ${chunk.content}`) } ``` ### getChunkCount() Get total number of indexed chunks. ```typescript async getChunkCount(): Promise<number> ``` #### Example ```typescript const chunkCount = await storage.getChunkCount() console.log(`Indexed ${chunkCount} chunks`) ``` ## Vector Operations These methods manage TF-IDF vectors for chunk-level search. ### storeChunkVectors() Store TF-IDF vectors for a chunk. ```typescript async storeChunkVectors( chunkId: number, terms: Map<string, { tf: number; tfidf: number; rawFreq: number }>, tokenCount?: number ): Promise<void> ``` #### Example ```typescript const terms = new Map([ ['function', { tf: 0.5, tfidf: 1.2, rawFreq: 2 }], ['export', { tf: 0.25, tfidf: 0.8, rawFreq: 1 }] ]) await storage.storeChunkVectors(123, terms, 4) ``` ### storeManyChunkVectors() Store vectors for multiple chunks in a batch. ```typescript async storeManyChunkVectors( chunkVectors: Array<{ chunkId: number terms: Map<string, { tf: number; tfidf: number; rawFreq: number }> tokenCount?: number }> ): Promise<void> ``` ### getChunkVectors() Get TF-IDF vectors for a chunk. ```typescript async getChunkVectors( chunkId: number ): Promise<Map<string, { tf: number; tfidf: number; rawFreq: number }> | null> ``` ### getAllChunkVectors() Get all chunk vectors in a single query. ```typescript async getAllChunkVectors(): Promise< Map<number, Map<string, { tf: number; tfidf: number; rawFreq: number }>> > ``` Returns a map of chunk IDs to their term vectors. ## IDF Operations Manage inverse document frequency scores. ### storeIdfScores() Store IDF scores for all terms. ```typescript async storeIdfScores( idf: Map<string, number>, docFreq: Map<string, number> ): Promise<void> ``` ### getIdfScores() Get all IDF scores. ```typescript async getIdfScores(): Promise<Map<string, number>> ``` ### getIdfScoresForTerms() Get IDF scores for specific terms only. ```typescript async getIdfScoresForTerms(terms: string[]): Promise<Map<string, number>> ``` ### rebuildIdfScoresFromVectors() Recalculate IDF scores from stored vectors. ```typescript async rebuildIdfScoresFromVectors(): Promise<void> ``` Uses smoothed IDF formula: `log((N+1)/(df+1)) + 1` ## Search Operations ### searchByTerms() Search chunks by query terms using SQL. ```typescript async searchByTerms( queryTerms: string[], options?: { limit?: number } ): Promise<SearchCandidate[]> ``` #### Returns ```typescript interface SearchCandidate { chunkId: number filePath: string content: string type: string startLine: number endLine: number matchedTerms: Map<string, { tfidf: number; rawFreq: number }> magnitude: number tokenCount: number } ``` #### Example ```typescript const candidates = await storage.searchByTerms( ['function', 'export'], { limit: 20 } ) for (const candidate of candidates) { console.log(`${candidate.filePath}:${candidate.startLine}`) console.log(`Matched: ${Array.from(candidate.matchedTerms.keys()).join(', ')}`) } ``` ## Metadata Operations ### setMetadata() Store metadata key-value pair. ```typescript async setMetadata(key: string, value: string): Promise<void> ``` ### getMetadata() Retrieve metadata value. ```typescript async getMetadata(key: string): Promise<string | null> ``` ### getAverageDocLength() Get average chunk token count (for BM25). ```typescript async getAverageDocLength(): Promise<number> ``` ### updateAverageDocLength() Recalculate and store average chunk length. ```typescript async updateAverageDocLength(): Promise<number> ``` ## Maintenance Operations ### recalculateTfidfScores() Update all TF-IDF scores using current IDF values. ```typescript async recalculateTfidfScores(): Promise<void> ``` ### updateChunkMagnitudes() Recalculate pre-computed magnitudes for cosine similarity. ```typescript async updateChunkMagnitudes(): Promise<void> ``` Magnitude formula: `sqrt(sum(tfidf^2))` for each chunk. ### getAllFileMetadata() Get file metadata without content (for incremental updates). ```typescript async getAllFileMetadata(): Promise<Map<string, { mtime: number; hash: string }>> ``` ### getTermsForFiles() Get all terms used in chunks of specified files. ```typescript async getTermsForFiles(paths: string[]): Promise<Set<string>> ``` Useful for tracking affected terms during incremental updates. ## MemoryStorage In-memory storage implementation. Useful for testing or temporary indexes. ### Constructor ```typescript new MemoryStorage() ``` ### Example ```typescript import { MemoryStorage } from '@sylphx/coderag' const storage = new MemoryStorage() await storage.storeFile({ path: 'test.ts', content: 'console.log("test")', size: 20, mtime: Date.now(), hash: 'abc' }) const file = await storage.getFile('test.ts') ``` ## Database Schema PersistentStorage uses the following tables: **files** - id, path (unique), content, hash, size, mtime, language, indexed_at **chunks** - id, file_id, content, type, start_line, end_line, metadata, magnitude, token_count **document_vectors** - chunk_id, term, tf, tfidf, raw_freq **idf_scores** - term (unique), idf, document_frequency **index_metadata** - key (unique), value, updated_at ## Performance Tips ### Batch Operations Use batch methods for better performance: ```typescript // Good: Single transaction await storage.storeFiles(files) await storage.storeManyChunks(fileChunks) // Bad: Multiple transactions for (const file of files) { await storage.storeFile(file) } ``` ### Incremental Updates Use metadata comparison to avoid unnecessary work: ```typescript const metadata = await storage.getAllFileMetadata() const existing = metadata.get(filePath) if (existing?.hash === newHash) { // Skip, content unchanged } ``` ### Low Memory Mode For large codebases, use SQL-based search: ```typescript const indexer = new CodebaseIndexer({ storage: new PersistentStorage(), lowMemoryMode: true // Uses searchByTerms() instead of in-memory index }) ``` ## Related - [CodebaseIndexer](./indexer.md) - [Types](./types.md)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/SylphxAI/coderag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

storage.md•11.6 KiB