Skip to main content
Glama

index-vectors

Index project files for semantic search by generating vector embeddings, supporting providers like OpenAI, Azure, and Gemini, with options to specify paths and force re-indexing.

Instructions

Index project files for semantic search using vector embeddings

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
forceNoForce re-indexing of all files
pathNoProject path to index (defaults to current directory)
providerNoEmbedding provider to use (defaults to configured provider)

Implementation Reference

  • src/server.ts:446-465 (registration)
    Registers the 'index-vectors' MCP tool, providing title, description, input schema, and an inline handler function that dynamically imports and invokes handleIndexVectors from './handlers/vector' with processed arguments.
    server.registerTool("index-vectors", { title: "Index Vectors", description: "Index project files for semantic search using vector embeddings", inputSchema: IndexVectorsSchema.shape, }, async (args) => { const { handleIndexVectors } = await import("./handlers/vector"); const result = await handleIndexVectors({ path: args.path || process.cwd(), provider: args.provider, force: args.force || false, }); return { content: [ { type: "text", text: result } ] }; });
  • Zod schema definition for 'index-vectors' tool input validation: path (optional), provider (enum), force (boolean).
    const IndexVectorsSchema = z.object({ path: z.string().optional().describe("Project path to index (defaults to current directory)"), provider: z.enum(["openai", "azure", "gemini"]).optional().describe("Embedding provider to use (defaults to configured provider)"), force: z.boolean().optional().describe("Force re-indexing of all files"), });
  • Core handler logic for vector indexing: scans project files (respecting gitignore), splits into chunks, generates embeddings using provider, stores metadata and vectors in SQLite with VSS support or fallback.
    export async function indexProject(options: IndexingOptions): Promise<IndexingResult> { const startTime = Date.now(); const { projectPath, provider, config, force = false, onProgress } = options; onProgress?.('Initializing vector database...'); const { db, client } = await getVectorDB(projectPath); // Update .gitignore if needed await updateGitignore(projectPath); // Get files to index onProgress?.('Scanning project files...'); const files = await getFilesToIndex(projectPath, config.filePatterns); if (files.length === 0) { logger.warn('No files found to index'); return { filesIndexed: 0, chunksCreated: 0, timeMs: Date.now() - startTime }; } onProgress?.(`Found ${files.length} files to process`); // Create text splitter const splitter = new RecursiveCharacterTextSplitter({ chunkSize: config.chunkSize, chunkOverlap: config.chunkOverlap, }); let filesIndexed = 0; let chunksCreated = 0; // Process files in batches for (let i = 0; i < files.length; i += config.batchSize) { const batch = files.slice(i, i + config.batchSize); const batchChunks: Array<{ id: string; relpath: string; chunk: string; hash: string; mtimeMs: number; embedding?: number[]; }> = []; // Process batch for (const filePath of batch) { try { const relPath = relative(projectPath, filePath); const stats = await stat(filePath); const content = await readFile(filePath, 'utf-8'); // Skip empty files if (!content.trim()) continue; // Split into chunks const chunks = await splitter.splitText(content); for (let idx = 0; idx < chunks.length; idx++) { const chunk = chunks[idx]; const id = `${relPath}#${idx}`; const hash = createHash('sha256').update(chunk).digest('hex'); // Check if chunk already exists with same hash if (!force) { const result = await client.execute({ sql: `SELECT hash, mtime_ms FROM vector_chunks WHERE id = ?`, args: [id] }); if (result.rows.length > 0) { const row = result.rows[0]; const existing = { hash: row[0] as string, mtime_ms: row[1] as number }; if (existing.hash === hash && existing.mtime_ms === stats.mtimeMs) { continue; // Skip unchanged chunk } } } batchChunks.push({ id, relpath: relPath, chunk, hash, mtimeMs: stats.mtimeMs, }); } filesIndexed++; } catch (error) { logger.error(`Error processing file ${filePath}:`, error); } } // Generate embeddings for batch if (batchChunks.length > 0) { onProgress?.(`Generating embeddings for batch ${Math.floor(i / config.batchSize) + 1}...`); try { const texts = batchChunks.map(c => c.chunk); const embeddings = await provider.getEmbeddings(texts); // Store chunks with embeddings (dual-table approach) for (let j = 0; j < batchChunks.length; j++) { const chunk = batchChunks[j]; const embedding = embeddings[j]; // Validate embedding dimensions if (embedding.length !== 1536) { logger.warn(`Embedding dimension mismatch for ${chunk.id}: expected 1536, got ${embedding.length}`); continue; } try { // 1. Insert/update metadata in main table const result = await client.execute({ sql: `INSERT OR REPLACE INTO vector_chunks (id, relpath, chunk, hash, mtime_ms) VALUES (?, ?, ?, ?, ?)`, args: [ chunk.id, chunk.relpath, chunk.chunk, chunk.hash, chunk.mtimeMs, ] }); // 2. Get the rowid for linking const rowidResult = await client.execute({ sql: `SELECT rowid FROM vector_chunks WHERE id = ?`, args: [chunk.id] }); if (rowidResult.rows.length > 0) { const rowid = rowidResult.rows[0][0] as number; // 3. Try to insert into VSS virtual table try { // VSS requires DELETE before INSERT for updates await client.execute({ sql: `DELETE FROM vss_vectors WHERE rowid = ?`, args: [rowid] }); await client.execute({ sql: `INSERT INTO vss_vectors (rowid, embedding) VALUES (?, ?)`, args: [rowid, new Float32Array(embedding).buffer] }); } catch (vssError) { // VSS not available, fallback to adding embedding to main table await client.execute({ sql: `UPDATE vector_chunks SET embedding = ? WHERE id = ?`, args: [float32ArrayToBuffer(embedding), chunk.id] }); } chunksCreated++; } } catch (error) { logger.error(`Error storing chunk ${chunk.id}:`, error); } } } catch (error) { logger.error('Error generating embeddings:', error); throw error; } } onProgress?.(`Processed ${Math.min(i + config.batchSize, files.length)} / ${files.length} files`); } const timeMs = Date.now() - startTime; onProgress?.(`Indexing complete: ${filesIndexed} files, ${chunksCreated} chunks in ${(timeMs / 1000).toFixed(1)}s`); return { filesIndexed, chunksCreated, timeMs }; }
  • Helper to initialize vector database in .ultra-mcp/vector-index-v1.sqlite3, creates tables for metadata (vector_chunks) and vector search (vss_vectors) with fallback embedding column.
    export async function getVectorDB(projectPath: string): Promise<VectorDatabase> { const dbDir = join(projectPath, '.ultra-mcp'); const dbPath = join(dbDir, 'vector-index-v1.sqlite3'); // Create directory if needed if (!existsSync(dbDir)) { mkdirSync(dbDir, { recursive: true }); logger.log(`Created vector database directory at ${dbDir}`); } try { const client = createClient({ url: `file:${dbPath}` }); const db = drizzle(client); // Create main metadata table await db.run(` CREATE TABLE IF NOT EXISTS vector_chunks ( id TEXT PRIMARY KEY, relpath TEXT NOT NULL, chunk TEXT NOT NULL, hash TEXT NOT NULL, mtime_ms INTEGER NOT NULL, created_at INTEGER DEFAULT (unixepoch('now', 'subsec') * 1000) ); `); // Create regular indexes on metadata table await db.run(` CREATE INDEX IF NOT EXISTS relpath_idx ON vector_chunks(relpath); `); await db.run(` CREATE INDEX IF NOT EXISTS hash_idx ON vector_chunks(hash); `); // Create VSS virtual table for vector search try { await db.run(` CREATE VIRTUAL TABLE IF NOT EXISTS vss_vectors USING vss0( embedding(1536) ); `); logger.log('Vector index created successfully'); } catch (error) { // VSS extension might not be available in all libsql versions logger.warn('Could not create VSS virtual table, will use fallback search:', error); // Fallback: add embedding column to main table for fallback search try { await db.run(` ALTER TABLE vector_chunks ADD COLUMN embedding F32_BLOB(1536); `); } catch (alterError) { // Column might already exist logger.debug('Embedding column already exists or alter failed:', alterError); } } return { db, client, path: dbPath }; } catch (error) { throw new Error(`Failed to initialize vector database: ${error instanceof Error ? error.message : String(error)}`); } }
  • Related helper for vector search, uses cosine similarity or VSS for querying indexed chunks (used by search-vectors tool).
    export async function searchVectors(options: SearchOptions): Promise<SearchResult[]> { const { projectPath, query, provider, limit = 10, similarityThreshold = 0.7 } = options; // Get query embedding const queryEmbedding = await provider.getEmbedding(query); // Get database const { db, client } = await getVectorDB(projectPath); try { // Try to use native vector search if available let results: Array<{ id: string; relpath: string; chunk: string; embedding: Buffer; distance: number; }>; try { // Try VSS virtual table first const vectorResult = await client.execute({ sql: `SELECT vc.id, vc.relpath, vc.chunk, vs.distance FROM vss_vectors vs JOIN vector_chunks vc ON vc.rowid = vs.rowid WHERE vss_search(vs.embedding, ?) ORDER BY vs.distance LIMIT ?`, args: [new Float32Array(queryEmbedding).buffer, limit] }); results = vectorResult.rows.map(row => ({ id: row[0] as string, relpath: row[1] as string, chunk: row[2] as string, embedding: Buffer.alloc(0), // Not needed for VSS results distance: row[3] as number, })); } catch (error) { // Fallback to manual cosine similarity if vector extension not available logger.warn('Native vector search failed, using fallback:', error); // Try to get embeddings from main table first (fallback storage) let fallbackResult = await client.execute({ sql: 'SELECT id, relpath, chunk, embedding FROM vector_chunks WHERE embedding IS NOT NULL', args: [] }); const allChunks = fallbackResult.rows.map(row => ({ id: row[0] as string, relpath: row[1] as string, chunk: row[2] as string, embedding: row[3] as Buffer, })); // Calculate cosine similarity for each chunk const withSimilarity = allChunks.map(chunk => { // Fix: libsql returns Uint8Array, need to access .buffer for Float32Array const embedBuffer = chunk.embedding instanceof Buffer ? chunk.embedding : Buffer.from((chunk.embedding as Uint8Array).buffer); const chunkEmbedding = bufferToFloat32Array(embedBuffer); const similarity = cosineSimilarity(queryEmbedding, Array.from(chunkEmbedding)); return { ...chunk, distance: 1 - similarity, // Convert similarity to distance }; }); // Sort by similarity and limit results = withSimilarity .sort((a, b) => a.distance - b.distance) .slice(0, limit); } // Convert distance to similarity and filter by threshold return results .map(result => ({ relpath: result.relpath, chunk: result.chunk, similarity: 1 - result.distance, chunkId: result.id, })) .filter(result => result.similarity >= similarityThreshold); } catch (error) { throw new Error(`Vector search failed: ${error instanceof Error ? error.message : String(error)}`); } }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/RealMikeChong/ultra-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server