Semantic Graph Search
graph_searchFind entities in your knowledge graph by describing them in natural language, even if your wording doesn't match stored names. Optionally explore connected nodes.
Instructions
Find entities semantically similar to a natural-language query, then optionally expand via graph traversal. Uses local sentence embeddings (bge-small-en, 384-dim) — no external API. Best when the user's wording doesn't match canonical entity names (e.g. "containers" → Docker, "AI tools" → Claude Code/Anthropic SDK). Falls back to graph_query if no embeddings available.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | Natural-language query (any phrasing — synonyms and paraphrases work). | |
| top_k | No | How many semantically similar entities to retrieve as seeds (default 10). | |
| min_similarity | No | Minimum cosine similarity threshold (default 0.5). | |
| entity_types | No | Restrict results to these entity types. | |
| expand | No | If true (default), also return the immediate graph neighbours of each seed. | |
| expand_min_weight | No | Min edge weight when expanding (default 0.3). |
Implementation Reference
- src/mcp-server/index.ts:1325-1412 (handler)The graph_search tool handler function. Takes a natural-language query, embeds it via embedText(), performs vector similarity search against the Neo4j entity_embedding index (tenant-scoped), and optionally expands results via graph traversal. Returns seeds (similar entities) and optionally expansion (neighbor edges/nodes).
}, async (args) => { try { // 1. Embed the query const { embedText } = await import("../shared/embeddings.js"); let queryVec: number[]; try { queryVec = await embedText(args.query); } catch (err) { const e = err instanceof Error ? err : new Error(String(err)); return toolError(`graph_search: embedder unavailable (${e.message}). Try graph_query instead.`); } // 2. Vector similarity search → seeds (tenant-scoped) const tenantId = currentTenant(); const seeds = await client.vectorSearch(tenantId, queryVec, { top_k: args.top_k ?? 10, min_similarity: args.min_similarity ?? 0.5, entity_types: args.entity_types as EntityType[] | undefined, }); if (seeds.length === 0) { return toolResult({ query: args.query, seeds: [], expansion: null, note: "No entities matched at the given similarity threshold. Try lowering min_similarity or check that embeddings have been backfilled (graph_stats > schema or check startup logs).", }); } // 3. Optionally expand: for each seed, find the top edges const expansionEdges: Array<{ from: string; to: string; from_name: string; to_name: string; relation: string; weight: number }> = []; const expansionNodes = new Map<string, { id: string; name: string; type: string; from_seed: string }>(); if (args.expand !== false) { const minWeight = args.expand_min_weight ?? 0.3; const seedIds = seeds.slice(0, 5).map((s) => s.id); // Expand only top 5 seeds to keep payload tight const expansionRows = await client.runReadQuery( ` MATCH (a:Entity {tenant_id: $tenantId})-[r]-(b:Entity {tenant_id: $tenantId}) WHERE a.id IN $seedIds AND r.weight > $minWeight RETURN a.id AS from_id, a.name AS from_name, b.id AS to_id, b.name AS to_name, [l IN labels(b) WHERE l <> 'Entity'][0] AS to_type, type(r) AS relation, r.weight AS weight ORDER BY r.weight DESC LIMIT 30 `, { tenantId, seedIds, minWeight }, ); for (const row of expansionRows) { const fromId = String(row["from_id"]); const toId = String(row["to_id"]); const toName = String(row["to_name"] ?? ""); const toType = String(row["to_type"] ?? "?"); expansionEdges.push({ from: fromId, to: toId, from_name: String(row["from_name"] ?? ""), to_name: toName, relation: String(row["relation"] ?? ""), weight: Number(row["weight"] ?? 0), }); // Don't include seeds themselves in the expansion node list if (!seeds.find((s) => s.id === toId)) { expansionNodes.set(toId, { id: toId, name: toName, type: toType, from_seed: fromId }); } } } return toolResult({ query: args.query, seeds, expansion: args.expand === false ? null : { nodes: Array.from(expansionNodes.values()), edges: expansionEdges, node_count: expansionNodes.size, edge_count: expansionEdges.length, }, }); } catch (err) { const e = err instanceof Error ? err : new Error(String(err)); return toolError(`graph_search failed: ${e.message}`); } }); - src/mcp-server/index.ts:1280-1324 (registration)Registration of the graph_search tool via server.registerTool('graph_search', ...). Defines title, description, inputSchema (query, top_k, min_similarity, entity_types, expand, expand_min_weight), and annotations (readOnlyHint).
server.registerTool("graph_search", { title: "Semantic Graph Search", description: "Find entities semantically similar to a natural-language query, then optionally expand via " + "graph traversal. Uses local sentence embeddings (bge-small-en, 384-dim) — no external API. " + "Best when the user's wording doesn't match canonical entity names (e.g. \"containers\" → Docker, " + "\"AI tools\" → Claude Code/Anthropic SDK). Falls back to graph_query if no embeddings available.", inputSchema: { query: z .string() .min(1) .describe("Natural-language query (any phrasing — synonyms and paraphrases work)."), top_k: z .number() .int() .min(1) .max(50) .optional() .default(10) .describe("How many semantically similar entities to retrieve as seeds (default 10)."), min_similarity: z .number() .min(0) .max(1) .optional() .default(0.5) .describe("Minimum cosine similarity threshold (default 0.5)."), entity_types: z .array(z.enum(ENTITY_TYPES)) .optional() .describe("Restrict results to these entity types."), expand: z .boolean() .optional() .default(true) .describe("If true (default), also return the immediate graph neighbours of each seed."), expand_min_weight: z .number() .min(0) .max(1) .optional() .default(0.3) .describe("Min edge weight when expanding (default 0.3)."), }, annotations: { readOnlyHint: true }, - src/mcp-server/index.ts:1287-1323 (schema)Zod input schema for graph_search: query (string, required), top_k (1-50, default 10), min_similarity (0-1, default 0.5), entity_types (optional array of ENTITY_TYPES), expand (boolean, default true), expand_min_weight (0-1, default 0.3).
inputSchema: { query: z .string() .min(1) .describe("Natural-language query (any phrasing — synonyms and paraphrases work)."), top_k: z .number() .int() .min(1) .max(50) .optional() .default(10) .describe("How many semantically similar entities to retrieve as seeds (default 10)."), min_similarity: z .number() .min(0) .max(1) .optional() .default(0.5) .describe("Minimum cosine similarity threshold (default 0.5)."), entity_types: z .array(z.enum(ENTITY_TYPES)) .optional() .describe("Restrict results to these entity types."), expand: z .boolean() .optional() .default(true) .describe("If true (default), also return the immediate graph neighbours of each seed."), expand_min_weight: z .number() .min(0) .max(1) .optional() .default(0.3) .describe("Min edge weight when expanding (default 0.3)."), }, - src/shared/embeddings.ts:39-46 (helper)The embedText() helper function used by graph_search to embed the natural-language query string into a 384-dim vector using bge-small-en-v1.5 (via @huggingface/transformers pipeline). Returns a number array.
export async function embedText(text: string): Promise<number[]> { const cleaned = text.trim(); if (!cleaned) return new Array<number>(EMBEDDING_DIM).fill(0); const e = await getEmbedder(); const result = await e(cleaned, { pooling: "mean", normalize: true }); // result.data is a Float32Array of length 384 return Array.from(result.data as Float32Array); } - src/shared/neo4j-client.ts:1765-1808 (helper)The vectorSearch() method on Neo4jClient called by graph_search. Queries the 'entity_embedding' vector index with over-request tenant-filtering strategy, returning entities with similarity scores.
async vectorSearch( tenantId: string, queryEmbedding: number[], options: { top_k?: number; min_similarity?: number; entity_types?: EntityType[] } = {}, ): Promise<Array<{ id: string; name: string; type: string; score: number; confidence: number }>> { const topK = options.top_k ?? 10; const minSim = options.min_similarity ?? 0.5; const candidatePool = Math.max(topK * 4, 40); // over-request, then filter by tenant const typeFilter = options.entity_types && options.entity_types.length > 0 ? `AND ANY(l IN labels(node) WHERE l IN $types)` : ""; const rows = await this.run( ` CALL db.index.vector.queryNodes('entity_embedding', $candidatePool, $queryEmbedding) YIELD node, score WHERE node.tenant_id = $tenantId AND score >= $minSim ${typeFilter} RETURN node.id AS id, node.name AS name, [l IN labels(node) WHERE l <> 'Entity'][0] AS type, node.confidence AS confidence, score ORDER BY score DESC LIMIT $topK `, { tenantId, candidatePool, topK, queryEmbedding, minSim, ...(options.entity_types && options.entity_types.length > 0 && { types: options.entity_types }), }, ); return rows.map((r) => ({ id: String(r["id"]), name: String(r["name"] ?? ""), type: String(r["type"] ?? "?"), confidence: Number(r["confidence"] ?? 0), score: Number(r["score"] ?? 0), })); }