Semantic Graph Search

graph_search

Read-only

Find entities in your knowledge graph by describing them in natural language, even if your wording doesn't match stored names. Optionally explore connected nodes.

Instructions

Find entities semantically similar to a natural-language query, then optionally expand via graph traversal. Uses local sentence embeddings (bge-small-en, 384-dim) — no external API. Best when the user's wording doesn't match canonical entity names (e.g. "containers" → Docker, "AI tools" → Claude Code/Anthropic SDK). Falls back to graph_query if no embeddings available.

Input Schema

TableJSON Schema

Name	Required	Description
`query`	Yes	Natural-language query (any phrasing — synonyms and paraphrases work).
`top_k`	No	How many semantically similar entities to retrieve as seeds (default 10).
`min_similarity`	No	Minimum cosine similarity threshold (default 0.5).
`entity_types`	No	Restrict results to these entity types.
`expand`	No	If true (default), also return the immediate graph neighbours of each seed.
`expand_min_weight`	No	Min edge weight when expanding (default 0.3).

Implementation Reference

src/mcp-server/index.ts:1325-1412 (handler)

The graph_search tool handler function. Takes a natural-language query, embeds it via embedText(), performs vector similarity search against the Neo4j entity_embedding index (tenant-scoped), and optionally expands results via graph traversal. Returns seeds (similar entities) and optionally expansion (neighbor edges/nodes).

}, async (args) => {
  try {
    // 1. Embed the query
    const { embedText } = await import("../shared/embeddings.js");
    let queryVec: number[];
    try {
      queryVec = await embedText(args.query);
    } catch (err) {
      const e = err instanceof Error ? err : new Error(String(err));
      return toolError(`graph_search: embedder unavailable (${e.message}). Try graph_query instead.`);
    }

    // 2. Vector similarity search → seeds (tenant-scoped)
    const tenantId = currentTenant();
    const seeds = await client.vectorSearch(tenantId, queryVec, {
      top_k: args.top_k ?? 10,
      min_similarity: args.min_similarity ?? 0.5,
      entity_types: args.entity_types as EntityType[] | undefined,
    });

    if (seeds.length === 0) {
      return toolResult({
        query: args.query,
        seeds: [],
        expansion: null,
        note: "No entities matched at the given similarity threshold. Try lowering min_similarity or check that embeddings have been backfilled (graph_stats > schema or check startup logs).",
      });
    }

    // 3. Optionally expand: for each seed, find the top edges
    const expansionEdges: Array<{ from: string; to: string; from_name: string; to_name: string; relation: string; weight: number }> = [];
    const expansionNodes = new Map<string, { id: string; name: string; type: string; from_seed: string }>();

    if (args.expand !== false) {
      const minWeight = args.expand_min_weight ?? 0.3;
      const seedIds = seeds.slice(0, 5).map((s) => s.id); // Expand only top 5 seeds to keep payload tight

      const expansionRows = await client.runReadQuery(
        `
        MATCH (a:Entity {tenant_id: $tenantId})-[r]-(b:Entity {tenant_id: $tenantId})
        WHERE a.id IN $seedIds AND r.weight > $minWeight
        RETURN a.id AS from_id, a.name AS from_name,
               b.id AS to_id, b.name AS to_name,
               [l IN labels(b) WHERE l <> 'Entity'][0] AS to_type,
               type(r) AS relation, r.weight AS weight
        ORDER BY r.weight DESC
        LIMIT 30
        `,
        { tenantId, seedIds, minWeight },
      );

      for (const row of expansionRows) {
        const fromId = String(row["from_id"]);
        const toId = String(row["to_id"]);
        const toName = String(row["to_name"] ?? "");
        const toType = String(row["to_type"] ?? "?");

        expansionEdges.push({
          from: fromId,
          to: toId,
          from_name: String(row["from_name"] ?? ""),
          to_name: toName,
          relation: String(row["relation"] ?? ""),
          weight: Number(row["weight"] ?? 0),
        });

        // Don't include seeds themselves in the expansion node list
        if (!seeds.find((s) => s.id === toId)) {
          expansionNodes.set(toId, { id: toId, name: toName, type: toType, from_seed: fromId });
        }
      }
    }

    return toolResult({
      query: args.query,
      seeds,
      expansion: args.expand === false ? null : {
        nodes: Array.from(expansionNodes.values()),
        edges: expansionEdges,
        node_count: expansionNodes.size,
        edge_count: expansionEdges.length,
      },
    });
  } catch (err) {
    const e = err instanceof Error ? err : new Error(String(err));
    return toolError(`graph_search failed: ${e.message}`);
  }
});

src/mcp-server/index.ts:1280-1324 (registration)

Registration of the graph_search tool via server.registerTool('graph_search', ...). Defines title, description, inputSchema (query, top_k, min_similarity, entity_types, expand, expand_min_weight), and annotations (readOnlyHint).

server.registerTool("graph_search", {
  title: "Semantic Graph Search",
  description:
    "Find entities semantically similar to a natural-language query, then optionally expand via " +
    "graph traversal. Uses local sentence embeddings (bge-small-en, 384-dim) — no external API. " +
    "Best when the user's wording doesn't match canonical entity names (e.g. \"containers\" → Docker, " +
    "\"AI tools\" → Claude Code/Anthropic SDK). Falls back to graph_query if no embeddings available.",
  inputSchema: {
    query: z
      .string()
      .min(1)
      .describe("Natural-language query (any phrasing — synonyms and paraphrases work)."),
    top_k: z
      .number()
      .int()
      .min(1)
      .max(50)
      .optional()
      .default(10)
      .describe("How many semantically similar entities to retrieve as seeds (default 10)."),
    min_similarity: z
      .number()
      .min(0)
      .max(1)
      .optional()
      .default(0.5)
      .describe("Minimum cosine similarity threshold (default 0.5)."),
    entity_types: z
      .array(z.enum(ENTITY_TYPES))
      .optional()
      .describe("Restrict results to these entity types."),
    expand: z
      .boolean()
      .optional()
      .default(true)
      .describe("If true (default), also return the immediate graph neighbours of each seed."),
    expand_min_weight: z
      .number()
      .min(0)
      .max(1)
      .optional()
      .default(0.3)
      .describe("Min edge weight when expanding (default 0.3)."),
  },
  annotations: { readOnlyHint: true },

src/mcp-server/index.ts:1287-1323 (schema)

Zod input schema for graph_search: query (string, required), top_k (1-50, default 10), min_similarity (0-1, default 0.5), entity_types (optional array of ENTITY_TYPES), expand (boolean, default true), expand_min_weight (0-1, default 0.3).

inputSchema: {
  query: z
    .string()
    .min(1)
    .describe("Natural-language query (any phrasing — synonyms and paraphrases work)."),
  top_k: z
    .number()
    .int()
    .min(1)
    .max(50)
    .optional()
    .default(10)
    .describe("How many semantically similar entities to retrieve as seeds (default 10)."),
  min_similarity: z
    .number()
    .min(0)
    .max(1)
    .optional()
    .default(0.5)
    .describe("Minimum cosine similarity threshold (default 0.5)."),
  entity_types: z
    .array(z.enum(ENTITY_TYPES))
    .optional()
    .describe("Restrict results to these entity types."),
  expand: z
    .boolean()
    .optional()
    .default(true)
    .describe("If true (default), also return the immediate graph neighbours of each seed."),
  expand_min_weight: z
    .number()
    .min(0)
    .max(1)
    .optional()
    .default(0.3)
    .describe("Min edge weight when expanding (default 0.3)."),
},

src/shared/embeddings.ts:39-46 (helper)

The embedText() helper function used by graph_search to embed the natural-language query string into a 384-dim vector using bge-small-en-v1.5 (via @huggingface/transformers pipeline). Returns a number array.

export async function embedText(text: string): Promise<number[]> {
  const cleaned = text.trim();
  if (!cleaned) return new Array<number>(EMBEDDING_DIM).fill(0);
  const e = await getEmbedder();
  const result = await e(cleaned, { pooling: "mean", normalize: true });
  // result.data is a Float32Array of length 384
  return Array.from(result.data as Float32Array);
}

src/shared/neo4j-client.ts:1765-1808 (helper)

The vectorSearch() method on Neo4jClient called by graph_search. Queries the 'entity_embedding' vector index with over-request tenant-filtering strategy, returning entities with similarity scores.

async vectorSearch(
  tenantId: string,
  queryEmbedding: number[],
  options: { top_k?: number; min_similarity?: number; entity_types?: EntityType[] } = {},
): Promise<Array<{ id: string; name: string; type: string; score: number; confidence: number }>> {
  const topK = options.top_k ?? 10;
  const minSim = options.min_similarity ?? 0.5;
  const candidatePool = Math.max(topK * 4, 40); // over-request, then filter by tenant

  const typeFilter = options.entity_types && options.entity_types.length > 0
    ? `AND ANY(l IN labels(node) WHERE l IN $types)`
    : "";

  const rows = await this.run(
    `
    CALL db.index.vector.queryNodes('entity_embedding', $candidatePool, $queryEmbedding)
    YIELD node, score
    WHERE node.tenant_id = $tenantId AND score >= $minSim ${typeFilter}
    RETURN node.id AS id,
           node.name AS name,
           [l IN labels(node) WHERE l <> 'Entity'][0] AS type,
           node.confidence AS confidence,
           score
    ORDER BY score DESC
    LIMIT $topK
    `,
    {
      tenantId,
      candidatePool,
      topK,
      queryEmbedding,
      minSim,
      ...(options.entity_types && options.entity_types.length > 0 && { types: options.entity_types }),
    },
  );

  return rows.map((r) => ({
    id: String(r["id"]),
    name: String(r["name"] ?? ""),
    type: String(r["type"] ?? "?"),
    confidence: Number(r["confidence"] ?? 0),
    score: Number(r["score"] ?? 0),
  }));
}

Graph-Memory

Semantic Graph Search

Instructions

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API