Skip to main content
Glama

Validate Graph Entities

graph_validate
Read-only

Scan recently extracted entities and edges for quality issues: generic names, type mismatches, near-duplicates, and extreme confidence. Catch bad data before it settles into the graph.

Instructions

Scan recently extracted entities and edges for quality issues: generic names, reference language, type mismatches, near-duplicate names, and extreme confidence values. Call this after a dream process extraction batch to catch bad data before it settles into the graph. Returns up to max_issues records of shape {entity_id, name, type, issue, severity} where severity is high/medium/low. Read-only — pair with graph_delete or graph_unmerge to act on flagged items.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
source_sessionNoLimit checks to entities extracted in this session. Omit to scan the whole graph.
max_issuesNoMaximum number of issues to return (default 50).

Implementation Reference

  • Registration of the 'graph_validate' tool via server.registerTool() with inputSchema, annotations, and handler.
    server.registerTool("graph_validate", {
      title: "Validate Graph Entities",
      description:
        "Scan recently extracted entities and edges for quality issues: generic names, reference language, " +
        "type mismatches, near-duplicate names, and extreme confidence values. " +
        "Call this after a dream process extraction batch to catch bad data before it settles into the graph. " +
        "Returns up to `max_issues` records of shape `{entity_id, name, type, issue, severity}` where severity is high/medium/low. " +
        "Read-only — pair with graph_delete or graph_unmerge to act on flagged items.",
      inputSchema: {
        source_session: z
          .string()
          .optional()
          .describe("Limit checks to entities extracted in this session. Omit to scan the whole graph."),
        max_issues: z
          .number()
          .int()
          .min(1)
          .max(200)
          .optional()
          .default(50)
          .describe("Maximum number of issues to return (default 50)."),
      },
      annotations: { readOnlyHint: true },
    }, async ({ source_session, max_issues = 50 }) => {
      const issues: Array<{ entity_id: string; name: string; type: string; issue: string; severity: "high" | "medium" | "low" }> = [];
    
      try {
        const tenantId = currentTenant();
        // Session filter: optional additional narrowing within the tenant.
        const sessionAndForOrphan = source_session
          ? `AND (n.source_session = $session OR EXISTS { MATCH (n)-[r]-() WHERE r.source_session = $session })`
          : "";
        const sessionAndForRest = source_session
          ? `AND (n.source_session = $session OR EXISTS { MATCH (n)-[r]-() WHERE r.source_session = $session })`
          : "";
        const params: Record<string, unknown> = source_session
          ? { tenantId, session: source_session }
          : { tenantId };
    
        // 1. Generic / blocklisted names (tenant-scoped)
        const genericRows = await client.runReadQuery(`
          MATCH (n:Entity {tenant_id: $tenantId})
          WHERE 1=1 ${sessionAndForRest}
          WITH n, toLower(trim(n.name)) AS lname
          WHERE size(lname) < 3
             OR lname IN $blocklist
          RETURN n.id AS id, n.name AS name, labels(n) AS labels, n.confidence AS confidence
          LIMIT $limit
        `, { ...params, blocklist: [...GENERIC_NAME_BLOCKLIST], limit: Math.ceil(max_issues / 4) });
    
        for (const row of genericRows) {
          const name = String(row["name"] ?? "");
          const type = ((row["labels"] as string[]) ?? []).find((l) => l !== "Entity") ?? "?";
          const lname = name.toLowerCase().trim();
          const reason = lname.length < 3 ? "name too short (< 3 chars)" : `generic blocklisted name "${lname}"`;
          issues.push({ entity_id: String(row["id"]), name, type, issue: reason, severity: "high" });
        }
    
        // 2. Reference-language names (tenant-scoped)
        const allNameRows = await client.runReadQuery(`
          MATCH (n:Entity {tenant_id: $tenantId})
          WHERE 1=1 ${sessionAndForRest}
          RETURN n.id AS id, n.name AS name, labels(n) AS labels, n.confidence AS confidence
          LIMIT 2000
        `, params);
    
        for (const row of allNameRows) {
          if (issues.length >= max_issues) break;
          const name = String(row["name"] ?? "");
          const lname = name.toLowerCase().trim();
          const type = ((row["labels"] as string[]) ?? []).find((l) => l !== "Entity") ?? "?";
          for (const prefix of REFERENCE_PREFIXES) {
            if (lname.startsWith(prefix) && lname.length < 40) {
              issues.push({
                entity_id: String(row["id"]),
                name,
                type,
                issue: `name starts with reference language "${prefix.trim()}" — extract the noun instead`,
                severity: "high",
              });
              break;
            }
          }
        }
    
        // 3. Orphaned new entities (tenant-scoped)
        const orphanRows = await client.runReadQuery(`
          MATCH (n:Entity {tenant_id: $tenantId})
          WHERE NOT (n)-[]-()
            AND n.confidence <= 0.4
            AND n.times_mentioned <= 1
            ${sessionAndForOrphan}
          RETURN n.id AS id, n.name AS name, labels(n) AS labels, n.confidence AS confidence
          LIMIT $limit
        `, { ...params, limit: Math.ceil(max_issues / 4) });
    
        for (const row of orphanRows) {
          if (issues.length >= max_issues) break;
          const type = ((row["labels"] as string[]) ?? []).find((l) => l !== "Entity") ?? "?";
          issues.push({
            entity_id: String(row["id"]),
            name: String(row["name"]),
            type,
            issue: `isolated entity with no edges and confidence ${Number(row["confidence"] ?? 0).toFixed(2)} — may be a spurious extraction`,
            severity: "low",
          });
        }
    
        // 4. Near-duplicate names (tenant-scoped, case-insensitive)
        const dupRows = await client.runReadQuery(`
          MATCH (a:Entity {tenant_id: $tenantId}), (b:Entity {tenant_id: $tenantId})
          WHERE id(a) < id(b)
            AND toLower(trim(a.name)) = toLower(trim(b.name))
            AND a.id <> b.id
          RETURN a.id AS id_a, a.name AS name_a, labels(a) AS labels_a,
                 b.id AS id_b, b.name AS name_b, labels(b) AS labels_b
          LIMIT $limit
        `, { tenantId, limit: Math.ceil(max_issues / 4) });
    
        for (const row of dupRows) {
          if (issues.length >= max_issues) break;
          const typeA = ((row["labels_a"] as string[]) ?? []).find((l) => l !== "Entity") ?? "?";
          const typeB = ((row["labels_b"] as string[]) ?? []).find((l) => l !== "Entity") ?? "?";
          issues.push({
            entity_id: String(row["id_a"]),
            name: String(row["name_a"]),
            type: typeA,
            issue: `near-duplicate: same name as entity ${row["id_b"]} (${row["name_b"]}, type ${typeB}) — consider merging with graph_relate ALIAS_OF or deleting one`,
            severity: "medium",
          });
        }
    
        const summary = {
          total_issues: issues.length,
          by_severity: {
            high: issues.filter((i) => i.severity === "high").length,
            medium: issues.filter((i) => i.severity === "medium").length,
            low: issues.filter((i) => i.severity === "low").length,
          },
          scope: source_session ? `session:${source_session}` : "full graph",
          issues: issues.slice(0, max_issues),
        };
    
        return toolResult(summary);
      } catch (err) {
        const e = err instanceof Error ? err : new Error(String(err));
        return toolError(`graph_validate failed: ${e.message}`);
      }
    });
  • Input schema (Zod) for graph_validate: optional source_session string and max_issues number (1-200, default 50).
    inputSchema: {
      source_session: z
        .string()
        .optional()
        .describe("Limit checks to entities extracted in this session. Omit to scan the whole graph."),
      max_issues: z
        .number()
        .int()
        .min(1)
        .max(200)
        .optional()
        .default(50)
        .describe("Maximum number of issues to return (default 50)."),
    },
    annotations: { readOnlyHint: true },
  • Handler function that performs validation checks: generic name blocklist, reference-language prefixes, orphaned entities, and near-duplicate names, returning issues with severity levels.
    }, async ({ source_session, max_issues = 50 }) => {
      const issues: Array<{ entity_id: string; name: string; type: string; issue: string; severity: "high" | "medium" | "low" }> = [];
    
      try {
        const tenantId = currentTenant();
        // Session filter: optional additional narrowing within the tenant.
        const sessionAndForOrphan = source_session
          ? `AND (n.source_session = $session OR EXISTS { MATCH (n)-[r]-() WHERE r.source_session = $session })`
          : "";
        const sessionAndForRest = source_session
          ? `AND (n.source_session = $session OR EXISTS { MATCH (n)-[r]-() WHERE r.source_session = $session })`
          : "";
        const params: Record<string, unknown> = source_session
          ? { tenantId, session: source_session }
          : { tenantId };
    
        // 1. Generic / blocklisted names (tenant-scoped)
        const genericRows = await client.runReadQuery(`
          MATCH (n:Entity {tenant_id: $tenantId})
          WHERE 1=1 ${sessionAndForRest}
          WITH n, toLower(trim(n.name)) AS lname
          WHERE size(lname) < 3
             OR lname IN $blocklist
          RETURN n.id AS id, n.name AS name, labels(n) AS labels, n.confidence AS confidence
          LIMIT $limit
        `, { ...params, blocklist: [...GENERIC_NAME_BLOCKLIST], limit: Math.ceil(max_issues / 4) });
    
        for (const row of genericRows) {
          const name = String(row["name"] ?? "");
          const type = ((row["labels"] as string[]) ?? []).find((l) => l !== "Entity") ?? "?";
          const lname = name.toLowerCase().trim();
          const reason = lname.length < 3 ? "name too short (< 3 chars)" : `generic blocklisted name "${lname}"`;
          issues.push({ entity_id: String(row["id"]), name, type, issue: reason, severity: "high" });
        }
    
        // 2. Reference-language names (tenant-scoped)
        const allNameRows = await client.runReadQuery(`
          MATCH (n:Entity {tenant_id: $tenantId})
          WHERE 1=1 ${sessionAndForRest}
          RETURN n.id AS id, n.name AS name, labels(n) AS labels, n.confidence AS confidence
          LIMIT 2000
        `, params);
    
        for (const row of allNameRows) {
          if (issues.length >= max_issues) break;
          const name = String(row["name"] ?? "");
          const lname = name.toLowerCase().trim();
          const type = ((row["labels"] as string[]) ?? []).find((l) => l !== "Entity") ?? "?";
          for (const prefix of REFERENCE_PREFIXES) {
            if (lname.startsWith(prefix) && lname.length < 40) {
              issues.push({
                entity_id: String(row["id"]),
                name,
                type,
                issue: `name starts with reference language "${prefix.trim()}" — extract the noun instead`,
                severity: "high",
              });
              break;
            }
          }
        }
    
        // 3. Orphaned new entities (tenant-scoped)
        const orphanRows = await client.runReadQuery(`
          MATCH (n:Entity {tenant_id: $tenantId})
          WHERE NOT (n)-[]-()
            AND n.confidence <= 0.4
            AND n.times_mentioned <= 1
            ${sessionAndForOrphan}
          RETURN n.id AS id, n.name AS name, labels(n) AS labels, n.confidence AS confidence
          LIMIT $limit
        `, { ...params, limit: Math.ceil(max_issues / 4) });
    
        for (const row of orphanRows) {
          if (issues.length >= max_issues) break;
          const type = ((row["labels"] as string[]) ?? []).find((l) => l !== "Entity") ?? "?";
          issues.push({
            entity_id: String(row["id"]),
            name: String(row["name"]),
            type,
            issue: `isolated entity with no edges and confidence ${Number(row["confidence"] ?? 0).toFixed(2)} — may be a spurious extraction`,
            severity: "low",
          });
        }
    
        // 4. Near-duplicate names (tenant-scoped, case-insensitive)
        const dupRows = await client.runReadQuery(`
          MATCH (a:Entity {tenant_id: $tenantId}), (b:Entity {tenant_id: $tenantId})
          WHERE id(a) < id(b)
            AND toLower(trim(a.name)) = toLower(trim(b.name))
            AND a.id <> b.id
          RETURN a.id AS id_a, a.name AS name_a, labels(a) AS labels_a,
                 b.id AS id_b, b.name AS name_b, labels(b) AS labels_b
          LIMIT $limit
        `, { tenantId, limit: Math.ceil(max_issues / 4) });
    
        for (const row of dupRows) {
          if (issues.length >= max_issues) break;
          const typeA = ((row["labels_a"] as string[]) ?? []).find((l) => l !== "Entity") ?? "?";
          const typeB = ((row["labels_b"] as string[]) ?? []).find((l) => l !== "Entity") ?? "?";
          issues.push({
            entity_id: String(row["id_a"]),
            name: String(row["name_a"]),
            type: typeA,
            issue: `near-duplicate: same name as entity ${row["id_b"]} (${row["name_b"]}, type ${typeB}) — consider merging with graph_relate ALIAS_OF or deleting one`,
            severity: "medium",
          });
        }
    
        const summary = {
          total_issues: issues.length,
          by_severity: {
            high: issues.filter((i) => i.severity === "high").length,
            medium: issues.filter((i) => i.severity === "medium").length,
            low: issues.filter((i) => i.severity === "low").length,
          },
          scope: source_session ? `session:${source_session}` : "full graph",
          issues: issues.slice(0, max_issues),
        };
    
        return toolResult(summary);
      } catch (err) {
        const e = err instanceof Error ? err : new Error(String(err));
        return toolError(`graph_validate failed: ${e.message}`);
      }
    });
  • Supporting constants: GENERIC_NAME_BLOCKLIST (set of generic terms that should never be entity names) and REFERENCE_PREFIXES (prefixes indicating reference language).
    const GENERIC_NAME_BLOCKLIST = new Set([
      "it", "this", "that", "the", "a", "an", "some", "thing", "things",
      "item", "items", "something", "anything", "everything", "nothing",
      "one", "other", "another", "each", "all", "both", "they", "them",
      "we", "i", "you", "he", "she", "data", "info", "information",
      "here", "there", "now", "then", "later", "unknown", "various",
      "server", "client", "system", "process", "service", "tool",
    ]);
    
    // Prefixes that indicate reference language rather than entity names
    const REFERENCE_PREFIXES = ["the ", "this ", "that ", "a ", "an ", "my ", "our ", "your ", "their "];
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the readOnlyHint annotation, the description adds details about return shape, severity levels, and the types of issues checked, enhancing the agent's understanding of behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with front-loaded purpose and zero wasted words, efficiently conveying purpose, usage, output, and related tools.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with only two parameters, no output schema, and a read-only annotation, the description fully covers the purpose, usage context, return shape, and companion tools, making it complete for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and both parameters have adequate descriptions in the schema. The tool description does not add significant new meaning beyond the schema (e.g., explaining source_session as limiting to a session), so a baseline score is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it scans for specific quality issues (generic names, reference language, type mismatches, etc.) and distinguishes from siblings like graph_delete and graph_unmerge by noting it is read-only.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Clearly advises calling after a dream process extraction batch and suggests pairing with graph_delete or graph_unmerge to act on flagged items, providing explicit when-to-use and alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/stevepridemore/graph-memory'

If you have feedback or need assistance with the MCP directory API, please join our Discord server