Skip to main content
Glama

Validate Graph Entities

graph_validate
Read-only

Scan recently extracted entities and edges for quality issues like generic names, type mismatches, and near-duplicates. Returns issues with severity ratings to help catch bad data before it enters the graph.

Instructions

Scan recently extracted entities and edges for quality issues: generic names, reference language, type mismatches, near-duplicate names, and extreme confidence values. Call this after a dream process extraction batch to catch bad data before it settles into the graph. Returns up to max_issues records of shape {entity_id, name, type, issue, severity} where severity is high/medium/low. Read-only — pair with graph_delete or graph_unmerge to act on flagged items.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
source_sessionNoLimit checks to entities extracted in this session. Omit to scan the whole graph.
max_issuesNoMaximum number of issues to return (default 50).

Implementation Reference

  • Tool registration for graph_validate. Defines title, description, inputSchema (Zod-based), and readOnlyHint annotation. The handler is the async function starting at line 836.
    // ─── Tool: graph_validate ───
    
    // Single-word generic terms that should never be entity names
    const GENERIC_NAME_BLOCKLIST = new Set([
      "it", "this", "that", "the", "a", "an", "some", "thing", "things",
      "item", "items", "something", "anything", "everything", "nothing",
      "one", "other", "another", "each", "all", "both", "they", "them",
      "we", "i", "you", "he", "she", "data", "info", "information",
      "here", "there", "now", "then", "later", "unknown", "various",
      "server", "client", "system", "process", "service", "tool",
    ]);
    
    // Prefixes that indicate reference language rather than entity names
    const REFERENCE_PREFIXES = ["the ", "this ", "that ", "a ", "an ", "my ", "our ", "your ", "their "];
    
    server.registerTool("graph_validate", {
      title: "Validate Graph Entities",
      description:
        "Scan recently extracted entities and edges for quality issues: generic names, reference language, " +
        "type mismatches, near-duplicate names, and extreme confidence values. " +
        "Call this after a dream process extraction batch to catch bad data before it settles into the graph. " +
        "Returns up to `max_issues` records of shape `{entity_id, name, type, issue, severity}` where severity is high/medium/low. " +
        "Read-only — pair with graph_delete or graph_unmerge to act on flagged items.",
      inputSchema: {
        source_session: z
          .string()
          .optional()
          .describe("Limit checks to entities extracted in this session. Omit to scan the whole graph."),
        max_issues: z
          .number()
          .int()
          .min(1)
          .max(200)
          .optional()
          .default(50)
          .describe("Maximum number of issues to return (default 50)."),
      },
      annotations: { readOnlyHint: true },
    }, async ({ source_session, max_issues = 50 }) => {
  • GENERIC_NAME_BLOCKLIST constant — a Set of single-word generic terms that should never be entity names (e.g., 'it', 'this', 'data', 'unknown'). Used by the graph_validate handler to flag low-quality entities.
    // Single-word generic terms that should never be entity names
    const GENERIC_NAME_BLOCKLIST = new Set([
      "it", "this", "that", "the", "a", "an", "some", "thing", "things",
      "item", "items", "something", "anything", "everything", "nothing",
      "one", "other", "another", "each", "all", "both", "they", "them",
      "we", "i", "you", "he", "she", "data", "info", "information",
      "here", "there", "now", "then", "later", "unknown", "various",
      "server", "client", "system", "process", "service", "tool",
    ]);
  • REFERENCE_PREFIXES constant — prefixes that indicate reference language rather than entity names (e.g., 'the ', 'this ', 'a '). Used by the graph_validate handler to flag poorly named entities.
    // Prefixes that indicate reference language rather than entity names
    const REFERENCE_PREFIXES = ["the ", "this ", "that ", "a ", "an ", "my ", "our ", "your ", "their "];
  • Main handler for graph_validate tool. Runs 4 quality checks via Cypher queries: (1) generic/blocklisted names, (2) reference-language names, (3) orphaned new entities with low confidence, (4) near-duplicate names. Returns issues array with entity_id, name, type, issue description, and severity (high/medium/low). Also returns summary with total_issues count and breakdown by severity.
    }, async ({ source_session, max_issues = 50 }) => {
      const issues: Array<{ entity_id: string; name: string; type: string; issue: string; severity: "high" | "medium" | "low" }> = [];
    
      try {
        const tenantId = currentTenant();
        // Session filter: optional additional narrowing within the tenant.
        const sessionAndForOrphan = source_session
          ? `AND (n.source_session = $session OR EXISTS { MATCH (n)-[r]-() WHERE r.source_session = $session })`
          : "";
        const sessionAndForRest = source_session
          ? `AND (n.source_session = $session OR EXISTS { MATCH (n)-[r]-() WHERE r.source_session = $session })`
          : "";
        const params: Record<string, unknown> = source_session
          ? { tenantId, session: source_session }
          : { tenantId };
    
        // 1. Generic / blocklisted names (tenant-scoped)
        const genericRows = await client.runReadQuery(`
          MATCH (n:Entity {tenant_id: $tenantId})
          WHERE 1=1 ${sessionAndForRest}
          WITH n, toLower(trim(n.name)) AS lname
          WHERE size(lname) < 3
             OR lname IN $blocklist
          RETURN n.id AS id, n.name AS name, labels(n) AS labels, n.confidence AS confidence
          LIMIT $limit
        `, { ...params, blocklist: [...GENERIC_NAME_BLOCKLIST], limit: Math.ceil(max_issues / 4) });
    
        for (const row of genericRows) {
          const name = String(row["name"] ?? "");
          const type = ((row["labels"] as string[]) ?? []).find((l) => l !== "Entity") ?? "?";
          const lname = name.toLowerCase().trim();
          const reason = lname.length < 3 ? "name too short (< 3 chars)" : `generic blocklisted name "${lname}"`;
          issues.push({ entity_id: String(row["id"]), name, type, issue: reason, severity: "high" });
        }
    
        // 2. Reference-language names (tenant-scoped)
        const allNameRows = await client.runReadQuery(`
          MATCH (n:Entity {tenant_id: $tenantId})
          WHERE 1=1 ${sessionAndForRest}
          RETURN n.id AS id, n.name AS name, labels(n) AS labels, n.confidence AS confidence
          LIMIT 2000
        `, params);
    
        for (const row of allNameRows) {
          if (issues.length >= max_issues) break;
          const name = String(row["name"] ?? "");
          const lname = name.toLowerCase().trim();
          const type = ((row["labels"] as string[]) ?? []).find((l) => l !== "Entity") ?? "?";
          for (const prefix of REFERENCE_PREFIXES) {
            if (lname.startsWith(prefix) && lname.length < 40) {
              issues.push({
                entity_id: String(row["id"]),
                name,
                type,
                issue: `name starts with reference language "${prefix.trim()}" — extract the noun instead`,
                severity: "high",
              });
              break;
            }
          }
        }
    
        // 3. Orphaned new entities (tenant-scoped)
        const orphanRows = await client.runReadQuery(`
          MATCH (n:Entity {tenant_id: $tenantId})
          WHERE NOT (n)-[]-()
            AND n.confidence <= 0.4
            AND n.times_mentioned <= 1
            ${sessionAndForOrphan}
          RETURN n.id AS id, n.name AS name, labels(n) AS labels, n.confidence AS confidence
          LIMIT $limit
        `, { ...params, limit: Math.ceil(max_issues / 4) });
    
        for (const row of orphanRows) {
          if (issues.length >= max_issues) break;
          const type = ((row["labels"] as string[]) ?? []).find((l) => l !== "Entity") ?? "?";
          issues.push({
            entity_id: String(row["id"]),
            name: String(row["name"]),
            type,
            issue: `isolated entity with no edges and confidence ${Number(row["confidence"] ?? 0).toFixed(2)} — may be a spurious extraction`,
            severity: "low",
          });
        }
    
        // 4. Near-duplicate names (tenant-scoped, case-insensitive)
        const dupRows = await client.runReadQuery(`
          MATCH (a:Entity {tenant_id: $tenantId}), (b:Entity {tenant_id: $tenantId})
          WHERE id(a) < id(b)
            AND toLower(trim(a.name)) = toLower(trim(b.name))
            AND a.id <> b.id
          RETURN a.id AS id_a, a.name AS name_a, labels(a) AS labels_a,
                 b.id AS id_b, b.name AS name_b, labels(b) AS labels_b
          LIMIT $limit
        `, { tenantId, limit: Math.ceil(max_issues / 4) });
    
        for (const row of dupRows) {
          if (issues.length >= max_issues) break;
          const typeA = ((row["labels_a"] as string[]) ?? []).find((l) => l !== "Entity") ?? "?";
          const typeB = ((row["labels_b"] as string[]) ?? []).find((l) => l !== "Entity") ?? "?";
          issues.push({
            entity_id: String(row["id_a"]),
            name: String(row["name_a"]),
            type: typeA,
            issue: `near-duplicate: same name as entity ${row["id_b"]} (${row["name_b"]}, type ${typeB}) — consider merging with graph_relate ALIAS_OF or deleting one`,
            severity: "medium",
          });
        }
    
        const summary = {
          total_issues: issues.length,
          by_severity: {
            high: issues.filter((i) => i.severity === "high").length,
            medium: issues.filter((i) => i.severity === "medium").length,
            low: issues.filter((i) => i.severity === "low").length,
          },
          scope: source_session ? `session:${source_session}` : "full graph",
          issues: issues.slice(0, max_issues),
        };
    
        return toolResult(summary);
      } catch (err) {
        const e = err instanceof Error ? err : new Error(String(err));
        return toolError(`graph_validate failed: ${e.message}`);
      }
    });
  • The slugify helper function used by graph_validate indirectly (for entity ID generation).
    const slugify = (s: string) => s.toLowerCase().replace(/[^a-z0-9]+/g, "-").replace(/^-|-$/g, "");
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses read-only nature consistent with annotations, describes return shape (entity_id, name, type, issue, severity) and severity levels. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise paragraph with clear lead sentence. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, usage timing, return format, severity levels, and how to act on results. No output schema but description compensates fully.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Parameters are fully described in the schema (100% coverage). The description restates max_issues behavior but adds no new semantic info beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool validates graph entities/edges for quality issues, listing specific checks (generic names, type mismatches). It distinguishes from siblings by focusing on post-extraction quality validation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Call this after a dream process extraction batch', giving clear context for use. It also mentions pairing with graph_delete/graph_unmerge for acting on issues, though it doesn't explicitly state when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/stevepridemore/graph-memory'

If you have feedback or need assistance with the MCP directory API, please join our Discord server