Skip to main content
Glama
us-all

openmetadata-mcp-server

by us-all

semantic-search

Search OpenMetadata entities with natural language queries. Uses vector embeddings to find relevant tables, dashboards, and other metadata filtered by type, owner, tags, domains, tier, or service.

Instructions

Natural-language semantic search over OpenMetadata entities using vector embeddings (requires OM 1.12+ with semantic search enabled)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYesNatural language search text. Example: 'customer demographics purchase history'
sizeNoNumber of distinct entities to return (max 100)
kNoKNN parameter — number of nearest neighbors to consider (max 10,000)
thresholdNoMinimum similarity score (0.0–1.0) to include in results
entityTypeNoFilter by entity types. Example: ['table','dashboard']
ownersNoFilter by owner names
tagsNoFilter by tag FQNs. Example: ['PII.Sensitive']
domainsNoFilter by domain names
tierNoFilter by tier. Example: ['Tier.Tier1']
serviceTypeNoFilter by service type. Example: ['Postgres']

Implementation Reference

  • The main handler function for the semantic-search tool. It POSTs a query to /search/vector/query on the OpenMetadata API with optional filters for entityType, owners, tags, domains, tier, and serviceType.
    export async function semanticSearch(params: z.infer<typeof semanticSearchSchema>) {
      const filters: Record<string, string[]> = {};
      if (params.entityType?.length) filters.entityType = params.entityType;
      if (params.owners?.length) filters.owners = params.owners;
      if (params.tags?.length) filters.tags = params.tags;
      if (params.domains?.length) filters.domains = params.domains;
      if (params.tier?.length) filters.tier = params.tier;
      if (params.serviceType?.length) filters.serviceType = params.serviceType;
    
      return omClient.post("/search/vector/query", {
        query: params.query,
        size: Math.min(params.size ?? 10, 100),
        k: Math.min(params.k ?? 500, 10000),
        threshold: params.threshold ?? 0.0,
        filters: Object.keys(filters).length > 0 ? filters : undefined,
      });
    }
  • Zod schema defining the input parameters for semantic-search: query (string), size (default 10), k (default 500), threshold (default 0.0), and optional array filters: entityType, owners, tags, domains, tier, serviceType.
    export const semanticSearchSchema = z.object({
      query: z.string().describe("Natural language search text. Example: 'customer demographics purchase history'"),
      size: z.coerce.number().optional().default(10).describe("Number of distinct entities to return (max 100)"),
      k: z.coerce.number().optional().default(500).describe("KNN parameter — number of nearest neighbors to consider (max 10,000)"),
      threshold: z.coerce.number().optional().default(0.0).describe("Minimum similarity score (0.0–1.0) to include in results"),
      entityType: z.array(z.string()).optional().describe("Filter by entity types. Example: ['table','dashboard']"),
      owners: z.array(z.string()).optional().describe("Filter by owner names"),
      tags: z.array(z.string()).optional().describe("Filter by tag FQNs. Example: ['PII.Sensitive']"),
      domains: z.array(z.string()).optional().describe("Filter by domain names"),
      tier: z.array(z.string()).optional().describe("Filter by tier. Example: ['Tier.Tier1']"),
      serviceType: z.array(z.string()).optional().describe("Filter by service type. Example: ['Postgres']"),
    });
  • src/index.ts:171-171 (registration)
    Registration of the 'semantic-search' tool with the MCP server under the 'search' category, using semanticSearchSchema and wrapped semanticSearch handler.
    tool("semantic-search", "Natural-language semantic search over OpenMetadata entities using vector embeddings (requires OM 1.12+ with semantic search enabled)", semanticSearchSchema.shape, wrapToolHandler(semanticSearch));
  • The 'search' category declaration in the tool registry, listing semantic-search alongside search-metadata and suggest-metadata.
    "search",          // search-metadata, suggest-metadata, semantic-search
  • wrapToolHandler utility used to wrap the semanticSearch handler for error handling and redaction.
    export const wrapToolHandler = createWrapToolHandler({
      redactionPatterns: [/OPENMETADATA_TOKEN/i],
      errorExtractors: [
        {
          match: (error) => error instanceof WriteBlockedError,
          extract: (error) => ({
            kind: "passthrough",
            text: (error as WriteBlockedError).message,
          }),
        },
        {
          match: (error) => error instanceof OpenMetadataError,
          extract: (error) => {
            const err = error as OpenMetadataError;
            return {
              kind: "structured",
              data: {
                message: err.message,
                status: err.status,
                details: err.body,
              },
            };
          },
        },
      ],
    });
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must carry the full burden. It mentions 'search' and 'vector embeddings' but does not disclose whether the operation is read-only, has side effects, or requires specific permissions. This omission is significant for a tool that likely performs read-only queries.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that conveys essential information (what it does, how it works, prerequisites) without extraneous text. Every part earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 10 parameters, no output schema, and no annotations, the description is too brief. It does not explain what the output looks like (e.g., list of entities with scores), how filters interact, or limitations. For a search tool with many filtering options, this is incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers all parameters with descriptions (100% coverage), so the baseline is 3. The description adds no extra detail about parameters beyond the schema; it only describes the overall functionality. Minimal added value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the tool performs natural-language semantic search using vector embeddings, explicitly distinct from sibling search tools like search-metadata or suggest-metadata. It mentions required version (OM 1.12+) and feature enablement, leaving no ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states the prerequisite environment (OM 1.12+, semantic search enabled) but does not provide guidance on when NOT to use this tool versus alternatives. No explicit exclusions or comparisons to other search methods are given, making it adequate but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/us-all/openmetadata-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server