Skip to main content
Glama
us-all
by us-all

failed-tests-summary

Aggregate dbt failed tests and data quality check failures by dataset, with the most recent failing rows. Replaces multiple tool calls for faster troubleshooting.

Instructions

Aggregated 24h-ish view: dbt failed tests + DQ checks failures grouped by dataset + most recent failing rows. Replaces 3+ tool calls (dbt-failed-tests + dq-failed-checks-by-dataset + dq-list-checks).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
recentRunsNoLook at last N dbt runs
sinceHoursNoRecent window for DQ checks

Implementation Reference

  • Main handler function that aggregates dbt failed tests, DQ check failures by dataset, and most recent DQ failures into a single summary. Uses aggregate() to run all three data sources in parallel, returns combined result with caveats.
    export async function failedTestsSummary(
      args: z.infer<typeof failedTestsSummarySchema>,
    ): Promise<unknown> {
      const caveats: string[] = [];
      const { dbt, dqByDataset, dqLatest } = await aggregate(
        {
          dbt: () => dbtFailedTests({ recentRuns: args.recentRuns }),
          dqByDataset: () =>
            dqConfigured()
              ? dqFailedChecksByDataset({ sinceHours: args.sinceHours, topN: 20 })
              : Promise.resolve(null),
          dqLatest: () =>
            dqConfigured()
              ? dqListChecks({ sinceHours: args.sinceHours, status: "fail", limit: 50 })
              : Promise.resolve(null),
        },
        caveats,
      );
      if (!dqConfigured()) caveats.push("DQ_RESULTS_TABLE not configured — quality category skipped");
      return {
        window: { recentRuns: args.recentRuns, sinceHours: args.sinceHours },
        dbtFailures: dbt,
        dqFailuresByDataset: dqByDataset,
        dqRecentFailures: dqLatest,
        caveats,
      };
    }
  • Zod schema defining input parameters: recentRuns (N dbt runs, 1-20, default 3) and sinceHours (DQ window, 1-720, default 24).
    export const failedTestsSummarySchema = z.object({
      recentRuns: z.coerce.number().int().min(1).max(20).default(3).describe("Look at last N dbt runs"),
      sinceHours: z.coerce.number().int().min(1).max(720).default(24).describe("Recent window for DQ checks"),
    });
  • src/index.ts:110-110 (registration)
    Registration call that binds the tool name 'failed-tests-summary' with its description, schema, and handler on the MCP server via the local tool() helper function.
    tool("failed-tests-summary", "Aggregated 24h-ish view: dbt failed tests + DQ checks failures grouped by dataset + most recent failing rows. Replaces 3+ tool calls (dbt-failed-tests + dq-failed-checks-by-dataset + dq-list-checks).", failedTestsSummarySchema.shape, wrapToolHandler(failedTestsSummary));
  • src/index.ts:66-74 (registration)
    The tool() helper function that registers the tool with the registry (categorizing it under 'quality') and conditionally adds it to the MCP server if the category is enabled.
    let currentCategory: Category = "dbt";
    
    // eslint-disable-next-line @typescript-eslint/no-explicit-any
    function tool(name: string, description: string, schema: any, handler: any): void {
      registry.register(name, description, currentCategory);
      if (registry.isEnabled(currentCategory)) {
        server.tool(name, description, schema, handler);
      }
    }
  • Reference in the 'failed-tests-deep-dive' prompt that instructs the LLM to call failed-tests-summary as the first step of investigation.
              `1. Call \`failed-tests-summary\` with recentRuns=${runs}, sinceHours=${hours}. Capture dbt failures and dq failures by dataset.`,
              "2. For the top 5 failing datasets, call `dbt-list-models` filtered by schema=<dataset> and shortlist models with the highest failure count.",
              "3. For each top failing test, call `dbt-get-test` to read the test definition and `dbt-graph` (upstream depth=2) on the attached model — note any failed upstream models or stale sources.",
              "4. Cross-check: call `freshness-status` (failingOnly=true) to see if any of these tests sit downstream of a freshness violation.",
              "5. If @us-all/airflow-mcp is also installed, suggest the user call `airflow-list-runs` for the loading DAGs of the affected datasets to confirm scheduling/run-time issues.",
              "6. Produce a remediation report:",
              "   - Top failing datasets (with failure counts and severity).",
              "   - Failures classified as: 'upstream broken' / 'source stale' / 'schema drift' / 'data anomaly' / 'unknown'.",
              "   - Per-failure: test name, attached model, severity, last failure timestamp, message, suggested action.",
              "   - Owners-to-notify based on dbt model meta (if present).",
            ].join("\n"),
          },
        },
      ],
    };
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions a '24h-ish view' and 'most recent failing rows' but does not specify exact time boundaries, data freshness guarantees, ordering, pagination, or potential performance implications. The vagueness reduces transparency significantly.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that efficiently communicates the tool's function and value proposition. No redundant words; every part earns its place by defining scope, inputs, and comparing to alternatives.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that there is no output schema, the description should explain return values. It mentions 'grouped by dataset' and 'most recent failing rows' but does not detail the exact structure, fields, or ordering. For an aggregation tool replacing three others, a more complete description of the output would be beneficial. A 3 reflects it provides partial but not full completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with both parameters well-described (recentRuns: 'Look at last N dbt runs', sinceHours: 'Recent window for DQ checks'). The description adds minimal extra meaning—'24h-ish view' aligns with the default sinceHours value. Baseline score of 3 is appropriate as the schema already provides clear parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides an aggregated view of dbt failed tests and DQ checks grouped by dataset with recent failing rows. It distinguishes from sibling tools by listing the three tools it replaces (dbt-failed-tests, dq-failed-checks-by-dataset, dq-list-checks), making its unique value explicit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says it 'Replaces 3+ tool calls' which implies using this tool instead of making multiple individual calls when a combined view is needed. While it lacks explicit when-not-to-use guidance, the mention of alternatives is strong. A 4 is appropriate as it gives clear context for when to opt for this tool over its siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/us-all/dbt-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server