Skip to main content
Glama
kumo-ai

KumoRFM MCP Server

Official
by kumo-ai

📊 Evaluating predictive query…

evaluate
Read-onlyIdempotent

Evaluate predictive query performance by comparing predictions with known historical labels, returning metrics for classification, regression, or link prediction tasks.

Instructions

Evaluate a predictive query and return performance metrics which compares predictions against known ground-truth labels from historical examples.

The graph needs to be materialized and the session needs to be authenticated before the KumoRFM model can start evaluating.

Take the label distribution of the predictive query in the output logs into account when analyzing the returned metrics.

Important: Before executing or suggesting any predictive queries, read the documentation first at 'kumo://docs/predictive-query'.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYesThe predictive query string, e.g., 'PREDICT COUNT(orders.*, 0, 30, days)>0 FOR EACH users.user_id' or 'PREDICT users.age FOR EACH users.user_id'
metricsNoThe metrics to use for evaluation. If `None`, will use a pre-selection of metrics depending on the given predictive query. The following metricsare supported: Binary classification: 'acc', 'precision', 'recall', 'f1', 'auroc', 'auprc', 'ap' Multi-class classification: 'acc', 'precision', 'recall', 'f1', 'mrr' Regression: 'mae', 'mape', 'mse', 'rmse', 'smape', 'r2' Temporal link prediction: 'map@k', 'ndcg@k', 'mrr@k', 'precision@k', 'recall@k', 'f1@k', 'hit_ratio@k' where 'k' needs to be an integer between 1 and 100
anchor_timeNoThe anchor time for which we are making a prediction for the the future. If `None`, will use the maximum timestamp in the data as anchor time. If 'entity', will use the timestamp of the entity's time column as anchor time (only valid for static predictive queries for which the entity table contains a time column), which is useful to prevent future data leakage when imputing missing values on facts, e.g., predicting whether a transaction is fraudulent should happen at the point in time the transaction was created.
run_modeNoThe run mode for the query. Trades runtime with model performance. The run mode dictates how many training/in-context examples are sampled to make a prediction, i.e. 1000 for 'fast', 5000 for 'normal', and 10000 for 'best'.fast
num_neighborsNoThe number of neighbors to sample for each hop to create subgraphs. For example, `[24, 12]` samples 24 neighbors in the first hop and 12 neighbors in the second hop. If `None` (recommended), will use two-hop sampling with 32 neighbors in 'fast' mode, and 64 neighbors otherwise in each hop. Up to 6-hop subgraphs are supported. Decreasing the number of neighbors per hop can prevent oversmoothing. Increasing the number of neighbors per hop allows the model to look at a larger historical time window. Increasing the number of hops can improve performance in case important signal is far away from the entity table, but can result in massive subgraphs. We advise to let the number of neighbors gradually shrink down in later hops to prevent recursive neighbor explosion, e.g., `num_neighbors=[32, 32, 4, 4, 2, 2]`, if more hops are required.
max_pq_iterationsNoThe maximum number of iterations to perform to collect valid training/in-context examples. It is advised to increase the number of iterations in case the model fails to find the upper bound of supported training examples w.r.t. the run mode, *i.e.* 1000 for 'fast', 5000 for 'normal' and 10000 for 'best'.

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
metricsNoThe metric value for every metric
logsNoEvaluation-specific log messages such as number of context and test examples, the underlying task type and the label distribution
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, non-destructive, idempotent behavior. The description adds valuable context about prerequisites (graph materialization, authentication) and a tip to consider label distribution, further guiding agent behavior without contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in 4 sentences with the main purpose upfront. It is concise but could be slightly tighter; the 'Important' line is relevant but somewhat separate.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, output schema exists), the description covers prerequisites and a usage tip. Return values are not detailed but output schema exists, so this is acceptable. Minor gaps in describing output format are compensated by schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% parameter description coverage, so description adds no extra parameter details beyond the schema. Baseline score of 3 is appropriate as the schema already does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool evaluates a predictive query and returns performance metrics comparing predictions to ground truth. It distinguishes from sibling tools like 'predict' and 'explain' by focusing on post-hoc evaluation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides important prerequisites (materialized graph, authenticated session) and advises reading documentation. It does not explicitly exclude alternatives or specify when not to use, but context is clear for evaluation tasks.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kumo-ai/kumo-rfm-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server