📊 Evaluating predictive query…
evaluateEvaluate predictive query performance by comparing predictions with known historical labels, returning metrics for classification, regression, or link prediction tasks.
Instructions
Evaluate a predictive query and return performance metrics which compares predictions against known ground-truth labels from historical examples.
The graph needs to be materialized and the session needs to be authenticated before the KumoRFM model can start evaluating.
Take the label distribution of the predictive query in the output logs into account when analyzing the returned metrics.
Important: Before executing or suggesting any predictive queries, read the documentation first at 'kumo://docs/predictive-query'.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | The predictive query string, e.g., 'PREDICT COUNT(orders.*, 0, 30, days)>0 FOR EACH users.user_id' or 'PREDICT users.age FOR EACH users.user_id' | |
| metrics | No | The metrics to use for evaluation. If `None`, will use a pre-selection of metrics depending on the given predictive query. The following metricsare supported: Binary classification: 'acc', 'precision', 'recall', 'f1', 'auroc', 'auprc', 'ap' Multi-class classification: 'acc', 'precision', 'recall', 'f1', 'mrr' Regression: 'mae', 'mape', 'mse', 'rmse', 'smape', 'r2' Temporal link prediction: 'map@k', 'ndcg@k', 'mrr@k', 'precision@k', 'recall@k', 'f1@k', 'hit_ratio@k' where 'k' needs to be an integer between 1 and 100 | |
| anchor_time | No | The anchor time for which we are making a prediction for the the future. If `None`, will use the maximum timestamp in the data as anchor time. If 'entity', will use the timestamp of the entity's time column as anchor time (only valid for static predictive queries for which the entity table contains a time column), which is useful to prevent future data leakage when imputing missing values on facts, e.g., predicting whether a transaction is fraudulent should happen at the point in time the transaction was created. | |
| run_mode | No | The run mode for the query. Trades runtime with model performance. The run mode dictates how many training/in-context examples are sampled to make a prediction, i.e. 1000 for 'fast', 5000 for 'normal', and 10000 for 'best'. | fast |
| num_neighbors | No | The number of neighbors to sample for each hop to create subgraphs. For example, `[24, 12]` samples 24 neighbors in the first hop and 12 neighbors in the second hop. If `None` (recommended), will use two-hop sampling with 32 neighbors in 'fast' mode, and 64 neighbors otherwise in each hop. Up to 6-hop subgraphs are supported. Decreasing the number of neighbors per hop can prevent oversmoothing. Increasing the number of neighbors per hop allows the model to look at a larger historical time window. Increasing the number of hops can improve performance in case important signal is far away from the entity table, but can result in massive subgraphs. We advise to let the number of neighbors gradually shrink down in later hops to prevent recursive neighbor explosion, e.g., `num_neighbors=[32, 32, 4, 4, 2, 2]`, if more hops are required. | |
| max_pq_iterations | No | The maximum number of iterations to perform to collect valid training/in-context examples. It is advised to increase the number of iterations in case the model fails to find the upper bound of supported training examples w.r.t. the run mode, *i.e.* 1000 for 'fast', 5000 for 'normal' and 10000 for 'best'. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| metrics | No | The metric value for every metric | |
| logs | No | Evaluation-specific log messages such as number of context and test examples, the underlying task type and the label distribution |