🤖 Running predictive query…
predictExecute predictive queries to generate model predictions for binary, multi-class, regression, or temporal link prediction tasks without retraining.
Instructions
Execute a predictive query and return model predictions.
The graph needs to be materialized and the session needs to be authenticated before the KumoRFM model can start generating predictions.
The output prediction format depends on the given task type.
Binary classification: | ENTITY | ANCHOR_TIMESTAMP | TARGET_PRED | False_PROB | True_PROB | where 'ENTITY' holds the entity ID, 'ANCHOR_TIMESTAMP' holds the anchor time of the prediction in unix format, 'TARGET_PRED' holds the final prediction based on a threshold of 0.5, and 'False_PROB' and 'True_PROB' hold the probabilities.
Multi-class classification: | ENTITY | ANCHOR_TIMESTAMP | CLASS | SCORE | PREDICTED | where 'ENTITY' holds the entity ID, 'ANCHOR_TIMESTAMP' holds the anchor time of the prediction in unix format. Each row corresponds to an (ENTITY, CLASS) pair (up to 10 classes are reported), where 'CLASS' holds the predicted value, 'SCORE' holds its probability, and 'PREDICTED' denotes whether the (ENTITY, CLASS) pair has the highest likelihood.
Regression: | ENTITY | ANCHOR_TIMESTAMP | TARGET_PRED | where 'ENTITY' holds the entity ID, 'ANCHOR_TIMESTAMP' holds the anchor time of the prediction in unix format, and 'TARGET_PRED' holds the predicted numerical value.
Temporal link prediction: | ENTITY | ANCHOR_TIMESTAMP | CLASS | SCORE | where 'ENTITY' holds the entity ID, 'ANCHOR_TIMESTAMP' holds the anchor time of the prediction in unix format. Each row corresponds to an (ENTITY, CLASS) pair, where 'CLASS' holds the recommended item and 'SCORE' holds its likelihood.
Important: Before executing or suggesting any predictive queries, read the documentation first at 'kumo://docs/predictive-query'.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | The predictive query string, e.g., 'PREDICT COUNT(orders.*, 0, 30, days)>0 FOR EACH users.user_id' or 'PREDICT users.age FOR EACH users.user_id' | |
| indices | Yes | The primary keys (entity indices) to generate predictions for. Up to 1000 entities are supported for an individual query. Predictions will be generated for all indices, regardless of whether they match any entity filter constraints. | |
| anchor_time | No | The anchor time for which we are making a prediction for the the future. If `None`, will use the maximum timestamp in the data as anchor time. If 'entity', will use the timestamp of the entity's time column as anchor time (only valid for static predictive queries for which the entity table contains a time column), which is useful to prevent future data leakage when imputing missing values on facts, e.g., predicting whether a transaction is fraudulent should happen at the point in time the transaction was created. | |
| run_mode | No | The run mode for the query. Trades runtime with model performance. The run mode dictates how many training/in-context examples are sampled to make a prediction, i.e. 1000 for 'fast', 5000 for 'normal', and 10000 for 'best'. | fast |
| num_neighbors | No | The number of neighbors to sample for each hop to create subgraphs. For example, `[24, 12]` samples 24 neighbors in the first hop and 12 neighbors in the second hop. If `None` (recommended), will use two-hop sampling with 32 neighbors in 'fast' mode, and 64 neighbors otherwise in each hop. Up to 6-hop subgraphs are supported. Decreasing the number of neighbors per hop can prevent oversmoothing. Increasing the number of neighbors per hop allows the model to look at a larger historical time window. Increasing the number of hops can improve performance in case important signal is far away from the entity table, but can result in massive subgraphs. We advise to let the number of neighbors gradually shrink down in later hops to prevent recursive neighbor explosion, e.g., `num_neighbors=[32, 32, 4, 4, 2, 2]`, if more hops are required. | |
| max_pq_iterations | No | The maximum number of iterations to perform to collect valid training/in-context examples. It is advised to increase the number of iterations in case the model fails to find the upper bound of supported training examples w.r.t. the run mode, *i.e.* 1000 for 'fast', 5000 for 'normal' and 10000 for 'best'. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| predictions | No | The predictions, where each row holds information about the entity, the anchor time, and the prediction scores | |
| logs | No | Prediction-specific log messages such as number of context examples, the underlying task type and the label distribution |