optimize_contextual
Select the best option using contextual bandit algorithm. Learns per-context preferences from features like user demographics or time of day to maximize reward.
Instructions
Pick the best option given a situational context vector (LinUCB contextual bandit). Use when the best option depends on features that vary per call (user demographics, time of day, weather, market regime). Pass observed history so the model can learn per-context preferences. If you have no per-call context features, use optimize_bandit instead. Returns selected arm with expected reward + confidence width.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| arms | Yes | ||
| context | Yes | Numeric feature vector describing the current situation. Length must match across calls. | |
| history | No | Optional past observations to seed the model. | |
| alpha | No | Exploration coefficient (default: 1.0). Higher = more exploration. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| selected | Yes | ||
| score | Yes | expectedReward + alpha * confidenceWidth. | |
| expectedReward | Yes | LinUCB point estimate of reward. | |
| confidenceWidth | Yes | Uncertainty bound on the estimate. | |
| algorithm | Yes |