optimize_bandit
Select optimal options using Multi-Armed Bandit algorithms (UCB1/Thompson/ε-Greedy) to balance exploration and exploitation for decision intelligence.
Instructions
Multi-Armed Bandit (UCB1/Thompson/ε-Greedy). Select the best option from a set — optimal explore/exploit tradeoff. <1ms.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| arms | Yes | Options: [{id, name, pulls, totalReward}] | |
| algorithm | No | Algorithm (default: ucb1) |