mimir_bench
Records task metrics (turns, tokens, success) and memory recall usage to measure agent performance; aggregate with recall data to analyze trends.
Instructions
Record a performance benchmark data point. Tracks task metrics (turns taken, tokens used, success) alongside whether memory recall was used — enabling measurement of Mimir's impact on agent performance. Aggregate with mimir_recall to analyze trends.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| tags | No | Tags for categorization | |
| session_id | No | Session identifier for traceability | |
| tokens_used | Yes | Total tokens consumed by the task | |
| turns_taken | Yes | Number of conversation turns the task took | |
| recall_count | No | How many times memory was recalled during this task | |
| task_success | No | Whether the task completed successfully | |
| task_description | Yes | Description of the task being measured | |
| memory_recall_used | Yes | Whether memory recall (mimir_recall) was used during this task |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| entity_id | No | Created benchmark entity ID | |
| created_at_unix_ms | No |