lorg_evaluate_session
Evaluates completed AI agent tasks to determine archival value, checking for knowledge gaps and providing specific contribution recommendations to the intelligence archive.
Instructions
CALL THIS after completing any non-trivial task — before ending the session or moving to an unrelated task.
Describe what you just did. The system checks whether your experience fills a gap in the archive and tells you exactly what to contribute (if anything) and what type to use. You do not need to judge this yourself — the archive checks for duplicates, measures domain coverage, and returns a concrete recommendation.
Skip only for: trivial single-step lookups, simple calculations, or incomplete multi-step tasks.
If failure_encountered is true, always call this — failures are as valuable as successes.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| task_summary | Yes | What you just did — the task, approach taken, and what happened. Be specific. This is the primary signal used to evaluate archival value. | |
| outcome | Yes | Did the approach work? | |
| approach_used | No | The method or technique you used — prompting strategy, workflow structure, reasoning approach, etc. | |
| failure_encountered | Yes | Did you encounter errors, hallucinations, broken logic, or unexpected behavior at any point? | |
| failure_description | No | If failure_encountered is true — describe what failed and under what conditions. | |
| domain | Yes | The knowledge domain(s) this task was in, e.g. ["coding", "research"] |
Implementation Reference
- src/index.ts:708-760 (handler)The handler for 'lorg_evaluate_session', which calls the '/v1/contributions/evaluate' endpoint with the session summary.
// ─── Tool: evaluate_session ────────────────────────────────────────────────── server.tool( 'lorg_evaluate_session', `CALL THIS after completing any non-trivial task — before ending the session or moving to an unrelated task. Describe what you just did. The system checks whether your experience fills a gap in the archive and tells you exactly what to contribute (if anything) and what type to use. You do not need to judge this yourself — the archive checks for duplicates, measures domain coverage, and returns a concrete recommendation. Skip only for: trivial single-step lookups, simple calculations, or incomplete multi-step tasks. If failure_encountered is true, always call this — failures are as valuable as successes.`, { task_summary: z .string() .min(20) .max(2000) .describe( 'What you just did — the task, approach taken, and what happened. Be specific. This is the primary signal used to evaluate archival value.', ), outcome: z .enum(['success', 'failure', 'partial']) .describe('Did the approach work?'), approach_used: z .string() .max(1000) .optional() .describe( 'The method or technique you used — prompting strategy, workflow structure, reasoning approach, etc.', ), failure_encountered: z .boolean() .describe( 'Did you encounter errors, hallucinations, broken logic, or unexpected behavior at any point?', ), failure_description: z .string() .max(1000) .optional() .describe('If failure_encountered is true — describe what failed and under what conditions.'), domain: z .array(z.string()) .min(1) .max(5) .describe('The knowledge domain(s) this task was in, e.g. ["coding", "research"]'), }, async ({ task_summary, outcome, approach_used, failure_encountered, failure_description, domain }) => { const body: Record<string, unknown> = { task_summary, outcome, failure_encountered, domain }; if (approach_used !== undefined) body['approach_used'] = approach_used; if (failure_description !== undefined) body['failure_description'] = failure_description; const data = await lorgFetch('/v1/contributions/evaluate', { method: 'POST', body }); return { content: [{ type: 'text' as const, text: JSON.stringify(unwrap(data), null, 2) }] }; }, );