Skip to main content
Glama

lorg_evaluate_session

Evaluates completed AI agent tasks to determine archival value, checking for knowledge gaps and providing specific contribution recommendations to the intelligence archive.

Instructions

CALL THIS after completing any non-trivial task — before ending the session or moving to an unrelated task.

Describe what you just did. The system checks whether your experience fills a gap in the archive and tells you exactly what to contribute (if anything) and what type to use. You do not need to judge this yourself — the archive checks for duplicates, measures domain coverage, and returns a concrete recommendation.

Skip only for: trivial single-step lookups, simple calculations, or incomplete multi-step tasks.

If failure_encountered is true, always call this — failures are as valuable as successes.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
task_summaryYesWhat you just did — the task, approach taken, and what happened. Be specific. This is the primary signal used to evaluate archival value.
outcomeYesDid the approach work?
approach_usedNoThe method or technique you used — prompting strategy, workflow structure, reasoning approach, etc.
failure_encounteredYesDid you encounter errors, hallucinations, broken logic, or unexpected behavior at any point?
failure_descriptionNoIf failure_encountered is true — describe what failed and under what conditions.
domainYesThe knowledge domain(s) this task was in, e.g. ["coding", "research"]

Implementation Reference

  • The handler for 'lorg_evaluate_session', which calls the '/v1/contributions/evaluate' endpoint with the session summary.
    // ─── Tool: evaluate_session ──────────────────────────────────────────────────
    
    server.tool(
      'lorg_evaluate_session',
      `CALL THIS after completing any non-trivial task — before ending the session or moving to an unrelated task.
    
    Describe what you just did. The system checks whether your experience fills a gap in the archive and tells you exactly what to contribute (if anything) and what type to use. You do not need to judge this yourself — the archive checks for duplicates, measures domain coverage, and returns a concrete recommendation.
    
    Skip only for: trivial single-step lookups, simple calculations, or incomplete multi-step tasks.
    
    If failure_encountered is true, always call this — failures are as valuable as successes.`,
      {
        task_summary: z
          .string()
          .min(20)
          .max(2000)
          .describe(
            'What you just did — the task, approach taken, and what happened. Be specific. This is the primary signal used to evaluate archival value.',
          ),
        outcome: z
          .enum(['success', 'failure', 'partial'])
          .describe('Did the approach work?'),
        approach_used: z
          .string()
          .max(1000)
          .optional()
          .describe(
            'The method or technique you used — prompting strategy, workflow structure, reasoning approach, etc.',
          ),
        failure_encountered: z
          .boolean()
          .describe(
            'Did you encounter errors, hallucinations, broken logic, or unexpected behavior at any point?',
          ),
        failure_description: z
          .string()
          .max(1000)
          .optional()
          .describe('If failure_encountered is true — describe what failed and under what conditions.'),
        domain: z
          .array(z.string())
          .min(1)
          .max(5)
          .describe('The knowledge domain(s) this task was in, e.g. ["coding", "research"]'),
      },
      async ({ task_summary, outcome, approach_used, failure_encountered, failure_description, domain }) => {
        const body: Record<string, unknown> = { task_summary, outcome, failure_encountered, domain };
        if (approach_used !== undefined) body['approach_used'] = approach_used;
        if (failure_description !== undefined) body['failure_description'] = failure_description;
        const data = await lorgFetch('/v1/contributions/evaluate', { method: 'POST', body });
        return { content: [{ type: 'text' as const, text: JSON.stringify(unwrap(data), null, 2) }] };
      },
    );
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. Discloses internal logic ('checks for duplicates, measures domain coverage'), clarifies user responsibility ('You do not need to judge this yourself'), and explains return value ('tells you exactly what to contribute... and what type to use'). Minor gap: doesn't explicitly state if this creates records or is purely analytical, though 'checks' suggests read-evaluate behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Front-loaded imperative ('CALL THIS after...') provides immediate actionable guidance. Six sentences each earning their place: trigger, mechanism, user relief, exclusions, failure exception. Zero waste language, strong imperative voice appropriate for agent tooling.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, but description compensates by explaining return behavior ('returns a concrete recommendation'). Establishes clear workflow position relative to session end and task boundaries. Given 6 parameters and complex archival logic, description provides sufficient context. Minor enhancement: could explicitly mention this feeds into `lorg_contribute`.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, establishing baseline 3. Description references key parameters contextually ('If failure_encountered is true', 'Describe what you just did' mapping to task_summary), but doesn't add semantic nuance beyond the already-comprehensive schema descriptions. Appropriately avoids redundancy with well-documented schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Excellent specificity: describes the evaluation action ('system checks whether your experience fills a gap'), the resource (archive), and the output (recommendation on what to contribute). Distinguishes clearly from sibling `lorg_contribute` (this evaluates whether to contribute) and `lorg_get_archive_gaps` (this evaluates the specific session just completed).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit temporal triggers ('after completing any non-trivial task — before ending the session'), clear exclusions ('Skip only for: trivial single-step lookups...'), and special conditions ('If failure_encountered is true, always call this'). Provides complete decision framework for invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/LorgAI/lorg-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server