Skip to main content
Glama
orneryd

M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

by orneryd
ORCHESTRATOR_README.mdβ€’3.34 kB
# Agent Validation Orchestrator Automated testing and validation for agent preambles using LangChain + GitHub Copilot API. --- ## πŸ“ Structure ``` src/orchestrator/ β”œβ”€β”€ validate-agent.ts # Main validation script β”œβ”€β”€ llm-client.ts # GitHub Copilot client wrapper β”œβ”€β”€ report-generator.ts # Report formatting └── evaluators/ └── index.ts # LLM-as-judge evaluators ``` --- ## πŸš€ Usage ### Validate Single Agent ```bash npm run validate docs/agents/claudette-debug.md benchmarks/debug-benchmark.json ``` **Output**: ``` πŸ” Validating agent: docs/agents/claudette-debug.md πŸ“‹ Benchmark: benchmarks/debug-benchmark.json βš™οΈ Executing benchmark task... βœ… Task completed in 12,451 tokens πŸ“Š Evaluating output against rubric... Bug Discovery: 32/35 Root Cause Analysis: 18/20 Methodology: 19/20 Process Quality: 14/15 Production Impact: 9/10 πŸ“ˆ Total score: 92/100 πŸ“„ Report saved to: validation-output/2025-10-15_claudette-debug.md ``` --- ## πŸ“¦ Prerequisites 1. **GitHub Copilot proxy running**: ```bash copilot-api start & ``` 2. **GitHub CLI authenticated**: ```bash gh auth status ``` 3. **Dependencies installed**: ```bash npm install ``` --- ## πŸ”§ Components ### `llm-client.ts` - Wraps GitHub Copilot Chat API via copilot-api proxy - Loads agent preambles as system prompts - Executes benchmark tasks - Captures conversation history and token usage ### `evaluators/index.ts` - LLM-as-judge pattern for automated scoring - Evaluates agent output against rubric categories - Returns scores and feedback ### `report-generator.ts` - Formats validation results as Markdown - Includes scoring breakdown, agent output, and conversation history ### `validate-agent.ts` - Main orchestration script - CLI entry point - Ties all components together --- ## πŸ“Š Output Files ### JSON (Raw Data) ``` validation-output/2025-10-15_claudette-debug.json ``` Contains: - Timestamp - Agent path - Benchmark path - Full conversation history - Token usage - Scores and feedback ### Markdown (Readable Report) ``` validation-output/2025-10-15_claudette-debug.md ``` Contains: - Scoring breakdown - Category feedback - Agent output - Token usage - Full conversation transcript --- ## 🎯 Creating Benchmarks See `benchmarks/debug-benchmark.json` for example format: ```json { "name": "Benchmark Name", "description": "What this tests", "task": "Instructions for the agent...", "rubric": { "categories": [ { "name": "Category Name", "maxPoints": 35, "criteria": [ "Criterion 1", "Criterion 2" ] } ] } } ``` --- ## πŸ› Troubleshooting ### "Connection refused" error ```bash # Ensure copilot-api is running copilot-api start & # Check it's listening curl http://localhost:4141/v1/models ``` ### "Authentication failed" error ```bash # Re-authenticate with GitHub CLI gh auth login # Verify gh auth status ``` ### TypeScript errors ```bash # Rebuild npm run build # Or run directly with ts-node ts-node src/orchestrator/validate-agent.ts <agent> <benchmark> ``` --- ## πŸ“š See Also - **Setup Guide**: `tools/SETUP.md` - **Full Design**: `docs/agents/VALIDATION_TOOL_DESIGN.md` - **Benchmark Examples**: `benchmarks/`

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server