Skip to main content
Glama
orneryd

M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

by orneryd
README.mdβ€’3.85 kB
# Agent Validation Tools Automated testing and validation for agent preambles using LangChain + GitHub Copilot API. --- ## πŸš€ Quick Start ```bash # 1. Setup (10 minutes) gh auth login pip install langchain-github-copilot langchain-core npm install @langchain/core @langchain/community langchain # 2. Verify python3 -c "from langchain_github_copilot import ChatGitHubCopilot; llm = ChatGitHubCopilot(); print('βœ…', llm.invoke('Hi').content)" # 3. Build validation tool # See VALIDATION_TOOL_DESIGN.md for implementation code ``` --- ## πŸ“š Documentation - **[SETUP.md](./SETUP.md)** - 10-minute setup guide - **[VALIDATION_TOOL_DESIGN.md](../docs/agents/VALIDATION_TOOL_DESIGN.md)** - Full implementation with code - **[VALIDATION_SUMMARY.md](../docs/agents/VALIDATION_SUMMARY.md)** - Overview and architecture --- ## 🎯 What This Does Automatically test agent preambles by: 1. **Loading** agent preamble as system prompt 2. **Executing** benchmark task via GitHub Copilot API 3. **Capturing** output and conversation history 4. **Scoring** against rubric using LLM-as-judge 5. **Generating** detailed reports (JSON + Markdown) --- ## πŸ—οΈ Architecture ``` TypeScript Tool β†’ Python Bridge β†’ GitHub Copilot API (GPT-4 + Claude) ``` **Why GitHub Copilot?** - βœ… Uses existing subscription (no new costs) - βœ… High quality (GPT-4 + Claude models) - βœ… Simple setup (just authenticate) - βœ… Fast (cloud inference) --- ## πŸ“¦ Files to Create ``` tools/ β”œβ”€β”€ llm-client.ts # Copilot client (TypeScript β†’ Python) β”œβ”€β”€ validate-agent.ts # Main validation script β”œβ”€β”€ evaluators/ β”‚ └── index.ts # LLM-as-judge evaluators └── report-generator.ts # Report formatting ``` Full code provided in `VALIDATION_TOOL_DESIGN.md`. --- ## 🎯 Usage Examples ### Validate Single Agent ```bash npm run validate docs/agents/claudette-debug.md benchmarks/debug-benchmark.json ``` ### Test Agentinator (Two-Hop) ```bash npm run validate:agentinator -- \ --agentinator docs/agents/claudette-agentinator.md \ --requirement "Design debug agent" \ --benchmark benchmarks/debug-benchmark.json \ --baseline 92 ``` --- ## πŸ“Š Output ### Terminal ``` πŸ” Validating agent: claudette-debug.md βš™οΈ Executing benchmark task... βœ… Task completed in 12,451 tokens πŸ“Š Evaluating output against rubric... πŸ“ˆ Total score: 92/100 πŸ“„ Report saved to: validation-output/2025-10-15_claudette-debug.md ``` ### Files Generated ``` validation-output/ β”œβ”€β”€ 2025-10-15_claudette-debug.json # Raw data └── 2025-10-15_claudette-debug.md # Readable report ``` --- ## ⏱️ Timeline | Phase | Task | Time | |-------|------|------| | Setup | Authenticate + install | 10 min | | Implement | Create tool files | 4 hours | | Benchmarks | Define tasks + rubrics | 1 hour | | Test | First validation | 30 min | | **Total** | **Working system** | **5.5 hours** | --- ## πŸ”§ Requirements - **Node.js** 18+ (for TypeScript tool) - **Python** 3.8+ (for Copilot integration) - **GitHub Copilot** subscription (already have) - **GitHub CLI** (`gh`) for authentication --- ## πŸš€ Next Steps 1. **Setup** (10 min): Run commands in `SETUP.md` 2. **Implement** (4 hours): Copy code from `VALIDATION_TOOL_DESIGN.md` 3. **Test** (30 min): Validate `claudette-debug.md` baseline 4. **Iterate** (ongoing): Test Agentinator-generated agents --- ## πŸ“– See Also - `docs/agents/AGENTIC_PROMPTING_FRAMEWORK.md` - Principles for agent design - `docs/agents/claudette-agentinator.md` - Meta-agent that builds agents - `docs/agents/claudette-debug.md` - Gold standard debug agent (92/100) - `benchmarks/RESEARCH_AGENT_BENCHMARK.md` - Benchmark example --- **Status**: Design complete, ready for implementation.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server