Skip to main content
Glama
orneryd

M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

by orneryd
VALIDATION_TOOL_DESIGN.mdβ€’22.9 kB
# Agent Validation Tool Design (LangChain + GitHub Copilot API) ## πŸ“‹ Executive Summary **Final Solution**: GitHub Copilot Chat API (not local LLM, not manual orchestration) **Why This Approach**: - βœ… **Leverages existing subscription** (no new costs) - βœ… **High quality** (GPT-4 + Claude models) - βœ… **Simple setup** (just authenticate, no 18GB download) - βœ… **Fast** (cloud inference) - βœ… **Auto-updated** (always latest models) - βœ… **Pure Node.js** (no Python needed!) **Timeline**: 6.5 hours end-to-end (5 min setup + 4 hrs build + 2 hrs testing) --- ## Problem Statement **Goal**: Automatically test agent preambles without manual orchestration. **Requirements**: - βœ… Load agent preambles as system prompts - βœ… Execute benchmark tasks - βœ… Capture outputs - βœ… Score against rubrics - βœ… Automated (no human-in-loop) - βœ… Use existing GitHub Copilot access **Solution**: LangChain + GitHub Copilot Chat API --- ## 🎯 Key Decision: Why LangChain + GitHub Copilot? ### Why Not Manual Testing? - ❌ Slow (1-2 hours per agent) - ❌ Not reproducible (human variance) - ❌ Can't batch test - ❌ No CI/CD integration ### Why Not Local LLM (Ollama)? - ⚠️ Download 18GB model - ⚠️ Manual maintenance - ⚠️ Lower quality than GPT-4/Claude - βœ… **Copilot is better in every way** ### Why GitHub Copilot? βœ… - βœ… Already have access (existing subscription) - βœ… No setup (just authenticate) - βœ… Best quality (GPT-4 + Claude) - βœ… Fast (cloud inference) - βœ… Auto-updated (always latest models) --- ## 🎯 WHY LANGCHAIN + GITHUB COPILOT ### What This Solution Provides 1. **GitHub Copilot Chat API Integration** - Use your existing Copilot subscription - Claude Sonnet 3.5 quality (Copilot uses GPT-4 + Claude) - No additional API costs - Fast response times 2. **Agent Testing Framework** (LangChain) - Unit tests for components - Integration tests for full workflows - `agentevals` package for trajectory evaluation 3. **Orchestration** (LangChain) - Load custom system prompts (agent preambles) - Execute tasks programmatically - Capture conversation history 4. **Evaluation Tools** - LLM-as-judge for automated scoring - Custom evaluators for rubrics - Batch testing support ### Advantages Over Local LLM | Feature | Local LLM (Ollama) | GitHub Copilot API | |---------|-------------------|-------------------| | **Quality** | Good (Qwen2.5 32B) | Excellent (GPT-4 + Claude) | | **Setup** | Download ~18GB model | Use existing auth | | **Speed** | Medium (local inference) | Fast (cloud) | | **Cost** | Free (local compute) | Included in Copilot | | **Maintenance** | Manual updates | Auto-updated | **Winner**: GitHub Copilot API - better quality, simpler setup, leverages existing subscription. --- ## πŸ—οΈ ARCHITECTURE ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Validation Tool (TypeScript) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ 1. Load agent preamble from .md file β”‚ β”‚ 2. Create LangChain agent with system β”‚ β”‚ prompt = preamble content β”‚ β”‚ 3. Execute benchmark task β”‚ β”‚ 4. Capture output + conversation history β”‚ β”‚ 5. Score with LLM-as-judge evaluator β”‚ β”‚ 6. Generate report β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ GitHub Copilot Chat API β”‚ β”‚ (GPT-4 + Claude models) β”‚ β”‚ - Uses existing Copilot β”‚ β”‚ - No setup required β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Output β”‚ β”‚ - Raw transcript β”‚ β”‚ - Scores (0-100) β”‚ β”‚ - Comparison reports β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` --- ## πŸ“¦ SETUP (One-Time) ### Step 1: Authenticate with GitHub Copilot ```bash # Install GitHub CLI (if not already) brew install gh # Authenticate (one-time) gh auth login # Verify gh auth status ``` ### Step 2: Install Copilot API Proxy ```bash cd /Users/timothysweet/src/GRAPH-RAG-TODO-main # Install copilot-api globally (Pure Node.js!) npm install -g copilot-api # Start Copilot proxy server (runs in background) copilot-api start & # Verify it's running curl http://localhost:4141/v1/models ``` **What this does**: - Runs a local server that proxies GitHub Copilot API - Exposes it as an OpenAI-compatible endpoint at `http://localhost:4141` - **No Python needed!** πŸŽ‰ ### Step 3: Install LangChain Dependencies ```bash cd /Users/timothysweet/src/GRAPH-RAG-TODO-main # Install LangChain with OpenAI integration npm install @langchain/core @langchain/openai langchain # Install TypeScript tools npm install -g ts-node typescript ``` ### Step 4: Verify Setup ```bash # Test with Node.js (dummy API key needed for OpenAI client, but not used by proxy) node -e "const {ChatOpenAI} = require('@langchain/openai'); const llm = new ChatOpenAI({apiKey: 'dummy-key-not-used', configuration: {baseURL: 'http://localhost:4141/v1'}}); llm.invoke('Hello!').then(r => console.log('βœ… Copilot Response:', r.content));" # Expected output: "βœ… Copilot Response: Hi! How can I assist you today?" ``` **Note**: The `apiKey` is required by the LangChain OpenAI client but is not actually used by the copilot-api proxy (which uses your `gh` authentication instead). --- ## πŸ› οΈ IMPLEMENTATION ### Tool Structure ``` tools/ β”œβ”€β”€ validate-agent.ts # Main validation script β”œβ”€β”€ llm-client.ts # Github + LangChain wrapper β”œβ”€β”€ evaluators/ β”‚ β”œβ”€β”€ bug-discovery.ts # Bug discovery evaluator β”‚ β”œβ”€β”€ root-cause.ts # Root cause analysis evaluator β”‚ └── methodology.ts # Methodology evaluator └── report-generator.ts # Report formatting benchmarks/ β”œβ”€β”€ debug-benchmark.json # Task + rubric β”œβ”€β”€ research-benchmark.json └── implementation-benchmark.json validation-output/ β”œβ”€β”€ 2025-10-15_claudette-debug-v1.0.0.json # Raw transcript └── 2025-10-15_claudette-debug-v1.0.0.md # Readable report ``` ### Code: `tools/llm-client.ts` ```typescript import { ChatOpenAI } from '@langchain/openai'; import { HumanMessage, SystemMessage } from '@langchain/core/messages'; import fs from 'fs/promises'; export interface AgentConfig { preamblePath: string; temperature?: number; maxTokens?: number; } /** * Client for GitHub Copilot Chat API via copilot-api proxy * (Pure Node.js - no Python needed!) */ export class CopilotAgentClient { private llm: ChatOpenAI; private systemPrompt: string = ''; constructor(config: AgentConfig) { // Use copilot-api proxy (OpenAI-compatible endpoint) this.llm = new ChatOpenAI({ openAIApiKey: 'dummy-key-not-used', // Required by OpenAI client but not used by proxy configuration: { baseURL: 'http://localhost:4141/v1', // copilot-api proxy }, temperature: config.temperature || 0.7, maxTokens: config.maxTokens || 8000, }); } async loadPreamble(path: string): Promise<void> { this.systemPrompt = await fs.readFile(path, 'utf-8'); } async execute(task: string): Promise<{ output: string; conversationHistory: Array<{ role: string; content: string }>; tokens: { input: number; output: number }; }> { const messages = [ new SystemMessage(this.systemPrompt), new HumanMessage(task), ]; const response = await this.llm.invoke(messages); return { output: response.content.toString(), conversationHistory: [ { role: 'system', content: this.systemPrompt }, { role: 'user', content: task }, { role: 'assistant', content: response.content.toString() }, ], tokens: { input: this.estimateTokens(this.systemPrompt + task), output: this.estimateTokens(response.content.toString()), }, }; } private estimateTokens(text: string): number { // Rough estimate: 1 token β‰ˆ 4 characters return Math.ceil(text.length / 4); } } ``` ### Code: `tools/validate-agent.ts` ```typescript import { CopilotAgentClient } from './llm-client'; import { evaluateAgent } from './evaluators'; import { generateReport } from './report-generator'; import fs from 'fs/promises'; import path from 'path'; interface BenchmarkTask { name: string; description: string; task: string; rubric: { categories: Array<{ name: string; maxPoints: number; criteria: string[]; }>; }; } async function validateAgent( agentPath: string, benchmarkPath: string, outputDir: string ): Promise<void> { console.log(`\nπŸ” Validating agent: ${agentPath}`); console.log(`πŸ“‹ Benchmark: ${benchmarkPath}\n`); // 1. Load benchmark const benchmark: BenchmarkTask = JSON.parse( await fs.readFile(benchmarkPath, 'utf-8') ); // 2. Initialize agent with GitHub Copilot const client = new CopilotAgentClient({ preamblePath: agentPath, temperature: 0.0, maxTokens: 8000, }); await client.loadPreamble(agentPath); // 3. Execute benchmark task console.log('βš™οΈ Executing benchmark task...'); const result = await client.execute(benchmark.task); console.log(`βœ… Task completed in ${result.tokens.input + result.tokens.output} tokens\n`); // 4. Evaluate output console.log('πŸ“Š Evaluating output against rubric...'); const scores = await evaluateAgent(result.output, benchmark.rubric); console.log(`πŸ“ˆ Total score: ${scores.total}/100\n`); // 5. Generate report const timestamp = new Date().toISOString().split('T')[0]; const agentName = path.basename(agentPath, '.md'); const outputPath = path.join(outputDir, `${timestamp}_${agentName}`); await fs.mkdir(outputDir, { recursive: true }); // Save raw output await fs.writeFile( `${outputPath}.json`, JSON.stringify( { timestamp: new Date().toISOString(), agent: agentPath, benchmark: benchmarkPath, result, scores, }, null, 2 ) ); // Save readable report const report = generateReport({ agent: agentName, benchmark: benchmark.name, result, scores, }); await fs.writeFile(`${outputPath}.md`, report); console.log(`πŸ“„ Report saved to: ${outputPath}.md`); } // CLI usage const [agentPath, benchmarkPath] = process.argv.slice(2); if (!agentPath || !benchmarkPath) { console.error('Usage: npm run validate <agent.md> <benchmark.json>'); process.exit(1); } validateAgent( agentPath, benchmarkPath, 'validation-output' ).catch(console.error); ``` ### Code: `tools/evaluators/index.ts` ```typescript import { ChatOpenAI } from '@langchain/openai'; interface Rubric { categories: Array<{ name: string; maxPoints: number; criteria: string[]; }>; } interface Scores { categories: Record<string, number>; total: number; feedback: Record<string, string>; } export async function evaluateAgent( agentOutput: string, rubric: Rubric ): Promise<Scores> { // Use GitHub Copilot for evaluation (LLM-as-judge) const evaluator = new ChatOpenAI({ openAIApiKey: 'dummy-key-not-used', // Required by OpenAI client but not used by proxy configuration: { baseURL: 'http://localhost:4141/v1', // copilot-api proxy }, temperature: 0.0, // Deterministic scoring }); const scores: Scores = { categories: {}, total: 0, feedback: {}, }; // Evaluate each category for (const category of rubric.categories) { const evaluationPrompt = ` You are an expert evaluator. Score the following agent output against this rubric category: **Category**: ${category.name} (Max: ${category.maxPoints} points) **Criteria**: ${category.criteria.map((c, i) => `${i + 1}. ${c}`).join('\n')} **Agent Output**: ${agentOutput} **Instructions**: 1. Assign a score from 0 to ${category.maxPoints} based on how well the output meets the criteria. 2. Provide brief feedback explaining the score. 3. Format your response EXACTLY as: SCORE: <number> FEEDBACK: <explanation> `.trim(); const response = await evaluator.invoke(evaluationPrompt); const responseText = response.content.toString(); // Parse score const scoreMatch = responseText.match(/SCORE:\s*(\d+)/); const feedbackMatch = responseText.match(/FEEDBACK:\s*(.+)/s); const score = scoreMatch ? parseInt(scoreMatch[1], 10) : 0; const feedback = feedbackMatch ? feedbackMatch[1].trim() : 'No feedback provided'; scores.categories[category.name] = Math.min(score, category.maxPoints); scores.feedback[category.name] = feedback; scores.total += scores.categories[category.name]; } return scores; } ``` ### Code: `tools/report-generator.ts` ```typescript interface ReportData { agent: string; benchmark: string; result: { output: string; conversationHistory: Array<{ role: string; content: string }>; tokens: { input: number; output: number }; }; scores: { categories: Record<string, number>; total: number; feedback: Record<string, string>; }; } export function generateReport(data: ReportData): string { return ` # Agent Validation Report **Agent**: ${data.agent} **Benchmark**: ${data.benchmark} **Date**: ${new Date().toISOString().split('T')[0]} **Total Score**: ${data.scores.total}/100 --- ## Scoring Breakdown ${Object.entries(data.scores.categories) .map( ([category, score]) => ` ### ${category}: ${score} points **Feedback**: ${data.scores.feedback[category]} ` ) .join('\n')} --- ## Agent Output \`\`\` ${data.result.output} \`\`\` --- ## Token Usage - **Input**: ${data.result.tokens.input} tokens - **Output**: ${data.result.tokens.output} tokens - **Total**: ${data.result.tokens.input + data.result.tokens.output} tokens --- ## Conversation History ${data.result.conversationHistory .map( (msg) => ` ### ${msg.role.toUpperCase()} \`\`\` ${msg.content} \`\`\` ` ) .join('\n')} `.trim(); } ``` --- ## πŸ“‹ USAGE ### Validate Single Agent ```bash npm run validate -- \ docs/agents/claudette-debug.md \ benchmarks/debug-benchmark.json ``` Output: ``` πŸ” Validating agent: docs/agents/claudette-debug.md πŸ“‹ Benchmark: benchmarks/debug-benchmark.json βš™οΈ Executing benchmark task... βœ… Task completed in 12,451 tokens πŸ“Š Evaluating output against rubric... Bug Discovery: 32/35 Root Cause Analysis: 18/20 Methodology: 19/20 Process Quality: 14/15 Production Impact: 9/10 πŸ“ˆ Total score: 92/100 πŸ“„ Report saved to: validation-output/2025-10-15_claudette-debug.md ``` ### Batch Validate Multiple Agents ```bash # Create batch validation script npm run validate:batch -- \ docs/agents/claudette-debug.md \ docs/agents/generated-debug-v1.md \ docs/agents/generated-debug-v2.md ``` ### Compare to Baseline ```bash npm run validate:compare -- \ --baseline docs/agents/claudette-debug.md \ --candidate docs/agents/generated-debug-v1.md \ --benchmark benchmarks/debug-benchmark.json ``` Output: ``` πŸ“Š Comparison Report Baseline (claudette-debug): 92/100 Candidate (generated-debug-v1): 88/100 Delta: -4 points Gaps: - Bug Discovery: -3 pts (stopped after 6 bugs) - Process Quality: -1 pt (missed cleanup) ``` --- ## 🎯 TESTING THE AGENTINATOR (Two-Hop) ### Automated Two-Hop Validation ```typescript // tools/validate-agentinator.ts async function validateAgentinator( aginatorPath: string, requirement: string, benchmarkPath: string, baselineScore: number ): Promise<void> { console.log('πŸ” Two-Hop Validation: Agentinator\n'); // Hop 1: Generate agent console.log('πŸ“ Hop 1: Generating agent with Agentinator...'); const agentClient = new CopilotAgentClient({ preamblePath: aginatorPath, }); await agentClient.loadPreamble(aginatorPath); const generatedAgent = await agentClient.execute(requirement); // Save generated agent const generatedPath = 'generated-agents/debug-v1.md'; await fs.writeFile(generatedPath, generatedAgent.output); console.log(`βœ… Agent generated: ${generatedPath}\n`); // Hop 2: Validate generated agent console.log('πŸ“Š Hop 2: Validating generated agent...'); await validateAgent(generatedPath, benchmarkPath, 'validation-output'); // Load scores const reportPath = `validation-output/${new Date().toISOString().split('T')[0]}_debug-v1.json`; const report = JSON.parse(await fs.readFile(reportPath, 'utf-8')); // Compare to baseline const delta = report.scores.total - baselineScore; const success = delta >= -10 && delta <= 0; console.log(`\n🎯 Agentinator Validation Results:`); console.log(` Baseline: ${baselineScore}/100`); console.log(` Generated: ${report.scores.total}/100`); console.log(` Delta: ${delta > 0 ? '+' : ''}${delta} points`); console.log(` Status: ${success ? 'βœ… PASS' : '❌ FAIL'}\n`); if (success) { console.log('βœ… Agentinator produces agents within 10 pts of baseline!'); } else { console.log('❌ Agentinator needs improvement. Gap too large.'); } } ``` Usage: ```bash npm run validate:agentinator -- \ --agentinator docs/agents/claudette-agentinator.md \ --requirement "Design a debug agent like claudette-debug" \ --benchmark benchmarks/debug-benchmark.json \ --baseline 92 ``` --- ## βœ… ADVANTAGES OF LANGCHAIN + GITHUB COPILOT | Feature | Manual (Cursor) | LangChain + Copilot | |---------|-----------------|---------------------| | **Automation** | ❌ Manual | βœ… Fully automated | | **Setup** | βœ… None needed | βœ… Just authenticate | | **Quality** | βœ… Claude Sonnet 4.5 | βœ… GPT-4 + Claude | | **Batch Testing** | ❌ One at a time | βœ… Parallel | | **Reproducibility** | ⚠️ Human variance | βœ… Deterministic | | **Speed** | ⏱️ 1-2 hours/agent | ⚑ 5-10 min/agent | | **Scoring** | ❌ Manual | βœ… LLM-as-judge | | **CI/CD Integration** | ❌ No | βœ… Yes | | **Cost** | βœ… Free (Cursor) | βœ… Included (Copilot) | **Winner**: LangChain + GitHub Copilot for automated, fast, high-quality validation. --- ## πŸ“Š Quick Reference: Expected Outputs ### Single Agent Validation ``` πŸ” Validating agent: claudette-debug.md βš™οΈ Executing benchmark task... βœ… Task completed in 12,451 tokens πŸ“Š Evaluating output against rubric... Bug Discovery: 32/35 Root Cause Analysis: 18/20 Methodology: 19/20 Process Quality: 14/15 Production Impact: 9/10 πŸ“ˆ Total score: 92/100 πŸ“„ Report saved to: validation-output/2025-10-15_claudette-debug.md ``` ### Two-Hop Validation (Agentinator) ``` πŸ“ Hop 1: Generating agent... βœ… Agent generated: generated-agents/debug-v1.md πŸ“Š Hop 2: Validating generated agent... πŸ“ˆ Generated agent score: 88/100 🎯 Delta: -4 points βœ… PASS (within 10 pts of baseline) ``` ### Comparison Feature Matrix | Feature | Manual (Cursor) | LangChain + Copilot | |---------|-----------------|---------------------| | **Automation** | ❌ Manual | βœ… Fully automated | | **Setup** | βœ… None needed | βœ… Just authenticate (5 min) | | **Quality** | βœ… Claude Sonnet 4.5 | βœ… GPT-4 + Claude | | **Batch Testing** | ❌ One at a time | βœ… Parallel | | **Reproducibility** | ⚠️ Human variance | βœ… Deterministic | | **Speed** | ⏱️ 1-2 hours/agent | ⚑ 5-10 min/agent | | **Scoring** | ❌ Manual | βœ… LLM-as-judge | | **CI/CD Integration** | ❌ No | βœ… Yes | | **Cost** | βœ… Free (Cursor) | βœ… Included (Copilot) | --- ## πŸš€ IMMEDIATE NEXT STEPS ### Phase 1: Setup (5 min) - Pure Node.js! ```bash # Authenticate with GitHub Copilot gh auth login # Install copilot-api proxy (Pure Node.js!) npm install -g copilot-api copilot-api start & # Install LangChain (TypeScript) npm install @langchain/core @langchain/openai langchain # Test connection node -e "const {ChatOpenAI} = require('@langchain/openai'); const llm = new ChatOpenAI({configuration: {baseURL: 'http://localhost:11435/v1'}}); llm.invoke('Hello!').then(r => console.log('βœ…', r.content));" ``` ### Phase 2: Build Tool (4 hours) ```bash # Create tool structure mkdir -p tools/evaluators touch tools/validate-agent.ts touch tools/llm-client.ts touch tools/evaluators/index.ts touch tools/report-generator.ts # Implement (copy code above) # ... # Add npm scripts to package.json npm pkg set scripts.validate="ts-node tools/validate-agent.ts" ``` ### Phase 3: Create Benchmarks (1 hour) ```bash # Create benchmark specs touch benchmarks/debug-benchmark.json # Fill with task + rubric (structured JSON) ``` ### Phase 4: Run First Validation (30 min) ```bash # Test claudette-debug (baseline) npm run validate docs/agents/claudette-debug.md benchmarks/debug-benchmark.json # Review output cat validation-output/2025-10-15_claudette-debug.md ``` ### Phase 5: Test Agentinator (1 hour) ```bash # Two-hop validation npm run validate:agentinator -- \ --agentinator docs/agents/claudette-agentinator.md \ --requirement "Design a debug agent" \ --benchmark benchmarks/debug-benchmark.json \ --baseline 92 ``` --- ## πŸŽ“ WHY THIS IS THE CORRECT SOLUTION **Your question**: "Why can't we use LangChain?" **My answer**: **We absolutely should use LangChain!** **Reasons**: 1. βœ… **No external APIs** - Ollama runs locally 2. βœ… **Automated** - No manual orchestration needed 3. βœ… **Industry standard** - LangChain is the de facto framework for agent testing 4. βœ… **Reproducible** - Same input β†’ same output 5. βœ… **Fast** - 5-10 min per agent vs. 1-2 hours manual 6. βœ… **CI/CD ready** - Can run in GitHub Actions 7. βœ… **LLM-as-judge** - Automated scoring against rubrics **My initial mistake**: I dismissed LangChain too quickly without researching its local LLM capabilities and evaluation tools. **Correct approach**: LangChain + Ollama is the perfect solution for automated, local, reproducible agent validation. --- ## πŸ“Š EXPECTED TIMELINE | Phase | Task | Time | |-------|------|------| | 1 | Setup (Pure Node.js!) | 5 min | | 2 | Build validation tool | 4 hours | | 3 | Create benchmarks | 1 hour | | 4 | Run first validation | 30 min | | 5 | Test Agentinator | 1 hour | | **Total** | **End-to-end working system** | **6.5 hours** | **Next step**: `gh auth login && npm install -g copilot-api && copilot-api start`

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server