Skip to main content
Glama

MCP Multi-Agent Orchestration Server

by ec49ca
WORKFLOW.md8.09 kB
# System Workflow - Based on Log Analysis This document describes the actual workflow of the MCP server system based on observed logs. It shows how a user query flows through the system from input to output. ## System Initialization When the server starts: 1. **Server Process Starts** - Uvicorn server initializes 2. **Agent Discovery** - System automatically discovers available agents - Discovers agent exports from the codebase - Found 2 agents: `InternalAgent` and `ExternalAgent` 3. **Agent Registration** - Each agent is registered with a unique ID: - `InternalAgent` → ID: `6ccf9095-09dc-55bd-b11f-91c64df1a532` - `ExternalAgent` → ID: `0939b671-5fb6-58f5-9e80-de95cd809d27` 4. **Server Ready** - Application startup complete, ready to accept requests --- ## Query Processing Workflow ### Step 0: Request Reception **Log Evidence:** ``` 🌐 SERVER: Received orchestrate request Query: [user's query text] ``` - Frontend sends POST request to `/orchestrate` endpoint - Server receives the query and logs it - Query is passed to the Orchestrator --- ### Step 1: Query Analysis & Agent Selection **Log Evidence:** ``` 🎯 ORCHESTRATOR: Starting to process query 🔍 STEP 1: Analyzing query and determining which agents to use... 📡 OLLAMA: Calling llama3:latest API... Prompt length: [X] chars System prompt: [Y] chars ✅ OLLAMA: Received response ([Z] chars) ✅ OLLAMA: Successfully parsed JSON response ✅ Query analysis complete! Agents needed: ['internal_agent', 'external_agent'] Generated queries: { 'internal_agent': '[optimized query 1]', 'external_agent': '[optimized query 2]' } ``` **What Happens:** 1. Orchestrator receives the user query 2. Makes an LLM call to Ollama (using `llama3:latest` model) 3. LLM analyzes the query and determines: - Which agents are needed (can be 0, 1, or multiple) - Optimized queries for each selected agent 4. Response is parsed as JSON - If JSON parsing fails, fallback parsing is attempted - Fallback parsing successfully extracts agent names and queries **Example from Logs:** - Query: "can you tell me from my italy-xxx contract what i need to change for it to work in australia" - Analysis Result: - Agents needed: `['internal_agent', 'external_agent']` - Internal agent query: "Find Italy-xxx contract terms and required changes for adaptation in Australia" - External agent query: "Retrieve Australian compliance standards and regional requirements for the adapted contract" --- ### Step 2: Agent Execution **Log Evidence:** ``` 🤖 STEP 2: Executing 2 agent(s)... → Executing internal_agent... Query: [optimized query for internal agent] 🔵 INTERNAL AGENT: Processing query... 📡 OLLAMA: Calling llama3:latest API... ✅ OLLAMA: Received response ([X] chars) ✅ INTERNAL AGENT: Got response ([X] chars) ✅ internal_agent completed successfully → Executing external_agent... Query: [optimized query for external agent] 🟢 EXTERNAL AGENT: Processing query... 📡 OLLAMA: Calling llama3:latest API... ✅ OLLAMA: Received response ([Y] chars) ✅ EXTERNAL AGENT: Got response ([Y] chars) ✅ external_agent completed successfully ``` **What Happens:** 1. Orchestrator executes agents **sequentially** (one after another) 2. For each agent: - Agent receives its optimized query - Agent makes its own LLM call to Ollama - Agent processes the response - Agent returns results to orchestrator 3. All agent results are collected **Execution Order:** - Agents execute in the order they were determined in Step 1 - Each agent is independent and makes its own Ollama API call - Agents can return different response sizes (observed: 1093 chars for internal, 1391 chars for external) **Timing Observations:** - Internal agent: ~13 seconds (16:30:55 → 16:31:08) - External agent: ~15 seconds (16:31:08 → 16:31:23) - Total agent execution: ~28 seconds for 2 agents --- ### Step 3: Result Synthesis **Log Evidence:** ``` 🔄 STEP 3: Comparing and synthesizing results... 📡 OLLAMA: Calling llama3:latest API... Prompt length: [X] chars (includes both agent results) System prompt: [Y] chars ✅ OLLAMA: Received response ([Z] chars) ✅ Result synthesis complete! ``` **What Happens:** 1. Orchestrator combines all agent results into a single prompt 2. Makes final LLM call to Ollama with: - The original user query - Results from all executed agents - Instructions to compare and synthesize 3. LLM generates a comprehensive final answer 4. Response is typically longer than individual agent responses (observed: 2970 chars vs 1093/1391) **Timing Observations:** - Synthesis takes ~38 seconds (16:31:23 → 16:32:02) - This is the longest step, likely because it processes the most data --- ### Step 4: Response Return **Log Evidence:** ``` ✅ ORCHESTRATOR: Query processing complete! 🌐 SERVER: Returning response to client INFO: 127.0.0.1:[port] - "POST /orchestrate HTTP/1.1" 200 OK ``` **What Happens:** 1. Orchestrator completes processing 2. Server returns the synthesized response to the frontend 3. HTTP 200 OK status indicates success 4. Frontend displays the response to the user --- ## Complete Timeline Example Based on the logs for query: "can you tell me from my italy-xxx contract what i need to change for it to work in australia" | Time | Step | Duration | Details | |------|------|----------|---------| | 16:30:46.130 | Request Received | - | Query received by server | | 16:30:46.130 | Step 1: Analysis | ~9s | LLM determines agents needed | | 16:30:55.264 | Step 2: Agent Exec | ~28s | Both agents execute sequentially | | 16:31:08.188 | - Internal Agent | ~13s | Processes internal documents | | 16:31:23.843 | - External Agent | ~15s | Processes external compliance | | 16:31:23.843 | Step 3: Synthesis | ~38s | LLM synthesizes final answer | | 16:32:02.030 | Step 4: Response | - | Response returned to client | | **Total** | **~76 seconds** | | End-to-end processing time | --- ## Key Observations from Logs ### 1. Error Handling - JSON parsing failures are handled gracefully - Fallback parsing mechanism successfully recovers from malformed JSON - System continues processing even when initial parsing fails ### 2. Agent Selection - System can determine 0 agents needed (simple queries) - System can determine multiple agents needed (complex queries) - Each agent gets an optimized, focused query ### 3. Sequential Execution - Agents execute one after another (not in parallel) - Each agent makes independent Ollama API calls - Results are collected before synthesis ### 4. Response Sizes - Analysis step: ~61-465 chars - Agent responses: ~1093-1391 chars each - Final synthesis: ~2490-2970 chars (largest) ### 5. Performance - Total processing time: ~76 seconds for complex query - Most time spent in synthesis step (~38s) - Agent execution: ~28s combined - Analysis: ~9s --- ## Workflow Summary ``` User Query ↓ [Server Receives Request] ↓ [Step 1: Query Analysis] → LLM determines agents needed → Generates optimized queries ↓ [Step 2: Agent Execution] → Execute agents sequentially → Each agent calls LLM independently → Collect all results ↓ [Step 3: Result Synthesis] → Combine all agent results → LLM synthesizes comprehensive answer ↓ [Step 4: Response Return] → Return to frontend → Display to user ``` --- ## Verification Status ✅ **System is Working Correctly** Evidence from logs: - ✅ Server starts and discovers agents successfully - ✅ Queries are received and processed - ✅ LLM calls are made successfully - ✅ Agents execute and return results - ✅ Results are synthesized into final answers - ✅ Responses are returned to clients - ✅ Error handling works (JSON parsing fallback) - ✅ All steps complete successfully The workflow is functioning as designed, processing queries through analysis, agent execution, and synthesis to produce comprehensive responses.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ec49ca/NLP-project-contract-comparison'

If you have feedback or need assistance with the MCP directory API, please join our Discord server