This MCP server enables AI agents to consult with multiple Ollama models for diverse perspectives, collaborative reasoning, and persistent memory management.
Core Capabilities:
Consult Individual Models - Send prompts to specific Ollama models with optional system prompts for guided reasoning
List Available Models - Discover all models available on your local or remote Ollama instance
Compare Multiple Models - Run identical prompts against multiple models simultaneously for side-by-side output comparisons
Sequential Reasoning Chains - Execute complex multi-step workflows where subsequent consultations build upon previous results
Persistent Memory Storage - Save consultation results (key, prompt, model, response) to configured memory services or local file storage for cross-session retrieval
Timeout Management - Configure specific time limits (60-600 seconds) for complex reasoning tasks without losing context
Flexible Integration - Works with any MCP-compatible client (like Claude Desktop) through the Model Context Protocol
Enables consulting with Ollama models for alternative reasoning viewpoints, with tools for sending prompts to models and listing available models on the Ollama instance.
MCP Ollama Consult Server
An intelligent MCP server for consulting with Ollama models and enabling multi-perspective AI reasoning
Overview
MCP Ollama Consult is a Model Context Protocol (MCP) server that enables AI agents to consult with multiple Ollama models for diverse perspectives, reasoning chains, and collaborative problem-solving. It provides powerful tools for sequential consultation workflows and persistent memory management.
Key Features
š¤ Multi-Model Consultation - Consult with any available Ollama model
š Model Comparison - Run identical prompts against multiple models simultaneously
š§ Sequential Reasoning Chains - Execute complex multi-step reasoning workflows
š¾ Persistent Memory - Store and retrieve consultation results across sessions
š Flexible Integration - Works with any MCP-compatible client or framework
ā” Timeout Management - Configurable timeouts for complex reasoning tasks
šÆ Demo Client - Built-in demo for testing and exploration
Installation
Quick Start
As an MCP Server
Add to your MCP client configuration (e.g., Claude Desktop):
Running the Server
Make sure Ollama is running locally (default: http://localhost:11434).
Usage Examples
Basic Consultation
Model Comparison
Sequential Consultation Chain
Environment Variables
Memory Configuration
The remember_consult tool supports flexible memory backend configuration. It attempts to connect to memory storage in this order:
REMEMBER_MCP_CONFIG environment variable (JSON config)
VS Code mcp.json entries (auto-detects
remember/memoryservers)MEMORY_MCP_CMD/MEMORY_MCP_ARGS environment variables
Local file fallback at
MEMORY_DIR(default:/tmp/mcp-consult-memory)
Example Memory Server Configuration
VS Code mcp.json (automatically detected):
Tool Reference
consult_ollama
Consult with a specific Ollama model.
Parameters:
prompt(required): The consultation promptmodel(required): Ollama model name (e.g., "llama3.2")context(optional): Additional context for the consultation
list_ollama_models
List all available models on the Ollama instance.
Parameters: None
compare_ollama_models
Run the same prompt against multiple models for comparison.
Parameters:
prompt(required): The prompt to send to all modelsmodels(required): Array of model names to comparecontext(optional): Shared context for all models
remember_consult
Store consultation results in persistent memory.
Parameters:
key(required): Unique identifier for the memory entryvalue(required): Content to storemetadata(optional): Additional context about the stored data
sequential_consultation_chain
Execute multi-step reasoning chains where consultants build on previous responses.
Parameters:
consultants(required): Array of consultant configurationsid(required): Unique consultant identifiermodel(required): Ollama model nameprompt(required): Consultation prompt (can reference previous consultants with{consultant_id})timeoutMs(optional): Timeout in milliseconds (default: 120000)
Development
Project Structure
Running Tests
Building
Sequential Consultation Chains
The sequential_consultation_chain tool enables complex multi-step reasoning by allowing consultants to reference and build upon previous responses. This creates powerful workflows for collaborative problem-solving.
Timeout Configuration
Configure timeouts based on task complexity:
Recommended timeouts:
Simple queries: 60-90 seconds
Code generation: 180-300 seconds
Complex analysis: 300-600 seconds
Note: Avoid breaking complex questions into smaller parts, as this loses conversation context. Instead, increase the timeoutMs for consultants that need more processing time.
For detailed examples, see sequential_chain_demos.md.
Integration with Other MCP Tools
MCP Consult works seamlessly with other MCP servers:
mcp-optimist - Code optimization and analysis
mcp-tdd - Test-driven development workflows
Memory servers - Persistent data storage
Code analysis tools - Static analysis integration
Docker
Building the Image
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Development Guidelines
Follow Test-Driven Development (TDD) practices
Maintain test coverage above 70%
Use TypeScript strict mode
Follow existing code style and formatting
Update documentation for new features
Architecture
For detailed technical architecture, see ARCHITECTURE.md.
License
MIT License - see LICENSE file for details.
Requirements
Node.js 18+
Ollama server running locally or accessible via HTTP
npm or pnpm for package management
Support
š Documentation
š Issue Tracker
š¬ Discussions
Links
Built with ā¤ļø using the Model Context Protocol and Test-Driven Development