Provides comprehensive management of local Ollama installations, including listing and removing models, chatting with locally installed models, starting and monitoring the Ollama server, performing health checks, and getting model recommendations for specific tasks based on locally available models.
Ollama MCP Server
A self-contained Model Context Protocol (MCP) server for local Ollama management, developed with Claude AI assistance. Features include listing local models, chatting, starting/stopping the server, and a 'local model advisor' to suggest the best local model for a given task. The server is designed to be a robust, dependency-free, and cross-platform tool for managing a local Ollama instance.
ā ļø Current Testing Status
Currently tested on: Windows 11 with NVIDIA RTX 4090
Status: Beta on Windows, Other Platforms Need Testing
Cross-platform code: Ready for Linux and macOS but requires community testing
GPU support: NVIDIA fully tested, AMD/Intel/Apple Silicon implemented but needs validation
We welcome testers on different platforms and hardware configurations! Please report your experience via GitHub Issues.
šÆ Key Features
š§ Self-Contained Architecture
Zero External Dependencies: No external MCP servers required
MIT License Ready: All code internally developed and properly licensed
Enterprise-Grade: Professional error handling with actionable troubleshooting
š Universal Compatibility
Cross-Platform: Windows, Linux, macOS with automatic platform detection
Multi-GPU Support: NVIDIA, AMD, Intel detection with vendor-specific optimizations
Smart Installation Discovery: Automatic Ollama detection across platforms
ā” Complete Local Ollama Management
Model Operations: List, suggest, and remove local models.
Server Control: Start and monitor the Ollama server with intelligent process management.
Direct Chat: Communicate with any locally installed model.
System Analysis: Assess hardware compatibility and monitor resources.
š Quick Start
Installation
Configuration
Add to your MCP client configuration (e.g., Claude Desktop config.json):
Note: Adjust the path to match your installation directory. On Linux/macOS, use forward slashes: /path/to/ollama-mcp-server/src/ollama_mcp/server.py
Requirements
Python 3.10+ (required by MCP SDK dependency)
Ollama installed and accessible in PATH
MCP-compatible client (Claude Desktop, etc.)
Ollama Configuration Compatibility
This MCP server automatically respects your Ollama configuration. If you have customized your Ollama setup (e.g., changed the models folder via OLLAMA_MODELS environment variable), the MCP server will work seamlessly without any additional configuration.
š ļø Available Tools
Model Management
list_local_models- List all locally installed models with their details.local_llm_chat- Chat directly with any locally installed model.remove_model- Safely remove a model from local storage.suggest_models- Recommends the best locally installed model for a specific task (e.g., "suggest a model for coding").
Server and System Operations
start_ollama_server- Starts the Ollama server if it's not already running.ollama_health_check- Performs a comprehensive health check of the Ollama server.system_resource_check- Analyzes system hardware and resource availability.
Diagnostics
test_model_responsiveness- Checks the responsiveness of a specific local model by sending a test prompt, helping to diagnose performance issues.select_chat_model- Presents a list of available local models to choose from before starting a chat.
š¬ How to Interact with Ollama-MCP
Ollama-MCP works through your MCP client (like Claude Desktop) - you don't interact with it directly. Instead, you communicate with your MCP client using natural language, and the client translates your requests into tool calls.
Basic Interaction Pattern
You speak to your MCP client in natural language, and it automatically uses the appropriate ollama-mcp tools:
Example Interactions
Model Management
"What models do I have installed?" ā
list_local_models"I need a model for creative writing, which of my models is best?" ā
suggest_models"Remove the old mistral model to save space" ā
remove_model
System Operations
"Start Ollama server" ā
start_ollama_server"Is my system capable of running large AI models?" ā
system_resource_check
AI Chat
"Chat with llama3.2: write a Python function to sort a list" ā
local_llm_chat"Use deepseek-coder to debug this code: [code snippet]" ā
local_llm_chat"Ask phi3.5 to explain quantum computing simply" ā
local_llm_chat
Key Points
No Direct Commands: You never call
ollama_health_check()directlyNatural Language: Speak normally to your MCP client
Automatic Tool Selection: The client chooses the right tool based on your request
Conversational: You can ask follow-up questions and the client maintains context
šÆ Real-World Use Cases
Daily Development Workflow
"I need to work on a coding project. Which of my local models is best for coding? Let's check its performance and then ask it a question."
This could trigger:
suggest_models- Recommends the best local model for "coding".test_model_responsiveness- Checks if the recommended model is responsive.local_llm_chat- Starts a chat with the model.
Model Management Session
"Show me what models I have and recommend one for writing a story. Then let's clean up any old models I don't need."
Triggers:
list_local_models- Current inventorysuggest_models- Recommends a local model for "writing a story".remove_model- Cleanup unwanted models.
Troubleshooting Session
"Ollama isn't working. Check what's wrong, try to fix it, and test with a simple chat."
Triggers:
ollama_health_check- Diagnose issuesstart_ollama_server- Attempt to start serverlocal_llm_chat- Verify working with test message
šļø Architecture
Design Principles
Self-Contained: Zero external MCP server dependencies
Fail-Safe: Comprehensive error handling with actionable guidance
Cross-Platform First: Universal Windows/Linux/macOS compatibility
Enterprise Ready: Professional-grade implementation and documentation
Technical Highlights
Internal Process Management: Advanced subprocess handling with timeout control
Multi-GPU Detection: Platform-specific GPU identification without confusing metrics
Intelligent Model Selection: Fallback to first available model when none specified
Progressive Health Monitoring: Smart server startup detection with detailed feedback
š System Compatibility
Operating Systems
Windows: Full support with auto-detection in Program Files and AppData ā Tested
Linux: XDG configuration support with package manager integration ā ļø Needs Testing
macOS: Homebrew detection with Apple Silicon GPU support ā ļø Needs Testing
GPU Support
NVIDIA: Full detection via nvidia-smi with memory and utilization info ā Tested RTX 4090
AMD: ROCm support via vendor-specific tools ā ļø Needs Testing
Intel: Basic detection via system tools ā ļø Needs Testing
Apple Silicon: M1/M2/M3 detection with unified memory handling ā ļø Needs Testing
Hardware Requirements
Minimum: 4GB RAM, 2GB free disk space
Recommended: 8GB+ RAM, 10GB+ free disk space
GPU: Optional but recommended for model acceleration
š§ Development
Project Structure
Key Technical Achievements
Self-Contained Implementation
Challenge: Eliminated external
desktop-commanderdependencySolution: Internal process management with advanced subprocess handling
Result: Zero external MCP dependencies, MIT license compatible
Intelligent GPU Detection
Challenge: Complex VRAM reporting causing user confusion
Solution: Simplified to GPU name display only
Result: Clean, reliable hardware identification
Enterprise Error Handling
Implementation: 6-level exception framework with specific error types
Coverage: Platform-specific errors, process failures, network issues
UX: Actionable troubleshooting steps for every error scenario
š¤ Contributing
We welcome contributions! Areas where help is especially appreciated:
Platform Testing: Different OS and hardware configurations ā High Priority
GPU Vendor Support: Additional vendor-specific detection
Performance Optimization: Startup time and resource usage improvements
Documentation: Usage examples and integration guides
Testing: Edge cases and error condition validation
Immediate Testing Needs
Linux: Ubuntu, Fedora, Arch with various GPU configurations
macOS: Intel and Apple Silicon Macs with different Ollama installations
GPU Vendors: AMD ROCm, Intel Arc, Apple unified memory
Edge Cases: Different Python versions, various Ollama installation methods
Development Setup
š Troubleshooting
Common Issues
Ollama Not Found
Server Startup Failures
Permission Issues
Windows: Run as Administrator if needed
Linux/macOS: Check user permissions for service management
Platform-Specific Issues
If you encounter issues on Linux or macOS, please report them via GitHub Issues with:
Operating system and version
Python version
Ollama version and installation method
GPU hardware (if applicable)
Complete error output
š Performance
Typical Response Times (Windows RTX 4090)
Health Check: <500ms
Model List: <1 second
Server Start: 1-15 seconds (hardware dependent)
Model Chat: 2-30 seconds (model and prompt dependent)
Resource Usage
Memory: <50MB for MCP server process
CPU: Minimal when idle, scales with operations
Storage: Configuration files and logs only
š Security
Data Flow: User ā MCP Client (Claude) ā ollama-mcp-server ā Local Ollama ā back through chain
šØāš» About This Project
This is my first MCP server, created by adapting a personal tool I had developed for my own Ollama management needs.
The Problem I Faced
I started using Claude to interact with Ollama because it allows me to use natural language instead of command-line interfaces. Claude also provides capabilities that Ollama alone doesn't have, particularly intelligent model suggestions based on both my system capabilities and specific needs.
My Solution
I built this MCP server to streamline my own workflow, and then refined it into a stable tool that others might find useful. The design reflects real usage patterns:
Self-contained: No external dependencies that can break
Intelligent error handling: Clear guidance when things go wrong
Cross-platform: Works consistently across different environments
Practical tools: Features I actually use in daily work
Design Philosophy
I initially developed this for my personal use to manage Ollama models more efficiently. When the MCP protocol became available, I transformed my personal tool into an MCP server to share it with others who might find it useful.
Development Approach: This project was developed with Claude using "vibe coding" - an iterative, conversational development process where AI assistance helped refine both the technical implementation and user experience. It's a practical example of AI-assisted development creating tools for AI management. Jules was also involved in the final refactoring phase.
š License
MIT License - see LICENSE file for details.
š Acknowledgments
Ollama Team: For the excellent local AI platform
MCP Project: For the Model Context Protocol specification
Claude Desktop/Code by Anthropic: As tool in MCP client implementation, testing and refactoring
Jules by Google: As tool in refactoring
š Support
Bug Reports: GitHub Issues
Feature Requests: GitHub Issues
Community Discussion: GitHub Discussions
Changelog
v0.9.0 (August 17, 2025): Critical bugfix release - Fixed datetime serialization issue that prevented model listing from working with Claude Desktop. All 9 tools now verified working correctly.
August 2025: Project refactoring and enhancements. Overhauled the architecture for modularity, implemented a fully asynchronous client, added a test suite, and refined the tool logic based on a "local-first" philosophy.
July 2025: Initial version created by Paolo Dalprato with Claude AI assistance.
For detailed changes, see CHANGELOG.md.
Status: Beta on Windows, Other Platforms Need Testing
Testing: Windows 11 + RTX 4090 validated, Linux/macOS require community validation
License: MIT
Dependencies: Zero external MCP servers required