Skip to main content
Glama
README.mdβ€’15.8 kB
# Ollama MCP Server [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/) [![MCP Compatible](https://img.shields.io/badge/MCP-Compatible-green.svg)](https://modelcontextprotocol.io/) A self-contained **Model Context Protocol (MCP) server** for local Ollama management, developed with Claude AI assistance. Features include listing local models, chatting, starting/stopping the server, and a 'local model advisor' to suggest the best local model for a given task. The server is designed to be a robust, dependency-free, and cross-platform tool for managing a local Ollama instance. ## ⚠️ Current Testing Status **Currently tested on**: Windows 11 with NVIDIA RTX 4090 **Status**: Beta on Windows, Other Platforms Need Testing **Cross-platform code**: Ready for Linux and macOS but requires community testing **GPU support**: NVIDIA fully tested, AMD/Intel/Apple Silicon implemented but needs validation We welcome testers on different platforms and hardware configurations! Please report your experience via GitHub Issues. ## 🎯 Key Features ### πŸ”§ **Self-Contained Architecture** - **Zero External Dependencies**: No external MCP servers required - **MIT License Ready**: All code internally developed and properly licensed - **Enterprise-Grade**: Professional error handling with actionable troubleshooting ### 🌐 **Universal Compatibility** - **Cross-Platform**: Windows, Linux, macOS with automatic platform detection - **Multi-GPU Support**: NVIDIA, AMD, Intel detection with vendor-specific optimizations - **Smart Installation Discovery**: Automatic Ollama detection across platforms ### ⚑ **Complete Local Ollama Management** - **Model Operations**: List, suggest, and remove local models. - **Server Control**: Start and monitor the Ollama server with intelligent process management. - **Direct Chat**: Communicate with any locally installed model. - **System Analysis**: Assess hardware compatibility and monitor resources. ## πŸš€ Quick Start ### Installation ```bash git clone https://github.com/paolodalprato/ollama-mcp-server.git cd ollama-mcp-server pip install -e . ``` ### Configuration Add to your MCP client configuration (e.g., Claude Desktop `config.json`): ```json { "mcpServers": { "ollama-mcp": { "command": "python", "args": [ "X:\\PATH_TO\\ollama-mcp-server\\src\\ollama_mcp\\server.py" ], "env": {} } } } ``` **Note**: Adjust the path to match your installation directory. On Linux/macOS, use forward slashes: `/path/to/ollama-mcp-server/src/ollama_mcp/server.py` ### Requirements - **Python 3.10+** (required by MCP SDK dependency) - **Ollama installed** and accessible in PATH - **MCP-compatible client** (Claude Desktop, etc.) ### Ollama Configuration Compatibility This MCP server automatically respects your Ollama configuration. If you have customized your Ollama setup (e.g., changed the models folder via `OLLAMA_MODELS` environment variable), the MCP server will work seamlessly without any additional configuration. ## πŸ› οΈ Available Tools ### **Model Management** - `list_local_models` - List all locally installed models with their details. - `local_llm_chat` - Chat directly with any locally installed model. - `remove_model` - Safely remove a model from local storage. - `suggest_models` - Recommends the best **locally installed** model for a specific task (e.g., "suggest a model for coding"). ### **Server and System Operations** - `start_ollama_server` - Starts the Ollama server if it's not already running. - `ollama_health_check` - Performs a comprehensive health check of the Ollama server. - `system_resource_check` - Analyzes system hardware and resource availability. ### **Diagnostics** - `test_model_responsiveness` - Checks the responsiveness of a specific local model by sending a test prompt, helping to diagnose performance issues. - `select_chat_model` - Presents a list of available local models to choose from before starting a chat. ## πŸ’¬ How to Interact with Ollama-MCP Ollama-MCP works **through your MCP client** (like Claude Desktop) - you don't interact with it directly. Instead, you communicate with your MCP client using **natural language**, and the client translates your requests into tool calls. ### **Basic Interaction Pattern** You speak to your MCP client in natural language, and it automatically uses the appropriate ollama-mcp tools: ``` You: "List my installed Ollama models" β†’ Client calls: list_local_models β†’ You get: Formatted list of your models You: "Chat with llama3.2: explain machine learning" β†’ Client calls: local_llm_chat with model="llama3.2" and message="explain machine learning" β†’ You get: AI response from your local model You: "Check if Ollama is running" β†’ Client calls: ollama_health_check β†’ You get: Server status and troubleshooting if needed ``` ### **Example Interactions** #### **Model Management** - *"What models do I have installed?"* β†’ `list_local_models` - *"I need a model for creative writing, which of my models is best?"* β†’ `suggest_models` - *"Remove the old mistral model to save space"* β†’ `remove_model` #### **System Operations** - *"Start Ollama server"* β†’ `start_ollama_server` - *"Is my system capable of running large AI models?"* β†’ `system_resource_check` #### **AI Chat** - *"Chat with llama3.2: write a Python function to sort a list"* β†’ `local_llm_chat` - *"Use deepseek-coder to debug this code: [code snippet]"* β†’ `local_llm_chat` - *"Ask phi3.5 to explain quantum computing simply"* β†’ `local_llm_chat` ### **Key Points** - **No Direct Commands**: You never call `ollama_health_check()` directly - **Natural Language**: Speak normally to your MCP client - **Automatic Tool Selection**: The client chooses the right tool based on your request - **Conversational**: You can ask follow-up questions and the client maintains context ## 🎯 Real-World Use Cases ### **Daily Development Workflow** *"I need to work on a coding project. Which of my local models is best for coding? Let's check its performance and then ask it a question."* This could trigger: 1. `suggest_models` - Recommends the best local model for "coding". 2. `test_model_responsiveness` - Checks if the recommended model is responsive. 3. `local_llm_chat` - Starts a chat with the model. ### **Model Management Session** *"Show me what models I have and recommend one for writing a story. Then let's clean up any old models I don't need."* Triggers: 1. `list_local_models` - Current inventory 2. `suggest_models` - Recommends a local model for "writing a story". 3. `remove_model` - Cleanup unwanted models. ### **Troubleshooting Session** *"Ollama isn't working. Check what's wrong, try to fix it, and test with a simple chat."* Triggers: 1. `ollama_health_check` - Diagnose issues 2. `start_ollama_server` - Attempt to start server 3. `local_llm_chat` - Verify working with test message ## πŸ—οΈ Architecture ### **Design Principles** - **Self-Contained**: Zero external MCP server dependencies - **Fail-Safe**: Comprehensive error handling with actionable guidance - **Cross-Platform First**: Universal Windows/Linux/macOS compatibility - **Enterprise Ready**: Professional-grade implementation and documentation ### **Technical Highlights** - **Internal Process Management**: Advanced subprocess handling with timeout control - **Multi-GPU Detection**: Platform-specific GPU identification without confusing metrics - **Intelligent Model Selection**: Fallback to first available model when none specified - **Progressive Health Monitoring**: Smart server startup detection with detailed feedback ## πŸ“‹ System Compatibility ### **Operating Systems** - **Windows**: Full support with auto-detection in Program Files and AppData βœ… **Tested** - **Linux**: XDG configuration support with package manager integration ⚠️ **Needs Testing** - **macOS**: Homebrew detection with Apple Silicon GPU support ⚠️ **Needs Testing** ### **GPU Support** - **NVIDIA**: Full detection via nvidia-smi with memory and utilization info βœ… **Tested RTX 4090** - **AMD**: ROCm support via vendor-specific tools ⚠️ **Needs Testing** - **Intel**: Basic detection via system tools ⚠️ **Needs Testing** - **Apple Silicon**: M1/M2/M3 detection with unified memory handling ⚠️ **Needs Testing** ### **Hardware Requirements** - **Minimum**: 4GB RAM, 2GB free disk space - **Recommended**: 8GB+ RAM, 10GB+ free disk space - **GPU**: Optional but recommended for model acceleration ## πŸ”§ Development ### **Project Structure** ``` ollama-mcp-server/ β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ __init__.py # Defines the package version β”‚ └── ollama_mcp/ β”‚ β”œβ”€β”€ __init__.py # Makes 'ollama_mcp' a package β”‚ β”œβ”€β”€ server.py # Main MCP server implementation β”‚ β”œβ”€β”€ client.py # Ollama API client β”‚ β”œβ”€β”€ config.py # Configuration management β”‚ β”œβ”€β”€ model_manager.py # Local model operations β”‚ β”œβ”€β”€ hardware_checker.py # System hardware analysis β”‚ └── ... (and other modules) β”œβ”€β”€ tests/ β”‚ β”œβ”€β”€ test_client.py # Unit tests for the client β”‚ └── test_tools.py # Integration tests for tools β”œβ”€β”€ .gitignore # Specifies intentionally untracked files └── pyproject.toml # Project configuration and dependencies ``` ### **Key Technical Achievements** #### **Self-Contained Implementation** - **Challenge**: Eliminated external `desktop-commander` dependency - **Solution**: Internal process management with advanced subprocess handling - **Result**: Zero external MCP dependencies, MIT license compatible #### **Intelligent GPU Detection** - **Challenge**: Complex VRAM reporting causing user confusion - **Solution**: Simplified to GPU name display only - **Result**: Clean, reliable hardware identification #### **Enterprise Error Handling** - **Implementation**: 6-level exception framework with specific error types - **Coverage**: Platform-specific errors, process failures, network issues - **UX**: Actionable troubleshooting steps for every error scenario ## 🀝 Contributing We welcome contributions! Areas where help is especially appreciated: - **Platform Testing**: Different OS and hardware configurations ⭐ **High Priority** - **GPU Vendor Support**: Additional vendor-specific detection - **Performance Optimization**: Startup time and resource usage improvements - **Documentation**: Usage examples and integration guides - **Testing**: Edge cases and error condition validation ### **Immediate Testing Needs** - **Linux**: Ubuntu, Fedora, Arch with various GPU configurations - **macOS**: Intel and Apple Silicon Macs with different Ollama installations - **GPU Vendors**: AMD ROCm, Intel Arc, Apple unified memory - **Edge Cases**: Different Python versions, various Ollama installation methods ### **Development Setup** ```bash git clone https://github.com/paolodalprato/ollama-mcp-server.git cd ollama-mcp-server # Install development dependencies pip install -e ".[dev]" # Run tests pytest # Code formatting black src/ isort src/ # Type checking mypy src/ ``` ## πŸ› Troubleshooting ### **Common Issues** #### **Ollama Not Found** ```bash # Verify Ollama installation ollama --version # Check PATH configuration which ollama # Linux/macOS where ollama # Windows ``` #### **Server Startup Failures** ```bash # Check port availability netstat -an | grep 11434 # Manual server start for debugging ollama serve ``` #### **Permission Issues** - **Windows**: Run as Administrator if needed - **Linux/macOS**: Check user permissions for service management ### **Platform-Specific Issues** If you encounter issues on Linux or macOS, please report them via GitHub Issues with: - Operating system and version - Python version - Ollama version and installation method - GPU hardware (if applicable) - Complete error output ## πŸ“Š Performance ### **Typical Response Times** *(Windows RTX 4090)* - **Health Check**: <500ms - **Model List**: <1 second - **Server Start**: 1-15 seconds (hardware dependent) - **Model Chat**: 2-30 seconds (model and prompt dependent) ### **Resource Usage** - **Memory**: <50MB for MCP server process - **CPU**: Minimal when idle, scales with operations - **Storage**: Configuration files and logs only ## πŸ” Security - **Data Flow**: User β†’ MCP Client (Claude) β†’ ollama-mcp-server β†’ Local Ollama β†’ back through chain ## πŸ‘¨β€πŸ’» About This Project This is my first MCP server, created by adapting a personal tool I had developed for my own Ollama management needs. ### **The Problem I Faced** I started using Claude to interact with Ollama because it allows me to use natural language instead of command-line interfaces. Claude also provides capabilities that Ollama alone doesn't have, particularly intelligent model suggestions based on both my system capabilities and specific needs. ### **My Solution** I built this MCP server to streamline my own workflow, and then refined it into a stable tool that others might find useful. The design reflects real usage patterns: - **Self-contained**: No external dependencies that can break - **Intelligent error handling**: Clear guidance when things go wrong - **Cross-platform**: Works consistently across different environments - **Practical tools**: Features I actually use in daily work ### **Design Philosophy** I initially developed this for my personal use to manage Ollama models more efficiently. When the MCP protocol became available, I transformed my personal tool into an MCP server to share it with others who might find it useful. **Development Approach**: This project was developed with Claude using "vibe coding" - an iterative, conversational development process where AI assistance helped refine both the technical implementation and user experience. It's a practical example of AI-assisted development creating tools for AI management. Jules was also involved in the final refactoring phase. ## πŸ“„ License MIT License - see [LICENSE](LICENSE) file for details. ## πŸ™ Acknowledgments - **Ollama Team**: For the excellent local AI platform - **MCP Project**: For the Model Context Protocol specification - **Claude Desktop/Code by Anthropic**: As tool in MCP client implementation, testing and refactoring - **Jules by Google**: As tool in refactoring ## πŸ“ž Support - **Bug Reports**: [GitHub Issues](https://github.com/paolodalprato/ollama-mcp-server/issues) - **Feature Requests**: [GitHub Issues](https://github.com/paolodalprato/ollama-mcp-server/issues) - **Community Discussion**: [GitHub Discussions](https://github.com/paolodalprato/ollama-mcp-server/discussions) --- ## Changelog * **v0.9.0 (August 17, 2025):** Critical bugfix release - Fixed datetime serialization issue that prevented model listing from working with Claude Desktop. All 9 tools now verified working correctly. * **August 2025:** Project refactoring and enhancements. Overhauled the architecture for modularity, implemented a fully asynchronous client, added a test suite, and refined the tool logic based on a "local-first" philosophy. * **July 2025:** Initial version created by Paolo Dalprato with Claude AI assistance. For detailed changes, see [CHANGELOG.md](CHANGELOG.md). --- **Status**: Beta on Windows, Other Platforms Need Testing **Testing**: Windows 11 + RTX 4090 validated, Linux/macOS require community validation **License**: MIT **Dependencies**: Zero external MCP servers required

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/paolodalprato/ollama-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server