Skip to main content
Glama
egoughnour

Massive Context MCP

by egoughnour

rlm_ollama_status

Check Ollama server status and available models to see if free local inference is available for processing large contexts.

Instructions

Check Ollama server status and available models.

Returns whether Ollama is running, list of available models, and if the default model (gemma3:12b) is available. Use this to determine if free local inference is available.

Args: force_refresh: Force refresh the cached status (default: false)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
force_refreshNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The handler function for the rlm_ollama_status tool. It calls _check_ollama_status (with optional force_refresh), then adds a recommendation string based on whether Ollama is running and the default model is available, and also adds the best_provider field.
    @mcp.tool()
    async def rlm_ollama_status(force_refresh: bool = False) -> dict:
        """Check Ollama server status and available models.
    
        Returns whether Ollama is running, list of available models, and if the
        default model (gemma3:12b) is available. Use this to determine if free
        local inference is available.
    
        Args:
            force_refresh: Force refresh the cached status (default: false)
        """
        status = await _check_ollama_status(force_refresh=force_refresh)
    
        # Add recommendation based on status
        if status["running"] and status["default_model_available"]:
            status["recommendation"] = "Ollama is ready! Sub-queries will use free local inference by default."
        elif status["running"] and not status["default_model_available"]:
            default_model = DEFAULT_MODELS["ollama"]
            status["recommendation"] = f"Ollama is running but default model not found. Run: ollama pull {default_model}"
        else:
            status["recommendation"] = (
                "Ollama not available. Sub-queries will use Claude API. To enable free local inference, install Ollama and run: ollama serve"
            )
    
        # Add current best provider
        status["best_provider"] = _get_best_provider()
    
        return status
  • The only parameter is 'force_refresh' (bool, default False), which controls whether to bypass the cached Ollama status.
        force_refresh: Force refresh the cached status (default: false)
    """
  • The tool is registered with FastMCP via the @mcp.tool() decorator on line 1339, wrapping the rlm_ollama_status async function.
    @mcp.tool()
  • The underlying helper function that actually performs the Ollama status check. It uses a TTL cache (60s), queries the Ollama API /api/tags endpoint via httpx, and handles connection errors gracefully. Called by the rlm_ollama_status handler.
    async def _check_ollama_status(force_refresh: bool = False) -> dict:
        """Check Ollama server status and available models. Cached with TTL."""
        import time
    
        cache = _ollama_status_cache
        now = time.time()
    
        # Return cached result if still valid
        if not force_refresh and cache["checked_at"] is not None:
            if now - cache["checked_at"] < cache["ttl_seconds"]:
                return {
                    "running": cache["running"],
                    "models": cache["models"],
                    "default_model_available": cache["default_model_available"],
                    "cached": True,
                    "checked_at": cache["checked_at"],
                }
    
        # Check Ollama status
        if not HAS_HTTPX:
            cache.update(
                {
                    "checked_at": now,
                    "running": False,
                    "models": [],
                    "default_model_available": False,
                }
            )
            return {
                "running": False,
                "error": "httpx not installed",
                "models": [],
                "default_model_available": False,
                "cached": False,
            }
    
        ollama_url = os.environ.get("OLLAMA_URL", "http://localhost:11434")
    
        try:
            async with httpx.AsyncClient(timeout=5.0) as client:
                # Check if Ollama is running
                response = await client.get(f"{ollama_url}/api/tags")
                response.raise_for_status()
    
                data = response.json()
                models = [m.get("name", "") for m in data.get("models", [])]
    
                # Check if default model is available
                default_model = DEFAULT_MODELS["ollama"]
                # Handle model name variations (gemma3:12b vs gemma3:12b-instruct-q4_0)
                default_available = any(m.startswith(default_model.split(":")[0]) for m in models)
    
                cache.update(
                    {
                        "checked_at": now,
                        "running": True,
                        "models": models,
                        "default_model_available": default_available,
                    }
                )
    
                return {
                    "running": True,
                    "url": ollama_url,
                    "models": models,
                    "model_count": len(models),
                    "default_model": default_model,
                    "default_model_available": default_available,
                    "cached": False,
                    "checked_at": now,
                }
    
        except httpx.ConnectError:
            cache.update(
                {
                    "checked_at": now,
                    "running": False,
                    "models": [],
                    "default_model_available": False,
                }
            )
            return {
                "running": False,
                "url": ollama_url,
                "error": "connection_refused",
                "message": "Ollama server not running. Start with: ollama serve",
                "models": [],
                "default_model_available": False,
                "cached": False,
            }
        except Exception as e:
            cache.update(
                {
                    "checked_at": now,
                    "running": False,
                    "models": [],
                    "default_model_available": False,
                }
            )
            return {
                "running": False,
                "url": ollama_url,
                "error": "check_failed",
                "message": str(e),
                "models": [],
                "default_model_available": False,
                "cached": False,
            }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It discloses caching behavior and the force_refresh parameter to bypass cache, adding transparency. No side effects mentioned but none expected for a read-only status check.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is succinct with three sentences and an Args section, each sentence providing essential information without redundancy. Well-structured and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and the presence of an output schema, the description covers the key return information: whether running, list of models, and default model availability. Complete for the intended use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description fully explains the sole parameter 'force_refresh' with its purpose and default value, adding meaningful semantics beyond the schema type.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks Ollama server status and available models. It uses specific verb 'Check' and specific resource 'Ollama server status and available models', distinguishing it from sibling tools like setup commands.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this to determine if free local inference is available', providing a clear usage context. It does not specify when not to use, but the context with siblings implies alternatives for setup or analysis.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/egoughnour/massive-context-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server