Skip to main content
Glama
180,147 tools. Last updated 2026-06-06 01:32

"llama.cpp" matching MCP tools:

  • Quantize a HuggingFace model to GGUF, GPTQ, or AWQ format with bit width selection (2-8). Reduces model size for deployment on Ollama, vLLM, LM Studio, or llama.cpp.
    MIT
  • Swap to a different llama.cpp model in a running session while preserving conversation context. Unloads current model, loads the requested one, and waits for readiness.
    Apache 2.0

Matching MCP Servers

  • Analyze Git repository codebases by asking questions to understand architecture, debug issues, review security, or evaluate code quality with AI-powered insights.
    MIT
  • Identify the currently loaded llama.cpp model in an active Claude Code session to verify which model is active for inference.
    Apache 2.0
  • Change the active LLM backend for AI task routing. Specify a backend ID to switch between different local models like Ollama, llama.cpp, or Gemini for processing tasks.