Quantize a HuggingFace model to GGUF, GPTQ, or AWQ format with bit width selection (2-8). Reduces model size for deployment on Ollama, vLLM, LM Studio, or llama.cpp.
Swap to a different llama.cpp model in a running session while preserving conversation context. Unloads current model, loads the requested one, and waits for readiness.
Change the active LLM backend for AI task routing. Specify a backend ID to switch between different local models like Ollama, llama.cpp, or Gemini for processing tasks.