Allows for the quantization of HuggingFace models into the GGUF format, making them compatible for local execution and deployment within the Ollama runtime.
mcp-turboquant
Self-contained Python MCP server for LLM quantization. Compress any HuggingFace model to GGUF, GPTQ, or AWQ format in a single tool call.
No external CLI required -- all quantization logic is embedded.
Install
pip install mcp-turboquantOr run directly with uvx:
uvx mcp-turboquantOptional backends
The info, check, and recommend tools work out of the box. For actual quantization, install the backend you need:
# GGUF (Ollama, llama.cpp, LM Studio)
pip install mcp-turboquant[gguf]
# GPTQ (vLLM, TGI)
pip install mcp-turboquant[gptq]
# AWQ (vLLM, TGI)
pip install mcp-turboquant[awq]
# Everything
pip install mcp-turboquant[all]Configure
Claude Code
Add to ~/.claude/settings.json:
{
"mcpServers": {
"turboquant": {
"command": "mcp-turboquant"
}
}
}Or with uvx (no install needed):
{
"mcpServers": {
"turboquant": {
"command": "uvx",
"args": ["mcp-turboquant"]
}
}
}Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"turboquant": {
"command": "uvx",
"args": ["mcp-turboquant"]
}
}
}Tools
Tool | Description | Heavy deps? |
| Get model info from HuggingFace (params, size, architecture) | No |
| Check available quantization backends on the system | No |
| Hardware-aware recommendation for best format + bits | No |
| Quantize a model to GGUF/GPTQ/AWQ | Yes |
| Run perplexity evaluation on a quantized model | Yes |
| Push quantized model to HuggingFace Hub | No |
Examples
Once configured, ask Claude:
"Get info on meta-llama/Llama-3.1-8B-Instruct"
"What quantization format should I use for Mistral-7B on my machine?"
"Quantize meta-llama/Llama-3.1-8B to 4-bit GGUF"
"Check which quantization backends I have installed"
"Evaluate the perplexity of my quantized model at /path/to/model.gguf"
"Push my quantized model to myuser/model-GGUF on HuggingFace"
How it works
Claude / Agent <--> MCP Protocol (stdio) <--> mcp-turboquant (Python) <--> llama-cpp-python / auto-gptq / autoawqAll quantization logic runs in-process. No external CLI tools needed.
Run directly
# As a command
mcp-turboquant
# As a module
python -m mcp_turboquantLicense
MIT