mcp-turboquant
Allows for the quantization of HuggingFace models into the GGUF format, making them compatible for local execution and deployment within the Ollama runtime.
mcp-turboquant
Self-contained Python MCP server for LLM quantization. Compress any HuggingFace model to GGUF, GPTQ, or AWQ format in a single tool call.
No external CLI required -- all quantization logic is embedded.
Install
pip install mcp-turboquantOr run directly with uvx:
uvx mcp-turboquantOptional backends
The info, check, and recommend tools work out of the box. For actual quantization, install the backend you need:
# GGUF (Ollama, llama.cpp, LM Studio)
pip install mcp-turboquant[gguf]
# GPTQ (vLLM, TGI)
pip install mcp-turboquant[gptq]
# AWQ (vLLM, TGI)
pip install mcp-turboquant[awq]
# Everything
pip install mcp-turboquant[all]Configure
Claude Code
Add to ~/.claude/settings.json:
{
"mcpServers": {
"turboquant": {
"command": "mcp-turboquant"
}
}
}Or with uvx (no install needed):
{
"mcpServers": {
"turboquant": {
"command": "uvx",
"args": ["mcp-turboquant"]
}
}
}Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"turboquant": {
"command": "uvx",
"args": ["mcp-turboquant"]
}
}
}Tools
Tool | Description | Heavy deps? |
| Get model info from HuggingFace (params, size, architecture) | No |
| Check available quantization backends on the system | No |
| Hardware-aware recommendation for best format + bits | No |
| Quantize a model to GGUF/GPTQ/AWQ | Yes |
| Run perplexity evaluation on a quantized model | Yes |
| Push quantized model to HuggingFace Hub | No |
Examples
Once configured, ask Claude:
"Get info on meta-llama/Llama-3.1-8B-Instruct"
"What quantization format should I use for Mistral-7B on my machine?"
"Quantize meta-llama/Llama-3.1-8B to 4-bit GGUF"
"Check which quantization backends I have installed"
"Evaluate the perplexity of my quantized model at /path/to/model.gguf"
"Push my quantized model to myuser/model-GGUF on HuggingFace"
How it works
Claude / Agent <--> MCP Protocol (stdio) <--> mcp-turboquant (Python) <--> llama-cpp-python / auto-gptq / autoawqAll quantization logic runs in-process. No external CLI tools needed.
Run directly
# As a command
mcp-turboquant
# As a module
python -m mcp_turboquantLicense
MIT
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/ShipItAndPray/mcp-turboquant'
If you have feedback or need assistance with the MCP directory API, please join our Discord server