Which integrations are available for this server?

Allows for the quantization of HuggingFace models into the GGUF format, making them compatible for local execution and deployment within the Ollama runtime.

mcp-turboquant

Self-contained Python MCP server for LLM quantization. Compress any HuggingFace model to GGUF, GPTQ, or AWQ format in a single tool call.

No external CLI required -- all quantization logic is embedded.

Install

pip install mcp-turboquant

Or run directly with uvx:

uvx mcp-turboquant

Optional backends

The info, check, and recommend tools work out of the box. For actual quantization, install the backend you need:

# GGUF (Ollama, llama.cpp, LM Studio)
pip install mcp-turboquant[gguf]

# GPTQ (vLLM, TGI)
pip install mcp-turboquant[gptq]

# AWQ (vLLM, TGI)
pip install mcp-turboquant[awq]

# Everything
pip install mcp-turboquant[all]

Configure

Claude Code

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "turboquant": {
      "command": "mcp-turboquant"
    }
  }
}

Or with uvx (no install needed):

{
  "mcpServers": {
    "turboquant": {
      "command": "uvx",
      "args": ["mcp-turboquant"]
    }
  }
}

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "turboquant": {
      "command": "uvx",
      "args": ["mcp-turboquant"]
    }
  }
}

Tools

Tool	Description	Heavy deps?
`info`	Get model info from HuggingFace (params, size, architecture)	No
`check`	Check available quantization backends on the system	No
`recommend`	Hardware-aware recommendation for best format + bits	No
`quantize`	Quantize a model to GGUF/GPTQ/AWQ	Yes
`evaluate`	Run perplexity evaluation on a quantized model	Yes
`push`	Push quantized model to HuggingFace Hub	No

Examples

Once configured, ask Claude:

"Get info on meta-llama/Llama-3.1-8B-Instruct"

"What quantization format should I use for Mistral-7B on my machine?"

"Quantize meta-llama/Llama-3.1-8B to 4-bit GGUF"

"Check which quantization backends I have installed"

"Evaluate the perplexity of my quantized model at /path/to/model.gguf"

"Push my quantized model to myuser/model-GGUF on HuggingFace"

How it works

Claude / Agent  <-->  MCP Protocol (stdio)  <-->  mcp-turboquant (Python)  <-->  llama-cpp-python / auto-gptq / autoawq

All quantization logic runs in-process. No external CLI tools needed.

Run directly

# As a command
mcp-turboquant

# As a module
python -m mcp_turboquant

License

MIT

Install Server

A

security – no known vulnerabilities

A

license - permissive license

A

quality - A tier

How are these scores calculated?

Resources

GitHub Repository

Need Help?

Related Servers

mcp-turboquant

mcp-turboquant

Install

Optional backends

Configure

Claude Code

Claude Desktop

Tools

Examples

How it works

Run directly

License

Resources

Tools

Latest Blog Posts

MCP directory API