Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault

No arguments

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": false
}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
infoA

Get model info from HuggingFace — parameters, size, architecture.

Lightweight call using the HuggingFace API. No GPU or heavy dependencies required.

Args: model: HuggingFace model ID (e.g. 'meta-llama/Llama-3.1-8B-Instruct') or local path to a model directory.

Returns: Model metadata including architecture, parameter count, size, hidden dimensions, number of layers, vocabulary size, and context length.

checkA

Check available quantization backends on this system.

Reports which quantization engines (GGUF/GPTQ/AWQ) are installed, whether PyTorch and transformers are available, GPU information (CUDA or Apple MPS), and system RAM.

No arguments required. Lightweight system check.

Returns: Dictionary of available backends and hardware info.

recommendA

Recommend best quantization format and bit width for a model.

Analyzes the model size and your hardware (GPU VRAM, Apple Silicon, system RAM) to suggest the optimal format (GGUF/GPTQ/AWQ) and bit width (2-8). Ranked recommendations with use-case explanations.

Args: model: HuggingFace model ID (e.g. 'meta-llama/Llama-3.1-8B-Instruct') or local path to a model directory.

Returns: Ranked recommendations with format, bits, reasoning, and use cases.

quantizeA

Quantize a HuggingFace model to GGUF, GPTQ, or AWQ format.

This is a heavy operation that downloads and compresses the model. Requires appropriate backend dependencies to be installed.

Args: model: HuggingFace model ID (e.g. 'meta-llama/Llama-3.1-8B-Instruct') or local path to a model directory. format: Output format — gguf, gptq, or awq. Default: gguf. bits: Quantization bit width — 2, 3, 4, 5, or 8. Default: 4. output_dir: Directory to write output files. Default: temp directory. target: Deployment target. ollama/llamacpp/lmstudio force GGUF, vllm forces AWQ.

Returns: Quantization result with file paths, sizes, and compression ratios.

evaluateA

Run perplexity evaluation on a quantized model.

Measures model quality after quantization using perplexity scoring. Lower perplexity = better quality. Includes a quality assessment (EXCELLENT/GOOD/FAIR/DEGRADED/POOR).

Args: model_path: Path to the quantized model file (GGUF) or directory (GPTQ/AWQ). format: Format of the quantized model. One of 'gguf', 'gptq', 'awq'. bits: Bit width used during quantization (for quality context).

Returns: Perplexity score, quality assessment, and evaluation metadata.

pushA

Push a quantized model to HuggingFace Hub.

Uploads all model files from the output directory to a HuggingFace repository. Generates a model card (README.md) with metadata. Requires HuggingFace authentication (huggingface-cli login or HF_TOKEN).

Args: repo_id: HuggingFace repository ID (e.g. 'username/model-GGUF-4bit'). model_dir: Local directory containing the quantized model files. model: Original model ID for the model card (optional). bits: Bit width used during quantization (for model card metadata).

Returns: Upload result with repository URL and file count.

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ShipItAndPray/mcp-turboquant'

If you have feedback or need assistance with the MCP directory API, please join our Discord server