Skip to main content
Glama

HF MCP Server

A Model Context Protocol server that gives Claude (and any MCP-compatible client) direct access to the Hugging Face Hub — search models and datasets, fetch metadata, run inference on text, images and audio, all from a single conversation.

There is no official Hugging Face MCP server. This fills that gap.


What you can do

Ask Claude things like:

  • "Find the top 5 trending text-generation models on Hugging Face"

  • "Compare gpt2 and distilgpt2 — which has more downloads and likes?"

  • "What does the README of meta-llama/Llama-2-7b say about usage?"

  • "Is cardiffnlp/twitter-roberta-base-sentiment-latest ready for inference?"

  • "Classify the sentiment of: I absolutely loved this film"

  • "What's in this image?" (with an image URL)

  • "Transcribe this audio file" (with an audio URL or local path)


Related MCP server: Hugging Face MCP Server

Tools

Tool

Description

search_models

Search models by query, task, sort criteria

get_model_info

Full metadata for a specific model

get_model_readme

README of a model (usage docs, examples, paper)

compare_models

Side-by-side stats for a list of models

list_trending_models

Currently trending models, optionally filtered by task

get_inference_status

Check if a model is warm/cold/loading

run_inference

Run text inference (classification, QA, zero-shot, etc.)

run_image_inference

Image classification / object detection from URL or file

run_audio_inference

Speech-to-text / audio classification from URL or file

generate_text

Text generation with streaming (requires HF Pro)

list_datasets

Search datasets on the Hub

explain_model

Combined metadata + README in one call


Requirements


Installation

# 1. Clone the repo
git clone https://github.com/YOUR_USERNAME/hf-mcp-server.git
cd hf-mcp-server

# 2. Create and activate a virtual environment
python -m venv venv

# Windows
venv\Scripts\activate

# macOS / Linux
source venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Set your Hugging Face token
cp .env.example .env
# Edit .env and replace the placeholder with your real token

Configuration

Edit .env:

HF_TOKEN=hf_your_token_here
LOG_LEVEL=INFO

Get your token at huggingface.co/settings/tokens. A Read token is enough for all tools.


Connect to Claude Desktop

Open your Claude Desktop config file:

  • Windows: %APPDATA%\Claude\claude_desktop_config.json

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

Add the mcpServers entry (adjust the path to match your setup):

{
  "mcpServers": {
    "huggingface": {
      "command": "/absolute/path/to/hf-mcp-server/venv/bin/python",
      "args": ["/absolute/path/to/hf-mcp-server/main.py"]
    }
  }
}

Windows example:

{
  "mcpServers": {
    "huggingface": {
      "command": "C:\\Users\\YourName\\Projects\\hf-mcp-server\\venv\\Scripts\\python.exe",
      "args": ["C:\\Users\\YourName\\Projects\\hf-mcp-server\\main.py"]
    }
  }
}

Restart Claude Desktop. You should see the Hugging Face tools available in the toolbar.


Running the tests

pytest tests/ -v

All tests mock the HF API — no network calls, no token needed.


Architecture

hf-mcp-server/
├── main.py                  # FastMCP server — 12 tools defined with @mcp.tool()
├── config.py                # Environment variables and constants
├── src/
│   └── clients/
│       └── hf_client.py     # Async HF API wrapper
│           ├── HFClient     # Main client (httpx.AsyncClient)
│           ├── RateLimiter  # Sliding-window limiter (async, thread-safe)
│           └── TTLCache     # In-memory cache with TTL
└── tests/
    ├── test_hf_client.py    # Unit tests for RateLimiter and TTLCache
    └── test_tools.py        # Unit tests for all 12 MCP tools (mocked client)

Key design decisions:

  • Async throughouthttpx.AsyncClient + asyncio, no blocking requests calls.

  • Rate limiting — sliding window (not a fixed counter), implemented with asyncio.Lock so concurrent tool calls don't race each other.

  • TTL cache — all GET metadata calls are cached for 1 hour by default. Inference and inference-status calls skip the cache.

  • truststore — uses the OS native certificate store (needed on networks with TLS inspection/corporate proxies).

  • Error handling — every tool catches exceptions and returns {"status": "error", "error": "..."} instead of crashing the MCP connection.


Notes

  • generate_text uses Server-Sent Events streaming internally and returns the complete text when done. It requires a HF Pro account or inference credits — most text-generation models are not available on the free tier.

  • run_image_inference and run_audio_inference accept both remote URLs and absolute local file paths.

  • The HF Inference API routes requests through router.huggingface.co/hf-inference. Not all models are available on all providers — if you get a "Model not supported by provider" error, try a different model or check HF Inference docs.


License

MIT — see LICENSE.

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/javica98/hf-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server