Provides access to Hugging Face's Inference API with 200,000+ machine learning models including text generation (LLMs like Llama, Mistral, Gemma), image generation (FLUX, Stable Diffusion), text classification, translation, summarization, question answering, speech recognition, text-to-speech, embeddings, and computer vision tasks like object detection and image captioning.
Hugging Face MCP Server
MCP server for accessing the Hugging Face Inference API. Run 200,000+ machine learning models including LLMs, image generation, text classification, embeddings, and more.
Features
Text Generation: LLMs like Llama-3, Mistral, Gemma
Image Generation: FLUX, Stable Diffusion XL, SD 2.1
Text Classification: Sentiment analysis, topic classification
Token Classification: Named entity recognition, POS tagging
Question Answering: Extract answers from context
Summarization: Condense long text
Translation: 200+ language pairs
Image-to-Text: Image captioning
Image Classification: Classify images into categories
Object Detection: Detect objects with bounding boxes
Text-to-Speech: Convert text to audio
Speech Recognition: Transcribe audio (Whisper)
Embeddings: Get text/sentence embeddings
And more: Fill-mask, sentence similarity
Setup
Prerequisites
Hugging Face account
API token (free or Pro)
Environment Variables
HUGGINGFACE_API_TOKEN
(required): Your Hugging Face API token
How to get an API token:
Click "New token"
Give it a name and select permissions (read is sufficient for inference)
Copy the token (starts with
hf_
)Store as
HUGGINGFACE_API_TOKEN
Available Tools
Text Generation Tools
text_generation
Generate text using large language models.
Parameters:
prompt
(string, required): Input text promptmodel_id
(string, optional): Model ID (default: 'mistralai/Mistral-7B-Instruct-v0.3')max_new_tokens
(int, optional): Maximum tokens to generatetemperature
(float, optional): Sampling temperature 0-2 (higher = more random)top_p
(float, optional): Nucleus sampling 0-1top_k
(int, optional): Top-k samplingrepetition_penalty
(float, optional): Penalty for repetitionreturn_full_text
(bool, optional): Return prompt + generation (default: False)
Popular models:
meta-llama/Llama-3.2-3B-Instruct
- Meta's Llama 3.2mistralai/Mistral-7B-Instruct-v0.3
- Mistral 7Bgoogle/gemma-2-2b-it
- Google Gemma 2HuggingFaceH4/zephyr-7b-beta
- Zephyr 7Btiiuae/falcon-7b-instruct
- Falcon 7B
Example:
Classification Tools
text_classification
Classify text into categories (sentiment, topics, etc.).
Parameters:
text
(string, required): Text to classifymodel_id
(string, optional): Model ID (default: 'distilbert-base-uncased-finetuned-sst-2-english')
Popular models:
distilbert-base-uncased-finetuned-sst-2-english
- Sentiment (positive/negative)facebook/bart-large-mnli
- Zero-shot classificationcardiffnlp/twitter-roberta-base-sentiment-latest
- Twitter sentimentfiniteautomata/bertweet-base-sentiment-analysis
- Tweet sentiment
Example:
token_classification
Token-level classification for NER, POS tagging, etc.
Parameters:
text
(string, required): Input textmodel_id
(string, optional): Model ID (default: 'dslim/bert-base-NER')
Popular models:
dslim/bert-base-NER
- Named Entity RecognitionJean-Baptiste/roberta-large-ner-english
- Large NER modeldbmdz/bert-large-cased-finetuned-conll03-english
- CoNLL-2003 NER
Example:
Question Answering & Text Processing
question_answering
Answer questions based on provided context.
Parameters:
question
(string, required): Question to answercontext
(string, required): Context containing the answermodel_id
(string, optional): Model ID (default: 'deepset/roberta-base-squad2')
Popular models:
deepset/roberta-base-squad2
- RoBERTa on SQuAD 2.0distilbert-base-cased-distilled-squad
- DistilBERT on SQuAD
Example:
summarization
Summarize long text into shorter version.
Parameters:
text
(string, required): Text to summarizemodel_id
(string, optional): Model ID (default: 'facebook/bart-large-cnn')max_length
(int, optional): Maximum summary lengthmin_length
(int, optional): Minimum summary length
Popular models:
facebook/bart-large-cnn
- BART CNN summarizationgoogle/pegasus-xsum
- PEGASUS XSumsshleifer/distilbart-cnn-12-6
- Distilled BART
Example:
translation
Translate text between languages.
Parameters:
text
(string, required): Text to translatemodel_id
(string, required): Model ID for language pair
Popular models:
Helsinki-NLP/opus-mt-en-es
- English to SpanishHelsinki-NLP/opus-mt-es-en
- Spanish to EnglishHelsinki-NLP/opus-mt-en-fr
- English to FrenchHelsinki-NLP/opus-mt-en-de
- English to Germanfacebook/mbart-large-50-many-to-many-mmt
- Multilingual (50 languages)
Example:
Image Generation Tools
text_to_image
Generate images from text prompts.
Parameters:
prompt
(string, required): Text description of desired imagemodel_id
(string, optional): Model ID (default: 'black-forest-labs/FLUX.1-dev')negative_prompt
(string, optional): What to avoid in imagenum_inference_steps
(int, optional): Number of denoising stepsguidance_scale
(float, optional): How closely to follow prompt
Popular models:
black-forest-labs/FLUX.1-dev
- FLUX.1 (high quality)stabilityai/stable-diffusion-xl-base-1.0
- SDXLstabilityai/stable-diffusion-2-1
- SD 2.1runwayml/stable-diffusion-v1-5
- SD 1.5
Example:
Computer Vision Tools
image_to_text
Generate text descriptions from images (captioning).
Parameters:
image_base64
(string, required): Base64 encoded imagemodel_id
(string, optional): Model ID (default: 'Salesforce/blip-image-captioning-large')
Popular models:
Salesforce/blip-image-captioning-large
- BLIP largenlpconnect/vit-gpt2-image-captioning
- ViT-GPT2
Example:
image_classification
Classify images into categories.
Parameters:
image_base64
(string, required): Base64 encoded imagemodel_id
(string, optional): Model ID (default: 'google/vit-base-patch16-224')
Popular models:
google/vit-base-patch16-224
- Vision Transformermicrosoft/resnet-50
- ResNet-50
Example:
object_detection
Detect objects in images with bounding boxes.
Parameters:
image_base64
(string, required): Base64 encoded imagemodel_id
(string, optional): Model ID (default: 'facebook/detr-resnet-50')
Popular models:
facebook/detr-resnet-50
- DETR with ResNet-50hustvl/yolos-tiny
- YOLOS tiny
Example:
Audio Tools
text_to_speech
Convert text to speech audio.
Parameters:
text
(string, required): Text to synthesizemodel_id
(string, optional): Model ID (default: 'facebook/mms-tts-eng')
Popular models:
facebook/mms-tts-eng
- MMS TTS Englishespnet/kan-bayashi_ljspeech_vits
- VITS LJSpeech
Example:
automatic_speech_recognition
Transcribe audio to text (speech recognition).
Parameters:
audio_base64
(string, required): Base64 encoded audiomodel_id
(string, optional): Model ID (default: 'openai/whisper-large-v3')
Popular models:
openai/whisper-large-v3
- Whisper large v3 (best quality)openai/whisper-medium
- Whisper medium (faster)facebook/wav2vec2-base-960h
- Wav2Vec 2.0
Example:
Embedding & Similarity Tools
sentence_similarity
Compute similarity between sentences.
Parameters:
source_sentence
(string, required): Reference sentencesentences
(list, required): List of sentences to comparemodel_id
(string, optional): Model ID (default: 'sentence-transformers/all-MiniLM-L6-v2')
Popular models:
sentence-transformers/all-MiniLM-L6-v2
- Fast, good qualitysentence-transformers/all-mpnet-base-v2
- Best qualityBAAI/bge-small-en-v1.5
- BGE small
Example:
feature_extraction
Get embeddings (feature vectors) for text.
Parameters:
text
(string, required): Input textmodel_id
(string, optional): Model ID (default: 'sentence-transformers/all-MiniLM-L6-v2')
Popular models:
sentence-transformers/all-MiniLM-L6-v2
- 384 dimensionssentence-transformers/all-mpnet-base-v2
- 768 dimensionsBAAI/bge-small-en-v1.5
- 384 dimensions
Example:
fill_mask
Fill in masked words in text.
Parameters:
text
(string, required): Text with [MASK] tokenmodel_id
(string, optional): Model ID (default: 'bert-base-uncased')
Popular models:
bert-base-uncased
- BERT baseroberta-base
- RoBERTa basedistilbert-base-uncased
- DistilBERT
Example:
Model Loading & Cold Starts
Important: Models may take 20-60 seconds to load on first request (cold start). Subsequent requests are faster.
Tips:
Use popular models for faster loading
Implement retry logic for timeouts
Consider caching model responses
Use smaller models for faster inference
Rate Limits
Free Tier
Rate limited to prevent abuse
Suitable for testing and small projects
May experience queuing during high load
Pro Subscription ($9/month)
No rate limits
Priority access to models
Faster inference
No queuing
Visit huggingface.co/pricing for details.
Base64 Encoding
For images and audio, you need to provide base64 encoded data:
Python example:
Parameter Tuning
Text Generation
temperature (0-2): Higher = more creative/random, Lower = more focused/deterministic
top_p (0-1): Nucleus sampling, typically 0.9-0.95
top_k: Number of highest probability tokens to keep
repetition_penalty: Penalize repeated tokens (>1.0 reduces repetition)
Image Generation
guidance_scale (1-20): Higher = follows prompt more strictly (typical: 7-7.5)
num_inference_steps: More steps = higher quality but slower (typical: 20-50)
negative_prompt: Describe what you don't want in the image
Error Handling
Common errors:
503 Service Unavailable: Model is loading (cold start), retry after 20-60 seconds
401 Unauthorized: Invalid or missing API token
429 Too Many Requests: Rate limit exceeded (upgrade to Pro)
400 Bad Request: Invalid parameters or model ID
504 Gateway Timeout: Model took too long to respond
Retry logic example:
Finding Models
Browse models:
Visit huggingface.co/models
Filter by task (Text Generation, Image Generation, etc.)
Sort by downloads, likes, or trending
Check model card for usage examples
Popular categories:
Text Generation: 50,000+ models
Text Classification: 30,000+ models
Image Generation: 10,000+ models
Translation: 5,000+ models
Embeddings: 3,000+ models
Best Practices
Use popular models: Faster loading and better maintained
Implement timeouts: Set appropriate timeouts (60-120 seconds)
Cache responses: Store results to reduce API calls
Handle cold starts: Implement retry logic for 503 errors
Monitor usage: Track API calls and costs
Test locally: Use Hugging Face Transformers library for testing
Read model cards: Understand model capabilities and limitations
Optimize parameters: Tune settings for your use case
Use Cases
Chatbots: LLM-powered conversational AI
Content Generation: Blog posts, articles, creative writing
Image Creation: Art, illustrations, product images
Sentiment Analysis: Customer feedback analysis
Translation: Multi-language support
Transcription: Meeting notes, podcast transcripts
Semantic Search: Embedding-based search
Data Extraction: NER for document processing
Content Moderation: Text and image classification
API Documentation
Support
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Enables access to 200,000+ machine learning models through the Hugging Face Inference API. Supports text generation, image creation, classification, translation, speech processing, embeddings, and more AI tasks.