Provides high-quality image generation using FLUX.1-dev model with automatic memory management, configurable parameters, and both MCP server and CLI interfaces for AI-powered image synthesis
FLUX MCP Server & CLI
A Model Context Protocol (MCP) server and command-line tool for generating images using FLUX.1-dev with automatic model unloading to save VRAM and power.
Features
šØ High-Quality Image Generation - Uses FLUX.1-dev for state-of-the-art image synthesis
ā” Lazy Loading - Model loads only when needed
š Auto-Unload - Automatically unloads model after configurable inactivity period (MCP mode)
š¾ Memory Efficient - Uses bfloat16 for optimal VRAM usage (~12GB)
š² Reproducible - Seed-based generation for consistent results
š Status Monitoring - Check model status and VRAM usage
š§ Runtime Configuration - Adjust timeout without restarting
š„ļø Dual Interface - Use via Claude Desktop (MCP) or command-line (CLI)
Requirements
Python 3.10+
NVIDIA GPU with 16GB+ VRAM (tested on RTX 4070 Ti Super)
CUDA toolkit installed
PyTorch with CUDA support
Installation
Clone the repository (or navigate to the project directory):
Install with UV (recommended):
Or install with pip:
Configure environment variables:
Configuration Options
Edit .env to customize:
MCP Server Registration
Add the server to your Claude Desktop configuration file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json
Or if installed globally with pip:
After adding the configuration, restart Claude Desktop.
CLI Usage
In addition to the MCP server for Claude Desktop, you can use FLUX directly from the command line for completely offline and private image generation.
Quick Start
Generate Command
The main command for image generation:
Options:
--steps, -s INTEGER- Number of inference steps (default: 28)--guidance, -g FLOAT- Guidance scale (default: 3.5)--width, -w INTEGER- Image width in pixels, must be multiple of 8 (default: 1024)--height, -h INTEGER- Image height in pixels, must be multiple of 8 (default: 1024)--seed INTEGER- Random seed for reproducibility--output, -o PATH- Custom output path (default: auto-generated)--output-dir PATH- Override output directory--interactive, -i- Interactive mode--verbose, -v- Verbose output with debug info
Examples:
Interactive Mode
Interactive mode allows you to generate multiple images without reloading the model:
Interactive workflow:
Enter your prompt
Configure parameters (steps, guidance, dimensions, seed)
Image generates and saves
Choose to generate another or exit
Model stays loaded between generations for faster subsequent images
Other Commands
Status Command:
Shows:
Model information
Output directory
CUDA availability
GPU name and VRAM usage
Model cache location
Config Command:
Displays current configuration from environment variables.
Open Output:
Opens the output directory in your file manager (Linux: xdg-open, macOS: open, Windows: explorer).
Output Files
Generated images are saved with metadata:
Image:
flux_YYYYMMDD_HHMMSS_SEED.pngMetadata:
flux_YYYYMMDD_HHMMSS_SEED.json
Metadata JSON contains:
CLI vs MCP Server
CLI Mode:
ā Completely offline and private (no Claude Desktop needed)
ā Direct control from terminal
ā Batch generation with interactive mode
ā No auto-unload (process terminates after generation)
ā Saves metadata JSON files
ā Rich terminal UI with progress bars
MCP Server Mode:
ā Integrated with Claude Desktop
ā Natural language interface
ā Auto-unload after timeout (saves power)
ā Persistent background process
ā Access from Claude conversations
Both modes share the same configuration, model cache, and output directory.
MCP Server Tools (Claude Desktop)
1. generate_image
Generate an image from a text prompt.
Parameters:
prompt(required): Text description of the imagesteps(optional): Number of inference steps (default: 28, range: 20-50)guidance_scale(optional): Guidance scale (default: 3.5, range: 1.0-10.0)width(optional): Image width in pixels (default: 1024, range: 256-2048)height(optional): Image height in pixels (default: 1024, range: 256-2048)seed(optional): Random seed for reproducibility (random if not provided)
Example Usage in Claude:
2. unload_model
Immediately unload the FLUX model from GPU memory.
Example Usage:
3. get_status
Check the current status of the FLUX generator.
Returns:
Model load status
Time remaining until auto-unload
Current VRAM usage
Last access time
Example Usage:
4. set_timeout
Change the auto-unload timeout at runtime.
Parameters:
timeout_seconds(required): New timeout in seconds (0 to disable)
Example Usage:
Usage Examples
Basic Image Generation
The server will:
Load the FLUX model (if not already loaded)
Generate the image
Save it to the output directory as
YYYYMMDD_HHMMSS_{seed}.pngReturn the file path, seed, and generation settings
Schedule auto-unload after 5 minutes (default)
Reproducible Generation
To generate the same image again, use the seed from a previous generation:
Custom Parameters
Memory Management
Check current status:
Manually unload to free VRAM:
Adjust auto-unload timeout:
How It Works
Auto-Unload Mechanism
Lazy Loading: The model is NOT loaded when the server starts
On-Demand Loading: Model loads automatically on first generation request
Timer Reset: Each generation resets the auto-unload timer
Automatic Cleanup: After the configured timeout with no activity:
Model is removed from memory
GPU cache is cleared (
torch.cuda.empty_cache())Python garbage collection runs
Seamless Reload: Model automatically reloads on next request
Memory Management
The server uses several strategies to minimize VRAM usage:
bfloat16 precision instead of float32 (saves ~50% VRAM)
Explicit cache clearing when unloading
Threading for non-blocking auto-unload
Lock-based synchronization for thread-safe operation
Output Files
Generated images are saved as:
Example: 20250126_143052_42.png
Troubleshooting
CUDA Out of Memory
Problem: Error during generation: "CUDA out of memory"
Note: The generator automatically uses sequential CPU offloading to reduce VRAM usage from ~28GB to ~12GB. This should work on 16GB GPUs like RTX 4070 Ti Super.
If you still get OOM errors:
Close other GPU applications:
# Check what's using VRAM nvidia-smiReduce image dimensions:
flux generate "prompt" --width 768 --height 768 # Or even smaller flux generate "prompt" --width 512 --height 512Reduce inference steps:
flux generate "prompt" --steps 20 # Default is 28Restart the process if VRAM isn't fully freed:
# CLI: Just run again (process exits after generation) # MCP: Restart Claude Desktop
Model Download Issues
Problem: Model download fails or times out
Solutions:
Check internet connection
Set a custom cache directory with more space:
FLUX_MODEL_CACHE=/path/to/large/disk/cacheDownload manually with HuggingFace CLI:
huggingface-cli download black-forest-labs/FLUX.1-dev
Server Not Responding
Problem: Claude Desktop doesn't see the tools
Solutions:
Check Claude Desktop logs for errors
Verify the configuration path is absolute
Ensure UV is in PATH or use full path to UV binary
Restart Claude Desktop after config changes
Test the server manually:
cd /path/to/flux-mcp uv run flux-mcp
Slow Generation
Problem: Image generation takes too long
Solutions:
Reduce
stepsparameter (try 20-25 instead of 28)Ensure GPU is being used (check with
nvidia-smi)Close background applications to free GPU resources
Check that CUDA is properly installed
Permission Errors
Problem: Cannot write to output directory
Solutions:
Check directory permissions
Set a different output directory in
.env:FLUX_OUTPUT_DIR=/home/$USER/flux_outputCreate the directory manually:
mkdir -p ~/flux_output chmod 755 ~/flux_output
Advanced Configuration
Custom Model Cache
To share the model cache across multiple projects or save space:
Disable Auto-Unload
To keep the model loaded permanently (uses more power but faster):
Or at runtime:
Logging
The server logs to stderr. To capture logs:
Performance Tips
Optimal Settings for RTX 4070 Ti Super (16GB)
Resolution: Up to 1024x1024 comfortably
Steps: 25-30 for good quality
Batch size: 1 (model doesn't support batching well)
Timeout: 300s for occasional use, 600s for active sessions
Generation Time Expectations
1024x1024, 28 steps: ~20-40 seconds (depending on prompt complexity)
512x512, 20 steps: ~5-10 seconds
First generation: +10-15 seconds for model loading
Technical Details
Architecture
Key Components
FluxGenerator: Manages model lifecycle, threading, and GPU memory (shared between CLI and MCP)
Config: Loads environment variables and provides defaults (shared)
MCP Server: Exposes tools via Model Context Protocol for Claude Desktop
CLI Tool: Direct command-line interface for offline usage
Thread Safety
The generator uses a threading lock (threading.Lock) to ensure:
Only one generation at a time
Safe model loading/unloading
No race conditions with auto-unload timer
License
MIT License - see LICENSE file for details
Contributing
Contributions welcome! Please:
Fork the repository
Create a feature branch
Make your changes
Submit a pull request
Support
For issues and questions:
Check the Troubleshooting section above
Review server logs for errors
Open an issue on GitHub
Changelog
v0.1.0 (2025-01-26)
Initial release
FLUX.1-dev integration
Auto-unload functionality (MCP mode)
Four MCP tools (generate, unload, status, set_timeout)
CLI tool with interactive mode (
fluxcommand)Shared architecture between CLI and MCP server
Comprehensive documentation with CLI and MCP usage examples
This server cannot be installed
local-only server
The server can only run on the client's local machine because it depends on local resources.
Enables high-quality image generation using FLUX.1-dev through Claude Desktop or CLI, with automatic model unloading to save VRAM and memory-efficient bfloat16 processing.