Provides comprehensive access to OpenAI's API capabilities including chat completions with GPT models, text and image embeddings, DALL-E image generation, text-to-speech conversion, Whisper speech-to-text transcription, GPT-4 Vision image analysis, content moderation, and model management.
OpenAI MCP Server
MCP server providing comprehensive access to OpenAI's API capabilities.
Features
Core Capabilities
Chat Completions - Generate responses using GPT-4o, GPT-4, and GPT-3.5 models
Embeddings - Create vector embeddings for semantic search and similarity
Image Generation - Generate images with DALL-E 3 and DALL-E 2
Text-to-Speech - Convert text to natural-sounding speech
Speech-to-Text - Transcribe audio using Whisper
Vision Analysis - Analyze images with GPT-4 Vision
Content Moderation - Check content against usage policies
Model Management - List and explore available models
Installation
Clone this repository
Install dependencies:
Set up your environment variables:
Configuration
Get Your OpenAI API Key
Sign in or create an account
Click "Create new secret key"
Copy the key and add it to your
.env
file
Running the Server
HTTP Mode (Recommended for NimbleBrain)
Start the server:
The server will be available at http://localhost:8000
Claude Desktop Configuration
Add to your claude_desktop_config.json
:
HTTP Configuration:
Alternative - Direct Python (stdio):
If you need stdio mode instead of HTTP, you can run directly:
Windows (%APPDATA%\Claude\claude_desktop_config.json
):
macOS (~/Library/Application Support/Claude/claude_desktop_config.json
):
Available Tools
chat_completion
Generate conversational responses using OpenAI's chat models.
Parameters:
messages
(required): List of message objects with 'role' and 'content'model
: Model name (default: "gpt-4o-mini")temperature
: Creativity level 0-2 (default: 1.0)max_tokens
: Maximum response lengthresponse_format
: Optional "json_object" for JSON responses
Example:
create_embedding
Generate vector embeddings for text.
Parameters:
text
(required): Text to embedmodel
: Embedding model (default: "text-embedding-3-small")
Use Cases:
Semantic search
Document similarity
Clustering and classification
Recommendation systems
generate_image
Create images from text descriptions using DALL-E.
Parameters:
prompt
(required): Description of desired imagemodel
: "dall-e-3" or "dall-e-2" (default: "dall-e-3")size
: Image dimensions (1024x1024, 1792x1024, 1024x1792)quality
: "standard" or "hd" (DALL-E 3 only)n
: Number of images (1-10, only 1 for DALL-E 3)
text_to_speech
Convert text to natural-sounding audio.
Parameters:
text
(required): Text to convertvoice
: alloy, echo, fable, onyx, nova, shimmer (default: "alloy")model
: "tts-1" or "tts-1-hd" (default: "tts-1")speed
: Speech rate 0.25-4.0 (default: 1.0)
Returns: Base64 encoded MP3 audio
transcribe_audio
Transcribe audio to text using Whisper.
Parameters:
audio_file_base64
(required): Base64 encoded audio filemodel
: "whisper-1"language
: Optional language code (auto-detected if not provided)response_format
: json, text, srt, vtt, verbose_json
analyze_image
Analyze images using GPT-4 Vision.
Parameters:
image_url
(required): URL of image to analyzeprompt
: Question about the image (default: "What's in this image?")model
: Vision model (default: "gpt-4o-mini")max_tokens
: Maximum response length
moderate_content
Check if content violates OpenAI's usage policies.
Parameters:
text
(required): Content to moderatemodel
: "text-moderation-latest" or "text-moderation-stable"
Returns: Flags and scores for various content categories
list_models
Get all available OpenAI models with metadata.
Usage Examples
Chat Conversation
Generate Marketing Image
Create Product Description Embeddings
Transcribe Meeting Recording
Model Recommendations
Chat Models
gpt-4o: Best overall, multimodal, fast
gpt-4o-mini: Cost-effective, very fast
gpt-4-turbo: High intelligence, good for complex tasks
gpt-3.5-turbo: Fast and affordable for simple tasks
Embedding Models
text-embedding-3-small: Best price/performance (1536 dimensions)
text-embedding-3-large: Highest quality (3072 dimensions)
Image Models
dall-e-3: Higher quality, better prompt following
dall-e-2: More images per request, lower cost
Error Handling
The server includes comprehensive error handling for:
Invalid API keys
Rate limiting
Invalid model names
Malformed requests
Network timeouts
Rate Limits
OpenAI enforces rate limits based on your account tier:
Free tier: Limited requests per minute
Pay-as-you-go: Higher limits based on usage
Enterprise: Custom limits
Check your limits at: https://platform.openai.com/account/limits
Pricing
Approximate costs (check OpenAI pricing page for current rates):
GPT-4o: ~$2.50 per 1M input tokens
GPT-4o-mini: ~$0.15 per 1M input tokens
Text Embeddings: ~$0.02 per 1M tokens
DALL-E 3: ~$0.04 per standard image
Whisper: ~$0.006 per minute
Security Notes
Never commit your API key to version control
Use environment variables for sensitive data
Rotate API keys regularly
Monitor usage on OpenAI dashboard
Set spending limits in your OpenAI account
Troubleshooting
"Invalid API Key" Error
Verify key is correct in
.env
fileEnsure no extra spaces or quotes
Check key is active at https://platform.openai.com/api-keys
Rate Limit Errors
Implement exponential backoff
Upgrade account tier if needed
Reduce request frequency
Timeout Errors
Increase timeout in httpx client
Check network connectivity
Try with smaller requests
Resources
License
MIT License - feel free to use in your projects!
This server cannot be installed
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Provides comprehensive access to OpenAI's API capabilities including chat completions, image generation, embeddings, text-to-speech, speech-to-text, vision analysis, and content moderation. Enables users to interact with GPT models, DALL-E, Whisper, and other OpenAI services through natural language commands.