Schema | Multimodal MCP Server

Multimodal MCP Server

Describes the environment variables required to run the server.

Name	Required	Description
`LOG_LEVEL`	No	Log level (default INFO).
`ALLOW_MKDIR`	No	Allow mkdir (default false).
`MCP_TEMP_DIR`	No	Temporary directory (default system temp dir).
`OPENAI_ORG_ID`	No	OpenAI organization ID.
`OPENAI_API_KEY`	Yes	Your OpenAI API key.
`OPENAI_PROJECT`	No	OpenAI project ID.
`MAX_INPUT_BYTES`	No	Maximum input bytes (default 25MB).
`OPENAI_BASE_URL`	No	Base URL for OpenAI API.
`MAX_OUTPUT_BYTES`	No	Maximum output bytes (default 25MB).
`OPENAI_MODEL_STT`	No	Model for speech-to-text.
`OPENAI_MODEL_TTS`	No	Model for text-to-speech.
`ENABLE_REMOTE_URLS`	No	Enable remote URLs (default false).
`OPENAI_MODEL_IMAGE`	No	Model for image generation.
`ALLOW_INSECURE_HTTP`	No	Allow insecure HTTP (default false).
`OPENAI_MODEL_VISION`	No	Model for vision tasks.
`OPENAI_MODEL_IMAGE_EDIT`	No	Model for image editing.
`ENABLE_PRESIGNED_UPLOADS`	No	Enable presigned uploads (default false).
`OPENAI_MODEL_AUDIO_ANALYZE`	No	Model for audio analysis.
`OPENAI_MODEL_AUDIO_TRANSFORM`	No	Model for audio transformation.

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": true }
`prompts`	{ "listChanged": false }
`resources`	{ "subscribe": false, "listChanged": false }
`experimental`	{ "tasks": { "list": {}, "cancel": {}, "requests": { "tools": { "call": {} }, "prompts": { "get": {} }, "resources": { "read": {} } } } }

Functions exposed to the LLM to take actions

Name	Description
image_generateC	Generate an image from a prompt and write it to the output reference.
image_analyzeC	Analyze an image and return text or schema-validated JSON.
image_editC	Edit or inpaint an image and write the result to the output reference.
image_extractC	Extract structured data from an image with schema validation.
image_to_specC	Convert an image into a structured textual spec.
audio_transcribeC	Transcribe audio to text and optionally write the transcript to a file.
audio_analyzeC	Analyze audio content and return text or schema-validated JSON.
audio_transformC	Transform speech audio based on an instruction and write output audio.
audio_ttsC	Generate speech audio from text and write it to the output reference.
multimodal_chainC	Execute a deterministic sequence of multimodal steps.

Interactive templates invoked by user choice

Name	Description
No prompts

Contextual data attached and managed by the client

Name	Description
No resources

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/soyrochus/m3cp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server