MusicGPT MCP Server

generate_music

Create custom AI-generated music from text prompts, producing songs with or without lyrics, instrumental tracks, or vocal-only versions based on specified styles.

Instructions

Generate custom music from a text prompt using AI. Can create songs with or without lyrics, instrumental tracks, or vocal-only versions.

Input Schema

TableJSON Schema

Name	Required	Description
`prompt`	Yes	Natural language prompt for music generation (keep under 280 characters for best results)
`music_style`	No	Style of music to generate (e.g., Rock, Pop, Jazz, Hip-Hop)
`lyrics`	No	Custom lyrics for the generated music
`make_instrumental`	No	Whether to make the music instrumental (no vocals)
`vocal_only`	No	Whether to generate only vocals of output audio
`voice_id`	No	Voice model ID to use for vocals (use get_all_voices to find IDs)
`webhook_url`	No	URL for callback upon completion

Implementation Reference

src/index.ts:829-852 (handler)

The core handler function that implements the generate_music tool logic by making an API call to the MusicGPT /MusicAI endpoint with the provided arguments and returning a status response.

private async handleGenerateMusic(args: any) {
  if (!args.prompt) {
    throw new McpError(ErrorCode.InvalidParams, "prompt is required");
  }

  const response = await this.axiosInstance.post("/MusicAI", {
    prompt: args.prompt,
    music_style: args.music_style,
    lyrics: args.lyrics,
    make_instrumental: args.make_instrumental || false,
    vocal_only: args.vocal_only || false,
    voice_id: args.voice_id,
    webhook_url: args.webhook_url,
  });

  return {
    content: [
      {
        type: "text",
        text: `Music generation started!\n\n${JSON.stringify(response.data, null, 2)}\n\nUse get_conversion_by_id with the task_id or conversion_id to check the status.`,
      },
    ],
  };
}

src/index.ts:122-160 (schema)

The input schema and description for the generate_music tool, defining parameters like prompt, music_style, lyrics, etc., used for validation in MCP.

  name: "generate_music",
  description: "Generate custom music from a text prompt using AI. Can create songs with or without lyrics, instrumental tracks, or vocal-only versions.",
  inputSchema: {
    type: "object" as const,
    properties: {
      prompt: {
        type: "string",
        description: "Natural language prompt for music generation (keep under 280 characters for best results)",
      },
      music_style: {
        type: "string",
        description: "Style of music to generate (e.g., Rock, Pop, Jazz, Hip-Hop)",
      },
      lyrics: {
        type: "string",
        description: "Custom lyrics for the generated music",
      },
      make_instrumental: {
        type: "boolean",
        description: "Whether to make the music instrumental (no vocals)",
        default: false,
      },
      vocal_only: {
        type: "boolean",
        description: "Whether to generate only vocals of output audio",
        default: false,
      },
      voice_id: {
        type: "string",
        description: "Voice model ID to use for vocals (use get_all_voices to find IDs)",
      },
      webhook_url: {
        type: "string",
        description: "URL for callback upon completion",
      },
    },
    required: ["prompt"],
  },
},

src/index.ts:669-670 (registration)
The switch case in the main tool execution handler that registers and dispatches 'generate_music' calls to the specific handleGenerateMusic function.
```
case "generate_music":
  return await this.handleGenerateMusic(args);
```

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions AI-based generation and output types (songs, instrumental tracks, vocal-only versions), but fails to disclose critical behavioral traits such as processing time, rate limits, authentication needs, file formats, or error handling. This leaves significant gaps for an agent to understand operational constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized with two sentences that are front-loaded with the core purpose. Each sentence adds value: the first defines the tool, and the second elaborates on output variations. There is no redundant information, making it efficient, though it could be slightly more structured for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of a 7-parameter AI music generation tool with no annotations and no output schema, the description is incomplete. It lacks details on output format (e.g., audio file type, duration), error conditions, latency, or usage limits. This makes it inadequate for an agent to fully understand how to invoke and handle the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 7 parameters thoroughly. The description adds minimal value beyond the schema by hinting at capabilities like creating songs with/without lyrics, which loosely relates to parameters like 'lyrics', 'make_instrumental', and 'vocal_only', but does not provide additional syntax, format details, or constraints beyond what the schema specifies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Generate custom music from a text prompt using AI.' It specifies the verb ('generate'), resource ('custom music'), and mechanism ('from a text prompt using AI'), but does not explicitly differentiate it from sibling tools like 'generate_lyrics' or 'text_to_speech', which also involve AI generation from text.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by mentioning capabilities ('create songs with or without lyrics, instrumental tracks, or vocal-only versions'), which suggests when to use it for music generation. However, it lacks explicit guidance on when to choose this tool over alternatives like 'generate_lyrics' for lyrics-only tasks or 'text_to_speech' for speech, and does not specify prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pasie15/mcp-server-musicgpt'

If you have feedback or need assistance with the MCP directory API, please join our Discord server