MusicGPT MCP Server

sing_over_instrumental

Add AI-generated vocals to instrumental tracks by providing lyrics and selecting a voice model, enabling users to create complete songs with synthesized singing.

Instructions

Add AI-generated vocals over an instrumental track

Input Schema

TableJSON Schema

Name	Required	Description
`instrumental_url`	Yes	URL of the instrumental audio file
`lyrics`	Yes	Lyrics to sing
`voice_id`	Yes	Voice model ID to use for singing (use get_all_voices to find IDs)
`webhook_url`	No	URL for callback upon completion

Implementation Reference

src/index.ts:1183-1203 (handler)

Handler function that executes the tool: validates required parameters (instrumental_url, lyrics, voice_id), forwards the request to the MusicGPT API endpoint '/sing_over_instrumental', and returns task status information.

private async handleSingOverInstrumental(args: any) {
  if (!args.instrumental_url || !args.lyrics || !args.voice_id) {
    throw new McpError(ErrorCode.InvalidParams, "instrumental_url, lyrics, and voice_id are required");
  }

  const response = await this.axiosInstance.post("/sing_over_instrumental", {
    instrumental_url: args.instrumental_url,
    lyrics: args.lyrics,
    voice_id: args.voice_id,
    webhook_url: args.webhook_url,
  });

  return {
    content: [
      {
        type: "text",
        text: `Singing over instrumental started!\n\n${JSON.stringify(response.data, null, 2)}\n\nUse get_conversion_by_id with the task_id to check the status.`,
      },
    ],
  };
}

src/index.ts:493-518 (schema)

Input schema definition for the tool, specifying required parameters: instrumental_url, lyrics, voice_id, and optional webhook_url.

{
  name: "sing_over_instrumental",
  description: "Add AI-generated vocals over an instrumental track",
  inputSchema: {
    type: "object" as const,
    properties: {
      instrumental_url: {
        type: "string",
        description: "URL of the instrumental audio file",
      },
      lyrics: {
        type: "string",
        description: "Lyrics to sing",
      },
      voice_id: {
        type: "string",
        description: "Voice model ID to use for singing (use get_all_voices to find IDs)",
      },
      webhook_url: {
        type: "string",
        description: "URL for callback upon completion",
      },
    },
    required: ["instrumental_url", "lyrics", "voice_id"],
  },
},

src/index.ts:709-710 (registration)
Registration in the tool dispatch switch statement: maps tool name to the handleSingOverInstrumental handler.
```
case "sing_over_instrumental":
  return await this.handleSingOverInstrumental(args);
```
src/index.ts:645-650 (registration)
Registration of the tools list handler, which includes the sing_over_instrumental tool schema via the TOOLS array.
```
this.server.setRequestHandler(
  ListToolsRequestSchema,
  async () => ({
    tools: TOOLS,
  })
);
```
src/index.ts:68-68 (helper)
Tool type constant used in get_conversion_by_id helper tool for polling status of sing_over_instrumental conversions.
```
"SING_OVER_INSTRUMENTAL",
```

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden but provides minimal behavioral context. It mentions 'AI-generated vocals' and 'callback upon completion' (via webhook_url parameter), but doesn't disclose execution time, rate limits, authentication needs, output format, or whether this is a synchronous/asynchronous operation. This leaves significant gaps for a tool that likely involves processing time and resource usage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence that directly states the tool's function without unnecessary words. It's front-loaded with the core purpose and efficiently communicates the essential action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 4 parameters, no annotations, and no output schema, the description is insufficient. It doesn't explain what the tool returns (e.g., a URL to the generated audio, job ID, or error formats), processing characteristics, or important behavioral aspects. The context signals indicate this is a non-trivial audio processing tool that needs more complete documentation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description adds no additional parameter semantics beyond what's in the schema (e.g., it doesn't explain URL formats, voice_id selection process beyond referencing get_all_voices, or webhook behavior). Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as 'Add AI-generated vocals over an instrumental track', which specifies the action (add vocals) and resource (instrumental track). It distinguishes from siblings like 'text_to_speech' (no instrumental) and 'create_cover_song' (broader process), but doesn't explicitly contrast with all alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives is provided. The description implies usage for adding vocals to instrumentals, but doesn't mention when to choose this over similar tools like 'create_cover_song' or 'voice_changer', nor does it specify prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pasie15/mcp-server-musicgpt'

If you have feedback or need assistance with the MCP directory API, please join our Discord server