Skip to main content
Glama

generate_music

Create music from text descriptions by specifying genre, mood, and instruments. Generate audio tracks with customizable duration and lyrics support.

Instructions

Generate music from text descriptions. Use list_models with category='audio' to discover available models.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesDescription of the music (genre, mood, instruments)
modelNoModel ID (e.g., 'fal-ai/lyria2', 'fal-ai/stable-audio-25/text-to-audio'). Use list_models to see options.fal-ai/lyria2
duration_secondsNoDuration in seconds
negative_promptNoWhat to avoid in the audio (e.g., 'vocals, distortion, noise')
lyrics_promptNoLyrics for vocal music generation. Only used with models that support lyrics (e.g., MiniMax). Format: [verse]\nLyric line 1\n[chorus]\nChorus line

Implementation Reference

  • The main handler function that executes the generate_music tool logic: resolves model, prepares args, executes via queue_strategy, extracts and returns audio URL or error messages.
    async def handle_generate_music(
        arguments: Dict[str, Any],
        registry: ModelRegistry,
        queue_strategy: QueueStrategy,
    ) -> List[TextContent]:
        """Handle the generate_music tool."""
        model_input = arguments.get("model", "fal-ai/lyria2")
        try:
            model_id = await registry.resolve_model_id(model_input)
        except ValueError as e:
            return [
                TextContent(
                    type="text",
                    text=f"❌ {e}. Use list_models to see available options.",
                )
            ]
    
        duration = arguments.get("duration_seconds", 30)
    
        # Build arguments for the music model
        music_args: Dict[str, Any] = {
            "prompt": arguments["prompt"],
            "duration_seconds": duration,
        }
    
        # Add optional parameters if provided
        if "negative_prompt" in arguments:
            music_args["negative_prompt"] = arguments["negative_prompt"]
        if "lyrics_prompt" in arguments:
            music_args["lyrics_prompt"] = arguments["lyrics_prompt"]
    
        # Use queue strategy with timeout protection
        logger.info("Starting music generation with %s (%ds)", model_id, duration)
        try:
            music_result = await asyncio.wait_for(
                queue_strategy.execute(model_id, music_args, timeout=120),
                timeout=125,  # Slightly longer than internal timeout
            )
        except asyncio.TimeoutError:
            return [
                TextContent(
                    type="text",
                    text=f"❌ Music generation timed out after 120 seconds. Model: {model_id}",
                )
            ]
    
        if music_result is None:
            return [
                TextContent(
                    type="text",
                    text=f"❌ Music generation failed or timed out with {model_id}",
                )
            ]
    
        # Check for error in response
        if "error" in music_result:
            error_msg = music_result.get("error", "Unknown error")
            return [
                TextContent(
                    type="text",
                    text=f"❌ Music generation failed: {error_msg}",
                )
            ]
    
        # Extract audio URL from result
        audio_dict = music_result.get("audio", {})
        if isinstance(audio_dict, dict):
            audio_url = audio_dict.get("url")
        else:
            audio_url = music_result.get("audio_url")
    
        if audio_url:
            return [
                TextContent(
                    type="text",
                    text=f"🎵 Music generated with {model_id}: {audio_url}",
                )
            ]
    
        return [
            TextContent(
                type="text",
                text="❌ Music generation completed but no audio URL was returned. Please try again.",
            )
        ]
  • Defines the input schema, description, and Tool object for the generate_music tool.
    AUDIO_TOOLS: List[Tool] = [
        Tool(
            name="generate_music",
            description="Generate music from text descriptions. Use list_models with category='audio' to discover available models.",
            inputSchema={
                "type": "object",
                "properties": {
                    "prompt": {
                        "type": "string",
                        "description": "Description of the music (genre, mood, instruments)",
                    },
                    "model": {
                        "type": "string",
                        "default": "fal-ai/lyria2",
                        "description": "Model ID (e.g., 'fal-ai/lyria2', 'fal-ai/stable-audio-25/text-to-audio'). Use list_models to see options.",
                    },
                    "duration_seconds": {
                        "type": "integer",
                        "default": 30,
                        "minimum": 5,
                        "maximum": 300,
                        "description": "Duration in seconds",
                    },
                    "negative_prompt": {
                        "type": "string",
                        "description": "What to avoid in the audio (e.g., 'vocals, distortion, noise')",
                    },
                    "lyrics_prompt": {
                        "type": "string",
                        "description": "Lyrics for vocal music generation. Only used with models that support lyrics (e.g., MiniMax). Format: [verse]\\nLyric line 1\\n[chorus]\\nChorus line",
                    },
                },
                "required": ["prompt"],
            },
        ),
    ]
  • Registers the generate_music schema by including AUDIO_TOOLS in the complete list of available tools (ALL_TOOLS).
    ALL_TOOLS = (
        UTILITY_TOOLS + IMAGE_TOOLS + IMAGE_EDITING_TOOLS + VIDEO_TOOLS + AUDIO_TOOLS
    )
  • Registers the handler mapping for generate_music in the TOOL_HANDLERS dictionary used by the MCP server to route tool calls.
    TOOL_HANDLERS = {
        # Utility tools (no queue needed)
        "list_models": handle_list_models,
        "recommend_model": handle_recommend_model,
        "get_pricing": handle_get_pricing,
        "get_usage": handle_get_usage,
        "upload_file": handle_upload_file,
        # Image generation tools
        "generate_image": handle_generate_image,
        "generate_image_structured": handle_generate_image_structured,
        "generate_image_from_image": handle_generate_image_from_image,
        # Image editing tools
        "remove_background": handle_remove_background,
        "upscale_image": handle_upscale_image,
        "edit_image": handle_edit_image,
        "inpaint_image": handle_inpaint_image,
        "resize_image": handle_resize_image,
        "compose_images": handle_compose_images,
        # Video tools
        "generate_video": handle_generate_video,
        "generate_video_from_image": handle_generate_video_from_image,
        "generate_video_from_video": handle_generate_video_from_video,
        # Audio tools
        "generate_music": handle_generate_music,
    }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the need to discover models via list_models, which adds useful context about model dependencies. However, it doesn't describe output format (audio file type, size), latency, rate limits, authentication requirements, or error conditions, leaving significant behavioral gaps for a generative tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise with just two sentences. The first sentence states the core purpose, and the second provides essential usage guidance about model discovery. Every word earns its place with no redundancy or unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of a generative AI tool with 5 parameters and no output schema, the description is incomplete. While it covers purpose and model discovery guidance well, it lacks information about output format, audio quality, generation limits, error handling, and cost implications. With no annotations and no output schema, more behavioral context would be needed for optimal agent usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 5 parameters thoroughly. The description doesn't add parameter-specific information beyond what's in the schema, but it does provide the high-level context that parameters are for 'text descriptions' of music, which aligns with the schema. With excellent schema coverage, the baseline is 3, but the description's guidance about model discovery adds some value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('generate music') and resource ('from text descriptions'), distinguishing it from sibling tools like generate_image, generate_video, etc. It explicitly identifies the domain (audio generation) and the input type (text descriptions).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: for generating music from text descriptions. It also specifies an alternative action ('Use list_models with category='audio' to discover available models'), giving clear direction for model selection, which is a key prerequisite for effective tool usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/raveenb/fal-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server