Boost.Audio
Server Details
AI audio tools for music producers — stem splitting, vocal removal, BPM & key detection, audio-to-MIDI, format conversion, trimming, video-to-audio extraction and AI song generation.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4.1/5 across 8 of 8 tools scored.
Each tool targets a unique audio processing task (e.g., BPM/key detection, format conversion, stem splitting) with no functional overlap, ensuring agents can easily distinguish between them.
All tool names follow a strict `boost_audio_verb_noun` pattern in snake_case, providing a predictable and clean naming convention across the entire set.
With exactly 8 tools, the server is well-scoped for an audio processing domain—comprehensive enough to cover key tasks without being overwhelming or sparse.
The tools cover essential audio operations (conversion, extraction, generation, splitting, trimming, etc.), but a merging or concatenation tool is missing, leaving a minor gap.
Available Tools
9 toolsboost_audio_bpm_key_finderBoost Audio - BPM and Key FinderAInspect
Detect the BPM (tempo) and musical key of an audio file. Use this when the user asks for the tempo, key, scale or harmonic information of a song.
| Name | Required | Description | Default |
|---|---|---|---|
| audio_url | No | Optional public URL to an audio/video file. If omitted, the user uploads the file in the rendered widget. | |
| file_name | No | Original file name. Required when audio_base64 is provided. | |
| file_type | No | Original mime type. Required when audio_base64 is provided. | |
| file_token | No | Token for a pre-uploaded large file (>22 MB). Obtained from POST /widget-api/upload-raw. Used instead of audio_base64 for large files. | |
| audio_base64 | No | Optional base64-encoded audio file payload. Used by the widget when the host iframe blocks CORS fetches. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=false and destructiveHint=false, so the description need not repeat that. The description adds no additional behavioral context beyond basic functionality (e.g., no mention of side effects or permissions). It is adequate but not enhanced beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two front-loaded sentences: first defines purpose, second provides usage guidance. No redundant or extra text. Every sentence is earned.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
While the description covers when to use the tool, it does not describe the output format (e.g., how BPM and key are returned) or any limitations (e.g., supported audio formats). Given no output schema, more context on return values would be helpful. Adequate but incomplete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema covers all 5 parameters with descriptions (100% coverage). The description does not add new semantic meaning beyond what the schema provides. Baseline 3 is appropriate when schema carries the full burden.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool detects BPM and musical key, with specific verbs 'detect' and resource 'BPM and musical key'. It also provides usage context ('when user asks for tempo, key, scale or harmonic information'), differentiating it from sibling tools like boost_audio_converter or boost_audio_vocal_remover.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly tells when to use this tool: when the user asks for tempo, key, scale, or harmonic information. However, it does not mention when not to use it or provide direct comparisons to alternatives, though siblings are listed separately.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
boost_audio_converterBoost Audio - Audio ConverterAInspect
Convert audio files between MP3, WAV, FLAC, OGG and M4A. Use this when the user wants to change the format of an audio file (e.g. WAV to MP3, MP3 to FLAC).
| Name | Required | Description | Default |
|---|---|---|---|
| audio_url | No | Optional public URL to an audio/video file. If omitted, the user uploads the file in the rendered widget. | |
| file_name | No | Original file name. Required when audio_base64 is provided. | |
| file_type | No | Original mime type. Required when audio_base64 is provided. | |
| file_token | No | Token for a pre-uploaded large file (>22 MB). Obtained from POST /widget-api/upload-raw. Used instead of audio_base64 for large files. | |
| audio_base64 | No | Optional base64-encoded audio file payload. Used by the widget when the host iframe blocks CORS fetches. | |
| target_format | No | Output format. Defaults to mp3. | mp3 |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations provide readOnlyHint=false, openWorldHint=true, destructiveHint=false. The description says 'Convert audio files,' which implies transformation. It does not disclose whether the original file is preserved, what happens on failure, or the exact behavior of the output. With openWorldHint=true, it likely creates a new file, but this is not stated. The description is consistent with annotations but adds minimal behavioral context beyond what annotations already imply.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences, front-loaded with the action and supported formats. It is concise, with no unnecessary words. Every sentence serves a purpose: stating the function and specifying when to use it.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The tool is straightforward with no output schema and full parameter coverage. The description covers the core functionality and use case. It does not mention the return value or success/error handling, but for a simple converter this is acceptable. A more complete description might include that the tool returns the converted file or a download link, but the current version is sufficient for an agent to understand its purpose.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, meaning all 6 parameters have descriptions in the input schema. The description does not add any additional meaning beyond the schema. It lists the formats but does not elaborate on how parameters like audio_url, file_token, or audio_base64 interact. The baseline is 3 when schema coverage is high, and the description does not compensate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool converts audio files and lists the supported formats (MP3, WAV, FLAC, OGG, M4A). It also provides a specific use case: 'when the user wants to change the format of an audio file (e.g. WAV to MP3, MP3 to FLAC).' This distinguishes it from sibling tools like boost_audio_vocal_remover or boost_audio_stem_splitter.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly tells when to use the tool: 'Use this when the user wants to change the format of an audio file.' It does not explicitly state when not to use it, but the context and sibling tool names (e.g., boost_audio_extract_audio_from_video, boost_audio_vocal_remover) make the boundaries clear. A small improvement would be mentioning that for extracting audio from video, another tool should be used.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
boost_audio_extract_audio_from_videoBoost Audio - Extract Audio from VideoAInspect
Extract the audio track from MP4, MOV, WebM or M4V video and download as MP3 or WAV. Use this when the user wants to get audio out of a video file.
| Name | Required | Description | Default |
|---|---|---|---|
| audio_url | No | Optional public URL to an audio/video file. If omitted, the user uploads the file in the rendered widget. | |
| file_name | No | Original file name. Required when audio_base64 is provided. | |
| file_type | No | Original mime type. Required when audio_base64 is provided. | |
| file_token | No | Token for a pre-uploaded large file (>22 MB). Obtained from POST /widget-api/upload-raw. Used instead of audio_base64 for large files. | |
| audio_base64 | No | Optional base64-encoded audio file payload. Used by the widget when the host iframe blocks CORS fetches. | |
| target_format | No | Output audio format. Defaults to mp3. | mp3 |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description discloses supported input video formats (MP4, MOV, WebM, M4V) and output audio formats (MP3, WAV), adding value beyond annotations. However, it does not discuss side effects, auth needs, or rate limits. Annotations already indicate non-read-only and non-destructive nature, so the description complements them adequately but not richly.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of two concise sentences with no wasted words. The first sentence states the function and supported formats, and the second provides usage context. It is front-loaded and efficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 6 optional parameters and no output schema, the description is reasonably complete. It covers the core function and output formats. The input parameters are well-documented in the schema, so the description does not need to repeat them. Minor omission: could mention that the user can provide audio via URL, base64, or file token, but schema handles that.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the baseline is 3. The description does not add any parameter-specific meaning beyond what the input schema provides; it merely restates the output formats which are already in the enum for target_format.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's function: extract audio track from video files (MP4, MOV, WebM, M4V) and download as MP3 or WAV. It uses a specific verb 'Extract' and identifies the resource, distinguishing it from sibling tools that handle BPM analysis, conversion, splitting, etc.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides a clear usage guideline: 'Use this when the user wants to get audio out of a video file.' This directly tells when to invoke this tool. While it doesn't explicitly state when not to use it, the sibling tools implicitly cover alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
boost_audio_song_generatorBoost Audio - AI Song GeneratorAInspect
Generate a full audio track from a text prompt using Boost Audio AI. Returns a mix and optionally separated stems openable in the Boost Audio DAW. Use this when the user wants AI-generated music, an instrumental, a song with lyrics or a quick demo from an idea.
| Name | Required | Description | Default |
|---|---|---|---|
| bpm | No | Optional tempo in beats per minute (60-180). | |
| style | No | Legacy alias — folded into `inspiration` if provided (e.g. 'lofi hiphop'). Prefer `inspiration` and `styles` instead. | |
| _jobId | No | Internal: job id returned by 'start'. Required when `_action='poll'`. | |
| prompt | No | Idea for the song. When `lyrics_mode='auto'` the AI uses this as the subject; when `lyrics_mode='write'` the contents become the actual lyrics. Equivalent to the 'Pomysł lub tekst / Idea or lyrics' field on boost.audio. | |
| styles | No | Up to 6 style preset ids from `/tools/song-styles` (e.g. ['lofi-hip-hop','synthpop','ambient']). Use `_action='styles'` to fetch the catalogue. | |
| _action | No | Internal action selector used by the widget. 'start' kicks off generation, 'poll' returns the current status, 'lyrics' asks Boost Audio AI to draft lyrics, 'styles' returns the style preset catalogue. | |
| _inline | No | Internal flag set by the widget when it wants the MCP server to drive the generation pipeline. | |
| duration | No | Track length in seconds (30-180). Defaults to 120. | |
| key_scale | No | Optional musical key, e.g. 'A minor' or 'C major'. | |
| _pollSecret | No | Internal: poll secret returned by 'start'. Required when `_action='poll'`. | |
| inspiration | No | Style, inspiration and arrangement notes (e.g. 'modern pop with synths, intimate vocal, wide chorus'). Maps to the 'Styl, inspiracje i instrukcje / Style, inspiration and instructions' field on boost.audio. | |
| lyrics_mode | No | 'auto' = AI writes lyrics from your idea (default). 'write' = treat `prompt` as ready-to-sing lyrics. | |
| instrumental | No | If true, no vocals are generated. Defaults to false. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate mutation (readOnlyHint=false) and external side effects (openWorldHint=true). The description adds that the output is a mix and optionally separated stems openable in a DAW, providing useful post-generation behavior. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences: first states core function, second adds output details, third gives usage context. No fluff, front-loaded, every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description explains what the tool returns and when to use it. The schema covers the 13 parameters in detail. Missing output schema is compensated by the description's output mention. Could add note about async/polling, but schema already captures that via _action.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, so each parameter is already documented in the schema with descriptions. The tool description adds no additional parameter-level detail beyond stating the overall purpose. Baseline 3 applies.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states 'Generate a full audio track from a text prompt', using a specific verb (generate) and resource (full audio track). It distinguishes from sibling tools (all audio processing), leaving no ambiguity about this being the generation tool.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly tells when to use: 'when the user wants AI-generated music, an instrumental, a song with lyrics or a quick demo from an idea.' It does not contrast with alternatives, but the sibling list implicitly shows this is the only generation tool.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
boost_audio_stem_splitterBoost Audio - Stem SplitterAInspect
Split a song into separate stems with Boost Audio AI: vocals, drums, bass and other (4-stem mode). Ultimate plan unlocks a 6-stem mode with separate guitar and piano. Use this when the user wants individual instrument stems.
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | Stem separation mode. 6-stem requires Ultimate plan. | 4-stem |
| audio_url | No | Optional public URL to an audio/video file. If omitted, the user uploads the file in the rendered widget. | |
| file_name | No | Original file name. Required when audio_base64 is provided. | |
| file_type | No | Original mime type. Required when audio_base64 is provided. | |
| file_token | No | Token for a pre-uploaded large file (>22 MB). Obtained from POST /widget-api/upload-raw. Used instead of audio_base64 for large files. | |
| audio_base64 | No | Optional base64-encoded audio file payload. Used by the widget when the host iframe blocks CORS fetches. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description discloses that 6-stem mode requires an Ultimate plan and explains the optional audio_url and widget upload behavior. Annotations already indicate non-destructive and open-world behavior. The description adds context beyond annotations but could elaborate on process duration or output format.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences long, front-loaded with the primary function, and every sentence adds value. No unnecessary words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description covers purpose, usage, and key constraints (plan requirement, upload options). However, it lacks details on the output format (e.g., returned as separate audio files or a zip), which would be helpful given the lack of an output schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema coverage, the description adds value by linking the mode parameter to the plan requirement. It also reiterates the audio_url behavior, providing practical context. A slight improvement would be to clarify the file_token and audio_base64 usage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states the tool splits a song into separate stems and lists the specific stems (vocals, drums, bass, other, and optionally guitar and piano). It clearly distinguishes from sibling tools like vocal remover and BPM key finder.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description directly states 'Use this when the user wants individual instrument stems,' providing a clear usage guideline. It does not explicitly mention when not to use it or name alternatives, but the context of sibling tools implies differentiation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
boost_audio_studio_builderBoost Audio - Studio BuilderAInspect
Open Soundra-style studio wizard inside this conversation. Asks the user 4 short questions (purpose, genre, budget, room size) and builds 3 studio proposals (budget / optimal / premium) with hardware per slot (microphone, audio interface, monitors, headphones, MIDI controller, DAW) and direct purchase links to supersound.pl with UTM tracking. Use when the user asks to recommend studio gear, plan a home studio for X PLN/EUR, swap a device, or compare price tiers. IMPORTANT: ALWAYS pass locale (pl/en/de) inferred from the user's chat language so prices are localized (PL=PLN, EN/DE=EUR) and product names + supersound.pl URLs are returned in the right language.
| Name | Required | Description | Default |
|---|---|---|---|
| name | No | Setup name for `_action='save'`. Defaults to 'Setup z MCP — {timestamp}'. | |
| slot | No | Optional slot id for `_action='search'` (e.g. 'microphone', 'interface'). Goes into `utm_content` for finer attribution. | |
| items | No | Setup payload for `_action='save'`. Free-form object — typically `{ slots: {...}, accessories: [...], total_price: number }`. | |
| limit | No | Optional result limit for `_action='search'` (1-20, default 6). | |
| budget | No | Total budget in PLN (or EUR for en/de locale). Required for `_action='build'`. Distributed across slots according to `purpose`. Range: 500-50000. | |
| locale | No | Localizes prices (PL=PLN, EN/DE=EUR) and supersound.pl shop URL prefix (/eu/ for EN, /de/ for DE). | |
| search | No | Free-text search for `_action='search'`, e.g. 'Shure SM7B' or 'Focusrite Scarlett'. | |
| _action | No | Action selector. 'build' (default): generate 3 proposals from budget+purpose. 'search': find specific products by category. 'get_mine': list saved setups (OAuth). 'save': persist current setup to user's account (OAuth). | |
| purpose | No | Primary use case. Drives slot allocation: 'vocal'=mic/interface heavy, 'podcast'=mic 35%/interface 20%, 'beat-making'=MIDI 20%/monitors 25%, 'production'=balanced studio, 'general'=balanced. | |
| category | No | Product category for `_action='search'`. Aliases internally to WooCommerce category slugs. | |
| max_price | No | Optional max price filter for `_action='search'`. | |
| min_price | No | Optional min price filter for `_action='search'`. | |
| room_size | No | Room size hint (informational, used by widget for monitor pair size suggestions). | |
| total_price | No | Optional total price hint for `_action='save'`. | |
| conversation_id | No | Optional conversation/session id from the LLM client. Goes into `utm_term` for sales attribution per chat session. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description discloses that some actions require OAuth ('save', 'get_mine') and that the tool generates proposals based on budget and purpose, which is consistent with annotations (readOnlyHint=false, destructiveHint=false). It could add more about rate limits or data handling.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single paragraph but contains all essential information without fluff. It could be broken into bullet points for clarity, but it remains efficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 15 parameters and multiple actions, the description covers the primary use cases and how to invoke each action. The lack of output schema is compensated by describing return values ('three tier proposals... with purchase links').
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The schema already describes all 15 parameters (100% coverage), so the baseline is 3. The description adds value by explaining the interaction between 'budget', 'purpose', and '_action', e.g., what 'build' does versus 'search'.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verbs like 'build', 'recommend', 'plan' and clearly states the resource ('recording/production studio setup') and outputs ('three tier proposals with hardware picks and purchase links'). It distinguishes itself from sibling tools which are audio processing utilities.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly lists when to use this tool: for gear recommendations, home studio planning, equipment selection, device swapping, or price tier comparisons. It implicitly excludes use for other audio tasks by mentioning the sibling context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
boost_audio_to_midiBoost Audio - Audio to MIDIAInspect
Convert a melody or drum track from audio to a downloadable MIDI file. Use midi_mode='melody' (default) for pitched instruments/vocals, 'drums' for drum transcription.
| Name | Required | Description | Default |
|---|---|---|---|
| audio_url | No | Optional public URL to an audio/video file. If omitted, the user uploads the file in the rendered widget. | |
| file_name | No | Original file name. Required when audio_base64 is provided. | |
| file_type | No | Original mime type. Required when audio_base64 is provided. | |
| midi_mode | No | 'melody' for pitched instruments/vocals (default), 'drums' for drum transcription. | melody |
| stem_hint | No | Instrument type hint for melody mode — helps the transcription engine. Ignored when midi_mode is 'drums'. | other |
| file_token | No | Token for a pre-uploaded large file (>22 MB). Obtained from POST /widget-api/upload-raw. Used instead of audio_base64 for large files. | |
| audio_base64 | No | Optional base64-encoded audio file payload. Used by the widget when the host iframe blocks CORS fetches. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate the tool is not read-only and not destructive. The description adds that the output is a downloadable MIDI file, which is useful context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences with the main purpose front-loaded and mode guidance immediately following. No unnecessary words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Description covers core functionality and mode selection. Lacks details on input methods (URL, upload, file_token) but those are covered in the schema. Adequate for the main use case.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so parameters are already documented. The description repeats the midi_mode options but doesn't add new information beyond what the schema provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it converts audio to a downloadable MIDI file, with explicit mention of melody and drum modes. It differentiates from sibling tools like converters and stem splitters.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear guidance on using midi_mode for melody vs drums. While it doesn't explicitly list when not to use this tool, the sibling tools cover other audio transformations, making the usage context clear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
boost_audio_trimmerBoost Audio - Audio TrimmerAInspect
Trim a section of an MP3 or WAV file with optional fade in/out. Use this when the user wants to cut a fragment from a longer recording.
| Name | Required | Description | Default |
|---|---|---|---|
| end_sec | No | End of the kept range, in seconds. | |
| fade_in | No | Fade-in length in seconds (default 0). | |
| fade_out | No | Fade-out length in seconds (default 0). | |
| audio_url | No | Optional public URL to an audio/video file. If omitted, the user uploads the file in the rendered widget. | |
| file_name | No | Original file name. Required when audio_base64 is provided. | |
| file_type | No | Original mime type. Required when audio_base64 is provided. | |
| start_sec | No | Start of the kept range, in seconds. | |
| file_token | No | Token for a pre-uploaded large file (>22 MB). Obtained from POST /widget-api/upload-raw. Used instead of audio_base64 for large files. | |
| audio_base64 | No | Optional base64-encoded audio file payload. Used by the widget when the host iframe blocks CORS fetches. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide readOnlyHint=false and destructiveHint=false, so the description's mention of trimming is consistent. However, the description does not add behavioral details beyond what is implied (e.g., non-destructive, returns new file).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences: first defines the action, second provides usage context. No extraneous information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 9 optional parameters, no output schema, and adequate annotations, the description covers the main use case. It could mention the output format or return value, but the schema descriptions for input methods are sufficient.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all parameters. The description mentions 'optional fade in/out' which aligns with fade_in/fade_out parameters, but adds no new meaning beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states the verb 'trim' and resource 'MP3 or WAV file', and differentiates from sibling audio tools by specifying 'cut a fragment from a longer recording'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicit usage context: 'Use this when the user wants to cut a fragment from a longer recording.' While no alternatives or when-not are mentioned, the sibling tools are distinct enough to avoid confusion.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
boost_audio_vocal_removerBoost Audio - Vocal RemoverAInspect
Separate vocals from instrumental in MP3/WAV using Boost Audio AI. Use this when the user wants to isolate vocals (acapella) or get an instrumental version of a song.
| Name | Required | Description | Default |
|---|---|---|---|
| audio_url | No | Optional public URL to an audio/video file. If omitted, the user uploads the file in the rendered widget. | |
| file_name | No | Original file name. Required when audio_base64 is provided. | |
| file_type | No | Original mime type. Required when audio_base64 is provided. | |
| file_token | No | Token for a pre-uploaded large file (>22 MB). Obtained from POST /widget-api/upload-raw. Used instead of audio_base64 for large files. | |
| audio_base64 | No | Optional base64-encoded audio file payload. Used by the widget when the host iframe blocks CORS fetches. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate this is not read-only, not destructive, and not idempotent. The description adds that it uses Boost Audio AI but does not disclose processing time, file size limits, or output format. With annotations, the description provides minimal additional behavioral insight.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, front-loaded with the core action and use case. Every sentence is valuable with no unnecessary words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
No output schema is present, but the description does not explain what the tool returns (e.g., separate audio files for vocals and instrumental). Complex input methods (multiple optional parameters) lack guidance on preferred approach. The description is insufficient for an agent to fully understand input/output expectations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with descriptions for all 5 parameters. The description does not add parameter-specific information beyond what the schema provides, so baseline score of 3 applies.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Separate', the resource 'vocals from instrumental', and specifies supported formats MP3/WAV. It distinguishes from the sibling stem splitter by focusing on vocal/instrumental isolation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states when to use: to isolate vocals or get instrumental version. However, it does not mention when not to use or provide alternatives, such as the sibling stem splitter for more detailed splitting.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!