Skip to main content
Glama

chroma_cqt

Analyze audio files to extract chroma features for music analysis. Converts audio signals into chromatic content representation showing note timing and amplitude.

Instructions

Computes the chroma CQT of the given audio time series using librosa.
The chroma CQT is a representation of the audio signal in terms of its
chromatic content, which is useful for music analysis.
The chroma CQT is computed using the following parameters:
- path_audio_time_series_y: The path to the audio time series (CSV file).
    It's sometimes better to take harmonics only
- hop_length: The number of samples between frames.
- fmin: The minimum frequency of the chroma feature.
- n_chroma: The number of chroma bins (default is 12).
- n_octaves: The number of octaves to include in the chroma feature.
The chroma CQT is saved to a CSV file with the following columns:
- note: The note name (C, C#, D, etc.).
- time: The time position of the note in seconds.
- amplitude: The amplitude of the note at that time.
The path to the CSV file is returned.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
path_audio_time_series_yYes
hop_lengthNo
fminNo
n_chromaNo
n_octavesNo

Implementation Reference

  • The chroma_cqt tool handler: loads audio time series, computes chroma CQT using librosa.feature.chroma_cqt, saves note,time,amplitude to CSV, returns path.
    @mcp.tool()
    def chroma_cqt(
        path_audio_time_series_y: str,
        hop_length: int = 512,
        fmin: float = None,
        n_chroma: int = 12,
        n_octaves: int = 7,
    ) -> str:
        """
        Computes the chroma CQT of the given audio time series using librosa.
        The chroma CQT is a representation of the audio signal in terms of its
        chromatic content, which is useful for music analysis.
        The chroma CQT is computed using the following parameters:
        - path_audio_time_series_y: The path to the audio time series (CSV file).
            It's sometimes better to take harmonics only
        - hop_length: The number of samples between frames.
        - fmin: The minimum frequency of the chroma feature.
        - n_chroma: The number of chroma bins (default is 12).
        - n_octaves: The number of octaves to include in the chroma feature.
        The chroma CQT is saved to a CSV file with the following columns:
        - note: The note name (C, C#, D, etc.).
        - time: The time position of the note in seconds.
        - amplitude: The amplitude of the note at that time.
        The path to the CSV file is returned.
        """
        y = np.loadtxt(path_audio_time_series_y, delimiter=";")
        chroma_cq = librosa.feature.chroma_cqt(
            y=y,
            hop_length=hop_length,
            fmin=fmin,
            n_chroma=n_chroma,
            n_octaves=n_octaves,
        )
        # Save the chroma_cq to a CSV file
        name = path_audio_time_series_y.split("/")[-1].split(".")[0] + "_chroma_cqt"
        chroma_cq_path = os.path.join(tempfile.gettempdir(), name + ".csv")
        notes = ["C", "C#", "D", "D#", "E", "F", "F#", "G", "G#", "A", "A#", "B"]
        time_frames = np.arange(chroma_cq.shape[1])
        time_seconds = librosa.frames_to_time(time_frames, hop_length=hop_length)
    
        with open(chroma_cq_path, "w") as f:
            f.write("note,time,amplitude\n")
            for i, note in enumerate(notes):
                for t_index, amplitude in enumerate(chroma_cq[i]):
                    t = time_seconds[t_index]
                    f.write(f"{note},{t},{amplitude}\n")
        # Return the path to the CSV file
        return chroma_cq_path
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses that the tool saves output to a CSV file and returns the file path, which is useful behavioral context. However, it doesn't mention performance characteristics, memory usage, error conditions, or whether the operation is read-only/destructive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and well-structured with clear sections: purpose, parameters, and output format. Every sentence adds value, though the parameter explanations could be slightly more concise. It's front-loaded with the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters with 0% schema coverage and no output schema, the description does an excellent job explaining parameters and output format. It could improve by mentioning computational requirements or typical use cases, but it's largely complete for this audio processing tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate fully. It provides detailed explanations for all 5 parameters beyond just their names, including practical advice ('It's sometimes better to take harmonics only' for path_audio_time_series_y) and default values. This adds significant meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool computes chroma CQT from audio time series using librosa for music analysis. It specifies the verb 'computes' and resource 'chroma CQT', but doesn't explicitly differentiate from sibling tools like mfcc or tempo which are also audio analysis tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions chroma CQT is 'useful for music analysis' but provides no guidance on when to use this specific tool versus alternatives like mfcc or tempo. No explicit when/when-not statements or comparison to sibling tools are included.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/hugohow/mcp-music-analysis'

If you have feedback or need assistance with the MCP directory API, please join our Discord server