Skip to main content
Glama

kobold_tts

Convert text to speech audio using KoboldAI's TTS capabilities for applications requiring voice output.

Instructions

Generate text-to-speech audio

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
apiUrlNohttp://localhost:5001
textYes
voiceNo
speedNo

Implementation Reference

  • Zod input schema definition for the kobold_tts tool, extending BaseConfigSchema with text, optional voice, and speed parameters.
    const TTSSchema = BaseConfigSchema.extend({
        text: z.string(),
        voice: z.string().optional(),
        speed: z.number().optional(),
    });
  • src/index.ts:218-222 (registration)
    Registration of the kobold_tts tool in the ListTools response, specifying name, description, and input schema.
    {
        name: "kobold_tts",
        description: "Generate text-to-speech audio",
        inputSchema: zodToJsonSchema(TTSSchema),
    },
  • src/index.ts:337-337 (registration)
    Internal mapping/registration of the kobold_tts tool to the KoboldAI endpoint '/api/extra/tts' and its schema within the POST endpoints dispatch table.
    kobold_tts: { endpoint: '/api/extra/tts', schema: TTSSchema },
  • Handler execution logic for kobold_tts (generic for POST tools): looks up endpoint and schema, validates input, proxies the POST request to the KoboldAI TTS endpoint via makeRequest, and returns the JSON response.
    if (postEndpoints[name]) {
        const { endpoint, schema } = postEndpoints[name];
        const parsed = schema.safeParse(args);
        if (!parsed.success) {
            throw new Error(`Invalid arguments: ${parsed.error}`);
        }
    
        const result = await makeRequest(`${apiUrl}${endpoint}`, 'POST', requestData);
        return {
            content: [{ type: "text", text: JSON.stringify(result, null, 2) }],
            isError: false,
        };
  • Helper function used by the handler to make HTTP requests to the KoboldAI backend API.
    async function makeRequest(url: string, method = 'GET', body: Record<string, unknown> | null = null) {
        const options: RequestInit = {
            method,
            headers: body ? { 'Content-Type': 'application/json' } : undefined,
        };
        
        if (body && method !== 'GET') {
            options.body = JSON.stringify(body);
        }
    
        const response = await fetch(url, options);
        if (!response.ok) {
            throw new Error(`KoboldAI API error: ${response.statusText}`);
        }
        
        return response.json();
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool generates audio but fails to mention critical traits: whether it's a read-only or mutating operation (though 'Generate' implies creation), potential rate limits, authentication needs (implied by apiUrl but not explicit), or what happens on failure. This is a significant gap for a tool with multiple parameters and no output schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with zero waste—'Generate text-to-speech audio' is front-loaded and directly conveys the core function. Every word earns its place, making it easy to parse quickly without unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, no output schema, no annotations), the description is incomplete. It doesn't cover behavioral aspects like error handling, output format (e.g., audio file type), or integration details (e.g., how apiUrl connects to Kobold). For a TTS tool with multiple configurable inputs, this minimal description fails to provide sufficient context for reliable use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate for undocumented parameters. It mentions 'text-to-speech' but adds no meaning beyond what the schema names imply (e.g., 'text' is the input, 'voice' and 'speed' affect output). It doesn't explain parameter roles, valid values (e.g., voice options), or defaults (apiUrl has a default in schema but not described). This leaves key semantics unclear.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Generate text-to-speech audio' clearly states the verb ('Generate') and resource ('text-to-speech audio'), making the tool's purpose immediately understandable. It distinguishes itself from siblings like kobold_chat or kobold_transcribe by focusing on speech synthesis rather than text generation or transcription. However, it doesn't specify the exact output format or quality, keeping it from a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing a running Kobold server), compare it to similar tools like kobold_generate for text, or specify use cases (e.g., converting text to audio for accessibility). This lack of context leaves the agent guessing about appropriate scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PhialsBasement/KoboldCPP-MCP-Server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server