Skip to main content
Glama

make_bot_speak

Generate speech for meeting bots using text-to-speech technology to vocalize text during video calls with customizable voice options.

Instructions

Make a bot speak text during a meeting using text-to-speech

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
bot_idYesID of the bot that should speak
textYesText for the bot to speak
voice_language_codeNoVoice language code (optional, defaults to 'en-US')en-US
voice_nameNoVoice name (optional, defaults to 'en-US-Casual-K')en-US-Casual-K

Implementation Reference

  • The main handler function that validates inputs, constructs the speech data, makes the API POST request to /api/v1/bots/{bot_id}/speech, and returns a success message.
    private async makeBotSpeak(args: Record<string, unknown>) {
      const bot_id = args.bot_id as string;
      const text = args.text as string;
      const voice_language_code = (args.voice_language_code as string) || "en-US";
      const voice_name = (args.voice_name as string) || "en-US-Casual-K";
      
      if (!bot_id || typeof bot_id !== 'string') {
        throw new Error("Missing or invalid required parameter: bot_id");
      }
      
      if (!text || typeof text !== 'string') {
        throw new Error("Missing or invalid required parameter: text");
      }
      
      const speechData = {
        text,
        text_to_speech_settings: {
          google: {
            voice_language_code,
            voice_name
          }
        }
      };
    
      await this.makeApiRequest(`/api/v1/bots/${bot_id}/speech`, "POST", speechData);
    
      return {
        content: [
          {
            type: "text",
            text: `āœ… Bot ${bot_id} will speak: "${text}"\n\nšŸ”Š Voice: ${voice_name} (${voice_language_code})\nšŸ’” The bot should now be speaking in the meeting!`,
          },
        ],
      };
    }
  • The input schema defining parameters for the make_bot_speak tool, including required bot_id and text, and optional voice settings.
    inputSchema: {
      type: "object",
      properties: {
        bot_id: {
          type: "string",
          description: "ID of the bot that should speak",
        },
        text: {
          type: "string",
          description: "Text for the bot to speak",
        },
        voice_language_code: {
          type: "string",
          description: "Voice language code (optional, defaults to 'en-US')",
          default: "en-US",
        },
        voice_name: {
          type: "string",
          description: "Voice name (optional, defaults to 'en-US-Casual-K')",
          default: "en-US-Casual-K",
        },
      },
      required: ["bot_id", "text"],
    },
  • src/index.ts:266-293 (registration)
    The tool registration entry in the ListTools response, specifying name, description, and input schema.
    {
      name: "make_bot_speak",
      description: "Make a bot speak text during a meeting using text-to-speech",
      inputSchema: {
        type: "object",
        properties: {
          bot_id: {
            type: "string",
            description: "ID of the bot that should speak",
          },
          text: {
            type: "string",
            description: "Text for the bot to speak",
          },
          voice_language_code: {
            type: "string",
            description: "Voice language code (optional, defaults to 'en-US')",
            default: "en-US",
          },
          voice_name: {
            type: "string",
            description: "Voice name (optional, defaults to 'en-US-Casual-K')",
            default: "en-US-Casual-K",
          },
        },
        required: ["bot_id", "text"],
      },
    },
  • src/index.ts:410-411 (registration)
    The switch case in the CallToolRequest handler that dispatches calls to the makeBotSpeak handler.
    case "make_bot_speak":
      return await this.makeBotSpeak(args);
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but lacks behavioral details. It doesn't disclose permissions needed, rate limits, whether speech interrupts other audio, or what happens if the bot isn't in a meeting. 'Make a bot speak' implies a mutation, but no further context is given.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's function without redundancy. It's front-loaded with the core purpose and uses minimal words to convey the essential action, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (mutating bot behavior in meetings) and lack of annotations or output schema, the description is incomplete. It doesn't cover error conditions, response format, or integration with meeting context, leaving significant gaps for an AI agent to understand full usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are well-documented in the schema. The description adds no additional meaning beyond implying text-to-speech conversion, which is already clear from the tool name and schema. Baseline 3 is appropriate as the schema handles parameter semantics adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('make a bot speak') and the mechanism ('using text-to-speech'), specifying both verb and resource. It distinguishes from siblings like send_chat_message or send_image_to_meeting by focusing on speech output, though it doesn't explicitly contrast with them.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., bot must be in a meeting), exclusions, or comparisons to siblings like send_chat_message for text-based communication, leaving usage context implied.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/rexposadas/attendee-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server