Skip to main content
Glama

configure_voice_and_script

Configure voice model and narration script for a project, synchronize audio with slide pages using character-based marks, and optionally add timed pauses or additional voiceovers at specified positions.

Instructions

设置项目的配音配置、解说文案与声画同步 pageMarks;可选 page_holds:在某段 TTS 结束后插入「画面停留 + 可选旁白或静音」。若某项含可读 narration:对该文案单独 TTS,音长以实测为准(用于解说演示);若无 narration 或仅空白:用 durationMs 纯静音停留。afterChar 必须等于某段口播结束下标(charEnd);纯静音时 durationMs 建议 ≥ 嵌入视频时长(200~120000);整段 TTS+holds 粗估上限 10 分钟。多页口播前须自行划分 page_marks。渲染前须让用户在本机预览页核对,请调用 get_render_preview。

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
project_idYes项目 ID
scriptYes完整解说/配音文案
voice_modelYes单人配音的音色 ID
page_marksNo声画同步:将 script 按字符下标分配到各页。每项 { start, end, pageIdx },区间为 [start,end)(半开),须覆盖 0..script.length 且无重叠、按 start 递增;pageIdx 为幻灯片索引,与 upload_html 的页数一致。多页渲染时必填。
voice_assignmentsNo多人配音:每页分配不同音色 [{pageIdx, voiceId}]
voice_speedNo全局语速(0.5 ~ 2.0,默认 1.0)
speed_marksNo分段变速 [{start, end, speed}]
page_holdsNo可选:TTS 段后画面停留;可读 narration 则旁白 TTS,否则 durationMs 静音。afterChar=某口播段结束下标;与 get_material_guidelines 同读。
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description fully discloses key behaviors: how narration vs silence holds work, actual audio length measurement for narration, pure silence duration recommendations, total time limit of 10 minutes, and strict requirements for afterChar and page_marks. It effectively communicates what the tool does under different conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, using a single paragraph that delivers essential information without redundancy. It is front-loaded with the main purpose and follows with details. While it could benefit from bullet points for readability, the current structure effectively conveys all needed information in a compact form.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of 8 parameters and no output schema, the description is remarkably complete. It covers all necessary aspects: purpose, parameter usage, constraints, prerequisites (e.g., page_marks for multi-page), and a critical call-to-action (call get_render_preview before rendering). No gaps remain for an agent to misuse the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Although the schema covers 100% of parameters with descriptions, the tool description adds significant value by explaining relationships among parameters (e.g., page_marks must align with script, afterChar must match charEnd, narration triggers separate TTS). It provides context that the schema alone does not, such as durationMs ranges and the 10-minute limit, enhancing the agent's understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: configuring dubbing, script, and audio-visual synchronization with optional holds. It uses specific verbs and resources (设置配音配置、解说文案、声画同步) and distinguishes it from sibling tools like get_render_preview or upload_html, which serve different functions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use the tool, including prerequisites (project_id, script, voice_model), constraints (afterChar must equal charEnd, page_marks must cover script), and recommendations (durationMs range, total TTS+holds limit). It also mentions when not to use it implicitly by requiring page_marks for multi-page and calling get_render_preview before rendering. No explicit alternative is needed as sibling tools have distinct purposes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/trtian/flash-cast-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server