Skip to main content
Glama
baidu-xiling

Baidu Digital Human MCP Server

Official
by baidu-xiling

getVoices

Retrieve available voice IDs for speech synthesis, including system voices and cloned voices, to select appropriate digital human voices for audio generation.

Instructions

#工具说明:查询可用的发音人ID。

样例1:

用户输入:我之前克隆过哪些声音? 思考过程: 1.用户想要查询可用的发音人ID,需要使用“getVoices”工具。 2.工具需要参数,isSystem,一个参数。 3.从“克隆过的”可以推测希望查询克隆发音人ID,因此参数的值为“false”

样例2:

用户输入:我想用一个二十岁左右温柔小姐姐的声音。 思考过程: 1.用户想要查询可用的发音人ID,需要使用“getVoices”工具。 2.工具需要参数,isSystem,一个参数。 3.用户未明确指出发音人ID的来源,因此不传任何值。 4.从接口返回的内容中寻找describe中“二十岁”左右,gender中为“female”的音色,优先推荐给用户

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
isSysNo是否是系统音色,true获取系统音色,false获取克隆音色, 空查询所有音色

Implementation Reference

  • Registration of the 'getVoices' tool using @mcp.tool decorator with name, description, and usage examples.
    @mcp.tool(
        name="getVoices",
        description=(
        """
    #工具说明:查询可用的发音人ID。
    # 样例1:
    用户输入:我之前克隆过哪些声音?
    思考过程:
    1.用户想要查询可用的发音人ID,需要使用“getVoices”工具。
    2.工具需要参数,isSystem,一个参数。
    3.从“克隆过的”可以推测希望查询克隆发音人ID,因此参数的值为“false”
    # 样例2:
    用户输入:我想用一个二十岁左右温柔小姐姐的声音。
    思考过程:
    1.用户想要查询可用的发音人ID,需要使用“getVoices”工具。
    2.工具需要参数,isSystem,一个参数。
    3.用户未明确指出发音人ID的来源,因此不传任何值。
    4.从接口返回的内容中寻找describe中“二十岁”左右,gender中为“female”的音色,优先推荐给用户
        """)
    )
  • The handler function for 'getVoices' tool. Accepts isSys parameter, retrieves DHApiClient, calls get_voices, handles exceptions, returns MCPVoicesResponse.
    async def getVoices(
            isSys: Annotated[Optional[bool],
                Field(description="是否是系统音色,true获取系统音色,false获取克隆音色, 空查询所有音色", default=None)]
    ) -> MCPVoicesResponse:
        """
        Get the list of available voices via DH API.
    
        Args:
            isSys: 是否是系统音色: true获取系统音色, false获取克隆音色, 空查询所有音色
    
        """
        try:
            client = await getDhClient()
            ret = await client.get_voices(isSys)
            return ret
        except Exception as e:
            return MCPVoicesResponse(error=str(e))
  • Helper method in DHApiClient that performs the actual HTTP GET request to Baidu Digital Human API endpoint for voices, transforms response to MCPVoicesResponse.
    async def get_voices(self, isSys: Optional[bool] = None) -> MCPVoicesResponse:
        """Get the list of available voices from the API."""
        async def api_call():
            param = ""
            if isSys == True:
                param = "true"
            elif isSys == False:
                param = "false"
            return await self._make_request(f"api/digitalhuman/open/v1/tts/persons?isSystem={param}")
    
        def transform_data(data, mcp_class):
            return mcp_class(voices=data if len(data) > 0 else None)
    
        return await self._handle_api_request(
            api_call=api_call,
            response_model_class=VoicesResponse,
            mcp_response_class=MCPVoicesResponse,
            error_msg="No voices found.",
            transform_func=transform_data,
        )
  • Output schema for getVoices tool: MCPVoicesResponse, which includes list of voices (VoiceInfo objects).
    class MCPVoicesResponse(BaseDHResponse):
        """ MCP 音色列表响应 """
        voices: Optional[List[VoiceInfo]] = None
  • VoiceInfo model used in MCPVoicesResponse for individual voice details (perId, name, gender).
    class VoiceInfo(BaseModel):
        """ 音色数据 """
        perId: str
        name: str
        gender: Annotated[Optional[str], Field(description="性别", default=None)]
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It describes the tool's function and parameter usage through examples, but lacks critical behavioral details such as whether this is a read-only operation, potential rate limits, authentication requirements, or what the output format looks like. The examples add some context but don't fully compensate for the missing annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is overly verbose and poorly structured, consisting of lengthy example scenarios with thought processes rather than a clear, front-loaded explanation. It wastes space on hypothetical user interactions instead of concisely stating the tool's purpose and usage. Every sentence doesn't earn its place, as the examples could be condensed or replaced with direct guidelines.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete. It explains parameter usage through examples but fails to describe the tool's behavior, output format, or any constraints. For a tool with one parameter and no structured output documentation, the description should provide more comprehensive context about what the tool returns and how to interpret results.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, with the parameter 'isSys' well-documented in the schema. The description adds value through examples that illustrate how to interpret and use the parameter in different scenarios (e.g., setting to 'false' for cloned voices, omitting for all voices), but doesn't provide additional semantic information beyond what's in the schema. Baseline 3 is appropriate given the high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool '查询可用的发音人ID' (queries available voice IDs), which is a clear purpose. However, it doesn't differentiate from siblings like 'getFigures' or 'getVoiceCloneStatus' that might also retrieve voice-related data. The description focuses on examples rather than a concise statement of what the tool does.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance through two detailed examples, showing when to use the tool (to query voice IDs) and how to handle the parameter (isSystem). It implies usage for both cloned and system voices, but doesn't explicitly state when not to use it or compare to alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/baidu-xiling/mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server