Skip to main content
Glama
baidu-xiling

Baidu Digital Human MCP Server

Official
by baidu-xiling

generateLite2dGeneralVideo

Create digital human videos from uploaded recordings using general lip-sync animation for basic video production needs.

Instructions

#工具说明:根据上传真人录制的视频生成数字人像,仅可用于基础视频制作,数字人使用通用口型驱动。

样例1:

用户输入:用fileid为xxx的视频文件,生成数字人,命名为“zhangsan”,是个男生的形象。 思考过程: 1.用户想要生成数字人像,需要使用“generateLite2dGeneralVideo”工具。 2.工具需要参数,name,gender,keepBackground,templateVideoId四个参数。 3.用户提到了fileID为xxx,所以templateVideoid的值为xxx,name为zhangsan,男生的形象,gender的值为male,未提到是否保留背景所以keepBackground默认为false。

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nameNo名称
genderNo性别FEMALE
templateVideoIdNo视频:视频文件/底板视频,来自 uploadFiles 返回的fileId
keepBackgroundNo是否保留背景
maskVideoIdNo遮罩,底板视频对应的mask视频

Implementation Reference

  • Main handler function that implements the core logic of the 'generateLite2dGeneralVideo' tool by constructing a Lite2dGenerateRequest and invoking the DH API client.
    async def generateLite2dGeneralVideo(
            name: Annotated[str, Field(description="名称", default=None)],
            gender: Annotated[Literal["MALE", "FEMALE"], Field(description="性别", default="FEMALE")],
            templateVideoId: Annotated[str, Field(description="视频:视频文件/底板视频,来自 uploadFiles 返回的fileId", default=None)],
            keepBackground: Annotated[bool, Field(description="是否保留背景", default=False)],
            maskVideoId: Annotated[str, Field(description="遮罩,底板视频对应的mask视频", default=None)]
    ) -> MCPLite2DGenerateResponse:
        """
        Generate a lite 2d general avatar video from template video file via the DH API.
    
        Args:
            name (str): 数字人名称
            gender (str): 性别,MALE 或 FEMALE,来自 getGender
            keepBackground (bool): 是否保留背景
            templateVideoId (str): 视频,视频文件,底板视频,来自 uploadFiles 返回的fileId
            maskVideoId (str): 遮罩,底板视频对应的mask视频,可选,来自 uploadFiles 返回的fileId
        """
        try:
            client = await getDhClient()
            req = Lite2dGenerateRequest(
                name=name,
                customizeType="LITE_2D_GENERAL",
                gender=gender,
                keepBackground=keepBackground,
                templateVideoId=templateVideoId,
                maskVideoId=maskVideoId if maskVideoId != "" else None,
            )
    
            ret = await client.generate_lite2d_video(req)
            return ret
        except Exception as e:
            return MCPLite2DGenerateResponse(error=str(e))
  • MCP tool registration decorator defining the tool name, description, and input parameters via type annotations.
    @mcp.tool(
        name="generateLite2dGeneralVideo",
        description=(
        """
    #工具说明:根据上传真人录制的视频生成数字人像,仅可用于基础视频制作,数字人使用通用口型驱动。
    # 样例1:
    用户输入:用fileid为xxx的视频文件,生成数字人,命名为“zhangsan”,是个男生的形象。
    思考过程:
    1.用户想要生成数字人像,需要使用“generateLite2dGeneralVideo”工具。
    2.工具需要参数,name,gender,keepBackground,templateVideoId四个参数。
    3.用户提到了fileID为xxx,所以templateVideoid的值为xxx,name为zhangsan,男生的形象,gender的值为male,未提到是否保留背景所以keepBackground默认为false。
        """)
    )
  • Pydantic model defining the input schema (Lite2dGenerateRequest) used in the tool request to the DH API.
    class Lite2dGenerateRequest(BaseModel):
        """ 通用口型驱动请求 """
        name: str
        customizeType: str
        gender: str
        keepBackground: bool = True
        templateVideoId: str
        lipVideoId: Optional[str] = None
        maskVideoId: Optional[str] = None
        callbackUrl: Optional[str] = None
  • Pydantic model for the MCP response schema (MCPLite2DGenerateResponse) returned by the tool.
    class MCPLite2DGenerateResponse(BaseDHResponse):
        """ MCP 2D小样本数字人响应 """
        figureId: Optional[str] = None
  • DHApiClient helper method that performs the actual HTTP API call to generate the Lite 2D video.
    async def generate_lite2d_video(self, video_request: Lite2dGenerateRequest) -> MCPLite2DGenerateResponse:
        """Generate a lite 2d avatar video from template video file via the DH API."""
        async def api_call():
            return await self._make_request(
                "api/digitalhuman/open/v1/figure/lite2d/train", method="POST", data=video_request.model_dump()
            )
    
        ret = await self._handle_api_request(
            api_call=api_call,
            response_model_class=CommonDHResponse,
            mcp_response_class=MCPLite2DGenerateResponse,
            error_msg="No video generation data returned.",
            figureId=lambda d: d.get("figureId"),
        )
        return ret
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions the tool is for '基础视频制作' (basic video production) and uses '通用口型驱动' (generic mouth shape driving), which adds some behavioral context. However, it lacks critical details: whether this is a read/write operation, permission requirements, rate limits, processing time, or what happens to the input video. For a tool that likely creates resources, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is poorly structured: it starts with a header '#工具说明:' followed by an example that dominates the text. The example includes unnecessary '思考过程' (thinking process) that doesn't belong in a tool description. While the core purpose is stated, the formatting wastes space and reduces clarity, making it less front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete. It doesn't explain what the tool returns (e.g., a video ID, status object), error conditions, or prerequisites (e.g., needing uploadFiles first). For a 5-parameter tool that likely creates digital avatars, more context on behavior and results is needed to be fully helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 5 parameters with descriptions. The description adds minimal value: it names parameters (name, gender, keepBackground, templateVideoId) in the example and implies templateVideoId comes from uploadFiles. However, it doesn't explain maskVideoId or provide deeper semantic context beyond what the schema offers, meeting the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: '根据上传真人录制的视频生成数字人像' (generate a digital avatar from uploaded real-person video). It specifies the resource (digital avatar) and verb (generate), and distinguishes it from siblings like generateText2Audio or generateVoiceCloneLite. However, it doesn't explicitly differentiate from generateDh123Video or generateDhVideo, which might have similar purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides some usage context: '仅可用于基础视频制作,数字人使用通用口型驱动' (only for basic video production, digital avatar uses generic mouth shape driving). This implies when to use it (basic video needs) but doesn't explicitly state when not to use it or name alternatives among siblings. The example helps illustrate usage but doesn't provide comprehensive guidelines.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/baidu-xiling/mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server