generateDhVideo
Create customized digital human videos using text or audio input, selecting avatar, voice, resolution, background, and camera angles. Add subtitles and automatic animations for personalized output.
Instructions
#工具说明:根据所选数字人像ID及发音人ID,生成数字人视频。
样例1:
用户输入:用数字人像ID为xxx,发音人ID为yyy的音色,视频的内容是“大家好,我是数字人播报的内容”,使用横屏全身的机位,视频背景用“https://digital-human-material.bj.bcebos.com/-%5BLjava.lang.String%3B%4046f6cc1e.png”,开启自动添加动作,开启字幕,生成一个1080P的数字人视频。 思考过程: 1.用户想要用人像ID生成一个数字人视频,对声音,背景,字幕,分辨率等有要求,不是一个简单的数字人视频,需要使用“generateDhVideo”工具。 2.工具需要FigureId,driveType,text,person,inputAudioUrl,width,hight,cameraID,enable,backgroundimageUrl,autoAnimoji这些参数。 3.FigureId是需要使用的人像ID,所以值为xxx。给的播报内容是文本,所以driveType是文本驱动,text为“大家好,我是数字人播报的内容”。发音人已经提供了ID,所以person的值是yyy,开启自动动作,所以autoAnimoji的值为true,开启字幕,所以enabled的值为true,分辨率为1080P,拆分为width的值为1920,hight的值为1080,backgroundimageUrl的值是“https://digital-human-material.bj.bcebos.com/-%5BLjava.lang.String%3B%4046f6cc1e.png”
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| autoAnimoji | No | 自动添加数字人动作 | |
| backgroundImageUrl | No | 背景图片 | |
| backgroundTransparent | No | 背景是否透明 | |
| callbackUrl | No | 回调地址 | |
| cameraId | No | 数字人相机机位,0:横屏半身, 1:竖屏半身, 2: 横屏全身, 3: 竖屏全身 | |
| driveType | No | 驱动类型, TEXT:文本驱动, VOICE: 音频驱动 | TEXT |
| figureId | No | 人像ID | |
| inputAudioUrl | No | 驱动音频URL | |
| resolutionHeight | No | 分辨率:高 | |
| resolutionWidth | No | 分辨率:宽 | |
| subtitleEnable | No | 是否启用字幕 | |
| text | No | 播报内容 | |
| voiceId | No | 音色ID |
Implementation Reference
- src/mcp_server_baidu_digitalhuman/dhserver.py:239-257 (registration)Registration of the generateDhVideo tool using @mcp.tool decorator, including the tool name and detailed usage description with examples.@mcp.tool( name="generateDhVideo", description=( """ #工具说明:根据所选数字人像ID及发音人ID,生成数字人视频。 # 样例1: 用户输入:用数字人像ID为xxx,发音人ID为yyy的音色,视频的内容是“大家好,我是数字人播报的内容”,使用横屏全身的机位,视频背景用\ “https://digital-human-material.bj.bcebos.com/-%5BLjava.lang.String%3B%4046f6cc1e.png”,\ 开启自动添加动作,开启字幕,生成一个1080P的数字人视频。 思考过程: 1.用户想要用人像ID生成一个数字人视频,对声音,背景,字幕,分辨率等有要求,不是一个简单的数字人视频,需要使用“generateDhVideo”工具。 2.工具需要FigureId,driveType,text,person,inputAudioUrl,width,hight,cameraID,enable,backgroundimageUrl,\ autoAnimoji这些参数。 3.FigureId是需要使用的人像ID,所以值为xxx。给的播报内容是文本,所以driveType是文本驱动,text为“大家好,我是数字人播报的内容”。\ 发音人已经提供了ID,所以person的值是yyy,开启自动动作,所以autoAnimoji的值为true,开启字幕,所以enabled的值为true,分辨率为1080P,\ 拆分为width的值为1920,hight的值为1080,backgroundimageUrl的值是\ “https://digital-human-material.bj.bcebos.com/-%5BLjava.lang.String%3B%4046f6cc1e.png” """) )
- The core handler function that implements the generateDhVideo tool logic. It validates inputs via Annotated Fields, constructs a VideoGenerateRequest using imported types, calls the DHApiClient's generate_avatar_video method, and returns MCPVideoGenerateResponse.async def generateDhVideo( figureId: Annotated[str, Field(description="人像ID", default=None)], voiceId: Annotated[str, Field(description="音色ID", default=None)], text: Annotated[str, Field(description="播报内容", default=None)], inputAudioUrl: Annotated[str, Field(description="驱动音频URL", default=None)], resolutionWidth: Annotated[int, Field(description="分辨率:宽", default=768)], resolutionHeight: Annotated[int, Field(description="分辨率:高", default=1280)], backgroundTransparent: Annotated[bool, Field(description="背景是否透明", default=False)], cameraId: Annotated[int, Field(description="数字人相机机位,0:横屏半身, 1:竖屏半身, 2: 横屏全身, 3: 竖屏全身", default=3)], backgroundImageUrl: Annotated[str, Field(description="背景图片", default=None)], callbackUrl: Annotated[str, Field(description="回调地址", default=None)], driveType: Annotated[Literal["TEXT", "VOICE"], Field(description="驱动类型, TEXT:文本驱动, VOICE: 音频驱动", default="TEXT")], subtitleEnable: Annotated[bool, Field(description="是否启用字幕", default=False)], autoAnimoji: Annotated[bool, Field(description="自动添加数字人动作", default=False)] ) -> MCPVideoGenerateResponse: """ Generate a new digital human video using the DH API. Args: figureId: 人像ID driveType: 驱动类型, TEXT:文本驱动, VOICE: 音频驱动 text: 文本内容,播报内容 voiceId: 音色id, inputAudioUrl: 驱动音频URL resolutionWidth: 分辨率宽 resolutionHeight: 分辨率高 backgroundTransparent: 背景透明 cameraId: 0:横屏半身, 1:竖屏半身, 2: 横屏全身, 3: 竖屏全身 subtitleEnable: 字幕 backgroundImageUrl: 背景图片 autoAnimoji: 自动添加数字人动作 callbackUrl: 回调地址 Returns: taskId: 任务ID """ try: request = VideoGenerateRequest( figureId=figureId, driveType=driveType, text=text, ttsParams=TtsParams(person=str(voiceId), speed="5", volume="5", pitch="5"), inputAudioUrl=inputAudioUrl, videoParams=VideoParams(width=resolutionWidth, height=resolutionHeight, transparent=backgroundTransparent), dhParams=DHParams(cameraId=cameraId), subtitleParams=SubtitleParams(subtitlePolicy="SRT", enabled=True) if subtitleEnable else None, backgroundImageUrl=backgroundImageUrl, callbackUrl=callbackUrl, autoAnimoji=autoAnimoji, ) client = await getDhClient() ret = await client.generate_avatar_video(request) return ret except Exception as e: return MCPVideoGenerateResponse(error=str(e))
- Input schema defined by Pydantic Annotated parameters in the handler function signature, including descriptions and defaults for tool parameters.figureId: Annotated[str, Field(description="人像ID", default=None)], voiceId: Annotated[str, Field(description="音色ID", default=None)], text: Annotated[str, Field(description="播报内容", default=None)], inputAudioUrl: Annotated[str, Field(description="驱动音频URL", default=None)], resolutionWidth: Annotated[int, Field(description="分辨率:宽", default=768)], resolutionHeight: Annotated[int, Field(description="分辨率:高", default=1280)], backgroundTransparent: Annotated[bool, Field(description="背景是否透明", default=False)], cameraId: Annotated[int, Field(description="数字人相机机位,0:横屏半身, 1:竖屏半身, 2: 横屏全身, 3: 竖屏全身", default=3)], backgroundImageUrl: Annotated[str, Field(description="背景图片", default=None)], callbackUrl: Annotated[str, Field(description="回调地址", default=None)], driveType: Annotated[Literal["TEXT", "VOICE"], Field(description="驱动类型, TEXT:文本驱动, VOICE: 音频驱动", default="TEXT")], subtitleEnable: Annotated[bool, Field(description="是否启用字幕", default=False)], autoAnimoji: Annotated[bool, Field(description="自动添加数字人动作", default=False)]