Skip to main content
Glama

create_lipsync

Synchronize mouth movements in videos with audio using text-to-speech or custom audio upload. Works with human faces in real, 3D, or 2D videos to create lip-sync content.

Instructions

Create a lip-sync video by synchronizing mouth movements with audio. Supports both text-to-speech (TTS) with various voice options or custom audio upload. The original video must contain a clear, steady human face with visible mouth. Works with real, 3D, or 2D human characters (not animals). Video length limited to 10 seconds.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
video_urlYesURL of the video to apply lip-sync to (must contain clear human face)
audio_urlNoURL of custom audio file (mp3, wav, flac, ogg; max 20MB, 60s). If provided, TTS parameters are ignored
tts_textNoText for text-to-speech synthesis (used only if audio_url is not provided)
tts_voiceNoVoice style for TTS (default: male-warm). Includes Chinese and English voice options
tts_speedNoSpeech speed for TTS (0.5-2.0, default: 1.0)
model_nameNoModel version to use (default: kling-v2-master)

Implementation Reference

  • Core handler function that implements the lip-sync tool logic by calling the Kling AI lip-sync API endpoint, processing video URLs, handling audio upload vs TTS modes, and returning the task ID.
    async createLipsync(request: LipsyncRequest): Promise<{ task_id: string }> { const path = '/v1/videos/lip-sync'; // Process video URL const video_url = await this.processImageUrl(request.video_url); const input: any = { video_url: video_url!, }; if (request.audio_url) { input.mode = 'audio2video'; input.audio_type = 'url'; input.audio_url = request.audio_url; } else if (request.tts_text) { input.mode = 'text2video'; input.text = request.tts_text; input.voice_id = request.tts_voice || 'male-magnetic'; input.voice_language = 'en'; input.voice_speed = request.tts_speed || 1.0; } else { throw new Error('Either audio_url or tts_text must be provided'); } const body = { input }; try { const response = await this.axiosInstance.post(path, body); return response.data.data; } catch (error) { if (axios.isAxiosError(error)) { throw new Error(`Kling API error: ${error.response?.data?.message || error.message}`); } throw error; } }
  • MCP protocol handler in the CallToolRequestSchema that parses arguments, validates inputs, calls klingClient.createLipsync, and formats the response with task ID.
    case 'create_lipsync': { const lipsyncRequest = { video_url: args.video_url as string, audio_url: args.audio_url as string | undefined, tts_text: args.tts_text as string | undefined, tts_voice: args.tts_voice as string | undefined, tts_speed: (args.tts_speed as number) ?? 1.0, model_name: (args.model_name as 'kling-v1' | 'kling-v1.5' | 'kling-v1.6' | 'kling-v2-master' | undefined) || 'kling-v2-master', }; // Validate that either audio_url or tts_text is provided if (!lipsyncRequest.audio_url && !lipsyncRequest.tts_text) { throw new Error('Either audio_url or tts_text must be provided for lip-sync'); } const result = await klingClient.createLipsync(lipsyncRequest); return { content: [ { type: 'text', text: `Lip-sync video creation started successfully!\nTask ID: ${result.task_id}\n\nThe video will be processed with ${lipsyncRequest.audio_url ? 'custom audio' : 'text-to-speech'}.\nUse the check_video_status tool with this task ID to check the progress.`, }, ], }; }
  • Tool schema definition including name, description, and detailed inputSchema with properties, enums, and validation for the create_lipsync tool.
    { name: 'create_lipsync', description: 'Create a lip-sync video by synchronizing mouth movements with audio. Supports both text-to-speech (TTS) with various voice options or custom audio upload. The original video must contain a clear, steady human face with visible mouth. Works with real, 3D, or 2D human characters (not animals). Video length limited to 10 seconds.', inputSchema: { type: 'object', properties: { video_url: { type: 'string', description: 'URL of the video to apply lip-sync to (must contain clear human face)', }, audio_url: { type: 'string', description: 'URL of custom audio file (mp3, wav, flac, ogg; max 20MB, 60s). If provided, TTS parameters are ignored', }, tts_text: { type: 'string', description: 'Text for text-to-speech synthesis (used only if audio_url is not provided)', }, tts_voice: { type: 'string', enum: ['male-warm', 'male-energetic', 'female-gentle', 'female-professional', 'male-deep', 'female-cheerful', 'male-calm', 'female-youthful'], description: 'Voice style for TTS (default: male-warm). Includes Chinese and English voice options', }, tts_speed: { type: 'number', description: 'Speech speed for TTS (0.5-2.0, default: 1.0)', minimum: 0.5, maximum: 2.0, }, model_name: { type: 'string', enum: ['kling-v1', 'kling-v1.5', 'kling-v1.6', 'kling-v2-master'], description: 'Model version to use (default: kling-v2-master)', }, }, required: ['video_url'], }, },
  • TypeScript interface defining the LipsyncRequest parameters used by the createLipsync handler.
    export interface LipsyncRequest { video_url: string; audio_url?: string; tts_text?: string; tts_voice?: string; tts_speed?: number; model_name?: 'kling-v1' | 'kling-v1.5' | 'kling-v1.6' | 'kling-v2-master'; }
  • src/index.ts:467-469 (registration)
    Registration of all tools including create_lipsync via the ListToolsRequestSchema handler that returns the TOOLS array containing the create_lipsync tool definition.
    server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: TOOLS, }));

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/199-mcp/mcp-kling'

If you have feedback or need assistance with the MCP directory API, please join our Discord server