Skip to main content
Glama

create_lipsync

Synchronize mouth movements in videos with audio using text-to-speech or custom audio upload. Works with human faces in real, 3D, or 2D videos to create lip-sync content.

Instructions

Create a lip-sync video by synchronizing mouth movements with audio. Supports both text-to-speech (TTS) with various voice options or custom audio upload. The original video must contain a clear, steady human face with visible mouth. Works with real, 3D, or 2D human characters (not animals). Video length limited to 10 seconds.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
video_urlYesURL of the video to apply lip-sync to (must contain clear human face)
audio_urlNoURL of custom audio file (mp3, wav, flac, ogg; max 20MB, 60s). If provided, TTS parameters are ignored
tts_textNoText for text-to-speech synthesis (used only if audio_url is not provided)
tts_voiceNoVoice style for TTS (default: male-warm). Includes Chinese and English voice options
tts_speedNoSpeech speed for TTS (0.5-2.0, default: 1.0)
model_nameNoModel version to use (default: kling-v2-master)

Implementation Reference

  • Core handler function that implements the lip-sync tool logic by calling the Kling AI lip-sync API endpoint, processing video URLs, handling audio upload vs TTS modes, and returning the task ID.
    async createLipsync(request: LipsyncRequest): Promise<{ task_id: string }> {
      const path = '/v1/videos/lip-sync';
      
      // Process video URL
      const video_url = await this.processImageUrl(request.video_url);
      
      const input: any = {
        video_url: video_url!,
      };
    
      if (request.audio_url) {
        input.mode = 'audio2video';
        input.audio_type = 'url';
        input.audio_url = request.audio_url;
      } else if (request.tts_text) {
        input.mode = 'text2video';
        input.text = request.tts_text;
        input.voice_id = request.tts_voice || 'male-magnetic';
        input.voice_language = 'en';
        input.voice_speed = request.tts_speed || 1.0;
      } else {
        throw new Error('Either audio_url or tts_text must be provided');
      }
    
      const body = { input };
    
      try {
        const response = await this.axiosInstance.post(path, body);
        return response.data.data;
      } catch (error) {
        if (axios.isAxiosError(error)) {
          throw new Error(`Kling API error: ${error.response?.data?.message || error.message}`);
        }
        throw error;
      }
    }
  • MCP protocol handler in the CallToolRequestSchema that parses arguments, validates inputs, calls klingClient.createLipsync, and formats the response with task ID.
    case 'create_lipsync': {
      const lipsyncRequest = {
        video_url: args.video_url as string,
        audio_url: args.audio_url as string | undefined,
        tts_text: args.tts_text as string | undefined,
        tts_voice: args.tts_voice as string | undefined,
        tts_speed: (args.tts_speed as number) ?? 1.0,
        model_name: (args.model_name as 'kling-v1' | 'kling-v1.5' | 'kling-v1.6' | 'kling-v2-master' | undefined) || 'kling-v2-master',
      };
    
      // Validate that either audio_url or tts_text is provided
      if (!lipsyncRequest.audio_url && !lipsyncRequest.tts_text) {
        throw new Error('Either audio_url or tts_text must be provided for lip-sync');
      }
    
      const result = await klingClient.createLipsync(lipsyncRequest);
      
      return {
        content: [
          {
            type: 'text',
            text: `Lip-sync video creation started successfully!\nTask ID: ${result.task_id}\n\nThe video will be processed with ${lipsyncRequest.audio_url ? 'custom audio' : 'text-to-speech'}.\nUse the check_video_status tool with this task ID to check the progress.`,
          },
        ],
      };
    }
  • Tool schema definition including name, description, and detailed inputSchema with properties, enums, and validation for the create_lipsync tool.
    {
      name: 'create_lipsync',
      description: 'Create a lip-sync video by synchronizing mouth movements with audio. Supports both text-to-speech (TTS) with various voice options or custom audio upload. The original video must contain a clear, steady human face with visible mouth. Works with real, 3D, or 2D human characters (not animals). Video length limited to 10 seconds.',
      inputSchema: {
        type: 'object',
        properties: {
          video_url: {
            type: 'string',
            description: 'URL of the video to apply lip-sync to (must contain clear human face)',
          },
          audio_url: {
            type: 'string',
            description: 'URL of custom audio file (mp3, wav, flac, ogg; max 20MB, 60s). If provided, TTS parameters are ignored',
          },
          tts_text: {
            type: 'string',
            description: 'Text for text-to-speech synthesis (used only if audio_url is not provided)',
          },
          tts_voice: {
            type: 'string',
            enum: ['male-warm', 'male-energetic', 'female-gentle', 'female-professional', 'male-deep', 'female-cheerful', 'male-calm', 'female-youthful'],
            description: 'Voice style for TTS (default: male-warm). Includes Chinese and English voice options',
          },
          tts_speed: {
            type: 'number',
            description: 'Speech speed for TTS (0.5-2.0, default: 1.0)',
            minimum: 0.5,
            maximum: 2.0,
          },
          model_name: {
            type: 'string',
            enum: ['kling-v1', 'kling-v1.5', 'kling-v1.6', 'kling-v2-master'],
            description: 'Model version to use (default: kling-v2-master)',
          },
        },
        required: ['video_url'],
      },
    },
  • TypeScript interface defining the LipsyncRequest parameters used by the createLipsync handler.
    export interface LipsyncRequest {
      video_url: string;
      audio_url?: string;
      tts_text?: string;
      tts_voice?: string;
      tts_speed?: number;
      model_name?: 'kling-v1' | 'kling-v1.5' | 'kling-v1.6' | 'kling-v2-master';
    }
  • src/index.ts:467-469 (registration)
    Registration of all tools including create_lipsync via the ListToolsRequestSchema handler that returns the TOOLS array containing the create_lipsync tool definition.
    server.setRequestHandler(ListToolsRequestSchema, async () => ({
      tools: TOOLS,
    }));

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/199-mcp/mcp-kling'

If you have feedback or need assistance with the MCP directory API, please join our Discord server