parse_transcript
Convert raw speaker-labeled plain text transcripts into word-level timestamps with configurable time offsets and video duration for accuracy.
Instructions
Parse a raw speaker-labeled plain text transcript into word-level timestamps. Input format: 'Speaker (MM:SS)\ntext...\n\nSpeaker2 (MM:SS)\ntext...'. Uses the Python backend to generate accurate word timings.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| raw_text | Yes | Raw speaker-labeled transcript text | |
| file_path | Yes | Path to the video file the transcript belongs to | |
| time_adjust | No | Offset in seconds to add to all timestamps | |
| total_duration | No | Total video duration in seconds (helps accuracy) |