Skip to main content
Glama

short-form-editor-mcp

Local MCP server that turns a long-form video into short-form (TikTok/Reel) clips by reasoning over a word-timestamped transcript. The agent reads the transcript and designs the edit — including non-contiguous reordering (open on the hook, cut back to the start, build, land the hook again). The server provides accurate STT, a cheap text-space validation loop, a silence-aware renderer, and an STT-based QA gate.

v1 scope: dialog/audio cues only. No smart reframe, source aspect ratio preserved, no burned-in captions. (Those are explicit later phases.)

How it works (the loop)

  1. create_project(video_path) — probe + extract 16kHz mono audio.

  2. transcribe(project_id) — WhisperX (word timestamps + silence map). Writes transcript.txt / transcript.json.

  3. Read transcript.txt and design one or more EDLs (edit decision lists). An EDL is an ordered list of segments, each a word-index range; segments may be reordered/reused.

  4. validate_edl(project_id, edl_obj) — snaps cuts to silence, returns the reconstructed dialog in designed order + join warnings. No render. Iterate here cheaply.

  5. render(project_id, edl_obj) — ffmpeg cut + concat, one re-encode, frame-accurate.

  6. verify_clip(project_id, edl_id) — re-STT the render and diff vs the intended dialog.

EDL shape:

{ "edl_id": "hook-v1", "title": "Whoops it deleted everything",
  "segments": [
    {"from_word": 880, "to_word": 905, "label": "hook"},
    {"from_word": 0,   "to_word": 120, "label": "setup"}
  ] }

Related MCP server: Video Transcriber MCP Server

v2: reframe, captions & polish (render-layer, all optional on the EDL)

Styling is configured on the EDL and applied by render: cleanup -> cut -> reframe -> captions/title + loudnorm -> multi-aspect.

{ "edl_id":"clip","title":"...","segments":[...],
  "cleanup":   {"remove_fillers": true, "max_pause": 1.0},
  "reframe":   {"mode":"track","aspect":"9:16","zoom":{"hook_punch":true}},
  "captions":  {"enabled": true, "preset":"karaoke-bold"},
  "title_card":{"text":"AI gave itself all the water","hold_s":3},
  "loudnorm":  true,
  "export_aspects": ["9:16","1:1"] }
  • reframe mode: track (YOLO11n subject-follow + One-Euro smoothing; center fallback), center, pad (blurred bars), none. Needs a visible person for track.

  • captions presets: karaoke-bold (Anton, word-by-word pop), lower-third, minimal-top.

  • cleanup: drops filler words + splits at pauses > max_pause.

  • The clean cut is always at renders/<edl_id>.mp4 (stable audio for verify_clip); styled deliverables at renders/<edl_id>__<aspect>.mp4.

  • New tools: suggest_clips, extract_thumbnail, list_caption_presets, list_reframe_modes.

  • New deps: ultralytics, opencv-python. Bundled font: assets/fonts/Anton-Regular.ttf (OFL).

Learn resources (read these first)

The server exposes MCP learn:// resources that bake in the workflow and the lessons: learn://overview, learn://workflow, learn://hooks, learn://cutting, learn://gotchas. An agent should read learn://overview then learn://workflow before driving the tools.

Setup

Requires: ffmpeg/ffprobe on PATH, an NVIDIA GPU (for the default WhisperX large-v3).

# 1. venv (Python 3.11)
py -3.11 -m venv E:\FlowdotPlatform\short-form-editor-mcp\.venv
$py = "E:\FlowdotPlatform\short-form-editor-mcp\.venv\Scripts\python.exe"

# 2. install: this package, CUDA torch, whisperx, (optional) openai
& $py -m pip install --upgrade pip
& $py -m pip install -e E:\FlowdotPlatform\short-form-editor-mcp
& $py -m pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
& $py -m pip install whisperx openai

First transcribe downloads the WhisperX model + the wav2vec2 alignment model. Gotcha: depending on the WhisperX/pyannote version, the VAD model may need a one-time Hugging Face token — set HF_TOKEN in the server env if the first run asks for it. We do not use diarization.

Register in .mcp.json

"short-form-editor": {
  "command": "E:\\FlowdotPlatform\\short-form-editor-mcp\\.venv\\Scripts\\python.exe",
  "args": ["-m", "short_form_editor_mcp"],
  "env": {
    "STT_BACKEND": "whisperx",
    "WHISPERX_MODEL": "large-v3",
    "DEVICE": "cuda",
    "WORKSPACE_ROOT": "E:\\FlowdotPlatform\\short-form-editor-mcp\\workspaces",
    "OPENAI_API_KEY": ""
  }
}

Config (env vars)

var

default

meaning

STT_BACKEND

whisperx

whisperx (free, local) or openai (whisper-1; 25 MB file cap)

WHISPERX_MODEL

large-v3

model size

DEVICE

cuda

cuda or cpu

COMPUTE_TYPE

float16/int8

CTranslate2 compute type

MIN_SILENCE

0.15

min gap (s) that counts as a clean cut boundary

SNAP_MARGIN

0.08

how far (s) into the silence to place the cut (capped at gap/2)

CROSSFADE_MS

15

per-join audio fade to kill clicks (0 = off)

WORKSPACE_ROOT

./workspaces

where project data is stored

OPENAI_API_KEY

required only for the openai backend

HF_TOKEN

only if WhisperX VAD asks for it

F
license - not found
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ElliotTheGreek/short-form-video-editor-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server