MCP Puppet Pipeline

mcp-puppet-pipeline
attached_assets

Pasted-Start-with-a-single-agent-that-writes-a-precise-image-description-for-a-reference-image-Feed-the-re-1758801452515_1758801452516.txt•24.2 kB

Start with a single agent that writes a precise image description for a reference image. Feed the reference image + the agent’s description into OpenAI (vision + image generation). Generate image sets per emotion and per angle, respecting fixed specs (puppet mechanics, mouth/tongue, fur/clothing consistency). Auto-detect extras (glasses/hat etc.), merge into identifiers, then QC against the spec. Deliver for review, then loop fixes. 0) Constants you can lock now Angles (v1): front, left, right, back, 3q-left, 3q-right. Emotions (v1): neutral, happy, sad, angry, surprised, disgust, fear, smirk. Mouth states: closed, open-small, open-wide, tongue-out, teeth-showing. Lighting: soft, even, studio-style, no harsh shadows. Background: plain light gray (or transparent if you’ll composite later). Output size: 1024×1024 (change if you need). Style lock: “same character proportions, color palette, and materials across all shots.” 1) End-to-end pipeline (single agent orchestration) Ingest Inputs: reference_image, optional existing_identifiers.json. Agent creates a Detailed Caption (objective description with anatomy, colors, materials, clothing, accessories, scars, markings). Vision pass auto-detects extras (e.g., hat, glasses) and oddities (asymmetries, damage). Build/merge Identifiers Merge detected features into the Identifiers object (see schema below). Freeze “fixed” specs (anatomy, proportions, fur pattern, palette, logos). Shot list expansion Cartesian expansion: angles × emotions × mouth states (configure which combos you actually need). Generate a Shot Spec for each output image. Prompting for generation For each shot: feed the reference_image + identifiers + shot spec to the image model. Enforce: locked palette, proportions, accessories; background; lighting; camera angle; emotion. QC (automatic) Vision pass compares output vs. Identifiers + Shot Spec. Score per rule (see QC rules). Tag any drift (palette off, accessories missing, wrong mouth state, angle misaligned). Triage If all rules pass: mark “Ready for Review”. If minor drift: one automatic correction attempt (tighten prompt; add “do not change” reminders). If still failing: route to user review with a compact diff report. Delivery Save images with deterministic names. Emit qc_report.json and identifiers.final.json. Produce a short review sheet. Corrections loop User comments → regenerate only failed shots with updated constraints. 2) Data contracts (copy-paste JSON Schemas) 2.1 Character Identifiers (authoritative) { "character_id": "string", "name": "string", "source_reference": "path-or-url", "anatomy": { "species_or_type": "e.g., puppet gremlin", "height_relative": "e.g., small", "proportions_notes": "e.g., large head, short limbs", "silhouette_keywords": ["rounded ears","tapered snout"] }, "colors_materials": { "primary_palette": ["#A1B2C3","#334455"], "secondary_palette": ["#..."], "materials": ["felt","faux fur","plastic eyes","stitched mouth"] }, "surface_features": { "fur_pattern": "describe zones and direction", "scars_markings": "none or details", "eye_details": {"iris_color":"hex","pupil_shape":"round"}, "mouth_teeth_tongue": {"teeth":"flat white","tongue":"pink #F29CB2"} }, "costume_baseline": { "garment": "yellow raincoat", "footwear": "none", "logo_text": null }, "accessories": ["round glasses"], "mechanics": { "mouth_states_allowed": ["closed","open-small","open-wide","tongue-out","teeth-showing"], "jaw_hinge_visibility": "hidden", "ear_flex": "none", "eye_gaze_rules": "camera unless specified" }, "forbidden_changes": [ "do not change eye color", "no new scars", "preserve fur pattern zones" ], "notes": "any extra lock-ins" } 2.2 Shot Spec (one per image to generate) { "character_id": "string", "angle": "front | left | right | back | 3q-left | 3q-right", "emotion": "neutral | happy | sad | angry | surprised | disgust | fear | smirk", "mouth_state": "closed | open-small | open-wide | tongue-out | teeth-showing", "lighting": "soft even studio", "background": "plain light gray", "framing": "waist-up | full-body | bust", "camera_height": "eye-level", "notes": "any per-shot nuance" } 2.3 QC Report (aggregated) { "batch_id": "string", "pass_rate": 0.0, "items": [ { "filename": "string", "shot_spec": { "angle":"front","emotion":"happy","mouth_state":"open-small" }, "scores": { "palette_lock": 0.97, "proportions_lock": 0.94, "accessories_present": 1.0, "angle_match": 0.92, "emotion_match": 0.88, "mouth_state_match": 0.95, "artifact_check": 0.90, "background_lock": 1.0 }, "status": "pass | auto-retry | fail", "notes": "what drifted", "retry_prompt_delta": "extra constraints if auto-retry" } ] } 3) Identifier checklist (what to capture up front) Anatomy & silhouette: head/torso ratio, limb lengths, ear shape, tail, overall silhouette cues. Color/palette: hex values for primary/secondary; eye/tongue/teeth color; fur zones. Materials: fur/felt/fabric types; reflectivity; stitching lines; plastic parts. Surface features: markings, freckles, scars, seams; fur direction; wear/tear. Face specifics: eye shape, eyelid line, lash presence; muzzle/snout form. Mouth/tongue/teeth: allowed states, tongue shape/length, tooth style. Clothing baseline: exact garments, fasteners, logos, patterns. Accessories: glasses, hats, jewelry; whether removable or always-on. Mechanics locks: what must never change (jaw hinge visibility, ear flexibility). Camera/lighting locks: even lighting, neutral background, camera height. Forbidden list: any “never change” items (colors, logos, scars). Oddities detector: hats/glasses/new stickers; damage; symmetry issues. 4) QC rules (pass/fail thresholds) Palette lock ≥ 0.95 cosine similarity in HSV/histogram space. Proportions lock: measured landmarks within ±3% of baseline ratios. Accessories present: binary presence, else fail. Angle match: head yaw/pitch/roll within target ±10° (3D estimate). Emotion match: classifier confidence ≥ 0.80 (or model self-report with rationale). Mouth state: classifier detects state requested; if mismatch → fail. Background lock: uniformity > 0.98; no props. Artifacts: no double pupils, extra fingers, melting edges; if found → auto-retry once. 5) File layout (simple and visual) C:\tools\Character-Pipeline\ 01_input\ reference\ (drop your reference image here) identifiers.seed.json (optional) 02_captions\ character.caption.json 03_specs\ identifiers.final.json shots.plan.json 04_generations\ <character_id>\angle=<X>\emotion=<Y>\mouth=<Z>\*.png 05_qc\ qc_report.json diffs\*.png 06_delivery\ review_sheet.md 6) Proof-of-concept (tiny batch) Character: 1 Angles: front, 3q-left Emotions: neutral, happy, angry Mouth: closed only Total: 6 images Success = QC pass rate ≥ 80% on first pass; no palette drift. 7) Copy-paste Python (baseline; uses OpenAI Vision + Image Gen) This is a starter. It won’t do pixel-level landmarking; it relies on the model for description and soft QC. Treat as a PoC. This needs verification. import os, json, uuid, pathlib from datetime import datetime from PIL import Image from openai import OpenAI BASE = r"C:\tools\Character-Pipeline" os.makedirs(BASE, exist_ok=True) client = OpenAI() # requires OPENAI_API_KEY in env def save_json(data, path): pathlib.Path(path).parent.mkdir(parents=True, exist_ok=True) with open(path, "w", encoding="utf-8") as f: json.dump(data, f, indent=2, ensure_ascii=False) def load_json(path, default=None): if os.path.exists(path): return json.load(open(path, "r", encoding="utf-8")) return default # -------- 1) Detailed Caption via Vision -------- def describe_reference(ref_path): prompt = ( "You are a meticulous character describer. " "Describe the character in the image for reproduction with image generation. " "Capture anatomy/silhouette, exact color palette (return hex swatches), materials, " "fur/skin patterns, eye details, mouth/teeth/tongue, clothing, accessories, seams/stitching, " "forbidden changes, and any oddities (hat, glasses, damage). " "Respond as strict JSON in the 'raw' field containing plain prose plus a 'palette' field with hexes." ) msg = [ {"role":"user","content":[ {"type":"input_text","text":prompt}, {"type":"input_image","image_url":f"file://{ref_path}"} ]} ] resp = client.chat.completions.create( model="gpt-4.1-mini", # fast vision-capable; adjust per pricing/perf messages=msg, temperature=0.2 ) text = resp.choices[0].message.content return {"raw": text} # -------- 2) Build Identifiers -------- def build_identifiers(caption_json, seed_identifiers=None): sys = ( "Merge the caption with the optional seed identifiers into the final Identifiers JSON " "using the schema given below. Fill hex colors from caption palette. Preserve any 'forbidden_changes'. " "Return ONLY the JSON object, no commentary." ) schema = load_json(os.path.join(BASE,"schema_identifiers.json")) or {} messages = [ {"role":"system","content":sys + "\nSCHEMA:\n" + json.dumps(schema)}, {"role":"user","content":json.dumps({ "caption": caption_json, "seed": seed_identifiers or {} })} ] resp = client.chat.completions.create( model="o3-mini", messages=messages, temperature=0 ) return json.loads(resp.choices[0].message.content) # -------- 3) Expand Shots -------- ANGLES = ["front","3q-left","3q-right","left","right","back"] EMOTIONS = ["neutral","happy","sad","angry","surprised","disgust","fear","smirk"] MOUTHS = ["closed","open-small","open-wide","tongue-out","teeth-showing"] def make_plan(char_id, angles, emotions, mouths): shots = [] for a in angles: for e in emotions: for m in mouths: shots.append({ "character_id": char_id, "angle": a, "emotion": e, "mouth_state": m, "lighting":"soft even studio", "background":"plain light gray", "framing":"bust", "camera_height":"eye-level", "notes":"lock palette and proportions; no background props" }) return {"shots": shots} # -------- 4) Generate One Shot -------- def generate_shot(ref_path, identifiers, shot, out_path): # Compose an instruction that locks the character guardrails = [ "Preserve exact colors and materials.", "Do not change eye color, fur pattern zones, or garment.", "Maintain proportions and silhouette.", "Plain light-gray background only." ] prompt = f""" Generate a clean studio image of the SAME character. Angle: {shot['angle']}. Emotion: {shot['emotion']}. Mouth: {shot['mouth_state']}. Framing: {shot['framing']}. Camera height: {shot['camera_height']}. Lighting: {shot['lighting']}. Background: {shot['background']}. Guardrails: {', '.join(guardrails)}. Character Identifiers (authoritative): {json.dumps(identifiers, ensure_ascii=False)} """ # Tool: Images generation (text + image input) img = client.images.generate( model="gpt-image-1", prompt=prompt, size="1024x1024", image[]=[{"image": f"file://{ref_path}"}] # image conditioning ) b64 = img.data[0].b64_json import base64 pathlib.Path(out_path).parent.mkdir(parents=True, exist_ok=True) with open(out_path, "wb") as f: f.write(base64.b64decode(b64)) # -------- 5) QC (model-aided) -------- def qc_image(image_path, identifiers, shot): q = { "task":"Evaluate if the image matches identifiers and shot spec. " "Score each category 0..1 and explain any drift briefly.", "identifiers":identifiers, "shot":shot } msg = [{"role":"user","content":[ {"type":"input_text","text":json.dumps(q)}, {"type":"input_image","image_url":f"file://{image_path}"} ]}] resp = client.chat.completions.create( model="gpt-4.1-mini", messages=msg, temperature=0 ) analysis = resp.choices[0].message.content # naive parse; in practice instruct strict JSON and json.loads return {"analysis": analysis} def main(): ref = os.path.join(BASE,"01_input","reference","ref.png") caption = describe_reference(ref) save_json(caption, os.path.join(BASE,"02_captions","character.caption.json")) seed = load_json(os.path.join(BASE,"01_input","identifiers.seed.json"), {}) # save schema locally for the merger step (optional) schema_path = os.path.join(BASE,"schema_identifiers.json") if not os.path.exists(schema_path): save_json({ "fields":"(this file is only to provide structure cues; optional in PoC)" }, schema_path) identifiers = build_identifiers(caption, seed) save_json(identifiers, os.path.join(BASE,"03_specs","identifiers.final.json")) char_id = identifiers.get("character_id","char-"+uuid.uuid4().hex[:8]) plan = make_plan(char_id, angles=["front","3q-left"], emotions=["neutral","happy","angry"], mouths=["closed"]) save_json(plan, os.path.join(BASE,"03_specs","shots.plan.json")) qc_items = [] for shot in plan["shots"]: out = os.path.join(BASE,"04_generations",char_id, f"angle={shot['angle']}",f"emotion={shot['emotion']}",f"mouth={shot['mouth_state']}", f"{char_id}_{shot['angle']}_{shot['emotion']}_{shot['mouth_state']}.png") generate_shot(ref, identifiers, shot, out) qc = qc_image(out, identifiers, shot) qc_items.append({"filename": out, "shot_spec": {"angle":shot["angle"],"emotion":shot["emotion"],"mouth_state":shot["mouth_state"]}, "scores":{}, "status":"review", "notes": qc["analysis"]}) save_json({"batch_id": uuid.uuid4().hex, "pass_rate": 0.0, "items": qc_items}, os.path.join(BASE,"05_qc","qc_report.json")) with open(os.path.join(BASE,"06_delivery","review_sheet.md"),"w",encoding="utf-8") as f: f.write("# Review\n\nSee qc_report.json and images.") if __name__ == "__main__": main() Notes: Replace models as needed. Model names and features evolve; confirm the current names in the docs before running. OpenAI Platform +1 Image generation with text + image inputs is documented under OpenAI Images/vision guides. OpenAI Platform +2 OpenAI Platform +2 8) n8n outline (if you prefer visual automation) Trigger: “New reference image dropped in 01_input/reference” Node 1 (OpenAI Chat): Vision caption → character.caption.json Node 2 (OpenAI Chat): Merge to identifiers.final.json (system prompt enforces schema) Node 3 (Function): Build shots.plan.json Node 4 (Loop): For each shot → Image Generate (OpenAI Images) Node 5 (OpenAI Chat Vision): QC each image; write qc_report.json Node 6 (If): Failures? → Auto-retry once with tightened prompt; else continue Node 7 (Write Binary Files): Save images; Node 8 (Markdown): review_sheet.md Node 9 (Notify): Send delivery folder link to you (Telegram/Email) 9) Limitations & failure points (and how to handle) Consistency drift (palette/proportions): Use stronger “do not change” guardrails; include the reference image on every call; generate in small batches; keep background neutral. Angle accuracy: The model may approximate. Add explicit yaw/pitch descriptors (“head turned ~30° to camera-left”). Emotion clarity: Add describers (“raised cheeks, crow’s feet for happy; brows down for angry”). Mouth mechanics: Explicitly name the tongue/teeth state; reiterate per prompt. Accessories vanishing: Re-assert “accessories present: X” early and late in prompt; fail if missing. Model updates: Model names/features change; confirm current names in the docs before running. OpenAI Platform 10) Quick start checklist (10 minutes) Create folders as in section 5. Put your reference image at 01_input/reference/ref.png. (Optional) Draft identifiers.seed.json with any known locks. Install openai, Pillow. Set OPENAI_API_KEY (you have a master .env—ensure it’s loaded). Run the script. Inspect 02_captions/character.caption.json. Review 03_specs/identifiers.final.json for sanity. Check 04_generations images. Open 05_qc/qc_report.json—skim the notes. If happy, use 06_delivery/review_sheet.md to approve or request fixes. If you want, I can adapt this to your exact toolchain (your C:\tools orchestration, character DB, and naming scheme) and produce an n8n export JSON so you can import it in one go. You said: dont we want multiagent system so that the agents can work on several flows modular? ChatGPT said: Short answer: yes. A multi-agent system is the right move—modular, parallel, and easier to scale and debug. Below is a lean, visual, step-by-step plan you can implement now without overhauling everything. Overview (what changes) One orchestrator assigns work. Small, single-purpose workers do one job well. A queue sits between every stage so tasks run in parallel and can retry safely. Artifacts (images, JSON, QC reports) are saved to disk with stable names so agents don’t block each other. Minimal agent roster (start here) Orchestrator Splits the job into shots (angle × emotion × mouth). Pushes tasks to queues. Tracks status. Describer (Vision) Input: reference image. Output: 02_captions/character.caption.json + detected accessories/oddities. Identifier Builder Merges caption + seed into 03_specs/identifiers.final.json (the “law”). Shot Planner Builds 03_specs/shots.plan.json from angles/emotions/mouth states. Generator (N parallel workers) Input: one shot spec + identifiers + reference. Output: image file. QC Inspector (M parallel workers) Input: generated image + identifiers + shot spec. Output: QC scores + pass/fail + retry hints. Auto-Fixer Tightens prompt/constraints for failed items and re-queues once. Publisher Packages images, qc_report.json, and a review sheet; notifies you. Visual flow (text diagram) [Orchestrator] ├─> Q_caption ----> [Describer] ----------┐ │ │ ├─> Q_ident ----> [Identifier Builder] --┼--> identifiers.final.json │ │ ├─> Q_plan ----> [Shot Planner] -------┘--> shots.plan.json │ └─> Q_gen (per shot) ---> [Generator x N] ----> images └─> Q_qc -----------> [QC x M] ---> pass/fail ├─pass → Q_publish → [Publisher] └─fail → Q_autofix → [Auto-Fixer] → Q_gen Why this suits ADHD + scale Parallel: Generators and QC run at the same time. Modular: Each agent is focused; easier to debug. Observable: Queues show where work is stuck. Fail-safe: Crashes don’t kill the whole run; failed items are isolated and retried. Data contracts (unchanged, just split per agent) Identifiers JSON (authoritative spec). Shot Spec JSON (one per image task). QC Report JSON (scored, pass/fail, retry reason). Folder layout (shared, agent-safe) C:\tools\Character-Pipeline\ 01_input\reference\ref.png 02_captions\character.caption.json 03_specs\identifiers.final.json 03_specs\shots.plan.json 04_generations\<character_id>\angle=...\emotion=...\mouth=...\*.png 05_qc\qc_report.json 06_delivery\review_sheet.md .runtime\queues\ (lightweight queue state if not using Redis/Rabbit) .runtime\logs\ Queueing options (pick one) Fastest to set up: File-based queues (JSON lines in .runtime\queues\*). Works on Windows, zero infra. Better for scale: Redis streams (n8n supports Redis; easy to monitor). n8n-only: Use n8n’s built-in “Execute Workflow” with concurrency limits per node. n8n blueprint (concise) Workflow A – Prep Trigger: “file added” to 01_input\reference\ Node: Describer (OpenAI Vision) → write 02_captions\... Node: Identifier Builder (OpenAI Chat) → write 03_specs\identifiers.final.json Node: Shot Planner (Function) → write 03_specs\shots.plan.json Node: For-Each (shots) → push shot specs to Q_gen (could be a simple Google Sheet row, Redis list, or local file queue) Workflow B – Generator Worker (concurrency: 3–6) Trigger: “new item in Q_gen” Node: Generate Image (OpenAI Images/your model) → save to 04_generations\... Node: Push to Q_qc Workflow C – QC Worker (concurrency: 2–4) Trigger: “new item in Q_qc” Node: QC (OpenAI Vision/Text) → append results to 05_qc\qc_report.json IF pass → Q_publish; IF fail and retry_count < 1 → Q_autofix; else → Q_publish (marked fail) Workflow D – Auto-Fixer Tighten prompts with explicit deltas (“re-enforce glasses, keep iris color #xxxxxx”) Requeue to Q_gen with retry_count=1 Workflow E – Publisher Gather passed images; collate qc_report.json; render review_sheet.md Notify via Telegram/Email with folder link Config file (copy-paste YAML) character_pipeline: angles: [front, 3q-left, 3q-right, left, right, back] emotions: [neutral, happy, sad, angry, surprised, disgust, fear, smirk] mouth_states: [closed, open-small, open-wide, tongue-out, teeth-showing] lighting: "soft even studio" background: "plain light gray" framing: "bust" camera_height: "eye-level" generator: concurrency: 4 model: "gpt-image-1" # replace if you switch vendors size: "1024x1024" qc: concurrency: 3 thresholds: palette_lock: 0.95 proportions_lock: 0.97 emotion_match: 0.80 angle_tolerance_deg: 10 max_retries: 1 storage_root: "C:\\tools\\Character-Pipeline" queues: type: "files" # files|redis path: ".runtime\\queues" Message schemas between agents Q_gen item { "character_id": "char-xxxx", "shot": {"angle":"3q-left","emotion":"happy","mouth_state":"closed"}, "identifiers_path": "03_specs/identifiers.final.json", "reference_path": "01_input/reference/ref.png", "retry_count": 0 } Q_qc item { "image_path": "04_generations/char-xxxx/angle=3q-left/emotion=happy/mouth=closed/char-xxxx_3q-left_happy_closed.png", "shot": {...}, "identifiers_path": "03_specs/identifiers.final.json" } Rollout plan (low risk) PoC multi-agent: Keep your existing single-script logic but split it into 3 workers: Generator x2 + QC x1, using file queues. Add Orchestrator: Build shots.plan.json and push items to Q_gen; watch concurrency. Add Auto-Fixer: Only when you see common failures (accessories missing, mouth mismatch). Swap queues to Redis if/when file queues feel limiting. Failure points to expect (and fixes) Race conditions writing qc_report.json: Append via temp file + atomic rename, or store one report per image then merge at the end. Duplicate work: Workers must claim a task by moving the JSON from pending to processing folder (or use Redis XCLAIM). Prompt drift under parallel load: Keep reference image + identifiers in every generation call; include “do not change” list twice (top and end). Accessory dropouts: Add explicit presence checks in QC; Auto-Fixer adds: “glasses MUST be present and circular, black rim.” Angle ambiguity: Encode head yaw/pitch in degrees in the shot spec (“target yaw +30°, pitch 0°, roll 0°”). Teach QC to estimate roughly. Quick start (today) Keep your earlier single-agent script for Describer/Identifiers/Planner. Add two small Python scripts: worker_generate.py → polls Q_gen, saves image, enqueues to Q_qc. worker_qc.py → polls Q_qc, writes per-image QC JSON, enqueues pass/fail. Start 3 consoles: worker_generate.py (run twice), worker_qc.py (once). You’ve got a multi-agent system.

Latest Blog Posts

What Is Context Bloat in MCP?
By Om-Shree-0709 on December 16, 2025.
mcp
Context Bloat
MCP Moves to the Linux Foundation: Neutral Stewardship for Agentic Infrastructure
By Om-Shree-0709 on December 15, 2025.
mcp
anthropic
Linux Foundation
Code Execution with MCP: Architecting Agentic Efficiency
By Om-Shree-0709 on December 14, 2025.
mcp
Token bloat

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bermingham85/mcp-puppet-pipeline'

If you have feedback or need assistance with the MCP directory API, please join our Discord server