mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-21 05:11:26 +00:00

skill(comfyui): add template-integrity reference from @purzbeats (#25828 )

Adds references/template-integrity.md covering safe conversion of the
official comfyui-workflow-templates package from editor format to API
format — Reroute bypass via link tracing, dotted dynamic-input keys
(values.a, resize_type.width) that must NOT be flattened, server-error
"patch don't rebuild" loop, Cloud quirks (302 redirect to signed GCS
URL, free-tier 1 concurrent job, 1920x1080 OOM on RTX 5090), and a
Discord-compatible ffmpeg stitch recipe (yuv420p + xfade/acrossfade).

SKILL.md lists the new reference so the agent loads it when starting
from an official template. purzbeats added to author list and to
scripts/release.py AUTHOR_MAP.

Co-authored-by: purzbeats <97489706+purzbeats@users.noreply.github.com>

2026-05-14 09:34:10 -07:00

8.6 KiB

Raw Blame History

ComfyUI Workflow-Template Integrity

Authored by @purzbeats — adapted from purzbeats/hermes-agent-comfyui-helper. Use this reference when converting workflows from the official comfyui-workflow-templates package (editor format) into API format for submission via /api/prompt. The conversion has subtle gotchas that cause hard-to-diagnose validation errors if you don't follow these rules.

Background

The official ComfyUI template package (comfyui-workflow-templates, currently v0.9.69) is installed inside the ComfyUI venv at a path like:

<comfy-install>/.venv/lib/python3.*/site-packages/comfyui_workflow_templates_*/templates/

The exact path depends on how ComfyUI was installed (comfy-cli default, Comfy Desktop, manual venv, etc.). Find it once with:

comfy --workspace <ws> run-python -c "import comfyui_workflow_templates, pathlib; print(pathlib.Path(comfyui_workflow_templates.__file__).parent / 'templates')"

Templates ship in editor format — nodes / links arrays inside data['definitions']['subgraphs'][0]. They must be converted to API format (a node_id -> {class_type, inputs} mapping) before submission.

RULE #1: Use templates AS CLOSE TO ORIGINAL AS POSSIBLE

Never strip, simplify, or "minimize" nodes from a template.
Full template architecture (dual-pass pipelines, LoRA chains, distilled sigmas, conditioning paths) is intentional — removing any part breaks quality.
If an image-dependent path exists but the task is text-to-video, leave it wired with the bypass toggle enabled — don't remove the nodes.
Only change: prompt text, seed, and dimensions (when explicitly requested).

RULE #2: Server validation errors are the source of truth

When a workflow submission fails, the server response looks like:

{
  "node_errors": {
    "238": {
      "errors": [{
        "message": "Required input is missing",
        "details": "width",
        "extra_info": { "input_name": "resize_type.width" }
      }]
    }
  }
}

The extra_info.input_name field tells you EXACTLY what JSON key the server wants. Use it literally. If it says "values.a" or "resize_type.width", those are the actual key names in the JSON object. Do not "simplify" them to flat names based on assumptions about what the field "should" be called.

RULE #3: Don't rebuild from scratch — patch the failing nodes

Every regeneration from the template reintroduces the same bugs. Instead:

Submit the workflow once.
Read the server error details for exact key names.
Use targeted patch/fix calls against the workflow file on disk.
Resubmit and check if errors resolved.

Reroute nodes: bypass, don't delete

Most servers (local, Cloud) don't have a Reroute node type. When converting a template:

Find what feeds into the Reroute by looking at links where target_id = the Reroute node ID.
Replace all inputs referencing the Reroute with [source_node_id, source_slot].
Delete the Reroute node from the API mapping.

Real example — LTX 2.3 t2v template:

Reroute node 255 receives VAE from CheckpointLoaderSimple 236 slot 2.
Three nodes reference Reroute 255 for their VAE input: LTXVImgToVideoInplace (230), LTXVLatentUpsampler (253), VAEDecodeTiled (251).
Fix: replace all occurrences of vae: ["255", 0] with vae: ["236", 2].
CheckpointLoaderSimple slot 2 = VAE (not slot 0 = MODEL).


❌ Wrong	`vae: ["236", 0]` → `MODELV mismatch input_type(VAE)`
✅ Correct	`vae: ["236", 2]`

Dynamic template nodes: dotted key names are correct

ComfyMathExpression (COMFY_AUTOGROW_V3)

{
  "class_type": "ComfyMathExpression",
  "inputs": {
    "expression": "a/2",
    "values.a": ["257", 0]
  }
}

values is a COMFY_AUTOGROW_V3 template.
Input names in links are values.a, values.b, etc.
Keep the dotted format as JSON keys.
Do NOT convert to {"values": {"a": ...}} or flatten to just "a".

ResizeImageMaskNode (COMFY_DYNAMICCOMBO_V3)

{
  "class_type": "ResizeImageMaskNode",
  "inputs": {
    "input": ["276", 0],
    "scale_method": "lanczos",
    "resize_type": "scale dimensions",
    "resize_type.width": 1920,
    "resize_type.height": 1088,
    "resize_type.crop": "center"
  }
}

resize_type is a COMFY_DYNAMICCOMBO_V3.
Mode-specific fields: resize_type.width, resize_type.height, resize_type.crop.
scale_method options: "nearest-exact", "bilinear", "area", "bicubic", "lanczos".
Keep the dotted format as JSON keys.
Do NOT flatten resize_type.width to just "width".

Conversion recipe

Load template from the installed package path.
Parse data['definitions']['subgraphs'][0].
For each node (skip Reroute):
- Resolve linked inputs from sg['links'] dict.
- Map widgets_values to input field names.
- Keep all dotted key names as-is from the template.
Bypass Reroute: trace source, replace references.
Change only: prompt text, seed values, and user-requested parameters.
Add SaveVideo terminal node if template uses only CreateVideo.
Submit → read errors → patch specific nodes → resubmit.

What to NEVER change in a template

Element	Why
Node topology	Graph is designed for the specific model
Sigmas values	Tuned for the model/sampler combination
LoRA/distilled paths	Required for quality, even if they look unused
Model parameters (cfg, steps, shifts)	Model-specific
Conditioning chains (zero-out, crop guides)	Required for correct conditioning
Pass-through wiring	Don't remove nodes, bypass them

Cloud compatibility (verified May 2025)

The full LTX 2.3 T2V template (video_ltx2_3_t2v.json) runs without modification on Comfy Cloud.

Confirmed working on Cloud (all custom nodes available): ComfyMathExpression, ResizeImageMaskNode, ResizeImagesByLongerEdge, PrimitiveInt, PrimitiveStringMultiline, PrimitiveBoolean, SaveVideo, LTXVCropGuides, LTXVImgToVideoInplace, LTXVConcatAVLatent, LTXVSeparateAVLatent, LTXVLatentUpsampler, LTXVAudioVAELoader, LTXVAudioVAEDecode, LTXVEmptyLatentAudio, LTXVPreprocess, LTXVConditioning, ManualSigmas, LTXAVTextEncoderLoader, plus all core nodes.

Cloud vs Local for LTX 2.3 (768x512):

Cloud: ~39s per video (4x faster).
Local (RTX 5090): ~160s per video.
example.png placeholder works on Cloud for bypassed image-dependent paths.
Submission format is identical between local and Cloud: {"prompt": wf, "extra_data": {}} to /api/prompt.
Free tier = 1 concurrent job.

Cloud submission pitfalls:

/api/object_info/<node> returns 404 on free tier — can't query node schemas remotely, but the workflow runs fine anyway. Always probe object_info locally before building workflows.
Cloud is ~4x faster — prefer Cloud for batch runs unless local is needed for debugging.
Cloud /api/view returns 302 redirect to signed GCS URL — use curl -s -L to follow and download. Python urllib fails with 401 (forwards auth headers to GCS CDN).
COMFY_CLOUD_API_KEY is only in the terminal/bash env, not in the Python sandbox. Use subprocess or terminal scripts for Cloud API calls.
Cloud free tier processes jobs sequentially (1 at a time). Submit all, then poll history.
LTX 2.3 at 1920x1080 OOMs locally (even RTX 5090) — upscaler pass exceeds VRAM. Prefer Cloud for 1080p; use 1280x720 locally (~90s/video).

FFmpeg stitch settings (Discord-compatible)

Generated ComfyUI videos often use yuv444p pixel format which does NOT work on Discord. Re-encode with:

ffmpeg -y -i input.mp4 \
  -c:v libx264 -profile:v main -preset medium -crf 13 -pix_fmt yuv420p \
  -c:a aac -b:a 192k \
  output_discord.mp4

Key settings:

-pix_fmt yuv420p — required for Discord, ComfyUI outputs yuv444p by default.
-crf 13 — high quality without massive file size (default 23 is too lossy).
-profile:v main — widely compatible.

For multi-video crossfade stitching, chain xfade (video) and acrossfade (audio):

ffmpeg -y -i a.mp4 -i b.mp4 -i c.mp4 \
  -filter_complex "[0:v][1:v]xfade=transition=fade:duration=1:offset=3.04[v1];[v1][2:v]xfade=transition=fade:duration=1:offset=6.08[vout];[0:a][1:a]acrossfade=duration=1:c1=tri:c2=tri[a1];[a1][2:a]acrossfade=duration=1:c1=tri:c2=tri[aout]" \
  -map "[vout]" -map "[aout]" \
  -c:v libx264 -profile:v main -crf 13 -pix_fmt yuv420p \
  -c:a aac -b:a 192k \
  output.mp4

Offset for xfade #N = (N+1) × duration - N × overlap.

8.6 KiB Raw Blame History Unescape Escape