mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-22 05:22:09 +00:00
Adds references/template-integrity.md covering safe conversion of the official comfyui-workflow-templates package from editor format to API format — Reroute bypass via link tracing, dotted dynamic-input keys (values.a, resize_type.width) that must NOT be flattened, server-error "patch don't rebuild" loop, Cloud quirks (302 redirect to signed GCS URL, free-tier 1 concurrent job, 1920x1080 OOM on RTX 5090), and a Discord-compatible ffmpeg stitch recipe (yuv420p + xfade/acrossfade). SKILL.md lists the new reference so the agent loads it when starting from an official template. purzbeats added to author list and to scripts/release.py AUTHOR_MAP. Co-authored-by: purzbeats <97489706+purzbeats@users.noreply.github.com>
243 lines
8.6 KiB
Markdown
243 lines
8.6 KiB
Markdown
# ComfyUI Workflow-Template Integrity
|
||
|
||
> **Authored by [@purzbeats](https://github.com/purzbeats)** — adapted from
|
||
> [purzbeats/hermes-agent-comfyui-helper](https://github.com/purzbeats/hermes-agent-comfyui-helper).
|
||
> Use this reference when converting workflows from the official
|
||
> `comfyui-workflow-templates` package (editor format) into API format for
|
||
> submission via `/api/prompt`. The conversion has subtle gotchas that cause
|
||
> hard-to-diagnose validation errors if you don't follow these rules.
|
||
|
||
## Background
|
||
|
||
The official ComfyUI template package (`comfyui-workflow-templates`, currently
|
||
v0.9.69) is installed inside the ComfyUI venv at a path like:
|
||
|
||
```
|
||
<comfy-install>/.venv/lib/python3.*/site-packages/comfyui_workflow_templates_*/templates/
|
||
```
|
||
|
||
The exact path depends on how ComfyUI was installed (comfy-cli default,
|
||
Comfy Desktop, manual venv, etc.). Find it once with:
|
||
|
||
```bash
|
||
comfy --workspace <ws> run-python -c "import comfyui_workflow_templates, pathlib; print(pathlib.Path(comfyui_workflow_templates.__file__).parent / 'templates')"
|
||
```
|
||
|
||
Templates ship in **editor format** — `nodes` / `links` arrays inside
|
||
`data['definitions']['subgraphs'][0]`. They must be converted to **API
|
||
format** (a `node_id -> {class_type, inputs}` mapping) before submission.
|
||
|
||
---
|
||
|
||
## RULE #1: Use templates AS CLOSE TO ORIGINAL AS POSSIBLE
|
||
|
||
- **Never strip, simplify, or "minimize" nodes** from a template.
|
||
- Full template architecture (dual-pass pipelines, LoRA chains, distilled
|
||
sigmas, conditioning paths) is intentional — removing any part breaks quality.
|
||
- If an image-dependent path exists but the task is text-to-video, **leave
|
||
it wired with the bypass toggle enabled** — don't remove the nodes.
|
||
- Only change: prompt text, seed, and dimensions (when explicitly requested).
|
||
|
||
## RULE #2: Server validation errors are the source of truth
|
||
|
||
When a workflow submission fails, the server response looks like:
|
||
|
||
```json
|
||
{
|
||
"node_errors": {
|
||
"238": {
|
||
"errors": [{
|
||
"message": "Required input is missing",
|
||
"details": "width",
|
||
"extra_info": { "input_name": "resize_type.width" }
|
||
}]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**The `extra_info.input_name` field tells you EXACTLY what JSON key the server
|
||
wants. Use it literally.** If it says `"values.a"` or `"resize_type.width"`,
|
||
those are the actual key names in the JSON object. Do not "simplify" them to
|
||
flat names based on assumptions about what the field "should" be called.
|
||
|
||
## RULE #3: Don't rebuild from scratch — patch the failing nodes
|
||
|
||
Every regeneration from the template reintroduces the same bugs. Instead:
|
||
|
||
1. Submit the workflow once.
|
||
2. Read the server error details for exact key names.
|
||
3. Use targeted patch/fix calls against the workflow file on disk.
|
||
4. Resubmit and check if errors resolved.
|
||
|
||
---
|
||
|
||
## Reroute nodes: bypass, don't delete
|
||
|
||
Most servers (local, Cloud) don't have a `Reroute` node type. When converting
|
||
a template:
|
||
|
||
1. Find what feeds into the Reroute by looking at links where
|
||
`target_id` = the Reroute node ID.
|
||
2. Replace all inputs referencing the Reroute with
|
||
`[source_node_id, source_slot]`.
|
||
3. Delete the Reroute node from the API mapping.
|
||
|
||
**Real example — LTX 2.3 t2v template:**
|
||
|
||
- Reroute node 255 receives VAE from `CheckpointLoaderSimple 236` slot 2.
|
||
- Three nodes reference Reroute 255 for their VAE input:
|
||
`LTXVImgToVideoInplace` (230), `LTXVLatentUpsampler` (253),
|
||
`VAEDecodeTiled` (251).
|
||
- Fix: replace all occurrences of `vae: ["255", 0]` with `vae: ["236", 2]`.
|
||
- `CheckpointLoaderSimple` slot 2 = VAE (not slot 0 = MODEL).
|
||
|
||
| | |
|
||
|---|---|
|
||
| ❌ Wrong | `vae: ["236", 0]` → `MODELV mismatch input_type(VAE)` |
|
||
| ✅ Correct | `vae: ["236", 2]` |
|
||
|
||
---
|
||
|
||
## Dynamic template nodes: dotted key names are correct
|
||
|
||
### ComfyMathExpression (COMFY_AUTOGROW_V3)
|
||
|
||
```json
|
||
{
|
||
"class_type": "ComfyMathExpression",
|
||
"inputs": {
|
||
"expression": "a/2",
|
||
"values.a": ["257", 0]
|
||
}
|
||
}
|
||
```
|
||
|
||
- `values` is a `COMFY_AUTOGROW_V3` template.
|
||
- Input names in links are `values.a`, `values.b`, etc.
|
||
- **Keep the dotted format as JSON keys.**
|
||
- Do NOT convert to `{"values": {"a": ...}}` or flatten to just `"a"`.
|
||
|
||
### ResizeImageMaskNode (COMFY_DYNAMICCOMBO_V3)
|
||
|
||
```json
|
||
{
|
||
"class_type": "ResizeImageMaskNode",
|
||
"inputs": {
|
||
"input": ["276", 0],
|
||
"scale_method": "lanczos",
|
||
"resize_type": "scale dimensions",
|
||
"resize_type.width": 1920,
|
||
"resize_type.height": 1088,
|
||
"resize_type.crop": "center"
|
||
}
|
||
}
|
||
```
|
||
|
||
- `resize_type` is a `COMFY_DYNAMICCOMBO_V3`.
|
||
- Mode-specific fields: `resize_type.width`, `resize_type.height`, `resize_type.crop`.
|
||
- `scale_method` options: `"nearest-exact"`, `"bilinear"`, `"area"`, `"bicubic"`, `"lanczos"`.
|
||
- **Keep the dotted format as JSON keys.**
|
||
- Do NOT flatten `resize_type.width` to just `"width"`.
|
||
|
||
---
|
||
|
||
## Conversion recipe
|
||
|
||
1. Load template from the installed package path.
|
||
2. Parse `data['definitions']['subgraphs'][0]`.
|
||
3. For each node (skip Reroute):
|
||
- Resolve linked inputs from `sg['links']` dict.
|
||
- Map `widgets_values` to input field names.
|
||
- Keep all dotted key names as-is from the template.
|
||
4. Bypass Reroute: trace source, replace references.
|
||
5. Change only: prompt text, seed values, and user-requested parameters.
|
||
6. Add `SaveVideo` terminal node if template uses only `CreateVideo`.
|
||
7. Submit → read errors → patch specific nodes → resubmit.
|
||
|
||
## What to NEVER change in a template
|
||
|
||
| Element | Why |
|
||
|---------|-----|
|
||
| Node topology | Graph is designed for the specific model |
|
||
| Sigmas values | Tuned for the model/sampler combination |
|
||
| LoRA/distilled paths | Required for quality, even if they look unused |
|
||
| Model parameters (cfg, steps, shifts) | Model-specific |
|
||
| Conditioning chains (zero-out, crop guides) | Required for correct conditioning |
|
||
| Pass-through wiring | Don't remove nodes, bypass them |
|
||
|
||
---
|
||
|
||
## Cloud compatibility (verified May 2025)
|
||
|
||
The full LTX 2.3 T2V template (`video_ltx2_3_t2v.json`) runs **without
|
||
modification** on Comfy Cloud.
|
||
|
||
**Confirmed working on Cloud (all custom nodes available):**
|
||
`ComfyMathExpression`, `ResizeImageMaskNode`, `ResizeImagesByLongerEdge`,
|
||
`PrimitiveInt`, `PrimitiveStringMultiline`, `PrimitiveBoolean`, `SaveVideo`,
|
||
`LTXVCropGuides`, `LTXVImgToVideoInplace`, `LTXVConcatAVLatent`,
|
||
`LTXVSeparateAVLatent`, `LTXVLatentUpsampler`, `LTXVAudioVAELoader`,
|
||
`LTXVAudioVAEDecode`, `LTXVEmptyLatentAudio`, `LTXVPreprocess`,
|
||
`LTXVConditioning`, `ManualSigmas`, `LTXAVTextEncoderLoader`, plus all core
|
||
nodes.
|
||
|
||
**Cloud vs Local for LTX 2.3 (768x512):**
|
||
|
||
- Cloud: ~39s per video (4x faster).
|
||
- Local (RTX 5090): ~160s per video.
|
||
- `example.png` placeholder works on Cloud for bypassed image-dependent paths.
|
||
- Submission format is **identical** between local and Cloud:
|
||
`{"prompt": wf, "extra_data": {}}` to `/api/prompt`.
|
||
- Free tier = 1 concurrent job.
|
||
|
||
**Cloud submission pitfalls:**
|
||
|
||
- `/api/object_info/<node>` returns 404 on free tier — can't query node
|
||
schemas remotely, but the workflow runs fine anyway. Always probe
|
||
`object_info` locally before building workflows.
|
||
- Cloud is ~4x faster — prefer Cloud for batch runs unless local is needed
|
||
for debugging.
|
||
- Cloud `/api/view` returns **302 redirect to signed GCS URL** — use
|
||
`curl -s -L` to follow and download. Python `urllib` fails with 401
|
||
(forwards auth headers to GCS CDN).
|
||
- `COMFY_CLOUD_API_KEY` is only in the terminal/bash env, not in the Python
|
||
sandbox. Use subprocess or terminal scripts for Cloud API calls.
|
||
- Cloud free tier processes jobs **sequentially** (1 at a time). Submit all,
|
||
then poll history.
|
||
- LTX 2.3 at **1920x1080 OOMs locally** (even RTX 5090) — upscaler pass
|
||
exceeds VRAM. Prefer Cloud for 1080p; use 1280x720 locally (~90s/video).
|
||
|
||
---
|
||
|
||
## FFmpeg stitch settings (Discord-compatible)
|
||
|
||
Generated ComfyUI videos often use `yuv444p` pixel format which does NOT work
|
||
on Discord. Re-encode with:
|
||
|
||
```bash
|
||
ffmpeg -y -i input.mp4 \
|
||
-c:v libx264 -profile:v main -preset medium -crf 13 -pix_fmt yuv420p \
|
||
-c:a aac -b:a 192k \
|
||
output_discord.mp4
|
||
```
|
||
|
||
Key settings:
|
||
|
||
- `-pix_fmt yuv420p` — **required for Discord**, ComfyUI outputs `yuv444p` by default.
|
||
- `-crf 13` — high quality without massive file size (default 23 is too lossy).
|
||
- `-profile:v main` — widely compatible.
|
||
|
||
For multi-video crossfade stitching, chain `xfade` (video) and `acrossfade`
|
||
(audio):
|
||
|
||
```bash
|
||
ffmpeg -y -i a.mp4 -i b.mp4 -i c.mp4 \
|
||
-filter_complex "[0:v][1:v]xfade=transition=fade:duration=1:offset=3.04[v1];[v1][2:v]xfade=transition=fade:duration=1:offset=6.08[vout];[0:a][1:a]acrossfade=duration=1:c1=tri:c2=tri[a1];[a1][2:a]acrossfade=duration=1:c1=tri:c2=tri[aout]" \
|
||
-map "[vout]" -map "[aout]" \
|
||
-c:v libx264 -profile:v main -crf 13 -pix_fmt yuv420p \
|
||
-c:a aac -b:a 192k \
|
||
output.mp4
|
||
```
|
||
|
||
Offset for xfade #N = `(N+1) × duration - N × overlap`.
|