fix(skills/comfyui): bug fixes, cloud parity, expanded coverage, examples, tests

The audit of v4.1 surfaced ~70 issues across the five scripts and three reference docs — most user-visible (silent file overwrites, status-error misclassified as success, X-API-Key leaked to S3 on /api/view redirect, Cloud endpoints that 404 because they were renamed). v5.0.0 fixes those and fills the gaps that previously forced users to write their own glue (WebSocket monitoring, batch/sweep, img2img upload helper, dep auto-fix, log fetch, health check, example workflows). Critical fixes - run_workflow.py: poll_status now checks status_str==error BEFORE completed:true, so a failed run no longer reports success - run_workflow.py: download_output streams to disk via safe_path_join, preserves server subfolder structure (no silent overwrites), and retries with exponential backoff - run_workflow.py: refuses to overwrite a link with a literal in inject_params (would silently break wiring) - _common.py: _StripSensitiveOnRedirectSession (subclasses requests.Session.rebuild_auth) drops X-API-Key/Cookie on cross-host redirects — fixes a real key-leak path through Cloud's signed-URL download flow. Tested - Cloud routing (verified live): /history → /history_v2, /models/<f> → /experiment/models/<f>, plus folder aliases for the unet ↔ diffusion_models and clip ↔ text_encoders rename - check_deps.py: distinguishes 200/empty vs 404 folder_not_found vs 403 free-tier; emits concrete fix_command per missing dep - extract_schema.py: prompt vs negative_prompt determined by tracing KSampler.{positive,negative} connections (incl. through Reroute / Primitive nodes) instead of meta-title heuristic; symmetric duplicate-name resolution; cycle-safe trace_to_node - hardware_check.py: multi-GPU pick-best, Apple variant detection, Rosetta detection, WSL2, ROCm --json, disk-space check, optional PyTorch probe; powershell preferred over deprecated wmic - comfyui_setup.sh: prefers pipx → uvx → pip --user (with PEP-668 fallback); idempotent — skips relaunch if server already up; configurable port/workspace; persistent log; SIGINT trap New scripts - run_batch.py — count or sweep (cartesian product), parallel up to cloud tier limit - ws_monitor.py — real-time WebSocket viewer; saves preview frames - auto_fix_deps.py — runs comfy node install / model download for whatever check_deps reports missing (with --dry-run) - health_check.py — single command that runs the verification checklist (comfy-cli + server + checkpoints + optional smoke test that cancels itself to avoid burning compute) - fetch_logs.py — pull traceback / status messages for a prompt_id Coverage expansion - Param patterns now cover Flux (BasicScheduler, BasicGuider, RandomNoise, ModelSamplingFlux), SD3, Wan/Hunyuan/LTX video, IPAdapter, rgthree, easy-use, AnimateDiff - Embedding refs in CLIPTextEncode strings extracted as model deps - ckpt_name / vae_name / lora_name / unet_name now controllable so workflows can be retargeted per run Examples - workflows/{sd15,sdxl,flux_dev}_txt2img.json - workflows/sdxl_{img2img,inpaint}.json - workflows/upscale_4x.json - workflows/{animatediff_video,wan_video_t2v}.json + README Tests - 117 tests (105 unit + 8 cloud integration + 4 cross-host security) - Cloud tests auto-skip without COMFY_CLOUD_API_KEY; verified end-to-end against live cloud API Backwards compatibility - All existing CLI flags continue to work; new behavior is opt-in (--ws, --input-image, --randomize-seed, --flat-output, etc.)
2026-05-03 02:11:48 +00:00 · 2026-04-29 20:50:52 -04:00 · 2026-04-29 20:50:52 -04:00 · a7780fe05f
commit a7780fe05f
parent 7d48a16f14
32 changed files with 6117 additions and 1372 deletions
--- a/skills/creative/comfyui/references/workflow-format.md
+++ b/skills/creative/comfyui/references/workflow-format.md
@ -1,10 +1,12 @@
 # ComfyUI Workflow JSON Format

-## Two Formats
+## Two Formats — Only API Format Is Executable

-ComfyUI uses two workflow formats. **Only API format works for programmatic execution.**
+**API format** is required for `/api/prompt` and every script in this skill.
+The web UI also produces an "editor format" used for visual editing, which
+**cannot** be submitted directly.

-### API Format (what we use)
+### API Format

 Top-level keys are string node IDs. Each node has `class_type` and `inputs`:

@ -28,191 +30,197 @@ Top-level keys are string node IDs. Each node has `class_type` and `inputs`:
  },
  "4": {
    "class_type": "CheckpointLoaderSimple",
-    "inputs": {
-      "ckpt_name": "v1-5-pruned-emaonly.safetensors"
-    }
-  },
-  "5": {
-    "class_type": "EmptyLatentImage",
-    "inputs": {"width": 512, "height": 512, "batch_size": 1}
-  },
-  "6": {
-    "class_type": "CLIPTextEncode",
-    "inputs": {
-      "text": "a beautiful cat",
-      "clip": ["4", 1]
-    }
-  },
-  "7": {
-    "class_type": "CLIPTextEncode",
-    "inputs": {
-      "text": "bad quality, ugly",
-      "clip": ["4", 1]
-    }
-  },
-  "9": {
-    "class_type": "SaveImage",
-    "inputs": {
-      "filename_prefix": "ComfyUI",
-      "images": ["8", 0]
-    }
+    "inputs": {"ckpt_name": "v1-5-pruned-emaonly.safetensors"}
  }
 }
 ```

-**How to detect:** Top-level keys are numeric strings, each value has `class_type`.
+**Detection:** every top-level value has `class_type`. The skill's
+`_common.is_api_format()` does this check.

 ### Editor Format (not directly executable)

-Has `nodes[]` and `links[]` arrays — the visual graph data from the ComfyUI web editor.
-This is what "Save" produces. For API use, export with "Save (API Format)" instead.
+Has `nodes[]` and `links[]` arrays — the visual graph. To convert: open in
+ComfyUI's web UI and use **Workflow → Export (API)** (newer UI) or the
+"Save (API Format)" button (older UI).

-**How to detect:** Top-level has `"nodes"` and `"links"` keys.
+**Detection:** top-level has `"nodes"` and `"links"` keys.

---
+## Inputs: Literals vs Links

-## Input Connections
+```json
+"inputs": {
+  "text": "a cat",         // literal — modifiable
+  "seed": 42,              // literal — modifiable
+  "clip": ["4", 1]         // link — wiring; do NOT overwrite
+}
+```

-Inputs can be:
- **Literal values**: `"text": "a cat"`, `"seed": 42`, `"width": 512`
- **Links to other nodes**: `["node_id", output_index]` — e.g., `["4", 0]` means
-  output slot 0 of node "4"
-
-Only literal values can be modified by parameter injection. Linked inputs are wiring.
-
---
+Links are length-2 arrays of `[upstream_node_id, output_slot]`. The skill's
+parameter injector refuses to overwrite a link with a literal (logs a
+warning and skips).

 ## Common Node Types and Their Controllable Parameters

+The full catalog lives in `scripts/_common.py` (`PARAM_PATTERNS` and
+`MODEL_LOADERS`). Highlights:
+
 ### Text Prompts

 | Node Class | Key Fields |
-|------------|-----------|
-| `CLIPTextEncode` | `text` (the prompt string) |
+|------------|------------|
+| `CLIPTextEncode` | `text` |
 | `CLIPTextEncodeSDXL` | `text_g`, `text_l`, `width`, `height` |
+| `CLIPTextEncodeFlux` | `clip_l`, `t5xxl`, `guidance` |

-Usually: positive prompt → one CLIPTextEncode, negative prompt → another.
-Distinguish by checking the `_meta.title` field or by tracing which feeds into
-positive vs negative inputs of the sampler.
+To distinguish positive from negative the skill traces `KSampler.negative`
+back through Reroute / Primitive nodes to the source CLIPTextEncode. Falls
+back to `_meta.title` heuristics ("negative", "neg", "anti").

 ### Sampling

 | Node Class | Key Fields |
-|------------|-----------|
+|------------|------------|
 | `KSampler` | `seed`, `steps`, `cfg`, `sampler_name`, `scheduler`, `denoise` |
-| `KSamplerAdvanced` | `noise_seed`, `steps`, `cfg`, `sampler_name`, `scheduler`, `start_at_step`, `end_at_step` |
-| `SamplerCustom` | `cfg`, `sampler`, `sigmas` |
+| `KSamplerAdvanced` | `noise_seed`, `steps`, `cfg`, `start_at_step`, `end_at_step` |
+| `SamplerCustom` | `noise_seed`, `cfg`, `sampler`, `sigmas` |
+| `SamplerCustomAdvanced` | `noise_seed` (via RandomNoise input) |
+| `RandomNoise` | `noise_seed` |
+| `BasicScheduler` | `steps`, `scheduler`, `denoise` |
+| `KSamplerSelect` | `sampler_name` |
+| `BasicGuider` / `CFGGuider` | `cfg` |
+| `ModelSamplingFlux` | `max_shift`, `base_shift`, `width`, `height` |
+| `SDTurboScheduler` | `steps`, `denoise` |

-### Image Dimensions
+### Latent / Dimensions

 | Node Class | Key Fields |
-|------------|-----------|
+|------------|------------|
 | `EmptyLatentImage` | `width`, `height`, `batch_size` |
-| `LatentUpscale` | `width`, `height`, `upscale_method` |
+| `EmptySD3LatentImage` | `width`, `height`, `batch_size` |
+| `EmptyHunyuanLatentVideo` | `width`, `height`, `length`, `batch_size` |
+| `EmptyMochiLatentVideo` | `width`, `height`, `length`, `batch_size` |
+| `EmptyLTXVLatentVideo` | `width`, `height`, `length`, `batch_size` |

 ### Model Loading

-| Node Class | Key Fields | Model Folder |
-|------------|-----------|-------------|
+| Node Class | Key Fields | Folder |
+|------------|------------|--------|
 | `CheckpointLoaderSimple` | `ckpt_name` | `checkpoints` |
 | `LoraLoader` | `lora_name`, `strength_model`, `strength_clip` | `loras` |
+| `LoraLoaderModelOnly` | `lora_name`, `strength_model` | `loras` |
 | `VAELoader` | `vae_name` | `vae` |
 | `ControlNetLoader` | `control_net_name` | `controlnet` |
 | `CLIPLoader` | `clip_name` | `clip` |
+| `DualCLIPLoader` | `clip_name1`, `clip_name2` | `clip` |
+| `TripleCLIPLoader` | `clip_name1/2/3` | `clip` |
 | `UNETLoader` | `unet_name` | `unet` |
 | `DiffusionModelLoader` | `model_name` | `diffusion_models` |
 | `UpscaleModelLoader` | `model_name` | `upscale_models` |
+| `IPAdapterModelLoader` | `ipadapter_file` | `ipadapter` |
+| `ADE_AnimateDiffLoaderWithContext` | `model_name`, `motion_scale` | `animatediff_models` |

 ### Image Input/Output

 | Node Class | Key Fields |
-|------------|-----------|
-| `LoadImage` | `image` (filename on server, after upload) |
-| `LoadImageMask` | `image`, `channel` |
+|------------|------------|
+| `LoadImage` | `image` (server-side filename, after upload) |
+| `LoadImageMask` | `image`, `channel` (`red` / `green` / `blue` / `alpha`) |
+| `VAEEncode` / `VAEDecode` | (no controllable fields) |
+| `VAEEncodeForInpaint` | `grow_mask_by` |
 | `SaveImage` | `filename_prefix` |
-| `PreviewImage` | (no controllable fields, just previews) |
+| `VHS_VideoCombine` | `frame_rate`, `format`, `filename_prefix`, `loop_count`, `pingpong` |

 ### ControlNet

 | Node Class | Key Fields |
-|------------|-----------|
+|------------|------------|
 | `ControlNetApply` | `strength` |
 | `ControlNetApplyAdvanced` | `strength`, `start_percent`, `end_percent` |

-### Video (AnimateDiff)
+### IPAdapter (community pack `comfyui_ipadapter_plus`)

 | Node Class | Key Fields |
-|------------|-----------|
-| `ADE_AnimateDiffLoaderWithContext` | `model_name`, `motion_scale` |
-| `VHS_VideoCombine` | `frame_rate`, `format`, `filename_prefix` |
+|------------|------------|
+| `IPAdapterAdvanced` | `weight`, `start_at`, `end_at` |
+| `IPAdapter` | `weight` |

---
+### Embeddings (referenced inside prompt strings)
+
+ComfyUI scans prompt text for `embedding:NAME` syntax. The skill's
+`_common.iter_embedding_refs()` extracts these as model dependencies.
+
+```text
+"a beautiful cat, embedding:goodvibes:1.2, embedding:art-style"
+```
+
+`extract_schema.py` and `check_deps.py` surface these in
+`embedding_dependencies` / `missing_embeddings`.

 ## Parameter Injection Pattern

-To modify a workflow programmatically:
-
 ```python
 import json, copy

 with open("workflow_api.json") as f:
    workflow = json.load(f)

-# Deep copy to avoid mutating original
 wf = copy.deepcopy(workflow)
-
-# Inject parameters by node ID + field name
-wf["6"]["inputs"]["text"] = "a beautiful sunset"     # positive prompt
-wf["7"]["inputs"]["text"] = "ugly, blurry"           # negative prompt
-wf["3"]["inputs"]["seed"] = 42                       # seed
-wf["3"]["inputs"]["steps"] = 30                      # steps
-wf["5"]["inputs"]["width"] = 1024                    # width
-wf["5"]["inputs"]["height"] = 1024                   # height
+wf["6"]["inputs"]["text"] = "a beautiful sunset"
+wf["7"]["inputs"]["text"] = "ugly, blurry"
+wf["3"]["inputs"]["seed"] = 42
+wf["3"]["inputs"]["steps"] = 30
+wf["5"]["inputs"]["width"] = 1024
+wf["5"]["inputs"]["height"] = 1024
 ```

-The `scripts/extract_schema.py` in this skill automates discovering which
-node IDs and fields correspond to which user-facing parameters.
-
---
+`scripts/extract_schema.py` automates discovering which node IDs/fields
+correspond to which user-facing parameters. It returns a `parameters` dict
+that `run_workflow.py` reads to inject values from `--args`.

 ## Identifying Controllable Parameters (Heuristics)

-When analyzing an unknown workflow, these patterns identify user-facing params:
+For unknown workflows:

-1. **Prompt text**: Any `CLIPTextEncode` → `text` field. Title/meta usually
-   indicates positive vs negative.
-
-2. **Seed**: Any `KSampler` / `KSamplerAdvanced` → `seed` / `noise_seed`.
-   Randomizable — set to different values for variations.
-
-3. **Dimensions**: `EmptyLatentImage` → `width`, `height`. Common: 512, 768,
-   1024 (must be multiples of 8).
-
-4. **Steps**: `KSampler` → `steps`. More = higher quality + slower. 20-50 typical.
-
-5. **CFG scale**: `KSampler` → `cfg`. How closely to follow prompt. 5-15 typical.
-
-6. **Model/checkpoint**: `CheckpointLoaderSimple` → `ckpt_name`. Must match an
-   installed model filename exactly.
-
-7. **LoRA**: `LoraLoader` → `lora_name`, `strength_model`. Adapter name + weight.
-
-8. **Images for img2img**: `LoadImage` → `image`. Filename on server after upload.
-
-9. **Denoise strength**: `KSampler` → `denoise`. 0.0-1.0. Lower = closer to input
-   image. Only relevant for img2img.
-
---
+1. **Prompt text** — any `CLIPTextEncode.text`. Use connection tracing back
+   from `KSampler.positive` / `.negative` to disambiguate (don't trust
+   meta-title alone).
+2. **Seed** — `KSampler.seed` / `KSamplerAdvanced.noise_seed` / `RandomNoise.noise_seed`.
+3. **Dimensions** — `Empty*LatentImage.width/height` (must be multiples of 8).
+4. **Steps / CFG** — `KSampler.steps`, `KSampler.cfg`. Steps 20–50 typical.
+   CFG 5–15 typical (Flux uses guidance, not CFG).
+5. **Model / checkpoint** — `CheckpointLoaderSimple.ckpt_name`. Filename must
+   match an installed file *exactly*.
+6. **LoRA** — `LoraLoader.lora_name`, `.strength_model`.
+7. **Images for img2img / inpaint** — `LoadImage.image`. Server-side filename
+   after upload.
+8. **Denoise** — `KSampler.denoise`. 0.0–1.0; 1.0 = ignore input image,
+   0.0 = pass through. Sweet spot for img2img: 0.4–0.7.

 ## Output Nodes

-Output is produced by these node types:
+Output is produced by these node types. The skill's `OUTPUT_NODES` set
+extends to common community packs.

 | Node | Output Key | Content |
 |------|-----------|---------|
 | `SaveImage` | `images` | List of `{filename, subfolder, type}` |
-| `VHS_VideoCombine` | `gifs` or `videos` | Video file references |
-| `SaveAudio` | `audio` | Audio file references |
 | `PreviewImage` | `images` | Temporary preview (not saved) |
+| `VHS_VideoCombine` | `gifs` (older) or `videos`/`video` (newer cloud) | Video file refs |
+| `SaveAudio` | `audio` | Audio file refs |
+| `SaveAnimatedWEBP` / `SaveAnimatedPNG` | `images` | Animated images |
+| `Save3D` | `3d` | 3D asset refs |

-After execution, fetch outputs from `/history/{prompt_id}` → `outputs` → `{node_id}`.
+After execution, fetch outputs from `/history/{prompt_id}` (local) or
+`/api/jobs/{prompt_id}` (cloud) → `outputs` → `{node_id}` → `{key}`.
+
+## Wrapper Variants
+
+Some saved JSON files wrap the workflow under a `"prompt"` key (matching
+the `/api/prompt` payload shape). The skill's `_common.unwrap_workflow()`
+handles this — pass any of:
+
+- raw API format: `{"3": {...}, "4": {...}}`
+- wrapped: `{"prompt": {"3": {...}}, "client_id": "..."}`
+
+It rejects editor format with a clear error and a re-export instruction.