The audit of v4.1 surfaced ~70 issues across the five scripts and three
reference docs — most user-visible (silent file overwrites, status-error
misclassified as success, X-API-Key leaked to S3 on /api/view redirect,
Cloud endpoints that 404 because they were renamed). v5.0.0 fixes those
and fills the gaps that previously forced users to write their own glue
(WebSocket monitoring, batch/sweep, img2img upload helper, dep auto-fix,
log fetch, health check, example workflows).
Critical fixes
- run_workflow.py: poll_status now checks status_str==error BEFORE
completed:true, so a failed run no longer reports success
- run_workflow.py: download_output streams to disk via safe_path_join,
preserves server subfolder structure (no silent overwrites), and
retries with exponential backoff
- run_workflow.py: refuses to overwrite a link with a literal in
inject_params (would silently break wiring)
- _common.py: _StripSensitiveOnRedirectSession (subclasses
requests.Session.rebuild_auth) drops X-API-Key/Cookie on cross-host
redirects — fixes a real key-leak path through Cloud's signed-URL
download flow. Tested
- Cloud routing (verified live): /history → /history_v2,
/models/<f> → /experiment/models/<f>, plus folder aliases for the
unet ↔ diffusion_models and clip ↔ text_encoders rename
- check_deps.py: distinguishes 200/empty vs 404 folder_not_found vs
403 free-tier; emits concrete fix_command per missing dep
- extract_schema.py: prompt vs negative_prompt determined by tracing
KSampler.{positive,negative} connections (incl. through Reroute /
Primitive nodes) instead of meta-title heuristic; symmetric
duplicate-name resolution; cycle-safe trace_to_node
- hardware_check.py: multi-GPU pick-best, Apple variant detection,
Rosetta detection, WSL2, ROCm --json, disk-space check, optional
PyTorch probe; powershell preferred over deprecated wmic
- comfyui_setup.sh: prefers pipx → uvx → pip --user (with PEP-668
fallback); idempotent — skips relaunch if server already up;
configurable port/workspace; persistent log; SIGINT trap
New scripts
- run_batch.py — count or sweep (cartesian product), parallel up to
cloud tier limit
- ws_monitor.py — real-time WebSocket viewer; saves preview frames
- auto_fix_deps.py — runs comfy node install / model download for
whatever check_deps reports missing (with --dry-run)
- health_check.py — single command that runs the verification checklist
(comfy-cli + server + checkpoints + optional smoke test that cancels
itself to avoid burning compute)
- fetch_logs.py — pull traceback / status messages for a prompt_id
Coverage expansion
- Param patterns now cover Flux (BasicScheduler, BasicGuider,
RandomNoise, ModelSamplingFlux), SD3, Wan/Hunyuan/LTX video,
IPAdapter, rgthree, easy-use, AnimateDiff
- Embedding refs in CLIPTextEncode strings extracted as model deps
- ckpt_name / vae_name / lora_name / unet_name now controllable so
workflows can be retargeted per run
Examples
- workflows/{sd15,sdxl,flux_dev}_txt2img.json
- workflows/sdxl_{img2img,inpaint}.json
- workflows/upscale_4x.json
- workflows/{animatediff_video,wan_video_t2v}.json + README
Tests
- 117 tests (105 unit + 8 cloud integration + 4 cross-host security)
- Cloud tests auto-skip without COMFY_CLOUD_API_KEY; verified end-to-end
against live cloud API
Backwards compatibility
- All existing CLI flags continue to work; new behavior is opt-in
(--ws, --input-image, --randomize-seed, --flat-output, etc.)
7.8 KiB
ComfyUI Workflow JSON Format
Two Formats — Only API Format Is Executable
API format is required for /api/prompt and every script in this skill.
The web UI also produces an "editor format" used for visual editing, which
cannot be submitted directly.
API Format
Top-level keys are string node IDs. Each node has class_type and inputs:
{
"3": {
"class_type": "KSampler",
"inputs": {
"seed": 156680208700286,
"steps": 20,
"cfg": 8,
"sampler_name": "euler",
"scheduler": "normal",
"denoise": 1.0,
"model": ["4", 0],
"positive": ["6", 0],
"negative": ["7", 0],
"latent_image": ["5", 0]
},
"_meta": {"title": "KSampler"}
},
"4": {
"class_type": "CheckpointLoaderSimple",
"inputs": {"ckpt_name": "v1-5-pruned-emaonly.safetensors"}
}
}
Detection: every top-level value has class_type. The skill's
_common.is_api_format() does this check.
Editor Format (not directly executable)
Has nodes[] and links[] arrays — the visual graph. To convert: open in
ComfyUI's web UI and use Workflow → Export (API) (newer UI) or the
"Save (API Format)" button (older UI).
Detection: top-level has "nodes" and "links" keys.
Inputs: Literals vs Links
"inputs": {
"text": "a cat", // literal — modifiable
"seed": 42, // literal — modifiable
"clip": ["4", 1] // link — wiring; do NOT overwrite
}
Links are length-2 arrays of [upstream_node_id, output_slot]. The skill's
parameter injector refuses to overwrite a link with a literal (logs a
warning and skips).
Common Node Types and Their Controllable Parameters
The full catalog lives in scripts/_common.py (PARAM_PATTERNS and
MODEL_LOADERS). Highlights:
Text Prompts
| Node Class | Key Fields |
|---|---|
CLIPTextEncode |
text |
CLIPTextEncodeSDXL |
text_g, text_l, width, height |
CLIPTextEncodeFlux |
clip_l, t5xxl, guidance |
To distinguish positive from negative the skill traces KSampler.negative
back through Reroute / Primitive nodes to the source CLIPTextEncode. Falls
back to _meta.title heuristics ("negative", "neg", "anti").
Sampling
| Node Class | Key Fields |
|---|---|
KSampler |
seed, steps, cfg, sampler_name, scheduler, denoise |
KSamplerAdvanced |
noise_seed, steps, cfg, start_at_step, end_at_step |
SamplerCustom |
noise_seed, cfg, sampler, sigmas |
SamplerCustomAdvanced |
noise_seed (via RandomNoise input) |
RandomNoise |
noise_seed |
BasicScheduler |
steps, scheduler, denoise |
KSamplerSelect |
sampler_name |
BasicGuider / CFGGuider |
cfg |
ModelSamplingFlux |
max_shift, base_shift, width, height |
SDTurboScheduler |
steps, denoise |
Latent / Dimensions
| Node Class | Key Fields |
|---|---|
EmptyLatentImage |
width, height, batch_size |
EmptySD3LatentImage |
width, height, batch_size |
EmptyHunyuanLatentVideo |
width, height, length, batch_size |
EmptyMochiLatentVideo |
width, height, length, batch_size |
EmptyLTXVLatentVideo |
width, height, length, batch_size |
Model Loading
| Node Class | Key Fields | Folder |
|---|---|---|
CheckpointLoaderSimple |
ckpt_name |
checkpoints |
LoraLoader |
lora_name, strength_model, strength_clip |
loras |
LoraLoaderModelOnly |
lora_name, strength_model |
loras |
VAELoader |
vae_name |
vae |
ControlNetLoader |
control_net_name |
controlnet |
CLIPLoader |
clip_name |
clip |
DualCLIPLoader |
clip_name1, clip_name2 |
clip |
TripleCLIPLoader |
clip_name1/2/3 |
clip |
UNETLoader |
unet_name |
unet |
DiffusionModelLoader |
model_name |
diffusion_models |
UpscaleModelLoader |
model_name |
upscale_models |
IPAdapterModelLoader |
ipadapter_file |
ipadapter |
ADE_AnimateDiffLoaderWithContext |
model_name, motion_scale |
animatediff_models |
Image Input/Output
| Node Class | Key Fields |
|---|---|
LoadImage |
image (server-side filename, after upload) |
LoadImageMask |
image, channel (red / green / blue / alpha) |
VAEEncode / VAEDecode |
(no controllable fields) |
VAEEncodeForInpaint |
grow_mask_by |
SaveImage |
filename_prefix |
VHS_VideoCombine |
frame_rate, format, filename_prefix, loop_count, pingpong |
ControlNet
| Node Class | Key Fields |
|---|---|
ControlNetApply |
strength |
ControlNetApplyAdvanced |
strength, start_percent, end_percent |
IPAdapter (community pack comfyui_ipadapter_plus)
| Node Class | Key Fields |
|---|---|
IPAdapterAdvanced |
weight, start_at, end_at |
IPAdapter |
weight |
Embeddings (referenced inside prompt strings)
ComfyUI scans prompt text for embedding:NAME syntax. The skill's
_common.iter_embedding_refs() extracts these as model dependencies.
"a beautiful cat, embedding:goodvibes:1.2, embedding:art-style"
extract_schema.py and check_deps.py surface these in
embedding_dependencies / missing_embeddings.
Parameter Injection Pattern
import json, copy
with open("workflow_api.json") as f:
workflow = json.load(f)
wf = copy.deepcopy(workflow)
wf["6"]["inputs"]["text"] = "a beautiful sunset"
wf["7"]["inputs"]["text"] = "ugly, blurry"
wf["3"]["inputs"]["seed"] = 42
wf["3"]["inputs"]["steps"] = 30
wf["5"]["inputs"]["width"] = 1024
wf["5"]["inputs"]["height"] = 1024
scripts/extract_schema.py automates discovering which node IDs/fields
correspond to which user-facing parameters. It returns a parameters dict
that run_workflow.py reads to inject values from --args.
Identifying Controllable Parameters (Heuristics)
For unknown workflows:
- Prompt text — any
CLIPTextEncode.text. Use connection tracing back fromKSampler.positive/.negativeto disambiguate (don't trust meta-title alone). - Seed —
KSampler.seed/KSamplerAdvanced.noise_seed/RandomNoise.noise_seed. - Dimensions —
Empty*LatentImage.width/height(must be multiples of 8). - Steps / CFG —
KSampler.steps,KSampler.cfg. Steps 20–50 typical. CFG 5–15 typical (Flux uses guidance, not CFG). - Model / checkpoint —
CheckpointLoaderSimple.ckpt_name. Filename must match an installed file exactly. - LoRA —
LoraLoader.lora_name,.strength_model. - Images for img2img / inpaint —
LoadImage.image. Server-side filename after upload. - Denoise —
KSampler.denoise. 0.0–1.0; 1.0 = ignore input image, 0.0 = pass through. Sweet spot for img2img: 0.4–0.7.
Output Nodes
Output is produced by these node types. The skill's OUTPUT_NODES set
extends to common community packs.
| Node | Output Key | Content |
|---|---|---|
SaveImage |
images |
List of {filename, subfolder, type} |
PreviewImage |
images |
Temporary preview (not saved) |
VHS_VideoCombine |
gifs (older) or videos/video (newer cloud) |
Video file refs |
SaveAudio |
audio |
Audio file refs |
SaveAnimatedWEBP / SaveAnimatedPNG |
images |
Animated images |
Save3D |
3d |
3D asset refs |
After execution, fetch outputs from /history/{prompt_id} (local) or
/api/jobs/{prompt_id} (cloud) → outputs → {node_id} → {key}.
Wrapper Variants
Some saved JSON files wrap the workflow under a "prompt" key (matching
the /api/prompt payload shape). The skill's _common.unwrap_workflow()
handles this — pass any of:
- raw API format:
{"3": {...}, "4": {...}} - wrapped:
{"prompt": {"3": {...}}, "client_id": "..."}
It rejects editor format with a clear error and a re-export instruction.