Complete rewrite of the ComfyUI skill to use: - comfy-cli (official, Comfy-Org/comfy-cli) for lifecycle management: install, launch, stop, node management, model downloads - Direct REST API + helper scripts for workflow execution: parameter injection, submission, monitoring, output download - No dependency on comfyui-skill-cli or any unofficial tool New files: - SKILL.md: full rewrite with two-layer architecture, decision tree, pitfalls - references/official-cli.md: complete comfy-cli command reference - references/rest-api.md: all REST endpoints (local + cloud) - references/workflow-format.md: API format spec, common nodes, param mapping - scripts/extract_schema.py: analyze workflow → extract controllable params - scripts/run_workflow.py: inject args, submit, poll, download outputs - scripts/check_deps.py: check missing nodes/models against running server - scripts/comfyui_setup.sh: full setup automation with official CLI Removed: - references/cli-reference.md (was for unofficial comfyui-skill-cli) - references/api-notes.md (replaced by rest-api.md) Addresses feedback from PR #17316 comment: - Correct author attribution - Remove references to unofficial OpenClaw project - License field reflects hermes-agent repo (MIT)
6.3 KiB
ComfyUI Workflow JSON Format
Two Formats
ComfyUI uses two workflow formats. Only API format works for programmatic execution.
API Format (what we use)
Top-level keys are string node IDs. Each node has class_type and inputs:
{
"3": {
"class_type": "KSampler",
"inputs": {
"seed": 156680208700286,
"steps": 20,
"cfg": 8,
"sampler_name": "euler",
"scheduler": "normal",
"denoise": 1.0,
"model": ["4", 0],
"positive": ["6", 0],
"negative": ["7", 0],
"latent_image": ["5", 0]
},
"_meta": {"title": "KSampler"}
},
"4": {
"class_type": "CheckpointLoaderSimple",
"inputs": {
"ckpt_name": "v1-5-pruned-emaonly.safetensors"
}
},
"5": {
"class_type": "EmptyLatentImage",
"inputs": {"width": 512, "height": 512, "batch_size": 1}
},
"6": {
"class_type": "CLIPTextEncode",
"inputs": {
"text": "a beautiful cat",
"clip": ["4", 1]
}
},
"7": {
"class_type": "CLIPTextEncode",
"inputs": {
"text": "bad quality, ugly",
"clip": ["4", 1]
}
},
"9": {
"class_type": "SaveImage",
"inputs": {
"filename_prefix": "ComfyUI",
"images": ["8", 0]
}
}
}
How to detect: Top-level keys are numeric strings, each value has class_type.
Editor Format (not directly executable)
Has nodes[] and links[] arrays — the visual graph data from the ComfyUI web editor.
This is what "Save" produces. For API use, export with "Save (API Format)" instead.
How to detect: Top-level has "nodes" and "links" keys.
Input Connections
Inputs can be:
- Literal values:
"text": "a cat","seed": 42,"width": 512 - Links to other nodes:
["node_id", output_index]— e.g.,["4", 0]means output slot 0 of node "4"
Only literal values can be modified by parameter injection. Linked inputs are wiring.
Common Node Types and Their Controllable Parameters
Text Prompts
| Node Class | Key Fields |
|---|---|
CLIPTextEncode |
text (the prompt string) |
CLIPTextEncodeSDXL |
text_g, text_l, width, height |
Usually: positive prompt → one CLIPTextEncode, negative prompt → another.
Distinguish by checking the _meta.title field or by tracing which feeds into
positive vs negative inputs of the sampler.
Sampling
| Node Class | Key Fields |
|---|---|
KSampler |
seed, steps, cfg, sampler_name, scheduler, denoise |
KSamplerAdvanced |
noise_seed, steps, cfg, sampler_name, scheduler, start_at_step, end_at_step |
SamplerCustom |
cfg, sampler, sigmas |
Image Dimensions
| Node Class | Key Fields |
|---|---|
EmptyLatentImage |
width, height, batch_size |
LatentUpscale |
width, height, upscale_method |
Model Loading
| Node Class | Key Fields | Model Folder |
|---|---|---|
CheckpointLoaderSimple |
ckpt_name |
checkpoints |
LoraLoader |
lora_name, strength_model, strength_clip |
loras |
VAELoader |
vae_name |
vae |
ControlNetLoader |
control_net_name |
controlnet |
CLIPLoader |
clip_name |
clip |
UNETLoader |
unet_name |
unet |
DiffusionModelLoader |
model_name |
diffusion_models |
UpscaleModelLoader |
model_name |
upscale_models |
Image Input/Output
| Node Class | Key Fields |
|---|---|
LoadImage |
image (filename on server, after upload) |
LoadImageMask |
image, channel |
SaveImage |
filename_prefix |
PreviewImage |
(no controllable fields, just previews) |
ControlNet
| Node Class | Key Fields |
|---|---|
ControlNetApply |
strength |
ControlNetApplyAdvanced |
strength, start_percent, end_percent |
Video (AnimateDiff)
| Node Class | Key Fields |
|---|---|
ADE_AnimateDiffLoaderWithContext |
model_name, motion_scale |
VHS_VideoCombine |
frame_rate, format, filename_prefix |
Parameter Injection Pattern
To modify a workflow programmatically:
import json, copy
with open("workflow_api.json") as f:
workflow = json.load(f)
# Deep copy to avoid mutating original
wf = copy.deepcopy(workflow)
# Inject parameters by node ID + field name
wf["6"]["inputs"]["text"] = "a beautiful sunset" # positive prompt
wf["7"]["inputs"]["text"] = "ugly, blurry" # negative prompt
wf["3"]["inputs"]["seed"] = 42 # seed
wf["3"]["inputs"]["steps"] = 30 # steps
wf["5"]["inputs"]["width"] = 1024 # width
wf["5"]["inputs"]["height"] = 1024 # height
The scripts/extract_schema.py in this skill automates discovering which
node IDs and fields correspond to which user-facing parameters.
Identifying Controllable Parameters (Heuristics)
When analyzing an unknown workflow, these patterns identify user-facing params:
-
Prompt text: Any
CLIPTextEncode→textfield. Title/meta usually indicates positive vs negative. -
Seed: Any
KSampler/KSamplerAdvanced→seed/noise_seed. Randomizable — set to different values for variations. -
Dimensions:
EmptyLatentImage→width,height. Common: 512, 768, 1024 (must be multiples of 8). -
Steps:
KSampler→steps. More = higher quality + slower. 20-50 typical. -
CFG scale:
KSampler→cfg. How closely to follow prompt. 5-15 typical. -
Model/checkpoint:
CheckpointLoaderSimple→ckpt_name. Must match an installed model filename exactly. -
LoRA:
LoraLoader→lora_name,strength_model. Adapter name + weight. -
Images for img2img:
LoadImage→image. Filename on server after upload. -
Denoise strength:
KSampler→denoise. 0.0-1.0. Lower = closer to input image. Only relevant for img2img.
Output Nodes
Output is produced by these node types:
| Node | Output Key | Content |
|---|---|---|
SaveImage |
images |
List of {filename, subfolder, type} |
VHS_VideoCombine |
gifs or videos |
Video file references |
SaveAudio |
audio |
Audio file references |
PreviewImage |
images |
Temporary preview (not saved) |
After execution, fetch outputs from /history/{prompt_id} → outputs → {node_id}.