Intended placement per PR #17610 discussion — comfyui belongs in skills/creative/ alongside other creative built-ins (touchdesigner-mcp, pretext, sketch), not in optional-skills/. Pure directory rename, no content changes. History preserved via git mv.
6.3 KiB
ComfyUI Workflow JSON Format
Two Formats
ComfyUI uses two workflow formats. Only API format works for programmatic execution.
API Format (what we use)
Top-level keys are string node IDs. Each node has class_type and inputs:
{
"3": {
"class_type": "KSampler",
"inputs": {
"seed": 156680208700286,
"steps": 20,
"cfg": 8,
"sampler_name": "euler",
"scheduler": "normal",
"denoise": 1.0,
"model": ["4", 0],
"positive": ["6", 0],
"negative": ["7", 0],
"latent_image": ["5", 0]
},
"_meta": {"title": "KSampler"}
},
"4": {
"class_type": "CheckpointLoaderSimple",
"inputs": {
"ckpt_name": "v1-5-pruned-emaonly.safetensors"
}
},
"5": {
"class_type": "EmptyLatentImage",
"inputs": {"width": 512, "height": 512, "batch_size": 1}
},
"6": {
"class_type": "CLIPTextEncode",
"inputs": {
"text": "a beautiful cat",
"clip": ["4", 1]
}
},
"7": {
"class_type": "CLIPTextEncode",
"inputs": {
"text": "bad quality, ugly",
"clip": ["4", 1]
}
},
"9": {
"class_type": "SaveImage",
"inputs": {
"filename_prefix": "ComfyUI",
"images": ["8", 0]
}
}
}
How to detect: Top-level keys are numeric strings, each value has class_type.
Editor Format (not directly executable)
Has nodes[] and links[] arrays — the visual graph data from the ComfyUI web editor.
This is what "Save" produces. For API use, export with "Save (API Format)" instead.
How to detect: Top-level has "nodes" and "links" keys.
Input Connections
Inputs can be:
- Literal values:
"text": "a cat","seed": 42,"width": 512 - Links to other nodes:
["node_id", output_index]— e.g.,["4", 0]means output slot 0 of node "4"
Only literal values can be modified by parameter injection. Linked inputs are wiring.
Common Node Types and Their Controllable Parameters
Text Prompts
| Node Class | Key Fields |
|---|---|
CLIPTextEncode |
text (the prompt string) |
CLIPTextEncodeSDXL |
text_g, text_l, width, height |
Usually: positive prompt → one CLIPTextEncode, negative prompt → another.
Distinguish by checking the _meta.title field or by tracing which feeds into
positive vs negative inputs of the sampler.
Sampling
| Node Class | Key Fields |
|---|---|
KSampler |
seed, steps, cfg, sampler_name, scheduler, denoise |
KSamplerAdvanced |
noise_seed, steps, cfg, sampler_name, scheduler, start_at_step, end_at_step |
SamplerCustom |
cfg, sampler, sigmas |
Image Dimensions
| Node Class | Key Fields |
|---|---|
EmptyLatentImage |
width, height, batch_size |
LatentUpscale |
width, height, upscale_method |
Model Loading
| Node Class | Key Fields | Model Folder |
|---|---|---|
CheckpointLoaderSimple |
ckpt_name |
checkpoints |
LoraLoader |
lora_name, strength_model, strength_clip |
loras |
VAELoader |
vae_name |
vae |
ControlNetLoader |
control_net_name |
controlnet |
CLIPLoader |
clip_name |
clip |
UNETLoader |
unet_name |
unet |
DiffusionModelLoader |
model_name |
diffusion_models |
UpscaleModelLoader |
model_name |
upscale_models |
Image Input/Output
| Node Class | Key Fields |
|---|---|
LoadImage |
image (filename on server, after upload) |
LoadImageMask |
image, channel |
SaveImage |
filename_prefix |
PreviewImage |
(no controllable fields, just previews) |
ControlNet
| Node Class | Key Fields |
|---|---|
ControlNetApply |
strength |
ControlNetApplyAdvanced |
strength, start_percent, end_percent |
Video (AnimateDiff)
| Node Class | Key Fields |
|---|---|
ADE_AnimateDiffLoaderWithContext |
model_name, motion_scale |
VHS_VideoCombine |
frame_rate, format, filename_prefix |
Parameter Injection Pattern
To modify a workflow programmatically:
import json, copy
with open("workflow_api.json") as f:
workflow = json.load(f)
# Deep copy to avoid mutating original
wf = copy.deepcopy(workflow)
# Inject parameters by node ID + field name
wf["6"]["inputs"]["text"] = "a beautiful sunset" # positive prompt
wf["7"]["inputs"]["text"] = "ugly, blurry" # negative prompt
wf["3"]["inputs"]["seed"] = 42 # seed
wf["3"]["inputs"]["steps"] = 30 # steps
wf["5"]["inputs"]["width"] = 1024 # width
wf["5"]["inputs"]["height"] = 1024 # height
The scripts/extract_schema.py in this skill automates discovering which
node IDs and fields correspond to which user-facing parameters.
Identifying Controllable Parameters (Heuristics)
When analyzing an unknown workflow, these patterns identify user-facing params:
-
Prompt text: Any
CLIPTextEncode→textfield. Title/meta usually indicates positive vs negative. -
Seed: Any
KSampler/KSamplerAdvanced→seed/noise_seed. Randomizable — set to different values for variations. -
Dimensions:
EmptyLatentImage→width,height. Common: 512, 768, 1024 (must be multiples of 8). -
Steps:
KSampler→steps. More = higher quality + slower. 20-50 typical. -
CFG scale:
KSampler→cfg. How closely to follow prompt. 5-15 typical. -
Model/checkpoint:
CheckpointLoaderSimple→ckpt_name. Must match an installed model filename exactly. -
LoRA:
LoraLoader→lora_name,strength_model. Adapter name + weight. -
Images for img2img:
LoadImage→image. Filename on server after upload. -
Denoise strength:
KSampler→denoise. 0.0-1.0. Lower = closer to input image. Only relevant for img2img.
Output Nodes
Output is produced by these node types:
| Node | Output Key | Content |
|---|---|---|
SaveImage |
images |
List of {filename, subfolder, type} |
VHS_VideoCombine |
gifs or videos |
Video file references |
SaveAudio |
audio |
Audio file references |
PreviewImage |
images |
Temporary preview (not saved) |
After execution, fetch outputs from /history/{prompt_id} → outputs → {node_id}.