hermes-agent/optional-skills/creative/comfyui/references/workflow-format.md
alt-glitch b81638d749 feat(comfyui): rewrite skill — official CLI + REST API, no third-party dependency
Complete rewrite of the ComfyUI skill to use:
- comfy-cli (official, Comfy-Org/comfy-cli) for lifecycle management:
  install, launch, stop, node management, model downloads
- Direct REST API + helper scripts for workflow execution:
  parameter injection, submission, monitoring, output download
- No dependency on comfyui-skill-cli or any unofficial tool

New files:
- SKILL.md: full rewrite with two-layer architecture, decision tree, pitfalls
- references/official-cli.md: complete comfy-cli command reference
- references/rest-api.md: all REST endpoints (local + cloud)
- references/workflow-format.md: API format spec, common nodes, param mapping
- scripts/extract_schema.py: analyze workflow → extract controllable params
- scripts/run_workflow.py: inject args, submit, poll, download outputs
- scripts/check_deps.py: check missing nodes/models against running server
- scripts/comfyui_setup.sh: full setup automation with official CLI

Removed:
- references/cli-reference.md (was for unofficial comfyui-skill-cli)
- references/api-notes.md (replaced by rest-api.md)

Addresses feedback from PR #17316 comment:
- Correct author attribution
- Remove references to unofficial OpenClaw project
- License field reflects hermes-agent repo (MIT)
2026-04-29 12:38:59 -07:00

6.3 KiB

ComfyUI Workflow JSON Format

Two Formats

ComfyUI uses two workflow formats. Only API format works for programmatic execution.

API Format (what we use)

Top-level keys are string node IDs. Each node has class_type and inputs:

{
  "3": {
    "class_type": "KSampler",
    "inputs": {
      "seed": 156680208700286,
      "steps": 20,
      "cfg": 8,
      "sampler_name": "euler",
      "scheduler": "normal",
      "denoise": 1.0,
      "model": ["4", 0],
      "positive": ["6", 0],
      "negative": ["7", 0],
      "latent_image": ["5", 0]
    },
    "_meta": {"title": "KSampler"}
  },
  "4": {
    "class_type": "CheckpointLoaderSimple",
    "inputs": {
      "ckpt_name": "v1-5-pruned-emaonly.safetensors"
    }
  },
  "5": {
    "class_type": "EmptyLatentImage",
    "inputs": {"width": 512, "height": 512, "batch_size": 1}
  },
  "6": {
    "class_type": "CLIPTextEncode",
    "inputs": {
      "text": "a beautiful cat",
      "clip": ["4", 1]
    }
  },
  "7": {
    "class_type": "CLIPTextEncode",
    "inputs": {
      "text": "bad quality, ugly",
      "clip": ["4", 1]
    }
  },
  "9": {
    "class_type": "SaveImage",
    "inputs": {
      "filename_prefix": "ComfyUI",
      "images": ["8", 0]
    }
  }
}

How to detect: Top-level keys are numeric strings, each value has class_type.

Editor Format (not directly executable)

Has nodes[] and links[] arrays — the visual graph data from the ComfyUI web editor. This is what "Save" produces. For API use, export with "Save (API Format)" instead.

How to detect: Top-level has "nodes" and "links" keys.


Input Connections

Inputs can be:

  • Literal values: "text": "a cat", "seed": 42, "width": 512
  • Links to other nodes: ["node_id", output_index] — e.g., ["4", 0] means output slot 0 of node "4"

Only literal values can be modified by parameter injection. Linked inputs are wiring.


Common Node Types and Their Controllable Parameters

Text Prompts

Node Class Key Fields
CLIPTextEncode text (the prompt string)
CLIPTextEncodeSDXL text_g, text_l, width, height

Usually: positive prompt → one CLIPTextEncode, negative prompt → another. Distinguish by checking the _meta.title field or by tracing which feeds into positive vs negative inputs of the sampler.

Sampling

Node Class Key Fields
KSampler seed, steps, cfg, sampler_name, scheduler, denoise
KSamplerAdvanced noise_seed, steps, cfg, sampler_name, scheduler, start_at_step, end_at_step
SamplerCustom cfg, sampler, sigmas

Image Dimensions

Node Class Key Fields
EmptyLatentImage width, height, batch_size
LatentUpscale width, height, upscale_method

Model Loading

Node Class Key Fields Model Folder
CheckpointLoaderSimple ckpt_name checkpoints
LoraLoader lora_name, strength_model, strength_clip loras
VAELoader vae_name vae
ControlNetLoader control_net_name controlnet
CLIPLoader clip_name clip
UNETLoader unet_name unet
DiffusionModelLoader model_name diffusion_models
UpscaleModelLoader model_name upscale_models

Image Input/Output

Node Class Key Fields
LoadImage image (filename on server, after upload)
LoadImageMask image, channel
SaveImage filename_prefix
PreviewImage (no controllable fields, just previews)

ControlNet

Node Class Key Fields
ControlNetApply strength
ControlNetApplyAdvanced strength, start_percent, end_percent

Video (AnimateDiff)

Node Class Key Fields
ADE_AnimateDiffLoaderWithContext model_name, motion_scale
VHS_VideoCombine frame_rate, format, filename_prefix

Parameter Injection Pattern

To modify a workflow programmatically:

import json, copy

with open("workflow_api.json") as f:
    workflow = json.load(f)

# Deep copy to avoid mutating original
wf = copy.deepcopy(workflow)

# Inject parameters by node ID + field name
wf["6"]["inputs"]["text"] = "a beautiful sunset"     # positive prompt
wf["7"]["inputs"]["text"] = "ugly, blurry"           # negative prompt
wf["3"]["inputs"]["seed"] = 42                       # seed
wf["3"]["inputs"]["steps"] = 30                      # steps
wf["5"]["inputs"]["width"] = 1024                    # width
wf["5"]["inputs"]["height"] = 1024                   # height

The scripts/extract_schema.py in this skill automates discovering which node IDs and fields correspond to which user-facing parameters.


Identifying Controllable Parameters (Heuristics)

When analyzing an unknown workflow, these patterns identify user-facing params:

  1. Prompt text: Any CLIPTextEncodetext field. Title/meta usually indicates positive vs negative.

  2. Seed: Any KSampler / KSamplerAdvancedseed / noise_seed. Randomizable — set to different values for variations.

  3. Dimensions: EmptyLatentImagewidth, height. Common: 512, 768, 1024 (must be multiples of 8).

  4. Steps: KSamplersteps. More = higher quality + slower. 20-50 typical.

  5. CFG scale: KSamplercfg. How closely to follow prompt. 5-15 typical.

  6. Model/checkpoint: CheckpointLoaderSimpleckpt_name. Must match an installed model filename exactly.

  7. LoRA: LoraLoaderlora_name, strength_model. Adapter name + weight.

  8. Images for img2img: LoadImageimage. Filename on server after upload.

  9. Denoise strength: KSamplerdenoise. 0.0-1.0. Lower = closer to input image. Only relevant for img2img.


Output Nodes

Output is produced by these node types:

Node Output Key Content
SaveImage images List of {filename, subfolder, type}
VHS_VideoCombine gifs or videos Video file references
SaveAudio audio Audio file references
PreviewImage images Temporary preview (not saved)

After execution, fetch outputs from /history/{prompt_id}outputs{node_id}.