mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-01 01:51:44 +00:00

alt-glitch b81638d749 feat(comfyui): rewrite skill — official CLI + REST API, no third-party dependency

Complete rewrite of the ComfyUI skill to use:
- comfy-cli (official, Comfy-Org/comfy-cli) for lifecycle management:
  install, launch, stop, node management, model downloads
- Direct REST API + helper scripts for workflow execution:
  parameter injection, submission, monitoring, output download
- No dependency on comfyui-skill-cli or any unofficial tool

New files:
- SKILL.md: full rewrite with two-layer architecture, decision tree, pitfalls
- references/official-cli.md: complete comfy-cli command reference
- references/rest-api.md: all REST endpoints (local + cloud)
- references/workflow-format.md: API format spec, common nodes, param mapping
- scripts/extract_schema.py: analyze workflow → extract controllable params
- scripts/run_workflow.py: inject args, submit, poll, download outputs
- scripts/check_deps.py: check missing nodes/models against running server
- scripts/comfyui_setup.sh: full setup automation with official CLI

Removed:
- references/cli-reference.md (was for unofficial comfyui-skill-cli)
- references/api-notes.md (replaced by rest-api.md)

Addresses feedback from PR #17316 comment:
- Correct author attribution
- Remove references to unofficial OpenClaw project
- License field reflects hermes-agent repo (MIT)

2026-04-29 12:38:59 -07:00

6.3 KiB

Raw Blame History

ComfyUI Workflow JSON Format

Two Formats

ComfyUI uses two workflow formats. Only API format works for programmatic execution.

API Format (what we use)

Top-level keys are string node IDs. Each node has class_type and inputs:

{
  "3": {
    "class_type": "KSampler",
    "inputs": {
      "seed": 156680208700286,
      "steps": 20,
      "cfg": 8,
      "sampler_name": "euler",
      "scheduler": "normal",
      "denoise": 1.0,
      "model": ["4", 0],
      "positive": ["6", 0],
      "negative": ["7", 0],
      "latent_image": ["5", 0]
    },
    "_meta": {"title": "KSampler"}
  },
  "4": {
    "class_type": "CheckpointLoaderSimple",
    "inputs": {
      "ckpt_name": "v1-5-pruned-emaonly.safetensors"
    }
  },
  "5": {
    "class_type": "EmptyLatentImage",
    "inputs": {"width": 512, "height": 512, "batch_size": 1}
  },
  "6": {
    "class_type": "CLIPTextEncode",
    "inputs": {
      "text": "a beautiful cat",
      "clip": ["4", 1]
    }
  },
  "7": {
    "class_type": "CLIPTextEncode",
    "inputs": {
      "text": "bad quality, ugly",
      "clip": ["4", 1]
    }
  },
  "9": {
    "class_type": "SaveImage",
    "inputs": {
      "filename_prefix": "ComfyUI",
      "images": ["8", 0]
    }
  }
}

How to detect: Top-level keys are numeric strings, each value has class_type.

Editor Format (not directly executable)

Has nodes[] and links[] arrays — the visual graph data from the ComfyUI web editor. This is what "Save" produces. For API use, export with "Save (API Format)" instead.

How to detect: Top-level has "nodes" and "links" keys.

Input Connections

Inputs can be:

Literal values: "text": "a cat", "seed": 42, "width": 512
Links to other nodes: ["node_id", output_index] — e.g., ["4", 0] means output slot 0 of node "4"

Only literal values can be modified by parameter injection. Linked inputs are wiring.

Common Node Types and Their Controllable Parameters

Text Prompts

Node Class	Key Fields
`CLIPTextEncode`	`text` (the prompt string)
`CLIPTextEncodeSDXL`	`text_g`, `text_l`, `width`, `height`

Usually: positive prompt → one CLIPTextEncode, negative prompt → another. Distinguish by checking the _meta.title field or by tracing which feeds into positive vs negative inputs of the sampler.

Sampling

Node Class	Key Fields
`KSampler`	`seed`, `steps`, `cfg`, `sampler_name`, `scheduler`, `denoise`
`KSamplerAdvanced`	`noise_seed`, `steps`, `cfg`, `sampler_name`, `scheduler`, `start_at_step`, `end_at_step`
`SamplerCustom`	`cfg`, `sampler`, `sigmas`

Image Dimensions

Node Class	Key Fields
`EmptyLatentImage`	`width`, `height`, `batch_size`
`LatentUpscale`	`width`, `height`, `upscale_method`

Model Loading

Node Class	Key Fields	Model Folder
`CheckpointLoaderSimple`	`ckpt_name`	`checkpoints`
`LoraLoader`	`lora_name`, `strength_model`, `strength_clip`	`loras`
`VAELoader`	`vae_name`	`vae`
`ControlNetLoader`	`control_net_name`	`controlnet`
`CLIPLoader`	`clip_name`	`clip`
`UNETLoader`	`unet_name`	`unet`
`DiffusionModelLoader`	`model_name`	`diffusion_models`
`UpscaleModelLoader`	`model_name`	`upscale_models`

Image Input/Output

Node Class	Key Fields
`LoadImage`	`image` (filename on server, after upload)
`LoadImageMask`	`image`, `channel`
`SaveImage`	`filename_prefix`
`PreviewImage`	(no controllable fields, just previews)

ControlNet

Node Class	Key Fields
`ControlNetApply`	`strength`
`ControlNetApplyAdvanced`	`strength`, `start_percent`, `end_percent`

Video (AnimateDiff)

Node Class	Key Fields
`ADE_AnimateDiffLoaderWithContext`	`model_name`, `motion_scale`
`VHS_VideoCombine`	`frame_rate`, `format`, `filename_prefix`

Parameter Injection Pattern

To modify a workflow programmatically:

import json, copy

with open("workflow_api.json") as f:
    workflow = json.load(f)

# Deep copy to avoid mutating original
wf = copy.deepcopy(workflow)

# Inject parameters by node ID + field name
wf["6"]["inputs"]["text"] = "a beautiful sunset"     # positive prompt
wf["7"]["inputs"]["text"] = "ugly, blurry"           # negative prompt
wf["3"]["inputs"]["seed"] = 42                       # seed
wf["3"]["inputs"]["steps"] = 30                      # steps
wf["5"]["inputs"]["width"] = 1024                    # width
wf["5"]["inputs"]["height"] = 1024                   # height

The scripts/extract_schema.py in this skill automates discovering which node IDs and fields correspond to which user-facing parameters.

Identifying Controllable Parameters (Heuristics)

When analyzing an unknown workflow, these patterns identify user-facing params:

Prompt text: Any CLIPTextEncode → text field. Title/meta usually indicates positive vs negative.
Seed: Any KSampler / KSamplerAdvanced → seed / noise_seed. Randomizable — set to different values for variations.
Dimensions: EmptyLatentImage → width, height. Common: 512, 768, 1024 (must be multiples of 8).
Steps: KSampler → steps. More = higher quality + slower. 20-50 typical.
CFG scale: KSampler → cfg. How closely to follow prompt. 5-15 typical.
Model/checkpoint: CheckpointLoaderSimple → ckpt_name. Must match an installed model filename exactly.
LoRA: LoraLoader → lora_name, strength_model. Adapter name + weight.
Images for img2img: LoadImage → image. Filename on server after upload.
Denoise strength: KSampler → denoise. 0.0-1.0. Lower = closer to input image. Only relevant for img2img.

Output Nodes

Output is produced by these node types:

Node	Output Key	Content
`SaveImage`	`images`	List of `{filename, subfolder, type}`
`VHS_VideoCombine`	`gifs` or `videos`	Video file references
`SaveAudio`	`audio`	Audio file references
`PreviewImage`	`images`	Temporary preview (not saved)

After execution, fetch outputs from /history/{prompt_id} → outputs → {node_id}.

6.3 KiB Raw Blame History