feat(comfyui): rewrite skill — official CLI + REST API, no third-party dependency

Complete rewrite of the ComfyUI skill to use: - comfy-cli (official, Comfy-Org/comfy-cli) for lifecycle management: install, launch, stop, node management, model downloads - Direct REST API + helper scripts for workflow execution: parameter injection, submission, monitoring, output download - No dependency on comfyui-skill-cli or any unofficial tool New files: - SKILL.md: full rewrite with two-layer architecture, decision tree, pitfalls - references/official-cli.md: complete comfy-cli command reference - references/rest-api.md: all REST endpoints (local + cloud) - references/workflow-format.md: API format spec, common nodes, param mapping - scripts/extract_schema.py: analyze workflow → extract controllable params - scripts/run_workflow.py: inject args, submit, poll, download outputs - scripts/check_deps.py: check missing nodes/models against running server - scripts/comfyui_setup.sh: full setup automation with official CLI Removed: - references/cli-reference.md (was for unofficial comfyui-skill-cli) - references/api-notes.md (replaced by rest-api.md) Addresses feedback from PR #17316 comment: - Correct author attribution - Remove references to unofficial OpenClaw project - License field reflects hermes-agent repo (MIT)
2026-05-01 01:51:44 +00:00 · 2026-04-30 00:43:59 +05:30 · 2026-04-30 00:43:59 +05:30 · b81638d749
commit b81638d749
parent 258449c468
8 changed files with 1925 additions and 0 deletions
--- a/optional-skills/creative/comfyui/references/workflow-format.md
+++ b/optional-skills/creative/comfyui/references/workflow-format.md
@ -0,0 +1,218 @@
+# ComfyUI Workflow JSON Format
+
+## Two Formats
+
+ComfyUI uses two workflow formats. **Only API format works for programmatic execution.**
+
+### API Format (what we use)
+
+Top-level keys are string node IDs. Each node has `class_type` and `inputs`:
+
+```json
+{
+  "3": {
+    "class_type": "KSampler",
+    "inputs": {
+      "seed": 156680208700286,
+      "steps": 20,
+      "cfg": 8,
+      "sampler_name": "euler",
+      "scheduler": "normal",
+      "denoise": 1.0,
+      "model": ["4", 0],
+      "positive": ["6", 0],
+      "negative": ["7", 0],
+      "latent_image": ["5", 0]
+    },
+    "_meta": {"title": "KSampler"}
+  },
+  "4": {
+    "class_type": "CheckpointLoaderSimple",
+    "inputs": {
+      "ckpt_name": "v1-5-pruned-emaonly.safetensors"
+    }
+  },
+  "5": {
+    "class_type": "EmptyLatentImage",
+    "inputs": {"width": 512, "height": 512, "batch_size": 1}
+  },
+  "6": {
+    "class_type": "CLIPTextEncode",
+    "inputs": {
+      "text": "a beautiful cat",
+      "clip": ["4", 1]
+    }
+  },
+  "7": {
+    "class_type": "CLIPTextEncode",
+    "inputs": {
+      "text": "bad quality, ugly",
+      "clip": ["4", 1]
+    }
+  },
+  "9": {
+    "class_type": "SaveImage",
+    "inputs": {
+      "filename_prefix": "ComfyUI",
+      "images": ["8", 0]
+    }
+  }
+}
+```
+
+**How to detect:** Top-level keys are numeric strings, each value has `class_type`.
+
+### Editor Format (not directly executable)
+
+Has `nodes[]` and `links[]` arrays — the visual graph data from the ComfyUI web editor.
+This is what "Save" produces. For API use, export with "Save (API Format)" instead.
+
+**How to detect:** Top-level has `"nodes"` and `"links"` keys.
+
+---
+
+## Input Connections
+
+Inputs can be:
+- **Literal values**: `"text": "a cat"`, `"seed": 42`, `"width": 512`
+- **Links to other nodes**: `["node_id", output_index]` — e.g., `["4", 0]` means
+  output slot 0 of node "4"
+
+Only literal values can be modified by parameter injection. Linked inputs are wiring.
+
+---
+
+## Common Node Types and Their Controllable Parameters
+
+### Text Prompts
+
+| Node Class | Key Fields |
+|------------|-----------|
+| `CLIPTextEncode` | `text` (the prompt string) |
+| `CLIPTextEncodeSDXL` | `text_g`, `text_l`, `width`, `height` |
+
+Usually: positive prompt → one CLIPTextEncode, negative prompt → another.
+Distinguish by checking the `_meta.title` field or by tracing which feeds into
+positive vs negative inputs of the sampler.
+
+### Sampling
+
+| Node Class | Key Fields |
+|------------|-----------|
+| `KSampler` | `seed`, `steps`, `cfg`, `sampler_name`, `scheduler`, `denoise` |
+| `KSamplerAdvanced` | `noise_seed`, `steps`, `cfg`, `sampler_name`, `scheduler`, `start_at_step`, `end_at_step` |
+| `SamplerCustom` | `cfg`, `sampler`, `sigmas` |
+
+### Image Dimensions
+
+| Node Class | Key Fields |
+|------------|-----------|
+| `EmptyLatentImage` | `width`, `height`, `batch_size` |
+| `LatentUpscale` | `width`, `height`, `upscale_method` |
+
+### Model Loading
+
+| Node Class | Key Fields | Model Folder |
+|------------|-----------|-------------|
+| `CheckpointLoaderSimple` | `ckpt_name` | `checkpoints` |
+| `LoraLoader` | `lora_name`, `strength_model`, `strength_clip` | `loras` |
+| `VAELoader` | `vae_name` | `vae` |
+| `ControlNetLoader` | `control_net_name` | `controlnet` |
+| `CLIPLoader` | `clip_name` | `clip` |
+| `UNETLoader` | `unet_name` | `unet` |
+| `DiffusionModelLoader` | `model_name` | `diffusion_models` |
+| `UpscaleModelLoader` | `model_name` | `upscale_models` |
+
+### Image Input/Output
+
+| Node Class | Key Fields |
+|------------|-----------|
+| `LoadImage` | `image` (filename on server, after upload) |
+| `LoadImageMask` | `image`, `channel` |
+| `SaveImage` | `filename_prefix` |
+| `PreviewImage` | (no controllable fields, just previews) |
+
+### ControlNet
+
+| Node Class | Key Fields |
+|------------|-----------|
+| `ControlNetApply` | `strength` |
+| `ControlNetApplyAdvanced` | `strength`, `start_percent`, `end_percent` |
+
+### Video (AnimateDiff)
+
+| Node Class | Key Fields |
+|------------|-----------|
+| `ADE_AnimateDiffLoaderWithContext` | `model_name`, `motion_scale` |
+| `VHS_VideoCombine` | `frame_rate`, `format`, `filename_prefix` |
+
+---
+
+## Parameter Injection Pattern
+
+To modify a workflow programmatically:
+
+```python
+import json, copy
+
+with open("workflow_api.json") as f:
+    workflow = json.load(f)
+
+# Deep copy to avoid mutating original
+wf = copy.deepcopy(workflow)
+
+# Inject parameters by node ID + field name
+wf["6"]["inputs"]["text"] = "a beautiful sunset"     # positive prompt
+wf["7"]["inputs"]["text"] = "ugly, blurry"           # negative prompt
+wf["3"]["inputs"]["seed"] = 42                       # seed
+wf["3"]["inputs"]["steps"] = 30                      # steps
+wf["5"]["inputs"]["width"] = 1024                    # width
+wf["5"]["inputs"]["height"] = 1024                   # height
+```
+
+The `scripts/extract_schema.py` in this skill automates discovering which
+node IDs and fields correspond to which user-facing parameters.
+
+---
+
+## Identifying Controllable Parameters (Heuristics)
+
+When analyzing an unknown workflow, these patterns identify user-facing params:
+
+1. **Prompt text**: Any `CLIPTextEncode` → `text` field. Title/meta usually
+   indicates positive vs negative.
+
+2. **Seed**: Any `KSampler` / `KSamplerAdvanced` → `seed` / `noise_seed`.
+   Randomizable — set to different values for variations.
+
+3. **Dimensions**: `EmptyLatentImage` → `width`, `height`. Common: 512, 768,
+   1024 (must be multiples of 8).
+
+4. **Steps**: `KSampler` → `steps`. More = higher quality + slower. 20-50 typical.
+
+5. **CFG scale**: `KSampler` → `cfg`. How closely to follow prompt. 5-15 typical.
+
+6. **Model/checkpoint**: `CheckpointLoaderSimple` → `ckpt_name`. Must match an
+   installed model filename exactly.
+
+7. **LoRA**: `LoraLoader` → `lora_name`, `strength_model`. Adapter name + weight.
+
+8. **Images for img2img**: `LoadImage` → `image`. Filename on server after upload.
+
+9. **Denoise strength**: `KSampler` → `denoise`. 0.0-1.0. Lower = closer to input
+   image. Only relevant for img2img.
+
+---
+
+## Output Nodes
+
+Output is produced by these node types:
+
+| Node | Output Key | Content |
+|------|-----------|---------|
+| `SaveImage` | `images` | List of `{filename, subfolder, type}` |
+| `VHS_VideoCombine` | `gifs` or `videos` | Video file references |
+| `SaveAudio` | `audio` | Audio file references |
+| `PreviewImage` | `images` | Temporary preview (not saved) |
+
+After execution, fetch outputs from `/history/{prompt_id}` → `outputs` → `{node_id}`.