fix(skills): align article-illustrator with real Hermes tool capabilities

Addresses review feedback on #13193: 1. Reference-image flow no longer assumes write_file/read_file handle binaries. vision_analyze produces a textual description; the binary is optionally copied via terminal (cp/curl). The description is what gets embedded in prompts. 2. image_generate's URL-only return is now explicit. Step 6 downloads the returned URL to local disk via terminal (curl -sSL -o ...), then verifies non-zero size before proceeding. 3. Removed "Please use nano banana pro..." line from prompts/system.md — the backend is user-configured and not agent-selectable, so routing hints in the prompt are misleading. PORT_NOTES.md updated: prompts/system.md is no longer verbatim, and the file-ops/backend-selection rows now reflect Hermes' actual tool surface (write_file/read_file for text, terminal for binaries and URL downloads, vision_analyze for reading images).
2026-07-13 14:02:16 +00:00 · 2026-04-20 21:28:42 -05:00 · 2026-04-20 21:28:42 -05:00 · a93de60b68
commit a93de60b68
parent 4bd297094a
4 changed files with 75 additions and 56 deletions
--- a/skills/creative/baoyu-article-illustrator/PORT_NOTES.md
+++ b/skills/creative/baoyu-article-illustrator/PORT_NOTES.md
@ -4,7 +4,7 @@ Ported from [JimLiu/baoyu-skills](https://github.com/JimLiu/baoyu-skills) v1.57.

 ## Changes from upstream

-`SKILL.md`, `references/workflow.md`, `references/usage.md`, `references/style-presets.md`, `references/styles.md`, and `references/prompt-construction.md` were adapted. The 21 style files, 4 palette files, and `prompts/system.md` are verbatim copies. The `references/config/` directory was removed entirely.
+`SKILL.md`, `references/workflow.md`, `references/usage.md`, `references/style-presets.md`, `references/styles.md`, `references/prompt-construction.md`, and `prompts/system.md` were adapted. The 23 style files and 4 palette files are verbatim copies. The `references/config/` directory was removed entirely.

 ### Adaptations

@ -14,19 +14,20 @@ Ported from [JimLiu/baoyu-skills](https://github.com/JimLiu/baoyu-skills) v1.57.
 | Trigger | `/baoyu-article-illustrator` slash command + CLI flags | Natural language skill matching |
 | User config | EXTEND.md (project/user/XDG paths) + first-time-setup | Removed — not part of Hermes infra |
 | User prompts | `AskUserQuestion` (batched, multi-question) | `clarify` tool (one question at a time) |
-| Image generation | `baoyu-imagine` (Bun/TypeScript, multi-provider, accepts `--ref`) | `image_generate` tool (describes references in prompt text) |
+| Image generation | `baoyu-imagine` (Bun/TypeScript, multi-provider, accepts `--ref`, writes to local path) | `image_generate` (returns URL only; agent downloads via `terminal`/`curl`) |
+| Backend selection | User picks provider via CLI flags | Not agent-selectable — `image_generate` uses the user-configured FAL model. Removed hardcoded "nano banana pro" line from `prompts/system.md`. |
+| Reference images | Passed to backend via `--ref`, copied via shell | `vision_analyze` extracts a textual description (binary never touched by `write_file`/`read_file`); description is embedded in prompts. Optional `terminal cp` for a local record. |
 | Platform support | Linux/macOS/Windows/WSL/PowerShell | Linux/macOS only |
-| File operations | Bash commands | Hermes file tools (`write_file`, `read_file`) |
+| File operations | Bash commands | Hermes file tools: `write_file`/`read_file` for text, `terminal` for binaries and URL downloads, `vision_analyze` for reading images |
 | Watermark | Driven by EXTEND.md `watermark.enabled` | Optional — user asks for it per-article |
 | Output directory | EXTEND.md `default_output_dir` (imgs-subdir / same-dir / illustrations-subdir / independent) | Defaults based on input type; user overrides in request |

 ### What was preserved

 - Type × Style × Palette three-dimension framework
- All style definitions (23 files)
- All palette definitions (4 files)
- Core reference files (workflow, prompt-construction, styles, style-presets)
- `prompts/system.md` (generation prompt template)
+- All style definitions (23 files, verbatim)
+- All palette definitions (4 files, verbatim)
+- Core reference files (workflow, prompt-construction, styles, style-presets) — adapted for Hermes tooling
 - Core principles and workflow structure (analyze → confirm → outline → prompts → generate)
 - Prompt-file-as-reproducibility-record discipline
 - Author, version, homepage attribution
@ -44,4 +45,4 @@ curl -sL https://raw.githubusercontent.com/JimLiu/baoyu-skills/main/skills/baoyu
 diff <(curl -sL https://raw.githubusercontent.com/JimLiu/baoyu-skills/main/skills/baoyu-article-illustrator/references/styles/blueprint.md) references/styles/blueprint.md
 ```

-`references/styles/*`, `references/palettes/*`, and `prompts/system.md` can be overwritten directly. `SKILL.md`, `references/workflow.md`, `references/usage.md`, `references/style-presets.md`, `references/styles.md`, and `references/prompt-construction.md` must be manually merged since they contain Hermes-specific adaptations.
+`references/styles/*` and `references/palettes/*` can be overwritten directly. `SKILL.md`, `references/workflow.md`, `references/usage.md`, `references/style-presets.md`, `references/styles.md`, `references/prompt-construction.md`, and `prompts/system.md` must be manually merged since they contain Hermes-specific adaptations (tool wiring, backend neutrality, removed EXTEND.md references).
--- a/skills/creative/baoyu-article-illustrator/SKILL.md
+++ b/skills/creative/baoyu-article-illustrator/SKILL.md
@ -91,11 +91,11 @@ If the user asks for a different layout (e.g., images alongside the article, or

 ### Step 1: Detect Reference Images

-If the user supplies reference images (paths pasted inline, attachments, or a list of files):
+If the user supplies reference images (paths pasted inline, attachments, or a URL):

-1. Copy each reference to `{output-dir}/references/NN-ref-{slug}.{ext}` using `write_file`.
-2. Create a sidecar `NN-ref-{slug}.md` describing the reference.
-3. If the user described a reference but can't provide a file path, extract style/palette verbally and record under `references/extracted-style.md` — do NOT add a `references:` field to prompt frontmatter in that case.
+1. For each reference, call `vision_analyze` with the path/URL and a question asking for style, palette, composition, and subject. Record the returned description in `{output-dir}/references/NN-ref-{slug}.md` via `write_file`.
+2. **Do not** try to copy the binary via `write_file` / `read_file` — those are text-only. If you want a local copy for the record, use `terminal` (`cp "$src" "{output-dir}/references/NN-ref-{slug}.{ext}"`). The skill itself never needs to read the binary; it works off the vision description.
+3. Since `image_generate` doesn't take image inputs, the vision description is what gets embedded in prompts during Step 5.

 Full procedures: [references/workflow.md](references/workflow.md#step-1-detect-reference-images).

@ -156,11 +156,14 @@ For each illustration:

 ### Step 6: Generate Images

-Use the `image_generate` tool with the assembled prompt from each prompt file.
+For each prompt file:

- Map aspect ratio to `image_generate` format: `16:9` → `landscape`, `9:16` → `portrait`, `1:1` → `square`. For custom ratios, pick the closest named aspect.
- Generate sequentially through the outline. On failure, auto-retry once.
- Save each image to `{output-dir}/NN-{type}-{slug}.png`.
+1. Call `image_generate(prompt=..., aspect_ratio=...)`. `image_generate` returns a JSON result containing an image URL; it does NOT write to disk and does NOT accept an output path.
+2. Map the prompt's `ASPECT` to `image_generate`'s enum: `16:9` → `landscape`, `9:16` → `portrait`, `1:1` → `square`. Custom ratios → nearest named aspect.
+3. Download the returned URL to `{output-dir}/NN-{type}-{slug}.png` via `terminal` (e.g. `curl -sSL -o "{output-dir}/NN-{type}-{slug}.png" "{url}"`).
+4. On generation failure, auto-retry once.
+
+Note: the underlying image-generation backend is user-configured (default: FAL FLUX 2 Klein 9B) and is NOT agent-selectable via `image_generate`. Do not write model names into prompts expecting them to route.

 ### Step 7: Finalize

@ -199,3 +202,5 @@ Images: X/N generated
 3. **Don't illustrate metaphors literally** — visualize the underlying concept.
 4. **Prompt files are mandatory** — no image generation without a saved prompt file. The file is what lets you regenerate or switch backends later.
 5. **`image_generate` aspect ratios** — the tool supports `landscape`, `portrait`, and `square`. Custom ratios map to the nearest option.
+6. **`image_generate` returns a URL, not a local file** — always download via `terminal` (`curl`) before inserting local image paths into the article.
+7. **No backend selection from the agent** — `image_generate` uses whatever model the user configured (default: FAL FLUX 2 Klein 9B). Don't write `"use <model> to generate this"` into prompts expecting it to route.
--- a/skills/creative/baoyu-article-illustrator/prompts/system.md
+++ b/skills/creative/baoyu-article-illustrator/prompts/system.md
@ -29,4 +29,4 @@ Create a cartoon-style infographic illustration following these guidelines:

 ---

-Please use nano banana pro to generate the illustration based on the content provided below:
+Generate the illustration based on the content provided below:
--- a/skills/creative/baoyu-article-illustrator/references/workflow.md
+++ b/skills/creative/baoyu-article-illustrator/references/workflow.md
@ -2,34 +2,39 @@

 ## Step 1: Detect Reference Images

-Check if the user provided reference images. Handle based on input type:
+If the user provides reference images (local path or URL), the goal is to produce **textual descriptions** that can be embedded in prompts — `image_generate` doesn't accept reference-image inputs, and Hermes' text file tools can't read or write binaries.
+
+**Tool rules**:
+
+| Task | Tool | Notes |
+|------|------|-------|
+| Analyze a reference image | `vision_analyze` | Accepts URL or local path. Ask for style, palette, composition, subject. |
+| Write the text description | `write_file` | Sidecar `.md` files only — never try to `write_file` a PNG/JPG. |
+| (Optional) Keep a local copy of the binary | `terminal` | `cp "$src" "{output-dir}/references/NN-ref-{slug}.{ext}"` — purely for the record; the skill itself doesn't read the binary. |

 | Input Type | Action |
 |------------|--------|
-| Image file path provided | Copy to `{output-dir}/references/` → reference it by description in prompts |
-| Image in conversation (no path) | Ask user (via `clarify`) for a file path or a description |
-| User can't provide path | Extract style/palette verbally → append to prompts (no `references:` frontmatter) |
+| Image file path provided | `vision_analyze` → write sidecar `.md`. Optional `terminal cp` for a local record. |
+| Image URL provided | `vision_analyze` with the URL → write sidecar `.md`. |
+| Image in conversation (no path, no URL) | Ask via `clarify` for a path or URL, or for a verbal description. |
+| User can't provide either | Extract style/palette verbally from the user → write `references/extracted-style.md`. Do NOT add `references:` to prompt frontmatter. |

-**CRITICAL**: Only add a `references:` field to prompt frontmatter if files are ACTUALLY SAVED to the `references/` subdirectory.
+**Procedure** (when a path/URL is available):

-**If user provides a file path**:
-1. Copy to `{output-dir}/references/NN-ref-{slug}.png` using `write_file`
-2. Create description: `{output-dir}/references/NN-ref-{slug}.md`
-3. Verify files exist (via `read_file`) before proceeding
+1. Call `vision_analyze(image_url=..., question="Describe the style, color palette (with hex approximations), composition, and subject so this can be used as a style/palette reference for another illustration.")`.
+2. Write `{output-dir}/references/NN-ref-{slug}.md` via `write_file` with the description.
+3. (Optional) Run `terminal` with `cp` (or `curl -sSL -o ...` for URLs) to keep a local binary copy. Not required by the skill.
+4. Mark the reference in the outline with usage `direct` / `style` / `palette`. In Step 5.1 the description gets appended to the prompt body.

-**If user can't provide a path** (extracted verbally):
-1. Analyze the image visually, extract: colors, style, composition
-2. Create `{output-dir}/references/extracted-style.md` with extracted info
-3. Do NOT add `references:` to prompt frontmatter
-4. Instead, append extracted style/colors directly to prompt text
-
-**Description File Format** (only when file saved):
+**Sidecar File Format**:
 ```yaml
 ---
 ref_id: NN
-filename: NN-ref-{slug}.png
+source: "<original path or URL>"
+local_copy: "NN-ref-{slug}.png"   # omit if no copy made
+usage_hint: style                 # direct | style | palette
 ---
-[User's description or auto-generated description]
+[vision_analyze description — colors, style, composition, subject]
 ```

 ---
@ -80,9 +85,9 @@ Save analysis to `{output-dir}/analysis.md` using `write_file`.
 - Decorative scenes
 - Generic illustrations

-### 2.5 Analyze Reference Images (if saved in Step 1)
+### 2.5 Plan Reference Image Usage (if analyzed in Step 1)

-For each reference image:
+For each reference image (use the `vision_analyze` description from Step 1):

 | Analysis | Description |
 |----------|-------------|
@ -92,13 +97,13 @@ For each reference image:
 | Style match | Which illustration types/styles align |
 | Usage recommendation | `direct` / `style` / `palette` |

-| Usage | When to Use |
-|-------|-------------|
-| `direct` | Reference matches desired output closely |
-| `style` | Extract visual style characteristics only |
-| `palette` | Extract color scheme only |
+| Usage | When to Use | How it's applied in Step 5.1 |
+|-------|-------------|------------------------------|
+| `direct` | Reference matches desired output closely | Paste the description (composition + subject + style + palette) into the prompt body |
+| `style` | Extract visual style characteristics only | Append style traits to prompt body |
+| `palette` | Extract color scheme only | Append extracted hex colors to prompt body |

-Note: `image_generate` does not accept reference-image inputs. For `direct` usage, describe the reference in the prompt text (composition, subject, palette) rather than passing the file itself.
+Note: `image_generate` does not accept reference-image inputs under any usage type. Everything is mediated through the `vision_analyze` description.

 ---

@ -255,32 +260,40 @@ For each illustration in the outline:
 8. **Backup rule**: If a prompt file exists, rename to `prompts/NN-{type}-{slug}-backup-YYYYMMDD-HHMMSS.md`

 **CRITICAL - References in Frontmatter**:
- Only add `references` field if files ACTUALLY EXIST in `{output-dir}/references/` directory
- If style/palette was extracted verbally (no file), append info to prompt BODY instead
- Before writing frontmatter, verify the reference file exists
+- Only add `references` field if a sidecar `.md` description exists in `{output-dir}/references/`
+- If style/palette was extracted verbally (no description file), append info to prompt BODY only
+- Before writing frontmatter, confirm the sidecar exists (try `read_file` on the `.md`)

-### 5.1 Process References (if references saved in Step 1)
+### 5.1 Process References (if analyzed in Step 1)

-Since `image_generate` doesn't accept reference-image inputs, convert every reference to a textual description and append it to the prompt body:
+Read the `vision_analyze` description from the sidecar `references/NN-ref-{slug}.md` (via `read_file`) and embed it in the prompt body. `image_generate` never receives the binary.

 | Usage | Action |
 |-------|--------|
-| `direct` | Describe the reference (composition, subject, style, palette) in the prompt body |
-| `style` | Append style traits to prompt: "Style: clean lines, gradient backgrounds..." |
-| `palette` | Append extracted colors to prompt: "Colors: #E8756D coral, #7ECFC0 mint..." |
+| `direct` | Paste the full reference description (composition, subject, style, palette) into the prompt body |
+| `style` | Append only the style traits: "Style: clean lines, gradient backgrounds..." |
+| `palette` | Append only the hex colors: "Colors: #E8756D coral, #7ECFC0 mint..." |

 ---

 ## Step 6: Generate Images

+`image_generate` returns a JSON blob with a URL (`{"success": true, "image": "<url>"}`). It does NOT save a local file, does NOT accept an output path, and does NOT let the agent pick a backend/model. Treat the URL as a temporary artifact and download it explicitly.
+
 For each prompt file:

 1. Read the prompt file (via `read_file`) and extract the assembled prompt
-2. Map the prompt's `ASPECT` to `image_generate`'s format: `16:9` → `landscape`, `9:16` → `portrait`, `1:1` → `square`. Custom ratios → nearest named aspect.
-3. Call `image_generate` with the prompt text
-4. **Backup rule**: If an existing image file is present, rename to `NN-{type}-{slug}-backup-YYYYMMDD-HHMMSS.png` before writing
-5. Save the resulting image to `{output-dir}/NN-{type}-{slug}.png`
-6. On failure, retry once, then log and continue. After each generation, report "Generated X/N".
+2. Map the prompt's `ASPECT` to `image_generate`'s enum: `16:9` → `landscape`, `9:16` → `portrait`, `1:1` → `square`. Custom ratios → nearest named aspect.
+3. Call `image_generate(prompt=<assembled>, aspect_ratio=<enum>)` and extract the `image` URL from the returned JSON.
+4. **Backup rule**: If `{output-dir}/NN-{type}-{slug}.png` already exists, rename it via `terminal` (`mv "{output-dir}/NN-{type}-{slug}.png" "{output-dir}/NN-{type}-{slug}-backup-YYYYMMDD-HHMMSS.png"`) before writing.
+5. Download the URL via `terminal`:
+   ```bash
+   curl -sSL -o "{output-dir}/NN-{type}-{slug}.png" "{image_url}"
+   ```
+   If `curl` is unavailable, fall back to `wget -qO "{output-dir}/NN-{type}-{slug}.png" "{image_url}"`.
+6. Verify the file exists and has non-zero size (`terminal`: `test -s "{path}" && echo ok`).
+7. On generation failure, retry `image_generate` once. On download failure, retry `curl` once with a longer timeout. Then log and continue.
+8. After each generation, report "Generated X/N".

 ---