fix(skills): align article-illustrator with real Hermes tool capabilities

Addresses review feedback on #13193:

1. Reference-image flow no longer assumes write_file/read_file handle
   binaries. vision_analyze produces a textual description; the binary
   is optionally copied via terminal (cp/curl). The description is what
   gets embedded in prompts.

2. image_generate's URL-only return is now explicit. Step 6 downloads
   the returned URL to local disk via terminal (curl -sSL -o ...), then
   verifies non-zero size before proceeding.

3. Removed "Please use nano banana pro..." line from prompts/system.md —
   the backend is user-configured and not agent-selectable, so routing
   hints in the prompt are misleading.

PORT_NOTES.md updated: prompts/system.md is no longer verbatim, and the
file-ops/backend-selection rows now reflect Hermes' actual tool surface
(write_file/read_file for text, terminal for binaries and URL downloads,
vision_analyze for reading images).
This commit is contained in:
Jim Liu 宝玉 2026-04-20 21:28:42 -05:00 committed by Teknium
parent 4bd297094a
commit a93de60b68
4 changed files with 75 additions and 56 deletions

View file

@ -4,7 +4,7 @@ Ported from [JimLiu/baoyu-skills](https://github.com/JimLiu/baoyu-skills) v1.57.
## Changes from upstream
`SKILL.md`, `references/workflow.md`, `references/usage.md`, `references/style-presets.md`, `references/styles.md`, and `references/prompt-construction.md` were adapted. The 21 style files, 4 palette files, and `prompts/system.md` are verbatim copies. The `references/config/` directory was removed entirely.
`SKILL.md`, `references/workflow.md`, `references/usage.md`, `references/style-presets.md`, `references/styles.md`, `references/prompt-construction.md`, and `prompts/system.md` were adapted. The 23 style files and 4 palette files are verbatim copies. The `references/config/` directory was removed entirely.
### Adaptations
@ -14,19 +14,20 @@ Ported from [JimLiu/baoyu-skills](https://github.com/JimLiu/baoyu-skills) v1.57.
| Trigger | `/baoyu-article-illustrator` slash command + CLI flags | Natural language skill matching |
| User config | EXTEND.md (project/user/XDG paths) + first-time-setup | Removed — not part of Hermes infra |
| User prompts | `AskUserQuestion` (batched, multi-question) | `clarify` tool (one question at a time) |
| Image generation | `baoyu-imagine` (Bun/TypeScript, multi-provider, accepts `--ref`) | `image_generate` tool (describes references in prompt text) |
| Image generation | `baoyu-imagine` (Bun/TypeScript, multi-provider, accepts `--ref`, writes to local path) | `image_generate` (returns URL only; agent downloads via `terminal`/`curl`) |
| Backend selection | User picks provider via CLI flags | Not agent-selectable — `image_generate` uses the user-configured FAL model. Removed hardcoded "nano banana pro" line from `prompts/system.md`. |
| Reference images | Passed to backend via `--ref`, copied via shell | `vision_analyze` extracts a textual description (binary never touched by `write_file`/`read_file`); description is embedded in prompts. Optional `terminal cp` for a local record. |
| Platform support | Linux/macOS/Windows/WSL/PowerShell | Linux/macOS only |
| File operations | Bash commands | Hermes file tools (`write_file`, `read_file`) |
| File operations | Bash commands | Hermes file tools: `write_file`/`read_file` for text, `terminal` for binaries and URL downloads, `vision_analyze` for reading images |
| Watermark | Driven by EXTEND.md `watermark.enabled` | Optional — user asks for it per-article |
| Output directory | EXTEND.md `default_output_dir` (imgs-subdir / same-dir / illustrations-subdir / independent) | Defaults based on input type; user overrides in request |
### What was preserved
- Type × Style × Palette three-dimension framework
- All style definitions (23 files)
- All palette definitions (4 files)
- Core reference files (workflow, prompt-construction, styles, style-presets)
- `prompts/system.md` (generation prompt template)
- All style definitions (23 files, verbatim)
- All palette definitions (4 files, verbatim)
- Core reference files (workflow, prompt-construction, styles, style-presets) — adapted for Hermes tooling
- Core principles and workflow structure (analyze → confirm → outline → prompts → generate)
- Prompt-file-as-reproducibility-record discipline
- Author, version, homepage attribution
@ -44,4 +45,4 @@ curl -sL https://raw.githubusercontent.com/JimLiu/baoyu-skills/main/skills/baoyu
diff <(curl -sL https://raw.githubusercontent.com/JimLiu/baoyu-skills/main/skills/baoyu-article-illustrator/references/styles/blueprint.md) references/styles/blueprint.md
```
`references/styles/*`, `references/palettes/*`, and `prompts/system.md` can be overwritten directly. `SKILL.md`, `references/workflow.md`, `references/usage.md`, `references/style-presets.md`, `references/styles.md`, and `references/prompt-construction.md` must be manually merged since they contain Hermes-specific adaptations.
`references/styles/*` and `references/palettes/*` can be overwritten directly. `SKILL.md`, `references/workflow.md`, `references/usage.md`, `references/style-presets.md`, `references/styles.md`, `references/prompt-construction.md`, and `prompts/system.md` must be manually merged since they contain Hermes-specific adaptations (tool wiring, backend neutrality, removed EXTEND.md references).

View file

@ -91,11 +91,11 @@ If the user asks for a different layout (e.g., images alongside the article, or
### Step 1: Detect Reference Images
If the user supplies reference images (paths pasted inline, attachments, or a list of files):
If the user supplies reference images (paths pasted inline, attachments, or a URL):
1. Copy each reference to `{output-dir}/references/NN-ref-{slug}.{ext}` using `write_file`.
2. Create a sidecar `NN-ref-{slug}.md` describing the reference.
3. If the user described a reference but can't provide a file path, extract style/palette verbally and record under `references/extracted-style.md` — do NOT add a `references:` field to prompt frontmatter in that case.
1. For each reference, call `vision_analyze` with the path/URL and a question asking for style, palette, composition, and subject. Record the returned description in `{output-dir}/references/NN-ref-{slug}.md` via `write_file`.
2. **Do not** try to copy the binary via `write_file` / `read_file` — those are text-only. If you want a local copy for the record, use `terminal` (`cp "$src" "{output-dir}/references/NN-ref-{slug}.{ext}"`). The skill itself never needs to read the binary; it works off the vision description.
3. Since `image_generate` doesn't take image inputs, the vision description is what gets embedded in prompts during Step 5.
Full procedures: [references/workflow.md](references/workflow.md#step-1-detect-reference-images).
@ -156,11 +156,14 @@ For each illustration:
### Step 6: Generate Images
Use the `image_generate` tool with the assembled prompt from each prompt file.
For each prompt file:
- Map aspect ratio to `image_generate` format: `16:9``landscape`, `9:16``portrait`, `1:1``square`. For custom ratios, pick the closest named aspect.
- Generate sequentially through the outline. On failure, auto-retry once.
- Save each image to `{output-dir}/NN-{type}-{slug}.png`.
1. Call `image_generate(prompt=..., aspect_ratio=...)`. `image_generate` returns a JSON result containing an image URL; it does NOT write to disk and does NOT accept an output path.
2. Map the prompt's `ASPECT` to `image_generate`'s enum: `16:9``landscape`, `9:16``portrait`, `1:1``square`. Custom ratios → nearest named aspect.
3. Download the returned URL to `{output-dir}/NN-{type}-{slug}.png` via `terminal` (e.g. `curl -sSL -o "{output-dir}/NN-{type}-{slug}.png" "{url}"`).
4. On generation failure, auto-retry once.
Note: the underlying image-generation backend is user-configured (default: FAL FLUX 2 Klein 9B) and is NOT agent-selectable via `image_generate`. Do not write model names into prompts expecting them to route.
### Step 7: Finalize
@ -199,3 +202,5 @@ Images: X/N generated
3. **Don't illustrate metaphors literally** — visualize the underlying concept.
4. **Prompt files are mandatory** — no image generation without a saved prompt file. The file is what lets you regenerate or switch backends later.
5. **`image_generate` aspect ratios** — the tool supports `landscape`, `portrait`, and `square`. Custom ratios map to the nearest option.
6. **`image_generate` returns a URL, not a local file** — always download via `terminal` (`curl`) before inserting local image paths into the article.
7. **No backend selection from the agent**`image_generate` uses whatever model the user configured (default: FAL FLUX 2 Klein 9B). Don't write `"use <model> to generate this"` into prompts expecting it to route.

View file

@ -29,4 +29,4 @@ Create a cartoon-style infographic illustration following these guidelines:
---
Please use nano banana pro to generate the illustration based on the content provided below:
Generate the illustration based on the content provided below:

View file

@ -2,34 +2,39 @@
## Step 1: Detect Reference Images
Check if the user provided reference images. Handle based on input type:
If the user provides reference images (local path or URL), the goal is to produce **textual descriptions** that can be embedded in prompts — `image_generate` doesn't accept reference-image inputs, and Hermes' text file tools can't read or write binaries.
**Tool rules**:
| Task | Tool | Notes |
|------|------|-------|
| Analyze a reference image | `vision_analyze` | Accepts URL or local path. Ask for style, palette, composition, subject. |
| Write the text description | `write_file` | Sidecar `.md` files only — never try to `write_file` a PNG/JPG. |
| (Optional) Keep a local copy of the binary | `terminal` | `cp "$src" "{output-dir}/references/NN-ref-{slug}.{ext}"` — purely for the record; the skill itself doesn't read the binary. |
| Input Type | Action |
|------------|--------|
| Image file path provided | Copy to `{output-dir}/references/` → reference it by description in prompts |
| Image in conversation (no path) | Ask user (via `clarify`) for a file path or a description |
| User can't provide path | Extract style/palette verbally → append to prompts (no `references:` frontmatter) |
| Image file path provided | `vision_analyze` → write sidecar `.md`. Optional `terminal cp` for a local record. |
| Image URL provided | `vision_analyze` with the URL → write sidecar `.md`. |
| Image in conversation (no path, no URL) | Ask via `clarify` for a path or URL, or for a verbal description. |
| User can't provide either | Extract style/palette verbally from the user → write `references/extracted-style.md`. Do NOT add `references:` to prompt frontmatter. |
**CRITICAL**: Only add a `references:` field to prompt frontmatter if files are ACTUALLY SAVED to the `references/` subdirectory.
**Procedure** (when a path/URL is available):
**If user provides a file path**:
1. Copy to `{output-dir}/references/NN-ref-{slug}.png` using `write_file`
2. Create description: `{output-dir}/references/NN-ref-{slug}.md`
3. Verify files exist (via `read_file`) before proceeding
1. Call `vision_analyze(image_url=..., question="Describe the style, color palette (with hex approximations), composition, and subject so this can be used as a style/palette reference for another illustration.")`.
2. Write `{output-dir}/references/NN-ref-{slug}.md` via `write_file` with the description.
3. (Optional) Run `terminal` with `cp` (or `curl -sSL -o ...` for URLs) to keep a local binary copy. Not required by the skill.
4. Mark the reference in the outline with usage `direct` / `style` / `palette`. In Step 5.1 the description gets appended to the prompt body.
**If user can't provide a path** (extracted verbally):
1. Analyze the image visually, extract: colors, style, composition
2. Create `{output-dir}/references/extracted-style.md` with extracted info
3. Do NOT add `references:` to prompt frontmatter
4. Instead, append extracted style/colors directly to prompt text
**Description File Format** (only when file saved):
**Sidecar File Format**:
```yaml
---
ref_id: NN
filename: NN-ref-{slug}.png
source: "<original path or URL>"
local_copy: "NN-ref-{slug}.png" # omit if no copy made
usage_hint: style # direct | style | palette
---
[User's description or auto-generated description]
[vision_analyze description — colors, style, composition, subject]
```
---
@ -80,9 +85,9 @@ Save analysis to `{output-dir}/analysis.md` using `write_file`.
- Decorative scenes
- Generic illustrations
### 2.5 Analyze Reference Images (if saved in Step 1)
### 2.5 Plan Reference Image Usage (if analyzed in Step 1)
For each reference image:
For each reference image (use the `vision_analyze` description from Step 1):
| Analysis | Description |
|----------|-------------|
@ -92,13 +97,13 @@ For each reference image:
| Style match | Which illustration types/styles align |
| Usage recommendation | `direct` / `style` / `palette` |
| Usage | When to Use |
|-------|-------------|
| `direct` | Reference matches desired output closely |
| `style` | Extract visual style characteristics only |
| `palette` | Extract color scheme only |
| Usage | When to Use | How it's applied in Step 5.1 |
|-------|-------------|------------------------------|
| `direct` | Reference matches desired output closely | Paste the description (composition + subject + style + palette) into the prompt body |
| `style` | Extract visual style characteristics only | Append style traits to prompt body |
| `palette` | Extract color scheme only | Append extracted hex colors to prompt body |
Note: `image_generate` does not accept reference-image inputs. For `direct` usage, describe the reference in the prompt text (composition, subject, palette) rather than passing the file itself.
Note: `image_generate` does not accept reference-image inputs under any usage type. Everything is mediated through the `vision_analyze` description.
---
@ -255,32 +260,40 @@ For each illustration in the outline:
8. **Backup rule**: If a prompt file exists, rename to `prompts/NN-{type}-{slug}-backup-YYYYMMDD-HHMMSS.md`
**CRITICAL - References in Frontmatter**:
- Only add `references` field if files ACTUALLY EXIST in `{output-dir}/references/` directory
- If style/palette was extracted verbally (no file), append info to prompt BODY instead
- Before writing frontmatter, verify the reference file exists
- Only add `references` field if a sidecar `.md` description exists in `{output-dir}/references/`
- If style/palette was extracted verbally (no description file), append info to prompt BODY only
- Before writing frontmatter, confirm the sidecar exists (try `read_file` on the `.md`)
### 5.1 Process References (if references saved in Step 1)
### 5.1 Process References (if analyzed in Step 1)
Since `image_generate` doesn't accept reference-image inputs, convert every reference to a textual description and append it to the prompt body:
Read the `vision_analyze` description from the sidecar `references/NN-ref-{slug}.md` (via `read_file`) and embed it in the prompt body. `image_generate` never receives the binary.
| Usage | Action |
|-------|--------|
| `direct` | Describe the reference (composition, subject, style, palette) in the prompt body |
| `style` | Append style traits to prompt: "Style: clean lines, gradient backgrounds..." |
| `palette` | Append extracted colors to prompt: "Colors: #E8756D coral, #7ECFC0 mint..." |
| `direct` | Paste the full reference description (composition, subject, style, palette) into the prompt body |
| `style` | Append only the style traits: "Style: clean lines, gradient backgrounds..." |
| `palette` | Append only the hex colors: "Colors: #E8756D coral, #7ECFC0 mint..." |
---
## Step 6: Generate Images
`image_generate` returns a JSON blob with a URL (`{"success": true, "image": "<url>"}`). It does NOT save a local file, does NOT accept an output path, and does NOT let the agent pick a backend/model. Treat the URL as a temporary artifact and download it explicitly.
For each prompt file:
1. Read the prompt file (via `read_file`) and extract the assembled prompt
2. Map the prompt's `ASPECT` to `image_generate`'s format: `16:9``landscape`, `9:16``portrait`, `1:1``square`. Custom ratios → nearest named aspect.
3. Call `image_generate` with the prompt text
4. **Backup rule**: If an existing image file is present, rename to `NN-{type}-{slug}-backup-YYYYMMDD-HHMMSS.png` before writing
5. Save the resulting image to `{output-dir}/NN-{type}-{slug}.png`
6. On failure, retry once, then log and continue. After each generation, report "Generated X/N".
2. Map the prompt's `ASPECT` to `image_generate`'s enum: `16:9``landscape`, `9:16``portrait`, `1:1``square`. Custom ratios → nearest named aspect.
3. Call `image_generate(prompt=<assembled>, aspect_ratio=<enum>)` and extract the `image` URL from the returned JSON.
4. **Backup rule**: If `{output-dir}/NN-{type}-{slug}.png` already exists, rename it via `terminal` (`mv "{output-dir}/NN-{type}-{slug}.png" "{output-dir}/NN-{type}-{slug}-backup-YYYYMMDD-HHMMSS.png"`) before writing.
5. Download the URL via `terminal`:
```bash
curl -sSL -o "{output-dir}/NN-{type}-{slug}.png" "{image_url}"
```
If `curl` is unavailable, fall back to `wget -qO "{output-dir}/NN-{type}-{slug}.png" "{image_url}"`.
6. Verify the file exists and has non-zero size (`terminal`: `test -s "{path}" && echo ok`).
7. On generation failure, retry `image_generate` once. On download failure, retry `curl` once with a longer timeout. Then log and continue.
8. After each generation, report "Generated X/N".
---