mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-29 06:31:32 +00:00
fix(skills): align article-illustrator with real Hermes tool capabilities
Addresses review feedback on #13193: 1. Reference-image flow no longer assumes write_file/read_file handle binaries. vision_analyze produces a textual description; the binary is optionally copied via terminal (cp/curl). The description is what gets embedded in prompts. 2. image_generate's URL-only return is now explicit. Step 6 downloads the returned URL to local disk via terminal (curl -sSL -o ...), then verifies non-zero size before proceeding. 3. Removed "Please use nano banana pro..." line from prompts/system.md — the backend is user-configured and not agent-selectable, so routing hints in the prompt are misleading. PORT_NOTES.md updated: prompts/system.md is no longer verbatim, and the file-ops/backend-selection rows now reflect Hermes' actual tool surface (write_file/read_file for text, terminal for binaries and URL downloads, vision_analyze for reading images).
This commit is contained in:
parent
4bd297094a
commit
a93de60b68
4 changed files with 75 additions and 56 deletions
|
|
@ -4,7 +4,7 @@ Ported from [JimLiu/baoyu-skills](https://github.com/JimLiu/baoyu-skills) v1.57.
|
|||
|
||||
## Changes from upstream
|
||||
|
||||
`SKILL.md`, `references/workflow.md`, `references/usage.md`, `references/style-presets.md`, `references/styles.md`, and `references/prompt-construction.md` were adapted. The 21 style files, 4 palette files, and `prompts/system.md` are verbatim copies. The `references/config/` directory was removed entirely.
|
||||
`SKILL.md`, `references/workflow.md`, `references/usage.md`, `references/style-presets.md`, `references/styles.md`, `references/prompt-construction.md`, and `prompts/system.md` were adapted. The 23 style files and 4 palette files are verbatim copies. The `references/config/` directory was removed entirely.
|
||||
|
||||
### Adaptations
|
||||
|
||||
|
|
@ -14,19 +14,20 @@ Ported from [JimLiu/baoyu-skills](https://github.com/JimLiu/baoyu-skills) v1.57.
|
|||
| Trigger | `/baoyu-article-illustrator` slash command + CLI flags | Natural language skill matching |
|
||||
| User config | EXTEND.md (project/user/XDG paths) + first-time-setup | Removed — not part of Hermes infra |
|
||||
| User prompts | `AskUserQuestion` (batched, multi-question) | `clarify` tool (one question at a time) |
|
||||
| Image generation | `baoyu-imagine` (Bun/TypeScript, multi-provider, accepts `--ref`) | `image_generate` tool (describes references in prompt text) |
|
||||
| Image generation | `baoyu-imagine` (Bun/TypeScript, multi-provider, accepts `--ref`, writes to local path) | `image_generate` (returns URL only; agent downloads via `terminal`/`curl`) |
|
||||
| Backend selection | User picks provider via CLI flags | Not agent-selectable — `image_generate` uses the user-configured FAL model. Removed hardcoded "nano banana pro" line from `prompts/system.md`. |
|
||||
| Reference images | Passed to backend via `--ref`, copied via shell | `vision_analyze` extracts a textual description (binary never touched by `write_file`/`read_file`); description is embedded in prompts. Optional `terminal cp` for a local record. |
|
||||
| Platform support | Linux/macOS/Windows/WSL/PowerShell | Linux/macOS only |
|
||||
| File operations | Bash commands | Hermes file tools (`write_file`, `read_file`) |
|
||||
| File operations | Bash commands | Hermes file tools: `write_file`/`read_file` for text, `terminal` for binaries and URL downloads, `vision_analyze` for reading images |
|
||||
| Watermark | Driven by EXTEND.md `watermark.enabled` | Optional — user asks for it per-article |
|
||||
| Output directory | EXTEND.md `default_output_dir` (imgs-subdir / same-dir / illustrations-subdir / independent) | Defaults based on input type; user overrides in request |
|
||||
|
||||
### What was preserved
|
||||
|
||||
- Type × Style × Palette three-dimension framework
|
||||
- All style definitions (23 files)
|
||||
- All palette definitions (4 files)
|
||||
- Core reference files (workflow, prompt-construction, styles, style-presets)
|
||||
- `prompts/system.md` (generation prompt template)
|
||||
- All style definitions (23 files, verbatim)
|
||||
- All palette definitions (4 files, verbatim)
|
||||
- Core reference files (workflow, prompt-construction, styles, style-presets) — adapted for Hermes tooling
|
||||
- Core principles and workflow structure (analyze → confirm → outline → prompts → generate)
|
||||
- Prompt-file-as-reproducibility-record discipline
|
||||
- Author, version, homepage attribution
|
||||
|
|
@ -44,4 +45,4 @@ curl -sL https://raw.githubusercontent.com/JimLiu/baoyu-skills/main/skills/baoyu
|
|||
diff <(curl -sL https://raw.githubusercontent.com/JimLiu/baoyu-skills/main/skills/baoyu-article-illustrator/references/styles/blueprint.md) references/styles/blueprint.md
|
||||
```
|
||||
|
||||
`references/styles/*`, `references/palettes/*`, and `prompts/system.md` can be overwritten directly. `SKILL.md`, `references/workflow.md`, `references/usage.md`, `references/style-presets.md`, `references/styles.md`, and `references/prompt-construction.md` must be manually merged since they contain Hermes-specific adaptations.
|
||||
`references/styles/*` and `references/palettes/*` can be overwritten directly. `SKILL.md`, `references/workflow.md`, `references/usage.md`, `references/style-presets.md`, `references/styles.md`, `references/prompt-construction.md`, and `prompts/system.md` must be manually merged since they contain Hermes-specific adaptations (tool wiring, backend neutrality, removed EXTEND.md references).
|
||||
|
|
|
|||
|
|
@ -91,11 +91,11 @@ If the user asks for a different layout (e.g., images alongside the article, or
|
|||
|
||||
### Step 1: Detect Reference Images
|
||||
|
||||
If the user supplies reference images (paths pasted inline, attachments, or a list of files):
|
||||
If the user supplies reference images (paths pasted inline, attachments, or a URL):
|
||||
|
||||
1. Copy each reference to `{output-dir}/references/NN-ref-{slug}.{ext}` using `write_file`.
|
||||
2. Create a sidecar `NN-ref-{slug}.md` describing the reference.
|
||||
3. If the user described a reference but can't provide a file path, extract style/palette verbally and record under `references/extracted-style.md` — do NOT add a `references:` field to prompt frontmatter in that case.
|
||||
1. For each reference, call `vision_analyze` with the path/URL and a question asking for style, palette, composition, and subject. Record the returned description in `{output-dir}/references/NN-ref-{slug}.md` via `write_file`.
|
||||
2. **Do not** try to copy the binary via `write_file` / `read_file` — those are text-only. If you want a local copy for the record, use `terminal` (`cp "$src" "{output-dir}/references/NN-ref-{slug}.{ext}"`). The skill itself never needs to read the binary; it works off the vision description.
|
||||
3. Since `image_generate` doesn't take image inputs, the vision description is what gets embedded in prompts during Step 5.
|
||||
|
||||
Full procedures: [references/workflow.md](references/workflow.md#step-1-detect-reference-images).
|
||||
|
||||
|
|
@ -156,11 +156,14 @@ For each illustration:
|
|||
|
||||
### Step 6: Generate Images
|
||||
|
||||
Use the `image_generate` tool with the assembled prompt from each prompt file.
|
||||
For each prompt file:
|
||||
|
||||
- Map aspect ratio to `image_generate` format: `16:9` → `landscape`, `9:16` → `portrait`, `1:1` → `square`. For custom ratios, pick the closest named aspect.
|
||||
- Generate sequentially through the outline. On failure, auto-retry once.
|
||||
- Save each image to `{output-dir}/NN-{type}-{slug}.png`.
|
||||
1. Call `image_generate(prompt=..., aspect_ratio=...)`. `image_generate` returns a JSON result containing an image URL; it does NOT write to disk and does NOT accept an output path.
|
||||
2. Map the prompt's `ASPECT` to `image_generate`'s enum: `16:9` → `landscape`, `9:16` → `portrait`, `1:1` → `square`. Custom ratios → nearest named aspect.
|
||||
3. Download the returned URL to `{output-dir}/NN-{type}-{slug}.png` via `terminal` (e.g. `curl -sSL -o "{output-dir}/NN-{type}-{slug}.png" "{url}"`).
|
||||
4. On generation failure, auto-retry once.
|
||||
|
||||
Note: the underlying image-generation backend is user-configured (default: FAL FLUX 2 Klein 9B) and is NOT agent-selectable via `image_generate`. Do not write model names into prompts expecting them to route.
|
||||
|
||||
### Step 7: Finalize
|
||||
|
||||
|
|
@ -199,3 +202,5 @@ Images: X/N generated
|
|||
3. **Don't illustrate metaphors literally** — visualize the underlying concept.
|
||||
4. **Prompt files are mandatory** — no image generation without a saved prompt file. The file is what lets you regenerate or switch backends later.
|
||||
5. **`image_generate` aspect ratios** — the tool supports `landscape`, `portrait`, and `square`. Custom ratios map to the nearest option.
|
||||
6. **`image_generate` returns a URL, not a local file** — always download via `terminal` (`curl`) before inserting local image paths into the article.
|
||||
7. **No backend selection from the agent** — `image_generate` uses whatever model the user configured (default: FAL FLUX 2 Klein 9B). Don't write `"use <model> to generate this"` into prompts expecting it to route.
|
||||
|
|
|
|||
|
|
@ -29,4 +29,4 @@ Create a cartoon-style infographic illustration following these guidelines:
|
|||
|
||||
---
|
||||
|
||||
Please use nano banana pro to generate the illustration based on the content provided below:
|
||||
Generate the illustration based on the content provided below:
|
||||
|
|
|
|||
|
|
@ -2,34 +2,39 @@
|
|||
|
||||
## Step 1: Detect Reference Images
|
||||
|
||||
Check if the user provided reference images. Handle based on input type:
|
||||
If the user provides reference images (local path or URL), the goal is to produce **textual descriptions** that can be embedded in prompts — `image_generate` doesn't accept reference-image inputs, and Hermes' text file tools can't read or write binaries.
|
||||
|
||||
**Tool rules**:
|
||||
|
||||
| Task | Tool | Notes |
|
||||
|------|------|-------|
|
||||
| Analyze a reference image | `vision_analyze` | Accepts URL or local path. Ask for style, palette, composition, subject. |
|
||||
| Write the text description | `write_file` | Sidecar `.md` files only — never try to `write_file` a PNG/JPG. |
|
||||
| (Optional) Keep a local copy of the binary | `terminal` | `cp "$src" "{output-dir}/references/NN-ref-{slug}.{ext}"` — purely for the record; the skill itself doesn't read the binary. |
|
||||
|
||||
| Input Type | Action |
|
||||
|------------|--------|
|
||||
| Image file path provided | Copy to `{output-dir}/references/` → reference it by description in prompts |
|
||||
| Image in conversation (no path) | Ask user (via `clarify`) for a file path or a description |
|
||||
| User can't provide path | Extract style/palette verbally → append to prompts (no `references:` frontmatter) |
|
||||
| Image file path provided | `vision_analyze` → write sidecar `.md`. Optional `terminal cp` for a local record. |
|
||||
| Image URL provided | `vision_analyze` with the URL → write sidecar `.md`. |
|
||||
| Image in conversation (no path, no URL) | Ask via `clarify` for a path or URL, or for a verbal description. |
|
||||
| User can't provide either | Extract style/palette verbally from the user → write `references/extracted-style.md`. Do NOT add `references:` to prompt frontmatter. |
|
||||
|
||||
**CRITICAL**: Only add a `references:` field to prompt frontmatter if files are ACTUALLY SAVED to the `references/` subdirectory.
|
||||
**Procedure** (when a path/URL is available):
|
||||
|
||||
**If user provides a file path**:
|
||||
1. Copy to `{output-dir}/references/NN-ref-{slug}.png` using `write_file`
|
||||
2. Create description: `{output-dir}/references/NN-ref-{slug}.md`
|
||||
3. Verify files exist (via `read_file`) before proceeding
|
||||
1. Call `vision_analyze(image_url=..., question="Describe the style, color palette (with hex approximations), composition, and subject so this can be used as a style/palette reference for another illustration.")`.
|
||||
2. Write `{output-dir}/references/NN-ref-{slug}.md` via `write_file` with the description.
|
||||
3. (Optional) Run `terminal` with `cp` (or `curl -sSL -o ...` for URLs) to keep a local binary copy. Not required by the skill.
|
||||
4. Mark the reference in the outline with usage `direct` / `style` / `palette`. In Step 5.1 the description gets appended to the prompt body.
|
||||
|
||||
**If user can't provide a path** (extracted verbally):
|
||||
1. Analyze the image visually, extract: colors, style, composition
|
||||
2. Create `{output-dir}/references/extracted-style.md` with extracted info
|
||||
3. Do NOT add `references:` to prompt frontmatter
|
||||
4. Instead, append extracted style/colors directly to prompt text
|
||||
|
||||
**Description File Format** (only when file saved):
|
||||
**Sidecar File Format**:
|
||||
```yaml
|
||||
---
|
||||
ref_id: NN
|
||||
filename: NN-ref-{slug}.png
|
||||
source: "<original path or URL>"
|
||||
local_copy: "NN-ref-{slug}.png" # omit if no copy made
|
||||
usage_hint: style # direct | style | palette
|
||||
---
|
||||
[User's description or auto-generated description]
|
||||
[vision_analyze description — colors, style, composition, subject]
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -80,9 +85,9 @@ Save analysis to `{output-dir}/analysis.md` using `write_file`.
|
|||
- Decorative scenes
|
||||
- Generic illustrations
|
||||
|
||||
### 2.5 Analyze Reference Images (if saved in Step 1)
|
||||
### 2.5 Plan Reference Image Usage (if analyzed in Step 1)
|
||||
|
||||
For each reference image:
|
||||
For each reference image (use the `vision_analyze` description from Step 1):
|
||||
|
||||
| Analysis | Description |
|
||||
|----------|-------------|
|
||||
|
|
@ -92,13 +97,13 @@ For each reference image:
|
|||
| Style match | Which illustration types/styles align |
|
||||
| Usage recommendation | `direct` / `style` / `palette` |
|
||||
|
||||
| Usage | When to Use |
|
||||
|-------|-------------|
|
||||
| `direct` | Reference matches desired output closely |
|
||||
| `style` | Extract visual style characteristics only |
|
||||
| `palette` | Extract color scheme only |
|
||||
| Usage | When to Use | How it's applied in Step 5.1 |
|
||||
|-------|-------------|------------------------------|
|
||||
| `direct` | Reference matches desired output closely | Paste the description (composition + subject + style + palette) into the prompt body |
|
||||
| `style` | Extract visual style characteristics only | Append style traits to prompt body |
|
||||
| `palette` | Extract color scheme only | Append extracted hex colors to prompt body |
|
||||
|
||||
Note: `image_generate` does not accept reference-image inputs. For `direct` usage, describe the reference in the prompt text (composition, subject, palette) rather than passing the file itself.
|
||||
Note: `image_generate` does not accept reference-image inputs under any usage type. Everything is mediated through the `vision_analyze` description.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -255,32 +260,40 @@ For each illustration in the outline:
|
|||
8. **Backup rule**: If a prompt file exists, rename to `prompts/NN-{type}-{slug}-backup-YYYYMMDD-HHMMSS.md`
|
||||
|
||||
**CRITICAL - References in Frontmatter**:
|
||||
- Only add `references` field if files ACTUALLY EXIST in `{output-dir}/references/` directory
|
||||
- If style/palette was extracted verbally (no file), append info to prompt BODY instead
|
||||
- Before writing frontmatter, verify the reference file exists
|
||||
- Only add `references` field if a sidecar `.md` description exists in `{output-dir}/references/`
|
||||
- If style/palette was extracted verbally (no description file), append info to prompt BODY only
|
||||
- Before writing frontmatter, confirm the sidecar exists (try `read_file` on the `.md`)
|
||||
|
||||
### 5.1 Process References (if references saved in Step 1)
|
||||
### 5.1 Process References (if analyzed in Step 1)
|
||||
|
||||
Since `image_generate` doesn't accept reference-image inputs, convert every reference to a textual description and append it to the prompt body:
|
||||
Read the `vision_analyze` description from the sidecar `references/NN-ref-{slug}.md` (via `read_file`) and embed it in the prompt body. `image_generate` never receives the binary.
|
||||
|
||||
| Usage | Action |
|
||||
|-------|--------|
|
||||
| `direct` | Describe the reference (composition, subject, style, palette) in the prompt body |
|
||||
| `style` | Append style traits to prompt: "Style: clean lines, gradient backgrounds..." |
|
||||
| `palette` | Append extracted colors to prompt: "Colors: #E8756D coral, #7ECFC0 mint..." |
|
||||
| `direct` | Paste the full reference description (composition, subject, style, palette) into the prompt body |
|
||||
| `style` | Append only the style traits: "Style: clean lines, gradient backgrounds..." |
|
||||
| `palette` | Append only the hex colors: "Colors: #E8756D coral, #7ECFC0 mint..." |
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Generate Images
|
||||
|
||||
`image_generate` returns a JSON blob with a URL (`{"success": true, "image": "<url>"}`). It does NOT save a local file, does NOT accept an output path, and does NOT let the agent pick a backend/model. Treat the URL as a temporary artifact and download it explicitly.
|
||||
|
||||
For each prompt file:
|
||||
|
||||
1. Read the prompt file (via `read_file`) and extract the assembled prompt
|
||||
2. Map the prompt's `ASPECT` to `image_generate`'s format: `16:9` → `landscape`, `9:16` → `portrait`, `1:1` → `square`. Custom ratios → nearest named aspect.
|
||||
3. Call `image_generate` with the prompt text
|
||||
4. **Backup rule**: If an existing image file is present, rename to `NN-{type}-{slug}-backup-YYYYMMDD-HHMMSS.png` before writing
|
||||
5. Save the resulting image to `{output-dir}/NN-{type}-{slug}.png`
|
||||
6. On failure, retry once, then log and continue. After each generation, report "Generated X/N".
|
||||
2. Map the prompt's `ASPECT` to `image_generate`'s enum: `16:9` → `landscape`, `9:16` → `portrait`, `1:1` → `square`. Custom ratios → nearest named aspect.
|
||||
3. Call `image_generate(prompt=<assembled>, aspect_ratio=<enum>)` and extract the `image` URL from the returned JSON.
|
||||
4. **Backup rule**: If `{output-dir}/NN-{type}-{slug}.png` already exists, rename it via `terminal` (`mv "{output-dir}/NN-{type}-{slug}.png" "{output-dir}/NN-{type}-{slug}-backup-YYYYMMDD-HHMMSS.png"`) before writing.
|
||||
5. Download the URL via `terminal`:
|
||||
```bash
|
||||
curl -sSL -o "{output-dir}/NN-{type}-{slug}.png" "{image_url}"
|
||||
```
|
||||
If `curl` is unavailable, fall back to `wget -qO "{output-dir}/NN-{type}-{slug}.png" "{image_url}"`.
|
||||
6. Verify the file exists and has non-zero size (`terminal`: `test -s "{path}" && echo ok`).
|
||||
7. On generation failure, retry `image_generate` once. On download failure, retry `curl` once with a longer timeout. Then log and continue.
|
||||
8. After each generation, report "Generated X/N".
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue