mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-02 07:11:49 +00:00

Jim Liu 宝玉 a93de60b68 fix(skills): align article-illustrator with real Hermes tool capabilities

Addresses review feedback on #13193:

1. Reference-image flow no longer assumes write_file/read_file handle
   binaries. vision_analyze produces a textual description; the binary
   is optionally copied via terminal (cp/curl). The description is what
   gets embedded in prompts.

2. image_generate's URL-only return is now explicit. Step 6 downloads
   the returned URL to local disk via terminal (curl -sSL -o ...), then
   verifies non-zero size before proceeding.

3. Removed "Please use nano banana pro..." line from prompts/system.md —
   the backend is user-configured and not agent-selectable, so routing
   hints in the prompt are misleading.

PORT_NOTES.md updated: prompts/system.md is no longer verbatim, and the
file-ops/backend-selection rows now reflect Hermes' actual tool surface
(write_file/read_file for text, terminal for binaries and URL downloads,
vision_analyze for reading images).

2026-05-18 18:28:56 -07:00

14 KiB

Raw Blame History

Detailed Workflow Procedures

Step 1: Detect Reference Images

If the user provides reference images (local path or URL), the goal is to produce textual descriptions that can be embedded in prompts — image_generate doesn't accept reference-image inputs, and Hermes' text file tools can't read or write binaries.

Tool rules:

Task	Tool	Notes
Analyze a reference image	`vision_analyze`	Accepts URL or local path. Ask for style, palette, composition, subject.
Write the text description	`write_file`	Sidecar `.md` files only — never try to `write_file` a PNG/JPG.
(Optional) Keep a local copy of the binary	`terminal`	`cp "$src" "{output-dir}/references/NN-ref-{slug}.{ext}"` — purely for the record; the skill itself doesn't read the binary.

Input Type	Action
Image file path provided	`vision_analyze` → write sidecar `.md`. Optional `terminal cp` for a local record.
Image URL provided	`vision_analyze` with the URL → write sidecar `.md`.
Image in conversation (no path, no URL)	Ask via `clarify` for a path or URL, or for a verbal description.
User can't provide either	Extract style/palette verbally from the user → write `references/extracted-style.md`. Do NOT add `references:` to prompt frontmatter.

Procedure (when a path/URL is available):

Call vision_analyze(image_url=..., question="Describe the style, color palette (with hex approximations), composition, and subject so this can be used as a style/palette reference for another illustration.").
Write {output-dir}/references/NN-ref-{slug}.md via write_file with the description.
(Optional) Run terminal with cp (or curl -sSL -o ... for URLs) to keep a local binary copy. Not required by the skill.
Mark the reference in the outline with usage direct / style / palette. In Step 5.1 the description gets appended to the prompt body.

Sidecar File Format:

---
ref_id: NN
source: "<original path or URL>"
local_copy: "NN-ref-{slug}.png"   # omit if no copy made
usage_hint: style                 # direct | style | palette
---
[vision_analyze description — colors, style, composition, subject]

Step 2: Analyze

2.1 Determine Output Directory

Input	Output Directory	Source-save path
Article file path	`{article-dir}/imgs/` (default)	— (read article via `read_file`)
Pasted content	`illustrations/{topic-slug}/` (cwd)	`source-{slug}.{ext}` (save via `write_file`)

If the user explicitly asked for a different layout (e.g., images in the article's folder, or an illustrations/ subdirectory), honor that.

2.2 Analyze Content

Analysis	Description
Content type	Technical / Tutorial / Methodology / Narrative
Illustration purpose	information / visualization / imagination
Core arguments	2-5 main points to visualize
Visual opportunities	Positions where illustrations add value
Recommended type	Based on content signals and purpose
Recommended density	Based on length and complexity

Save analysis to {output-dir}/analysis.md using write_file.

2.3 Extract Core Arguments

Main thesis
Key concepts reader needs
Comparisons/contrasts
Framework/model proposed

CRITICAL: If the article uses metaphors (e.g., "电锯切西瓜"), do NOT illustrate literally. Visualize the underlying concept.

2.4 Identify Positions

Illustrate:

Core arguments (REQUIRED)
Abstract concepts
Data comparisons
Processes, workflows

Do NOT Illustrate:

Metaphors literally
Decorative scenes
Generic illustrations

2.5 Plan Reference Image Usage (if analyzed in Step 1)

For each reference image (use the vision_analyze description from Step 1):

Analysis	Description
Visual characteristics	Style, colors, composition
Content/subject	What the reference depicts
Suitable positions	Which sections match this reference
Style match	Which illustration types/styles align
Usage recommendation	`direct` / `style` / `palette`

Usage	When to Use	How it's applied in Step 5.1
`direct`	Reference matches desired output closely	Paste the description (composition + subject + style + palette) into the prompt body
`style`	Extract visual style characteristics only	Append style traits to prompt body
`palette`	Extract color scheme only	Append extracted hex colors to prompt body

Note: image_generate does not accept reference-image inputs under any usage type. Everything is mediated through the vision_analyze description.

Step 3: Confirm Settings

Use the clarify tool. Since clarify handles one question at a time, ask the most important question first. Skip any question the user already answered in their request.

Q1: Preset or Type (highest priority)

Based on Step 2 content analysis, recommend a preset first (sets both type & style). Look up style-presets.md "Content Type → Preset Recommendations" table.

[Recommended preset] — [brief: type + style + why]
[Alternative preset] — [brief]
Or choose type manually: infographic / scene / flowchart / comparison / framework / timeline / mixed

If user picks a preset → skip Q3 (type & style both resolved). If user picks a type → Q3 is required.

Q2: Density

minimal (1-2) — Core concepts only
balanced (3-5) — Major sections
per-section — At least 1 per section/chapter (Recommended)
rich (6+) — Comprehensive coverage

Q3: Style (skip if preset chosen in Q1)

Present Core Styles first:

[Best compatible core style] (Recommended)
[Other compatible core style 1]
[Other compatible core style 2]
Other (see full Style Gallery)

Core Styles (simplified selection):

Core Style	Maps To	Best For
`minimal-flat`	notion	General, knowledge sharing, SaaS
`sci-fi`	blueprint	AI, frontier tech, system design
`hand-drawn`	sketch/warm	Relaxed, reflective, casual
`editorial`	editorial	Processes, data, journalism
`scene`	warm/watercolor	Narratives, emotional, lifestyle
`poster`	screen-print	Opinion, editorial, cultural, cinematic

Style selection based on Type × Style compatibility matrix (styles.md). In Step 5, read styles/<style>.md for visual elements and rendering rules.

Q4: Palette (optional)

If the preset did not specify a palette, offer:

Default (use style's built-in colors) (Recommended)
macaron — soft pastel blocks on warm cream
warm — warm earth tones, no cool colors
neon — vibrant neon on dark backgrounds

Skip if: preset already resolved palette, or user specified a palette in the request.

See Palette Gallery in styles.md and full specs in palettes/<palette>.md.

Q5: Image Text Language (only when ambiguous)

If the article language is different from the user's conversational language, ask which to use:

Article language (match article content) (Recommended)
User's conversational language

Skip if: languages match, or the user already specified in the request.

Display Reference Usage (if references saved in Step 1)

When presenting the outline preview to the user, show reference assignments:

Reference Images:
| Ref | Filename | Recommended Usage |
|-----|----------|-------------------|
| 01 | 01-ref-diagram.png | direct → Illustration 1, 3 |
| 02 | 02-ref-chart.png | palette → Illustration 2 |

Step 4: Generate Outline

Save as {output-dir}/outline.md using write_file:

---
type: infographic
density: balanced
style: blueprint
image_count: 4
references:                    # Only if references provided
  - ref_id: 01
    filename: 01-ref-diagram.png
    description: "Technical diagram showing system architecture"
  - ref_id: 02
    filename: 02-ref-chart.png
    description: "Color chart with brand palette"
---

## Illustration 1

**Position**: [section] / [paragraph]
**Purpose**: [why this helps]
**Visual Content**: [what to show]
**Type Application**: [how type applies]
**References**: [01]                    # Optional: list ref_ids used
**Reference Usage**: direct             # direct | style | palette
**Filename**: 01-infographic-concept-name.png

## Illustration 2
...

Backup rule: If outline.md exists, rename to outline-backup-YYYYMMDD-HHMMSS.md before writing.

Requirements:

Each position justified by content needs
Type applied consistently
Style reflected in descriptions
Count matches density
References assigned based on Step 2.5 analysis

Step 5: Generate Prompts

BLOCKING: Every illustration must have a saved prompt file before any image is generated.

For each illustration in the outline:

Create prompt file: {output-dir}/prompts/NN-{type}-{slug}.md via write_file

Include YAML frontmatter:

---
illustration_id: 01
type: infographic
style: custom-flat-vector
---

Load style specs: Read styles/<style>.md (via read_file) for visual elements, style rules, and rendering instructions
Load palette specs (if palette specified): Read palettes/<palette>.md for colors and background. Palette colors replace the style's default Color Palette. If no palette specified, use the style's built-in colors.
Follow type-specific template from prompt-construction.md, using rendering from style + colors from palette (or style default)
Prompt quality requirements (all REQUIRED):
- Layout: Describe overall composition (grid / radial / hierarchical / left-right / top-down)
- ZONES: Describe each visual area with specific content, not vague descriptions
- LABELS: Use actual numbers, terms, metrics, quotes from the article — NOT generic placeholders
- COLORS: Specify hex codes from palette (or style default) with semantic meaning
- STYLE: Describe line treatment, texture, mood, character rendering per style rules
- ASPECT: Specify ratio (e.g., 16:9)
Apply defaults: composition requirements, character rendering, text guidelines
Backup rule: If a prompt file exists, rename to prompts/NN-{type}-{slug}-backup-YYYYMMDD-HHMMSS.md

CRITICAL - References in Frontmatter:

Only add references field if a sidecar .md description exists in {output-dir}/references/
If style/palette was extracted verbally (no description file), append info to prompt BODY only
Before writing frontmatter, confirm the sidecar exists (try read_file on the .md)

5.1 Process References (if analyzed in Step 1)

Read the vision_analyze description from the sidecar references/NN-ref-{slug}.md (via read_file) and embed it in the prompt body. image_generate never receives the binary.

Usage	Action
`direct`	Paste the full reference description (composition, subject, style, palette) into the prompt body
`style`	Append only the style traits: "Style: clean lines, gradient backgrounds..."
`palette`	Append only the hex colors: "Colors: #E8756D coral, #7ECFC0 mint..."

Step 6: Generate Images

image_generate returns a JSON blob with a URL ({"success": true, "image": "<url>"}). It does NOT save a local file, does NOT accept an output path, and does NOT let the agent pick a backend/model. Treat the URL as a temporary artifact and download it explicitly.

For each prompt file:

Read the prompt file (via read_file) and extract the assembled prompt
Map the prompt's ASPECT to image_generate's enum: 16:9 → landscape, 9:16 → portrait, 1:1 → square. Custom ratios → nearest named aspect.
Call image_generate(prompt=<assembled>, aspect_ratio=<enum>) and extract the image URL from the returned JSON.
Backup rule: If {output-dir}/NN-{type}-{slug}.png already exists, rename it via terminal (mv "{output-dir}/NN-{type}-{slug}.png" "{output-dir}/NN-{type}-{slug}-backup-YYYYMMDD-HHMMSS.png") before writing.
Download the URL via terminal:
```
curl -sSL -o "{output-dir}/NN-{type}-{slug}.png" "{image_url}"
```
If curl is unavailable, fall back to wget -qO "{output-dir}/NN-{type}-{slug}.png" "{image_url}".
Verify the file exists and has non-zero size (terminal: test -s "{path}" && echo ok).
On generation failure, retry image_generate once. On download failure, retry curl once with a longer timeout. Then log and continue.
After each generation, report "Generated X/N".

Step 7: Finalize

7.1 Update Article

Insert after the corresponding paragraph, using the path relative to the article file:

Input	Insert Path
Article file path (default `imgs-subdir`)	`![description](imgs/NN-{type}-{slug}.png)`
Article file path (images alongside)	`![description](NN-{type}-{slug}.png)`
Article file path (`illustrations/` subdirectory)	`![description](illustrations/NN-{type}-{slug}.png)`
Pasted content	`![description](illustrations/{topic-slug}/NN-{type}-{slug}.png)` (relative to cwd)

Alt text: concise description in the article's language.

7.2 Output Summary

Article Illustration Complete!

Article: [path]
Type: [type] | Density: [level] | Style: [style]
Location: [directory]
Images: X/N generated

Positions:
- 01-xxx.png → After "[Section]"
- 02-yyy.png → After "[Section]"

[If failures]
Failed:
- NN-zzz.png: [reason]

14 KiB Raw Blame History Unescape Escape