Addresses review feedback on #13193: 1. Reference-image flow no longer assumes write_file/read_file handle binaries. vision_analyze produces a textual description; the binary is optionally copied via terminal (cp/curl). The description is what gets embedded in prompts. 2. image_generate's URL-only return is now explicit. Step 6 downloads the returned URL to local disk via terminal (curl -sSL -o ...), then verifies non-zero size before proceeding. 3. Removed "Please use nano banana pro..." line from prompts/system.md — the backend is user-configured and not agent-selectable, so routing hints in the prompt are misleading. PORT_NOTES.md updated: prompts/system.md is no longer verbatim, and the file-ops/backend-selection rows now reflect Hermes' actual tool surface (write_file/read_file for text, terminal for binaries and URL downloads, vision_analyze for reading images).
14 KiB
Detailed Workflow Procedures
Step 1: Detect Reference Images
If the user provides reference images (local path or URL), the goal is to produce textual descriptions that can be embedded in prompts — image_generate doesn't accept reference-image inputs, and Hermes' text file tools can't read or write binaries.
Tool rules:
| Task | Tool | Notes |
|---|---|---|
| Analyze a reference image | vision_analyze |
Accepts URL or local path. Ask for style, palette, composition, subject. |
| Write the text description | write_file |
Sidecar .md files only — never try to write_file a PNG/JPG. |
| (Optional) Keep a local copy of the binary | terminal |
cp "$src" "{output-dir}/references/NN-ref-{slug}.{ext}" — purely for the record; the skill itself doesn't read the binary. |
| Input Type | Action |
|---|---|
| Image file path provided | vision_analyze → write sidecar .md. Optional terminal cp for a local record. |
| Image URL provided | vision_analyze with the URL → write sidecar .md. |
| Image in conversation (no path, no URL) | Ask via clarify for a path or URL, or for a verbal description. |
| User can't provide either | Extract style/palette verbally from the user → write references/extracted-style.md. Do NOT add references: to prompt frontmatter. |
Procedure (when a path/URL is available):
- Call
vision_analyze(image_url=..., question="Describe the style, color palette (with hex approximations), composition, and subject so this can be used as a style/palette reference for another illustration."). - Write
{output-dir}/references/NN-ref-{slug}.mdviawrite_filewith the description. - (Optional) Run
terminalwithcp(orcurl -sSL -o ...for URLs) to keep a local binary copy. Not required by the skill. - Mark the reference in the outline with usage
direct/style/palette. In Step 5.1 the description gets appended to the prompt body.
Sidecar File Format:
---
ref_id: NN
source: "<original path or URL>"
local_copy: "NN-ref-{slug}.png" # omit if no copy made
usage_hint: style # direct | style | palette
---
[vision_analyze description — colors, style, composition, subject]
Step 2: Analyze
2.1 Determine Output Directory
| Input | Output Directory | Source-save path |
|---|---|---|
| Article file path | {article-dir}/imgs/ (default) |
— (read article via read_file) |
| Pasted content | illustrations/{topic-slug}/ (cwd) |
source-{slug}.{ext} (save via write_file) |
If the user explicitly asked for a different layout (e.g., images in the article's folder, or an illustrations/ subdirectory), honor that.
2.2 Analyze Content
| Analysis | Description |
|---|---|
| Content type | Technical / Tutorial / Methodology / Narrative |
| Illustration purpose | information / visualization / imagination |
| Core arguments | 2-5 main points to visualize |
| Visual opportunities | Positions where illustrations add value |
| Recommended type | Based on content signals and purpose |
| Recommended density | Based on length and complexity |
Save analysis to {output-dir}/analysis.md using write_file.
2.3 Extract Core Arguments
- Main thesis
- Key concepts reader needs
- Comparisons/contrasts
- Framework/model proposed
CRITICAL: If the article uses metaphors (e.g., "电锯切西瓜"), do NOT illustrate literally. Visualize the underlying concept.
2.4 Identify Positions
Illustrate:
- Core arguments (REQUIRED)
- Abstract concepts
- Data comparisons
- Processes, workflows
Do NOT Illustrate:
- Metaphors literally
- Decorative scenes
- Generic illustrations
2.5 Plan Reference Image Usage (if analyzed in Step 1)
For each reference image (use the vision_analyze description from Step 1):
| Analysis | Description |
|---|---|
| Visual characteristics | Style, colors, composition |
| Content/subject | What the reference depicts |
| Suitable positions | Which sections match this reference |
| Style match | Which illustration types/styles align |
| Usage recommendation | direct / style / palette |
| Usage | When to Use | How it's applied in Step 5.1 |
|---|---|---|
direct |
Reference matches desired output closely | Paste the description (composition + subject + style + palette) into the prompt body |
style |
Extract visual style characteristics only | Append style traits to prompt body |
palette |
Extract color scheme only | Append extracted hex colors to prompt body |
Note: image_generate does not accept reference-image inputs under any usage type. Everything is mediated through the vision_analyze description.
Step 3: Confirm Settings
Use the clarify tool. Since clarify handles one question at a time, ask the most important question first. Skip any question the user already answered in their request.
Q1: Preset or Type (highest priority)
Based on Step 2 content analysis, recommend a preset first (sets both type & style). Look up style-presets.md "Content Type → Preset Recommendations" table.
- [Recommended preset] — [brief: type + style + why]
- [Alternative preset] — [brief]
- Or choose type manually: infographic / scene / flowchart / comparison / framework / timeline / mixed
If user picks a preset → skip Q3 (type & style both resolved). If user picks a type → Q3 is required.
Q2: Density
- minimal (1-2) — Core concepts only
- balanced (3-5) — Major sections
- per-section — At least 1 per section/chapter (Recommended)
- rich (6+) — Comprehensive coverage
Q3: Style (skip if preset chosen in Q1)
Present Core Styles first:
- [Best compatible core style] (Recommended)
- [Other compatible core style 1]
- [Other compatible core style 2]
- Other (see full Style Gallery)
Core Styles (simplified selection):
| Core Style | Maps To | Best For |
|---|---|---|
minimal-flat |
notion | General, knowledge sharing, SaaS |
sci-fi |
blueprint | AI, frontier tech, system design |
hand-drawn |
sketch/warm | Relaxed, reflective, casual |
editorial |
editorial | Processes, data, journalism |
scene |
warm/watercolor | Narratives, emotional, lifestyle |
poster |
screen-print | Opinion, editorial, cultural, cinematic |
Style selection based on Type × Style compatibility matrix (styles.md).
In Step 5, read styles/<style>.md for visual elements and rendering rules.
Q4: Palette (optional)
If the preset did not specify a palette, offer:
- Default (use style's built-in colors) (Recommended)
macaron— soft pastel blocks on warm creamwarm— warm earth tones, no cool colorsneon— vibrant neon on dark backgrounds
Skip if: preset already resolved palette, or user specified a palette in the request.
See Palette Gallery in styles.md and full specs in palettes/<palette>.md.
Q5: Image Text Language (only when ambiguous)
If the article language is different from the user's conversational language, ask which to use:
- Article language (match article content) (Recommended)
- User's conversational language
Skip if: languages match, or the user already specified in the request.
Display Reference Usage (if references saved in Step 1)
When presenting the outline preview to the user, show reference assignments:
Reference Images:
| Ref | Filename | Recommended Usage |
|-----|----------|-------------------|
| 01 | 01-ref-diagram.png | direct → Illustration 1, 3 |
| 02 | 02-ref-chart.png | palette → Illustration 2 |
Step 4: Generate Outline
Save as {output-dir}/outline.md using write_file:
---
type: infographic
density: balanced
style: blueprint
image_count: 4
references: # Only if references provided
- ref_id: 01
filename: 01-ref-diagram.png
description: "Technical diagram showing system architecture"
- ref_id: 02
filename: 02-ref-chart.png
description: "Color chart with brand palette"
---
## Illustration 1
**Position**: [section] / [paragraph]
**Purpose**: [why this helps]
**Visual Content**: [what to show]
**Type Application**: [how type applies]
**References**: [01] # Optional: list ref_ids used
**Reference Usage**: direct # direct | style | palette
**Filename**: 01-infographic-concept-name.png
## Illustration 2
...
Backup rule: If outline.md exists, rename to outline-backup-YYYYMMDD-HHMMSS.md before writing.
Requirements:
- Each position justified by content needs
- Type applied consistently
- Style reflected in descriptions
- Count matches density
- References assigned based on Step 2.5 analysis
Step 5: Generate Prompts
BLOCKING: Every illustration must have a saved prompt file before any image is generated.
For each illustration in the outline:
- Create prompt file:
{output-dir}/prompts/NN-{type}-{slug}.mdviawrite_file - Include YAML frontmatter:
--- illustration_id: 01 type: infographic style: custom-flat-vector --- - Load style specs: Read
styles/<style>.md(viaread_file) for visual elements, style rules, and rendering instructions - Load palette specs (if palette specified): Read
palettes/<palette>.mdfor colors and background. Palette colors replace the style's default Color Palette. If no palette specified, use the style's built-in colors. - Follow type-specific template from prompt-construction.md, using rendering from style + colors from palette (or style default)
- Prompt quality requirements (all REQUIRED):
Layout: Describe overall composition (grid / radial / hierarchical / left-right / top-down)ZONES: Describe each visual area with specific content, not vague descriptionsLABELS: Use actual numbers, terms, metrics, quotes from the article — NOT generic placeholdersCOLORS: Specify hex codes from palette (or style default) with semantic meaningSTYLE: Describe line treatment, texture, mood, character rendering per style rulesASPECT: Specify ratio (e.g.,16:9)
- Apply defaults: composition requirements, character rendering, text guidelines
- Backup rule: If a prompt file exists, rename to
prompts/NN-{type}-{slug}-backup-YYYYMMDD-HHMMSS.md
CRITICAL - References in Frontmatter:
- Only add
referencesfield if a sidecar.mddescription exists in{output-dir}/references/ - If style/palette was extracted verbally (no description file), append info to prompt BODY only
- Before writing frontmatter, confirm the sidecar exists (try
read_fileon the.md)
5.1 Process References (if analyzed in Step 1)
Read the vision_analyze description from the sidecar references/NN-ref-{slug}.md (via read_file) and embed it in the prompt body. image_generate never receives the binary.
| Usage | Action |
|---|---|
direct |
Paste the full reference description (composition, subject, style, palette) into the prompt body |
style |
Append only the style traits: "Style: clean lines, gradient backgrounds..." |
palette |
Append only the hex colors: "Colors: #E8756D coral, #7ECFC0 mint..." |
Step 6: Generate Images
image_generate returns a JSON blob with a URL ({"success": true, "image": "<url>"}). It does NOT save a local file, does NOT accept an output path, and does NOT let the agent pick a backend/model. Treat the URL as a temporary artifact and download it explicitly.
For each prompt file:
- Read the prompt file (via
read_file) and extract the assembled prompt - Map the prompt's
ASPECTtoimage_generate's enum:16:9→landscape,9:16→portrait,1:1→square. Custom ratios → nearest named aspect. - Call
image_generate(prompt=<assembled>, aspect_ratio=<enum>)and extract theimageURL from the returned JSON. - Backup rule: If
{output-dir}/NN-{type}-{slug}.pngalready exists, rename it viaterminal(mv "{output-dir}/NN-{type}-{slug}.png" "{output-dir}/NN-{type}-{slug}-backup-YYYYMMDD-HHMMSS.png") before writing. - Download the URL via
terminal:
Ifcurl -sSL -o "{output-dir}/NN-{type}-{slug}.png" "{image_url}"curlis unavailable, fall back towget -qO "{output-dir}/NN-{type}-{slug}.png" "{image_url}". - Verify the file exists and has non-zero size (
terminal:test -s "{path}" && echo ok). - On generation failure, retry
image_generateonce. On download failure, retrycurlonce with a longer timeout. Then log and continue. - After each generation, report "Generated X/N".
Step 7: Finalize
7.1 Update Article
Insert after the corresponding paragraph, using the path relative to the article file:
| Input | Insert Path |
|---|---|
Article file path (default imgs-subdir) |
 |
| Article file path (images alongside) |  |
Article file path (illustrations/ subdirectory) |
 |
| Pasted content |  (relative to cwd) |
Alt text: concise description in the article's language.
7.2 Output Summary
Article Illustration Complete!
Article: [path]
Type: [type] | Density: [level] | Style: [style]
Location: [directory]
Images: X/N generated
Positions:
- 01-xxx.png → After "[Section]"
- 02-yyy.png → After "[Section]"
[If failures]
Failed:
- NN-zzz.png: [reason]