`hermes gateway restart` on Windows could take the gateway offline with no
replacement. restart() was stop() -> sleep(1.0) -> start(), but the graceful
drain can run up to ~180s while the detached pythonw process stays alive. The
1s sleep let start() run against the still-draining old process; its
"already running" guard then no-opped, and when the old process finally exited
nothing relaunched it.
Two root causes, both fixed:
1. Loose PID detection. `_scan_gateway_pids` and the gateway.status helpers
used substring matches ("... gateway" in cmdline) for lifecycle decisions,
so they false-matched `gateway status`/`dashboard` siblings and unrelated
processes like `python -m tui_gateway`, plus stale gateway.pid records.
Add a shared strict matcher `looks_like_gateway_command_line()` in
gateway/status.py that requires the real `gateway run` subcommand (or the
dedicated entrypoints), and route `_looks_like_gateway_process`,
`_record_looks_like_gateway`, and `_scan_gateway_pids` through it.
2. restart() race. Wait until the gateway is authoritatively gone
(`get_running_pid()` + strict `_gateway_pids()`) before relaunch; force-kill
once if it lingers and raise rather than start a duplicate; verify the
relaunch produced a running gateway and raise loudly if not (no more
exit-0 silent outage).
Scoped to Windows; systemd/launchd restart paths are already drain-aware.
Adds tests/gateway/test_gateway_command_line_matcher.py.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the tautological test from the original PR (which asserted a
plain assignment it performed itself in the test body) with one that
exercises the actual contracts: _init_cached_agent_for_turn leaves
max_iterations untouched, and the per-turn IterationBudget rebuild
(turn_context.py) propagates a refreshed cap.
When a gateway agent is reused from cache, it retains the max_iterations
from its initial creation. If config.yaml agent.max_turns or HERMES_MAX_ITERATIONS
changed between turns, the cached agent's budget becomes stale.
Before reusing a cached agent, refresh agent.max_iterations from the
freshly-resolved value (read from env/config at line 14585).
Fixes partial issue from PR #48127: handles fresh agent creation + cached agent reuse.
The Discord fix (previous commit) handles dict-shaped clarify choices at the
Discord adapter only. The same dict-repr leak originates upstream at
tools/clarify_tool.py's str(c).strip() normalization — the single
platform-agnostic point both the CLI and every gateway adapter flow through.
When an LLM emits [{"description": "..."}] instead of bare strings, str(c)
produced {'description': '...'} which leaked onto the CLI panel
(cli.py:13048/13081), was returned verbatim as the user's answer
(cli.py:11945), and hit Telegram's numbered list too.
Add _flatten_choice (same label->description->text->title unwrap as the
Discord adapter, name/value excluded, keyless dicts dropped) and apply it at
the normalization line. Fixes CLI + Telegram + all platforms at the root;
the Discord smart-truncation now operates on already-clean text.
Adds johnjacobkenny to AUTHOR_MAP for the salvaged commit.
Two bugs surfaced from production usage in #37134:
1. Dict choices rendered as Python repr. LLMs sometimes emit
[{"description": "..."}] instead of bare strings; the old
str(c).strip() coercion turned the whole dict into
"{'description': '...'}" on the button label.
Fix: add a _flatten_choice helper that unwraps dicts against
the canonical LLM tool-call user-facing keys (label, description,
text, title) in that order. Dicts with none of those keys are
dropped. The "name" and "value" keys are deliberately NOT in the
priority list — they're Discord-component-shaped fields that
could appear in dicts that aren't meant to be choices (a
developer-error wiring that passes a Button-shaped object);
picking them would leak raw enum values or 4-char model
identifiers onto user-facing buttons.
2. Mid-word truncation on long button labels. The old
choice[:72] + "..." cut at position 72, mid-word. Worse, the
three-char ellipsis ate into the 80-char Discord label cap,
leaving only 75 chars of body.
Fix: budget-aware cut strategy with three tiers:
a. Last space in the trailing half of the budget (word boundary).
b. Last soft boundary (- , . )) in the trailing half — used
only when no word boundary exists.
c. Hard cut at the budget limit (last resort).
Use single U+2026 (…) to fit the cap. Cut AT soft boundaries
(inclusive) so the label ends on the boundary char rather than
on the alpha char that followed it.
Tests:
- test_unwraps_dict_choices_to_description: reproduces the
screenshot in #37134, asserts the Python repr is gone.
- test_unwrap_prefers_description_over_name_in_multi_key_dict:
regression guard for the name-key order in the unwrap list.
- test_unwrap_prefers_label_over_description: regression guard
for label winning over description.
- test_unwrap_does_not_pick_value_or_name_alone: regression
guard for the "name"/"value" fields being absent.
- test_truncates_long_choice_label: 200-char input, asserts
total <= 80 and U+2026.
- test_truncates_long_choice_label_breaks_on_word_boundary:
asserts the cut is on a space, not mid-word.
- test_truncates_long_no_space_choice_on_soft_boundary:
adversarial input where position 76 is mid-word alpha, asserts
the renderer falls back to a soft boundary.
Parity: telegram clarify suite (12 tests) still passes; the
helper is a Discord adapter local, not shared with the gateway.
Follow-up: gateway/platforms/telegram.py has the same str(c).strip()
pattern in its own send_clarify and will need a similar fix
(separate PR to keep this diff reviewable).
Fixes#37134
Unit-test `storedSessionIdForNotification`: runtime ids resolve to their
stored id, unknown ids and empty maps pass through unchanged, the right
stored id is picked among several sessions, and stored ids (map keys) are
never rewritten.
Native notifications (approval / sudo / secret / clarify) are tagged with
the gateway *runtime* session id — the key under which the session lives in
the gateway's in-memory `_sessions` map and the id every event carries
(`tui_gateway/server.py` `_emit(event, sid, ...)`). The chat route, however,
is keyed by the *stored* session id (`stored_session_id`), which is a
different value: a new chat gets its runtime id immediately but its stored id
only once the first turn persists.
`onFocusSession` navigated straight to `sessionRoute(<runtime id>)`, so
clicking a notification (e.g. an approval prompt) sent the route-resume path a
runtime id where it expects a stored id. `useRouteResume` then resumed it as a
stored session -> REST `/api/sessions/<runtime id>` 404 "session not found",
and the running session was navigated away, which the user experiences as the
session being destroyed.
Translate runtime -> stored before navigating via the existing
`runtimeIdByStoredSessionId` map (new `storedSessionIdForNotification`
helper), falling back to the id as-is when no mapping is known. The
Approve/Reject notification button path is untouched: `approval.respond` is
routed by the runtime id (`_sess()` -> `_sessions[session_id]`), so it must
keep carrying the runtime id.
_is_compression_ancestor walked parent links in a 100-hop Python loop
issuing two SELECTs per hop and hand-re-encoded the compression
continuation edge a fourth time. Collapse it into a single recursive CTE
that reuses the canonical _COMPRESSION_CHILD_SQL fragment (already shared
by _ephemeral_child_sql and set_session_archived), so the edge definition
lives in exactly one place. The UNION recursion also dedups visited nodes,
making it cycle-safe without the defensive hop cap. Behavior is unchanged
(all TestSessionTitleLineage + existing title-command tests pass).
Regression tests for renaming a compression continuation back to its base
title: single- and multi-level chains transfer the title off the ended
predecessor, while unrelated sessions and non-compression children (created
while the parent was live) still raise the uniqueness conflict.
When context compression rotates a session, the original is ended and the
continuation is auto-numbered (e.g. "name" -> "name #2"). The session list
projects the ended root behind its live tip, so the user never sees the
predecessor. But set_session_title's uniqueness check compared against ALL
sessions, so renaming the visible tip back to "name" dead-ended with
"Title 'name' is already in use by session <id the user can't find>".
When the conflicting title is held by a compression ancestor of the session
being renamed, transfer the title instead of raising: clear it from the
ended predecessor and apply it to the continuation. Uniqueness is preserved
(still exactly one session carries the title) and the parent-link lineage is
untouched, so resume-by-title and tip projection keep working. Genuine
conflicts with unrelated sessions, and with non-compression children
(delegate/branch), still raise as before.
TMUX is not forwarded over SSH, so a TUI launched on a remote host from
inside local tmux only sees TERM=tmux/tmux-256color with no TMUX var --
the cursor-drift bug still applies there. Extend supportsFastEchoTerminal()
to also fall back when TERM is tmux-flavored.
Deliberately scoped to tmux* only, NOT screen*: GNU screen sets the same
screen/screen-256color TERM and has no reported drift, so widening to
screen would disable the optimization for those users with no evidence of
a bug (matching the original PR's stated out-of-scope note).
Adds tests for tmux-flavored TERM (disabled) and screen/xterm TERM
(stays enabled) to guard against accidental widening.
When /goal (and other _PENDING_INPUT_COMMANDS: retry, queue, q, steer,
plan, undo) were typed in the TUI desktop app, slash.exec returned error
4018 instructing the frontend to fall back to command.dispatch. Some
clients failed that client-side fallback, leaving the command empty and
surfacing "empty command" — the user's typed text was silently dropped.
slash.exec now routes pending-input commands to command.dispatch
internally, eliminating the fragile client-side fallback hop. The
response is exactly what command.dispatch would have produced, so the
TUI client behaves identically once the round-trip succeeds.
Salvaged from #48944 — rebased onto current main. The original PR's
source change and test_goal_command.py update are correct, but it missed
the second test surface: tests/tui_gateway/test_protocol.py's
parametrized test_slash_exec_rejects_pending_input_commands still
asserted the old 4018 rejection for retry/queue/q/steer/plan, turning CI
red (5 failures). That test is rewritten here as a behavior contract:
slash.exec for a pending-input command must yield the same payload as a
direct command.dispatch call, and must no longer emit the old
"pending-input command" fallback rejection.
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
`hermes backup` walked every file under HERMES_HOME, excluding only
hermes-agent / node_modules / __pycache__ / backups / checkpoints. Python
dependency trees (plugin and MCP-server venvs, site-packages) and pip/uv
tool caches that live under HERMES_HOME were swept in file-by-file,
ballooning a backup to hundreds of thousands of entries that crawl for
hours — the reported "backup stuck for days / 426543 files" symptom.
Add the canonical regeneratable-dir names (.venv, venv, site-packages,
.tox, .nox, .pytest_cache, .mypy_cache, .ruff_cache — mirroring
agent.skill_utils.EXCLUDED_SKILL_DIRS) plus .cache to the backup's
exclusion set, used by both run_backup and the pre-update/pre-migration
_write_full_zip_backup. .archive is intentionally left in so the curator's
restorable archived skills still get backed up.
Tests cover each new dir name (excluded at any depth), that .archive and
cache-resembling files are kept, and an integration check that a planted
venv/site-packages/cache is pruned from the actual backup zip while
skills/config survive.
The batch tool_status values ('completed'/'error'/'pending') and the inbound
status alias sets were inline magic strings, duplicated across two checks in
_tool_result_status. Hoist them to module-level constants
(_TOOL_STATUS_* + _TOOL_STATUS_{ERROR,COMPLETED}_ALIASES) so the canonical
wire values and the alias->canonical mapping live in one place. Emitted
values are unchanged.
_messages_to_openviking_batch's pre-scan already parses and caches each
tool call's arguments into tool_calls_by_id. The pending-tool-call branch
re-parsed them via _tool_call_input(), a second parse and a second source
of truth. Reuse the cached tool_input when the id was cached (non-empty),
falling back to a parse only for the uncached empty-id case so arguments
are never dropped. No behavior change.
_OPENVIKING_RECALL_TOOL_NAMES hardcoded the three read-tool names as string
literals, which can silently desync from the *_SCHEMA["name"] constants on a
rename (the same drift the adjacent _CATEGORY_SUBDIR_MAP comment warns about).
Derive the set from SEARCH/READ/BROWSE_SCHEMA["name"] instead. Write tools
(viking_remember / viking_add_resource) remain intentionally excluded. Set
contents are unchanged.
The npm workspace pins a single npmDepsHash for fetchNpmDeps. Any change to
package-lock.json that doesn't also refresh that hash breaks the bundled
hermes-tui / hermes-desktop-renderer build for Nix flake consumers, and no
nix CI catches it — the workflow that ran fix-lockfiles was removed in
9eb0bcd6 ("change(ci): rip out nix ci for now").
Fetch the workspace deps with pkgs.importNpmLock instead. It resolves each
package from the lockfile's own integrity hashes, so package-lock.json is the
single source of truth and there is no separate hash to drift.
This also removes:
- the fix-lockfiles checker/refresher and its devShell wiring — it existed
only to keep npmDepsHash in sync, so it is dead once the hash is gone, and
its sole CI consumer was already removed in 9eb0bcd6;
- the patchPhase that normalized lockfile trailing newlines — importNpmLock's
npmConfigHook overwrites the lockfile rather than diffing it, so the
normalization is unnecessary.
npm-lockfile-fix is retained: importNpmLock requires an integrity-complete
lockfile, which that tool guarantees when the lockfile is regenerated.
Co-authored-by: ak2k <19240940+ak2k@users.noreply.github.com>
Two follow-up fixes on top of the cherry-picked structured-sync work:
- _messages_to_openviking_batch only added a recall tool result's id to
skipped_tool_ids when the id was non-empty. An empty tool_call_id (which
the canonical transcript can carry; agent_runtime_helpers defaults it to
"") poisoned the skip set with "", silently dropping any *other* tool
result that also lacked an id. Move the recall-skip add inside the
existing `if tool_id:` guard. Adds a regression test (mutation-checked:
fails on pre-fix code, passes after).
- _sync_trace_enabled() open-coded the canonical truthy-env check; reuse
utils.env_var_enabled (byte-identical {1,true,yes,on} semantics).
* feat(skills): add html-artifact skill, fold in sketch + architecture-diagram + concept-diagrams
Adds a unified `html-artifact` creative skill that produces self-contained,
single-file HTML artifacts — concept explainers, implementation plans,
status/incident reports, code-review walkthroughs, technical + educational
SVG diagrams, multi-variant design comparisons, and throwaway editors that
export their state back to the clipboard. Grounded in Anthropic's
html-effectiveness gallery (MIT); the house style (token block, serif/sans/
mono split, hand-rolled diffs, inline-SVG diagrams, graceful degradation) is
distilled from reading all 20 reference files.
Supersedes and removes three overlapping skills, folding their unique value in:
- sketch -> the fidelity dial (throwaway vs presentation) + the
multi-variant comparison layouts + the browser-vision
verify loop (references/fidelity-and-verify.md)
- architecture-diagram-> the dark "infra" token variant + double-rect masking +
semantic component palette (references/dark-tech.md,
templates/diagram.html infra mode)
- concept-diagrams -> the 9-ramp educational color system + the concept
archetype library (references/concept-archetypes.md,
the light design system in templates/diagram.html)
Structure:
- SKILL.md (description exactly 60 chars), 6 references, 3 templates
- templates verified by headless-Chrome render + vision inspection
- editor export logic (file://-safe clipboard, Promise-normalized) verified in node
Cross-references updated in claude-design (new disambiguation table row drawing
the design-taste vs information-artifact boundary), design-md, pretext, spike,
and kanban-video-orchestrator. Website skill docs + catalogs regenerated;
stale EN/zh-Hans per-skill pages pruned and i18n cross-refs fixed.
Not folded (intentionally orthogonal): excalidraw (.excalidraw JSON), p5js
(generative canvas), claude-design / popular-web-designs / design-md (visual
design taste / brand vocab / token spec).
* feat(skills): ship html-effectiveness gallery as fetched reference examples
Add scripts/fetch-examples.sh (idempotent clone/pull of Anthropic's MIT
html-effectiveness gallery) + references/examples.md mapping each of the 20
example files to a mode so the agent reads the right worked example. The clone
lands in references/examples/ and is gitignored (it's a 384KB upstream repo,
not vendored). SKILL.md workflow + reference list now point at it; falls back to
the distilled pattern references when offline.
* feat(skills): make reading a gallery example a required authoring step
Reading the matching html-effectiveness example is now workflow step 2 (was an
optional aside in step 3): fetch the gallery, read_file the file for your mode,
mirror its structure. Models skip optional steps; the examples are the ground
truth, so consulting one is mandatory. Added an 'Example' column to the
mode->build quick-reference table and a 'don't skip the example' pitfall.
Also dogfooded the skill: read 03-code-review-pr.html and 13-flowchart-diagram.html
raw and reconciled the distilled references against source — aligned diff-row tint
opacity to the source's 0.15 (was 0.18) and added the .ctx/.hunk rows in
house-style.md + base.html so they match 03-code-review-pr.html verbatim.
* docs(skills): explain the consolidation + bundled-vs-optional rationale
The supersession note only stated *what* was folded, not *why* the prune is
sound. Expand SKILL.md's intro into a 'Why this skill exists' section: the three
former skills emitted the same artifact and overlapped, so consolidating removes
which-one-do-I-load ambiguity; and the optional->bundled promotion of
concept-diagrams is footprint-safe because this skill has zero deps (only cost is
the 60-char description; everything else is progressive-disclosure). States the
bundling dividing line explicitly: zero install cost + broadly useful gets
bundled, real install cost (hyperframes: Node+FFmpeg+Chromium) stays optional.
Regenerated website per-skill page to match.
Follow-up to the salvaged hosted /exit fix. Instead of a separate 4-env-var
fingerprint (HERMES_TUI_INLINE + /opt/data HERMES_HOME + HERMES_WRITE_SAFE_ROOT
+ HERMES_DISABLE_LAZY_INSTALLS), gate /exit and /quit on the existing
DASHBOARD_TUI_MODE flag (HERMES_TUI_DASHBOARD) that the keyboard idle-exit
(useInputHandlers) and SIGINT-ignore (entry.tsx) paths already use. One hosted
detection mechanism instead of two divergent ones.
Extract the refusal text to an exported DASHBOARD_EXIT_DISABLED_MESSAGE so the
test asserts the same source of truth as production (no change-detector on the
literal). Test mocks only the DASHBOARD_TUI_MODE export via importActual so the
other env exports stay real.
- Drop empty entries before validating SLACK_ALLOWED_USERS so a trailing or
interior comma (which the gateway silently tolerates in
gateway/platforms/slack.py) is no longer rejected at the dashboard.
- Hoist the member-ID regex to a module-level _SLACK_MEMBER_ID_RE constant
and note it stays in sync with the frontend SLACK_MEMBER_ID_RE.
- Add a regression test for the trailing-comma case.
The new SLACK_ALLOWED_USERS validation rejected '*', but the Slack gateway
honors '*' as an allow-all wildcard (gateway/platforms/slack.py DM auth,
slash-confirm, and approval-button paths). Accept '*' as a valid list entry
in both the API validator and the dashboard form so a value the runtime
honors is no longer blocked at setup.
* fix(relay): enable RELAY platform + normalize dial URL so hosted gateways actually connect
Three bugs blocked a self-provisioned hosted gateway from ever establishing its
inbound relay WS (found while standing up the live staging end-to-end). Each
masked the next; all three are needed for inbound to work.
1. RELAY platform never enabled in config.platforms (gateway/config.py).
register_relay_adapter() puts the adapter in the platform_registry, but
start_gateway()'s connect loop iterates self.config.platforms — which never
contained Platform.RELAY. So the adapter was "registered" but never connected
(logs showed "relay adapter registered" then "No messaging platforms
enabled"). Fix: _apply_env_overrides now enables Platform.RELAY (mirroring
relay_url into extra for the connected-checker) when GATEWAY_RELAY_URL (env)
or gateway.relay_url (yaml) is set. Absent -> no RELAY entry (direct/
single-tenant gateways unaffected).
2. URL scheme not converted for the WS dial (gateway/relay/ws_transport.py).
The relay URL is configured once as the http(s):// base (used as-is for the
provision POST), but websockets.connect rejects http(s):// with "scheme isn't
ws or wss". Fix: _ws_dial_url converts https->wss / http->ws.
3. /relay path not appended (same helper). The connector mounts its
WebSocketServer at path "/relay" and returns HTTP 400 on an upgrade to any
other path. GATEWAY_RELAY_URL is the base (no /relay), so the dial hit "/"
-> 400. Fix: _ws_dial_url ensures the path ends in /relay. Idempotent — a URL
already carrying ws(s):// and/or /relay is unchanged, so provision's
_provision_url (which derives /relay/provision from either form) still works.
Why the cross-repo E2E missed #2/#3: the stub connector binds ws://host:port and
its websockets.serve accepts ANY path, so neither the scheme nor the /relay path
was exercised. Real connector needs both.
Verified live on staging hermes-agent-stg-automated-perception-5054: after the
fixes the gateway logs "Connecting to relay..." -> "✓ relay connected" ->
"Gateway running with 1 platform(s)" against
wss://gateway-gateway.staging-nousresearch.com/relay, stable.
Tests: added _ws_dial_url scheme+path+idempotency cases (test_ws_transport.py)
and RELAY-platform-enablement cases for env + yaml + absent (test_config.py).
Full gateway/relay + config suites green (191 passed).
Relay-adapter lane. EXPERIMENTAL.
* fix(relay): re-attach guild_id to outbound so connector egress resolves the tenant
The final bug in the hosted-relay round-trip. Inbound worked end to end (Discord
-> connector -> bus -> agent WS -> agent runs -> reply), but the reply's egress
was declined by the connector: "discord egress declined: target not routed to an
onboarded tenant".
Cause: the connector's routedEgressGuard resolves the owning tenant from the
OUTBOUND action's metadata.guild_id (Discord's routing discriminator). The
gateway's generic delivery path builds outbound metadata via
run.py _thread_metadata_for_source, which only carries thread_id (and returns
None entirely for a non-threaded message) — so guild_id never reached the
connector, tenant resolution failed, and the shared bot refused to post.
Fix (relay-adapter-local, no perturbation of the generic delivery path or other
platforms): RelayAdapter learns chat_id -> guild_id from each inbound event
(_capture_scope) and re-attaches it to the outbound action's metadata in send()
(_with_scope) when not already present. No-op for chats we never saw inbound
(e.g. DMs) and never overwrites an explicit guild_id.
Verified live on staging hermes-agent-stg-automated-perception-5054: an
@mention in #general now produces a visible bot reply — full multi-tenant relay
round-trip (real Discord -> shared connector bot -> tenant routing -> agent WS ->
reply egress -> Discord).
Tests: _capture_scope/_with_scope reattach, no-scope no-op, explicit-guild_id
preserved (test_relay_adapter.py). Full relay + config suites green (160 passed).
Relay-adapter lane. EXPERIMENTAL.
systemctl --user restart hermes-gateway run via the terminal tool is a
child of the gateway itself. When systemd delivers SIGTERM the gateway
kills this subprocess before it can complete, so the service may never
restart — reproducing issue #37453.
The hermes gateway restart/stop guard (hermes_cli/gateway.py) and the
cron-path guard (hermes_cli/cron.py) already block equivalent commands
in their respective paths but the terminal tool had no such defense.
Add a hard-block before command execution in terminal_tool: when
_HERMES_GATEWAY=1 and the command matches _contains_gateway_lifecycle_command,
return an error immediately. force=True cannot bypass it — unlike the
normal dangerous-command approval flow, here even a user-approved restart
would fail because the SIGTERM propagates to child processes.
Also extend _GATEWAY_LIFECYCLE_PATTERNS to match systemctl with flags
(e.g. systemctl --user restart) — the previous regex required the
action word immediately after systemctl with no flags in between.
Adds 9 regression tests: 6 blocked variants (parametrized), force bypass
attempt, safe systemctl passthrough, and guard-inactive-outside-gateway.
* feat(image-gen): add image-to-image / editing to image_generate
Brings image generation to parity with video generation: the unified
image_generate tool now edits/transforms a source image (image-to-image)
when given image_url / reference_image_urls, routing to each backend's
edit endpoint, exactly as video_generate routes to image-to-video.
- ImageGenProvider ABC: generate() gains keyword-only image_url +
reference_image_urls; new capabilities() declares modalities +
max_reference_images (defaults to text-only, backward compatible).
success_response gains a modality field; adds normalize_reference_images.
- image_generate tool: schema exposes image_url + reference_image_urls;
dynamic schema reflects the active model's actual edit capability so the
agent knows when image_url is honored. Handler + plugin dispatch forward
the new inputs; legacy/text-only providers get a clear modality_unsupported
error instead of silently dropping the source image.
- In-tree FAL: 7 models gain edit endpoints (flux-2-klein, flux-2-pro,
nano-banana-pro, gpt-image-1.5, gpt-image-2, ideogram/v3, qwen-image)
with per-model edit_supports whitelists + reference caps; routes to the
/edit endpoint and skips the upscaler for edits.
- Plugins: openai (images.edit, 16 refs), xai (/v1/images/edits via
grok-imagine-image-quality, JSON body per xAI docs), krea
(image_style_references, 10 refs). openai-codex stays text-only and
rejects edits with an actionable error.
- Tests: 15 new (payload, routing, dispatch forwarding, dynamic schema,
capabilities); updated 2 change-detector/lambda tests for the new schema.
- Docs: image-generation feature page, image-gen provider plugin guide,
tools reference.
* fix(image-gen): preserve legacy passthrough in fal/krea plugin tests
Two existing plugin tests asserted pre-image-to-image behavior:
- fal: forward image_url/reference_image_urls only when supplied, so a
text-to-image delegation stays byte-identical (no None kwargs).
- krea: keep dict-shaped image_style_references refs verbatim (the unified
string refs go through normalize_reference_images; legacy non-string ref
objects pass through unchanged) — fixes KeyError when callers pass the
richer Krea ref-object shape.
* fix(image-gen): clearer not-capable message for text-to-image-only models
When a text-to-image-only model (incl. gpt-image-2 on the Codex OAuth path,
which can't do editing through the Responses image_generation tool) gets a
source image, say 'this model is not capable of image-to-image / editing —
provide a text-only prompt' rather than sending the user shopping for other
backends. Applies to the openai-codex guard, the in-tree FAL no-edit-endpoint
error, and the dynamic tool-schema text-only line.
The desktop model picker had no way to force a fresh model fetch: model.options
went through the 1h-cached provider_models_cache.json, and there was no flag to
bust it. When a provider's cached list expired and its next live fetch failed,
the picker fell back to the curated static list — silently dropping live-only
models (e.g. OpenCode Zen's free tier like deepseek-v4-flash-free) the user had
been using.
- Thread refresh through model.options (RPC + REST /api/model/options) ->
build_models_payload -> list_authenticated_providers, which calls
clear_provider_models_cache() up front when set so every row re-fetches live.
- Add a 'Refresh Models' control to the desktop picker (5-locale i18n, spinning
sync icon). Normal opens leave refresh=false to stay snappy on the cache.
Verified: stale cache hides deepseek-v4-flash-free -> refresh busts it -> live
re-fetch surfaces it. refresh=false never touches the cache.