* feat(plugins): pluggable image_gen backends + OpenAI provider
Adds a ImageGenProvider ABC so image generation backends register as
bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner
gains three primitives to make this work generically:
- `kind:` manifest field (`standalone` | `backend` | `exclusive`).
Bundled `kind: backend` plugins auto-load — no `plugins.enabled`
incantation. User-installed backends stay opt-in.
- Path-derived keys: `plugins/image_gen/openai/` gets key
`image_gen/openai`, so a future `tts/openai` cannot collide.
- Depth-2 recursion into category namespaces (parent dirs without a
`plugin.yaml` of their own).
Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5
default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64
responses save to `$HERMES_HOME/cache/images/`; URL responses pass
through.
FAL stays in-tree for this PR — a follow-up ports it into
`plugins/image_gen/fal/` so the in-tree `image_generation_tool.py`
slims down. The dispatch shim in `_handle_image_generate` only fires
when `image_gen.provider` is explicitly set to a non-FAL value, so
existing FAL setups are untouched.
- 41 unit tests (scanner recursion, kind parsing, gate logic,
registry, OpenAI payload shapes)
- E2E smoke verified: bundled plugin autoloads, registers, and
`_handle_image_generate` routes to OpenAI when configured
* fix(image_gen/openai): don't send response_format to gpt-image-*
The live API rejects it: 'Unknown parameter: response_format'
(verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return
b64_json unconditionally, so the parameter was both unnecessary and
actively broken.
* feat(image_gen/openai): gpt-image-2 only, drop legacy catalog
gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21)
and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 /
dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward
(dall-e-2 squares only). Trim the catalog down to a single model.
Live-verified end-to-end: landscape 1536x1024 render of a Moog-style
synth matches prompt exactly, 2.4MB PNG saved to cache.
* feat(image_gen/openai): expose gpt-image-2 as three quality tiers
Users pick speed/fidelity via the normal model picker instead of a
hidden quality knob. All three tier IDs resolve to the single underlying
gpt-image-2 API model with a different quality parameter:
gpt-image-2-low ~15s fast iteration
gpt-image-2-medium ~40s default
gpt-image-2-high ~2min highest fidelity
Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the
same 1024x1024 prompt.
Config:
image_gen.openai.model: gpt-image-2-high
# or
image_gen.model: gpt-image-2-low
# or env var for scripts/tests
OPENAI_IMAGE_MODEL=gpt-image-2-medium
Live-verified end-to-end with the low tier: 18.8s landscape render of a
golden retriever in wildflowers, vision-confirmed exact match.
* feat(tools_config): plugin image_gen providers inject themselves into picker
'hermes tools' → Image Generation now shows plugin-registered backends
alongside Nous Subscription and FAL.ai without tools_config.py needing
to know about them. OpenAI appears as a third option today; future
backends appear automatically as they're added.
Mechanism:
- ImageGenProvider gains an optional get_setup_schema() hook
(name, badge, tag, env_vars). Default derived from display_name.
- tools_config._plugin_image_gen_providers() pulls the schemas from
every registered non-FAL plugin provider.
- _visible_providers() appends those rows when rendering the Image
Generation category.
- _configure_provider() handles the new image_gen_plugin_name marker:
writes image_gen.provider and routes to the plugin's list_models()
catalog for the model picker.
- _toolset_needs_configuration_prompt('image_gen') stops demanding a
FAL key when any plugin provider reports is_available().
FAL is skipped in the plugin path because it already has hardcoded
TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up
PR the hardcoded rows go away and it surfaces through the same path
as OpenAI.
Verified live: picker shows Nous Subscription / FAL.ai / OpenAI.
Picking OpenAI prompts for OPENAI_API_KEY, then shows the
gpt-image-2-low/medium/high model picker sourced from the plugin.
397 tests pass across plugins/, tools_config, registry, and picker.
* fix(image_gen): close final gaps for plugin-backend parity with FAL
Two small places that still hardcoded FAL:
- hermes_cli/setup.py status line: an OpenAI-only setup showed
'Image Generation: missing FAL_KEY'. Now probes plugin providers
and reports '(OpenAI)' when one is_available() — or falls back to
'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured.
- image_generate tool schema description: said 'using FAL.ai, default
FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are
user-configured' — and notes the 'image' field can be a URL or an
absolute path, which the gateway delivers either way via
extract_local_files().
Surfaces the free variant alongside the paid minimax-m2.5 entry in
both the OPENROUTER_MODELS fallback snapshot and the nous/openrouter
provider model list.
Kimi's /coding endpoint speaks the Anthropic Messages protocol but has
its own thinking semantics: when thinking.enabled is sent, Kimi validates
the history and requires every prior assistant tool-call message to carry
OpenAI-style reasoning_content. The Anthropic path never populates that
field, and convert_messages_to_anthropic strips Anthropic thinking blocks
on third-party endpoints — so after one tool-calling turn the next request
fails with:
HTTP 400: thinking is enabled but reasoning_content is missing in
assistant tool call message at index N
Kimi on chat_completions handles thinking via extra_body in
ChatCompletionsTransport (#13503). On the Anthropic route, drop the
parameter entirely and let Kimi drive reasoning server-side.
build_anthropic_kwargs now gates the reasoning_config -> thinking block
on not _is_kimi_coding_endpoint(base_url).
Tests: 8 new parametric tests cover /coding, /coding/v1, /coding/anthropic,
/coding/ (trailing slash), explicit disabled, other third-party endpoints
still getting thinking (MiniMax), native Anthropic unaffected, and the
non-/coding Kimi root route.
Remove nvidia/nemotron-3-super-120b-a12b:free, arcee-ai/trinity-large-preview:free,
and openrouter/elephant-alpha from _PROVIDER_MODELS['nous']. The paid nemotron and
arcee-thinking variants remain.
Fourth and final transport — completes the transport layer with all four
api_modes covered. Wraps agent/bedrock_adapter.py behind the ProviderTransport
ABC, handles both raw boto3 dicts and already-normalized SimpleNamespace.
Wires all transport methods to production paths in run_agent.py:
- build_kwargs: _build_api_kwargs bedrock branch
- validate_response: response validation, new bedrock_converse branch
- finish_reason: new bedrock_converse branch in finish_reason extraction
Based on PR #13467 by @kshitijk4poor, with one adjustment: the main normalize
loop does NOT add a bedrock_converse branch to invoke normalize_response on
the already-normalized response. Bedrock's normalize_converse_response runs
at the dispatch site (run_agent.py:5189), so the response already has the
OpenAI-compatible .choices[0].message shape by the time the main loop sees
it. Falling through to the chat_completions else branch is correct and
sidesteps a redundant NormalizedResponse rebuild.
Transport coverage — complete:
| api_mode | Transport | build_kwargs | normalize | validate |
|--------------------|--------------------------|:------------:|:---------:|:--------:|
| anthropic_messages | AnthropicTransport | ✅ | ✅ | ✅ |
| codex_responses | ResponsesApiTransport | ✅ | ✅ | ✅ |
| chat_completions | ChatCompletionsTransport | ✅ | ✅ | ✅ |
| bedrock_converse | BedrockTransport | ✅ | ✅ | ✅ |
17 new BedrockTransport tests pass. 117 transport tests total pass.
160 bedrock/converse tests across tests/agent/ pass. Full tests/run_agent/
targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the
pre-existing test_concurrent_interrupt flake on origin/main).
Restore the old-CLI contract where only complete failures tint Activity
red. Everything else is still visible for debugging but no longer
commandeers attention.
- gateway.stderr: always tone='info' (drops the ERRLIKE_RE regex)
- gateway.protocol_error: both pushes demoted to 'info'
- commands.catalog cold-start failure: demoted to 'info'
- approval.request: no longer duplicates the overlay into Activity
Kept as 'error': terminal `error` event, gateway.start_timeout,
gateway-exited, explicit status.update kinds.
Reverts the auto-expand-on-new-error effect added in 93b47d96. The
effect overrode the user's chosen detailsMode and visually interrupted
every turn. Red/yellow chevron tint remains as the passive signal —
click to read, just like Thinking and Tool calls.
Third concrete transport — handles the default 'chat_completions' api_mode used
by ~16 OpenAI-compatible providers (OpenRouter, Nous, NVIDIA, Qwen, Ollama,
DeepSeek, xAI, Kimi, custom, etc.). Wires build_kwargs + validate_response to
production paths.
Based on PR #13447 by @kshitijk4poor, with fixes:
- Preserve tool_call.extra_content (Gemini thought_signature) via
ToolCall.provider_data — the original shim stripped it, causing 400 errors
on multi-turn Gemini 3 thinking requests.
- Preserve reasoning_content distinctly from reasoning (DeepSeek/Moonshot) so
the thinking-prefill retry check (_has_structured) still triggers.
- Port Kimi/Moonshot quirks (32000 max_tokens, top-level reasoning_effort,
extra_body.thinking) that landed on main after the original PR was opened.
- Keep _qwen_prepare_chat_messages_inplace alive and call it through the
transport when sanitization already deepcopied (avoids a second deepcopy).
- Skip the back-compat SimpleNamespace shim in the main normalize loop — for
chat_completions, response.choices[0].message is already the right shape
with .content/.tool_calls/.reasoning/.reasoning_content/.reasoning_details
and per-tool-call .extra_content from the OpenAI SDK.
run_agent.py: -239 lines in _build_api_kwargs default branch extracted to the
transport. build_kwargs now owns: codex-field sanitization, Qwen portal prep,
developer role swap, provider preferences, max_tokens resolution (ephemeral >
user > NVIDIA 16384 > Qwen 65536 > Kimi 32000 > anthropic_max_output), Kimi
reasoning_effort + extra_body.thinking, OpenRouter/Nous/GitHub reasoning,
Nous product attribution tags, Ollama num_ctx, custom-provider think=false,
Qwen vl_high_resolution_images, request_overrides.
39 new transport tests (8 build_kwargs, 5 Kimi, 4 validate, 4 normalize
including extra_content regression, 3 cache stats, 3 basic). Tests/run_agent/
targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the
test_concurrent_interrupt flake present on origin/main).
Wire the auxiliary client (compaction, vision, session search, web extract)
to the Nous Portal's curated recommended-models endpoint when running on
Nous Portal, with a TTL-cached fetch that mirrors how we pull /models for
pricing.
hermes_cli/models.py
- fetch_nous_recommended_models(portal_base_url, force_refresh=False)
10-minute TTL cache, keyed per portal URL (staging vs prod don't
collide). Public endpoint, no auth required. Returns {} on any
failure so callers always get a dict.
- get_nous_recommended_aux_model(vision, free_tier=None, ...)
Tier-aware pick from the payload:
- Paid tier → paidRecommended{Vision,Compaction}Model, falling back
to freeRecommended* when the paid field is null (common during
staged rollouts of new paid models).
- Free tier → freeRecommended* only, never leaks paid models.
When free_tier is None, auto-detects via the existing
check_nous_free_tier() helper (already cached 3 min against
/api/oauth/account). Detection errors default to paid so we never
silently downgrade a paying user.
agent/auxiliary_client.py — _try_nous()
- Replaces the hardcoded xiaomi/mimo free-tier branch with a single call
to get_nous_recommended_aux_model(vision=vision).
- Falls back to _NOUS_MODEL (google/gemini-3-flash-preview) when the
Portal is unreachable or returns a null recommendation.
- The Portal is now the source of truth for aux model selection; the
xiaomi allowlist we used to carry is effectively dead.
Tests (15 new)
- tests/hermes_cli/test_models.py::TestNousRecommendedModels
Fetch caching, per-portal keying, network failure, force_refresh;
paid-prefers-paid, paid-falls-to-free, free-never-leaks-paid,
auto-detect, detection-error → paid default, null/blank modelName
handling.
- tests/agent/test_auxiliary_client.py::TestNousAuxiliaryRefresh
_try_nous honors Portal recommendation for text + vision, falls
back to google/gemini-3-flash-preview on None or exception.
Behavior won't visibly change today — both tier recommendations currently
point at google/gemini-3-flash-preview — but the moment the Portal ships
a better paid recommendation, subscribers pick it up within 10 minutes
without a Hermes release.
Drop _NOUS_ALLOWED_FREE_MODELS + filter_nous_free_models and its two call
sites. Whatever Nous Portal prices as free now shows up in the picker as-is
— no local allowlist gatekeeping. Free-tier partitioning (paid vs free in
the menu) still runs via partition_nous_models_by_tier.
- Wrap child.run_conversation() in a ThreadPoolExecutor with configurable
timeout (delegation.child_timeout_seconds, default 300s) to prevent
indefinite blocking when a subagent's API call or tool HTTP request hangs.
- Add heartbeat stale detection: if a child's api_call_count doesn't
advance for 5 consecutive heartbeat cycles (~2.5 min), stop touching
the parent's activity timestamp so the gateway inactivity timeout
can fire as a last resort.
- Add 'timeout' as a new exit_reason/status alongside the existing
completed/max_iterations/interrupted states.
- Use shutdown(wait=False) on the timeout executor to avoid the
ThreadPoolExecutor.__exit__ deadlock when a child is stuck on
blocking I/O.
Closes#13768
Add ResponsesApiTransport wrapping codex_responses_adapter.py behind the
ProviderTransport ABC. Auto-registered via _discover_transports().
Wire ALL Codex transport methods to production paths in run_agent.py:
- build_kwargs: main _build_api_kwargs codex branch (50 lines extracted)
- normalize_response: main loop + flush + summary + retry (4 sites)
- convert_tools: memory flush tool override
- convert_messages: called internally via build_kwargs
- validate_response: response validation gate
- preflight_kwargs: request sanitization (2 sites)
Remove 7 dead legacy wrappers from AIAgent (_responses_tools,
_chat_messages_to_responses_input, _normalize_codex_response,
_preflight_codex_api_kwargs, _preflight_codex_input_items,
_extract_responses_message_text, _extract_responses_reasoning_text).
Keep 3 ID manipulation methods still used by _build_assistant_message.
Update 18 test call sites across 3 test files to call adapter functions
directly instead of through deleted AIAgent wrappers.
24 new tests. 343 codex/responses/transport tests pass (0 failures).
PR 4 of the provider transport refactor.
Follow-ups after salvaging xiaoqiang243's kimi-for-coding patches:
- KIMI_CODE_BASE_URL: drop trailing /v1 (was /coding/v1).
The /coding endpoint speaks Anthropic Messages, and the Anthropic SDK
appends /v1/messages internally. /coding/v1 + SDK suffix produced
/coding/v1/v1/messages (a 404). /coding + SDK suffix now yields
/coding/v1/messages correctly.
- kimi-coding ProviderConfig: keep legacy default api.moonshot.ai/v1 so
non-sk-kimi- moonshot keys still authenticate. sk-kimi- keys are
already redirected to api.kimi.com/coding via _resolve_kimi_base_url.
- doctor.py: update Kimi UA to claude-code/0.1.0 (was KimiCLI/1.30.0)
and rewrite /coding base URLs to /coding/v1 for the /models health
check (Anthropic surface has no /models).
- test_kimi_env_vars: accept KIMI_CODING_API_KEY as a secondary env var.
E2E verified:
sk-kimi-<key> → https://api.kimi.com/coding/v1/messages (Anthropic)
sk-<legacy> → https://api.moonshot.ai/v1/chat/completions (OpenAI)
UA: claude-code/0.1.0, x-api-key: <sk-kimi-*>
- Add _is_kimi_coding_endpoint() to detect Kimi coding API
- Place Kimi check BEFORE _requires_bearer_auth to ensure User-Agent header is set
- Without this header, Kimi returns 403 on /coding/v1/messages
- Fixes kimi-2.5, kimi-for-coding, kimi-k2.6-code-preview all returning 403
The CLI has no attachment channel — MEDIA:<path> tags are only
intercepted on messaging gateway platforms (Telegram, Discord,
Slack, WhatsApp, Signal, BlueBubbles, email, etc.). On the CLI
they render as literal text, which is confusing for users.
The CLI platform hint was the one PLATFORM_HINTS entry that said
nothing about file delivery, so models trained on the messaging
hints would default to MEDIA: tags on the CLI too. Tool schemas
(browser_tool, tts_tool, etc.) also recommend MEDIA: generically.
Extend the CLI hint to explicitly discourage MEDIA: tags and tell
the agent to reference files by plain absolute path instead.
Add a regression test asserting the CLI hint carries negative
guidance about MEDIA: while messaging hints keep positive guidance.
* fix(skills/baoyu-comic): require absolute paths for curl -o downloads
When downloading generated images across several batches of image_generate
calls, relying on persistent-shell CWD is unsafe. The terminal tool's shell
can rotate (TERMINAL_LIFETIME_SECONDS expiry, a failed cd that leaves the
shell somewhere else), and 'curl -fsSL <url> -o relative.png' then silently
writes to the wrong directory with no error.
Update the skill's Step 7 Download step to require absolute -o paths (or
workdir= on the terminal tool) and add a matching pitfall entry referencing
the Apr 2026 incident where pages 06-09 of a 10-page comic landed at the
repo root instead of comic/<slug>/. The agent then spent several turns
claiming the files existed where they didn't.
* fix(skills/baoyu-comic): handle clarify timeouts correctly in Step 2
A clarify timeout returning 'Use your best judgement to make the choice
and proceed' is NOT user consent to default the entire Step 2 questionnaire.
It is a per-question default only. Add guidance at both instruction sites
(SKILL.md User Questions section, references/workflow.md Step 2 header)
telling the agent to:
1. Continue asking the remaining questions in the sequence after a
timeout — each question is an independent consent point.
2. Surface every defaulted choice in the next user-visible message
so the user can correct it when they return. An unreported default
is indistinguishable from never having asked.
Reported live Apr 2026: agent asked style question via clarify, got a
timeout response, and silently defaulted style + narrative focus +
audience + review flags in one pass. User only learned style had
defaulted to 'ohmsha' after the comic was fully generated.
website/src/pages/skills/index.tsx imports ../../data/skills.json, but
that file is git-ignored and generated at build time by
website/scripts/extract-skills.py. CI workflows (deploy-site.yml,
docs-site-checks.yml) run the script explicitly before 'npm run build',
so production and PR checks always work — but 'npm run build' on a
contributor's machine fails with:
Module not found: Can't resolve '../../data/skills.json'
because the extraction step was never wired into the npm scripts.
Adds a prebuild/prestart hook that runs extract-skills.py automatically.
If python3 or pyyaml aren't installed locally, writes an empty
skills.json instead of hard-failing — the Skills Hub page renders with
an empty state, the rest of the site builds normally, and CI (which
always has the deps) still generates the full catalog for production.
Fills the three gaps left by the orchestrator/width-depth salvage:
- configuration.md §Delegation: max_concurrent_children, max_spawn_depth,
orchestrator_enabled are now in the canonical config.yaml reference
with a paragraph covering defaults, clamping, role-degradation, and
the 3x3x3=27-leaf cost scaling.
- environment-variables.md: adds DELEGATION_MAX_CONCURRENT_CHILDREN to
the Agent Behavior table.
- features/delegation.md: corrects stale 'default 5, cap 8' wording
(that was from the original PR; the salvage landed on default 3 with
no ceiling and a tool error on excess instead of truncation).
Page prompts are written in Step 5 from the text descriptions in
characters/characters.md — the PNG sheet generated in Step 7.1
cannot be used to write them. Reposition the PNG as a human-facing
review artifact (and reference for later regenerations / manual
edits), and drop the confusing "Character sheet | Strategy" tables
since the embedding rule is uniform.
- Remove PDF merge feature and scripts/ directory (no pdf-lib dep)
- Correct image_generate docs: prompt-only, returns URL; add
curl download step after every call
- Downgrade reference images to text-based trait extraction
(style/palette/scene); character sheet is agent-facing reference
- Unify source file naming on source-{slug}.md across SKILL.md
and workflow.md
Port the upstream baoyu-comic skill to Hermes' tool ecosystem, matching
the earlier baoyu-infographic adaptation:
- metadata namespace openclaw -> hermes (+ tags, homepage)
- drop EXTEND.md preferences system (references/config/ removed,
workflow Step 1.1 removed)
- user prompts via clarify (one question at a time) instead of
AskUserQuestion batches
- image generation via image_generate instead of baoyu-imagine, with
aspect-ratio mapping to landscape/portrait/square
- Windows/PowerShell/WSL shell snippets dropped
- file I/O referenced via Hermes write_file/read_file tools
- CLI-style --flags converted to natural-language options and
user-intent cues (skill matching has no slash command trigger)
Add PORT_NOTES.md documenting the adaptations and a sync procedure.
Art-style/tone/layout reference files are preserved verbatim from
upstream v1.56.1.
A single global MAX_TEXT_LENGTH = 4000 truncated every TTS provider at
4000 chars, causing long inputs to be silently chopped even though the
underlying APIs allow much more:
- OpenAI: 4096
- xAI: 15000
- MiniMax: 10000
- ElevenLabs: 5000 / 10000 / 30000 / 40000 (model-aware)
- Gemini: ~5000
- Edge: ~5000
The schema description also told the model 'Keep under 4000 characters',
which encouraged the agent to self-chunk long briefs into multiple TTS
calls (producing 3 separate audio files instead of one).
New behavior:
- PROVIDER_MAX_TEXT_LENGTH table + ELEVENLABS_MODEL_MAX_TEXT_LENGTH
encode the documented per-provider limits.
- _resolve_max_text_length(provider, cfg) resolves:
1. tts.<provider>.max_text_length user override
2. ElevenLabs model_id lookup
3. provider default
4. 4000 fallback
- text_to_speech_tool() and stream_tts_to_speaker() both call the
resolver; old MAX_TEXT_LENGTH alias kept for back-compat.
- Schema description no longer hardcodes 4000.
Tests: 27 new unit + E2E tests; all 53 existing TTS tests and 253
voice-command/voice-cli tests still pass.
After the prior inline-diff fix, the gateway still prepends a literal
" ┊ review diff" line to inline_diff (it's terminal chrome written by
`_emit_inline_diff`). Wrapping that in a ```diff fence left that header
inside the code block. The agent also often narrates its own edit in a
second fenced diff, so the assistant message ended up stacking two
diff blocks for the same change.
- Strip the leading "┊ review diff" header from queued inline diffs
before fencing.
- Skip appending the fenced diff entirely when the assistant already
wrote its own ```diff (or ```patch) fence.
Keeps the single-surface diff UX even when the agent is chatty.
When tool.complete already carries inline_diff, the assistant message owns the full diff block. Suppress the tool-row summary/detail in that case so the turn shows one detailed diff surface instead of a rich diff plus a duplicated tool-detail payload.
Avoid duplicate diff rendering in #13729 flow. We now skip queued inline diffs that are already present in final assistant text and dedupe repeated queued diffs by exact content.
Follow-up for #13729: segment-level system artifacts still looked detached in real flow.\n\nInstead of appending inline_diff as a standalone segment/system row, queue sanitized diffs during tool.complete and append them as a fenced diff block to the assistant completion text on message.complete. This keeps the diff in the same message flow as the assistant response.
Follow-up on multiline arrow behavior: Up/Down now fall back to queue/history whenever there is no logical line above/below the caret (not only at absolute start/end character positions). This makes Up from the end of the top line cycle history, matching expected readline-ish behavior.
Follow-up on #13724: showing literally every source was too noisy.\n\n now fetches a wider window (, larger limit) and then filters to a curated allowlist of human-facing sources (tui/cli plus chat adapters like telegram/discord/slack/whatsapp/etc). This keeps row #7 fixed (telegram sessions visible in /resume) without surfacing internal source kinds such as tool/acp.
Follow-up on #13729 from blitz screenshot feedback.\n\n- When tool.complete carried inline_diff but no buffered assistant text existed, pending tool rows were still in streamPendingTools, so diff rendered above the tool row section. appendSegmentMessage now emits pending tool rows as a trail segment before appending the diff artifact.\n- Strip ANSI color escapes from inline_diff payloads so we don't render loud red/green terminal palettes in the transcript.
Follow-up on #13726 from blitz feedback: Up/Down history cycling should only trigger when the caret is at the start/end boundary (or the input is empty).\n\nPreviously useInputHandlers intercepted arrows whenever inputBuf was empty, which still stole Up/Down from normal multiline editing. textInput now publishes caret position through inputSelectionStore even with no active selection, and useInputHandlers gates history/queue cycling on those boundaries.
* feat(models): hide OpenRouter models that don't advertise tool support
Port from Kilo-Org/kilocode#9068.
hermes-agent is tool-calling-first — every provider path assumes the
model can invoke tools. Models whose OpenRouter supported_parameters
doesn't include 'tools' (e.g. image-only or completion-only models)
cannot be driven by the agent loop and fail at the first tool call.
Filter them out of fetch_openrouter_models() so they never appear in
the model picker (`hermes model`, setup wizard, /model slash command).
Permissive when the field is missing — OpenRouter-compatible gateways
(Nous Portal, private mirrors, older snapshots) don't always populate
supported_parameters. Treat missing as 'unknown → allow' rather than
silently emptying the picker on those gateways. Only hide models
whose supported_parameters is an explicit list that omits tools.
Tests cover: tools present → kept, tools absent → dropped, field
missing → kept, malformed non-list → kept, non-dict item → kept,
empty list → dropped.
* feat(delegate): cross-agent file state coordination for concurrent subagents
Prevents mangled edits when concurrent subagents touch the same file
(same process, same filesystem — the mangle scenario from #11215).
Three layers, all opt-out via HERMES_DISABLE_FILE_STATE_GUARD=1:
1. FileStateRegistry (tools/file_state.py) — process-wide singleton
tracking per-agent read stamps and the last writer globally.
check_stale() names the sibling subagent in the warning when a
non-owning agent wrote after this agent's last read.
2. Per-path threading.Lock wrapped around the read-modify-write
region in write_file_tool and patch_tool. Concurrent siblings on
the same path serialize; different paths stay fully parallel.
V4A multi-file patches lock in sorted path order (deadlock-free).
3. Delegate-completion reminder in tools/delegate_tool.py: after a
subagent returns, writes_since(parent, child_start, parent_reads)
appends '[NOTE: subagent modified files the parent previously
read — re-read before editing: ...]' to entry.summary when the
child touched anything the parent had already seen.
Complements (does not replace) the existing path-overlap check in
run_agent._should_parallelize_tool_batch — batch check prevents
same-file parallel dispatch within one agent's turn (cheap prevention,
zero API cost), registry catches cross-subagent and cross-turn
staleness at write time (detection).
Behavior is warning-only, not hard-failing — matches existing project
style. Errors surface naturally: sibling writes often invalidate the
old_string in patch operations, which already errors cleanly.
Tests: tests/tools/test_file_state_registry.py — 16 tests covering
registry state transitions, per-path locking, per-path-not-global
locking, writes_since filtering, kill switch, and end-to-end
integration through the real read_file/write_file/patch handlers.
Reported during TUI v2 blitz retest: code-review diffs from tool.complete
appeared at the top of the current interaction thread, out of sequence
with the agent's messages and tool rows below them.
Root cause — `sys(inline_diff)` appends to `historyItems`, which sits
above the `StreamingAssistant` pane that renders the active turn.
Until the turn closed, the diff visually floated above everything
else happening in the same turn.
Route the diff through `turnController.appendSegmentMessage` instead
so it flushes any pending streaming text first, then lands in the
segment stream beside assistant output and tool calls. On
`message.complete` the segment list is committed to history in emit
order (diff → final text), matching what the gateway sent.
Adds a regression test that exercises tool.complete → message.complete
with an inline_diff payload and asserts both the streaming and final
placement.
Reported during TUI v2 blitz retest: `/history` in the TUI only shows
prompts from non-TUI Hermes runs and can't scroll the window. Root
cause is the slash-worker subprocess: it's a detached HermesCLI that
never sees the TUI's turns, so its `conversation_history` starts empty
and `show_history` surfaces whatever was persisted from earlier CLI
sessions — not what the user just did inside the TUI.
Intercept `/history` as a local slash command so it dumps
`ctx.local.getHistoryItems()` — the TUI's own transcript — routed
through the pager (which scrolls after #13591). Accepts an optional
preview-length argument (default 400 chars per message).
Adds createSlashHandler coverage.
Reported during TUI v2 blitz retest: typing a multi-line message with
shift-Enter and then pressing Up to edit an earlier line swapped the
whole buffer for the previous history entry instead of moving the
cursor up a line. Down then restored the draft → the buffer appeared
to "flip" between the draft and a prior prompt.
`useInputHandlers` cycles history on Up/Down, but textInput only
checked `inputBuf.length` — that only counts lines committed with a
trailing backslash, not shift-Enter newlines inside `input` itself.
Fix: detect logical lines inside the input string and move the cursor
one line up/down preserving column offset (clamp to line end when the
destination is shorter, standard editor behavior). Only fall through
to history cycling when the cursor is already on the first line (Up)
or last line (Down).
Adds unit coverage for the new `lineNav` helper.
Reported during TUI v2 blitz retest: /resume modal only surfaced tui/cli
rows, even though `hermes --tui --resume <id>` with a pasted telegram
session id works fine. The handler double-fetched with explicit
`source="tui"` and `source="cli"` filters and dropped everything else on
the floor.
Drop the filter — list_sessions_rich(source=None) already excludes
child sessions (subagents, compression continuations) via its default,
and users want to resume messenger sessions from inside the TUI.
Adds gateway regression coverage.
- Drop the outer no-op capture group from INLINE_RE and restructure the
source as an ordered list of patterns-with-index-comments so each
alternative is individually greppable. Shift group indices in MdInline
down by one accordingly.
- Inline single-use helpers (parseFence, isFenceClose, isMarkdownFence,
trimBareUrl) and intermediate variables (path, lang, raw, prefix, body,
depth, task body, setext match, etc.).
- Hoist block-level regexes used inside MdImpl (FENCE_CLOSE_RE, SETEXT_RE,
BULLET_RE, TASK_RE, NUMBERED_RE, QUOTE_RE) to top-level consts so
they're compiled once instead of per-line.
- Collapse the duplicate compact-vs-normal blank-line branches into one
if/!compact gap call.
- Move Fence and MdProps types to the bottom per house style.
- Shorten splitTableRow → splitRow and use optional chaining in a few
match sites.
No behavior change; 162/162 tests pass. Net -22 LoC.
The inline markdown regex had `~([^~\s][^~]*?)~` for Pandoc-style subscript
(H~2~O, CO~2~). On models that decorate prose with kaomoji like `thing ~!`
and `cool ~?` — Kimi especially — the opener `~!` paired with the next
stray `~` on the line and dim-formatted everything between them with a
leading `_` character, mangling markdown output.
Tighten the pattern to short alphanumeric-only content (`~[A-Za-z0-9]{1,8}~`)
since real subscript never contains punctuation, spaces, or long runs.
Same tightening applied to stripInlineMarkup so width measurement stays
consistent. Classic CLI was unaffected because it renders these literally.