* feat(skills): add 'hermes skills reset' to un-stick bundled skills
When a user edits a bundled skill, sync flags it as user_modified and
skips it forever. The problem: if the user later tries to undo the edit
by copying the current bundled version back into ~/.hermes/skills/, the
manifest still holds the old origin hash from the last successful
sync, so the fresh bundled hash still doesn't match and the skill stays
stuck as user_modified.
Adds an escape hatch for this case.
hermes skills reset <name>
Drops the skill's entry from ~/.hermes/skills/.bundled_manifest and
re-baselines against the user's current copy. Future 'hermes update'
runs accept upstream changes again. Non-destructive.
hermes skills reset <name> --restore
Also deletes the user's copy and re-copies the bundled version.
Use when you want the pristine upstream skill back.
Also available as /skills reset in chat.
- tools/skills_sync.py: new reset_bundled_skill(name, restore=False)
- hermes_cli/skills_hub.py: do_reset() + wired into skills_command and
handle_skills_slash; added to the slash /skills help panel
- hermes_cli/main.py: argparse entry for 'hermes skills reset'
- tests/tools/test_skills_sync.py: 5 new tests covering the stuck-flag
repro, --restore, unknown-skill error, upstream-removed-skill, and
no-op on already-clean state
- website/docs/user-guide/features/skills.md: new 'Bundled skill updates'
section explaining the origin-hash mechanic + reset usage
* fix(nous): respect 'Skip (keep current)' after OAuth login
When a user already set up on another provider (e.g. OpenRouter) runs
`hermes model` and picks Nous Portal, OAuth succeeds and then a model
picker is shown. If the user picks 'Skip (keep current)', the previous
provider + model should be preserved.
Previously, \_update_config_for_provider was called unconditionally after
login, which flipped config.yaml model.provider to 'nous' while keeping
the old model.default (e.g. anthropic/claude-opus-4.6 from OpenRouter),
leaving the user with a mismatched provider/model pair on the next
request.
Fix: snapshot the prior active_provider before login, and if no model is
selected (Skip, or no models available, or fetch failure), restore the
prior active_provider and leave config.yaml untouched. The Nous OAuth
tokens stay saved so future `hermes model` -> Nous works without
re-authenticating.
Test plan:
- New tests cover Skip path (preserves provider+model, saves creds),
pick-a-model path (switches to nous), and fresh-install Skip path
(active_provider cleared, not stuck as 'nous').
The cherry-picked SDK compat fix (previous commit) wired process() to
parse CallbackMessage.data into a ChatbotMessage, but _extract_text()
was still written against the pre-0.20 payload shape:
* message.text changed from dict {content: ...} → TextContent object.
The old code's str(text) fallback produced 'TextContent(content=...)'
as the agent's input, so every received message came in mangled.
* rich_text moved from message.rich_text (list) to
message.rich_text_content.rich_text_list.
This preserves legacy fallbacks (dict-shaped text, bare rich_text list)
while handling the current SDK layout via hasattr(text, 'content').
Adds regression tests covering:
* webhook domain allowlist (api.*, oapi.*, and hostile lookalikes)
* _IncomingHandler.process is a coroutine function
* _extract_text against TextContent object, dict, rich_text_content,
legacy rich_text, and empty-message cases
Also adds kevinskysunny to scripts/release.py AUTHOR_MAP (release CI
blocks unmapped emails).
When a user edits a bundled skill, sync flags it as user_modified and
skips it forever. The problem: if the user later tries to undo the edit
by copying the current bundled version back into ~/.hermes/skills/, the
manifest still holds the old origin hash from the last successful
sync, so the fresh bundled hash still doesn't match and the skill stays
stuck as user_modified.
Adds an escape hatch for this case.
hermes skills reset <name>
Drops the skill's entry from ~/.hermes/skills/.bundled_manifest and
re-baselines against the user's current copy. Future 'hermes update'
runs accept upstream changes again. Non-destructive.
hermes skills reset <name> --restore
Also deletes the user's copy and re-copies the bundled version.
Use when you want the pristine upstream skill back.
Also available as /skills reset in chat.
- tools/skills_sync.py: new reset_bundled_skill(name, restore=False)
- hermes_cli/skills_hub.py: do_reset() + wired into skills_command and
handle_skills_slash; added to the slash /skills help panel
- hermes_cli/main.py: argparse entry for 'hermes skills reset'
- tests/tools/test_skills_sync.py: 5 new tests covering the stuck-flag
repro, --restore, unknown-skill error, upstream-removed-skill, and
no-op on already-clean state
- website/docs/user-guide/features/skills.md: new 'Bundled skill updates'
section explaining the origin-hash mechanic + reset usage
Three tests in tests/test_plugin_skills.py and tests/hermes_cli/test_plugins.py
used caplog.at_level(logging.WARNING) without specifying a logger. When another
test earlier in the same xdist worker touched propagation on tools.skills_tool
or hermes_cli.plugins, caplog would miss the warning and the assertion would
fail intermittently in CI.
These three tests accounted for 15 of the last ~30 Tests workflow failures
(5 each), including the recent main failure on commit 436a7359 (PR #11398).
Fix: pass logger="tools.skills_tool" / logger="hermes_cli.plugins" to
caplog.at_level() so the handler attaches directly to the logger under test
and capture is independent of global propagation state.
Affected tests:
- tests/test_plugin_skills.py::TestSkillViewPluginGuards::test_injection_logged_but_served
- tests/hermes_cli/test_plugins.py::TestPluginCommands::test_register_command_empty_name_rejected
- tests/hermes_cli/test_plugins.py::TestPluginCommands::test_register_command_builtin_conflict_rejected
No production code change. Verified passing under xdist (-n 4) alongside
test_hermes_logging.py (the test most likely to poison the logger state).
Extends test_build_event_handler_registers_reaction_and_card_processors
to assert that register_p2_im_chat_access_event_bot_p2p_chat_entered_v1
and register_p2_im_message_recalled_v1 are called when building the
event handler, matching the production registrations.
Also adds Fatty911 to scripts/release.py AUTHOR_MAP for credit on the
salvaged event-handler fix.
* feat(image_gen): upgrade Recraft V3 → V4 Pro, Nano Banana → Pro
Upstream asked for these two upgrades ASAP — the old entries show
stale models when newer, higher-quality versions are available on FAL.
Recraft V3 → Recraft V4 Pro
ID: fal-ai/recraft-v3 → fal-ai/recraft/v4/pro/text-to-image
Price: $0.04/image → $0.25/image (6x — V4 Pro is premium tier)
Schema: V4 dropped the required `style` enum entirely; defaults
handle taste now. Added `colors` and `background_color`
to supports for brand-palette control. `seed` is not
supported by V4 per the API docs.
Nano Banana → Nano Banana Pro
ID: fal-ai/nano-banana → fal-ai/nano-banana-pro
Price: $0.08/image → $0.15/image (1K); $0.30 at 4K
Schema: Aspect ratio family unchanged. Added `resolution`
(1K/2K/4K, default 1K for billing predictability),
`enable_web_search` (real-time info grounding, +$0.015),
and `limit_generations` (force exactly 1 image).
Architecture: Gemini 2.5 Flash → Gemini 3 Pro Image. Quality
and reasoning depth improved; slower (~6s → ~8s).
Migration: users who had the old IDs in `image_gen.model` will
fall through the existing 'unknown model → default' warning path
in `_resolve_fal_model()` and get the Klein 9B default on the next
run. Re-run `hermes tools` → Image Generation to pick the new
version. No silent cost-upgrade aliasing — the 2-6x price jump
on these tiers warrants explicit user re-selection.
Portal note: both new model IDs need to be allowlisted on the
Nous fal-queue-gateway alongside the previous 7 additions, or
users on Nous Subscription will see the 'managed gateway rejected
model' error we added previously (which is clear and
self-remediating, just noisy).
* docs: wrap '<1s' in backticks to unblock MDX compilation
Docusaurus's MDX parser treats unquoted '<' as the start of JSX, and
'<1s' fails because '1' isn't a valid tag-name start character. This
was broken on main since PR #11265 (never noticed because
docs-site-checks was failing on OTHER issues at the time and we
admin-merged through it).
Wrapping in backticks also gives the cell monospace styling which
reads more cleanly alongside the inline-code model ID in the same row.
The other '<1s' occurrence (line 52) is inside a fenced code block
and is already safe — code fences bypass MDX parsing.
* feat(mcp-oauth): scaffold MCPOAuthManager
Central manager for per-server MCP OAuth state. Provides
get_or_build_provider (cached), remove (evicts cache + deletes
disk), invalidate_if_disk_changed (mtime watch, core fix for
external-refresh workflow), and handle_401 (dedup'd recovery).
No behavior change yet — existing call sites still use
build_oauth_auth directly. Task 1 of 8 in the MCP OAuth
consolidation (fixes Cthulhu's BetterStack reliability issues).
* feat(mcp-oauth): add HermesMCPOAuthProvider with pre-flow disk watch
Subclasses the MCP SDK's OAuthClientProvider to inject a disk
mtime check before every async_auth_flow, via the central
manager. When a subclass instance is used, external token
refreshes (cron, another CLI instance) are picked up before
the next API call.
Still dead code: the manager's _build_provider still delegates
to build_oauth_auth and returns the plain OAuthClientProvider.
Task 4 wires this subclass in. Task 2 of 8.
* refactor(mcp-oauth): extract build_oauth_auth helpers
Decomposes build_oauth_auth into _configure_callback_port,
_build_client_metadata, _maybe_preregister_client, and
_parse_base_url. Public API preserved. These helpers let
MCPOAuthManager._build_provider reuse the same logic in Task 4
instead of duplicating the construction dance.
Also updates the SDK version hint in the warning from 1.10.0 to
1.26.0 (which is what we actually require for the OAuth types
used here). Task 3 of 8.
* feat(mcp-oauth): manager now builds HermesMCPOAuthProvider directly
_build_provider constructs the disk-watching subclass using the
helpers from Task 3, instead of delegating to the plain
build_oauth_auth factory. Any consumer using the manager now gets
pre-flow disk-freshness checks automatically.
build_oauth_auth is preserved as the public API for backwards
compatibility. The code path is now:
MCPOAuthManager.get_or_build_provider ->
_build_provider ->
_configure_callback_port
_build_client_metadata
_maybe_preregister_client
_parse_base_url
HermesMCPOAuthProvider(...)
Task 4 of 8.
* feat(mcp): wire OAuth manager + add _reconnect_event
MCPServerTask gains _reconnect_event alongside _shutdown_event.
When set, _run_http / _run_stdio exit their async-with blocks
cleanly (no exception), and the outer run() loop re-enters the
transport to rebuild the MCP session with fresh credentials.
This is the recovery path for OAuth failures that the SDK's
in-place httpx.Auth cannot handle (e.g. cron externally consumed
the refresh_token, or server-side session invalidation).
_run_http now asks MCPOAuthManager for the OAuth provider
instead of calling build_oauth_auth directly. Config-time,
runtime, and reconnect paths all share one provider instance
with pre-flow disk-watch active.
shutdown() defensively sets both events so there is no race
between reconnect and shutdown signalling.
Task 5 of 8.
* feat(mcp): detect auth failures in tool handlers, trigger reconnect
All 5 MCP tool handlers (tool call, list_resources, read_resource,
list_prompts, get_prompt) now detect auth failures and route
through MCPOAuthManager.handle_401:
1. If the manager says recovery is viable (disk has fresh tokens,
or SDK can refresh in-place), signal MCPServerTask._reconnect_event
to tear down and rebuild the MCP session with fresh credentials,
then retry the tool call once.
2. If no recovery path exists, return a structured needs_reauth
JSON error so the model stops hallucinating manual refresh
attempts (the 'let me curl the token endpoint' loop Cthulhu
pasted from Discord).
_is_auth_error catches OAuthFlowError, OAuthTokenError,
OAuthNonInteractiveError, and httpx.HTTPStatusError(401). Non-auth
exceptions still surface via the generic error path unchanged.
Task 6 of 8.
* feat(mcp-cli): route add/remove through manager, add 'hermes mcp login'
cmd_mcp_add and cmd_mcp_remove now go through MCPOAuthManager
instead of calling build_oauth_auth / remove_oauth_tokens
directly. This means CLI config-time state and runtime MCP
session state are backed by the same provider cache — removing
a server evicts the live provider, adding a server populates
the same cache the MCP session will read from.
New 'hermes mcp login <name>' command:
- Wipes both the on-disk tokens file and the in-memory
MCPOAuthManager cache
- Triggers a fresh OAuth browser flow via the existing probe
path
- Intended target for the needs_reauth error Task 6 returns
to the model
Task 7 of 8.
* test(mcp-oauth): end-to-end integration tests
Five new tests exercising the full consolidation with real file
I/O and real imports (no transport mocks):
1. external_refresh_picked_up_without_restart — Cthulhu's cron
workflow. External process writes fresh tokens to disk;
on the next auth flow the manager's mtime-watch flips
_initialized and the SDK re-reads from storage.
2. handle_401_deduplicates_concurrent_callers — 10 concurrent
handlers for the same failed token fire exactly ONE recovery
attempt (thundering-herd protection).
3. handle_401_returns_false_when_no_provider — defensive path
for unknown servers.
4. invalidate_if_disk_changed_handles_missing_file — pre-auth
state returns False cleanly.
5. provider_is_reused_across_reconnects — cache stickiness so
reconnects preserve the disk-watch baseline mtime.
Task 8 of 8 — consolidation complete.
Mirrors OpenRouter which already lists anthropic/claude-opus-4.7 as
recommended. Surfaces the model in the `hermes model` picker and the
gateway /model flow for Nous Portal users.
Context length (1M) is already covered by the existing claude-opus-4.7
entry in agent/model_metadata.py DEFAULT_CONTEXT_LENGTHS.
* docs: fix ascii-guard border alignment errors
Three docs pages had ASCII diagram boxes with off-by-one column
alignment issues that failed docs-site-checks CI:
- architecture.md: outer box is 71 cols but inner-box content lines
and border corners were offset by 1 col, making content-line right
border at col 70/72 while top/bottom border was at col 71. Inner
boxes also had border corners at cols 19/36/53 but content pipes
at cols 20/37/54. Rewrote the diagram with consistent 71-col width
throughout, aligned inner boxes at cols 4-19, 22-37, 40-55 with
2-space gaps and 15-space trailing padding.
- gateway-internals.md: same class of issue — outer box at 51 cols,
inner content lines varied 52-54 cols. Rewrote with consistent
51-col width, inner boxes at cols 4-15, 18-29, 32-43. Also
restructured the bottom-half message flow so it's bare text
(not half-open box cells) matching the intent of the original.
- agent-loop.md line 112-114: box 2 (API thread) content lines had
one extra space pushing the right border to col 46 while the top
and bottom borders of that box sat at col 45. Trimmed one trailing
space from each of the three content lines.
All 123 docs files now pass `npm run lint:diagrams`:
✓ Errors: 0 (warnings: 6, non-fatal)
Pre-existing failures on main — unrelated to any open PR.
* test(setup): accept description kwarg in prompt_choice mock lambdas
setup.py's `_curses_prompt_choice` gained an optional `description`
parameter (used for rendering context hints alongside the prompt).
`prompt_choice` forwards it via keyword arg. The two existing tests
mocked `_curses_prompt_choice` with lambdas that didn't accept the
new kwarg, so the forwarded call raised TypeError.
Fix: add `description=None` to both mock lambda signatures so they
absorb the new kwarg without changing behavior.
* test(matrix): update stale audio-caching assertion
test_regular_audio_has_http_url asserted that non-voice audio
messages keep their HTTP URL and are NOT downloaded/cached. That
was true when the caching code only triggered on
`is_voice_message`. Since bec02f37 (encrypted-media caching
refactor), matrix.py caches all media locally — photos, audio,
video, documents — so downstream tools can read them as real
files via media_urls. This applies to regular audio too.
Renamed the test to `test_regular_audio_is_cached_locally`,
flipped the assertions accordingly, and documented the
intentional behavior change in the docstring. Other tests in
the file (voice-specific caching, message-type detection,
reply-to threading) continue to pass.
* test(413): allow multi-pass preflight compression
run_agent.py's preflight compression runs up to 3 passes in a loop
for very large sessions (each pass summarizes the middle N turns,
then re-checks tokens). The loop breaks when a pass returns a
message list no shorter than its input (can't compress further).
test_preflight_compresses_oversized_history used a static mock
return value that returned the same 2 messages regardless of input,
so the loop ran pass 1 (41 -> 2) and pass 2 (2 -> 2 -> break),
making call_count == 2. The assert_called_once() assertion was
strictly wrong under the multi-pass design.
The invariant the test actually cares about is: preflight ran, and
its first invocation received the full oversized history. Replaced
the count assertion with those two invariants.
* docs: drop '...' from gateway diagram, merge side-by-side boxes
ascii-guard 2.3.0 flagged two remaining issues after the initial fix
pass:
1. gateway-internals.md L33: the '...' suffix after inner box 3's
right border got parsed as 'extra characters after inner-box right
border'. Dropped the '...' — the surrounding prose already conveys
'and more platforms' without needing the visual hint.
2. agent-loop.md: ascii-guard can't cleanly parse two side-by-side
boxes of different heights (main thread 7 rows, API thread 5 rows).
Even equalizing heights didn't help — the linter treats the left
box's right border as the end of the diagram. Merged into a single
54-char-wide outer box with both threads labeled as regions inside,
keeping the ▶ arrow to preserve the main→API flow direction.
Previous pass assumed both skills would always be loaded together, so
each description pointed at the other ('use concept-diagrams instead').
That breaks when only one skill is active — the agent reads 'use the
other skill' and there is no other skill.
Now each skill's description and scope section is fully self-contained:
- States what it's best suited for
- Lists subjects where a more specialized skill (if available) would be
a better fit, naming them only as 'consider X if available'
- Explicitly offers itself as a general SVG diagram fallback when no
more specialized skill exists
An agent loading either skill alone gets unambiguous guidance; an
agent with both loaded still gets useful routing via the 'consider X
if available' hints and the related_skills metadata.
Both skills generate SVG system diagrams, but for very different subjects
and aesthetics. The old descriptions didn't make the split clear, so an
agent loading either one couldn't confidently pick.
Changes:
- Rewrote both frontmatter descriptions to state the scope up front plus
an explicit 'for X, use the other skill instead' pointer.
- Added a symmetric 'When to use this skill vs <other>' decision table
to the top of each SKILL.md body, so the guidance is visible whether
the agent is reading frontmatter or full content.
- Added architecture-diagram <-> concept-diagrams to each other's
related_skills metadata.
Rule of thumb baked into both skills:
software/cloud infra -> architecture-diagram
physical / scientific / educational -> concept-diagrams
Salvage of PR #11045 (original by v1k22). Changes on top of the
original commit:
- Rename 'architecture-visualization-svg-diagrams' -> 'concept-diagrams'
to differentiate from the existing architecture-diagram skill.
architecture-diagram stays as the dark-themed Cocoon-style option for
software/infra; concept-diagrams covers physics, chemistry, math,
engineering, physical objects, and educational visuals.
- Trigger description scoped to actual use cases; removed the 'always
use this skill' language and long phrase-capture list to stop
colliding with architecture-diagram, excalidraw, generative-widgets,
manim-video.
- Default output is now a standalone self-contained HTML file (works
offline, no server). The preview server is opt-in and no longer part
of the default workflow.
- When the server IS used: bind to 127.0.0.1 instead of 0.0.0.0 (was a
LAN exposure hazard on shared networks) and let the OS pick a free
ephemeral port instead of hard-coding 22223 (collision prone).
- Shrink SKILL.md from 1540 to 353 lines by extracting reusable
material into linked files:
- templates/template.html (host page with full CSS design system)
- references/physical-shape-cookbook.md
- references/infrastructure-patterns.md
- references/dashboard-patterns.md
All 15 examples kept intact.
- Add dhandhalyabhavik@gmail.com -> v1k22 to AUTHOR_MAP.
Preserves v1k22's authorship on the underlying commit.
- SKILL.md with full SVG design system (color palette, typography, spacing, dark mode)
- 15 example diagrams covering flowcharts, physical structures, chemistry, charts, floor plans, and more
- Supports 8 diagram types: flowchart, structural, API map, microservice, data flow, physical, infrastructure, UI mockups
- Auto-hosts diagrams on 0.0.0.0:22223 as interactive web pages
Inbound Feishu messages arriving during brief windows when the adapter
loop is unavailable (startup/restart transitions, network-flap reconnect)
were silently dropped with a WARNING log. This matches the symptom in
issue #5499 — and users have reported seeing only a subset of their
messages reach the agent.
Fix: queue pending events in a thread-safe list and spawn a single
drainer thread that replays them once the loop becomes ready. Covers
these scenarios:
* Queue events instead of dropping when loop is None/closed
* Single drainer handles the full queue (not thread-per-event)
* Thread-safe with threading.Lock on the queue and schedule flag
* Handles mid-drain bursts (new events arrive while drainer is working)
* Handles RuntimeError if loop closes between check and submit
* Depth cap (1000) prevents unbounded growth during extended outages
* Drops queue cleanly on disconnect rather than holding forever
* Safety timeout (120s) prevents infinite retention on broken adapters
Based on the approach proposed in #4789 by milkoor, rewritten for
thread-safety and correctness.
Test plan:
* 5 new unit tests (TestPendingInboundQueue) — all passing
* E2E test with real asyncio loop + fake WS thread: 10-event burst
before loop ready → all 10 delivered in order
* E2E concurrent burst test: 20 events queued, 20 more arrive during
drainer dispatch → all 40 delivered, no loss, no duplicates
* All 111 existing feishu tests pass
Related: #5499, #4789
Co-authored-by: milkoor <milkoor@users.noreply.github.com>
* feat(image_gen): multi-model FAL support with picker in hermes tools
Adds 8 FAL text-to-image models selectable via `hermes tools` →
Image Generation → (FAL.ai | Nous Subscription) → model picker.
Models supported:
- fal-ai/flux-2/klein/9b (new default, <1s, $0.006/MP)
- fal-ai/flux-2-pro (previous default, kept backward-compat upscaling)
- fal-ai/z-image/turbo (Tongyi-MAI, bilingual EN/CN)
- fal-ai/nano-banana (Gemini 2.5 Flash Image)
- fal-ai/gpt-image-1.5 (with quality tier: low/medium/high)
- fal-ai/ideogram/v3 (best typography)
- fal-ai/recraft-v3 (vector, brand styles)
- fal-ai/qwen-image (LLM-based)
Architecture:
- FAL_MODELS catalog declares per-model size family, defaults, supports
whitelist, and upscale flag. Three size families handled uniformly:
image_size_preset (flux family), aspect_ratio (nano-banana), and
gpt_literal (gpt-image-1.5).
- _build_fal_payload() translates unified inputs (prompt + aspect_ratio)
into model-specific payloads, merges defaults, applies caller overrides,
wires GPT quality_setting, then filters to the supports whitelist — so
models never receive rejected keys.
- IMAGEGEN_BACKENDS registry in tools_config prepares for future imagegen
providers (Replicate, Stability, etc.); each provider entry tags itself
with imagegen_backend: 'fal' to select the right catalog.
- Upscaler (Clarity) defaults off for new models (preserves <1s value
prop), on for flux-2-pro (backward-compat). Per-model via FAL_MODELS.
Config:
image_gen.model = fal-ai/flux-2/klein/9b (new)
image_gen.quality_setting = medium (new, GPT only)
image_gen.use_gateway = bool (existing)
Agent-facing schema unchanged (prompt + aspect_ratio only) — model
choice is a user-level config decision, not an agent-level arg.
Picker uses curses_radiolist (arrow keys, auto numbered-fallback on
non-TTY). Column-aligned: Model / Speed / Strengths / Price.
Docs: image-generation.md rewritten with the model table and picker
walkthrough. tools-reference, tool-gateway, overview updated to drop
the stale "FLUX 2 Pro" wording.
Tests: 42 new in tests/tools/test_image_generation.py covering catalog
integrity, all 3 size families, supports filter, default merging, GPT
quality wiring, model resolution fallback. 8 new in
tests/hermes_cli/test_tools_config.py for picker wiring (registry,
config writes, GPT quality follow-up prompt, corrupt-config repair).
* feat(image_gen): translate managed-gateway 4xx to actionable error
When the Nous Subscription managed FAL proxy rejects a model with 4xx
(likely portal-side allowlist miss or billing gate), surface a clear
message explaining:
1. The rejected model ID + HTTP status
2. Two remediation paths: set FAL_KEY for direct access, or
pick a different model via `hermes tools`
5xx, connection errors, and direct-FAL errors pass through unchanged
(those have different root causes and reasonable native messages).
Motivation: new FAL models added to this release (flux-2-klein-9b,
z-image-turbo, nano-banana, gpt-image-1.5, ideogram-v3, recraft-v3,
qwen-image) are untested against the Nous Portal proxy. If the portal
allowlists model IDs, users on Nous Subscription will hit cryptic
4xx errors without guidance on how to work around it.
Tests: 8 new cases covering status extraction across httpx/fal error
shapes and 4xx-vs-5xx-vs-ConnectionError translation policy.
Docs: brief note in image-generation.md for Nous subscribers.
Operator action (Nous Portal side): verify that fal-queue-gateway
passes through these 7 new FAL model IDs. If the proxy has an
allowlist, add them; otherwise Nous Subscription users will see the
new translated error and fall back to direct FAL.
* feat(image_gen): pin GPT-Image quality to medium (no user choice)
Previously the tools picker asked a follow-up question for GPT-Image
quality tier (low / medium / high) and persisted the answer to
`image_gen.quality_setting`. This created two problems:
1. Nous Portal billing complexity — the 22x cost spread between tiers
($0.009 low / $0.20 high) forces the gateway to meter per-tier per
user, which the portal team can't easily support at launch.
2. User footgun — anyone picking `high` by mistake burns through
credit ~6x faster than `medium`.
This commit pins quality at medium by baking it into FAL_MODELS
defaults for gpt-image-1.5 and removes all user-facing override paths:
- Removed `_resolve_gpt_quality()` runtime lookup
- Removed `honors_quality_setting` flag on the model entry
- Removed `_configure_gpt_quality_setting()` picker helper
- Removed `_GPT_QUALITY_CHOICES` constant
- Removed the follow-up prompt call in `_configure_imagegen_model()`
- Even if a user manually edits `image_gen.quality_setting` in
config.yaml, no code path reads it — always sends medium.
Tests:
- Replaced TestGptQualitySetting (6 tests) with TestGptQualityPinnedToMedium
(5 tests) — proves medium is baked in, config is ignored, flag is
removed, helper is removed, non-gpt models never get quality.
- Replaced test_picker_with_gpt_image_also_prompts_quality with
test_picker_with_gpt_image_does_not_prompt_quality — proves only 1
picker call fires when gpt-image is selected (no quality follow-up).
Docs updated: image-generation.md replaces the quality-tier table
with a short note explaining the pinning decision.
* docs(image_gen): drop stale 'wires GPT quality tier' line from internals section
Caught in a cleanup sweep after pinning quality to medium. The
"How It Works Internally" walkthrough still described the removed
quality-wiring step.
The helper used ${var,,} (bash 4+ lowercase parameter expansion) and
[[ =~ ]], which fail on macOS default /bin/bash (3.2.57) with:
bash: ${default,,}: bad substitution
With 'set -e' at the top of the script, that aborts the whole
installer for macOS users who don't have a newer bash on PATH.
Replace the lowercase expansions with POSIX-style case patterns
(`[yY]|[yY][eE][sS]|...`) that behave identically and parse cleanly
on bash 3.2. Verified with a 15-case behavior test on both bash 3.2
and bash 5.2 — all pass.
Re-land of #10933, now guarded by the tests in #11266.
When a provider drops a TCP connection mid-stream, the socket can enter
CLOSE-WAIT and ''epoll_wait'' may never fire — no data or error signal
arrives, so the httpx read timeout never triggers and the agent hangs
indefinitely. The other defenses (''_force_close_tcp_sockets'', stale
stream detector) all ride on the socket layer reporting the dead
connection, which it never does without probes.
Inject ''SO_KEEPALIVE'' + ''TCP_KEEPIDLE''/''KEEPINTVL''/''KEEPCNT''
into the httpx transport. Kernel probes after 30s idle, retries every
10s, gives up after 3 → dead peer detected within ~60s instead of
hanging forever. Platform-aware: ''TCP_KEEPIDLE'' on Linux,
''TCP_KEEPALIVE'' on macOS. Silent no-op on Windows or anywhere
the socket options aren't available.
The original land (#10933) mutated ''client_kwargs'' in place when it
injected the ''httpx.Client''. Since callers pass ''self._client_kwargs''
by reference, the injected client leaked into the instance state. After
the first request, the OpenAI SDK closed its ''http_client'' — including
the injected one. The next ''_create_openai_client'' call re-read the
now-closed ''httpx.Client'' from ''self._client_kwargs'' and every
subsequent chat raised ''APIConnectionError'' with cause ''RuntimeError:
Cannot send a request, as the client has been closed'' (AlexKucera's
Discord report, 2026-04-16).
The defensive ''client_kwargs = dict(client_kwargs)'' copy already on
main (taeuk178's #10978) means this injection only lands in the
per-call local copy. Each ''_create_openai_client'' invocation gets
its OWN fresh ''httpx.Client'' whose lifetime is tied to the paired
''OpenAI'' client. When that ''OpenAI'' client is closed (rebuild,
teardown, credential rotation), its ''httpx.Client'' closes with it
and the next call constructs a fresh one — no stale closed transport
can be reused.
Full 4-test matrix all green (unit + live with real OpenRouter round
trips, HERMES_LIVE_TESTS=1):
tests/run_agent/test_create_openai_client_kwargs_isolation.py PASS
tests/run_agent/test_create_openai_client_reuse.py PASS (2)
tests/run_agent/test_sequential_chats_live.py PASS
Socket options verified on the live httpx transport:
_socket_options: [(1, 9, 1), (6, 4, 30), (6, 5, 10), (6, 6, 3)]
= (SO_KEEPALIVE=1, TCP_KEEPIDLE=30s, TCP_KEEPINTVL=10s, TCP_KEEPCNT=3)
Sequential-chat reproduction of the #10933 failure was explicitly
run against this patch — the defensive copy on main prevents the
closed transport from leaking back into ''self._client_kwargs'', so
every rebuild constructs a fresh transport.
Closes#10324
PR #4918 fixed the double-/v1 bug at fresh agent init by stripping the
trailing /v1 from OpenCode base URLs when api_mode is anthropic_messages
(so the Anthropic SDK's own /v1/messages doesn't land on /v1/v1/messages).
The same logic was missing from the /model mid-session switch path.
Repro: start a session on opencode-go with GLM-5 (or any chat_completions
model), then `/model minimax-m2.7`. switch_model() correctly sets
api_mode=anthropic_messages via opencode_model_api_mode(), but base_url
passes through as https://opencode.ai/zen/go/v1. The Anthropic SDK then
POSTs to https://opencode.ai/zen/go/v1/v1/messages, which returns the
OpenCode website 404 HTML page (title 'Not Found | opencode').
Same bug affects `/model claude-sonnet-4-6` on opencode-zen.
Verified upstream: POST /v1/messages returns clean JSON 401 with x-api-key
auth (route works), while POST /v1/v1/messages returns the exact HTML 404
users reported.
Fix mirrors runtime_provider.resolve_runtime_provider:
- hermes_cli/model_switch.py::switch_model() strips /v1 after the OpenCode
api_mode override when the resolved mode is anthropic_messages.
- run_agent.py::AIAgent.switch_model() applies the same strip as
defense-in-depth, so any direct caller can't reintroduce the double-/v1.
Tests: 9 new regression tests in tests/hermes_cli/test_model_switch_opencode_anthropic.py
covering minimax on opencode-go, claude on opencode-zen, chat_completions
(GLM/Kimi/Gemini) keeping /v1 intact, codex_responses (GPT) keeping /v1
intact, trailing-slash handling, and the agent-level defense-in-depth.
Follow-ups on top of kshitijk4poor's cherry-picked salvage of PR #8018:
tools/environments/daytona.py
- PID-suffix /tmp/.hermes_sync.<pid>.tar so concurrent sync_back calls
against the same sandbox don't collide on the remote temp path
- Move sync_back() inside the cleanup lock and after the _sandbox-None
guard, with its own try/except. Previously a no-op cleanup (sandbox
already cleared) still fired sync_back → 3-attempt retry storm against
a nil sandbox (~6s of sleep). Now short-circuits cleanly.
tools/environments/file_sync.py
- Add _SYNC_BACK_MAX_BYTES (2 GiB) defensive cap: refuse to extract a
tar larger than the limit. Protects against runaway sandboxes
producing arbitrary-size archives.
- Add 'nothing previously pushed' guard at the top of sync_back(). If
_pushed_hashes and _synced_files are both empty, the FileSyncManager
was never initialized from the host side — there is nothing coherent
to sync back. Skips the retry/backoff machinery on uninitialized
managers and eliminates test-suite slowdown from pre-existing cleanup
tests that don't mock the sync layer.
tests/tools/test_file_sync_back.py
- Update _make_manager helper to seed a _pushed_hashes entry by default
so sync_back() exercises its real path. A seed_pushed_state=False
opt-out is available for noop-path tests.
- Add TestSyncBackSizeCap with positive and negative coverage of the
new cap.
tests/tools/test_sync_back_backends.py
- Update Daytona bulk download test to assert the PID-suffixed path
pattern instead of the fixed /tmp/.hermes_sync.tar.
Salvage of PR #8018 by @alt-glitch onto current main.
On sandbox teardown, FileSyncManager now downloads the remote .hermes/
directory, diffs against SHA-256 hashes of what was originally pushed,
and applies only changed files back to the host.
Core (tools/environments/file_sync.py):
- sync_back(): orchestrates download -> unpack -> diff -> apply with:
- Retry with exponential backoff (3 attempts, 2s/4s/8s)
- SIGINT trap + defer (prevents partial writes on Ctrl-C)
- fcntl.flock serialization (concurrent gateway sandboxes)
- Last-write-wins conflict resolution with warning
- New remote files pulled back via _infer_host_path prefix matching
Backends:
- SSH: _ssh_bulk_download — tar cf - piped over SSH
- Modal: _modal_bulk_download — exec tar cf - -> proc.stdout.read
- Daytona: _daytona_bulk_download — exec tar cf -> SDK download_file
- All three call sync_back() at the top of cleanup()
Fixes applied during salvage (vs original PR #8018):
| # | Issue | Fix |
|---|-------|-----|
| C1 | import fcntl unconditional — crashes Windows | try/except with fallback; _sync_back_locked skips locking when fcntl=None |
| W1 | assert for runtime guard (stripped by -O) | Replaced with proper if/raise RuntimeError |
| W2 | O(n*m) from _get_files_fn() called per file | Cache mapping once at start of _sync_back_impl, pass to resolve/infer |
| W3 | Dead BulkDownloadFn imports in 3 backends | Removed unused imports |
| W4 | Modal hardcodes root/.hermes, no explanation | Added docstring comment explaining Modal always runs as root |
| S1 | SHA-256 computed for new files where pushed_hash=None | Skip hashing when pushed_hash is None (comparison always False) |
| S2 | Daytona /tmp/.hermes_sync.tar never cleaned up | Added rm -f after download (best-effort) |
Tests: 49 passing (17 new: _infer_host_path edge cases, SIGINT
main/worker thread, Windows fcntl=None fallback, Daytona tar cleanup).
Based on #8018 by @alt-glitch.
Sticky prompt:
The loop was skipping `first` (the first row in the viewport) when
looking for a user message scrolled above the top edge. If `first`
itself was a user row that had just ticked above the viewport, we'd
fall through the early-return guard (`role === 'user' && !above`),
then walk from `first - 1` backward — never rechecking `first`, never
finding anything, returning '' and leaving the sticky empty. This is
why it felt "stuck" at the start: one-turn sessions with the user row
exactly at/near the top never surfaced the breadcrumb.
Collapsed the two branches into one loop starting at `first`: nearest
user wins — still-on-screen → empty (redundant to echo), already
above → text. Same semantics, covers the gap.
Scrollbar:
`useSyncExternalStore` snapshot was `scrollTop:vp:scrollHeight` —
scrollHeight ticks up by ~1 row on every streamed chunk, forcing a
re-render per chunk. Quantized snapshot to the displayed values
(`thumbTop:thumbSize:vp`) so we only re-render when the bar actually
changes. Drops render count per turn by ~100x during streaming and
stops the "constantly resizes" flicker.
The gateway was gating `reasoning.delta` and `reasoning.available`
behind `_reasoning_visible(sid)` (true iff `display.show_reasoning:
true` or `tool_progress_mode: verbose`). With the default config,
neither was true — so reasoning events never reached the TUI,
`turn.reasoning` stayed empty, `reasoningTokens` stayed 0, and the
Thinking expander showed no token label for the whole turn. Tools
still reported tokens because `tool.start` had no such gate.
Then `message.complete` fired with `payload.reasoning` populated, the
TUI saved it into `msg.thinking`, and the finalized row's expander
sprouted "~36 tokens" post-hoc. That's the "tokens appear after the
turn" jank.
Remove the gate on emission. The TUI is responsible for whether to
display reasoning content (detailsMode + collapsed expander already
handle that). Token counting becomes continuous throughout the turn,
matching how tools work.
Also dropped the now-unused `_reasoning_visible` and
`_session_show_reasoning` helpers. `show_reasoning` config key stays
in place — it's still toggled via `/reasoning show|hide` and read
elsewhere for potential future TUI-side gating.
Two improvements:
1. The progress ToolTrail and the streaming MessageLine were two
sibling JSX blocks in appLayout with hand-rolled margin glue
between them. Extracted into `<StreamingAssistant>`, a single
component that owns both the trail and the streaming body plus
the 1-row gap between them. appLayout just hands it `progress`
and theme; the layout logic lives in one place, matching the
mental model that these two pieces are one live assistant turn.
2. Thinking token label was hidden when `reasoningTokens === 0` even
if the live reasoning text was already populated (the
scheduleReasoning timer hadn't ticked, or the model sent no
reasoning but the text was coming in via reasoning.delta).
Changed the tokenCount fallback from `reasoningTokens !==
undefined ? reasoningTokens : estimate` to `reasoningTokens > 0 ?
... : estimate` so the label appears the moment text exists.
`appLayout` was passing `busy={ui.busy && !progress.streaming}` into
ToolTrail, so the moment `message.delta` fired and streaming began,
the panel internally saw `busy=false`. With the prior fix in place
(hasThinking = !!cot || reasoningActive || busy), that flipped
hasThinking to false and the Thinking expander vanished mid-turn —
reappearing only after message.complete when the finalized row
rendered with its own internal expander.
The `!progress.streaming` override was a defensive guard against the
panel implying "still thinking" once the response text was streaming.
But that's already handled inside ToolTrail — `streaming` prop on the
Thinking component uses `busy && reasoningStreaming`, and
reasoningStreaming is already falsey once recordMessageDelta calls
endReasoningPhase.
Pass plain `busy={ui.busy}`. Panel stays up start-to-finish; handoff
to the finalized-message row is continuous.
Finalized assistant messages rendered the thinking/tools trail inside
MessageLine with marginBottom=1 before the response body — giving a
clean blank line above the text. The streaming path rendered the
progress ToolTrail and the streaming MessageLine as two separate
siblings with no margin between, so the in-progress response butted
right up against the thinking panel. That's the "newline appears
after it's done" jank.
Wrap the streaming MessageLine in a Box with marginTop=1 whenever the
progress area is visible above it. Same spacing as the finalized
version, continuous through the handoff.
Previously `hasThinking = !!cot || reasoningActive || (busy && !hasTools)`
so the moment a tool started streaming (`hasTools` → true) the expander
vanished mid-turn. If the model also produced no `reasoning.delta`
events (reasoning-less models, or reasoning arriving after tools), the
whole turn ran with no Thinking row — then `message.complete`
populated `msg.thinking` from the payload's post-hoc reasoning trace
and the expander suddenly appeared in the transcript AFTER the turn.
Drop the `!hasTools` restriction. The Thinking row now anchors for the
entire `busy` window; tools and thinking coexist as sibling sections
(they already did — the exclusion was a UX mistake). Reasoning-less
models show a dim empty header; streaming models show live content;
tool-interleaved turns keep the anchor visible throughout.
The status bar was showing stale lifecycle text ("running…") while the
face+verb stream flickered through the thinking panel as Python pushed
thinking.delta events. That's backwards — the face ticker is the
primary "I'm alive" signal, it belongs in the status bar; the thinking
panel is for substantive reasoning and tool activity.
Status bar now reads `ui.busy`: when true, renders a local `<FaceTicker>`
cycling FACES × VERBS on a 2.5s interval, unaffected by server events.
When false, the bar shows the actual status string (ready, starting
agent…, interrupted, etc.).
Side effect: `scheduleThinkingStatus` still patches `ui.status` with
Python's face text, but while busy the bar ignores that string and uses
the ticker instead. No server-side changes needed — Python keeps
emitting thinking.delta as a liveness heartbeat, the TUI just doesn't
let it fight the status bar.
"PT" was internal shorthand for prompt_toolkit that leaked into
AGENTS.md and the TUI post-mortem. Spell it out.
- AGENTS.md: "PT CLI" → "classic (prompt_toolkit) CLI"
- docs/plans/2026-04-01-ink-gateway-tui-migration-plan.md: both hits
"Ink" is the React reconciler — implementation detail, not branding.
Consistent naming: the classic CLI is the CLI, the new one is the TUI.
Updated docs: user-guide/tui.md, user-guide/cli.md cross-link, quickstart,
cli-commands reference, environment-variables reference.
Updated code: main.py --tui help text, server.py user-visible setup
error, AGENTS.md "TUI Architecture" section.
Kept "Ink" only where it is literally the library (hermes-ink internal
source comments, AGENTS.md tree note flagging ui-tui/ as a React/Ink dir).
New primary guide at `user-guide/tui.md` covering launch, requirements,
keybindings, slash commands, status line, configuration, sessions, and
the revert path. Matches the voice of `user-guide/cli.md`.
Cross-links:
- `user-guide/cli.md`: tip callout pointing readers at the Ink TUI
- `getting-started/quickstart.md`: shows both `hermes` and `hermes --tui`
under "Start Chatting" so first-run users know they have the choice
- `reference/environment-variables.md`: new "Interface" section with
`HERMES_TUI` and `HERMES_TUI_DIR`
- `reference/cli-commands.md`: `--tui` and `--dev` added to global options
Sidebar: `user-guide/tui` slotted right after `user-guide/cli`.
All 61 TUI-related tests green across 3 consecutive xdist runs.
tests/tui_gateway/test_protocol.py:
- rename `get_messages` → `get_messages_as_conversation` on mock DB (method
was renamed in the real backend, test was still stubbing the old name)
- update tool-message shape expectation: `{role, name, context}` matches
current `_history_to_messages` output, not the legacy `{role, text}`
tests/hermes_cli/test_tui_resume_flow.py:
- `cmd_chat` grew a first-run provider-gate that bailed to "Run: hermes
setup" before `_launch_tui` was ever reached; 3 tests stubbed
`_resolve_last_session` + `_launch_tui` but not the gate
- factored a `main_mod` fixture that stubs `_has_any_provider_configured`,
reused by all three tests
tests/test_tui_gateway_server.py:
- `test_config_set_personality_resets_history_and_returns_info` was flaky
under xdist because the real `_write_config_key` touches
`~/.hermes/config.yaml`, racing with any other worker that writes
config. Stub it in the test.
The Discord voice receive path skipped RFC 3550 §5.1 padding handling,
passing padding-contaminated payloads into DAVE E2EE decrypt and Opus
decode. Symptoms in live VC sessions: deaf inbound speech, intermittent
empty STT results, "corrupted stream" decode errors — especially on the
first reply after join.
When the P bit is set in the RTP header, the last payload byte holds the
count of trailing padding bytes (including itself) that must be removed.
Receive pipeline now follows the spec order:
1. RTP header parse
2. NaCl transport decrypt (aead_xchacha20_poly1305_rtpsize)
3. strip encrypted RTP extension data from start
4. strip RTP padding from end if P bit set ← was missing
5. DAVE inner media decrypt
6. Opus decode
Drops malformed packets where pad_len is 0 or exceeds payload length.
Adds 7 integration tests covering valid padded packets, the X+P combined
case, padding under DAVE passthrough, and three malformed-padding paths.
Closes#11267
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the dashboard connects to a remote gateway via GATEWAY_HEALTH_URL,
display the URL instead of the remote PID (which is meaningless locally).
Falls back to PID display for local gateways as before.
- Backend: expose gateway_health_url in /api/status response
- Frontend: prefer gateway_health_url over PID in gatewayValue()
- Add truncate + title tooltip for long URLs that overflow the card
- Add min-w-0/overflow-hidden on status cards for proper truncation
- Tests: verify gateway_health_url in remote and no-URL scenarios
Two new tests in tests/run_agent/ that pin the user-visible invariant
behind AlexKucera's Discord report (2026-04-16): no matter how a future
keepalive / transport fix for #10324 plumbs sockets in, sequential
chats on the same AIAgent instance must all succeed.
test_create_openai_client_reuse.py (no network, runs in CI):
- test_second_create_does_not_wrap_closed_transport_from_first
back-to-back _create_openai_client calls must not hand the same
http_client (after an SDK close) to the second construction
- test_replace_primary_openai_client_survives_repeated_rebuilds
three sequential rebuilds via the real _replace_primary_openai_client
entrypoint must each install a live client
test_sequential_chats_live.py (opt-in, HERMES_LIVE_TESTS=1):
- test_three_sequential_chats_across_client_rebuild
real OpenRouter round trips, with an explicit
_replace_primary_openai_client call between turns 2 and 3.
Error-sentinel detector treats 'API call failed after 3 retries'
replies as failures instead of letting them pass the naive
truthy check (which is how a first draft of this test missed
the bug it was meant to catch).
Validation:
clean main (post-revert, defensive copy present)
-> all 4 tests PASS
broken #10933 state (keepalive injection, no defensive copy)
-> all 4 tests FAIL with precise messages pointing at #10933
Companion to taeuk178's test_create_openai_client_kwargs_isolation.py,
which pins the syntactic 'don't mutate input dict' half of the same
contract. Together they catch both the specific mechanism of #10933
and any other reimplementation that breaks the sequential-call
invariant.
The language switcher displayed the *other* language's flag (clicking
the Chinese flag switched to Chinese). This is dissonant — a flag reads
as a state indicator first, so seeing the Chinese flag while the UI is
in English feels wrong. Users expect the flag to reflect the current
language, like every other status indicator.
Flips the flag and label ternaries so English shows UK + EN, Chinese
shows CN + 中文. Tooltip text ("Switch to Chinese" / "切换到英文") still
communicates the click action, which is where that belongs.
* fix(cli): stop approval panel from clipping approve/deny off-screen
The dangerous-command approval panel had an unbounded Window height with
choices at the bottom. When tirith findings produced long descriptions or
the terminal was compact, HSplit clipped the bottom of the widget — which
is exactly where approve/session/always/deny live. Users were asked to
decide on commands without being able to see the choices (and sometimes
the command itself was hidden too).
Fix: reorder the panel so title → command → choices render first, with
description last. Budget vertical rows so the mandatory content (command
and every choice) always fits, and truncate the description to whatever
row budget is left. Handle three edge cases:
- Long description in a normal terminal: description gets truncated at
the bottom with a '… (description truncated)' marker. Command and
all four choices always visible.
- Compact terminal (≤ ~14 rows): description dropped entirely. Command
and choices are the only content, no overflow.
- /view on a giant command: command gets truncated with a marker so
choices still render. Keeps at least 2 rows of command.
Same row-budgeting pattern applied to the clarify widget, which had the
identical structural bug (long question would push choices off-screen).
Adds regression tests covering all three scenarios.
* fix(cli): add compact chrome mode for approval/clarify panels on short terminals
Live PTY test at 100x14 rows revealed reserved_below=4 was too optimistic
— the spinner/tool-progress line, status bar, input area, separators, and
prompt symbol actually consume ~6 rows below the panel. At 14 rows, the
panel still got 'Deny' clipped off the bottom.
Fix: bump reserved_below to 6 (measured from live PTY output) and add a
compact-chrome mode that drops the blank separators between title/command
and command/choices when the full-chrome panel wouldn't fit. Chrome goes
from 5 rows to 3 rows in tight mode, keeping command + all 4 choices on
screen in terminals as small as ~13 rows.
Same compact-chrome pattern applied to the clarify widget.
Verified live in PTY hermes chat sessions at 100x14 (compact chrome
triggered, all choices visible) and 100x30 (full chrome with blanks, nice
spacing) by asking the agent to run 'rm -rf /tmp/sandbox'.
---------
Co-authored-by: Teknium <teknium@nousresearch.com>