Adds platform-level private notice delivery abstraction so operational
messages (e.g. sethome prompt) can be sent ephemerally on Slack when
configured with `slack.notice_delivery: private`.
Changes:
- gateway/config.py: _normalize_notice_delivery() + GatewayConfig.get_notice_delivery()
with per-platform config bridging
- gateway/platforms/base.py: send_private_notice() default implementation
(falls through to send())
- gateway/platforms/slack.py: send_private_notice() via chat_postEphemeral
- gateway/run.py: _deliver_platform_notice() helper replaces direct
adapter.send() for the sethome notice, with private→public fallback
- gateway/platforms/slack.py: app_mention handler now forwards to
_handle_slack_message (safe due to ts-based dedup) instead of no-op pass,
fixing edge-case Slack configs where mentions arrive only as app_mention
- gateway/platforms/slack.py format_message: negative lookbehind prevents
markdown images (![]()) from becoming broken Slack links; italic regex
now requires non-whitespace boundaries so 'a * b * c' stays literal
Based on PR #9340 by @probepark.
Slack slash commands (/q, /btw, /stop, /model, etc.) previously showed
no user-visible acknowledgement and posted command replies as public
channel messages. This diverged from Discord, which uses ephemeral
deferred responses for slash commands.
Changes:
- handle_hermes_command now passes response_type='ephemeral' and a
'Running /cmd…' text to ack(), giving the user immediate 'Only visible
to you' feedback when they invoke any native slash command.
- _handle_slash_command stashes the Slack response_url from the command
payload in a per-channel context dict before dispatching to
handle_message.
- send() checks for a pending slash context and, when found, POSTs to
the response_url with replace_original=true to swap the initial ack
with the real command reply (e.g. 'Queued for the next turn.'),
keeping it ephemeral.
- Stale slash contexts are garbage-collected on lookup (120s TTL).
- The response_url POST is non-fatal: if it fails, the user already saw
the initial ack, and send() returns success=True.
Fixes#18182
Long-running gateway processes that survive 'hermes update' keep
pre-update modules cached in sys.modules. When new tool files on
disk then try to 'from hermes_cli.config import cfg_get' (added in
PR #17304), the import resolves against the stale module object
and raises ImportError — hitting users on Matrix, Telegram, Feishu,
and other platforms.
Two defenses:
1. Gateway self-check (gateway/run.py). On __init__, snapshot the
newest mtime across sentinel source files (hermes_cli/config.py,
run_agent.py, gateway/run.py, etc.). On every inbound message,
re-read those mtimes; if any is newer than boot time + 2s slack,
request a graceful restart via the normal drain path and return
a one-line ack to the user. Idempotent, works regardless of how
the update happened (hermes update, manual git pull, installer).
2. Post-restart survivor sweep ('hermes update'). After the existing
restart loop, sleep 3s, rescan for gateway PIDs we already tried
to kill, and SIGKILL any survivors. The detached profile watchers
and systemd then relaunch with fresh code instead of waiting out
the 120s watcher timeout.
Closes#17648.
* fix(curator): defer first run and add --dry-run preview (#18373)
Curator was meant to run 7 days after install, not on the very first
gateway tick. On a fresh install (no .curator_state), should_run_now()
returned True immediately because last_run_at was None — so the gateway
cron ticker fired Curator against a fresh skill library moments after
'hermes update'. Combined with the binary 'agent-created' provenance
model (anything not bundled and not hub-installed), this consolidated
hand-authored user workflow skills without consent.
Changes:
- should_run_now(): first observation seeds last_run_at='now' and returns
False. The next real pass fires one full interval_hours later (7 days
by default), matching the original design intent.
- hermes curator run --dry-run: produces the same review report without
applying automatic transitions OR permitting the LLM to call
skill_manage / terminal mv. A DRY-RUN banner is prepended to the
prompt and the caller skips apply_automatic_transitions. State is
NOT advanced so a preview doesn't defer the next scheduled real pass.
- hermes update: prints a one-liner on fresh installs pointing at
--dry-run, pause, and the docs. Silent on steady state.
- Docs: curator.md and cli-commands.md explain the deferred first-run
behavior and warn that hand-written SKILL.md files share the
'agent-created' bucket, with guidance to pin or preview before the
first pass.
Tests:
- test_first_run_defers replaces the old 'first run always eligible'
assertion — same fixture, inverted expectation.
- test_maybe_run_curator_defers_on_fresh_install covers the gateway tick
path end-to-end.
- Three new dry-run tests cover state-advance suppression, prompt
banner injection, and apply_automatic_transitions skipping.
Fixes#18373.
* feat(curator): pre-run backup + rollback (#18373)
Every real curator pass now snapshots ~/.hermes/skills/ into
~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling
apply_automatic_transitions or the LLM review. If a run consolidates or
archives something the user didn't want touched, 'hermes curator
rollback' restores the tree in one command. Dry-run is skipped — no
mutation means no snapshot needed.
Changes:
- agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The
snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed
by the skills hub). Extract refuses absolute paths and .. components,
and uses tarfile's filter='data' on Python 3.12+. Rollback takes a
pre-rollback safety snapshot FIRST, stages the current tree into
.rollback-staging-<ts>/ so the extract lands in an empty dir, and
cleans the staging dir on success. A failed extract restores the
staged contents.
- agent/curator.py: run_curator_review() calls curator_backup.
snapshot_skills(reason='pre-curator-run') before apply_automatic_
transitions. Best-effort — a failed snapshot logs at debug and the
run continues (a transient disk issue shouldn't silently disable
curator forever).
- hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator
rollback' subcommands. rollback supports --list, --id <ts>, -y.
- hermes_cli/config.py: curator.backup.{enabled, keep} config block
with sane defaults (enabled=true, keep=5).
- Docs: curator.md gets a 'Backups and rollback' section; cli-commands
.md table gets the new rows.
Tests (new file tests/agent/test_curator_backup.py, 16 cases):
- snapshot creates tarball + manifest with correct counts
- snapshot excludes .curator_backups/ (recursion guard) and .hub/
- snapshot disabled via config returns None without creating anything
- snapshot uniquifies ids within the same second (-01 suffix)
- prune honors keep count, newest-first
- list_backups + _resolve_backup cover newest-default and unknown-id
- rollback restores a deleted skill with content intact
- rollback is itself undoable — safety snapshot shows up in list_backups
- rollback with no snapshots returns an error
- rollback refuses tarballs with absolute paths or .. components
- real curator runs take a 'pre-curator-run' snapshot; dry-runs do not
All curator tests: 210 passing locally.
Prevents ghost sessions from accumulating in state.db when the TUI/web
dashboard is opened and closed without sending a message.
Changes:
- run_agent.py: Add _ensure_db_session() gate method, called at
run_conversation() entry. Remove eager create_session() from __init__.
Handle compression rotation flag correctly.
- tui_gateway/server.py: Remove eager db.create_session() in
_start_agent_build(). Add post-first-message pending_title re-apply.
- hermes_state.py: Extract _insert_session_row() shared helper (DRY).
Add prune_empty_ghost_sessions() for one-time migration.
- cli.py: One-time ghost session prune on startup. Fix _pending_title
to call _ensure_db_session() before set_session_title().
- hermes_cli/main.py: Guard TUI exit summary on message_count > 0.
- tests: Update test_860_dedup to call _ensure_db_session() before
direct _flush_messages_to_session_db() calls.
Closes: ghost session clutter in hermes sessions list and web dashboard.
Telegram's client does not display empty forum topics in the chat's
topic list. After createForumTopic succeeds, send a short pin message
into the new topic so it becomes immediately visible to the user.
Only fires for newly created topics (no thread_id in config yet).
Failure to send the seed is non-fatal (debug-logged, topic still works).
The bot-owner identity check inside OwnerCommandMiddleware was commented
out and replaced with a hardcoded `is_owner = True`, so any group member
could trigger allowlisted privileged commands (/approve, /deny, /stop,
/reset, /retry, /undo, /new, /background, /bg, /btw, /queue, /q) by
sending the slash command without @-mentioning the bot. The most severe
case is /approve: a non-owner could approve a dangerous tool call the
bot was waiting on the owner to confirm.
Re-enable the documented identity check (push.from_account ==
push.bot_owner_id) so only the configured owner can issue these
commands.
Adds a new top-of-sidebar docs page at /docs/user-stories that is a
masonry-style collage of 99 real user stories sourced from X/Twitter,
GitHub issues/PRs, Reddit, Hacker News, YouTube, blogs (Medium, Substack,
dev.to), podcasts, LinkedIn, GitHub Gists, and Product Hunt.
Every tile links to the original post/issue/video/gist where someone
described a specific use case: personal assistants, dev workflows,
trading bots, research briefs, family WhatsApp agents, Kubernetes
deployments, legal-domain self-hosted setups, and more.
- docs/user-stories.mdx: MDX entry mounting the collage component
- src/components/UserStoriesCollage: React component with category +
source filters, CSS-columns masonry layout, per-category accent colors
- src/data/userStories.json: source-of-truth dataset (force-added; the
root .gitignore's unanchored 'data/' rule would otherwise swallow it,
same reason skills.json is explicitly listed in website/.gitignore)
- sidebars.ts: link added at the top of the docs sidebar
Four callsites hardcoded Path.home() / '.hermes' with no HERMES_HOME
check, breaking Docker deployments and profile isolation (hermes -p):
- plugins/hermes-achievements/dashboard/plugin_api.py:
state_path(), snapshot_path(), checkpoint_path() bare-literal paths
- scripts/profile-tui.py:
DEFAULT_STATE_DB and DEFAULT_LOG defaults ignored HERMES_HOME
- hermes_cli/slack_cli.py:
except-Exception fallback for slack-manifest.json dump
- optional-skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py:
--target argparse default
Use get_hermes_home() (with an ImportError shim for the standalone
scripts) or 'os.environ.get("HERMES_HOME") or str(Path.home()/".hermes")'
where importing hermes_constants is impractical.
E2E-verified: with HERMES_HOME=/tmp/x all three achievements paths and
both profile-tui defaults route under /tmp/x.
Salvaged from #18068 (original scope was broader mechanical cleanup
claiming 23 callsites were buggy; most were already respecting
HERMES_HOME via os.environ.get(key, default) — only these 4 had no env
check at all). Credit: @web-dev0521.
Two machine-readable entry points to the Hermes Agent docs:
/llms.txt curated index of every doc page, one link per page
with short descriptions. ~17 KB, safe to load into
an LLM context window.
/llms-full.txt every page under website/docs/ concatenated as markdown.
~1.8 MB. For one-shot ingestion by coding agents and
RAG pipelines.
Both files are also served from /docs/llms.txt and /docs/llms-full.txt
(Docusaurus serves website/static/ under baseUrl=/docs/). Some agents and
IDE plugins probe the classic site-root path; the deploy workflow now copies
both files to _site root so either URL works.
Conforms to the emerging llmstxt.org spec: H1 project name, blockquote
summary, short install command, GitHub link, then curated sections
mirroring the docs-site navigation (Getting Started, Using Hermes,
Features, Messaging, Integrations, Guides, Developer Guide, Reference).
Generated by website/scripts/generate-llms-txt.py. Wired into prebuild.mjs
so every 'npm run build' and 'npm run start' refreshes the files alongside
the existing skills.json extraction. Both outputs are gitignored (same
precedent as src/data/skills.json).
Descriptions in llms.txt are pulled from each page's frontmatter, so they
stay current automatically. All ~80 section slugs are validated against
the filesystem at generation time; an invalid slug would fail the prebuild.
Adds a proper feature page at user-guide/features/goals.md covering
the /goal slash command — Hermes' take on the Ralph loop shipped in
PR #18262. The slash-commands reference table had two table rows but
no narrative doc walking through the judge model, fail-open semantics,
turn budget, persistence, user-message preemption, or the aux-model
config override.
Adds a walkthrough example showing a multi-turn goal running to
completion, covers the two judge failure modes with how to recover,
and credits Codex CLI 0.128.0 / Eric Traut as prior art.
Also cross-links both slash-commands.md rows to the new page so
readers discovering /goal from the command reference can dive in.
The anyOf collapse in _repair_schema returned early, skipping the
nullable-strip and enum-cleanup steps. When a schema had anyOf
[{enum: [..., null, '']}, {type: null}] alongside a parent-level
'nullable: true', collapsing to the single non-null branch produced a
merged node that still had both 'nullable' and the bad enum values —
Moonshot would still 400 on it.
Fix: fall through to Rules 1/3 when the collapse produces a single
merged node; only return early for the multi-branch case (pure
anyOf preservation) or when there was no null branch to remove.
Adds a test that locks in the combined-case expectation.
When a schema node inside anyOf has enum values but no explicit 'type',
Rule 3 (enum cleanup) ran before _fill_missing_type, so node_type was
None and the enum was never cleaned. Moonshot then rejected the schema
with 'enum value (<nil>) does not match any type in [string]'.
Fix: reorder operations — fill missing type first, strip nullable,
then clean enum. This ensures enum cleanup always has a type to check.
Also fixes test expectation: empty string in enum is now correctly
stripped (Moonshot rejects it too).
Closes#16875
Add a standing-goal slash command that keeps Hermes working toward a
user-stated objective across turns until it is achieved, paused, or
the turn budget runs out. Our take on the Ralph loop — cf. Codex CLI
0.128.0's /goal.
After each turn, a lightweight auxiliary-model judge call asks 'is
this goal satisfied by the assistant's last response?'. If not, and
we're under the turn budget (default 20), Hermes feeds a continuation
prompt back into the same session as a normal user message. Any real
user message preempts the continuation loop automatically.
Judge failures fail OPEN (continue) so a flaky judge never wedges
progress — the turn budget is the real backstop.
### Commands
- `/goal <text>` — set a standing goal (kicks off the first turn)
- `/goal` or `/goal status` — show current state
- `/goal pause` — pause the continuation loop
- `/goal resume` — resume (resets turn counter)
- `/goal clear` — drop the goal
Works on both CLI and gateway platforms via the central CommandDef
registry.
### Design invariants preserved
- **Prompt cache**: continuation prompts are regular user-role
messages appended to history. No system-prompt mutation, no toolset
swap.
- **Role alternation**: continuation is a user turn, never injected
mid-tool-loop.
- **Session persistence**: goal state lives in SessionDB.state_meta
keyed by `goal:<session_id>`, so `/resume` picks it up.
- **Mid-run safety**: on the gateway, `/goal status|pause|clear` are
allowed mid-run (control-plane only); setting a new goal requires
`/stop` first so we don't race a second continuation prompt against
the current turn.
### Files
- `hermes_cli/goals.py` (new, 380 lines) — GoalManager + judge + state
- `hermes_cli/commands.py` — CommandDef entry
- `hermes_cli/config.py` — `goals.max_turns` default
- `hermes_cli/web_server.py` — dashboard category merge
- `cli.py` — /goal handler + post-turn continuation hook in
process_loop
- `gateway/run.py` — /goal handler + post-turn continuation hook
wrapping _handle_message_with_agent
- `tests/hermes_cli/test_goals.py` (new, 26 tests) — judge parsing,
fail-open semantics, lifecycle, persistence, budget exhaustion
- `website/docs/reference/slash-commands.md` — docs entry
* docs(sidebar): collapse exploding skills tree to a single Skills node
The Skills sub-tree in the left sidebar expanded to 200+ entries
(22 bundled categories + 15 optional categories, every skill a page).
That's most of the nav on a first visit — docs for the actual product
get drowned in it.
Collapse the sidebar to:
Skills
godmode (hand-written spotlight)
google-workspace (hand-written spotlight)
Bundled catalog (reference/skills-catalog — table of all bundled)
Optional catalog (reference/optional-skills-catalog — table of all optional)
Per-skill pages still generate and are still reachable at their URLs;
they're linked from the two catalog tables and from the Skills overview
page. They just don't appear in the left nav anymore.
sidebars.ts goes from 649 lines to 247. generate-skill-docs.py loses
the bundled/optional sidebar render helpers.
Also picks up incidental generator output drift on current main
(comfyui skill content refresh; 4 new skill pages for
devops-kanban-orchestrator, devops-kanban-worker,
productivity-here-now, productivity-shopify; two catalog refreshes).
These are what the generator produces on main today — keeping them
committed avoids the next docs build showing 'working tree dirty'.
* docs(sidebar): drop godmode and google-workspace spotlight pages
Keep the Skills sidebar node strictly principled: two catalog links,
nothing else. There was no rule for which skills got spotlight pages
and which got auto-generated pages — just that these two happened to
be hand-written first.
Both pages still build and are still reachable at
/docs/user-guide/skills/godmode and
/docs/user-guide/skills/google-workspace. They're linked from the
catalog tables and the Skills overview page.
Sidebar Skills node now:
Skills
├── Bundled catalog
└── Optional catalog
hermes update had two interactive [Y/n] prompts with no bypass:
1. Config migration (after new env/config options are added)
2. Autostash restore (when uncommitted work was stashed before pull)
hermes uninstall already has --yes/-y; mirrors that.
Under --yes:
- Config-migrate prompt → auto-yes, migrate_config(interactive=False)
so new config fields are applied but API-key prompts are skipped
(user runs 'hermes config migrate' later for those). Matches
gateway-mode semantics.
- Stash-restore prompt → auto-yes, git stash apply runs automatically.
Closes the 'can I hermes update -y, No ! Fix' gap reported by @murelux.
Adds opt-in auto-deletion for slash-command reply messages like
"New session started!", "Restarting gateway…", "Stopped.", and
YOLO toggles. After the TTL elapses the gateway calls the adapter's
delete_message; on platforms without a delete API (everything except
Telegram today) the TTL is silently ignored and the message stays.
Requested on Twitter by @charlesmcdowell — tool-call bubbles are useful
real-time, but system notices clutter the thread once the agent finishes.
Implementation:
- EphemeralReply(str) sentinel in gateway/platforms/base.py. Subclasses
str so existing 'X' in response / response.startswith(...) checks in
tests and call sites keep working unchanged; isinstance() still
distinguishes it for the send path.
- _process_message_background and both busy-session bypass paths
(in base.py) call _unwrap_ephemeral() on the handler return, send
the unwrapped text, and schedule a detached delete task when the
TTL > 0 AND the adapter class overrides delete_message.
- display.ephemeral_system_ttl (default 0 = disabled) in DEFAULT_CONFIG.
Handler can pass ttl_seconds explicitly to override.
- Wrapped the highest-noise return sites: /new, /reset, /stop,
/yolo on/off, /restart success + "already in progress". Draining
notices and /help output left as plain strings — those are
informational and users want to read them.
Backward-compat: default TTL 0 → no scheduling, no behavior change
for existing users. Platforms without delete_message silently no-op.
When the curator consolidates skill X into umbrella Y, any cron job
that listed X in its skills field would fail to load X at run time —
the scheduler logs a warning and skips it, so the scheduled job runs
without the instructions it was scheduled to follow.
cron.jobs.rewrite_skill_refs(consolidated, pruned) now updates jobs
in-place: consolidated names route to the umbrella target (dedup
when umbrella is already present), pruned names are dropped.
agent.curator._write_run_report calls it after classification,
best-effort so a cron-side failure never breaks the curator itself.
Results are recorded in run.json (counts.cron_jobs_rewritten + full
cron_rewrites payload), a separate cron_rewrites.json for convenience
when jobs were touched, and a section in REPORT.md.
Reported by @tombielecki.
DeepSeek V4 Pro tightened thinking-mode validation and rejects empty-string
reasoning_content with HTTP 400:
The reasoning content in the thinking mode must be passed back to the API.
run_agent.py injected "" at three fallback sites — the tool-call pad in
_build_assistant_message and both injection branches of
_copy_reasoning_content_for_api (cross-provider poison guard + unconditional
thinking pad). All three now emit " " (single space), which satisfies the
non-empty check on V4 Pro without leaking fabricated reasoning.
Also upgrades stale empty-string placeholders on replay: sessions persisted
before this change have reasoning_content="" pinned at creation time; when
the active provider enforces thinking-mode echo, the replay path now rewrites
"" -> " " so existing users don't 400 on their first V4 Pro turn after
updating. Non-thinking providers still round-trip "" verbatim.
Updates 9 existing assertions + adds 2 regression tests (stale-placeholder
upgrade, non-thinking verbatim preservation).
Refs #15250, #17400.
Closes#17341.
The user-visible /compress banner and the post-compression last_prompt_tokens
writeback both counted only the raw message transcript (chars/4). With a 15KB
system prompt and 30 tool schemas (~26KB), a 4-message transcript that looks
like ~45 tokens to the transcript-only estimator is really ~10.5K tokens of
request pressure — a 234x gap.
Two user-facing consequences:
- Banner shows 'Compressing … (~45 tokens)…' while compression is actually
firing on 10K+ tokens of real pressure, confusing users about why
compression triggered (reported by @codecovenant on X; #6217).
- Post-compression last_prompt_tokens writeback omits tool schemas, so the
next should_compress() check compares real usage against a stale
underestimate — compression triggers late, potentially past the model's
context limit on small-context models (#14695).
Swap estimate_messages_tokens_rough() for estimate_request_tokens_rough()
at every user-visible banner and at the post-compression writeback.
estimate_request_tokens_rough() already existed for exactly this purpose
and includes system prompt + tool schemas.
Touched call sites:
- run_agent.py: post-compression last_prompt_tokens writeback, post-tool
call should_compress() fallback when provider usage is missing
- cli.py: /compress banner + summary
- gateway/run.py: gateway /compress banner + summary
- tui_gateway/server.py: TUI /compress status + summary
- acp_adapter/server.py: ACP /compact before/after
Left intentionally alone:
- Session-hygiene fallback and the 'no agent' /status path in gateway/run.py
— no agent instance is in scope to query for system prompt/tools, and the
existing 30-50% overestimate wobble on hygiene is safety-accepted.
- Verbose-mode 'Request size' logging — informational only, already counts
system prompt via api_messages[0].
Also relabels the feedback line from 'Rough transcript estimate' to
'Approx request size' so the metric label matches what it actually measures.
Credits: diagnoses from @devilardis (#14695) and @Jackten (#6217);
user report @codecovenant on X (2026-04-30).
Closes#14695Closes#6217
When a user types /steer <text> on an ACP session that isn't actively
running a turn (and there's no interrupted-prompt salvage available),
_cmd_steer silently appended to state.queued_prompts and replied
"No active turn — queued for the next turn". That looks identical to
/queue output even though the user never typed /queue — @EddyLeeKhane
reported this as "/steer never works, gets queued instead".
Rewrite the payload to a plain user prompt before the slash-intercept
fires, matching the gateway's idle-/steer fallthrough in
gateway/run.py ~L4898.
`hermes update` ran the config migration (11 → 17) successfully then
crashed at `agent/skill_utils.py:340` during the post-migration
skill-config prompt. User @FlockonUS reported this on Twitter.
Root cause: `get_missing_skill_config_vars` in hermes_cli/config.py
only guarded the import of `discover_all_skill_config_vars`, not the
call. Any runtime exception inside the skill scan (malformed SKILL.md,
unreadable external skill dir, etc.) propagated up through
`migrate_config` and aborted `hermes update` after the version bump.
Wrap the call in try/except so skill-config prompting — which is a
post-migration nicety — can never block the migration itself.
The initial guardrail PR consolidated failure classification by pointing
display._detect_tool_failure at the new classify_tool_failure helper,
which was strictly broader: it flagged any JSON result with
"success": false / "failed": true / non-empty "error", plus plain-text
"traceback" and "error:" prefixes. That would uptick the user-visible
[error] tag on tools that return {"success": false} as a benign signal
(memory fullness, todo state, etc.) and feed the failure-streak counter
at the same time.
Restore display._detect_tool_failure to its pre-PR semantics verbatim.
Tighten classify_tool_failure (the guardrail's internal safety-fallback
used only when callers don't pass failed=) to match _detect_tool_failure
exactly, so the two never disagree. Production callers in run_agent.py
already pass an explicit failed= derived from _detect_tool_failure, so
the guardrail counter is driven by the same signal the CLI shows.
- Emit providers in CANONICAL_PROVIDERS order (matching hermes model)
with user-defined/custom providers appended after
- Remove digit quick-select (1-9,0) handler — inconsistent with
absolute row numbering and already removed from hint text
- Remove unused windowOffset import
_process_message_background snapshotted callback_generation from the
interrupt event at the TOP of the task — before the handler ran.
_hermes_run_generation is only set on the event by
GatewayRunner._bind_adapter_run_generation during
_handle_message_with_agent, which runs DURING the handler await. The
early snapshot always captured None, which then flowed into
pop_post_delivery_callback(..., generation=None) in the finally block.
In pop_post_delivery_callback, generation=None with a tuple-registered
entry (generation, callback) bypasses the ownership check — it pops and
fires the callback regardless of which run owns it. Result: a stale run
could fire a fresher run's post-delivery callback (e.g. a
background-review notification attributed to the wrong turn).
Fix: move the snapshot into the finally block, after the handler has
run and _hermes_run_generation has been bound to the current run.
Regression test added: simulates a stale handler at generation=1 and a
fresher callback registered at generation=2. Pre-fix: snapshot=None →
pop fires the generation=2 callback under generation=1's ownership
("newer" fires). Post-fix: snapshot=1 → pop skips the mismatched
entry, callback stays in the dict for the correct run to claim.
Verified: test FAILS on current main (captures "newer" in fired list),
PASSES with this fix.
Salvaged from PR #12565 (the callback-ownership portion only; the
/status totals portion was already fixed on main in 7abc9ce4d via #17158).
Co-authored-by: Oxidane-bot <1317078257maroon@gmail.com>
Widens #16528 to two sibling sites that had the same quoted-boolean
bug: a YAML string "false" (or "0", "no", "off") silently evaluated
truthy under bool() / if-check.
- gateway/run.py _load_show_reasoning: is_truthy_value wrap
- tools/skill_manager_tool.py _guard_agent_created_enabled: is_truthy_value wrap
- regression tests for both
SELECT in get_messages_as_conversation() was missing finish_reason, so
assistant messages round-tripped through replay (including /branch copies)
silently dropped the provider's stop signal. Adds it to the SELECT, restores
it on assistant rows, and locks it in with a round-trip test.
When running on a host with sudoers NOPASSWD configured for the current
user, interactive Hermes sessions were unnecessarily entering the
password prompt path before executing sudo commands. Outside Hermes,
`sudo -n true` exits 0 for that user.
Add `_sudo_nopasswd_works()` that probes `sudo -n true` and, when it
succeeds, lets `_transform_sudo_command()` return the command unchanged
with no stdin password. The probe:
- Is scoped to the `local` terminal backend only, so Docker/SSH/Modal
and other remote backends do not inherit host sudo state.
- Re-probes every call (no process-lifetime cache) so an expired sudo
timestamp cannot silently make a later command block waiting for a
password that Hermes never prompts for.
- Is bypassed entirely when `SUDO_PASSWORD` is configured or a cached
password already exists, preserving existing explicit-password flows.
Co-authored-by: Junting Wu <juntingpublic@gmail.com>