Follow-up to salvaged PR #41633: the timestamp prefix injection was
unconditional. Gate the in-context render behind
gateway.message_timestamps.enabled (default false) at both the live-message
and history-replay sites; timestamp metadata is still captured + persisted
regardless so the toggle can be flipped on later. Add DEFAULT_CONFIG entry,
docs, and gate tests.
Consolidates these related Amy fork patches:
- 429830f39 feat(gateway): inject message timestamps into user messages for LLM context
- 3c3d6fac0 fix: handle both ISO string and epoch float timestamps in history replay
- 2874f7725 feat: human-friendly timestamp format with weekday and timezone name
- 3735f4c8b fix: render gateway message timestamps once
Telegram does not echo a sendRichMessage's content back in
reply_to_message (.text/.caption empty, .api_kwargs None), so replies
to rich sends (briefings, the gateway's own rich finals) arrived with
no quotable text and the [Replying to: ...] injection was skipped.
Remember message_id -> text at send time in a best-effort JSON index
(gateway/rich_sent_store.py), and recover it on inbound when text and
caption are both empty. Best-effort and no-throw throughout: any
failure degrades to prior behavior and never breaks a send or message.
Salvaged from #47375 by @x1erra. Dropped the cross-platform run.py
reply-prefix rewrite (out of scope; bloated every reply on every
platform) and scrubbed a docstring reference to an out-of-repo script.
Kept the inbound reply_to logging enrichment used to verify the fix.
PROBLEM: Mattermost threads can become invalid or enormous, exposing two failure modes: internal scratch/reasoning/commentary displays could leak into persistent Mattermost threads via global display toggles, while rejected threaded user-visible replies could disappear unless every failed send fell back flat. A broad flat fallback would pollute channels with tool/status/progress noise.
SOLUTION: Require explicit Mattermost platform opt-in for scratch displays, keep using the existing notify=True metadata marker for user-visible final text/media/file replies, and allow the Mattermost plugin adapter to flat-fallback only notify-worthy sends whose threaded POST failure looks like a broken root/thread. Keep tool/status/progress and other non-notify sends thread-strict. Add regression tests for display opt-in, notify-only broken-thread fallback, generic API failure suppression, and stream notify metadata.
Verification: tests/gateway/test_mattermost.py tests/gateway/test_stream_consumer.py tests/gateway/test_stream_consumer_thread_routing.py tests/gateway/test_stream_consumer_fresh_final.py tests/gateway/test_stream_consumer_draft.py; tests/gateway/test_session_api.py tests/gateway/test_status_command.py tests/gateway/test_resume_command.py tests/hermes_cli/test_commands.py; py_compile touched gateway files; git diff --check.
Session: Mattermost thread 6qg8e9dd1pd9pkhi74xyaa1mry, 2026-06-01.
Add display.tool_progress_style setting to control how tool progress
messages are displayed in chat platforms:
- 'accumulate' (default): Edit a single message with all tool calls
(new v0.9.0 behavior)
- 'separate': Send each tool call as its own message, interleaved
with thinking messages (pre-v0.9 behavior, better readability)
The setting participates in the per-platform display override system
and can be set globally or per-platform.
Files: gateway/display_config.py, gateway/run.py
Background memory reviews now support three notification modes,
configured via display.memory_notifications in config.yaml:
off — no chat notification (still logged to stdout/HA log)
on — generic '💾 Memory updated' (default, unchanged behavior)
verbose — content preview with action indicators:
💾 Memory ➕ Hermes Repo liegt unter /config/amy/hermes-agent/...
💾 Memory ✏️ Updated repo path from claude-code to hermes-agent...
💾 Memory ➖ old entry about claude-code path...
Previews are truncated to 120 chars for adds/replaces, 60 for removes.
Each action gets its own line in verbose mode for readability.
Files: run_agent.py, gateway/run.py
* feat(delegation): async background subagents via delegate_task(background=true)
delegate_task(background=true) dispatches a subagent that runs in the
background and returns a handle immediately, so the user and model keep
working while it runs. The full result — plus the original task source —
re-enters the conversation as a new turn when the subagent finishes,
riding the same completion-queue rail as terminal background processes.
- tools/async_delegation.py: daemon-executor registry, capacity cap,
rich self-contained completion event pushed onto the shared
process_registry.completion_queue (type='async_delegation').
- delegate_tool.py: background param + single-task dispatch branch;
batch async rejected (v1).
- process_registry.py: format_process_notification renders the rich
task-source block (goal/context/toolsets/model/status/result).
- gateway/run.py: dedicated _async_delegation_watcher drains + injects
results into the originating session (idle + post-turn), session_key
routing enrichment, shutdown interrupt of dangling delegations.
- config: delegation.max_async_children (default 3).
Reuses the existing idle-drain wiring rather than mutating a running
agent loop, preserving message-role alternation and prompt-cache
invariants. 13 targeted tests; CLI + gateway paths E2E-verified.
* test(delegation): make async non-blocking tests environment-independent
CI 'test (5)' flaked on a cold, 8-worker runner: the first
delegate_task(background=true) call measured 2.27s of one-time setup
(config load + child-agent construction + imports), tripping the
elapsed < 1.0 wall-clock assertion. That assertion was testing setup
overhead, not blocking.
Replace the wall-clock thresholds with the real invariant: dispatch
returns while the child is still gated (active_count == 1, completion
queue empty), which a synchronous impl could not do. Keep only a loose
4s sanity backstop well under the runner's 5s gate.
* fix(delegation): harden async background delegation
Follow-up review fixes:
- Detach background child from parent._active_children at dispatch —
otherwise parent-turn interrupts (Ctrl+C, mid-turn steering), cache
evicts (release_clients), and session close (/new) kill/close the
detached subagent mid-run, defeating the point of background mode.
Lifecycle is owned by the async registry's interrupt_fn.
- Make the capacity check atomic with the record insert (TOCTOU: two
concurrent dispatches could both pass active_count() and exceed the cap).
- TUI dedup: key async_delegation events by delegation_id — the
fallthrough keyed them all as ("", type), suppressing every completion
after the first in the desktop/TUI status feed.
- CLI /stop now interrupts running background delegations and /agents
lists them (they live outside the process registry and were invisible).
- Drop stray unbalanced ']' line from the re-injection block and the
unused _ASYNC_DEFAULT import.
Tests: detach-at-dispatch + concurrent-capacity race added (15 total in
test_async_delegation.py); 137 delegate + 140 process-registry/notify/watch
+ 7 TUI dedup tests pass.
* fix(delegation): harden async background completion drains
Track why a background process finished and include that source in notify-on-complete messages so SIGTERM from process.kill, kill_all, backend loss, and ordinary exits are distinguishable.
The #45966 cross-process coherence guard snapshots a session's on-disk
message_count next to the cached agent and rebuilds the agent when the
count changes. But the snapshot is taken at agent-BUILD time — before
the turn writes its own user + assistant (+ tool) rows — and the cache
entry is never rewritten on a reuse. So this process's OWN turn grows
message_count, and the very next turn sees a mismatch and rebuilds the
agent. That happens every turn, for every conversation, silently
destroying the per-conversation prompt caching the cache exists to
protect (AGENTS.md: prompt caching is sacred).
Add _refresh_agent_cache_message_count(): after a turn completes and the
agent has flushed its rows to the SessionDB, re-baseline the stored count
to the now-current value. The guard then fires ONLY when a DIFFERENT
process changes the transcript — preserving the #45966 fix while keeping
the cache warm for normal single-process operation.
Tests drive the real SessionDB + the real guard condition: 5 consecutive
same-process turns now all REUSE the cached agent (0 before the fix); a
cross-process append still invalidates; and the re-baseline is fail-safe
(no DB, falsy session_id, raising probe, legacy 2-tuple, pending sentinel
all no-op).
Three changes to prevent infinite re-execution loops when a user sends
a new message while long-running tools are executing:
1. Filter interrupted tool results in _build_gateway_agent_history:
skip tool messages whose content contains [Command interrupted] or
exit_code 130 — they represent partial execution, not valid results.
2. Don't replay auto-continue notes as user messages: detect
gateway-injected [System note: ...] / [IMPORTANT: ...] prefixes
and skip them in _build_gateway_agent_history so the LLM doesn't
see 4+ messages from 'the user' telling it to finish old work.
3. Fix the wording: the system note now instructs the model to
address the user's NEW message FIRST, IGNORE pending results,
and NOT re-execute old tool calls.
Closes#45230
Gateway startup now queues real inbound messages until restart-interrupted auto-resume turns have completed, preventing duplicate agents for the same session after a restart.
A stale certifi CA bundle after a partial `hermes update` used to crash
the agent on the first outbound HTTPS call with a raw traceback and
trap the gateway in a retry loop.
This patch:
* Adds `agent/errors.py` with a typed `SSLConfigurationError`
* Adds `agent/ssl_guard.py` with a `verify_ca_bundle()` pre-flight
that asserts the bundle exists, is non-trivial in size, and can build
a working SSLContext. On macOS, it falls back to the system trust
store when the bundle is empty but the system store is healthy
(covers corporate proxies / MDM setups).
* Wires the guard into `run_agent.py` and `gateway/run.py` right
after the `hermes_bootstrap` import, inside a try/except so a bug
in the guard itself can never prevent startup.
* Adds a `SSL / CA Certificates` section to `hermes_cli doctor` so
users can detect the failure with one command.
* Adds unit tests covering the healthy, missing, empty, skip-env, and
macOS-fallback paths.
* Adds an RCA document describing the failure mode and the recovery
path (`pip install -e .`).
When the bundle is broken the user sees:
\u26a0\ufe0f SSL certificate bundle issue detected.
Run: pip install -e .
`HERMES_SKIP_SSL_GUARD=1` disables the check for sandboxed
environments that ship their own trust store.
The salvaged fix's two regression tests mock adapter.handle_message, so
they only assert the pre-claimed sentinel is set/cleaned around a stub —
they never drive the real dispatch chain. Add a full-path test that
exercises _schedule_resume_pending_sessions -> _guarded_handle_message ->
adapter.handle_message -> _process_message_background -> _handle_message
and asserts the resumed session's agent runs EXACTLY ONCE: not zero (the
pre-claim must not self-bounce the resume into a queued no-op) and not
twice (the duplicate-agent bug #45456 the fix targets). Also assert no
leaked sentinel and no orphaned pending event after the drain settles.
Tighten the _guarded_handle_message docstring: on current main the real
sentinel is taken over inside _handle_message (not _process_message_background),
and note the `is _AGENT_PENDING_SENTINEL` guard only releases the slot we
ourselves placed, never one a live run owns.
When the gateway restarts and auto-resumes an interrupted session, an
inbound message arriving in the window between `asyncio.create_task()`
and the task's first await could spin up a second AIAgent for the same
session. Both agents would then process messages concurrently,
producing interleaved duplicate responses (#45456).
Fix: set `_AGENT_PENDING_SENTINEL` in `_running_agents` immediately
after the "already running" check, before creating the task. This
closes the race window — any inbound message sees the slot as occupied
and queues behind the auto-resume.
A `_guarded_handle_message` wrapper ensures the pre-claimed sentinel is
always released, even if `handle_message` raises before reaching
`_process_message_background` (whose `finally` block handles normal
cleanup).
(cherry picked from commit 85150c976b)
After compression exhaustion the auto-reset created a fresh session but
discarded reset_session()'s return value and left the Telegram topic
binding pointing at the oversized compressed child. The next inbound
message in that topic healed the binding forward and switch_session'd the
freshly-reset lane back onto the bloated transcript, re-triggering
compression exhaustion in a loop with a new session id each time.
Capture the fresh entry and re-sync the topic binding to it so the next
message starts clean. No-op on non-topic lanes.
Regression of the #9893/#10063 auto-reset fix.
Fixes#35809
Salvages PR #25747 by preserving gateway session rotation even when a post-compression model call fails before returning final content.
Co-authored-by: Hermes <127238744+teknium1@users.noreply.github.com>
* feat(billing): /usage → portal top-up browser handoff
Add the terminal side of the billing slice (phase 2a): start a top-up by
throwing the user to the portal billing page with the top-up modal open. The
terminal does not confirm, poll, or track payment — checkout completes in the
browser and the next /usage shows the new balance.
- nous_account.py: parse organisation.slug/name from /api/oauth/account into
NousPortalAccountInfo; add nous_portal_topup_url() building the org-pinned
{base}/orgs/{slug}/billing?topup=open with a null-slug fallback to the legacy
{base}/billing?topup=open (never /orgs/None/...).
- portal_cli.py: 'hermes portal topup' — fresh account fetch, identity line
(Topping up as <email> / org <name>), browser open with printed-URL fallback,
no-wait closing copy. No polling/confirmation (deferred to 2b).
- account_usage.py: the shared /usage credits block now links the org-pinned
top-up URL (auto-opens the modal) + points to the command.
Depends on NAS #409 (organisation.slug/name + ?topup=open). Do not merge until
that is live on the target env; until then /api/oauth/account returns
organisation: { id } only and the URL falls back to legacy.
* feat(billing): /credits command for balance + top-up handoff
Replace the standalone `hermes portal topup` subcommand with an in-session
/credits slash command — a focused money surface (balance in, top-up out) that
works in the CLI, TUI, and every messaging platform from one registry entry.
- commands.py: register /credits (Info category). Slack is at its 50-slash cap,
so /credits is routed via /hermes credits on Slack only (new
_SLACK_VIA_HERMES_ONLY set) to avoid clamping a canonical command off the
native list and breaking Telegram parity; native everywhere else.
- account_usage.py: build_credits_view() — one portal fetch → balance lines +
identity line + org-pinned top-up URL + depleted flag, consumed by all
surfaces. Reuses the same snapshot/URL builder as /usage so numbers match.
- cli.py: _show_credits() — balance block + identity line + 3-button panel
(Open top-up / Copy link / Cancel) via the existing prompt_toolkit modal.
ASK, never auto-launch; headless falls back to printing the URL.
- gateway/slash_commands.py: _handle_credits_command() — renders the block +
tappable top-up URL + no-wait copy; works on button and plain-text platforms.
- /usage credits line now points to /credits.
- Retire `hermes portal topup` (portal_cli.py back to baseline); the engine
(slug/name parse + nous_portal_topup_url) stays as the shared core.
No polling, no payment confirmation (billing phase 2a). Depends on NAS #409.
* fix(credits): /credits works in the TUI slash-worker (non-interactive)
In the TUI, /credits runs in the slash-worker subprocess where there is no
live prompt_toolkit app and stdin is the JSON-RPC pipe. _show_credits called
the 3-button modal unconditionally, which fell back to reading stdin →
exception → slash.exec rejected → the command produced no output (only the
pre-existing 'Credit access paused' banner showed).
- _show_credits: when self._app is None (TUI worker / piped / non-interactive),
render the text variant — balance block + tappable top-up URL + no-wait line,
same affordance as the messaging surfaces — and skip the modal entirely. The
3-button panel still renders in the interactive CLI.
- Depleted banner copy: 'run /usage for balance' → 'run /credits to top up'
now that /credits is the dedicated money surface (+ tests).
- Regression tests: _show_credits with self._app=None renders text and never
invokes the modal; logged-out path.
* feat(tui): credits.view RPC for the /credits tappable top-up button
Add a credits.view JSON-RPC method returning the structured CreditsView
(logged_in, balance_lines, identity_line, topup_url, depleted) so the TUI can
render a clickable <Link> top-up button instead of plain text. Account-
independent (portal fetch gated on a logged-in Nous account), fail-open to
{logged_in: false} on any hiccup. Mirrors session.usage's credits-block pattern.
Frontend (TUI-local /credits command + Ink component) lands separately.
* feat(tui): /credits command with keyboard-driven top-up confirm
TUI-local /credits: fetches the structured balance via the credits.view RPC,
prints the balance + identity + top-up URL, then arms the EXISTING confirm
overlay (Enter = open top-up in browser via openExternalUrl, Esc = cancel).
Reuses ConfirmReq — no new overlay component/state/input handler. Headless
(openExternalUrl returns false) falls back to printing the URL.
- gatewayTypes.ts: CreditsViewResponse.
- commands/credits.ts: the command (mirrors /status's rpc+guarded pattern).
- registry.ts: register creditsCommands.
- test: balance+overlay armed, headless fallback, no-url, logged-out (4 cases).
Matches the CLI /credits 'Enter to open' affordance. Phase 2a: no polling.
Sibling site of the PDF/DOCX note fixed in PR #44175: the audio file
attachment context note led with "Ask the user what they'd like you to
do with it", steering the model into asking instead of transcribing.
Rewritten to instruct the agent to transcribe/process the file itself
when the request involves its content, only asking when intent is
genuinely unclear. Contract assertion added to the existing audio
attachment note test.
When a user attached a binary document (PDF, DOCX, XLSX, …) in chat, the
context note prepended to the turn said "Ask the user what they'd like you to
do with it." That steered the model into asking the user to paste the
contents rather than extracting the text it is fully capable of reading — so
attached PDFs/DOCX appeared "unreadable" to the agent.
Rewrite the binary-document note to tell the agent the file is a non-text
format saved at the given path and to extract its text itself (e.g. via the
terminal tool or the ocr-and-documents skill) before answering. Text
documents (whose content is already inlined by the platform adapter) keep
their existing note. The note construction is pulled into a small
`_build_document_context_note` helper so it is unit-testable.
Review fixes for the Cron Recipes stack before release:
- hydration-move: */90 in the cron minute field silently wraps to hourly
(croniter-verified) — 90/120-minute options never fired at their stated
cadence. Replaced with an hour-field step (0 9-17/2 * * 1-5) and an
interval_hours slot whose options (1/2/3h) all fire as labeled.
- fill_recipe: reject unknown slot names. A typo'd 'tiem=07:15' used to
silently create the job at the 08:00 default; now it 422s on the dashboard
form and errors on the slash/deep-link paths with the valid slot list.
- deliver slot: non-strict enum (options are suggestions, scheduler
validates downstream) so slack/whatsapp/etc. users aren't locked out;
GET /api/cron/recipes rewrites its options from cron_delivery_targets()
so the dashboard form only offers configured platforms; help text no
longer claims dashboard-created jobs deliver to 'the chat you set this
up from' (the endpoint strips origin — they go to the home channel).
- gateway: success/accept messages no longer point at /cron (cli_only);
surface-aware hint instead. Conversational fill now sends the
'Setting up X — I'll ask you a couple of things…' ack before the agent
turn, matching the CLI experience.
- important-mail catalog entry: reference the urgency classifier by module
path (python3 -m cron.scripts.classify_items) instead of baking an
absolute host path into the job prompt — stale after relocation and
nonexistent on remote terminal backends. cron/scripts is now a real
package and ships in the wheel (pyproject packages.find).
- export_recipe: interval schedules round-trip again — parse_schedule
stores 'minutes' but the renderer only read 'seconds', so every interval
job exported as the silent '0 9 * * *' fallback.
- skills_hub install: say so when a recipe suggestion is dropped
(latched dedup or pending cap) instead of printing nothing.
Targeted tests: 58 cron/recipe + 261 web_server pass; E2E-validated all
14 recipes fill+parse, hydration cadences via croniter, typo rejection on
slash + endpoint paths, surface-aware hints, and interval export round-trip.
Reworks the chat-line UX: pick a recipe by name and the agent asks you for
what it needs, one question at a time, instead of forcing you to hand-type a
slot=val command line.
- /cron-recipe -> lists the catalog
- /cron-recipe <name> -> forgiving name match (exact/prefix/substring/
fuzzy; ambiguous lists candidates), then seeds
the agent with a natural-language fill request
built from the recipe's typed slots + schedule
and prompt templates. The agent asks for each
value one at a time and calls the EXISTING
cronjob tool. No new tool.
- /cron-recipe <name> slot=val -> unchanged deterministic path (fill_recipe ->
create_job) for the dashboard/docs/power user.
Mechanism (no new plumbing, invariant-safe — the seed enters as a normal user
turn, never a synthetic injection):
- shared handler returns RecipeCommandResult{text, agent_seed}; match_recipe()
and build_recipe_seed() are the new shared pieces.
- gateway: dispatch rewrites event.text to the seed and falls through to the
agent (the same pattern /steer uses).
- CLI: handler sets a one-shot self._pending_agent_seed; the interactive loop
consumes it right after process_command() and runs it as the next turn.
The typed-slot schema stays the single source of truth (still validates the
form/inline path via fill_recipe); the agent path just renders those slots into
the questions to ask. Docs updated to lead with the name-then-ask flow.
A 'recipe' is a one-place definition of an automation that every surface
renders natively. The slot schema (cron/recipe_catalog.py) is the single
source of truth; four renderers consume it, and all paths end at the same
cron.jobs.create_job — no second job engine.
Form where there's a screen, conversation where there's a chat line:
- Dashboard / GUI app: a Recipes sub-tab on the Cron page renders each
recipe's typed slots as a form (time-picker, enum dropdown, free-text);
submit POSTs /api/cron/recipes/instantiate which fills + creates the job.
- CLI / TUI / messengers: /cron-recipe lists the catalog, shows a recipe's
fields, or fills + creates from a pasted 'key slot=val' command. The shared
handler (hermes_cli/cron_recipe_cmd.py) names any missing/invalid slot so
the agent can ask a targeted follow-up.
- Docs: a generated Cron Recipes catalog page (website, .mdx + React cards)
shows each recipe with a copy-paste command and a 'Send to App' button.
- Desktop: a hermes:// URL scheme (Electron single-instance lock +
setAsDefaultProtocolClient + open-url/second-instance) routes
hermes://cron-recipe/<key>?slot=val into the chat composer pre-filled.
Typed slots (time/enum/text/weekdays) with defaults: users never type raw
cron — recipes parameterize time-of-day and weekday sets and translate to
cron expressions; a free-text 'schedule' slot is the full-flexibility escape
hatch. Consent-first throughout: nothing schedules without an explicit submit
or send.
Core:
- cron/recipe_catalog.py — CronRecipe + RecipeSlot, 5 curated recipes,
recipe_form_schema / recipe_slash_command / recipe_deeplink /
recipe_catalog_entry renderers, fill_recipe (validate + translate to
create_job kwargs).
- hermes_cli/cron_recipe_cmd.py — shared /cron-recipe handler (CLI + TUI +
gateway never drift). CommandDef + dispatch in commands.py / cli.py /
gateway/run.py.
Dashboard: GET /api/cron/recipes + POST /api/cron/recipes/instantiate
(web_server.py), CronRecipes.tsx gallery+form, Segmented sub-tab on CronPage,
api.ts methods + types.
Desktop: hermes:// scheme end to end (main.cjs deep-link router + ready-queue,
preload onDeepLink/signalDeepLinkReady, global.d.ts types, desktop-controller
composer prefill, electron-builder protocols key).
Docs: extract-cron-recipes.py generator wired into prebuild.mjs,
cron-recipes-catalog.mdx + CronRecipesCatalog React component, sidebar entry.
Generated index json gitignored like skills.json.
Tests: 23 core (catalog/slots/schedule-resolution/validation/renderers/command
handler/generator) + 5 web_server endpoint tests. E2E verified end to end:
slot fill -> create_job -> persisted job with correct schedule/deliver/origin.
Hermes can propose automations and let the user accept them with one tap
via /suggestions, instead of making them assemble cron jobs by hand. Every
proposal — wherever it originates — flows through one surface.
Sources (the 'where suggestions come from'):
- catalog: curated starter automations (daily briefing, important-mail
monitor, weekly review, workday-start reminder) via /suggestions catalog
- recipe: installing a skill that carries a metadata.hermes.recipe block
registers a suggestion instead of auto-scheduling
- usage / integration: reserved for the background-review detector and
account-connect triggers (sources defined; emitters land next)
Pieces:
- cron/suggestions.py — the store. add/list/accept/dismiss, dedup+latch by
key (dismissed proposals never re-offered), pending cap so it can't become
a nag wall. Accepting calls the existing cron.jobs.create_job — there is
NO second job engine. Mirrors jobs.py storage (atomic writes, lock, 0600).
- cron/suggestion_catalog.py — the curated set. The important-mail monitor
entry is where the old proactive-monitor poll->classify->surface engine
lives now (cron/scripts/classify_items.py + the 'monitor' aux task), as ONE
catalog automation rather than a standalone feature.
- tools/recipes.py — recipe<->job bridge; register_recipe_suggestion() makes
a recipe source 'recipe' of this surface. recipe_to_job_spec() is the single
translation both the direct and suggestion paths share.
- hermes_cli/suggestions_cmd.py — shared /suggestions handler (CLI + gateway
never drift); /suggestions [accept N|dismiss N|catalog|clear].
- Wired: CommandDef + CLI dispatch (cli.py) + gateway dispatch (gateway/run.py)
+ aux 'monitor' task (config.py) + recipe-install hook (skills_hub.py).
Consent-first throughout: nothing auto-schedules; acceptance is always
explicit; dismissals latch.
Supersedes #41122 (proactive-monitor) and #41127 (recipes): both fold in here
as a catalog entry and a suggestion source respectively.
Tests: store (dedup/cap/accept/dismiss/latch), catalog seeding+idempotency,
recipe->suggestion bridge, command handler, aux config. E2E: recipe SKILL.md
-> parsed -> suggested -> accepted -> real cron job persisted to jobs.json.
Follow-up to the salvaged #41264 (Windows watcher): the setsid/bash detached
restart watcher on Linux/macOS inherits _HERMES_GATEWAY=1 the same way, so
the CLI's self-restart loop guard silently refuses 'hermes gateway restart'
and the gateway never comes back. Scrub the marker from the watcher env on
the POSIX branch as well, and extend the setsid test to assert it.
When the agent runs several terminal commands back-to-back, each
progress line repeated the '💻 terminal' header above its fenced code
block, cluttering the progress bubble. Now only the first terminal call
in a streak emits the header; subsequent consecutive terminal calls
render adjacent code blocks. Any other tool (or non-block preview)
resets the streak so the next terminal call gets a fresh header.
Follow-ups to #38199/#43354 found in post-merge review:
- Inline CLI memory approval never worked: the per-thread approval callback
was not passed to prompt_dangerous_approval, so the prompt_toolkit
fail-closed guard (#15216) denied every gated foreground write without
showing a prompt. Now invokes the registered callback directly; a crashed
prompt falls back to staging instead of a silent deny.
- Gateway sessions claimed inline support but prompt_dangerous_approval has
no gateway round-trip (that lives in the pending-approval queue), so gated
gateway memory writes hit the input() fallback and denied. Gateway
contexts now stage for /memory pending review.
- /skills pending|approve|reject|diff|approval now works on the gateway
(gateway_config_gate on skills.write_approval), so skills staged from a
messaging session can be reviewed there. Diff output truncated for chat.
- memory_tool validates required params before the gate so invalid writes
are rejected immediately instead of staged and failing at approve time.
- Stale tri-state write_mode docstrings updated to the boolean gate; docs
table corrected (inline prompt is interactive-CLI-only).
- 6 new tests covering the interactive approve/deny/error paths, gateway
staging, skills never-prompt invariant, and pre-gate validation.
Two fixes for the reported Slack thread approval UX:
1. Slack Block Kit approval/confirm sends silently overflowed the
3000-char section-block cap (flat 2900-char truncation + header +
reason), so long execute_code approvals failed with invalid_blocks
and fell back to the plain-text prompt with no buttons. Budget the
command preview against the rendered fixed parts so blocks never
exceed the cap (send_exec_approval + send_slash_confirm).
2. The text fallbacks told users to reply /approve — which Slack blocks
inside threads and Matrix clients reserve client-side. Add a
typed_command_prefix capability flag on BasePlatformAdapter
(default "/"; Slack and Matrix set "!" to match their existing
bang-prefix rewrite) and use it in the shared fallback prompt
builders (exec approval, update prompt, destructive slash confirm,
expensive-model confirm) plus Matrix's reaction-prompt text.
The slash-confirm text-intercept now also accepts bang-prefixed
replies (!always, !cancel) since those keywords aren't registered
commands and the adapters' rewrite doesn't touch them.
The voice inactivity timer (VOICE_TIMEOUT) only counted the bot's OWN audio
playback as activity. Under /voice off (text-only replies, but still in the
channel — leaving is /voice leave) nothing ever reset it, so every 300s the bot
disconnected and spammed "Left voice channel (inactivity timeout)."
The adapter now learns the live voice-reply mode via a getter wired from run.py
and skips the auto-disconnect while mode is off. It also resets the timer when a
user actually speaks to the bot, so an active listener (incl. voice-on
text-only sessions that never play audio) isn't dropped mid-conversation.
## What does this PR do?
The voice-during-active-run feature (#41984) changed
`_enrich_message_with_transcription` so that it returns a
`(enriched_text, successful_transcripts)` tuple instead of a bare string,
which lets callers echo the raw transcript back to the user. The signature
and every other return path were updated to match, but one branch was
missed: when a successfully transcribed clip arrives with the Discord
"empty content" placeholder as its caption, the method still returned the
prefix string on its own. All four call sites unpack the result with
`text, transcripts = await self._enrich_message_with_transcription(...)`,
so that path raised `ValueError: too many values to unpack (expected 2)`
and the inbound voice message was dropped instead of reaching the agent.
This is a real user-facing path rather than a corner case: a Discord voice
note sent without a caption is delivered as exactly that placeholder, so a
captionless voice message that transcribed correctly would crash the
handler precisely when transcription had worked. The fix returns the
proper tuple from that branch so the placeholder is still stripped while
the transcripts continue to flow back to the caller for the echo.
## Related Issue
N/A
## Type of Change
- [x] 🐛 Bug fix (non-breaking change that fixes an issue)
- [ ] ✨ New feature (non-breaking change that adds functionality)
- [ ] 🔒 Security fix
- [ ] 📝 Documentation update
- [ ] ✅ Tests (adding or improving test coverage)
- [ ] ♻️ Refactor (no behavior change)
- [ ] 🎯 New skill (bundled or hub)
## Changes Made
- `gateway/run.py`: in `_enrich_message_with_transcription`, return
`(prefix, successful_transcripts)` instead of a bare `prefix` from the
empty-content-placeholder branch, so the contract matches the signature
and the other return paths.
- `tests/gateway/test_stt_config.py`: add
`test_enrich_message_with_transcription_returns_tuple_for_empty_content_placeholder`,
which drives a successful transcription with the placeholder caption and
asserts the placeholder is stripped while the transcript is still returned.
## How to Test
1. Check out `main` and run the new test — it fails with
`ValueError: too many values to unpack (expected 2)`, reproducing the
crash a captionless Discord voice note would trigger.
2. Apply this change and re-run
`pytest tests/gateway/test_stt_config.py -q` — all tests pass.
3. `ruff check gateway/run.py tests/gateway/test_stt_config.py` and
`python scripts/check-windows-footguns.py gateway/run.py
tests/gateway/test_stt_config.py` both pass.
## Checklist
### Code
- [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md)
- [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.)
- [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix/feature (no unrelated commits)
- [x] I've run `pytest tests/ -q` and all tests pass
- [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)
### Documentation & Housekeeping
- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
Adds memory.write_mode and skills.write_mode (on|off|approve), applied to
both foreground turns and the background self-improvement review fork — the
source of the unprompted 'wrong assumption' saves users reported.
- on (default): write freely, unchanged behaviour
- off: never write; the tool returns a clean disabled result
- approve: don't commit. Memory foreground writes prompt inline (small,
reviewable in a chat bubble); background memory writes and ALL skill writes
stage to a pending store instead (a SKILL.md is too large to review inline,
and a daemon thread can't block on a prompt)
Review staged writes from CLI or any messaging platform:
/memory pending|approve|reject|mode
/skills pending|approve|reject|diff|mode
Skill review respects the size asymmetry: inline you see a one-line gist;
the full unified diff stays out-of-band (/skills diff, dashboard, or the
staged JSON file).
New: tools/write_approval.py (gate + pending store), hermes_cli/
write_approval_commands.py (shared CLI+gateway handlers). Gates wired at the
single entry points memory_tool() and skill_manage(), using the existing
write-origin ContextVar to distinguish foreground from background_review.
* fix(gateway): auto-start after container restart via planned-stop marker
On Docker (s6-overlay), the gateway runs as a dynamically-registered s6
service. When the container stops/restarts/upgrades, s6 sends the gateway
a plain SIGTERM. The shutdown path (_stop_impl) ended with an
unconditional _update_runtime_status("stopped"), persisting
gateway_state=stopped to the volume. container_boot.py reads that on the
next boot and only auto-starts gateways whose last state was "running"
(_AUTOSTART_STATES) — so after a routine `docker compose up
--force-recreate` the gateway stays down and messaging channels silently
go dark, with no error surfaced (issue #42675).
The codebase already distinguishes intentional stops from unexpected
signals via the planned-stop marker (write_planned_stop_marker /
consume_planned_stop_marker_for_self): `hermes gateway stop`,
systemd/launchd ExecStop, and Ctrl+C write a marker before signalling,
so the handler classifies them as planned. An unmarked SIGTERM
(container/s6 restart, OOM, bare kill) is signal-initiated.
This wires that existing classification through to the state persist,
rather than adding unreliable signal-source inference:
- run.py: GatewayRunner._signal_initiated_shutdown, set in
shutdown_signal_handler's unmarked-signal branch. In _stop_impl, a
signal-initiated (non-restart) teardown now persists "running" instead
of "stopped" — preserving the operator's run-intent and overwriting the
mid-shutdown "draining" marker so _AUTOSTART_STATES matches on reboot.
Operator stops and restarts persist "stopped" as before.
- service_manager.py: S6ServiceManager.stop() now writes the planned-stop
marker for the supervised PID (read from s6-svstat) before `s6-svc -d`,
so an in-container `hermes gateway stop` is correctly classified as
intentional (parity with the systemd/launchd/host stop paths, which
already mark). Best-effort: a marker-write failure falls back to the
safe signal-initiated path.
Tests: shutdown persist-decision table (signal→running, operator→stopped,
restart→stopped), s6 stop marker write + svstat PID parse + failure
tolerance. The signal→running and s6-marker tests fail without the
respective source change. Verified end-to-end against a container built
from this branch: an unmarked SIGTERM to the live gateway leaves
gateway_state=running (shutdown-context log confirms signal path);
existing real container-restart suite still green.
* docs(docker): clarify gateway autostart distinguishes operator-stop from container-kill
The per-profile-supervision section described the autostart-across-restart
contract as "running gateways come back, stopped stay stopped" without
spelling out what records 'stopped'. That contract was the source of
#42675 confusion: users expected a restart to bring the gateway back and
it didn't. With the write-side fix, only an explicit `hermes gateway stop`
records 'stopped'; container/s6 restart SIGTERMs (incl. image upgrades and
unexpected exits) leave the state 'running' so the gateway auto-starts.
Make that distinction explicit in both the multi-profile and
per-profile-supervision sections.
* test(docker): real-restart autostart E2E for #42675
Adds test_live_gateway_autostarts_after_real_restart_without_manual_state_stamp:
a live s6-supervised gateway is killed by an actual `docker restart`
SIGTERM (no manual gateway_state stamp, no planned-stop marker) and must
auto-start on the next boot. Exercises the WRITE side of the fix that the
existing stamp-based tests bypass.
Verified to FAIL against an origin/main image (reconciler logs
prior_state=stopped action=registered — the #42675 bug) and PASS against
the fixed image (prior_state=running action=started).
The markdown code-block change rendered args['command'] in full in both
verbose AND non-verbose (all/new) modes, so a long or multi-line terminal
command bypassed the tool_preview_length cap (default 40) and rendered as
a huge block. Non-verbose now collapses to a single line capped at the
preview length while keeping the fence; verbose keeps the full command.
image_generate returns its artifact as JSON ({"image": "/abs/path.png"})
with no MEDIA: tag, so the gateway auto-append path (which only recognized
text_to_speech MEDIA: tags) never delivered it — image delivery silently
depended on the model restating the path in its reply. Add image_generate to
the producer allowlist and extract the local path from its JSON result
(host_image > image > agent_visible_image), reusing the existing
extension-anchored matcher and history-dedupe so remote URLs, unknown
extensions, failures, and already-sent paths are rejected.
Closes the remaining unfixed path from #19105.