* feat(billing): /usage → portal top-up browser handoff
Add the terminal side of the billing slice (phase 2a): start a top-up by
throwing the user to the portal billing page with the top-up modal open. The
terminal does not confirm, poll, or track payment — checkout completes in the
browser and the next /usage shows the new balance.
- nous_account.py: parse organisation.slug/name from /api/oauth/account into
NousPortalAccountInfo; add nous_portal_topup_url() building the org-pinned
{base}/orgs/{slug}/billing?topup=open with a null-slug fallback to the legacy
{base}/billing?topup=open (never /orgs/None/...).
- portal_cli.py: 'hermes portal topup' — fresh account fetch, identity line
(Topping up as <email> / org <name>), browser open with printed-URL fallback,
no-wait closing copy. No polling/confirmation (deferred to 2b).
- account_usage.py: the shared /usage credits block now links the org-pinned
top-up URL (auto-opens the modal) + points to the command.
Depends on NAS #409 (organisation.slug/name + ?topup=open). Do not merge until
that is live on the target env; until then /api/oauth/account returns
organisation: { id } only and the URL falls back to legacy.
* feat(billing): /credits command for balance + top-up handoff
Replace the standalone `hermes portal topup` subcommand with an in-session
/credits slash command — a focused money surface (balance in, top-up out) that
works in the CLI, TUI, and every messaging platform from one registry entry.
- commands.py: register /credits (Info category). Slack is at its 50-slash cap,
so /credits is routed via /hermes credits on Slack only (new
_SLACK_VIA_HERMES_ONLY set) to avoid clamping a canonical command off the
native list and breaking Telegram parity; native everywhere else.
- account_usage.py: build_credits_view() — one portal fetch → balance lines +
identity line + org-pinned top-up URL + depleted flag, consumed by all
surfaces. Reuses the same snapshot/URL builder as /usage so numbers match.
- cli.py: _show_credits() — balance block + identity line + 3-button panel
(Open top-up / Copy link / Cancel) via the existing prompt_toolkit modal.
ASK, never auto-launch; headless falls back to printing the URL.
- gateway/slash_commands.py: _handle_credits_command() — renders the block +
tappable top-up URL + no-wait copy; works on button and plain-text platforms.
- /usage credits line now points to /credits.
- Retire `hermes portal topup` (portal_cli.py back to baseline); the engine
(slug/name parse + nous_portal_topup_url) stays as the shared core.
No polling, no payment confirmation (billing phase 2a). Depends on NAS #409.
* fix(credits): /credits works in the TUI slash-worker (non-interactive)
In the TUI, /credits runs in the slash-worker subprocess where there is no
live prompt_toolkit app and stdin is the JSON-RPC pipe. _show_credits called
the 3-button modal unconditionally, which fell back to reading stdin →
exception → slash.exec rejected → the command produced no output (only the
pre-existing 'Credit access paused' banner showed).
- _show_credits: when self._app is None (TUI worker / piped / non-interactive),
render the text variant — balance block + tappable top-up URL + no-wait line,
same affordance as the messaging surfaces — and skip the modal entirely. The
3-button panel still renders in the interactive CLI.
- Depleted banner copy: 'run /usage for balance' → 'run /credits to top up'
now that /credits is the dedicated money surface (+ tests).
- Regression tests: _show_credits with self._app=None renders text and never
invokes the modal; logged-out path.
* feat(tui): credits.view RPC for the /credits tappable top-up button
Add a credits.view JSON-RPC method returning the structured CreditsView
(logged_in, balance_lines, identity_line, topup_url, depleted) so the TUI can
render a clickable <Link> top-up button instead of plain text. Account-
independent (portal fetch gated on a logged-in Nous account), fail-open to
{logged_in: false} on any hiccup. Mirrors session.usage's credits-block pattern.
Frontend (TUI-local /credits command + Ink component) lands separately.
* feat(tui): /credits command with keyboard-driven top-up confirm
TUI-local /credits: fetches the structured balance via the credits.view RPC,
prints the balance + identity + top-up URL, then arms the EXISTING confirm
overlay (Enter = open top-up in browser via openExternalUrl, Esc = cancel).
Reuses ConfirmReq — no new overlay component/state/input handler. Headless
(openExternalUrl returns false) falls back to printing the URL.
- gatewayTypes.ts: CreditsViewResponse.
- commands/credits.ts: the command (mirrors /status's rpc+guarded pattern).
- registry.ts: register creditsCommands.
- test: balance+overlay armed, headless fallback, no-url, logged-out (4 cases).
Matches the CLI /credits 'Enter to open' affordance. Phase 2a: no polling.
Sibling site of the PDF/DOCX note fixed in PR #44175: the audio file
attachment context note led with "Ask the user what they'd like you to
do with it", steering the model into asking instead of transcribing.
Rewritten to instruct the agent to transcribe/process the file itself
when the request involves its content, only asking when intent is
genuinely unclear. Contract assertion added to the existing audio
attachment note test.
When a user attached a binary document (PDF, DOCX, XLSX, …) in chat, the
context note prepended to the turn said "Ask the user what they'd like you to
do with it." That steered the model into asking the user to paste the
contents rather than extracting the text it is fully capable of reading — so
attached PDFs/DOCX appeared "unreadable" to the agent.
Rewrite the binary-document note to tell the agent the file is a non-text
format saved at the given path and to extract its text itself (e.g. via the
terminal tool or the ocr-and-documents skill) before answering. Text
documents (whose content is already inlined by the platform adapter) keep
their existing note. The note construction is pulled into a small
`_build_document_context_note` helper so it is unit-testable.
Review fixes for the Cron Recipes stack before release:
- hydration-move: */90 in the cron minute field silently wraps to hourly
(croniter-verified) — 90/120-minute options never fired at their stated
cadence. Replaced with an hour-field step (0 9-17/2 * * 1-5) and an
interval_hours slot whose options (1/2/3h) all fire as labeled.
- fill_recipe: reject unknown slot names. A typo'd 'tiem=07:15' used to
silently create the job at the 08:00 default; now it 422s on the dashboard
form and errors on the slash/deep-link paths with the valid slot list.
- deliver slot: non-strict enum (options are suggestions, scheduler
validates downstream) so slack/whatsapp/etc. users aren't locked out;
GET /api/cron/recipes rewrites its options from cron_delivery_targets()
so the dashboard form only offers configured platforms; help text no
longer claims dashboard-created jobs deliver to 'the chat you set this
up from' (the endpoint strips origin — they go to the home channel).
- gateway: success/accept messages no longer point at /cron (cli_only);
surface-aware hint instead. Conversational fill now sends the
'Setting up X — I'll ask you a couple of things…' ack before the agent
turn, matching the CLI experience.
- important-mail catalog entry: reference the urgency classifier by module
path (python3 -m cron.scripts.classify_items) instead of baking an
absolute host path into the job prompt — stale after relocation and
nonexistent on remote terminal backends. cron/scripts is now a real
package and ships in the wheel (pyproject packages.find).
- export_recipe: interval schedules round-trip again — parse_schedule
stores 'minutes' but the renderer only read 'seconds', so every interval
job exported as the silent '0 9 * * *' fallback.
- skills_hub install: say so when a recipe suggestion is dropped
(latched dedup or pending cap) instead of printing nothing.
Targeted tests: 58 cron/recipe + 261 web_server pass; E2E-validated all
14 recipes fill+parse, hydration cadences via croniter, typo rejection on
slash + endpoint paths, surface-aware hints, and interval export round-trip.
Reworks the chat-line UX: pick a recipe by name and the agent asks you for
what it needs, one question at a time, instead of forcing you to hand-type a
slot=val command line.
- /cron-recipe -> lists the catalog
- /cron-recipe <name> -> forgiving name match (exact/prefix/substring/
fuzzy; ambiguous lists candidates), then seeds
the agent with a natural-language fill request
built from the recipe's typed slots + schedule
and prompt templates. The agent asks for each
value one at a time and calls the EXISTING
cronjob tool. No new tool.
- /cron-recipe <name> slot=val -> unchanged deterministic path (fill_recipe ->
create_job) for the dashboard/docs/power user.
Mechanism (no new plumbing, invariant-safe — the seed enters as a normal user
turn, never a synthetic injection):
- shared handler returns RecipeCommandResult{text, agent_seed}; match_recipe()
and build_recipe_seed() are the new shared pieces.
- gateway: dispatch rewrites event.text to the seed and falls through to the
agent (the same pattern /steer uses).
- CLI: handler sets a one-shot self._pending_agent_seed; the interactive loop
consumes it right after process_command() and runs it as the next turn.
The typed-slot schema stays the single source of truth (still validates the
form/inline path via fill_recipe); the agent path just renders those slots into
the questions to ask. Docs updated to lead with the name-then-ask flow.
A 'recipe' is a one-place definition of an automation that every surface
renders natively. The slot schema (cron/recipe_catalog.py) is the single
source of truth; four renderers consume it, and all paths end at the same
cron.jobs.create_job — no second job engine.
Form where there's a screen, conversation where there's a chat line:
- Dashboard / GUI app: a Recipes sub-tab on the Cron page renders each
recipe's typed slots as a form (time-picker, enum dropdown, free-text);
submit POSTs /api/cron/recipes/instantiate which fills + creates the job.
- CLI / TUI / messengers: /cron-recipe lists the catalog, shows a recipe's
fields, or fills + creates from a pasted 'key slot=val' command. The shared
handler (hermes_cli/cron_recipe_cmd.py) names any missing/invalid slot so
the agent can ask a targeted follow-up.
- Docs: a generated Cron Recipes catalog page (website, .mdx + React cards)
shows each recipe with a copy-paste command and a 'Send to App' button.
- Desktop: a hermes:// URL scheme (Electron single-instance lock +
setAsDefaultProtocolClient + open-url/second-instance) routes
hermes://cron-recipe/<key>?slot=val into the chat composer pre-filled.
Typed slots (time/enum/text/weekdays) with defaults: users never type raw
cron — recipes parameterize time-of-day and weekday sets and translate to
cron expressions; a free-text 'schedule' slot is the full-flexibility escape
hatch. Consent-first throughout: nothing schedules without an explicit submit
or send.
Core:
- cron/recipe_catalog.py — CronRecipe + RecipeSlot, 5 curated recipes,
recipe_form_schema / recipe_slash_command / recipe_deeplink /
recipe_catalog_entry renderers, fill_recipe (validate + translate to
create_job kwargs).
- hermes_cli/cron_recipe_cmd.py — shared /cron-recipe handler (CLI + TUI +
gateway never drift). CommandDef + dispatch in commands.py / cli.py /
gateway/run.py.
Dashboard: GET /api/cron/recipes + POST /api/cron/recipes/instantiate
(web_server.py), CronRecipes.tsx gallery+form, Segmented sub-tab on CronPage,
api.ts methods + types.
Desktop: hermes:// scheme end to end (main.cjs deep-link router + ready-queue,
preload onDeepLink/signalDeepLinkReady, global.d.ts types, desktop-controller
composer prefill, electron-builder protocols key).
Docs: extract-cron-recipes.py generator wired into prebuild.mjs,
cron-recipes-catalog.mdx + CronRecipesCatalog React component, sidebar entry.
Generated index json gitignored like skills.json.
Tests: 23 core (catalog/slots/schedule-resolution/validation/renderers/command
handler/generator) + 5 web_server endpoint tests. E2E verified end to end:
slot fill -> create_job -> persisted job with correct schedule/deliver/origin.
Hermes can propose automations and let the user accept them with one tap
via /suggestions, instead of making them assemble cron jobs by hand. Every
proposal — wherever it originates — flows through one surface.
Sources (the 'where suggestions come from'):
- catalog: curated starter automations (daily briefing, important-mail
monitor, weekly review, workday-start reminder) via /suggestions catalog
- recipe: installing a skill that carries a metadata.hermes.recipe block
registers a suggestion instead of auto-scheduling
- usage / integration: reserved for the background-review detector and
account-connect triggers (sources defined; emitters land next)
Pieces:
- cron/suggestions.py — the store. add/list/accept/dismiss, dedup+latch by
key (dismissed proposals never re-offered), pending cap so it can't become
a nag wall. Accepting calls the existing cron.jobs.create_job — there is
NO second job engine. Mirrors jobs.py storage (atomic writes, lock, 0600).
- cron/suggestion_catalog.py — the curated set. The important-mail monitor
entry is where the old proactive-monitor poll->classify->surface engine
lives now (cron/scripts/classify_items.py + the 'monitor' aux task), as ONE
catalog automation rather than a standalone feature.
- tools/recipes.py — recipe<->job bridge; register_recipe_suggestion() makes
a recipe source 'recipe' of this surface. recipe_to_job_spec() is the single
translation both the direct and suggestion paths share.
- hermes_cli/suggestions_cmd.py — shared /suggestions handler (CLI + gateway
never drift); /suggestions [accept N|dismiss N|catalog|clear].
- Wired: CommandDef + CLI dispatch (cli.py) + gateway dispatch (gateway/run.py)
+ aux 'monitor' task (config.py) + recipe-install hook (skills_hub.py).
Consent-first throughout: nothing auto-schedules; acceptance is always
explicit; dismissals latch.
Supersedes #41122 (proactive-monitor) and #41127 (recipes): both fold in here
as a catalog entry and a suggestion source respectively.
Tests: store (dedup/cap/accept/dismiss/latch), catalog seeding+idempotency,
recipe->suggestion bridge, command handler, aux config. E2E: recipe SKILL.md
-> parsed -> suggested -> accepted -> real cron job persisted to jobs.json.
Follow-up to the salvaged #41264 (Windows watcher): the setsid/bash detached
restart watcher on Linux/macOS inherits _HERMES_GATEWAY=1 the same way, so
the CLI's self-restart loop guard silently refuses 'hermes gateway restart'
and the gateway never comes back. Scrub the marker from the watcher env on
the POSIX branch as well, and extend the setsid test to assert it.
When the agent runs several terminal commands back-to-back, each
progress line repeated the '💻 terminal' header above its fenced code
block, cluttering the progress bubble. Now only the first terminal call
in a streak emits the header; subsequent consecutive terminal calls
render adjacent code blocks. Any other tool (or non-block preview)
resets the streak so the next terminal call gets a fresh header.
Follow-ups to #38199/#43354 found in post-merge review:
- Inline CLI memory approval never worked: the per-thread approval callback
was not passed to prompt_dangerous_approval, so the prompt_toolkit
fail-closed guard (#15216) denied every gated foreground write without
showing a prompt. Now invokes the registered callback directly; a crashed
prompt falls back to staging instead of a silent deny.
- Gateway sessions claimed inline support but prompt_dangerous_approval has
no gateway round-trip (that lives in the pending-approval queue), so gated
gateway memory writes hit the input() fallback and denied. Gateway
contexts now stage for /memory pending review.
- /skills pending|approve|reject|diff|approval now works on the gateway
(gateway_config_gate on skills.write_approval), so skills staged from a
messaging session can be reviewed there. Diff output truncated for chat.
- memory_tool validates required params before the gate so invalid writes
are rejected immediately instead of staged and failing at approve time.
- Stale tri-state write_mode docstrings updated to the boolean gate; docs
table corrected (inline prompt is interactive-CLI-only).
- 6 new tests covering the interactive approve/deny/error paths, gateway
staging, skills never-prompt invariant, and pre-gate validation.
Two fixes for the reported Slack thread approval UX:
1. Slack Block Kit approval/confirm sends silently overflowed the
3000-char section-block cap (flat 2900-char truncation + header +
reason), so long execute_code approvals failed with invalid_blocks
and fell back to the plain-text prompt with no buttons. Budget the
command preview against the rendered fixed parts so blocks never
exceed the cap (send_exec_approval + send_slash_confirm).
2. The text fallbacks told users to reply /approve — which Slack blocks
inside threads and Matrix clients reserve client-side. Add a
typed_command_prefix capability flag on BasePlatformAdapter
(default "/"; Slack and Matrix set "!" to match their existing
bang-prefix rewrite) and use it in the shared fallback prompt
builders (exec approval, update prompt, destructive slash confirm,
expensive-model confirm) plus Matrix's reaction-prompt text.
The slash-confirm text-intercept now also accepts bang-prefixed
replies (!always, !cancel) since those keywords aren't registered
commands and the adapters' rewrite doesn't touch them.
The voice inactivity timer (VOICE_TIMEOUT) only counted the bot's OWN audio
playback as activity. Under /voice off (text-only replies, but still in the
channel — leaving is /voice leave) nothing ever reset it, so every 300s the bot
disconnected and spammed "Left voice channel (inactivity timeout)."
The adapter now learns the live voice-reply mode via a getter wired from run.py
and skips the auto-disconnect while mode is off. It also resets the timer when a
user actually speaks to the bot, so an active listener (incl. voice-on
text-only sessions that never play audio) isn't dropped mid-conversation.
## What does this PR do?
The voice-during-active-run feature (#41984) changed
`_enrich_message_with_transcription` so that it returns a
`(enriched_text, successful_transcripts)` tuple instead of a bare string,
which lets callers echo the raw transcript back to the user. The signature
and every other return path were updated to match, but one branch was
missed: when a successfully transcribed clip arrives with the Discord
"empty content" placeholder as its caption, the method still returned the
prefix string on its own. All four call sites unpack the result with
`text, transcripts = await self._enrich_message_with_transcription(...)`,
so that path raised `ValueError: too many values to unpack (expected 2)`
and the inbound voice message was dropped instead of reaching the agent.
This is a real user-facing path rather than a corner case: a Discord voice
note sent without a caption is delivered as exactly that placeholder, so a
captionless voice message that transcribed correctly would crash the
handler precisely when transcription had worked. The fix returns the
proper tuple from that branch so the placeholder is still stripped while
the transcripts continue to flow back to the caller for the echo.
## Related Issue
N/A
## Type of Change
- [x] 🐛 Bug fix (non-breaking change that fixes an issue)
- [ ] ✨ New feature (non-breaking change that adds functionality)
- [ ] 🔒 Security fix
- [ ] 📝 Documentation update
- [ ] ✅ Tests (adding or improving test coverage)
- [ ] ♻️ Refactor (no behavior change)
- [ ] 🎯 New skill (bundled or hub)
## Changes Made
- `gateway/run.py`: in `_enrich_message_with_transcription`, return
`(prefix, successful_transcripts)` instead of a bare `prefix` from the
empty-content-placeholder branch, so the contract matches the signature
and the other return paths.
- `tests/gateway/test_stt_config.py`: add
`test_enrich_message_with_transcription_returns_tuple_for_empty_content_placeholder`,
which drives a successful transcription with the placeholder caption and
asserts the placeholder is stripped while the transcript is still returned.
## How to Test
1. Check out `main` and run the new test — it fails with
`ValueError: too many values to unpack (expected 2)`, reproducing the
crash a captionless Discord voice note would trigger.
2. Apply this change and re-run
`pytest tests/gateway/test_stt_config.py -q` — all tests pass.
3. `ruff check gateway/run.py tests/gateway/test_stt_config.py` and
`python scripts/check-windows-footguns.py gateway/run.py
tests/gateway/test_stt_config.py` both pass.
## Checklist
### Code
- [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md)
- [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.)
- [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix/feature (no unrelated commits)
- [x] I've run `pytest tests/ -q` and all tests pass
- [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)
### Documentation & Housekeeping
- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
Adds memory.write_mode and skills.write_mode (on|off|approve), applied to
both foreground turns and the background self-improvement review fork — the
source of the unprompted 'wrong assumption' saves users reported.
- on (default): write freely, unchanged behaviour
- off: never write; the tool returns a clean disabled result
- approve: don't commit. Memory foreground writes prompt inline (small,
reviewable in a chat bubble); background memory writes and ALL skill writes
stage to a pending store instead (a SKILL.md is too large to review inline,
and a daemon thread can't block on a prompt)
Review staged writes from CLI or any messaging platform:
/memory pending|approve|reject|mode
/skills pending|approve|reject|diff|mode
Skill review respects the size asymmetry: inline you see a one-line gist;
the full unified diff stays out-of-band (/skills diff, dashboard, or the
staged JSON file).
New: tools/write_approval.py (gate + pending store), hermes_cli/
write_approval_commands.py (shared CLI+gateway handlers). Gates wired at the
single entry points memory_tool() and skill_manage(), using the existing
write-origin ContextVar to distinguish foreground from background_review.
* fix(gateway): auto-start after container restart via planned-stop marker
On Docker (s6-overlay), the gateway runs as a dynamically-registered s6
service. When the container stops/restarts/upgrades, s6 sends the gateway
a plain SIGTERM. The shutdown path (_stop_impl) ended with an
unconditional _update_runtime_status("stopped"), persisting
gateway_state=stopped to the volume. container_boot.py reads that on the
next boot and only auto-starts gateways whose last state was "running"
(_AUTOSTART_STATES) — so after a routine `docker compose up
--force-recreate` the gateway stays down and messaging channels silently
go dark, with no error surfaced (issue #42675).
The codebase already distinguishes intentional stops from unexpected
signals via the planned-stop marker (write_planned_stop_marker /
consume_planned_stop_marker_for_self): `hermes gateway stop`,
systemd/launchd ExecStop, and Ctrl+C write a marker before signalling,
so the handler classifies them as planned. An unmarked SIGTERM
(container/s6 restart, OOM, bare kill) is signal-initiated.
This wires that existing classification through to the state persist,
rather than adding unreliable signal-source inference:
- run.py: GatewayRunner._signal_initiated_shutdown, set in
shutdown_signal_handler's unmarked-signal branch. In _stop_impl, a
signal-initiated (non-restart) teardown now persists "running" instead
of "stopped" — preserving the operator's run-intent and overwriting the
mid-shutdown "draining" marker so _AUTOSTART_STATES matches on reboot.
Operator stops and restarts persist "stopped" as before.
- service_manager.py: S6ServiceManager.stop() now writes the planned-stop
marker for the supervised PID (read from s6-svstat) before `s6-svc -d`,
so an in-container `hermes gateway stop` is correctly classified as
intentional (parity with the systemd/launchd/host stop paths, which
already mark). Best-effort: a marker-write failure falls back to the
safe signal-initiated path.
Tests: shutdown persist-decision table (signal→running, operator→stopped,
restart→stopped), s6 stop marker write + svstat PID parse + failure
tolerance. The signal→running and s6-marker tests fail without the
respective source change. Verified end-to-end against a container built
from this branch: an unmarked SIGTERM to the live gateway leaves
gateway_state=running (shutdown-context log confirms signal path);
existing real container-restart suite still green.
* docs(docker): clarify gateway autostart distinguishes operator-stop from container-kill
The per-profile-supervision section described the autostart-across-restart
contract as "running gateways come back, stopped stay stopped" without
spelling out what records 'stopped'. That contract was the source of
#42675 confusion: users expected a restart to bring the gateway back and
it didn't. With the write-side fix, only an explicit `hermes gateway stop`
records 'stopped'; container/s6 restart SIGTERMs (incl. image upgrades and
unexpected exits) leave the state 'running' so the gateway auto-starts.
Make that distinction explicit in both the multi-profile and
per-profile-supervision sections.
* test(docker): real-restart autostart E2E for #42675
Adds test_live_gateway_autostarts_after_real_restart_without_manual_state_stamp:
a live s6-supervised gateway is killed by an actual `docker restart`
SIGTERM (no manual gateway_state stamp, no planned-stop marker) and must
auto-start on the next boot. Exercises the WRITE side of the fix that the
existing stamp-based tests bypass.
Verified to FAIL against an origin/main image (reconciler logs
prior_state=stopped action=registered — the #42675 bug) and PASS against
the fixed image (prior_state=running action=started).
The markdown code-block change rendered args['command'] in full in both
verbose AND non-verbose (all/new) modes, so a long or multi-line terminal
command bypassed the tool_preview_length cap (default 40) and rendered as
a huge block. Non-verbose now collapses to a single line capped at the
preview length while keeping the fence; verbose keeps the full command.
image_generate returns its artifact as JSON ({"image": "/abs/path.png"})
with no MEDIA: tag, so the gateway auto-append path (which only recognized
text_to_speech MEDIA: tags) never delivered it — image delivery silently
depended on the model restating the path in its reply. Add image_generate to
the producer allowlist and extract the local path from its JSON result
(host_image > image > agent_visible_image), reusing the existing
extension-anchored matcher and history-dedupe so remote URLs, unknown
extensions, failures, and already-sent paths are rejected.
Closes the remaining unfixed path from #19105.
Terminal tool progress on markdown-capable gateways (Telegram, Slack,
Discord, WhatsApp, Matrix, Weixin, Feishu) renders the full command in a
fenced code block again, in all/new AND verbose modes — gated on the
adapter's supports_code_blocks capability. Plain-text platforms keep the
short truncated preview.
No language tag is emitted: Slack mrkdwn renders a '```bash' fence with
'bash' as a literal first code line, so a bare '```' fence is used, which
renders correctly on every platform that supports blocks.
This restores the #41215 feature (removed in #41950 due to the command
showing in group chats) as the default. For a personal assistant the
command display is desired; the group-chat concern is a preference, not a
vulnerability.
When a platform adapter sets REQUIRES_EDIT_FINALIZE=True (e.g.
TelegramAdapter), tool progress edits now pass finalize=True so
format_message() is applied before sending to the platform.
Previously, the initial send() formatted the message correctly via
MarkdownV2, but subsequent edit_message() calls skipped formatting
(finalize=False), causing raw markdown (e.g. triple backticks for
bash code blocks) to render as plain text on Telegram.
Refs: #41955, #41732
#41215 rendered a terminal tool call as a native ```bash fenced block on
markdown platforms (Telegram, WhatsApp, Slack, and others), showing the full
command with no truncation, in both all/new and verbose modes. That posted
complete shell commands (heredocs, internal paths, destructive commands) into
the chat before the final answer, visible to everyone in it.
This restores the prior behavior: terminal progress shows the short, truncated
preview line that every other tool already uses, capped at tool_preview_length.
The supports_code_blocks capability flag is left in place for future use.
CLI/TUI rendering is a separate path and was unaffected.
Adds a regression test asserting terminal progress renders as a truncated
preview, not a fenced bash block, even on a markdown-capable gateway.
Fixes#41955
When the agent has its own SessionDB reference (_session_db is not None),
_flush_messages_to_session_db() persists user messages to SQLite during the
agent run. Two gateway fallback paths also wrote the same user message
without skip_db=True, creating duplicate entries in state.db:
1. agent_failed_early path (transient 429/timeout failures)
2. not-new-messages path (history_offset >= len(messages) edge case)
Move agent_persisted flag definition to before the if/elif/else block so
all paths can use it, and pass skip_db=agent_persisted to every fallback
append_to_transcript() call.
Fixes#42039
Lift the 4 inbound-message authorization methods out of GatewayRunner into
gateway/authz_mixin.py:GatewayAuthorizationMixin. Behavior-neutral; gateway/run.py
16200 -> 15812 LOC.
Methods moved (~389 LOC): _is_user_authorized, _get_unauthorized_dm_behavior,
_adapter_dm_policy, _adapter_enforces_own_access_policy. The two adapter-policy
helpers are private to _is_user_authorized, so the cluster is fully self-contained
(zero outside-cluster self.method calls after the lift). All self.* calls resolve
unchanged via the MRO (GatewayRunner(GatewayAuthorizationMixin, ...)).
Import split: 6 neutral deps (os, Optional, Platform, SessionSource, the two
whatsapp_identity helpers) at the mixin module top; the module-level logger is
imported lazily inside _is_user_authorized (from gateway.run import logger) so
the mixin never imports gateway.run at module scope -> no cycle. The lazy import
preserves the exact logger name (gateway.run) so log records are unchanged.
_evict_cached_agent (the chokepoint for /new, /model, /undo, session
resets — 17 call sites) only popped the cache entry, dropping the
AIAgent reference without releasing its httpx client pool. AIAgent
holds reference cycles (callbacks, tool state) so CPython refcounting
does not free the client promptly; under steady gateway traffic the
held sockets + buffers accumulate and RSS climbs (the leak class behind
Now the chokepoint pops AND schedules a soft release_clients() on a
daemon thread (mirrors the cap-enforcer / idle-sweeper). Soft release
frees the client pool + per-turn child subagents but preserves the
session's terminal sandbox / browser / bg processes for resumption.
Mid-turn agents are skipped so a running request is never torn down.
Also fixes the no-lock branch which previously never popped at all.
Long-lived gateways under heavy cron/build workloads grow steadily (~18 MB/hr
post-phantom-dispatch-fix) and eventually need a restart-or-OOM. Four retention
sites, all confirmed live on current main:
1. _evict_cached_agent() (/model, /reasoning, codex-runtime, /undo, etc.) popped
the cache entry without releasing the agent's OpenAI client, httpx transport,
SSL context, or conversation history. Only /new cleaned up first. Now releases
clients on a daemon thread, matching _enforce_agent_cache_cap.
2. _release_evicted_agent_soft() now clears _session_messages after
release_clients() — tool outputs (file reads, terminal output, search results)
can be tens of MB per 100+-tool-call session; the list is rebuilt from
persisted session JSON on resume, so dropping it on soft eviction is safe.
3. The session-expiry watcher (permanent finalization) now drops the session's
per-session control dicts (_session_model_overrides, _session_reasoning_overrides,
_pending_approvals, _update_prompt_pending, _pending_model_notes). These leaked
one entry per session per gateway lifetime. NOTE: this is the session-finalize
path, NOT idle agent-cache eviction — an idle-evicted session is still alive and
rebuilds its agent from these overrides, so pruning them there would silently
reset a user's /model choice.
4. _tool_defs_cache is now bounded (_TOOL_DEFS_CACHE_MAX=8) with oldest-first
eviction instead of growing unboundedly across the distinct toolset/config
fingerprints a gateway sees over its lifetime.
Salvaged from #25318 by Michael Steuer (@mssteuer); fix 3 redirected from the
idle-sweep to the session-finalize lifecycle, magic number 8 lifted to a named
constant, test ported.
Fixes#19251
Co-authored-by: Michael Steuer <michael@make.software>
Salvaged from #6600 (@kristianvast) — re-scoped to the voice half only and
rebased onto current main. The cascading-interrupt hang half of the original
PR landed independently in dd0d1222a, so this carries ONLY Problem 1.
When a voice/audio message arrives while the agent is busy on the same
session, it hit the interrupt path with empty text because STT only ran after
the running-agent guard — the voice was effectively lost. Now we transcribe
audio BEFORE signaling the agent (and on the fresh-message path), echo the raw
transcript back to the user (🎙️), and _enrich_message_with_transcription
returns (text, transcripts) so callers can echo. A new
_dequeue_pending_with_transcription drives the post-agent drain the same way.
Reapplied onto _prepare_inbound_message_text (inbound enrichment was extracted
from the inline dispatch block since the original PR).
Co-authored-by: Kristian Vastveit <kristian@agrointel.no>
The in-session slash commands (/model, /reset, /usage, /compress, /voice, ...)
— 42 _handle_*_command handlers, ~3,200 LOC — move out of gateway/run.py into a
mixin GatewayRunner inherits. self._handle_*_command dispatch + all test
references resolve unchanged via the MRO.
Neutral deps (MessageEvent, EphemeralReply, Platform, t, cfg_get, atomic_*_write,
account-usage helpers, stdlib) imported at the mixin top level. The ~10 run.py-
internal helpers (_hermes_home, _load_gateway_config, _resolve_gateway_model,
_AGENT_PENDING_SENTINEL, ...) imported lazily inside the handlers that need them
to avoid an import cycle.
gateway/run.py 19157 -> 15870 LOC; GatewayRunner direct methods 214 -> 172.
Behavior-neutral: voice/update/model/compress command test suites pass; all 42
resolve to the mixin via MRO.
When --replace force-kills an unresponsive old gateway, SIGKILL can fail
to reap it (uninterruptible sleep, zombie-reaping parent, etc.). The old
code unconditionally cleared the PID file and scoped locks and started a
fresh instance anyway, leaving two live gateways fighting over the same
bot token — a duplicate-gateway failure mode of #19471.
Re-verify the process is actually gone (via the Windows-safe _pid_exists
helper) after the force-kill; if it still appears alive, clear the
takeover marker and abort the replacement instead of duplicating.
Co-authored-by: Hermes <noreply@nousresearch.com>
gateway/run.py is the largest god file (20k LOC, GatewayRunner with 220
methods). This lifts the cohesive kanban-watcher cluster — _kanban_notifier_watcher,
_kanban_dispatcher_watcher, _kanban_advance/unsub/rewind, _deliver_kanban_artifacts
(~1,035 LOC, 6 methods) — into gateway/kanban_watchers.py as a mixin that
GatewayRunner inherits.
Mixin (not free functions) because the methods use only self state: inheriting
keeps every self._kanban_* call site working unchanged via the MRO, making this
a behavior-neutral move. The methods' lazy imports (_kb, _decomp, _load_config,
Platform) travel with them; the mixin needs only stdlib + a matching
logging.getLogger('gateway.run').
run.py 20187 -> 19157 LOC; GatewayRunner direct methods 220 -> 214.
Behavior-neutral: gateway test suite 6582 passed / 0 failed; start() still wires
both watchers via self._kanban_*; MRO resolves all 6 to the mixin. One test
(corrupt-board quarantine retry) keyed its time-travel mock on the caller's
filename being gateway/run.py — updated to also accept gateway/kanban_watchers.py.
Establishes the mixin-extraction pattern for further GatewayRunner decomposition
(the 2406-LOC _run_agent and 1164-LOC _handle_message remain, but their callback
closures need a context-object redesign — deferred).
Session-scoped /model and /reasoning overrides were silently lost on
Telegram DM/forum topics and after compression session splits (#30479).
Root cause: _handle_message_with_agent rewrites source.thread_id via
_recover_telegram_topic_thread_id (lobby/stripped reply -> the user's
bound topic) before deriving the session key. The /model and /reasoning
handlers derived their override key from the raw inbound event.source,
skipping that recovery, so the override was stored under one key and the
next message turn read a different key.
Fix: add _normalize_source_for_session_key (applies the same recovery a
message turn does) and use it in both handlers before deriving the key.
session_id rotation on compression was never the cause — overrides are
keyed by the durable session_key; the split path preserves it.
Author: teknium1 <127238744+teknium1@users.noreply.github.com>
Adds thread_id and chat_type to the agent:start/end plugin hook context
(via getattr with safe defaults; both are real `source` attrs already used
in gateway/run.py). agent:end inherits them via **hook_ctx. Purely additive
— no prompt/history mutation. Documents the full ctx dict in hooks.py.
Co-authored-by: SNooZyy2 <SNooZyy2@users.noreply.github.com>
Tool-progress now shows a terminal command in a ```bash fenced block —
full command, no surrounding quotes, no label, no 40-char truncation —
instead of the noisy `terminal: "cmd…"` line, on every platform that
renders markdown code blocks (Telegram, Slack, Matrix, WhatsApp, Feishu,
Weixin, Discord). Plain-text platforms keep the compact preview line.
Gated on a new `BasePlatformAdapter.supports_code_blocks` capability
(default False) rather than a hardcoded platform list, so plugin adapters
(Discord lives in plugins/platforms/) opt in by setting the flag. Applies
to both all/new and verbose progress modes, with a safe fallback when the
command arg is missing or blank.
* feat(onboarding): opt-in structured profile-build path on first contact
On a user's very first gateway message, Hermes now optionally offers to
build a short profile of them — then, only with consent, gathers durable
facts and persists them to the user-profile memory store (memory tool,
target="user") so future sessions start already knowing who they are.
Inspired by Poke's zero-input onboarding, but consent-first by design:
- The agent OFFERS, never assumes. Declining stops it immediately.
- Before ANY external lookup it states what it will look up and asks.
- It never reads connected accounts (email/calendar) silently — the
exact privacy concern that made naive implementations feel invasive.
Wiring reuses existing infrastructure end-to-end:
- gateway/run.py first-message hook (was a plain self-intro) now swaps in
the profile-build directive when enabled and not yet offered.
- agent/onboarding.py gains profile_build_mode()/profile_build_directive()
+ PROFILE_BUILD_FLAG, latched once via the existing onboarding.seen
mechanism so the offer fires at most once per install.
- config default onboarding.profile_build: "ask" (set "off" to disable).
Added to an existing section, so no _config_version bump needed.
No new storage layer, no new injection path, no prompt-cache impact.
* fix(dashboard): fold onboarding into agent tab to avoid 1-field category
onboarding.profile_build is the only schema-surfaced onboarding field
(onboarding.seen is an internal latch dict), so the dashboard CONFIG_SCHEMA
single-field-category invariant rejected it. Merge onboarding -> agent like
the other small categories.
SIMPLEX_ALLOWED_USERS silently denied every contact when operators
listed display names instead of numeric contactIds. The SimpleX UI
never surfaces the numeric id, so display names are what operators
naturally put in the env var. _is_user_authorized only compared
source.user_id (the contactId), so the allowlist never matched.
Expand check_ids to include source.user_name for the simplex platform,
mirroring the existing WhatsApp phone-LID aliasing pattern. Adds doc +
setup-prompt clarification and three regression tests.
Salvaged from PR #40393. Adds manishbyatroy to release.py AUTHOR_MAP.
Persist the inbound user turn before provider/tool execution so a crash
before run_conversation() (e.g. provider/httpx client init failure) keeps
the inbound message in the transcript. Repair stale/missing SSL_CERT_FILE
state on gateway startup, and avoid duplicate gateway fallback writes.
subprocess.run() and proc.wait() without timeout can hang indefinitely
if the child process becomes unresponsive. This blocks the calling
thread forever.
Fixed locations:
- tools/transcription_tools.py: ffmpeg conversion (timeout=300) and
user-configured STT commands with shell=True (timeout=300)
- gateway/run.py: helper script proc.wait() (timeout=3600)
Not fixed:
- agent/anthropic_adapter.py: interactive 'claude setup-token' —
user-driven, timeout would be inappropriate
On Windows with non-UTF-8 console encodings (e.g. cp949, cp1252),
StreamHandler emits raise UnicodeEncodeError when log messages contain
characters outside the console codepage — such as the em-dash (U+2014)
in the session hygiene message.
This crashed the gateway process silently, leaving no diagnostic output.
Fix: add _safe_stderr() helper that wraps sys.stderr in a TextIOWrapper
with encoding='utf-8' and errors='replace' when the console encoding
is not UTF-8. Applied to both:
- hermes_logging.py setup_verbose_logging() stderr handler
- gateway/run.py optional stderr handler
The wrapper ensures log lines are never lost — un-encodable characters
are replaced with '?' instead of crashing the process.
Fixes#40432
Home Assistant is a bundled plugin now (#40709) and declares
allow_update_command=True on its PlatformEntry. The registry fallback
in _handle_update_command already covers it, so the frozenset entry is
a redundant double-allow — same cleanup #40711 did for Discord and
Mattermost. Adds a registry-fallback test mirroring the existing
discord/mattermost cases.
`gateway/run.py::_UPDATE_ALLOWED_PLATFORMS` was a hardcoded frozenset
listing every messaging platform allowed to invoke the `/update` slash
command. Plugin-migrated platforms (currently Discord and Mattermost,
soon also Home Assistant via #32500) declare `allow_update_command=True`
on their `PlatformEntry`, and `_handle_update_command` already falls
back to the registry when a platform isn't in the frozenset. The result
was a silent redundancy: those entries said "allowed" twice, and the
registry flag was a no-op for them in practice.
- Removed `Platform.DISCORD` and `Platform.MATTERMOST` from the frozenset.
- Updated the docstring to make the split explicit (built-ins live in
the frozenset; plugins use `allow_update_command` on the registry entry).
The remaining frozenset entries are all still built-in platforms living
under `gateway/platforms/` today. Future plugin migrations should drop
their entry from the frozenset as part of the migration PR (or in a
sibling chore PR like this one).
Added a `TestUpdateCommandPlatformGate` test class that pins down all
three branches of the gate so future changes don't silently regress:
- Programmatic interfaces (`Platform.WEBHOOK`, `Platform.API_SERVER`)
must remain blocked.
- Plugin-migrated platforms (Discord, Mattermost) must pass via the
registry fallback.
- Built-in platforms in the hardcoded frozenset (Telegram) must
still pass without needing the registry.
The gate previously had zero direct test coverage — its only existing
coverage was `test_no_adapter_for_platform` which exercised a different
code path.
Move gateway/platforms/homeassistant.py into plugins/platforms/homeassistant/
following the same shape as the Mattermost and Discord migrations.
- Adapter file is renamed via git mv (history is preserved).
- register() exposes the platform via the plugin system instead of the
hardcoded Platform.HOMEASSISTANT elif in gateway/run.py::build_adapter().
- _standalone_send() replaces the legacy _send_homeassistant() helper in
tools/send_message_tool.py. Out-of-process cron delivery
(deliver=homeassistant from a cron process not co-located with the
gateway) now flows through the registry's standalone_sender_fn path
instead of the hardcoded elif.
- _is_connected() probes HASS_TOKEN via hermes_cli.gateway.get_env_value
so existing connected-platform checks behave identically.
The HASS_TOKEN / HASS_URL env-to-PlatformConfig seeding in
gateway/config.py stays in core — same pattern bluebubbles, mattermost,
and discord migrations followed. No setup_fn or apply_yaml_config_fn is
registered because Home Assistant has no _setup_homeassistant wizard in
hermes_cli/setup.py and no homeassistant: YAML block in config.yaml today;
setup runs through the existing hermes_cli/tools_config.py toolset wizard.
Test imports were rewritten across tests/gateway/test_homeassistant.py,
tests/integration/test_ha_integration.py, and
tests/tools/test_send_message_missing_platforms.py; the legacy
(token, extra, chat_id, message)-shaped _send_homeassistant call site is
preserved via a small SimpleNamespace shim in
test_send_message_missing_platforms.py (same approach used when
mattermost moved).
- Focused HA suites (64 tests across the three rewritten files) pass.
- Broader gateway/cron sweep produces 10 failures identical to main
baseline (telegram approval/model-picker xdist isolation flakes,
wecom_callback defusedxml issue, cron script_timeout fixture issue).
Zero net new failures.
* fix: respect disabled auto-compaction on context overflow
Port from anomalyco/opencode#30749.
When compression.enabled is false, NO automatic compaction trigger may
fire. The proactive token-threshold paths (preflight + post-response
should_compress gate) already honoured the setting, but the three
provider-overflow recovery paths in the agent loop — long-context-tier
429, 413 payload-too-large, and context-overflow — called
_compress_context() unconditionally, silently compressing and rotating
the session against the user's explicit choice.
Add a single guard at the top of the overflow-recovery dispatch: when
compression is disabled and the error is one of those three overflow
classes, surface a terminal error (compaction_disabled: True) telling the
user to /compress manually, /new, switch to a larger-context model, or
reduce attachments. Manual /compress (force=True) is unaffected — it never
enters this loop.
Tests: new TestOverflowWithCompactionDisabled (413 + 400 overflow don't
compress when disabled; control case still compresses when enabled).
Existing overflow-recovery tests updated to enable compaction explicitly
(they verify the recovery fires); fixture defaults flipped to True to
match production (compression.enabled defaults to True).
* fix(gateway): plain text while busy interrupts by default again
busy_input_mode (default 'interrupt') was advertised as the busy-behavior
knob, but a second knob added in 7abd62719 — busy_text_mode, defaulting to
'queue' — short-circuited every plain TEXT message before busy_input_mode
was consulted. Result: plain follow-ups silently queued instead of
interrupting, even with busy_input_mode left at its 'interrupt' default
(regression #38390, silent-queue #31588).
Collapse to one source of truth: busy_input_mode drives text handling.
busy_text_mode is kept only as a legacy explicit override for back-compat
(existing queue setups keep working); when unset it follows busy_input_mode.
All default fallbacks flipped queue->interrupt. The debounce mechanism is
preserved and now keyed off the resolved mode.
Fixes#38390, #31588.