hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-20 10:11:58 +00:00

Author	SHA1	Message	Date
Aleksandr Pasevin	8a4fe80f8d	fix(signal): skip reactions for unauthorized senders The on_processing_start hook fired a reaction emoji (👀) on every inbound Signal message before run.py's _is_user_authorized check. This meant contacts not in SIGNAL_ALLOWED_USERS would see the bot react to their messages even though Hermes silently dropped them — leaking the presence of the bot and causing confusing UX. Two changes to gateway/platforms/signal.py: 1. Read SIGNAL_ALLOWED_USERS into self.dm_allow_from in __init__ (mirrors the group_allow_from pattern already in place). 2. Add _reactions_enabled(event) — two-gate check: - SIGNAL_REACTIONS=false/0/no disables reactions globally - If SIGNAL_ALLOWED_USERS is set, only react to senders in the allowlist (skips unauthorized contacts) Both on_processing_start and on_processing_complete now call this guard before sending any reaction. Telegram already has an equivalent _reactions_enabled() guard (controlled by TELEGRAM_REACTIONS). This brings Signal to parity.	2026-05-04 01:38:21 -07:00
Siddharth Balyan	a11aed1acc	fix(cli): local backend CLI always uses launch directory, stops .env sync of TERMINAL_CWD (#19334 ) The old CWD heuristic was fooled by: 1. TERMINAL_CWD persisted to .env by `hermes config set terminal.cwd` 2. Inherited TERMINAL_CWD from parent hermes processes 3. Only resolved when config had a placeholder value (not explicit paths) Fix: - load_cli_config() unconditionally uses os.getcwd() for local backend - TERMINAL_CWD always force-exported in CLI mode (overrides stale values) - Gateway sets _HERMES_GATEWAY=1 marker so lazy cli.py imports don't clobber - Remove terminal.cwd from config-set .env sync map (prevents re-poisoning) - Clarify setup wizard label as 'Gateway working directory' Closes #19214	2026-05-04 11:36:19 +05:30
molvikar	74636f9c4a	fix(gateway): clear queued reload-skills notes on new/resume/branch	2026-05-03 17:00:31 -07:00
Kenny Wang	222767e5e8	fix: sanitize Telegram help command mentions	2026-05-03 17:00:09 -07:00
konsisumer	6fda92aa7f	fix(gateway): bridge top-level require_mention to Telegram config Users commonly place `require_mention: true` at the top level of config.yaml alongside `group_sessions_per_user`, expecting it to gate Telegram group messages. The key was silently ignored because the config loader only checked `yaml_cfg["telegram"]["require_mention"]`. When `require_mention` is found at the top level and no telegram-specific value is set, the fix now: - adds it to platforms_data["telegram"]["extra"] so _telegram_require_mention() picks it up via the primary config.extra path - sets TELEGRAM_REQUIRE_MENTION env var for the secondary fallback path A telegram-specific value (telegram.require_mention) still takes precedence over the top-level shorthand. Also corrects telegram.md: bare /cmd without @botname is rejected when require_mention is enabled; only /cmd@botname (bot-menu form) passes. Fixes #3979	2026-05-03 16:59:46 -07:00
clawbot	1bd975c0ba	fix(gateway): suppress duplicate voice transcripts Deduplicate exact and near-exact Discord voice STT transcripts per guild/user over a short window to avoid duplicate delayed agent replies. Adds regression tests for exact and near-duplicate voice transcript suppression.	2026-05-03 16:59:21 -07:00
Zyproth	a5cae16496	fix(api_server): fall back to default port on malformed API_SERVER_PORT	2026-05-03 15:27:03 -07:00
leprincep35700	b59bb4e351	fix(gateway): preserve home-channel thread targets across restart notifications	2026-05-03 08:47:49 -07:00
Teknium	d87fd9f039	fix(goals): make /goal work in TUI and fix gateway verdict delivery (#19209 ) /goal was silently broken outside the classic CLI. TUI: /goal was routed through the HermesCLI slash-worker subprocess, which set the goal row in SessionDB but then called _pending_input.put(state.goal) — the subprocess has no reader for that queue, so the kickoff message was discarded. No post-turn judge was wired into prompt.submit either, so even a manual kickoff would not continue the goal loop. Intercept /goal in command.dispatch instead, drive GoalManager directly, and return {type: send, notice, message} so the TUI client renders the Goal-set notice and fires the kickoff. Run the judge in _run_prompt_submit after message.complete, surface the verdict via status.update {kind: goal}, and chain the continuation turn after the running guard is released. Gateway: _post_turn_goal_continuation was gated on hasattr(adapter, 'send_message'), but adapters only expose send(). That branch was dead on every platform — users never saw '✓ Goal achieved', 'Continuing toward goal', or budget-exhausted messages. Replace the dead call with adapter.send(chat_id, content, metadata) and drop a broken reference to self._loop. Tests: - tests/tui_gateway/test_goal_command.py — full /goal dispatch matrix (set / status / pause / resume / clear / stop / done / whitespace) plus regressions for slash.exec → 4018 and 'goal' staying in _PENDING_INPUT_COMMANDS. - tests/gateway/test_goal_verdict_send.py — locks in the adapter.send path for done / continue / budget-exhausted and verifies the hook no-ops when no goal is set or the adapter lacks send().	2026-05-03 05:49:12 -07:00
charliekerfoot	1148c46241	fix(gateway): correct ws scheme conversion for https urls	2026-05-03 03:54:03 -07:00
Hermes Agent	934103476f	fix(gateway): send /new response before cancel_session_processing to avoid race (#18912 ) When /new is issued while an agent is actively processing, the confirmation response was never sent to the user because cancel_session_processing() was called before _send_with_retry(). Task cancellation side effects could silently drop the response. Fix: reorder to send the response BEFORE cancelling the old task. Add logging at the send point (matching the pattern at line 2800 in _process_message_background) so future failures are visible. Closes: #18912	2026-05-03 03:54:03 -07:00
millerc79	f1e0292517	fix(gateway): resume sessions after crash/restart instead of blanket suspend suspend_recently_active() was unconditionally setting suspended=True on startup, causing get_or_create_session() to wipe conversation history on every restart. Change to set resume_pending=True instead, so sessions auto-resume while still allowing stuck-loop escalation after 3 failures.	2026-05-03 03:54:03 -07:00
nftpoetrist	6c1322b997	fix(slack): close previous handler in connect() to prevent zombie Socket Mode connections SlackAdapter.connect() overwrote self._handler, self._app, and self._socket_mode_task without closing the prior AsyncSocketModeHandler first. If connect() was called a second time on the same adapter (e.g. during a gateway restart or in-process reconnect attempt), the old Socket Mode websocket stayed alive. Both the old and new connections received every Slack event and dispatched it twice — producing double responses with different wording, the same bug that affected DiscordAdapter (#18187, fixed in #18758). Fix: add a close-before-reassign guard at the start of the connection setup path, mirroring the guard DiscordAdapter.connect() already has. When self._handler is None (fresh adapter, first connect()) the block is a harmless no-op. Scoped to the handler/app fields only — no behavior change for any path that does not call connect() twice. Fixes #18980	2026-05-03 03:47:49 -07:00
0xyg3n	19ba9e43b6	fix(gateway/discord): require allowlist auth on slash commands Slash commands (_run_simple_slash, _handle_thread_create_slash) bypassed every DISCORD_ALLOWED_* gate enforced by on_message. Any guild member could invoke /background (RCE via terminal), /restart, /model, /skill, etc. CVSS 9.8 Critical. - _evaluate_slash_authorization mirrors on_message gates (user, role, channel, ignored channel) with fail-closed semantics - _check_slash_authorization sends ephemeral reject + logs + admin alert - Auth gate runs before defer() so rejections are ephemeral - /skill autocomplete returns [] for unauthorized users (no catalog leak) - Component views (ExecApproval, SlashConfirm, UpdatePrompt, ModelPicker) now honor role allowlists via shared _component_check_auth helper - Optional DISCORD_HIDE_SLASH_COMMANDS defense-in-depth - Cross-platform admin alert (Telegram/Slack fallback) on unauthorized attempts Based on PR #18125 by @0xyg3n.	2026-05-03 03:44:55 -07:00
MottledShadow	a22465e07a	fix(weixin): send_weixin_direct cross-loop session check When send_message tool is called from inside a running gateway, the _run_async bridge spawns a worker thread with a separate event loop. send_weixin_direct then reuses the live adapter's aiohttp session which was created on the gateway's main loop. aiohttp's TimerContext checks asyncio.current_task(loop=session._loop) and sees None because we're executing on the worker thread's loop → raises 'Timeout context manager should be used inside a task'. Fix: skip the live-adapter shortcut when the session belongs to a different event loop, falling through to the fresh-session path.	2026-05-03 01:51:33 -07:00
teknium1	762eb79f1e	fix(gateway): tighten httpx keepalive and close whatsapp typing-response leak (#18451 ) Two mitigations for the CLOSE_WAIT accumulation reported against QQ Bot + Feishu on macOS behind Cloudflare Warp. 1. Shared httpx.Limits helper (gateway/platforms/_http_client_limits.py). Every long-lived platform adapter now constructs httpx.AsyncClient with max_keepalive_connections=10 and keepalive_expiry=2.0, vs httpx's default of unbounded keepalive pool and 5.0s expiry. On macOS/Warp the default 5s window let idle keepalive sockets sit in CLOSE_WAIT long enough for seven persistent adapters (QQ Bot, WeCom, DingTalk, Signal, BlueBubbles, WeCom-callback, plus the transient Feishu helper) to compound to the 256-fd ulimit. Tunable via HERMES_GATEWAY_HTTPX_KEEPALIVE_EXPIRY and HERMES_GATEWAY_HTTPX_MAX_KEEPALIVE env vars. 2. whatsapp.send_typing aiohttp leak. The call was 'await self._http_session.post(...)' with no 'async with' and no variable capture — the ClientResponse went out of scope unclosed, holding its TCP socket in CLOSE_WAIT until GC. Fixed by wrapping in 'async with'. This was the only bare-await aiohttp leak in the gateway/tools/plugins tree per audit; all other aiohttp sites use the context-manager pattern correctly. The underlying reporter also saw Feishu SDK (lark-oapi) connections in CLOSE_WAIT — those are inside the SDK and out of our direct control, but tightening httpx keepalive across adapters reduces the aggregate pool pressure regardless of which individual adapter leaks.	2026-05-02 02:23:37 -07:00
beibi9966	38dd057e91	fix(feishu): finalize remote document downloads inside httpx.AsyncClient context (#18502 ) Snapshot Content-Type and body while the client context is still active so pooled connections fully release on exit. Previously the read happened after `async with httpx.AsyncClient(...)` returned — which works today only because httpx eagerly buffers non-streaming responses; a future refactor to `.stream()` would silently read- after-close. Part of the #18451 connection-hygiene audit. Salvage of #18502.	2026-05-02 02:23:37 -07:00
Teknium	1dce908930	fix(gateway): shutdown + restart hygiene (drain timeout, false-fatal, success log) (#18761 ) * fix(gateway): config.yaml wins over .env for agent/display/timezone settings Regression from the silent config→env bridge. The bridge at module import time is correct for max_turns (unconditional overwrite), but every other agent., display., timezone, and security bridge key was guarded by 'if X not in os.environ' — so a stale .env entry from an old 'hermes setup' run would shadow the user's current config.yaml indefinitely. Symptom: agent.max_turns: 500 in config.yaml, HERMES_MAX_ITERATIONS=60 in .env from an old setup, and the gateway silently capped at 60 iterations per turn. Gateway logs confirmed api_calls never exceeded 60. Three changes: 1. gateway/run.py: drop the 'not in os.environ' guards for all agent., display., timezone, and security.* bridge keys. config.yaml is now authoritative for these settings — same semantics already in place for max_turns, terminal., and auxiliary.. Also surface the bridge failure (previously 'except Exception: pass') to stderr so operators see bridge errors instead of silently falling back to .env. 2. gateway/run.py: INFO-log the resolved max_iterations at gateway start so operators can verify the config→env bridge did the right thing instead of chasing a phantom budget ceiling. 3. hermes_cli/setup.py: stop writing HERMES_MAX_ITERATIONS to .env in the setup wizard. config.yaml is the single source of truth. Also clean up any stale .env entry left behind by pre-fix setups. Regression tests in tests/gateway/test_config_env_bridge_authority.py guard each config→env key against the 'stale .env shadows config' bug. * fix(gateway): shutdown + restart hygiene (drain timeout, false-fatal, success log) Three issues observed in production gateway.log during a rapid restart chain on 2026-05-02, all fixed here. 1. _send_restart_notification logged unconditional success adapter.send() catches provider errors (e.g. Telegram 'Chat not found') and returns SendResult(success=False); it never raises. The caller ignored the return value and always logged 'Sent restart notification to <chat>' at INFO, producing a misleading success line directly below the 'Failed to send Telegram message' traceback on every boot. Now inspects result.success and logs WARNING with the error otherwise. 2. WhatsApp bridge SIGTERM on shutdown classified as fatal error _check_managed_bridge_exit() saw the bridge's returncode -15 (our own SIGTERM from disconnect()) and fired the full fatal-error path, producing 'ERROR ... WhatsApp bridge process exited unexpectedly' plus 'Fatal whatsapp adapter error (whatsapp_bridge_exited)' on every planned shutdown, immediately before the normal '✓ whatsapp disconnected'. Adds a _shutting_down flag that disconnect() sets before the terminate, and _check_managed_bridge_exit() returns None for returncode in {0, -2, -15} while shutting down. OOM-kill (137) and other non-signal exits still hit the fatal path. 3. restart_drain_timeout default 60s → 180s On 2026-05-02 01:43:27 a user /restart fired while three agents were mid-API-call (82s, 112s, 154s into their turns). The 60s drain budget expired and all three were force-interrupted. 180s covers realistic in-flight agent turns; users on very-long-reasoning models can still raise it further via agent.restart_drain_timeout in config.yaml. Existing explicit user values are preserved by deep-merge. Tests - tests/gateway/test_restart_notification.py: two new tests assert INFO is only logged on SendResult(success=True) and WARNING with the error string is logged on SendResult(success=False). - tests/gateway/test_whatsapp_connect.py: parametrized test for returncode in {0, -2, -15} proves shutdown-time exits are suppressed; separate test proves returncode 137 (SIGKILL/OOM) still surfaces as fatal even when _shutting_down is set. - _check_managed_bridge_exit() reads _shutting_down via getattr-with- default so existing _make_adapter() test helpers that bypass __init__ (pitfall #17 in AGENTS.md) keep working unmodified.	2026-05-02 02:08:06 -07:00
luyao618	292d2fb42f	fix(discord): close old client before reconnect to prevent zombie websockets (#18187 ) When DiscordAdapter.connect() is called during reconnect, it creates a new commands.Bot client without closing the previous one. The old client's websocket remains connected to Discord's gateway, causing both to fire on_message for every incoming event — resulting in double responses. Fix: before creating a new Bot instance, check if a previous client exists and close it. This ensures only one websocket connection is active at any time. Closes #18187	2026-05-02 02:04:14 -07:00
Teknium	10297fa23c	fix(discord): `/reload-skills` now refreshes the `/skill` autocomplete live (#18754 ) `_register_skill_group` captured the skill catalog in closure variables (`entries` and `skill_lookup`) so the single `tree.add_command` call at startup owned the only live copy. The closure is never re-entered after startup, so `/reload-skills` — which rescans the on-disk skills dir and refreshes the in-process `_skill_commands` registry — had no way to propagate results into the `/skill` autocomplete on Discord. New skills stayed invisible in the dropdown, and deleted skills returned "Unknown skill" when the stale autocomplete entry was clicked. The fix is purely a dataflow change: promote `entries` and `skill_lookup` to instance attributes (`_skill_entries`, `_skill_lookup`), split the collector-driven rebuild into a helper (`_refresh_skill_catalog_state`), and add a public `refresh_skill_group()` method that re-runs the helper and is safe to call at any point after the initial registration. The gateway's `_handle_reload_skills_command` then iterates `self.adapters` and calls `refresh_skill_group()` on any adapter that exposes it (currently only Discord). Both sync and async implementations are supported; adapters that don't override the method (Telegram's BotCommand menu, Slack subcommand map, etc.) are silently skipped — the in-process `reload_skills()` call covers them. No `tree.sync()` is required because Discord fetches autocomplete options dynamically on every keystroke — mutating the instance state the callbacks already read from is sufficient. That sidesteps the per-app command-bucket rate limit (~5 writes / 20 s) that made the previous bulk-sync-on-reload approach unusable (#16713 context). Tests: tests/gateway/test_reload_skills_discord_resync.py — five cases covering (1) refresh replaces entries, (2) entries stay sorted after refresh, (3) collector exception leaves cached state intact, (4) `_refresh_skill_catalog_state` populates the instance attrs, (5) orchestrator calls `refresh_skill_group()` on sync + async adapters and skips adapters that don't expose it.	2026-05-02 02:00:11 -07:00
Teknium	6ec74aec07	fix(gateway): match disabled/optional skills by frontmatter slug, not dir name (#18753 ) _check_unavailable_skill is meant to turn a typed "/foo" command that doesn't resolve into a specific hint — "disabled, enable with hermes skills config" or "available but not installed, install with hermes skills install …" — instead of the generic "unknown command" reply. It was doing the match with `skill_md.parent.name.lower().replace("_", "-")`, comparing that to the typed command. For every skill whose directory name drifted from its declared frontmatter `name:`, that comparison failed and the user got the unhelpful generic path. On a standard install today 19 skills have this drift, e.g.: dir: mlops/stable-diffusion frontmatter: name: Stable Diffusion Image Generation registered slug (what the user types): /stable-diffusion-image-generation dir: mlops/qdrant frontmatter: name: Qdrant Vector Search registered slug: /qdrant-vector-search dir: mlops/flash-attention frontmatter: name: Optimizing Attention Flash registered slug: /optimizing-attention-flash In every case, _check_unavailable_skill would fall through because "stable-diffusion" != "stable-diffusion-image-generation", even with the skill sitting right there on disk. Fix: extract a small `_skill_slug_from_frontmatter` helper that reads the SKILL.md frontmatter and normalizes exactly like scan_skill_commands (lower, spaces/underscores → hyphens, strip non-[a-z0-9-], collapse runs of hyphens, strip edges). Use it in both the disabled-skills branch and the optional-skills branch. The disabled-set membership check now uses the declared frontmatter name (which is what `hermes skills config` writes into skills.disabled / platform_disabled), not the slug. Tests: five cases in tests/gateway/test_unavailable_skill_hint.py — the drift case for the disabled branch, unknown-command negative, matched-but-not-disabled negative, non-alnum stripping, and the drift case for the optional-skills branch. All five fail against main and pass with the fix.	2026-05-02 02:00:09 -07:00
Jacob Lizarraga	2470434d60	fix(telegram): probe polling liveness after reconnect to detect wedged Updater After a transient Telegram 502, _handle_polling_network_error's stop()+start_polling() cycle can leave PTB's Updater with `running=True` but a wedged consumer task that never makes progress. No error_callback fires in that state, so the reconnect ladder never advances past attempt 1, the MAX_NETWORK_RETRIES fatal-error path is never reached, and the gateway sits silent indefinitely. Schedule a heartbeat probe (60s after a successful reconnect) that verifies Updater.running is still True and bot.get_me() responds within a tight asyncio.wait_for timeout. Either failure feeds back into the reconnect ladder so the existing escalation path fires. No PTB-internal coupling, no Application rebuild — minimal additive defense inside the existing reconnect abstraction. Tests cover healthy / Updater non-running / probe timeout / probe network error / already-fatal cases, plus an integration check that the probe is actually scheduled after a successful start_polling(). Closes the silent-wedge case observed in the wild after a transient Telegram 502; existing reconnect tests updated to mock bot.get_me() now that the success path schedules a heartbeat probe.	2026-05-02 01:55:04 -07:00
Amr Essam	d05a87e686	fix(gateway): clear slack assistant thread status	2026-05-01 14:01:26 -07:00
hinotoi-agent	a147164d3c	fix(slack): preserve per-user slash-command session isolation	2026-05-01 14:01:26 -07:00
nightq	5cdc39e29a	fix(gateway): preserve case-sensitive chat IDs in DeliveryTarget.parse Fixes NousResearch/hermes-agent#11768 Root cause: target.strip().lower() was lowercasing the entire target string, corrupting case-sensitive chat IDs like Slack C123ABC and Matrix !RoomABC. Fix: Only lowercase the platform prefix for case-insensitive matching; preserve the original case for chat_id and thread_id values.	2026-05-01 14:01:26 -07:00
YAMAGUCHI Seiji	2b3923ff13	fix(gateway): coerce scalar free_response_channels to str before split YAML loads a bare numeric value such as discord: free_response_channels: 1491973769726791812 as an int. _discord_free_response_channels() / _slack_free_response_channels() checked `isinstance(raw, list)` and `isinstance(raw, str)` in that order and then fell through to `return set()`, so a single-channel config that happened to be unquoted was silently dropped with no log line — the bot kept demanding @mentions even though the channel was configured to free-response. A multi-channel value like `1234567890,9876543210` does not trip this because the comma forces YAML to parse it as a string. Single-channel configs are the only case that breaks, which is exactly the footgun that's hardest to diagnose (the config "looks right" and the feature just doesn't activate). Note that the old-schema env-var bridge at gateway/config.py:614+ already runs `str(frc)` when forwarding to SLACK_/DISCORD_FREE_RESPONSE_CHANNELS, so the env-var fallback worked. The bug only surfaces on the `config.extra["free_response_channels"]` path populated by the `platforms:` bridge at gateway/config.py:576, which passes the raw YAML value through unchanged. Fix at the reader: treat any non-list value as a scalar, coerce with str(), then apply the same CSV split semantics. This keeps the public contract stable (list or str-like continues to work identically) while accepting the ints that the YAML loader is free to hand us. Added tests for both Discord and Slack covering: - bare int value in config.extra - list of ints in config.extra	2026-05-01 14:01:26 -07:00
kshitijk4poor	8fcc160f6b	fix(gateway/slack): review fixes — scope ephemeral to commands, user isolation Self-review fixes for the slash ephemeral ack: - Only stash response_url when text starts with '/' (gateway command). Free-form questions via '/hermes <question>' must produce public agent replies visible to the whole channel, not ephemeral. - Use a ContextVar (_slash_user_id) to thread the invoking user's ID from _handle_slash_command through to send(). _pop_slash_context now matches the exact (channel_id, user_id) key when the ContextVar is set, preventing concurrent users on the same channel from stealing each other's ephemeral context. ContextVars propagate to child asyncio.Tasks, so the value survives through handle_message → _process_message_background → _send_with_retry → send(). - Add truncate_message() in _send_slash_ephemeral to prevent silent failures on long responses (response_url has the same ~40k limit). - Log send_private_notice failures at debug level instead of bare except/pass — aids diagnostics without spamming. - Document app_mention dedup dependency on shared event ts. - Add tests: free-form question must NOT stash context, concurrent users on the same channel get isolated contexts, non-slash send() path fallback behavior.	2026-05-01 13:33:06 -07:00
probepark	0ab2d752ff	feat(gateway): private notice delivery and Slack format_message fixes Adds platform-level private notice delivery abstraction so operational messages (e.g. sethome prompt) can be sent ephemerally on Slack when configured with `slack.notice_delivery: private`. Changes: - gateway/config.py: _normalize_notice_delivery() + GatewayConfig.get_notice_delivery() with per-platform config bridging - gateway/platforms/base.py: send_private_notice() default implementation (falls through to send()) - gateway/platforms/slack.py: send_private_notice() via chat_postEphemeral - gateway/run.py: _deliver_platform_notice() helper replaces direct adapter.send() for the sethome notice, with private→public fallback - gateway/platforms/slack.py: app_mention handler now forwards to _handle_slack_message (safe due to ts-based dedup) instead of no-op pass, fixing edge-case Slack configs where mentions arrive only as app_mention - gateway/platforms/slack.py format_message: negative lookbehind prevents markdown images (![]()) from becoming broken Slack links; italic regex now requires non-whitespace boundaries so 'a * b * c' stays literal Based on PR #9340 by @probepark.	2026-05-01 13:33:06 -07:00
kshitijk4poor	7cda0e5224	fix(gateway/slack): ephemeral ack and routing for slash commands Slack slash commands (/q, /btw, /stop, /model, etc.) previously showed no user-visible acknowledgement and posted command replies as public channel messages. This diverged from Discord, which uses ephemeral deferred responses for slash commands. Changes: - handle_hermes_command now passes response_type='ephemeral' and a 'Running /cmd…' text to ack(), giving the user immediate 'Only visible to you' feedback when they invoke any native slash command. - _handle_slash_command stashes the Slack response_url from the command payload in a per-channel context dict before dispatching to handle_message. - send() checks for a pending slash context and, when found, POSTs to the response_url with replace_original=true to swap the initial ack with the real command reply (e.g. 'Queued for the next turn.'), keeping it ephemeral. - Stale slash contexts are garbage-collected on lookup (120s TTL). - The response_url POST is non-fatal: if it fails, the user already saw the initial ack, and send() returns success=True. Fixes #18182	2026-05-01 13:33:06 -07:00
Teknium	f99676e315	fix(gateway): auto-restart when source files change out from under us (#17648 ) (#18409 ) Long-running gateway processes that survive 'hermes update' keep pre-update modules cached in sys.modules. When new tool files on disk then try to 'from hermes_cli.config import cfg_get' (added in PR #17304), the import resolves against the stale module object and raises ImportError — hitting users on Matrix, Telegram, Feishu, and other platforms. Two defenses: 1. Gateway self-check (gateway/run.py). On __init__, snapshot the newest mtime across sentinel source files (hermes_cli/config.py, run_agent.py, gateway/run.py, etc.). On every inbound message, re-read those mtimes; if any is newer than boot time + 2s slack, request a graceful restart via the normal drain path and return a one-line ack to the user. Idempotent, works regardless of how the update happened (hermes update, manual git pull, installer). 2. Post-restart survivor sweep ('hermes update'). After the existing restart loop, sleep 3s, rescan for gateway PIDs we already tried to kill, and SIGKILL any survivors. The detached profile watchers and systemd then relaunch with fresh code instead of waiting out the 120s watcher timeout. Closes #17648.	2026-05-01 09:50:08 -07:00
Siddharth Balyan	75e1339d4c	fix(telegram): send seed message after creating DM topics (#18334 ) Telegram's client does not display empty forum topics in the chat's topic list. After createForumTopic succeeds, send a short pin message into the new topic so it becomes immediately visible to the user. Only fires for newly created topics (no thread_id in config yet). Failure to send the seed is non-fatal (debug-logged, topic still works).	2026-05-01 15:21:56 +05:30
UgwujaGeorge	b7ad3f478f	fix(yuanbao): enforce owner identity check on group slash commands The bot-owner identity check inside OwnerCommandMiddleware was commented out and replaced with a hardcoded `is_owner = True`, so any group member could trigger allowlisted privileged commands (/approve, /deny, /stop, /reset, /retry, /undo, /new, /background, /bg, /btw, /queue, /q) by sending the slash command without @-mentioning the bot. The most severe case is /approve: a non-owner could approve a dangerous tool call the bot was waiting on the owner to confirm. Re-enable the documented identity check (push.from_account == push.bot_owner_id) so only the configured owner can issue these commands.	2026-04-30 23:57:55 -07:00
Mikey O'Brien	1be3b74cfb	fix(gateway): honor MATRIX_HOME_ROOM in onboarding	2026-04-30 23:13:34 -07:00
Teknium	265bd59c1d	feat: /goal — persistent cross-turn goals (Ralph loop) (#18262 ) Add a standing-goal slash command that keeps Hermes working toward a user-stated objective across turns until it is achieved, paused, or the turn budget runs out. Our take on the Ralph loop — cf. Codex CLI 0.128.0's /goal. After each turn, a lightweight auxiliary-model judge call asks 'is this goal satisfied by the assistant's last response?'. If not, and we're under the turn budget (default 20), Hermes feeds a continuation prompt back into the same session as a normal user message. Any real user message preempts the continuation loop automatically. Judge failures fail OPEN (continue) so a flaky judge never wedges progress — the turn budget is the real backstop. ### Commands - `/goal <text>` — set a standing goal (kicks off the first turn) - `/goal` or `/goal status` — show current state - `/goal pause` — pause the continuation loop - `/goal resume` — resume (resets turn counter) - `/goal clear` — drop the goal Works on both CLI and gateway platforms via the central CommandDef registry. ### Design invariants preserved - Prompt cache: continuation prompts are regular user-role messages appended to history. No system-prompt mutation, no toolset swap. - Role alternation: continuation is a user turn, never injected mid-tool-loop. - Session persistence: goal state lives in SessionDB.state_meta keyed by `goal:<session_id>`, so `/resume` picks it up. - Mid-run safety: on the gateway, `/goal status\|pause\|clear` are allowed mid-run (control-plane only); setting a new goal requires `/stop` first so we don't race a second continuation prompt against the current turn. ### Files - `hermes_cli/goals.py` (new, 380 lines) — GoalManager + judge + state - `hermes_cli/commands.py` — CommandDef entry - `hermes_cli/config.py` — `goals.max_turns` default - `hermes_cli/web_server.py` — dashboard category merge - `cli.py` — /goal handler + post-turn continuation hook in process_loop - `gateway/run.py` — /goal handler + post-turn continuation hook wrapping _handle_message_with_agent - `tests/hermes_cli/test_goals.py` (new, 26 tests) — judge parsing, fail-open semantics, lifecycle, persistence, budget exhaustion - `website/docs/reference/slash-commands.md` — docs entry	2026-04-30 23:10:20 -07:00
Teknium	4caad285a6	feat(gateway): auto-delete slash-command system notices after TTL (#18266 ) Adds opt-in auto-deletion for slash-command reply messages like "New session started!", "Restarting gateway…", "Stopped.", and YOLO toggles. After the TTL elapses the gateway calls the adapter's delete_message; on platforms without a delete API (everything except Telegram today) the TTL is silently ignored and the message stays. Requested on Twitter by @charlesmcdowell — tool-call bubbles are useful real-time, but system notices clutter the thread once the agent finishes. Implementation: - EphemeralReply(str) sentinel in gateway/platforms/base.py. Subclasses str so existing 'X' in response / response.startswith(...) checks in tests and call sites keep working unchanged; isinstance() still distinguishes it for the send path. - _process_message_background and both busy-session bypass paths (in base.py) call _unwrap_ephemeral() on the handler return, send the unwrapped text, and schedule a detached delete task when the TTL > 0 AND the adapter class overrides delete_message. - display.ephemeral_system_ttl (default 0 = disabled) in DEFAULT_CONFIG. Handler can pass ttl_seconds explicitly to override. - Wrapped the highest-noise return sites: /new, /reset, /stop, /yolo on/off, /restart success + "already in progress". Draining notices and /help output left as plain strings — those are informational and users want to read them. Backward-compat: default TTL 0 → no scheduling, no behavior change for existing users. Platforms without delete_message silently no-op.	2026-04-30 23:05:48 -07:00
Teknium	f0dc919f92	fix(compression): include system prompt + tool schemas in token estimates (#18265 ) The user-visible /compress banner and the post-compression last_prompt_tokens writeback both counted only the raw message transcript (chars/4). With a 15KB system prompt and 30 tool schemas (~26KB), a 4-message transcript that looks like ~45 tokens to the transcript-only estimator is really ~10.5K tokens of request pressure — a 234x gap. Two user-facing consequences: - Banner shows 'Compressing … (~45 tokens)…' while compression is actually firing on 10K+ tokens of real pressure, confusing users about why compression triggered (reported by @codecovenant on X; #6217). - Post-compression last_prompt_tokens writeback omits tool schemas, so the next should_compress() check compares real usage against a stale underestimate — compression triggers late, potentially past the model's context limit on small-context models (#14695). Swap estimate_messages_tokens_rough() for estimate_request_tokens_rough() at every user-visible banner and at the post-compression writeback. estimate_request_tokens_rough() already existed for exactly this purpose and includes system prompt + tool schemas. Touched call sites: - run_agent.py: post-compression last_prompt_tokens writeback, post-tool call should_compress() fallback when provider usage is missing - cli.py: /compress banner + summary - gateway/run.py: gateway /compress banner + summary - tui_gateway/server.py: TUI /compress status + summary - acp_adapter/server.py: ACP /compact before/after Left intentionally alone: - Session-hygiene fallback and the 'no agent' /status path in gateway/run.py — no agent instance is in scope to query for system prompt/tools, and the existing 30-50% overestimate wobble on hygiene is safety-accepted. - Verbose-mode 'Request size' logging — informational only, already counts system prompt via api_messages[0]. Also relabels the feedback line from 'Rough transcript estimate' to 'Approx request size' so the metric label matches what it actually measures. Credits: diagnoses from @devilardis (#14695) and @Jackten (#6217); user report @codecovenant on X (2026-04-30). Closes #14695 Closes #6217	2026-04-30 23:03:54 -07:00
Oxidane-bot	8d7500d80d	fix(gateway): snapshot callback generation after agent binds it, not before _process_message_background snapshotted callback_generation from the interrupt event at the TOP of the task — before the handler ran. _hermes_run_generation is only set on the event by GatewayRunner._bind_adapter_run_generation during _handle_message_with_agent, which runs DURING the handler await. The early snapshot always captured None, which then flowed into pop_post_delivery_callback(..., generation=None) in the finally block. In pop_post_delivery_callback, generation=None with a tuple-registered entry (generation, callback) bypasses the ownership check — it pops and fires the callback regardless of which run owns it. Result: a stale run could fire a fresher run's post-delivery callback (e.g. a background-review notification attributed to the wrong turn). Fix: move the snapshot into the finally block, after the handler has run and _hermes_run_generation has been bound to the current run. Regression test added: simulates a stale handler at generation=1 and a fresher callback registered at generation=2. Pre-fix: snapshot=None → pop fires the generation=2 callback under generation=1's ownership ("newer" fires). Post-fix: snapshot=1 → pop skips the mismatched entry, callback stays in the dict for the correct run to claim. Verified: test FAILS on current main (captures "newer" in fired list), PASSES with this fix. Salvaged from PR #12565 (the callback-ownership portion only; the /status totals portion was already fixed on main in `7abc9ce4d` via #17158). Co-authored-by: Oxidane-bot <1317078257maroon@gmail.com>	2026-04-30 20:41:18 -07:00
Teknium	27ec74c68a	fix: coerce show_reasoning and guard_agent_created config bools Widens #16528 to two sibling sites that had the same quoted-boolean bug: a YAML string "false" (or "0", "no", "off") silently evaluated truthy under bool() / if-check. - gateway/run.py _load_show_reasoning: is_truthy_value wrap - tools/skill_manager_tool.py _guard_agent_created_enabled: is_truthy_value wrap - regression tests for both	2026-04-30 20:40:46 -07:00
johnncenae	bb706c3f38	fix(gateway): coerce tool_progress_command as a real boolean	2026-04-30 20:40:46 -07:00
simbam99	7ba1a2b3df	fix(gateway): preserve assistant metadata when branching sessions	2026-04-30 20:40:28 -07:00
simbam99	ccfe6a47c3	fix(gateway): coerce StreamingConfig booleans and malformed numerics safely	2026-04-30 20:37:49 -07:00
hharry11	158eb32686	fix(gateway): preserve document type when merging queued events	2026-04-30 20:37:27 -07:00
Roy-oss1	b94cb8e2c4	feat(feishu): operator-configurable bot admission and mention policy Add two operator-facing toggles for inbound Feishu admission, enabling bot-to-bot scenarios such as A2A orchestration and inter-bot notifications: FEISHU_ALLOW_BOTS=none\|mentions\|all (default: none) Accept messages from other bots. `mentions` requires the peer bot to @-mention Hermes; `all` admits every peer-bot message. FEISHU_REQUIRE_MENTION=true\|false (default: true) Whether group messages must @-mention the bot. Override per-chat via `group_rules.<chat_id>.require_mention` in config.yaml. Defaults preserve prior behavior. Self-echo protection is always on: when the bot's identity is unresolved (auto-detection failed and FEISHU_BOT_OPEN_ID unset), peer-bot messages are rejected fail-closed to avoid feedback loops. Admitted peer bots bypass the human-user allowlist (FEISHU_ALLOWED_USERS) to match existing Discord behavior; humans still need an explicit allowlist entry. yaml feishu.allow_bots is bridged to the env var so the adapter and gateway auth layer share one source of truth. Resolving peer-bot display names requires the application:bot.basic_info:read scope; without it, peers still route but appear as their open_id. Test: tests/gateway/test_feishu_bot_admission.py covers the admission pipeline, group-policy bot-bypass, hydration, and event-dispatch plumbing as a parametrized matrix. Change-Id: I363cccb578c2a5c8b8bf0f0a890c01c89909e256	2026-04-30 20:30:31 -07:00
buray	fa9fd26acb	fix(gateway): re-inject topic-bound skill after /new or /reset reset_session() creates a fresh SessionEntry with created_at == updated_at, but get_or_create_session() bumps updated_at on the next inbound message, causing _is_new_session in _handle_message_with_agent to evaluate False. The topic/channel skill auto-load gate (group_topics, channel_skill_bindings) silently skips the first message after a manual reset. Add an is_fresh_reset flag on SessionEntry, set by reset_session() and consumed once by the message handler. Kept distinct from was_auto_reset because that flag also drives a 'session expired due to inactivity' user-facing notice and a context-note prepend — both wrong for an explicit /new or /reset. Persisted through to_dict/from_dict so the flag survives gateway restart between /reset and the next message. Fixes #6508 Co-authored-by: warabe1122 <45554392+warabe1122@users.noreply.github.com> Co-authored-by: willy-scr <187001140+willy-scr@users.noreply.github.com>	2026-04-30 20:29:19 -07:00
Jezza Hehn	7abc9ce4df	fix(gateway): read /status token totals from SessionDB (#17158 ) /status was reading session_entry.total_tokens from the in-memory SessionStore (gateway/session.py), which the agent never writes to — so the token count was always 0. The agent already persists token deltas to the SQLite SessionDB (run_agent.py:11497) for every platform with a session_id. Route /status through that single source of truth instead of duplicating token writes into a second store. Fix: - gateway/run.py: _handle_status_command now calls self._session_db.get_session(session_id) and sums the five token component columns (input/output/cache_read/cache_write/reasoning). Falls back to 0 when no SessionDB is configured or no row exists. - Two new regression tests covering the populated-row and missing-row paths. Co-authored-by: Hermes <127238744+teknium1@users.noreply.github.com>	2026-04-30 20:28:50 -07:00
Teknium	a178081468	fix(gateway): use _session_key_for_source for native image buffer write Minor follow-up to the native-image-buffer isolation fix. The write site in _prepare_inbound_message_text was calling build_session_key directly, while every other call site in gateway/run.py uses the _session_key_for_source helper — which consults session_store._generate_session_key first and falls back to build_session_key. Keeping the write key and consume key on the same helper prevents key drift if the session store ever overrides the default keying behavior.	2026-04-30 20:26:35 -07:00
Yukipukii1	bdb7edd89e	fix(gateway): isolate pending native image paths by session	2026-04-30 20:26:35 -07:00
Teknium	9a75743496	fix(gateway): apply agent.disabled_toolsets in gateway message loop Widens the cherry-picked fix from @jatingodnani (#17343) to the gateway path. On main, user_config.agent.disabled_toolsets was only honored by _get_platform_tools' name-level subtraction — it did not catch tools pulled in implicitly by a composite toolset (browser includes web_search, hermes-* platforms include most tools). Changes: - gateway/run.py: resolve disabled_toolsets alongside enabled_toolsets and pass to AIAgent at both user-facing construction sites (normal message loop + single-turn cron-like path). Hygiene/compression agents (fixed enabled_toolsets=[memory]) are intentionally untouched. - gateway/run.py: add (agent, disabled_toolsets) to _CACHE_BUSTING_CONFIG_KEYS so editing the list in config.yaml invalidates the cached AIAgent on the next message. - cli.py: drop unused 'import platform' left over from PR #17343's import churn; restore 'import sys' used throughout the file. - model_tools.py: drop unused 'import os, sys' added by PR #17343; fix comment reference from #15291 (unrelated OAuth issue) to #17309. Co-authored-by: jatin godnani <godnanijatin@gmail.com>	2026-04-30 20:24:39 -07:00
Teknium	01cc701e54	docs + nit: busy_ack_enabled follow-ups - Move the disabled-ack guard above the debounce so we don't stamp _busy_ack_ts[session_key] when no ack was actually sent. Harmless (never read when disabled) but cosmetically off. - Document display.busy_ack_enabled in user-guide/messaging/index.md and HERMES_GATEWAY_BUSY_ACK_ENABLED in reference/environment-variables.md. - Add JezzaHehn to scripts/release.py AUTHOR_MAP for contributor credit. Follow-up to #17491 (Jezza Hehn).	2026-04-30 20:22:30 -07:00
Jezza Hehn	2b512cbca4	feat(gateway): add busy_ack_enabled config option to suppress ack messages When a user sends a message while the gateway is busy processing, an acknowledgment message is sent. This can be spammy for users who send rapid messages. Add display.busy_ack_enabled config option (default: true) to allow users to suppress these busy-input acknowledgment messages. Fixes #17457	2026-04-30 20:22:30 -07:00

1 2 3 4 5 ...

1391 commits