hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Author	SHA1	Message	Date
Teknium	62c14d5513	refactor(gateway): extract WhatsApp identity helpers into shared module Follow-up to the canonical-identity session-key fix: pull the JID/LID normalize/expand/canonical helpers into gateway/whatsapp_identity.py instead of living in two places. gateway/session.py (session-key build) and gateway/run.py (authorisation allowlist) now both import from the shared module, so the two resolution paths can't drift apart. Also switches the auth path from module-level _hermes_home (cached at import time) to dynamic get_hermes_home() lookup, which matches the session-key path and correctly reflects HERMES_HOME env overrides. The lone test that monkeypatched gateway.run._hermes_home for the WhatsApp auth path is updated to set HERMES_HOME env var instead; all other tests that monkeypatch _hermes_home for unrelated paths (update, restart drain, shutdown marker, etc.) still work — the module-level _hermes_home is untouched.	2026-04-24 07:55:55 -07:00
Keira Voss	10deb1b87d	fix(gateway): canonicalize WhatsApp identity in session keys Hermes' WhatsApp bridge routinely surfaces the same person under either a phone-format JID (60123456789@s.whatsapp.net) or a LID (…@lid), and may flip between the two for a single human within the same conversation. Before this change, build_session_key used the raw identifier verbatim, so the bridge reshuffling an alias form produced two distinct session keys for the same person — in two places: 1. DM chat_id — a user's DM sessions split in half, transcripts and per-sender state diverge. 2. Group participant_id (with group_sessions_per_user enabled) — a member's per-user session inside a group splits in half for the same reason. Add a canonicalizer that walks the bridge's lid-mapping-*.json files and picks the shortest/numeric-preferred alias as the stable identity. build_session_key now routes both the DM chat_id and the group participant_id through this helper when the platform is WhatsApp. All other platforms and chat types are untouched. Expose canonical_whatsapp_identifier and normalize_whatsapp_identifier as public helpers. Plugins that need per-sender behaviour (role-based routing, per-contact authorization, policy gating) need the same identity resolution Hermes uses internally; without a public helper, each plugin would have to re-implement the walker against the bridge's internal on-disk format. Keeping this alongside build_session_key makes it authoritative and one refactor away if the bridge ever changes shape. _expand_whatsapp_aliases stays private — it's an implementation detail of how the mapping files are walked, not a contract callers should depend on.	2026-04-24 07:55:55 -07:00
Blind Dev	591aa159aa	feat: allow Telegram chat allowlists for groups and forums (#15027 ) * feat: allow Telegram chat allowlists for groups and forums * chore: map web3blind noreply email for release attribution --------- Co-authored-by: web3blind <web3blind@users.noreply.github.com>	2026-04-24 07:23:14 -07:00
Stefan Dimitrov	260ae62134	Invoke session finalize hooks on expiry flush	2026-04-24 05:40:52 -07:00
Tranquil-Flow	ee83a710f0	fix(gateway,cron): activate fallback_model when primary provider auth fails When the primary provider raises AuthError (expired OAuth token, revoked API key), the error was re-raised before AIAgent was created, so fallback_model was never consulted. Now both gateway/run.py and cron/scheduler.py catch AuthError specifically and attempt to resolve credentials from the fallback_providers/fallback_model config chain before propagating the error. Closes #7230	2026-04-24 05:35:43 -07:00
Nicecsh	fe34741f32	fix(model): repair Discord Copilot /model flow Keep Discord Copilot model switching responsive and current by refreshing picker data from the live catalog when possible, correcting the curated fallback list, and clearing stale controls before the switch completes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-24 03:33:29 -07:00
Teknium	b2e124d082	refactor(commands): drop /provider, /plan handler, and clean up slash registry (#15047 ) * refactor(commands): drop /provider and clean up slash registry * refactor(commands): drop /plan special handler — use plain skill dispatch	2026-04-24 03:10:52 -07:00
Teknium	8a1e247c6c	fix(discord): honor wildcard '' in ignored_channels and free_response_channels Follow-up to the allowed_channels wildcard fix in the preceding commit. The same '' literal trap affected two other Discord channel config lists: - DISCORD_IGNORED_CHANNELS: '' was stored as the literal string in the ignored set, and the intersection check never matched real channel IDs, so '' was a no-op instead of silencing every channel. - DISCORD_FREE_RESPONSE_CHANNELS: same shape — '' never matched, so the bot still required a mention everywhere. Add a '' short-circuit to both checks, matching the allowed_channels semantics. Extend tests/gateway/test_discord_allowed_channels.py with regression coverage for all three lists. Refs: #14920	2026-04-24 03:04:42 -07:00
Mrunmayee Rane	8598746e86	fix(discord): honor wildcard '' in DISCORD_ALLOWED_CHANNELS allowed_channels: "" in config (or DISCORD_ALLOWED_CHANNELS="" env var) is meant to allow all channels, but the check was comparing numeric channel IDs against the literal string set {""} via set intersection — always empty, so every message was silently dropped. Add a "*" short-circuit before the set intersection, consistent with every other platform's allowlist handling (Signal, Slack, Telegram all do this). Fixes #14920	2026-04-24 03:04:42 -07:00
Keira Voss	1ef1e4c669	feat(plugins): add pre_gateway_dispatch hook Introduces a new plugin hook `pre_gateway_dispatch` fired once per incoming MessageEvent in `_handle_message`, after the internal-event guard but before the auth / pairing chain. Plugins may return a dict to influence flow: {"action": "skip", "reason": "..."} -> drop (no reply) {"action": "rewrite", "text": "..."} -> replace event.text {"action": "allow"} / None -> normal dispatch Motivation: gateway-level message-flow patterns that don't fit cleanly into any single adapter — e.g. listen-only group-chat windows (buffer ambient messages, collapse on @mention), or human-handover silent ingest (record messages while an owner handles the chat manually). Today these require forking core; with this hook they can live in a single profile-agnostic plugin. Hook runs BEFORE auth so plugins can handle unauthorized senders (e.g. customer-service handover ingest) without triggering the pairing-code flow. Exceptions in plugin callbacks are caught and logged; the first non-None action dict wins, remaining results are ignored. Includes: - `VALID_HOOKS` entry + inline doc in `hermes_cli/plugins.py` - Invocation block in `gateway/run.py::_handle_message` - 5 new tests in `tests/gateway/test_pre_gateway_dispatch.py` (skip, rewrite, allow, exception safety, internal-event bypass) - 2 additional tests in `tests/hermes_cli/test_plugins.py` - Table entry in `website/docs/user-guide/features/plugins.md` Made-with: Cursor	2026-04-24 03:02:03 -07:00
Teknium	a9a4416c7c	fix(compress): don't reach into ContextCompressor privates from /compress (#15039 ) Manual /compress crashed with 'LCMEngine' object has no attribute '_align_boundary_forward' when any context-engine plugin was active. The gateway handler reached into _align_boundary_forward and _find_tail_cut_by_tokens on tmp_agent.context_compressor, but those are ContextCompressor-specific — not part of the generic ContextEngine ABC — so every plugin engine (LCM, etc.) raised AttributeError. - Add optional has_content_to_compress(messages) to ContextEngine ABC with a safe default of True (always attempt). - Override it in the built-in ContextCompressor using the existing private helpers — preserves exact prior behavior for 'compressor'. - Rewrite gateway /compress preflight to call the ABC method, deleting the private-helper reach-in. - Add focus_topic to the ABC compress() signature. Make _compress_context retry without focus_topic on TypeError so older strict-sig plugins don't crash on manual /compress <focus>. - Regression test with a fake ContextEngine subclass that only implements the ABC (mirrors LCM's surface). Reported by @selfhostedsoul (Discord, Apr 22).	2026-04-24 02:55:43 -07:00
whitehatjr1001	9d147f7fde	fix(gateway): enhance message handling during agent tasks with queue mode support	2026-04-23 15:12:42 -07:00
Teknium	b61ac8964b	fix(gateway/discord): read permission attrs from AppCommand, canonicalize contexts Follow-up to Magaav's safe sync policy. Two gaps in the canonicalizer caused false diffs or silent drift: 1. discord.py's AppCommand.to_dict() omits nsfw, dm_permission, and default_member_permissions — those live only on attributes. The canonicalizer was reading them via payload.get() and getting defaults (False/True/None), while the desired side from Command.to_dict(tree) had the real values. Any command using non-default permissions false-diffed on every startup. Pull them from the AppCommand attributes via _existing_command_to_payload(). 2. contexts and integration_types weren't canonicalized at all, so drift in either was silently ignored. Added both to _canonicalize_app_command_payload (sorted for stable compare). Also normalized default_member_permissions to str-or-None since the server emits strings but discord.py stores ints locally. Added regression tests for both gaps.	2026-04-23 15:11:56 -07:00
Magaav	a1ff6b45ea	fix(gateway/discord): add safe startup slash sync policy Replaces blind tree.sync() on every Discord reconnect with a diff-based reconcile. In safe mode (default), fetch existing global commands, compare desired vs existing payloads, skip unchanged, PATCH changed, recreate when non-patchable metadata differs, POST missing, and delete stale commands one-by-one. Keeps 'bulk' for legacy behavior and 'off' to skip startup sync entirely. Fixes restart-heavy workflows that burn Discord's command write budget and can surface 429s when iterating on native slash commands. Env var: DISCORD_COMMAND_SYNC_POLICY (safe\|bulk\|off), default 'safe'. Co-authored-by: Codex <codex@openai.invalid>	2026-04-23 15:11:56 -07:00
hharry11	d0821b0573	fix(gateway): only clear locks belonging to the replaced process	2026-04-23 15:07:06 -07:00
Teknium	97b9b3d6a6	fix(gateway): drain-aware hermes update + faster still-working pings (#14736 ) cmd_update no longer SIGKILLs in-flight agent runs, and users get 'still working' status every 3 min instead of 10. Two long-standing sources of '@user — agent gives up mid-task' reports on Telegram and other gateways. Drain-aware update: - New helper hermes_cli.gateway._graceful_restart_via_sigusr1(pid, drain_timeout) sends SIGUSR1 to the gateway and polls os.kill(pid, 0) until the process exits or the budget expires. - cmd_update's systemd loop now reads MainPID via 'systemctl show --property=MainPID --value' and tries the graceful path first. The gateway's existing SIGUSR1 handler -> request_restart(via_service= True) -> drain -> exit(75) is wired in gateway/run.py and is respawned by systemd's Restart=on-failure (and the explicit RestartForceExitStatus=75 on newer units). - Falls back to 'systemctl restart' when MainPID is unknown, the drain budget elapses, or the unit doesn't respawn after exit (older units missing Restart=on-failure). Old install behavior preserved. - Drain budget = max(restart_drain_timeout, 30s) + 15s margin so the drain loop in run_agent + final exit have room before fallback fires. Composes with #14728's tool-subprocess reaping. Notification interval: - agent.gateway_notify_interval default 600 -> 180. - HERMES_AGENT_NOTIFY_INTERVAL env-var fallback in gateway/run.py matched. - 9-minute weak-model spinning runs now ping at 3 min and 6 min instead of 27 seconds before completion, removing the 'is the bot dead?' reflex that drives gateway-restart cycles. Tests: - Two new tests in tests/hermes_cli/test_update_gateway_restart.py: one asserts SIGUSR1 is sent and 'systemctl restart' is NOT called when MainPID is known and the helper succeeds; one asserts the fallback fires when the helper returns False. - E2E: spawned detached bash processes confirm the helper returns True on SIGUSR1-handling exit (~0.5s) and False on SIGUSR1-ignoring processes (timeout). Verified non-existent PID and pid=0 edge cases. - 41/41 in test_update_gateway_restart.py (was 39, +2 new). - 154/154 in shutdown-related suites including #14728's new tests. Reported by @GeoffWellman and @ANT_1515 on X.	2026-04-23 14:01:57 -07:00
Teknium	327b57da91	fix(gateway): kill tool subprocesses before adapter disconnect on drain timeout (#14728 ) Closes #8202. Root cause: stop() reclaimed tool-call bash/sleep children only at the very end of the shutdown sequence — after a 60s drain, 5s interrupt grace, and per-adapter disconnect. Under systemd (TimeoutStopSec bounded by drain_timeout), that meant the cgroup SIGKILL escalation fired first, and systemd reaped the bash/sleep children instead of us. Fix: - Extract tool-subprocess cleanup into a local helper _kill_tool_subprocesses() in _stop_impl(). - Invoke it eagerly right after _interrupt_running_agents() on the drain-timeout path, before adapter disconnect. - Keep the existing catch-all call at the end for the graceful path and defense in depth against mid-teardown respawns. - Bump generated systemd unit TimeoutStopSec to drain_timeout + 30s so cleanup + disconnect + DB close has headroom above the drain budget, matching the 'subprocess timeout > TimeoutStopSec + margin' rule from the skill. Tests: - New: test_gateway_stop_kills_tool_subprocesses_before_adapter_disconnect_on_timeout asserts kill_all() runs before disconnect() when drain times out. - New: test_gateway_stop_kills_tool_subprocesses_on_graceful_path guards that the final catch-all still fires when drain succeeds (regression guard against accidental removal during refactor). - Updated: existing systemd unit generator tests expect TimeoutStopSec=90 (= 60s drain + 30s headroom) with explanatory comment.	2026-04-23 13:59:29 -07:00
Teknium	5651a73331	fix(gateway): guard-match the finally-block _active_sessions delete Before this, _process_message_background's finally did an unconditional 'del self._active_sessions[session_key]' — even if a /stop/ /new command had already swapped in its own command_guard via _dispatch_active_session_command and cancelled us. The old task's unwind would clobber the newer guard, opening a race for follow-ups. Replace with _release_session_guard(session_key, guard=interrupt_event) so the delete only fires when the guard we captured is still the one installed. The sibling _session_tasks pop already had equivalent ownership matching via asyncio.current_task() identity; this closes the asymmetry. Adds two direct regressions in test_session_split_brain_11016: - stale guard reference must not clobber a newer guard by identity - guard=None default still releases unconditionally (for callers that don't have a captured guard to match against) Refs #11016	2026-04-23 05:15:52 -07:00
etcircle	b7bdf32d4e	fix(gateway): guard session slot ownership after stop/reset Closes the runner-side half of the split-brain described in issue #11016 by wiring the existing _session_run_generation counter through the session-slot promotion and release paths. Without this, an older async run could still: - promote itself from sentinel to real agent after /stop or /new invalidated its run generation - clear _running_agents on the way out, deleting a newer run's slot Both races leave _running_agents desynced from what the user actually has in flight, which is half of what shows up as 'No active task to stop' followed by late 'Interrupting current task...' acks. Changes: - track_agent() in _run_agent now calls _is_session_run_current() before writing the real agent into _running_agents[session_key]; if /stop or /new bumped the generation while the agent was spinning up, the slot is left alone (the newer run owns it). - _release_running_agent_state() gained an optional run_generation keyword. When provided, it only clears the slot if the generation is still current. The final cleanup at the tail of _run_agent passes the run's generation so an old unwind can't blow away a newer run's state. - Returns bool so callers can tell when a release was blocked. All the existing call sites that do NOT pass run_generation behave exactly as before — this is a strict additive guard. Refs #11016	2026-04-23 05:15:52 -07:00
dyxushuai	d72985b7ce	fix(gateway): serialize reset command handoff and heal stale session locks Closes the adapter-side half of the split-brain described in issue #11016 where _active_sessions stays live but nothing is processing, trapping the chat in repeated 'Interrupting current task...' while /stop reports no active task. Changes on BasePlatformAdapter: - Add _session_tasks: Dict[str, asyncio.Task] mapping session -> owner task so session-terminating commands can cancel the right task and old task finally blocks can't clobber a newer task's guard. - Add _release_session_guard(guard=...) that only releases if the guard Event still matches, preventing races where /stop or /new swaps in a temporary guard while the old task unwinds. - Add _session_task_is_stale() and _heal_stale_session_lock() for on-entry self-heal: when handle_message() sees an _active_sessions entry whose RECORDED owner task is done/cancelled, clear it and fall through to normal dispatch. No owner task recorded = not stale (some tests install guards directly and shouldn't be auto-healed). - Add cancel_session_processing() as the explicit adapter-side cancel API so /stop/ /new/ /reset can cleanly tear down in-flight work. - Route /stop, /new, /reset through _dispatch_active_session_command(): 1. install a temporary command guard so follow-ups stay queued 2. let the runner process the command 3. cancel the old adapter task AFTER the runner response is ready 4. release the command guard and drain the latest pending follow-up - _start_session_processing() replaces the inline create_task + guard setup in handle_message() so guard + owner-task entry land atomically. - cancel_background_tasks() also clears _session_tasks. Combined, this means: - /stop / /new / /reset actually cancel stuck work instead of leaving adapter state desynced from runner state. - A dead session lock self-heals on the next inbound message rather than persisting until gateway restart. - Follow-up messages after /new are processed in order, after the reset command's runner response lands. Refs #11016	2026-04-23 05:15:52 -07:00
David VV	39fcf1d127	fix(model_switch): group custom_providers by endpoint in /model picker (#9210 ) Multiple custom_providers entries sharing the same base_url + api_key are now grouped into a single picker row. A local Ollama host with per-model display names ("Ollama — GLM 5.1", "Ollama — Qwen3-coder", "Ollama — Kimi K2", "Ollama — MiniMax M2.7") previously produced four near-duplicate picker rows that differed only by suffix; now it appears as one "Ollama" row with four models. Key changes: - Grouping key changed from slug-by-name to (base_url, api_key). Names frequently differ per model while the endpoint stays the same. - When the grouped endpoint matches current_base_url, the row's slug is set to current_provider so picker-driven switches route through the live credential pipeline (no re-resolution needed). - Per-model suffix is stripped from the display name ("Ollama — X" → "Ollama") via em-dash / " - " separators. - Two groups with different api_keys at the same base_url (or otherwise colliding on cleaned name) are disambiguated with a numeric suffix (custom:openai, custom:openai-2) so both stay visible. - current_base_url parameter plumbed through both gateway call sites. Existing #8216, #11499, #13509 regressions covered (dict/list shapes of models:, section-3/section-4 dedup, normalized list-format entries). Salvaged from @davidvv's PR #9210 — the underlying code had diverged ~1400 commits since that PR was opened, so this is a reconstruction of the same approach on current main rather than a clean cherry-pick. Authorship preserved via --author on this commit. Closes #9210	2026-04-23 03:10:30 -07:00
phpoh	4c02e4597e	fix(status): catch OSError in os.kill(pid, 0) for Windows compatibility On Windows, os.kill(nonexistent_pid, 0) raises OSError with WinError 87 ("The parameter is incorrect") instead of ProcessLookupError. Without catching OSError, the acquire_scoped_lock() and get_running_pid() paths crash on any invalid PID check — preventing gateway startup on Windows whenever a stale PID file survives from a prior run. Adapted @phpoh's fix in #12490 onto current main. The main file was refactored in the interim (get_running_pid now iterates over (primary_record, fallback_record) with a per-iteration try/except), so the OSError catch is added as a new except clause after PermissionError (which is a subclass of OSError, so order matters: PermissionError must match first). Co-authored-by: phpoh <1352808998@qq.com>	2026-04-23 03:05:49 -07:00
Lind3ey	9dba75bc38	fix(feishu): issue where streaming edits in Feishu show extra leading newlines	2026-04-23 03:02:09 -07:00
roytian1217	8b1ff55f53	fix(wecom): strip @mention prefix in group chats for slash command recognition In WeCom group chats, messages sent as "@BotName /command" arrive with the @mention prefix intact. This causes is_command() to return False since the text does not start with "/". Strip the leading @mention in group messages before creating the MessageEvent, mirroring the existing behavior in the Telegram adapter.	2026-04-23 02:00:56 -07:00
Nan93	2f48c58b85	fix: normalize iOS unicode dashes in slash command args iOS auto-corrects -- to — (em dash) and - to – (en dash), causing commands like /model glm-4.7 —provider zai to fail with 'Model names cannot contain spaces'. Normalize at get_command_args().	2026-04-22 21:30:32 -07:00
fengtianyu88	ec7e92082d	fix(qqbot): add backoff upper-bound check for QQCloseError reconnect path The QQCloseError (non-4008) reconnect path in _listen_loop was missing the MAX_RECONNECT_ATTEMPTS upper-bound check that exists in both the Exception handler (line 546) and the 4008 rate-limit handler (line 486). Without this check, if _reconnect() fails permanently for any non-4008 close code, backoff_idx grows indefinitely and the bot retries forever at 60-second intervals instead of giving up cleanly. Fix: add the same guard after backoff_idx += 1 in the general QQCloseError branch, consistent with the existing Exception path.	2026-04-22 21:16:16 -07:00
fuleinist	e371af1df2	Add config option to disable Discord slash commands Add discord.slash_commands config option (default: true) to allow users to disable Discord slash command registration when running alongside other bots that use the same command names. When set to false in config.yaml: discord: slash_commands: false The _register_slash_commands() call is skipped while text-based parsing of /commands continues to work normally. Fixes #4881	2026-04-22 20:03:39 -07:00
huangke	6209e85e7d	feat: support document/archive extensions in MEDIA: tag extraction Add epub, pdf, zip, rar, 7z, docx, xlsx, pptx, txt, csv, apk, ipa to the MEDIA: path regex in extract_media(). These file types were already routed to send_document() in the delivery loop (base.py:1705), but the extraction regex only matched media extensions (audio/video/image), causing document paths to fall through to the generic \S+ branch which could fail silently in some cases. This explicit list ensures reliable matching and delivery for all common document formats.	2026-04-22 19:59:11 -07:00
Teknium	36730b90c4	fix(gateway): also clear session-scoped approval state on /new Follow-up to the /resume and /branch cleanup in the previous commit: /new is a conversation-boundary operation too, so session-scoped dangerous-command approvals and /yolo state must not survive it. Adds a scoped unit test for _clear_session_boundary_security_state that also covers the /new path (which calls the same helper).	2026-04-22 18:26:59 -07:00
Es1la	050aabe2d4	fix(gateway): reset approval and yolo state on session boundary	2026-04-22 18:26:59 -07:00
Teknium	9bd1518425	fix(feishu): correct identity model docs and prefer tenant-scoped user_id Feishu's open_id is app-scoped (same user gets different open_ids per bot app), not a canonical identity. Functionally correct for single-bot mode but semantically misleading. - Add comprehensive Feishu identity model documentation to module docstring - Prefer user_id (tenant-scoped) over open_id (app-scoped) in _resolve_sender_profile when both are available - Document bot_open_id usage for @mention matching - Update user_id_alt comment in SessionSource to be platform-generic Ref: closes analysis from PR #8388 (closed as over-scoped)	2026-04-22 18:06:22 -07:00
Xiping Hu	c0df4a0a7f	fix(email): accept **kwargs in send_document to handle metadata param	2026-04-22 17:34:05 -07:00
Evan	e67eb7ff4b	fix(gateway): add hermes-gateway script pattern to PID detection The _looks_like_gateway_process function was missing the hermes-gateway script pattern, causing dashboard to report gateway as not running even when the process was active. Patterns now cover all entry points: - hermes_cli.main gateway - hermes_cli/main.py gateway - hermes gateway - hermes-gateway (new) - gateway/run.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-22 17:24:15 -07:00
Teknium	402d048eb6	fix(gateway): also unlink stale PID + lock files on cleanup Follow-up for salvaged PR #14179. `_cleanup_invalid_pid_path` previously called `remove_pid_file()` for the default PID path, but that helper defensively refuses to delete a PID file whose pid field differs from `os.getpid()` (to protect --replace handoffs). Every realistic stale-PID scenario is exactly that case: a crashed/Ctrl+C'd gateway left behind a PID file owned by a now-dead foreign PID. Once `get_running_pid()` has confirmed the runtime lock is inactive, the on-disk metadata is known to belong to a dead process, so we can force-unlink both the PID file and the sibling `gateway.lock` directly instead of going through the defensive helper. Also adds a regression test with a dead foreign PID that would have failed against the previous cleanup logic.	2026-04-22 16:33:46 -07:00
helix4u	b52123eb15	fix(gateway): recover stale pid and planned restart state	2026-04-22 16:33:46 -07:00
Teknium	51ca575994	feat(gateway): expose plugin slash commands natively on all platforms + decision-capable command hook Plugin slash commands now surface as first-class commands in every gateway enumerator — Discord native slash picker, Telegram BotCommand menu, Slack /hermes subcommand map — without a separate per-platform plugin API. The existing 'command:<name>' gateway hook gains a decision protocol via HookRegistry.emit_collect(): handlers that return a dict with {'decision': 'deny'\|'handled'\|'rewrite'\|'allow'} can intercept slash command dispatch before core handling runs, unifying what would otherwise have been a parallel 'pre_gateway_command' hook surface. Changes: - gateway/hooks.py: add HookRegistry.emit_collect() that fires the same handler set as emit() but collects non-None return values. Backward compatible — fire-and-forget telemetry hooks still work via emit(). - hermes_cli/plugins.py: add optional 'args_hint' param to register_command() so plugins can opt into argument-aware native UI registration (Discord arg picker, future platforms). - hermes_cli/commands.py: add _iter_plugin_command_entries() helper and merge plugin commands into telegram_bot_commands() and slack_subcommand_map(). New is_gateway_known_command() recognizes both built-in and plugin commands so the gateway hook fires for either. - gateway/platforms/discord.py: extract _build_auto_slash_command helper from the COMMAND_REGISTRY auto-register loop and reuse it for plugin-registered commands. Built-in name conflicts are skipped. - gateway/run.py: before normal slash dispatch, call emit_collect on command:<canonical> and honor deny/handled/rewrite/allow decisions. Hook now fires for plugin commands too. - scripts/release.py: AUTHOR_MAP entry for @Magaav. - Tests: emit_collect semantics, plugin command surfacing per platform, decision protocol (deny/handled/rewrite/allow + non-dict tolerance), Discord plugin auto-registration + conflict skipping, is_gateway_known_command. Salvaged from #14131 (@Magaav). Original PR added a parallel 'pre_gateway_command' hook and a platform-keyed plugin command registry; this re-implementation reuses the existing 'command:<name>' hook and treats plugin commands as platform-agnostic so the same capability reaches Telegram and Slack without new API surface. Co-authored-by: Magaav <73175452+Magaav@users.noreply.github.com>	2026-04-22 16:23:21 -07:00
Roy-oss1	e86acad8f1	feat(feishu): preserve @mention context on inbound messages Resolve Feishu @_user_N / @_all placeholders into display names plus a structured [Mentioned: Name (open_id=...), ...] hint so agents can both reason about who was mentioned and call Feishu OpenAPI tools with stable open_ids. Strip bot self-mentions only at message edges (leading unconditionally, trailing only before whitespace/terminal punctuation) so commands parse cleanly while mid-text references are preserved. Covers both plain-text and rich-post payloads. Also fixes a pre-existing hydration bug: Client.request no longer accepts the 'method' kwarg on lark-oapi 1.5.3, so bot identity silently failed to hydrate and self-filtering never worked. Migrate to the BaseRequest.builder() pattern and accept the 'app_name' field the API actually returns. Tighten identity matching precedence so open_id is authoritative when present on both sides.	2026-04-22 14:44:07 -07:00
Junass1	61d0a99c11	fix(debug): sweep expired pending pastes on slash debug paths	2026-04-22 11:59:39 -07:00
kshitijk4poor	1f216ecbb4	feat(gateway/slack): add SLACK_REACTIONS env toggle for reaction lifecycle Adds _reactions_enabled() gating to match Discord (DISCORD_REACTIONS) and Telegram (TELEGRAM_REACTIONS) pattern. Defaults to true to preserve existing behavior. Gates at three levels: - _handle_slack_message: skips _reacting_message_ids registration - on_processing_start: early return - on_processing_complete: early return Also adds config.yaml bridge (slack.reactions) and two new tests.	2026-04-22 08:49:24 -07:00
Roopak Nijhara	70a33708e7	fix(gateway/slack): align reaction lifecycle with Discord/Telegram pattern Slack reactions were placed around handle_message(), which returns immediately after spawning a background task. This caused the 👀 → ✅ swap to happen before any real work began. Fix: implement on_processing_start / on_processing_complete callbacks (matching Discord/Telegram) so reactions bracket actual _message_handler work driven by the base class. Also fixes missing stop_typing() for Slack's assistant thread status indicator, which left 'is thinking...' stuck in the UI after processing completed. - Add _reacting_message_ids set for DM/@mention-only gating - Add _active_status_threads dict for stop_typing lookup - Update test_reactions_in_message_flow for new callback pattern - Add test_reactions_failure_outcome and test_reactions_skipped_for_non_dm_non_mention	2026-04-22 08:49:24 -07:00
WideLee	cf55c738e7	refactor(qqbot): migrate qr onboard flow to sync + consolidate into onboard.py - Replace async create_bind_task/poll_bind_result with synchronous httpx.Client equivalents, eliminating manual event loop management - Move _render_qr and full qr_register() entry-point into onboard.py, mirroring the Feishu onboarding pattern - Remove _qqbot_render_qr and _qqbot_qr_flow from gateway.py (~90 lines); call site becomes a single qr_register() import - Fix potential segfault: previous code called loop.close() in the EXPIRED branch and again in the finally block (double-close crashed under uvloop)	2026-04-22 05:50:21 -07:00
Abner	b66644f0ec	feat(hindsight): richer session-scoped retain metadata - Add configurable retain_tags / retain_source / retain_user_prefix / retain_assistant_prefix knobs for native Hindsight. - Thread gateway session identity (user_name, chat_id, chat_name, chat_type, thread_id) through AIAgent and MemoryManager into MemoryProvider.initialize kwargs so providers can scope and tag retained memories. - Hindsight attaches the new identity fields as retain metadata, merges per-call tool tags with configured default tags, and uses the configurable transcript labels for auto-retained turns. Co-authored-by: Abner <abner.the.foreman@agentmail.to>	2026-04-22 05:27:10 -07:00
Teknium	b8663813b6	feat(state): auto-prune old sessions + VACUUM state.db at startup (#13861 ) * feat(state): auto-prune old sessions + VACUUM state.db at startup state.db accumulates every session, message, and FTS5 index entry forever. A heavy user (gateway + cron) reported 384MB with 982 sessions / 68K messages causing slowdown; manual 'hermes sessions prune --older-than 7' + VACUUM brought it to 43MB. The prune command and VACUUM are not wired to run automatically anywhere — sessions grew unbounded until users noticed. Changes: - hermes_state.py: new state_meta key/value table, vacuum() method, and maybe_auto_prune_and_vacuum() — idempotent via last-run timestamp in state_meta so it only actually executes once per min_interval_hours across all Hermes processes for a given HERMES_HOME. Never raises. - hermes_cli/config.py: new 'sessions:' block in DEFAULT_CONFIG (auto_prune=True, retention_days=90, vacuum_after_prune=True, min_interval_hours=24). Added to _KNOWN_ROOT_KEYS. - cli.py: call maintenance once at HermesCLI init (shared helper _run_state_db_auto_maintenance reads config and delegates to DB). - gateway/run.py: call maintenance once at GatewayRunner init. - Docs: user-guide/sessions.md rewrites 'Automatic Cleanup' section. Why VACUUM matters: SQLite does NOT shrink the file on DELETE — freed pages get reused on next INSERT. Without VACUUM, a delete-heavy DB stays bloated forever. VACUUM only runs when the prune actually removed rows, so tight DBs don't pay the I/O cost. Tests: 10 new tests in tests/test_hermes_state.py covering state_meta, vacuum, idempotency, interval skipping, VACUUM-only-when-needed, corrupt-marker recovery. All 246 existing state/config/gateway tests still pass. Verified E2E with real imports + isolated HERMES_HOME: DEFAULT_CONFIG exposes the new block, load_config() returns it for fresh installs, first call prunes+vacuums, second call within min_interval_hours skips, and the state_meta marker persists across connection close/reopen. * sessions.auto_prune defaults to false (opt-in) Session history powers session_search recall across past conversations, so silently pruning on startup could surprise users. Ship the machinery disabled and let users opt in when they notice state.db is hurting performance. - DEFAULT_CONFIG.sessions.auto_prune: True → False - Call-site fallbacks in cli.py and gateway/run.py match the new default (so unmigrated configs still see off) - Docs: flip 'Enable in config.yaml' framing + tip explains the tradeoff	2026-04-22 05:21:49 -07:00
Teknium	b43524ecab	fix(wecom): visible poll progress + clearer no-bot-info failure + docstring note Follow-ups on top of salvaged #13923 (@keifergu): - Print QR poll dot every 3s instead of every 18s so "Fetching configuration results..." doesn't look hung. - On "status=success but no bot_info" from the WeCom query endpoint, log the full payload at WARNING and tell the user we're falling back to manual entry (was previously a single opaque line). - Document in the qr_scan_for_bot_info() docstring that the work.weixin.qq.com/ai/qc/* endpoints are the admin-console web-UI flow, not the public developer API, and may change without notice. Also add keifergu@tencent.com to scripts/release.py AUTHOR_MAP so release notes attribute the feature correctly.	2026-04-22 05:15:32 -07:00
keifergu	8bcd77a9c2	feat(wecom): add QR scan flow and interactive setup wizard for bot credentials	2026-04-22 05:15:32 -07:00
helix4u	a7d78d3bfd	fix: preserve reasoning_content on Kimi replay	2026-04-22 04:31:59 -07:00
Teknium	2aa983e2f2	feat(gateway): recognize .pdf in MEDIA: tag extraction (#13683 ) PDFs emitted by tools (report generators, document exporters, etc.) now deliver as native attachments when wrapped in MEDIA: — same as images, audio, and video. Bare .pdf paths are intentionally NOT added to extract_local_files(), so the agent can still reference PDFs in text without auto-sending them.	2026-04-21 13:48:10 -07:00
Teknium	e889332c99	fix(gateway): always inject reply-to pointer, not just when quoted text is absent (#13676 ) The [Replying to: "..."] prefix is disambiguation, not deduplication. When a user explicitly replies to a prior message, the agent needs a pointer to which specific message they're referencing — even when the quoted text already exists somewhere in history. History can contain the same or similar text multiple times; without an explicit pointer the agent has to guess (or answer for both subjects), and the reply signal is silently dropped. Example: in a conversation comparing Japan and Italy, replying to the "Japan is great for culture..." message and asking "What's the best time to go?" — previously the found_in_history check suppressed the prefix because the quoted text was already in history, leaving the agent to guess which destination the user meant. Now the pointer is always present. Drops the found_in_history guard added in #1594. Token overhead is minimal (snippet capped at 500 chars on the new user turn; cached prefix unaffected). Behavior becomes deterministic: reply sent ⇒ pointer present. Thanks to smartyi for flagging this.	2026-04-21 13:33:02 -07:00
Teknium	16accd44bd	fix(telegram): require TELEGRAM_WEBHOOK_SECRET in webhook mode (#13527 ) When TELEGRAM_WEBHOOK_URL was set but TELEGRAM_WEBHOOK_SECRET was not, python-telegram-bot received secret_token=None and the webhook endpoint accepted any HTTP POST. Anyone who could reach the listener could inject forged updates — spoofed user IDs, spoofed chat IDs, attacker-controlled message text — and trigger handlers as if Telegram delivered them. The fix refuses to start the adapter in webhook mode without the secret. Polling mode (default, no webhook URL) is unaffected — polling is authenticated by the bot token directly. BREAKING CHANGE for webhook-mode deployments that never set TELEGRAM_WEBHOOK_SECRET. The error message explains remediation: export TELEGRAM_WEBHOOK_SECRET="$(openssl rand -hex 32)" and instructs registering it with Telegram via setWebhook's secret_token parameter. Release notes must call this out. Reported in GHSA-3vpc-7q5r-276h by @bupt-Yy-young. Hardening — not CVE per SECURITY.md §3 "Public Exposure: Deploying the gateway to the public internet without external authentication or network protection" covers the historical default, but shipping a fail-open webhook as the default was the wrong choice and the guard aligns us with the SECURITY.md threat model.	2026-04-21 06:23:09 -07:00
unlinearity	155b619867	fix(agent): normalize socks:// env proxies for httpx/anthropic WSL2 / Clash-style setups often export ALL_PROXY=socks://127.0.0.1:PORT. httpx and the Anthropic SDK reject that alias and expect socks5://, so agent startup failed early with "Unknown scheme for proxy URL" before any provider request could proceed. Add shared normalize_proxy_url()/normalize_proxy_env_vars() helpers in utils.py and route all proxy entry points through them: - run_agent._get_proxy_from_env - agent.auxiliary_client._validate_proxy_env_urls - agent.anthropic_adapter.build_anthropic_client - gateway.platforms.base.resolve_proxy_url Regression coverage: - run_agent proxy env resolution - auxiliary proxy env normalization - gateway proxy URL resolution Verified with: PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 /home/nonlinear/.hermes/hermes-agent/venv/bin/pytest -o addopts='' -p pytest_asyncio.plugin tests/run_agent/test_create_openai_client_proxy_env.py tests/agent/test_proxy_and_url_validation.py tests/gateway/test_proxy_mode.py 39 passed.	2026-04-21 05:52:46 -07:00

1 2 3 4 5 ...

1174 commits