hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-26 01:01:40 +00:00

Author	SHA1	Message	Date
Kira	e1b0b135cb	fix(discord): accept .log attachments and raise document size limit	2026-04-09 02:26:33 -07:00
Teknium	d97f6cec7f	feat(gateway): add BlueBubbles iMessage platform adapter (#6437 ) Adds Apple iMessage as a gateway platform via BlueBubbles macOS server. Architecture: - Webhook-based inbound (event-driven, no polling/dedup needed) - Email/phone → chat GUID resolution for user-friendly addressing - Private API safety (checks helper_connected before tapback/typing) - Inbound attachment downloading (images, audio, documents cached locally) - Markdown stripping for clean iMessage delivery - Smart progress suppression for platforms without message editing Based on PR #5869 by @benjaminsehl (webhook architecture, GUID resolution, Private API safety, progress suppression) with inbound attachment downloading from PR #4588 by @1960697431 (attachment cache routing). Integration points: Platform enum, env config, adapter factory, auth maps, cron delivery, send_message routing, channel directory, platform hints, toolset definition, setup wizard, status display. 27 tests covering config, adapter, webhook parsing, GUID resolution, attachment download routing, toolset consistency, and prompt hints.	2026-04-08 23:54:03 -07:00
Teknium	241bd4fc7e	fix: add size cap to assistant thread metadata cache Prevents unbounded memory growth in _assistant_threads dict. Evicts oldest entries when exceeding _ASSISTANT_THREADS_MAX (5000), matching the pattern used by _mentioned_threads and _seen_messages.	2026-04-08 23:53:50 -07:00
helix4u	30a0fcaec8	fix(slack): handle assistant thread lifecycle events	2026-04-08 23:53:50 -07:00
Teknium	5449c01d26	fix: clean env vars in pairing regression test The test_non_internal_event_without_user_triggers_pairing test relied on no Discord auth env vars being set, but gateway/run.py loads dotenv at module level. In environments with DISCORD_ALLOW_ALL_USERS=True in .env, the auth check passed instead of triggering the pairing flow. Clear DISCORD_ALLOW_ALL_USERS, DISCORD_ALLOWED_USERS, GATEWAY_ALLOW_ALL_USERS, and GATEWAY_ALLOWED_USERS via monkeypatch to ensure test isolation.	2026-04-08 23:01:04 -07:00
xingkongliang	1d8d4f28ae	fix(gateway): prevent background process notifications from triggering false pairing requests When a background process with notify_on_complete=True finishes, the gateway injects a synthetic MessageEvent to notify the session. This event was constructed without user_id, causing _is_user_authorized() to reject it and — for DM-origin sessions — trigger the pairing flow, sending "Hi~ I don't recognize you yet!" with a pairing code to the chat owner. Add an `internal` flag to MessageEvent that bypasses authorization checks for system-generated synthetic events. Only the process watcher sets this flag; no external/adapter code path can produce it. Includes 4 regression tests covering the fix and the normal pairing path.	2026-04-08 23:01:04 -07:00
Helmi	092061711e	fix(gateway): add staged inactivity warning before timeout escalation Introduce gateway_timeout_warning (default 900s) as a pre-timeout alert layer. When inactivity reaches the warning threshold, a single notification is sent to the user offering to wait or reset. If inactivity continues to the gateway_timeout (default 1800s), the full timeout fires as before. This gives users a chance to intervene before work is lost on slow API providers without disabling the safety timeout entirely. Config: agent.gateway_timeout_warning in config.yaml, or HERMES_AGENT_TIMEOUT_WARNING env var (0 = disable warning).	2026-04-08 20:01:06 -07:00
Teknium	e26393ffc2	fix: Signal duplicate replies with streaming + per-platform tool_progress (#6348 ) Fixes #4647 — Signal replies duplicated when gateway streaming is enabled. Root cause: stream_consumer.py did not handle the case where send() returns success=True but no message_id (Signal behavior). Every stream delta produced a separate send() call (7+ messages instead of 2), plus the gateway sent another full duplicate since already_sent was never set. Changes: - stream_consumer.py: Add elif branch for success-without-message_id — enters fallback mode (sets already_sent, disables editing, sends only continuation) - signal.py send(): Extract timestamp from signal-cli RPC result as message_id so stream consumer follows normal edit→fallback path - signal.py: Add public stop_typing() delegating to _stop_typing_indicator() so base adapter's _keep_typing finally block can clean up typing tasks - gateway/run.py: Per-platform tool_progress_overrides (#6164) — lets users set e.g. signal: off while keeping telegram: all - hermes_cli/config.py: Add tool_progress_overrides to DEFAULT_CONFIG Refs: #4647, #6164	2026-04-08 17:39:45 -07:00
Teknium	7d26feb9a3	feat(discord): add DISCORD_REPLY_TO_MODE setting (#6333 ) Add configurable reply-reference behavior for Discord, matching the existing Telegram (TELEGRAM_REPLY_TO_MODE) and Mattermost (MATTERMOST_REPLY_MODE) implementations. Modes: - 'off': never reply-reference the original message - 'first': reply-reference on first chunk only (default, current behavior) - 'all': reply-reference on every chunk Set DISCORD_REPLY_TO_MODE=off in .env to disable reply-to messages. Changes: - gateway/config.py: parse DISCORD_REPLY_TO_MODE env var - gateway/platforms/discord.py: read reply_to_mode from config, respect it in send() — skip fetch_message entirely when 'off' - hermes_cli/config.py: add to OPTIONAL_ENV_VARS for hermes setup - 23 tests covering config, send behavior, env var override - docs: discord.md env var table + environment-variables.md reference Closes community request from Stuart on Discord.	2026-04-08 17:08:40 -07:00
Teknium	ab21fbfd89	fix: add gateway coverage for session boundary hooks, move test to tests/cli/ - Fire on_session_finalize and on_session_reset in gateway _handle_reset_command() - Fire on_session_finalize during gateway stop() for each active agent - Move CLI test from tests/ root to tests/cli/ (matches recent restructure) - Add 5 gateway tests covering reset hooks, ordering, shutdown, and error handling - Place on_session_reset after new session is guaranteed to exist (covers the get_or_create_session fallback path)	2026-04-08 04:27:34 -07:00
Teknium	30ea423ce8	fix: unify reasoning_effort to config.yaml only, remove HERMES_REASONING_EFFORT env var Gateway and cron had inconsistent reasoning_effort resolution: - CLI: config.yaml only (correct) - Gateway: config.yaml first, env var fallback - Cron: env var first, config.yaml fallback All three now read exclusively from agent.reasoning_effort in config.yaml. Removed HERMES_REASONING_EFFORT env var support entirely — .env is for secrets only, not behavioral config.	2026-04-08 03:36:44 -07:00
landy	383db35925	fix: improve streaming fallback after edit failures	2026-04-08 03:33:43 -07:00
Teknium	598c25d43e	feat(feishu): add interactive card approval buttons (#6043 ) Add button-based exec approval to the Feishu adapter, matching the existing Discord, Telegram, and Slack implementations. When the agent encounters a dangerous command, Feishu users now see an interactive card with four buttons instead of text instructions: - Allow Once (primary) - Allow Session - Always Allow - Deny (danger) Implementation: - send_exec_approval() sends an interactive card via the Feishu message API with buttons carrying hermes_action in their value dict - _handle_card_action_event() intercepts approval button clicks before routing them as synthetic commands, directly calling resolve_gateway_approval() to unblock the agent thread - _update_approval_card() replaces the orange approval card with a green (approved) or red (denied) status card showing who acted - _approval_state dict tracks pending approval_id → session_key mappings; cleaned up on resolution The gateway's existing routing in _approval_notify_sync already checks getattr(type(adapter), 'send_exec_approval', None) and will automatically use the button-based flow for Feishu. Tests: 16 new tests covering send, callback resolution, state management, card updates, and non-interference with existing card actions.	2026-04-07 22:45:14 -07:00
Teknium	50d1518df6	fix(tests): update tool_progress_callback test calls to new 4-arg signature Follow-up to sroecker's PR #5918 — test mocks were using the old 3-arg callback signature (name, preview, args) instead of the new (event_type, name, preview, args, **kwargs).	2026-04-07 17:56:01 -07:00
Teknium	efbe8d674a	docs: add Discord channel controls and Telegram reactions documentation - Discord: ignored_channels, no_thread_channels config reference + examples - Telegram: message reactions section with config, behavior notes - Environment variables reference updated for all new vars	2026-04-07 17:55:55 -07:00
Teknium	a6547f399f	test: add tests for Discord channel controls and Telegram reactions - 14 tests for ignored_channels, no_thread_channels, and config bridging - 17 tests for reaction enable/disable, API calls, error handling, and config	2026-04-07 17:55:55 -07:00
Teknium	469cd16fe0	fix(security): consolidated security hardening — SSRF, timing attack, tar traversal, credential leakage (#5944 ) Salvaged from PRs #5800 (memosr), #5806 (memosr), #5915 (Ruzzgar), #5928 (Awsh1). Changes: - Use hmac.compare_digest for API key comparison (timing attack prevention) - Apply provider env var blocklist to Docker containers (credential leakage) - Replace tar.extractall() with safe extraction in TerminalBench2 (CVE-2007-4559) - Add SSRF protection via is_safe_url to ALL platform adapters: base.py (cache_image_from_url, cache_audio_from_url), discord, slack, telegram, matrix, mattermost, feishu, wecom (Signal and WhatsApp protected via base.py helpers) - Update tests: mock is_safe_url in Mattermost download tests - Add security tests for tar extraction (traversal, symlinks, safe files)	2026-04-07 17:28:37 -07:00
Zainan Victor Zhou	0d41fb0827	fix(gateway): show full session id and title in /status	2026-04-07 17:27:09 -07:00
Jeff Escalante	4aef055805	fix(gateway/webhook): don't pop delivery_info on send The webhook adapter stored per-request `deliver`/`deliver_extra` config in `_delivery_info[chat_id]` during POST handling and consumed it via `.pop()` inside `send()`. That worked for routes whose agent run produced exactly one outbound message — the final response — but it broke whenever the agent emitted any interim status message before the final response. Status messages flow through the same `send(chat_id, ...)` path as the final response (see `gateway/run.py::_status_callback_sync` → `adapter.send(...)`). Common triggers include: - "🔄 Primary model failed — switching to fallback: ..." (run_agent.py::_emit_status when `fallback_providers` activates) - context-pressure / compression notices - any other lifecycle event routed through `status_callback` When any of those fired, the first `send()` call popped the entry, so the subsequent final-response `send()` saw an empty dict and silently downgraded `deliver_type` from `"telegram"` (or `discord`/`slack`/etc.) to the default `"log"`. The agent's response was logged to the gateway log instead of being delivered to the configured cross-platform target — no warning, no error, just a missing message. This was easy to hit in practice. Any user with `fallback_providers` configured saw it the first time their primary provider hiccuped on a webhook-triggered run. Routes that worked perfectly in dev (where the primary stays healthy) silently dropped responses in prod. Fix: read `_delivery_info` with `.get()` so multiple `send()` calls for the same `chat_id` all see the same delivery config. To keep the dict bounded without relying on per-send cleanup, add a parallel `_delivery_info_created` timestamp dict and a `_prune_delivery_info()` helper that drops entries older than `_idempotency_ttl` (1h, same window already used by `_seen_deliveries`). Pruning runs on each POST, mirroring the existing `_seen_deliveries` cleanup pattern. Worst-case memory footprint is now `rate_limit * TTL = 30/min * 60min = 1800` entries, each ~1KB → under 2 MB. In practice it'll be far smaller because most webhooks complete in seconds, not the full hour. Test changes: - `test_delivery_info_cleaned_after_send` is replaced with `test_delivery_info_survives_multiple_sends`, which is now the regression test for this bug — it asserts that two consecutive `send()` calls both see the delivery config. - A new `test_delivery_info_pruned_via_ttl` covers the TTL cleanup behavior. - The two integration tests that asserted `chat_id not in adapter._delivery_info` after `send()` now assert the opposite, with a comment explaining why. All 40 tests in `tests/gateway/test_webhook_adapter.py` and `tests/gateway/test_webhook_integration.py` pass. Verified end-to-end locally against a dynamic `hermes webhook subscribe` route configured with `--deliver telegram --deliver-chat-id <user>`: with `gpt-5.4` as the primary (currently flaky) and `claude-opus-4.6` as the fallback, the fallback notification fires, the agent finishes, and the final response is delivered to Telegram as expected.	2026-04-07 17:27:09 -07:00
Siddharth Balyan	f3006ebef9	refactor(tests): re-architect tests + fix CI failures (#5946 ) * refactor: re-architect tests to mirror the codebase * Update tests.yml * fix: add missing tool_error imports after registry refactor * fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers, causing test_code_execution tests to hit the Modal remote path. * fix(tests): fix update_check and telegram xdist failures - test_update_check: replace patch("hermes_cli.banner.os.getenv") with monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os directly, it uses get_hermes_home() from hermes_constants. - test_telegram_conflict/approval_buttons: provide real exception classes for telegram.error mock (NetworkError, TimedOut, BadRequest) so the except clause in connect() doesn't fail with "catching classes that do not inherit from BaseException" when xdist pollutes sys.modules. * fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock	2026-04-07 17:19:07 -07:00
Teknium	99ff375f7a	fix(gateway): respect tool_preview_length in all/new progress modes (#5937 ) Previously, all/new tool progress modes always hard-truncated previews to 40 chars, ignoring the display.tool_preview_length config. This made it impossible for gateway users to see meaningful command/path info without switching to verbose mode (which shows too much detail). Now all/new modes read tool_preview_length from config: - tool_preview_length: 0 (default/unset) → 40 chars (no regression) - tool_preview_length: 120 → 120-char previews in all/new mode - verbose mode: unchanged (already respected the config) Users who want longer previews can set: display: tool_preview_length: 120 Reported by demontut_ on Discord.	2026-04-07 14:10:56 -07:00
Dilee	4a630c2071	fix(telegram): replace substring caption check with exact line-by-line match Captions in photo bursts and media group albums were silently dropped when a shorter caption happened to be a substring of an existing one (e.g. "Meeting" lost inside "Meeting agenda"). Extract a shared _merge_caption static helper that splits on "\n\n" and uses exact match with whitespace normalisation, then use it in both _enqueue_photo_event and _queue_media_group_event. Adds 13 unit tests covering the fixed bug scenarios. Cherry-picked from PR #2671 by Dilee.	2026-04-07 14:08:59 -07:00
Teknium	e49c8bbbbb	feat(slack): thread engagement — auto-respond in bot-started and mentioned threads (#5897 ) When the bot sends a message in a thread, track its ts in _bot_message_ts. When the bot is @mentioned in a thread, register it in _mentioned_threads. Both sets enable auto-responding to future messages in those threads without requiring repeated @mentions — making the bot behave like a team member that stays engaged once a conversation starts. Channel message gating now checks 4 signals (in order): 1. @mention in this message 2. Reply in a thread the bot started/participated in (_bot_message_ts) 3. Message in a thread where the bot was previously @mentioned (_mentioned_threads) 4. Existing session for this thread (_has_active_session_for_thread — survives restarts) Thread context fetching now triggers on ANY first-entry path (not just @mention), so the agent gets context whether it's entering via a mention, a bot-thread reply, or a mentioned-thread auto-trigger. Both tracking sets are bounded (5000 cap with prune-oldest-half) to prevent unbounded memory growth in long-running deployments. Salvaged from PR #5754 by @hhhonzik. Preserves our existing approval buttons, thread context fetching, and session key fix. Does NOT include the edit_message format_message() removal (that was a regression in the original PR). Tests: 4 new tests for bot-ts tracking and mentioned-thread bounds.	2026-04-07 11:12:08 -07:00
Teknium	1a2a03ca69	feat(gateway): approval buttons for Slack & Telegram + Slack thread context (#5890 ) Slack: - Add Block Kit interactive buttons for command approval (Allow Once, Allow Session, Always Allow, Deny) via send_exec_approval() - Register @app.action handlers for each approval button - Add _fetch_thread_context() — fetches thread history via conversations.replies when bot is first @mentioned mid-thread - Fix _has_active_session_for_thread() to use build_session_key() instead of manual key construction (fixes session key mismatch bug where thread_sessions_per_user flag was ignored, ref PR #5833) Telegram: - Add InlineKeyboard approval buttons via send_exec_approval() - Add ea:* callback handling in _handle_callback_query() - Uses monotonic counter + _approval_state dict to map button clicks back to session keys (avoids 64-byte callback_data limit) Both platforms now auto-detected by the gateway runner's _approval_notify_sync() — any adapter with send_exec_approval() on its class gets button-based approval instead of text fallback. Inspired by community PRs #3898 (LevSky22), #2953 (ygd58), #5833 (heathley). Implemented fresh on current main. Tests: 24 new tests covering button rendering, action handling, thread context fetching, session key fix, double-click prevention.	2026-04-07 11:03:14 -07:00
Teknium	caded0a5e7	fix: repair 57 failing CI tests across 14 files (#5823 ) * fix: repair 57 failing CI tests across 14 files Categories of fixes: Test isolation under xdist (-n auto): - test_hermes_logging: Strip ALL RotatingFileHandlers before each test to prevent handlers leaked from other xdist workers from polluting counts - test_code_execution: Force TERMINAL_ENV=local in setUp — prevents Modal AuthError when another test leaks TERMINAL_ENV=modal - test_timezone: Same TERMINAL_ENV fix for execute_code timezone tests - test_codex_execution_paths: Mock _resolve_turn_agent_config to ensure model resolution works regardless of xdist worker state Matrix adapter tests (nio not installed in CI): - Add _make_fake_nio() helper with real response classes for isinstance() checks in production code - Replace MagicMock(spec=nio.XxxResponse) with fake_nio instances - Wrap production method calls with patch.dict('sys.modules', {'nio': ...}) so import nio succeeds in method bodies - Use try/except instead of pytest.importorskip for nio.crypto imports (importorskip can be fooled by MagicMock in sys.modules) - test_matrix_voice: Skip entire file if nio is a mock, not just missing Stale test expectations: - test_cli_provider_resolution: _prompt_provider_choice now takes kwargs (default param added); mock getpass.getpass alongside input - test_anthropic_oauth_flow: Mock getpass.getpass (code switched from input) - test_gemini_provider: Mock models.dev + OpenRouter API lookups to test hardcoded defaults without external API variance - test_code_execution: Add notify_on_complete to blocked terminal params - test_setup_openclaw_migration: Mock prompt_choice to select 'Full setup' (new quick-setup path leads to _require_tty → sys.exit in CI) - test_skill_manager_tool: Patch get_all_skills_dirs alongside SKILLS_DIR so _find_skill searches tmp_path, not real ~/.hermes/skills/ Missing attributes in object.__new__ test runners: - test_platform_reconnect: Add session_store to _make_runner() - test_session_race_guard: Add hooks, _running_agents_ts, session_store, delivery_router to _make_runner() Production bug fix (gateway/run.py):** - Fix sentinel eviction race: _AGENT_PENDING_SENTINEL was immediately evicted by the stale-detection logic because sentinels have no get_activity_summary() method, causing _stale_idle=inf >= timeout. Guard _should_evict with 'is not _AGENT_PENDING_SENTINEL'. * fix: address remaining CI failures - test_setup_openclaw_migration: Also mock _offer_launch_chat (called at end of both quick and full setup paths) - test_code_execution: Move TERMINAL_ENV=local to module level to protect ALL test classes (TestEnvVarFiltering, TestExecuteCodeEdgeCases, TestInterruptHandling, TestHeadTailTruncation) from xdist env leaks - test_matrix: Use try/except for nio.crypto imports (importorskip can be fooled by MagicMock in sys.modules under xdist)	2026-04-07 09:58:45 -07:00
Teknium	8b861b77c1	refactor: remove browser_close tool — auto-cleanup handles it (#5792 ) * refactor: remove browser_close tool — auto-cleanup handles it The browser_close tool was called in only 9% of browser sessions (13/144 navigations across 66 sessions), always redundantly — cleanup_browser() already runs via _cleanup_task_resources() at conversation end, and the background inactivity reaper catches anything else. Removing it saves one tool schema slot in every browser-enabled API call. Also fixes a latent bug: cleanup_browser() now handles Camofox sessions too (previously only Browserbase). Camofox sessions were never auto-cleaned per-task because they live in a separate dict from _active_sessions. Files changed (13): - tools/browser_tool.py: remove function, schema, registry entry; add camofox cleanup to cleanup_browser() - toolsets.py, model_tools.py, prompt_builder.py, display.py, acp_adapter/tools.py: remove browser_close from all tool lists - tests/: remove browser_close test, update toolset assertion - docs/skills: remove all browser_close references * fix: repeat browser_scroll 5x per call for meaningful page movement Most backends scroll ~100px per call — barely visible on a typical viewport. Repeating 5x gives ~500px (~half a viewport), making each scroll tool call actually useful. Backend-agnostic approach: works across all 7+ browser backends without needing to configure each one's scroll amount individually. Breaks early on error for the agent-browser path. * feat: auto-return compact snapshot from browser_navigate Every browser session starts with navigate → snapshot. Now navigate returns the compact accessibility tree snapshot inline, saving one tool call per browser task. The snapshot captures the full page DOM (not viewport-limited), so scroll position doesn't affect it. browser_snapshot remains available for refreshing after interactions or getting full=true content. Both Browserbase and Camofox paths auto-snapshot. If the snapshot fails for any reason, navigation still succeeds — the snapshot is a bonus, not a requirement. Schema descriptions updated to guide models: navigate mentions it returns a snapshot, snapshot mentions it's for refresh/full content. * refactor: slim cronjob tool schema — consolidate model/provider, drop unused params Session data (151 calls across 67 sessions) showed several schema properties were never used by models. Consolidated and cleaned up: Removed from schema (still work via backend/CLI): - skill (singular): use skills array instead - reason: pause-only, unnecessary - include_disabled: now defaults to true - base_url: extreme edge case, zero usage - provider (standalone): merged into model object Consolidated: - model + provider → single 'model' object with {model, provider} fields. If provider is omitted, the current main provider is pinned at creation time so the job stays stable even if the user changes their default. Kept: - script: useful data collection feature - skills array: standard interface for skill loading Schema shrinks from 14 to 10 properties. All backend functionality preserved — the Python function signature and handler lambda still accept every parameter. * fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli, hermes-messaging, safe), which meant it appeared in every session for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS gate only works after running 'hermes tools' explicitly. Now MoA only appears when a user explicitly enables it via 'hermes tools'. The moa toolset definition and check_fn remain unchanged — it just needs to be opted into.	2026-04-07 03:28:44 -07:00
Teknium	eb7c408445	fix(gateway): /stop and /new bypass Level 1 active-session guard (#5765 ) * fix(gateway): /stop and /new bypass Level 1 active-session guard The base adapter's Level 1 guard intercepted ALL messages while an agent was running, including /stop and /new. These commands were queued as pending messages instead of being dispatched to the gateway runner's Level 2 handler. When the agent eventually stopped (via the interrupt mechanism), the command text leaked into the conversation as a user message — the model would receive '/stop' as input and respond to it. Fix: Add /stop, /new, and /reset to the bypass set in base.py alongside /approve, /deny, and /status. Consolidate the three separate bypass blocks into one. Commands in the bypass set are dispatched inline to the gateway runner, where Level 2 handles them correctly (hard-kill for /stop, session reset for /new). Also add a safety net in _run_agent's pending-message processing: if the pending text resolves to a known slash command, discard it instead of passing it to the agent. This catches edge cases where command text leaks through the interrupt_message fallback. Refs: #5244 * test: regression tests for command bypass of active-session guard 17 tests covering: - /stop, /new, /reset bypass the Level 1 guard when agent is running - /approve, /deny, /status bypass (existing behavior, now tested) - Regular text and unknown commands still queued (not bypassed) - File paths like '/path/to/file' not treated as commands - Telegram @botname suffix handled correctly - Safety net command resolution (resolve_command detects known commands)	2026-04-07 00:53:45 -07:00
Teknium	8dee82ea1e	fix: stream consumer creates new message after tool boundaries (#5739 ) When streaming was enabled on the gateway, the stream consumer created a single message at the start and kept editing it as tokens arrived. Tool progress messages were sent as separate messages below it. Since edits don't change message position on Telegram/Matrix/Discord, the final response ended up stuck above all tool progress messages — users had to scroll up past potentially dozens of tool call lines to read the answer. The agent already sends stream_delta_callback(None) at tool boundaries (before _execute_tool_calls). The stream consumer was ignoring this signal. Now it treats None as a segment break: finalizes the current message (removes cursor), resets _message_id, and the next text chunk creates a fresh message below the tool progress messages. Timeline before: [msg 1: 'Let me search...' → edits → 'Here is the answer'] ← top [msg 2: tool progress lines] ← bottom Timeline after: [msg 1: 'Let me search...'] ← top [msg 2: tool progress lines] [msg 3: 'Here is the answer'] ← bottom (visible) Reported by SkyLinx on Discord.	2026-04-06 23:00:14 -07:00
eizus	4ec615b0c2	feat(gateway): Enable Slack thread replies without explicit @mentions When a user replies in a Slack thread where the bot has an active conversation session, the bot now processes the message even without an explicit @mention. This improves UX for ongoing threaded discussions. Changes: - Added set_session_store() to BasePlatformAdapter for adapters to check active sessions - Modified SlackAdapter to detect thread replies and check if a session exists for that thread before requiring @mentions - Updated GatewayRunner to inject the session store into adapters - Added comprehensive tests for the new behavior Fixes: Thread replies without @jarvis are now processed if there is an active session, matching user expectations for conversation flow	2026-04-06 21:27:16 -07:00
jtuki	57abc99315	feat(gateway): add per-group access control for Feishu Add fine-grained authorization policies per Feishu group chat via platforms.feishu.extra configuration. - Add global bot-level admins that bypass all group restrictions - Add per-group policies: open, allowlist, blacklist, admin_only, disabled - Add default_group_policy fallback for chats without explicit rules - Thread chat_id through group message gate for per-chat rule selection - Match both open_id and user_id for backward compatibility - Preserve existing FEISHU_ALLOWED_USERS / FEISHU_GROUP_POLICY behavior - Add focused regression tests for all policy modes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
jtuki	18727ca9aa	refactor(gateway): simplify Feishu websocket config helpers Consolidate coercion functions, extract loop readiness check, and deduplicate test mock setup to improve maintainability without changing behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
jtuki	157d6184e3	fix(gateway): make Feishu websocket overrides effective at runtime Reapply local reconnect and ping settings after the Feishu SDK refreshes its client config so user-provided websocket tuning actually takes effect. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
jtuki	ea31d9077c	feat(gateway): add Feishu websocket ping timing overrides Allow Feishu websocket keepalive timing to be configured via platform extra config so disconnects can be detected faster in unstable networks. New optional extra settings: - ws_ping_interval - ws_ping_timeout These values are applied only when explicitly configured. Invalid values fall back to the websocket library defaults by leaving the options unset. This complements the reconnect timing settings added previously and helps reduce total recovery time after network interruptions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
jtuki	7d0bf15121	feat(gateway): add configurable Feishu websocket reconnect timing Allow users to configure websocket reconnect behavior via platform extra config to reduce reconnect latency in production environments. The official Feishu SDK defaults to: - First reconnect: random jitter 0-30 seconds - Subsequent retries: 120 second intervals This can cause 20-30 second delays before reconnection after network interruptions. This commit makes these values configurable while keeping the SDK defaults for backward compatibility. Configuration via ~/.hermes/config.yaml: ```yaml platforms: feishu: extra: ws_reconnect_nonce: 0 # Disable first-reconnect jitter (default: 30) ws_reconnect_interval: 3 # Retry every 3 seconds (default: 120) ``` Invalid values (negative numbers, non-integers) fall back to SDK defaults. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
jtuki	7cf4bd06bf	fix(gateway): fix Feishu reconnect message drops and shutdown hang This commit fixes two critical bugs in the Feishu adapter that affect message reliability and process lifecycle. Bug Fix 1: Intermittent Message Drops Root cause: Event handler was created once in __init__ and reused across reconnects, causing callbacks to capture stale loop references. When the adapter disconnected and reconnected, old callbacks continued firing with invalid loop references, resulting in dropped messages with warnings: "[Feishu] Dropping inbound message before adapter loop is ready" Fix: - Rebuild event handler on each connect (websocket/webhook) - Clear handler on disconnect - Ensure callbacks always capture current valid loop - Add defensive loop.is_closed() checks with getattr for test compatibility - Unify webhook dispatch path to use same loop checks as websocket mode Bug Fix 2: Process Hangs on Ctrl+C / SIGTERM Root cause: Feishu SDK's websocket client runs in a background thread with an infinite _select() loop that never exits naturally. The thread was never properly joined on disconnect, causing processes to hang indefinitely after Ctrl+C or gateway stop commands. Fix: - Store reference to thread-local event loop (_ws_thread_loop) - On disconnect, cancel all tasks in thread loop and stop it gracefully via call_soon_threadsafe() - Await thread future with 10s timeout - Clean up pending tasks in thread's finally block before closing loop - Add detailed debug logging for disconnect flow Additional Improvements: - Add regression tests for disconnect cleanup and webhook dispatch - Ensure all event callbacks check loop readiness before dispatching Tested on Linux with websocket mode. All Feishu tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 16:54:16 -07:00
kshitijk4poor	05f9267938	fix(matrix): hard-fail E2EE when python-olm missing + stable MATRIX_DEVICE_ID Two issues caused Matrix E2EE to silently not work in encrypted rooms: 1. When matrix-nio is installed without the [e2e] extra (no python-olm / libolm), nio.crypto.ENCRYPTION_ENABLED is False and client.olm is never initialized. The adapter logged warnings but returned True from connect(), so the bot appeared online but could never decrypt messages. Now: check_matrix_requirements() and connect() both hard-fail with a clear error message when MATRIX_ENCRYPTION=true but E2EE deps are missing. 2. Without a stable device_id, the bot gets a new device identity on each restart. Other clients see it as "unknown device" and refuse to share Megolm session keys. Now: MATRIX_DEVICE_ID env var lets users pin a stable device identity that persists across restarts and is passed to nio.AsyncClient constructor + restore_login(). Changes: - gateway/platforms/matrix.py: add _check_e2ee_deps(), hard-fail in connect() and check_matrix_requirements(), MATRIX_DEVICE_ID support in constructor + restore_login - gateway/config.py: plumb MATRIX_DEVICE_ID into platform extras - hermes_cli/config.py: add MATRIX_DEVICE_ID to OPTIONAL_ENV_VARS Closes #3521	2026-04-06 16:54:16 -07:00
WAXLYY	1c0183ec71	fix(gateway): sanitize media URLs in base platform logs	2026-04-06 16:50:05 -07:00
ryanautomated	0f9aa57069	fix: silent memory flush failure on /new and /resume commands The _async_flush_memories() helper accepts (session_id) but both the /new and /resume handlers passed two arguments (session_id, session_key). The TypeError was silently swallowed at DEBUG level, so memory extraction never ran when users typed /new or /resume. One call site (the session expiry watcher) was already fixed in `9c96f669`, but /new and /resume were missed. - gateway/run.py:3247 — remove stray session_key from /new handler - gateway/run.py:4989 — remove stray session_key from /resume handler - tests/gateway/test_resume_command.py:222 — update test assertion	2026-04-06 16:49:42 -07:00
Nick	4f03b9a419	feat(webhook): add {__raw__} template token and thread_id passthrough for forum topics - {__raw__} in webhook prompt templates dumps the full JSON payload (truncated at 4000 chars) - _deliver_cross_platform now passes thread_id/message_thread_id from deliver_extra as metadata, enabling Telegram forum topic delivery - Tests for both features	2026-04-06 16:42:52 -07:00
Mikita Lisavets	9afb9a6cb2	fix: clear session-scoped model overrides during session reset	2026-04-06 13:20:01 -07:00
Dusk1e	1a2f109d8e	Ensure atomic writes for gateway channel directory cache to prevent truncation	2026-04-06 13:20:01 -07:00
kshitijk4poor	5e88eb2ba0	fix(signal): implement send_image_file, send_voice, and send_video for MEDIA: tag delivery The Signal adapter inherited base class defaults for send_image_file(), send_voice(), and send_video() which only sent the file path as text (e.g. '🖼️ Image: /tmp/chart.png') instead of actually delivering the file as a Signal attachment. When agent responses contain MEDIA:/path/to/file tags, the gateway media pipeline extracts them and routes through these methods by file type. Without proper overrides, image/audio/video files were never actually delivered to Signal users. Extract a shared _send_attachment() helper that handles all file validation, size checking, group/DM routing, and RPC dispatch. The four public methods (send_document, send_image_file, send_voice, send_video) now delegate to this helper, following the same pattern used by WhatsApp (_send_media_to_bridge) and Discord (_send_file_attachment). The helper also uses a single stat() call with try/except FileNotFoundError instead of the previous exists() + stat() two-syscall pattern, eliminating a TOCTOU race. As a bonus, send_document() now gains the 100MB size check that was previously missing (inconsistency with send_image). Add 25 tests covering all methods plus MEDIA: tag extraction integration, method-override guards, and send_document's new size check. Fixes #5105	2026-04-06 11:41:34 -07:00
Teknium	89c812d1d2	feat: shared thread sessions by default — multi-user thread support (#5391 ) Threads (Telegram forum topics, Discord threads, Slack threads) now default to shared sessions where all participants see the same conversation. This is the expected UX for threaded conversations where multiple users @mention the bot and interact collaboratively. Changes: - build_session_key(): when thread_id is present, user_id is no longer appended to the session key (threads are shared by default) - New config: thread_sessions_per_user (default: false) — opt-in to restore per-user isolation in threads if needed - Sender attribution: messages in shared threads are prefixed with [sender name] so the agent can tell participants apart - System prompt: shared threads show 'Multi-user thread' note instead of a per-turn User line (avoids busting prompt cache) - Wired through all callers: gateway/run.py, base.py, telegram.py, feishu.py - Regular group messages (no thread) remain per-user isolated (unchanged) - DM threads are unaffected (they have their own keying logic) Closes community request from demontut_ re: thread-based shared sessions.	2026-04-05 19:46:58 -07:00
Teknium	8d5226753f	fix: add missing ButtonStyle.grey to discord mock for test compatibility	2026-04-05 12:42:47 -07:00
Abhey	66d0fa1778	fix: avoid unnecessary Discord members intent on startup Only request the privileged members intent when DISCORD_ALLOWED_USERS includes non-numeric entries that need username resolution. Also release the Discord token lock when startup fails so retries and restarts are not blocked by a stale lock.\n\nAdds regression tests for conditional intents and startup lock cleanup.	2026-04-05 12:42:47 -07:00
MichaelWDanko	c6793d6fc3	fix(gateway): wrap cron helpers with staticmethod to prevent self-binding Plain functions imported as class attributes in APIServerAdapter get auto-bound as methods via Python's descriptor protocol. Every self._cron_() call injected self as the first positional argument, causing TypeError on all 8 cron API endpoints at runtime. Wrap each import with staticmethod() so self._cron_() calls dispatch correctly without modifying any call sites. Co-authored-by: teknium <teknium@nousresearch.com>	2026-04-05 12:31:10 -07:00
Mibayy	cc2b56b26a	feat(api): structured run events via /v1/runs SSE endpoint Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events for SSE streaming of typed lifecycle events (tool.started, tool.completed, message.delta, reasoning.available, run.completed, run.failed). Changes the internal tool_progress_callback signature from positional (tool_name, preview, args) to event-type-first (event_type, tool_name, preview, args, **kwargs). Existing consumers filter on event_type and remain backward-compatible. Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep. Fixes logic inversion in cli.py _on_tool_progress where the original PR would have displayed internal tools instead of non-internal ones. Co-authored-by: Mibayy <mibayy@users.noreply.github.com>	2026-04-05 12:05:13 -07:00
analista	e8053e8b93	fix(gateway): surface unknown /commands instead of leaking them to the LLM Previously, typing a /command that isn't a built-in, plugin, or skill would silently fall through to the LLM as plain text. The model often interprets it as a loose instruction and invents unrelated tool calls — e.g. a stray /claude_code slipped through and the model fabricated a delegate_task invocation that got stuck in an OAuth loop. Now we check GATEWAY_KNOWN_COMMANDS after the skill / plugin / unavailable-skill lookups and return an actionable message pointing the user at /commands. The user gets feedback, and the agent doesn't waste a round-trip guessing what /foo-bar was supposed to mean.	2026-04-05 11:59:28 -07:00
Damian P	afccbf253c	fix: resolve listed messaging targets consistently	2026-04-05 11:59:28 -07:00
kshitijk4poor	1d2e34c7eb	Prevent Telegram polling handoffs and flood-control send failures Telegram polling can inherit a stale webhook registration when a deployment switches transport modes, which leaves getUpdates idle even though the gateway starts cleanly. Outbound send also treats Telegram retry_after responses as terminal errors, so brief flood control can drop tool progress and replies. Constraint: Keep the PR narrowly scoped to upstream/main Telegram adapter behavior Rejected: Port OpenClaw's broader polling supervisor and offset persistence \| too broad for an isolated fix PR Confidence: high Scope-risk: narrow Reversibility: clean Directive: Polling mode should clear webhook state before starting getUpdates, and send-path retry logic must distinguish flood control from timeouts Tested: uv run --extra dev pytest tests/gateway/test_telegram_* -q Not-tested: Live Telegram webhook-to-polling migration and real Bot API 429 behavior	2026-04-05 11:59:28 -07:00

1 2 3 4 5 ...

335 commits