hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Author	SHA1	Message	Date
Teknium	7c10761dd2	fix(discord): shield text-batch flush from follow-up cancel (#12444 ) When Discord splits a long message at 2000 chars, _enqueue_text_event buffers each chunk and schedules a _flush_text_batch task with a short delay. If another chunk lands while the prior flush task is already inside handle_message, _enqueue_text_event calls prior_task.cancel() — and without asyncio.shield, CancelledError propagates from the flush task into handle_message → the agent's streaming request, aborting the response the user was waiting on. Reproducer: user sends a 3000-char prompt (split by Discord into 2 messages). Chunk 1 lands, flush delay starts, chunk 2 lands during the brief window when chunk 1's flush has already committed to handle_message. Agent's current streaming response is cancelled with CancelledError, user sees a truncated or missing reply. Fix (gateway/platforms/discord.py): - Wrap the handle_message call in asyncio.shield so the inner dispatch is protected from the outer task's cancel. - Add an except asyncio.CancelledError clause so the outer task still exits cleanly when cancel lands during the sleep window (before the pop) — semantics for that path are unchanged. The new flush task spawned by the follow-up chunk still handles its own batch via the normal pending-message / active-session machinery in base.py, so follow-ups are not lost. Tests: tests/gateway/test_text_batching.py — test_shield_protects_handle_message_from_cancel. Tracks a distinct first_handle_cancelled event so the assertion fails cleanly when the shield is missing (verified by stashing the fix and re-running). Live E2E on the live-loaded DiscordAdapter: first_handle_cancelled: False (shield worked) first_handle_completed: True (handle_message ran to completion)	2026-04-19 00:09:38 -07:00
Teknium	dca439fe92	fix(tui): scope session.interrupt pending-prompt release to the calling session (#12441 ) session.interrupt on session A was blast-resolving pending clarify/sudo/secret prompts on ALL sessions sharing the same tui_gateway process. Other sessions' agent threads unblocked with empty-string answers as if the user had cancelled — silent cross-session corruption. Root cause: _pending and _answers were globals keyed by random rid with no record of the owning session. _clear_pending() iterated every entry, so the session.interrupt handler had no way to limit the release to its own sid. Fix: - tui_gateway/server.py: _pending now maps rid to (sid, Event) tuples. _clear_pending takes an optional sid argument and filters by owner_sid when provided. session.interrupt passes the calling sid so unrelated sessions are untouched. _clear_pending(None) remains the shutdown path for completeness. - _block and _respond updated to pack/unpack the new tuple format. Tests (tests/test_tui_gateway_server.py): 4 new cases. - test_interrupt_only_clears_own_session_pending: two sessions with pending prompts, interrupting one must not release the other. - test_interrupt_clears_multiple_own_pending: same-sid multi-prompt release works. - test_clear_pending_without_sid_clears_all: shutdown path preserved. - test_respond_unpacks_sid_tuple_correctly: _respond handles the tuple format. Also updated tests/tui_gateway/test_protocol.py to use the new tuple format for test_block_and_respond and test_clear_pending. Live E2E against the live Python environment confirmed cross-session isolation: interrupting sid_a released its own pending prompt without touching sid_b's. All 78 related tests pass.	2026-04-19 00:03:58 -07:00
Teknium	ce410521b3	feat(browser): add browser_cdp raw DevTools Protocol passthrough (#12369 ) Agents can now send arbitrary CDP commands to the browser. The tool is gated on a reachable CDP endpoint at session start — it only appears in the toolset when BROWSER_CDP_URL is set (from '/browser connect') or 'browser.cdp_url' is configured in config.yaml. Backends that don't currently expose CDP to the Python side (Camofox, default local agent-browser, cloud providers whose per-session cdp_url is not yet surfaced) do not see the tool at all. Tool schema description links to the CDP method reference at https://chromedevtools.github.io/devtools-protocol/ so the agent can web_extract specific method docs on demand. Stateless per call. Browser-level methods (Target., Browser., Storage.*) omit target_id. Page-level methods attach to the target with flatten=true and dispatch the method on the returned sessionId. Clean errors when the endpoint becomes unreachable mid-session or the URL isn't a WebSocket. Tests: 19 unit (mock CDP server + gate checks) + E2E against real headless Chrome (Target.getTargets, Browser.getVersion, Runtime.evaluate with target_id, Page.navigate + re-eval, bogus method, bogus target_id, missing endpoint) + E2E of the check_fn gate (tool hidden without CDP URL, visible with it, hidden again after unset).	2026-04-19 00:03:10 -07:00
helix4u	7b1a11b971	fix(memory): keep Honcho provider opt-in	2026-04-18 22:50:55 -07:00
Erosika	21d5ef2f17	feat(honcho): wizard cadence default 2, surface reasoning level, backwards-compat fallback Setup wizard now always writes dialecticCadence=2 on new configs and surfaces the reasoning level as an explicit step with all five options (minimal / low / medium / high / max), always writing dialecticReasoningLevel. Code keeps a backwards-compat fallback of 1 when dialecticCadence is unset so existing honcho.json configs that predate the setting keep firing every turn on upgrade. New setups via the wizard get 2 explicitly; docs show 2 as the default. Also scrubs editorial lines from code and docs ("max is reserved for explicit tool-path selection", "Unset → every turn; wizard pre-fills 2", and similar process-exposing phrasing) and adds an inline link to app.honcho.dev where the server-side observation sync is mentioned in honcho.md. Recommended cadence range updated to 1-5 across docs and wizard copy.	2026-04-18 22:50:55 -07:00
LeonSGP43	5b6792f04d	fix(honcho): scope gateway sessions by runtime user id	2026-04-18 22:50:55 -07:00
Erosika	ba7da73ca9	test(honcho): drop two first-turn tests subsumed by prewarm + smoke coverage - TestDialecticDepth::test_first_turn_runs_dialectic_synchronously: covered by TestSessionStartDialecticPrewarm::test_turn1_falls_back_to_sync_when_prewarm_missing (more realistic — exercises the empty-prewarm → sync-fallback path) - TestDialecticDepth::test_first_turn_dialectic_does_not_double_fire: covered by TestDialecticLifecycleSmoke (turn 1 flow) and TestDialecticCadenceAdvancesOnSuccess::test_empty_dialectic_result_does_not_advance_cadence Both predate the prewarm refactor and test paths that are now fallback behaviors already covered elsewhere.	2026-04-18 22:50:55 -07:00
Erosika	c630dfcdac	feat(honcho): dialectic liveness — stale-thread watchdog, stale-result discard, empty-streak backoff Hardens the dialectic lifecycle against three failure modes that could leave the prefetch pipeline stuck or injecting stale content: - Stale-thread watchdog: _thread_is_live() treats any prefetch thread older than timeout × 2.0 as dead. A hung Honcho call can no longer block subsequent fires indefinitely. - Stale-result discard: pending _prefetch_result is tagged with its fire turn. prefetch() discards the result if more than cadence × 2 turns passed before a consumer read it (e.g. a run of trivial-prompt turns between fire and read). - Empty-streak backoff: consecutive empty dialectic returns widen the effective cadence (dialectic_cadence + streak, capped at cadence × 8). A healthy fire resets the streak. Prevents the plugin from hammering the backend every turn when the peer graph is cold. - liveness_snapshot() on the provider exposes current turn, last fire, pending fire-at, empty streak, effective cadence, and thread status for in-process diagnostics. - system_prompt_block: nudge the model that honcho_reasoning accepts reasoning_level minimal/low/medium/high/max per call. - hermes honcho status: surface base reasoning level, cap, and heuristic toggle so config drift is visible at a glance. Tests: 550 passed. - TestDialecticLiveness (8 tests): stale-thread recovery, stale-result discard, fresh-result retention, backoff widening, backoff ceiling, streak reset on success, streak increment on empty, snapshot shape. - Existing TestDialecticCadenceAdvancesOnSuccess::test_in_flight_thread_is_not_stacked updated to set _prefetch_thread_started_at so it tests the fresh-thread-blocks branch (stale path covered separately). - test_cli TestCmdStatus fake updated with the new config attrs surfaced in the status block.	2026-04-18 22:50:55 -07:00
Erosika	5f9907c116	chore(honcho): drop docs from PR scope, scrub commentary - Revert website/docs and SKILL.md changes; docs unification handled separately - Scrub commit/PR refs and process narration from code comments and test docstrings (no behavior change)	2026-04-18 22:50:55 -07:00
Erosika	78586ce036	fix(honcho): dialectic lifecycle — defaults, retry, prewarm consumption Several correctness and cost-safety fixes to the Honcho dialectic path after a multi-turn investigation surfaced a chain of silent failures: - dialecticCadence default flipped 3 → 1. PR #10619 changed this from 1 to 3 for cost, but existing installs with no explicit config silently went from per-turn dialectic to every-3-turns on upgrade. Restores pre-#10619 behavior; 3+ remains available for cost-conscious setups. Docs + wizard + status output updated to match. - Session-start prewarm now consumed. Previously fired a .chat() on init whose result landed in HonchoSessionManager._dialectic_cache and was never read — pop_dialectic_result had zero call sites. Turn 1 paid for a duplicate synchronous dialectic. Prewarm now writes directly to the plugin's _prefetch_result via _prefetch_lock so turn 1 consumes it with no extra call. - Prewarm is now dialecticDepth-aware. A single-pass prewarm can return weak output on cold peers; the multi-pass audit/reconcile cycle is exactly the case dialecticDepth was built for. Prewarm now runs the full configured depth in the background. - Silent dialectic failure no longer burns the cadence window. _last_dialectic_turn now advances only when the result is non-empty. Empty result → next eligible turn retries immediately instead of waiting the full cadence gap. - Thread pile-up guard. queue_prefetch skips when a prior dialectic thread is still in-flight, preventing stacked races on _prefetch_result. - First-turn sync timeout is recoverable. Previously on timeout the background thread's result was stored in a dead local list. Now the thread writes into _prefetch_result under lock so the next turn picks it up. - Cadence gate applies uniformly. At cadence=1 the old "cadence > 1" guard let first-turn sync + same-turn queue_prefetch both fire. Gate now always applies. - Restored query-length reasoning-level scaling, dropped in 9a0ab34c. Scales dialecticReasoningLevel up on longer queries (+1 at ≥120 chars, +2 at ≥400), clamped at reasoningLevelCap. Two new config keys: `reasoningHeuristic` (bool, default true) and `reasoningLevelCap` (string, default "high"; previously parsed but never enforced). Respects dialecticDepthLevels and proportional lighter-early passes. - Restored short-prompt skip, dropped in `ef7f3156`. One-word acknowledgements ("ok", "y", "thanks") and slash commands bypass both injection and dialectic fire. - Purged dead code in session.py: prefetch_dialectic, _dialectic_cache, set_dialectic_result, pop_dialectic_result — all unused after prewarm refactor. Tests: 542 passed across honcho_plugin/, agent/test_memory_provider.py, and run_agent/test_run_agent.py. New coverage: - TestTrivialPromptHeuristic (classifier + prefetch/queue skip) - TestDialecticCadenceAdvancesOnSuccess (empty-result retry, pile-up guard) - TestSessionStartDialecticPrewarm (prewarm consumed, sync fallback) - TestReasoningHeuristic (length bumps, cap clamp, interaction with depth) - TestDialecticLifecycleSmoke (end-to-end 8-turn session walk)	2026-04-18 22:50:55 -07:00
Teknium	bf5d7462ba	fix(tui): reject history-mutating commands while session is running (#12416 ) Fixes silent data loss in the TUI when /undo, /compress, /retry, or rollback.restore runs during an in-flight agent turn. The version- guard at prompt.submit:1449 would fail the version check and silently skip writing the agent's result — UI showed the assistant reply but DB / backend history never received it, causing UI↔backend desync that persisted across session resume. Changes (tui_gateway/server.py): - session.undo, session.compress, /retry, rollback.restore (full-history only — file-scoped rollbacks still allowed): reject with 4009 when session.running is True. Users can /interrupt first. - prompt.submit: on history_version mismatch (defensive backstop), attach a 'warning' field to message.complete and log to stderr instead of silently dropping the agent's output. The UI can surface the warning to the user; the operator can spot it in logs. Tests (tests/test_tui_gateway_server.py): 6 new cases. - test_session_undo_rejects_while_running - test_session_undo_allowed_when_idle (regression guard) - test_session_compress_rejects_while_running - test_rollback_restore_rejects_full_history_while_running - test_prompt_submit_history_version_mismatch_surfaces_warning - test_prompt_submit_history_version_match_persists_normally (regression) Validated: against unpatched server.py the three 'rejects_while_running' tests fail and the version-mismatch test fails (no 'warning' field). With the fix, all 6 pass, all 33 tests in the file pass, 74 TUI tests in total pass. Live E2E against the live Python environment confirmed all 5 patches present and guards enforce 4009 exactly as designed.	2026-04-18 22:30:10 -07:00
Teknium	3a6351454b	fix(gateway): close pending-drain and late-arrival races in base adapter (#12371 ) Two related race conditions in gateway/platforms/base.py that could produce duplicate agent runs or silently drop messages. Neither is specific to any one platform — all adapters inherit this logic. R5 (HIGH) — duplicate agent spawn on turn chain In _process_message_background, the pending-drain path deleted _active_sessions[session_key] before awaiting typing_task.cancel() and then recursively awaiting _process_message_background for the queued event. During the typing_task await, a fresh inbound message M3 could pass the Level-1 guard (entry now missing), set its own Event, and spawn a second _process_message_background for the same session_key — two agents running simultaneously, duplicate responses, duplicate tool calls. Fix: keep the _active_sessions entry populated and only clear() the Event. The guard stays live, so any concurrent inbound message takes the busy-handler path (queue + interrupt) as intended. R6 (MED-HIGH) — message dropped during finally cleanup The finally block has two await points (typing_task, stop_typing) before it deletes _active_sessions. A message arriving in that window passes the guard (entry still live), lands in _pending_messages via the busy-handler — and then the unconditional del removes the guard with that message still queued. Nothing drains it; the user never gets a reply. Fix: before deleting _active_sessions in finally, pop any late pending_messages entry and spawn a drain task for it. Only delete _active_sessions when no pending is waiting. Tests: tests/gateway/test_pending_drain_race.py — three regression cases. Validated: without the fix, two of the three fail exactly where the races manifest (duplicate-spawn guard loses identity, late-arrival 'LATE' message not in processed list).	2026-04-18 19:32:26 -07:00
Teknium	762f7e9796	feat: configurable approval mode for cron jobs (approvals.cron_mode) Add approvals.cron_mode config option that controls how cron jobs handle dangerous commands. Previously, cron jobs silently auto-approved all dangerous commands because there was no user present to approve them. Now the behavior is configurable: - deny (default): block dangerous commands and return a message telling the agent to find an alternative approach. The agent loop continues — it just can't use that specific command. - approve: auto-approve all dangerous commands (previous behavior). When a command is blocked, the agent receives the same response format as a user denial in the CLI — exit_code=-1, status=blocked, with a message explaining why and pointing to the config option. This keeps the agent loop running and encourages it to adapt. Implementation: - config.py: add approvals.cron_mode to DEFAULT_CONFIG - scheduler.py: set HERMES_CRON_SESSION=1 env var before agent runs - approval.py: both check_command_approval() and check_all_command_guards() now check for cron sessions and apply the configured mode - 21 new tests covering config parsing, deny/approve behavior, and interaction with other bypass mechanisms (yolo, containers)	2026-04-18 19:24:35 -07:00
Teknium	b02833f32d	fix(codex): Hermes owns its own Codex auth; stop touching ~/.codex/auth.json (#12360 ) Codex OAuth refresh tokens are single-use and rotate on every refresh. Sharing them with the Codex CLI / VS Code via ~/.codex/auth.json made concurrent use of both tools a race: whoever refreshed last invalidated the other side's refresh_token. On top of that, the silent auto-import path picked up placeholder / aborted-auth data from ~/.codex/auth.json (e.g. literal {"access_token":"access-new","refresh_token":"refresh-new"}) and seeded it into the Hermes pool as an entry the selector could eventually pick. Hermes now owns its own Codex auth state end-to-end: Removed - agent/credential_pool.py: _sync_codex_entry_from_cli() method, its pre-refresh + retry + _available_entries call sites, and the post-refresh write-back to ~/.codex/auth.json. - agent/credential_pool.py: auto-import from ~/.codex/auth.json in _seed_from_singletons() — users now run `hermes auth openai-codex` explicitly. - hermes_cli/auth.py: silent runtime migration in resolve_codex_runtime_credentials() — now surfaces `codex_auth_missing` directly (message already points to `hermes auth`). - hermes_cli/auth.py: post-refresh write-back in _refresh_codex_auth_tokens(). - hermes_cli/auth.py: dead helper _write_codex_cli_tokens() and its 4 tests in test_auth_codex_provider.py. Kept - hermes_cli/auth.py: _import_codex_cli_tokens() — still used by the interactive `hermes auth openai-codex` setup flow for a user-gated one-time import (with "a separate login is recommended" messaging). User-visible impact - On existing installs with Hermes auth already present: no change. - On a fresh install where the user has only logged in via Codex CLI: `hermes chat --provider openai-codex` now fails with "No Codex credentials stored. Run `hermes auth` to authenticate." The interactive setup flow then detects ~/.codex/auth.json and offers a one-time import. - On an install where Codex CLI later refreshes its token: Hermes is unaffected (we no longer read from that file at runtime). Tests - tests/hermes_cli/test_auth_codex_provider.py: 15/15 pass. - tests/hermes_cli/test_auth_commands.py: 20/20 pass. - tests/agent/test_credential_pool.py: 31/31 pass. - Live E2E on openai-codex/gpt-5.4: 1 API call, 1.7s latency, 3 log lines, no refresh events, no auth drama. The related 14:52 refresh-loop bug (hundreds of rotations/minute on a single entry) is a separate issue — that requires a refresh-attempt cap on the auth-recovery path in run_agent.py, which remains open.	2026-04-18 19:19:46 -07:00
yeyitech	bd01ec7885	fix(cli): strip all reasoning tag variants from /resume recap HermesCLI._display_resumed_history() calls the module-level _strip_reasoning_tags() to clean assistant content before rendering the recap panel. The tag list was missing <thought> (Gemma 4) and there was no pass for stray orphan </tag> closes, so those variants leaked internal reasoning into the recap display (#11316). - Add <thought> to _REASONING_TAGS. - Add a third regex pass that strips orphan close tags (e.g. 'stuff</think>answer' → 'stuffanswer'). - Apply IGNORECASE to closed-pair and unclosed-pair passes so mixed-case variants (<THINK>, <Thinking>) are handled uniformly — previously both 'THINKING' and 'thinking' had to be listed explicitly as distinct tuple entries, which missed <Thinking>. 7 new regression tests in tests/cli/test_resume_display.py covering: <think>, <thinking>, <reasoning>, <thought>, unclosed <think>, multiple interleaved blocks, and orphan </think> close. Resolves #11316. Originally proposed as PR #11366.	2026-04-18 19:19:24 -07:00
Tranquil-Flow	ec48ec5530	fix(agent): strip <think> blocks from stored assistant content Inline reasoning tags in an assistant message's content field leak to every downstream consumer: messaging platforms (#8878, #9568), API replay of prior turns, session transcript, CLI recap, generated session titles, and context compression. _extract_reasoning() already captures the reasoning text into msg['reasoning'] separately, so the raw tags in content are redundant. Stripping once at the storage boundary in _build_assistant_message() cleans the content for every downstream path in one place — no per-platform or per-path stripper needed. Measured impact on a real MiniMax M2.7-highspeed session (per @luoyejiaoe-source, #9306): 55% of assistant messages started with <think> blocks, 51/100 session titles were polluted, 16% content-size reduction. 3 new regression tests in TestBuildAssistantMessage: closed-pair strip with reasoning capture, no-think-tag passthrough, and unterminated-block strip. Resolves #8878 and #9568. Originally proposed as PR #9250.	2026-04-18 19:19:24 -07:00
Teknium	9489d1577d	fix(agent): strip unterminated <think> blocks from visible content Providers served via NIM (MiniMax M2.7, some Moonshot/DeepSeek proxies) sometimes drop the closing </think> tag, leaving raw reasoning in the assistant's content field. _strip_think_blocks()'s closed-pair regex is non-greedy so it only matches complete blocks — any orphan <think>...EOF survived the stripper and leaked to users (#8878, #9568, #10408). Adds an unterminated-tag pass that fires when an open reasoning tag sits at a block boundary (start of text or after a newline) with no matching close. Everything from that tag to end of string is stripped. The block-boundary check mirrors gateway/stream_consumer.py's filter so models that mention <think> in prose are not over-stripped. Also makes the closed-pair regexes consistently case-insensitive so <THINK>...</THINK> and <Thinking>...</Thinking> are handled uniformly — previously the mixed-case open tag would bypass the closed-pair pass and be caught by the unterminated-tag pass, taking trailing visible content with it. 6 new regression tests in TestStripThinkBlocks covering: unterminated <think>, unterminated <thought>, multi-line unterminated, line-start orphan with preserved prefix, prose-mention non-regression, mixed-case closed pairs. The implementation is inspired by @luinbytes's PR #10408 report of the NIM/MiniMax symptom. This commit does not include the 💭/🧠 emoji regexes from that PR — those glyphs are Hermes CLI display decorations, not model content markers.	2026-04-18 19:19:24 -07:00
Teknium	beabbd87ef	fix(gateway): close adapter resources when connect() fails or raises (#12339 ) Gateway startup leaks aiohttp.ClientSession (and other partial-init resources) when an adapter's connect() returns False or raises. The adapter is never added to self.adapters, so the shutdown path at gateway/run.py:2426 never calls disconnect() on it — Python GC later logs 'Unclosed client session' at process exit. Seen on 2026-04-18 18:08:16 during a double --replace takeover cycle: one of the partial-init sessions survived past shutdown and emitted the warning right before status=75/TEMPFAIL. Fix: - New GatewayRunner._safe_adapter_disconnect() helper — calls adapter.disconnect() and swallows any exception. Used on error paths. - Connect loop calls it in both failure branches: success=False and except Exception. - Adapter disconnect() implementations are already expected to be idempotent and tolerate partial-init state (they all guard on self._http_session / self._bridge_process before touching them). Tests: tests/gateway/test_safe_adapter_disconnect.py — 3 cases verify the helper forwards to disconnect, swallows exceptions, and tolerates platform=None.	2026-04-18 18:53:31 -07:00
Teknium	632a807a3e	fix(gateway): slash commands never interrupt a running agent (#12334 ) Any recognized slash command now bypasses the Level-1 active-session guard instead of queueing + interrupting. A mid-run /model (or /reasoning, /voice, /insights, /title, /resume, /retry, /undo, /compress, /usage, /provider, /reload-mcp, /sethome, /reset) used to interrupt the agent AND get silently discarded by the slash-command safety net — zero-char response, dropped tool calls. Root cause: - Discord registers 41 native slash commands via tree.command(). - Only 14 were in ACTIVE_SESSION_BYPASS_COMMANDS. - The other ~15 user-facing ones fell through base.py:handle_message to the busy-session handler, which calls running_agent.interrupt() AND queues the text. - After the aborted run, gateway/run.py:9912 correctly identifies the queued text as a slash command and discards it — but the damage (interrupt + zero-char response) already happened. Fix: - should_bypass_active_session() now returns True for any resolvable slash command. ACTIVE_SESSION_BYPASS_COMMANDS stays as the subset with dedicated Level-2 handlers (documentation + tests). - gateway/run.py adds a catch-all after the dedicated handlers that returns a user-visible "agent busy — wait or /stop first" response for any other resolvable command. - Unknown text / file-path-like messages are unchanged — they still queue. Also: - gateway/platforms/discord.py logs the invoker identity on every slash command (user id + name + channel + guild) so future ghost-command reports can be triaged without guessing. Tests: - 15 new parametrized cases in test_command_bypass_active_session.py cover every previously-broken Discord slash command. - Existing tests for /stop, /new, /approve, /deny, /help, /status, /agents, /background, /steer, /update, /queue still pass. - test_steer.py's ACTIVE_SESSION_BYPASS_COMMANDS check still passes. Fixes #5057. Related: #6252, #10370, #4665.	2026-04-18 18:53:22 -07:00
Teknium	aa5f89d3ea	test: add coverage for from_user=None DM fallback Tests the three cases: - DM with from_user=None: user_id falls back to chat.id - Group with from_user=None: user_id stays None (safe default) - DM with from_user present: user_id uses from_user.id (no regression)	2026-04-18 18:18:01 -07:00
Teknium	c49a58a6d0	fix(gateway): mark only still-running sessions resume_pending on drain timeout (#12332 ) Follow-up to #12301. The drain-timeout branch of _stop_impl() was iterating the drain-start snapshot (active_agents) when marking sessions resume_pending. That snapshot can include sessions that finished gracefully during the drain window — marking them would give their next turn a stray 'your previous turn was interrupted by a gateway restart' system note even though the prior turn actually completed cleanly. Iterate self._running_agents at timeout time instead, mirroring _interrupt_running_agents() exactly: - only sessions still blocking the shutdown get marked - pending sentinels (AIAgent construction not yet complete) are skipped Changes: - gateway/run.py: swap active_agents.keys() for filtered self._running_agents.items() iteration in the drain-timeout mark loop. - tests/gateway/test_restart_resume_pending.py: two regression tests — finisher-during-drain not marked, pending sentinel not marked.	2026-04-18 17:40:34 -07:00
Teknium	cb4addacab	fix(gateway): auto-resume sessions after drain-timeout restart (#11852 ) (#12301 ) The shutdown banner promised "send any message after restart to resume where you left off" but the code did the opposite: a drain-timeout restart skipped the .clean_shutdown marker, which made the next startup call suspend_recently_active(), which marked the session suspended, which made get_or_create_session() spawn a fresh session_id with a 'Session automatically reset. Use /resume...' notice — contradicting the banner. Introduce a resume_pending state on SessionEntry that is distinct from suspended. Drain-timeout shutdown flags active sessions resume_pending instead of letting startup-wide suspension destroy them. The next message on the same session_key preserves the session_id, reloads the transcript, and the agent receives a reason-aware restart-resume system note that subsumes the existing tool-tail auto-continue note (PR #9934). Terminal escalation still flows through the existing .restart_failure_counts stuck-loop counter (PR #7536, threshold 3) — no parallel counter on SessionEntry. suspended still wins over resume_pending in get_or_create_session() so genuinely stuck sessions converge to a clean slate. Spec: PR #11852 (BrennerSpear). Implementation follows the spec with the approved correction (reuse .restart_failure_counts rather than adding a resume_attempts field). Changes: - gateway/session.py: SessionEntry.resume_pending/resume_reason/ last_resume_marked_at + to_dict/from_dict; SessionStore .mark_resume_pending()/clear_resume_pending(); get_or_create_session() returns existing entry when resume_pending (suspended still wins); suspend_recently_active() skips resume_pending entries. - gateway/run.py: _stop_impl() drain-timeout branch marks active sessions resume_pending before _interrupt_running_agents(); _run_agent() injects reason-aware restart-resume system note that subsumes the tool-tail case; successful-turn cleanup also clears resume_pending next to _clear_restart_failure_count(); _notify_active_sessions_of_shutdown() softens the restart banner to 'I'll try to resume where you left off' (honest about stuck-loop escalation). - tests/gateway/test_restart_resume_pending.py: 29 new tests covering SessionEntry roundtrip, mark/clear helpers, get_or_create_session precedence (suspended > resume_pending), suspend_recently_active skip, drain-timeout mark reason (restart vs shutdown), system-note injection decision tree (including tool-tail subsumption), banner wording, and stuck-loop escalation override.	2026-04-18 17:32:17 -07:00
Teknium	6a3a6a0fb6	Merge pull request #12263 from NousResearch/bb/tui-audit-followup fix(tui): TUI v2 audit follow-up — registry, overlays, paste, reasoning, hyperlinks	2026-04-18 14:40:16 -07:00
helix4u	4e8f60fd11	fix(cli): use display width for wrapped spinner height	2026-04-18 14:34:05 -07:00
Brooklyn Nicholson	bfac5d039d	Merge branch 'main' of github.com:NousResearch/hermes-agent into bb/tui-audit-followup	2026-04-18 15:27:40 -05:00
Brooklyn Nicholson	93b4080b78	Merge branch 'main' of github.com:NousResearch/hermes-agent into bb/tui-audit-followup # Conflicts: # ui-tui/src/components/markdown.tsx # ui-tui/src/types/hermes-ink.d.ts	2026-04-18 14:52:54 -05:00
helix4u	ca32a2a60b	fix(gemini): restore bearer auth on openai route	2026-04-18 12:52:01 -07:00
helix4u	a7dd6a3449	fix(gemini): hide stale and low-TPM Google models	2026-04-18 12:52:01 -07:00
helix4u	2eab7ee15f	fix(gemini): hide low-TPM Gemma models from exposed lists	2026-04-18 12:52:01 -07:00
jarvischer	0f778f7768	fix: prevent tool name duplication in streaming accumulator (MiniMax/NVIDIA NIM) Based on #11984 by @maxchernin. Fixes #8259. Some providers (MiniMax M2.7 via NVIDIA NIM) resend the full function name in every streaming chunk instead of only the first. The old accumulator used += which concatenated them into 'read_fileread_file'. Changed to simple assignment (=), matching the OpenAI Node SDK, LiteLLM, and Vercel AI SDK patterns. Function names are atomic identifiers delivered complete — no provider splits them across chunks, so concatenation was never correct semantics.	2026-04-18 12:50:32 -07:00
Honghua Yang	3128d9fcd2	fix(context_compressor): keep tool-call arguments JSON valid when shrinking Pass 3 of `_prune_old_tool_results` previously shrunk long `function.arguments` blobs by slicing the raw JSON string at byte 200 and appending the literal text `...[truncated]`. That routinely produced payloads like:: {"path": "/foo.md", "content": "# Long markdown ...[truncated] — an unterminated string with no closing brace. Strict providers (observed on MiniMax) reject this as `invalid function arguments json string` with a non-retryable 400. Because the broken call survives in the session history, every subsequent turn re-sends the same malformed payload and gets the same 400, locking the session into a re-send loop until the call falls out of the window. Fix: parse the arguments first, shrink long string leaves inside the parsed structure, and re-serialise. Non-string values (paths, ints, booleans, lists) pass through intact. Arguments that are not valid JSON to begin with (rare, some backends use non-JSON tool args) are returned unchanged rather than replaced with something neither we nor the provider can parse. Observed in the wild: a `write_file` with ~800 chars of markdown `content` triggered this on a real session against MiniMax-M2.7; every turn after compression got rejected until the session was manually reset. Tests: - 7 direct tests of `_truncate_tool_call_args_json` covering valid-JSON output, non-JSON pass-through, nested structures, non-string leaves, scalar JSON, and Unicode preservation - 1 end-to-end test through `_prune_old_tool_results` Pass 3 that reproduces the exact failure payload shape from the incident Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 12:40:56 -07:00
kshitij	c14b3b5880	fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo) (#12144 ) * fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo) The prior override only matched the literal model name "kimi-for-coding", but Moonshot's coding endpoint is hit with real model IDs such as `kimi-k2.5`, `kimi-k2-turbo-preview`, `kimi-k2-thinking`, etc. Those requests bypassed the override and kept the caller's temperature, so Moonshot returns HTTP 400 "invalid temperature: only 0.6 is allowed for this model" (or 1.0 for thinking variants). Match the whole kimi-k2.* family: * kimi-k2-thinking / kimi-k2-thinking-turbo -> 1.0 (thinking mode) * all other kimi-k2.* -> 0.6 (non-thinking / instant mode) Also accept an optional vendor prefix (e.g. `moonshotai/kimi-k2.5`) so aggregator routings are covered. * refactor(kimi): whitelist-match kimi coding models instead of prefix Addresses review feedback on PR #12144. - Replace `startswith("kimi-k2")` with explicit frozensets sourced from Moonshot's kimi-for-coding model list. The prefix match would have also clamped `kimi-k2-instruct` / `kimi-k2-instruct-0905`, which are the separate non-coding K2 family with variable temperature (recommended 0.6 but not enforced — see huggingface.co/moonshotai/Kimi-K2-Instruct). - Confirmed via platform.kimi.ai docs that all five coding models (k2.5, k2-turbo-preview, k2-0905-preview, k2-thinking, k2-thinking-turbo) share the fixed-temperature lock, so the preview-model mapping is no longer an assumption. - Drop the fragile `"thinking" in bare` substring test for a set lookup. - Log a debug line on each override so operators can see when Hermes silently rewrites temperature. - Update class docstring. Extend the negative test to parametrize over kimi-k2-instruct, Kimi-K2-Instruct-0905, and a hypothetical future kimi-k2-experimental name — all must keep the caller's temperature.	2026-04-18 09:35:51 -07:00
kshitijk4poor	656c375855	fix(tui): review follow-up — /retry, /plan, ANSI truncation, caching - /retry: use session['history'] instead of non-existent agent.conversation_history; truncate history at last user message to match CLI retry_last() behavior; add history_lock safety - /plan: pass user instruction (arg) to build_plan_path instead of session_key; add runtime_note so agent knows where to save the plan - ANSI tool results: render full text via <Ansi wrap=truncate-end> instead of slicing raw ANSI through compactPreview (which cuts mid-escape-sequence producing garbled output) - Move _PENDING_INPUT_COMMANDS frozenset to module level - Use get_skill_commands() (cached) instead of scan_skill_commands() (rescans disk) in slash.exec skill interception - Add 3 retry tests: happy path with history truncation verification, empty history error, multipart content extraction - Update test mock target from scan_skill_commands to get_skill_commands	2026-04-18 09:30:48 -07:00
kshitijk4poor	abc95338c2	fix(tui): slash.exec _pending_input commands, tool ANSI, terminal title Additional TUI fixes discovered in the same audit: 1. /plan slash command was silently lost — process_command() queues the plan skill invocation onto _pending_input which nobody reads in the slash worker subprocess. Now intercepted in slash.exec and routed through command.dispatch with a new 'send' dispatch type. Same interception added for /retry, /queue, /steer as safety nets (these already have correct TUI-local handlers in core.ts, but the server-side guard prevents regressions if the local handler is bypassed). 2. Tool results were stripping ANSI escape codes — the messageLine component used stripAnsi() + plain <Text> for tool role messages, losing all color/styling from terminal, search_files, etc. Now uses <Ansi> component (already imported) when ANSI is detected. 3. Terminal tab title now shows model + busy status via useTerminalTitle hook from @hermes/ink (was never used). Users can identify Hermes tabs and see at a glance whether the agent is busy or ready. 4. Added 'send' variant to CommandDispatchResponse type + asCommandDispatch parser + createSlashHandler handler for commands that need to inject a message into the conversation (plan, queue fallback, steer fallback).	2026-04-18 09:30:48 -07:00
kshitijk4poor	2da558ec36	fix(tui): clickable hyperlinks and skill slash command dispatch Two TUI fixes: 1. Hyperlinks are now clickable (Cmd+Click / Ctrl+Click) in terminals that support OSC 8. The markdown renderer was rendering links as plain colored text — now wraps them in the existing <Link> component from @hermes/ink which emits OSC 8 escape sequences. 2. Skill slash commands (e.g. /hermes-agent-dev) now work in the TUI. The slash.exec handler was delegating to the _SlashWorker subprocess which calls cli.process_command(). For skills, process_command() queues the invocation message onto _pending_input — a Queue that nobody reads in the worker subprocess. The skill message was lost. Now slash.exec detects skill commands early and rejects them so the TUI falls through to command.dispatch, which correctly builds and returns the skill payload for the client to send().	2026-04-18 09:30:48 -07:00
Brooklyn Nicholson	b82ec6419d	test(tui-gateway): cover mcp_servers field in _session_info output	2026-04-18 09:42:57 -05:00
Brooklyn Nicholson	a397b0fd4d	test(tui-gateway): assert quick_commands appear in commands.catalog output	2026-04-18 09:42:57 -05:00
Teknium	2edebedc9e	feat(steer): /steer <prompt> injects a mid-run note after the next tool call (#12116 ) * feat(steer): /steer <prompt> injects a mid-run note after the next tool call Adds a new slash command that sits between /queue (turn boundary) and interrupt. /steer <text> stashes the message on the running agent and the agent loop appends it to the LAST tool result's content once the current tool batch finishes. The model sees it as part of the tool output on its next iteration. No interrupt is fired, no new user turn is inserted, and no prompt cache invalidation happens beyond the normal per-turn tool-result churn. Message-role alternation is preserved — we only modify an existing role:"tool" message's content. Wiring ------ - hermes_cli/commands.py: register /steer + add to ACTIVE_SESSION_BYPASS_COMMANDS. - run_agent.py: add _pending_steer state, AIAgent.steer(), _drain_pending_steer(), _apply_pending_steer_to_tool_results(); drain at end of both parallel and sequential tool executors; clear on interrupt; return leftover as result['pending_steer'] if the agent exits before another tool batch. - cli.py: /steer handler — route to agent.steer() when running, fall back to the regular queue otherwise; deliver result['pending_steer'] as next turn. - gateway/run.py: running-agent intercept calls running_agent.steer(); idle-agent path strips the prefix and forwards as a regular user message. - tui_gateway/server.py: new session.steer JSON-RPC method. - ui-tui: SessionSteerResponse type + local /steer slash command that calls session.steer when ui.busy, otherwise enqueues for the next turn. Fallbacks --------- - Agent exits mid-steer → surfaces in run_conversation result as pending_steer so CLI/gateway deliver it as the next user turn instead of silently dropping it. - All tools skipped after interrupt → re-stashes pending_steer for the caller. - No active agent → /steer reduces to sending the text as a normal message. Tests ----- - tests/run_agent/test_steer.py — accept/reject, concatenation, drain, last-tool-result injection, multimodal list content, thread safety, cleared-on-interrupt, registry membership, bypass-set membership. - tests/gateway/test_steer_command.py — running agent, pending sentinel, missing steer() method, rejected payload, empty payload. - tests/gateway/test_command_bypass_active_session.py — /steer bypasses the Level-1 base adapter guard. - tests/test_tui_gateway_server.py — session.steer RPC paths. 72/72 targeted tests pass under scripts/run_tests.sh. * feat(steer): register /steer in Discord's native slash tree Discord's app_commands tree is a curated subset of slash commands (not derived from COMMAND_REGISTRY like Telegram/Slack). /steer already works there as plain text (routes through handle_message → base adapter bypass → runner), but registering it here adds Discord's native autocomplete + argument hint UI so users can discover and type it like any other first-class command.	2026-04-18 04:17:18 -07:00
Teknium	9527707f80	fix(signal): back off sendTyping spam for unreachable recipients (#12118 ) base.py's _keep_typing refresh loop calls send_typing every ~2s while the agent is processing. If signal-cli returns NETWORK_FAILURE for the recipient (offline, unroutable, group membership lost), the unmitigated path was a WARNING log every 2 seconds for as long as the agent stayed busy — a user report showed 1048 warnings in 41 minutes for one offline contact, plus the matching volume of pointless RPC traffic to signal-cli. - _rpc() accepts log_failures=False so callers can route repeated expected failures (typing) to DEBUG while keeping send/receive at WARNING. - send_typing() tracks consecutive failures per chat. First failure still logs WARNING so transport issues remain visible; subsequent failures log at DEBUG. After three consecutive failures we skip the RPC during an exponential cooldown (16s, 32s, 60s cap) so we stop hammering signal-cli for a recipient it can't deliver to. A successful sendTyping resets the counters. - _stop_typing_indicator() clears the backoff state so the next agent turn starts fresh. E2E simulation against the reported 41-minute window: RPCs drop from 1230 to 45 (-96%), log lines from 1048 WARNINGs to 1 WARNING + 44 DEBUGs. Credits kshitijk4poor (#12056) for the _rpc log_failures kwarg idea; the broader restructure in that PR (nested per-chat loop inside send_typing) is avoided here in favour of stateful backoff that preserves base.py's existing _keep_typing architecture.	2026-04-18 04:13:32 -07:00
teknium1	3b69b2fd61	test(session-search): regression coverage for CJK LIKE fallback Twelve tests under TestCJKSearchFallback guarding: - CJK detection across Chinese/Japanese/Korean/Hiragana/Katakana ranges (including the full Hangul syllables block \uac00-\ud7af, to catch the shorter-range typo from one of the duplicate PRs) - Substring match for multi-char Chinese, Japanese, Korean queries - Filter preservation (source_filter, exclude_sources, role_filter) in the LIKE path — guards against the SQL-builder bug from another duplicate PR where filter clauses landed after LIMIT/OFFSET - Snippet centered on the matched term (instr-based substr window), not the leading 200 chars of content - English fast-path untouched - Empty/no-match cases - Mixed CJK+English queries Also: - hermes_state.py: LIKE-fallback snippet is now `substr(content, max(1, instr(content, ?) - 40), 120)`, centered on the match instead of the whole-content default. Credit goes to @iamagenius00 for the snippet idea in PR #11517. - scripts/release.py: add @iamagenius00 to AUTHOR_MAP so future release attribution resolves cleanly. Refs #11511, #11516, #11517, #11541. Co-authored-by: iamagenius00 <iamagenius00@users.noreply.github.com>	2026-04-18 01:57:57 -07:00
Teknium	8322b42c6c	fix(streaming): surface dropped tool-call on mid-stream stall (#12072 ) When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: \| Scenario \| Before \| After \| \|---------------------------------------------\|---------------------------\|---------------------------------------------\| \| Stream dies mid tool-call, text already sent \| Silent exit, no indication \| User sees ⚠ warning naming the dropped tool \| \| Text-only partial stream \| Bare recovered text \| Unchanged \| \| tests/run_agent/test_streaming.py \| 24 passed \| 26 passed (2 new) \|	2026-04-18 01:52:06 -07:00
Teknium	285bb2b915	feat(execute_code): add project/strict execution modes, default to project (#11971 ) Weaker models (Gemma-class) repeatedly rediscover and forget that execute_code uses a different CWD and Python interpreter than terminal(), causing them to flip-flop on whether user files exist and to hit import errors on project dependencies like pandas. Adds a new 'code_execution.mode' config key (default 'project') that brings execute_code into line with terminal()'s filesystem/interpreter: project (new default): - cwd = session's TERMINAL_CWD (falls back to os.getcwd()) - python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python with a Python 3.8+ version check; falls back cleanly to sys.executable if no venv or the candidate fails - result : 'import pandas' works, '.env' resolves, matches terminal() strict (opt-in): - cwd = staging tmpdir (today's behavior) - python = sys.executable (today's behavior) - result : maximum reproducibility and isolation; project deps won't resolve Security-critical invariants are identical across both modes and covered by explicit regression tests: - env scrubbing (strips _API_KEY, _TOKEN, _SECRET, _PASSWORD, _CREDENTIAL, _PASSWD, *_AUTH substrings) - SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no delegate_task, no MCP from inside scripts) - resource caps (5-min timeout, 50KB stdout, 50 tool calls) Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool descriptions (regression from commit `39b83f34` where agents on local backends falsely believed they were sandboxed and refused networking). Override via env var: HERMES_EXECUTE_CODE_MODE=strict\|project	2026-04-18 01:46:25 -07:00
Teknium	598cba62ad	test: update stale tests to match current code (#11963 ) Seven test files were asserting against older function signatures and behaviors. CI has been red on main because of accumulated test debt from other PRs; this catches the tests up. - tests/agent/test_subagent_progress.py: _build_child_progress_callback now takes (task_index, goal, parent_agent, task_count=1); update all call sites and rewrite tests that assumed the old 'batch-only' relay semantics (now relays per-tool AND flushes a summary at BATCH_SIZE). Renamed test_thinking_not_relayed_to_gateway → test_thinking_relayed_to_gateway since thinking IS now relayed as subagent.thinking. - tests/tools/test_delegate.py: _build_child_agent now requires task_count; add task_count=1 to all 8 call sites. - tests/cli/test_reasoning_command.py: AIAgent gained _stream_callback; stub it on the two test agent helpers that use spec=AIAgent / __new__. - tests/hermes_cli/test_cmd_update.py: cmd_update now runs npm install in repo root + ui-tui/ + web/ and 'npm run build' in web/; assert all four subprocess calls in the expected order. - tests/hermes_cli/test_model_validation.py: dissimilar unknown models now return accepted=False (previously True with warning); update both affected tests. - tests/tools/test_registry.py: include feishu_doc_tool and feishu_drive_tool in the expected builtin tool set. - tests/gateway/test_voice_command.py: missing-voice-deps message now suggests 'pip install PyNaCl' not 'hermes-agent[messaging]'. 411/411 pass locally across these 7 files.	2026-04-17 21:35:30 -07:00
AviArora02-commits	994faacce8	fix: suppress Authorization: Bearer for Gemini provider to prevent HTTP 400 (#7893 )	2026-04-17 21:30:17 -07:00
Teknium	8a59f8a9ed	fix(update): survive mid-update terminal disconnect (#11960 ) hermes update no longer dies when the controlling terminal closes (SSH drop, shell close) during pip install. SIGHUP is set to SIG_IGN for the duration of the update, and stdout/stderr are wrapped so writes to a closed pipe are absorbed instead of cascading into process exit. All update output is mirrored to ~/.hermes/logs/update.log so users can see what happened after reconnecting. SIGINT (Ctrl-C) and SIGTERM (systemd) are intentionally still honored — those are deliberate cancellations, not accidents. In gateway mode the helper is a no-op since the update is already detached. POSIX preserves SIG_IGN across exec(), so pip and git subprocesses inherit hangup protection automatically — no changes to subprocess spawning needed.	2026-04-17 21:29:24 -07:00
Teknium	45acd9beb5	fix(gateway): ignore redelivered /restart after PTB offset ACK fails (#11940 ) When a Telegram /restart fires and PTB's graceful-shutdown `get_updates` ACK call times out ("When polling for updates is restarted, updates may be received twice" in gateway.log), the new gateway receives the same /restart again and restarts a second time — a self-perpetuating loop. Record the triggering update_id in `.restart_last_processed.json` when handling /restart. On the next process, reject a /restart whose update_id <= the recorded one as a stale redelivery. 5-minute staleness guard so an orphaned marker can't block a legitimately new /restart. - gateway/platforms/base.py: add `platform_update_id` to MessageEvent - gateway/platforms/telegram.py: propagate `update.update_id` through _build_message_event for text/command/location/media handlers - gateway/run.py: write dedup marker in _handle_restart_command; _is_stale_restart_redelivery checks it before processing /restart - tests/gateway/test_restart_redelivery_dedup.py: 9 new tests covering fresh restart, redelivery, staleness window, cross-platform, malformed-marker resilience, and no-update_id (CLI) bypass Only active for Telegram today (the one platform with monotonic cross-session update ordering); other platforms return False from _is_stale_restart_redelivery and proceed normally.	2026-04-17 21:17:33 -07:00
Teknium	20f2258f34	fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907 ) * fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace interrupt() previously only flagged the agent's _execution_thread_id. Tools running inside _execute_tool_calls_concurrent execute on ThreadPoolExecutor worker threads whose tids are distinct from the agent's, so is_interrupted() inside those tools returned False no matter how many times the gateway called .interrupt() — hung ssh / curl / long make-builds ran to their own timeout. Changes: - run_agent.py: track concurrent-tool worker tids in a per-agent set, fan interrupt()/clear_interrupt() out to them, and handle the register-after-interrupt race at _run_tool entry. getattr fallback for the tracker so test stubs built via object.__new__ keep working. - tools/environments/base.py: opt-in _wait_for_process trace (ENTER, per-30s HEARTBEAT with interrupt+activity-cb state, INTERRUPT DETECTED, TIMEOUT, EXIT) behind HERMES_DEBUG_INTERRUPT=1. - tools/interrupt.py: opt-in set_interrupt() trace (caller tid, target tid, set snapshot) behind the same env flag. - tests: new regression test runs a polling tool on a concurrent worker and asserts is_interrupted() flips to True within ~1s of interrupt(). Second new test guards clear_interrupt() clearing tracked worker bits. Validation: tests/run_agent/ all 762 pass; tests/tools/ interrupt+env subset 216 pass. * fix(interrupt-debug): bypass quiet_mode logger filter so trace reaches agent.log AIAgent.__init__ sets logging.getLogger('tools').setLevel(ERROR) when quiet_mode=True (the CLI default). This would silently swallow every INFO-level trace line from the HERMES_DEBUG_INTERRUPT=1 instrumentation added in the parent commit — confirmed by running hermes chat -q with the flag and finding zero trace lines in agent.log even though _wait_for_process was clearly executing (subprocess pid existed). Fix: when HERMES_DEBUG_INTERRUPT=1, each traced module explicitly sets its own logger level to INFO at import time, overriding the 'tools' parent-level filter. Scoped to the opt-in case only, so production (quiet_mode default) logs stay quiet as designed. Validation: hermes chat -q with HERMES_DEBUG_INTERRUPT=1 now writes '_wait_for_process ENTER/EXIT' lines to agent.log as expected. * fix(cli): SIGTERM/SIGHUP no longer orphans tool subprocesses Tool subprocesses spawned by the local environment backend use os.setsid so they run in their own process group. Before this fix, SIGTERM/SIGHUP to the hermes CLI killed the main thread via KeyboardInterrupt but the worker thread running _wait_for_process never got a chance to call _kill_process — Python exited, the child was reparented to init (PPID=1), and the subprocess ran to its natural end (confirmed live: sleep 300 survived 4+ min after SIGTERM to the agent until manual cleanup). Changes: - cli.py _signal_handler (interactive) + _signal_handler_q (-q mode): route SIGTERM/SIGHUP through agent.interrupt() so the worker's poll loop sees the per-thread interrupt flag and calls _kill_process (os.killpg) on the subprocess group. HERMES_SIGTERM_GRACE (default 1.5s) gives the worker time to complete its SIGTERM+SIGKILL escalation before KeyboardInterrupt unwinds main. - tools/environments/base.py _wait_for_process: wrap the poll loop in try/except (KeyboardInterrupt, SystemExit) so the cleanup fires even on paths the signal handlers don't cover (direct sys.exit, unhandled KI from nested code, etc.). Emits EXCEPTION_EXIT trace line when HERMES_DEBUG_INTERRUPT=1. - New regression test: injects KeyboardInterrupt into a running _wait_for_process via PyThreadState_SetAsyncExc, verifies the subprocess process group is dead within 3s of the exception and that KeyboardInterrupt re-raises cleanly afterward. Validation: \| Before \| After \| \|---------------------------------------------------------\|--------------------\| \| sleep 300 survives 4+ min as PPID=1 orphan after SIGTERM \| dies within 2 s \| \| No INTERRUPT DETECTED in trace \| INTERRUPT DETECTED fires + killing process group \| \| tests/tools/test_local_interrupt_cleanup \| 1/1 pass \| \| tests/run_agent/test_concurrent_interrupt \| 4/4 pass \|	2026-04-17 20:39:25 -07:00
Teknium	607be54a24	fix(discord): forum channel media + polish Extend forum support from PR #10145: - REST path (_send_discord): forum thread creation now uploads media files as multipart attachments on the starter message in a single call. Previously media files were silently dropped on the forum path. - Websocket media paths (_send_file_attachment, send_voice, send_image, send_animation — covers send_image_file, send_video, send_document transitively): forum channels now go through a new _forum_post_file helper that creates a thread with the file as starter content, instead of failing via channel.send(file=...) which forums reject. - _send_to_forum chunk follow-up failures are collected into raw_response['warnings'] so partial-send outcomes surface. - Process-local probe cache (_DISCORD_CHANNEL_TYPE_PROBE_CACHE) avoids GET /channels/{id} on every uncached send after the first. - Dedup of TestSendDiscordMedia that the PR merge-resolution left behind. - Docs: Forum Channels section under website/docs/user-guide/messaging/discord.md. Tests: 117 passed (22 new for forum+media, probe cache, warnings).	2026-04-17 20:25:48 -07:00
ChimingLiu	e5333e793c	feat(discord): support forum channels	2026-04-17 20:25:48 -07:00
helix4u	148459716c	fix(kimi): cover remaining fixed-temperature bypasses	2026-04-17 20:25:42 -07:00

1 2 3 4 5 ...

2075 commits