hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-26 17:38:36 +00:00

Author	SHA1	Message	Date
BarnacleBoy	550b72dd87	fix(cli): gate tool-rendering paths with tool_progress_mode, not quiet_mode quiet_mode was being used to suppress tool-result display when tool_progress_mode was 'off'. But quiet_mode also gates operational status messages, so users with /verbose + tool-progress off lost all status output. Adds a dedicated tool_progress_mode attribute to AIAgent; the tool_executor result-rendering path gates on tool_progress_mode != 'off'. The CLI passes its tool_progress_mode through agent setup and the tool-progress cycle command syncs it onto the live agent. Fixes #33860.	2026-06-08 11:29:53 -07:00
Robert Ban	4129092fda	fix(cli): strip OSC 8 hyperlink sequences in ChatConsole output prompt_toolkit's ANSI parser does not handle OSC escape sequences (\x1b]...\x07 / \x1b]...\x1b\), which caused Rich's [link=...] markup to leak raw OSC 8 payload into the banner title after /clear. Added _OSC_ESCAPE_RE to strip OSC sequences in ChatConsole.print() before routing through _cprint(). CSI/SGR color sequences are preserved. Visible text between OSC sequences is kept intact.	2026-06-08 11:29:53 -07:00
liuhao1024	8e4c447e5f	fix(gateway): prevent duplicate user messages in state.db When the agent has its own SessionDB reference (_session_db is not None), _flush_messages_to_session_db() persists user messages to SQLite during the agent run. Two gateway fallback paths also wrote the same user message without skip_db=True, creating duplicate entries in state.db: 1. agent_failed_early path (transient 429/timeout failures) 2. not-new-messages path (history_offset >= len(messages) edge case) Move agent_persisted flag definition to before the if/elif/else block so all paths can use it, and pass skip_db=agent_persisted to every fallback append_to_transcript() call. Fixes #42039	2026-06-08 11:29:53 -07:00
brooklyn!	9b1e0d6f70	feat(desktop): assignable themes per profile (#42286 ) * feat(desktop): assignable themes per profile The desktop skin was a single global preference, so every profile shared one look. Make the theme assignment per profile: picking a theme assigns it to the profile that's currently live, and switching profiles paints that profile's own skin. A profile with no assignment inherits the global default, so single-profile installs and existing setups are unchanged. - themes/context.tsx: per-profile skin record in localStorage; ThemeProvider follows $activeGatewayProfile; boot paint uses the last active profile's theme to avoid a flash on a non-default relaunch; setTheme assigns to the live profile (default profile also seeds the legacy global fallback). - settings/appearance-settings.tsx: caption noting the theme is saved per profile, shown only when more than one profile exists. - i18n: themeProfileNote string across en/zh/zh-hant/ja. - themes/profile-theme.test.ts: resolution + inheritance coverage. * feat(desktop): make light/dark mode per profile too The command palette / theme picker sets skin + mode together on each pick, so leaving mode global meant a profile couldn't actually remember the full look it was given (e.g. "Ember Dark" in one profile would render Ember Light if another profile last flipped the global mode). Mirror the per-profile skin record for light/dark mode: ThemeProvider resolves and applies the active profile's mode on switch, the boot paint uses it, and setMode assigns to the live profile (default profile also seeds the legacy global mode fallback). * refactor(desktop): collapse per-profile skin/mode into one helper Skin and mode were near-identical resolve/assign pairs with hand-rolled try/catch around localStorage. Fold both into a single profilePref<T> factory (resolve + assign, default profile seeds the legacy global) and lean on storedString/persistString for the error-swallowing. Tests go table-driven over both prefs since they share one contract. No behavior change; -89 LOC. * refactor(desktop): treat default profile as the global slot directly "default" isn't a real profile — it is the legacy global value. Stop double-writing (record['default'] + global) on assign; route default straight to the global. resolve is unchanged: a profile with no record entry already falls back to the global, so default reads it for free.	2026-06-08 17:42:17 +00:00
brooklyn!	395ed91891	fix(desktop): keep a just-finished session visible after switching away (#42285 ) A brand-new session's first turn persists to the SessionDB a beat after the gateway emits message.complete, so a refresh fired in that window gets a listSessions(min_messages=1) page that omits the new row. sessionsToKeep() already shields the active chat from this race, but a session you started and then navigated away from is — at the next refresh — neither working, pinned, nor active, so mergeSessionPage() evicts it. Nothing re-fetches afterward, so it stays gone until the app restarts. Track sessions whose turn just settled (a real working->idle transition) in a short, auto-expiring grace window and add them to the merge keep-set. This bridges the persist race for non-active chats without resurrecting deleted rows (mergeSessionPage only revives rows still in the in-memory list, which optimistic delete/archive already drop). Repro: start a new chat, send a message, then click another session before the reply lands — the new session vanishes from the sidebar.	2026-06-08 12:32:27 -05:00
kshitij	a38003be3d	Merge pull request #42143 from kshitijk4poor/salvage/tui-slash-worker-leak-35626	2026-06-08 10:07:18 -07:00
teknium1	365813a72b	fix: resolve rebase conflict in _teardown_session worker cleanup Main folded slash_worker.close() into _finalize_session (the single _finalized-guarded chokepoint) while #42143 was open. The rebase conflicted with the PR's worker-close in _teardown_session. Keep both — they target the same #38095 leak and _SlashWorker.close() is idempotent (_closed/poll()-guarded) — so callers reaching _teardown_session without the real _finalize_session (and the PR's own tests, which monkeypatch _finalize_session out) still reap the worker. Same for _shutdown_sessions, now routed through the unified _close_session_by_id funnel.	2026-06-08 10:02:05 -07:00
firefly	ae94ed1728	fix(tui-gateway): reap leaked slash_worker sessions on disconnect + active_list liveness (re-scoped onto current main) Salvaged from #35626 (banditburai) and re-scoped after maintainers landed the parent-death watchdog (slash_worker.py) and PTY process-group teardown (pty_bridge.py) directly on main. Those pieces are intentionally NOT included here — this carries only what is still missing: - C1 disconnect reap: ws.py's `finally` only re-pointed the dead transport at stdio. `_close_sessions_for_transport` now reaps `close_on_disconnect` sessions and schedules the grace-reap for the rest, offloaded via `asyncio.to_thread` so the blocking worker.close() + DB write never stalls the uvicorn loop. - C2 create/close orphan race: `_attach_worker` stores the worker iff `_sessions.get(sid) is session` under the lock (else closes it), applied at every spawn site incl. the post-turn `_restart_slash_worker`. - Single idempotent teardown funnel: session.close, WS disconnect, the generous-TTL idle reaper, shutdown, and the WS grace-reap all reach `_close_session_by_id` → `_teardown_session`; `_finalized`/`_closed` flags make concurrent/double teardown a no-op. `_sessions_lock` upgraded to RLock. - uvicorn `ws_ping_interval/timeout=20s` so a half-open socket (reverse-proxy 524) becomes a `WebSocketDisconnect` and the C1 path runs. Plus two review-driven hardening fixes (mine): - `session.active_list` now skips `_finalized` sessions so the footer "N sessions" count reflects attachable sessions instead of only ever growing until restart (#38950). Keys on `_finalized` only, NOT the stdio sentinel, so a standalone `hermes --tui` session stays visible. - `_schedule_ws_orphan_reap._reap` pops via `_close_session_by_id` (under `_sessions_lock`) instead of `_sessions.pop` under the unrelated `_session_resume_lock` (#39591); the resume_lock now only guards the orphan re-check against `session.resume`. - Float env knobs (`HERMES_SLASH_WATCHDOG_*`, `HERMES_TUI_SESSION_TTL_S`) parse with a fallback helper so a malformed value can't crash the worker at import. Fixes #32377 Fixes #38950 Addresses #22855 Co-authored-by: banditburai <123342691+banditburai@users.noreply.github.com> Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-06-08 10:02:05 -07:00
Teknium	9c9d9113a8	fix(auth): auto-detect OpenRouter credential from the pool, not just env (#42263 ) resolve_provider() auto-detection only checked OPENROUTER_API_KEY/ OPENAI_API_KEY env vars, never the credential pool. A key added via `hermes auth add openrouter` (manual pool entry, no env var) was invisible: the provider failed to resolve or resolved with an empty api_key, so requests went out with no Authorization header and OpenRouter returned "HTTP 401: Missing Authentication header" while `hermes auth list` showed the credential. Closes #42130. - auth.py: check load_pool("openrouter").has_credentials() after the env check - dump.py: `debug share` shows 'openrouter set (auth pool)' instead of the misleading 'not set' when the key lives in the pool - add regression tests (pool credential auto-detects; empty pool still raises)	2026-06-08 10:01:47 -07:00
brooklyn!	de80d28f38	fix(desktop): require session ids for scoped gateway events (#42178 ) * fix(desktop): require session ids for scoped gateway events Drop unscoped stream, tool, and subagent events in the desktop renderer so async activity cannot attach to whichever chat is currently focused. * fix(desktop): preserve unscoped session info events Keep session.info out of the scoped-event drop list so global desktop runtime broadcasts still initialize UI state before a session is active.	2026-06-08 09:50:48 -07:00
teknium1	a77efada5f	refactor(cli): extract 18 model-flow wizard functions into model_setup_flows (god-file Phase 2) Lift the 18 _model_flow_* provider-setup wizard functions out of hermes_cli/main.py into hermes_cli/model_setup_flows.py. Behavior-neutral; main.py 14050 -> 11479 LOC. select_provider_and_model (the dispatcher) STAYS in main.py and re-imports the flows via an explicit 'from hermes_cli.model_setup_flows import (...)' block, so both its bare-name calls and existing test monkeypatches targeting hermes_cli.main._model_flow_* keep resolving against main's namespace unchanged. Imports: 3 neutral deps (argparse, os, subprocess) at the module top; the 14 main.py-internal helpers the flows call (_prompt_api_key, _save_custom_provider, the reasoning-effort/stepfun/qwen helpers, _run_anthropic_oauth_flow, ...) are lazy-imported per-flow (from hermes_cli.main import ...) so the new module never imports main at module scope -> no import cycle. Repointed one source-inspection change-detector (test_setup_ollama_cloud_force_refresh) to read the module the ollama-cloud branch moved to. Validation: 6563/6563 hermes_cli tests pass; live flow-dispatch probe confirms the lazy main-internal imports resolve at runtime.	2026-06-08 09:42:44 -07:00
teknium1	55b83c3d99	refactor(agent): extract run_conversation post-loop tail into finalize_turn (god-file Phase 1) Lift the post-loop finalization tail out of run_conversation into agent/turn_finalizer.py:finalize_turn. Behavior-neutral; run_conversation 4204 -> 3846 LOC, conversation_loop.py 4578 -> 4220. The region (everything after the main tool-calling while loop): budget-exhaustion summary, trajectory save, session persist, turn diagnostics, response transforms, result-dict assembly, steer drain, and the memory/skill review trigger. Lifted verbatim into a synchronous single-return free function; the 12 post-loop locals it reads are passed as keyword args and the assembled result dict is returned to run_conversation (which returns it to the caller). All agent.* side effects fire exactly as before. Imports: os + _summarize_user_message_for_log at module top; logger lazy from agent.conversation_loop (preserves the gateway... err 'agent.conversation_loop' logger name, no import cycle). Validation: 1609/1609 tests/run_agent/ pass; live PTY agent turn PASS.	2026-06-08 09:42:23 -07:00
teknium1	a706a349b5	refactor(gateway): extract authorization cluster into GatewayAuthorizationMixin (god-file Phase 3) Lift the 4 inbound-message authorization methods out of GatewayRunner into gateway/authz_mixin.py:GatewayAuthorizationMixin. Behavior-neutral; gateway/run.py 16200 -> 15812 LOC. Methods moved (~389 LOC): _is_user_authorized, _get_unauthorized_dm_behavior, _adapter_dm_policy, _adapter_enforces_own_access_policy. The two adapter-policy helpers are private to _is_user_authorized, so the cluster is fully self-contained (zero outside-cluster self.method calls after the lift). All self.* calls resolve unchanged via the MRO (GatewayRunner(GatewayAuthorizationMixin, ...)). Import split: 6 neutral deps (os, Optional, Platform, SessionSource, the two whatsapp_identity helpers) at the mixin module top; the module-level logger is imported lazily inside _is_user_authorized (from gateway.run import logger) so the mixin never imports gateway.run at module scope -> no cycle. The lazy import preserves the exact logger name (gateway.run) so log records are unchanged.	2026-06-08 09:42:02 -07:00
teknium1	094aa85c37	refactor(cli): extract agent-construction cluster into CLIAgentSetupMixin (god-file Phase 4) Lift the 5 agent-construction/session-resume methods out of HermesCLI into hermes_cli/cli_agent_setup_mixin.py:CLIAgentSetupMixin. Behavior-neutral; cli.py 14139 -> 13492 LOC. Methods moved (~647 LOC): _ensure_runtime_credentials, _resolve_turn_agent_config, _init_agent, _preload_resumed_session, _display_resumed_history. All self.* calls resolve unchanged via the MRO (HermesCLI(CLIAgentSetupMixin, CLICommandsMixin)). Import split (same recipe as #41942): 2 neutral deps (sys, _escape) imported at the mixin module top; 12 cli.py-internal helpers/constants (AIAgent, ChatConsole, CLI_CONFIG, _cprint, _DIM, _RST, _accent_hex, ...) imported lazily per-method (from cli import ...) so the mixin never imports cli at module scope -> no cycle. Repointed one source-inspection change-detector (test_callable_api_key.py) to read the mixin file where the method now lives.	2026-06-08 09:41:34 -07:00
qWait	cef00ae602	fix(tui): handle Windows PTY stdin and detached WS frames (#41953 ) Two narrow Windows desktop fixes: 1. tools/process_registry.py — PTY stdin writes are now platform-aware. pywinpty (Windows) expects str; ptyprocess (POSIX) expects bytes. Previously bytes was unconditionally passed, producing a TypeError on Windows ("'bytes' object cannot be converted to 'PyString'"). 2. tui_gateway/server.py + ws.py — Detached WebSocket sessions now park on a _DropTransport sink instead of _stdio_transport. In the desktop the gateway runs in-process and stdout is captured by Electron into desktop.log, so falling back to stdio leaked raw JSON-RPC frames into the desktop log after WS disconnects. Orphan-reap semantics are preserved via _ws_session_is_orphaned. Verified on a Windows desktop install: - pywinpty 2.0.15 rejects bytes / accepts str — reproduced exactly - Focused suite green (write_stdin × 2, write_json_drops_detached_ws_frames, ws_orphan_reap × 2) - All 6 CI test shards green, e2e green, nix (ubuntu/macos) green Salvage commit (`21be7ca`) fixes the new test referencing an undefined _ThreadUnsafeStdout — uses the existing _ChunkyStdout helper.	2026-06-08 09:41:20 -07:00
Teknium	74744795af	docs(tui): correct HERMES_TUI_GATEWAY_URL — dashboard-internal, not remote-attach (#42162 ) The TUI docs presented HERMES_TUI_GATEWAY_URL + /api/ws as a supported 'attach the TUI to a standalone running gateway' workflow. It isn't. /api/ws exists only inside the dashboard's FastAPI server (hermes_cli/web_server.py), which spawns its own embedded TUI child and injects the var as an internal wiring detail. The OpenAI-compat API server (api_server platform) deliberately does not serve /api/ws, so the documented ws://host:port/api/ws workflow 404s — the cause of #32882 and the two PRs (#32904, #32955) that tried to add the route to the wrong surface. Rewrites the section in en + zh-Hans to describe the var accurately and point users at shared state.db / dashboard embedded chat for multi-surface session sharing.	2026-06-08 09:37:03 -07:00
Teknium	399b8ee5f0	fix(anthropic): strip Responses-only kwargs before Messages SDK call (#31673 ) (#42155 ) A Responses-API-shaped payload carrying instructions=/input=/store=/ parallel_tool_calls= can reach the native Anthropic messages.stream() / messages.create() call under a rare api_mode-flip race (e.g. a concurrent auxiliary vision call mutating a shared agent between the kwargs build and the stream dispatch). The Anthropic SDK rejects these with a non-retryable TypeError that kills the whole turn and propagates the entire fallback chain. Add sanitize_anthropic_kwargs() at both Anthropic dispatch sites: it drops the Responses-only keys in place and logs a WARNING (with #31673 breadcrumb) when one is present, so the underlying race stays visible in the wild instead of being silently papered over.	2026-06-08 09:36:38 -07:00
Teknium	47d5177a7d	fix(plugins): thread-safe lazy-singleton helpers; fix honcho TOCTOU (#24759 ) (#42150 ) * fix(plugins): add thread-safe lazy-singleton helpers, fix honcho TOCTOU (#24759) get_honcho_client() and fal's _load_fal_client() used unlocked check-then-init: racing threads both ran the expensive build and the loser's client (open connection) leaked. Rather than one-off locks, add plugins/plugin_utils.py with two reusable primitives every plugin author can drop in: - lazy_singleton: decorator for zero-arg accessors - SingletonSlot: manual slot for config-keyed accessors (first wins) Both use double-checked locking; factory runs at most once; failed builds aren't cached. honcho is the reference consumer; fal's sibling TOCTOU gets a matching double-checked lock. Plugin dev guide documents the pattern so future plugins don't reintroduce the race. Closes #24759 * test(honcho): update reset test for SingletonSlot internals test_reset_clears_singleton poked the removed _honcho_client module global directly. Assert through the slot's public peek() surface instead, matching the #24759 refactor.	2026-06-08 09:35:22 -07:00
yoniebans	74239b4942	i18n(desktop): translate backend update apply status messages Two independent reviewers flagged that applyBackendUpdate's in-progress and error messages were inline English while the rest of the update overlay is i18n'd. Move them into updates.applyStatus (preparing/pulling/restarting/ notAvailable/failed/noReturn) across en, ja, zh, zh-hant + types.	2026-06-08 08:58:26 -07:00
yoniebans	b000e05b11	fix(desktop): don't claim the backend update succeeded when it never returns The no-return error said 'Backend updated but did not come back online' — but once the connection drops the client can't know the update's exit code, only that it was started and the backend is unreachable. Reword to not overclaim: the update may not have completed.	2026-06-08 08:58:26 -07:00
yoniebans	cd030f5f40	fix(desktop): close the backend update overlay on success; error on no-return Three rough edges in the remote backend apply flow: - On success the overlay dropped to IDLE, briefly re-rendering the pre-install 'update available' view and then the generic 'you're all set' before settling. Close the overlay outright once the backend is confirmed back instead of bouncing through the idle view. - If the backend never came back (a failed restart), the flow still reported success. waitForBackendReturn now returns whether the backend answered; finishBackendApply surfaces an error when it didn't. - The up-to-date copy said 'you're running the latest version', conflating client and backend. Backend target now reads 'the backend is running the latest version' — the client's own version is a separate pill.	2026-06-08 08:58:26 -07:00
yoniebans	81647458c7	fix(desktop): recover the backend update overlay after the remote restarts The backend Install path set stage:'restart' and stopped — in remote mode no boot-progress events arrive to carry the overlay to done, so it sat on the restarting spinner until a manual reload while the backend had already come back. Poll the backend until it answers again, then clear the overlay and refresh the backend status. Target-aware applying copy explains the remote restart + auto-reconnect instead of the local-updater-window wording. Also switch the apply poll sleeps from window.setTimeout to globalThis.setTimeout so the flow is exercisable off the renderer.	2026-06-08 08:58:26 -07:00
yoniebans	9b2a64fa6a	fix(desktop): reflect env-override remote in gateway connection state HERMES_DESKTOP_REMOTE_URL forces a remote connection but never writes connection.json, so the gateway panel read mode/url from persisted config and mislabelled an env-remote session as local with no url.	2026-06-08 08:58:26 -07:00
yoniebans	47518bc913	fix(desktop): check backend updates when the connection becomes remote The poller starts at mount, before the gateway connects, so its initial checkBackendUpdates() ran while mode was still unset and no-op'd via the remote-mode guard — leaving the backend button empty until the user clicked it. Subscribe to $connection and re-check the backend when mode resolves to remote.	2026-06-08 08:58:26 -07:00
yoniebans	cfaa46fcae	fix(desktop): pre-check backend updates in poller; client button first Two follow-ups from testing the two-button bar: - The background poller and focus handler only checked the client, so the backend behind-count and changelog stayed empty until the user opened the overlay — and the overlay's first render then hit the empty-commits fallback ('Improvements and fixes') instead of the real changelog. Check the backend alongside the client on poller start, interval, and focus so its state is ready before the button is clicked. - Order the status bar client-first, backend-second.	2026-06-08 08:58:26 -07:00
yoniebans	56be1a63a3	fix(desktop): split client and backend into two distinct update buttons The status bar merged both versions into one pill with a single click target, so there was no way to tell which artifact an update acted on — and the apply path was overloaded by connection mode. Separate them: - store: independent client (checkUpdates/applyUpdates) and backend (checkBackendUpdates/applyBackendUpdate) flows with their own status/apply atoms; openUpdateOverlayFor(target) drives the overlay. - status bar: two buttons — client vX (always) and backend vY (+N) (remote only), each with its own behind-count, opening the overlay for its target. - overlay: reads the active target's atoms; install/check route per target. Removes the version-bar merge helper (no longer merging the two versions).	2026-06-08 08:58:26 -07:00
yoniebans	9c264555b0	fix(desktop): name the update target in the overlay; honest no-changelog copy The updates overlay showed generic 'New update available / improvements and fixes' with no indication of whether it was updating the client or the backend. In remote mode it now reads 'Backend update available' and names the connected backend, and when there's no commit changelog (e.g. pip/non-git backend) it degrades to honest 'release notes aren't available for this install type' copy instead of filler. Copy selection extracted to a pure resolveUpdateCopy() helper (unit-tested); threads target ('client'\|'backend') from connection.mode through the overlay.	2026-06-08 08:58:26 -07:00
yoniebans	87ac7cac13	fix(dashboard): log update changelog against origin/main, not @{upstream} The behind-count (banner._check_via_local_git) measures HEAD..origin/main, but _recent_upstream_commits logged HEAD..@{upstream}. On a feature-branch checkout @{upstream} is the branch's own tip (0 commits), so the changelog came back empty while behind>0 — the overlay then showed generic filler instead of what changed. Pin the commit range to origin/main so count and changelog agree. Verified against a checkout 11 behind origin/main: now returns 11 commits.	2026-06-08 08:58:26 -07:00
yoniebans	64da518db4	feat(desktop): remote update overlay sourced from backend In remote mode, checkUpdates()/applyUpdates() branch on connection.mode and drive the existing updates overlay from the connected backend instead of the local Electron git bridge: - checkUpdates -> GET /api/hermes/update/check, mapped onto DesktopUpdateStatus (behind, commits, supported=can_apply, message). The overlay renders the commit list as 'what's changed' and shows guidance (not Install) when the backend install can't self-apply (docker/nix). - applyUpdates -> POST /api/hermes/update (the proven command-center path), polling the action to completion and handling the expected mid-update connection drop as the restart phase. Local mode is unchanged. Adds checkHermesUpdate() to hermes.ts and a BackendUpdateCheckResponse type.	2026-06-08 08:58:26 -07:00
yoniebans	ed1e2533b7	feat(desktop): show client and backend versions in status bar when remote In remote thin-client mode the Electron client and the backend it connects to are separate installs that drift independently. The status bar previously showed only the client version, hiding skew (e.g. client 0.15.1 talking to backend 0.16.0 looked fine). Add a pure resolveVersionBar() helper (unit-tested) that, gated on connection.mode === 'remote', renders both 'client vX · backend vY' from the desktop appVersion and StatusResponse.version, and flags skew. Local mode is byte-identical to before. Wire it into the status-bar version item.	2026-06-08 08:58:26 -07:00
yoniebans	2284147044	docs: document commits field on /api/hermes/update/check	2026-06-08 08:58:26 -07:00
yoniebans	9e360681f8	feat(dashboard): return recent commits from /api/hermes/update/check Add a best-effort `commits` list (sha/summary/author/at) to the update-check response for git/pip installs that are behind upstream, so the desktop's remote update overlay can show what's changed before applying. Additive and non-breaking: existing consumers (legacy dashboard, tests using subset assertions) ignore the new field. Leaves the shared check_for_updates() int contract untouched — commits come from a separate best-effort git call.	2026-06-08 08:58:26 -07:00
Teknium	fd1e7c2bc3	fix(tui): install the process.on('exit') terminal-mode backstop (#42165 ) #19194's fix added process.exit(0) to die()/dieWithCode() with a comment relying on a process.on('exit') handler in entry.tsx that resets terminal modes — but that handler was never installed. So /quit, Ctrl+C, Ctrl+D and every process.exit() path left DEC mouse tracking (?1000/1002/1003/1006) armed in the parent shell. The terminal then kept emitting mouse reports into stdin — read as keystrokes by the shell or a freshly relaunched TUI — surfacing as ...;...M garbage in the input box. Install the missing handler. 'exit' fires once on real termination and runs synchronous code only; resetTerminalModes() writes via writeSync, so the disable sequence lands before the process is gone. Fixes #28419	2026-06-08 08:21:19 -07:00
Siddharth Balyan	7230fcb7f2	revert(nix): drop the cp patchPhase workaround from #41867 (#42151 ) #41867 replaced mkNpmPassthru's patchPhase with `cp $npmDeps/package-lock.json package-lock.json`, on the theory that prefetch-npm-deps strips advisory fields (engines/os/cpu) from the cache lockfile. That diagnosis was wrong. prefetch-npm-deps copies the lockfile into the cache verbatim (prefetch-npm-deps/src/main.rs reads it and writes it unchanged). Building the cache fresh from the current root lockfile yields exactly the pinned npmDepsHash, and that cache's package-lock.json is byte-identical to the source (740 "engines" blocks on each side). With the hash correct, npmConfigHook's consistency check passes on its own — verified by building .#tui and .#default green with this (original) patchPhase. So the cp was unnecessary, and worse: it bypasses the consistency check wholesale, silently masking a genuinely stale npmDepsHash (a lockfile that changed without its hash being refreshed) instead of failing loudly. The original patchPhase keeps the check meaningful while still handling the one real cosmetic difference it was written for (trailing newlines); stale-hash drift is caught by the npmDepsHash itself plus the auto-fix workflow. Keeps the fix-lockfiles real-build verification and the nix-lockfile-fix.yml file-path fix from #41867 — only the patchPhase cp is reverted.	2026-06-08 20:29:41 +05:30
Siddharth Balyan	4219a91df5	fix(nix): make config.yaml group-writable under addToSystemPackages (#41940 ) addToSystemPackages exports HERMES_HOME system-wide and puts the hermes CLI on interactive users' PATH, so those users (in the hermes group) share the gateway's state — that's the option's whole purpose. But the activation script wrote config.yaml as 0640 (group read-only), so an interactive user saving a setting via the CLI/TUI hit: error: [Errno 13] Permission denied: '/var/lib/hermes/.hermes/config.yaml' Make the mode conditional: 0660 when addToSystemPackages is set (group hermes can write), else the previous 0640. .env stays 0640 either way — it holds secrets, not user-facing settings. The config merge already preserves user-added keys across rebuilds, so this simply lets interactive hermes-group users actually make those edits. Verified by evaluating the module's activation script for both option values: addToSystemPackages=true -> chmod 0660, false -> chmod 0640.	2026-06-08 20:10:47 +05:30
Teknium	a3fca26c56	fix(tui): close slash_worker inside _finalize_session (defense-in-depth, #38095 ) (#42149 ) Fold the slash-worker subprocess close into _finalize_session itself — the single _finalized-guarded session-end chokepoint — instead of relying on each caller (_teardown_session, _shutdown_sessions) to close it separately. A future code path that finalizes a session directly can no longer reintroduce the #38095 worker leak. Idempotent: _SlashWorker.close() is poll()-guarded and _finalize_session short-circuits on _finalized, so the existing teardown paths are unaffected. Drops the now-redundant separate close() in _shutdown_sessions. Note: the active leak this issue reported was already fixed on main (WS-orphan reaper #38591, _restart_slash_worker close, atexit shutdown). This addresses the residual defense-in-depth gap the reporter correctly identified in their follow-up comment.	2026-06-08 07:26:05 -07:00
Teknium	5e06c9ffef	fix(agent): clear _session_messages in AIAgent.close() (#42123 ) close() is the hard teardown for true session boundaries (/new, /reset, session expiry). It already closes the OpenAI client and child agents but left the conversation-history list intact. Mirror the soft-eviction path (_release_evicted_agent_soft clears _session_messages) so a held reference to a closed agent — e.g. a draining background task — doesn't pin tens of MB of tool outputs until the agent object itself is collected.	2026-06-08 07:03:39 -07:00
teknium1	cb13723f53	fix(pty-bridge): mark os.killpg/getpgid windows-footgun-ok (POSIX-only module)	2026-06-08 07:03:12 -07:00
teknium1	8cb1908e18	chore: map paulb26 in AUTHOR_MAP for #24135 salvage	2026-06-08 07:03:12 -07:00
firefly	8b6a8f667d	feat(slash-worker): self-terminate on parent death via create_time watchdog Daemon thread polls _is_orphaned (original ppid check + psutil create_time PID-reuse guard, no PR_SET_PDEATHSIG). On orphan, drains an in-flight command up to a grace window then os._exit(0). Started before the HermesCLI build to cover the spawn window. Task: swl-qrf.8	2026-06-08 07:03:12 -07:00
paulb26	b31c6c33b2	fix(pty-bridge): terminate PTY process groups on teardown	2026-06-08 07:03:12 -07:00
Teknium	e9c1e757fe	fix(gateway): release evicted agent clients to stop RSS leak (#29298 ) (#41974 ) _evict_cached_agent (the chokepoint for /new, /model, /undo, session resets — 17 call sites) only popped the cache entry, dropping the AIAgent reference without releasing its httpx client pool. AIAgent holds reference cycles (callbacks, tool state) so CPython refcounting does not free the client promptly; under steady gateway traffic the held sockets + buffers accumulate and RSS climbs (the leak class behind Now the chokepoint pops AND schedules a soft release_clients() on a daemon thread (mirrors the cap-enforcer / idle-sweeper). Soft release frees the client pool + per-turn child subagents but preserves the session's terminal sandbox / browser / bg processes for resumption. Mid-turn agents are skipped so a running request is never torn down. Also fixes the no-lock branch which previously never popped at all.	2026-06-08 06:44:51 -07:00
Michael Steuer	3d029a53ec	fix(gateway): close residual memory-leak sites under heavy scheduled workload Long-lived gateways under heavy cron/build workloads grow steadily (~18 MB/hr post-phantom-dispatch-fix) and eventually need a restart-or-OOM. Four retention sites, all confirmed live on current main: 1. _evict_cached_agent() (/model, /reasoning, codex-runtime, /undo, etc.) popped the cache entry without releasing the agent's OpenAI client, httpx transport, SSL context, or conversation history. Only /new cleaned up first. Now releases clients on a daemon thread, matching _enforce_agent_cache_cap. 2. _release_evicted_agent_soft() now clears _session_messages after release_clients() — tool outputs (file reads, terminal output, search results) can be tens of MB per 100+-tool-call session; the list is rebuilt from persisted session JSON on resume, so dropping it on soft eviction is safe. 3. The session-expiry watcher (permanent finalization) now drops the session's per-session control dicts (_session_model_overrides, _session_reasoning_overrides, _pending_approvals, _update_prompt_pending, _pending_model_notes). These leaked one entry per session per gateway lifetime. NOTE: this is the session-finalize path, NOT idle agent-cache eviction — an idle-evicted session is still alive and rebuilds its agent from these overrides, so pruning them there would silently reset a user's /model choice. 4. _tool_defs_cache is now bounded (_TOOL_DEFS_CACHE_MAX=8) with oldest-first eviction instead of growing unboundedly across the distinct toolset/config fingerprints a gateway sees over its lifetime. Salvaged from #25318 by Michael Steuer (@mssteuer); fix 3 redirected from the idle-sweep to the session-finalize lifecycle, magic number 8 lifted to a named constant, test ported. Fixes #19251 Co-authored-by: Michael Steuer <michael@make.software>	2026-06-08 06:32:42 -07:00
teknium1	400e6e43ca	test(gateway): de-flake concurrent-compression lock test with a barrier test_concurrent_compressions_same_session_serialize relied on a time.sleep(0.25) inside the stubbed compressor to make the two threads overlap inside the per-session lock window. Under CI CPU starvation that sleep is insufficient: one thread can acquire -> compress -> rotate -> RELEASE the lock before the other reaches try_acquire, so both acquire on the shared session_id and both compress (the recurring 'Expected exactly one agent to compress, got 2' failure on shard test (1)). Replace the timing dependency with a threading.Barrier(2) wrapped around the shared db's try_acquire_compression_lock: both threads rendezvous immediately before the real (atomic) acquire, guaranteeing genuine simultaneous contention regardless of scheduling. The real lock logic is unchanged and still picks exactly one winner — this only fixes the test's overlap guarantee. Restored after join so the post-join lock-leak assertion hits the unwrapped method. Verified: 20/20 plain + 15/15 under all-core CPU stress (load avg ~4.6), where the old version flaked.	2026-06-08 06:32:23 -07:00
kshitij	b99c6c4277	Merge #42076 : nested category plugin discovery + alias-normalized enable/disable (#41066 ) Merge #42076: nested category plugin discovery + alias-normalized enable/disable (#41066) Lands the complete nested category plugin fix: - Discovery in `hermes plugins list` (from @islam666's #41076, carried in this PR) - Alias-normalized enable/disable mutation path so nested plugins can be toggled - Fixes the #41076 base breakages (web_server 6-tuple unpack + stale test fixtures) Co-authored work: discovery by @islam666 (#41076). Closes #41066.	2026-06-08 05:47:27 -07:00
kshitijk4poor	2b89afec79	fix(plugins): alias-normalize enable/disable for nested category plugins (follow-up to #41076 ) #41076 makes `hermes plugins list` discover nested category plugins (e.g. observability/nemo_relay). This adds the missing enable/disable mutation path so those plugins can actually be toggled, and fixes two incomplete-update breakages on the #41076 base. Before: `hermes plugins enable nemo_relay` -> "Plugin 'nemo_relay' is not installed or bundled." (exit 1), because cmd_enable/cmd_disable went through _plugin_exists(), which only checked top-level plugins/<name>/. Changes: - Add _resolve_plugin_key(): resolve a bare manifest/leaf name OR a full path-derived key (observability/nemo_relay) to the canonical key the runtime loader gates on, reusing #41076's _discover_all_plugins(). A bare leaf name ambiguous across two categories resolves to None rather than silently picking one. - cmd_enable/cmd_disable resolve first, persist the canonical key, and drop any stale legacy bare-name alias so the enabled/disabled lists can't drift into a contradictory state. _plugin_exists delegates to the same resolver. - Fix #41076 base breakages: _discover_all_plugins now returns 6-tuples, but web_server._merged_plugins_hub() still unpacked 5 (ValueError on the dashboard plugins-hub endpoint) and several test_plugins_cmd_list.py fixtures were still 5-tuples. Both updated; the hub status check is now key-aware. Verified e2e on the real CLI + runtime loader (isolated HERMES_HOME): `hermes plugins enable nemo_relay` writes observability/nemo_relay to config.yaml and the loader then loads it (enabled=True, error=None); a stale bare-name alias is cleared on disable; the dashboard _merged_plugins_hub() runs without crashing. Adds resolution + enable/disable tests; full tests/hermes_cli/test_plugins_cmd* + web_server plugin tests green. Follow-up to #41076 (#41066). Branched from that PR's head.	2026-06-08 17:57:37 +05:30
kshitij	c3055d6185	Merge pull request #41984 from kshitijk4poor/salvage/6600-stale-streaming-worker fix(gateway): transcribe voice messages during active agent runs (salvage #6600, voice half)	2026-06-08 02:51:25 -07:00
kshitijk4poor	f96eb857a5	chore: add kristianvast to AUTHOR_MAP	2026-06-08 15:16:20 +05:30
Kristian Vastveit	d55304c39f	fix(gateway): transcribe voice messages during active agent runs Salvaged from #6600 (@kristianvast) — re-scoped to the voice half only and rebased onto current main. The cascading-interrupt hang half of the original PR landed independently in `dd0d1222a`, so this carries ONLY Problem 1. When a voice/audio message arrives while the agent is busy on the same session, it hit the interrupt path with empty text because STT only ran after the running-agent guard — the voice was effectively lost. Now we transcribe audio BEFORE signaling the agent (and on the fresh-message path), echo the raw transcript back to the user (🎙️), and _enrich_message_with_transcription returns (text, transcripts) so callers can echo. A new _dequeue_pending_with_transcription drives the post-agent drain the same way. Reapplied onto _prepare_inbound_message_text (inbound enrichment was extracted from the inline dispatch block since the original PR). Co-authored-by: Kristian Vastveit <kristian@agrointel.no>	2026-06-08 15:16:20 +05:30
teknium1	00c46b8ff9	test(tui): cover heapdump opt-in gate + retention; add AUTHOR_MAP On-disk vitest coverage for the auto-heapdump disk-safety guard: opt-in gating (suppressed diagnostics-only path), truthy-spelling acceptance, manual-trigger passthrough, and the retention prune. Test approach adapted from #21780 (briandevans) and #21822 (LeonSGP43), reconciled to the merged gate semantics. Maps alarcritty into AUTHOR_MAP for CI.	2026-06-08 02:20:49 -07:00

1 2 3 4 5 ...

11051 commits