Commit graph

12474 commits

Author SHA1 Message Date
valentt
77fdbbfe81 fix(whatsapp): validate bridge PID identity before killing stale pidfile entry
`_kill_stale_bridge_by_pidfile` SIGTERMed the PID recorded in `bridge.pid`
after only a bare liveness check. Once the bridge exits and is reaped the
kernel recycles that PID onto an unrelated process; because the WhatsApp bridge
crash-loops ("Bridge process died (exit code 1)" repeating), this cleanup ran
on every restart and could SIGTERM a recycled PID that had landed on the user's
browser — closing Firefox at irregular intervals with no crash and no coredump
(a clean kill of a stranger).

Same PID-recycling class as the MCP reaper (7bd1f8a2d) and the process-registry
host-PID guard (e6a99cef2); this was the third, and most actively-fired, path.

Fix: `_write_bridge_pidfile` now also records the leader's kernel start time
(line 2). `_kill_stale_bridge_by_pidfile` re-validates identity via
`_bridge_pid_is_ours` before signalling — the (pid, start time) pair must match,
or for legacy single-line pidfiles the live cmdline must name `node` + this
session's unique path. A recycled PID (different start time / cmdline) is logged
and skipped, never signalled. Legacy pidfiles stay readable.

Adds TestWhatsappBridgePidfile: real-process tests proving a genuine bridge is
reaped while a recycled PID (start-time mismatch, or non-bridge cmdline) is
spared. 7 new + 108 gateway/registry tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 17:23:33 -07:00
valentt
e447723149 fix(process-registry): re-validate PID identity before killing host processes
The background-process registry signalled host PIDs (recovery adoption,
detached-session kill, tree-kill) using a number captured at spawn, guarded
only by a bare liveness check. Once a session's process exits and is reaped the
kernel recycles that PID onto an unrelated process, so an alive-but-different
PID passed the check and got tree-killed.

Observed in the wild: a recycled background-session PID landed on Firefox's
session leader; a later kill/refresh walked its process tree and SIGTERMed
every tab — Firefox "closing" at irregular intervals with no crash/coredump.

This is the same PID/PGID-recycling class fixed for the MCP orphan reaper in
7bd1f8a2d, but the process_registry subsystem was never guarded — so the bug
persisted.

Fix: record each host process's kernel start time (/proc/<pid>/stat field 22)
at spawn, persist it in the checkpoint, and re-validate it before every signal
via `_host_pid_is_ours`. A PID whose start time no longer matches — or that is
gone — is never signalled:
  - recover_from_checkpoint: a recycled PID is not adopted as a session.
  - _refresh_detached_session: a recycled detached PID is marked exited.
  - kill_process / _terminate_host_pid: refuse to tree-kill a stranger.
Legacy checkpoints and platforms without /proc (no baseline) degrade to the
prior best-effort liveness behaviour, so nothing else changes.

Adds TestPidReuseGuard: real-process tests proving a mismatched start time
refuses termination while a matching one still kills, plus recovery/refresh
recycling paths. 74 registry + 22 MCP-stability tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 17:23:33 -07:00
Teknium
84e1d31e54
refactor(kanban): fold worker/orchestrator skills into injected guidance (#50473)
The kanban-worker and kanban-orchestrator bundled skills existed only to
be force-loaded into dispatcher-spawned workers, gated by
environments:[kanban] so they wouldn't leak into normal CLI listings.
That gating was fragile (the leak that #50443 patched) and the
--skills auto-load was already best-effort — most workers ran without it
because the bundled skill isn't present in profile-scoped skills dirs.

Remove the skills entirely and promote their load-bearing content
(workspace kinds, deliverable artifacts, created-card integrity, profile
discovery) into KANBAN_GUIDANCE, which is already injected into every
kanban worker's system prompt. Net result: every worker reliably gets
the guidance, nothing can leak into a CLI/blank-slate session, and the
gating machinery is gone.

- agent/prompt_builder.py: promote the 4 load-bearing rules into KANBAN_GUIDANCE
- hermes_cli/kanban_db.py: drop --skills kanban-worker auto-injection + _kanban_worker_skill_available probe
- hermes_cli/kanban_swarm.py: drop skills=[kanban-orchestrator] on the root card
- hermes_cli/kanban.py: drop kanban-init skill seeding; fix help text
- delete skills/devops/kanban-{worker,orchestrator}
- docs: delete the two skill pages (EN+zh), fix sidebars/catalog/kanban.md/kanban-worker-lanes.md and the video-orchestrator + codex-lane references
- tests: update spawn-argv expectations; re-bound the guidance-size guard

Supersedes the skill-leak half of #50443 (credit @helix4u for flagging the area).
2026-06-21 17:06:48 -07:00
Carl
e5e2583635 fix(desktop): relaunch on Linux after in-app update instead of hanging (#45205)
On a Linux source install the in-app updater ran the full backend update +
desktop rebuild successfully but never restarted the app — it hung forever on
the applying overlay with no close button. Two causes:

- applyUpdatesPosixInApp() only handled the macOS .app bundle swap;
  runningAppBundle() is null off macOS, so Linux fell through to
  { ok: true, backendUpdated: true } without ever relaunching.
- The renderer store had no terminal state for that result shape, so
  $updateApply stayed { applying: true } and the overlay's close button
  (hidden while applying) never appeared.

Fix (new electron/update-relaunch.cjs, pure + unit-tested):
- Decide the Linux outcome from whether the *running* binary is the one we
  just rebuilt (execPath under release/<plat>-unpacked, path-segment-aware so
  linux-unpacked-evil can't masquerade) and whether its chrome-sandbox helper
  is launchable (root:root + setuid, or an --no-sandbox / ELECTRON_DISABLE_SANDBOX
  opt-out):
    relaunch — detached watcher waits for this PID to exit (graceful, then
      SIGKILL), self-deletes, and re-execs the rebuilt binary with the original
      launch context (filtered args + HERMES_*/sandbox env + cwd) restored.
    guiSkew  — AppImage/.deb/.rpm/dev: backend updated but this GUI package was
      NOT changed; surface an honest closeable 'reinstall the desktop app'
      terminal state instead of lying that it loads next launch (#37541 skew).
    manual   — rebuilt binary but sandbox helper not launchable: keep the
      working window, don't quit into a dead app.
- store/updates.ts lands a terminal, closeable state for EVERY resolved apply
  outcome (handedOff / guiSkew / manualRestart / updated-not-relaunched / error)
  so the hang is impossible regardless of platform or result.
- New DesktopUpdateStage values (update/rebuild/done/guiSkew) + GuiSkewView so
  progress reads correctly and the skew state is closeable. i18n in all four
  locales (en/ja/zh/zh-hant) in parity.
- electron/update-relaunch.test.cjs (16 tests) + store outcome tests.

Salvaged from #45205 onto current main. Linux quit dwell uses the shared
UPDATE_HANDOFF_DWELL_MS (2.5s) from #50448 for consistency. Four-locale i18n
parity, AUTHOR_MAP entry, and the test wiring added on top.

Closes #45205.
2026-06-21 17:04:52 -07:00
teknium1
1f6994d1ee chore(release): add AUTHOR_MAP entry for #45205 salvage (EtherAura) 2026-06-21 17:04:52 -07:00
brooklyn!
1ec4fcf614
Merge pull request #50466 from NousResearch/bb/composer-popout-bounds
fix(desktop): keep the floating composer in-bounds (can't be lost off-screen)
2026-06-21 18:58:14 -05:00
Flownium
13ce811906
fix: show desktop approval fallback (#46548) 2026-06-21 18:57:18 -05:00
Dusk1e
84fcbbf6a9 fix(security): quote HERMES_TIMEZONE in remote code execution to prevent shell injection 2026-06-21 16:55:12 -07:00
liuhao1024
bef1d3e4ff
fix(desktop): filter undefined entries in AttachmentList to prevent refText crash on session switch (#49624)
* fix(desktop): filter undefined entries in AttachmentList to prevent refText crash on session switch

When switching sessions, the attachments array can contain stale/undefined
entries from the previous session's state. Accessing attachment.refText on
an undefined entry throws TypeError, breaking session switching entirely.

Fix: add .filter(Boolean) before .map() to skip undefined/null entries.

Fixes #49614

* fix(desktop): update I18nConfigClient usage in attachment test

The i18n config API changed from getLocale/saveLocale to
getConfig/saveConfig. Update the test fixture to match.
2026-06-21 18:54:09 -05:00
Brooklyn Nicholson
16aeba1707 fix(desktop): clamp composer peel-off under cursor
Keep the floating composer bounded from the first peel-off frame and leave titlebar clearance when recovering bad persisted positions.
2026-06-21 18:52:01 -05:00
Teknium
c768c4b71c fix(antigravity): move model flow to model_setup_flows + stop bare-alias hijack
CI on the salvage caught two issues the stale PR base masked:

1. The model-setup flows were extracted from main.py into
   hermes_cli/model_setup_flows.py after @pmos69 forked. The cherry-pick
   re-introduced a stale _model_flow_custom into main.py (duplicating the
   one main.py now imports) and put _model_flow_google_antigravity there too.
   Move the antigravity flow into model_setup_flows.py alongside its siblings
   and drop the stale _model_flow_custom dup. Fixes the getpass/stdin OSError
   in tests/cli/test_cli_provider_resolution.py.

2. google-antigravity re-exposes Claude/Gemini/GPT-OSS models, so its catalog
   was hijacking bare short aliases (`sonnet` -> google-antigravity instead of
   anthropic) in detect_static_provider_for_model via dict insertion order.
   Add _BORROWED_MODEL_PROVIDERS and defer those providers to a last-resort
   pass so a model's native vendor always wins alias/direct-catalog detection.
   Fixes tests/hermes_cli/test_models.py::test_short_alias_resolves_to_static_model.
2026-06-21 16:41:30 -07:00
Teknium
37c37c9dc5 fix(antigravity): register google-antigravity ProviderProfile + AUTHOR_MAP
The salvaged PR wired auth.py / providers.py / runtime_provider.py for
google-antigravity but never registered a ProviderProfile, so the provider
was invisible to list_providers() / the model picker / alias resolution.
Register it in the gemini model-provider plugin (alongside gemini and
google-gemini-cli) with the antigravity-pa:// scheme and aliases. Also add
@pmos69 to release.py AUTHOR_MAP (CI gate).
2026-06-21 16:41:30 -07:00
Teknium
b7a912ea45 fix(antigravity): bake in public OAuth client + default project fallback
Salvage follow-up on top of @pmos69's #29474. The PR resolved the
Antigravity OAuth client purely by discovering it from an installed `agy`
binary or HERMES_ANTIGRAVITY_CLIENT_ID/SECRET env vars, so users without
agy installed hit a hard 'client ID not available' error.

Antigravity's desktop OAuth client is a public, non-confidential installed-app
client (PKCE provides the security), baked into every copy of the Antigravity
CLI — same posture as the gemini-cli credentials Hermes already ships in
google_oauth.py. Bake it in as the final fallback (env -> discovery -> public
default) and add the public default Code Assist project as the discovery
fallback, matching the reference Antigravity flow. Now consumers can
authenticate directly without agy installed.
2026-06-21 16:41:30 -07:00
pmos69
8baa4e9976 feat(cli): add native Antigravity OAuth provider 2026-06-21 16:41:30 -07:00
xxxigm
29176ffecf test(gateway): cover no eager platform install on startup sweep
Pin the contract that ``_apply_env_overrides`` consults ``is_connected``
before the install-triggering ``check_fn``: an unconfigured platform is
skipped without calling ``check_fn`` (no lazy install), while a configured
platform still has ``check_fn`` run and is auto-enabled. The first assertion
fails on the pre-fix unconditional sweep.
2026-06-21 16:41:17 -07:00
xxxigm
242ec45f45 fix(gateway): don't lazy-install SDKs for unconfigured platforms on startup
For adapter plugins, ``PlatformEntry.check_fn`` doubles as a lazy installer:
calling it pip-installs the platform SDK as a side effect (see e.g.
``plugins/platforms/discord/adapter.py::check_discord_requirements``). The
enablement sweep in ``_apply_env_overrides`` called ``check_fn`` for every
registered plugin platform unconditionally, so a single
``load_gateway_config()`` — which the desktop/dashboard readiness probe
``GET /api/status`` awaits synchronously — pip-installed Discord, Telegram,
Slack, Feishu and Dingtalk even when the user configured none of them
(``platforms: none``). On a slow or restricted network the installs ran long
enough to block the event loop past the desktop's readiness timeouts, so the
app timed out, killed and re-spawned the backend, and boot-looped (stuck at
94%).

Consult the cheap ``is_connected`` credential check FIRST and only run the
install-triggering ``check_fn`` for platforms that are already enabled or
actually configured. Auto-enable-by-credentials is unchanged: a platform with
its token set still gets its SDK installed and enabled.
2026-06-21 16:41:17 -07:00
Dusk1e
8fcb8136bb fix(security): harden smart approval guard against prompt injection
# Conflicts:
#	tools/approval.py
2026-06-21 16:39:48 -07:00
JP Lew
c11ae8261b fix(codex): seed app-server sessions with configured cwd 2026-06-21 16:39:02 -07:00
Brooklyn Nicholson
7785655b4e fix(desktop): keep the floating composer in-bounds so it can't be lost off-screen
The pop-out position is a bottom-right corner inset; the old clamp only floored
it and capped each inset by a flat constant, so dragging left/up (or restoring a
position saved on a larger/other monitor) could push the box's width/height past
the left/top edges and strand it off-screen — unrecoverable since the bad spot
persisted to localStorage.

Now the clamp bounds the WHOLE box (accounting for its measured width/height plus
an edge margin) on all four sides. Applied on drag (measured size), on load
(clamped in readPosition), and via a mount + window-resize reclamp so a shrunk
window or stale persisted value always pulls the box back into view.
2026-06-21 18:35:33 -05:00
Teknium
745c4db235
feat(desktop/windows): show update-in-progress feedback before the desktop exits (#50419) (#50448)
Follow-up to #50238/#50381. The restart-loop is now SAFE (marker + launch
gate), but the trigger that lured users into relaunching mid-update remained:
on the in-app update hand-off the desktop window vanished almost immediately
(app.quit() 600ms after spawning the detached updater), before the updater's
own window appeared — a blank-screen gap that looks like a crash.

- Linger on the update overlay for UPDATE_HANDOFF_DWELL_MS (2.5s, was 600ms)
  before quitting, on BOTH hand-off paths (in-app update + Windows bootstrap
  recovery), so the message lands and bridges to the updater window.
- Strengthen the restart-stage copy and the overlay's applyingBody/applyingClose
  to explicitly tell the user the window will reopen automatically and NOT to
  reopen Hermes themselves while it updates. All four locales (en/ja/zh/zh-hant)
  updated in parity.

Pure UX; does not touch the #50381 marker/gate mutual-exclusion safety net.
2026-06-21 15:34:52 -07:00
teknium1
624580e836 fix(browser): verify daemon identity before orphan reaper kills a PID (#14073)
The browser orphan reaper reads a daemon PID from a `.pid` file in a
world-writable, predictably-named temp dir (`/tmp/agent-browser-h_*`) it
does not write itself, then tree-kills that PID via `_terminate_host_pid`
after only a liveness check. A same-user actor could plant a fake socket
dir whose `.pid` points at an arbitrary victim process, and OS PID reuse
after the real daemon exits could land the recorded PID on an unrelated
process — either way an arbitrary same-user process (and its whole tree)
gets SIGTERMed. Local DoS.

Add `_verify_reapable_browser_daemon()`, gated before the kill: via psutil
(a hard dep, fine cross-platform for the same-user processes the reaper can
signal) require both (1) identity — `agent-browser` in the process
name/cmdline — and (2) binding — the live process references *this* session's
socket dir in its cmdline or `AGENT_BROWSER_SOCKET_DIR`. The binding check is
the real spoof defense: a planted/recycled PID won't embed our exact socket
path. Fail-closed on any ambiguity (unreadable cmdline, no match), leaving the
process and its socket dir untouched for a later sweep.

Builds on @sgaofen's fix in #14394 (cmdline identity check); rewritten to use
psutil instead of `/proc`+`ps` (cross-platform, Windows-covered) and to add
the session-socket-dir binding check for recycled-PID / spoof resistance.

Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com>
2026-06-21 15:23:47 -07:00
teknium1
4d4ba0831e refactor(session): simplify traversal guard to a helper + logger, harden non-leading separators
Follow-up to the salvaged #9560 fix:
- Replace the _TRAVERSAL_RE regex with an explicit _is_path_unsafe() helper
  (drops the now-unused `import re`); catches a path separator ANYWHERE,
  not just leading, so a non-leading Windows backslash can't slip through.
- Switch the per-entry skip in _ensure_loaded_locked from print() to
  logger.warning to match the module's logging conventions.
- Add AUTHOR_MAP entry for the contributor.
- Add regression tests for the non-leading-separator case.
2026-06-21 15:23:36 -07:00
OrbisAI Security
aa2aac68b0 fix(V-009): reject Windows drive-letter paths in session field validation
Extends the CWE-22 path traversal guard to cover Windows absolute paths
of the form C:/... and D:\... — previously only leading / and \ were
checked, which missed drive-letter prefixes. Replaces the inline
startswith check with a compiled module-level regex (_TRAVERSAL_RE) that
covers all three attack patterns: .., leading /\, and leading X: drives.
Adds two regression tests for C:/windows/system32 and D:\\path\\to\\file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-21 15:23:36 -07:00
OrbisAI Security
3a6a43cb81 fix(V-009): reject path traversal in SessionEntry.from_dict and harden _ensure_loaded
Addresses PR #9560 review comments: applies the CWE-22 fix to current main
(post-PR #458 rebase) and adds the requested regression tests.

- SessionEntry.from_dict now raises ValueError for session_key or session_id
  containing '..' or starting with '/' or '\' (directory traversal guard)
- SessionStore._ensure_loaded moves per-entry validation inside the loop so
  one malicious/corrupt entry is skipped with a warning instead of aborting
  the entire sessions.json load
- Adds TestSessionEntryFromDictTraversalValidation (5 cases) and
  TestEnsureLoadedSkipsInvalidEntries covering the skip-not-abort behavior

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-21 15:23:36 -07:00
orbisai0security
c8eb7cf843 fix: V-009 security vulnerability
Automated security fix generated by Orbis Security AI
2026-06-21 15:23:36 -07:00
ethernet
bb59075b25
Merge pull request #50398 from helix4u/fix/windows-npm-path-fallback
fix(windows): prefer cmd npm shim on PATH fallback
2026-06-21 18:55:02 -03:00
devorun
6f0ecf37da fix(redact): mask all Authorization schemes and x-api-key style headers
Secret redaction only matched `Authorization: Bearer <token>`. Other auth
headers passed through verbatim into logs, tool output, and transcripts:

- `Authorization: Basic <base64>` — leaks base64(user:password)
- `Authorization: token <pat>` / any non-Bearer scheme
- `Proxy-Authorization: ...`
- `x-api-key: <key>` (Anthropic and many providers) and `api-key`,
  `x-goog-api-key`, `x-auth-token`, `x-access-token`, ... — opaque values with
  no known vendor prefix were caught by nothing

A logged request or an echoed `curl -H "x-api-key: ..."` command therefore
leaked live credentials.

Generalize the Authorization rule to mask the credential for any scheme (and
Proxy-Authorization) while preserving the header name and scheme word for
debuggability, and add an api-key header rule for the single-opaque-value
headers. Bearer behavior is unchanged; plain prose containing the word
"authorization" (no colon-delimited value) is left untouched.

Adds regression tests for Basic/token/Proxy auth and the x-api-key/api-key
headers, including inside a curl command.
2026-06-21 14:08:06 -07:00
teknium1
87ab373381 test(url-safety): cover IPv6 scope-ID strip + fail-closed in URL guards
Follow-up to the salvaged #25961 fix: regression tests asserting that
scope-bearing IPv6 addresses (fe80::1%eth0, ::1%lo) are blocked by
is_safe_url after the scope is stripped, that a still-unparseable address
fails closed, and that a scoped IPv4-mapped IMDS address is caught by the
always-blocked floor.
2026-06-21 13:56:35 -07:00
sprmn24
ed966696eb fix(security): handle IPv6 scope IDs in URL safety checks to prevent bypass
ipaddress.ip_address() raises ValueError on IPv6 addresses with scope
IDs (e.g. 'fe80::1%eth0'). Both is_always_blocked_url() and is_safe_url()
silently skipped these via `except ValueError: continue`.

If ALL resolved addresses for a hostname carry scope IDs, every address
is skipped and the URL passes all safety checks — a potential SSRF
bypass vector against link-local or metadata endpoints.

Fix:
- Strip the scope ID (%eth0) before parsing in both functions
- is_safe_url(): fail closed (return False) with a warning log if still
  unparseable after stripping
- is_always_blocked_url(): use continue (not return False) to preserve
  multi-address scanning, with a warning log

Affected: tools/url_safety.py — is_always_blocked_url(), is_safe_url()
2026-06-21 13:56:35 -07:00
liuhao1024
b5b8a4cd56 fix(gateway): respect adapter decline of fresh-final to prevent double delivery
When a streamed Telegram reply finalizes, the stream consumer could take
the fresh-final path (send a new sendRichMessage + best-effort delete the
preview) purely because the time-based _should_send_fresh_final()
threshold elapsed — even though Telegram's prefers_fresh_final_streaming
returns False. The fresh Rich Message then overlapped the legacy
MarkdownV2 preview already on screen, leaving both visible (the #47048
table + bullet double-render).

Honor the adapter's decision: when prefers_fresh_final_streaming exists
on the adapter (checked on the class + instance __dict__ so MagicMock
auto-attrs don't false-positive) and declines, the time threshold no
longer overrides it. Adapters without the hook keep the time-based
fresh-final for backward compat.

Fixes #47048
2026-06-21 13:55:50 -07:00
teknium1
f79e0a7060 fix(email): mark missing-config as non-retryable + reject blank env vars (#40715)
Fold in the #40715 blank-env OOM fix on top of the host-resolution change:
- connect() now sets a non-retryable fatal error when required settings are
  missing, so the gateway stops reconnecting against an empty host instead of
  looping forever and leaking memory until the host OOM-kills.
- check_email_requirements() treats blank/whitespace-only EMAIL_* values as
  missing, so an abandoned setup with empty keys no longer enables the platform.

Credits the parallel fixes by zerone0x (#40745) and liuhao1024 (#40829).
2026-06-21 13:33:52 -07:00
teknium1
e921c4f826 chore(release): map devorun salvage author email 2026-06-21 13:33:52 -07:00
devorun
b7f6cb9c8b fix(email): resolve IMAP/SMTP host from config and validate before connecting
The email adapter read address/host purely from env vars and never stripped
them, so a missing or whitespace-padded EMAIL_IMAP_HOST reached
imaplib.IMAP4_SSL("") and surfaced as the misleading
"[Errno 8] nodename nor servname provided, or not known" — sending users down a
DNS rabbit hole when the real problem was an empty/dirty host string. A
config.yaml-only setup also left the host empty because __init__ ignored
PlatformConfig.extra, even though the "connected" check, the send helper, and
`hermes config show` already read address/imap_host/smtp_host from it.

Resolve address/imap_host/smtp_host from the env var first, then fall back to
config.extra, and strip surrounding whitespace — matching the send helper's
existing pattern. Validate the required settings at the start of connect() and
return False with an actionable message instead of attempting a connection with
an empty host.

Adds regression tests for whitespace stripping, config.extra fallback, and the
no-IMAP-attempt-on-missing-host path.
2026-06-21 13:33:52 -07:00
teknium1
4cff0360ea test(approval): regression for interrupt-unblocks-approval; AUTHOR_MAP
- Add thread-scoped regression test: interrupt on the waiting thread resolves
  the approval as deny well under the 300s timeout; a foreign-thread interrupt
  does NOT release the wait (interrupts are per-thread).
- Add panghuer023 to AUTHOR_MAP for the salvaged #37994 fix.
2026-06-21 13:33:48 -07:00
panghuer023
a9c8025984 fix(approval): honor interrupt in blocking gateway approval wait (#8697)
A dangerous-command gateway approval blocks the agent's execution thread
inside _await_gateway_decision() on threading.Event.wait() until the user
responds or the 5-minute approval timeout fires. The poll loop never checked
is_interrupted(), so /stop (which flags the agent's execution thread via
AIAgent.interrupt()) was silently ignored — the session stayed wedged until
timeout, even though /stop reported the session unlocked.

Check is_interrupted() at the top of the poll loop. The wait runs on the
agent's execution thread, the exact thread interrupt() flags, so the check
sees the signal and resolves the pending approval as deny — the agent loop
receives a normal denial and unwinds cleanly. Covers /stop, /new, and the
gateway inactivity-timeout interrupt through the single shared wait loop used
by both the terminal and execute_code guards.
2026-06-21 13:33:48 -07:00
Teknium
824c9d3812
fix(config): alias model.api_base -> model.base_url for custom providers (#50385)
A bare custom provider configured via `model.api_base` (the intuitive name
OpenAI-SDK / LiteLLM users reach for) was silently ignored: `hermes config set`
accepts any dotted key, so `model.api_base` got written and confirmed, but the
runtime resolver reads only `model.base_url`. Requests fell back to OpenRouter
with an empty key -> 401, zero hits to the custom endpoint (issue #8919).

Now api_base is migrated to base_url at load time (fixes existing broken
configs) and at set time (with a notice), never overriding an explicit
base_url. Closes #8919.
2026-06-21 13:33:41 -07:00
Teknium
bb77a8b0d5
fix(gateway): respawn unmapped Windows gateways after update (#50090) (#50373)
On Windows, _pause_windows_gateways_for_update() force-kills every running
gateway before mutating the venv. Gateways mapped to a profile (via
profile.path/gateway.pid) were respawned afterward, but gateways with NO
profile mapping — e.g. a Windows Scheduled Task running
"pythonw.exe -m hermes_cli.main gateway run" — were force-killed and only
told to restart manually. After an auto-update/bootstrap the Telegram bot
stayed dead until manual intervention.

Now we snapshot each unmapped gateway's argv (psutil, guarded by
looks_like_gateway_command_line) before the kill and replay it through the
same detached watcher used for profile gateways, so unmapped gateways come
back automatically too.

Co-authored-by: Hermes Agent <agent@nousresearch.com>
2026-06-21 13:33:26 -07:00
Teknium
99f3072aa0
fix(model-switch): a failed in-place swap must be a no-op, not a dead session (#50375)
When a /model switch resolves a valid model but the in-place agent swap
fails mid-conversation (expired key, unreachable base_url), the agent
rolls itself back to the old working model+client and re-raises. The
callers caught that re-raise, logged a warning, then committed the broken
switch anyway: wrote the failed model to the session DB, set
_session_model_overrides to the broken model/provider/key, and (gateway
direct path) evicted the working cached agent. The next message then
rebuilt a dead agent from the broken override -> permanently unusable
conversation (#50163).

Fix the whole caller class so a failed swap aborts the commit entirely:

- gateway/slash_commands.py (picker + direct /model paths): on swap
  failure, early-return an error message; skip DB persist, session
  override, cache eviction, and config write.
- cli.py (both /model handlers): snapshot CLI-level credential/runtime
  fields before mutating, restore them on swap failure, and abort the
  note + success print.
- tui_gateway/server.py: wrap the previously-unguarded swap; on failure
  raise a clean error and skip worker restart, runtime persist, switch
  marker, session model_override, and config persist.

The no-cached-agent path (apply-on-next-session) is unaffected.

Adds a gateway regression test that fails on the pre-fix behavior.
2026-06-21 13:33:23 -07:00
memosr
ed3d12a762 fix(security): fail-closed when WebSocket peer is empty in loopback mode
Per @egilewski's audit on this PR (#15544), the original fix was
correct but the file has refactored since: the four endpoint-local
empty-peer checks have been consolidated into _ws_client_is_allowed
and _ws_client_reason, but the helpers were left fail-open ('no peer
host known means allow' / 'no reason to block').

On a loopback-bound dashboard with auth disabled, an ASGI server
behind a misconfigured proxy or a unix-socket transport can deliver
ws.client == None or ws.client.host == ''. The helpers were treating
that as 'allowed', so the loopback-only peer gate could be bypassed
by anything that suppressed the client tuple in transit. All four
WebSocket endpoints (/api/pty, /api/ws, /api/pub, /api/events) route
through _ws_request_is_allowed -> _ws_client_is_allowed, so the gap
applied uniformly.

Fix:

* _ws_client_is_allowed: return False when client_host is empty
  instead of True. Only reached on loopback bind with auth disabled
  (auth_required=True and explicit non-loopback binds short-circuit
  earlier), so the fail-closed behavior is scoped to the surface
  that needs it.

* _ws_client_reason: return a 'missing_or_empty_peer bound=...'
  block reason instead of None, so the dispatcher's existing
  reason-based rejection path picks it up and the close gets logged
  with a machine-parseable token for diagnosability.

Behavior unchanged for:

* gated mode (auth_required=True) — early-returns True before the
  empty-peer check runs. The OAuth ticket is the auth at that point.
* explicit non-loopback bind (--host 0.0.0.0/::, or a specific LAN
  address, always with --insecure) — early-returns True before the
  empty-peer check runs. DNS-rebinding is still blocked by the
  Host/Origin guard in _ws_host_origin_is_allowed.
* legitimate loopback peers (client_host == '127.0.0.1' / '::1') —
  not affected by the empty-peer branch.

Regression tests added in tests/hermes_cli/test_dashboard_auth_ws_auth.py:

* test_empty_client_host_rejected_in_loopback_mode
* test_missing_client_object_rejected_in_loopback_mode
* test_empty_client_host_reason_is_block

Plus two regression guards to ensure the fix does not over-reach:

* test_empty_client_host_still_allowed_in_insecure_public_mode
* test_empty_client_host_still_allowed_in_gated_mode

All three new fail-closed tests fail without this patch (the helpers
return True / None for an empty peer) and pass with it. The 45
pre-existing tests in test_dashboard_auth_ws_auth.py continue to pass.
2026-06-21 13:33:18 -07:00
sgaofen
a4b1554c73 fix(whatsapp): normalize bare phone targets to JIDs before bridge send
Baileys' jidDecode crashes ("Cannot destructure property 'user' of
jidDecode(...) as it is undefined") when handed a bare phone number, so
sending a WhatsApp message to +50766715226 / 50766715226 returned HTTP
500 and never delivered (#8637).

Add to_whatsapp_jid() to gateway/whatsapp_identity.py — the outbound
inverse of normalize_whatsapp_identifier: it builds the JID a send must
use (bare phone -> <digits>@s.whatsapp.net) and passes through already
qualified JIDs (@g.us, @lid, status@broadcast, @newsletter) unchanged.
Wire it at every outbound bridge call site in the WhatsApp adapter
(send, edit, media, typing, get_chat_info, and the standalone cron /
send_message sender).

Co-authored-by: Hermes Agent <noreply@nousresearch.com>
2026-06-21 13:32:22 -07:00
Teknium
f72690825e
fix(desktop/windows): stop in-app update from cascading into a backend restart loop (#50381)
When a Windows user relaunches Hermes while an in-app update is still
running (the desktop vanished with no progress and looks crashed), the
fresh instance spawns its own dashboard backend. That backend re-locks
the venv shim, the updater's straggler cleanup (force_kill_other_hermes
-> taskkill /F /T /IM hermes.exe) kills it, the launch dies with the 45s
"backend didn't come up" timeout, and the user relaunches into the same
trap -- an infinite respawn/kill loop (#50238).

Root cause: no mutual exclusion between an applying update and a fresh
desktop spawning its own local backend.

Fix: the updater publishes a HERMES_HOME/.hermes-update-in-progress
marker (pid + start time) for the whole run via an RAII drop-guard that
removes it on every exit path (success, early return, panic). A
freshly-launched desktop checks the marker before spawning its local
backend and PARKS until the update finishes -- then brings the backend
up itself (it is the surviving instance; the updater's own relaunch hits
the single-instance lock and quits). A stale marker (dead pid or past a
20-minute ceiling) is pruned so a crashed updater can never strand
future launches. No rogue backend spawns mid-update, so
force_kill_other_hermes has nothing legitimate to kill.

Marker parse/staleness logic is extracted to update-marker.cjs and
unit-tested; the Rust guard has unit tests; the Rust-write <-> JS-read
contract is E2E-verified.
2026-06-21 13:10:32 -07:00
LeonSGP43
09a96ba0f6 fix(gateway): pause Telegram typing before stream finalize
In Telegram streaming, the typing indicator persisted through the slow
final rich-text/MarkdownV2 finalize edit, so the '...typing' bubble
lingered for seconds after the last streamed token. Add a one-shot
on_before_finalize hook to GatewayStreamConsumer, fired once when the
stream transitions into its finalization path, and wire it on both
Telegram streaming call sites to call pause_typing_for_chat() before
the final edit. Cover hook ordering and once-only behavior in tests.

Fixes #49712
2026-06-21 13:10:25 -07:00
teknium1
6902eb3913 fix(cli): make ZIP-update directory replace atomic so it can't delete ui-tui
Root cause of #49145: the Windows ZIP-update path did rmtree(dst) then
copytree(src, dst). If the copy failed partway — common on that path,
which only runs because file I/O is already flaky on the machine — the
directory was left deleted with nothing copied back. ui-tui/ vanishing
is what broke 'hermes --tui' (WinError 267), but the bug hit every
top-level directory.

_atomic_replace_dir stages the new copy into a sibling temp dir and only
swaps it in on full success, restoring the original on failure. A failed
update now leaves the live tree untouched instead of half-deleted.
2026-06-21 13:10:22 -07:00
teknium1
db097fb088 fix(cli): auto-restore a deleted ui-tui workspace from git before TUI launch
The Windows update path can leave tracked ui-tui/ files deleted in the
working tree (HEAD intact). The guard now self-heals: when ui-tui/ is
missing in a git checkout, run `git restore -- ui-tui` and continue,
falling back to the printed manual-recovery steps only when git can't
recover it (no checkout / restore failed).

Builds on konsisumer's missing-workspace guard.
2026-06-21 13:10:22 -07:00
konsisumer
537ad9ea9a fix(cli): guard missing ui-tui workspace before TUI launch 2026-06-21 13:10:22 -07:00
峯岸 亮
5b45fb269a fix(security): sanitize kanban markdown html 2026-06-21 13:10:17 -07:00
helix4u
7502d38bf9 fix(windows): prefer cmd npm shim on PATH fallback 2026-06-21 14:06:39 -06:00
Teknium
8e4d2fd23f
docs(plugins): document acting from hooks via ctx.profile_name + dispatch_tool (#50352)
Answers a recurring plugin-author question: how to read the active
profile and drive Hermes from inside a hook callback when ctx._cli_ref
is None (gateway, hermes chat -q, and kanban-spawned worker sessions).

- Adds a 'Act from inside a hook' section to the plugin guide covering
  ctx.profile_name and ctx.dispatch_tool as the session-agnostic APIs,
  with a kanban_task_blocked example, and notes there is no in-process
  slash-command bridge for headless workers (shell out via the terminal
  tool instead).
- Adds the three kanban lifecycle hooks to the hook reference table with
  their process semantics.
- Pins the contract with a regression test: ctx.dispatch_tool invokes a
  tool handler with _cli_ref=None (worker/hook context).

Requested by @Smithangshu on Discord.
2026-06-21 12:54:40 -07:00
teknium1
b6f03ab891 docs(ui-tui): add billing.step_up.verification event + perfPane.tsx to README
Follow-up on salvaged #50347: the event surface table was missing the
billing.step_up.verification switch case, and the File map omitted
lib/perfPane.tsx.
2026-06-21 12:52:22 -07:00
alelpoan
d7737bfd97 docs(ui-tui): fix file paths, add billing command, update file map 2026-06-21 12:52:22 -07:00