hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-24 10:52:21 +00:00

Author	SHA1	Message	Date
Teknium	bb7ff7dc30	revert(cron): return cron job storage to per-profile (reverts #32117 + #50993 ) (#51116 ) * Revert "fix(cron): scope job execution to its owning profile (#32091 follow-up) (#50993)" This reverts commit `660e36f097`. * Revert "fix(cron): anchor cron storage at the default root home (not the active profile)" This reverts commit `a5c09fd176`.	2026-06-22 17:53:50 -07:00
Eri Barrett	ba9e3a491b	feat(memory): Honcho OAuth connect — desktop and CLI flows + token refresh (#44335 ) * feat(memory): OAuth token storage and refresh for the Honcho provider * feat(memory): refresh the Honcho OAuth token in the client and session * feat(memory): zero-CLI loopback OAuth authorization flow * feat(memory): generic memory-provider OAuth connect endpoints * feat(desktop): memory-provider OAuth connect link * feat(memory): CLI OAuth sign-in with source-tagged authorize links * fix(memory): IP-literal loopback redirect and consent config_path on the authorize link * fix(memory): profile-scope the memory-provider OAuth endpoints * refactor(desktop): generic memory-provider OAuth client functions * docs(memory): trim OAuth module docstrings to the invariants * docs(memory): document OAuth connect as an optional auth method * fix(memory): send home-relative display path to consent, not the absolute path * perf(memory): cache OAuth token expiry in memory to skip the hot-path disk read * fix(memory): log OAuth refresh failures at warning, not debug * feat(memory): fall back to an OS-assigned loopback port when 8765 is taken * test(memory): cover the desktop Connect launcher, status, and provider dispatch * fix(desktop): keep the memory-provider dropdown one size regardless of connect state * fix(desktop): move the memory connect link to the description line, leaving the dropdown untouched * refactor(memory): move OAuth connect routes out of web_server into a memory-layer router * refactor(desktop): import MemoryConnect directly, drop the single-export barrel * fix(memory): launch CLI OAuth sign-in right after the auth choice, not after the wizard * fix(desktop): auto-clear the OAuth error state instead of leaving it sticky * test(honcho): isolate auth-method prompt from deployment-shape wizard tests main's wizard suite scripts the cloud prompts without the OAuth auth-method step; auto-answer it in the shared helper so the answer lists stay shape-only. * docs(honcho): document query-adaptive reasoning level (reasoningHeuristic) README never mentioned reasoningHeuristic and listed reasoningLevelCap as an orphaned cap with the wrong default (— vs "high"). Add the query-adaptive scaling note + the reasoningHeuristic/reasoningLevelCap rows (grouped under Dialectic & Reasoning), matching the wording already on the hosted honcho.md page, and add a pointer from the memory-providers overview. * fix(honcho): default the CLI peer prompt to the OAuth consent name The CLI runs the grant with apply_config=False, so the peerName the user just entered at consent was dropped and the wizard's 'Your name' prompt fell back to $USER. Surface it as a transient OAuthCredential.consent_peer_name (set even when config isn't merged) and seed the prompt default from it. * feat(honcho): split OAuth client_id by surface (cli=hermes-agent, desktop=hermes-desktop) resolve_endpoints now picks the client_id from the initiating surface and threads it through authorize -> token exchange -> persisted grant -> refresh, so the CLI and desktop register as distinct OAuth clients. Surface-specific env overrides (HONCHO_OAUTH_CLIENT_ID_CLI/_DESKTOP) win over the generic HONCHO_OAUTH_CLIENT_ID, which still overrides every surface. * feat(honcho): show OAuth vs API key in status; detect existing OAuth in setup status now prints 'Auth: OAuth (clientId, token valid Xm/expired)' instead of masking the OAuth access token as a generic API key; setup notes an existing OAuth grant when re-run. * docs(honcho): drop 'shared pool' wording from unified observation mode help * fix(honcho): cross-process lock around OAuth refresh to prevent grant revocation The in-process threading lock can't stop a sibling process (another profile or the desktop app sharing honcho.json) from replaying the single-use refresh token and tripping reuse-detection, which revokes the whole grant. Guard the read-refresh-persist section with an OS file lock on <config>.lock so only one process rotates at a time; the others re-read the freshly-persisted token. Best-effort: platforms without flock degrade to in-process serialization. * refactor(honcho): one OAuth client (hermes-agent) for all surfaces Collapse the per-surface client_id split. CLI and desktop now use a single client_id (hermes-agent); consent branding/UI still adapt via the source query param. One grant identity means no clientId-vs-refresh-token desync that could get the grant revoked. HONCHO_OAUTH_CLIENT_ID still overrides for self-hosting. * fix(honcho): per-session resolves to session_id, never remapped by title Reorder resolve_session_name so stable identifiers win over labels: gateway per-chat key first, then the per-session session_id, then the cwd map / title. A (possibly auto-generated) title can no longer remap a live per-session conversation onto a second Honcho session mid-stream — fixes the desktop, which is per-conversation via session_id. Consequence: a gateway's per-chat key now also wins over a title (titles never remap a stable id).	2026-06-22 19:16:47 -05:00
brooklyn!	6780cee679	Merge pull request #51072 from NousResearch/bb/desktop-computer-use feat(computer-use): add a cross-platform readiness preflight to the desktop	2026-06-22 18:37:07 -05:00
Brooklyn Nicholson	2dfcead683	feat(computer-use): make the preflight cross-platform (win/linux) The card was macOS-only. cua-driver also runs on Windows and Linux, so fold `cua-driver doctor` (cross-platform binary/health probes) into a single OS-aware `ready` signal: - macOS: ready == both TCC grants; keeps the permission rows + grant flow. - Windows/Linux: no TCC toggles, so ready == driver health, with a per-OS note (SmartScreen/UIAccess on Windows; X11/XWayland on Linux). `computer_use_status()` replaces the macOS-only `permissions_status()` and surfaces `platform`, `ready`, `can_grant`, and the doctor `checks` (non-ok ones render as warnings). CLI `permissions status`, the REST endpoint, and the desktop card all key off the one payload. Grant stays macOS-only (400 elsewhere — nothing to grant).	2026-06-22 17:48:43 -05:00
Brooklyn Nicholson	0223ea5f59	feat(computer-use): surface macOS permission preflight in the desktop Computer Use already worked through the desktop backend (the cua-driver toolset enables + installs via Settings -> Skills & Tools), but there was no in-app way to see or grant the two macOS permissions it needs, so "give a model my Mac" was tribal knowledge. The grants attach to cua-driver's OWN TCC identity (com.trycua.driver / the installed CuaDriver.app), not Hermes -- so no app entitlement is involved. cua-driver 0.5+ exposes `permissions status/grant`, which we wrap: - tools/computer_use/permissions.py: thin client over the two subcommands - hermes computer-use permissions {status,grant}: CLI parity - GET /api/tools/computer-use/status, POST .../permissions/grant: desktop REST - ComputerUsePanel: live Accessibility + Screen Recording state with a Grant button (dialog attributed to CuaDriver), shown in the expanded Computer Use toolset row. Binary install stays in the existing provider post-setup runner. Follow-ups: i18n the card copy; a "Stop driver" control (cua-driver stop) for the runaway-`serve` case.	2026-06-22 17:33:52 -05:00
Teknium	87c4a5ebb8	feat(background-review): aux-model selector for the self-improvement review (#49252 ) Adds auxiliary.background_review.{provider,model} (default auto = main chat model — unchanged). Set it to a different, cheaper model and the post-turn self-improvement review runs there for ~3-5x lower cost. Cache-aware by design: the main chat is warm in the prompt cache, so the default full-history replay on the main model is cheap cache reads — left exactly as-is. A different model can't reuse that cache (different key), so when (and only when) routed to a different model the fork replays a compact digest instead of the full transcript, minimising what it cold-writes on the aux model. Same model -> full replay; different model -> digest. Quality holds in benchmarks: memory capture identical, skill near-identical. Nothing changes unless you opt in by naming a different model. Co-authored-by: Hermes Agent <noreply@nousresearch.com>	2026-06-22 14:54:53 -07:00
Teknium	660e36f097	fix(cron): scope job execution to its owning profile (#32091 follow-up) (#50993 ) The #32091 fix moved every profile's cron jobs into one shared root store, but never wired the execution-scoping half it recommended: a job still ran under whichever profile's ticker picked it up, not its owning profile. So a job created under `hermes -p donna` could execute with the root profile's .env / config.yaml / credentials. - jobs.py: create_job auto-captures the active profile (explicit profile= override available) and stores it on the job; resolve_profile_home() maps a profile name to its HERMES_HOME; legacy jobs backfill to 'default'. - scheduler.py: run_job applies the job's profile via a scoped HERMES_HOME override (env var + in-process ContextVar) before any .env/config/script load, restored in finally. tick() routes profile-mismatched jobs to the single-worker sequential pool so the env mutation can't race. - cronjob tool threads profile through (NOT exposed in the model schema, to avoid cross-profile privilege escalation); hermes cron add gains --profile. E2E verified against a temp HERMES_HOME with a real profile dir: a root-profile ticker runs a profile='donna' job with HERMES_HOME=donna during execution and restores the ticker env afterward.	2026-06-22 14:54:28 -07:00
kshitijk4poor	0e69cd4b37	fix(memory): honor configured char limits in the no-agent on-disk store Follow-up to the /memory approve fresh-store fix. Both the CLI fallback and the messaging-gateway handler built a bare MemoryStore() with the hardcoded default char limits (2200/1375), ignoring the user's configured memory.memory_char_limit / user_char_limit. A live agent honors those overrides (agent/agent_init.py), so an approval applied without a live agent could accept a write the user's lower cap would reject, or vice versa. Extract a shared tools.memory_tool.load_on_disk_store() factory that reads the configured limits (falling back to defaults if config can't load) and wire both the CLI and gateway handlers to it, closing the gap on both surfaces and de-duplicating the construction block.	2026-06-23 03:10:53 +05:30
Max Hsu	3147cbb136	fix(memory): apply /memory approve against a fresh store when no live agent The CLI /memory slash handler (cli_commands_mixin._handle_memory_command) passed self.agent._memory_store straight through, which is None when the command runs without a live agent — e.g. /memory approve from the Desktop GUI. The shared write-approval handler then returns "memory store unavailable" and applies nothing, even with built-in memory enabled and pending writes present. Fall back to a freshly loaded on-disk MemoryStore when no live store is available, mirroring the gateway path (gateway/slash_commands.py). It persists to the same MEMORY/USER.md and creates MEMORY.md on the first approved write. Fixes #46783 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-23 03:10:53 +05:30
Francesco Bonacci	5f1d23cfb2	fix(computer-use): delete broken pre-install asset probe; trust the upstream installer `hermes computer-use install` refused to install on Linux, Windows, and macOS x86_64 because the pre-install asset probe was hitting the wrong GitHub endpoint AND duplicating tag-resolution logic the upstream installer already does correctly. `_check_cua_driver_asset_for_arch()` queried `https://api.github.com/repos/trycua/cua/releases/latest`. On trycua/cua: - cua-driver-rs releases (the binary the installer fetches) are marked prerelease on every cut. GitHub's `/releases/latest` explicitly skips prereleases. - The Python package releases (`cua-agent`, `cua-computer`, `cua-train`) are non-prerelease and end up as the "latest" instead. Live API check today: $ curl -sf https://api.github.com/repos/trycua/cua/releases/latest \ \| jq '{tag:.tag_name, asset_count: (.assets\|length)}' { "tag": "agent-v0.8.3", "asset_count": 0 } The probe sees zero assets, prints "Latest CUA release has no Linux x86_64 asset", and skips install on every Linux / Windows / macOS-x86_64 host — even though the cua-driver-rs-v0.6.0 release ships 19 binary assets covering all those platforms. Filtering `/releases?per_page=N` for the `cua-driver-rs-v` prefix fixes the bug, but it duplicates tag-resolution logic the upstream `_install-rust.sh` already does correctly via `CUA_DRIVER_RS_BAKED_VERSION` (auto-baked by CD on every release, with a `/releases?per_page=N` API fallback for dev checkouts). The right answer is to trust that contract instead of mirroring it in Python where it can drift. Two paths get the same outcome without the probe: 1. Fresh install: run `install.sh` directly. It has the baked release tag, fetches the right asset, and errors with a clear message on missing-arch downloads. No preflight needed. 2. Upgrade path*: `cua_driver_update_check()` (separately added) shells `cua-driver check-update --json` against the installed binary, which returns the canonical update answer from the same source the installer uses. - `hermes_cli/tools_config.py`: delete `_check_cua_driver_asset_for_arch` and its two call sites in `install_cua_driver`. Replace with an inline comment near the top of the module explaining the rationale. - `tests/hermes_cli/test_install_cua_driver.py`: drop the `TestCheckCuaDriverAssetForArch` block. Add `TestArchProbeRemoval` with three regressions: - `test_probe_function_is_gone` — asserts the deleted helpers stay deleted. - `test_fresh_install_does_not_call_github_api` — asserts the install path doesn't hit GitHub directly from Python anymore. - `test_upgrade_with_binary_does_not_call_github_api_directly` — same for the upgrade path. All 9 `test_install_cua_driver` tests pass. Reported by @teknium1 while testing on a headed Ubuntu host.	2026-06-22 13:41:03 -07:00
Austin Pickett	2a58fee1a1	fix(api): allow dashboard updates for git checkouts in containers (#51005 ) Salvages #50469 by @libre-7. _dashboard_local_update_managed_externally() previously blocked every containerized dashboard from the local update API, even when the running install was a bind-mounted git checkout that can be updated with hermes update. Allow the dashboard updater only for git installs inside containers, while keeping hosted /opt/data, docker, and pip installs managed externally. Pip remains blocked because its apply path mutates the running container filesystem and is not the self-managed checkout case. Adds regression coverage for docker, git, and pip install-method handling inside containers, and maps the contributor email for release attribution. Co-authored-by: libre-7 <libre-7@users.noreply.github.com>	2026-06-22 15:55:33 -04:00
Teknium	2ba1cfeb2e	feat(goals): completion contracts for /goal — evidence-based judging (#50501 ) Adds an optional structured completion contract to the standing-goal loop, adapted from OpenAI Codex's /goal guidance (a durable objective works best when it names what done means, how to prove it, what not to break, what's in scope, and when to stop). A contract has five optional fields — outcome, verification, constraints, boundaries, stop_when. When set, the continuation prompt tells the agent to target the verification surface and respect constraints, and the judge marks the goal done only when the verification criterion is met with concrete evidence (command result, file excerpt, test output) instead of a loose "looks done" claim. This tightens the most common /goal failure mode: premature completion / endless over-continuation on an underspecified goal. Two ways to set a contract, both backward compatible (bare /goal <text> behaves exactly as before): - /goal draft <objective> — expands plain text into a full contract via the goal_judge aux model (cache-safe side call), falls back to a free-form goal if the model is unavailable. - /goal <text> with inline 'field: value' lines (verify:, constraints:, boundaries:, stop when:, ...). Plain goals with an incidental colon are not mangled — only known field prefixes are pulled out. - /goal show prints the active contract. Contracts persist in SessionDB.state_meta alongside the goal (survive /resume), compose with /subgoal criteria, and old goal rows load unchanged. CLI + every gateway platform via the shared GoalManager engine; zero new model tools. Tests: +18 in tests/hermes_cli/test_goals.py (parse/serialize/judge-prompt/ draft/fallback), 73/73 green; 42/42 across the broader goal test surface; live E2E roundtrip (set -> persist -> reload -> contract-aware prompts) green.	2026-06-22 12:20:09 -07:00
kshitij	5937b95192	Merge pull request #50773 from NousResearch/salvage/43719-dashboard-plugin-rce fix(security): restrict dashboard plugin backend auto-import to bundled plugins — defense-in-depth (#43719)	2026-06-22 22:57:33 +05:30
kshitijk4poor	e2bea0abe6	refactor(security): centralize non-bundled plugin sources in one constant /simplify-code (LOW, flagged by two reviewers): the source tags 'user' / 'project' / 'bundled' were bare string literals scattered across the discovery scrub and the two mount-time refuse guards. A typo in any one site (e.g. 'users') would SILENTLY disable a security gate with no error — the exact failure mode this RCE boundary must not have. Introduce a shared module-level _NON_BUNDLED_PLUGIN_SOURCES frozenset referenced by both the discovery scrub and the (now single) mount guard, so the auto-import policy lives in one place. The two mount guards collapse into one gate that still emits the distinct per-source operator message via a map (no loss of guidance). Behavior unchanged: 39 RCE-bypass tests pass, and the constant is mutation-checked (typo'ing it fails the bypass tests). Defence-in-depth (discovery scrub + mount refuse) is retained intentionally.	2026-06-22 22:48:37 +05:30
Teknium	f1e6d39a74	feat(computer_use): disable cua-driver telemetry by default, add opt-in (#50842 ) * feat(computer_use): disable cua-driver telemetry by default, add opt-in cua-driver ships anonymous PostHog usage telemetry ENABLED by default upstream (fires cua_driver_install / cua_driver_doctor events to eu.i.posthog.com). Hermes now disables it for our users unless they explicitly opt in. - New config key `computer_use.cua_telemetry` (default false) in DEFAULT_CONFIG. - `cua_backend.cua_driver_child_env()` injects `CUA_DRIVER_RS_TELEMETRY_ENABLED=0` into the child env when telemetry is disabled (the default); leaves the var untouched on opt-in so the driver uses its own default. Reads config fail-safe — any error defaults to telemetry off. - Routed every cua-driver spawn site through the policy: MCP backend (StdioServerParameters env), `cua_driver_update_check`, doctor's health_report Popen, the install.sh/install.ps1 runner, and the `--version` / status probes. - Docs: new Telemetry subsection in computer-use.md (EN). - Tests: tests/computer_use/test_cua_telemetry.py — default disables, explicit-false disables, opt-in leaves var untouched, config-failure fails safe, inherited-enabled is overridden off. Verified live on Linux against the real cua-driver-rs 0.6.0 binary: with the var=0 the driver reports "telemetry: disabled via CUA_DRIVER_RS_TELEMETRY_ENABLED" and sends no event; with it unset it logs "sending event: cua_driver_doctor". 213 computer_use + install tests green. * fix(dashboard): fold computer_use config category into agent tab The new computer_use.cua_telemetry key created a single-field dashboard config category, tripping test_no_single_field_categories (web_server's invariant that categories with <2 fields must be merged to avoid tab sprawl). Add computer_use -> agent to _CATEGORY_MERGE, matching the existing onboarding/telegram single-field folds.	2026-06-22 09:57:16 -07:00
teknium1	38c56a1e86	fix(computer_use): probe cua-driver-rs release tag, not monorepo releases/latest The install pre-flight asset probe queried trycua/cua's `releases/latest`, which floats across the monorepo's components (agent-, computer-, lume-, train-) — most ship zero binary assets. So the probe false-negatived and hard-blocked `install_cua_driver` (line 770: `if not probe: return False`) BEFORE the upstream installer ran, on Linux, Windows, and Intel macOS — even though the installer it gates resolves the right tag and would have succeeded. Net effect: the normal enable path (`hermes tools` → Computer Use post-setup, and `hermes computer-use install`) refused to install on every platform this PR claims to support. Fix: list `/releases?per_page=100`, pick the newest `cua-driver-rs-v` tag, and match its assets on OS-token + arch — mirroring what the upstream `install.sh` already does. Fail open if no driver release surfaces (installer remains the source of truth). Adds an OS-token gate so a darwin asset can't satisfy a Linux probe. Tests: updated the install-probe fixtures to the list-of-releases shape with `cua-driver-rs-v` tags + OS-token asset names; added a regression guard (`test_releases_latest_tag_ignored_picks_driver_rs_tag`) for the monorepo floating-latest case. 25/25 install + 192 computer_use tests green. Verified live: probe returns True for all six platform/arch combos against the real GitHub releases API.	2026-06-22 06:42:30 -07:00
teknium1	e3505c7f73	fix(computer_use): reconcile Linux gate with stale "gated off" comments The runtime gate (check_computer_use_requirements) and the hermes tools platform_gate both enable linux alongside darwin/win32, but several docstrings/comments still described Linux as "alpha, gated off until it flips upstream" — contradicting the code that ships it. Bring the prose in line with the gate that's actually live: - tool.py / cua_backend.py module docstrings: Linux is enabled (X11 today, Wayland via XWayland), not gated off. - toolsets.py description and hermes tools display name: (macOS/Windows) -> (macOS/Windows/Linux). No behavior change — the gate already allowed all three platforms.	2026-06-22 06:42:30 -07:00
Francesco Bonacci	f2e37549c6	feat(computer_use): cross-platform cua-driver (macOS/Windows/Linux) Make the computer_use toolset platform-agnostic by driving cua-driver on macOS, Windows, and Linux. Consumes the 8 cua-driver decoupling surfaces (capability discovery, structuredContent AX tree, opaque element_token, click button enum, explicit mimeType, machine-readable manifest, structured list_windows, structured health_report), each degrading gracefully on older drivers. Adds `hermes computer-use doctor` (drives cua-driver health_report with a per-OS check matrix and an exit 0/1/2 ok/degraded/blocked contract), full typed wrappers for the previously-uncovered cua-driver tools plus a generic call_tool escape hatch, per-session agent-cursor lifecycle, platform-aware system-prompt guidance (host-deterministic, cache-safe), and honors HERMES_CUA_DRIVER_CMD end-to-end. Replaces the macOS-only skills/apple/macos-computer-use skill with a cross-platform skills/computer-use skill, and refreshes the EN + zh-Hans docs. Supersedes #44221 (Windows-enablement salvage of #30660). Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-06-22 06:42:30 -07:00
Teknium	ff85af3fc7	feat(goals): /goal wait <pid> — park the loop on a background process (#50503 ) * feat(goals): add /goal wait <pid> barrier to park the loop on a background process The /goal loop re-pokes the agent every turn via the post-turn judge. When a goal is gated on a long-running background process (CI poller, build, test matrix, deploy) that produces nothing to judge yet, this spins the agent into 'is it done?' busy-work and burns the turn budget. /goal wait <pid> [reason] parks the loop: while the PID is alive, the judge is skipped, no turn is consumed, no continuation fires, and /goal status shows a parked indicator. The barrier auto-clears the moment the process exits (the agent's notify_on_complete watcher is the natural wake signal), then the next turn resumes normal judging. /goal unwait clears it manually; pause/resume/clear drop it; a dead/stale PID can never wedge the loop. Wired across CLI, gateway, and the mid-run command guard for parity. Barrier persists in SessionDB.state_meta (survives /resume); GoalState gains backward-compatible waiting_on_pid/waiting_reason/waiting_since fields. 12 new tests; docs updated. * fix(goals): use gateway.status._pid_exists for liveness, not os.kill(pid,0) The Windows-footguns CI guard flagged os.kill(pid, 0) in _pid_alive — on Windows that's not a no-op, it routes to CTRL_C_EVENT and hard-kills the target's console process group (bpo-14484). Delegate to the canonical footgun-safe gateway.status._pid_exists (psutil + ctypes/POSIX fallback) instead, with a direct-psutil last resort. * feat(goals): judge-driven auto-wait — the loop parks itself, no manual /goal wait Makes the wait barrier automatic. Every turn the judge is shown the agent's live background processes (pid, command, uptime, output tail from the process_registry) alongside the goal + response, and can return a new 'wait' verdict instead of continue: {"verdict":"wait","wait_on_pid":N} → park until that process exits {"verdict":"wait","wait_for_seconds":N} → park until the deadline passes evaluate_after_turn acts on the directive (sets the barrier, parks the loop) so the agent isn't re-poked into busy-work while CI/builds/deploys run. Adds a time-based waiting_until barrier alongside the pid barrier; both auto-clear and can never wedge the loop. Drivers (CLI, gateway, tui_gateway) feed the live registry in via gather_background_processes(). Manual /goal wait stays as an override. Judge verdict contract widened to (verdict, reason, parse_failed, wait_directive); legacy {"done":bool} shape still accepted. * test(goals): update kanban _fake_judge to the 4-tuple judge contract CI test(3) caught it: test_kanban_goal_mode's _fake_judge still returned the 3-tuple (verdict, reason, parse_failed), but the kanban loop now unpacks the 4-tuple (+ wait_directive). Update the fake to return None for the directive and accept the background_processes kwarg. * feat(goals): trigger-based wait — park on a process's own signal, not just exit Addresses two gaps in the judge-driven wait: (1) the judge could only express 'wait until PID exits' or 'wait N seconds', so a long-lived watcher/server that fires a trigger MID-RUN (and may never exit) couldn't be waited on; (2) the process's own watch_patterns/notify_on_complete trigger was invisible to the judge. Adds a session-based barrier (waiting_on_session) that releases on the process's OWN trigger via process_registry.is_session_waiting(): the session exits, OR (if started with watch_patterns) its pattern matches — even while the process keeps running. list_sessions() now surfaces session_id + watch_patterns/watch_hit/ notify_on_complete so the judge sees the trigger and is told to prefer wait_on_session for trigger processes. Judge verdict gains a {wait_on_session} directive (preferred over pid). Backward-compatible GoalState field; pid + time barriers unchanged. Tests: TestSessionTriggerBarrier (release on mid-run pattern match while alive, release on exit, unknown-session, full park→trigger→resume, parse, validation, backcompat load). 105 goal-surface + 85 process_registry tests green.	2026-06-22 06:27:29 -07:00
teknium1	a6ce9b2fbb	fix(picker): keep flat-namespace reseller first-party models in desktop picker OpenCode Go (and OpenCode Zen) showed only a subset of the models they serve in the desktop/CLI model picker — e.g. opencode-go rendered 13 of 19, silently dropping minimax-m3/m2.7/m2.5, glm-5/5.1, deepseek-v4-flash. Root cause: the picker dedup in build_models_payload strips any model from an aggregator row that overlaps a user-defined provider's catalog (so a local proxy isn't shadowed by OpenRouter). It gated on is_aggregator(), which is True for opencode-go/zen because their flat /v1/models returns bare IDs the model-switch resolver searches. But those are flat-namespace RESELLERS, not routing aggregators — every model they list is first-party, so deduping them against a user proxy that happens to serve a same-named model guts their own catalog. Fix: add is_routing_aggregator() (True only for true routers like OpenRouter and custom:* proxies; False for opencode-go/zen) and gate the picker dedup on it. is_aggregator() is unchanged so model-switch flat catalog resolution keeps working. Both desktop entry points (model.options JSON-RPC and /api/model/options REST) and hermes model share build_models_payload, so all surfaces get the full list. Fixes #47077	2026-06-22 06:09:08 -07:00
Teknium	ef6492b648	fix(gateway): cold-start installed Windows gateway after update when none was running (#50804 ) The post-update gateway resume path (`_resume_windows_gateways_after_update`) only relaunched gateways that were running when the update began — it enumerates live PIDs in `_pause_windows_gateways_for_update` and respawns exactly those. A gateway that had already died between updates (e.g. it was launched attached to a terminal/TUI that later closed, taking the child with it) was never brought back: the Startup-folder / Scheduled-Task autostart entry only fires on the next login, not after an in-place update. So a Desktop-GUI update (which runs `hermes update --yes --gateway`) on a box whose gateway had quietly died would complete with no gateway running, and the user had no indication anything should have come up. Fix: when no gateway is running at pause time but an autostart entry is installed (`gateway_windows.is_installed()` — an explicit "I want a gateway" signal), return a `cold_start_if_installed` token. The resume step then does a fresh detached spawn via `gateway_windows._spawn_detached()` — the same windowless `pythonw` + `CREATE_BREAKAWAY_FROM_JOB` path `hermes gateway start` uses. It re-checks liveness immediately before spawning so a concurrent start (autostart entry firing) can't produce a duplicate. Gateway-less users (no autostart entry) get nothing forced on them — the pause step still returns None for them. POSIX is unaffected: enabled systemd units already restart via `Restart=always`. Windows-only; best-effort throughout (logs at debug and no-ops on any error). Tests: pause returns the cold-start token only when installed, returns None when not installed, resume cold-starts on the token, and resume skips the cold-start when a gateway is already running.	2026-06-22 06:02:31 -07:00
Teknium	eecb5b9dd1	fix(update): don't count across shallow-clone boundary (bogus '12492 commits behind') (#50784 ) * chore: re-trigger CI (workflows did not dispatch on prior head) * fix(update): don't count across shallow-clone boundary (bogus '12492 commits behind') Installer checkouts are shallow (git clone --depth 1). The CLI banner and hermes update --check both did a plain git fetch (silently unshallowing the repo) then git rev-list --count HEAD..origin/main, which counts across the shallow boundary and prints a huge nonsense number like '12492 commits behind'. Detect shallow up front, fetch with --depth 1 to preserve the boundary, and compare tip SHAs instead of counting: - banner _check_via_local_git: returns UPDATE_AVAILABLE_NO_COUNT when behind (renders as 'update available') instead of the bogus count. - _cmd_update_check: reports presence-only on shallow clones. Full clones keep the exact count path unchanged. Mirrors the desktop fix in apps/desktop/electron/main.cjs (commit `2950c6fa2`).	2026-06-22 05:39:11 -07:00
Eugeniusz Gilewski	8845f3316c	fix(security): restrict dashboard plugin backend import to bundled plugins (#43719 ) Defense-in-depth for the dashboard plugin auto-import path. The web server auto-imports and mounts the Python backend (dashboard/manifest.json -> api file) of plugins found in ~/.hermes/plugins/ (user) and ./.hermes/plugins/ (project), not just bundled plugins. So any plugin that reaches one of those dirs gets arbitrary Python executed on the next dashboard start. NOTE ON THREAT MODEL: #43719's originally-documented delivery chain (a public --insecure dashboard + open API used to git clone a malicious repo into ~/.hermes/plugins/) is ALREADY mitigated on main — since the June 2026 hermes-0day hardening, a non-loopback bind ALWAYS requires an auth provider and --insecure no longer bypasses the auth gate. This change is therefore NOT closing that (now-authenticated) network path; it removes the residual 'arbitrary code executes merely because a plugin is on disk' hazard, which still applies when a plugin arrives by other means: a socially-engineered git clone, a supply-chain drop, an authenticated-but-malicious actor, or a future regression in the auth gate. Untrusted on-disk code should not auto-execute. Restrict dashboard backend Python auto-import to BUNDLED plugins only. User and project plugins may still extend the dashboard UI via static JS/CSS, but their api Python file is never auto-imported. Two layers: _discover_dashboard_plugins scrubs api/_api_file for user/project sources (and bundled wins name conflicts so a non-bundled plugin cannot shadow a trusted backend route); _mount_plugin_api_routes re-refuses user/project at mount time. Tightens the prior GHSA-5qr3-c538-wm9j / #29156 hardening (bundled+user) to bundled-only. Salvaged from #44472 (@egilewski) onto current main.	2026-06-22 17:51:37 +05:30
sherman-yang	74a5905aea	fix(cron): layer enabled MCP servers onto per-job enabled_toolsets A cron job that sets `enabled_toolsets` to a list of native toolsets (e.g. `["web", "terminal"]`) silently got ZERO MCP tools, while a job with no per-job list got every globally-enabled MCP server. `_resolve_cron_enabled_ toolsets` returned the per-job list verbatim, bypassing the MCP-merge that the platform-fallback branch performs via `_get_platform_tools`. So `discover_mcp_tools()` registered the MCP tools into the registry, but `get_tool_definitions(enabled_toolsets=...)` kept only the named native toolsets — the agent then rejected every `mcp_` call as "Unknown tool". (R2 of #23997.) Fix: `_merge_mcp_into_per_job_toolsets` layers MCP membership onto a per-job allowlist with the SAME semantics as `_get_platform_tools`: `no_mcp` sentinel present -> no MCP servers (sentinel stripped) * one or more MCP server names already listed -> treat as an allowlist * otherwise -> union in every globally-enabled MCP server To avoid duplicating the "which MCP servers are enabled" computation (it already existed inline in `_get_platform_tools`), this extracts a shared `enabled_mcp_server_names(config)` helper in `hermes_cli.tools_config` and has BOTH the gateway/CLI platform resolver and the cron per-job resolver call it — so every path agrees on MCP membership (extend, don't duplicate). Note: the issue's headline — bare MCP server names rejected, registry never includes them — was already fixed on main (commits `c10fea8d2` + `04918345e`, both before the issue was filed). This PR closes the remaining cron-specific gap (R2). The `server:*` / `mcp:server` alias-notation rejection (R1) and the quiet-mode silent-drop (R3) are tracked separately. Salvaged from #32788 by sherman-yang (credited below). Reworked to reuse the shared `enabled_mcp_server_names` helper instead of re-implementing the MCP membership set in cron/scheduler.py. Fixes #23997 Co-authored-by: sherman-yang <58446328+sherman-yang@users.noreply.github.com>	2026-06-22 15:52:58 +05:30
Teknium	5ff11a689b	feat(cli): /timestamps command + timestamps in /history (#50506 ) display.timestamps already drove the [HH:MM] suffix on live submitted and streamed message labels, but there was no runtime command to toggle it and /history ignored the setting entirely. Add /timestamps [on\|off\|status] (alias /ts) and render [HH:MM] in /history for turns that carry a stored unix timestamp (resumed sessions). Live unsaved turns without a stored time are never given a fabricated one. Uses the existing sanctioned non-wire 'timestamp' message key (stripped before the API call in chat_completions), so message-alternation and prompt-cache invariants are untouched.	2026-06-21 22:44:25 -07:00
Shannon Sands	5dae502b86	Address email pairing review feedback	2026-06-21 22:43:57 -07:00
Shannon Sands	2455e1801b	Make email pairing opt-in	2026-06-21 22:43:57 -07:00
Shannon Sands	4b09903de5	fix Nous auth refresh for idle agents	2026-06-21 22:43:48 -07:00
teknium1	4314d451ca	fix(gateway): accept any inbound file type across all messaging platforms Authorization to message the agent is the gate, not the file extension. Previously the inbound-attachment allowlist (SUPPORTED_DOCUMENT_TYPES) was opt-OUT on Discord (allow_any_attachment defaulted false) and had no bypass at all on Telegram/Slack — so an .html (or any non-allowlisted type) was dropped or hard-rejected before the agent saw it. Now every authorized upload is cached and surfaced to the agent regardless of type: - base.cache_media_bytes(): unknown types cache as octet-stream (or the caller-supplied MIME) instead of returning None — fixes the chokepoint that Teams/Telegram-media route through. - discord/telegram/slack adapters: removed the allowlist reject/skip; any non-media attachment is typed DOCUMENT and cached. Known types keep their precise MIME. - Text inlining now gates on a shared _TEXT_INJECT_EXTENSIONS set (text + code + config + markup) instead of a blind UTF-8 decode, so binary formats (PDF/zip/docx) with ASCII headers are never inlined. - gateway/run.py emits the path-pointing context note for every DOCUMENT, including non text/application MIME types. - discord.allow_any_attachment is now a documented no-op kept for config back-compat. Validation: 357 gateway tests pass; E2E confirms .html/.bin/custom types cache, known types stay precise, PDFs are not inlined.	2026-06-21 22:43:45 -07:00
Ben Barclay	6202fdfc35	fix(container): detect dashboard role under s6-overlay v3 (#49196 ) (#50600 ) * fix(gateway): walk /proc//cmdline to find main-wrapper.sh under s6-overlay v3 (#49196) (cherry picked from commit `3a108c2df0`) fix(container): peel s6-v3 rc.init prefix so dashboard role is detected kyssta-exe's preceding commit (#49238) fixed _read_container_argv() to locate the rc.init-launched main-wrapper.sh process under s6-overlay v3, but the skip still never fired: _strip_container_argv_prefix() only peeled a prefix when args[0] was init/main-wrapper.sh/hermes. Under s6 v3 the matched argv is /bin/sh -e /run/s6/basedir/scripts/rc.init top /opt/hermes/docker/main-wrapper.sh dashboard ... so args[0] stayed /bin/sh, _is_dashboard_container() returned False, and the dashboard container reconciled + started its own gateway-default — the exact dual Telegram getUpdates 409 in issue #49196. Fix: strip everything up to and including the main-wrapper.sh token (the stable boundary the image owns), covering both the v2 (/init ...) and v3 (/bin/sh ... rc.init top ...) shapes with one rule, instead of matching launcher tokens positionally. This also repairs _is_legacy_gateway_run_request() under v3, which shares the same strip helper (the issue called this out). Tests: extend the dashboard true/false parametrize sets with the s6-v3 argv shape, and add test_main_skips_reconcile_in_dashboard_container_s6v3 exercising main() end-to-end with the v3 argv. Verified via mutation that both new v3 assertions fail under the old positional strip and pass with the fix. --------- Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>	2026-06-22 15:35:38 +10:00
Teknium	e448b21414	feat(dashboard): interactive auth setup on no-provider non-loopback bind (#50551 ) When `hermes dashboard --host 0.0.0.0` is run interactively with the auth gate engaged but no DashboardAuthProvider configured, prompt to set up the bundled username/password provider on the spot (or point at `hermes dashboard register` for OAuth) instead of only emitting the fail-closed error. - main.py: `_maybe_setup_dashboard_auth_interactively()` runs before start_server. No-ops on loopback binds, when a provider is already registered, or when stdin/stdout isn't a TTY (Docker/s6, CI, piped runs) so the fail-closed SystemExit stays the backstop for unattended deploys. On the password path it writes dashboard.basic_auth.{username,password_hash,secret} to config.yaml (scrypt hash, never plaintext), then force-rediscovers plugins so the basic provider registers before the gate check. - web_server.py: fix the fail-closed hint — it told operators to set `dashboard_auth.basic.username` but the provider reads `dashboard.basic_auth`. - docs: note the interactive setup under Fail-closed semantics. No new env vars; reuses the existing dashboard.basic_auth config surface.	2026-06-21 20:21:48 -07:00
Teknium	9e96e70995	feat(cli): /prompt — compose your next prompt in $EDITOR (#50509 ) * feat(cli): /prompt — compose your next prompt in $EDITOR Adds /prompt (alias /compose): opens $VISUAL/$EDITOR on a temp markdown file so you can hand-edit a multi-line prompt, then sends the saved buffer as the next agent turn. Text after the command pre-seeds the buffer; an empty save cancels. Reuses the one-shot _pending_agent_seed the interactive loop already consumes (same mechanism as /blueprint), so no changes to the input event loop or message pipeline. CLI-only. * feat(tui): /prompt slash command opens $EDITOR (parity with CLI) The TUI already opens $EDITOR via Ctrl+G (openEditor), but had no /prompt slash command like the classic CLI. Wire openEditor into the slash handler context and register /prompt (alias /compose) to call it; inline text after the command is dropped into the composer first so it carries into the editor, matching the CLI's /prompt <text>.	2026-06-21 20:21:33 -07:00
Teknium	95d53c3bcb	feat(cli): /reasoning full — show complete thinking, not 10-line clamp (#50499 ) * feat(cli): /reasoning full to show complete thinking, not 10-line clamp The post-response Reasoning recap box hard-clamped long thinking to the first 10 lines, so there was no way to see the full reasoning trace after a turn (live streaming already shows it in full). Add display.reasoning_full (default off) plus /reasoning full\|clamp to toggle it at runtime; the clamp truncation note now points at the command. Addresses repeated user requests to show all thinking tokens. * test(gateway): de-snapshot /reasoning help assertion The test froze the exact args-hint literal '/reasoning [level\|show\|hide]', which the new full/clamp args change to '[level\|show\|hide\|full\|clamp]'. Convert to an invariant: assert /reasoning is in help and carries its core args, not the exact hint string. * feat(tui): /reasoning full\|clamp parity in tui_gateway The classic-CLI reasoning_full toggle had no TUI equivalent — typing /reasoning full in the TUI fell through to parse_reasoning_effort and errored. The TUI renders thinking as an expand/collapse section (no fixed 10-line recap), so map full -> sections.thinking=expanded (raw, uncapped via thinkingPreview mode='full') and clamp -> collapsed, persisting display.reasoning_full for cross-surface config consistency.	2026-06-21 20:21:11 -07:00
Teknium	7130d60861	feat(providers): remove google-gemini-cli + google-antigravity OAuth providers (#50492 ) * feat(providers): remove google-gemini-cli + google-antigravity OAuth providers Google now actively bans accounts for third-party tools that piggyback on Gemini CLI / Antigravity / Code Assist OAuth, and because abuse prevention sits at a backend layer the ban can extend to the entire Google account (Gmail/Drive), with a second violation being permanent. Ref: https://github.com/google-gemini/gemini-cli/discussions/20632 Removes both OAuth inference providers entirely (modules, provider profiles, auth/runtime/config/models wiring, the /gquota Code Assist quota command, the antigravity-cli optional skill, desktop + docs surface in en + zh-Hans). The API-key 'gemini' provider (GOOGLE_API_KEY/GEMINI_API_KEY against generativelanguage.googleapis.com) is unaffected and stays fully supported. * fix(skills): keep the antigravity-cli skill — only the OAuth provider is removed The antigravity-cli optional skill orchestrates the external `agy` binary as a coding-agent tool via the terminal tool — it does NOT wrap Hermes inference through the banned google-antigravity OAuth provider, so it carries none of the account-ban risk that motivated removing that provider. Restore the skill, its docs page, the sidebar entry, and the optional-skills catalog row. The google-antigravity / google-gemini-cli inference providers stay fully removed.	2026-06-21 19:53:27 -07:00
Teknium	5bf23ff251	fix(banner): don't advertise toolsets/skills the agent wasn't given (#50497 ) The welcome banner's 'Available Tools' merged in every toolset from the global check_tool_availability() registry walk, regardless of whether it was enabled for the current platform. On a Blank Slate CLI (file + terminal only) that surfaced discord / feishu / kanban tools the agent was never actually given — they are not in the agent's tool schema, but the banner displayed them, making it look like they were exposed. - Filter the unavailable-toolset merge to toolsets actually in enabled_toolsets (a toolset that's enabled but has unmet deps still legitimately shows as disabled/lazy). - Gate the 'Available Skills' section on the skills toolset being enabled — when it's off, the agent can't load any skill, so show 'Skills toolset disabled' instead of the on-disk catalog. When enabled_toolsets is empty (older callers), behavior is unchanged. Validation: blank-slate banner now shows only file + terminal and 'Skills toolset disabled'; a skills-enabled banner still lists the catalog. Added regression tests; full banner suite green (15/15).	2026-06-21 19:08:54 -07:00
teknium1	8cecaf0b29	feat(process): escalate SIGTERM->SIGKILL on host-pid termination after grace A daemon that ignores or stalls in its SIGTERM handler currently survives the process-registry reap and leaks until reboot (observed as agent-browser daemons accumulating to EMFILE on long-running gateways). _terminate_host_pid now snapshots the tree, SIGTERMs it, waits a bounded grace window (terminal.daemon_term_grace_seconds, default 2.0s, 0 disables), then SIGKILLs any survivor. The recycled-PID identity guard still gates the whole path, so escalation never reaches a stranger; Windows is unchanged (taskkill /F is already a hard kill). Config lives in config.yaml (terminal.daemon_term_grace_seconds), NOT an env var, per the .env-secrets-only policy. Implements the SIGKILL-escalation idea from @tkwong's #15008, reworked onto the current _terminate_host_pid tree-kill path (the original predated it) and config-gated instead of env-var-gated. Co-authored-by: Benjamin Wong <tkwong@inspiresynergy.com>	2026-06-21 19:08:52 -07:00
teknium1	41fe086eb6	style(security-audit): add explicit encoding to read_text calls (ruff PLW1514)	2026-06-21 19:05:27 -07:00
teknium1	f45ace9318	feat(security): startup security posture audit (warn-on-load) Surface dangerous host/deployment posture at gateway startup so operators get the 'you're exposed' signal the June 2026 MCP-config persistence campaign victims never had. Warn-only — never blocks startup, never raises. Checks (each independently fail-safe): - Running as root (POSIX uid 0) - SSH daemon with PasswordAuthentication enabled (incl. the 'yes' default) - Running in a container with no persistent volume mount over HERMES_HOME - Network-accessible API server with no API_SERVER_KEY New module hermes_cli/security_audit_startup.py; invoked once per process from start_gateway() right after setup_logging(). Cross-platform (root/SSH checks no-op on Windows). Idea: @Cthulhu.	2026-06-21 19:05:27 -07:00
teknium1	7726ce3040	fix(security): close hermes-0day MCP-persistence attack surface Remove the dashboard --insecure auth-bypass, add an MCP persistence guard + IOC blocklist, and raise the API-server key entropy floor. Driven by the June 2026 hermes-0day campaign (r/hermesagent, live 854.media instance): scanners find exposed Hermes dashboards/API servers, drive the root agent to plant a 'command: bash' MCP entry that appends an attacker SSH key to authorized_keys, which cron + startup then re-execute every tick. - dashboard: --insecure no longer disables the auth gate. should_require_auth returns True for every non-loopback bind; a public bind ALWAYS requires an auth provider (bundled password provider or OAuth). --insecure kept as a warned no-op for backward compat. Fail-closed error now points at the password provider, not at --insecure. - mcp_security: validate_mcp_server_entry now also rejects shell payloads that write to OS persistence surfaces (authorized_keys/.ssh/pam.d/sudoers/cron/ rc files) and hard-rejects a hermes-0day IOC blocklist (attacker SSH key + source IPs) anywhere in command/args/env. Runs at save AND spawn time. - api_server: raise network-bind API_SERVER_KEY entropy floor 8->16 chars; warn when a network-accessible API server runs an unsandboxed local backend.	2026-06-21 19:05:27 -07:00
Teknium	84e1d31e54	refactor(kanban): fold worker/orchestrator skills into injected guidance (#50473 ) The kanban-worker and kanban-orchestrator bundled skills existed only to be force-loaded into dispatcher-spawned workers, gated by environments:[kanban] so they wouldn't leak into normal CLI listings. That gating was fragile (the leak that #50443 patched) and the --skills auto-load was already best-effort — most workers ran without it because the bundled skill isn't present in profile-scoped skills dirs. Remove the skills entirely and promote their load-bearing content (workspace kinds, deliverable artifacts, created-card integrity, profile discovery) into KANBAN_GUIDANCE, which is already injected into every kanban worker's system prompt. Net result: every worker reliably gets the guidance, nothing can leak into a CLI/blank-slate session, and the gating machinery is gone. - agent/prompt_builder.py: promote the 4 load-bearing rules into KANBAN_GUIDANCE - hermes_cli/kanban_db.py: drop --skills kanban-worker auto-injection + _kanban_worker_skill_available probe - hermes_cli/kanban_swarm.py: drop skills=[kanban-orchestrator] on the root card - hermes_cli/kanban.py: drop kanban-init skill seeding; fix help text - delete skills/devops/kanban-{worker,orchestrator} - docs: delete the two skill pages (EN+zh), fix sidebars/catalog/kanban.md/kanban-worker-lanes.md and the video-orchestrator + codex-lane references - tests: update spawn-argv expectations; re-bound the guidance-size guard Supersedes the skill-leak half of #50443 (credit @helix4u for flagging the area).	2026-06-21 17:06:48 -07:00
Teknium	c768c4b71c	fix(antigravity): move model flow to model_setup_flows + stop bare-alias hijack CI on the salvage caught two issues the stale PR base masked: 1. The model-setup flows were extracted from main.py into hermes_cli/model_setup_flows.py after @pmos69 forked. The cherry-pick re-introduced a stale _model_flow_custom into main.py (duplicating the one main.py now imports) and put _model_flow_google_antigravity there too. Move the antigravity flow into model_setup_flows.py alongside its siblings and drop the stale _model_flow_custom dup. Fixes the getpass/stdin OSError in tests/cli/test_cli_provider_resolution.py. 2. google-antigravity re-exposes Claude/Gemini/GPT-OSS models, so its catalog was hijacking bare short aliases (`sonnet` -> google-antigravity instead of anthropic) in detect_static_provider_for_model via dict insertion order. Add _BORROWED_MODEL_PROVIDERS and defer those providers to a last-resort pass so a model's native vendor always wins alias/direct-catalog detection. Fixes tests/hermes_cli/test_models.py::test_short_alias_resolves_to_static_model.	2026-06-21 16:41:30 -07:00
pmos69	8baa4e9976	feat(cli): add native Antigravity OAuth provider	2026-06-21 16:41:30 -07:00
Teknium	824c9d3812	fix(config): alias model.api_base -> model.base_url for custom providers (#50385 ) A bare custom provider configured via `model.api_base` (the intuitive name OpenAI-SDK / LiteLLM users reach for) was silently ignored: `hermes config set` accepts any dotted key, so `model.api_base` got written and confirmed, but the runtime resolver reads only `model.base_url`. Requests fell back to OpenRouter with an empty key -> 401, zero hits to the custom endpoint (issue #8919). Now api_base is migrated to base_url at load time (fixes existing broken configs) and at set time (with a notice), never overriding an explicit base_url. Closes #8919.	2026-06-21 13:33:41 -07:00
Teknium	bb77a8b0d5	fix(gateway): respawn unmapped Windows gateways after update (#50090 ) (#50373 ) On Windows, _pause_windows_gateways_for_update() force-kills every running gateway before mutating the venv. Gateways mapped to a profile (via profile.path/gateway.pid) were respawned afterward, but gateways with NO profile mapping — e.g. a Windows Scheduled Task running "pythonw.exe -m hermes_cli.main gateway run" — were force-killed and only told to restart manually. After an auto-update/bootstrap the Telegram bot stayed dead until manual intervention. Now we snapshot each unmapped gateway's argv (psutil, guarded by looks_like_gateway_command_line) before the kill and replay it through the same detached watcher used for profile gateways, so unmapped gateways come back automatically too. Co-authored-by: Hermes Agent <agent@nousresearch.com>	2026-06-21 13:33:26 -07:00
memosr	ed3d12a762	fix(security): fail-closed when WebSocket peer is empty in loopback mode Per @egilewski's audit on this PR (#15544), the original fix was correct but the file has refactored since: the four endpoint-local empty-peer checks have been consolidated into _ws_client_is_allowed and _ws_client_reason, but the helpers were left fail-open ('no peer host known means allow' / 'no reason to block'). On a loopback-bound dashboard with auth disabled, an ASGI server behind a misconfigured proxy or a unix-socket transport can deliver ws.client == None or ws.client.host == ''. The helpers were treating that as 'allowed', so the loopback-only peer gate could be bypassed by anything that suppressed the client tuple in transit. All four WebSocket endpoints (/api/pty, /api/ws, /api/pub, /api/events) route through _ws_request_is_allowed -> _ws_client_is_allowed, so the gap applied uniformly. Fix: * _ws_client_is_allowed: return False when client_host is empty instead of True. Only reached on loopback bind with auth disabled (auth_required=True and explicit non-loopback binds short-circuit earlier), so the fail-closed behavior is scoped to the surface that needs it. * _ws_client_reason: return a 'missing_or_empty_peer bound=...' block reason instead of None, so the dispatcher's existing reason-based rejection path picks it up and the close gets logged with a machine-parseable token for diagnosability. Behavior unchanged for: * gated mode (auth_required=True) — early-returns True before the empty-peer check runs. The OAuth ticket is the auth at that point. * explicit non-loopback bind (--host 0.0.0.0/::, or a specific LAN address, always with --insecure) — early-returns True before the empty-peer check runs. DNS-rebinding is still blocked by the Host/Origin guard in _ws_host_origin_is_allowed. * legitimate loopback peers (client_host == '127.0.0.1' / '::1') — not affected by the empty-peer branch. Regression tests added in tests/hermes_cli/test_dashboard_auth_ws_auth.py: * test_empty_client_host_rejected_in_loopback_mode * test_missing_client_object_rejected_in_loopback_mode * test_empty_client_host_reason_is_block Plus two regression guards to ensure the fix does not over-reach: * test_empty_client_host_still_allowed_in_insecure_public_mode * test_empty_client_host_still_allowed_in_gated_mode All three new fail-closed tests fail without this patch (the helpers return True / None for an empty peer) and pass with it. The 45 pre-existing tests in test_dashboard_auth_ws_auth.py continue to pass.	2026-06-21 13:33:18 -07:00
teknium1	6902eb3913	fix(cli): make ZIP-update directory replace atomic so it can't delete ui-tui Root cause of #49145: the Windows ZIP-update path did rmtree(dst) then copytree(src, dst). If the copy failed partway — common on that path, which only runs because file I/O is already flaky on the machine — the directory was left deleted with nothing copied back. ui-tui/ vanishing is what broke 'hermes --tui' (WinError 267), but the bug hit every top-level directory. _atomic_replace_dir stages the new copy into a sibling temp dir and only swaps it in on full success, restoring the original on failure. A failed update now leaves the live tree untouched instead of half-deleted.	2026-06-21 13:10:22 -07:00
teknium1	db097fb088	fix(cli): auto-restore a deleted ui-tui workspace from git before TUI launch The Windows update path can leave tracked ui-tui/ files deleted in the working tree (HEAD intact). The guard now self-heals: when ui-tui/ is missing in a git checkout, run `git restore -- ui-tui` and continue, falling back to the printed manual-recovery steps only when git can't recover it (no checkout / restore failed). Builds on konsisumer's missing-workspace guard.	2026-06-21 13:10:22 -07:00
konsisumer	537ad9ea9a	fix(cli): guard missing ui-tui workspace before TUI launch	2026-06-21 13:10:22 -07:00
Teknium	d164ed0326	fix(kanban): make reclaim claim-lock-aware to stop task/run status desync (#50366 ) After a worker crash + reclaim + respawn, the board could show a task in the Ready lane while its task_run was 'running' and the new worker was actively executing (#36910). The dispatcher could then treat live work as available and double-assign. Root cause: the three reclaim paths (detect_crashed_workers, release_stale_claims heartbeat-stale backstop, enforce_max_runtime) each snapshot a task's worker_pid/claim_lock, do liveness work, then reset tasks.status back to 'ready' with only a 'WHERE status=running' guard. If the task was reclaimed AND re-claimed by a NEW worker in between (new run, new claim_lock, live pid), the stale UPDATE clobbered the live task: status flipped to 'ready' while the fresh run stayed 'running'. claim_task is the only writer that sets status='running', so nothing put it back — permanent desync. Fix: gate each reset on the snapshot's claim_lock (and worker_pid where available) so it only fires when the task is still owned by the worker the reclaim was computed for. A stale reclaim now no-ops (rowcount 0) instead of desyncing a re-claimed task. Genuine crashes (lock still matches) reclaim exactly as before. This is the same race class the in-gateway dispatch lock (single-writer ticks) mitigates, closed at the row level so a single dispatcher's fast reclaim->respawn across two ticks is also safe. Closes #36910.	2026-06-21 12:49:07 -07:00
memosr	ae46699905	fix(security): validate snapshot_id and file paths in restore_quick_snapshot to prevent path traversal	2026-06-21 12:44:22 -07:00

1 2 3 4 5 ...

2969 commits