hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-01 12:02:05 +00:00

Author	SHA1	Message	Date
Teknium	bf0d8fed8e	fix(config): v32 migration flips baked-in verify_on_stop=true to false (#54740 ) The first ship of verify-on-stop (config v30) defaulted DEFAULT_CONFIG agent.verify_on_stop to a literal True, and migrate_config persists defaults with strip_defaults=False — so every install that updated through v30 had verify_on_stop: true written into config.yaml as a literal. The v30->v31 migration only flipped missing/'auto' values to false and deliberately preserved an explicit bool, so it skipped that entire population and left verify-on-stop ON for everyone who had updated. A literal true was never a user choice: the feature had no off-switch worth setting it against until v31 introduced one, so a true persisted before v32 is always the old machine default. v32 migration flips a literal true -> false once, for both v30 (skipped v31) and v31 (preserved-by-bug) installs. A true the user sets AFTER v32 is a deliberate opt-in and is never touched.	2026-06-29 01:51:08 -07:00
Ben	4125cc3b7c	fix(slack): subscribe to message.mpim + mpim scopes so group DMs work Group DMs (multi-person DMs, channel_type=mpim) were never delivered to the Slack bot. The adapter already classifies mpim as a DM and replies ambiently (adapter.py:2526, is_dm = channel_type in {im, mpim}), but the generated app manifest only subscribed to message.im / im:history — the 1:1 DM pair. Without the message.mpim event subscription Slack drops group-DM messages before the adapter ever sees them, so 1:1 DMs worked while group-DM ambient mode was dead. Add message.mpim to bot_events and mpim:history (the scope that event requires per Slack docs) + mpim:read (mirrors im:read for the conversations.info classification call) to bot_scopes. Update the SLACK_BOT_TOKEN / SLACK_APP_TOKEN setup-help strings and the Slack docs (EN + zh-Hans: scope table, event table, troubleshooting) so existing installs are told to add the new scopes and reinstall. Reported by an enterprise customer. Note: this is a manifest/scope change, so it only takes effect after the app is reinstalled and the new scopes are accepted. Tests: assert message.mpim + mpim:history + mpim:read are in the manifest (with and without assistant mode); both fail on current main and pass with this change.	2026-06-29 01:02:53 -07:00
Ben	1c75e7c9d8	feat(dashboard): list & add arbitrary custom .env keys on the Keys page The Keys page only rendered env vars present in a catalog (OPTIONAL_ENV_VARS or the provider catalog); any other key a user set in .env was invisible, and there was no way to add an arbitrary env var from the GUI (e.g. to inject a var a skill or MCP server needs). Backend: GET /api/env now also emits a row for every on-disk .env key that isn't in any catalog, flagged category="custom" + custom=true and password-masked (an unrecognised key could hold anything, so it's redacted and reveal-gated like any secret). Channel-managed credentials stay excluded. The write (PUT /api/env) and reveal (POST /api/env/reveal) paths already handle arbitrary keys, with the existing env-name guard + denylist (PATH, LD_PRELOAD, PYTHONPATH, …) enforced server-side — no new write surface. Frontend: a new "Custom Keys" section lists those custom rows and carries an add-a-key form (client-side name validation mirroring the backend regex; the new row reuses the normal edit/save flow, so on save it round-trips back from the backend as a durable custom row). i18n added for en + zh + types. Tests: behavior-contract coverage that an unknown .env key surfaces as a masked custom row and a catalogued key does not — verified to fail on the pre-fix backend.	2026-06-28 22:53:56 -07:00
Shannon Sands	476875acb9	Add dashboard backup upload and download	2026-06-28 22:35:09 -07:00
brooklyn!	388268ecde	Merge pull request #54568 from NousResearch/bb/shared-websocket-layer refactor(desktop+dashboard): shared WebSocket layer + decouple desktop from dashboard (hermes serve)	2026-06-28 23:43:49 -05:00
Brooklyn Nicholson	1af109c79c	test(cli): drop pytest dep + use real sentinel handlers in serve test Clears the ty diff bot's warnings on the new test: pass real callables to build_dashboard_parser (not object()) and replace the pytest.mark.parametrize with a plain loop so the file is stdlib-only.	2026-06-28 23:24:45 -05:00
Ben Barclay	0943e2a272	fix(cron): don't report a false 'gateway not running' on external-provider instances (#54600 ) `hermes cron status` (and the create/list 'gateway not running' nag) judge whether cron will fire purely from the in-process ticker's heartbeat file + a live gateway PID. That heuristic is correct for the built-in ticker but WRONG for an external provider like Chronos: Chronos arms exactly one external one-shot per job and is fired by a NAS-mediated webhook (POST /api/cron/fire). Its `start()` returns immediately and it deliberately runs no 60s loop and writes no ticker heartbeat — that's the whole point of scale-to-zero (the machine is at zero between fires). So on a perfectly healthy Chronos instance, `cron status` always printed '✗ Gateway is not running — cron jobs will NOT fire' (or a STALLED-ticker warning), and `cron create` always appended the 'jobs won't fire automatically' nag — both false. Verified live on a staging Chronos instance: jobs fired and completed on schedule via the relay while `cron status` insisted the gateway wasn't running and the heartbeat was 370s+ stale. Fix: resolve the active provider (offline — `resolve_cron_scheduler`, whose `is_available()` contract forbids network) and, for any non-builtin provider, report the managed-scheduler state instead of the ticker heuristics, and suppress the ticker-only 'gateway not running' warning. The built-in path is byte-unchanged. Active-job summary is factored into a shared helper so both paths print it identically. New tests prove both directions (chronos: no false negative even with no gateway PID / no heartbeat; builtin: historical warning preserved) and fail without the fix.	2026-06-29 14:03:02 +10:00
lkevincc	163562bf88	fix: normalize lmstudio base urls	2026-06-28 20:46:44 -07:00
Brooklyn Nicholson	9d9a50c2bc	test(cli): pin the `hermes serve` decoupling contract Add a focused contract test for the headless `serve` command (routes to the shared dashboard handler, headless by default while `dashboard` is not, accepts the legacy --no-open, shares the same runtime/lifecycle flag surface). Also refresh the dashboard.py module docstring to cover both commands.	2026-06-28 22:11:48 -05:00
Ben	dee41d0716	feat(dashboard): catalogue all memory-provider API keys in OPTIONAL_ENV_VARS The dashboard Keys page and `hermes setup` render API-key rows from OPTIONAL_ENV_VARS, but only Honcho had an entry — so Hindsight, Supermemory, Mem0, RetainDB, ByteRover, and OpenViking read their keys straight from os.environ yet had no place to set them in the GUI. Add catalog entries (category=tool, password-masked, with get-key URLs and the tool each powers) for all six, plus the relevant base-URL/endpoint companions. Pure declaration: the generic GET /api/env endpoint, the save/reveal write path, and the sandbox env blocklist (which auto-derives from tool-category OPTIONAL_ENV_VARS) all pick these up with no further wiring. Adds a behavior-contract test asserting every memory provider's primary credential key is catalogued, tool-categorised, and password-masked.	2026-06-28 19:17:02 -07:00
Teknium	11183e8332	fix(profiles): validate custom alias names to prevent path traversal `hermes profile alias <profile> --name <custom>` accepted arbitrary strings and used them verbatim as a filename under ~/.local/bin. Because normalize_profile_name only lowercases/strips (no regex gate), a value like `../../.bashrc` escaped the wrapper directory and clobbered arbitrary user-writable files. remove_wrapper_script had the same sink. Add validate_alias_name (reusing the profile-id regex, which forbids `/`, `.`, and `..`) and wire it into check_alias_collision, create_wrapper_script, remove_wrapper_script, and the CLI alias action so the rejection surfaces a clear "Invalid alias name" error instead of silently writing or unlinking outside the wrapper dir. Co-authored-by: Gutslabs <gutslabsxyz@gmail.com> Co-authored-by: Xowiek <xowiekk@gmail.com>	2026-06-28 18:53:33 -07:00
Teknium	490f215a19	test: cover export-prefix stripping in .env parsers (PR #6659 )	2026-06-28 18:53:00 -07:00
Teknium	9860d93f2a	fix(terminal): require approval for host-bound Docker commands (#54483 ) * fix(terminal): require approval for host-bound Docker commands The Docker terminal backend blanket-skips dangerous-command approval on the assumption that the container is isolated from the host. That holds only when nothing is bind-mounted in. Once a host path is exposed (via TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE or a host-path entry in TERMINAL_DOCKER_VOLUMES), a command like `rm -rf /workspace` reaches real host files but is still auto-approved. Detect host bind mounts and route those sessions through the normal approval flow. Isolated Docker keeps the fast path. The same gating is applied to the execute_code guard, which had the identical blanket skip. Co-authored-by: Hermes Agent <agent@nousresearch.com> * chore: add AUTHOR_MAP entry for PR #6436 salvage (Kolektori) * test: accept has_host_access kwarg in _check_all_guards mocks The host-bound Docker approval fix adds a has_host_access kwarg to the _check_all_guards wrapper. Six pre-existing tests monkeypatch it with a fixed (command, env_type) / (cmd, env) lambda signature, which now raises TypeError when terminal_tool passes the new kwarg. Widen those mock signatures to accept **kwargs. --------- Co-authored-by: Kolektori <256073454+Kolektori@users.noreply.github.com> Co-authored-by: Hermes Agent <agent@nousresearch.com>	2026-06-29 11:35:41 +10:00
Brooklyn Nicholson	f34cf7e3a4	test(gmi): stub profile fetch_models in static-fallback test The fallback test only mocked fetch_api_models; CI still hit the real GMI /v1/models endpoint via ProviderProfile.fetch_models and merged live models into the result.	2026-06-28 18:05:28 -05:00
brooklyn!	16ff1a3b93	Merge pull request #54457 from NousResearch/bb/windows-console-launcher-repair fix(windows): repair missing console script launchers	2026-06-28 17:15:56 -05:00
奥森木	e7d4ade8cf	fix(anthropic): ignore stale non-Anthropic base_url across all resolution paths A config left with `provider: anthropic` but a leftover `base_url: https://openrouter.ai/api/v1` (e.g. after a provider switch) would route Anthropic OAuth/setup-token traffic to OpenRouter and 404. Add `_anthropic_base_url_override_ok()` and gate the three native-Anthropic resolution branches (pool, explicit, native) on it. The guard honors a configured `model.base_url` only when it plausibly speaks the Anthropic Messages protocol — official `.anthropic.com` / `.claude.com` hosts, Azure Foundry endpoints, and `/anthropic`-suffixed or Kimi `/coding` proxies — and falls back to `https://api.anthropic.com` otherwise. Aggregator URLs like openrouter.ai / api.openai.com are treated as stale. Reconstructed from @clovericbot's PR #3661 onto current main: the original patched one branch with an anthropic-only allow-list, which would have broken Azure-via-anthropic; widened to all three sites and made Azure/proxy-safe.	2026-06-28 15:12:03 -07:00
Mibayy	b0b7ff0d75	fix(provider): auto+base_url bypasses cloud API when custom endpoint configured (#3846 ) When config.yaml has `provider: auto` and a non-cloud `base_url` (e.g. Ollama at localhost:11434), requests were silently sent to https://api.anthropic.com whenever ANTHROPIC_API_KEY was present in the environment, ignoring the configured local endpoint and returning HTTP 401 / "credit balance too low". Root cause: resolve_provider("auto") scans env vars and returns "anthropic" when ANTHROPIC_API_KEY is set, before config.model.base_url is ever consulted. In resolve_runtime_provider(), before calling resolve_provider(), short-circuit to the OpenAI-compatible resolver when no explicit creds were passed, provider is "auto"/unset, and a non-cloud base_url is configured. Well-known cloud roots (openrouter.ai, anthropic.com, openai.com) are matched on HOST (not substring) so look-alike hosts can't evade the bypass and leak a cloud credential. Co-authored-by: Hermes Agent <hermes@nousresearch.com>	2026-06-28 15:11:55 -07:00
Gille	df8e2523fa	fix(windows): verify launchers after primary install	2026-06-28 17:02:05 -05:00
HexLab98	76bb8f46a0	test(cli): cover Windows console script repair (#52931 ) Add unit tests for missing-shim detection and repair trigger in _verify_console_scripts_installed.	2026-06-28 17:01:31 -05:00
Brooklyn Nicholson	f9b469d7de	test(web_git): assert default branch invariant, not hardcoded main CI git init defaults to master on some runners; compare branch to defaultBranch instead of pinning a branch name.	2026-06-28 16:29:52 -05:00
Brooklyn Nicholson	4e9439cc3b	fix(desktop): route composer context picking through remote-aware fs Second pass on the remote-project flow: the project dialog and git cockpit were remote-aware, but the composer's Add file/folder context picker still called the native Electron picker directly. Route it through selectDesktopPaths so remote sessions use the backend-aware picker instead of local disk paths; preserve local multi-select behavior and keep remote folder selection single because the in-app remote picker only supports one directory. Also use readDesktopFileDataUrl for image previews so an already-known backend image path can be read through /api/fs/read-data-url, and add focused coverage for backend file-diff routing plus the plain-folder git init/worktree path.	2026-06-28 14:35:23 -05:00
Brooklyn Nicholson	fc86e35764	feat(desktop): make the git cockpit work over a remote gateway After the folder picker fix, an added remote folder was still half-usable: the desktop's git GUI (coding-rail status, worktree lanes, review pane, branch switch, file diff) all ran Electron-local git on the USER's machine, so against a remote-gateway repo they silently degraded to empty. Mirror the whole surface over the dashboard REST API so it acts on the BACKEND repo where sessions actually run: - hermes_cli/web_git.py: git/gh logic (status, worktrees, branches, review list/diff/stage/unstage/revert/commit/commit-context/push/ship-info/ create-pr, file-diff, worktree add/remove, branch switch) shelling to the system git, mirroring the Electron ops' shapes. - web_server.py: /api/git/* routes (same auth gate + _fs_path hardening as /api/fs, executor-offloaded, mutations -> 400). - apps/desktop desktop-git.ts: remote-aware facade exposing the same shape as window.hermesDesktop.git; coding-status / review / projects / model / desktop-fs route through desktopGit() so local stays Electron, remote hits /api/git/*. Tests: tests/hermes_cli/test_web_server_git.py (real repo: status counts, review classification, diff incl. untracked all-add, stage+commit roundtrip, worktree/branch lifecycle, commit-context, gh-absent ship-info, auth) and desktop-git.test.ts (local vs remote routing, envelope unwrap, POST bodies).	2026-06-28 14:26:09 -05:00
ygd58	3e16176ba4	fix(tools): reconcile agent.disabled_toolsets when a toolset is enabled _get_platform_tools() applies agent.disabled_toolsets as a final override AFTER reading platform_toolsets.<platform>, so a toolset listed there stays permanently OFF no matter what the toggle write path saves. Blank Slate installs pre-populate this list with ~27 toolsets, making most of the desktop Toolsets UI un-enableable (issue #49995). Fix: _save_platform_tools() now removes any toolset the user just explicitly enabled FOR THIS PLATFORM from agent.disabled_toolsets. Toolsets the user did not touch, or that remain disabled on other platforms, are left alone -- disabled_toolsets keeps working as a cross-platform suppression list for anything not actively re-enabled. Disabling a toolset (unchecking it) does not touch disabled_toolsets at all -- only enables reconcile it. Verified end-to-end with the exact repro from the issue: Blank Slate config (disabled_toolsets=['todo','memory','browser'], cli=['file', 'terminal']) -> enable 'todo' via the toggle -> _get_platform_tools() now resolves 'todo' as enabled while 'memory'/'browser' (untouched) remain disabled. Added 4 regression tests. Full tools_config suite: 101 passed (97 existing + 4 new), no regressions. Fixes #49995	2026-06-28 21:59:03 +05:30
Teknium	0c2e6c0049	test: make active session cross-process race deterministic (#54248 )	2026-06-28 05:49:21 -07:00
izumi0uu	c4719aa51c	fix(gateway): boot out stale launchd registration before restart bootstrap launchd restart can leave the gateway job stopped but still registered after update-time drain logic, so a direct bootstrap hits exit 5 and falls back to a detached process. Booting the stale registration out before bootstrap keeps the launchd-managed restart path intact and locks it with a regression test. Constraint: Keep upstream-facing conventional commit style while preserving local decision context Rejected: Treat bootstrap exit 5 as expected \| Leaves macOS launchd restart outside launchd supervision after update Confidence: high Scope-risk: narrow Directive: Keep launchd start/restart recovery flows aligned when changing launchctl handling Tested: pytest -q tests/hermes_cli/test_gateway_service.py -k "launchd_restart_boots_out_stale_registration_before_bootstrap or launchd_restart_falls_back_to_detached_on_error_5 or launchd_restart_drains_running_gateway_before_kickstart or launchd_restart_self_requests_graceful_restart_without_kickstart" Tested: pytest -q tests/hermes_cli/test_gateway_service.py -k launchd Not-tested: Manual macOS launchctl restart after hermes update	2026-06-28 04:17:13 -07:00
teknium1	463225caf1	fix(gateway): bypass legacy-unit prompt in non-TTY systemd install Folds in PR #42124 (kyssta-exe): systemd_install gained a non_interactive flag so the 'Remove the legacy unit(s)?' prompt — the second hidden prompt not guarded by --start-now/--start-on-login — is also skipped in headless contexts. Updates systemd_install test mocks to accept the new kwarg and adds coverage for the legacy-unit-skip path.	2026-06-28 04:09:54 -07:00
liuhao1024	831d443b03	fix(gateway): honor --start-now/--start-on-login flags and support non-TTY headless installs When running `hermes gateway install` on Linux/systemd, the command unconditionally prompts with two `prompt_yes_no` questions, breaking headless installs (SSH, CI, provisioning scripts) and ignoring the existing --start-now / --start-on-login CLI flags that the Windows branch already respects. The fix mirrors the Windows path: read CLI flags first, prompt only when flags are not provided AND stdin is a TTY, and fall back to True defaults for non-TTY contexts. The argparse help strings are promoted from SUPPRESS to visible so users can discover the flags. Fixes #42065	2026-06-28 04:09:54 -07:00
Teknium	a06d0198cd	fix(dashboard): reap PTY bridge on child EOF, not only in writer finally (#54190 ) The /api/pty handler only closed the PtyBridge in the writer loop's finally. On child EOF the reader task closes the WebSocket, but if the handler task is cancelled the instant the socket closes, the writer's finally can be skipped and the PTY fds leak (#54028) — the FD-leak the regression test guards. Under dashboard auto-reconnect this stacks orphaned PTYs until fds are exhausted. Reap the bridge in the reader's EOF finally too (close() is idempotent), so the PTY is reaped independently of the writer-loop cancellation race. Harden the regression test to poll for teardown instead of asserting on the same tick. Was flaky on main (2/20); now 25/25.	2026-06-28 03:58:18 -07:00
yoniebans	204a67f0c8	fix(kanban): retry write_txn on transient SQLITE_BUSY	2026-06-28 02:44:04 -07:00
yoniebans	90c1dc0493	test(kanban): cover write_txn BUSY retry (currently failing)	2026-06-28 02:44:04 -07:00
Teknium	6d879d486b	fix(dashboard): close PTY WebSocket on child EOF to stop FD leak (#54028 ) (#54123 ) * fix(dashboard): close PTY WebSocket on child EOF to stop FD leak The /api/pty handler's reader task returns on child EOF, but the writer loop stayed blocked on ws.receive() until the browser sent a disconnect. When the browser socket is half-open (no FIN delivered — common on macOS/launchd), that disconnect never arrives, so the handler never reaches its finally and the PTY master fd + child process leak. With dashboard auto-reconnect (#52962), every dropped socket then spawns a fresh PTY on top of the orphaned one, exhausting file descriptors within hours (EMFILE / Errno 24). Fix: the reader task now closes the WebSocket in a finally when the child EOFs or the send side breaks, which unblocks ws.receive() so the existing finally runs bridge.close(). The writer loop also guards ws.receive() against the RuntimeError Starlette raises once the socket is closed. Reported by @fifteenzhang. Fixes #54028 * docs: add infographic for #54028 PTY FD leak fix	2026-06-28 02:42:21 -07:00
teknium1	7c9cdad9fd	test(cli): cover Windows self-lock recovery guard + cmd-quote its hint Add two tests for the self-lock guard in _recover_from_interrupted_install: one asserting it clears the marker and skips install when hermes.exe is a process ancestor (breaking the #52378/#45542 loop), one asserting it falls through to a normal recovery install when the shim is NOT an ancestor. The guard's manual-recovery hint runs only inside the Windows branch, so quote it for cmd.exe (cd /d, double-quoted paths) — the cross-platform fallback hint at the end of the function is left POSIX-correct. Map Icather in scripts/release.py AUTHOR_MAP for the salvage.	2026-06-28 02:40:37 -07:00
PRATHAMESH75	e551da6ddb	fix(gateway): reap cgroup orphans via ExecStopPost to unblock restart Long-lived helpers spawned indirectly by tool calls (adb, platform bridges) were left in the service cgroup after the gateway's main process exited. When the kernel rejected the deferred cgroup-wide kill with EINVAL, systemd blocked Restart=always for 6+ minutes, taking down all platforms and cron windows (#37454). Add a small ExecStopPost helper (gateway.cgroup_cleanup) that walks cgroup.procs and sends per-PID SIGKILLs — a different kernel code path than cgroup.kill, so it succeeds where the cgroup-wide write failed. KillMode=mixed is preserved so the gateway still reaps its own tool-call children before systemd intervenes (#8202). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-28 02:05:50 -07:00
xxxigm	093f567f0d	fix(agent,cli): surface empty-body API errors and fail oneshot exit code When an LLM API call returns HTTP 4xx with an empty parsed SDK `body` ({}), `_summarize_api_error` fell through to a bare `str(error)`, so users saw only "HTTP 400" with no provider detail (reported on Windows in #36109). The SDK leaves `body` empty in this case, but the httpx `response` still carries the payload in `.text`. - run_agent.py `_summarize_api_error`: when `body` is empty, fall back to `response.text` — parse a JSON `error.message`/`message` when present, else surface the raw (truncated) body. Platform-agnostic diagnostics. - hermes_cli/oneshot.py: `hermes -z` now runs via `run_conversation` and returns exit code 2 when the run is failed/partial with no usable final response, so scripts can detect LLM failures (still 0 when a response — incl. an error summary as output — is produced). Tests: new tests/run_agent/test_summarize_api_error.py (empty-body JSON + raw text, RED/GREEN verified) + oneshot exit-code/`run_conversation` wiring tests. NOTE: #36109's original root cause (Windows "all providers return empty 400") is not reproducible on current main (heavy provider-transport churn since v0.15.1). This change does not claim to fix that root cause — it makes any empty-body API error LEGIBLE so a future occurrence shows the real provider message instead of a bare HTTP 400. Relates to #36109 (does not close it).	2026-06-28 02:05:20 -07:00
teknium1	c918d42d88	feat(desktop): config-driven Electron launch flags + GPU policy Adds a desktop: section to config.yaml so headless/VM users can make `hermes desktop` launch correctly without a wrapper command: - desktop.electron_flags: extra Electron CLI flags (e.g. --ozone-platform=x11) appended to every launch. Accepts a list or a shell-split string. - desktop.disable_gpu: auto\|true\|false, bridged to the HERMES_DESKTOP_DISABLE_GPU env var the Electron app already reads. An explicit env var still wins. cmd_gui() reads these via _desktop_launch_options() and applies them. This is the config.yaml form of the capability proposed as a raw env var in #38934 (@1RB) — behavioral settings belong in config.yaml, not a new HERMES_* env var. Co-authored-by: ray <86501179+1RB@users.noreply.github.com>	2026-06-27 22:26:43 -07:00
Rafael Millan	54ea059919	fix: fall back to no-sandbox for desktop launch on restricted Linux hosts	2026-06-27 22:16:20 -07:00
Teknium	4626ceb747	fix(gateway): only offer system-scope gateway install to root sessions (#53975 ) Non-root users picking 'System service' in the setup wizard were handed a 'sudo hermes gateway install --system --run-as-user <you>' recipe that fails on most distros: sudo's secure_path strips ~/.local/bin (pipx/uv installs), so 'sudo hermes' is command-not-found. Worse, it funnels a non-root user toward a system install they shouldn't be doing from a user session. Now prompt_linux_gateway_install_scope() only offers system scope when os.geteuid()==0. Non-root sessions get user-service or skip, with a tip to re-run as root for a boot service. The non-root branch in install_linux_gateway_from_setup becomes a defensive guard that refuses without printing any self-elevation recipe. Gated the matching deferral hint in setup.py behind root too.	2026-06-27 21:24:08 -07:00
teknium1	f54c52800a	fix(models): scope live-first picker merge to opencode aggregators only Follow-up to the salvaged #49129 commit. The original change flipped the shared generic-provider merge in provider_model_ids() to live-first unconditionally, which regressed curated-first for single providers (kimi/zai, #46309) — and the PR encoded that regression by flipping the kimi-coding and zai test assertions to expect live-first. Gate live-first on an explicit _LIVE_FIRST_PICKER_PROVIDERS set ({opencode-zen, opencode-go}); every other provider keeps curated-first. Also widen the uncapped picker + live-first sets to opencode-go, which has the same 70+ model catalog problem as opencode-zen. Restore the kimi-coding curated-first test and rewrite the merge-order test to assert the per-provider contract.	2026-06-27 21:23:25 -07:00
Afnath Ahamed	f98ffbc246	fix(models): live-first merge + update opencode-zen catalog + uncap aggregator picker	2026-06-27 21:23:25 -07:00
Teknium	3b23a984b5	feat(kanban): stamp handoff freshness so workers don't read stale state as current (#53973 ) Multi-agent boards leak staleness: a sibling worker's parent handoff, comment, or prior-attempt summary gets read by the next worker as live truth even when it's a day old. build_worker_context surfaced the text with (at best) a bare absolute timestamp, which an LLM reads as fact regardless of age — parent results had no timestamp at all. Adds a coarse relative-age stamp (just now / 18h ago / 3d ago) to every recalled-state line and a one-line 'point-in-time snapshot, re-verify against source' frame on the parent-results section, so the worker sees when handoffs were produced and re-checks stale ones before acting.	2026-06-27 21:21:54 -07:00
kshitijk4poor	2af1678bfc	fix(auth): explicit provider intent beats stale OAuth active_provider (#29285 ) `resolve_provider("auto")` checked `auth.json` `active_provider` BEFORE the config.yaml `model.provider` and env-var API-key checks. So a user who was OAuth-logged-into one provider (e.g. Anthropic) but had set an explicit `model.provider` or exported an API key (e.g. `OPENAI_API_KEY`) was silently routed to the stale OAuth provider — the override was invisible and surprising. Reorder the auto-path so explicit intent wins (the order the issue asks for): 1. explicit CLI api_key/base_url 2. config.yaml `model.provider` (safety net — see below) 3. OPENAI_API_KEY / OPENROUTER_API_KEY env 4. OpenRouter credential pool 5. provider-specific API-key env vars 6. auth.json `active_provider` (OAuth) ← demoted to last-resort 7. AWS Bedrock credential chain 8. error `active_provider` is still honored — it's just a last-resort fallback chosen only when the user expressed no other preference, instead of overriding one. The normal chat/gateway/TUI/ACP/status path already resolves config.provider upstream in `resolve_requested_provider()` before "auto" is reached, so this duplicate config check is the safety net for the lone direct caller (`main.py` `resolve_provider("auto")`) and any future bypass. Because every surface funnels through this one resolver, the fix propagates everywhere with a single edit — no sibling path re-implements precedence. Also add a one-shot WARN when resolution lands on `active_provider` while a populated `model` config dict lacks a `provider` key — surfacing the silent override the issue reported without breaking first-install. Synthesizes the two competing PRs: #29615 (LifeJiggy — config-before-auth + the silent-override framing) and #29809 (Minksgo — the env-before-auth reorder). #29809 could not be merged directly (bundled unrelated, un-opt-in cost-tagging telemetry); its reorder idea is incorporated here and credited. Tests: tests/hermes_cli/test_provider_precedence.py — config/env beat stale OAuth, OAuth still used as last resort, explicit request short-circuits, WARN fires on silent fall-through. Full provider-resolution suites: 374 passed. Fixes #29285 Co-authored-by: LifeJiggy <141562589+LifeJiggy@users.noreply.github.com> Co-authored-by: Minksgo <153416856+Minksgo@users.noreply.github.com>	2026-06-27 19:49:02 -07:00
郝鹏宇	98488c4be4	fix(config): prevent save_config from materialising schema defaults Fixes #27354 Root cause: called during init (or by any code path that saves ) wrote injected schema defaults into config.yaml as if the user had authored them. Two fix layers: 1. now only injects when the user actually set somewhere (root or agent). A user who never set keeps it absent, so 's explicit-path detection won't treat it as user-authored. 2. gains a parameter and a new pass that removes keys matching unless those paths were explicitly present in the raw (pre-normalization) config on disk. Explicit-path detection uses on before any normalisation runs — preventing injected-in defaults from being mistaken for user-set values. All migration and edit-config call sites pass to preserve their intentional default-seeding behaviour. New helpers: - — collects leaf-key paths from a raw dict - — removes keys matching schema defaults Test coverage: 4 new regression tests (59 total, all passing).	2026-06-27 19:38:11 -07:00
Teknium	d3d621f7c3	revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853 ) * Revert "fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)" This reverts commit `2ecca1e7d3`. * Revert "fix(windows): stop terminal-window popups from background spawns (#53810)" This reverts commit `5db1430af9`. * Revert "fix(windows): stop subprocess console-window popups + add CI guard (#53791)" This reverts commit `ef17cd204d`.	2026-06-27 15:59:00 -07:00
brooklyn!	5db1430af9	fix(windows): stop terminal-window popups from background spawns (#53810 ) * fix(windows): stop terminal-window popups from background spawns Native-Windows desktop/gateway users saw cmd/conhost windows flash on gateway restart, image paste, the dashboard Projects tree, voice notes, and ~5 min after closing the app (detached cron). Two root causes: - Console-subsystem exes (taskkill, schtasks, wmic, netstat, tasklist, agent-browser, git, ffmpeg, powershell, git-bash) spawned via raw subprocess allocate a fresh console when the launching process has none (pythonw desktop backend / detached gateway) - even with output captured. - uv venv pythonw shims re-exec console python.exe, so Python children get a console regardless of how they're launched. Fixes: - Single hidden-spawn primitive (_subprocess_compat.run/.popen) that ORs CREATE_NO_WINDOW on Windows, no-op on POSIX. Route every Hermes-owned console-exe spawn through it. - FreeConsole() catch-all in hermes_bootstrap: any Python child that exclusively owns an auto-allocated console detaches it at startup (GetConsoleProcessList()==1 gate leaves shared interactive consoles untouched). - Replace PowerShell/wmic gateway PID scans with in-process psutil. - Skip schtasks queries on non-interactive desktop restarts. - Prefer native agent-browser .exe over .cmd shims. - Guard test bans raw subprocess spawns of the Windows-only console tools repo-wide so the popup class can't regress. * fix(windows): scope FreeConsole to background entry points; fix merge fallout Console detach review (per #53810 feedback): GetConsoleProcessList()==1 can't tell a uv pythonw->python phantom console apart from a user opening the interactive CLI/TUI in its own fresh console (double-click, shortcut, ConPTY) — both report a single attached process with a tty. Running FreeConsole() in the import-time bootstrap therefore risked detaching a legitimately-interactive terminal. - Extract FreeConsole into explicit hermes_bootstrap.detach_orphan_console(); remove it from apply_windows_utf8_bootstrap() (import side effect). - Call it only from known background mains: gateway run, dashboard backend (start_server, what the desktop spawns), cron standalone, tui_gateway entry, slash worker. Interactive CLI/TUI never calls it. - Behavior-contract tests: frees only when solo owner, leaves shared console, no-op without console / on POSIX, and asserts it's not an import side effect. Merge fallout from origin/main (#53791): - local.py: 3-way merge left a dangling *_popen_kwargs (NameError crashing every terminal init). _subprocess_compat.popen already hides the window, so drop it. - discord adapter: merge stacked an undefined windows_hide_flags() onto the primitive call; drop the redundant arg. - test_gateway: scan now goes psutil-first (zero spawn); rewrite the case-variant test to drive that production path. test(claw): mock _subprocess_compat.run seam for Windows process scan claw.py's Windows tasklist/powershell scan routes through the hidden-spawn primitive; the tests still patched claw_mod.subprocess, so on win32 the mock was never hit and real spawns returned nothing. Patch the actual seam.	2026-06-27 14:02:24 -07:00
Teknium	ef17cd204d	fix(windows): stop subprocess console-window popups + add CI guard (#53791 ) * fix(windows): stop subprocess console-window popups + add CI guard The single biggest source of Windows 'terminal popup' bug reports was bare subprocess.run/Popen calls spawning a console window. The compat helpers (windows_hide_flags / windows_detach_popen_kwargs) already existed but the footgun checker had no rule to stop new bare calls from reintroducing the flash. - scripts/check-windows-footguns.py: new AST-based rule flagging subprocess calls that can create a new console — output-redirection-aware (capture/ redirect/check_output exempt) and POSIX-only-program-aware (launchctl/ systemctl/brew/etc. exempt). Comprehensive on real popups, no annotation burden on calls that can't flash. - Swept all genuine window-spawning sites through windows_hide_flags()/ windows_detach_popen_kwargs(); marked intentionally-visible launches (editor/terminal/foreground re-exec) with '# windows-footgun: ok'. - tests/scripts/test_windows_footgun_subprocess_rule.py: behavior-contract tests + full-repo cleanliness invariant. - CONTRIBUTING.md: documents the rule + the helper pattern. * test: accept creationflags kwarg in psutil_android fake_subprocess_run The Windows no-window sweep added creationflags=windows_hide_flags() to install_psutil_android.py's subprocess.run call; the test's fake stub had a fixed (cmd) signature and raised TypeError on the new kwarg.	2026-06-27 13:03:51 -07:00
ailthrim	25ec01f79f	fix(desktop): don't purge Electron cache / mirror-retry after a late build failure `hermes desktop` / `hermes update` recover from a corrupt Electron download by purging the cached zip + re-downloading and retrying the pack, and then by falling back to a public mirror. That recovery is only meaningful when the packaged executable is MISSING — the signature of a partial/corrupt unpack. A LATE failure such as macOS code signing (#40187) leaves `Hermes.app/Contents/MacOS/Hermes` (or the platform equivalent) in place. Re-downloading Electron can't repair a signing failure, so the purge + slow mirror retry just grind through another identical failure before the build finally errors out. Gate both recovery blocks on `_desktop_packaged_executable(desktop_dir) is None` so a build that already produced the executable fails fast instead of triggering the destructive download recovery. The corrupt-download path (executable missing) is unchanged. Salvage of #42782, re-applied onto current main (the surrounding recovery was refactored to `_electron_dist_ok` / `_redownload_electron_dist` since the PR was opened). Adds a regression test asserting no purge / mirror retry runs when the executable exists, and updates the existing retry/mirror tests to model the corrupt-download case (executable absent) the recovery is actually for. Related to #40187 (the residual cache-purge sub-issue; the signing failure itself is fixed by #52591).	2026-06-28 00:29:34 +05:30
teknium1	1ef19bad90	fix(model): show MoA preset picker on selection and label MoA in the banner Selecting 'Mixture of Agents' in the `hermes model` provider picker fell through silently — select_provider_and_model had no moa branch, so it just reprinted the current model/provider summary and exited. And the CLI session banner rendered the bare preset name (e.g. 'opus-gpt · Nous Research'), which is meaningless out of context. - Add _model_flow_moa: always lists the available presets (even one), then prints the full reference-models + aggregator breakdown for the selection and persists model.provider=moa / model.default=<preset> (dropping stale base_url + endpoint creds, since moa is a virtual local provider). - Wire the branch into select_provider_and_model. - build_welcome_banner takes provider; when 'moa' it renders 'MoA: <preset> · agg <aggregator>' instead of a bare slug. Both CLI call sites pass self.provider. Tests: 2 new banner tests (moa + non-moa unchanged); E2E verified the picker persists the preset and clears stale base_url/api_key.	2026-06-27 11:45:07 -07:00
Teknium	27322612b4	fix(update): route loud build/installer output to update.log instead of the terminal (#53616 ) * fix(update): route loud build/installer output to update.log instead of the terminal hermes update flooded the terminal with the full vite asset dump, electron-builder logs, npm deprecation warnings from the desktop build, and the cua-driver installer's 'Next steps' wall. All of that is low-signal noise the user doesn't need on a successful update. - Capture the desktop --build-only subprocess (vite + electron-builder) into ~/.hermes/logs/update.log; print a one-line status, and on failure surface the last 15 lines + a pointer to the full log. - Capture the cua-driver installer's output when verbose=False (the hermes update refresh path); concise upgrade line is unchanged. - Add _log_only_write() / _run_logged_subprocess() helpers that write to the update.log handle without echoing to the terminal. The repo-root npm install keeps streaming (capture_output=False) — that is the deliberate #18840 guard so a slow postinstall download doesn't look hung. The desktop npm install is a separate Electron process with no such progress concern and is captured. * fix(update): persist full cua-driver installer output to update.log The captured cua-driver installer output was only sent to logger.debug (agent.log) on failure, so the 'Next steps' wall was lost from update.log entirely on success. Write the full captured output straight to the update.log handle (sys.stdout._log) on both success and failure, matching the desktop-build capture, so update.log keeps the complete record of everything an update did.	2026-06-27 11:43:01 -07:00
Teknium	917f6bdb00	fix(tools): let vision pick any provider+model, not just OpenRouter (#53606 ) * fix(tools): let vision pick any provider+model, not just OpenRouter hermes tools → configure → vision no longer forces an OPENROUTER_API_KEY. It now offers the same any-provider surface as the model command: Auto (use main model / aggregator fallback), pick any authenticated provider + model, or a custom OpenAI-compatible endpoint. Selections persist to auxiliary.vision.{provider,model,base_url} — the keys the vision resolver already reads. Custom endpoint pins provider=custom so base_url routes correctly. Reconfigure path uses the same picker instead of re-prompting for OPENROUTER_API_KEY. * docs: add PR infographic for vision any-provider picker	2026-06-27 04:41:42 -07:00
ms-alan	16192103f4	fix(config): accept placeholder base_url in custom provider validation _normalize_custom_provider_entry() ran urlparse() on base_url and dropped any entry whose value was an un-expanded placeholder, so a caller reaching the normalizer with raw config (e.g. the Dockerized gateway path) silently skipped the provider with a 'not a valid URL' warning. Skip URL validation when the candidate contains a placeholder token — both ${ENV_VAR} env-refs and bare {region}-style templates — since those are expanded at runtime. Closes #14457	2026-06-27 04:15:27 -07:00

1 2 3 4 5 ...

1720 commits