hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-01 12:02:05 +00:00

Author	SHA1	Message	Date
Teknium	ee8cbfdc03	feat(web_extract): truncate-and-store instead of LLM summarization (#54843 ) * feat(web_extract): truncate-and-store instead of LLM summarization web_extract no longer runs an auxiliary LLM over scraped pages. The extract backends (Firecrawl/Tavily/Exa/Parallel) already return clean, boilerplate- stripped markdown, so we return it directly: pages within a char budget (default 15000, web.extract_char_limit) come back whole; larger pages get a head+tail window plus an explicit footer giving the stored full-text path and the read_file call to page through the omitted middle. The full clean text is written to cache/web (mounted read-only into remote backends like the other cache dirs), so nothing is lost. Inline base64 images are converted to [IMAGE: alt] placeholders (token bombs dropped) while real http(s) image URLs are preserved as links so the agent can still web_extract/vision_analyze them. Removes process_content_with_llm + the chunked summarizer + check_auxiliary_model + _resolve_web_extract_auxiliary. context_references._default_url_fetcher is updated to the truncate path and its stale data.documents shape read is fixed to results (it was silently returning empty). Live before/after eval (firecrawl, 4 URLs): 11.7x faster overall (176.6s -> 15.1s); 10-60x on large pages. Quality identical; findability 4/4 (answer recoverable from stored full text on every truncated page). web_search is unchanged. No own scraper added; no changes to web_search. * fix(web_extract): add char_limit to execute_code web_extract stub The new web_extract char_limit param must appear in the code_execution_tool _TOOL_STUBS signature (and doc line) or test_stubs_cover_all_schema_params fails — the stub schema must cover every real schema param.	2026-06-29 10:00:49 -07:00
Ben Barclay	f53ba9bb54	fix(s6): dot-prefix gateway staging dir so svscan ignores it mid-build (#54834 ) Some checks are pending CI / Detect affected areas (push) Waiting to run Details CI / Python tests (push) Blocked by required conditions Details CI / Python lints (push) Blocked by required conditions Details CI / TypeScript (push) Blocked by required conditions Details CI / Docs Site (push) Blocked by required conditions Details CI / Deny unrelated histories (push) Blocked by required conditions Details CI / Check contributors (push) Blocked by required conditions Details CI / Check uv.lock (push) Blocked by required conditions Details CI / Lint Docker scripts (push) Blocked by required conditions Details CI / Build&Test Docker image (push) Blocked by required conditions Details CI / Supply-chain scan (push) Blocked by required conditions Details CI / OSV scan (push) Waiting to run Details CI / All required checks pass (push) Blocked by required conditions Details Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details The register path builds each profile-gateway slot in a sibling staging dir under /run/service (the scandir s6-svscan watches), then atomically renames it to the live gateway-<profile> name. The staging dir was named gateway-<profile>.tmp — a NON-dotfile — so a concurrent `s6-svscanctl -a` rescan (fired by the cont-init reconciler registering gateway-default, or by a sibling register) would supervise the half-built slot the moment it had a valid type/run: s6-supervise spawns AS ROOT and mkdirs supervise/ root-owned 0700, then the in-flight _seed_supervise_skeleton early-returns on the now-existing supervise/ and the next `mkdir supervise/event` hits PermissionError. That is the arm64-only CI flake on test_s6_unregister_removes_service_dir_in_live_container (PermissionError: /run/service/gateway-phase3test.tmp/supervise/event) — arm64-only because the native-arm runner's wider scheduling jitter lets the rescan land inside the ~ms seed window; amd64 ran 30/30 clean. Fix: dot-prefix the staging dir (.gateway-<profile>.tmp) in both register paths (S6ServiceManager.register_profile_gateway and container_boot._register_service). s6-svscan skips any scandir entry whose name begins with '.', so the half-built slot can never be supervised mid-build. The atomic rename to the dotless live name is unchanged. Verified on a real s6 image (amd64): a non-dotted staging dir is picked up by an svscanctl -a rescan (SUPERVISED owner=root) while a dot-prefixed one is ignored (NOT-SUPERVISED). Added a docker-harness regression test that asserts both, plus a unit test that the staging dir is dot-prefixed.	2026-06-29 21:33:00 +10:00
teknium1	61f56d27db	refactor(dashboard-auth): drop redundant _interactive_providers helper list_session_providers() already filters on supports_session=True, so the new helper re-filtered an already-filtered list. Call it directly at the single auto-SSO call site.	2026-06-29 04:25:18 -07:00
Ben	f5ecbe1ec6	feat(dashboard): auto-initiate portal SSO redirect on unauthenticated load When the dashboard gateway has no local session cookie, it rendered a click-through /login interstitial — even though the Nous portal's /oauth/authorize auto-approves any current member of the dashboard's org and is a silent 302 when the user already holds a portal session. For the common case (clicking a hosted-agent dashboard link while signed in to the portal) that interstitial click is pure friction. This makes the gate auto-initiate the OAuth redirect on an unauthenticated HTML document load instead of rendering the interstitial, when exactly one interactive provider is registered. A one-shot loop-guard cookie (hermes_sso_attempt, 60s TTL) ensures that a genuinely absent portal session (the portal bounces back still-unauthenticated) falls back to the /login page after exactly one bounce rather than ping-ponging forever. The marker is cleared on a successful callback and whenever the gate falls back to /login. Security: this removes a human CLICK, not a security check. The redirect lands on the existing /auth/login route and runs the unchanged PKCE auth-code flow; token verification, audience checks, redirect-URI match, and org-membership checks are all untouched. /api/* fetches still get the 401 JSON envelope (never a 302 a fetch() would follow opaquely), and with two or more providers the /login chooser still renders. Phase 1 of the cloud-auto-discovery work.	2026-06-29 04:25:18 -07:00
Sahil-SS9	1bb7b59c5d	fix: offload blocking profiles endpoints from asyncio event loop (#54523 ) (cherry picked from commit `09f10e2b77`)	2026-06-29 02:35:57 -07:00
chenxiang	d5eee133eb	perf(profiles): fix list_profiles O(N*M) wrapper rescan (6.4s -> 0.4s) find_alias_for_profile re-scanned the whole wrapper dir (~/.local/bin) and read_text every file for EACH profile — including large unrelated binaries (ffmpeg etc.) read 15x over. With 16 profiles this took ~6.4s, long enough that the desktop's per-request backend calls timed out (15s) and the sidebar rendered '全部智能体 0 / 会话 0'. - Add build_alias_map(): single-pass {profile -> alias} reverse map, reads only an 8KB head slice per wrapper, skips binaries via UnicodeDecodeError. - find_alias_for_profile now delegates to it (behavior preserved). - Cache _count_skills by skills-dir mtime signature (+30s TTL). list_profiles: 6.37s -> 0.84s cold / 0.44s warm. 138 profile tests pass. (cherry picked from commit `89e593749a`)	2026-06-29 02:35:57 -07:00
Telos	fa11b11cf5	fix: propagate key_env from custom_providers into ProviderDef resolve_custom_provider() previously returned api_key_env_vars=() for every custom provider entry, silently dropping the configured key_env field. This caused 401 errors for any custom provider that required an API key via environment variable (e.g. Xiaomi MiMo Token Plan, self-hosted OpenAI-compatible servers). The key_env field is already documented in _VALID_CUSTOM_PROVIDER_FIELDS and normalized by normalize_custom_provider_entry(), so this was just an oversight in the ProviderDef construction. Also adds a regression test that verifies key_env is properly propagated into the resolved ProviderDef.	2026-06-29 02:25:48 -07:00
Teknium	bf0d8fed8e	fix(config): v32 migration flips baked-in verify_on_stop=true to false (#54740 ) The first ship of verify-on-stop (config v30) defaulted DEFAULT_CONFIG agent.verify_on_stop to a literal True, and migrate_config persists defaults with strip_defaults=False — so every install that updated through v30 had verify_on_stop: true written into config.yaml as a literal. The v30->v31 migration only flipped missing/'auto' values to false and deliberately preserved an explicit bool, so it skipped that entire population and left verify-on-stop ON for everyone who had updated. A literal true was never a user choice: the feature had no off-switch worth setting it against until v31 introduced one, so a true persisted before v32 is always the old machine default. v32 migration flips a literal true -> false once, for both v30 (skipped v31) and v31 (preserved-by-bug) installs. A true the user sets AFTER v32 is a deliberate opt-in and is never touched.	2026-06-29 01:51:08 -07:00
teknium1	41095fdb04	fix(camofox): register CAMOFOX_API_KEY in OPTIONAL_ENV_VARS The auth-header fix reads CAMOFOX_API_KEY but it was never registered, so it didn't surface in `hermes setup` / `hermes tools`. Add it as an advanced password-category tool env var alongside CAMOFOX_URL.	2026-06-29 01:26:24 -07:00
Ben	4125cc3b7c	fix(slack): subscribe to message.mpim + mpim scopes so group DMs work Group DMs (multi-person DMs, channel_type=mpim) were never delivered to the Slack bot. The adapter already classifies mpim as a DM and replies ambiently (adapter.py:2526, is_dm = channel_type in {im, mpim}), but the generated app manifest only subscribed to message.im / im:history — the 1:1 DM pair. Without the message.mpim event subscription Slack drops group-DM messages before the adapter ever sees them, so 1:1 DMs worked while group-DM ambient mode was dead. Add message.mpim to bot_events and mpim:history (the scope that event requires per Slack docs) + mpim:read (mirrors im:read for the conversations.info classification call) to bot_scopes. Update the SLACK_BOT_TOKEN / SLACK_APP_TOKEN setup-help strings and the Slack docs (EN + zh-Hans: scope table, event table, troubleshooting) so existing installs are told to add the new scopes and reinstall. Reported by an enterprise customer. Note: this is a manifest/scope change, so it only takes effect after the app is reinstalled and the new scopes are accepted. Tests: assert message.mpim + mpim:history + mpim:read are in the manifest (with and without assistant mode); both fail on current main and pass with this change.	2026-06-29 01:02:53 -07:00
Ben	1c75e7c9d8	feat(dashboard): list & add arbitrary custom .env keys on the Keys page The Keys page only rendered env vars present in a catalog (OPTIONAL_ENV_VARS or the provider catalog); any other key a user set in .env was invisible, and there was no way to add an arbitrary env var from the GUI (e.g. to inject a var a skill or MCP server needs). Backend: GET /api/env now also emits a row for every on-disk .env key that isn't in any catalog, flagged category="custom" + custom=true and password-masked (an unrecognised key could hold anything, so it's redacted and reveal-gated like any secret). Channel-managed credentials stay excluded. The write (PUT /api/env) and reveal (POST /api/env/reveal) paths already handle arbitrary keys, with the existing env-name guard + denylist (PATH, LD_PRELOAD, PYTHONPATH, …) enforced server-side — no new write surface. Frontend: a new "Custom Keys" section lists those custom rows and carries an add-a-key form (client-side name validation mirroring the backend regex; the new row reuses the normal edit/save flow, so on save it round-trips back from the backend as a durable custom row). i18n added for en + zh + types. Tests: behavior-contract coverage that an unknown .env key surfaces as a masked custom row and a catalogued key does not — verified to fail on the pre-fix backend.	2026-06-28 22:53:56 -07:00
Shannon Sands	476875acb9	Add dashboard backup upload and download	2026-06-28 22:35:09 -07:00
brooklyn!	388268ecde	Merge pull request #54568 from NousResearch/bb/shared-websocket-layer refactor(desktop+dashboard): shared WebSocket layer + decouple desktop from dashboard (hermes serve)	2026-06-28 23:43:49 -05:00
Ben Barclay	0943e2a272	fix(cron): don't report a false 'gateway not running' on external-provider instances (#54600 ) `hermes cron status` (and the create/list 'gateway not running' nag) judge whether cron will fire purely from the in-process ticker's heartbeat file + a live gateway PID. That heuristic is correct for the built-in ticker but WRONG for an external provider like Chronos: Chronos arms exactly one external one-shot per job and is fired by a NAS-mediated webhook (POST /api/cron/fire). Its `start()` returns immediately and it deliberately runs no 60s loop and writes no ticker heartbeat — that's the whole point of scale-to-zero (the machine is at zero between fires). So on a perfectly healthy Chronos instance, `cron status` always printed '✗ Gateway is not running — cron jobs will NOT fire' (or a STALLED-ticker warning), and `cron create` always appended the 'jobs won't fire automatically' nag — both false. Verified live on a staging Chronos instance: jobs fired and completed on schedule via the relay while `cron status` insisted the gateway wasn't running and the heartbeat was 370s+ stale. Fix: resolve the active provider (offline — `resolve_cron_scheduler`, whose `is_available()` contract forbids network) and, for any non-builtin provider, report the managed-scheduler state instead of the ticker heuristics, and suppress the ticker-only 'gateway not running' warning. The built-in path is byte-unchanged. Active-job summary is factored into a shared helper so both paths print it identically. New tests prove both directions (chronos: no false negative even with no gateway PID / no heartbeat; builtin: historical warning preserved) and fail without the fix.	2026-06-29 14:03:02 +10:00
lkevincc	163562bf88	fix: normalize lmstudio base urls	2026-06-28 20:46:44 -07:00
Brooklyn Nicholson	9d9a50c2bc	test(cli): pin the `hermes serve` decoupling contract Add a focused contract test for the headless `serve` command (routes to the shared dashboard handler, headless by default while `dashboard` is not, accepts the legacy --no-open, shares the same runtime/lifecycle flag surface). Also refresh the dashboard.py module docstring to cover both commands.	2026-06-28 22:11:48 -05:00
Brooklyn Nicholson	dff491a2b9	feat(cli): add headless `hermes serve` backend; desktop no longer launches `dashboard` The desktop app spawned `hermes dashboard --no-open` as its backend, which made the dashboard look like a desktop prerequisite. Add a dedicated headless `hermes serve` command that boots the same gateway (shared cmd_dashboard / start_server) but never opens a browser, and point the desktop backend spawn exclusively at it. dashboard and serve are now independent surfaces — neither launches the other. - subcommands/dashboard.py: factor shared server args; add `serve` parser (always headless; accepts legacy --no-open as a no-op) - main.py: register serve in _BUILTIN_SUBCOMMANDS + coalesce set + gui-log detection; extend stale-backend reaper patterns to match `serve` - desktop electron: spawn `serve`, rename dashboardArgs -> backendArgs, update comments + windows-child-process test assertions - docs: desktop README, desktop.md (incl. remote-backend), AGENTS.md, and cli-commands.md now describe `hermes serve` as the desktop/headless backend	2026-06-28 22:04:22 -05:00
Ben	dee41d0716	feat(dashboard): catalogue all memory-provider API keys in OPTIONAL_ENV_VARS The dashboard Keys page and `hermes setup` render API-key rows from OPTIONAL_ENV_VARS, but only Honcho had an entry — so Hindsight, Supermemory, Mem0, RetainDB, ByteRover, and OpenViking read their keys straight from os.environ yet had no place to set them in the GUI. Add catalog entries (category=tool, password-masked, with get-key URLs and the tool each powers) for all six, plus the relevant base-URL/endpoint companions. Pure declaration: the generic GET /api/env endpoint, the save/reveal write path, and the sandbox env blocklist (which auto-derives from tool-category OPTIONAL_ENV_VARS) all pick these up with no further wiring. Adds a behavior-contract test asserting every memory provider's primary credential key is catalogued, tool-categorised, and password-masked.	2026-06-28 19:17:02 -07:00
Teknium	11183e8332	fix(profiles): validate custom alias names to prevent path traversal `hermes profile alias <profile> --name <custom>` accepted arbitrary strings and used them verbatim as a filename under ~/.local/bin. Because normalize_profile_name only lowercases/strips (no regex gate), a value like `../../.bashrc` escaped the wrapper directory and clobbered arbitrary user-writable files. remove_wrapper_script had the same sink. Add validate_alias_name (reusing the profile-id regex, which forbids `/`, `.`, and `..`) and wire it into check_alias_collision, create_wrapper_script, remove_wrapper_script, and the CLI alias action so the rejection surfaces a clear "Invalid alias name" error instead of silently writing or unlinking outside the wrapper dir. Co-authored-by: Gutslabs <gutslabsxyz@gmail.com> Co-authored-by: Xowiek <xowiekk@gmail.com>	2026-06-28 18:53:33 -07:00
aaronagent	5c1ac6c70d	fix(config): strip `export` prefix in .env parsers across three modules All three .env parsers use `line.partition("=")` without stripping the bash-compatible `export ` prefix first. A line like `export API_KEY=sk-...` produces key `"export API_KEY"` instead of `"API_KEY"`, silently ignoring the variable and causing auth failures for users who copy-paste from bash profiles or follow tutorials that include `export`. - tools/skills_tool.py: `load_env()` for skill environment - hermes_cli/config.py: `load_env()` for core config - hermes_cli/main.py: `_has_any_provider_configured()` inline parser Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-28 18:53:00 -07:00
Brooklyn Nicholson	27f03243a0	fix(dashboard): stop ElevenLabs voice-list 401 log spam The /api/audio/elevenlabs/voices endpoint logged a WARNING on every failure, and the desktop re-polls it on each settings open/focus — a bad/expired/scoped ELEVENLABS_API_KEY floods agent/gui logs with identical "voice list failed: HTTP Error 401" lines indefinitely. Treat 401/403 as a persistent "integration unavailable" state: return {available: false, error: "unauthorized"} with a 200 (the dropdown already handles available:false) instead of a 502, and collapse repeated identical failures to a single log line via a small re-arming latch (logs again on recovery or when the error changes). Non-auth errors keep the 502 but are throttled the same way.	2026-06-28 17:59:28 -05:00
Teknium	980622d0ec	perf(startup): parse config + plugin manifests with libyaml CSafeLoader (#54486 ) The startup config/manifest reads used PyYAML's pure-Python SafeLoader, which is ~8x slower than the libyaml-backed CSafeLoader C extension. config.yaml is parsed several times during launch (cli config, raw config, early interface/redaction bridge, logging config) and every plugin manifest is parsed once — all on the slow path. Add utils.fast_safe_load (CSafeLoader-preferring, pure-Python fallback, true drop-in for safe_load) and route the hot startup parse sites through it: hermes_cli/config.py (config + manifest reads), hermes_cli/plugins.py (manifest parse), env_loader, cli.load_cli_config, hermes_logging, and the two pre-config early YAML bridges in main.py. Behavior is identical (same restricted safe tag set); only speed changes. safe_load calls on the startup path drop from ~79 to ~0, cutting the YAML parse cost from ~0.9s to ~0.15s under profiling. Adds tests/test_fast_safe_load.py asserting equivalence with safe_load across input shapes, empty-doc falsiness, C-loader preference, and that python/object tags are still rejected (safe, not full loader).	2026-06-28 15:38:39 -07:00
brooklyn!	16ff1a3b93	Merge pull request #54457 from NousResearch/bb/windows-console-launcher-repair fix(windows): repair missing console script launchers	2026-06-28 17:15:56 -05:00
奥森木	e7d4ade8cf	fix(anthropic): ignore stale non-Anthropic base_url across all resolution paths A config left with `provider: anthropic` but a leftover `base_url: https://openrouter.ai/api/v1` (e.g. after a provider switch) would route Anthropic OAuth/setup-token traffic to OpenRouter and 404. Add `_anthropic_base_url_override_ok()` and gate the three native-Anthropic resolution branches (pool, explicit, native) on it. The guard honors a configured `model.base_url` only when it plausibly speaks the Anthropic Messages protocol — official `.anthropic.com` / `.claude.com` hosts, Azure Foundry endpoints, and `/anthropic`-suffixed or Kimi `/coding` proxies — and falls back to `https://api.anthropic.com` otherwise. Aggregator URLs like openrouter.ai / api.openai.com are treated as stale. Reconstructed from @clovericbot's PR #3661 onto current main: the original patched one branch with an anthropic-only allow-list, which would have broken Azure-via-anthropic; widened to all three sites and made Azure/proxy-safe.	2026-06-28 15:12:03 -07:00
Teknium	95f2919f91	perf(startup): lazy-load gateway platform adapters (#54448 ) Bundled platform plugins (telegram, discord, feishu, teams, ...) were eagerly imported at plugin-discovery time on every `hermes` invocation, including plain `hermes chat` which never touches a gateway platform. Their modules import heavy platform SDKs at module level (lark_oapi, microsoft_teams, discord.py, slack_bolt, ...) — feishu alone pulled in lark_oapi (~2.6s), teams pulled microsoft_teams (~1.9s). Discovery now registers a cheap deferred loader per platform in the platform_registry; the adapter module is imported only when the gateway / cron / setup / send_message path actually asks for that platform. is_registered() and the iterate-all accessors stay correct (deferred counts as registered; plugin_entries()/all_entries() materialize all deferred loaders, since those paths genuinely need every adapter). Cold start: ~4.4s -> ~2.45s to banner. discover_and_load: 2.0s -> 0.3s (warm), and the heavy SDKs are no longer imported at all in CLI mode. Every shipped platform remains available out of the box — it just loads on first use.	2026-06-28 15:11:59 -07:00
Mibayy	b0b7ff0d75	fix(provider): auto+base_url bypasses cloud API when custom endpoint configured (#3846 ) When config.yaml has `provider: auto` and a non-cloud `base_url` (e.g. Ollama at localhost:11434), requests were silently sent to https://api.anthropic.com whenever ANTHROPIC_API_KEY was present in the environment, ignoring the configured local endpoint and returning HTTP 401 / "credit balance too low". Root cause: resolve_provider("auto") scans env vars and returns "anthropic" when ANTHROPIC_API_KEY is set, before config.model.base_url is ever consulted. In resolve_runtime_provider(), before calling resolve_provider(), short-circuit to the OpenAI-compatible resolver when no explicit creds were passed, provider is "auto"/unset, and a non-cloud base_url is configured. Well-known cloud roots (openrouter.ai, anthropic.com, openai.com) are matched on HOST (not substring) so look-alike hosts can't evade the bypass and leak a cloud credential. Co-authored-by: Hermes Agent <hermes@nousresearch.com>	2026-06-28 15:11:55 -07:00
Gille	df8e2523fa	fix(windows): verify launchers after primary install	2026-06-28 17:02:05 -05:00
HexLab98	95994bbc56	fix(windows): repair missing hermes.exe after pip install (#52931 ) On Windows, uv pip install -e . can register hermes.exe in package metadata while the launcher never lands on disk. Detect missing [project.scripts] shims and reinstall entry points under the existing quarantine path in hermes update and install.ps1.	2026-06-28 17:01:31 -05:00
brooklyn!	28097d9cd9	Merge pull request #54385 from NousResearch/bb/project-folder-picker-remote feat(desktop): remote-gateway-aware folder picker + git cockpit (status, review, worktrees)	2026-06-28 16:35:57 -05:00
Teknium	9a0010fd46	fix(windows): cover remaining console-flash spawn legs (#54417 )	2026-06-28 13:49:08 -07:00
Brooklyn Nicholson	453f134b3b	refactor(desktop): centralize remote git REST routing Keep the remote git mirror as a thin facade: route all GETs through gitGet, all mutations through gitPost, and keep consumers on desktopGit(). On the backend, route git paths through a single _git_path helper instead of repeating str(_fs_path(...)) in every endpoint. Behavior unchanged.	2026-06-28 14:37:36 -05:00
Brooklyn Nicholson	e4cf3a2e9d	refactor(web_git): unify porcelain-v2 parsing into one walker Collapse the two near-duplicate status parsers (_parse_status_v2 + _iter_status_entries) into a single _walk_entries generator feeding the rail, review list, and commit flow; share the staged predicate; hoist `import re`. Behavior unchanged.	2026-06-28 14:29:59 -05:00
Brooklyn Nicholson	fc86e35764	feat(desktop): make the git cockpit work over a remote gateway After the folder picker fix, an added remote folder was still half-usable: the desktop's git GUI (coding-rail status, worktree lanes, review pane, branch switch, file diff) all ran Electron-local git on the USER's machine, so against a remote-gateway repo they silently degraded to empty. Mirror the whole surface over the dashboard REST API so it acts on the BACKEND repo where sessions actually run: - hermes_cli/web_git.py: git/gh logic (status, worktrees, branches, review list/diff/stage/unstage/revert/commit/commit-context/push/ship-info/ create-pr, file-diff, worktree add/remove, branch switch) shelling to the system git, mirroring the Electron ops' shapes. - web_server.py: /api/git/* routes (same auth gate + _fs_path hardening as /api/fs, executor-offloaded, mutations -> 400). - apps/desktop desktop-git.ts: remote-aware facade exposing the same shape as window.hermesDesktop.git; coding-status / review / projects / model / desktop-fs route through desktopGit() so local stays Electron, remote hits /api/git/*. Tests: tests/hermes_cli/test_web_server_git.py (real repo: status counts, review classification, diff incl. untracked all-add, stage+commit roundtrip, worktree/branch lifecycle, commit-context, gh-absent ship-info, auth) and desktop-git.test.ts (local vs remote routing, envelope unwrap, POST bodies).	2026-06-28 14:26:09 -05:00
ygd58	3e16176ba4	fix(tools): reconcile agent.disabled_toolsets when a toolset is enabled _get_platform_tools() applies agent.disabled_toolsets as a final override AFTER reading platform_toolsets.<platform>, so a toolset listed there stays permanently OFF no matter what the toggle write path saves. Blank Slate installs pre-populate this list with ~27 toolsets, making most of the desktop Toolsets UI un-enableable (issue #49995). Fix: _save_platform_tools() now removes any toolset the user just explicitly enabled FOR THIS PLATFORM from agent.disabled_toolsets. Toolsets the user did not touch, or that remain disabled on other platforms, are left alone -- disabled_toolsets keeps working as a cross-platform suppression list for anything not actively re-enabled. Disabling a toolset (unchecking it) does not touch disabled_toolsets at all -- only enables reconcile it. Verified end-to-end with the exact repro from the issue: Blank Slate config (disabled_toolsets=['todo','memory','browser'], cli=['file', 'terminal']) -> enable 'todo' via the toggle -> _get_platform_tools() now resolves 'todo' as enabled while 'memory'/'browser' (untouched) remain disabled. Added 4 regression tests. Full tools_config suite: 101 passed (97 existing + 4 new), no regressions. Fixes #49995	2026-06-28 21:59:03 +05:30
Teknium	cb982ad997	fix(windows): hide console-window flash on backend git/gh/wmic/bash subprocess spawns The Windows desktop GUI runs its backend headless via pythonw.exe. Several auxiliary subprocess sites that run inside that windowless backend spawned console-subsystem children (git, gh, wmic, powershell, bash, rg, taskkill) WITHOUT CREATE_NO_WINDOW, so Windows allocated a fresh conhost per call and flashed a black window on screen — sometimes continuously (the dashboard Projects-tree git probe alone fired ~118 spawns in 60s on startup). The terminal tool, cron, browser, code_execution, and gateway-spawn paths already carry windows_hide_flags(); these auxiliary probe/scan/launcher legs were missed. Wire the existing helper into them: - tui_gateway/git_probe.py: run_git (+ encoding=utf-8/errors=replace, fixes the cp950 UnicodeDecodeError on CJK paths from the same site) - agent/coding_context.py: _git (per-turn git status/log/diff) - agent/context_references.py: _run_git + _rg_files (@file/@ref resolution) - hermes_cli/copilot_auth.py: gh auth token probe (auxiliary provider:auto) - hermes_cli/gateway.py: wmic + PowerShell Get-CimInstance PID scan - hermes_cli/main.py: wmic stale-dashboard PID scan - gateway/status.py: taskkill /T /F force-kill windows_hide_flags() returns 0 on POSIX, so every changed call is a no-op on Linux/macOS (verified: real git/rg probes still work; Windows-simulated calls all pass creationflags=CREATE_NO_WINDOW). Scoped to the windowless-backend paths that cause the reported flashing. The Electron updater-handoff leg (main.cjs windowsHide:false) and the interactive-CLI banner probes (cli.py) are intentionally NOT touched here — the former needs a Windows-tested change of its own, the latter runs in a visible console anyway. Tracking: #54220 Refs: #53178 #53631 #53781 #53957 #49602 #52982 #53424 #53053 #53016	2026-06-28 05:28:45 -07:00
izumi0uu	c4719aa51c	fix(gateway): boot out stale launchd registration before restart bootstrap launchd restart can leave the gateway job stopped but still registered after update-time drain logic, so a direct bootstrap hits exit 5 and falls back to a detached process. Booting the stale registration out before bootstrap keeps the launchd-managed restart path intact and locks it with a regression test. Constraint: Keep upstream-facing conventional commit style while preserving local decision context Rejected: Treat bootstrap exit 5 as expected \| Leaves macOS launchd restart outside launchd supervision after update Confidence: high Scope-risk: narrow Directive: Keep launchd start/restart recovery flows aligned when changing launchctl handling Tested: pytest -q tests/hermes_cli/test_gateway_service.py -k "launchd_restart_boots_out_stale_registration_before_bootstrap or launchd_restart_falls_back_to_detached_on_error_5 or launchd_restart_drains_running_gateway_before_kickstart or launchd_restart_self_requests_graceful_restart_without_kickstart" Tested: pytest -q tests/hermes_cli/test_gateway_service.py -k launchd Not-tested: Manual macOS launchctl restart after hermes update	2026-06-28 04:17:13 -07:00
teknium1	463225caf1	fix(gateway): bypass legacy-unit prompt in non-TTY systemd install Folds in PR #42124 (kyssta-exe): systemd_install gained a non_interactive flag so the 'Remove the legacy unit(s)?' prompt — the second hidden prompt not guarded by --start-now/--start-on-login — is also skipped in headless contexts. Updates systemd_install test mocks to accept the new kwarg and adds coverage for the legacy-unit-skip path.	2026-06-28 04:09:54 -07:00
liuhao1024	831d443b03	fix(gateway): honor --start-now/--start-on-login flags and support non-TTY headless installs When running `hermes gateway install` on Linux/systemd, the command unconditionally prompts with two `prompt_yes_no` questions, breaking headless installs (SSH, CI, provisioning scripts) and ignoring the existing --start-now / --start-on-login CLI flags that the Windows branch already respects. The fix mirrors the Windows path: read CLI flags first, prompt only when flags are not provided AND stdin is a TTY, and fall back to True defaults for non-TTY contexts. The argparse help strings are promoted from SUPPRESS to visible so users can discover the flags. Fixes #42065	2026-06-28 04:09:54 -07:00
Teknium	a06d0198cd	fix(dashboard): reap PTY bridge on child EOF, not only in writer finally (#54190 ) The /api/pty handler only closed the PtyBridge in the writer loop's finally. On child EOF the reader task closes the WebSocket, but if the handler task is cancelled the instant the socket closes, the writer's finally can be skipped and the PTY fds leak (#54028) — the FD-leak the regression test guards. Under dashboard auto-reconnect this stacks orphaned PTYs until fds are exhausted. Reap the bridge in the reader's EOF finally too (close() is idempotent), so the PTY is reaped independently of the writer-loop cancellation race. Harden the regression test to poll for teardown instead of asserting on the same tick. Was flaky on main (2/20); now 25/25.	2026-06-28 03:58:18 -07:00
yoniebans	204a67f0c8	fix(kanban): retry write_txn on transient SQLITE_BUSY	2026-06-28 02:44:04 -07:00
Teknium	6d879d486b	fix(dashboard): close PTY WebSocket on child EOF to stop FD leak (#54028 ) (#54123 ) * fix(dashboard): close PTY WebSocket on child EOF to stop FD leak The /api/pty handler's reader task returns on child EOF, but the writer loop stayed blocked on ws.receive() until the browser sent a disconnect. When the browser socket is half-open (no FIN delivered — common on macOS/launchd), that disconnect never arrives, so the handler never reaches its finally and the PTY master fd + child process leak. With dashboard auto-reconnect (#52962), every dropped socket then spawns a fresh PTY on top of the orphaned one, exhausting file descriptors within hours (EMFILE / Errno 24). Fix: the reader task now closes the WebSocket in a finally when the child EOFs or the send side breaks, which unblocks ws.receive() so the existing finally runs bridge.close(). The writer loop also guards ws.receive() against the RuntimeError Starlette raises once the socket is closed. Reported by @fifteenzhang. Fixes #54028 * docs: add infographic for #54028 PTY FD leak fix	2026-06-28 02:42:21 -07:00
teknium1	7c9cdad9fd	test(cli): cover Windows self-lock recovery guard + cmd-quote its hint Add two tests for the self-lock guard in _recover_from_interrupted_install: one asserting it clears the marker and skips install when hermes.exe is a process ancestor (breaking the #52378/#45542 loop), one asserting it falls through to a normal recovery install when the shim is NOT an ancestor. The guard's manual-recovery hint runs only inside the Windows branch, so quote it for cmd.exe (cd /d, double-quoted paths) — the cross-platform fallback hint at the end of the function is left POSIX-correct. Map Icather in scripts/release.py AUTHOR_MAP for the salvage.	2026-06-28 02:40:37 -07:00
灵越羽毛	b6f592dbdc	fix(cli): detect self-lock in update recovery to break infinite retry loop on Windows	2026-06-28 02:40:37 -07:00
Teknium	fde1c8570f	fix(tui_gateway): suppress WS peer-hangup teardown error flood (#50005 ) (#54126 ) When the Desktop forcibly closes its WebSocket mid-write, asyncio logs a full traceback for every pending connection-lost callback — 50+ identical WinError 10054 (ConnectionResetError) lines per disconnect on Windows, the equivalent ConnectionResetError/BrokenPipeError on POSIX. These are not actionable: they are the expected side effect of the peer hanging up before our writes drained. Install a loop exception handler on the gateway serving loop that collapses exactly this teardown class (ConnectionResetError/ConnectionAbortedError/ BrokenPipeError originating from _call_connection_lost) to a single debug line, forwarding every other loop error to the existing/default handler unchanged so genuine loop bugs still surface. Idempotent per loop.	2026-06-28 02:35:01 -07:00
PRATHAMESH75	e551da6ddb	fix(gateway): reap cgroup orphans via ExecStopPost to unblock restart Long-lived helpers spawned indirectly by tool calls (adb, platform bridges) were left in the service cgroup after the gateway's main process exited. When the kernel rejected the deferred cgroup-wide kill with EINVAL, systemd blocked Restart=always for 6+ minutes, taking down all platforms and cron windows (#37454). Add a small ExecStopPost helper (gateway.cgroup_cleanup) that walks cgroup.procs and sends per-PID SIGKILLs — a different kernel code path than cgroup.kill, so it succeeds where the cgroup-wide write failed. KillMode=mixed is preserved so the gateway still reaps its own tool-call children before systemd intervenes (#8202). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-28 02:05:50 -07:00
xxxigm	093f567f0d	fix(agent,cli): surface empty-body API errors and fail oneshot exit code When an LLM API call returns HTTP 4xx with an empty parsed SDK `body` ({}), `_summarize_api_error` fell through to a bare `str(error)`, so users saw only "HTTP 400" with no provider detail (reported on Windows in #36109). The SDK leaves `body` empty in this case, but the httpx `response` still carries the payload in `.text`. - run_agent.py `_summarize_api_error`: when `body` is empty, fall back to `response.text` — parse a JSON `error.message`/`message` when present, else surface the raw (truncated) body. Platform-agnostic diagnostics. - hermes_cli/oneshot.py: `hermes -z` now runs via `run_conversation` and returns exit code 2 when the run is failed/partial with no usable final response, so scripts can detect LLM failures (still 0 when a response — incl. an error summary as output — is produced). Tests: new tests/run_agent/test_summarize_api_error.py (empty-body JSON + raw text, RED/GREEN verified) + oneshot exit-code/`run_conversation` wiring tests. NOTE: #36109's original root cause (Windows "all providers return empty 400") is not reproducible on current main (heavy provider-transport churn since v0.15.1). This change does not claim to fix that root cause — it makes any empty-body API error LEGIBLE so a future occurrence shows the real provider message instead of a bare HTTP 400. Relates to #36109 (does not close it).	2026-06-28 02:05:20 -07:00
teknium1	64972b6403	fix(config): canonicalize model.name/model.model to model.default (#34500 ) A custom_providers config that names the model under model.name (or model.model) resolved to an empty model, so the API request went out with model= — HTTP 400 from OpenAI-compatible backends. Display paths (hermes status/dump) already read model.name and showed the model, making the failure silent. The model id was read via 'default or model' at ~14 independent sites (cli, gateway, cron, curator, oneshot, fallback, profiles, ...), none of which honored 'name'. Rather than patch every site, canonicalize at the single load/save chokepoint: _normalize_root_model_keys() now promotes model.model/model.name -> model.default (precedence default > model > name) and drops the stale alias, so every reader — present and future — sees a populated default and config.yaml is migrated canonical on next save. The gateway, which bypasses load_config(), replays the same normalization in _load_gateway_config(). Co-authored-by: Bartok9 <danielrpike9@gmail.com> Credit: root-cause analysis and fix direction from @Bartok9 (#34502, first) and @v86861062 (#34527).	2026-06-28 02:05:13 -07:00
Teknium	c9df4bc094	fix(gateway): default restart_drain_timeout to 0 to kill systemd crash loop (#54066 ) A restart now interrupts in-flight agents immediately rather than holding the gateway open for a grace window. The previous 180s default coupled two independently-set timers: the gateway's own drain timer and systemd's TimeoutStopSec. On a stale unit where TimeoutStopSec < drain, systemd SIGKILLed the gateway mid-cleanup, leaving a stale lock that made the next startup exit immediately ('already running') — an infinite crash loop under Restart=on-failure (#31981). Setting drain to 0 makes the mismatch structurally impossible: with drain 0 the generated unit gets TimeoutStopSec=90 against a near-instant drain, so systemd never kills mid-cleanup. Contract: restart the gateway, in-flight work stops. A grace window large enough to 'save' a long agent turn would have to outlast an unbounded task, which is impossible. Also fixes the stale-unit warning's suggested command (hermes gateway service install --replace -> hermes gateway install --force); the former subcommand does not exist. Closes #31981	2026-06-28 01:14:34 -07:00
teknium1	aacc15b2c9	fix(clarify): raise default clarify_timeout to 3600s (#32762 ) The 600s default evicted the gateway clarify entry while users were still away (meeting/AFK); a later button tap then landed on a dead entry and the agent hung on 'running: clarify'. Raise the default to 1h in DEFAULT_CONFIG and the get_clarify_timeout() code-level fallback, documenting the running-agent-guard tradeoff. User overrides still win.	2026-06-28 01:07:53 -07:00
teknium1	c918d42d88	feat(desktop): config-driven Electron launch flags + GPU policy Adds a desktop: section to config.yaml so headless/VM users can make `hermes desktop` launch correctly without a wrapper command: - desktop.electron_flags: extra Electron CLI flags (e.g. --ozone-platform=x11) appended to every launch. Accepts a list or a shell-split string. - desktop.disable_gpu: auto\|true\|false, bridged to the HERMES_DESKTOP_DISABLE_GPU env var the Electron app already reads. An explicit env var still wins. cmd_gui() reads these via _desktop_launch_options() and applies them. This is the config.yaml form of the capability proposed as a raw env var in #38934 (@1RB) — behavioral settings belong in config.yaml, not a new HERMES_* env var. Co-authored-by: ray <86501179+1RB@users.noreply.github.com>	2026-06-27 22:26:43 -07:00

1 2 3 4 5 ...

3142 commits