hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-15 09:21:36 +00:00

Author	SHA1	Message	Date
Teknium	61ee2dbfdb	fix(s6): make profile gateway log parent writable (#46291 ) * fix(gateway): chown logs/gateways parent so late-added profiles can log The per-profile log service script created $HERMES_HOME/logs/gateways/ via 'mkdir -p' but only chowned the leaf logs/gateways/<profile>. When the first log service boots in root context, the gateways/ parent stays root:root; every profile registered later runs its log service as the dropped hermes user, 'mkdir -p' fails with EACCES, and s6-log enters a sub-second fatal crash-loop flooding the container log. The stage2 recursive heal does not catch it either: it is gated on needs_chown, which is false when the top-level $HERMES_HOME is already hermes-owned. Two complementary fixes: - service_manager._render_log_run: chown the gateways/ parent (non-recursively) before the leaf chown. Runs on every root-context boot, so it also heals volumes already poisoned by older images. - docker/stage2-hook.sh: seed logs/gateways in the as_hermes mkdir -p block; cont-init runs before any service starts, so the parent already exists hermes-owned when the first log/run does 'mkdir -p'. The needs_chown repair loop needs no twin entry: it already chowns logs/ recursively, which covers logs/gateways. Fixes #45258 * chore(release): map salvaged contributor --------- Co-authored-by: tangtaizhong666 <tangtaizhong792@gmail.com>	2026-06-15 13:47:05 +10:00
Teknium	40d7c264f0	fix(s6): register profile gateways without auto-starting (#46266 ) * fix(s6): prevent profile create from auto-starting gateway service When hermes profile create runs inside an s6 container, _maybe_register_gateway_service() calls register_profile_gateway() which creates the service directory and triggers s6-svscanctl -a. Previously the service always started immediately, causing profiles that share the main gateway's bot token (e.g. Kanban worker profiles) to fail with a token-lock conflict and persist gateway_state: running — becoming zombies that resurrect on every container restart. Wire the existing start_now parameter through the S6 implementation: when start_now=False, write a marker file (same pattern as container_boot.py _register_gateway_slot) so s6-supervise leaves the service stopped until the user explicitly runs hermes -p <profile> gateway start. 4 files, +61/-6, 4 new tests (all passing). * test(docker): wait for gateway running state before restart --------- Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-06-15 11:43:23 +10:00
Teknium	4eb0ff639b	Remove is_container check when restarting over dashboard (#46290 ) Co-authored-by: IAvecilla <ignacio.avecilla@lambdaclass.com>	2026-06-15 11:09:23 +10:00
Teknium	f3fe99863d	revert(web): remove keyless Parallel search fallback (#46350 ) Remove the free Parallel Search MCP path and restore the keyed Parallel backend behavior from before it was introduced. Also drops the keyless fallback registration/display labeling tests and returns the Parallel SDK pin to the prior version.	2026-06-14 16:47:57 -07:00
Teknium	a829e04d62	fix: migrate cloned profile configs (#46345 )	2026-06-14 16:30:23 -07:00
Teknium	2a14e8957d	fix(kimi): surface K2.7 Code in native picker (#46309 )	2026-06-14 14:01:03 -07:00
mr-r0b0t	bff78a34dc	feat(zai): add GLM-5.2 with verified 1M context window GLM-5.2 ships with a 1M (1,048,576) token context window. Without this entry, Hermes falls through to the generic 'glm' key (202,752 tokens), under-reporting the context bar and prematurely compressing conversations. The 1M limit was verified empirically via needle-in-a-haystack retrieval at 789,240 prompt tokens on api.z.ai/api/coding/paas/v4 — zero errors, zero truncation, correct retrieval at every tested size (25K through 789K). Changes: - agent/model_metadata.py: add 'glm-5.2': 1_048_576 before 'glm' fallback - hermes_cli/models.py: add glm-5.2 to zai curated models - hermes_cli/setup.py: add glm-5.2 to setup wizard zai list - hermes_cli/auth.py: add glm-5.2 to coding plan endpoint probes - plugins/model-providers/zai/__init__.py: add glm-5.2 to fallback_models - tests/agent/test_model_metadata.py: context resolution + vendor-prefix tests	2026-06-14 13:50:36 -07:00
Teknium	a1f51feb72	fix(telegram): avoid rich final duplicate previews (#46206 )	2026-06-14 11:13:38 -07:00
kshitijk4poor	ce19fdb7ce	fix(skills): apply global\|platform disabled union to all resolution sites The platform-disabled fix landed only in agent.skill_utils.get_disabled_skill_names (the system-prompt path). Two sibling resolvers still used the old replace-not-union semantics, so the same skill could be hidden from the <available_skills> prompt yet reported enabled elsewhere: - hermes_cli/skills_config.get_disabled_skills (the 'hermes skills config' UI) returned only the platform list, so a globally-disabled skill showed as enabled (unchecked) on any platform with a platform_disabled entry. - tools/skills_tool._is_skill_disabled (gates whether skill_view loads a skill) ignored the global list when a platform list existed, so a globally-disabled skill could still be loaded on such a platform. Both now union the global list with the platform list, matching get_disabled_skill_names. An explicit empty platform list no longer re-enables a globally-disabled skill — global disables hold on every platform (#46201). Also: fix the now-stale get_disabled_skill_names docstring and drop a stray blank line. Regression tests added for both sites (proven to fail on the old replace semantics).	2026-06-14 22:54:54 +05:30
Teknium	7433d5f0eb	fix(gateway): scope early duplicate guard to pid file	2026-06-14 08:42:06 -07:00
konsisumer	1436793051	fix(gateway): block shell gateway run when a service supervises the profile	2026-06-14 08:42:06 -07:00
Diyon18	288f7026e3	fix(messaging): correct Weixin personal account labeling	2026-06-14 04:52:54 -07:00
Teknium	a27d7e68cc	fix(mcp): block suspicious stdio configs before probe (#46112 )	2026-06-14 04:46:54 -07:00
Teknium	972a9885ee	fix(mcp): block exfil-shaped stdio server configs (#46083 )	2026-06-14 04:24:14 -07:00
Teknium	723c2331bd	fix: make profile subprocess HOME policy explicit	2026-06-14 03:20:21 -07:00
Teknium	0428945b5b	fix(desktop): keep profile homes out of bootstrap (#46073 )	2026-06-14 03:08:52 -07:00
LeonSGP43	89bdb1e546	fix: read dashboard spa assets as utf-8 Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-06-14 02:31:04 -07:00
helix4u	d76a58bd15	fix(gateway): resolve sudo profile system installs	2026-06-14 02:20:55 -07:00
helix4u	4936a49a0c	fix(mcp): preserve loop during probes Some checks failed Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details OSV-Scanner / Scan lockfiles (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details Typecheck / typecheck (apps/bootstrap-installer) (push) Waiting to run Details Typecheck / typecheck (apps/desktop) (push) Waiting to run Details Typecheck / typecheck (apps/shared) (push) Waiting to run Details Typecheck / typecheck (ui-tui) (push) Waiting to run Details Typecheck / typecheck (web) (push) Waiting to run Details uv.lock check / uv lock --check (push) Waiting to run Details Nix Lockfile Fix / auto-fix-main (push) Has been cancelled Details Nix Lockfile Fix / fix (push) Has been cancelled Details Build Skills Index / build-index (push) Has been cancelled Details Build Skills Index / trigger-deploy (push) Has been cancelled Details	2026-06-14 02:09:45 -07:00
helix4u	85e6232a07	fix(providers): support anthropic proxy v1 endpoints	2026-06-14 02:09:16 -07:00
Teknium	1b16c48170	fix: guard OAuth account removal	2026-06-13 21:47:13 -07:00
Justin Sunseri	12682d96b9	feat(telegram): restore rich messages opt-out Salvages PR #45840's client-compatibility opt-out while keeping rich messages enabled by default via telegram.extra.rich_messages: true.	2026-06-13 21:45:49 -07:00
chromalinx	a218a0f156	fix(agent,gateway,doctor): add SSL CA cert bundle fail-fast guard A stale certifi CA bundle after a partial `hermes update` used to crash the agent on the first outbound HTTPS call with a raw traceback and trap the gateway in a retry loop. This patch: * Adds `agent/errors.py` with a typed `SSLConfigurationError` * Adds `agent/ssl_guard.py` with a `verify_ca_bundle()` pre-flight that asserts the bundle exists, is non-trivial in size, and can build a working SSLContext. On macOS, it falls back to the system trust store when the bundle is empty but the system store is healthy (covers corporate proxies / MDM setups). * Wires the guard into `run_agent.py` and `gateway/run.py` right after the `hermes_bootstrap` import, inside a try/except so a bug in the guard itself can never prevent startup. * Adds a `SSL / CA Certificates` section to `hermes_cli doctor` so users can detect the failure with one command. * Adds unit tests covering the healthy, missing, empty, skip-env, and macOS-fallback paths. * Adds an RCA document describing the failure mode and the recovery path (`pip install -e .`). When the bundle is broken the user sees: \u26a0\ufe0f SSL certificate bundle issue detected. Run: pip install -e . `HERMES_SKIP_SSL_GUARD=1` disables the check for sandboxed environments that ship their own trust store.	2026-06-13 21:14:32 -07:00
Teknium	c8e5f34f24	fix(gemini): strip native self prefixes before generateContent (#36141 ) Strip `google/` and `gemini/` self-prefixes before native Gemini generateContent calls, and keep provider-normalization expectations aligned.	2026-06-13 13:47:08 -07:00
Teknium	08890d77e6	fix(plugins): normalize browser-pasted GitHub repo URLs (#33539 ) Accept common GitHub web URLs in `hermes plugins install` by normalizing repository views back to cloneable `.git` URLs, with focused parser coverage.	2026-06-13 13:23:59 -07:00
helix4u	78c11d99e3	fix(update): stop Windows gateways before mutating install	2026-06-13 10:46:08 -07:00
Teknium	9b5f7b63c6	fix(profile): make clone-from a full source selector	2026-06-13 07:33:58 -07:00
WompaJango	28bf8fb47d	feat(dashboard): clone profiles from any source	2026-06-13 07:33:58 -07:00
Que0x	3380563d94	fix(security): stop /api/status leaking host paths and PID on gated binds The dashboard's public /api/status liveness endpoint is in PUBLIC_API_PATHS and bypasses dashboard auth, yet it returned absolute hermes_home, config_path, env_path, the gateway PID, and the internal gateway health URL. That exceeds the shape its own allowlist documents as public ("version, gateway state, active session count, and the dashboard auth-gate shape. No bodies, no session content, no secrets"), leaking deployment recon to any unauthenticated caller on a network-exposed (gated) bind. Withhold host-local detail unless the bind is loopback / --insecure, where the dashboard is local-only and the caller is already inside the trust envelope -- the same split should_require_auth draws. The NAS liveness probe and the auth-gate badge are unaffected. Adds invariant tests for both modes (gated withholds, loopback keeps).	2026-06-13 07:18:59 -07:00
Teknium	d206e1f51d	fix(dashboard): keep local file browser on home	2026-06-13 06:39:38 -07:00
Teknium	74c5158b10	fix(model): show bare custom endpoints in gateway picker (#45597 ) Surface direct model.provider=custom endpoints in /model picker output and keep explicit bare custom switches on the current endpoint instead of requiring a named providers/custom_providers row.	2026-06-13 06:05:30 -07:00
Teknium	0333a99925	fix: merge session-only model analytics rows (#45582 )	2026-06-13 05:52:42 -07:00
Adalsteinn Helgason	643dc82793	Fix custom provider identity loss in session persistence _runtime_model_config persisted the live agent's RESOLVED provider into the session row's model_config JSON. For any named providers:/ custom_providers: entry, agent.provider is the literal string "custom", so the entry name was lost (and the api_key is deliberately never persisted). On session.resume or _reset_session_agent the stored provider="custom" fed resolve_runtime_provider(requested="custom"), which cannot match a named entry — the rebuild either raised "No LLM provider configured" or silently resolved placeholder credentials against the patched-back base_url. Persist the REQUESTED/entry identity instead: a new reverse lookup find_custom_provider_identity(base_url) maps the endpoint URL back to the canonical custom:<name> menu key. _runtime_model_config stores that key; _make_agent performs the same recovery for rows persisted before the fix, falling back to passing the stored base_url as explicit_base_url so the direct-alias branch still targets the session's endpoint when no entry matches. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 05:51:05 -07:00
Teknium	cb125c2b3f	fix(kanban): pin assigned profile toolsets for workers (#45590 )	2026-06-13 05:50:09 -07:00
Teknium	62b4618e9a	fix(dashboard): scope sessions and analytics to selected profile (#45598 )	2026-06-13 05:42:38 -07:00
Teknium	aa0798352a	fix(auth): self-heal missing Codex access tokens Recover Codex singleton auth entries that have a refresh token but no access token by adopting a valid Codex CLI token pair, matching the cron-time failure mode before falling back to the credential pool.	2026-06-13 05:15:26 -07:00
Kennedy Umege	311ff967de	review: validate refresh_token, path-agnostic recovery log, map author email Addresses PR review feedback: - Validate refresh_token (not only access_token) before persisting the re-imported Codex token, so a half-token payload can't silently break the next refresh cycle. - Make the recovery log path-agnostic ("Codex CLI auth.json") since _import_codex_cli_tokens can read $CODEX_HOME, not only ~/.codex. - Add regression test: relogin-required + imported token missing refresh_token -> re-raise and persist nothing. - Map kenmege@yahoo.com -> Kenmege in scripts/release.py AUTHOR_MAP (fixes the check-attribution job). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 05:15:26 -07:00
Kennedy Umege	bd66e7e3fb	fix(auth): self-heal Codex refresh_token rotation by reimporting from ~/.codex Hermes keeps its own copy of the Codex OAuth token per profile and at the top level, separate from the Codex CLI's ~/.codex/auth.json. OAuth refresh_tokens are single-use, so when the Codex CLI (or another Hermes process) rotates the shared token, the frozen copy's refresh_token goes stale and refresh_codex_oauth_pure fails with a relogin-required error (invalid_grant / refresh_token_reused / 401). Today that surfaces as a hard 401 on the turn — idle profiles and desktop sessions 401 "token_expired" until a manual re-auth — even though ~/.codex/auth.json holds a fresh token. _refresh_codex_auth_tokens now falls back to _import_codex_cli_tokens() (the canonical Codex CLI store) when the stored refresh_token is rejected, adopts and persists the fresh token, and lets the in-flight retry succeed. This complements PR #6525 (force relogin on 401/403): we attempt automatic recovery before surfacing a relogin prompt. Transient failures (e.g. 429 quota, relogin_required=False) are never self-healed — the stored token is still valid there — so they re-raise unchanged, and the happy path is untouched. Adds tests/hermes_cli/test_auth_codex_self_heal.py covering: self-heal on invalid_grant, no self-heal on 429 quota, re-raise when ~/.codex is absent, and happy-path-unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 05:15:26 -07:00
Teknium	2681c5a12d	fix(photon): correct gateway start command (#45566 )	2026-06-13 05:14:59 -07:00
xxxigm	5b857201b7	fix(profiles): correct misleading per-profile gateway port docstrings The s6 profile-gateway docstrings claimed the bind port comes from a `[gateway] port` key in config.yaml ("the single source of truth"). No such key exists or is read anywhere — the API server port is resolved by gateway/config.py from `API_SERVER_PORT` (or `platforms.api_server.extra.port`) and defaults to 8642. The wrong reference actively misled a Docker user into setting a non-functional `gateway.port`. Point both docstrings (`S6ServiceManager._render_run_script`, `_maybe_register_gateway_service`) at the real knob, and note the practical consequence: since each supervised profile gateway loads its own HERMES_HOME, two profiles left at the default both try to bind 8642 — each needs a distinct `API_SERVER_PORT` in its own `.env`.	2026-06-13 05:13:25 -07:00
Teknium	905ed413d1	fix(doctor): avoid unsafe npm audit fallback Root-level npm audit fix can crash with isDescendantOf on the same monorepo tree, so workspace audit advisories should explain the lockfile-bump path instead of recommending another manual npm fix command.	2026-06-13 05:09:56 -07:00
xxxigm	a5e9b17ce3	fix(doctor): stop recommending the npm-crashing audit fix, explain build-tool advisories `hermes doctor` flagged the web/ui-tui workspaces and told the user to run `npm audit fix --workspace <name>`, which crashes current npm with "Cannot read properties of null (reading 'edgesOut')" (an arborist bug with workspace-filtered audit fix). Recommend the root-level `npm audit fix` instead. Even the root form can hit a known npm arborist crash (edgesOut / isDescendantOf) on this monorepo tree, so add a note that these workspace advisories are build-time tooling (esbuild/vite, etc.) — not runtime code — and clear via a lockfile bump rather than a manual fix. This keeps doctor from handing users a command that errors out and from implying a broken Hermes install.	2026-06-13 05:09:56 -07:00
Teknium	8cf9d8689d	fix(desktop): keep composer usable during reconnect (#45488 ) * feat(cli): add --safe-mode troubleshooting flag Inspired by Claude Code v2.1.169 (June 2026): run Hermes with all customizations disabled to isolate setup problems from product bugs. --safe-mode implies --ignore-user-config and --ignore-rules, and additionally skips plugin discovery (hermes_cli/plugins.py) and MCP server loading (tools/mcp_tool.py) via the internal HERMES_SAFE_MODE env bridge. * fix(desktop): keep composer usable during reconnect	2026-06-13 02:36:09 -07:00
Teknium	bc060c7c1c	fix(models): remove unavailable claude-fable-5 (#45492 )	2026-06-13 02:03:50 -07:00
Teknium	9688c1a94f	chore: add Kimi K2.7 code catalog slug (#45283 )	2026-06-12 16:55:40 -07:00
Teknium	135fe90166	fix(profiles): backfill .env for pre-existing profiles on hermes update (#45247 ) Profiles created before #44792 have no .env. Now that the Channels/Keys endpoints are profile-scoped (no os.environ fallback), those profiles would show everything as unconfigured. hermes update now copies the default install's .env into each named profile that lacks one (0600, never overwrites, placeholder fallback when the root has no .env), so existing users keep the credentials they were effectively running with.	2026-06-12 15:42:14 -07:00
Teknium	7a318aae22	fix(profiles): exclude session history, backups, and snapshots from --clone-all (#45246 ) --clone-all copied the source profile's state.db, sessions/, backups/, state-snapshots/, and checkpoints/ into the new profile. These are per-profile history: a 49GB copy in practice (15GB snapshots + 11GB backup archives + 16GB state.db + 6.4GB sessions), and restoring a copied backup inside the clone would resurrect the SOURCE profile's state. A clone is a fresh workspace; history stays with the source. New _CLONE_ALL_HISTORY_EXCLUDE_ROOT set, applied at root level for ANY source profile (named profiles accumulate the same artifacts), unlike the default-gated infrastructure excludes. Nested same-name dirs still copy. Docs and the post-create CLI message updated to match; profile export / hermes backup remain the full-history paths.	2026-06-12 15:41:50 -07:00
Teknium	a118b94a85	fix(dashboard): skill installs from the dashboard silently auto-cancel (#45150 ) The dashboard's /api/skills/hub/install (and the new-profile hub_skills path) spawned `hermes skills install <id>` with stdin=DEVNULL but without --yes. do_install()'s 'Confirm [y/N]' prompt hit EOF, defaulted to 'n', and printed 'Installation cancelled.' into a background log the user never sees — every dashboard install no-opped. Pass --yes on both spawn sites, matching the uninstall endpoint which already passed --yes. The dashboard install button is the explicit user consent, same as the TUI/slash-command skip_confirm rationale. Repro: spawned the exact argv with stdin=DEVNULL against a temp HERMES_HOME — without --yes it cancels, with --yes the skill installs.	2026-06-12 12:58:36 -07:00
Teknium	bba9b519aa	fix(delegation): remove the default subagent wall-clock timeout (#45149 ) Subagents doing legitimate heavy work (deep code reviews, research fan-outs, slow reasoning models) were routinely killed at the blanket 600s child_timeout_seconds cap while making steady progress (e.g. 36 API calls completed when the axe fell). Failures should come from what the child is actually doing — API errors, tool errors, iteration budget — not a delegation-level stopwatch. - DEFAULT_CHILD_TIMEOUT: 600 -> None; Future.result(timeout=None) blocks until the child finishes - config default delegation.child_timeout_seconds: 600 -> 0 (0/negative = disabled; positive opts back in, floor 30s unchanged) - stuck-child protection unchanged: the heartbeat staleness monitor still stops refreshing parent activity so the gateway inactivity timeout fires on a truly wedged worker; the 0-API-call diagnostic dump still works when a cap is configured - docs updated (EN + zh-Hans)	2026-06-12 12:58:25 -07:00
Teknium	9b01c4d193	fix(update): never spawn an interactive polkit prompt when restarting a system-scope gateway (#45145 ) When hermes update restarts a hermes-gateway system service as a non-root user, the systemctl reset-failed/start/restart calls trigger polkit's org.freedesktop.systemd1.manage-units TTY authentication agent. That prompt runs inside a captured subprocess with a 10-15s timeout, so it flashes and dies before the user can answer, and the resulting TimeoutExpired was swallowed silently by the loop's blanket except — the restart phase just vanished with no output. - Resolve a manage-units command prefix up front: plain systemctl as root, sudo -n systemctl as non-root (with a targeted reset-failed probe so least-privilege sudoers entries scoped to hermes-gateway* qualify), or None when no non-interactive privilege path exists. - Add --no-ask-password to every manage-units call in the update restart path so polkit can never prompt inside a captured subprocess. - When unprivileged: after a graceful drain, rely on systemd's own RestartSec auto-restart (needs no privileges) with a message about the wait; skip the force-restart fallback with clear manual instructions instead of racing a doomed polkit prompt. - Surface TimeoutExpired in the restart loop instead of passing silently, and add sudo to the system-scope recovery hints. - Docs: headless-VM note recommending user service + enable-linger, or sudo updates / a scoped NOPASSWD sudoers entry for system services.	2026-06-12 12:38:15 -07:00

1 2 3 4 5 ...

2752 commits