* fix: persist s6 gateway desired state
* chore(release): map salvaged contributor
---------
Co-authored-by: Alfred Smith <alfred@my-cloud.me>
Co-authored-by: Ben <ben@nousresearch.com>
* fix(gateway): chown logs/gateways parent so late-added profiles can log
The per-profile log service script created $HERMES_HOME/logs/gateways/
via 'mkdir -p' but only chowned the leaf logs/gateways/<profile>. When
the first log service boots in root context, the gateways/ parent stays
root:root; every profile registered later runs its log service as the
dropped hermes user, 'mkdir -p' fails with EACCES, and s6-log enters a
sub-second fatal crash-loop flooding the container log. The stage2
recursive heal does not catch it either: it is gated on needs_chown,
which is false when the top-level $HERMES_HOME is already hermes-owned.
Two complementary fixes:
- service_manager._render_log_run: chown the gateways/ parent
(non-recursively) before the leaf chown. Runs on every root-context
boot, so it also heals volumes already poisoned by older images.
- docker/stage2-hook.sh: seed logs/gateways in the as_hermes mkdir -p
block; cont-init runs before any service starts, so the parent
already exists hermes-owned when the first log/run does 'mkdir -p'.
The needs_chown repair loop needs no twin entry: it already chowns
logs/ recursively, which covers logs/gateways.
Fixes#45258
* chore(release): map salvaged contributor
---------
Co-authored-by: tangtaizhong666 <tangtaizhong792@gmail.com>
* fix(s6): prevent profile create from auto-starting gateway service
When hermes profile create runs inside an s6 container,
_maybe_register_gateway_service() calls register_profile_gateway()
which creates the service directory and triggers s6-svscanctl -a.
Previously the service always started immediately, causing profiles
that share the main gateway's bot token (e.g. Kanban worker profiles)
to fail with a token-lock conflict and persist gateway_state: running
— becoming zombies that resurrect on every container restart.
Wire the existing start_now parameter through the S6 implementation:
when start_now=False, write a marker file (same pattern as
container_boot.py _register_gateway_slot) so s6-supervise leaves the
service stopped until the user explicitly runs hermes -p <profile>
gateway start.
4 files, +61/-6, 4 new tests (all passing).
* test(docker): wait for gateway running state before restart
---------
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Remove the free Parallel Search MCP path and restore the keyed Parallel backend behavior from before it was introduced.
Also drops the keyless fallback registration/display labeling tests and returns the Parallel SDK pin to the prior version.
The platform-disabled fix landed only in agent.skill_utils.get_disabled_skill_names
(the system-prompt path). Two sibling resolvers still used the old
replace-not-union semantics, so the same skill could be hidden from the
<available_skills> prompt yet reported enabled elsewhere:
- hermes_cli/skills_config.get_disabled_skills (the 'hermes skills config' UI)
returned only the platform list, so a globally-disabled skill showed as
enabled (unchecked) on any platform with a platform_disabled entry.
- tools/skills_tool._is_skill_disabled (gates whether skill_view loads a skill)
ignored the global list when a platform list existed, so a globally-disabled
skill could still be loaded on such a platform.
Both now union the global list with the platform list, matching
get_disabled_skill_names. An explicit empty platform list no longer re-enables
a globally-disabled skill — global disables hold on every platform (#46201).
Also: fix the now-stale get_disabled_skill_names docstring and drop a stray
blank line. Regression tests added for both sites (proven to fail on the old
replace semantics).
build_skills_system_prompt() already resolved _platform_hint but called
get_disabled_skill_names() with no argument, so the resolved platform never
reached the filter and the prompt cache_key varied by platform while the
disabled set did not. Pass _platform_hint or None.
get_disabled_skill_names() also fully ignored the global 'disabled' list once
a platform-specific list was found. Return the union (global | platform) so a
globally-disabled skill stays disabled on every platform.
Salvaged from #46203 by @iborazzi; the unrelated apps/shared/tsconfig.json
ES2023 bump is intentionally dropped (one concern per PR).
The dashboard's public /api/status liveness endpoint is in PUBLIC_API_PATHS
and bypasses dashboard auth, yet it returned absolute hermes_home,
config_path, env_path, the gateway PID, and the internal gateway health URL.
That exceeds the shape its own allowlist documents as public ("version,
gateway state, active session count, and the dashboard auth-gate shape. No
bodies, no session content, no secrets"), leaking deployment recon to any
unauthenticated caller on a network-exposed (gated) bind.
Withhold host-local detail unless the bind is loopback / --insecure, where
the dashboard is local-only and the caller is already inside the trust
envelope -- the same split should_require_auth draws. The NAS liveness probe
and the auth-gate badge are unaffected.
Adds invariant tests for both modes (gated withholds, loopback keeps).
Surface direct model.provider=custom endpoints in /model picker output and keep explicit bare custom switches on the current endpoint instead of requiring a named providers/custom_providers row.
_runtime_model_config persisted the live agent's RESOLVED provider into
the session row's model_config JSON. For any named providers:/
custom_providers: entry, agent.provider is the literal string "custom",
so the entry name was lost (and the api_key is deliberately never
persisted). On session.resume or _reset_session_agent the stored
provider="custom" fed resolve_runtime_provider(requested="custom"),
which cannot match a named entry — the rebuild either raised "No LLM
provider configured" or silently resolved placeholder credentials
against the patched-back base_url.
Persist the REQUESTED/entry identity instead: a new reverse lookup
find_custom_provider_identity(base_url) maps the endpoint URL back to
the canonical custom:<name> menu key. _runtime_model_config stores that
key; _make_agent performs the same recovery for rows persisted before
the fix, falling back to passing the stored base_url as
explicit_base_url so the direct-alias branch still targets the
session's endpoint when no entry matches.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Recover Codex singleton auth entries that have a refresh token but no access token by adopting a valid Codex CLI token pair, matching the cron-time failure mode before falling back to the credential pool.
Addresses PR review feedback:
- Validate refresh_token (not only access_token) before persisting the
re-imported Codex token, so a half-token payload can't silently break the
next refresh cycle.
- Make the recovery log path-agnostic ("Codex CLI auth.json") since
_import_codex_cli_tokens can read $CODEX_HOME, not only ~/.codex.
- Add regression test: relogin-required + imported token missing refresh_token
-> re-raise and persist nothing.
- Map kenmege@yahoo.com -> Kenmege in scripts/release.py AUTHOR_MAP
(fixes the check-attribution job).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Hermes keeps its own copy of the Codex OAuth token per profile and at the
top level, separate from the Codex CLI's ~/.codex/auth.json. OAuth
refresh_tokens are single-use, so when the Codex CLI (or another Hermes
process) rotates the shared token, the frozen copy's refresh_token goes
stale and refresh_codex_oauth_pure fails with a relogin-required error
(invalid_grant / refresh_token_reused / 401). Today that surfaces as a hard
401 on the turn — idle profiles and desktop sessions 401 "token_expired"
until a manual re-auth — even though ~/.codex/auth.json holds a fresh token.
_refresh_codex_auth_tokens now falls back to _import_codex_cli_tokens() (the
canonical Codex CLI store) when the stored refresh_token is rejected, adopts
and persists the fresh token, and lets the in-flight retry succeed. This
complements PR #6525 (force relogin on 401/403): we attempt automatic
recovery before surfacing a relogin prompt. Transient failures (e.g. 429
quota, relogin_required=False) are never self-healed — the stored token is
still valid there — so they re-raise unchanged, and the happy path is
untouched.
Adds tests/hermes_cli/test_auth_codex_self_heal.py covering: self-heal on
invalid_grant, no self-heal on 429 quota, re-raise when ~/.codex is absent,
and happy-path-unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Root-level npm audit fix can crash with isDescendantOf on the same monorepo tree, so workspace audit advisories should explain the lockfile-bump path instead of recommending another manual npm fix command.
* feat(cli): add --safe-mode troubleshooting flag
Inspired by Claude Code v2.1.169 (June 2026): run Hermes with all
customizations disabled to isolate setup problems from product bugs.
--safe-mode implies --ignore-user-config and --ignore-rules, and
additionally skips plugin discovery (hermes_cli/plugins.py) and MCP
server loading (tools/mcp_tool.py) via the internal HERMES_SAFE_MODE
env bridge.
* fix(desktop): keep composer usable during reconnect
Profiles created before #44792 have no .env. Now that the Channels/Keys
endpoints are profile-scoped (no os.environ fallback), those profiles
would show everything as unconfigured. hermes update now copies the
default install's .env into each named profile that lacks one (0600,
never overwrites, placeholder fallback when the root has no .env), so
existing users keep the credentials they were effectively running with.
--clone-all copied the source profile's state.db, sessions/, backups/,
state-snapshots/, and checkpoints/ into the new profile. These are
per-profile history: a 49GB copy in practice (15GB snapshots + 11GB
backup archives + 16GB state.db + 6.4GB sessions), and restoring a
copied backup inside the clone would resurrect the SOURCE profile's
state. A clone is a fresh workspace; history stays with the source.
New _CLONE_ALL_HISTORY_EXCLUDE_ROOT set, applied at root level for ANY
source profile (named profiles accumulate the same artifacts), unlike
the default-gated infrastructure excludes. Nested same-name dirs still
copy. Docs and the post-create CLI message updated to match; profile
export / hermes backup remain the full-history paths.
The dashboard's /api/skills/hub/install (and the new-profile hub_skills
path) spawned `hermes skills install <id>` with stdin=DEVNULL but
without --yes. do_install()'s 'Confirm [y/N]' prompt hit EOF, defaulted
to 'n', and printed 'Installation cancelled.' into a background log the
user never sees — every dashboard install no-opped.
Pass --yes on both spawn sites, matching the uninstall endpoint which
already passed --yes. The dashboard install button is the explicit user
consent, same as the TUI/slash-command skip_confirm rationale.
Repro: spawned the exact argv with stdin=DEVNULL against a temp
HERMES_HOME — without --yes it cancels, with --yes the skill installs.
A user passing an image to `hermes send --file` got a raw
UnicodeDecodeError ('utf-8 codec can't decode byte 0x89...') with no
hint that media delivery goes through the MEDIA:<path> directive.
- send_cmd: catch UnicodeDecodeError separately and print a usage error
explaining --file is for text bodies, with copy-pasteable MEDIA: and
[[as_document]] examples using the user's own path
- --file help text + epilog now mention MEDIA:
- docs: new 'Sending images and other media' section on the hermes send
reference page
Replace the PortPool-based port reservation system (9120-9199 range) with OS-assigned ephemeral ports via --port 0.
Before: Desktop probed a hardcoded port range, reserved ports in-process to close TOCTOU races, and passed the chosen port to the dashboard via CLI arg.
After: Desktop spawns dashboard with --port 0, parses the actual port from a stdout announcement line (HERMES_DASHBOARD_READY port=<N>), and uses that for WebSocket connections.
Changes:
- web_server.py: add --port 0 support with SO_REUSEADDR pre-bind + announcement; add EADDRINUSE preflight for explicit ports
- main.cjs: remove PortPool, PORT_FLOOR/CEILING, pickPort(), isPortAvailable(); add waitForDashboardPort() stdout parser
- Delete port-pool.cjs and port-pool.test.cjs (106 lines removed)
Net effect: eliminates the entire TOCTOU-mitigation reservation infrastructure and arbitrary port range constraints. OS handles port allocation natively.
Two halves of the same community report (dashboard Profile Builder):
1. A fresh dashboard/CLI-created profile got no .env file unless cloned,
so it silently inherited API keys and messaging tokens from the shell
environment / root install. create_profile() now seeds a placeholder
.env (0600) for non-clone profiles, matching the SOUL.md seeding.
2. The Channels endpoints (/api/messaging/platforms GET/PUT/test) were
not profile-scoped: they read/wrote the dashboard process's own .env
via load_env()/save_env_value() regardless of the global profile
switcher. They now accept the standard optional profile param (body
beats query on the PUT, matching other scoped writes) and run inside
_profile_scope(). When scoped, the payload no longer falls back to
os.environ or load_gateway_config()'s env-override layer — both carry
the ROOT install's credentials and would misreport them as the
profile's. /api/messaging/platforms added to PROFILE_SCOPED_PREFIXES
so the sidebar switcher scopes the Channels page automatically.
* feat(billing): /usage → portal top-up browser handoff
Add the terminal side of the billing slice (phase 2a): start a top-up by
throwing the user to the portal billing page with the top-up modal open. The
terminal does not confirm, poll, or track payment — checkout completes in the
browser and the next /usage shows the new balance.
- nous_account.py: parse organisation.slug/name from /api/oauth/account into
NousPortalAccountInfo; add nous_portal_topup_url() building the org-pinned
{base}/orgs/{slug}/billing?topup=open with a null-slug fallback to the legacy
{base}/billing?topup=open (never /orgs/None/...).
- portal_cli.py: 'hermes portal topup' — fresh account fetch, identity line
(Topping up as <email> / org <name>), browser open with printed-URL fallback,
no-wait closing copy. No polling/confirmation (deferred to 2b).
- account_usage.py: the shared /usage credits block now links the org-pinned
top-up URL (auto-opens the modal) + points to the command.
Depends on NAS #409 (organisation.slug/name + ?topup=open). Do not merge until
that is live on the target env; until then /api/oauth/account returns
organisation: { id } only and the URL falls back to legacy.
* feat(billing): /credits command for balance + top-up handoff
Replace the standalone `hermes portal topup` subcommand with an in-session
/credits slash command — a focused money surface (balance in, top-up out) that
works in the CLI, TUI, and every messaging platform from one registry entry.
- commands.py: register /credits (Info category). Slack is at its 50-slash cap,
so /credits is routed via /hermes credits on Slack only (new
_SLACK_VIA_HERMES_ONLY set) to avoid clamping a canonical command off the
native list and breaking Telegram parity; native everywhere else.
- account_usage.py: build_credits_view() — one portal fetch → balance lines +
identity line + org-pinned top-up URL + depleted flag, consumed by all
surfaces. Reuses the same snapshot/URL builder as /usage so numbers match.
- cli.py: _show_credits() — balance block + identity line + 3-button panel
(Open top-up / Copy link / Cancel) via the existing prompt_toolkit modal.
ASK, never auto-launch; headless falls back to printing the URL.
- gateway/slash_commands.py: _handle_credits_command() — renders the block +
tappable top-up URL + no-wait copy; works on button and plain-text platforms.
- /usage credits line now points to /credits.
- Retire `hermes portal topup` (portal_cli.py back to baseline); the engine
(slug/name parse + nous_portal_topup_url) stays as the shared core.
No polling, no payment confirmation (billing phase 2a). Depends on NAS #409.
* fix(credits): /credits works in the TUI slash-worker (non-interactive)
In the TUI, /credits runs in the slash-worker subprocess where there is no
live prompt_toolkit app and stdin is the JSON-RPC pipe. _show_credits called
the 3-button modal unconditionally, which fell back to reading stdin →
exception → slash.exec rejected → the command produced no output (only the
pre-existing 'Credit access paused' banner showed).
- _show_credits: when self._app is None (TUI worker / piped / non-interactive),
render the text variant — balance block + tappable top-up URL + no-wait line,
same affordance as the messaging surfaces — and skip the modal entirely. The
3-button panel still renders in the interactive CLI.
- Depleted banner copy: 'run /usage for balance' → 'run /credits to top up'
now that /credits is the dedicated money surface (+ tests).
- Regression tests: _show_credits with self._app=None renders text and never
invokes the modal; logged-out path.
* feat(tui): credits.view RPC for the /credits tappable top-up button
Add a credits.view JSON-RPC method returning the structured CreditsView
(logged_in, balance_lines, identity_line, topup_url, depleted) so the TUI can
render a clickable <Link> top-up button instead of plain text. Account-
independent (portal fetch gated on a logged-in Nous account), fail-open to
{logged_in: false} on any hiccup. Mirrors session.usage's credits-block pattern.
Frontend (TUI-local /credits command + Ink component) lands separately.
* feat(tui): /credits command with keyboard-driven top-up confirm
TUI-local /credits: fetches the structured balance via the credits.view RPC,
prints the balance + identity + top-up URL, then arms the EXISTING confirm
overlay (Enter = open top-up in browser via openExternalUrl, Esc = cancel).
Reuses ConfirmReq — no new overlay component/state/input handler. Headless
(openExternalUrl returns false) falls back to printing the URL.
- gatewayTypes.ts: CreditsViewResponse.
- commands/credits.ts: the command (mirrors /status's rpc+guarded pattern).
- registry.ts: register creditsCommands.
- test: balance+overlay armed, headless fallback, no-url, logged-out (4 cases).
Matches the CLI /credits 'Enter to open' affordance. Phase 2a: no polling.
list_plugins() attribution diffed registry names against all already-loaded
plugins, so when a plugin registered a hook / middleware / tool name an
earlier plugin had already used, the shared name was credited to the first
plugin only and later plugins under-reported (0 hooks) in hermes plugins
list. commands_registered right beside it already attributed correctly by
plugin ownership.
Snapshot per-registry counts before register() and attribute the entries
this plugin's register() actually added (per-registration delta). Add a
regression test: two plugins registering the same hook name are each
credited with 1 hook.
discover_and_load(force=True) cleared every per-plugin registry except
_plugin_platform_names, which register_platform() populates. A platform
plugin disabled between force-rediscovers left a stale name behind, so the
set diverged from the real platform_registry / _plugins state and never
shrank across repeated force passes.
Add the missing clear() and a regression test that seeds every per-plugin
registry, forces a rediscover, and asserts they all empty (so a future
registry addition can't silently leak across a force pass either).
The desktop "Local / custom endpoint" onboarding never collected an API
key and /api/model/set silently dropped one, so an auth-gated endpoint
(e.g. a hosted vLLM behind a key) could never enumerate models — and
Settings' "Set up custom endpoint" routed `custom` into a non-existent
OAuth flow, booting the user back to the first screen (the reported loop).
Backend (web_server.py):
- /api/providers/validate accepts an optional api_key and sends it as a
Bearer header when probing a custom endpoint's /v1/models.
- /api/model/set accepts api_key, persists it to model.api_key (same
switch/preserve lifecycle as base_url), and registers a named
custom_providers entry via _save_custom_provider — matching the
`hermes model` CLI flow so the endpoint shows up as a ready picker row.
Desktop:
- ApiKeyForm shows an optional API key field for the local/custom option;
the key is threaded through saveOnboardingLocalEndpoint → validate +
setModelAssignment.
- New onboarding `localEndpoint` intent + startManualLocalEndpoint(); the
Settings "Set up custom endpoint" button now opens the local-endpoint
form (URL + key) instead of the OAuth dead-end.
- Added localApiKeyPlaceholder i18n key (en + types + zh).
Tests: api_key lifecycle on _apply_main_model_assignment, key persistence
+ custom_providers registration on /api/model/set, Bearer-header probe;
onboarding store forwards + persists the key.