`updates.backup_keep: 0` (or any negative value) wiped the freshly-
created pre-update zip:
_prune_pre_update_backups(backup_dir, keep=0):
backups = sorted(..., reverse=True) # newest first, includes
# the zip we just wrote
for p in backups[0:]: # = all of them
p.unlink()
The wrapper in `main.py` then printed `Saved: <path>` for a file that
no longer existed (the size lookup is wrapped in `try/except OSError`
which silently degrades to "0 B"), leaving operators believing they had
a recovery point when they had none.
This is a real footgun because some config systems treat 0 as "keep
unlimited"; here it does the opposite — every backup is destroyed
right after creation.
Fix: clamp `keep` to a minimum of 1 inside `_prune_pre_update_backups`
since that helper is only invoked immediately after a fresh backup
is written. Operators who genuinely want no backups should set
`updates.pre_update_backup: false` (which gates creation entirely)
rather than relying on `backup_keep: 0`.
Also extends the `backup_keep` config docstring to spell out the floor
and point at `pre_update_backup: false` as the off-switch.
## Tests
Three regression tests added in `TestPreUpdateBackup`:
- `test_keep_zero_does_not_delete_freshly_created_backup` —
asserts the file persists after `keep=0`
- `test_keep_negative_does_not_delete_freshly_created_backup` —
same for negative values
- `test_keep_zero_still_prunes_older_backups` — proves the floor
only protects the new backup; older ones are still rotated out
Verified the new tests fail on origin/main (without the floor) and
pass with it; full `tests/hermes_cli/test_backup.py` suite green
(84 tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the Codex auto-import UX. On successful Nous login (either
`hermes auth add nous --type oauth` or `hermes login nous`), tokens are
mirrored to `$HERMES_SHARED_AUTH_DIR/nous_auth.json` (default
`~/.hermes/shared/nous_auth.json`, outside any named profile's
HERMES_HOME). On next login in a new profile, the flow offers to import
those credentials ("Import these credentials? [Y/n]") and rehydrates via
a forced refresh+mint instead of running the full device-code flow.
Runtime refresh in any profile syncs the rotated refresh_token back to
the shared store so sibling profiles don't hit stale-token fallback
after rotation.
The volatile 24h agent_key is NOT persisted to the shared store —
only the long-lived OAuth tokens are cross-profile useful.
- `HERMES_SHARED_AUTH_DIR` env var for tests + custom layouts
- Pytest seat belt mirrors the existing `_auth_file_path` guard so
forgetting to redirect the store in a test fails loudly
- File mode 0600 where platform supports it
- Runtime credential resolution is unchanged — shared store is only
consulted during the login flow, so profile isolation at runtime is
preserved
- Stale refresh_token + portal-down cases gracefully fall back to
device-code
Addresses a user report from Mike Nguyen: running
`hermes --profile <name> auth add nous --type oauth` for every new
profile is unnecessary friction now that Codex has a shared-import
flow via `~/.codex/auth.json`.
Instead of an unhelpful CalledProcessError traceback when running
`hermes gateway start/stop/restart` without first installing the service,
check for the unit file and exit with an actionable install hint.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Follow-up to @changchun989's cherry-pick: reverts the validate-via-
normalize change so validate_profile_name remains a strict regex check
on the input AS-GIVEN. Callers that accept mixed-case user input
(dashboard UI, CLI args, import flows) call normalize_profile_name()
first, then validate the result. This keeps validate honest about
what the on-disk directory name must look like — e.g. ' jules '
(trailing whitespace) is now rejected instead of silently trimmed
and accepted.
- validate_profile_name: strict lowercase/regex check again, 'UPPER'
back in the invalid-names parametrize
- 8 call sites in profiles.py (create_profile, delete_profile,
set_active_profile, export_profile, import_profile, rename_profile,
resolve_profile_env, plus the clone_from branch): swap the
normalize-then-validate order
- scripts/release.py: add changchun989@proton.me -> changchun989 to
AUTHOR_MAP so CI doesn't block on the unmapped contributor email
All kanban + profile tests pass (268 across test_profiles.py +
test_kanban_db.py + test_kanban_core_functionality.py, plus 73 in
test_kanban_tools.py + test_kanban_dashboard_plugin.py).
Closes#18498.
- Add normalize_profile_name() for lowercase canonical IDs and Default alias
- Use canonical names in create/delete/rename/export/import/set_active paths
- Canonicalize Kanban assignee on create/assign, list filter, and worker spawn
- Tests for mixed-case assignees and profile resolution (fixes#18498)
`hermes import` was creating secret files with the process umask
(typically 0644) instead of 0600. zipfile.open() does not honor the
Unix mode bits stored in zip member external_attr; the restore loop
used open(target, "wb") which always falls back to umask.
Threat: silent privilege downgrade after a routine restore on
multi-user systems (shared dev boxes, CI runners, jump hosts) — any
local user could read API keys and OAuth tokens from ~/.hermes/.
Fix mirrors the convention already used at file creation
(hermes_cli/auth.py: stat.S_IRUSR | stat.S_IWUSR for auth.json).
The quick-snapshot restore path (restore_quick_snapshot) is
unaffected — it uses shutil.copy2 which preserves perms via
copystat().
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds first-class board support to kanban so users can separate unrelated
streams of work (projects, repos, domains) into isolated queues. Single-
project users stay on the 'default' board and see no UI change.
Isolation model
---------------
- Each board is a directory at `~/.hermes/kanban/boards/<slug>/` with
its own `kanban.db`, `workspaces/`, and `logs/`. The 'default' board
keeps its legacy path (`~/.hermes/kanban.db`) for back-compat — fresh
installs and pre-boards users get zero migration.
- Workers spawned by the dispatcher have `HERMES_KANBAN_BOARD` pinned in
their env alongside the existing `HERMES_KANBAN_DB` /
`HERMES_KANBAN_WORKSPACES_ROOT` pins, so workers physically cannot see
other boards' tasks.
- The gateway's single dispatcher loop now sweeps every board per tick;
per-tick cost is a few extra filesystem stats.
- CAS concurrency guarantees are preserved per-board (each board is its
own SQLite DB, same WAL+IMMEDIATE machinery as before).
CLI
---
hermes kanban boards list|create|switch|show|rename|rm
hermes kanban --board <slug> <any-subcommand>
Board resolution order: `--board` flag → `HERMES_KANBAN_BOARD` env →
`~/.hermes/kanban/current` file → `default`. Slug validation is strict:
lowercase alphanumerics + hyphens + underscores, 1-64 chars, starts with
alphanumeric. Uppercase is auto-downcased; slashes / dots / `..` /
control chars are rejected so boards can't name their way out of the
boards/ directory.
Passive discoverability: when more than one board exists, `hermes kanban
list` prints a one-line header ("Board: foo (2 other boards …)") so
users who stumble across multi-project never have to hunt for the
feature. Invisible for single-board installs.
Dashboard
---------
- New `BoardSwitcher` component at the top of the Kanban tab: dropdown
with all boards + task counts, `+ New board` button, `Archive`
button (non-default only). Hidden entirely when only `default` exists
and is empty — single-project users never see it.
- New `NewBoardDialog` modal: slug / display name / description / icon
+ "switch to this board after creating" checkbox.
- Selected board persists to `localStorage` so browser users don't
shift the CLI's active board out from under a terminal they left open.
- New `?board=<slug>` query param on every existing endpoint plus a
new `/boards` CRUD surface (`GET /boards`, `POST /boards`,
`PATCH /boards/<slug>`, `DELETE /boards/<slug>`,
`POST /boards/<slug>/switch`).
- Events WebSocket is pinned to a board at connection time; switching
opens a fresh WS against the new board.
Also fixes a pre-existing bug in the plugin's tenant / assignee
filters: the SDK's `Select` uses `onValueChange(value)`, not
native `onChange(event)`, so those filters silently didn't work.
New `selectChangeHandler` helper wires both signatures.
Tests
-----
49 new tests in `tests/hermes_cli/test_kanban_boards.py` covering:
slug validation (valid / invalid / auto-downcase), path resolution
(default = legacy path, named = `boards/<slug>/`, env var override),
current-board resolution chain (env > file > default), board CRUD +
archive / hard-delete, per-board connection isolation (tasks don't
leak), worker spawn env injection (`HERMES_KANBAN_BOARD`,
`HERMES_KANBAN_DB`, `HERMES_KANBAN_WORKSPACES_ROOT` all point at the
right board), and end-to-end CLI surface.
Regression surface: all 264 pre-existing kanban tests continue to pass.
Live-tested via the dashboard: created 3 boards (default,
hermes-agent, atm10-server), created tasks on each via both CLI
(`--board <slug> create`) and dashboard (inline create on the Ready
column), confirmed zero cross-board leakage, confirmed `BoardSwitcher`
+ `NewBoardDialog` work end-to-end in the browser.
The resilient restart settings from PR #18639 only took effect when
the gateway was started via `hermes gateway start` or `hermes gateway
restart` — both of which call refresh_systemd_unit_if_needed() which
writes the new unit and runs daemon-reload.
However, when the gateway self-restarts via exit-code-75 (stale-code
detection after `hermes update`, or the /restart command), systemd
respawns the process directly without going through any CLI function.
The unit file on disk stays stale, and systemd keeps using the old
cached settings (StartLimitBurst=5, RestartSec=30) until someone
manually runs `hermes gateway restart`.
This meant that after PR #18639 was deployed, users who never ran
`hermes gateway restart` manually were still vulnerable to the
permanent-death-on-network-outage bug.
Fix: call refresh_systemd_unit_if_needed() at the top of run_gateway()
(the foreground entry point that systemd's ExecStart invokes). This
ensures that on every boot — whether triggered by systemd restart,
exit-75 respawn, or manual foreground run — the unit definition and
daemon state are current. The call is best-effort (exceptions caught)
and a no-op when the unit is already current (one stat + string compare).
Allow users to start a fresh session and immediately set its title by
passing a name to /new (or /reset):
/new Refactor auth module
Changes:
- hermes_cli/commands.py: add args_hint='[name]' to /new command
- cli.py: parse title argument in process_command(), pass to new_session()
- cli.py: new_session() accepts title=None, sets title via SessionDB
- gateway/run.py: _handle_reset_command() parses title, sets on new entry
- gateway/session.py: reset_session() accepts optional display_name
- tests: add test_new_session_with_title, test_reset_command_with_title,
test_new_command_in_help_output
All 36 affected tests pass.
When agent-browser is globally installed via 'npm install -g agent-browser'
but not present in the local node_modules, doctor falsely warns that it's
not installed. Add shutil.which('agent-browser') as a fallback check after
the local path check.
Closes#15951
The auth check in list_authenticated_providers used mere key presence in
credential_pool to conclude a provider is authenticated. An empty entry
(pool_store key with no actual credentials) caused providers like
ollama-cloud to appear as authenticated in the model picker even when no
OLLAMA_API_KEY was set.
The user's picker then offered nemotron-3-super under Ollama Cloud;
selecting it routed every subsequent turn to https://ollama.com/v1, which
rejected the requests with HTTP 400.
Fix: drop the pool_store key-existence check from both section 2
(HERMES_OVERLAYS) and section 2b (CANONICAL_PROVIDERS). The following
load_pool().has_credentials() call already handles the legitimate pooled-
credential case; checking for an empty key just ahead of it was redundant
and actively harmful.
`_apply_profile_override()` scans `sys.argv` for `-p / --profile` at
module import time. When `hermes_cli.main` is imported inside pytest
with `-p no:xdist` on the command line, it picks up `'no:xdist'` as a
profile name candidate, then passes it to `resolve_profile_env()` which
raises `ValueError` (invalid format), and the function calls
`sys.exit(1)` — aborting test collection with an INTERNALERROR before
any test runs.
The same conflict affects any tool or wrapper that uses `-p` for its
own flag and then imports `hermes_cli.main`.
Fix: add a format guard immediately after step 1 (explicit flag scan).
If `consume == 2` (the value came from `-p <value>`, not
`--profile=value`) and the candidate doesn't match the canonical
profile-name pattern `[a-z0-9][a-z0-9_-]{0,63}` (mirrored from
`hermes_cli.profiles._PROFILE_ID_RE`), discard it and continue as if
no `-p` flag was found. The `active_profile` file-based fallback
(step 2) only reads a file written by hermes itself, so it always
produces valid names and needs no guard.
Regression guard: with the guard reverted, importing
`hermes_cli.main` with `sys.argv = ['pytest', '-p', 'no:xdist', ...]`
raises `SystemExit(1)`. With the guard in place, the import succeeds
and `sys.argv` is left intact for pytest. Legitimate `-p coder` still
flows through to `resolve_profile_env()` unchanged.
Rebased onto current `origin/main` (`e5dad4ac5`) — the prior branch
base (`4fade39c9`) was 824 commits behind and the PR was DIRTY /
CONFLICTING. The 1.5 HERMES_HOME-set early-return block has since
landed between the original insertion point and step 2; the new guard
is positioned correctly before the early return so a bogus `-p` value
no longer prevents the early return from kicking in.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MiniMax China (api.minimaxi.com) does not expose a /v1/models endpoint.
The doctor command was probing it and reporting HTTP 404 as a warning,
even though the API works correctly for chat completions.
Set supports_health_check=False for MiniMax CN so doctor shows
"(key configured)" instead of the false 404 warning.
Refs #12768, #13757
Guard the save_env_value('AUXILIARY_VISION_MODEL', ...) call with
'if _selected_vision_model:' so blank input at the non-OpenAI vision
model prompt doesn't nuke existing values in .env.
save_env_value has no internal guard against empty strings — it
faithfully writes whatever it receives, including empty values that
shadow the previously-configured model.
Salvage of #15504 (core hunk). Contributor's test was dropped because
it collided with subsequent test refactors; the fix stands on its own.
Co-authored-by: alt-glitch <balyan.sid@gmail.com>
Preserve explicit caller overrides, but backfill a sensible default
TERM=xterm-256color when missing or blank in the spawn env. CI often
runs without TERM in the parent process, which makes terminal probes
like 'tput cols' fail before winsize reads.
Salvage of #15278's core code fix only — the test changes conflict
with subsequent test refactors on main that now exercise TIOCGWINSZ
directly instead of via 'tput'.
Co-authored-by: LeonSGP43 <154585401+LeonSGP43@users.noreply.github.com>
Commands that open pickers (/model, /skin, /personality) previously
received a trailing space in their completions to keep the dropdown
visible in the classic CLI. However, the TUI's submit handler applies
the completion when Enter is pressed and the result differs from the
input — so '/model' + space became '/model ' and the command was never
executed.
Picker commands now omit the trailing space for exact matches, allowing
Enter to submit and open the picker. Non-picker commands (/help, etc.)
are unaffected.
_reconfigure_provider() updates cloud_provider/backend/tts.provider when
switching tool providers via "hermes setup tools → Reconfigure", but did
not update the matching use_gateway flag. _configure_provider() (the
initial-setup path) sets use_gateway on all three tool categories. The
omission in _reconfigure_provider leaves a stale value in config.yaml:
switching from a Nous-managed provider (use_gateway=True) to a self-hosted
one keeps use_gateway=True, continuing to route requests through the Nous
gateway; switching the other way leaves use_gateway unset so the managed
feature does not activate.
Fix: mirror _configure_provider's use_gateway = bool(managed_feature)
assignment in the tts, browser, and web blocks of _reconfigure_provider.
Symmetric across all three tool categories. No behavior change for any
provider that does not set tts_provider, browser_provider, or web_backend.
Fixes#15229
Create a timestamped backup (~/.hermes/config.yaml.bak.YYYYMMDD_HHMMSS)
before the setup wizard runs any configuration sections. After setup
completes, show the backup path and a restore command.
This protects user-customized values (compression thresholds, provider
routing, PII redaction, auxiliary model configs) from being silently
overwritten by setup defaults.
Addresses #3522
Two related fixes for custom_providers model switching:
1. validate_requested_model() now recognizes custom:<name> slugs
(e.g. custom:volcengine) as custom endpoints, not generic providers.
Previously only the bare 'custom' slug matched the relaxed validation
branch, causing model validation to fail with 'not found in provider
listing' for all named custom providers.
2. switch_model() now consults the custom_providers list when deciding
whether to override a validation rejection. If the requested model
matches the entry's 'model' field or any key in its 'models' dict,
the switch is accepted even when the remote /v1/models endpoint does
not list it.
Both changes are covered by existing tests (86 passed).
_scan_gateway_pids() uses ps-based pattern matching to find running
gateways. When invoked from the CLI (e.g. `hermes gateway status`),
the calling process itself matches gateway patterns, causing false
positives — the CLI is mistakenly counted as a running gateway.
Add _get_ancestor_pids() that walks the process tree from the current
PID up to init (PID 1). Merge this set into exclude_pids at the top
of _scan_gateway_pids() so the entire ancestor chain is filtered out.
This complements the existing os.getpid() exclusion in
_append_unique_pid() by also covering parent/grandparent processes
(e.g. when hermes is invoked via a wrapper script or shell).
Closes#13242
_setup_slack() was the only platform setup function that did not prompt
for a home channel. All four sibling setups (_setup_telegram,
_setup_discord, _setup_mattermost, _setup_bluebubbles) close with an
identical home-channel block, and setup_gateway() already checks for
SLACK_HOME_CHANNEL presence at the end of the wizard — but the value
was never collected, leaving cron delivery and cross-platform
notifications silently broken for Slack after a fresh hermes setup run.
Add the standard home-channel prompt at the end of _setup_slack(),
symmetric with the Discord implementation. Add two unit tests that
verify the prompt is saved when provided and skipped when left blank.
When multiple gateway profiles are running (e.g. default and wx1),
`hermes gateway status` can be misleading — stopping one profile's
gateway and checking status may still show the other profile's process
without indicating which profile it belongs to.
Add `_print_other_profiles_gateway_status()` which displays running
gateways from other profiles at the bottom of the status output:
Other profiles:
✓ wx1 — PID 166893
This uses the existing `find_profile_gateway_processes()` and
`get_active_profile_name()` — no new dependencies.
Closes#19113
Related: #4402, #4587
Path.home() / ".hermes" / "profiles" breaks custom-root deployments
(e.g. HERMES_HOME=/opt/data). Switch to get_default_hermes_root() so
profile discovery is consistent with kanban_db_path() and
workspaces_root() fixed in #18985.
Fixes#19017.
Related to #18442, #18985.
The old CWD heuristic was fooled by:
1. TERMINAL_CWD persisted to .env by `hermes config set terminal.cwd`
2. Inherited TERMINAL_CWD from parent hermes processes
3. Only resolved when config had a placeholder value (not explicit paths)
Fix:
- load_cli_config() unconditionally uses os.getcwd() for local backend
- TERMINAL_CWD always force-exported in CLI mode (overrides stale values)
- Gateway sets _HERMES_GATEWAY=1 marker so lazy cli.py imports don't clobber
- Remove terminal.cwd from config-set .env sync map (prevents re-poisoning)
- Clarify setup wizard label as 'Gateway working directory'
Closes#19214
Adds an optional dashboard side-process to the container entrypoint,
toggled by `HERMES_DASHBOARD=1` (also accepts `true` / `yes`). When set,
the entrypoint backgrounds `hermes dashboard` before `exec`-ing the main
command so the user's chosen foreground process (gateway, chat, `sleep
infinity`, …) remains PID-of-interest for the container runtime.
docker run -d \
-v ~/.hermes:/opt/data \
-p 8642:8642 -p 9119:9119 \
-e HERMES_DASHBOARD=1 \
nousresearch/hermes-agent gateway run
Defaults chosen for the container case:
- Host: 0.0.0.0 (reachable through published port; can override to
127.0.0.1 via HERMES_DASHBOARD_HOST for sidecar/reverse-proxy setups)
- Port: 9119 (matches `hermes dashboard`)
- Auto-adds `--insecure` when binding to non-localhost, matching the
dashboard's own safety gate for exposing API keys
- HERMES_DASHBOARD_TUI is read by `hermes dashboard` directly — no
entrypoint plumbing needed
Dashboard output is prefixed with `[dashboard]` via `stdbuf`+`sed -u` so
it's easy to separate from gateway logs in `docker logs`. No supervision:
if the dashboard crashes it stays down until the container restarts
(documented in the `:::note` panel).
Other changes bundled in:
- Deprecate GATEWAY_HEALTH_URL / GATEWAY_HEALTH_TIMEOUT env vars in
hermes_cli/web_server.py with a DEPRECATED block comment and a
`.. deprecated::` note on _probe_gateway_health. The feature still
works for this release; it'll be removed alongside the move to a
first-class dashboard config key.
- Rewrite the "Running the dashboard" doc section around the new
single-container pattern. Drops the previously-documented
dashboard-as-its-own-container setup — that pattern relied on the
deprecated env vars for cross-container gateway-liveness detection,
and without them the dashboard would permanently report the gateway
as "not running".
- Collapse the two-service Compose example (gateway + dashboard
container) into a single service with HERMES_DASHBOARD=1. Removes
the now-unnecessary bridge network and `depends_on`.
- Drop the ":::warning" caveat about "Running a dashboard container
alongside the gateway is safe" — that case no longer exists.
`_tui_need_npm_install()` compares the canonical `package-lock.json` against
the hidden `node_modules/.package-lock.json` to decide whether `npm install`
needs to re-run. npm 9 drops the `"peer": true` field from the hidden lock
on dev-deps that are *also* declared as peers (the canonical lock preserves
the dual annotation). That made the check flag 16 packages (`@babel/core`,
`@types/node`, `@types/react`, `@typescript-eslint/*`, `react`, `vite`,
`tsx`, `typescript`, …) as mismatched on every launch, triggering a runtime
`npm install`.
Inside the Docker image, that runtime install then fails with EACCES because
`/opt/hermes/ui-tui/node_modules/` is root-owned from build time, so
`docker run … hermes-agent --tui` prints:
Installing TUI dependencies…
npm install failed.
…and exits 1, with no preview. The empty preview is a second bug: the
launcher captured only stderr, but npm 9 writes EACCES to stdout, which
was DEVNULL'd.
Fixes:
- Add `"peer"` to `_NPM_LOCK_RUNTIME_KEYS` so the comparison ignores the
non-deterministic field, alongside the existing `"ideallyInert"`.
- Capture stdout as well as stderr in the install subprocess so future
failures surface a useful preview instead of a bare "failed." line.
Regression tests:
- `test_no_install_when_only_peer_annotation_differs` — the exact scenario
- `test_install_when_version_differs_even_with_peer_drop` — guards against
the peer-drop tolerance masking a real version skew
On-host impact: the same false-positive was firing on every `hermes --tui`
invocation from a normal checkout, silently running a no-op `npm install`
each time (it converged because the host's `node_modules/` is writable).
Startup time on the TUI should drop noticeably.
On Windows, services and terminals default to cp1252 encoding. The CLI
uses box-drawing characters (┌│├└─) in banners, doctor output, and
status displays. When print() tries to encode these under cp1252, an
unhandled UnicodeEncodeError crashes the gateway on startup.
This fix adds early UTF-8 enforcement in hermes_cli/__init__.py:
- Sets PYTHONUTF8=1 and PYTHONIOENCODING=utf-8
- Re-opens stdout/stderr with UTF-8 encoding if not already UTF-8
Runs at import time so it protects all CLI subcommands. No effect on
Unix (gated on sys.platform == "win32"). Backwards-compatible: on
systems already using UTF-8, the function is a no-op.
Fixes#10956
The MiniMax OAuth API endpoints have moved from api.minimax.io to
account.minimax.io and the old paths now respond with HTTP 307.
httpx defaults to follow_redirects=False (unlike requests), so the
device-code and token-refresh flows fail with "Temporary Redirect".
Adds follow_redirects=True to the two httpx.Client instances in
hermes_cli/auth.py used by the MiniMax OAuth flow. This is forward-
compatible -- if endpoints move again, the redirect chain is
followed automatically.
Repro before patch:
curl -i -X POST https://api.minimax.io/oauth/code # -> 307
curl -i -X POST https://api.minimax.io/oauth/token # -> 307
Verified end-to-end against a real MiniMax Plus account on macOS;
the existing tests/test_minimax_oauth.py suite (15 tests) still
passes.
Layers defense-in-depth on top of the shared-root anchoring (base commit).
Changes in hermes_cli/kanban_db.py:
- kanban_db_path() now honours HERMES_KANBAN_DB first, then falls through
to kanban_home()/kanban.db.
- workspaces_root() now honours HERMES_KANBAN_WORKSPACES_ROOT first, then
falls through to kanban_home()/kanban/workspaces.
- All three overrides (HERMES_KANBAN_HOME, HERMES_KANBAN_DB,
HERMES_KANBAN_WORKSPACES_ROOT) now call .expanduser() for consistency.
- _default_spawn() injects HERMES_KANBAN_DB and
HERMES_KANBAN_WORKSPACES_ROOT into the worker subprocess env. Even
when the worker's get_default_hermes_root() resolution somehow
disagrees with the dispatcher's (symlinks, unusual Docker layouts),
the two processes still open the same SQLite file.
Module docstring updated to describe all three overrides and the
dispatcher env-injection contract.
Tests (tests/hermes_cli/test_kanban_db.py, TestSharedBoardPaths):
- test_hermes_kanban_db_pin_beats_kanban_home
- test_hermes_kanban_workspaces_root_pin_beats_kanban_home
- test_empty_per_path_overrides_fall_through
- test_dispatcher_spawn_injects_kanban_db_and_workspaces_root
(monkeypatches subprocess.Popen, asserts both env vars reach the
child even after HERMES_HOME is rewritten by `hermes -p <profile>`.)
Docs: website/docs/reference/environment-variables.md gets entries
for the three kanban env vars.
This fusion is built on the cleanest of the seven competing PRs that
targeted issue #18442:
* Base commit (from PR #19350 by @GodsBoy): add `kanban_home()` helper
anchored at `get_default_hermes_root()`, reroute all 5 kanban path
sites through it (including the 3 sibling log-dir sites that the
other six PRs missed), 8-test regression class.
* Dispatcher env-var injection approach drawn from PRs #18300
(@quocanh261997) and #19100 (@cg2aigc).
* Per-path env overrides drawn from PR #19100 (@cg2aigc).
* get_default_hermes_root() resolution direction first proposed in
PR #18503 (@beibi9966) and PR #18985 (@Gosuj).
Closes the duplicate/competing PRs: #18300, #18503, #18670, #18985,
#19037, #19056, #19100. Fixes#18442 and #19348.
Co-authored-by: quocanh261997 <17986614+quocanh261997@users.noreply.github.com>
Co-authored-by: cg2aigc <232694053+cg2aigc@users.noreply.github.com>
Co-authored-by: beibi9966 <beibei1988@proton.me>
Co-authored-by: Gosuj <123411271+Gosuj@users.noreply.github.com>
Co-authored-by: LeonSGP43 <154585401+LeonSGP43@users.noreply.github.com>
The Kanban board is documented as shared across all Hermes profiles, but
`kanban_db_path()` and `workspaces_root()` resolved through `get_hermes_home()`,
which returns the active profile's HERMES_HOME. When the dispatcher spawned a
worker with `hermes -p <profile> --skills kanban-worker chat -q "work kanban
task <id>"`, the worker rewrote HERMES_HOME to the profile subdirectory before
kanban_db.py imported, opening a profile-local `kanban.db` that did not contain
the dispatcher's task. `kanban_show` and `kanban_complete` failed; the
dispatcher's row stayed `running` and was retried/crashed. The same defect
applied to `_default_spawn`'s log directory and `worker_log_path`, so
`hermes kanban tail` did not see the worker's output.
Add `kanban_home()` in `hermes_cli/kanban_db.py` that resolves through
`HERMES_KANBAN_HOME` (explicit override) then `get_default_hermes_root()`,
which already understands the `<root>/profiles/<name>` and Docker / custom
HERMES_HOME shapes. Reroute `kanban_db_path`, `workspaces_root`, the
`_default_spawn` log directory, `gc_worker_logs`, and `worker_log_path`
through it. Profile-specific config, `.env`, memory, and sessions stay
isolated as before; only the kanban surface is shared.
Add a `TestSharedBoardPaths` regression class to `tests/hermes_cli/test_kanban_db.py`
covering: default install, profile-worker convergence, Docker custom HERMES_HOME,
Docker profile layout, explicit `HERMES_KANBAN_HOME` override, and a real
SQLite round-trip across dispatcher and worker HERMES_HOME perspectives.
The dispatcher/worker convergence tests fail on origin/main and pass after
the fix.
Update the `kanban.md` user-guide page and the misleading docstrings in
`kanban_db.py` to describe the shared-root behavior.
Fixes#19348
CLI/TUI sessions on the local backend now unconditionally use
os.getcwd() as the working directory. The terminal.cwd config value is
only consumed by gateway/cron/delegation modes (where there's no shell
to cd from).
Previously, 'hermes setup' would write an absolute path (e.g. $HOME)
into terminal.cwd which then pinned the CLI to that directory regardless
of where the user launched hermes from. This was a silent foot-gun —
the user's 'cd' was being ignored.
Changes:
1. cli.py: Restructured CWD resolution — if TERMINAL_CWD is not already
set by the gateway, and the backend is local, always use os.getcwd().
Config terminal.cwd is irrelevant for interactive CLI/TUI sessions.
2. setup.py: Moved the cwd prompt from setup_terminal_backend() to
setup_gateway(). It now only appears when configuring messaging
platforms and is labeled 'Gateway working directory'.
3. Tests: Rewrote test_cwd_env_respect.py to validate the new behavior:
explicit config paths are ignored for CLI, gateway pre-set values are
preserved, non-local backends keep their config paths.
4. Docs: Updated configuration.md, profiles.md, and
environment-variables.md to clarify that terminal.cwd only affects
gateway/cron mode on local backend.
Closes#19214
Apply agent.redact.redact_sensitive_text with force=True to log content
captured by _capture_log_snapshot before it reaches upload_to_pastebin.
On-disk logs are untouched. Compatible with the off-by-default local
redaction policy from #16794: this is upload-time-only and applies
regardless of security.redact_secrets because the public paste service
is the leak surface. A visible banner is prepended to each uploaded log
paste so reviewers know redaction was applied. --no-redact preserves
deliberate unredacted sharing for maintainer-coordinated cases.
The bug-report, setup-help, and feature-request issue templates direct
users to run hermes debug share and paste the resulting public URLs.
With redaction off by default per #16794, those uploads have been
carrying credentials onto paste.rs and dpaste.com.
force=True is non-negotiable: without it, redact_sensitive_text
short-circuits at agent/redact.py:322 when the env var is unset, so the
fix would silently be a no-op for its target audience. A regression
test pins this down.
Fixes#19316
* feat: add video_analyze tool for native video understanding
Adds a video_analyze tool that sends video files to multimodal LLMs
(e.g. Gemini) for analysis via the OpenRouter-compatible video_url
content type. Mirrors vision_analyze in structure, error handling,
and registration pattern.
Key design:
- Base64 encodes entire video (no frame extraction, no ffmpeg dep)
- Uses 'video_url' content block type (OpenRouter standard)
- Supports mp4, webm, mov, avi, mkv, mpeg formats
- 50 MB hard cap, 20 MB warning threshold
- 180s minimum timeout (videos take longer than images)
- AUXILIARY_VIDEO_MODEL env override, falls back to AUXILIARY_VISION_MODEL
- Same SSRF protection, retry logic, and cleanup as vision_analyze
Default disabled: registered in 'video' toolset (not in _HERMES_CORE_TOOLS).
Users opt in via: hermes tools enable video, or enabled_toolsets=['video'].
* feat(video): add models.dev capability pre-check + CONFIGURABLE_TOOLSETS entry
- Pre-checks model video capability via models.dev modalities.input
before expensive base64 encoding. Fails early with helpful message
suggesting video-capable alternatives (gemini, mimo-v2.5-pro).
- Passes optimistically if model unknown or lookup fails.
- Adds ModelInfo.supports_video_input() helper.
- Adds 'video' to CONFIGURABLE_TOOLSETS and _DEFAULT_OFF_TOOLSETS
so 'hermes tools enable video' works from CLI.
- 8 new tests for the capability check (37 total).
* refactor(video): remove models.dev capability pre-check
Removes _check_video_model_capability and ModelInfo.supports_video_input.
The vision_analyze tool doesn't pre-check image capability either — both
tools rely on the same pattern: send request, handle API errors gracefully
with categorized user-facing messages. The pre-check was inconsistent
(only worked for some providers/models) so drop it for parity.
* cleanup: compress comments, fix fragile timeout coupling
- Replace _VISION_DOWNLOAD_TIMEOUT * 2 with hardcoded 60s (no silent
breakage if vision timeout changes independently)
- Strip verbose comments and redundant log lines throughout
- No behavioral changes
- TestClampCommandNamesTriples: unit tests for 3-tuple support in
_clamp_command_names (short names, long names, collisions, multiple
entries, backward compat with 2-tuples)
- TestDiscordSkillCmdKeyDispatch: integration test through the full
discord_skill_commands pipeline verifying long skill names retain
their original cmd_key after clamping
- Add contributor CharlieKerfoot to AUTHOR_MAP
Enable OpenRouter's response caching feature (beta) via X-OpenRouter-Cache
headers. When enabled, identical API requests return cached responses for
free (zero billing), reducing both latency and cost.
Configuration via config.yaml:
openrouter:
response_cache: true # default: on
response_cache_ttl: 300 # 1-86400 seconds
Changes:
- Add openrouter config section to DEFAULT_CONFIG (response_cache + TTL)
- Add build_or_headers() in auxiliary_client.py that builds attribution
headers plus optional cache headers based on config
- Replace inline _OR_HEADERS dicts with build_or_headers() at all 5 sites:
run_agent.py __init__, _apply_client_headers_for_base_url(), and
auxiliary_client.py _try_openrouter() + _to_async_client()
- Add _check_openrouter_cache_status() method to AIAgent that reads
X-OpenRouter-Cache-Status from streaming response headers and logs
HIT/MISS status
- Document in cli-config.yaml.example
- Add 28 tests (22 unit + 6 integration)
Ref: https://openrouter.ai/docs/guides/features/response-caching
Point users to xAI's custom voices feature — clone your voice in the
console, paste the voice_id into tts.xai.voice_id. No code changes
needed; the existing TTS pipeline already handles arbitrary voice IDs.
- config.py: link to xAI custom voices docs in voice_id comment
- setup.py: prompt accepts custom voice IDs during xAI TTS setup
- tts.md: short section linking to xAI console and docs
* fix(gateway): config.yaml wins over .env for agent/display/timezone settings
Regression from the silent config→env bridge. The bridge at module import
time is correct for max_turns (unconditional overwrite), but every other
agent.*, display.*, timezone, and security bridge key was guarded by
'if X not in os.environ' — so a stale .env entry from an old 'hermes setup'
run would shadow the user's current config.yaml indefinitely.
Symptom: agent.max_turns: 500 in config.yaml, HERMES_MAX_ITERATIONS=60
in .env from an old setup, and the gateway silently capped at 60
iterations per turn. Gateway logs confirmed api_calls never exceeded 60.
Three changes:
1. gateway/run.py: drop the 'not in os.environ' guards for all agent.*,
display.*, timezone, and security.* bridge keys. config.yaml is now
authoritative for these settings — same semantics already in place
for max_turns, terminal.*, and auxiliary.*. Also surface the bridge
failure (previously 'except Exception: pass') to stderr so operators
see bridge errors instead of silently falling back to .env.
2. gateway/run.py: INFO-log the resolved max_iterations at gateway
start so operators can verify the config→env bridge did the right
thing instead of chasing a phantom budget ceiling.
3. hermes_cli/setup.py: stop writing HERMES_MAX_ITERATIONS to .env in
the setup wizard. config.yaml is the single source of truth. Also
clean up any stale .env entry left behind by pre-fix setups.
Regression tests in tests/gateway/test_config_env_bridge_authority.py
guard each config→env key against the 'stale .env shadows config' bug.
* fix(gateway): shutdown + restart hygiene (drain timeout, false-fatal, success log)
Three issues observed in production gateway.log during a rapid restart
chain on 2026-05-02, all fixed here.
1. _send_restart_notification logged unconditional success
adapter.send() catches provider errors (e.g. Telegram 'Chat not found')
and returns SendResult(success=False); it never raises. The caller
ignored the return value and always logged 'Sent restart notification
to <chat>' at INFO, producing a misleading success line directly
below the 'Failed to send Telegram message' traceback on every boot.
Now inspects result.success and logs WARNING with the error otherwise.
2. WhatsApp bridge SIGTERM on shutdown classified as fatal error
_check_managed_bridge_exit() saw the bridge's returncode -15 (our own
SIGTERM from disconnect()) and fired the full fatal-error path,
producing 'ERROR ... WhatsApp bridge process exited unexpectedly' plus
'Fatal whatsapp adapter error (whatsapp_bridge_exited)' on every
planned shutdown, immediately before the normal '✓ whatsapp
disconnected'. Adds a _shutting_down flag that disconnect() sets
before the terminate, and _check_managed_bridge_exit() returns None
for returncode in {0, -2, -15} while shutting down. OOM-kill (137)
and other non-signal exits still hit the fatal path.
3. restart_drain_timeout default 60s → 180s
On 2026-05-02 01:43:27 a user /restart fired while three agents were
mid-API-call (82s, 112s, 154s into their turns). The 60s drain budget
expired and all three were force-interrupted. 180s covers realistic
in-flight agent turns; users on very-long-reasoning models can still
raise it further via agent.restart_drain_timeout in config.yaml.
Existing explicit user values are preserved by deep-merge.
Tests
- tests/gateway/test_restart_notification.py: two new tests assert INFO
is only logged on SendResult(success=True) and WARNING with the error
string is logged on SendResult(success=False).
- tests/gateway/test_whatsapp_connect.py: parametrized test for
returncode in {0, -2, -15} proves shutdown-time exits are suppressed;
separate test proves returncode 137 (SIGKILL/OOM) still surfaces as
fatal even when _shutting_down is set.
- _check_managed_bridge_exit() reads _shutting_down via getattr-with-
default so existing _make_adapter() test helpers that bypass __init__
(pitfall #17 in AGENTS.md) keep working unmodified.
Discord's per-command name limit is 32 chars. When two skill slugs
share the same first 32 chars (or a skill slug clamps onto a reserved
gateway command name), only the first seen wins — the second is
dropped from the /skill autocomplete. The old behavior incremented a
``hidden`` counter silently, so skill authors had no way to discover
the drop short of noticing their skill was missing from the picker.
Not an actively-biting bug today (no collisions on the default catalog
as of 2026-05), but a landmine the moment someone ships a skill with a
long name. The earlier series in #18745 / #18753 / #18754 dropped the
other silent data-loss paths in the Discord /skill collector; this one
lights up the last remaining one.
Fix: promote ``_names_used`` from a set to a dict keyed by the clamped
name, mapping to the source cmd_key (or a ``"<reserved>"`` sentinel
for names inherited via ``reserved_names``). On collision, log a
WARNING naming both sides — the winner, the loser, the clamped name,
and what to rename.
Two phrasings:
* skill-vs-skill — "both clamp to X on Discord's 32-char command-name
limit; only the winner appears in /skill. Rename one skill's
frontmatter ``name:`` to differ in its first 32 chars."
* skill-vs-reserved — "collides with a reserved gateway command name;
the skill will not appear in /skill. Rename the skill's frontmatter
``name:``."
Tests: three cases in
``tests/hermes_cli/test_discord_skill_clamp_warning.py`` —
skill-vs-skill collision (warning names both cmd_keys + clamped prefix),
skill-vs-reserved collision (warning uses the distinct phrasing), and a
no-collision negative (zero warnings emitted).
``discord_skill_commands_by_category`` was lagging the flat
``discord_skill_commands`` collector on two counts. Both were actively
dropping skills from Discord's ``/skill`` autocomplete dropdown.
1. External-dir skills were filtered out. #18741 widened the flat
collector to accept ``SKILLS_DIR + skills.external_dirs`` but left
this sibling collector — the one ``_register_skill_group`` actually
uses on Discord — still matching ``SKILLS_DIR`` only. External
skills were visible in ``hermes skills list`` and the agent's
``/skill-name`` dispatch but silently absent from Discord's
``/skill`` picker. Widen the accepted roots to match, and derive
categories from whichever root the skill lives under so
``<ext>/mlops/foo/SKILL.md`` still lands in the ``mlops`` group.
2. 25-group × 25-subcommand caps were still applied. PR #11580
refactored ``/skill`` to a flat autocomplete (whose options Discord
fetches dynamically — no per-command payload concern) and its
docstring promises "no hidden skills." The collector kept the old
nested-layout caps anyway, silently dropping anything past the 25th
alphabetical category. On installs with 29 category dirs today (real
example: tail categories ``social-media``, ``software-development``,
``yuanbao`` going missing) this was biting immediately. Remove the
caps; ``hidden`` now reports only 32-char name-clamp collisions
against reserved names.
Tests: guard both behaviors. ``test_no_legacy_25x25_cap`` builds 30
categories × 30 skills each and asserts all 900 are returned.
``test_external_dirs_skills_included`` monkeypatches
``get_external_skills_dirs`` and asserts an external-dir skill makes
it into the result grouped under its own top-level directory.
Path.read_text() uses the system locale by default. On Windows CN/JP/KR
locales (GBK/CP932/CP949), reading a UTF-8 .env raises UnicodeDecodeError
as soon as it contains any non-ASCII byte (e.g. an em dash).
Pin encoding="utf-8" on every .env read in hermes_cli to match how the
rest of the codebase (load_dotenv at doctor.py:26) already decodes it.
Adds a regression test that monkeypatches Path.read_text to simulate a
GBK locale and asserts 'hermes doctor' no longer raises.
Refs #18637