Built-in commands with required args (e.g. /queue, /steer, /background)
were excluded from Telegram setMyCommands output, making them invisible
in the autocomplete menu. However, their handlers already return usage
text when invoked without arguments, so hiding them hurts discoverability.
This commit removes the _requires_argument filter for built-in commands
(COMMAND_REGISTRY) while keeping it for plugin-registered slash commands,
which may not provide a no-arg usage fallback.
Closes#24312
* fix(install): use `--extra all` not `--all-extras`; drop lazy-covered extras from [all]
Two coupled fixes for the Windows install hang where uv sync built
python-olm from sdist and failed on missing make.
# Root cause: --all-extras vs --extra all (credit: ethernet)
`uv sync --all-extras` installs every key in [project.optional-
dependencies], bypassing the curated [all] extra entirely. So even
when [all] excluded [matrix], [rl], [yc-bench], etc., the installer
pulled them anyway because they were still defined as extras. On
Windows that meant python-olm (no wheel, needs make to build from
sdist) and the install died there.
The right flag is `--extra all` — install just the [all] extra's
contents, respecting curation. Empirically verified via dry-run:
--all-extras: pulls python-olm, mautrix, ctranslate2, onnxruntime,
atroposlib, tinker, wandb, modal, daytona, vercel,
python-telegram-bot, discord.py, slack-bolt,
dingtalk-stream, lark-oapi, anthropic, boto3,
edge-tts, elevenlabs, exa-py, fal-client, faster-
whisper, firecrawl-py, honcho-ai, parallel-web
--extra all: pulls none of those — just [all]'s curated set
Dockerfile already uses `--extra all` (with comment explaining the
gotcha) — knowledge existed; the gap was install.sh / install.ps1 /
setup-hermes.sh.
Sites fixed: scripts/install.sh L1118, scripts/install.ps1 L809,
setup-hermes.sh L245.
# Companion fix: drop lazy-covered extras from [all]
`tools/lazy_deps.py` already covers anthropic, bedrock, exa,
firecrawl, parallel-web, fal, edge-tts, elevenlabs, modal, daytona,
vercel, all messaging platforms (telegram/discord/slack/matrix/
dingtalk/feishu), honcho, and faster-whisper. They were ALSO in
[all], which defeats the whole point of lazy-install — fresh
installs eager-pulled them and inherited whatever was broken
upstream (the matrix → python-olm → no Windows wheel chain being
the proximate symptom).
[all] now contains only what genuinely can't be lazy-installed:
cron, cli, dev, pty, mcp, homeassistant, sms, acp, google, web,
youtube. Same trim applied to [termux-all]. New regression test
asserts the contract: every extra in LAZY_DEPS must NOT also appear
in [all].
# Companion fix: surface uv progress + errors
setup-hermes.sh's hash-verified path swallowed uv's stderr to a
tempfile, identical to the install.sh bug fixed in PR #24504. Same
fix applied: stream stderr through directly so users see live
progress instead of staring at a frozen prompt.
# Files
- pyproject.toml: trim [all] and [termux-all] to non-lazy extras only.
- scripts/install.sh: --all-extras → --extra all; trim _ALL_EXTRAS /
_PYPI_EXTRAS to match.
- scripts/install.ps1: --all-extras → --extra all; trim $allExtras /
$pypiExtras to match.
- setup-hermes.sh: --all-extras → --extra all; stream stderr.
- tests/test_project_metadata.py: invert matrix-in-[all] assertion;
add lazy-coverage contract test.
- uv.lock: regenerated.
# Validation
5/5 metadata tests pass. 37/37 in update_autostash + tool_token_
estimation. `uv lock --check` passes. Empirical dry-run confirms
`--extra all` excludes python-olm + RL chain on the new lockfile.
* fix(install): parse [all] from pyproject.toml instead of mirroring it
ethernet's review point: the previous patch left two hand-mirrored
copies of [all]'s contents (in install.sh's $_ALL_EXTRAS and
install.ps1's $allExtras). That guarantees future drift the next
time pyproject.toml's [all] changes.
Now both scripts parse pyproject.toml at install time using stdlib
tomllib (Python 3.11+, which the bootstrap step already requires).
Single source of truth. The only purpose of the parsed list is to
build the 'Tier 2: [all] minus broken extras' fallback spec — so we
parse, filter against $brokenExtras, and rebuild the .[a,b,c] spec.
Also: removed redundant fallback tiers.
Before: Tier 1 [all]
Tier 2 [all] minus broken
Tier 3 PyPI-only extras (no git deps)
Tier 4 [web,mcp,cron,cli,messaging,dev]
Tier 5 .
After: Tier 1 [all]
Tier 2 [all] minus broken
Tier 3 .
Tier 3 (PyPI-only) and Tier 4 (dashboard+core) used to dodge the [rl]
git+sdist deps and the [matrix] python-olm build. Both are no longer
in [all] post-2026-05-12 lazy-install migration, so the carve-out
tiers had no remaining content. Tier 4 also referenced [messaging],
which is now lazy-installed — the hardcoded fallback was actually
inconsistent with the new policy.
Defensive fallback: if tomllib parse fails (corrupted pyproject,
unexpected schema), Tier 2 collapses to '.[all]' (same as Tier 1) so
the broken-extras path becomes a no-op rather than crashing.
* fix(gateway): hide Matrix from setup picker on Windows
Matrix is the one messaging platform that has no working install path
on Windows: [matrix] -> mautrix[encryption] -> python-olm, which has
Linux-only wheels and needs make + libolm to build from sdist. The
[all] cleanup in this PR keeps mautrix out of fresh installs, but a
user who picked Matrix in 'hermes setup gateway' would still walk
into the same sdist build failure when the wizard tried to install
the extra.
Hide the option at the picker so users never get the chance to try.
The gate lives in _all_platforms() — single source of truth for the
setup wizard, the curses gateway-config menu, and any future picker.
Adapter loading at runtime is intentionally NOT gated: users who
already have MATRIX_* env vars set (e.g. config copied from a Linux
install) keep working if they somehow have python-olm available.
This is the lowest-friction fix — picker visibility only.
Tests cover linux/darwin/win32 and verify other platforms aren't
collateral damage.
* feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback
Three coordinated mitigations for the Mini Shai-Hulud worm hitting
mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package
compromise that follows.
# What this PR makes true
1. Users with the poisoned mistralai 2.4.6 in their venv get a loud
detection banner with copy-pasteable remediation steps the moment
they run hermes (and on every gateway startup).
2. One quarantined / yanked PyPI package can no longer silently demote
a fresh install to 'core only' — the installer keeps every other
extra and tells the user which tier landed.
3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can
lazy-install on first use under a strict allowlist, instead of
eagerly pulling everything at install time.
# Detection: hermes_cli/security_advisories.py
- ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for
mistralai==2.4.6). Adding the next one is a single dataclass.
- detect_compromised() uses importlib.metadata.version() — no pip
dependency, works in uv venvs that lack pip.
- Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits
the startup banner to once per 24h per advisory.
- Acks persisted to security.acked_advisories in config.yaml; never
re-banner after ack.
- Wired into:
* hermes doctor — runs first, prints full remediation block
* hermes doctor --ack <id> — dismisses an advisory
* cli.py interactive run() and single-query branches — short
stderr banner pointing at hermes doctor
* gateway/run.py startup — operator-visible warning in gateway.log
# Lazy-install framework: tools/lazy_deps.py
- LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs,
memory.honcho, provider.bedrock, etc.) to pip specs.
- ensure(feature) installs missing deps in the active venv via the
uv → pip → ensurepip ladder (matches tools_config._pip_install).
- Strict spec safety regex rejects URLs, file paths, shell metas,
pip flag injection, control chars — only PyPI-by-name accepted.
- Gated on security.allow_lazy_installs (default true) plus the
HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs.
- Migrated three backends as proof of pattern:
* tools/tts_tool.py — _import_elevenlabs() calls ensure first
* plugins/memory/honcho/client.py — get_honcho_client lazy-installs
* tts.mistral / stt.mistral entries pre-registered for when PyPI
restores mistralai
# Installer fallback tiers
scripts/install.sh, scripts/install.ps1, setup-hermes.sh:
- Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one
array when a transitive breaks; users keep every other extra.
- New 'all minus known-broken' tier between [all] and the existing
PyPI-only-extras tier. Only kicks in when [all] fails resolve.
- All three tiers explicit: every fallback announces which tier
landed and prints a re-run hint when not on Tier 1.
- install.ps1 and install.sh both regenerate their tier specs from
the same _BROKEN_EXTRAS array so updates stay in sync.
Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral'
in its extra list — bug fixed by the refactor (mistral is filtered
out).
# Config
hermes_cli/config.py — DEFAULT_CONFIG.security gains:
- acked_advisories: [] (advisory IDs the user has dismissed)
- allow_lazy_installs: True (security gate for ensure())
No config version bump needed — both keys nest under existing
security: block, and load_config's deep-merge picks up DEFAULT_CONFIG
defaults for users with older configs.
# Tests
tests/hermes_cli/test_security_advisories.py — 23 tests covering:
- detect_compromised matches/non-matches, wildcard frozenset
- ack persistence, idempotence, blank rejection, config-failure path
- banner cache rate limiting + 24h re-banner + ack-stops-banner
- short_banner_lines / full_remediation_text / render_doctor_section /
gateway_log_message
- shipped catalog well-formedness invariant
tests/tools/test_lazy_deps.py — 40 tests covering:
- spec safety: 11 safe parametrized + 18 unsafe parametrized
- allowlist: unknown-feature rejection, namespace.name shape,
every shipped spec passes the safety regex
- security gating: config flag, env var, default, fail-open
- ensure() happy/sad paths: already-satisfied, install success,
pip stderr surfaced on failure, install-succeeds-but-still-missing
- is_available, feature_install_command
Combined: 63 new tests, all passing under scripts/run_tests.sh.
# Validation
- scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py
tests/tools/test_lazy_deps.py → 63/63 passing
- scripts/run_tests.sh tests/hermes_cli/test_doctor.py
tests/hermes_cli/test_doctor_command_install.py
tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py
tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing
- scripts/run_tests.sh tests/hermes_cli/ tests/tools/ →
9191 passed, 8 pre-existing failures (verified on origin/main
before this change)
- bash -n on install.sh and setup-hermes.sh → OK
- py_compile on all modified .py files → OK
- End-to-end smoke test of detect_compromised + render_doctor_section
+ gateway_log_message with mocked installed version → produces
copy-pasteable remediation output
# Community
Full advisory + remediation steps:
website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md
Short-form post drafts (Discord, GitHub pinned issue, README banner):
scripts/community-announcement-shai-hulud.md
Refs: PR #24205 (mistral disabled), Socket Security advisory
<https://socket.dev/blog/mini-shai-hulud-worm-pypi>
* build(deps): pin every direct dep to ==X.Y.Z (no ranges)
Companion to the supply-chain advisory work: replace every >=/</~= range
in pyproject.toml's [project.dependencies] and [project.optional-dependencies]
with an exact ==X.Y.Z pin sourced from uv.lock.
Why: ranges allow PyPI to ship a fresh version of any direct dep at any
time without a code review on our side. With ranges, the malicious
mistralai 2.4.6 release would have been pulled by every fresh
'pip install -e .[all]' for the hours between upload and PyPI's
quarantine — exactly the install window we got hit on. Exact pins close
that window: the only way a new package version reaches a user is via
an intentional update on our end.
What the user-facing change is: nothing, behavior-wise. Every package
resolves to the same version it was already resolving to via uv.lock —
the pins just remove the resolver's freedom to pick a different one.
Cost: any user installing Hermes alongside another package that requires
a newer pin gets a resolver conflict. Acceptable for our isolated-venv
install path; documented in the new comment block.
Build-system requires line (setuptools>=61.0) is intentionally left
as a range — pinning the build backend would block fresh pip from
bootstrapping the build on architectures where that exact wheel isn't
available.
mistral extra (mistralai==2.3.0) is pinned but stays out of [all]
(per PR #24205). 'uv lock' regeneration will fail until PyPI restores
mistralai; lockfile regeneration is gated behind that, NOT on every PR.
LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy-
install pathway can never resolve a different version than the one
declared in pyproject.toml.
Validation:
- Cross-checked all 77 pinned direct deps in pyproject.toml against
uv.lock — every pin matches the resolved version exactly.
- Cross-checked all LAZY_DEPS specs against uv.lock — same.
- 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly.
- tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py
→ 63/63 passing (every shipped spec passes the safety regex).
- Doctor + TTS + transcription targeted suite → 146/146 passing.
* build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra
You asked: 'what about the dependencies the dependencies rely on?' —
correctly noting that exact-pinning direct deps in pyproject.toml does
NOT cover the transitive graph. `pip install` and `uv pip install` both
re-resolve transitives fresh from PyPI at install time, so a compromised
transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would
still hit our users even with every direct dep exact-pinned.
# What this commit fixes
1. **Both real installer scripts now prefer `uv sync --locked` as Tier 0.**
uv.lock records SHA256 hashes for every transitive — a compromised
package with a different hash gets REJECTED. Falls through to the
existing `uv pip install` cascade if the lockfile is missing or
stale, with a loud warning that the fallback path does NOT
hash-verify transitives. Previously only `setup-hermes.sh` (the dev
path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1`
(the paths fresh users actually run) skipped it.
2. **Removed the `[mistral]` extra entirely.** The `mistralai` PyPI
project is fully quarantined right now — every version returns 404,
so any pin we wrote was unresolvable, which broke `uv lock --check`
in CI. Restoration is documented in pyproject.toml as a 5-step
checklist (verify, re-add extra, re-enable in 4 modules, regenerate
lock, optionally re-add to [all]).
3. **Regenerated uv.lock.** 262 packages, mistralai/eval-type-backport/
jsonpath-python pruned. `uv lock --check` now passes.
# Defense-in-depth view
| Layer | Where | Protects against |
|----------------------------|-------------------|-------------------------------------------|
| Exact pins in pyproject | direct deps | new mistralai 2.4.6-style direct compromise |
| uv.lock + `--locked` install | transitive graph | transitive worm injection |
| Tier-0 hash-verified path | install.sh / .ps1 | actually USE the lockfile in fresh installs |
| `uv lock --check` CI gate | every PR | drift between pyproject and lockfile |
| `hermes_cli/security_advisories.py` | runtime | cleanup for users who already got hit |
The exact pinning + hash verification together close the supply-chain
gap. Without the lockfile path, exact pins alone are theater.
# Validation
- `uv lock --check` → passes (262 packages resolved, no drift).
- `bash -n` on install.sh + setup-hermes.sh → OK.
- 209/209 tests passing across new + adjacent test files
(test_lazy_deps.py, test_security_advisories.py, test_doctor.py,
test_tts_mistral.py, test_transcription_tools.py).
- TOML parse OK.
* chore: remove community announcement drafts (PR body covers it)
* build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard)
Extends the lazy-install framework to cover everything that's not used by
every hermes session. Base install drops from ~60 packages to 45.
Moved out of core dependencies = []:
- anthropic (only when provider=anthropic native, not via aggregators)
- exa-py, firecrawl-py, parallel-web (search backends; only when picked)
- fal-client (image gen; only when picked)
- edge-tts (default TTS but still optional)
New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web]
[fal] [edge-tts]. All added to [all].
New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel},
tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix},
terminal.{modal,daytona,vercel}, tool.dashboard.
Each import site now calls ensure() before importing the SDK. Where the
module had a top-level try/except (telegram, discord, fastapi), the
graceful-fallback pattern was extended to lazy-install on first
check_*_requirements() call and re-bind module globals.
Updated test_windows_native_support.py tzdata check from snapshot
(>=2023.3 literal) to invariant (any version + win32 marker).
Validation:
- Base install: 45 packages (was ~60); 6 newly-extracted packages absent
- uv lock --check: passes (262 packages, no drift)
- 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing
- py_compile clean on all 12 modified modules
Handle MiniMax OAuth expiry values consistently across CLI and dashboard
flows, fix CLI status/add behavior, and force pooled OAuth runtime
requests through Anthropic Messages.
- web_server._minimax_poller: parse expired_in via the shared resolver
so unix-ms absolute timestamps stop landing as TTL seconds and crashing
with 'year 583911 is out of range' when a user connects MiniMax OAuth
from the dashboard.
- auth._minimax_oauth_login / _refresh_minimax_oauth_state: same fix on
the CLI login + refresh paths.
- auth.get_auth_status: dispatch minimax-oauth to its dedicated status
function instead of falling through.
- auth_commands.auth_add_command: 'hermes auth add minimax-oauth' now
starts the device-code login flow and persists a pool entry with the
access + refresh tokens, instead of requiring credentials to already
exist.
- runtime_provider._resolve_runtime_from_pool_entry: pin pooled
minimax-oauth credentials to anthropic_messages so a stale
model.api_mode: chat_completions can't send requests to
/anthropic/chat/completions and trigger MiniMax nginx 404s.
Co-authored-by: Cursor <cursoragent@cursor.com>
Free-tier users were seeing 'No free models currently available.' in the
`hermes model` and post-login pickers even though qwen/qwen3.6-plus is
free on the Portal right now. Three independent breakages compounded:
1. The docs-hosted catalog manifest at website/static/api/model-catalog.json
was not regenerated when _PROVIDER_MODELS['nous'] was updated, so users
fetching the manifest got a list that didn't include qwen/qwen3.6-plus.
2. _resolve_nous_pricing_credentials() returned ('', '') on any auth blip,
collapsing get_pricing_for_provider('nous') to {} and making every
curated model fall through the free-tier filter as 'paid'.
3. Even with healthy pricing, the picker only ever showed models from the
in-repo curated list intersected with live pricing — a Portal-flagged
free model not yet in the curated list could never appear.
Changes:
- hermes_cli/models.py: new union_with_portal_free_recommendations() that
augments the curated list with Portal freeRecommendedModels entries
(with synthetic free pricing so partition keeps them). The Portal's
/api/nous/recommended-models endpoint is now the source of truth for
free-tier surfacing — old Hermes builds will see new free models
without a CLI release.
- hermes_cli/models.py: _resolve_nous_pricing_credentials() falls back to
the public inference base URL when runtime cred resolution fails.
The /v1/models endpoint exposes pricing without auth, so silently
returning {} just because a refresh token expired was wrong.
- hermes_cli/auth.py + hermes_cli/main.py: both free-tier picker call
sites call union_with_portal_free_recommendations() before partition.
- tests/hermes_cli/test_models.py: 7 tests covering union behaviour
(prepend, dedup, end-to-end with stale pricing, empty/missing/error
payloads, invalid entries).
- tests/hermes_cli/test_model_catalog.py: drift guard
TestManifestMatchesInRepoLists fails CI when _PROVIDER_MODELS['nous']
or OPENROUTER_MODELS is edited without re-running
scripts/build_model_catalog.py. Verified empirically that removing a
manifest entry triggers an assertion with an actionable error message.
Validation:
- 133/133 targeted tests pass (test_models, test_model_catalog,
test_auth_nous_provider).
- Live E2E against the real Portal:
- Stale curated list ['claude-opus','claude-sonnet','gpt-5.4'] (no
qwen) → after union: ['qwen/qwen3.6-plus', ...] →
partition(free_tier=True): selectable=['qwen/qwen3.6-plus'].
- Simulated expired refresh token → anon fetch returns 403 pricing
entries including qwen/qwen3.6-plus -> {prompt:0, completion:0}.
- ruff: clean.
cua-driver was only installed once on toolset enable: `_run_post_setup` early-returns when the binary is already on PATH, so upstream fixes (e.g. v0.1.6 Safari window-focus fix) never reached existing users without manual reinstall.
Two refresh points now:
- `hermes update` re-runs the upstream installer at the end of the update if cua-driver is on PATH (macOS-only, no-op otherwise). Ties driver freshness to the user-controlled update cadence — no startup latency, no per-launch GitHub API call.
- `hermes computer-use install --upgrade` for manual force-refresh.
The upstream `install.sh` always pulls the latest release, so re-running is the canonical upgrade path. No version-comparison logic needed.
`hermes computer-use status` now shows the installed version, and points at `--upgrade` for refreshing.
Fixes#22832.
## Root cause
`hermes_cli/web_server.py:start_oauth_login` dispatched OAuth flows by
the catalog's `flow` field rather than provider id:
if catalog_entry["flow"] == "pkce":
return _start_anthropic_pkce()
The catalog had two `flow: "pkce"` entries — `anthropic` and
`minimax-oauth` — so clicking "Login" on MiniMax in the dashboard's
Keys tab unconditionally launched the Anthropic/Claude PKCE flow.
## Fix
Three changes in `hermes_cli/web_server.py`:
1. Catalog entry for `minimax-oauth` changed from `flow: "pkce"` to
`flow: "device_code"`. From a UX perspective MiniMax is a
verification-URI + user-code flow (open URL, enter code, backend
polls) — same shape as Nous's device-code flow. The PKCE bit
(verifier + challenge from `_minimax_pkce_pair`) is a security
extension that doesn't change the operator experience; the existing
dashboard modal already renders `device_code` correctly for this UX.
2. New MiniMax branch in `_start_device_code_flow`, mirroring the
existing Nous branch but calling MiniMax-specific helpers
(`_minimax_request_user_code`, `_minimax_pkce_pair`). Stashes
verifier + state in the session for the poller to consume. Handles
the overloaded `expired_in` field (could be unix-ms timestamp OR
seconds-from-now duration) the same way `_minimax_poll_token` does.
3. New `_minimax_poller` background thread mirroring `_nous_poller`.
Calls `_minimax_poll_token` → on success builds the same
`auth_state` dict the CLI flow (`_minimax_oauth_login`) builds, and
persists via `_minimax_save_auth_state` so the dashboard path leaves
the system in the same state as `hermes auth add minimax-oauth`.
Plus a dispatcher tightening to prevent regression: the `pkce` branch
now requires `provider_id == "anthropic"`, so any future PKCE provider
added without a proper start function gets a clean
`400 Unsupported flow` rather than silently launching Anthropic OAuth.
## Test
New `tests/hermes_cli/test_web_oauth_dispatch.py`:
- Regression test asserting MiniMax start does NOT return claude.ai
- Sanity test that Anthropic PKCE still works after the dispatcher
tightening
- Forward-looking test: a hypothetical pkce-flagged provider without
an explicit branch is rejected cleanly rather than misrouted
## Limitations
- The dashboard MiniMax path defaults to `region="global"`. CN-region
operators can still use the CLI flow which supports `--region cn`.
Adding a region toggle to the dashboard UI is a follow-up.
The old mtime-tracking staleness machinery (_tui_build_needed,
_hermes_ink_bundle_stale, _find_bundled_tui) tried to avoid rebuilding
by comparing source timestamps to dist/entry.js. This was fragile and
added ~100 lines of code. Replace with three clear paths:
1. HERMES_TUI_DIR set (prebuilt/nix): just node dist/entry.js, no build
2. --dev mode: tsx src/entry.tsx, no build, hot reload
3. Normal: always npm run build (esbuild is ~1s, correctness > caching)
Also error when HERMES_TUI_DIR is set with --dev (footgun: prebuilt
bundle has no source code to hot-reload).
The /model picker for Nous Portal users was returning the in-repo
_PROVIDER_MODELS["nous"] snapshot — which only updates on Hermes
releases — instead of the remote manifest published at
https://hermes-agent.nousresearch.com/docs/api/model-catalog.json.
OpenRouter already pulled from the manifest via fetch_openrouter_models;
"nous" was the only curated provider where the existing manifest
plumbing (get_curated_nous_model_ids → get_curated_nous_models) was
defined but not wired into the picker pipeline. Switch the curated
build in list_authenticated_providers to use it, with the same
graceful fallback to the in-repo snapshot when the manifest is
unreachable.
Test: tests/hermes_cli/test_model_catalog.py exercises the picker with
a patched manifest and asserts the manifest's nous list reaches
list_picker_providers. Falls-back-to-static path was already covered
by test_curated_nous_ids_falls_back_to_hardcoded_on_empty_catalog.
Follow-up to PR #23824. Adds two correctness fixes on top of the
contributor's salvaged commit:
1. Stale-dist fallback no longer gated on `fatal=False`. `cmd_dashboard`
passes `fatal=True` and is the primary scenario this fallback is for
(issue #23817 — Windows Scheduled Task at logon). The previous gate
meant the fallback never fired in the case it was designed for.
2. `--skip-build` now verifies the dist actually exists before starting
the server. Without this, a misconfigured pre-build would launch the
dashboard pointing at a missing dist and silently serve 404s. We now
exit 1 with a clear "pre-build first: cd web && npm run build"
message, and on success print which dist directory is being used.
Verified end-to-end on Linux:
- build fails + stale dist (fatal=True) -> fallback fires
- build fails + no dist (fatal=True) -> exit 1 with stderr surfaced
- build fails + stale dist (fatal=False) -> fallback fires
- --skip-build + missing dist -> exit 1 with clear guidance
- --skip-build + valid dist -> 'Skipping web UI build...'
On Windows systems using a Chinese GBK locale, `hermes update` could misreport the Web UI build as failed even when `npm run build` actually succeeded. The failure was caused by Python decoding captured npm output with the process locale inside a background subprocess reader thread. When npm emitted bytes such as `0x85`, decoding under GBK raised `UnicodeDecodeError`, and Hermes then surfaced a misleading "Web UI build failed" warning.
This change makes the npm install/npm ci path and the Web UI build step decode captured output explicitly as UTF-8 with `errors="replace"`. That keeps unexpected bytes from crashing output collection, preserves successful builds, and prevents false negatives during update on Windows.
The patch also adds regression tests that verify these subprocess calls always use explicit UTF-8 decoding with replacement semantics.
* Revert "fix(goals): force judge to use tool calls instead of JSON-text replies (#23547)"
This reverts commit a63a2b7c78.
* Revert "fix(goals): forward standing /goal state on auto-compression session rotation (#23530)"
This reverts commit 4a080b1d5a.
* Revert "feat(goals): /goal checklist + /subgoal user controls (#23456)"
This reverts commit 404640a2b7.
A YAML parse error in ~/.hermes/config.yaml caused load_config() to print
one line to stdout (Warning: Failed to load config: ...) and silently fall
back to DEFAULT_CONFIG, dropping every user override (auxiliary providers,
fallback chain, model settings). Users only noticed when downstream
behavior misbehaved — see issue #23570 where a tab-indent error in the
auxiliary section caused aux fallback to use OpenRouter (depleted) instead
of the configured Codex/MiniMax chain.
Now: log at WARNING (so 'hermes logs' surfaces it), write a prominent line
to stderr, dedup on (path, mtime_ns, size) so concurrent loads don't spam,
and re-warn after the user edits the file. Both call sites (raw read +
merged load) route through the same helper.
Refs #23570
Surface ready tasks that nobody claims within a threshold (default
30 min) regardless of why. One identity-agnostic signal that catches:
- Operator typo'd the assignee
- Profile was deleted, leaving its tasks stranded
- External worker pool (Codex CLI lane, custom daemon) is down
- Dispatcher misconfigured (wrong board / wrong HERMES_HOME)
Today the dispatcher correctly skips these (no respawn loop, good)
but nothing surfaces the fact that operator-actionable work is
accumulating. The new `stranded_in_ready` rule does that without
requiring a manual lane registry — it reads the most recent ready-
transition event (`created` / `promoted` / `reclaimed` / `unblocked`)
and fires when (now - last_ready_ts) > threshold.
Severity escalates with age: warning at threshold, error at 2x,
critical at 6x. The cli_hint and reassign actions point operators
at the right next step.
Out of scope deliberately:
- Lane registry (#20157 closed) — this signal supersedes it.
- Pushing the diagnostic into messaging gateways — diagnostics
are pull-only via 'hermes kanban diagnostics' for now; gateway
push is a separate UX decision.
Tests: 10 new + 461 existing kanban tests pass. E2E verified end-
to-end via 'hermes kanban diagnostics --json' against a 2h-old
stranded task — surfaces as error severity with correct actions.
Live-tested on gemini-3-flash-preview the judge kept returning empty
or non-JSON content, tripping the consecutive-parse-failures auto-
pause. Free-form JSON output is hopeful; tool-call schemas are
enforced server-side by virtually every modern provider.
Two new tools the judge calls:
- submit_checklist(items) — Phase A, decompose
- update_checklist(updates, new_items, reason) — Phase B, evaluate
Both phases now call the auxiliary client with tool_choice forcing
the right tool. read_file remains for Phase B history inspection,
with the loop exiting only when update_checklist is called or the
read budget is exhausted (at which point read_file is dropped from
the toolbox and update_checklist is forced).
Robustness:
- _call_judge_with_tool_choice falls back tool_choice forced→required→
auto if the provider rejects a particular shape.
- If a fully-broken provider still returns content instead of a tool
call, the legacy JSON-text parsers stay around as a last-ditch
backstop so we never silently lose a checklist.
- _normalize_update_args replaces the JSON parser for the apply
layer; same 1-based→0-based conversion + terminal-status filter.
Live verification: same fizzbuzz goal that was hitting 'judge model
returned unparseable output 3 turns in a row' before now terminates
in 2 turns, all 11 items marked completed with item-specific
evidence, no auto-pause. Agent log shows
'produced 11 checklist items via tool call' instead of the JSON-
parse path.
Tests: 7 new cases for the tool-call path (Phase A success, Phase B
update only, Phase B read_file→update, JSON-content backstop,
empty-text item dropping, non-terminal status filter).
When run_agent's _compress_context fires mid-turn it ends the parent
session in SessionDB and creates a new continuation session with a
fresh session_id. The /goal state is keyed on session_id in
state_meta ("goal:<sid>"), so without forwarding the goal silently
disappears: _get_goal_manager() rebinds for the new session_id,
load_goal() returns None, mgr.is_active() is False, and the
continuation loop dies with no user-visible signal.
Fix: in the same SessionDB transaction block that creates the
continuation session, copy state_meta[goal:<old>] →
state_meta[goal:<new>] when present. No-op when the user has no
active goal. Logged at INFO so a stuck loop is debuggable.
Tests cover the round-trip via SessionDB and the no-op path.
Affects all three run-conversation surfaces (CLI, gateway, TUI
gateway) because _compress_context is the single rotation site.
* feat(goals): /goal checklist + /subgoal user controls
Two-phase judge for /goal — Phase A decomposes the goal into a detailed
checklist on first turn; Phase B evaluates each pending item harshly
against the agent's most recent response. The goal completes only when
every item is in a terminal status (completed or impossible). Adds
/subgoal so the user can append, complete, mark impossible, undo,
remove, or clear items the judge missed or got wrong.
Mechanics:
- GoalState gains `checklist` and `decomposed` fields, both backwards
compatible (old state_meta rows load unchanged).
- Phase A: aux call writes a harsh, exhaustive checklist; biased toward
more items not fewer. Falls through to legacy freeform judge when
decompose fails.
- Phase B: judge gets the checklist + last-response snippet + path to
a per-session conversation dump at <HERMES_HOME>/goals/<sid>.json.
A bounded read_file tool (max 5 calls per turn, restricted to that
one file) lets the judge inspect history when the snippet is
ambiguous. Stickiness in code: terminal items are frozen, only the
user can revert via /subgoal undo.
- Continuation prompt shows checklist progress when non-empty;
reverts to old prompt when empty.
- Status line shows M/N done counts.
CLI + gateway + TUI gateway all pass the agent reference into
evaluate_after_turn so the dump can be written. Gateway-side
/subgoal is allowed mid-run since it only modifies the checklist
the judge consults at turn boundaries.
Tests: 24 new cases — backcompat round-trip, Phase A decompose,
Phase B updates + new_items + stickiness, user override flows,
conversation dump (incl. unsafe-sid sanitization), judge read_file
restriction. Existing freeform-mode tests updated to patch the
renamed `judge_goal_freeform` and skip Phase A explicitly.
* fix(goals): off-by-one in judge index, message-list plumbing, prompt tuning
Three live-test findings from running /goal end-to-end against
gemini-3-flash-preview as the judge:
1. Off-by-one bug — the judge sees the checklist rendered with 1-based
indices ('1. [ ] foo, 2. [ ] bar') but the apply layer indexed
state.checklist as 0-based. Result: every judge update landed on
the wrong item, evidence got attached to neighbouring rows, and
the genuine 'first pending' item (usually #1) never got marked.
Fix: convert 1 → 0 in _parse_evaluate_response. Also tightened the
user prompt to call out the 1-based scheme explicitly. New tests
cover the parser conversion + an end-to-end fake-judge round-trip.
2. Conversation dump never happened — _extract_agent_messages tried
common AIAgent attribute names (.messages, .conversation_history,
etc.) but AIAgent doesn't expose the message list as an instance
attribute; it lives inside run_conversation()'s scope. Result: the
judge's read_file tool always saw history_path=unavailable. Fix:
added an explicit messages= kwarg to evaluate_after_turn that all
three call sites (CLI, gateway, TUI gateway) now pass directly.
Agent-attribute extraction kept as back-compat fallback.
3. Prompt was too harsh on simple goals. The original 'be HARSH,
default to leaving items pending' wording made the judge refuse
to mark 'file exists' completed even after the agent ran ls,
test -f, os.path.isfile, and find — burning the entire 8-turn
budget on a fizzbuzz task. Softened to 'strict but not absurd'
with explicit guidance on what counts as evidence and a directive
not to require re-proving items already established earlier.
Re-tested live with the same fizzbuzz goal: now terminates in 2
turns with all 8 checklist items correctly attributed to their
own evidence. /subgoal user-action flow (add / complete / undo /
impossible) verified live as well.
When kanban_complete rejects a created_cards list as hallucinated, the
task is intentionally left in-flight (the gate runs before the write
txn) so the worker can retry with a corrected list or pass
created_cards=[] to skip the check. The retry path already worked, but
the previous error wording read like a terminal failure and workers
were observed abandoning the run instead of trying again.
Spell out the recovery path explicitly in the tool_error response
("Your task is still in-flight ... Retry kanban_complete with ...") and
add regression coverage at both the kernel and tool layers so the
retry contract — and the wording the worker depends on to discover
it — is pinned.
Fixes#22923
Workers running slow models (e.g. kimi-k2.6) can spend longer than
DEFAULT_CLAIM_TTL_SECONDS inside a single tool-free LLM call, making
no tool calls and therefore not heartbeating. release_stale_claims
previously reclaimed these healthy workers, producing the
spawn-then-immediately-reclaim loop reported in #23025.
When a stale-by-TTL claim's host-local worker PID is still alive,
extend the claim (emit a claim_extended event) rather than killing
it. enforce_max_runtime / detect_crashed_workers remain the upper
bounds for genuinely wedged or dead workers. Reclaim events now also
record claim_expires, last_heartbeat_at, worker_pid, and host_local
so operators can see why a worker was killed.
Follow-up to the previous commit's notifier behavior change. Two test fixes:
1. `tests/gateway/test_kanban_notifier.py` gains
`test_notifier_redelivers_same_kind_on_dispatch_cycle` — pins the new
contract directly: a task that crashes, gets reclaimed, and crashes
again notifies the user BOTH times. Before #21398 the second crash
silently dropped because the subscription was already deleted.
2. `tests/hermes_cli/test_kanban_notify.py::
test_notifier_unsubs_after_abnormal_events[gave_up|crashed|timed_out]`
is flipped. Those tests were added in the salvage of #22941 and
asserted the OLD behavior (subscription deleted after gave_up /
crashed / timed_out). They're now obsolete — the new contract is
"subscription survives a non-final terminal event so retries reach
the user." Updated docstring + asserts; the cursor-advance check is
added to confirm the dedup mechanism still works.
The `test_notifier_unsubs_after_completed_event` test stays untouched
because `completed` IS still a terminal event that triggers unsub
(the task hits `done` status, which is handled by the `task_terminal`
branch in the notifier loop).
Builds on @kshitijk4poor's CLI handoff stub. The original PR's flow
deferred everything to whenever a real user happened to message the
target platform; this rewrites it so the gateway picks up handoffs
immediately and the destination chat just starts working.
State machine on sessions table replaces the boolean flag:
None -> 'pending' -> 'running' -> ('completed' | 'failed')
plus handoff_error for failure reasons. CLI request_handoff /
get_handoff_state / list_pending_handoffs / claim_handoff /
complete_handoff / fail_handoff helpers wrap the transitions.
CLI side (cli.py): /handoff <platform> validates the platform's home
channel via load_gateway_config, refuses if the agent is mid-turn,
flips the row to 'pending', and poll-blocks (60s) on terminal state.
On 'completed' it prints the /resume hint and exits the CLI like
/quit. On 'failed' or timeout it surfaces the reason and the CLI
session stays intact.
Gateway side (gateway/run.py): new _handoff_watcher background task
scans state.db every 2s, atomically claims pending rows, and runs
_process_handoff for each. _process_handoff:
1. Resolves the platform's home channel.
2. Asks the adapter for a fresh thread via the new
create_handoff_thread(parent_chat_id, name) capability so the
handed-off conversation gets its own scrollback. Adapters that
don't support threads (or fail) return None and the watcher
falls back to the home channel directly.
3. Constructs a SessionSource keyed as 'thread' when a thread was
created, 'dm' otherwise, then session_store.switch_session
re-binds the destination key to the CLI session_id. The full
role-aware transcript replays via load_transcript on the next
turn (no flat-text injection into context_prompt).
4. Forges a synthetic MessageEvent(internal=True) with the handoff
notice and dispatches through _handle_message; the agent runs
against the loaded transcript and adapter.send delivers the
reply.
5. Marks the row 'completed' on success, 'failed' (+error) on any
exception.
Adapter capability (gateway/platforms/base.py): create_handoff_thread
default returns None. Three overrides:
- Telegram (gateway/platforms/telegram.py): wraps _create_dm_topic
so DM topics (Bot API 9.4+) and forum supergroups both work.
- Discord (gateway/platforms/discord.py): parent.create_thread on
text channels with a seed-message + message.create_thread
fallback for permission edge cases. Skips DMs and other
non-thread-capable parents.
- Slack (gateway/platforms/slack.py): posts a seed message and
returns its ts as the thread anchor — Slack threads are
message-anchored.
In thread mode, build_session_key keys the destination without
user_id (thread_sessions_per_user defaults to False) so the synthetic
turn and any later real-user message in the thread share the same
session_key — seamless takeover without race.
CommandDef stays cli_only=True (handoff is initiated from the CLI;
gateway exposes /resume for the reverse direction).
Removed the original PR's _handle_message_with_agent handoff hook
(transcript-as-text injection into context_prompt) and the
send_message_tool notification — both replaced by the watcher path.
Tests rewritten around the new state machine: 13/13 pass.
E2E-validated thread + no-thread paths and the failure path against
real worktree imports with mocked adapters.
Adds /handoff <platform> CLI command that queues the current session for
resume on the configured home channel of any messaging platform.
CLI side:
- /handoff telegram — marks session in shared DB, sends summary to
the Telegram home channel via send_message
- /handoff discord — same for Discord
- Supports telegram, discord, slack, whatsapp, signal, matrix
Gateway side:
- On new session creation, checks for pending handoffs for the
incoming message's platform
- If found, loads the CLI session's full conversation history and
injects it into the context prompt as a handoff transcript
- Agent continues the conversation seamlessly
Files:
- hermes_state.py: handoff_pending, handoff_platform columns + helpers
- cli.py: _handle_handoff_command dispatch + handler
- hermes_cli/commands.py: CommandDef entry
- gateway/run.py: handoff detection in _handle_message_with_agent
- tests/hermes_cli/test_session_handoff.py: 8 tests
Follow-up to the previous commit's toolset-vs-skill validation.
The contributor's fix raises ValueError on the first toolset name found
in the skills list. That works for one mistake, but agents that confuse
skills with toolsets usually pass several at once
(`skills=["web", "browser", "terminal"]`) — and serial-correcting one
per failure round-trip wastes tokens. Collect all toolset-shaped
entries first, then raise once with the full list.
The error message is also slightly clearer:
'web', 'browser', 'terminal' are toolset names, not skill name(s).
Put toolsets in the assignee profile's `toolsets:` config instead of
per-task skills. Skills are named skill bundles (e.g. `kanban-worker`,
`blogwatcher`); toolsets are runtime capabilities (e.g. `web`,
`browser`, `terminal`).
vs. the previous "the assignee profile's toolsets" — explicitly naming
the YAML key (`toolsets:`) and giving concrete examples in both
categories closes the conceptual gap that produced the bug to begin
with.
Adds one regression test (test_create_task_skills_lists_all_toolset_typos)
covering the multi-name aggregation path. The single-typo test from
the original PR still passes (the loose `match="toolset name"` matches
both singular and plural forms).
Follow-up to the previous commit's safe-int task_age fix.
The original PR shipped without test coverage. This commit adds:
- test_safe_int_accepts_int_and_int_string — sanity for the well-typed
path so the helper itself can't quietly start swallowing valid values.
- test_safe_int_returns_none_on_corrupt_inputs — the failure modes
(None, '%s', 'abc', '', '1.5', random objects). Covers both the
ValueError and TypeError catch branches.
- test_task_age_handles_corrupt_created_at — the headline regression:
a task with created_at='%s' used to raise ValueError and turn
GET /api/plugins/kanban/board into a 500.
- test_task_age_handles_corrupt_started_and_completed — confirms the
safe-int treatment is consistent across all three timestamp fields.
- test_task_age_well_formed_task — regression that the safe path
doesn't change observable output for normal data.
- test_task_dict_survives_corrupt_created_at — defense in depth.
Writes a corrupt row directly via SQL, reads it back through the
ORM, and confirms task_age + the surrounding plugin_api guard
degrade gracefully instead of crashing.
Also adds the AUTHOR_MAP entry for the contributor's GitHub-noreply
email so release notes credit @baocin (the commit was authored locally
as `aoi <aoi@hino.local>` — re-attributed during salvage to the
github noreply form).
Follow-up to the previous commit's contributor cherry-pick.
The cherry-picked change replaced the bare ``["hermes", ...]`` spawn with
``[sys.executable, "-m", "hermes", ...]``. The intent was right (avoid
PATH dependence — cron, systemd User= services, launchd jobs, and other
detached dispatcher invocations routinely run with a stripped $PATH that
doesn't include the venv's bin/, breaking the bare-shim spawn) but the
module name is wrong: there is no top-level ``hermes`` package. The
console-script entry point in pyproject.toml is
``hermes = "hermes_cli.main:main"``, and ``python -m hermes`` fails with
``No module named hermes``. The cherry-picked form would have replaced a
sometimes-broken spawn with an always-broken one.
This commit:
- Adds ``_resolve_hermes_argv()``, mirroring ``gateway.run._resolve_hermes_bin``.
Tries ``shutil.which("hermes")`` first (preferred — keeps existing ``ps``
output and log lines familiar in the common case) and falls back to
``[sys.executable, "-m", "hermes_cli.main"]`` when the shim is not on
PATH. The fallback goes through the running interpreter so it's
PATH-independent. Kept as a local helper rather than imported from
gateway because ``hermes_cli`` sits below ``gateway`` in the dependency
order.
- Switches the dispatcher's ``cmd`` list to use ``*_resolve_hermes_argv()``.
- Adds three regression tests:
* ``test_resolve_hermes_argv_prefers_path_shim`` — pins the PATH-first
branch so a future refactor doesn't silently flip the order.
* ``test_resolve_hermes_argv_falls_back_to_module_form_when_no_path_shim`` —
pins the correct module name (``hermes_cli.main``, NOT ``hermes``).
Direct regression guard for the form that shipped in the original PR.
* ``test_resolve_hermes_argv_module_actually_runs`` — runs the fallback
invocation as a real subprocess and asserts ``--version`` works, so
losing ``hermes_cli.main``'s ``__main__`` handling can't slip past the
string-match test.
Verified end-to-end: with the shim on PATH the resolver returns
``[/.../hermes]`` and ``--version`` works; with the shim removed the
resolver returns ``[python, -m, hermes_cli.main]`` and ``--version``
still works; the original PR's ``python -m hermes`` invocation fails as
expected (``No module named hermes``).
Follow-up to the previous commit's middleware fix.
- plugins/kanban/dashboard/plugin_api.py: rewrite the "Security note"
docstring. The previous text said "/api/plugins/ is unauthenticated by
design" — that's now actively wrong and dangerously misleading. New
text explains that plugin routes flow through the same session-token
middleware as core API routes and that --host 0.0.0.0 is safe to use
on a LAN as a result.
- tests/hermes_cli/test_web_server.py: extend TestPluginAPIAuth to cover
the surfaces the original PR didn't pin:
* test_plugin_route_allows_auth now exercises a real plugin path
(/api/plugins/example/hello) instead of accepting 200 OR 404 from
a maybe-loaded kanban plugin — the assertion was effectively vacuous.
* test_plugin_patch_requires_auth + test_plugin_delete_requires_auth
cover non-GET mutation methods in case a future regression
whitelists them by accident.
* test_non_kanban_plugin_route_requires_auth proves the fix is
plugin-agnostic, not kanban-specific (hits hermes-achievements +
a non-existent plugin namespace; both 401 before route resolution).
* test_plugin_websocket_unaffected_by_http_middleware locks in that
the HTTP middleware change didn't accidentally start gating WS
upgrades — kanban /events still uses its own ?token= check.
Plus a cosmetic blank-line cleanup.
Remove the blanket /api/plugins/* exemption from auth_middleware so
plugin API routes (e.g. Kanban dashboard) require the same session
token as all other /api/ endpoints.
Fixes#19533
Two follow-ups from self-review:
1. Add gpt-5.3-codex-spark to DEFAULT_CONTEXT_LENGTHS at 128k. The
primary resolution path for Spark goes through provider='openai-codex'
→ _CODEX_OAUTH_CONTEXT_FALLBACK (already correct). But if any future
code path resolves Spark's context with a different provider (custom
proxy, generic fallthrough), the longest-substring-first lookup in
step 8 would match 'gpt-5' and report 400k, which is wrong by ~3x.
Adding the explicit override is a cheap defensive correctness fix
matching how gpt-5.4-mini and gpt-5.4-nano already shadow the generic
gpt-5 entry.
2. Update test_openai_codex_model_validation_fallback.py docstring. The
bug it was originally written for (gpt-5.3-codex-spark missing from
listing) is now resolved by this PR's catalog restoration. The test
still validly exercises the soft-accept code path for any future
entitlement-gated Codex slug that ships before Hermes catalogs it,
but the framing was stale — clarified.
Two follow-ups from self-review:
1. Add unit test for _fetch_models_from_api covering the live HTTP path.
The salvaged PR #19530 dropped the supported_in_api:false filter in
both _fetch_models_from_api and _read_cache_models, but only the
cache path had a regression test. This adds the symmetric live-fetch
test (mocked httpx) so a future drive-by change to the HTTP path
can't silently re-introduce the filter.
2. Pin test_codex_picker_uses_live_codex_catalog to the cache fallback.
The test wrote a fake JWT and a CODEX_HOME cache, but provider_model_ids
('openai-codex') still issued a real 10s HTTP probe to
chatgpt.com/backend-api/codex/models before falling back to the cache.
That made the test slow and non-deterministic in restricted/CI
networks. Patch _fetch_models_from_api to return [] so we go straight
to the cache path the test actually means to exercise.
Closes#21794.
`/kanban`, `/kanban help`, `/kanban --help`, and `/kanban <sub> -h`
all returned broken output to the gateway and interactive CLI. Three
underlying bugs in `hermes_cli.kanban.run_slash`:
1. argparse writes help to **stdout** but `run_slash` only captured
stderr at parse time, so `-h` text was silently swallowed and
replaced with the `(usage error: 0)` sentinel.
2. The wrapping parser used `prog="/"` and routed via a synthetic
"_top → kanban" subparser, producing `usage: / kanban …` (stray
space) and `usage: /kanban kanban …` (doubled token) in error text.
3. Bare `/kanban` and `/kanban help` dumped argparse's full ~3KB
usage tree, which reads as visual garbage in a chat bubble.
Fix: drive the kanban_parser directly (no double-wrap), rewrite prog
strings on every leaf subparser, capture stdout AND stderr around
parse_args, distinguish SystemExit(0) (help — return captured stdout)
from SystemExit(2) (error — return single-line ⚠-prefixed message),
and add an explicit chat-friendly short-help block returned for bare
invocation and the help aliases (`help`, `--help`, `-h`, `?`).
Added 5 regression tests covering bare invocation, every help alias,
subcommand help, unknown action, and missing required arg.
Affects every chat platform via gateway/run.py::_handle_kanban_command
and the interactive CLI via cli.py::_handle_kanban_command.
Co-Authored-By: Nagatha (Claude Opus 4.7) <noreply@anthropic.com>
Both `_kanban_notifier_watcher` and `_kanban_dispatcher_watcher`'s
`_tick_once_for_board` called `_kb.connect(board=slug)` immediately
followed by `_kb.init_db(board=slug)`. Since `connect()` already runs
the schema + idempotent migration on first open per process, the
explicit `init_db()` was redundant — and worse, `init_db()` deliberately
busts the per-process `_INITIALIZED_PATHS` cache and re-runs the migration
on a *second* connection that races the first.
On every cold gateway start against a legacy DB this surfaced as either
`sqlite3.OperationalError: duplicate column name: <col>` or intermittent
`database is locked` errors logged at the first tick. The duplicate-column
case is now tolerated by `_add_column_if_missing` (commit 78698381a), but
the wasted second migration plus the database-is-locked race remain
fixable by skipping the redundant call entirely.
Drops `_kb.init_db(board=slug)` at both call sites and adds a regression
test in `tests/hermes_cli/test_kanban_notify.py` that pins the absence
via source inspection plus a runtime spy.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Adds test_notifier_second_blocked_delivers to cover the case where a
task is blocked, unblocked, then blocked again — the second blocked
event must still deliver a gateway notification.
Currently fails because blocked is treated as a terminal event kind,
causing the subscription to be dropped after the first block.
* feat(curator): show rename map (where skills went) in user-visible summary
The full data has always been on disk in REPORT.md, but the user-visible
curator summary (gateway 💾 line, CLI session-start panel,
`hermes curator status`) was counts-only — "consolidated 4 into 2
umbrellas" with no names. Users only discovered renames when something
they expected was gone.
New `_build_rename_summary()` formats the rename map and appends it to
`final_summary`:
auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1
archived 3 skill(s):
• docx-extraction → document-tools
• pdf-extraction → document-tools
• old-stale-thing — pruned (stale)
full report: hermes curator status
Empty on no-op ticks (no archives), so most ticks add zero log noise.
Cap of 10 entries keeps agent.log readable when a 50-skill
consolidation lands; the full list is always in REPORT.md.
`hermes curator status` indents continuation lines so the multi-line
summary reads as one logical field.
5 new tests in tests/agent/test_curator_classification.py covering
empty / consolidation / pruning / cap / mixed cases.
* feat(curator): show recent run summary once on `hermes update`
The rename map is now visible from where users actually look — the
update flow they explicitly run, instead of just the live gateway log
or transient CLI session-start panel.
Behavior:
- After `hermes update`, if the most recent curator run produced a
rename map (multi-line summary) that the user hasn't seen yet, print
it once with a 'last run Xh ago' header and a one-time-message
footer.
- Stamp `last_run_summary_shown_at = last_run_at` after printing so
subsequent `hermes update` invocations are silent until a newer
curator run lands.
- Silent on no-op runs (single-line summary like 'auto: no changes;
llm: no change'). Still stamps shown so we don't reconsider on
every update.
- Silent when the curator has never run (the existing first-run
notice handles that case).
Output:
ℹ Skill curator — last run 4h ago
auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1
archived 3 skill(s):
• docx-extraction → document-tools
• pdf-extraction → document-tools
• old-stale-thing — pruned (stale)
full report: hermes curator status
(This message shows once per curator run. View anytime: hermes curator status)
State migration:
- `_default_state()` gains `last_run_summary_shown_at: None`. Existing
state files lack the field; `.get()` returns None; the comparison
treats any prior run as 'not yet shown' and prints once on next
update. Self-healing.
Wiring:
- Both `hermes update` paths in main.py call the new
`_print_curator_recent_run_notice()` right after the existing
first-run notice. Best-effort try/except so a state-load bug
never breaks the update flow.
6 tests in tests/hermes_cli/test_curator_recent_run_notice.py:
no-run / single-line / multi-line / show-once / new-run-resets /
time-formatter buckets.
Follow-up test fix for #22693 — the existing test for ps-failure +
pid-file fallback needed the /proc walk path stubbed too since /proc
is now consulted first.
Salvage of NousResearch/hermes-agent#7622.
Docker images often lack procps so `ps` is unavailable. Try reading
/proc/*/cmdline first (works in any Linux container) and fall back to
`ps -A eww` only when /proc is not present. PermissionError on
individual PIDs is silently skipped.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
run_gateway() calls refresh_systemd_unit_if_needed() on every invocation
so restart settings stay current after exit-code-75 respawns. The
user-scope unit path resolves under Path.home() (NOT sandboxed by
conftest, only HERMES_HOME is), and generate_systemd_unit() bakes the
current HERMES_HOME into the unit's Environment= line.
Result: any test that exercises run_gateway() end-to-end on a real
Linux dev box silently rewrites the developer's installed
~/.config/systemd/user/hermes-gateway.service with a polluted
HERMES_HOME pointing at /tmp/pytest-of-<user>/.../hermes_test. On the
next reboot, systemd loads that unit, the gateway starts looking at an
empty tmp dir, and Telegram/Discord/etc. all show as 'No messaging
platforms enabled' even though the user's real config is fine. Three
tests in tests/hermes_cli/test_gateway.py hit this path:
test_run_gateway_exits_cleanly_on_keyboard_interrupt,
test_run_gateway_exits_nonzero_when_start_gateway_reports_failure, and
test_run_gateway_root_guard_has_escape_hatch.
Two-layer fix:
1. _install_fake_gateway_run helper (covers all four run_gateway() call
sites in test_gateway.py and any future ones) now also stubs
supports_systemd_services and refresh_systemd_unit_if_needed.
2. refresh_systemd_unit_if_needed() itself sniffs the generated unit
body for /pytest-of- and /hermes_test markers and refuses to write
when present. Defense in depth so a future test that bypasses the
helper still can't corrupt the dev's gateway. Tests that legitimately
exercise the refresh flow (test_run_gateway_refreshes_outdated_unit_on_boot)
patch generate_systemd_unit to return synthetic content that doesn't
carry those markers, so they keep working.
Adds test_refresh_refuses_to_bake_pytest_tmpdir_into_real_user_unit as a
regression test for the source-side guard.