Three state-loss bugs at the compression rotation boundary, fixed together
because they all live in the same ~80-line rotation block:
- #33618: a persistent /goal did not follow the rotation. load_goal does a
flat per-session lookup with no lineage walk, so a goal silently died when
compression minted a fresh child id. Added migrate_goal_to_session() and
call it after the child session is created (move-not-copy: the parent row
is archived as cleared so exactly one active goal row exists).
- #33906/#33907: if the child create_session raised (FK constraint,
contended write), the outer handler only warned and let the agent continue
on the NEW id — which has no row in state.db — producing an orphan session.
Now the rotation rolls agent.session_id back to the still-indexed parent
(reopening it) instead of stranding the conversation on a phantom id.
- #27633: the compaction-boundary on_session_start notification omitted the
platform kwarg, so context-engine plugins saw source=unknown for every
message after the boundary. Forward platform (matching the initial
session-start call in agent_init.py).
Co-authored-by: denisqq <21260182+denisqq@users.noreply.github.com>
Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com>
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
On Windows, hermes writes writer.bat (@echo off / hermes -p writer %*)
with CRLF endings instead of the POSIX writer shell script. The test
hardcoded the POSIX path and exact bytes, so it failed on Windows hosts.
Assert on stripped non-empty lines per platform, making it line-ending-
and OS-independent.
Follow-up to the salvaged worktree-materialization fix. When a worktree
task has no explicit workspace_path, resolve the anchor from the board's
default_workdir (a git repo) and materialize <repo>/.worktrees/<id> per
task, instead of silently rooting under the dispatcher's CWD (whatever
directory launched the gateway, e.g. the Hermes checkout). If no
default_workdir is configured, raise with a clear message rather than
guessing from CWD.
Adds AUTHOR_MAP entry for the salvaged commit.
The dispatcher treated workspace_kind=worktree as metadata only and never
ran 'git worktree add', so every worktree task ran in the main repo checkout
instead of an isolated worktree — concurrent tasks silently shared one tree
and contaminated each other.
This materializes a real linked worktree at <repo>/.worktrees/<task_id> on
branch wt/<task_id> when resolve_workspace() handles a worktree task, treats a
repo-root workspace_path as shorthand for that location, persists the derived
workspace/branch back onto the task row, and — on rerun/redispatch — detects an
already-materialized linked worktree (via git-common-dir) and reuses it instead
of nesting a second .worktrees/<id> inside it.
A nous inference_base_url that fails the host allowlist (e.g. a stale
stg-inference-api.nousresearch.com persisted before the allowlist
existed) was only replaced 'if refreshed_url:' — so when the validator
rejected the URL it left the poisoned value in place. The 'falling back
to default' warning fired but never took effect: every subsequent call,
including the auxiliary compression call, kept hitting the dead staging
endpoint and 401'd.
Reset to DEFAULT_NOUS_INFERENCE_URL when validation returns None at both
refresh sites in resolve_nous_runtime_credentials, so a poisoned
auth.json self-heals on the next refresh. The proxy adapter already did
this correctly; this brings the two auth.py sites in line.
* feat(setup): Blank Slate setup mode — minimal agent, opt in to everything
Adds a third first-time setup option alongside Quick Setup and Full Setup.
Blank Slate forces ON only what an agent needs to run — provider & model,
the File Operations toolset, and the Terminal toolset — and turns
everything else OFF, then walks the user through opting each capability
back in.
What it does:
- platform_toolsets.cli = [file, terminal] (explicit, authoritative list)
- agent.disabled_toolsets = every other known toolset (web, browser,
code_execution, vision, memory, delegation, cronjob, skills, image_gen,
kanban, …). Applied last in the resolver, so it overrides the
non-configurable platform-toolset recovery that would otherwise re-add
toolsets like kanban — guaranteeing a true blank slate.
- Optional config features off: compression, memory + user-profile capture,
checkpoints, smart model routing, auto session reset.
- Bundled skills default to NONE (reuses the .no-bundled-skills marker);
offers to seed the full catalog.
- Walks through tools / plugins / MCP / messaging, all opt-in.
Proven end-to-end: with the Blank Slate config, model_tools.get_tool_definitions
emits exactly 6 schemas — patch, process, read_file, search_files, terminal,
write_file. Nothing else reaches the model.
Re-enable later via hermes tools / hermes skills opt-in --sync /
hermes setup agent.
Tests: tests/hermes_cli/test_setup_blank_slate.py (8 tests) pin the writers,
the resolver invariant ({file, terminal}), and the 6-schema end-to-end set.
Docs: getting-started/quickstart.md documents all three setup modes.
* feat(setup): Blank Slate fork — finish minimal, or walk through configs
After applying the minimal baseline (provider/model + file + terminal,
everything else off), Blank Slate now presents a choice instead of always
running the full walkthrough:
1. Start with everything disabled — finish now with the minimal agent.
2. Walk through all configurations — opt in to tools, skills, plugins, MCP,
and messaging.
Provider/model and terminal are still configured first either way (the agent
can't run without them). The finish-now path records the bundled-skill opt-out
so future `hermes update` runs don't re-inject skills. The walkthrough body
moved to a separate _blank_slate_walkthrough() helper.
Tests: TestBlankSlateFork covers both branches (finish-now applies baseline +
skill opt-out and skips the walkthrough; walkthrough path invokes it). Docs
updated to describe the fork.
Salvage of PR #41284 onto current main. Relocates the last 9 inline messaging
adapters (+ satellites: telegram_network, feishu_comment/_rules/meeting_invite,
wecom_crypto, wecom_callback) from gateway/platforms/ into self-contained
bundled plugins under plugins/platforms/<x>/, discovered via the platform
registry. Strips the per-platform core touchpoints from gateway/run.py,
gateway/config.py, hermes_cli/gateway.py, hermes_cli/setup.py, and
tools/send_message_tool.py.
Carries forward the migration fixes (explicit enabled:false honored,
get_connected_platforms forces discovery, plugin is_connected via
gateway.get_env_value, logs --component gateway matches plugins.platforms.*,
matrix hidden on Windows).
Additionally ports config keys main added since the PR base: the matrix
plugin's _apply_yaml_config now also covers allowed_users,
ignore_user_patterns, process_notices, and session_scope (the inline
gateway/config.py matrix block gained these in the 1340 commits the PR sat
open; they would otherwise have been silently dropped on deletion).
Manual verification surfaced a second bypass class beyond the standalone
config loaders: several code paths bridge config.yaml values into os.environ
(HERMES_TIMEZONE, HERMES_REDACT_SECRETS, HERMES_MAX_ITERATIONS, TERMINAL_*,
network.force_ipv4, ...) by reading the raw user YAML, so the env the whole
process reads carried the USER's value even when an administrator pinned it —
e.g. a managed timezone was overridden because gateway/run.py wrote the user's
timezone into HERMES_TIMEZONE, and _resolve_timezone_name() checks the env var
first.
Wired the shared apply_managed_overlay() into every config→env bridge:
- gateway/run.py module-level startup bridge (timezone, redact_secrets,
max_turns, terminal, display, gateway.strict, ...)
- gateway/run.py _reload_runtime_env_preserving_config_authority (the per-turn
re-bridge that keeps config authoritative over reloaded .env — must keep
MANAGED authoritative on every turn, not just startup)
- hermes_cli/main.py early security.redact_secrets / network.force_ipv4 bridge
(runs before load_config is usable, at import time)
- hermes_cli/send_cmd.py top-level scalar config→env bridge
Verified end-to-end against a writable managed dir (12/12 checks incl. timezone,
logging, model, skin, gateway settings, write-guard) and in a clean process the
gateway per-turn bridge writes HERMES_TIMEZONE=<managed>. Adds an
order-independent regression test for the bridge overlay.
The skin bug was one instance of a class: several subsystems build their
config dict directly from config.yaml instead of routing through
hermes_cli.config.load_config (which carries the managed merge), so they
silently ignored administrator-pinned values. Audited every config.yaml
reader and fixed the behavioral-read bypasses:
- gateway/config.py load_gateway_config (messaging gateway: session_reset,
quick_commands, stt, model, ...)
- gateway/run.py _load_gateway_config (its read_raw_config fast path also
skipped the merge — read_raw_config returns raw user YAML)
- tui_gateway/server.py _load_cfg (new TUI + desktop backend: skin,
reasoning_effort, service_tier, provider_routing)
- cron/scheduler.py (scheduled-job model/reasoning/toolsets/provider_routing)
- hermes_logging.py (logging.level/max_size_mb/backup_count)
- hermes_time.py (timezone)
- hermes_cli/doctor.py (memory-provider diagnostic reads effective config)
All route through a new shared managed_scope.apply_managed_overlay() helper
that mirrors _load_config_impl (env-only expansion so a user ${VAR} can't
shadow a managed literal, root-model-string normalization, leaf-merge) and is
fail-open. cli.py's earlier inline fix is refactored onto the same helper.
Write-back paths (slash_commands, telegram/yuanbao dm_topics, profile
distribution) are deliberately left reading raw user YAML — overlaying managed
values there would persist them into the user file. The dashboard
(web_server.py) already routes through load_config and needed no change.
TUI loader caches the RAW config so _save_cfg never writes managed values to
disk. Adds test_managed_scope_overlay.py (helper) and
test_managed_scope_loaders.py (per-surface integration); mutation-checked.
cli.py's load_cli_config() builds CLI_CONFIG independently of
hermes_cli.config._load_config_impl (it reads config.yaml directly and merges
into hardcoded defaults), so the Phase 2 managed merge never reached the
interactive CLI/TUI surface. Symptom: a managed display.skin (and any other
display/CLI pref read from CLI_CONFIG) was silently ignored by the TUI while
`hermes config`/`doctor`/write-guards — which go through load_config — correctly
honored it. Found via manual testing: the skin engine kept using 'default'.
Fix: overlay the managed config last in load_cli_config(), mirroring
_load_config_impl — expand against the process env only (so a user ${VAR} can't
shadow a managed literal), normalize the root model key so a managed
`model: x/y` string can't clobber the dict shape callers expect, then
leaf-merge. Fail-open so managed scope can never block CLI startup.
Adds tests/hermes_cli/test_managed_scope_cli_config.py locking that CLI_CONFIG
honors managed values, preserves user siblings, and is inert with no scope.
- show_config prints an administrator header naming the managed source and
lists the pinned config/env keys when a scope is active (silent otherwise).
- hermes doctor gains a managed_scope_check under Configuration Files that
reports the resolved managed dir + pinned key counts, and flags a
HERMES_MANAGED_DIR redirect (the documented foot-gun).
- set_config_value hard-rejects a managed config key (D2) and names the
source, exiting non-zero.
- save_env_value / remove_env_value refuse a managed env key.
- save_config strips managed leaves from a bulk write (mechanical safety net)
with a warning, so the unmanaged remainder still persists.
New _strip_dotted_keys helper drives the bulk-save pruning. All guards are
distinct from and layered after the existing is_managed() package-manager
write-lock.
load_hermes_dotenv now loads the managed-scope .env after user/project .env
and external secret sources, with override=True, so managed env values beat
the user .env and any pre-existing shell export. Reuses the existing dotenv
fallback + credential-sanitization path. Fail-open: no managed dir/.env is a
no-op and any error is swallowed so managed scope never blocks startup.
_load_config_impl now deep-merges the managed config.yaml on top of the
expanded user config so managed leaves win while sibling keys stay
user-controlled (leaf-level merge, D3). Managed values are expanded against
the process env only, never user-defined ${VAR}, so a user can't shadow a
managed literal. The managed file's (mtime,size) is folded into the load
cache key so editing it invalidates the cache. This inverts the usual
env-over-config precedence for pinned keys by design (see design doc §4.1).
New hermes_cli/managed_scope.py resolves a system-level managed directory
(HERMES_MANAGED_DIR override > /etc/hermes), parses managed config.yaml/.env
with fail-open semantics, and exposes is_key_managed/is_env_managed helpers.
The system default is ignored under pytest and HERMES_MANAGED_DIR is added to
the conftest env scrub so a real managed scope can't leak into the suite.
Not wired into the load paths yet (Phases 2-3).
release_stale_claims and detect_stale_running call _terminate_reclaimed_worker
and then release the task claim unconditionally, even when the termination did
not actually kill the worker. _terminate_reclaimed_worker already reports this
via its "terminated" flag, but the callers ignore it.
When a worker is parked in uninterruptible (D) state — for example throttled by
a cgroup memory.high limit — a pending SIGTERM/SIGKILL cannot be delivered until
the throttle lifts, so the kill is a no-op. The dispatcher then frees the claim
and spawns a fresh worker beside the still-alive one. Repeated every dispatch
tick this accumulates duplicate workers without bound, deepening the memory
pressure that caused the throttle in the first place — a self-reinforcing
runaway.
Fix: gate both automatic reclaim paths on _worker_survived_termination(). When
we attempted to kill our own host-local worker and it is still alive, defer the
reclaim (_defer_reclaim_for_live_worker extends the claim a short grace and
emits a reclaim_deferred event) instead of releasing. This guarantees at most
one live worker per task and is self-correcting: not spawning a duplicate is
what relieves the pressure so the pending signal lands and the worker dies, and
the next tick reclaims cleanly. Non-host-local claims and the operator-driven
reclaim_task() path keep their existing force-release behaviour.
Related: #41448 (concurrent dispatchers amplify this by doubling reclaim
frequency); #42858 (kill the worker rather than orphan it on archive).
Tests: defer-when-worker-survives, reclaim-when-killed,
release-when-not-host-local, and the detect_stale_running path.
PR #49056 set the default to 0, which reverts the #45592 idle-clock fix:
without a periodic invalidate, prompt_toolkit stops repainting the bottom
chrome during idle and the status bar goes stale/disappears after a turn.
Restore 1.0 as the default for everyone. The config knob stays — users on
emulators where the per-second redraw fights auto-scroll (#48309) can set
display.cli_refresh_interval: 0 to opt out.
Foundations for serving multiple profiles from one gateway process, inert
when off:
- gateway.multiplex_profiles config flag (default false), round-trips through
GatewayConfig and load_gateway_config (top-level + nested gateway.* form).
- hermes_cli.profiles.profiles_to_serve(multiplex): the single chokepoint for
which (profile, HERMES_HOME) pairs the gateway serves. Lightweight dir scan;
active-profile-only when off, default + all named profiles when on.
- build_session_key gains a profile= namespace slot. Default/None reuse the
historical 'agent:main:...' literal BYTE-IDENTICALLY (no session migration,
positional parsers unaffected); a named profile becomes 'agent:<profile>:...'
so two profiles on the same platform/chat never collide.
- SessionStore._resolve_profile_for_key + _session_key_for_source fallback
resolve the namespace from the flag (legacy when off, active profile when on).
Tests: byte-identical-when-off (parametrized), namespace isolation, positional
layout preserved, config round-trip, profiles_to_serve enumeration.
Adds the end-to-end parity contract test: every CANONICAL_PROVIDERS entry (the
`hermes model` universe) must be configurable on a desktop Providers tab —
keys(/api/env) ∪ ids(/api/providers/oauth) ⊇ canonical. Asserted as an
invariant against the live endpoints so the GUI can never silently drift from
the CLI again.
Surfacing this contract caught Bedrock: it's aws_sdk (no api-key vars), so it
had no Keys card. /api/env now tags AWS_REGION/AWS_PROFILE to the bedrock
provider card. Anthropic is whitelisted as a legitimate dual-tab provider
(direct API key + subscription OAuth).
Also refreshes the _OAUTH_PROVIDER_CATALOG docstring to describe its new role
as the override base for _build_oauth_catalog().
/api/providers/oauth now unions the explicit hand-tuned OAuth cards
(_OAUTH_PROVIDER_CATALOG — bespoke flow/status/cli, plus the api-key Anthropic
PKCE card and synthetic claude-code row) with every accounts-tab provider in
provider_catalog(). Any OAuth/external provider in the `hermes model` universe
now appears automatically, closing the drift where google-gemini-cli and
copilot-acp had no Accounts card despite being CLI-configurable.
Adds read-only status cards for google-gemini-cli (via existing
get_gemini_oauth_auth_status) and copilot-acp (managed-by-CLI, like claude-code).
DELETE handler routes through the same _build_oauth_catalog() builder.
Parity test asserts the Accounts tab offers every accounts-tab catalog provider
as an invariant.
The Keys tab now surfaces every keys-tab provider in provider_catalog() (the
`hermes model` universe), synthesizing a card even when the env var has no hand
entry in OPTIONAL_ENV_VARS. Closes the drift where openai-api, kilocode, novita,
tencent-tokenhub, and copilot were CLI-configurable but invisible in the desktop
Providers → API keys tab.
Each provider row now carries backend-derived provider/provider_label grouping
hints so the desktop can group by the same provider identity the CLI picker
uses. Hand OPTIONAL_ENV_VARS prose still wins where present (enrichment, not a
gate). Shared non-provider credentials (e.g. tool-category GITHUB_TOKEN) are
explicitly not hijacked into a provider card — Copilot uses its provider-owned
COPILOT_GITHUB_TOKEN.
Adds hermes_cli/provider_catalog.py, deriving one descriptor per provider from
the CANONICAL_PROVIDERS universe (what `hermes model` renders, auto-extended
from provider plugins), joined with auth/env from PROVIDER_REGISTRY and display
metadata from ProviderProfile (with canonical/env fallbacks for the four
profile-less providers and the many profiles with blank display/signup fields).
Each descriptor is tagged with the desktop tab it belongs on (keys vs accounts)
by auth_type. This is the single source of truth the desktop Providers tabs will
derive membership from, so they can no longer drift from the CLI picker.
Tests assert the parity contract (catalog == hermes model universe) and tab
routing as invariants, not snapshots.
A plain /model <name> switch only lasted for the current session — every
new session reverted to the previously-configured model, so users had to
re-switch every time (e.g. glm-5.1 -> glm-5.2 on every launch).
Persist-by-default is now the behavior across all three /model surfaces
(CLI, gateway, TUI/dashboard), gated by a new config key
model.persist_switch_by_default (default true):
/model <name> switch model (persists to config.yaml)
/model <name> --session switch for this session only
/model <name> --global switch and persist (explicit, unchanged)
The effective persistence is resolved once via resolve_persist_behavior()
in hermes_cli/model_switch.py so --session opts out, --global opts in,
and the config-gated default applies otherwise. --global remains a valid
explicit no-op alias for the new default.
gui.log was registered in hermes_cli/logs.py::LOG_FILES (and surfaced by
`hermes logs gui`) but was never wired into `hermes debug share`. The share
report captured agent/errors/gateway/desktop tails plus full agent/gateway/
desktop logs — but nothing from gui.log, the surface the dashboard, TUI-over-
PTY bridge, and websocket layer (hermes_cli.web_server / pty_bridge /
tui_gateway) actually write to. A user reporting a dashboard or TUI bug shared
zero breadcrumbs from the broken surface.
Wire gui.log through all three share surfaces, matching the existing pattern:
- _capture_default_log_snapshots(): capture the gui snapshot (redacted like the rest)
- collect_debug_report(): add the gui.log summary tail block
- build_debug_share(): pull gui full_text, prepend dump header + redaction banner, add to the upload loop
- run_debug_share() --local branch: same, plus the local print block
- _PRIVACY_NOTICE: name gui.log in both bullets
Redaction is inherited for free — the gui snapshot goes through the same
_capture_log_snapshot(..., redact=redact) path, so secrets are scrubbed in
both the tail and full text (verified E2E: seeded key masked by default,
passes through under --no-redact, raw token never leaks).
Tests: seed gui.log in the fixture, add test_report_includes_gui_log, and bump
the upload-count tripwire 4->5 (test_share_uploads_five_pastes).
`hermes backup` walked every file under HERMES_HOME, excluding only
hermes-agent / node_modules / __pycache__ / backups / checkpoints. Python
dependency trees (plugin and MCP-server venvs, site-packages) and pip/uv
tool caches that live under HERMES_HOME were swept in file-by-file,
ballooning a backup to hundreds of thousands of entries that crawl for
hours — the reported "backup stuck for days / 426543 files" symptom.
Add the canonical regeneratable-dir names (.venv, venv, site-packages,
.tox, .nox, .pytest_cache, .mypy_cache, .ruff_cache — mirroring
agent.skill_utils.EXCLUDED_SKILL_DIRS) plus .cache to the backup's
exclusion set, used by both run_backup and the pre-update/pre-migration
_write_full_zip_backup. .archive is intentionally left in so the curator's
restorable archived skills still get backed up.
Tests cover each new dir name (excluded at any depth), that .archive and
cache-resembling files are kept, and an integration check that a planted
venv/site-packages/cache is pruned from the actual backup zip while
skills/config survive.
- Drop empty entries before validating SLACK_ALLOWED_USERS so a trailing or
interior comma (which the gateway silently tolerates in
gateway/platforms/slack.py) is no longer rejected at the dashboard.
- Hoist the member-ID regex to a module-level _SLACK_MEMBER_ID_RE constant
and note it stays in sync with the frontend SLACK_MEMBER_ID_RE.
- Add a regression test for the trailing-comma case.
The new SLACK_ALLOWED_USERS validation rejected '*', but the Slack gateway
honors '*' as an allow-all wildcard (gateway/platforms/slack.py DM auth,
slash-confirm, and approval-button paths). Accept '*' as a valid list entry
in both the API validator and the dashboard form so a value the runtime
honors is no longer blocked at setup.
systemctl --user restart hermes-gateway run via the terminal tool is a
child of the gateway itself. When systemd delivers SIGTERM the gateway
kills this subprocess before it can complete, so the service may never
restart — reproducing issue #37453.
The hermes gateway restart/stop guard (hermes_cli/gateway.py) and the
cron-path guard (hermes_cli/cron.py) already block equivalent commands
in their respective paths but the terminal tool had no such defense.
Add a hard-block before command execution in terminal_tool: when
_HERMES_GATEWAY=1 and the command matches _contains_gateway_lifecycle_command,
return an error immediately. force=True cannot bypass it — unlike the
normal dangerous-command approval flow, here even a user-approved restart
would fail because the SIGTERM propagates to child processes.
Also extend _GATEWAY_LIFECYCLE_PATTERNS to match systemctl with flags
(e.g. systemctl --user restart) — the previous regex required the
action word immediately after systemctl with no flags in between.
Adds 9 regression tests: 6 blocked variants (parametrized), force bypass
attempt, safe systemctl passthrough, and guard-inactive-outside-gateway.
The desktop model picker had no way to force a fresh model fetch: model.options
went through the 1h-cached provider_models_cache.json, and there was no flag to
bust it. When a provider's cached list expired and its next live fetch failed,
the picker fell back to the curated static list — silently dropping live-only
models (e.g. OpenCode Zen's free tier like deepseek-v4-flash-free) the user had
been using.
- Thread refresh through model.options (RPC + REST /api/model/options) ->
build_models_payload -> list_authenticated_providers, which calls
clear_provider_models_cache() up front when set so every row re-fetches live.
- Add a 'Refresh Models' control to the desktop picker (5-locale i18n, spinning
sync icon). Normal opens leave refresh=false to stay snappy on the cache.
Verified: stale cache hides deepseek-v4-flash-free -> refresh busts it -> live
re-fetch surfaces it. refresh=false never touches the cache.
Live-test finding: the Chronos fire webhook was only on the APIServerAdapter
(aiohttp), but hosted agents expose `hermes dashboard` (the FastAPI web_server
app on :9119) as their public URL — NOT the api_server adapter. So NAS's relay
callback to {callback_url}/api/cron/fire could never reach the verifier on a
hosted agent (the exact target environment). Two layers were wrong:
1. Wrong server: /api/cron/fire didn't exist on the dashboard app. Added
cron_fire_webhook there, alongside the existing /api/cron/* dashboard routes.
It resolves the job's profile (_find_cron_job_profile) and runs fire_due via
the resolved provider under the cron-profile retarget lock
(_fire_cron_job_for_profile, mirroring _call_cron_for_profile) so the CAS
claim + run_one_job operate on the right profile's jobs.json. Runs with no
live adapters (delivery falls back to the per-platform send path, like the
desktop cron path). 202 + background so a long turn never trips NAS's
timeout; the store CAS de-dupes a NAS retry. job-not-found -> 200 "gone".
2. Auth gate: the dashboard auth middleware 401s any non-cookie request before
the handler runs. Added /api/cron/fire to the shared PUBLIC_API_PATHS so the
NAS bearer-JWT callback reaches the verifier — the JWT (purpose=cron_fire),
not the cookie, is the real gate. One shared frozenset feeds both the
loopback and OAuth middlewares, so no drift.
Kept the APIServerAdapter route too (valid self-host api_server surface).
Contract doc updated to name the dashboard app as the hosted-agent callback
surface.
Tests: test_cron_fire_dashboard (6) — route registered on the dashboard app,
in PUBLIC_API_PATHS, 401 on bad token WITH the cookie gate engaged (proves it's
reachable past the gate + JWT is the gate), 400 missing job_id, 200 gone for
unknown job, 202 + fire_due invoked for the resolved profile on a valid token.
Full hermes_cli + cron + chronos + webhook suites green (7637).
Why the original tests missed it: the api_server webhook test built an
APIServerAdapter client directly and never asserted which server the hosted
public URL exposes — green-but-wrong-integration. The new test pins the route
to the dashboard app.
* fix(dashboard): resolve chat TUI argv off event loop
Dashboard chat now resolves its TUI launch command off the
FastAPI/WebSocket event loop. The resolver can run `npm install` /
`npm run build` through `_make_tui_argv()`, and doing that synchronously
in `/api/pty` can block proxy keepalives and other dashboard WebSocket
work long enough for reverse-proxy deployments to drop the chat
connection.
This keeps the current TUI build policy intact: normal production
launches still run the correctness-first `npm run build` path, while
`HERMES_TUI_DIR` remains the prebuilt/no-build path for distros and
containers. The change only moves the potentially slow resolver work to
a worker thread for the dashboard chat path, serialized by an
`asyncio.Lock` so concurrent chat tabs preserve one-build-at-a-time
behavior. `SystemExit` (node/npm missing) and the profile `HTTPException`
path still propagate cleanly through `asyncio.to_thread()`.
Salvaged from #26124 — rebased onto current main. The async wrapper now
threads the `profile` parameter that `_resolve_chat_argv` gained on main
since the PR was opened, so cross-profile chat is preserved.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
* chore: add 0xdany to AUTHOR_MAP
* fix(dashboard): bind chat-argv lock to app.state; cover error propagation
Self-review hardening on top of the salvaged fix:
- Move `_chat_argv_lock` from a module-level `asyncio.Lock()` onto
`app.state` (initialised in `_lifespan`, lazy fallback via
`_get_chat_argv_lock`), mirroring `event_lock`. A module-level
`asyncio.Lock()` binds to whatever event loop is active at import time,
which is the exact pattern `_get_event_state`'s docstring warns against
(breaks across TestClient instances / uvicorn reloads). This keeps the
lock on the running loop.
- Add two tests exercising the real `_resolve_chat_argv_async` →
`asyncio.to_thread` → lock → re-raise chain: `SystemExit` (node/npm
missing) and `HTTPException` (invalid profile) both propagate out of the
worker thread and are caught by `pty_ws`'s existing handlers. The prior
tests mocked `asyncio.to_thread` away and never covered this path.
* test(dashboard): dedupe pty error-propagation tests; assert close code
simplify-code cleanup pass on the salvage stack:
- Extract the shared scaffolding of the two pty_ws error-propagation tests
into `_assert_pty_propagates`, keeping the two tests as distinct contracts
for the `except SystemExit` and `except HTTPException` arms.
- Assert the stable WebSocket close code (1011) instead of relying solely on
the user-facing "Chat unavailable" notice wording — a behavior contract per
the AGENTS.md "behavior contracts over snapshots" rule, robust to notice
rewording. The detail substring ("unknown profile") is still checked for the
HTTPException case since proving the detail survives the thread hop is the
point of that test.
No production-code change; the helper exercises the same real
_resolve_chat_argv_async -> asyncio.to_thread -> lock -> re-raise chain.
---------
Co-authored-by: draihan <draihan@student.ubc.ca>
* fix(desktop): show Hindsight memory provider
* feat(desktop): configure Hindsight memory provider
* fix(desktop): limit Hindsight modes to supported setup
* refactor(desktop): generic memory-provider config surface
Replace the bespoke Hindsight settings surface with a declarative,
schema-driven path so adding a memory provider is pure declaration —
no per-provider page, conditional, or endpoint.
- memory_providers.py: declarative registry. Each provider lists its
fields {key, label, kind, default, options, secret-vs-plain}. Hindsight's
mode is a select(cloud, local_external), so rejecting local_embedded
falls out of generic enum validation instead of a hand-written check.
- One generic endpoint pair GET/PUT /api/memory/providers/{name}/config.
GET returns declared fields + current values (secrets only as is_set,
never read back); PUT validates selects against their options, writes
plain fields to the provider config file, secrets to the env store,
and flips memory.provider.
- ProviderConfigPanel renders straight from the schema, replacing
hindsight-settings.tsx and the memory.provider === 'hindsight'
conditional in config-settings.tsx — same pattern as
toolset-config-panel.tsx off env_vars.
Scoped to memory providers; storage layout is unchanged so the runtime
Hindsight plugin reads the same config.json / HINDSIGHT_API_KEY / provider
keys as before. Tests cover the registry, endpoint behavior (defaults,
write+secret, select rejection, unknown provider, secret-never-returned),
and the generic panel.
DELETE /api/sessions/{id} was the only session endpoint that didn't
resolve the id (detail, messages, rename, export all call
resolve_session_id) and 404'd when the row was already gone. The desktop
optimistically removes the sidebar row, then RESTORES it and shows the
error on any failure — so deleting a session that had just been reaped
(empty-session hygiene) or removed by a concurrent client resurrected a
ghost row and surfaced "session not found". /goal + auto-compression churn
leaves transient empty rows that race the sidebar snapshot, which is the
exact "I deleted the empty one and got 'session not found'" report.
Resolve exact ids / unique prefixes, and treat an already-absent session
as an idempotent success — DELETE's contract is "ensure it's gone". This
mirrors the bulk-delete endpoint, which already treats ghost ids as
success.
Tests: deleting an absent id is idempotent (200, not 404); delete resolves
a unique prefix; a real session still deletes.
Follow-up to the cap-removal salvage. The contributor guarded the new
unlimited default with `[:max_models] if max_models else ...`, which conflates
max_models=0 (used by slug-only callers that want an empty model list) with
None (unlimited). Tighten to `is not None` at all five slicing sites in
list_authenticated_providers / list_picker_providers, and add a regression test
asserting the three-way contract: None=full, 0=empty, N=first N.
The interactive model pickers (Desktop REST API, TUI model.options, CLI
/model) were hard-capped at max_models=50, which truncated large provider
catalogs like Kilo Gateway (336 models) to just 50 entries. This made
most models undiscoverable via the picker search box.
Changes:
- Change build_models_payload() default from max_models=50 to None (unlimited)
- Change list_authenticated_providers() default from max_models=8 to None
- Change list_picker_providers() default from max_models=8 to None
- Fix all [:max_models] slicing to handle None as 'no limit'
- Remove max_models=50 from 5 interactive picker callers:
* web_server.py: get_model_options (Desktop /api/model/options)
* web_server.py: get_recommended_default_model
* model_switch.py: prewarm_picker_cache_async
* tui_gateway/server.py: model.options JSON-RPC
* cli.py: HermesCLI model picker
- Telegram/Discord inline keyboard picker (gateway/slash_commands.py)
still passes max_models=50 explicitly — unchanged behavior.
The total_models field was already in the response payload and is now
meaningful since models.length == total_models for interactive pickers.
Fixes#48279
* feat(billing): nous_billing http client + BillingState core (phase 2b)
Phase 2b terminal-billing client foundation:
- hermes_cli/nous_billing.py: typed client for the 4 /api/billing/* endpoints
(state/charge/poll/auto-top-up). Raises typed errors (BillingScopeRequired,
BillingRateLimited, BillingAuthError) mapped from the live-verified contract;
fail-open is the caller's job. Idempotency-Key enforced client-side.
- agent/billing_view.py: surface-agnostic BillingState core + Decimal money
parsing (server emits decimal strings, not 2dp), fail-open builder,
idempotency-key gen, custom-amount validation.
- 51 unit tests (decimal parse/format, payload tiering, error->exception
matrix, fail-open, amount validation).
Plan: docs/plans/2026-06-13-001-phase-2b-terminal-billing-tui-plan.md
* feat(billing): billing:manage scope + lazy step-up re-auth (phase 2b)
- NOUS_BILLING_MANAGE_SCOPE constant.
- nous_token_has_billing_scope(): split-based scope check (no false-positive
substring match).
- step_up_nous_billing_scope(): re-runs the device flow requesting
billing:manage, reusing the held credential's portal/inference URLs + client_id
(so a preview stays a preview), persists like _login_nous but WITHOUT the model
picker. Returns True iff the minted token carries the scope (False when NAS
silently downscopes a non-admin / unticked grant).
Lazy step-up (plan D-A): normal login path unchanged; 403 insufficient_scope
from a billing call triggers this. 7 unit tests.
* feat(billing): billing JSON-RPC methods for the TUI (phase 2b)
billing.state / charge / charge_status / auto_reload / step_up in
tui_gateway/server.py. Return STRUCTURED success envelopes (result.ok +
result.error=<code>) rather than JSON-RPC-level errors, so the Ink rpc() promise
always resolves and the TUI branches on the typed billing error code
(insufficient_scope, rate_limited, no_payment_method, …) to render the right
affordance. Money serialized as decimal STRINGS + display strings. charge mints
+ echoes an idempotency_key for retry reuse. 16 unit tests.
* feat(billing): /billing CLI handler + command registry (phase 2b)
- CommandDef("billing", subcommands=buy|auto-reload|limit), added to
_SLACK_VIA_HERMES_ONLY so it routes via /hermes on Slack (keeps the 50-cap
parity test green, same as /credits).
- cli.py::_show_billing + screen helpers: all 5 screens (overview, buy→confirm→
poll, auto-reload, monthly-limit read-only). Reuses _prompt_text_input_modal /
_prompt_text_input (D-C). Non-interactive (_app is None) renders text + portal
deep-link, never prompts (R7). Decimal money end-to-end. 2s/5-min cancellable
poll loop; 429/503 = retry not failure; settled = ledger truth. Lazy step-up on
403 insufficient_scope. no_payment_method treated as mainline funnel-to-portal.
- 6 CLI tests; 156 command tests (incl. Slack/Telegram parity) green.
* feat(billing): /billing Ink TUI screens + tests (phase 2b)
- ui-tui/src/app/slash/commands/billing.ts: /billing TUI command covering all 5
screens — overview (text), buy <amt> → ConfirmReq → charge → non-blocking 2s/
5-min poll loop → settled/failed/timeout branches, auto-reload <below> <to> →
ConfirmReq → PATCH, limit (read-only). Reuses the existing ConfirmReq overlay
(D-C) — no bespoke component. Typed-error envelope branching: insufficient_scope
arms the lazy step-up confirm; no_payment_method/rate_limited/cap funnel to
portal. Client-side amount validation mirrors the server (bounds + 2dp).
- gatewayTypes.ts: Billing* response interfaces.
- registry.ts: register billingCommands.
- billingCommand.test.ts: 12 vitest cases (overview/gating/buy-confirm-poll-
settled/no_payment_method/step-up/limit/auto-reload/validation).
TUI build green; 12/12 vitest pass; slash tests pass once @hermes/ink is built.
* docs(billing): scrub private cross-repo references
NAS is a private repo — remove all references to it from the public PR:
- drop the cross-repo planning doc (planning scaffolding, not a deliverable;
the PR description documents the design)
- replace 'NAS' / 'PR #412 preview' mentions in code + test comments with
generic 'the server' / 'a preview deployment'
* docs(billing): scrub final NAS reference in step-up docstring
* docs(billing): drop dangling plan-doc refs
The phase-2b plan doc was removed in the cross-repo scrub (300afcc0b)
but two module docstrings still pointed at it. Drop the dead refs.
* feat(billing): interactive /billing overlay + step-up UX, portal-URL & token fixes
Adds the interactive /billing TUI overlay and hardens the terminal-billing
client across CLI and TUI.
- TUI: full /billing overlay state machine (overview to buy to confirm,
auto-reload, read-only monthly limit) reusing the existing confirm overlay.
- Step-up: surface the verification link in-transcript and open the browser
via the TUI's own opener (the device flow runs in the headless gateway, so a
printed URL was being dropped); run the step-up handler off the main loop and
emit the link as an out-of-band event so the gateway stays responsive.
- Step-up copy is scope-accurate ("Billing permission granted") and re-checks
/state so it never claims "enabled" when the org kill-switch is still off.
- Portal deep-links resolve to absolute URLs against the active portal base
(the server emits them relative) - fixes a bare "/billing?topup=open" link.
- Billing calls refresh an expired access token via the stored refresh token
instead of reporting a false "not logged in".
- Optimistic funnel: advise "set up a saved card on the portal" up front when
no card is on file (advisory, not a hard gate).
- Token resolution is cached briefly so the 2s charge poll loop stops
re-locking + re-reading the auth store on every tick; 401 re-resolves fresh.
- Remove the temporary demo-mode shims.
Validation: 87 Python billing tests, 88 TS tests (billing command + gateway
event handler), tsc clean, ink + ui-tui builds green.
* docs(billing): add /billing TUI screenshots for PR
* fix(cli): guard _last_invalidate on bare instances; update stale prompt-fallback test
The UI-invalidate throttle read self._last_invalidate unconditionally, which
raised AttributeError on HermesCLI instances built without __init__ (the
thread-safety test's object.__new__ shell). Guard the read with getattr.
The off-main-thread branch of _prompt_text_input was changed (#23185) to cancel
cleanly to None instead of falling back to a bare input() that would hang on the
slash-worker thread; the test still asserted the old direct-input fallback.
Update it to assert the current intended behavior: returns None, calls neither
run_in_terminal nor input(), and does not hang.
The dashboard MCP catalog only showed name/description/transport and a
non-clickable source. Users couldn't see what an entry connects to or runs
before installing — the exact detail the docs trust model tells them to vet.
- /api/mcp/catalog now returns transport target (url, or command+args),
auth_type, git install source/ref + bootstrap commands, default-enabled
tool hint, and post-install guidance per entry.
- McpPage renders the endpoint URL (http) or command+args (stdio), the git
install source/ref, a collapsible bootstrap-commands list, setup notes,
and the source as a clickable link when it's a URL.
- Docs: drop the 'uv pip install -e .[mcp]' quick-start step (Hermes does
not support pip installs; MCP ships with the standard install) and note
the dashboard now surfaces this detail.
- Strengthen the catalog endpoint test to assert the new inspection fields.
Defense-in-depth fix for the silent wipe of ~/.hermes/ documented in
#48200. A `hermes update --yes` run silently destroyed a user's
.env, MEMORY.md, kanban.db, custom skills, and scripts. Two changes:
1. `_rmtree_writable` in tools/skills_sync.py now refuses to rmtree
anything outside SKILLS_DIR (the HERMES_HOME/skills/ root).
All five call sites pass paths under SKILLS_DIR, so the guard is
a no-op for current code and a loud, recoverable failure for
any future regression (bad path join, malicious bundled
manifest, stale path in scope after an exception).
2. The default `updates.pre_update_backup` flips from false to
true in hermes_cli/config.py. A few minutes of zip per update
is negligible compared to silent total data loss. Still
overridable; --no-backup still works for one-off opt-out.
Five new tests in TestRmtreeWritableScopeGuard (root path,
hermes home, sibling dir, skills root itself, subdir) plus a
flipped `test_default_enabled_creates_backup` in test_backup.py.
178/178 tests pass in the two affected files. Public method
signatures unchanged, no test-stub blast radius.
Closes#48200
Cleanup pass on the salvage (behavior-preserving):
- diff_bundled_skill now uses the existing _skill_file_list() helper
instead of reimplementing the rglob/is_file/relative_to file-set
enumeration inline (twice).
- Extract _is_tracked_user_modification(origin_hash, user_hash) and use
it in BOTH the sync loop and list_user_modified_bundled_skills() so the
'kept user edit' rule can't drift between the two sites.
- _read_text_for_diff -> _read_for_diff returns (bytes, text); the binary
branch now compares the bytes it already read instead of re-reading
both files from disk.
- Drop the unused 'user_present' key from diff_bundled_skill's return
contract (no consumer or test ever read it).
- test_update_modified_notice: drop the brittle '>= 2 sites' count-floor
so consolidating the two print paths into a shared helper stays a
welcome refactor; keep the per-site 'count notice => discovery hint'
invariant (still mutation-tested).