display.timestamps already drove the [HH:MM] suffix on live submitted and
streamed message labels, but there was no runtime command to toggle it and
/history ignored the setting entirely. Add /timestamps [on|off|status]
(alias /ts) and render [HH:MM] in /history for turns that carry a stored
unix timestamp (resumed sessions). Live unsaved turns without a stored time
are never given a fabricated one. Uses the existing sanctioned non-wire
'timestamp' message key (stripped before the API call in chat_completions),
so message-alternation and prompt-cache invariants are untouched.
Authorization to message the agent is the gate, not the file extension.
Previously the inbound-attachment allowlist (SUPPORTED_DOCUMENT_TYPES) was
opt-OUT on Discord (allow_any_attachment defaulted false) and had no bypass
at all on Telegram/Slack — so an .html (or any non-allowlisted type) was
dropped or hard-rejected before the agent saw it.
Now every authorized upload is cached and surfaced to the agent regardless
of type:
- base.cache_media_bytes(): unknown types cache as octet-stream (or the
caller-supplied MIME) instead of returning None — fixes the chokepoint
that Teams/Telegram-media route through.
- discord/telegram/slack adapters: removed the allowlist reject/skip; any
non-media attachment is typed DOCUMENT and cached. Known types keep their
precise MIME.
- Text inlining now gates on a shared _TEXT_INJECT_EXTENSIONS set (text +
code + config + markup) instead of a blind UTF-8 decode, so binary formats
(PDF/zip/docx) with ASCII headers are never inlined.
- gateway/run.py emits the path-pointing context note for every DOCUMENT,
including non text/application MIME types.
- discord.allow_any_attachment is now a documented no-op kept for config
back-compat.
Validation: 357 gateway tests pass; E2E confirms .html/.bin/custom types
cache, known types stay precise, PDFs are not inlined.
* fix(gateway): walk /proc/*/cmdline to find main-wrapper.sh under s6-overlay v3 (#49196)
(cherry picked from commit 3a108c2df0)
* fix(container): peel s6-v3 rc.init prefix so dashboard role is detected
kyssta-exe's preceding commit (#49238) fixed _read_container_argv() to
locate the rc.init-launched main-wrapper.sh process under s6-overlay v3,
but the skip still never fired: _strip_container_argv_prefix() only peeled
a prefix when args[0] was init/main-wrapper.sh/hermes. Under s6 v3 the
matched argv is
/bin/sh -e /run/s6/basedir/scripts/rc.init top
/opt/hermes/docker/main-wrapper.sh dashboard ...
so args[0] stayed /bin/sh, _is_dashboard_container() returned False, and
the dashboard container reconciled + started its own gateway-default —
the exact dual Telegram getUpdates 409 in issue #49196.
Fix: strip everything up to and including the main-wrapper.sh token (the
stable boundary the image owns), covering both the v2 (/init ...) and v3
(/bin/sh ... rc.init top ...) shapes with one rule, instead of matching
launcher tokens positionally. This also repairs _is_legacy_gateway_run_request()
under v3, which shares the same strip helper (the issue called this out).
Tests: extend the dashboard true/false parametrize sets with the s6-v3
argv shape, and add test_main_skips_reconcile_in_dashboard_container_s6v3
exercising main() end-to-end with the v3 argv. Verified via mutation that
both new v3 assertions fail under the old positional strip and pass with
the fix.
---------
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
When `hermes dashboard --host 0.0.0.0` is run interactively with the auth
gate engaged but no DashboardAuthProvider configured, prompt to set up the
bundled username/password provider on the spot (or point at `hermes dashboard
register` for OAuth) instead of only emitting the fail-closed error.
- main.py: `_maybe_setup_dashboard_auth_interactively()` runs before
start_server. No-ops on loopback binds, when a provider is already
registered, or when stdin/stdout isn't a TTY (Docker/s6, CI, piped runs) so
the fail-closed SystemExit stays the backstop for unattended deploys. On the
password path it writes dashboard.basic_auth.{username,password_hash,secret}
to config.yaml (scrypt hash, never plaintext), then force-rediscovers
plugins so the basic provider registers before the gate check.
- web_server.py: fix the fail-closed hint — it told operators to set
`dashboard_auth.basic.username` but the provider reads `dashboard.basic_auth`.
- docs: note the interactive setup under Fail-closed semantics.
No new env vars; reuses the existing dashboard.basic_auth config surface.
* feat(cli): /prompt — compose your next prompt in $EDITOR
Adds /prompt (alias /compose): opens $VISUAL/$EDITOR on a temp markdown
file so you can hand-edit a multi-line prompt, then sends the saved buffer
as the next agent turn. Text after the command pre-seeds the buffer; an
empty save cancels. Reuses the one-shot _pending_agent_seed the interactive
loop already consumes (same mechanism as /blueprint), so no changes to the
input event loop or message pipeline. CLI-only.
* feat(tui): /prompt slash command opens $EDITOR (parity with CLI)
The TUI already opens $EDITOR via Ctrl+G (openEditor), but had no /prompt
slash command like the classic CLI. Wire openEditor into the slash handler
context and register /prompt (alias /compose) to call it; inline text after
the command is dropped into the composer first so it carries into the editor,
matching the CLI's /prompt <text>.
* feat(cli): /reasoning full to show complete thinking, not 10-line clamp
The post-response Reasoning recap box hard-clamped long thinking to the
first 10 lines, so there was no way to see the full reasoning trace after
a turn (live streaming already shows it in full). Add display.reasoning_full
(default off) plus /reasoning full|clamp to toggle it at runtime; the clamp
truncation note now points at the command. Addresses repeated user requests
to show all thinking tokens.
* test(gateway): de-snapshot /reasoning help assertion
The test froze the exact args-hint literal '/reasoning [level|show|hide]',
which the new full/clamp args change to '[level|show|hide|full|clamp]'.
Convert to an invariant: assert /reasoning is in help and carries its core
args, not the exact hint string.
* feat(tui): /reasoning full|clamp parity in tui_gateway
The classic-CLI reasoning_full toggle had no TUI equivalent — typing
/reasoning full in the TUI fell through to parse_reasoning_effort and
errored. The TUI renders thinking as an expand/collapse section (no fixed
10-line recap), so map full -> sections.thinking=expanded (raw, uncapped
via thinkingPreview mode='full') and clamp -> collapsed, persisting
display.reasoning_full for cross-surface config consistency.
* feat(providers): remove google-gemini-cli + google-antigravity OAuth providers
Google now actively bans accounts for third-party tools that piggyback on
Gemini CLI / Antigravity / Code Assist OAuth, and because abuse prevention
sits at a backend layer the ban can extend to the entire Google account
(Gmail/Drive), with a second violation being permanent.
Ref: https://github.com/google-gemini/gemini-cli/discussions/20632
Removes both OAuth inference providers entirely (modules, provider profiles,
auth/runtime/config/models wiring, the /gquota Code Assist quota command,
the antigravity-cli optional skill, desktop + docs surface in en + zh-Hans).
The API-key 'gemini' provider (GOOGLE_API_KEY/GEMINI_API_KEY against
generativelanguage.googleapis.com) is unaffected and stays fully supported.
* fix(skills): keep the antigravity-cli skill — only the OAuth provider is removed
The antigravity-cli optional skill orchestrates the external `agy` binary as
a coding-agent tool via the terminal tool — it does NOT wrap Hermes inference
through the banned google-antigravity OAuth provider, so it carries none of
the account-ban risk that motivated removing that provider. Restore the skill,
its docs page, the sidebar entry, and the optional-skills catalog row. The
google-antigravity / google-gemini-cli inference providers stay fully removed.
The welcome banner's 'Available Tools' merged in every toolset from the
global check_tool_availability() registry walk, regardless of whether it
was enabled for the current platform. On a Blank Slate CLI (file +
terminal only) that surfaced discord / feishu / kanban tools the agent
was never actually given — they are not in the agent's tool schema, but
the banner displayed them, making it look like they were exposed.
- Filter the unavailable-toolset merge to toolsets actually in
enabled_toolsets (a toolset that's enabled but has unmet deps still
legitimately shows as disabled/lazy).
- Gate the 'Available Skills' section on the skills toolset being
enabled — when it's off, the agent can't load any skill, so show
'Skills toolset disabled' instead of the on-disk catalog.
When enabled_toolsets is empty (older callers), behavior is unchanged.
Validation: blank-slate banner now shows only file + terminal and
'Skills toolset disabled'; a skills-enabled banner still lists the
catalog. Added regression tests; full banner suite green (15/15).
A daemon that ignores or stalls in its SIGTERM handler currently survives the
process-registry reap and leaks until reboot (observed as agent-browser
daemons accumulating to EMFILE on long-running gateways). _terminate_host_pid
now snapshots the tree, SIGTERMs it, waits a bounded grace window
(terminal.daemon_term_grace_seconds, default 2.0s, 0 disables), then SIGKILLs
any survivor. The recycled-PID identity guard still gates the whole path, so
escalation never reaches a stranger; Windows is unchanged (taskkill /F is
already a hard kill).
Config lives in config.yaml (terminal.daemon_term_grace_seconds), NOT an env
var, per the .env-secrets-only policy.
Implements the SIGKILL-escalation idea from @tkwong's #15008, reworked onto the
current _terminate_host_pid tree-kill path (the original predated it) and
config-gated instead of env-var-gated.
Co-authored-by: Benjamin Wong <tkwong@inspiresynergy.com>
Surface dangerous host/deployment posture at gateway startup so operators get
the 'you're exposed' signal the June 2026 MCP-config persistence campaign
victims never had. Warn-only — never blocks startup, never raises.
Checks (each independently fail-safe):
- Running as root (POSIX uid 0)
- SSH daemon with PasswordAuthentication enabled (incl. the 'yes' default)
- Running in a container with no persistent volume mount over HERMES_HOME
- Network-accessible API server with no API_SERVER_KEY
New module hermes_cli/security_audit_startup.py; invoked once per process from
start_gateway() right after setup_logging(). Cross-platform (root/SSH checks
no-op on Windows). Idea: @Cthulhu.
Remove the dashboard --insecure auth-bypass, add an MCP persistence guard +
IOC blocklist, and raise the API-server key entropy floor.
Driven by the June 2026 hermes-0day campaign (r/hermesagent, live 854.media
instance): scanners find exposed Hermes dashboards/API servers, drive the
root agent to plant a 'command: bash' MCP entry that appends an attacker SSH
key to authorized_keys, which cron + startup then re-execute every tick.
- dashboard: --insecure no longer disables the auth gate. should_require_auth
returns True for every non-loopback bind; a public bind ALWAYS requires an
auth provider (bundled password provider or OAuth). --insecure kept as a
warned no-op for backward compat. Fail-closed error now points at the
password provider, not at --insecure.
- mcp_security: validate_mcp_server_entry now also rejects shell payloads that
write to OS persistence surfaces (authorized_keys/.ssh/pam.d/sudoers/cron/
rc files) and hard-rejects a hermes-0day IOC blocklist (attacker SSH key +
source IPs) anywhere in command/args/env. Runs at save AND spawn time.
- api_server: raise network-bind API_SERVER_KEY entropy floor 8->16 chars;
warn when a network-accessible API server runs an unsandboxed local backend.
The kanban-worker and kanban-orchestrator bundled skills existed only to
be force-loaded into dispatcher-spawned workers, gated by
environments:[kanban] so they wouldn't leak into normal CLI listings.
That gating was fragile (the leak that #50443 patched) and the
--skills auto-load was already best-effort — most workers ran without it
because the bundled skill isn't present in profile-scoped skills dirs.
Remove the skills entirely and promote their load-bearing content
(workspace kinds, deliverable artifacts, created-card integrity, profile
discovery) into KANBAN_GUIDANCE, which is already injected into every
kanban worker's system prompt. Net result: every worker reliably gets
the guidance, nothing can leak into a CLI/blank-slate session, and the
gating machinery is gone.
- agent/prompt_builder.py: promote the 4 load-bearing rules into KANBAN_GUIDANCE
- hermes_cli/kanban_db.py: drop --skills kanban-worker auto-injection + _kanban_worker_skill_available probe
- hermes_cli/kanban_swarm.py: drop skills=[kanban-orchestrator] on the root card
- hermes_cli/kanban.py: drop kanban-init skill seeding; fix help text
- delete skills/devops/kanban-{worker,orchestrator}
- docs: delete the two skill pages (EN+zh), fix sidebars/catalog/kanban.md/kanban-worker-lanes.md and the video-orchestrator + codex-lane references
- tests: update spawn-argv expectations; re-bound the guidance-size guard
Supersedes the skill-leak half of #50443 (credit @helix4u for flagging the area).
CI on the salvage caught two issues the stale PR base masked:
1. The model-setup flows were extracted from main.py into
hermes_cli/model_setup_flows.py after @pmos69 forked. The cherry-pick
re-introduced a stale _model_flow_custom into main.py (duplicating the
one main.py now imports) and put _model_flow_google_antigravity there too.
Move the antigravity flow into model_setup_flows.py alongside its siblings
and drop the stale _model_flow_custom dup. Fixes the getpass/stdin OSError
in tests/cli/test_cli_provider_resolution.py.
2. google-antigravity re-exposes Claude/Gemini/GPT-OSS models, so its catalog
was hijacking bare short aliases (`sonnet` -> google-antigravity instead of
anthropic) in detect_static_provider_for_model via dict insertion order.
Add _BORROWED_MODEL_PROVIDERS and defer those providers to a last-resort
pass so a model's native vendor always wins alias/direct-catalog detection.
Fixes tests/hermes_cli/test_models.py::test_short_alias_resolves_to_static_model.
A bare custom provider configured via `model.api_base` (the intuitive name
OpenAI-SDK / LiteLLM users reach for) was silently ignored: `hermes config set`
accepts any dotted key, so `model.api_base` got written and confirmed, but the
runtime resolver reads only `model.base_url`. Requests fell back to OpenRouter
with an empty key -> 401, zero hits to the custom endpoint (issue #8919).
Now api_base is migrated to base_url at load time (fixes existing broken
configs) and at set time (with a notice), never overriding an explicit
base_url. Closes#8919.
On Windows, _pause_windows_gateways_for_update() force-kills every running
gateway before mutating the venv. Gateways mapped to a profile (via
profile.path/gateway.pid) were respawned afterward, but gateways with NO
profile mapping — e.g. a Windows Scheduled Task running
"pythonw.exe -m hermes_cli.main gateway run" — were force-killed and only
told to restart manually. After an auto-update/bootstrap the Telegram bot
stayed dead until manual intervention.
Now we snapshot each unmapped gateway's argv (psutil, guarded by
looks_like_gateway_command_line) before the kill and replay it through the
same detached watcher used for profile gateways, so unmapped gateways come
back automatically too.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
Per @egilewski's audit on this PR (#15544), the original fix was
correct but the file has refactored since: the four endpoint-local
empty-peer checks have been consolidated into _ws_client_is_allowed
and _ws_client_reason, but the helpers were left fail-open ('no peer
host known means allow' / 'no reason to block').
On a loopback-bound dashboard with auth disabled, an ASGI server
behind a misconfigured proxy or a unix-socket transport can deliver
ws.client == None or ws.client.host == ''. The helpers were treating
that as 'allowed', so the loopback-only peer gate could be bypassed
by anything that suppressed the client tuple in transit. All four
WebSocket endpoints (/api/pty, /api/ws, /api/pub, /api/events) route
through _ws_request_is_allowed -> _ws_client_is_allowed, so the gap
applied uniformly.
Fix:
* _ws_client_is_allowed: return False when client_host is empty
instead of True. Only reached on loopback bind with auth disabled
(auth_required=True and explicit non-loopback binds short-circuit
earlier), so the fail-closed behavior is scoped to the surface
that needs it.
* _ws_client_reason: return a 'missing_or_empty_peer bound=...'
block reason instead of None, so the dispatcher's existing
reason-based rejection path picks it up and the close gets logged
with a machine-parseable token for diagnosability.
Behavior unchanged for:
* gated mode (auth_required=True) — early-returns True before the
empty-peer check runs. The OAuth ticket is the auth at that point.
* explicit non-loopback bind (--host 0.0.0.0/::, or a specific LAN
address, always with --insecure) — early-returns True before the
empty-peer check runs. DNS-rebinding is still blocked by the
Host/Origin guard in _ws_host_origin_is_allowed.
* legitimate loopback peers (client_host == '127.0.0.1' / '::1') —
not affected by the empty-peer branch.
Regression tests added in tests/hermes_cli/test_dashboard_auth_ws_auth.py:
* test_empty_client_host_rejected_in_loopback_mode
* test_missing_client_object_rejected_in_loopback_mode
* test_empty_client_host_reason_is_block
Plus two regression guards to ensure the fix does not over-reach:
* test_empty_client_host_still_allowed_in_insecure_public_mode
* test_empty_client_host_still_allowed_in_gated_mode
All three new fail-closed tests fail without this patch (the helpers
return True / None for an empty peer) and pass with it. The 45
pre-existing tests in test_dashboard_auth_ws_auth.py continue to pass.
Root cause of #49145: the Windows ZIP-update path did rmtree(dst) then
copytree(src, dst). If the copy failed partway — common on that path,
which only runs because file I/O is already flaky on the machine — the
directory was left deleted with nothing copied back. ui-tui/ vanishing
is what broke 'hermes --tui' (WinError 267), but the bug hit every
top-level directory.
_atomic_replace_dir stages the new copy into a sibling temp dir and only
swaps it in on full success, restoring the original on failure. A failed
update now leaves the live tree untouched instead of half-deleted.
The Windows update path can leave tracked ui-tui/ files deleted in the
working tree (HEAD intact). The guard now self-heals: when ui-tui/ is
missing in a git checkout, run `git restore -- ui-tui` and continue,
falling back to the printed manual-recovery steps only when git can't
recover it (no checkout / restore failed).
Builds on konsisumer's missing-workspace guard.
After a worker crash + reclaim + respawn, the board could show a task in the
Ready lane while its task_run was 'running' and the new worker was actively
executing (#36910). The dispatcher could then treat live work as available and
double-assign.
Root cause: the three reclaim paths (detect_crashed_workers,
release_stale_claims heartbeat-stale backstop, enforce_max_runtime) each
snapshot a task's worker_pid/claim_lock, do liveness work, then reset
tasks.status back to 'ready' with only a 'WHERE status=running' guard. If the
task was reclaimed AND re-claimed by a NEW worker in between (new run, new
claim_lock, live pid), the stale UPDATE clobbered the live task: status flipped
to 'ready' while the fresh run stayed 'running'. claim_task is the only writer
that sets status='running', so nothing put it back — permanent desync.
Fix: gate each reset on the snapshot's claim_lock (and worker_pid where
available) so it only fires when the task is still owned by the worker the
reclaim was computed for. A stale reclaim now no-ops (rowcount 0) instead of
desyncing a re-claimed task. Genuine crashes (lock still matches) reclaim
exactly as before.
This is the same race class the in-gateway dispatch lock (single-writer ticks)
mitigates, closed at the row level so a single dispatcher's fast
reclaim->respawn across two ticks is also safe.
Closes#36910.
connect() wrapped its entire body in an unbounded blocking flock(LOCK_EX) on
every call (_cross_process_init_lock). A single process stalled inside the
critical section — or a stale lock held by a wedged worker — blocked every
other connect(), including the long-lived gateway dispatcher's next-tick
connect, forever. No timeout, no traceback, no recovery: the board silently
stopped being worked until a manual restart (issue #36644).
Two fixes:
1. Fast-path skip: once THIS process has initialized a path, the expensive
first-open work (header validation, integrity probe, schema + additive
migrations) is already cached in _INITIALIZED_PATHS. The steady-state
connect has nothing for the cross-process lock to protect, so it now opens
the connection (WAL + pragmas) under only the cheap in-process _INIT_LOCK
and never touches the file lock. This removes the lock from the dispatcher's
hot path entirely — a stalled external 'hermes kanban list' can no longer
block ticks.
2. Bounded acquire: even on first-init, _cross_process_init_lock now retries a
non-blocking acquire up to a 10s deadline, then logs a WARNING and proceeds
WITHOUT the cross-process lock. Safe because the in-process _INIT_LOCK still
serializes same-process threads and the init work is idempotent
(CREATE TABLE IF NOT EXISTS + additive migrations) — worst case is redundant
work, not corruption. A bounded 'proceed anyway' beats an unbounded hang.
Windows path switched LK_LOCK -> LK_NBLCK (non-blocking) to match.
Closes#36644.
_default_spawn launched the worker subprocess with cwd=workspace and set
HERMES_KANBAN_WORKSPACE, but never set TERMINAL_CWD — so the worker inherited
the dispatching gateway's TERMINAL_CWD. That value takes precedence over the
process cwd in two places:
- tools/file_tools.py::_resolve_base_dir — a relative write_file path resolved
against the gateway user's home instead of the workspace, so artifacts
silently landed outside the workspace (#41312).
- agent_init's context-file loader — AGENTS.md was discovered relative to the
gateway's cwd, so under multi-profile dispatch a worker loaded whichever
gateway won the claim race's AGENTS.md, not the task's (#34619).
Both are the same root cause. Pinning TERMINAL_CWD to the workspace (where the
task's work actually happens) fixes both. Guarded on an existing absolute dir
because file_tools rejects relative/sentinel TERMINAL_CWD values — a non-dir
workspace leaves the inherited value rather than writing a meaningless one.
Closes#34619, closes#41312.
Plugins could observe session/tool/approval lifecycle but had no way to
observe kanban task transitions. Adds three observer hooks fired by the
board's claim/complete/block transitions:
- kanban_task_claimed (dispatcher process, before worker spawn)
- kanban_task_completed (worker process, carries summary)
- kanban_task_blocked (worker process, carries reason)
Each fires AFTER the DB write txn commits, so a plugin observes durable
state and a slow/hanging callback can never hold the SQLite write lock.
All firing is best-effort: a raising hook is logged and swallowed and
never breaks a board transition. profile_name is resolved from
HERMES_HOME so dispatcher- and worker-side hooks carry the right profile.
Requested by @Smithangshu on Discord.
Plugins previously had no way to read the active profile name from the
PluginContext. The workaround in the wild — reaching into
ctx._manager._cli_ref — only works in an interactive CLI session;
_cli_ref is None in the gateway and in kanban-spawned worker sessions
(hermes -p <profile> chat -q ...), so the workaround breaks exactly
where multi-profile awareness matters most.
ctx.profile_name wraps hermes_cli.profiles.get_active_profile_name(),
which derives the name from HERMES_HOME and therefore works in every
execution context with zero dependency on _cli_ref.
Fixes a regression introduced by the prior approach (synchronous import
hermes_cli.gateway inside _lifespan) that caused a new failure mode:
the blocking import stalled the asyncio event loop before uvicorn could
bind its port, pushing HERMES_DASHBOARD_READY past the desktop shell's
45 s announcement deadline and triggering a respawn loop that accumulated
orphaned backend processes.
Two-part fix:
_lifespan: replace the blocking import with a fire-and-forget
run_in_executor call (_warm_gateway_module). The import runs in a
worker thread while the server socket is already open, so
HERMES_DASHBOARD_READY fires without delay.
get_status: replace the inline lazy import with
await run_in_executor(None, _resolve_restart_drain_timeout). This is
the root fix for the original 15 s socket-timeout: the blocking
.pyc-compilation + Defender scan is offloaded to a thread, keeping the
event loop free for every /api/status probe. After the first call the
module is in sys.modules and the executor returns in microseconds.
Both helpers are extracted as module-level sync functions so they can
be unit-tested independently of FastAPI or uvicorn.
Closes#50209
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A shell-launched 'hermes gateway run --replace' / 'gateway restart' on a
systemd/launchd host can leave an orphan gateway whose kanban dispatcher
escapes the service cgroup, survives 'systemctl restart', and becomes a
second long-lived writer on the shared kanban.db. Two dispatchers that each
believe they own the file both pass SQLite busy_timeout and then race on WAL
frames — the documented root cause of multi-writer corruption (issue #35240).
The existing _guard_supervised_gateway_conflict startup guard blocks the
common way an orphan is born, but does nothing once a second dispatcher
already exists. This adds the defense-in-depth: dispatch_once now wraps every
tick in a non-blocking, board-scoped flock (_dispatch_tick_lock). A losing
dispatcher returns DispatchResult(skipped_locked=True) and does zero DB writes
this tick — so two dispatchers can never run a reclaim/spawn/write sequence
concurrently regardless of how the second one got there.
- Non-blocking (LOCK_NB): never stalls the gateway's async watcher.
- Board-scoped: lock file is a .dispatch.lock sibling of each board's
kanban.db, so unrelated boards tick in parallel.
- POSIX + Windows (fcntl / msvcrt LK_NBLCK), no-op degrade where neither
exists — mirrors the existing _cross_process_init_lock pattern.
Verified with a real two-process orphan repro: while a separate process holds
the lock, dispatch_once skips; after release it runs.
hermes backup only walks HERMES_HOME, so memory providers that keep
config/credentials in home-anchored dotdirs (honcho -> ~/.honcho,
hindsight -> ~/.hindsight, openviking -> ~/.openviking) lost that data
across a backup/import cycle — the peer IDs, session pairings, and API
keys never made it into the archive.
Add an optional MemoryProvider.backup_paths() hook (default []). The
active provider declares its external paths; backup resolves them from
config only (no init, no network), archives the ones under the home dir
into a reserved _external/ subtree encoded relative to home, and import
restores them to their original location with a home-anchored traversal
guard and 0600 on credential-shaped files. Paths outside home are
skipped as non-portable.
honcho, hindsight, and openviking override the hook. E2E-validated full
backup->import cycle plus 7 new tests.
Rich messages are not ready for primetime: current Telegram clients can
render Bot API 10.1 rich messages as blank/unsupported bubbles and make
them hard to copy as plain text, which is worse than the legacy
MarkdownV2 path for command snippets and mobile handoffs. Default the
rich_messages toggle to False so replies stay on the copyable legacy
path; users opt in per bot via platforms.telegram.extra.rich_messages:
true. Updates adapter, gateway config default, example config, English +
zh-Hans docs, and the default/opt-in tests.
Inbound image/audio/video payloads were buffered fully into process memory
before being written to the cache, with no size limit. A large upload
(Discord Nitro allows 500 MB) or a remote media URL in an inbound message
pointing at a huge file could spike RAM and OOM-kill the gateway.
Enforce a configurable cap in the shared cache helpers (gateway/platforms/
base.py) so the protection holds across every platform adapter, not one:
- cache_image/audio/video_from_bytes reject oversized payloads before writing
(video was the gap in the original report — now covered).
- cache_image/audio_from_url stream the body, rejecting on an oversized
Content-Length header and re-checking the running total per chunk so an
absent/lying header can't smuggle an unbounded body past the cap.
- Discord's _read_attachment_bytes checks att.size up front, so an oversized
attachment is rejected before any bytes are pulled into memory.
Configurable via gateway.max_inbound_media_bytes in config.yaml (default
128 MiB; 0 disables). No new env var — non-secret config lives in config.yaml.
Salvaged and extended from @sgaofen's PR #13341 (the original report and the
shared-helper approach). Reapplied onto current main (Discord adapter has
since moved to plugins/platforms/discord/), the configurable knob moved from
an env var to config.yaml, and the video cache helper added.
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
hermes config show printed the model dict raw via print(), bypassing the
logging redactor; a custom-provider api_key (e.g. Cloudflare cfut_...) was
shown in plaintext even with security.redact_secrets=true. Opaque tokens
don't match any vendor-prefix regex, so structural key-name masking is
required.
- Add redact_config_value(): recursively masks credential-shaped keys
(api_key/token/secret/... exact-match) via mask_secret.
- Wrap the show_config model dump in it.
- Mask the set_config_value echo when the leaf key is credential-shaped
(config set model.api_key routes to config.yaml, lowercase misses the
.env allowlist).
The gateway pre-compression hygiene valve force-compressed any session
crossing 400 messages regardless of token usage. On large-context (1M+)
models doing many short, message-dense turns, a healthy session at ~16%
token usage could hit 400 messages and get force-compressed — and the
compression summary's stale Active Task could then bleed into the next
turn.
The valve's actual purpose is to break a death spiral: when API calls
keep disconnecting on an oversized session, no token-usage data arrives,
the token threshold never fires, and the transcript grows unbounded.
It's a count-based floor for that pathological case only. 400 was tuned
for ~200K-context models and is far too low for modern large-context
sessions. Raise the default to 5000 — still well clear of any death
spiral, but no longer firing on legitimate long conversations.
The value remains fully configurable via compression.hygiene_hard_message_limit.
The OpenAI-compatible API server only enforced a hardcoded cap of 10
concurrent runs on /v1/runs, leaving /v1/chat/completions and
/v1/responses unbounded — a request flood could exhaust CPU, memory,
and upstream LLM quota (#7483).
- Add gateway.api_server.max_concurrent_runs (config.yaml, default 10,
0 disables). No env var.
- Shared concurrency gate across all three agent-serving endpoints,
counting both the chat/responses in-flight counter and the /v1/runs
stream set. Returns OpenAI-style 429 + Retry-After when at the cap.
- Remove the dead hardcoded _MAX_CONCURRENT_RUNS class attribute.
Closes#7483.
Second cleanup pass (simplify-code review of the first follow-up):
- write_runtime_status now clamps active_agents via parse_active_agents
instead of an inline max(0, int(...)). Removes the duplicated clamp the
helper's docstring acknowledged AND closes a write-side ValueError gap
(a non-numeric active_agents previously raised; now degrades to 0).
- hermes_cli/gateway.py draining-status line routes its active-agents count
through parse_active_agents too — the third coercion site of the same
persisted field, now consistent and non-raising with the two HTTP surfaces.
- web_server.py /api/status: the drain-timeout resolver fallback now catches
ImportError specifically and falls back to DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
(a real float) instead of a blanket 'except Exception -> None'. None would
have violated the surfaced field's int/float contract and stripped NAS's
poll-deadline hint silently.
- Dropped a redundant 'if runtime else 0' branch (parse_active_agents already
handles the empty/None case) and tightened the parse_active_agents docstring
to describe the actual single-contract role (write + both reads).
Follow-up cleanups on top of the busy/idle readout (PR #50103):
- web_server.py /api/status reused the single drain-timeout resolver
hermes_cli.gateway._get_restart_drain_timeout() (HERMES_RESTART_DRAIN_TIMEOUT
env -> agent.restart_drain_timeout config -> default) instead of inlining a
third hand-rolled copy of that precedence chain. Also fixes a subtle
divergence: the inline copy used os.environ.get() so a set-but-empty env var
was treated as a value rather than falling through to config; the shared
resolver .strip()s and falls through correctly.
- Added gateway.status.parse_active_agents() and routed BOTH HTTP surfaces
(/api/status and /health/detailed) through it, so the exposed active_agents
field is consistently clamped non-negative. Previously /api/status clamped
while /health/detailed exposed the raw file value, diverging on a corrupt
count.
- Added TestParseActiveAgents covering the shared coercion contract.
Give an external consumer (NAS) a trustworthy, always-reachable busy/idle
readout it can poll before a disruptive lifecycle action (restart,
migrate, stop, auto-update). The dashboard /api/status is the only HTTP
surface guaranteed up on a hosted agent regardless of which gateway
platforms are enabled, and it already reads gateway_state.json.
Add to /api/status (additive, non-breaking):
- active_agents — in-flight gateway-turn count (now refreshed
per-turn by the companion gateway-side commit)
- gateway_busy — running AND active_agents > 0
- gateway_drainable — running and live (a valid begin-drain target)
- restart_drain_timeout — resolved seconds, so the consumer can size its
poll deadline without out-of-band knowledge
(env HERMES_RESTART_DRAIN_TIMEOUT → config
agent.restart_drain_timeout → default)
The busy/drainable contract is defined once in gateway.status
(derive_gateway_busy / derive_gateway_drainable) and consumed by both
/api/status and /health/detailed so the two surfaces can never disagree.
Liveness keys off gateway_running (a live PID/health probe), NEVER
gateway_updated_at — a healthy idle gateway never advances that timestamp.
All derived fields degrade to safe falsy values when the gateway is down
or the status file is absent/corrupt (never a spurious "busy" that would
wedge the consumer). active_sessions (the 5-min DB recency heuristic the
SPA reads) is left exactly as-is — new signal, new fields.
Tests (behaviour contracts, not snapshots): the pure derivation contract
across every running/state/count/liveness combination; /api/status
integration for busy, idle-drainable, draining, down, stale-busy-file,
corrupt-count, and timeout surfacing; and /health/detailed parity.
Follow-up to the salvaged preflight-compression warning:
- Replace silent `except Exception: pass` at all 5 guard call sites
(cli.py x2, gateway/slash_commands.py x2, tui_gateway/server.py) with
`logger.debug(...)` so signature drift in the guard helper isn't hidden.
- tui_gateway/server.py: set the confirm dict's `warning` field to the
merged message (was bare expensive-model text) so it matches
`confirm_message` for any future consumer reading `warning`.
- Add trailing newlines to the two new files.
Adds hermes_cli/context_switch_guard.py mirroring the model_cost_guard
pattern. When a user switches models mid-session (Herm TUI picker, CLI,
or /model on Telegram/Discord), the warning surfaces on the existing
ModelSwitchResult.warning_message path used by the expensive-model
guard if the new model's compression threshold is below the current
session size.
Partial fix for #23767 — addresses only the 'user-facing guardrail
when switching from a high-context provider to a substantially
lower-context provider' slice. The other proposed fixes from that
issue (hard preflight token guard, metadata cache invalidation on
switch, compression safety invariant, oversized tool-output handling)
are out of scope for this PR.
`cronjob(action='run')` (and `hermes cron run`) only set `next_run_at = now`
and returned success, relying on the scheduler ticker to actually execute the
job on its next tick. When no gateway/ticker is running — a CLI-only setup, or
the Windows case in #41037 — the job never executed: `run` reported success,
but `last_run_at` stayed null forever, no output, no delivery.
A manual `run` should actually run. `_execute_job_now` now:
- **claims the job via `claim_job_for_fire`** — the same at-most-once CAS the
scheduler/external-provider fire path uses. This both advances `next_run_at`
for recurring jobs and blocks a concurrently-running gateway ticker from
double-firing the same job; if the claim is lost, the run is skipped (the
tool reports `execution_skipped`). This closes the double-fire race that a
bare `advance_next_run` left open (a tick whose `get_due_jobs` already
captured the job between trigger and advance would still fire it).
- **delegates firing to `run_one_job`** — the single shared
execute→save→deliver→mark body the ticker and external providers use — so
failure delivery, `[SILENT]` handling, and live-adapter delivery stay
identical across paths and can't drift. (The original salvage re-implemented
this sequence inline and had already dropped failure delivery + `[SILENT]`.)
The tool response carries `executed`, `execution_success`, and either
`execution_error` or `execution_skipped`. The `hermes cron run` CLI message no
longer claims "It will run on the next scheduler tick" — it reports the actual
"Ran now: succeeded/failed" outcome (or the skip).
Salvaged from #41130 by @kyssta-exe (authorship preserved); reworked to reuse
`claim_job_for_fire` + `run_one_job` per review rather than re-implementing the
fire sequence inline. Adds tests for the claim-then-fire path, claim-lost skip,
failure reporting, and exception capture.
Fixes#41037
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
The in-process cron ticker (cron/scheduler_provider.py) caught only
`Exception` and logged at DEBUG, so a `SystemExit`/`KeyboardInterrupt`
raised from a misbehaving provider SDK or agent retry path killed the
ticker thread silently. The gateway PROCESS stayed up, so `hermes cron
status` — which only checks `find_gateway_pids()` — kept reporting
"✓ jobs will fire automatically" while no jobs ever fired (#32612,
#32895).
This makes ticker death survivable and detectable:
- The ticker loop now catches `BaseException` and logs at ERROR with a
traceback, so a single bad tick no longer tears the thread down and
the failure is visible in the gateway log.
- The loop records a heartbeat (`cron/ticker_heartbeat`, epoch seconds)
on startup and after every tick — best-effort, never raised into the
loop. Both ticker entry points (the gateway and the desktop fallback
in web_server.py) funnel through `InProcessCronScheduler.start`, so one
heartbeat site covers both.
- `hermes cron status` now reads the heartbeat age: if the gateway is
running but the heartbeat is stale (> 200s, i.e. several missed ~60s
ticks), it reports the ticker as STALLED and suggests a restart instead
of falsely claiming jobs will fire. A missing heartbeat (older build /
never ran) is treated as "unknown", not "dead".
Adds tests for BaseException survival, per-iteration heartbeat recording,
heartbeat round-trip/age, staleness detection, and silent-write-failure.
Salvaged from #49660 (BaseException survival on current structure),
extended with the heartbeat + honest-status reporting that the earlier
(pre-refactor) watchdog PRs #35616 and #33849 proposed.
Fixes#32612Fixes#32895
Co-authored-by: banditburai <promptsiren@gmail.com>
Co-authored-by: sweetcornna <96944678+sweetcornna@users.noreply.github.com>
The terminal backend onboarding step pointed at
/docs/developer-guide/environments, which no longer exists. Point it at
the live docs page /docs/user-guide/configuration#terminal-backend-configuration.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
resolve_spotify_runtime_credentials() called _refresh_spotify_oauth_state()
without a try/except, so a terminal failure (HTTP 400/401, invalid_grant,
refresh_token_reused) raised AuthError but left the dead refresh_token in
auth.json. Every subsequent session re-read and retried the same token over
the network, failing identically each time.
Fix: wrap the refresh call and, when exc.relogin_required is True and a
refresh_token is present, clear the dead OAuth fields (access_token,
refresh_token, expires_at, expires_in, obtained_at) and write a
last_auth_error quarantine marker to auth.json before re-raising. The next
call sees no access_token and fails fast with spotify_access_token_missing —
no network retry — and the user is prompted to re-authenticate.
Mirrors the quarantine pattern already in place for Nous, xAI-OAuth,
Codex-OAuth (#28116, #28118), and MiniMax-OAuth (#28119).
Three state-loss bugs at the compression rotation boundary, fixed together
because they all live in the same ~80-line rotation block:
- #33618: a persistent /goal did not follow the rotation. load_goal does a
flat per-session lookup with no lineage walk, so a goal silently died when
compression minted a fresh child id. Added migrate_goal_to_session() and
call it after the child session is created (move-not-copy: the parent row
is archived as cleared so exactly one active goal row exists).
- #33906/#33907: if the child create_session raised (FK constraint,
contended write), the outer handler only warned and let the agent continue
on the NEW id — which has no row in state.db — producing an orphan session.
Now the rotation rolls agent.session_id back to the still-indexed parent
(reopening it) instead of stranding the conversation on a phantom id.
- #27633: the compaction-boundary on_session_start notification omitted the
platform kwarg, so context-engine plugins saw source=unknown for every
message after the boundary. Forward platform (matching the initial
session-start call in agent_init.py).
Co-authored-by: denisqq <21260182+denisqq@users.noreply.github.com>
Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com>
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>