* fix(api-server): stop silently promising async delivery on stateless HTTP path
terminal(notify_on_complete=True / watch_patterns) and delegate_task(background=True)
silently no-op'd on the API server / WebUI path (#10760): the watcher / detached
child registered, but every API-server route (OpenAI-spec /v1/chat/completions
and /v1/responses, plus the proprietary /v1/runs SSE stream) tears down its
channel when the turn ends, and APIServerAdapter.send() is a no-op stub. A
completion that fires after the response closed had nowhere to go — from the
agent side, indistinguishable from a hang.
There is no spec-compliant surface to wake the agent later on a stateless HTTP
client, so make the no-op honest instead of silent:
- Add a per-adapter capability flag supports_async_delivery (default True;
APIServerAdapter = False), propagated into a HERMES_SESSION_ASYNC_DELIVERY
contextvar via async_delivery_supported(). Toggle on the adapter, not a
hardcoded platform string — a future stateless adapter is correct-by-default.
- terminal: when delivery is unsupported, skip watcher registration, force
notify_on_complete off, and return a notify_unsupported note telling the
agent to process(action='poll').
- delegate_task: when delivery is unsupported, fall back to SYNCHRONOUS
execution (work runs and returns in the same response) with a note, instead
of handing out a handle that never resolves.
CLI (in-process completion_queue) and the real gateway platforms are unchanged.
Fixes#10760
* refactor(api-server): route session binding through a single no-delivery chokepoint
Add APIServerAdapter._bind_api_server_session() and route both agent-entry
paths (_run_agent for /v1/chat/completions + /v1/responses, and the /v1/runs
_run_sync path) through it. The helper hardwires platform="api_server" and
async_delivery=False with no async_delivery parameter to pass, so a future
route added to the API server physically cannot reintroduce the silent
no-op (#10760) by forgetting to mark the channel as non-delivering.
The binding stays request-scoped (cleared per turn), so a session resumed
later on a delivering interface (CLI / gateway platform) re-binds fresh and
is NOT blocked — the no-delivery decision tracks the interface handling the
current turn, never the session.
A turn forcibly interrupted by the drain-timeout escalation never reaches
turn_finalizer.finalize_turn (the only place that flushes the turn to
state.db). Its in-flight tool rounds live only in the in-memory
_session_messages, so the immediate pre-restart turn was silently dropped
from load_transcript() on resume.
_finalize_shutdown_agents now flushes _session_messages to the SQLite
session store before teardown. The flush is idempotent (identity-tracked
in _flush_messages_to_session_db), so agents that finished gracefully
re-flush nothing. The resume_pending / fresh-tool-tail branches in
_handle_message_with_agent already expect a transcript whose tail may be a
pending tool result.
Fixes#13121.
Inbound image/audio/video payloads were buffered fully into process memory
before being written to the cache, with no size limit. A large upload
(Discord Nitro allows 500 MB) or a remote media URL in an inbound message
pointing at a huge file could spike RAM and OOM-kill the gateway.
Enforce a configurable cap in the shared cache helpers (gateway/platforms/
base.py) so the protection holds across every platform adapter, not one:
- cache_image/audio/video_from_bytes reject oversized payloads before writing
(video was the gap in the original report — now covered).
- cache_image/audio_from_url stream the body, rejecting on an oversized
Content-Length header and re-checking the running total per chunk so an
absent/lying header can't smuggle an unbounded body past the cap.
- Discord's _read_attachment_bytes checks att.size up front, so an oversized
attachment is rejected before any bytes are pulled into memory.
Configurable via gateway.max_inbound_media_bytes in config.yaml (default
128 MiB; 0 disables). No new env var — non-secret config lives in config.yaml.
Salvaged and extended from @sgaofen's PR #13341 (the original report and the
shared-helper approach). Reapplied onto current main (Discord adapter has
since moved to plugins/platforms/discord/), the configurable knob moved from
an env var to config.yaml, and the video cache helper added.
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
Funnel session finalization through AIAgent.close() — the single terminal
path every agent (CLI, gateway, subagent, cron) funnels through — so finished
agents stop leaving rows with ended_at IS NULL. The biggest leak source was
delegate_task subagent + background-review forks whose close() never ended
their row.
end_session() is first-reason-wins and no-ops on an already-ended row, so a
'compression'/'cron_complete'/'cli_close' reason set by an earlier terminal
path is never clobbered. /resume already calls reopen_session(), so
finalizing-on-close does not break resumability.
Temporary helper agents that rotate/share the session forward (manual
compression, gateway session-hygiene) opt out via _end_session_on_close=False.
Also stop the long-running gateway heartbeat once the executor is done or the
session slot is rebound to a different agent, preventing a stale
'running: delegate_task' bubble from outliving its run.
Closes#12029.
The gateway pre-compression hygiene valve force-compressed any session
crossing 400 messages regardless of token usage. On large-context (1M+)
models doing many short, message-dense turns, a healthy session at ~16%
token usage could hit 400 messages and get force-compressed — and the
compression summary's stale Active Task could then bleed into the next
turn.
The valve's actual purpose is to break a death spiral: when API calls
keep disconnecting on an oversized session, no token-usage data arrives,
the token threshold never fires, and the transcript grows unbounded.
It's a count-based floor for that pathological case only. 400 was tuned
for ~200K-context models and is far too low for modern large-context
sessions. Raise the default to 5000 — still well clear of any death
spiral, but no longer firing on legitimate long conversations.
The value remains fully configurable via compression.hygiene_hard_message_limit.
The OpenAI-compatible API server only enforced a hardcoded cap of 10
concurrent runs on /v1/runs, leaving /v1/chat/completions and
/v1/responses unbounded — a request flood could exhaust CPU, memory,
and upstream LLM quota (#7483).
- Add gateway.api_server.max_concurrent_runs (config.yaml, default 10,
0 disables). No env var.
- Shared concurrency gate across all three agent-serving endpoints,
counting both the chat/responses in-flight counter and the /v1/runs
stream set. Returns OpenAI-style 429 + Retry-After when at the cap.
- Remove the dead hardcoded _MAX_CONCURRENT_RUNS class attribute.
Closes#7483.
Second cleanup pass (simplify-code review of the first follow-up):
- write_runtime_status now clamps active_agents via parse_active_agents
instead of an inline max(0, int(...)). Removes the duplicated clamp the
helper's docstring acknowledged AND closes a write-side ValueError gap
(a non-numeric active_agents previously raised; now degrades to 0).
- hermes_cli/gateway.py draining-status line routes its active-agents count
through parse_active_agents too — the third coercion site of the same
persisted field, now consistent and non-raising with the two HTTP surfaces.
- web_server.py /api/status: the drain-timeout resolver fallback now catches
ImportError specifically and falls back to DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
(a real float) instead of a blanket 'except Exception -> None'. None would
have violated the surfaced field's int/float contract and stripped NAS's
poll-deadline hint silently.
- Dropped a redundant 'if runtime else 0' branch (parse_active_agents already
handles the empty/None case) and tightened the parse_active_agents docstring
to describe the actual single-contract role (write + both reads).
Follow-up cleanups on top of the busy/idle readout (PR #50103):
- web_server.py /api/status reused the single drain-timeout resolver
hermes_cli.gateway._get_restart_drain_timeout() (HERMES_RESTART_DRAIN_TIMEOUT
env -> agent.restart_drain_timeout config -> default) instead of inlining a
third hand-rolled copy of that precedence chain. Also fixes a subtle
divergence: the inline copy used os.environ.get() so a set-but-empty env var
was treated as a value rather than falling through to config; the shared
resolver .strip()s and falls through correctly.
- Added gateway.status.parse_active_agents() and routed BOTH HTTP surfaces
(/api/status and /health/detailed) through it, so the exposed active_agents
field is consistently clamped non-negative. Previously /api/status clamped
while /health/detailed exposed the raw file value, diverging on a corrupt
count.
- Added TestParseActiveAgents covering the shared coercion contract.
Give an external consumer (NAS) a trustworthy, always-reachable busy/idle
readout it can poll before a disruptive lifecycle action (restart,
migrate, stop, auto-update). The dashboard /api/status is the only HTTP
surface guaranteed up on a hosted agent regardless of which gateway
platforms are enabled, and it already reads gateway_state.json.
Add to /api/status (additive, non-breaking):
- active_agents — in-flight gateway-turn count (now refreshed
per-turn by the companion gateway-side commit)
- gateway_busy — running AND active_agents > 0
- gateway_drainable — running and live (a valid begin-drain target)
- restart_drain_timeout — resolved seconds, so the consumer can size its
poll deadline without out-of-band knowledge
(env HERMES_RESTART_DRAIN_TIMEOUT → config
agent.restart_drain_timeout → default)
The busy/drainable contract is defined once in gateway.status
(derive_gateway_busy / derive_gateway_drainable) and consumed by both
/api/status and /health/detailed so the two surfaces can never disagree.
Liveness keys off gateway_running (a live PID/health probe), NEVER
gateway_updated_at — a healthy idle gateway never advances that timestamp.
All derived fields degrade to safe falsy values when the gateway is down
or the status file is absent/corrupt (never a spurious "busy" that would
wedge the consumer). active_sessions (the 5-min DB recency heuristic the
SPA reads) is left exactly as-is — new signal, new fields.
Tests (behaviour contracts, not snapshots): the pure derivation contract
across every running/state/count/liveness combination; /api/status
integration for busy, idle-drainable, draining, down, stale-busy-file,
corrupt-count, and timeout surfacing; and /health/detailed parity.
The gateway only rewrote gateway_state.json on lifecycle transitions
(start/connect/drain/stop), never on turn start/end. Live-verified on a
hosted agent: a confirmed end-to-end turn ran while gateway_updated_at
stayed frozen at boot and active_agents was absent — so any active_agents
read from the file between transitions is stale. That makes it unusable
as a busy/idle signal for an external consumer (NAS deciding whether it's
safe to restart/migrate/auto-update an agent mid-turn).
Add _persist_active_agents(), called at every turn boundary:
- turn start: both running-agent sentinel-claim sites (normal inbound
message path + startup-resume path)
- turn end: the central _release_running_agent_state() choke point
(covers normal completion, /stop, /reset, sentinel cleanup,
stale-eviction — every path that ends a running turn)
It passes ONLY active_agents to write_runtime_status, leaving
gateway_state (and every other field) _UNSET so the read-merge-write
preserves the current lifecycle state. Passing gateway_state=None would
clobber it — hence a dedicated helper rather than reusing
_update_runtime_status. The write is the same cheap JSON write done on
lifecycle transitions today; best-effort (a failed status write never
disrupts a turn).
Behaviour-contract test: an active_agents-only write preserves both
running and draining gateway_state, and the count clamps non-negative.
Follow-up to the salvaged preflight-compression warning:
- Replace silent `except Exception: pass` at all 5 guard call sites
(cli.py x2, gateway/slash_commands.py x2, tui_gateway/server.py) with
`logger.debug(...)` so signature drift in the guard helper isn't hidden.
- tui_gateway/server.py: set the confirm dict's `warning` field to the
merged message (was bare expensive-model text) so it matches
`confirm_message` for any future consumer reading `warning`.
- Add trailing newlines to the two new files.
Adds hermes_cli/context_switch_guard.py mirroring the model_cost_guard
pattern. When a user switches models mid-session (Herm TUI picker, CLI,
or /model on Telegram/Discord), the warning surfaces on the existing
ModelSwitchResult.warning_message path used by the expensive-model
guard if the new model's compression threshold is below the current
session size.
Partial fix for #23767 — addresses only the 'user-facing guardrail
when switching from a high-context provider to a substantially
lower-context provider' slice. The other proposed fixes from that
issue (hard preflight token guard, metadata cache invalidation on
switch, compression safety invariant, oversized tool-output handling)
are out of scope for this PR.
After context compression, the agent re-sent an already-delivered
generated image on every subsequent turn (#46627). The auto-append
fallback rescans full history when the message list shrinks (compression-
safe path), deduping against _history_media_paths — but that set was built
by scanning ONLY MEDIA: text tags in tool results. image_generate returns
its path in a JSON payload field (host_image/image/agent_visible_image),
never a MEDIA: tag, so generated-image paths never entered the dedup set
and were re-emitted after the boundary.
Extract the history-path collection into _collect_history_media_paths(),
which now covers BOTH delivery shapes: MEDIA: text tags AND image_generate
JSON-payload paths (mirroring what _collect_auto_append_media_tags
extracts). The inline block in _handle_message is replaced with a call to
the helper.
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Gateway Session Hygiene auto-compression destroyed the original transcript
when the throwaway hygiene agent couldn't rotate the session (#21301, P1).
The _hyg_agent is built WITHOUT a session_db, so _compress_context cannot
end-and-fork the session (its rotate block is gated on agent._session_db).
The session_id stays unchanged, and the rewrite_transcript() call ran
UNCONDITIONALLY — replacing the full original transcript with just the
head+summary list. Permanent data loss on every hygiene compaction.
Guard the rewrite behind 'rotated OR in-place' exactly like the /compress
path already does (#44794/#39704): only overwrite when a new session id
was minted or in-place compaction succeeded; otherwise preserve the
original transcript and log a warning. The token/count bookkeeping that
followed the rewrite is moved inside the guard, with no-change values in
the preserve branch.
Co-authored-by: SandroHub013 <sandrohub013@gmail.com>
Co-authored-by: WuTianyi123 <wtyopenclaw@gmail.com>
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
When the agent is busy and the user sends multiple text follow-ups, the
interrupt-mode and steer-fallback path stored them via
merge_pending_message_event(merge_text=True), which newline-joins
consecutive TEXT messages into a SINGLE pending turn — collapsing two
separate user messages into one mashed-together turn and destroying the
message boundaries the user sees (#43066 sub-bug 2).
Route that storage through _queue_or_replace_pending_event (the same FIFO
infrastructure used by busy queue-mode and /queue) so each follow-up gets
its own next-turn slot in arrival order, while still preserving
photo-burst / album merge semantics for media. Pure queue-mode already
used FIFO; this brings the interrupt/steer-fallback path in line.
The sibling defect in #43066 (assistant messages lost after compaction)
was already fixed on main by the identity-tracking flush rewrite (#46053)
plus the pre-rotation flush (#47202), so this only addresses the
remaining busy-message-merge half.
Co-authored-by: KiruyaMomochi <65301509+KiruyaMomochi@users.noreply.github.com>
In Docker the install tree (/opt/hermes) is read-only, so npm install for
the WhatsApp bridge fails with EACCES. Add resolve_whatsapp_bridge_dir() in
whatsapp_common.py: when the install dir is read-only, mirror the bridge
source into a writable HERMES_HOME location and use that. Both the
adapter and the 'hermes whatsapp' CLI resolve through the shared helper so
the install and runtime paths agree.
Fixes#49561
Async-delegation completions (delegate_task(background=true)) and
background-process completions (terminal notify_on_complete) re-enter the
originating session as internal MessageEvents. When the session was busy,
_handle_active_session_busy_message treated them like a user TEXT message and
the default busy_input_mode='interrupt' aborted the active turn (and sent a
'Interrupting current task' ack) — the opposite of the design invariant that a
completion surfaces as a new turn only when idle.
Short-circuit internal events to return False so the base adapter queues them
silently (it already excludes internal events from debounce), cascading them as
the next turn after the current one finishes.
Review (Codex + 3-agent parallel) found the first cut of in-place mode was
incomplete: it only updated the system prompt, so the persisted transcript
stayed 'full history + summary' and the next turn/resume reloaded the full
history and immediately re-compacted (a loop), and every downstream layer
that keyed off session-id rotation silently no-op'd. The session_id was
doing double duty as the 'compaction happened' signal. This wires the whole
path so removing rotation is actually complete:
Agent (agent/conversation_compression.py):
- In-place now DURABLY replaces the transcript: replace_messages(session_id,
compressed) on the same row (the canonical store the gateway reloads from),
not just update_system_prompt. Resume reloads the compacted set; no loop.
- Reset flush identity/cursor (_last_flushed_db_idx=0, _flushed_db_message_ids
cleared) so next-turn appends diff against the compacted transcript.
- Expose a rotation-independent signal: agent._last_compaction_in_place, and
in_place=True on the session:compress event.
- Fire the compaction-boundary hooks (context-engine on_session_start, memory
manager on_session_switch, reason='compression') in BOTH modes — in-place
passes the same id as parent so DAG/buffer state still checkpoints. Without
this, memory/context plugins miss every in-place compaction.
Gateway auto-compress (gateway/run.py):
- Read agent._last_compaction_in_place; set history_offset=0 on rotation OR
in-place (both return the compacted set, so slicing past the pre-compaction
length would drop everything). Carry compacted_in_place in the result dict.
- No extra rewrite needed: the agent shares the gateway's SessionDB, so its
replace_messages already updated the canonical store load_transcript reads.
Manual /compress (gateway/slash_commands.py):
- The throwaway /compress agent has no _session_db, so rewrite_transcript is
the durable write. Previously gated behind 'if rotated:' which treated
'id unchanged' as the #44794 data-loss failure case and SKIPPED the rewrite
— making /compress a silent no-op in in-place mode. Now rewrites on rotated
OR in_place; the data-loss guard still fires only for the genuine
no-rotation-AND-not-in-place failure.
Hygiene auto-compress already writes _compressed to the same id
unconditionally (its agent has no _session_db, can't rotate) — correct for
in-place, no change.
Tests (tests/run_agent/test_in_place_compaction.py):
- Assert the DURABLE transcript IS the compacted set after reload
(get_messages_as_conversation == compacted), message_count==2, flush
identity reset, and the rotation-independent signal set on in-place /
unset on rotation. Rotation regression guard unchanged.
Verified: 64 tests green across in-place + rotation/persistence/boundary/
concurrent/failure-sync/command/cli suites; E2E both modes (durable replace,
gateway offset=0, rotation preserves old transcript); ruff clean. Still
default-off.
Salvage of PR #41284 onto current main. Relocates the last 9 inline messaging
adapters (+ satellites: telegram_network, feishu_comment/_rules/meeting_invite,
wecom_crypto, wecom_callback) from gateway/platforms/ into self-contained
bundled plugins under plugins/platforms/<x>/, discovered via the platform
registry. Strips the per-platform core touchpoints from gateway/run.py,
gateway/config.py, hermes_cli/gateway.py, hermes_cli/setup.py, and
tools/send_message_tool.py.
Carries forward the migration fixes (explicit enabled:false honored,
get_connected_platforms forces discovery, plugin is_connected via
gateway.get_env_value, logs --component gateway matches plugins.platforms.*,
matrix hidden on Windows).
Additionally ports config keys main added since the PR base: the matrix
plugin's _apply_yaml_config now also covers allowed_users,
ignore_user_patterns, process_notices, and session_scope (the inline
gateway/config.py matrix block gained these in the 1340 commits the PR sat
open; they would otherwise have been silently dropped on deletion).
`_sent_message_timestamps` (the reply-to-own-message quote cache) used a
`set` evicted with `set.pop()`, which removes an ARBITRARY element — so once
more than the cap (500) outbound timestamps are tracked, a still-recent
timestamp could be dropped while older ones survive, missing a genuine
reply-to-own-message. Convert it to an OrderedDict with FIFO (oldest-first)
eviction, mirroring the recently-hardened echo ring (#31250). This closes the
same bug class on the sibling cache.
Adds a regression test asserting oldest-first eviction + MRU promotion.
Review follow-up on the salvaged self-mention strip (#31217): the original
only stripped the bot's rendered @<number>/@<uuid> self-mention inside the
`require_mention=true` branch, so groups with require_mention=false still
leaked it into the agent text. Hoist the strip to run for every group message
(fixing the whole bug class), and collapse the doubled space a mid-sentence
removal leaves while preserving intentional newlines.
Widen the env_float() guard from #48735 across the whole bug class: a
non-numeric value (e.g. a stale .env "HERMES_API_TIMEOUT=abc" or a typo'd
port) raised an unhandled ValueError and crashed adapter/agent init.
Converts 22 genuinely-unguarded first-party int/float(os.getenv()) sites to
the canonical utils.env_int / utils.env_float helpers (the established house
pattern), instead of duplicating per-module helpers or inline try/except:
- gateway/config.py: WECOM_CALLBACK_PORT, BLUEBUBBLES_WEBHOOK_PORT
- gateway/platforms/email.py: EMAIL_IMAP/SMTP_PORT, EMAIL_POLL_INTERVAL
- gateway/platforms/feishu.py: dedup cache + text/media batch settings
- gateway/platforms/wecom.py, discord/adapter.py: text batch delays
- gateway/platforms/telegram.py: media batch delay, TELEGRAM_WEBHOOK_PORT
- gateway/platforms/whatsapp.py: WHATSAPP_NPM_INSTALL_TIMEOUT
- hermes_cli/auth.py: CODEX/XAI refresh timeouts
- agent/chat_completion_helpers.py: API/stream read/stale timeouts
- run_agent.py, agent/auxiliary_client.py: API + nous timeouts
Sites already guarded by try/except or local helpers are left untouched.
The HERMES_MAX_ITERATIONS sites are already guarded on main via
_current_max_iterations(), so they are not included.
Review follow-up on the salvaged AAC + markdown changes:
- Fix an inaccurate comment claiming the STT layer has a sniff-and-remux
fallback (verified: no such fallback exists; the ffmpeg-absent path caches
raw ADTS and STT may reject it).
- Type the _markdown_to_signal wrapper as tuple[str, list[str]] to match the
shared helper instead of a bare tuple.
- Replace the hardcoded /home/pi/... test fixture with a runtime-generated
ADTS AAC sample so the remux round-trip actually runs in CI (skips only
when ffmpeg is absent) instead of always-skipping.
Android Signal delivers voice notes as raw ADTS AAC frames, which
share the `0xFF 0xFx` sync word with MPEG-1/2 Layer 3 (MP3). The
`_guess_extension` byte-signature test in gateway/platforms/signal.py
was matching both, so ADTS AAC was being misclassified as MP3 — saved
to disk with the wrong extension and rejected by every major STT API
(Groq, OpenAI) because their server-side format sniffers inspect the
actual codec, not the file extension.
Two changes:
1. Tighten the MP3 vs ADTS disambiguator. ADTS packs `ID`,
`layer`, and `protection_absent` into bits 3-0 of byte 1, where
`ID=0` and `layer=00` for AAC. Real MP3 has `ID=1` and
`layer` in {01, 10, 11}. The mask `0xF6` against target `0xF0`
cleanly separates them.
2. Remux raw ADTS AAC to MP4 container at the cache step via
`ffmpeg -c:a copy`. Single demux/remux, no re-encode, no quality
loss, sub-100ms on a Pi 5. The cached file is a normal `.m4a`
that all major STT providers accept. ffmpeg is a transitive
dependency of many other Hermes features (TTS, video skills) so
this isn't a new install requirement; the remux degrades
gracefully to a no-op if ffmpeg is missing.
The new helper `_remux_aac_to_m4a` is unit-tested with a real
Android voice note from the audio cache that originally triggered
the bug, plus synthetic ADTS frames for the byte-level
disambiguator and garbage-input graceful failure.
Closes the gap that broke transcription for any Android Signal user
sending voice messages to Hermes.
Route Signal send paths through shared markdown formatting helpers and render markdown bullets consistently as Unicode bullets. Add coverage for Signal formatting and send_message integration.
When a tool call itself restarts the gateway (docker restart, systemctl
restart, and similar), the process is terminated mid-call — before the
tool result is persisted and before the orderly drain rewind can run. The
transcript tail is left as an assistant(tool_calls) with no matching tool
answer. On resume the model re-issues the unanswered call, taking the
gateway down again — an infinite loop (#49201).
Source fix: _build_gateway_agent_history now strips a trailing
assistant(tool_calls) block that has no tool answers
(_strip_dangling_tool_call_tail), so there is nothing for the model to
re-execute. This complements _strip_interrupted_tool_tails, which only
handles the case where a tool result row exists with an interrupt marker.
Cognitive backstop: the resume-pending system note now states that any
restart command in the history already ran and must not be re-executed or
verified, and the empty-message auto-resume startup turn reports recovery
and asks for instructions instead of the nonsensical "address the user's
NEW message" (there is no new message on that turn).
Reimplements the intent of #49243 by @JoaoMarcos44 at the replay layer.
Fixes#49201
#49066 made /model text and the CLI picker persist to config.yaml by
default, but the gateway (Telegram/Discord/Matrix) inline-keyboard picker
callback stayed session-only. Mirror the text path's persist block so a
tapped model survives across launches like a typed one.
Behavior-preserving cleanups on the managed-node resolver:
- Hoist _candidate_node_command_names() out of the inner dir loop in
find_hermes_node_executable (computed once, not per directory).
- Drop redundant os.environ.copy() at the two with_hermes_node_path(
os.environ.copy()) sites \u2014 the helper already copies os.environ when
called with no argument (verified env-equivalent).
- Add reciprocal keep-in-sync comments between iter_hermes_node_dirs()
(hermes_constants.py) and hermesManagedNodePathEntries() (electron
main.cjs), which mirror the same platform-ordering rule across the
Python/Node boundary.
Auto-generated session titles already rename the Telegram forum topic via
the title_callback path, but the /title command only wrote the session
title to the database. On a Telegram topic lane the visible topic kept its
auto-assigned name, so a user who ran /title to override it saw no change.
Propagate the user-chosen title to the topic by calling the existing
_schedule_telegram_topic_title_rename helper on a successful /title set. It
already no-ops off Telegram topic lanes and when auto-rename is disabled.
Third review pass (Hermes subagent) declared convergence: no BLOCKING, the
round-2 generation-aware publish / context-engine staging / CLI reload / ACP
routing all verified correct by hand and by test.
- agent_init: capture _tool_snapshot_generation immediately before the tool
snapshot (was ~425 lines earlier); removes a harmless skew window so the
recorded generation always matches the snapshot it describes.
- gateway/run.py _execute_mcp_reload: keep preserving each cached agent's
build-time enabled_toolsets EXACTLY (do NOT merge newly-connected servers like
CLI/TUI do) and document WHY — gateway sessions can be deliberately locked
down, and test_reload_mcp_preserves_per_agent_toolset_overrides asserts this.
A reviewer suggested "parity" here; it would have violated that contract.
MCP servers that connect after the agent's one-time tool snapshot were
invisible for the whole session. Two root causes, fixed together:
1. The startup discovery wait was a flat 0.75s. HTTP/OAuth servers
commonly take 2-6s on a cold connect, so they missed the window and
their tools never entered the agent's snapshot. `thread.join(timeout)`
already returns the instant discovery completes, so raising the bound
costs ~0s for the common case (no MCP / fast servers) and only ever
blocks for a genuinely-pending server, capped so a dead server can't
freeze startup. The bound is now configurable via
`mcp_discovery_timeout` (config.yaml, default 5.0s).
2. Three call sites duplicated the agent tool-snapshot rebuild (the TUI
`reload.mcp` RPC, the gateway reload, and the TUI late-binding refresh
thread), and the late-refresh detected changes by tool COUNT — missing
an equal-size add/remove swap. Consolidated into one shared
`tools.mcp_tool.refresh_agent_mcp_tools(agent)` helper that diffs by
tool NAME, mutates the agent under a lock (thread-safe), and respects
the agent's own enabled/disabled toolsets.
The late-binding refresh keeps its pre-first-turn cache-safety guard:
it never rebuilds the tool list once a turn has started, so the cached
prompt prefix is never invalidated mid-conversation.
Tests: new tests/tools/test_refresh_agent_mcp_tools.py covers the
name-based diff, in-place mutation, agent-scoped filtering, thread
safety, and the config-driven discovery bound (incl. instant-return
when nothing is pending). 75 passed across the touched areas.
Sets the Telegram bot's short description (the line under its name) to
"Online" on gateway connect and "Offline" on clean disconnect, gated
behind extra.status_indicator (off by default).
Telegram bots have no presence/online dot — that's a user-account
feature the Bot API doesn't expose for bots. The short description is
the closest available surface, so this gives users a way to tell whether
the gateway is up from the bot's profile.
- New extra.status_indicator flag (+ status_online/status_offline text
overrides), read in __init__ via config.extra — no config-schema change.
- _set_status_indicator() helper: best-effort, swallows API errors so it
never blocks connect/disconnect; truncates to Telegram's 120-char cap.
- Wired Online after _mark_connected(), Offline at top of disconnect()
while the bot HTTP client is still alive.
- 9 unit tests + Telegram docs section.
Requested by @ilTrumpista, cc @Teknium.
Manual verification surfaced a second bypass class beyond the standalone
config loaders: several code paths bridge config.yaml values into os.environ
(HERMES_TIMEZONE, HERMES_REDACT_SECRETS, HERMES_MAX_ITERATIONS, TERMINAL_*,
network.force_ipv4, ...) by reading the raw user YAML, so the env the whole
process reads carried the USER's value even when an administrator pinned it —
e.g. a managed timezone was overridden because gateway/run.py wrote the user's
timezone into HERMES_TIMEZONE, and _resolve_timezone_name() checks the env var
first.
Wired the shared apply_managed_overlay() into every config→env bridge:
- gateway/run.py module-level startup bridge (timezone, redact_secrets,
max_turns, terminal, display, gateway.strict, ...)
- gateway/run.py _reload_runtime_env_preserving_config_authority (the per-turn
re-bridge that keeps config authoritative over reloaded .env — must keep
MANAGED authoritative on every turn, not just startup)
- hermes_cli/main.py early security.redact_secrets / network.force_ipv4 bridge
(runs before load_config is usable, at import time)
- hermes_cli/send_cmd.py top-level scalar config→env bridge
Verified end-to-end against a writable managed dir (12/12 checks incl. timezone,
logging, model, skin, gateway settings, write-guard) and in a clean process the
gateway per-turn bridge writes HERMES_TIMEZONE=<managed>. Adds an
order-independent regression test for the bridge overlay.
The skin bug was one instance of a class: several subsystems build their
config dict directly from config.yaml instead of routing through
hermes_cli.config.load_config (which carries the managed merge), so they
silently ignored administrator-pinned values. Audited every config.yaml
reader and fixed the behavioral-read bypasses:
- gateway/config.py load_gateway_config (messaging gateway: session_reset,
quick_commands, stt, model, ...)
- gateway/run.py _load_gateway_config (its read_raw_config fast path also
skipped the merge — read_raw_config returns raw user YAML)
- tui_gateway/server.py _load_cfg (new TUI + desktop backend: skin,
reasoning_effort, service_tier, provider_routing)
- cron/scheduler.py (scheduled-job model/reasoning/toolsets/provider_routing)
- hermes_logging.py (logging.level/max_size_mb/backup_count)
- hermes_time.py (timezone)
- hermes_cli/doctor.py (memory-provider diagnostic reads effective config)
All route through a new shared managed_scope.apply_managed_overlay() helper
that mirrors _load_config_impl (env-only expansion so a user ${VAR} can't
shadow a managed literal, root-model-string normalization, leaf-merge) and is
fail-open. cli.py's earlier inline fix is refactored onto the same helper.
Write-back paths (slash_commands, telegram/yuanbao dm_topics, profile
distribution) are deliberately left reading raw user YAML — overlaying managed
values there would persist them into the user file. The dashboard
(web_server.py) already routes through load_config and needed no change.
TUI loader caches the RAW config so _save_cfg never writes managed values to
disk. Adds test_managed_scope_overlay.py (helper) and
test_managed_scope_loaders.py (per-surface integration); mutation-checked.
ruff (unspecified-encoding) and the Windows-footgun checker both flag
open() in text mode without encoture=. Keep text mode (the Windows lock
path in _try_acquire_file_lock writes a str newline) and pass
encoding='utf-8'.
Two robustness gaps from community review (#44919):
1. Windows dead-path: replaced bespoke fcntl.flock with gateway.status
_try_acquire_file_lock / _release_file_lock — already cross-platform
(msvcrt on Windows, fcntl on POSIX). Added _release_singleton_lock
helper.
2. Lock fd never released: stored handle is now released explicitly in
both exit paths — CancelledError handler and normal while-loop exit.
Allows in-process stop/restart (tests, embedded use).
Also tightened docstrings — 'corrupt the SQLite DBs' is now specific
(wal_autocheckpoint=0 + concurrent manual WAL checkpoints can corrupt
index pages), matching the module's own concurrency claims.
The gateway's embedded dispatcher has no guard against more than one dispatcher
running concurrently. dispatch_in_gateway defaults to true, so a second gateway
for the same profile (a restart race where the old process is slow to exit) — or
any deployment that runs multiple profile gateways with the default — starts a
second dispatcher loop. As #41448 describes, concurrent dispatchers each run
release_stale_claims() against the same boards, double reclaim frequency, and
re-dispatch slow workers before they finish. In practice they also corrupt the
shared kanban SQLite DBs under concurrent write load.
Add _acquire_singleton_lock(): an exclusive, non-blocking fcntl.flock at the
machine-global kanban root (kanban_home()/kanban/.dispatcher.lock — the board is
shared across profiles by design, so this serialises every gateway, not just one
profile). The first gateway to start its dispatcher holds the lock for its
process lifetime; any other gateway finds it contended, logs, and skips
dispatching while still running for messaging. Falls back to config-only control
on non-POSIX or filesystems without flock.
This is more robust than a per-profile guard because the documented model is
"one dispatcher sweeps all boards" — the contention is across profiles, not just
within one. Closes#41448.
Test: lock is exclusive (held, then contended while held, then held again after
release).