Follow-up cleanups on top of the busy/idle readout (PR #50103):
- web_server.py /api/status reused the single drain-timeout resolver
hermes_cli.gateway._get_restart_drain_timeout() (HERMES_RESTART_DRAIN_TIMEOUT
env -> agent.restart_drain_timeout config -> default) instead of inlining a
third hand-rolled copy of that precedence chain. Also fixes a subtle
divergence: the inline copy used os.environ.get() so a set-but-empty env var
was treated as a value rather than falling through to config; the shared
resolver .strip()s and falls through correctly.
- Added gateway.status.parse_active_agents() and routed BOTH HTTP surfaces
(/api/status and /health/detailed) through it, so the exposed active_agents
field is consistently clamped non-negative. Previously /api/status clamped
while /health/detailed exposed the raw file value, diverging on a corrupt
count.
- Added TestParseActiveAgents covering the shared coercion contract.
Give an external consumer (NAS) a trustworthy, always-reachable busy/idle
readout it can poll before a disruptive lifecycle action (restart,
migrate, stop, auto-update). The dashboard /api/status is the only HTTP
surface guaranteed up on a hosted agent regardless of which gateway
platforms are enabled, and it already reads gateway_state.json.
Add to /api/status (additive, non-breaking):
- active_agents — in-flight gateway-turn count (now refreshed
per-turn by the companion gateway-side commit)
- gateway_busy — running AND active_agents > 0
- gateway_drainable — running and live (a valid begin-drain target)
- restart_drain_timeout — resolved seconds, so the consumer can size its
poll deadline without out-of-band knowledge
(env HERMES_RESTART_DRAIN_TIMEOUT → config
agent.restart_drain_timeout → default)
The busy/drainable contract is defined once in gateway.status
(derive_gateway_busy / derive_gateway_drainable) and consumed by both
/api/status and /health/detailed so the two surfaces can never disagree.
Liveness keys off gateway_running (a live PID/health probe), NEVER
gateway_updated_at — a healthy idle gateway never advances that timestamp.
All derived fields degrade to safe falsy values when the gateway is down
or the status file is absent/corrupt (never a spurious "busy" that would
wedge the consumer). active_sessions (the 5-min DB recency heuristic the
SPA reads) is left exactly as-is — new signal, new fields.
Tests (behaviour contracts, not snapshots): the pure derivation contract
across every running/state/count/liveness combination; /api/status
integration for busy, idle-drainable, draining, down, stale-busy-file,
corrupt-count, and timeout surfacing; and /health/detailed parity.
The gateway only rewrote gateway_state.json on lifecycle transitions
(start/connect/drain/stop), never on turn start/end. Live-verified on a
hosted agent: a confirmed end-to-end turn ran while gateway_updated_at
stayed frozen at boot and active_agents was absent — so any active_agents
read from the file between transitions is stale. That makes it unusable
as a busy/idle signal for an external consumer (NAS deciding whether it's
safe to restart/migrate/auto-update an agent mid-turn).
Add _persist_active_agents(), called at every turn boundary:
- turn start: both running-agent sentinel-claim sites (normal inbound
message path + startup-resume path)
- turn end: the central _release_running_agent_state() choke point
(covers normal completion, /stop, /reset, sentinel cleanup,
stale-eviction — every path that ends a running turn)
It passes ONLY active_agents to write_runtime_status, leaving
gateway_state (and every other field) _UNSET so the read-merge-write
preserves the current lifecycle state. Passing gateway_state=None would
clobber it — hence a dedicated helper rather than reusing
_update_runtime_status. The write is the same cheap JSON write done on
lifecycle transitions today; best-effort (a failed status write never
disrupts a turn).
Behaviour-contract test: an active_agents-only write preserves both
running and draining gateway_state, and the count clamps non-negative.
The native echo recovery handles replies to most rich messages, but
messages sent before the bot's first rich send have no echo to read.
record() was only called on the fresh-send path (_try_send_rich); a
streamed final finalized via _try_edit_rich/editMessageText was never
indexed, so a reply to it had neither a native echo nor an index entry.
Mirror the fresh-send record() into the edit success path to close
that gap.
Telegram DOES echo a rich message's content back in
reply_to_message.api_kwargs['rich_message']['blocks'] when a user
replies to it. Read that native field first in _build_message_event,
keeping the local send-time index only as a fallback. Duck-type
api_kwargs via .get() since it is a mappingproxy, not a dict.
Fixes#49534
After context compression, the agent re-sent an already-delivered
generated image on every subsequent turn (#46627). The auto-append
fallback rescans full history when the message list shrinks (compression-
safe path), deduping against _history_media_paths — but that set was built
by scanning ONLY MEDIA: text tags in tool results. image_generate returns
its path in a JSON payload field (host_image/image/agent_visible_image),
never a MEDIA: tag, so generated-image paths never entered the dedup set
and were re-emitted after the boundary.
Extract the history-path collection into _collect_history_media_paths(),
which now covers BOTH delivery shapes: MEDIA: text tags AND image_generate
JSON-payload paths (mirroring what _collect_auto_append_media_tags
extracts). The inline block in _handle_message is replaced with a call to
the helper.
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
test_matrix_voice flaked in CI (6/7 failing on some shards, passing on
others and on main) depending on leaked MATRIX_REQUIRE_MENTION env state.
Root cause: the adapter defaults require_mention=True (falling back to the
MATRIX_REQUIRE_MENTION env var). These tests fire a group-room audio event
with no @mention, so _resolve_message_context drops it before dispatch
('No event was captured') whenever require_mention resolves True — which
happens in a clean shard, but an earlier test in another shard can leave
MATRIX_REQUIRE_MENTION=false in os.environ and mask it. The plugin
migration (#5600105478 adapter→bundled plugin) shifted shard composition
and exposed it.
Pin require_mention: False in the test adapter config so these media-TYPE
detection tests are no longer gated by the mention requirement, regardless
of ambient env. Verified: 7/7 pass with MATRIX_REQUIRE_MENTION=true (the
failing condition) AND with the env unset.
Gateway Session Hygiene auto-compression destroyed the original transcript
when the throwaway hygiene agent couldn't rotate the session (#21301, P1).
The _hyg_agent is built WITHOUT a session_db, so _compress_context cannot
end-and-fork the session (its rotate block is gated on agent._session_db).
The session_id stays unchanged, and the rewrite_transcript() call ran
UNCONDITIONALLY — replacing the full original transcript with just the
head+summary list. Permanent data loss on every hygiene compaction.
Guard the rewrite behind 'rotated OR in-place' exactly like the /compress
path already does (#44794/#39704): only overwrite when a new session id
was minted or in-place compaction succeeded; otherwise preserve the
original transcript and log a warning. The token/count bookkeeping that
followed the rewrite is moved inside the guard, with no-change values in
the preserve branch.
Co-authored-by: SandroHub013 <sandrohub013@gmail.com>
Co-authored-by: WuTianyi123 <wtyopenclaw@gmail.com>
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
The subagent-demotion busy-handler test asserted the internal
merge_pending_message_event call, which the FIFO refactor replaced with
_queue_or_replace_pending_event. Assert the behavioral outcome (the
follow-up lands in the pending slot for the next turn) instead — same
fix already applied to the two steer-fallback tests.
When the agent is busy and the user sends multiple text follow-ups, the
interrupt-mode and steer-fallback path stored them via
merge_pending_message_event(merge_text=True), which newline-joins
consecutive TEXT messages into a SINGLE pending turn — collapsing two
separate user messages into one mashed-together turn and destroying the
message boundaries the user sees (#43066 sub-bug 2).
Route that storage through _queue_or_replace_pending_event (the same FIFO
infrastructure used by busy queue-mode and /queue) so each follow-up gets
its own next-turn slot in arrival order, while still preserving
photo-burst / album merge semantics for media. Pure queue-mode already
used FIFO; this brings the interrupt/steer-fallback path in line.
The sibling defect in #43066 (assistant messages lost after compaction)
was already fixed on main by the identity-tracking flush rewrite (#46053)
plus the pre-rotation flush (#47202), so this only addresses the
remaining busy-message-merge half.
Co-authored-by: KiruyaMomochi <65301509+KiruyaMomochi@users.noreply.github.com>
Follow-up for salvaged #49654: unit tests for resolve_whatsapp_bridge_dir()
(writable passthrough, read-only mirror, existing-mirror reuse) and the
AUTHOR_MAP entry for the contributor.
Async-delegation completions (delegate_task(background=true)) and
background-process completions (terminal notify_on_complete) re-enter the
originating session as internal MessageEvents. When the session was busy,
_handle_active_session_busy_message treated them like a user TEXT message and
the default busy_input_mode='interrupt' aborted the active turn (and sent a
'Interrupting current task' ack) — the opposite of the design invariant that a
completion surfaces as a new turn only when idle.
Short-circuit internal events to return False so the base adapter queues them
silently (it already excludes internal events from debounce), cascading them as
the next turn after the current one finishes.
test_telegram_webhook_secret reads telegram adapter source by path; point it
at plugins/platforms/telegram/adapter.py. test_windows_native_support
npm-spawn parametrization referenced gateway/platforms/whatsapp.py; point it at
plugins/platforms/whatsapp/adapter.py.
Salvage of PR #41284 onto current main. Relocates the last 9 inline messaging
adapters (+ satellites: telegram_network, feishu_comment/_rules/meeting_invite,
wecom_crypto, wecom_callback) from gateway/platforms/ into self-contained
bundled plugins under plugins/platforms/<x>/, discovered via the platform
registry. Strips the per-platform core touchpoints from gateway/run.py,
gateway/config.py, hermes_cli/gateway.py, hermes_cli/setup.py, and
tools/send_message_tool.py.
Carries forward the migration fixes (explicit enabled:false honored,
get_connected_platforms forces discovery, plugin is_connected via
gateway.get_env_value, logs --component gateway matches plugins.platforms.*,
matrix hidden on Windows).
Additionally ports config keys main added since the PR base: the matrix
plugin's _apply_yaml_config now also covers allowed_users,
ignore_user_patterns, process_notices, and session_scope (the inline
gateway/config.py matrix block gained these in the 1340 commits the PR sat
open; they would otherwise have been silently dropped on deletion).
`_sent_message_timestamps` (the reply-to-own-message quote cache) used a
`set` evicted with `set.pop()`, which removes an ARBITRARY element — so once
more than the cap (500) outbound timestamps are tracked, a still-recent
timestamp could be dropped while older ones survive, missing a genuine
reply-to-own-message. Convert it to an OrderedDict with FIFO (oldest-first)
eviction, mirroring the recently-hardened echo ring (#31250). This closes the
same bug class on the sibling cache.
Adds a regression test asserting oldest-first eviction + MRU promotion.
The hardened echo ring (#31250) changes _recent_sent_timestamps from a set
to an OrderedDict, so the reply-detection-cache regression test from the quote
salvage can no longer call .discard(); route it through the new
_consume_sent_timestamp() helper, which is the real echo-removal path.
Review follow-up on the salvaged AAC + markdown changes:
- Fix an inaccurate comment claiming the STT layer has a sniff-and-remux
fallback (verified: no such fallback exists; the ffmpeg-absent path caches
raw ADTS and STT may reject it).
- Type the _markdown_to_signal wrapper as tuple[str, list[str]] to match the
shared helper instead of a bare tuple.
- Replace the hardcoded /home/pi/... test fixture with a runtime-generated
ADTS AAC sample so the remux round-trip actually runs in CI (skips only
when ffmpeg is absent) instead of always-skipping.
Android Signal delivers voice notes as raw ADTS AAC frames, which
share the `0xFF 0xFx` sync word with MPEG-1/2 Layer 3 (MP3). The
`_guess_extension` byte-signature test in gateway/platforms/signal.py
was matching both, so ADTS AAC was being misclassified as MP3 — saved
to disk with the wrong extension and rejected by every major STT API
(Groq, OpenAI) because their server-side format sniffers inspect the
actual codec, not the file extension.
Two changes:
1. Tighten the MP3 vs ADTS disambiguator. ADTS packs `ID`,
`layer`, and `protection_absent` into bits 3-0 of byte 1, where
`ID=0` and `layer=00` for AAC. Real MP3 has `ID=1` and
`layer` in {01, 10, 11}. The mask `0xF6` against target `0xF0`
cleanly separates them.
2. Remux raw ADTS AAC to MP4 container at the cache step via
`ffmpeg -c:a copy`. Single demux/remux, no re-encode, no quality
loss, sub-100ms on a Pi 5. The cached file is a normal `.m4a`
that all major STT providers accept. ffmpeg is a transitive
dependency of many other Hermes features (TTS, video skills) so
this isn't a new install requirement; the remux degrades
gracefully to a no-op if ffmpeg is missing.
The new helper `_remux_aac_to_m4a` is unit-tested with a real
Android voice note from the audio cache that originally triggered
the bug, plus synthetic ADTS frames for the byte-level
disambiguator and garbage-input graceful failure.
Closes the gap that broke transcription for any Android Signal user
sending voice messages to Hermes.
Route Signal send paths through shared markdown formatting helpers and render markdown bullets consistently as Unicode bullets. Add coverage for Signal formatting and send_message integration.
When a tool call itself restarts the gateway (docker restart, systemctl
restart, and similar), the process is terminated mid-call — before the
tool result is persisted and before the orderly drain rewind can run. The
transcript tail is left as an assistant(tool_calls) with no matching tool
answer. On resume the model re-issues the unanswered call, taking the
gateway down again — an infinite loop (#49201).
Source fix: _build_gateway_agent_history now strips a trailing
assistant(tool_calls) block that has no tool answers
(_strip_dangling_tool_call_tail), so there is nothing for the model to
re-execute. This complements _strip_interrupted_tool_tails, which only
handles the case where a tool result row exists with an interrupt marker.
Cognitive backstop: the resume-pending system note now states that any
restart command in the history already ran and must not be re-executed or
verified, and the empty-message auto-resume startup turn reports recovery
and asks for instructions instead of the nonsensical "address the user's
NEW message" (there is no new message on that turn).
Reimplements the intent of #49243 by @JoaoMarcos44 at the replay layer.
Fixes#49201
Teams overrode send_image/send_image_file but not send_video, send_voice,
or send_document — so when the gateway dispatched a video/voice/document
reply to a Teams chat it fell through to the base-class text fallback and
sent the local file path as plain text (same broken-UX class as the LINE
URL-image gap in #49298).
Extract the existing send_image attachment logic into a shared
_send_media_attachment helper (remote URL by reference, local file as a
base64 data URI, MIME guessed from the path) and route all four media
kinds through it. 5 new tests cover remote-URL, local-file base64,
no-app, and missing-file paths.
Simplify pass on the picker-persist coverage:
- Stub list_picker_providers + resolve_display_context_length so the
tests no longer make real outbound HTTP calls (OpenRouter catalog +
Ollama /api/show) during picker setup and confirmation rendering.
Runtime drops from ~11s to ~0.4s and the tests are now deterministic.
- Collapse the two positive persist cases into one parametrize over the
config seed (nested-dict vs flat-string), asserting the nested-dict
invariant in both.
- Assert the in-memory session override is applied in the --session
case, closing a 'passes for the wrong reason' gap (config untouched
AND the switch still took effect).
- _FakePickerResult -> types.SimpleNamespace.
Mutation re-checked on the final test: both persist cases fail on
pre-fix slash_commands.py; the --session case passes on both.
Add regression coverage for the picker persist fix: drive the real
_handle_model_command with a fake picker-capable adapter that captures
the on_model_selected callback, fire a 'tap', and assert config.yaml is
written (bare /model), left untouched (--session), and that a flat-string
model: is coerced to a nested dict on a tap.
Mutation-checked: the persist and coercion assertions fail on pre-fix
slash_commands.py and pass on the fix.
Auto-generated session titles already rename the Telegram forum topic via
the title_callback path, but the /title command only wrote the session
title to the database. On a Telegram topic lane the visible topic kept its
auto-assigned name, so a user who ran /title to override it saw no change.
Propagate the user-chosen title to the topic by calling the existing
_schedule_telegram_topic_title_rename helper on a successful /title set. It
already no-ops off Telegram topic lanes and when auto-rename is disabled.
* fix(discord): hydrate channel context when replying to a message
Replying to a message in a free-response (non-mention, threads-off)
channel previously received only the 500-char "[Replying to: ...]"
snippet — the history-backfill gate fired only for mention-gated
channels and threads, so a reply got no surrounding channel context.
Replies now route through the same _fetch_channel_context hydration
that threads use. When the user replied to a specific (often older)
message, a reply-anchored window is scanned ending at that message so
the agent sees the exchange around what was pointed at, even when the
target sits before the self-message partition. The two windows are
merged chronologically and de-duplicated by message id.
Also hardens the recent-window scan to skip non-conversational status
bumps before the self-message partition check, and makes author-name
resolution defensive against partial/deleted authors.
* fix(discord): duck-type reply-target resolution instead of isinstance(discord.Message)
The e2e suite stubs the discord module, so discord.Message is a MagicMock
and isinstance(_resolved, discord.Message) raises 'isinstance() arg 2 must
be a type'. Any object with an int .id works as a scan anchor, so resolve
the reply target by duck-typing on .id and fall back to a _Snowflake from
the reference message_id.
Sets the Telegram bot's short description (the line under its name) to
"Online" on gateway connect and "Offline" on clean disconnect, gated
behind extra.status_indicator (off by default).
Telegram bots have no presence/online dot — that's a user-account
feature the Bot API doesn't expose for bots. The short description is
the closest available surface, so this gives users a way to tell whether
the gateway is up from the bot's profile.
- New extra.status_indicator flag (+ status_online/status_offline text
overrides), read in __init__ via config.extra — no config-schema change.
- _set_status_indicator() helper: best-effort, swallows API errors so it
never blocks connect/disconnect; truncates to Telegram's 120-char cap.
- Wired Online after _mark_connected(), Offline at top of disconnect()
while the bot HTTP client is still alive.
- 9 unit tests + Telegram docs section.
Requested by @ilTrumpista, cc @Teknium.
Adds a Raft platform adapter as a bundled plugin (plugins/platforms/raft/)
connecting Hermes to Raft as an external agent via a wake-channel bridge.
The adapter starts a loopback HTTP endpoint, spawns 'raft agent bridge' as a
child process, and injects content-free wake hints into the gateway session
pipeline. The agent reads/sends messages through the Raft CLI; the adapter
never touches message bodies or delivery cursors. Activity observer hooks
report tool/LLM/session lifecycle events via a bounded at-most-once queue.
Auto-enables when RAFT_PROFILE is set.
Cherry-picked from PR #47629. Authored by skyzh (@xxchan).
Two robustness gaps from community review (#44919):
1. Windows dead-path: replaced bespoke fcntl.flock with gateway.status
_try_acquire_file_lock / _release_file_lock — already cross-platform
(msvcrt on Windows, fcntl on POSIX). Added _release_singleton_lock
helper.
2. Lock fd never released: stored handle is now released explicitly in
both exit paths — CancelledError handler and normal while-loop exit.
Allows in-process stop/restart (tests, embedded use).
Also tightened docstrings — 'corrupt the SQLite DBs' is now specific
(wal_autocheckpoint=0 + concurrent manual WAL checkpoints can corrupt
index pages), matching the module's own concurrency claims.
The gateway's embedded dispatcher has no guard against more than one dispatcher
running concurrently. dispatch_in_gateway defaults to true, so a second gateway
for the same profile (a restart race where the old process is slow to exit) — or
any deployment that runs multiple profile gateways with the default — starts a
second dispatcher loop. As #41448 describes, concurrent dispatchers each run
release_stale_claims() against the same boards, double reclaim frequency, and
re-dispatch slow workers before they finish. In practice they also corrupt the
shared kanban SQLite DBs under concurrent write load.
Add _acquire_singleton_lock(): an exclusive, non-blocking fcntl.flock at the
machine-global kanban root (kanban_home()/kanban/.dispatcher.lock — the board is
shared across profiles by design, so this serialises every gateway, not just one
profile). The first gateway to start its dispatcher holds the lock for its
process lifetime; any other gateway finds it contended, logs, and skips
dispatching while still running for messaging. Falls back to config-only control
on non-POSIX or filesystems without flock.
This is more robust than a per-profile guard because the documented model is
"one dispatcher sweeps all boards" — the contention is across profiles, not just
within one. Closes#41448.
Test: lock is exclusive (held, then contended while held, then held again after
release).
- _guard_named_profile_under_multiplexer: when the default gateway is running
with gateway.multiplex_profiles=on, a named-profile 'hermes gateway run' hard
-errors (pointing at the multiplexer) instead of double-binding that
profile's platforms. Inert unless all hold: this invocation is a named
profile, a default-profile gateway is alive, and its config has multiplexing
on. --force overrides. Wired into run_gateway's guard chain.
- write_runtime_status gains served_profiles: the secondary-adapter startup
records [active] + multiplexed profiles into runtime_status.json so
'hermes status' can show per-profile coverage without a second probe. Absent
for single-profile gateways.
Tests: served_profiles round-trips and is absent by default; guard is inert for
the default profile / under --force / when no default gateway is running.
Bring up adapters for every profile the gateway serves, not just the active
one. Keeps self.adapters as the default/active profile's map (the ~93 existing
self.adapters[...] sites are untouched) and adds secondary profiles under
self._profile_adapters[profile][platform].
- _start_secondary_profile_adapters loops profiles_to_serve(multiplex=True),
skips the active profile (handled by the primary startup loop), and for each
other profile loads its gateway config and creates+connects its enabled
adapters under that profile's _profile_runtime_scope (home + secret scope).
- Each secondary adapter gets _make_profile_message_handler(profile): stamps
source.profile (when unset) before delegating to the shared _handle_message,
so the agent turn and session key resolve to that profile.
- Same-platform credential-conflict detection: _adapter_credential_fingerprint
hashes the adapter's bot token (salted, truncated — never logs the token);
two profiles claiming the same (platform, token) refuse the duplicate with a
clear error naming both, since one token can't be polled twice.
- Port-binding hard-error: a SECONDARY profile that enables a port-binding
platform (webhook, api_server, msgraph_webhook, feishu, wecom_callback,
bluebubbles, sms) is a config error and aborts startup via MultiplexConfigError
— the default profile owns the single shared HTTP listener and serves every
profile through the /p/<profile>/ prefix, so a second bind can only collide.
Distinct from a transient connect failure (which logs + stays alive to retry):
a config error writes gateway_state=startup_failed and exits cleanly with an
actionable message (names the profile, the platform, and the fix). There is no
valid reason to bind a second port once you've opted into a multiplexer.
- Shutdown tears down secondary adapters alongside the primary ones.
- Defensive getattr guards keep partial-construction unit tests (stop(),
_run_agent on bare instances) working.
No-op when multiplex_profiles is off (self._profile_adapters stays empty).
Tests: fingerprint stability/log-safety/distinctness, profile message-handler
stamping (and not overriding an already-stamped source), port-binding hard-error
raises + names the profile/platform, non-binding platform is not rejected, and
the guard set covers every TCP-binding adapter.
Serve webhook inbound for multiple profiles off the one shared listener via a
URL prefix, with no second port bound.
- SessionSource gains a 'profile' field (round-trips through to_dict/from_dict;
omitted when unset so existing serialization is unchanged). It carries which
profile an inbound message was routed to.
- WebhookAdapter registers /p/{profile}/webhooks/{route_name} alongside the
existing /webhooks/{route_name}. _resolve_request_profile validates the
prefix against profiles_to_serve(): None when absent or multiplexing is off
(ignored, handled as default — no spurious 404), the profile name when valid,
_PROFILE_REJECTED (→ 404) when the profile isn't served. The resolved profile
is stamped onto the SessionSource.
- session-key namespacing and the per-turn home/credential scope now prefer
source.profile: SessionStore._resolve_profile_for_key(source),
_session_key_for_source fallback, and _resolve_profile_home_for_source all
honor it (→ the agent turn resolves that profile's config/skills/credentials
via the Phase 2 _profile_runtime_scope).
Constraint: routing inbound needs no per-profile platform credential, but the
agent still needs the routed profile's provider key — delivered by Phase 2's
secret scope. api_server (OpenAI-compatible surface) profile routing is a
focused follow-on; its source-construction path differs from webhook's.
Tests: SessionSource.profile round-trip + namespace drive; _resolve_request_
profile accept/reject/ignore matrix.
The credential gate. When multiplexing is active, a profile's secrets resolve
from a context-local scope, never the process-global os.environ (which in a
multiplexer may hold another profile's keys, and is inherited by every
subprocess spawned with env=dict(os.environ)).
- agent/secret_scope.py: get_secret() backed by a secret-scope contextvar.
FAIL-CLOSED: when multiplex is active and no scope is installed, an unscoped
read RAISES UnscopedSecretError instead of falling back to os.environ — a
missed/new call site crashes loudly at that line rather than leaking a
cross-profile value. Genuinely-global vars (HERMES_*, PATH, kanban paths,
…) keep reading os.environ via an allowlist. load_env_file/build_profile_
secret_scope parse a profile .env into an isolated dict WITHOUT mutating
os.environ. Off by default => transparent os.getenv behavior.
- hermes_cli/runtime_provider.py: all credential/provider/base-url reads go
through _getenv -> get_secret.
- agent/credential_pool.py: env fallbacks route through get_secret (the
~/.hermes/.env-first preference is preserved and already profile-correct via
the home override).
- tools/mcp_tool.py: MCP config interpolation resolves through
get_secret, so a server's picks up the routed profile's value.
- gateway/run.py: set_multiplex_active() at GatewayRunner init; per-turn .env
reload is a no-op for credentials in multiplex mode (secrets come from the
scope, not global env); _profile_runtime_scope context manager combines the
HERMES_HOME override + secret scope; _run_agent wraps _run_agent_inner in
that scope (resolved via _resolve_profile_home_for_source) when multiplexing.
Propagates into the agent worker thread for free via the existing
copy_context() in _run_in_executor_with_context.
Tests: 13 unit (fail-closed, scope isolation, global allowlist, .env parsing
without environ mutation) + 7 E2E (runtime_provider + MCP interpolation prove
two profiles isolated, unscoped read raises, globals still read environ).
Foundations for serving multiple profiles from one gateway process, inert
when off:
- gateway.multiplex_profiles config flag (default false), round-trips through
GatewayConfig and load_gateway_config (top-level + nested gateway.* form).
- hermes_cli.profiles.profiles_to_serve(multiplex): the single chokepoint for
which (profile, HERMES_HOME) pairs the gateway serves. Lightweight dir scan;
active-profile-only when off, default + all named profiles when on.
- build_session_key gains a profile= namespace slot. Default/None reuse the
historical 'agent:main:...' literal BYTE-IDENTICALLY (no session migration,
positional parsers unaffected); a named profile becomes 'agent:<profile>:...'
so two profiles on the same platform/chat never collide.
- SessionStore._resolve_profile_for_key + _session_key_for_source fallback
resolve the namespace from the flag (legacy when off, active profile when on).
Tests: byte-identical-when-off (parametrized), namespace isolation, positional
layout preserved, config round-trip, profiles_to_serve enumeration.
Discord channel-history backfill partitions on Hermes' last self-authored
message. Asynchronous, non-conversational status sends (self-improvement
review bubbles, heartbeats, background-process notifications, update status,
gateway restart/online notices) land as ordinary bot messages, so a delayed
status bump becomes the history boundary and swallows real messages that
arrived after Hermes' actual reply.
Mark these sends at the source via metadata["non_conversational"] (Discord
only; other platforms' metadata is unchanged). The adapter no longer advances
the history-boundary cache for marked sends and persists their IDs to a
sidecar JSON so the cold-start scan can skip them by ID after a restart. A
narrow regex recognizer remains only as an upgrade bridge for status bumps
emitted by an older gateway that pre-dates the marking.
A plain /model <name> switch only lasted for the current session — every
new session reverted to the previously-configured model, so users had to
re-switch every time (e.g. glm-5.1 -> glm-5.2 on every launch).
Persist-by-default is now the behavior across all three /model surfaces
(CLI, gateway, TUI/dashboard), gated by a new config key
model.persist_switch_by_default (default true):
/model <name> switch model (persists to config.yaml)
/model <name> --session switch for this session only
/model <name> --global switch and persist (explicit, unchanged)
The effective persistence is resolved once via resolve_persist_behavior()
in hermes_cli/model_switch.py so --session opts out, --global opts in,
and the config-gated default applies otherwise. --global remains a valid
explicit no-op alias for the new default.
Address correctness gaps found in pre-PR review of the strict matcher:
- Profile selectors can appear on EITHER side of the `gateway` token
(`_apply_profile_override` strips `--profile`/`-p` from anywhere in argv
before argparse), so `hermes gateway --profile work run` and
`python -m hermes_cli.main gateway -p work run` are valid launches the
previous matcher wrongly rejected. Strip `--profile`/`-p`/`--profile=`/`-p=`
from anywhere before locating the subcommand.
- A profile literally named `gateway` (`hermes -p gateway gateway run`) made
the old token scan stop on the profile value; stripping the selector+value
first fixes it.
- Tokenize quote-aware with `shlex` so quoted Windows paths containing spaces
(`"C:\Program Files\Hermes\hermes-gateway.exe"`) are no longer split mid-path
and the dedicated-entrypoint match survives.
Without these, the matcher could MISS a real running gateway -> the opposite
failure (restart/status reporting "down" when up). Adds regression tests for
all three shapes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`hermes gateway restart` on Windows could take the gateway offline with no
replacement. restart() was stop() -> sleep(1.0) -> start(), but the graceful
drain can run up to ~180s while the detached pythonw process stays alive. The
1s sleep let start() run against the still-draining old process; its
"already running" guard then no-opped, and when the old process finally exited
nothing relaunched it.
Two root causes, both fixed:
1. Loose PID detection. `_scan_gateway_pids` and the gateway.status helpers
used substring matches ("... gateway" in cmdline) for lifecycle decisions,
so they false-matched `gateway status`/`dashboard` siblings and unrelated
processes like `python -m tui_gateway`, plus stale gateway.pid records.
Add a shared strict matcher `looks_like_gateway_command_line()` in
gateway/status.py that requires the real `gateway run` subcommand (or the
dedicated entrypoints), and route `_looks_like_gateway_process`,
`_record_looks_like_gateway`, and `_scan_gateway_pids` through it.
2. restart() race. Wait until the gateway is authoritatively gone
(`get_running_pid()` + strict `_gateway_pids()`) before relaunch; force-kill
once if it lingers and raise rather than start a duplicate; verify the
relaunch produced a running gateway and raise loudly if not (no more
exit-0 silent outage).
Scoped to Windows; systemd/launchd restart paths are already drain-aware.
Adds tests/gateway/test_gateway_command_line_matcher.py.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the tautological test from the original PR (which asserted a
plain assignment it performed itself in the test body) with one that
exercises the actual contracts: _init_cached_agent_for_turn leaves
max_iterations untouched, and the per-turn IterationBudget rebuild
(turn_context.py) propagates a refreshed cap.
Two bugs surfaced from production usage in #37134:
1. Dict choices rendered as Python repr. LLMs sometimes emit
[{"description": "..."}] instead of bare strings; the old
str(c).strip() coercion turned the whole dict into
"{'description': '...'}" on the button label.
Fix: add a _flatten_choice helper that unwraps dicts against
the canonical LLM tool-call user-facing keys (label, description,
text, title) in that order. Dicts with none of those keys are
dropped. The "name" and "value" keys are deliberately NOT in the
priority list — they're Discord-component-shaped fields that
could appear in dicts that aren't meant to be choices (a
developer-error wiring that passes a Button-shaped object);
picking them would leak raw enum values or 4-char model
identifiers onto user-facing buttons.
2. Mid-word truncation on long button labels. The old
choice[:72] + "..." cut at position 72, mid-word. Worse, the
three-char ellipsis ate into the 80-char Discord label cap,
leaving only 75 chars of body.
Fix: budget-aware cut strategy with three tiers:
a. Last space in the trailing half of the budget (word boundary).
b. Last soft boundary (- , . )) in the trailing half — used
only when no word boundary exists.
c. Hard cut at the budget limit (last resort).
Use single U+2026 (…) to fit the cap. Cut AT soft boundaries
(inclusive) so the label ends on the boundary char rather than
on the alpha char that followed it.
Tests:
- test_unwraps_dict_choices_to_description: reproduces the
screenshot in #37134, asserts the Python repr is gone.
- test_unwrap_prefers_description_over_name_in_multi_key_dict:
regression guard for the name-key order in the unwrap list.
- test_unwrap_prefers_label_over_description: regression guard
for label winning over description.
- test_unwrap_does_not_pick_value_or_name_alone: regression
guard for the "name"/"value" fields being absent.
- test_truncates_long_choice_label: 200-char input, asserts
total <= 80 and U+2026.
- test_truncates_long_choice_label_breaks_on_word_boundary:
asserts the cut is on a space, not mid-word.
- test_truncates_long_no_space_choice_on_soft_boundary:
adversarial input where position 76 is mid-word alpha, asserts
the renderer falls back to a soft boundary.
Parity: telegram clarify suite (12 tests) still passes; the
helper is a Discord adapter local, not shared with the gateway.
Follow-up: gateway/platforms/telegram.py has the same str(c).strip()
pattern in its own send_clarify and will need a similar fix
(separate PR to keep this diff reviewable).
Fixes#37134
* fix(relay): enable RELAY platform + normalize dial URL so hosted gateways actually connect
Three bugs blocked a self-provisioned hosted gateway from ever establishing its
inbound relay WS (found while standing up the live staging end-to-end). Each
masked the next; all three are needed for inbound to work.
1. RELAY platform never enabled in config.platforms (gateway/config.py).
register_relay_adapter() puts the adapter in the platform_registry, but
start_gateway()'s connect loop iterates self.config.platforms — which never
contained Platform.RELAY. So the adapter was "registered" but never connected
(logs showed "relay adapter registered" then "No messaging platforms
enabled"). Fix: _apply_env_overrides now enables Platform.RELAY (mirroring
relay_url into extra for the connected-checker) when GATEWAY_RELAY_URL (env)
or gateway.relay_url (yaml) is set. Absent -> no RELAY entry (direct/
single-tenant gateways unaffected).
2. URL scheme not converted for the WS dial (gateway/relay/ws_transport.py).
The relay URL is configured once as the http(s):// base (used as-is for the
provision POST), but websockets.connect rejects http(s):// with "scheme isn't
ws or wss". Fix: _ws_dial_url converts https->wss / http->ws.
3. /relay path not appended (same helper). The connector mounts its
WebSocketServer at path "/relay" and returns HTTP 400 on an upgrade to any
other path. GATEWAY_RELAY_URL is the base (no /relay), so the dial hit "/"
-> 400. Fix: _ws_dial_url ensures the path ends in /relay. Idempotent — a URL
already carrying ws(s):// and/or /relay is unchanged, so provision's
_provision_url (which derives /relay/provision from either form) still works.
Why the cross-repo E2E missed #2/#3: the stub connector binds ws://host:port and
its websockets.serve accepts ANY path, so neither the scheme nor the /relay path
was exercised. Real connector needs both.
Verified live on staging hermes-agent-stg-automated-perception-5054: after the
fixes the gateway logs "Connecting to relay..." -> "✓ relay connected" ->
"Gateway running with 1 platform(s)" against
wss://gateway-gateway.staging-nousresearch.com/relay, stable.
Tests: added _ws_dial_url scheme+path+idempotency cases (test_ws_transport.py)
and RELAY-platform-enablement cases for env + yaml + absent (test_config.py).
Full gateway/relay + config suites green (191 passed).
Relay-adapter lane. EXPERIMENTAL.
* fix(relay): re-attach guild_id to outbound so connector egress resolves the tenant
The final bug in the hosted-relay round-trip. Inbound worked end to end (Discord
-> connector -> bus -> agent WS -> agent runs -> reply), but the reply's egress
was declined by the connector: "discord egress declined: target not routed to an
onboarded tenant".
Cause: the connector's routedEgressGuard resolves the owning tenant from the
OUTBOUND action's metadata.guild_id (Discord's routing discriminator). The
gateway's generic delivery path builds outbound metadata via
run.py _thread_metadata_for_source, which only carries thread_id (and returns
None entirely for a non-threaded message) — so guild_id never reached the
connector, tenant resolution failed, and the shared bot refused to post.
Fix (relay-adapter-local, no perturbation of the generic delivery path or other
platforms): RelayAdapter learns chat_id -> guild_id from each inbound event
(_capture_scope) and re-attaches it to the outbound action's metadata in send()
(_with_scope) when not already present. No-op for chats we never saw inbound
(e.g. DMs) and never overwrites an explicit guild_id.
Verified live on staging hermes-agent-stg-automated-perception-5054: an
@mention in #general now produces a visible bot reply — full multi-tenant relay
round-trip (real Discord -> shared connector bot -> tenant routing -> agent WS ->
reply egress -> Discord).
Tests: _capture_scope/_with_scope reattach, no-scope no-op, explicit-guild_id
preserved (test_relay_adapter.py). Full relay + config suites green (160 passed).
Relay-adapter lane. EXPERIMENTAL.