Commit graph

1352 commits

Author SHA1 Message Date
Sahil Saghir
226e9322e1 fix(kanban): cross-platform dispatcher lock + explicit release
Two robustness gaps from community review (#44919):

1. Windows dead-path: replaced bespoke fcntl.flock with gateway.status
   _try_acquire_file_lock / _release_file_lock — already cross-platform
   (msvcrt on Windows, fcntl on POSIX). Added _release_singleton_lock
   helper.

2. Lock fd never released: stored handle is now released explicitly in
   both exit paths — CancelledError handler and normal while-loop exit.
   Allows in-process stop/restart (tests, embedded use).

Also tightened docstrings — 'corrupt the SQLite DBs' is now specific
(wal_autocheckpoint=0 + concurrent manual WAL checkpoints can corrupt
index pages), matching the module's own concurrency claims.
2026-06-19 07:35:33 -07:00
Sahil Saghir
dfa561092a fix(kanban): machine-global singleton lock for the embedded dispatcher (#41448)
The gateway's embedded dispatcher has no guard against more than one dispatcher
running concurrently. dispatch_in_gateway defaults to true, so a second gateway
for the same profile (a restart race where the old process is slow to exit) — or
any deployment that runs multiple profile gateways with the default — starts a
second dispatcher loop. As #41448 describes, concurrent dispatchers each run
release_stale_claims() against the same boards, double reclaim frequency, and
re-dispatch slow workers before they finish. In practice they also corrupt the
shared kanban SQLite DBs under concurrent write load.

Add _acquire_singleton_lock(): an exclusive, non-blocking fcntl.flock at the
machine-global kanban root (kanban_home()/kanban/.dispatcher.lock — the board is
shared across profiles by design, so this serialises every gateway, not just one
profile). The first gateway to start its dispatcher holds the lock for its
process lifetime; any other gateway finds it contended, logs, and skips
dispatching while still running for messaging. Falls back to config-only control
on non-POSIX or filesystems without flock.

This is more robust than a per-profile guard because the documented model is
"one dispatcher sweeps all boards" — the contention is across profiles, not just
within one. Closes #41448.

Test: lock is exclusive (held, then contended while held, then held again after
release).
2026-06-19 07:35:33 -07:00
Ben Barclay
1e70df5fdd feat(gateway): multiplex phase 4 — lifecycle guard + per-profile observability
- _guard_named_profile_under_multiplexer: when the default gateway is running
  with gateway.multiplex_profiles=on, a named-profile 'hermes gateway run' hard
  -errors (pointing at the multiplexer) instead of double-binding that
  profile's platforms. Inert unless all hold: this invocation is a named
  profile, a default-profile gateway is alive, and its config has multiplexing
  on. --force overrides. Wired into run_gateway's guard chain.
- write_runtime_status gains served_profiles: the secondary-adapter startup
  records [active] + multiplexed profiles into runtime_status.json so
  'hermes status' can show per-profile coverage without a second probe. Absent
  for single-profile gateways.

Tests: served_profiles round-trips and is absent by default; guard is inert for
the default profile / under --force / when no default gateway is running.
2026-06-19 07:34:15 -07:00
Ben Barclay
d5d02eabb0 feat(gateway): multiplex phase 3 — secondary-profile adapter registry + conflict detection
Bring up adapters for every profile the gateway serves, not just the active
one. Keeps self.adapters as the default/active profile's map (the ~93 existing
self.adapters[...] sites are untouched) and adds secondary profiles under
self._profile_adapters[profile][platform].

- _start_secondary_profile_adapters loops profiles_to_serve(multiplex=True),
  skips the active profile (handled by the primary startup loop), and for each
  other profile loads its gateway config and creates+connects its enabled
  adapters under that profile's _profile_runtime_scope (home + secret scope).
- Each secondary adapter gets _make_profile_message_handler(profile): stamps
  source.profile (when unset) before delegating to the shared _handle_message,
  so the agent turn and session key resolve to that profile.
- Same-platform credential-conflict detection: _adapter_credential_fingerprint
  hashes the adapter's bot token (salted, truncated — never logs the token);
  two profiles claiming the same (platform, token) refuse the duplicate with a
  clear error naming both, since one token can't be polled twice.
- Port-binding hard-error: a SECONDARY profile that enables a port-binding
  platform (webhook, api_server, msgraph_webhook, feishu, wecom_callback,
  bluebubbles, sms) is a config error and aborts startup via MultiplexConfigError
  — the default profile owns the single shared HTTP listener and serves every
  profile through the /p/<profile>/ prefix, so a second bind can only collide.
  Distinct from a transient connect failure (which logs + stays alive to retry):
  a config error writes gateway_state=startup_failed and exits cleanly with an
  actionable message (names the profile, the platform, and the fix). There is no
  valid reason to bind a second port once you've opted into a multiplexer.
- Shutdown tears down secondary adapters alongside the primary ones.
- Defensive getattr guards keep partial-construction unit tests (stop(),
  _run_agent on bare instances) working.

No-op when multiplex_profiles is off (self._profile_adapters stays empty).

Tests: fingerprint stability/log-safety/distinctness, profile message-handler
stamping (and not overriding an already-stamped source), port-binding hard-error
raises + names the profile/platform, non-binding platform is not rejected, and
the guard set covers every TCP-binding adapter.
2026-06-19 07:34:15 -07:00
Ben Barclay
f35abb122a feat(gateway): multiplex phase 1 — HTTP-inbound /p/<profile>/ routing (webhook)
Serve webhook inbound for multiple profiles off the one shared listener via a
URL prefix, with no second port bound.

- SessionSource gains a 'profile' field (round-trips through to_dict/from_dict;
  omitted when unset so existing serialization is unchanged). It carries which
  profile an inbound message was routed to.
- WebhookAdapter registers /p/{profile}/webhooks/{route_name} alongside the
  existing /webhooks/{route_name}. _resolve_request_profile validates the
  prefix against profiles_to_serve(): None when absent or multiplexing is off
  (ignored, handled as default — no spurious 404), the profile name when valid,
  _PROFILE_REJECTED (→ 404) when the profile isn't served. The resolved profile
  is stamped onto the SessionSource.
- session-key namespacing and the per-turn home/credential scope now prefer
  source.profile: SessionStore._resolve_profile_for_key(source),
  _session_key_for_source fallback, and _resolve_profile_home_for_source all
  honor it (→ the agent turn resolves that profile's config/skills/credentials
  via the Phase 2 _profile_runtime_scope).

Constraint: routing inbound needs no per-profile platform credential, but the
agent still needs the routed profile's provider key — delivered by Phase 2's
secret scope. api_server (OpenAI-compatible surface) profile routing is a
focused follow-on; its source-construction path differs from webhook's.

Tests: SessionSource.profile round-trip + namespace drive; _resolve_request_
profile accept/reject/ignore matrix.
2026-06-19 07:34:15 -07:00
Ben Barclay
f538470cf4 feat(gateway): multiplex phase 2 — fail-closed profile credential isolation (Workstream A)
The credential gate. When multiplexing is active, a profile's secrets resolve
from a context-local scope, never the process-global os.environ (which in a
multiplexer may hold another profile's keys, and is inherited by every
subprocess spawned with env=dict(os.environ)).

- agent/secret_scope.py: get_secret() backed by a secret-scope contextvar.
  FAIL-CLOSED: when multiplex is active and no scope is installed, an unscoped
  read RAISES UnscopedSecretError instead of falling back to os.environ — a
  missed/new call site crashes loudly at that line rather than leaking a
  cross-profile value. Genuinely-global vars (HERMES_*, PATH, kanban paths,
  …) keep reading os.environ via an allowlist. load_env_file/build_profile_
  secret_scope parse a profile .env into an isolated dict WITHOUT mutating
  os.environ. Off by default => transparent os.getenv behavior.
- hermes_cli/runtime_provider.py: all credential/provider/base-url reads go
  through _getenv -> get_secret.
- agent/credential_pool.py: env fallbacks route through get_secret (the
  ~/.hermes/.env-first preference is preserved and already profile-correct via
  the home override).
- tools/mcp_tool.py: MCP config  interpolation resolves through
  get_secret, so a server's  picks up the routed profile's value.
- gateway/run.py: set_multiplex_active() at GatewayRunner init; per-turn .env
  reload is a no-op for credentials in multiplex mode (secrets come from the
  scope, not global env); _profile_runtime_scope context manager combines the
  HERMES_HOME override + secret scope; _run_agent wraps _run_agent_inner in
  that scope (resolved via _resolve_profile_home_for_source) when multiplexing.

Propagates into the agent worker thread for free via the existing
copy_context() in _run_in_executor_with_context.

Tests: 13 unit (fail-closed, scope isolation, global allowlist, .env parsing
without environ mutation) + 7 E2E (runtime_provider + MCP interpolation prove
two profiles isolated, unscoped read raises, globals still read environ).
2026-06-19 07:34:15 -07:00
Ben Barclay
d82f9fa7f7 feat(gateway): multiplex phase 0 — config flag, profile enumeration, profile-stamped session keys
Foundations for serving multiple profiles from one gateway process, inert
when off:

- gateway.multiplex_profiles config flag (default false), round-trips through
  GatewayConfig and load_gateway_config (top-level + nested gateway.* form).
- hermes_cli.profiles.profiles_to_serve(multiplex): the single chokepoint for
  which (profile, HERMES_HOME) pairs the gateway serves. Lightweight dir scan;
  active-profile-only when off, default + all named profiles when on.
- build_session_key gains a profile= namespace slot. Default/None reuse the
  historical 'agent:main:...' literal BYTE-IDENTICALLY (no session migration,
  positional parsers unaffected); a named profile becomes 'agent:<profile>:...'
  so two profiles on the same platform/chat never collide.
- SessionStore._resolve_profile_for_key + _session_key_for_source fallback
  resolve the namespace from the flag (legacy when off, active profile when on).

Tests: byte-identical-when-off (parametrized), namespace isolation, positional
layout preserved, config round-trip, profiles_to_serve enumeration.
2026-06-19 07:34:15 -07:00
snav
caaa916289 fix(gateway): don't let delayed Discord status messages partition history backfill
Discord channel-history backfill partitions on Hermes' last self-authored
message. Asynchronous, non-conversational status sends (self-improvement
review bubbles, heartbeats, background-process notifications, update status,
gateway restart/online notices) land as ordinary bot messages, so a delayed
status bump becomes the history boundary and swallows real messages that
arrived after Hermes' actual reply.

Mark these sends at the source via metadata["non_conversational"] (Discord
only; other platforms' metadata is unchanged). The adapter no longer advances
the history-boundary cache for marked sends and persists their IDs to a
sidecar JSON so the cold-start scan can skip them by ID after a restart. A
narrow regex recognizer remains only as an upgrade bridge for status bumps
emitted by an older gateway that pre-dates the marking.
2026-06-19 07:29:27 -07:00
Alex Yates
fad4b40d9d fix(model): persist /model switch by default across sessions
A plain /model <name> switch only lasted for the current session — every
new session reverted to the previously-configured model, so users had to
re-switch every time (e.g. glm-5.1 -> glm-5.2 on every launch).

Persist-by-default is now the behavior across all three /model surfaces
(CLI, gateway, TUI/dashboard), gated by a new config key
model.persist_switch_by_default (default true):

  /model <name>             switch model (persists to config.yaml)
  /model <name> --session   switch for this session only
  /model <name> --global    switch and persist (explicit, unchanged)

The effective persistence is resolved once via resolve_persist_behavior()
in hermes_cli/model_switch.py so --session opts out, --global opts in,
and the config-gated default applies otherwise. --global remains a valid
explicit no-op alias for the new default.
2026-06-19 07:07:06 -07:00
Charles Power
715fa9ea1c fix(gateway): harden gateway command-line matcher (review findings)
Address correctness gaps found in pre-PR review of the strict matcher:

- Profile selectors can appear on EITHER side of the `gateway` token
  (`_apply_profile_override` strips `--profile`/`-p` from anywhere in argv
  before argparse), so `hermes gateway --profile work run` and
  `python -m hermes_cli.main gateway -p work run` are valid launches the
  previous matcher wrongly rejected. Strip `--profile`/`-p`/`--profile=`/`-p=`
  from anywhere before locating the subcommand.
- A profile literally named `gateway` (`hermes -p gateway gateway run`) made
  the old token scan stop on the profile value; stripping the selector+value
  first fixes it.
- Tokenize quote-aware with `shlex` so quoted Windows paths containing spaces
  (`"C:\Program Files\Hermes\hermes-gateway.exe"`) are no longer split mid-path
  and the dedicated-entrypoint match survives.

Without these, the matcher could MISS a real running gateway -> the opposite
failure (restart/status reporting "down" when up). Adds regression tests for
all three shapes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 06:31:56 -07:00
Charles Power
fd92a3a5c9 fix(gateway): Windows restart no longer causes a silent outage
`hermes gateway restart` on Windows could take the gateway offline with no
replacement. restart() was stop() -> sleep(1.0) -> start(), but the graceful
drain can run up to ~180s while the detached pythonw process stays alive. The
1s sleep let start() run against the still-draining old process; its
"already running" guard then no-opped, and when the old process finally exited
nothing relaunched it.

Two root causes, both fixed:

1. Loose PID detection. `_scan_gateway_pids` and the gateway.status helpers
   used substring matches ("... gateway" in cmdline) for lifecycle decisions,
   so they false-matched `gateway status`/`dashboard` siblings and unrelated
   processes like `python -m tui_gateway`, plus stale gateway.pid records.
   Add a shared strict matcher `looks_like_gateway_command_line()` in
   gateway/status.py that requires the real `gateway run` subcommand (or the
   dedicated entrypoints), and route `_looks_like_gateway_process`,
   `_record_looks_like_gateway`, and `_scan_gateway_pids` through it.

2. restart() race. Wait until the gateway is authoritatively gone
   (`get_running_pid()` + strict `_gateway_pids()`) before relaunch; force-kill
   once if it lingers and raise rather than start a duplicate; verify the
   relaunch produced a running gateway and raise loudly if not (no more
   exit-0 silent outage).

Scoped to Windows; systemd/launchd restart paths are already drain-aware.
Adds tests/gateway/test_gateway_command_line_matcher.py.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 06:31:56 -07:00
teknium1
144834b2f7 test(gateway): real cached-agent max_iterations regression test
Replaces the tautological test from the original PR (which asserted a
plain assignment it performed itself in the test body) with one that
exercises the actual contracts: _init_cached_agent_for_turn leaves
max_iterations untouched, and the per-turn IterationBudget rebuild
(turn_context.py) propagates a refreshed cap.
2026-06-19 06:31:13 -07:00
infinitycrew39
dcac719527 test(gateway): cover runtime max_turns refresh 2026-06-19 06:31:13 -07:00
Kenny John Jacob
bce1e36b57 fix(discord): unwrap dict choices + soft-boundary truncate clarify buttons
Two bugs surfaced from production usage in #37134:

1. Dict choices rendered as Python repr. LLMs sometimes emit
   [{"description": "..."}] instead of bare strings; the old
   str(c).strip() coercion turned the whole dict into
   "{'description': '...'}" on the button label.

   Fix: add a _flatten_choice helper that unwraps dicts against
   the canonical LLM tool-call user-facing keys (label, description,
   text, title) in that order. Dicts with none of those keys are
   dropped. The "name" and "value" keys are deliberately NOT in the
   priority list — they're Discord-component-shaped fields that
   could appear in dicts that aren't meant to be choices (a
   developer-error wiring that passes a Button-shaped object);
   picking them would leak raw enum values or 4-char model
   identifiers onto user-facing buttons.

2. Mid-word truncation on long button labels. The old
   choice[:72] + "..." cut at position 72, mid-word. Worse, the
   three-char ellipsis ate into the 80-char Discord label cap,
   leaving only 75 chars of body.

   Fix: budget-aware cut strategy with three tiers:
     a. Last space in the trailing half of the budget (word boundary).
     b. Last soft boundary (- , . )) in the trailing half — used
        only when no word boundary exists.
     c. Hard cut at the budget limit (last resort).
   Use single U+2026 (…) to fit the cap. Cut AT soft boundaries
   (inclusive) so the label ends on the boundary char rather than
   on the alpha char that followed it.

Tests:
- test_unwraps_dict_choices_to_description: reproduces the
  screenshot in #37134, asserts the Python repr is gone.
- test_unwrap_prefers_description_over_name_in_multi_key_dict:
  regression guard for the name-key order in the unwrap list.
- test_unwrap_prefers_label_over_description: regression guard
  for label winning over description.
- test_unwrap_does_not_pick_value_or_name_alone: regression
  guard for the "name"/"value" fields being absent.
- test_truncates_long_choice_label: 200-char input, asserts
  total <= 80 and U+2026.
- test_truncates_long_choice_label_breaks_on_word_boundary:
  asserts the cut is on a space, not mid-word.
- test_truncates_long_no_space_choice_on_soft_boundary:
  adversarial input where position 76 is mid-word alpha, asserts
  the renderer falls back to a soft boundary.

Parity: telegram clarify suite (12 tests) still passes; the
helper is a Discord adapter local, not shared with the gateway.

Follow-up: gateway/platforms/telegram.py has the same str(c).strip()
pattern in its own send_clarify and will need a similar fix
(separate PR to keep this diff reviewable).

Fixes #37134
2026-06-19 06:31:08 -07:00
Ben Barclay
a64fc490fe
fix(relay): make hosted gateways actually connect AND complete the inbound/outbound round-trip (#48828)
* fix(relay): enable RELAY platform + normalize dial URL so hosted gateways actually connect

Three bugs blocked a self-provisioned hosted gateway from ever establishing its
inbound relay WS (found while standing up the live staging end-to-end). Each
masked the next; all three are needed for inbound to work.

1. RELAY platform never enabled in config.platforms (gateway/config.py).
   register_relay_adapter() puts the adapter in the platform_registry, but
   start_gateway()'s connect loop iterates self.config.platforms — which never
   contained Platform.RELAY. So the adapter was "registered" but never connected
   (logs showed "relay adapter registered" then "No messaging platforms
   enabled"). Fix: _apply_env_overrides now enables Platform.RELAY (mirroring
   relay_url into extra for the connected-checker) when GATEWAY_RELAY_URL (env)
   or gateway.relay_url (yaml) is set. Absent -> no RELAY entry (direct/
   single-tenant gateways unaffected).

2. URL scheme not converted for the WS dial (gateway/relay/ws_transport.py).
   The relay URL is configured once as the http(s):// base (used as-is for the
   provision POST), but websockets.connect rejects http(s):// with "scheme isn't
   ws or wss". Fix: _ws_dial_url converts https->wss / http->ws.

3. /relay path not appended (same helper). The connector mounts its
   WebSocketServer at path "/relay" and returns HTTP 400 on an upgrade to any
   other path. GATEWAY_RELAY_URL is the base (no /relay), so the dial hit "/"
   -> 400. Fix: _ws_dial_url ensures the path ends in /relay. Idempotent — a URL
   already carrying ws(s):// and/or /relay is unchanged, so provision's
   _provision_url (which derives /relay/provision from either form) still works.

Why the cross-repo E2E missed #2/#3: the stub connector binds ws://host:port and
its websockets.serve accepts ANY path, so neither the scheme nor the /relay path
was exercised. Real connector needs both.

Verified live on staging hermes-agent-stg-automated-perception-5054: after the
fixes the gateway logs "Connecting to relay..." -> "✓ relay connected" ->
"Gateway running with 1 platform(s)" against
wss://gateway-gateway.staging-nousresearch.com/relay, stable.

Tests: added _ws_dial_url scheme+path+idempotency cases (test_ws_transport.py)
and RELAY-platform-enablement cases for env + yaml + absent (test_config.py).
Full gateway/relay + config suites green (191 passed).

Relay-adapter lane. EXPERIMENTAL.

* fix(relay): re-attach guild_id to outbound so connector egress resolves the tenant

The final bug in the hosted-relay round-trip. Inbound worked end to end (Discord
-> connector -> bus -> agent WS -> agent runs -> reply), but the reply's egress
was declined by the connector: "discord egress declined: target not routed to an
onboarded tenant".

Cause: the connector's routedEgressGuard resolves the owning tenant from the
OUTBOUND action's metadata.guild_id (Discord's routing discriminator). The
gateway's generic delivery path builds outbound metadata via
run.py _thread_metadata_for_source, which only carries thread_id (and returns
None entirely for a non-threaded message) — so guild_id never reached the
connector, tenant resolution failed, and the shared bot refused to post.

Fix (relay-adapter-local, no perturbation of the generic delivery path or other
platforms): RelayAdapter learns chat_id -> guild_id from each inbound event
(_capture_scope) and re-attaches it to the outbound action's metadata in send()
(_with_scope) when not already present. No-op for chats we never saw inbound
(e.g. DMs) and never overwrites an explicit guild_id.

Verified live on staging hermes-agent-stg-automated-perception-5054: an
@mention in #general now produces a visible bot reply — full multi-tenant relay
round-trip (real Discord -> shared connector bot -> tenant routing -> agent WS ->
reply egress -> Discord).

Tests: _capture_scope/_with_scope reattach, no-scope no-op, explicit-guild_id
preserved (test_relay_adapter.py). Full relay + config suites green (160 passed).

Relay-adapter lane. EXPERIMENTAL.
2026-06-19 16:30:24 +10:00
Ben Barclay
2c6e266e88
fix(relay): trigger self-provision on relay-config + NAS token, not is_managed() (#48724)
self_provision_if_managed() gated on is_managed(), but is_managed() means
"NixOS/package-manager-managed" (it keys on HERMES_MANAGED or a ~/.hermes/.managed
marker) — NOT "NAS-hosted". A NAS-provisioned Fly agent sets NEITHER, so the gate
was always False and relay self-provision SILENTLY no-oped on exactly the hosted
agents it was built for. Caught live: a staging agent with GATEWAY_RELAY_URL
correctly stamped logged "No messaging platforms enabled" and never dialed the
connector; HERMES_MANAGED was unset on the machine. The unit tests had mocked
is_managed()->True, so they passed while the real trigger never fired (mocked-
trigger blind spot).

Fix: drop the is_managed() gate and rename self_provision_if_managed ->
self_provision_relay. The real trigger is now "relay_url() set + no pinned secret
+ a resolvable NAS token", which is both NAS-independent and self-guarding:
  - NAS-hosted agent: GATEWAY_RELAY_URL + no pinned secret + bootstrapped NAS
    token -> self-provisions.
  - Self-hosted + `hermes gateway enroll`: pinned GATEWAY_RELAY_SECRET -> skipped
    (existing secret-present guard).
  - Self-hosted, unenrolled, no NAS identity: resolve_nous_access_token() fails
    -> graceful no-op (existing fail-soft path).

Security: unchanged trust model. The connector still derives tenant from the
validated NAS token; this only broadens WHEN the provision attempt fires, and
every broadened case is still guarded by token-resolution + pinned-secret-skip.

Tests: replaced the (wrong) "skips when not managed" test with a regression test
proving a NAS host where is_managed()==False STILL provisions; renamed all call
sites; added a "no NAS token -> non-fatal skip" test for the self-hosted branch.
88 relay tests pass.

Relay-adapter lane. EXPERIMENTAL.
2026-06-19 01:01:24 +00:00
Ben Barclay
d2c53ff558
feat(relay): WS-only inbound on the gateway adapter (Phase 3) (#48294)
The connector now delivers inbound (messages + interrupts) over the gateway's
OUTBOUND /relay WebSocket, not a signed HTTP POST to an inbound endpoint. The
gateway needs no inbound HTTP port — which is what makes hosted gateways (no
public IP) able to receive inbound at all.

- gateway/relay/adapter.py: connect() wires set_interrupt_inbound_handler(
  self.on_interrupt) so connector->gateway interrupt_inbound frames bridge into
  the existing per-session interrupt path (the inbound message handler was
  already wired). Removed _maybe_start_inbound_receiver() + the _inbound_runner
  lifecycle — there is no HTTP receiver anymore.
- gateway/relay/inbound_receiver.py: deleted (the signed-HTTP InboundDelivery
  receiver).
- gateway/relay/__init__.py: removed relay_inbound_config() (dead with the
  receiver gone). The delivery key is still set in-process by self-provision for
  forward-compat but is no longer consumed for inbound.
- docs/relay-connector-contract.md: §3 rewritten — inbound is the WS back-channel
  routed cross-instance via the connector's relay bus; §5 interrupt + §6 auth
  table updated; the old signed-HTTP-POST + per-tenant-delivery-key-signing path
  is documented as superseded. gatewayEndpoint noted as passthrough-plane only.

Tests: stub_connector grows set_interrupt_inbound_handler + push_interrupt;
new test_relay_interrupt case proves connect() wires BOTH inbound handlers and an
interrupt_inbound frame over the WS cancels the right session. Removed the
HTTP-receiver test; updated the crypto-shedding scan + self-provision delivery-key
assertion. 88 relay tests pass.

EXPERIMENTAL. Pairs with gateway-gateway (relay bus + WsGatewayDelivery) and the
NAS GATEWAY_RELAY_URL stamp. The cross-repo E2E (connector repo) proves the full
multi-instance path against this production adapter code.
2026-06-19 09:33:15 +10:00
Ben Barclay
0ddd21c74e
feat(relay): managed-boot self-provision client (Phase 3, gateway side) (#48242)
The gateway half of relay Phase 3. On a MANAGED boot with relay configured and
no secret pinned, the runtime self-provisions its relay credentials IN-PROCESS:
resolve the agent's own Nous access token (resolve_nous_access_token) -> POST
the connector's /relay/provision asserting its own endpoint + route keys ->
set GATEWAY_RELAY_ID/SECRET/DELIVERY_KEY into os.environ so the immediately-
following register_relay_adapter() reads them and dials out authenticated.

No human, no enrollment token, no disk write — the creds live only in process
memory (save_env_value refuses under managed anyway, and keeping the secret off
any volume is the stronger posture). Stateless: process-env creds don't survive
a restart, so a managed container re-provisions every boot; the connector's
rotation window covers a still-connected prior instance. An explicitly-pinned
GATEWAY_RELAY_SECRET is respected (skip). Self-hosted is unchanged: humans keep
using `hermes gateway enroll`.

Endpoint provenance is gateway-asserted (GATEWAY_RELAY_ENDPOINT +
GATEWAY_RELAY_ROUTE_KEYS, env or gateway.relay_* config) — uniform code path
whether the operator sets it (self-hosted) or NAS stamps it (hosted, the only
case NAS knows the public URL). Both absent -> outbound-only provisioning
(credentials, no inbound routes). The connector scopes the asserted endpoint to
the verified tenant, so it stays within the security model.

- gateway/relay/__init__.py: relay_endpoint(), relay_route_keys(),
  _provision_url(), _post_provision(), self_provision_if_managed() (never
  raises — a provision failure logs and boots without relay auth).
- gateway/run.py: call self_provision_if_managed() immediately before
  register_relay_adapter() in the startup path.

Tests: 12 unit (trigger logic, respect-pinned-secret, in-process env wiring,
endpoint+routes vs outbound-only, fail-soft on token/connector failure);
mutation-checked (drop is_managed guard / pinned-secret guard -> tests fail).
Cross-repo live E2E driver lands on the connector side (depends on this).

EXPERIMENTAL: relay auth scheme may change until >=2 Class-1 platforms validate.
2026-06-18 15:25:29 +10:00
Ben Barclay
c276b017ad
feat(relay): connector⇄gateway channel auth + signed-HTTP inbound receiver + enroll CLI (#48147)
* feat(relay): authenticate the connector⇄gateway WS channel

The relay gateway may be customer-managed and internet-exposed, so the
connector⇄gateway channel is itself authenticated (distinct from the
platform crypto the relay path sheds). Add gateway/relay/auth.py — a
Python port of the connector's HMAC token + delivery-signature schemes
(relayAuthToken.ts / deliverySigning.ts), verified byte-for-byte against
the connector's compiled TypeScript via cross-language test vectors.

Present an Authorization bearer on the /relay WS upgrade keyed by the
per-gateway secret (resolved from GATEWAY_RELAY_ID / GATEWAY_RELAY_SECRET
in env or config). The connector rejects an unauthenticated/invalid/
revoked upgrade with close 4401.

* feat(relay): signed-HTTP inbound delivery receiver

The connector delivers normalized inbound events to a tenant's gateway
over a signed HTTP POST, not the outbound /relay WS: the connector
instance owning a platform socket is generally not the instance a given
gateway dialed out to, so inbound targets a tenant endpoint that may
load-balance across gateway instances.

Add gateway/relay/inbound_receiver.py — verifies x-relay-signature /
x-relay-timestamp over the EXACT raw request bytes (re-serializing would
break the HMAC: JS JSON.stringify is compact, Python json.dumps spaces)
against the per-tenant delivery key verify list within a 300s replay
window, then dispatches messages to handle_message and interrupts to the
interrupt handler. Wire it into the adapter lifecycle (start in connect()
when a delivery key + bind port are configured, tear down in disconnect();
a purely-outbound dev gateway runs without it).

Refine test_relay_sheds_crypto to distinguish PLATFORM crypto (Discord
ed25519, Twilio/WeCom HMAC — still shed) from the connector⇄gateway
CHANNEL auth (intended): auth.py / inbound_receiver.py are exempt from
the platform-symbol scan but still banned from importing platform-crypto
modules, plus a positive guard that auth.py uses only stdlib hmac/hashlib.

* feat(relay): hermes gateway enroll CLI

Add the gateway half of zero-touch enrollment. `hermes gateway enroll`
resolves a fresh Nous Portal access token (the tenant-proving identity),
POSTs {enrollmentToken, gatewayId} to the connector's /relay/enroll, and
persists GATEWAY_RELAY_ID / GATEWAY_RELAY_SECRET / GATEWAY_RELAY_DELIVERY_KEY
to ~/.hermes/.env. The per-gateway secret authenticates the WS upgrade;
the per-tenant delivery key verifies signed inbound deliveries.

Refuses under is_managed() (hosted installs get the secret stamped in by
the orchestrator). Added as an 'enroll' subcommand on the existing
gateway subparser — not a new top-level command.

* docs(relay): inbound is signed HTTP, not WS; document channel auth

Fix the stale contract: §3/§5 said inbound rode the WS socket (single-
instance only, predates the multi-instance socket-ownership + channel-auth
model). Inbound + connector→gateway interrupt are signed HTTP POSTs to the
tenant endpoint. Add §6.1 documenting the two channel-auth schemes (per-
gateway WS-upgrade secret, per-tenant inbound delivery key) and how they
differ from the platform crypto the relay path sheds.

* test(relay): update build_gateway_parser callers for cmd_gateway_enroll

The enroll subcommand added cmd_gateway_enroll as a required keyword-only
arg to build_gateway_parser, but two existing parser-extraction tests still
called it with only cmd_gateway/cmd_proxy — failing CI with TypeError.
Thread the new handler through both call sites and add a test asserting
`gateway enroll` dispatches to cmd_gateway_enroll with its flags parsed.
2026-06-18 12:01:54 +10:00
Ben
acc8916ac7 test(gateway): live ws-transport round-trip + config-driven registration
- test_ws_transport.py: drives WebSocketRelayTransport against a REAL in-process
  websockets server (not a mock socket): handshake (hello->descriptor), inbound
  frame -> handler, outbound request/response correlation, follow_up routing,
  and clean disconnect failing pending waiters. Skips if websockets is absent.
- test_relay_registration.py: rewritten for the config-driven gate — registers
  when GATEWAY_RELAY_URL is set / an explicit url is passed / force=True; no-op
  without a URL; trailing slash stripped; adapter constructs through the registry.

Full relay suite: 57 passed.
2026-06-17 16:37:45 -07:00
Ben
3db9b3e616 feat(gateway): token-less follow_up outbound op (A2 capability action)
The relay outbound surface had send/edit/typing but no way to act on a
SHARED-identity capability (e.g. a Discord interaction follow-up token,
~15min) that the connector captured + stripped at the edge. Under A2 that
credential never reaches the gateway, so the gateway can't just 'send with
the token' — it needs a semantic op naming the session it's already in.

Adds the follow_up op end to end on the gateway side:
- RelayTransport.send_follow_up(action): protocol method. Action carries
  op='follow_up' + session_key + kind + content (+ metadata) and NO token.
- RelayAdapter.send_follow_up(session_key, kind, content, metadata): builds
  that action and returns a SendResult. The connector resolves the real
  capability (its resolveOutboundCapability), enforces the tenant match so
  tenant B can't wield tenant A's capability, and egresses; success=False
  when the capability is absent/expired/mismatched (nothing to retry — a
  leaked gateway holds zero capability material).
- StubConnector records follow_ups + a canned next_follow_up_result.

Tests: round-trips without a token; the wire action carries only session
refs (no credential value field — the 'kind' string is a type ref, not the
secret); failure surfaces when the connector can't resolve; no-transport
fails cleanly. 55 passed. §4 doc entry follows in the contract-rewrite commit.
2026-06-17 16:37:45 -07:00
Ben
c28a02b49d test(gateway): shed platform crypto from the relay path (A2 invariant)
Under the A2 trust model the connector is the SOLE crypto/identity
boundary: it verifies/decrypts every inbound platform payload at the edge
(it holds the tenant secrets), normalizes to a tenant-scoped MessageEvent,
and forwards only the sanitized event. The gateway re-validates nothing —
it cannot without being handed the shared signing secret, which on a
shared bot is itself the cross-tenant leak.

The relay path already imports no platform-crypto today; this locks that
in as an enforced invariant so nobody bolts re-validation (Discord
ed25519, Twilio HMAC, WeCom BizMsgCrypt, generic webhook signature checks)
onto the relay later and silently re-couples the gateway to platform
secrets it must never hold. Verification stays in the direct platform
adapters (gateway/platforms/*) which serve non-relay deployments.

- test_relay_package_imports_no_platform_crypto: AST-walks gateway/relay/*
  and fails on any import of a platform-crypto/verification module.
- test_relay_package_calls_no_signature_verification: fails on any
  verification-symbol reference (ed25519/hmac/bizmsg/verify_*).

Invariants (assert the relation 'relay re-validates nothing'), not frozen
snapshots. Verified the guard bites: injecting a wecom_crypto import makes
it fail, removing it goes green. docs §6 rewrite follows in a later commit.
2026-06-17 16:37:45 -07:00
Ben
e74577ed0f test(gateway): Telegram relay round-trip (Phase 1 generalization proof)
The Phase 1 exit gate requires BOTH Discord and Telegram to round-trip
through the relay stub, but test_relay_roundtrip.py only covered Discord.
Add the Telegram companion exercising its distinct discriminator profile:

- no guild_id — two chats isolate on chat_id alone
- forum topics share one chat_id and isolate by thread_id (the Telegram
  analog of Discord per-guild isolation), shared across participants by
  default (thread_sessions_per_user=False)
- DM isolation by chat_id
- utf16 len_unit + markdown_v2 dialect round-trip and configure the adapter
- outbound send round-trips through the stub

Proves the CapabilityDescriptor + build_session_key generalize beyond
Discord, not just the struct (which the descriptor unit tests already
covered).
2026-06-17 16:37:45 -07:00
Ben
5feec8b4cf test(gateway): enforce relay contract-doc ⟷ Python conformance
Add an invariant test pinning docs/relay-connector-contract.md to the
Python source of truth so the doc (which the connector repo mirrors by
hand) cannot silently drift:

- CapabilityDescriptor §2 table ⟷ dataclass fields + required/optional
- SessionSource wire keys (to_dict output) ⟷ §3 documented fields
- per-platform discriminator columns exist as real SessionSource fields
- guard that is_bot stays off the wire until deliberately promoted

Writing the test surfaced a real gap: §3 only enumerated 5 discriminators
in its per-platform table while to_dict() emits 12 keys. Seven wire keys
the connector must populate (chat_name, chat_topic, user_id_alt,
chat_id_alt, parent_chat_id, message_id, user_name) were undocumented —
a connector author reading the doc would never know to set them. Added a
complete SessionSource wire-field table to §3. The connector's existing
contract.ts already carries all 12, so no connector change is needed; the
doc was the lagging artifact.
2026-06-17 16:37:45 -07:00
Ben
c803661cec fix(gateway): register relay connection checker
The platform-connected-checker invariant test requires every built-in
Platform enum member to have either a generic token path or a bespoke
entry in _PLATFORM_CONNECTED_CHECKERS. Platform.RELAY was added without
one, so test_all_builtins_have_checker_or_generic_token_path failed.

Relay dials OUT to a connector and is 'connected' once an endpoint URL
is configured (extra['relay_url'] or extra['url']); the capability
descriptor is negotiated at handshake time, so the URL is the only
config-level signal in the experimental phase. Add the checker plus a
synthetic-config case exercising its True path.
2026-06-17 16:37:45 -07:00
Ben
c366466d70 test(relay): assert connector stub never leaks into production paths
CI guard: fails if gateway/ or plugins/ ever imports the test-only stub
connector or defines StubConnector. Matches code leaks (imports / class defs),
not prose mentions, so the transport.py docstring reference to the stub's path
is allowed.

Phase 1 complete. Task 1.6 of the gateway-relay plan.
2026-06-17 16:37:45 -07:00
Ben
a3cdd8c39d feat(relay): route mid-turn /stop over relay interrupt channel
RelayAdapter.on_interrupt(session_key, chat_id) bridges a connector-delivered
mid-turn /stop into the existing interrupt_session_activity path, setting the
per-session _active_sessions Event and clearing typing — cancelling exactly the
targeted session's turn without touching siblings (mirrors test_stop_thread_
sibling isolation). Transport.send_interrupt carries the gateway-side egress to
the connector for socket-owner routing.

Phase 1, Task 1.4 of the gateway-relay plan.
2026-06-17 16:37:45 -07:00
Ben
d0133fd8e4 feat(relay): register RelayAdapter through platform registry (flagged off by default)
register_relay_adapter() registers the generic 'relay' platform via the same
PlatformRegistry path as plugin adapters — no core dispatch changes. OFF by
default (dark-launch): only registers when HERMES_GATEWAY_RELAY is truthy (or
force=True for tests), so existing single-tenant/direct deployments are
unaffected. Factory builds a transport-less RelayAdapter with a placeholder
descriptor; the real descriptor is negotiated at handshake.

Phase 1, Task 1.3 of the gateway-relay plan.
2026-06-17 16:37:45 -07:00
Ben
259e78e175 feat(relay): transport protocol + test-only stub connector
Defines RelayTransport (lifecycle/handshake/inbound/outbound/interrupt) as the
gateway<->connector wire contract; RelayAdapter.connect now registers an inbound
handler that bridges connector-delivered MessageEvents into handle_message.
Adds an in-memory StubConnector under tests/ and an E2E round-trip proving:
connect registers the handler, inbound events reach the adapter, guild_id drives
build_session_key isolation (two guilds -> two keys; same guild/channel/user ->
one), outbound send round-trips, get_chat_info is proxied.

Phase 1, Task 1.2 of the gateway-relay plan.
2026-06-17 16:37:45 -07:00
Ben
b0999c82f3 feat(relay): generic RelayAdapter advertising negotiated capabilities
One BasePlatformAdapter subclass that reads its capability profile from a
CapabilityDescriptor: MAX_MESSAGE_LENGTH attribute, message_len_fn (table-driven
by len_unit: chars=len, utf16=Telegram-style code units), supports_draft_streaming.
Implements the four abstract methods (connect/disconnect/send/get_chat_info) by
delegating to an injected RelayTransport (full protocol lands in Task 1.2). Adds
Platform.RELAY enum member. No per-platform gateway code.

Phase 1, Task 1.1 of the gateway-relay plan.
2026-06-17 16:37:45 -07:00
Ben
3db49381d6 feat(relay): derive descriptor from PlatformEntry
CapabilityDescriptor.from_platform_entry() projects an existing PlatformEntry
(label, max_message_length, emoji, platform_hint, pii_safe, name) into a
descriptor, proving the descriptor is a projection of existing config rather
than a parallel concept. Runtime-only capabilities (len_unit, draft/edit/
thread/markdown) are caller-supplied. max_message_length==0 ('no limit') maps
to the stream_consumer 4096 default.

Phase 0 complete. Task 0.3 of the gateway-relay plan.
2026-06-17 16:37:45 -07:00
Ben
53d9b98305 feat(relay): experimental CapabilityDescriptor schema
Frozen, JSON-serializable handshake payload the connector hands the future
RelayAdapter: char limit, draft-streaming/edit/threading flags, markdown
dialect, len_unit. Mostly a wire projection of PlatformEntry + the adapter
capability methods. contract_version gates additive-only evolution; declared
EXPERIMENTAL until >=2 Class-1 platforms validate it. from_json ignores
unknown keys (forward-compat) and fills optional defaults.

Phase 0, Task 0.2 of the gateway-relay plan.
2026-06-17 16:37:45 -07:00
Ben
e9a2ce6585 test: lock gateway adapter capability surface (relay phase 0)
Behavioral regression harness locking the capability surface that the future
RelayAdapter must reproduce: the abstract-method set (connect/disconnect/send/
get_chat_info), message_len_fn default, supports_draft_streaming default, and
the stream_consumer MAX_MESSAGE_LENGTH attribute read. Passes on main before
any RelayAdapter exists.

Phase 0, Task 0.1 of the gateway-relay plan.
2026-06-17 16:37:45 -07:00
teknium
36ae958473 feat(gateway): gate message timestamps behind opt-in (default off)
Follow-up to salvaged PR #41633: the timestamp prefix injection was
unconditional. Gate the in-context render behind
gateway.message_timestamps.enabled (default false) at both the live-message
and history-replay sites; timestamp metadata is still captured + persisted
regardless so the toggle can be flipped on later. Add DEFAULT_CONFIG entry,
docs, and gate tests.
2026-06-16 15:49:59 -07:00
Wolfram Ravenwolf
bd7fc8fdcd feat(gateway): inject stable human-readable message timestamps
Consolidates these related Amy fork patches:
- 429830f39 feat(gateway): inject message timestamps into user messages for LLM context
- 3c3d6fac0 fix: handle both ISO string and epoch float timestamps in history replay
- 2874f7725 feat: human-friendly timestamp format with weekday and timezone name
- 3735f4c8b fix: render gateway message timestamps once
2026-06-16 15:49:59 -07:00
teknium1
8ed16a7a0c test(telegram): rich-reply recovery via send-time index
Cover #47375 fix: record-on-rich-send + lookup-on-reply round trip,
lookup miss leaving reply_to_text None, and precedence (native quote
and echoed caption both win over the index fallback).
2026-06-16 13:04:20 -07:00
Wolfram Ravenwolf
16fc717091 fix(mattermost): harden delivery hygiene
PROBLEM: Mattermost threads can become invalid or enormous, exposing two failure modes: internal scratch/reasoning/commentary displays could leak into persistent Mattermost threads via global display toggles, while rejected threaded user-visible replies could disappear unless every failed send fell back flat. A broad flat fallback would pollute channels with tool/status/progress noise.

SOLUTION: Require explicit Mattermost platform opt-in for scratch displays, keep using the existing notify=True metadata marker for user-visible final text/media/file replies, and allow the Mattermost plugin adapter to flat-fallback only notify-worthy sends whose threaded POST failure looks like a broken root/thread. Keep tool/status/progress and other non-notify sends thread-strict. Add regression tests for display opt-in, notify-only broken-thread fallback, generic API failure suppression, and stream notify metadata.

Verification: tests/gateway/test_mattermost.py tests/gateway/test_stream_consumer.py tests/gateway/test_stream_consumer_thread_routing.py tests/gateway/test_stream_consumer_fresh_final.py tests/gateway/test_stream_consumer_draft.py; tests/gateway/test_session_api.py tests/gateway/test_status_command.py tests/gateway/test_resume_command.py tests/hermes_cli/test_commands.py; py_compile touched gateway files; git diff --check.

Session: Mattermost thread 6qg8e9dd1pd9pkhi74xyaa1mry, 2026-06-01.
2026-06-16 06:34:54 -07:00
Rory Evans
e65d74bc6f fix(gateway): accept metadata kwarg in WhatsApp/email send_image
`BasePlatformAdapter.send_multiple_images` passes `metadata=metadata` to
`send_image` / `send_image_file` / `send_animation` on every send. The
WhatsApp and email `send_image` overrides stopped their signature at
`reply_to`, so any image delivered as a URL (the common case — image-gen
backends return URLs) raised:

    TypeError: send_image() got an unexpected keyword argument "metadata"

and the image silently failed to send. Their sibling overrides
(`send_image_file` / `send_video` / `send_voice` / `send_document`)
already absorb it via **kwargs, which is why only plain image-URL sends
broke.

- whatsapp/email `send_image`: accept `metadata` (matches the base
  signature); WhatsApp forwards it to the super() text fallback.
- Add `tests/gateway/test_media_metadata_contract.py`: asserts WhatsApp +
  email accept it, plus a best-effort sweep over every adapter so the next
  slip fails at test time instead of in production.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 06:23:53 -07:00
teknium
6373aba80f feat(gateway): rename to tool_progress_grouping, add config/docs/tests
Follow-up to salvaged PR #41620:
- Rename tool_progress_style -> tool_progress_grouping (clearer intent)
- Add display.tool_progress_grouping to DEFAULT_CONFIG (accumulate default)
- Document in messaging docs incl. 'separate is noisier, only where progress enabled'
- Add resolver tests (default/global/override/invalid/case)
2026-06-16 05:49:24 -07:00
Teknium
a6364bfa08
fix(telegram): edit streamed previews in place as rich (Bot API 10.1) (#46890)
Streamed Telegram replies that finalize through editMessageText were
converted to MarkdownV2, which has no table syntax and rewrites pipe
tables into bullet lists — users saw a table while streaming that
collapsed to a list at the last moment.

Finalize now edits the existing preview IN PLACE via Bot API 10.1's
editMessageText rich_message parameter when the content has constructs
the legacy path degrades (tables, task lists, <details>, block math).
No fresh send + delete, so no duplicate-preview flicker — the reason
#46206 reverted the fresh-final re-send path. prefers_fresh_final_streaming
stays False; the in-place edit replaces it.

- _needs_rich_rendering(): rich reserved for table/task-list/details/math
  (adapted from #45995, @YonganZhang); plain replies stay on MarkdownV2.
- _try_edit_rich(): editMessageText + rich_message via do_api_request,
  mirroring _try_send_rich's fallback/latch/transient contract.
- edit_message finalize tries rich in place before the 4,096 overflow
  pre-flight (rich cap is 32,768), falling back to legacy on rejection.
- rich_messages default flipped back to True (DEFAULT_CONFIG + adapter).
- docs (en + zh-Hans) + cli-config example updated to default-on.

Closes the root cause behind #45911 / #46009.
2026-06-16 05:26:04 -07:00
Teknium
5a0e0d35b9 fix(mattermost): preserve thread-local delivery hygiene
Salvage the valid thread-routing pieces from #41640:
- route Mattermost progress/status sends through metadata thread IDs
- treat top-level Mattermost channel posts as thread roots for progress
- preserve thread metadata through media/file sends
- allow flat fallback only for final notify-worthy replies on confirmed broken roots

Co-authored-by: Wolfram Ravenwolf <github.com@wolfram.ravenwolf.de>
2026-06-15 15:06:23 -07:00
kshitij
d2b34e89b0
Merge pull request #44431 from erosika/feat/honcho-identity-tree
feat(honcho): gateway-gated identity tree + canonicalize on pinUserPeer
2026-06-16 03:35:24 +05:30
kshitij
cffd6e3c8d
Merge pull request #46078 from xxxigm/fix/discord-slash-command-100-cap
fix(discord): cap slash commands at Discord's 100-command limit
2026-06-16 02:05:31 +05:30
Austin Pickett
5f6be7f31b
fix(teams): package Microsoft Teams SDK as an installable extra (salvage #43945) (#46764)
* fix(teams): package Microsoft Teams SDK as an installable extra

The Teams adapter imports the microsoft-teams-apps SDK, but it was never
declared as a dependency, so source/local installs hit ImportError and the
adapter silently reported the SDK as unavailable. Add a 'teams' extra
(microsoft-teams-apps==2.0.13.4 + aiohttp) and document 'uv sync --extra teams'.

Per the 2026-05-12 [all] policy, opt-in messaging-platform SDKs are NOT added
to [all] (they would break every fresh install on a quarantined release); the
teams extra is installed on demand like the other platform backends.

Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai>

* chore: map rio-jeong contributor email for attribution (#43945)

* feat(teams): lazy-install the Teams SDK on demand (parity with other channels)

The teams extra alone left Teams as the only messaging platform that wouldn't
auto-install its SDK — every other channel (telegram, discord, slack, matrix,
dingtalk, feishu) lazy-installs via tools.lazy_deps on first connect. Bring
Teams to parity:

- Add 'platform.teams' to LAZY_DEPS (microsoft-teams-apps + aiohttp).
- Replace the passive 'check_teams_requirements = check_requirements' alias with
  a real lazy-installer that calls ensure_and_bind('platform.teams', ...),
  rebinding all Teams SDK globals on success (mirrors check_slack_requirements).
- Call check_teams_requirements() at the top of TeamsAdapter.connect() so
  enabling Teams installs the SDK on demand.
- Keep the passive check_requirements() as the registry check_fn so 'gateway
  status' probes never trigger a pip install.

The 'teams' extra remains for packagers / explicit 'uv sync --extra teams'.

Tests: rework the alias test into shortcircuit + lazy-install assertions, and
update test_connect_fails_without_sdk to simulate an uninstallable SDK.

---------

Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-06-15 14:35:15 -04:00
Teknium
3e7e9b24d4 fix: harden salvaged session and browser improvements
Polish salvaged contributor work before PR review:
- read browser inactivity timeout from config with documented fallback
- skip redundant v10 trigram backfill before v11 FTS rebuild
- show delegate_task goals safely in progress previews
- show gateway status model/context without redundant token wording
- wire gateway /sessions to shared session-listing helpers
- map Ravenwolf author emails for release attribution

Co-authored-by: Wolfram Ravenwolf <github.com@wolfram.ravenwolf.de>
Co-authored-by: Amy Ravenwolf <amy@ravenwolf.de>
2026-06-15 07:46:34 -07:00
Wolfram Ravenwolf
ead38107a2 feat(status): restore model and context in gateway status
PROBLEM: The old public /status PR drifted out of the current Amy patch stack, leaving /status without the model/provider, context window, or explicit cumulative token label that Wolfram uses to monitor context pressure from chat.

SOLUTION: Re-port the feature onto the current gateway status handler. Prefer live/cached agent runtime metadata, fall back to SessionDB + SessionStore state between turns, add localized status model/context lines, and keep token totals explicitly labeled cumulative.

Verification: tests/gateway/test_status_command.py, tests/hermes_cli/test_commands.py
2026-06-15 07:46:34 -07:00
Teknium
0d82060c74 fix: harden WhatsApp target alias salvage
Add a parser-only routing regression that proves raw WhatsApp group JIDs bypass channel-directory resolution and home-channel fallback, include channel_aliases.json in quick state snapshots, harden malformed alias handling, and map Keiron McCammon for release attribution.
2026-06-15 05:51:47 -07:00
Keiron McCammon
ea49a79633 fix(messaging): route WhatsApp group JIDs to the target, not the home DM
send_message(target="whatsapp:<group-jid>") silently delivered to the
configured home DM instead of the requested group. Two gaps:

1. _parse_target_ref had no WhatsApp branch. Group JIDs (<id>@g.us),
   user JIDs (<id>@s.whatsapp.net), linked-identity JIDs (<id>@lid), and
   broadcast/newsletter JIDs matched no pattern and fell through to
   `return None, None, False`, so the caller treated them as
   unresolvable and used the home channel. The bridge's /send endpoint
   accepts any chatId, so only the tool-side target parsing was at fault.
   Add a whatsapp branch that recognizes native JIDs as explicit targets.
   The pre-existing '+'-prefixed E.164 path is preserved.

2. WhatsApp groups have no human-friendly name — the channel directory
   is regenerated from session data on a timer, so a group shows up as
   its raw 18-digit JID and any hand-edit to channel_directory.json is
   clobbered on the next rebuild. Add a user-maintained alias overlay
   (~/.hermes/channel_aliases.json) re-applied on every build AND every
   load, giving durable friendly names and letting a freshly-created
   group be pre-named before its first message.

Tests: TestParseTargetRefWhatsAppJID (7 cases) for the parser;
TestChannelAliases (7 cases) for the overlay, plus an autouse fixture
isolating CHANNEL_ALIASES_PATH so a real alias file can't leak into the
existing directory tests.
2026-06-15 05:51:47 -07:00
Tharushka Dinujaya
ec05d2bc3e fix(gateway): evict scoped lock when PID+start_time match but process is not a gateway
On Linux, systemd spawns core services (cron, nginx, sshd) with
deterministic PIDs and jiffy start_times across reboots. A service can
land on the exact same PID and start_time as a previous gateway, causing
acquire_scoped_lock to mistake it for a live gateway and block startup.

The existing stale-detection paths only covered:
  - start_times both non-None and different (clear mismatch)
  - start_times both None (macOS/Windows fallback to cmdline check)

The boot-time collision falls through both: times are non-None and
equal, so neither branch fired.

Add a third check: when both start_times are known and match but the
live process fails _looks_like_gateway_process, read its cmdline. If
the cmdline is readable (non-None), we have positive evidence of an
impostor and mark the lock stale. Requiring a readable cmdline keeps the
check conservative — if cmdline is unreadable we do not evict.
2026-06-15 05:25:07 -07:00
Teknium
a1f51feb72
fix(telegram): avoid rich final duplicate previews (#46206) 2026-06-14 11:13:38 -07:00