hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-22 10:32:00 +00:00

Author	SHA1	Message	Date
Sahil Saghir	226e9322e1	fix(kanban): cross-platform dispatcher lock + explicit release Two robustness gaps from community review (#44919): 1. Windows dead-path: replaced bespoke fcntl.flock with gateway.status _try_acquire_file_lock / _release_file_lock — already cross-platform (msvcrt on Windows, fcntl on POSIX). Added _release_singleton_lock helper. 2. Lock fd never released: stored handle is now released explicitly in both exit paths — CancelledError handler and normal while-loop exit. Allows in-process stop/restart (tests, embedded use). Also tightened docstrings — 'corrupt the SQLite DBs' is now specific (wal_autocheckpoint=0 + concurrent manual WAL checkpoints can corrupt index pages), matching the module's own concurrency claims.	2026-06-19 07:35:33 -07:00
Sahil Saghir	dfa561092a	fix(kanban): machine-global singleton lock for the embedded dispatcher (#41448 ) The gateway's embedded dispatcher has no guard against more than one dispatcher running concurrently. dispatch_in_gateway defaults to true, so a second gateway for the same profile (a restart race where the old process is slow to exit) — or any deployment that runs multiple profile gateways with the default — starts a second dispatcher loop. As #41448 describes, concurrent dispatchers each run release_stale_claims() against the same boards, double reclaim frequency, and re-dispatch slow workers before they finish. In practice they also corrupt the shared kanban SQLite DBs under concurrent write load. Add _acquire_singleton_lock(): an exclusive, non-blocking fcntl.flock at the machine-global kanban root (kanban_home()/kanban/.dispatcher.lock — the board is shared across profiles by design, so this serialises every gateway, not just one profile). The first gateway to start its dispatcher holds the lock for its process lifetime; any other gateway finds it contended, logs, and skips dispatching while still running for messaging. Falls back to config-only control on non-POSIX or filesystems without flock. This is more robust than a per-profile guard because the documented model is "one dispatcher sweeps all boards" — the contention is across profiles, not just within one. Closes #41448. Test: lock is exclusive (held, then contended while held, then held again after release).	2026-06-19 07:35:33 -07:00
Ben Barclay	1e70df5fdd	feat(gateway): multiplex phase 4 — lifecycle guard + per-profile observability - _guard_named_profile_under_multiplexer: when the default gateway is running with gateway.multiplex_profiles=on, a named-profile 'hermes gateway run' hard -errors (pointing at the multiplexer) instead of double-binding that profile's platforms. Inert unless all hold: this invocation is a named profile, a default-profile gateway is alive, and its config has multiplexing on. --force overrides. Wired into run_gateway's guard chain. - write_runtime_status gains served_profiles: the secondary-adapter startup records [active] + multiplexed profiles into runtime_status.json so 'hermes status' can show per-profile coverage without a second probe. Absent for single-profile gateways. Tests: served_profiles round-trips and is absent by default; guard is inert for the default profile / under --force / when no default gateway is running.	2026-06-19 07:34:15 -07:00
Ben Barclay	d5d02eabb0	feat(gateway): multiplex phase 3 — secondary-profile adapter registry + conflict detection Bring up adapters for every profile the gateway serves, not just the active one. Keeps self.adapters as the default/active profile's map (the ~93 existing self.adapters[...] sites are untouched) and adds secondary profiles under self._profile_adapters[profile][platform]. - _start_secondary_profile_adapters loops profiles_to_serve(multiplex=True), skips the active profile (handled by the primary startup loop), and for each other profile loads its gateway config and creates+connects its enabled adapters under that profile's _profile_runtime_scope (home + secret scope). - Each secondary adapter gets _make_profile_message_handler(profile): stamps source.profile (when unset) before delegating to the shared _handle_message, so the agent turn and session key resolve to that profile. - Same-platform credential-conflict detection: _adapter_credential_fingerprint hashes the adapter's bot token (salted, truncated — never logs the token); two profiles claiming the same (platform, token) refuse the duplicate with a clear error naming both, since one token can't be polled twice. - Port-binding hard-error: a SECONDARY profile that enables a port-binding platform (webhook, api_server, msgraph_webhook, feishu, wecom_callback, bluebubbles, sms) is a config error and aborts startup via MultiplexConfigError — the default profile owns the single shared HTTP listener and serves every profile through the /p/<profile>/ prefix, so a second bind can only collide. Distinct from a transient connect failure (which logs + stays alive to retry): a config error writes gateway_state=startup_failed and exits cleanly with an actionable message (names the profile, the platform, and the fix). There is no valid reason to bind a second port once you've opted into a multiplexer. - Shutdown tears down secondary adapters alongside the primary ones. - Defensive getattr guards keep partial-construction unit tests (stop(), _run_agent on bare instances) working. No-op when multiplex_profiles is off (self._profile_adapters stays empty). Tests: fingerprint stability/log-safety/distinctness, profile message-handler stamping (and not overriding an already-stamped source), port-binding hard-error raises + names the profile/platform, non-binding platform is not rejected, and the guard set covers every TCP-binding adapter.	2026-06-19 07:34:15 -07:00
Ben Barclay	f35abb122a	feat(gateway): multiplex phase 1 — HTTP-inbound /p/<profile>/ routing (webhook) Serve webhook inbound for multiple profiles off the one shared listener via a URL prefix, with no second port bound. - SessionSource gains a 'profile' field (round-trips through to_dict/from_dict; omitted when unset so existing serialization is unchanged). It carries which profile an inbound message was routed to. - WebhookAdapter registers /p/{profile}/webhooks/{route_name} alongside the existing /webhooks/{route_name}. _resolve_request_profile validates the prefix against profiles_to_serve(): None when absent or multiplexing is off (ignored, handled as default — no spurious 404), the profile name when valid, _PROFILE_REJECTED (→ 404) when the profile isn't served. The resolved profile is stamped onto the SessionSource. - session-key namespacing and the per-turn home/credential scope now prefer source.profile: SessionStore._resolve_profile_for_key(source), _session_key_for_source fallback, and _resolve_profile_home_for_source all honor it (→ the agent turn resolves that profile's config/skills/credentials via the Phase 2 _profile_runtime_scope). Constraint: routing inbound needs no per-profile platform credential, but the agent still needs the routed profile's provider key — delivered by Phase 2's secret scope. api_server (OpenAI-compatible surface) profile routing is a focused follow-on; its source-construction path differs from webhook's. Tests: SessionSource.profile round-trip + namespace drive; _resolve_request_ profile accept/reject/ignore matrix.	2026-06-19 07:34:15 -07:00
Ben Barclay	f538470cf4	feat(gateway): multiplex phase 2 — fail-closed profile credential isolation (Workstream A) The credential gate. When multiplexing is active, a profile's secrets resolve from a context-local scope, never the process-global os.environ (which in a multiplexer may hold another profile's keys, and is inherited by every subprocess spawned with env=dict(os.environ)). - agent/secret_scope.py: get_secret() backed by a secret-scope contextvar. FAIL-CLOSED: when multiplex is active and no scope is installed, an unscoped read RAISES UnscopedSecretError instead of falling back to os.environ — a missed/new call site crashes loudly at that line rather than leaking a cross-profile value. Genuinely-global vars (HERMES_*, PATH, kanban paths, …) keep reading os.environ via an allowlist. load_env_file/build_profile_ secret_scope parse a profile .env into an isolated dict WITHOUT mutating os.environ. Off by default => transparent os.getenv behavior. - hermes_cli/runtime_provider.py: all credential/provider/base-url reads go through _getenv -> get_secret. - agent/credential_pool.py: env fallbacks route through get_secret (the ~/.hermes/.env-first preference is preserved and already profile-correct via the home override). - tools/mcp_tool.py: MCP config interpolation resolves through get_secret, so a server's picks up the routed profile's value. - gateway/run.py: set_multiplex_active() at GatewayRunner init; per-turn .env reload is a no-op for credentials in multiplex mode (secrets come from the scope, not global env); _profile_runtime_scope context manager combines the HERMES_HOME override + secret scope; _run_agent wraps _run_agent_inner in that scope (resolved via _resolve_profile_home_for_source) when multiplexing. Propagates into the agent worker thread for free via the existing copy_context() in _run_in_executor_with_context. Tests: 13 unit (fail-closed, scope isolation, global allowlist, .env parsing without environ mutation) + 7 E2E (runtime_provider + MCP interpolation prove two profiles isolated, unscoped read raises, globals still read environ).	2026-06-19 07:34:15 -07:00
Ben Barclay	d82f9fa7f7	feat(gateway): multiplex phase 0 — config flag, profile enumeration, profile-stamped session keys Foundations for serving multiple profiles from one gateway process, inert when off: - gateway.multiplex_profiles config flag (default false), round-trips through GatewayConfig and load_gateway_config (top-level + nested gateway.* form). - hermes_cli.profiles.profiles_to_serve(multiplex): the single chokepoint for which (profile, HERMES_HOME) pairs the gateway serves. Lightweight dir scan; active-profile-only when off, default + all named profiles when on. - build_session_key gains a profile= namespace slot. Default/None reuse the historical 'agent:main:...' literal BYTE-IDENTICALLY (no session migration, positional parsers unaffected); a named profile becomes 'agent:<profile>:...' so two profiles on the same platform/chat never collide. - SessionStore._resolve_profile_for_key + _session_key_for_source fallback resolve the namespace from the flag (legacy when off, active profile when on). Tests: byte-identical-when-off (parametrized), namespace isolation, positional layout preserved, config round-trip, profiles_to_serve enumeration.	2026-06-19 07:34:15 -07:00
snav	caaa916289	fix(gateway): don't let delayed Discord status messages partition history backfill Discord channel-history backfill partitions on Hermes' last self-authored message. Asynchronous, non-conversational status sends (self-improvement review bubbles, heartbeats, background-process notifications, update status, gateway restart/online notices) land as ordinary bot messages, so a delayed status bump becomes the history boundary and swallows real messages that arrived after Hermes' actual reply. Mark these sends at the source via metadata["non_conversational"] (Discord only; other platforms' metadata is unchanged). The adapter no longer advances the history-boundary cache for marked sends and persists their IDs to a sidecar JSON so the cold-start scan can skip them by ID after a restart. A narrow regex recognizer remains only as an upgrade bridge for status bumps emitted by an older gateway that pre-dates the marking.	2026-06-19 07:29:27 -07:00
Alex Yates	fad4b40d9d	fix(model): persist /model switch by default across sessions A plain /model <name> switch only lasted for the current session — every new session reverted to the previously-configured model, so users had to re-switch every time (e.g. glm-5.1 -> glm-5.2 on every launch). Persist-by-default is now the behavior across all three /model surfaces (CLI, gateway, TUI/dashboard), gated by a new config key model.persist_switch_by_default (default true): /model <name> switch model (persists to config.yaml) /model <name> --session switch for this session only /model <name> --global switch and persist (explicit, unchanged) The effective persistence is resolved once via resolve_persist_behavior() in hermes_cli/model_switch.py so --session opts out, --global opts in, and the config-gated default applies otherwise. --global remains a valid explicit no-op alias for the new default.	2026-06-19 07:07:06 -07:00
Charles Power	715fa9ea1c	fix(gateway): harden gateway command-line matcher (review findings) Address correctness gaps found in pre-PR review of the strict matcher: - Profile selectors can appear on EITHER side of the `gateway` token (`_apply_profile_override` strips `--profile`/`-p` from anywhere in argv before argparse), so `hermes gateway --profile work run` and `python -m hermes_cli.main gateway -p work run` are valid launches the previous matcher wrongly rejected. Strip `--profile`/`-p`/`--profile=`/`-p=` from anywhere before locating the subcommand. - A profile literally named `gateway` (`hermes -p gateway gateway run`) made the old token scan stop on the profile value; stripping the selector+value first fixes it. - Tokenize quote-aware with `shlex` so quoted Windows paths containing spaces (`"C:\Program Files\Hermes\hermes-gateway.exe"`) are no longer split mid-path and the dedicated-entrypoint match survives. Without these, the matcher could MISS a real running gateway -> the opposite failure (restart/status reporting "down" when up). Adds regression tests for all three shapes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-19 06:31:56 -07:00
Charles Power	fd92a3a5c9	fix(gateway): Windows restart no longer causes a silent outage `hermes gateway restart` on Windows could take the gateway offline with no replacement. restart() was stop() -> sleep(1.0) -> start(), but the graceful drain can run up to ~180s while the detached pythonw process stays alive. The 1s sleep let start() run against the still-draining old process; its "already running" guard then no-opped, and when the old process finally exited nothing relaunched it. Two root causes, both fixed: 1. Loose PID detection. `_scan_gateway_pids` and the gateway.status helpers used substring matches ("... gateway" in cmdline) for lifecycle decisions, so they false-matched `gateway status`/`dashboard` siblings and unrelated processes like `python -m tui_gateway`, plus stale gateway.pid records. Add a shared strict matcher `looks_like_gateway_command_line()` in gateway/status.py that requires the real `gateway run` subcommand (or the dedicated entrypoints), and route `_looks_like_gateway_process`, `_record_looks_like_gateway`, and `_scan_gateway_pids` through it. 2. restart() race. Wait until the gateway is authoritatively gone (`get_running_pid()` + strict `_gateway_pids()`) before relaunch; force-kill once if it lingers and raise rather than start a duplicate; verify the relaunch produced a running gateway and raise loudly if not (no more exit-0 silent outage). Scoped to Windows; systemd/launchd restart paths are already drain-aware. Adds tests/gateway/test_gateway_command_line_matcher.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-19 06:31:56 -07:00
teknium1	144834b2f7	test(gateway): real cached-agent max_iterations regression test Replaces the tautological test from the original PR (which asserted a plain assignment it performed itself in the test body) with one that exercises the actual contracts: _init_cached_agent_for_turn leaves max_iterations untouched, and the per-turn IterationBudget rebuild (turn_context.py) propagates a refreshed cap.	2026-06-19 06:31:13 -07:00
infinitycrew39	dcac719527	test(gateway): cover runtime max_turns refresh	2026-06-19 06:31:13 -07:00
Kenny John Jacob	bce1e36b57	fix(discord): unwrap dict choices + soft-boundary truncate clarify buttons Two bugs surfaced from production usage in #37134: 1. Dict choices rendered as Python repr. LLMs sometimes emit [{"description": "..."}] instead of bare strings; the old str(c).strip() coercion turned the whole dict into "{'description': '...'}" on the button label. Fix: add a _flatten_choice helper that unwraps dicts against the canonical LLM tool-call user-facing keys (label, description, text, title) in that order. Dicts with none of those keys are dropped. The "name" and "value" keys are deliberately NOT in the priority list — they're Discord-component-shaped fields that could appear in dicts that aren't meant to be choices (a developer-error wiring that passes a Button-shaped object); picking them would leak raw enum values or 4-char model identifiers onto user-facing buttons. 2. Mid-word truncation on long button labels. The old choice[:72] + "..." cut at position 72, mid-word. Worse, the three-char ellipsis ate into the 80-char Discord label cap, leaving only 75 chars of body. Fix: budget-aware cut strategy with three tiers: a. Last space in the trailing half of the budget (word boundary). b. Last soft boundary (- , . )) in the trailing half — used only when no word boundary exists. c. Hard cut at the budget limit (last resort). Use single U+2026 (…) to fit the cap. Cut AT soft boundaries (inclusive) so the label ends on the boundary char rather than on the alpha char that followed it. Tests: - test_unwraps_dict_choices_to_description: reproduces the screenshot in #37134, asserts the Python repr is gone. - test_unwrap_prefers_description_over_name_in_multi_key_dict: regression guard for the name-key order in the unwrap list. - test_unwrap_prefers_label_over_description: regression guard for label winning over description. - test_unwrap_does_not_pick_value_or_name_alone: regression guard for the "name"/"value" fields being absent. - test_truncates_long_choice_label: 200-char input, asserts total <= 80 and U+2026. - test_truncates_long_choice_label_breaks_on_word_boundary: asserts the cut is on a space, not mid-word. - test_truncates_long_no_space_choice_on_soft_boundary: adversarial input where position 76 is mid-word alpha, asserts the renderer falls back to a soft boundary. Parity: telegram clarify suite (12 tests) still passes; the helper is a Discord adapter local, not shared with the gateway. Follow-up: gateway/platforms/telegram.py has the same str(c).strip() pattern in its own send_clarify and will need a similar fix (separate PR to keep this diff reviewable). Fixes #37134	2026-06-19 06:31:08 -07:00
Ben Barclay	a64fc490fe	fix(relay): make hosted gateways actually connect AND complete the inbound/outbound round-trip (#48828 ) * fix(relay): enable RELAY platform + normalize dial URL so hosted gateways actually connect Three bugs blocked a self-provisioned hosted gateway from ever establishing its inbound relay WS (found while standing up the live staging end-to-end). Each masked the next; all three are needed for inbound to work. 1. RELAY platform never enabled in config.platforms (gateway/config.py). register_relay_adapter() puts the adapter in the platform_registry, but start_gateway()'s connect loop iterates self.config.platforms — which never contained Platform.RELAY. So the adapter was "registered" but never connected (logs showed "relay adapter registered" then "No messaging platforms enabled"). Fix: _apply_env_overrides now enables Platform.RELAY (mirroring relay_url into extra for the connected-checker) when GATEWAY_RELAY_URL (env) or gateway.relay_url (yaml) is set. Absent -> no RELAY entry (direct/ single-tenant gateways unaffected). 2. URL scheme not converted for the WS dial (gateway/relay/ws_transport.py). The relay URL is configured once as the http(s):// base (used as-is for the provision POST), but websockets.connect rejects http(s):// with "scheme isn't ws or wss". Fix: _ws_dial_url converts https->wss / http->ws. 3. /relay path not appended (same helper). The connector mounts its WebSocketServer at path "/relay" and returns HTTP 400 on an upgrade to any other path. GATEWAY_RELAY_URL is the base (no /relay), so the dial hit "/" -> 400. Fix: _ws_dial_url ensures the path ends in /relay. Idempotent — a URL already carrying ws(s):// and/or /relay is unchanged, so provision's _provision_url (which derives /relay/provision from either form) still works. Why the cross-repo E2E missed #2/#3: the stub connector binds ws://host:port and its websockets.serve accepts ANY path, so neither the scheme nor the /relay path was exercised. Real connector needs both. Verified live on staging hermes-agent-stg-automated-perception-5054: after the fixes the gateway logs "Connecting to relay..." -> "✓ relay connected" -> "Gateway running with 1 platform(s)" against wss://gateway-gateway.staging-nousresearch.com/relay, stable. Tests: added _ws_dial_url scheme+path+idempotency cases (test_ws_transport.py) and RELAY-platform-enablement cases for env + yaml + absent (test_config.py). Full gateway/relay + config suites green (191 passed). Relay-adapter lane. EXPERIMENTAL. * fix(relay): re-attach guild_id to outbound so connector egress resolves the tenant The final bug in the hosted-relay round-trip. Inbound worked end to end (Discord -> connector -> bus -> agent WS -> agent runs -> reply), but the reply's egress was declined by the connector: "discord egress declined: target not routed to an onboarded tenant". Cause: the connector's routedEgressGuard resolves the owning tenant from the OUTBOUND action's metadata.guild_id (Discord's routing discriminator). The gateway's generic delivery path builds outbound metadata via run.py _thread_metadata_for_source, which only carries thread_id (and returns None entirely for a non-threaded message) — so guild_id never reached the connector, tenant resolution failed, and the shared bot refused to post. Fix (relay-adapter-local, no perturbation of the generic delivery path or other platforms): RelayAdapter learns chat_id -> guild_id from each inbound event (_capture_scope) and re-attaches it to the outbound action's metadata in send() (_with_scope) when not already present. No-op for chats we never saw inbound (e.g. DMs) and never overwrites an explicit guild_id. Verified live on staging hermes-agent-stg-automated-perception-5054: an @mention in #general now produces a visible bot reply — full multi-tenant relay round-trip (real Discord -> shared connector bot -> tenant routing -> agent WS -> reply egress -> Discord). Tests: _capture_scope/_with_scope reattach, no-scope no-op, explicit-guild_id preserved (test_relay_adapter.py). Full relay + config suites green (160 passed). Relay-adapter lane. EXPERIMENTAL.	2026-06-19 16:30:24 +10:00
Ben Barclay	2c6e266e88	fix(relay): trigger self-provision on relay-config + NAS token, not is_managed() (#48724 ) self_provision_if_managed() gated on is_managed(), but is_managed() means "NixOS/package-manager-managed" (it keys on HERMES_MANAGED or a ~/.hermes/.managed marker) — NOT "NAS-hosted". A NAS-provisioned Fly agent sets NEITHER, so the gate was always False and relay self-provision SILENTLY no-oped on exactly the hosted agents it was built for. Caught live: a staging agent with GATEWAY_RELAY_URL correctly stamped logged "No messaging platforms enabled" and never dialed the connector; HERMES_MANAGED was unset on the machine. The unit tests had mocked is_managed()->True, so they passed while the real trigger never fired (mocked- trigger blind spot). Fix: drop the is_managed() gate and rename self_provision_if_managed -> self_provision_relay. The real trigger is now "relay_url() set + no pinned secret + a resolvable NAS token", which is both NAS-independent and self-guarding: - NAS-hosted agent: GATEWAY_RELAY_URL + no pinned secret + bootstrapped NAS token -> self-provisions. - Self-hosted + `hermes gateway enroll`: pinned GATEWAY_RELAY_SECRET -> skipped (existing secret-present guard). - Self-hosted, unenrolled, no NAS identity: resolve_nous_access_token() fails -> graceful no-op (existing fail-soft path). Security: unchanged trust model. The connector still derives tenant from the validated NAS token; this only broadens WHEN the provision attempt fires, and every broadened case is still guarded by token-resolution + pinned-secret-skip. Tests: replaced the (wrong) "skips when not managed" test with a regression test proving a NAS host where is_managed()==False STILL provisions; renamed all call sites; added a "no NAS token -> non-fatal skip" test for the self-hosted branch. 88 relay tests pass. Relay-adapter lane. EXPERIMENTAL.	2026-06-19 01:01:24 +00:00
Ben Barclay	d2c53ff558	feat(relay): WS-only inbound on the gateway adapter (Phase 3) (#48294 ) The connector now delivers inbound (messages + interrupts) over the gateway's OUTBOUND /relay WebSocket, not a signed HTTP POST to an inbound endpoint. The gateway needs no inbound HTTP port — which is what makes hosted gateways (no public IP) able to receive inbound at all. - gateway/relay/adapter.py: connect() wires set_interrupt_inbound_handler( self.on_interrupt) so connector->gateway interrupt_inbound frames bridge into the existing per-session interrupt path (the inbound message handler was already wired). Removed _maybe_start_inbound_receiver() + the _inbound_runner lifecycle — there is no HTTP receiver anymore. - gateway/relay/inbound_receiver.py: deleted (the signed-HTTP InboundDelivery receiver). - gateway/relay/__init__.py: removed relay_inbound_config() (dead with the receiver gone). The delivery key is still set in-process by self-provision for forward-compat but is no longer consumed for inbound. - docs/relay-connector-contract.md: §3 rewritten — inbound is the WS back-channel routed cross-instance via the connector's relay bus; §5 interrupt + §6 auth table updated; the old signed-HTTP-POST + per-tenant-delivery-key-signing path is documented as superseded. gatewayEndpoint noted as passthrough-plane only. Tests: stub_connector grows set_interrupt_inbound_handler + push_interrupt; new test_relay_interrupt case proves connect() wires BOTH inbound handlers and an interrupt_inbound frame over the WS cancels the right session. Removed the HTTP-receiver test; updated the crypto-shedding scan + self-provision delivery-key assertion. 88 relay tests pass. EXPERIMENTAL. Pairs with gateway-gateway (relay bus + WsGatewayDelivery) and the NAS GATEWAY_RELAY_URL stamp. The cross-repo E2E (connector repo) proves the full multi-instance path against this production adapter code.	2026-06-19 09:33:15 +10:00
Ben Barclay	0ddd21c74e	feat(relay): managed-boot self-provision client (Phase 3, gateway side) (#48242 ) The gateway half of relay Phase 3. On a MANAGED boot with relay configured and no secret pinned, the runtime self-provisions its relay credentials IN-PROCESS: resolve the agent's own Nous access token (resolve_nous_access_token) -> POST the connector's /relay/provision asserting its own endpoint + route keys -> set GATEWAY_RELAY_ID/SECRET/DELIVERY_KEY into os.environ so the immediately- following register_relay_adapter() reads them and dials out authenticated. No human, no enrollment token, no disk write — the creds live only in process memory (save_env_value refuses under managed anyway, and keeping the secret off any volume is the stronger posture). Stateless: process-env creds don't survive a restart, so a managed container re-provisions every boot; the connector's rotation window covers a still-connected prior instance. An explicitly-pinned GATEWAY_RELAY_SECRET is respected (skip). Self-hosted is unchanged: humans keep using `hermes gateway enroll`. Endpoint provenance is gateway-asserted (GATEWAY_RELAY_ENDPOINT + GATEWAY_RELAY_ROUTE_KEYS, env or gateway.relay_* config) — uniform code path whether the operator sets it (self-hosted) or NAS stamps it (hosted, the only case NAS knows the public URL). Both absent -> outbound-only provisioning (credentials, no inbound routes). The connector scopes the asserted endpoint to the verified tenant, so it stays within the security model. - gateway/relay/__init__.py: relay_endpoint(), relay_route_keys(), _provision_url(), _post_provision(), self_provision_if_managed() (never raises — a provision failure logs and boots without relay auth). - gateway/run.py: call self_provision_if_managed() immediately before register_relay_adapter() in the startup path. Tests: 12 unit (trigger logic, respect-pinned-secret, in-process env wiring, endpoint+routes vs outbound-only, fail-soft on token/connector failure); mutation-checked (drop is_managed guard / pinned-secret guard -> tests fail). Cross-repo live E2E driver lands on the connector side (depends on this). EXPERIMENTAL: relay auth scheme may change until >=2 Class-1 platforms validate.	2026-06-18 15:25:29 +10:00
Ben Barclay	c276b017ad	feat(relay): connector⇄gateway channel auth + signed-HTTP inbound receiver + enroll CLI (#48147 ) * feat(relay): authenticate the connector⇄gateway WS channel The relay gateway may be customer-managed and internet-exposed, so the connector⇄gateway channel is itself authenticated (distinct from the platform crypto the relay path sheds). Add gateway/relay/auth.py — a Python port of the connector's HMAC token + delivery-signature schemes (relayAuthToken.ts / deliverySigning.ts), verified byte-for-byte against the connector's compiled TypeScript via cross-language test vectors. Present an Authorization bearer on the /relay WS upgrade keyed by the per-gateway secret (resolved from GATEWAY_RELAY_ID / GATEWAY_RELAY_SECRET in env or config). The connector rejects an unauthenticated/invalid/ revoked upgrade with close 4401. * feat(relay): signed-HTTP inbound delivery receiver The connector delivers normalized inbound events to a tenant's gateway over a signed HTTP POST, not the outbound /relay WS: the connector instance owning a platform socket is generally not the instance a given gateway dialed out to, so inbound targets a tenant endpoint that may load-balance across gateway instances. Add gateway/relay/inbound_receiver.py — verifies x-relay-signature / x-relay-timestamp over the EXACT raw request bytes (re-serializing would break the HMAC: JS JSON.stringify is compact, Python json.dumps spaces) against the per-tenant delivery key verify list within a 300s replay window, then dispatches messages to handle_message and interrupts to the interrupt handler. Wire it into the adapter lifecycle (start in connect() when a delivery key + bind port are configured, tear down in disconnect(); a purely-outbound dev gateway runs without it). Refine test_relay_sheds_crypto to distinguish PLATFORM crypto (Discord ed25519, Twilio/WeCom HMAC — still shed) from the connector⇄gateway CHANNEL auth (intended): auth.py / inbound_receiver.py are exempt from the platform-symbol scan but still banned from importing platform-crypto modules, plus a positive guard that auth.py uses only stdlib hmac/hashlib. * feat(relay): hermes gateway enroll CLI Add the gateway half of zero-touch enrollment. `hermes gateway enroll` resolves a fresh Nous Portal access token (the tenant-proving identity), POSTs {enrollmentToken, gatewayId} to the connector's /relay/enroll, and persists GATEWAY_RELAY_ID / GATEWAY_RELAY_SECRET / GATEWAY_RELAY_DELIVERY_KEY to ~/.hermes/.env. The per-gateway secret authenticates the WS upgrade; the per-tenant delivery key verifies signed inbound deliveries. Refuses under is_managed() (hosted installs get the secret stamped in by the orchestrator). Added as an 'enroll' subcommand on the existing gateway subparser — not a new top-level command. * docs(relay): inbound is signed HTTP, not WS; document channel auth Fix the stale contract: §3/§5 said inbound rode the WS socket (single- instance only, predates the multi-instance socket-ownership + channel-auth model). Inbound + connector→gateway interrupt are signed HTTP POSTs to the tenant endpoint. Add §6.1 documenting the two channel-auth schemes (per- gateway WS-upgrade secret, per-tenant inbound delivery key) and how they differ from the platform crypto the relay path sheds. * test(relay): update build_gateway_parser callers for cmd_gateway_enroll The enroll subcommand added cmd_gateway_enroll as a required keyword-only arg to build_gateway_parser, but two existing parser-extraction tests still called it with only cmd_gateway/cmd_proxy — failing CI with TypeError. Thread the new handler through both call sites and add a test asserting `gateway enroll` dispatches to cmd_gateway_enroll with its flags parsed.	2026-06-18 12:01:54 +10:00
Ben	acc8916ac7	test(gateway): live ws-transport round-trip + config-driven registration - test_ws_transport.py: drives WebSocketRelayTransport against a REAL in-process websockets server (not a mock socket): handshake (hello->descriptor), inbound frame -> handler, outbound request/response correlation, follow_up routing, and clean disconnect failing pending waiters. Skips if websockets is absent. - test_relay_registration.py: rewritten for the config-driven gate — registers when GATEWAY_RELAY_URL is set / an explicit url is passed / force=True; no-op without a URL; trailing slash stripped; adapter constructs through the registry. Full relay suite: 57 passed.	2026-06-17 16:37:45 -07:00
Ben	3db9b3e616	feat(gateway): token-less follow_up outbound op (A2 capability action) The relay outbound surface had send/edit/typing but no way to act on a SHARED-identity capability (e.g. a Discord interaction follow-up token, ~15min) that the connector captured + stripped at the edge. Under A2 that credential never reaches the gateway, so the gateway can't just 'send with the token' — it needs a semantic op naming the session it's already in. Adds the follow_up op end to end on the gateway side: - RelayTransport.send_follow_up(action): protocol method. Action carries op='follow_up' + session_key + kind + content (+ metadata) and NO token. - RelayAdapter.send_follow_up(session_key, kind, content, metadata): builds that action and returns a SendResult. The connector resolves the real capability (its resolveOutboundCapability), enforces the tenant match so tenant B can't wield tenant A's capability, and egresses; success=False when the capability is absent/expired/mismatched (nothing to retry — a leaked gateway holds zero capability material). - StubConnector records follow_ups + a canned next_follow_up_result. Tests: round-trips without a token; the wire action carries only session refs (no credential value field — the 'kind' string is a type ref, not the secret); failure surfaces when the connector can't resolve; no-transport fails cleanly. 55 passed. §4 doc entry follows in the contract-rewrite commit.	2026-06-17 16:37:45 -07:00
Ben	c28a02b49d	test(gateway): shed platform crypto from the relay path (A2 invariant) Under the A2 trust model the connector is the SOLE crypto/identity boundary: it verifies/decrypts every inbound platform payload at the edge (it holds the tenant secrets), normalizes to a tenant-scoped MessageEvent, and forwards only the sanitized event. The gateway re-validates nothing — it cannot without being handed the shared signing secret, which on a shared bot is itself the cross-tenant leak. The relay path already imports no platform-crypto today; this locks that in as an enforced invariant so nobody bolts re-validation (Discord ed25519, Twilio HMAC, WeCom BizMsgCrypt, generic webhook signature checks) onto the relay later and silently re-couples the gateway to platform secrets it must never hold. Verification stays in the direct platform adapters (gateway/platforms/) which serve non-relay deployments. - test_relay_package_imports_no_platform_crypto: AST-walks gateway/relay/ and fails on any import of a platform-crypto/verification module. - test_relay_package_calls_no_signature_verification: fails on any verification-symbol reference (ed25519/hmac/bizmsg/verify_*). Invariants (assert the relation 'relay re-validates nothing'), not frozen snapshots. Verified the guard bites: injecting a wecom_crypto import makes it fail, removing it goes green. docs §6 rewrite follows in a later commit.	2026-06-17 16:37:45 -07:00
Ben	e74577ed0f	test(gateway): Telegram relay round-trip (Phase 1 generalization proof) The Phase 1 exit gate requires BOTH Discord and Telegram to round-trip through the relay stub, but test_relay_roundtrip.py only covered Discord. Add the Telegram companion exercising its distinct discriminator profile: - no guild_id — two chats isolate on chat_id alone - forum topics share one chat_id and isolate by thread_id (the Telegram analog of Discord per-guild isolation), shared across participants by default (thread_sessions_per_user=False) - DM isolation by chat_id - utf16 len_unit + markdown_v2 dialect round-trip and configure the adapter - outbound send round-trips through the stub Proves the CapabilityDescriptor + build_session_key generalize beyond Discord, not just the struct (which the descriptor unit tests already covered).	2026-06-17 16:37:45 -07:00
Ben	5feec8b4cf	test(gateway): enforce relay contract-doc ⟷ Python conformance Add an invariant test pinning docs/relay-connector-contract.md to the Python source of truth so the doc (which the connector repo mirrors by hand) cannot silently drift: - CapabilityDescriptor §2 table ⟷ dataclass fields + required/optional - SessionSource wire keys (to_dict output) ⟷ §3 documented fields - per-platform discriminator columns exist as real SessionSource fields - guard that is_bot stays off the wire until deliberately promoted Writing the test surfaced a real gap: §3 only enumerated 5 discriminators in its per-platform table while to_dict() emits 12 keys. Seven wire keys the connector must populate (chat_name, chat_topic, user_id_alt, chat_id_alt, parent_chat_id, message_id, user_name) were undocumented — a connector author reading the doc would never know to set them. Added a complete SessionSource wire-field table to §3. The connector's existing contract.ts already carries all 12, so no connector change is needed; the doc was the lagging artifact.	2026-06-17 16:37:45 -07:00
Ben	c803661cec	fix(gateway): register relay connection checker The platform-connected-checker invariant test requires every built-in Platform enum member to have either a generic token path or a bespoke entry in _PLATFORM_CONNECTED_CHECKERS. Platform.RELAY was added without one, so test_all_builtins_have_checker_or_generic_token_path failed. Relay dials OUT to a connector and is 'connected' once an endpoint URL is configured (extra['relay_url'] or extra['url']); the capability descriptor is negotiated at handshake time, so the URL is the only config-level signal in the experimental phase. Add the checker plus a synthetic-config case exercising its True path.	2026-06-17 16:37:45 -07:00
Ben	c366466d70	test(relay): assert connector stub never leaks into production paths CI guard: fails if gateway/ or plugins/ ever imports the test-only stub connector or defines StubConnector. Matches code leaks (imports / class defs), not prose mentions, so the transport.py docstring reference to the stub's path is allowed. Phase 1 complete. Task 1.6 of the gateway-relay plan.	2026-06-17 16:37:45 -07:00
Ben	a3cdd8c39d	feat(relay): route mid-turn /stop over relay interrupt channel RelayAdapter.on_interrupt(session_key, chat_id) bridges a connector-delivered mid-turn /stop into the existing interrupt_session_activity path, setting the per-session _active_sessions Event and clearing typing — cancelling exactly the targeted session's turn without touching siblings (mirrors test_stop_thread_ sibling isolation). Transport.send_interrupt carries the gateway-side egress to the connector for socket-owner routing. Phase 1, Task 1.4 of the gateway-relay plan.	2026-06-17 16:37:45 -07:00
Ben	d0133fd8e4	feat(relay): register RelayAdapter through platform registry (flagged off by default) register_relay_adapter() registers the generic 'relay' platform via the same PlatformRegistry path as plugin adapters — no core dispatch changes. OFF by default (dark-launch): only registers when HERMES_GATEWAY_RELAY is truthy (or force=True for tests), so existing single-tenant/direct deployments are unaffected. Factory builds a transport-less RelayAdapter with a placeholder descriptor; the real descriptor is negotiated at handshake. Phase 1, Task 1.3 of the gateway-relay plan.	2026-06-17 16:37:45 -07:00
Ben	259e78e175	feat(relay): transport protocol + test-only stub connector Defines RelayTransport (lifecycle/handshake/inbound/outbound/interrupt) as the gateway<->connector wire contract; RelayAdapter.connect now registers an inbound handler that bridges connector-delivered MessageEvents into handle_message. Adds an in-memory StubConnector under tests/ and an E2E round-trip proving: connect registers the handler, inbound events reach the adapter, guild_id drives build_session_key isolation (two guilds -> two keys; same guild/channel/user -> one), outbound send round-trips, get_chat_info is proxied. Phase 1, Task 1.2 of the gateway-relay plan.	2026-06-17 16:37:45 -07:00
Ben	b0999c82f3	feat(relay): generic RelayAdapter advertising negotiated capabilities One BasePlatformAdapter subclass that reads its capability profile from a CapabilityDescriptor: MAX_MESSAGE_LENGTH attribute, message_len_fn (table-driven by len_unit: chars=len, utf16=Telegram-style code units), supports_draft_streaming. Implements the four abstract methods (connect/disconnect/send/get_chat_info) by delegating to an injected RelayTransport (full protocol lands in Task 1.2). Adds Platform.RELAY enum member. No per-platform gateway code. Phase 1, Task 1.1 of the gateway-relay plan.	2026-06-17 16:37:45 -07:00
Ben	3db49381d6	feat(relay): derive descriptor from PlatformEntry CapabilityDescriptor.from_platform_entry() projects an existing PlatformEntry (label, max_message_length, emoji, platform_hint, pii_safe, name) into a descriptor, proving the descriptor is a projection of existing config rather than a parallel concept. Runtime-only capabilities (len_unit, draft/edit/ thread/markdown) are caller-supplied. max_message_length==0 ('no limit') maps to the stream_consumer 4096 default. Phase 0 complete. Task 0.3 of the gateway-relay plan.	2026-06-17 16:37:45 -07:00
Ben	53d9b98305	feat(relay): experimental CapabilityDescriptor schema Frozen, JSON-serializable handshake payload the connector hands the future RelayAdapter: char limit, draft-streaming/edit/threading flags, markdown dialect, len_unit. Mostly a wire projection of PlatformEntry + the adapter capability methods. contract_version gates additive-only evolution; declared EXPERIMENTAL until >=2 Class-1 platforms validate it. from_json ignores unknown keys (forward-compat) and fills optional defaults. Phase 0, Task 0.2 of the gateway-relay plan.	2026-06-17 16:37:45 -07:00
Ben	e9a2ce6585	test: lock gateway adapter capability surface (relay phase 0) Behavioral regression harness locking the capability surface that the future RelayAdapter must reproduce: the abstract-method set (connect/disconnect/send/ get_chat_info), message_len_fn default, supports_draft_streaming default, and the stream_consumer MAX_MESSAGE_LENGTH attribute read. Passes on main before any RelayAdapter exists. Phase 0, Task 0.1 of the gateway-relay plan.	2026-06-17 16:37:45 -07:00
teknium	36ae958473	feat(gateway): gate message timestamps behind opt-in (default off) Follow-up to salvaged PR #41633: the timestamp prefix injection was unconditional. Gate the in-context render behind gateway.message_timestamps.enabled (default false) at both the live-message and history-replay sites; timestamp metadata is still captured + persisted regardless so the toggle can be flipped on later. Add DEFAULT_CONFIG entry, docs, and gate tests.	2026-06-16 15:49:59 -07:00
Wolfram Ravenwolf	bd7fc8fdcd	feat(gateway): inject stable human-readable message timestamps Consolidates these related Amy fork patches: - 429830f39 feat(gateway): inject message timestamps into user messages for LLM context - 3c3d6fac0 fix: handle both ISO string and epoch float timestamps in history replay - 2874f7725 feat: human-friendly timestamp format with weekday and timezone name - 3735f4c8b fix: render gateway message timestamps once	2026-06-16 15:49:59 -07:00
teknium1	8ed16a7a0c	test(telegram): rich-reply recovery via send-time index Cover #47375 fix: record-on-rich-send + lookup-on-reply round trip, lookup miss leaving reply_to_text None, and precedence (native quote and echoed caption both win over the index fallback).	2026-06-16 13:04:20 -07:00
Wolfram Ravenwolf	16fc717091	fix(mattermost): harden delivery hygiene PROBLEM: Mattermost threads can become invalid or enormous, exposing two failure modes: internal scratch/reasoning/commentary displays could leak into persistent Mattermost threads via global display toggles, while rejected threaded user-visible replies could disappear unless every failed send fell back flat. A broad flat fallback would pollute channels with tool/status/progress noise. SOLUTION: Require explicit Mattermost platform opt-in for scratch displays, keep using the existing notify=True metadata marker for user-visible final text/media/file replies, and allow the Mattermost plugin adapter to flat-fallback only notify-worthy sends whose threaded POST failure looks like a broken root/thread. Keep tool/status/progress and other non-notify sends thread-strict. Add regression tests for display opt-in, notify-only broken-thread fallback, generic API failure suppression, and stream notify metadata. Verification: tests/gateway/test_mattermost.py tests/gateway/test_stream_consumer.py tests/gateway/test_stream_consumer_thread_routing.py tests/gateway/test_stream_consumer_fresh_final.py tests/gateway/test_stream_consumer_draft.py; tests/gateway/test_session_api.py tests/gateway/test_status_command.py tests/gateway/test_resume_command.py tests/hermes_cli/test_commands.py; py_compile touched gateway files; git diff --check. Session: Mattermost thread 6qg8e9dd1pd9pkhi74xyaa1mry, 2026-06-01.	2026-06-16 06:34:54 -07:00
Rory Evans	e65d74bc6f	fix(gateway): accept `metadata` kwarg in WhatsApp/email send_image `BasePlatformAdapter.send_multiple_images` passes `metadata=metadata` to `send_image` / `send_image_file` / `send_animation` on every send. The WhatsApp and email `send_image` overrides stopped their signature at `reply_to`, so any image delivered as a URL (the common case — image-gen backends return URLs) raised: TypeError: send_image() got an unexpected keyword argument "metadata" and the image silently failed to send. Their sibling overrides (`send_image_file` / `send_video` / `send_voice` / `send_document`) already absorb it via **kwargs, which is why only plain image-URL sends broke. - whatsapp/email `send_image`: accept `metadata` (matches the base signature); WhatsApp forwards it to the super() text fallback. - Add `tests/gateway/test_media_metadata_contract.py`: asserts WhatsApp + email accept it, plus a best-effort sweep over every adapter so the next slip fails at test time instead of in production. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 06:23:53 -07:00
teknium	6373aba80f	feat(gateway): rename to tool_progress_grouping, add config/docs/tests Follow-up to salvaged PR #41620: - Rename tool_progress_style -> tool_progress_grouping (clearer intent) - Add display.tool_progress_grouping to DEFAULT_CONFIG (accumulate default) - Document in messaging docs incl. 'separate is noisier, only where progress enabled' - Add resolver tests (default/global/override/invalid/case)	2026-06-16 05:49:24 -07:00
Teknium	a6364bfa08	fix(telegram): edit streamed previews in place as rich (Bot API 10.1) (#46890 ) Streamed Telegram replies that finalize through editMessageText were converted to MarkdownV2, which has no table syntax and rewrites pipe tables into bullet lists — users saw a table while streaming that collapsed to a list at the last moment. Finalize now edits the existing preview IN PLACE via Bot API 10.1's editMessageText rich_message parameter when the content has constructs the legacy path degrades (tables, task lists, <details>, block math). No fresh send + delete, so no duplicate-preview flicker — the reason #46206 reverted the fresh-final re-send path. prefers_fresh_final_streaming stays False; the in-place edit replaces it. - _needs_rich_rendering(): rich reserved for table/task-list/details/math (adapted from #45995, @YonganZhang); plain replies stay on MarkdownV2. - _try_edit_rich(): editMessageText + rich_message via do_api_request, mirroring _try_send_rich's fallback/latch/transient contract. - edit_message finalize tries rich in place before the 4,096 overflow pre-flight (rich cap is 32,768), falling back to legacy on rejection. - rich_messages default flipped back to True (DEFAULT_CONFIG + adapter). - docs (en + zh-Hans) + cli-config example updated to default-on. Closes the root cause behind #45911 / #46009.	2026-06-16 05:26:04 -07:00
Teknium	5a0e0d35b9	fix(mattermost): preserve thread-local delivery hygiene Salvage the valid thread-routing pieces from #41640: - route Mattermost progress/status sends through metadata thread IDs - treat top-level Mattermost channel posts as thread roots for progress - preserve thread metadata through media/file sends - allow flat fallback only for final notify-worthy replies on confirmed broken roots Co-authored-by: Wolfram Ravenwolf <github.com@wolfram.ravenwolf.de>	2026-06-15 15:06:23 -07:00
kshitij	d2b34e89b0	Merge pull request #44431 from erosika/feat/honcho-identity-tree feat(honcho): gateway-gated identity tree + canonicalize on pinUserPeer	2026-06-16 03:35:24 +05:30
kshitij	cffd6e3c8d	Merge pull request #46078 from xxxigm/fix/discord-slash-command-100-cap fix(discord): cap slash commands at Discord's 100-command limit	2026-06-16 02:05:31 +05:30
Austin Pickett	5f6be7f31b	fix(teams): package Microsoft Teams SDK as an installable extra (salvage #43945 ) (#46764 ) * fix(teams): package Microsoft Teams SDK as an installable extra The Teams adapter imports the microsoft-teams-apps SDK, but it was never declared as a dependency, so source/local installs hit ImportError and the adapter silently reported the SDK as unavailable. Add a 'teams' extra (microsoft-teams-apps==2.0.13.4 + aiohttp) and document 'uv sync --extra teams'. Per the 2026-05-12 [all] policy, opt-in messaging-platform SDKs are NOT added to [all] (they would break every fresh install on a quarantined release); the teams extra is installed on demand like the other platform backends. Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai> * chore: map rio-jeong contributor email for attribution (#43945) * feat(teams): lazy-install the Teams SDK on demand (parity with other channels) The teams extra alone left Teams as the only messaging platform that wouldn't auto-install its SDK — every other channel (telegram, discord, slack, matrix, dingtalk, feishu) lazy-installs via tools.lazy_deps on first connect. Bring Teams to parity: - Add 'platform.teams' to LAZY_DEPS (microsoft-teams-apps + aiohttp). - Replace the passive 'check_teams_requirements = check_requirements' alias with a real lazy-installer that calls ensure_and_bind('platform.teams', ...), rebinding all Teams SDK globals on success (mirrors check_slack_requirements). - Call check_teams_requirements() at the top of TeamsAdapter.connect() so enabling Teams installs the SDK on demand. - Keep the passive check_requirements() as the registry check_fn so 'gateway status' probes never trigger a pip install. The 'teams' extra remains for packagers / explicit 'uv sync --extra teams'. Tests: rework the alias test into shortcircuit + lazy-install assertions, and update test_connect_fails_without_sdk to simulate an uninstallable SDK. --------- Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai> Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-06-15 14:35:15 -04:00
Teknium	3e7e9b24d4	fix: harden salvaged session and browser improvements Polish salvaged contributor work before PR review: - read browser inactivity timeout from config with documented fallback - skip redundant v10 trigram backfill before v11 FTS rebuild - show delegate_task goals safely in progress previews - show gateway status model/context without redundant token wording - wire gateway /sessions to shared session-listing helpers - map Ravenwolf author emails for release attribution Co-authored-by: Wolfram Ravenwolf <github.com@wolfram.ravenwolf.de> Co-authored-by: Amy Ravenwolf <amy@ravenwolf.de>	2026-06-15 07:46:34 -07:00
Wolfram Ravenwolf	ead38107a2	feat(status): restore model and context in gateway status PROBLEM: The old public /status PR drifted out of the current Amy patch stack, leaving /status without the model/provider, context window, or explicit cumulative token label that Wolfram uses to monitor context pressure from chat. SOLUTION: Re-port the feature onto the current gateway status handler. Prefer live/cached agent runtime metadata, fall back to SessionDB + SessionStore state between turns, add localized status model/context lines, and keep token totals explicitly labeled cumulative. Verification: tests/gateway/test_status_command.py, tests/hermes_cli/test_commands.py	2026-06-15 07:46:34 -07:00
Teknium	0d82060c74	fix: harden WhatsApp target alias salvage Add a parser-only routing regression that proves raw WhatsApp group JIDs bypass channel-directory resolution and home-channel fallback, include channel_aliases.json in quick state snapshots, harden malformed alias handling, and map Keiron McCammon for release attribution.	2026-06-15 05:51:47 -07:00
Keiron McCammon	ea49a79633	fix(messaging): route WhatsApp group JIDs to the target, not the home DM send_message(target="whatsapp:<group-jid>") silently delivered to the configured home DM instead of the requested group. Two gaps: 1. _parse_target_ref had no WhatsApp branch. Group JIDs (<id>@g.us), user JIDs (<id>@s.whatsapp.net), linked-identity JIDs (<id>@lid), and broadcast/newsletter JIDs matched no pattern and fell through to `return None, None, False`, so the caller treated them as unresolvable and used the home channel. The bridge's /send endpoint accepts any chatId, so only the tool-side target parsing was at fault. Add a whatsapp branch that recognizes native JIDs as explicit targets. The pre-existing '+'-prefixed E.164 path is preserved. 2. WhatsApp groups have no human-friendly name — the channel directory is regenerated from session data on a timer, so a group shows up as its raw 18-digit JID and any hand-edit to channel_directory.json is clobbered on the next rebuild. Add a user-maintained alias overlay (~/.hermes/channel_aliases.json) re-applied on every build AND every load, giving durable friendly names and letting a freshly-created group be pre-named before its first message. Tests: TestParseTargetRefWhatsAppJID (7 cases) for the parser; TestChannelAliases (7 cases) for the overlay, plus an autouse fixture isolating CHANNEL_ALIASES_PATH so a real alias file can't leak into the existing directory tests.	2026-06-15 05:51:47 -07:00
Tharushka Dinujaya	ec05d2bc3e	fix(gateway): evict scoped lock when PID+start_time match but process is not a gateway On Linux, systemd spawns core services (cron, nginx, sshd) with deterministic PIDs and jiffy start_times across reboots. A service can land on the exact same PID and start_time as a previous gateway, causing acquire_scoped_lock to mistake it for a live gateway and block startup. The existing stale-detection paths only covered: - start_times both non-None and different (clear mismatch) - start_times both None (macOS/Windows fallback to cmdline check) The boot-time collision falls through both: times are non-None and equal, so neither branch fired. Add a third check: when both start_times are known and match but the live process fails _looks_like_gateway_process, read its cmdline. If the cmdline is readable (non-None), we have positive evidence of an impostor and mark the lock stale. Requiring a readable cmdline keeps the check conservative — if cmdline is unreadable we do not evict.	2026-06-15 05:25:07 -07:00
Teknium	a1f51feb72	fix(telegram): avoid rich final duplicate previews (#46206 )	2026-06-14 11:13:38 -07:00

1 2 3 4 5 ...

1352 commits