Commit graph

12172 commits

Author SHA1 Message Date
Teknium
b266ad748c
chore(deps): npm audit fix — bump transitive undici to clear advisories (#49113)
Resolves the 2 npm audit advisories (1 high, 1 moderate), both from
transitive undici:
- undici 6.26.0 -> 6.27.0 (high: TLS bypass / header injection /
  response queue poisoning class, via node-gyp + ui-tui)
- jsdom's undici 7.27.2 -> 7.28.0 (moderate, via jsdom test dep)

Both are in-range bumps (no --force). Lockfile also reconciled two
pre-existing manifest drifts during the install: dompurify 3.4.10 ->
3.4.11 (in-range patch) and the web workspace's already-declared
vitest ^4.1.5 devDep. No package.json changes. npm audit reports 0
vulnerabilities in root, ui-tui, and apps/desktop after.
2026-06-19 08:20:03 -07:00
brooklyn!
0e8b76532e
fix(desktop): rename "Restart messaging" → "Restart gateway", surface restarts in the statusbar, make logs selectable (#49094)
* fix(desktop): rename "Restart messaging" -> "Restart gateway"

The Command Center control restarts the whole messaging gateway, yet was
labelled "Restart messaging" while the status line above it reads "Messaging
gateway running/stopped". Rename the i18n key to match what it does, across
all 4 locales.

* feat(desktop): restart the gateway from Cmd+K, with statusbar spinner feedback

Add a shared runGatewayRestart() (store/system-actions.ts) and wire it to a
new Cmd+K "Restart gateway" action. While a restart is in flight the
statusbar "Gateway" item swaps its icon for the TUI glyph spinner and reads
"restarting…", returning to its real state on completion — driven by a
$gatewayRestarting atom, not a transient toast or the generic "Agents
running" counter. The helper owns its error handling so fire-and-forget
callers can't leak an unhandled rejection; only a failure toasts.

* fix(desktop): offer a Restart gateway action on messaging save/toggle toasts

The "setup saved" and "platform enabled/disabled" toasts told users their
change needs a gateway restart but left it a separate hunt. Attach a "Restart
gateway" action (the shared runGatewayRestart), and reword the copy to state
the pending consequence ("...takes effect after a gateway restart") now that
the button carries the verb. Updated all 4 locales.

* fix(desktop): make rendered logs selectable so they can be copied

The global body { user-select: none } left log surfaces unselectable. Opt them
back in via the existing data-selectable-text convention — at the shared
LogView primitive (boot-failure + bootstrap install overlays) plus Command
Center recent logs, toolset post-setup output, notification detail, and
subagent stream/file lines.
2026-06-19 10:09:15 -05:00
Brooklyn Nicholson
929dbf7778 fix(desktop): make rendered logs selectable so they can be copied
The global body { user-select: none } left log surfaces unselectable. Opt them
back in via the existing data-selectable-text convention — at the shared
LogView primitive (boot-failure + bootstrap install overlays) plus Command
Center recent logs, toolset post-setup output, notification detail, and
subagent stream/file lines.
2026-06-19 10:03:46 -05:00
Brooklyn Nicholson
a1639921ac fix(desktop): offer a Restart gateway action on messaging save/toggle toasts
The "setup saved" and "platform enabled/disabled" toasts told users their
change needs a gateway restart but left it a separate hunt. Attach a "Restart
gateway" action (the shared runGatewayRestart), and reword the copy to state
the pending consequence ("...takes effect after a gateway restart") now that
the button carries the verb. Updated all 4 locales.
2026-06-19 10:03:24 -05:00
Brooklyn Nicholson
553cf4f977 feat(desktop): restart the gateway from Cmd+K, with statusbar spinner feedback
Add a shared runGatewayRestart() (store/system-actions.ts) and wire it to a
new Cmd+K "Restart gateway" action. While a restart is in flight the
statusbar "Gateway" item swaps its icon for the TUI glyph spinner and reads
"restarting…", returning to its real state on completion — driven by a
$gatewayRestarting atom, not a transient toast or the generic "Agents
running" counter. The helper owns its error handling so fire-and-forget
callers can't leak an unhandled rejection; only a failure toasts.
2026-06-19 10:02:54 -05:00
Brooklyn Nicholson
6308d3416a fix(desktop): rename "Restart messaging" -> "Restart gateway"
The Command Center control restarts the whole messaging gateway, yet was
labelled "Restart messaging" while the status line above it reads "Messaging
gateway running/stopped". Rename the i18n key to match what it does, across
all 4 locales.
2026-06-19 10:02:21 -05:00
Teknium
0d7abd555c
fix(dashboard): sort chat session switcher by most-recent activity (#49104)
The Chat-tab session switcher rendered rows in the API's default
order="created" (original start time) while each row displays
last_active — so a session you just messaged in could sit below an
older one, and the list looked unsorted against its own timestamps.

Pass order="recent" from ChatSessionList so the switcher sorts by
latest activity across the compression chain (most-recently-used at
top, ChatGPT-style; long conversations that auto-compressed into a new
continuation id stay on the first page). Adds an optional, defaulted
`order` arg to api.getSessions; the paginated Sessions page keeps the
stable created order.
2026-06-19 07:58:56 -07:00
Teknium
1b04e4ede5
fix(cli): status bar no longer stays hidden after resize during idle (#49105)
The classic CLI status bar could vanish for the rest of a session: any
terminal reflow (SIGWINCH from a tmux pane change, SSH window restore, font
zoom) set _status_bar_suppressed_after_resize=True, but the flag was ONLY
cleared on the next *submitted* user input. Resize then sit idle and the
bottom chrome rendered at height 0 on every repaint — even with the
refresh clock ticking — so the bar was gone until you typed and hit enter.

Fix: _recover_after_resize now schedules a debounced unsuppress timer that
clears the flag and repaints once the reflow settles (~0.35s), so the bar
returns on its own during idle. The next-submit clear stays as a fast path.
Fails open: any error in scheduling clears the flag immediately rather than
leaving the bar stuck hidden.
2026-06-19 07:53:58 -07:00
teknium1
7d86178cf5 fix(raft): set stdin=DEVNULL on bridge subprocess
Satisfies the repo-wide subprocess-stdin guard
(tests/tools/test_subprocess_stdin_guard.py); the long-lived bridge
child should not inherit the gateway's stdin.
2026-06-19 07:52:37 -07:00
teknium1
22ccb12c30 chore(release): map skyzh@mail.build to xxchan for Raft salvage
CI blocks PRs with unmapped commit-author emails.
2026-06-19 07:52:37 -07:00
skyzh
9026a8c789 feat(gateway): add Raft bundled platform plugin with activity hooks
Adds a Raft platform adapter as a bundled plugin (plugins/platforms/raft/)
connecting Hermes to Raft as an external agent via a wake-channel bridge.
The adapter starts a loopback HTTP endpoint, spawns 'raft agent bridge' as a
child process, and injects content-free wake hints into the gateway session
pipeline. The agent reads/sends messages through the Raft CLI; the adapter
never touches message bodies or delivery cursors. Activity observer hooks
report tool/LLM/session lifecycle events via a bounded at-most-once queue.
Auto-enables when RAFT_PROFILE is set.

Cherry-picked from PR #47629. Authored by skyzh (@xxchan).
2026-06-19 07:52:37 -07:00
Teknium
2a5e9d994a
Merge pull request #48275 from NousResearch/feat/cron-scheduler-provider-chronos
feat(cron): pluggable CronScheduler interface + Chronos managed-cron provider (scale-to-zero)
2026-06-19 07:51:59 -07:00
Ben
1928aa0443 fix(managed-scope): honor managed scope in config→env bridges too
Manual verification surfaced a second bypass class beyond the standalone
config loaders: several code paths bridge config.yaml values into os.environ
(HERMES_TIMEZONE, HERMES_REDACT_SECRETS, HERMES_MAX_ITERATIONS, TERMINAL_*,
network.force_ipv4, ...) by reading the raw user YAML, so the env the whole
process reads carried the USER's value even when an administrator pinned it —
e.g. a managed timezone was overridden because gateway/run.py wrote the user's
timezone into HERMES_TIMEZONE, and _resolve_timezone_name() checks the env var
first.

Wired the shared apply_managed_overlay() into every config→env bridge:

- gateway/run.py module-level startup bridge (timezone, redact_secrets,
  max_turns, terminal, display, gateway.strict, ...)
- gateway/run.py _reload_runtime_env_preserving_config_authority (the per-turn
  re-bridge that keeps config authoritative over reloaded .env — must keep
  MANAGED authoritative on every turn, not just startup)
- hermes_cli/main.py early security.redact_secrets / network.force_ipv4 bridge
  (runs before load_config is usable, at import time)
- hermes_cli/send_cmd.py top-level scalar config→env bridge

Verified end-to-end against a writable managed dir (12/12 checks incl. timezone,
logging, model, skin, gateway settings, write-guard) and in a clean process the
gateway per-turn bridge writes HERMES_TIMEZONE=<managed>. Adds an
order-independent regression test for the bridge overlay.
2026-06-19 07:46:33 -07:00
Ben
b0e47a98f9 fix(managed-scope): honor managed scope in all standalone config loaders
The skin bug was one instance of a class: several subsystems build their
config dict directly from config.yaml instead of routing through
hermes_cli.config.load_config (which carries the managed merge), so they
silently ignored administrator-pinned values. Audited every config.yaml
reader and fixed the behavioral-read bypasses:

- gateway/config.py load_gateway_config (messaging gateway: session_reset,
  quick_commands, stt, model, ...)
- gateway/run.py _load_gateway_config (its read_raw_config fast path also
  skipped the merge — read_raw_config returns raw user YAML)
- tui_gateway/server.py _load_cfg (new TUI + desktop backend: skin,
  reasoning_effort, service_tier, provider_routing)
- cron/scheduler.py (scheduled-job model/reasoning/toolsets/provider_routing)
- hermes_logging.py (logging.level/max_size_mb/backup_count)
- hermes_time.py (timezone)
- hermes_cli/doctor.py (memory-provider diagnostic reads effective config)

All route through a new shared managed_scope.apply_managed_overlay() helper
that mirrors _load_config_impl (env-only expansion so a user ${VAR} can't
shadow a managed literal, root-model-string normalization, leaf-merge) and is
fail-open. cli.py's earlier inline fix is refactored onto the same helper.

Write-back paths (slash_commands, telegram/yuanbao dm_topics, profile
distribution) are deliberately left reading raw user YAML — overlaying managed
values there would persist them into the user file. The dashboard
(web_server.py) already routes through load_config and needed no change.

TUI loader caches the RAW config so _save_cfg never writes managed values to
disk. Adds test_managed_scope_overlay.py (helper) and
test_managed_scope_loaders.py (per-surface integration); mutation-checked.
2026-06-19 07:46:33 -07:00
Ben
732293cf87 fix(managed-scope): apply managed layer in cli.py's standalone config loader
cli.py's load_cli_config() builds CLI_CONFIG independently of
hermes_cli.config._load_config_impl (it reads config.yaml directly and merges
into hardcoded defaults), so the Phase 2 managed merge never reached the
interactive CLI/TUI surface. Symptom: a managed display.skin (and any other
display/CLI pref read from CLI_CONFIG) was silently ignored by the TUI while
`hermes config`/`doctor`/write-guards — which go through load_config — correctly
honored it. Found via manual testing: the skin engine kept using 'default'.

Fix: overlay the managed config last in load_cli_config(), mirroring
_load_config_impl — expand against the process env only (so a user ${VAR} can't
shadow a managed literal), normalize the root model key so a managed
`model: x/y` string can't clobber the dict shape callers expect, then
leaf-merge. Fail-open so managed scope can never block CLI startup.

Adds tests/hermes_cli/test_managed_scope_cli_config.py locking that CLI_CONFIG
honors managed values, preserves user siblings, and is inert with no scope.
2026-06-19 07:46:33 -07:00
Ben
9a24e41d0f docs: add managed scope admin guide + cross-link from configuration 2026-06-19 07:46:33 -07:00
Ben
ddd519ea70 feat(managed-scope): surface managed scope in config show and doctor
- show_config prints an administrator header naming the managed source and
  lists the pinned config/env keys when a scope is active (silent otherwise).
- hermes doctor gains a managed_scope_check under Configuration Files that
  reports the resolved managed dir + pinned key counts, and flags a
  HERMES_MANAGED_DIR redirect (the documented foot-gun).
2026-06-19 07:46:33 -07:00
Ben
4f9e15df97 feat(managed-scope): guard writes to managed config/env keys
- set_config_value hard-rejects a managed config key (D2) and names the
  source, exiting non-zero.
- save_env_value / remove_env_value refuse a managed env key.
- save_config strips managed leaves from a bulk write (mechanical safety net)
  with a warning, so the unmanaged remainder still persists.
New _strip_dotted_keys helper drives the bulk-save pruning. All guards are
distinct from and layered after the existing is_managed() package-manager
write-lock.
2026-06-19 07:46:33 -07:00
Ben
81a663abea feat(managed-scope): apply managed .env last with override
load_hermes_dotenv now loads the managed-scope .env after user/project .env
and external secret sources, with override=True, so managed env values beat
the user .env and any pre-existing shell export. Reuses the existing dotenv
fallback + credential-sanitization path. Fail-open: no managed dir/.env is a
no-op and any error is swallowed so managed scope never blocks startup.
2026-06-19 07:46:33 -07:00
Ben
b5ddd6e719 feat(managed-scope): managed config layer wins over user config
_load_config_impl now deep-merges the managed config.yaml on top of the
expanded user config so managed leaves win while sibling keys stay
user-controlled (leaf-level merge, D3). Managed values are expanded against
the process env only, never user-defined ${VAR}, so a user can't shadow a
managed literal. The managed file's (mtime,size) is folded into the load
cache key so editing it invalidates the cache. This inverts the usual
env-over-config precedence for pinned keys by design (see design doc §4.1).
2026-06-19 07:46:33 -07:00
Ben
9cbcc0c9c8 feat(managed-scope): add managed_scope module (resolver, loaders, key helpers)
New hermes_cli/managed_scope.py resolves a system-level managed directory
(HERMES_MANAGED_DIR override > /etc/hermes), parses managed config.yaml/.env
with fail-open semantics, and exposes is_key_managed/is_env_managed helpers.
The system default is ignored under pytest and HERMES_MANAGED_DIR is added to
the conftest env scrub so a real managed scope can't leak into the suite.

Not wired into the load paths yet (Phases 2-3).
2026-06-19 07:46:33 -07:00
Ben
bf9a0481fa test(config): pin config/env load behavior before managed scope 2026-06-19 07:46:33 -07:00
teknium1
a58287afcb
Merge remote-tracking branch 'origin/main' into pr48275-rebase
# Conflicts:
#	cron/scheduler.py
2026-06-19 07:40:29 -07:00
Teknium
35e7ca03d5 fix(kanban): treat already-gone worker as terminated, not survived
_terminate_reclaimed_worker early-returned on ProcessLookupError with
terminated=False. The new reclaim-defer guard reads that as 'worker
survived the kill' and defers the reclaim forever, so a stale task whose
worker is already dead never lands in result.stale. ProcessLookupError
means the process is gone — that IS a successful termination. Split it
from the generic OSError branch and set terminated=True.
2026-06-19 07:38:10 -07:00
Sahil Saghir
b9e521da23 fix(kanban): hold reclaim while the worker is still alive
release_stale_claims and detect_stale_running call _terminate_reclaimed_worker
and then release the task claim unconditionally, even when the termination did
not actually kill the worker. _terminate_reclaimed_worker already reports this
via its "terminated" flag, but the callers ignore it.

When a worker is parked in uninterruptible (D) state — for example throttled by
a cgroup memory.high limit — a pending SIGTERM/SIGKILL cannot be delivered until
the throttle lifts, so the kill is a no-op. The dispatcher then frees the claim
and spawns a fresh worker beside the still-alive one. Repeated every dispatch
tick this accumulates duplicate workers without bound, deepening the memory
pressure that caused the throttle in the first place — a self-reinforcing
runaway.

Fix: gate both automatic reclaim paths on _worker_survived_termination(). When
we attempted to kill our own host-local worker and it is still alive, defer the
reclaim (_defer_reclaim_for_live_worker extends the claim a short grace and
emits a reclaim_deferred event) instead of releasing. This guarantees at most
one live worker per task and is self-correcting: not spawning a duplicate is
what relieves the pressure so the pending signal lands and the worker dies, and
the next tick reclaims cleanly. Non-host-local claims and the operator-driven
reclaim_task() path keep their existing force-release behaviour.

Related: #41448 (concurrent dispatchers amplify this by doubling reclaim
frequency); #42858 (kill the worker rather than orphan it on archive).

Tests: defer-when-worker-survives, reclaim-when-killed,
release-when-not-host-local, and the detect_stale_running path.
2026-06-19 07:38:10 -07:00
teknium1
13d4b5fe2f fix(hindsight): align client version to 0.6.1 across all sources
The lazy_deps pin (memory.hindsight -> hindsight-client==0.6.1) was newer
than the plugin's stated floor (>=0.4.22). Align _MIN_CLIENT_VERSION,
the setup wizard dep string, plugin.yaml, and the README to 0.6.1 so the
floor check, auto-upgrade target, and runtime lazy-install all agree.
Also drops the redundant local _MIN_CLIENT_VERSION redefinition in
post_setup.
2026-06-19 07:36:28 -07:00
Ben
6c44471bfd fix(hindsight): lazy-install cloud client dependency 2026-06-19 07:36:28 -07:00
Sahil Saghir
db744e7d1e feat(simplify-code): add risk-tiered application, Chesterton's Fence, slop + silent failure detection
Five targeted enhancements to the upstream simplify-code skill:

1. Risk-tiered application (SAFE/CAREFUL/RISKY) — safe changes auto-applied,
   careful changes verified per-file, risky changes flagged for human review.
   Prevents auto-applying N+1 restructures and public API renames.

2. Chesterton's Fence — before flagging anything for removal, reviewers run
   'git blame' to understand why it exists. Low-confidence findings are
   escalated rather than guessed.

3. AI slop detection — Quality reviewer now catches: extra comments restating
   obvious code, unnecessary defensive null-checks on validated inputs, 'as any'
   casts, and patterns inconsistent with the rest of the file.

4. Silent failure detection — Efficiency reviewer now catches: empty catch
   blocks, ignored error returns, except:pass, .catch(()=>{}) with no handling,
   and error propagation gaps.

5. Structured reviewer output with confidence+risk tags — reviewers report in
   'file:line → problem → fix | confidence: H/M/L | risk: SAFE/CAREFUL/RISKY'
   format, enabling the orchestrator to tier the application.

Plus 3 new pitfalls: over-trusting dead code tools, public contract awareness,
and preserving intentional error handling.

Total: +45/-8 lines. Keeps the 212-line compact spirit.

Ref: #379
2026-06-19 07:35:36 -07:00
Teknium
ba50e86563 fix: open dispatcher lock file with explicit utf-8 encoding
ruff (unspecified-encoding) and the Windows-footgun checker both flag
open() in text mode without encoture=. Keep text mode (the Windows lock
path in _try_acquire_file_lock writes a str newline) and pass
encoding='utf-8'.
2026-06-19 07:35:33 -07:00
Sahil Saghir
226e9322e1 fix(kanban): cross-platform dispatcher lock + explicit release
Two robustness gaps from community review (#44919):

1. Windows dead-path: replaced bespoke fcntl.flock with gateway.status
   _try_acquire_file_lock / _release_file_lock — already cross-platform
   (msvcrt on Windows, fcntl on POSIX). Added _release_singleton_lock
   helper.

2. Lock fd never released: stored handle is now released explicitly in
   both exit paths — CancelledError handler and normal while-loop exit.
   Allows in-process stop/restart (tests, embedded use).

Also tightened docstrings — 'corrupt the SQLite DBs' is now specific
(wal_autocheckpoint=0 + concurrent manual WAL checkpoints can corrupt
index pages), matching the module's own concurrency claims.
2026-06-19 07:35:33 -07:00
Sahil Saghir
dfa561092a fix(kanban): machine-global singleton lock for the embedded dispatcher (#41448)
The gateway's embedded dispatcher has no guard against more than one dispatcher
running concurrently. dispatch_in_gateway defaults to true, so a second gateway
for the same profile (a restart race where the old process is slow to exit) — or
any deployment that runs multiple profile gateways with the default — starts a
second dispatcher loop. As #41448 describes, concurrent dispatchers each run
release_stale_claims() against the same boards, double reclaim frequency, and
re-dispatch slow workers before they finish. In practice they also corrupt the
shared kanban SQLite DBs under concurrent write load.

Add _acquire_singleton_lock(): an exclusive, non-blocking fcntl.flock at the
machine-global kanban root (kanban_home()/kanban/.dispatcher.lock — the board is
shared across profiles by design, so this serialises every gateway, not just one
profile). The first gateway to start its dispatcher holds the lock for its
process lifetime; any other gateway finds it contended, logs, and skips
dispatching while still running for messaging. Falls back to config-only control
on non-POSIX or filesystems without flock.

This is more robust than a per-profile guard because the documented model is
"one dispatcher sweeps all boards" — the contention is across profiles, not just
within one. Closes #41448.

Test: lock is exclusive (held, then contended while held, then held again after
release).
2026-06-19 07:35:33 -07:00
Sahil Saghir
a5e06078b2 fix(cron): compact cron failure messages + repair bare repo dirs after git gc
Two small, focused fixes for the cron scheduler and checkpoint manager.

1. _summarize_cron_failure_for_delivery (cron/scheduler.py):
   Replaces the raw error dump in _process_job with a compact
   pattern-matched summary. Provider rate limits, timeouts, and
   authentication errors now produce a short human-readable message
   instead of dumping multi-KB provider JSON into the delivery channel.

2. _repair_bare_repo_dirs (tools/checkpoint_manager.py):
   Recreates refs/heads/ and branches/ directories after git gc
   --prune=now, which can remove empty dirs from bare repos and cause
   subsequent git add -A to fail with 'fatal: not a git repository'.
   Called after all four git gc call sites.

Both fixes use only standard library imports and plug into existing
call sites with no architectural changes.
2026-06-19 07:35:29 -07:00
Teknium
1958208744 chore(release): add Sahil-SS9 to AUTHOR_MAP for PRs #48466/#44919/#44909/#42209 2026-06-19 07:35:29 -07:00
Teknium
d7bff949af
fix(cli): default cli_refresh_interval to 1.0 to keep status bar alive (#49087)
PR #49056 set the default to 0, which reverts the #45592 idle-clock fix:
without a periodic invalidate, prompt_toolkit stops repainting the bottom
chrome during idle and the status bar goes stale/disappears after a turn.

Restore 1.0 as the default for everyone. The config knob stays — users on
emulators where the per-second redraw fights auto-scroll (#48309) can set
display.cli_refresh_interval: 0 to opt out.
2026-06-19 07:35:06 -07:00
Ben Barclay
2dd285f9b3 docs(gateway): document multiplexing opt-in + contract changes
Extend the 'Running Many Gateways at Once' user-guide page with a
'one gateway for all profiles (multiplexing)' section, kept to a single page:

- How to opt in (gateway.multiplex_profiles on the default profile) and when to
  prefer it vs one-process-per-profile.
- Every contract change a user sees when the flag is on:
  1. secondary-profile 'gateway start' is a hard error (--force escape hatch),
  2. HTTP-inbound reached via /p/<profile>/ prefix; secondary profiles must NOT
     enable a port-binding platform (webhook/api_server/msgraph_webhook/feishu/
     wecom_callback/bluebubbles/sms) — config error at startup,
  3. per-credential platforms still need their own token per profile,
  4. session keys namespaced agent:<profile>: (default stays agent:main:),
  5. single PID/lock + aggregated hermes status, per-profile runtime_status.json.
- What does NOT change: per-profile .env credential isolation (stricter, incl.
  MCP/Kanban subprocess env), Kanban, profile-scoped skills/memory/SOUL, routing.

All inert when the flag is off.
2026-06-19 07:34:15 -07:00
Ben Barclay
1e70df5fdd feat(gateway): multiplex phase 4 — lifecycle guard + per-profile observability
- _guard_named_profile_under_multiplexer: when the default gateway is running
  with gateway.multiplex_profiles=on, a named-profile 'hermes gateway run' hard
  -errors (pointing at the multiplexer) instead of double-binding that
  profile's platforms. Inert unless all hold: this invocation is a named
  profile, a default-profile gateway is alive, and its config has multiplexing
  on. --force overrides. Wired into run_gateway's guard chain.
- write_runtime_status gains served_profiles: the secondary-adapter startup
  records [active] + multiplexed profiles into runtime_status.json so
  'hermes status' can show per-profile coverage without a second probe. Absent
  for single-profile gateways.

Tests: served_profiles round-trips and is absent by default; guard is inert for
the default profile / under --force / when no default gateway is running.
2026-06-19 07:34:15 -07:00
Ben Barclay
d5d02eabb0 feat(gateway): multiplex phase 3 — secondary-profile adapter registry + conflict detection
Bring up adapters for every profile the gateway serves, not just the active
one. Keeps self.adapters as the default/active profile's map (the ~93 existing
self.adapters[...] sites are untouched) and adds secondary profiles under
self._profile_adapters[profile][platform].

- _start_secondary_profile_adapters loops profiles_to_serve(multiplex=True),
  skips the active profile (handled by the primary startup loop), and for each
  other profile loads its gateway config and creates+connects its enabled
  adapters under that profile's _profile_runtime_scope (home + secret scope).
- Each secondary adapter gets _make_profile_message_handler(profile): stamps
  source.profile (when unset) before delegating to the shared _handle_message,
  so the agent turn and session key resolve to that profile.
- Same-platform credential-conflict detection: _adapter_credential_fingerprint
  hashes the adapter's bot token (salted, truncated — never logs the token);
  two profiles claiming the same (platform, token) refuse the duplicate with a
  clear error naming both, since one token can't be polled twice.
- Port-binding hard-error: a SECONDARY profile that enables a port-binding
  platform (webhook, api_server, msgraph_webhook, feishu, wecom_callback,
  bluebubbles, sms) is a config error and aborts startup via MultiplexConfigError
  — the default profile owns the single shared HTTP listener and serves every
  profile through the /p/<profile>/ prefix, so a second bind can only collide.
  Distinct from a transient connect failure (which logs + stays alive to retry):
  a config error writes gateway_state=startup_failed and exits cleanly with an
  actionable message (names the profile, the platform, and the fix). There is no
  valid reason to bind a second port once you've opted into a multiplexer.
- Shutdown tears down secondary adapters alongside the primary ones.
- Defensive getattr guards keep partial-construction unit tests (stop(),
  _run_agent on bare instances) working.

No-op when multiplex_profiles is off (self._profile_adapters stays empty).

Tests: fingerprint stability/log-safety/distinctness, profile message-handler
stamping (and not overriding an already-stamped source), port-binding hard-error
raises + names the profile/platform, non-binding platform is not rejected, and
the guard set covers every TCP-binding adapter.
2026-06-19 07:34:15 -07:00
Ben Barclay
f35abb122a feat(gateway): multiplex phase 1 — HTTP-inbound /p/<profile>/ routing (webhook)
Serve webhook inbound for multiple profiles off the one shared listener via a
URL prefix, with no second port bound.

- SessionSource gains a 'profile' field (round-trips through to_dict/from_dict;
  omitted when unset so existing serialization is unchanged). It carries which
  profile an inbound message was routed to.
- WebhookAdapter registers /p/{profile}/webhooks/{route_name} alongside the
  existing /webhooks/{route_name}. _resolve_request_profile validates the
  prefix against profiles_to_serve(): None when absent or multiplexing is off
  (ignored, handled as default — no spurious 404), the profile name when valid,
  _PROFILE_REJECTED (→ 404) when the profile isn't served. The resolved profile
  is stamped onto the SessionSource.
- session-key namespacing and the per-turn home/credential scope now prefer
  source.profile: SessionStore._resolve_profile_for_key(source),
  _session_key_for_source fallback, and _resolve_profile_home_for_source all
  honor it (→ the agent turn resolves that profile's config/skills/credentials
  via the Phase 2 _profile_runtime_scope).

Constraint: routing inbound needs no per-profile platform credential, but the
agent still needs the routed profile's provider key — delivered by Phase 2's
secret scope. api_server (OpenAI-compatible surface) profile routing is a
focused follow-on; its source-construction path differs from webhook's.

Tests: SessionSource.profile round-trip + namespace drive; _resolve_request_
profile accept/reject/ignore matrix.
2026-06-19 07:34:15 -07:00
Ben Barclay
f538470cf4 feat(gateway): multiplex phase 2 — fail-closed profile credential isolation (Workstream A)
The credential gate. When multiplexing is active, a profile's secrets resolve
from a context-local scope, never the process-global os.environ (which in a
multiplexer may hold another profile's keys, and is inherited by every
subprocess spawned with env=dict(os.environ)).

- agent/secret_scope.py: get_secret() backed by a secret-scope contextvar.
  FAIL-CLOSED: when multiplex is active and no scope is installed, an unscoped
  read RAISES UnscopedSecretError instead of falling back to os.environ — a
  missed/new call site crashes loudly at that line rather than leaking a
  cross-profile value. Genuinely-global vars (HERMES_*, PATH, kanban paths,
  …) keep reading os.environ via an allowlist. load_env_file/build_profile_
  secret_scope parse a profile .env into an isolated dict WITHOUT mutating
  os.environ. Off by default => transparent os.getenv behavior.
- hermes_cli/runtime_provider.py: all credential/provider/base-url reads go
  through _getenv -> get_secret.
- agent/credential_pool.py: env fallbacks route through get_secret (the
  ~/.hermes/.env-first preference is preserved and already profile-correct via
  the home override).
- tools/mcp_tool.py: MCP config  interpolation resolves through
  get_secret, so a server's  picks up the routed profile's value.
- gateway/run.py: set_multiplex_active() at GatewayRunner init; per-turn .env
  reload is a no-op for credentials in multiplex mode (secrets come from the
  scope, not global env); _profile_runtime_scope context manager combines the
  HERMES_HOME override + secret scope; _run_agent wraps _run_agent_inner in
  that scope (resolved via _resolve_profile_home_for_source) when multiplexing.

Propagates into the agent worker thread for free via the existing
copy_context() in _run_in_executor_with_context.

Tests: 13 unit (fail-closed, scope isolation, global allowlist, .env parsing
without environ mutation) + 7 E2E (runtime_provider + MCP interpolation prove
two profiles isolated, unscoped read raises, globals still read environ).
2026-06-19 07:34:15 -07:00
Ben Barclay
d82f9fa7f7 feat(gateway): multiplex phase 0 — config flag, profile enumeration, profile-stamped session keys
Foundations for serving multiple profiles from one gateway process, inert
when off:

- gateway.multiplex_profiles config flag (default false), round-trips through
  GatewayConfig and load_gateway_config (top-level + nested gateway.* form).
- hermes_cli.profiles.profiles_to_serve(multiplex): the single chokepoint for
  which (profile, HERMES_HOME) pairs the gateway serves. Lightweight dir scan;
  active-profile-only when off, default + all named profiles when on.
- build_session_key gains a profile= namespace slot. Default/None reuse the
  historical 'agent:main:...' literal BYTE-IDENTICALLY (no session migration,
  positional parsers unaffected); a named profile becomes 'agent:<profile>:...'
  so two profiles on the same platform/chat never collide.
- SessionStore._resolve_profile_for_key + _session_key_for_source fallback
  resolve the namespace from the flag (legacy when off, active profile when on).

Tests: byte-identical-when-off (parametrized), namespace isolation, positional
layout preserved, config round-trip, profiles_to_serve enumeration.
2026-06-19 07:34:15 -07:00
alt-glitch
9e1f616136 fix(clarify): docstring — put options in choices[] only, never enumerate in question text
The model was enumerating options inside the question string (dead prose the UI
can't render as pickable rows). Schema description now spells out: choices[] is
REQUIRED for selectable options; question holds ONLY the question.
2026-06-19 07:34:02 -07:00
teknium1
df2420f571 fix(gateway): keep non-Discord home-channel startup send byte-identical
The salvaged non_conversational marking made the home-channel startup
no-metadata branch always pass metadata= explicitly; for non-Discord
platforms _non_conversational_metadata returns None, so Telegram/etc.
went from adapter.send(chat_id, message) to adapter.send(..., metadata=None).
Behaviorally identical but broke test_restart_notification's exact
assert_called_once_with. Only attach metadata when the marker applies
(Discord), restoring the original call shape elsewhere.
2026-06-19 07:29:27 -07:00
snav
caaa916289 fix(gateway): don't let delayed Discord status messages partition history backfill
Discord channel-history backfill partitions on Hermes' last self-authored
message. Asynchronous, non-conversational status sends (self-improvement
review bubbles, heartbeats, background-process notifications, update status,
gateway restart/online notices) land as ordinary bot messages, so a delayed
status bump becomes the history boundary and swallows real messages that
arrived after Hermes' actual reply.

Mark these sends at the source via metadata["non_conversational"] (Discord
only; other platforms' metadata is unchanged). The adapter no longer advances
the history-boundary cache for marked sends and persists their IDs to a
sidecar JSON so the cold-start scan can skip them by ID after a restart. A
narrow regex recognizer remains only as an upgrade bridge for status bumps
emitted by an older gateway that pre-dates the marking.
2026-06-19 07:29:27 -07:00
Teknium
b936f92b25
fix(desktop): render send/prefill directive notices (/goal, /undo) (#49073)
The desktop slash dispatcher dropped the `notice` field on `send` and
never handled `prefill` directives at all. `/goal <text>` returns
{type: send, notice: "⊙ Goal set …", message} from command.dispatch —
the desktop submitted the goal text as a plain prompt with no feedback,
so the goal looked like it did nothing. `/undo` returns a prefill
directive that fell through to "invalid response".

- types: add `notice?` to SendCommandDispatchResponse; add
  PrefillCommandDispatchResponse to the union.
- parseCommandDispatch: keep `notice` on send, parse prefill.
- runExec dispatcher: render the notice as a system line before acting,
  and handle prefill by dropping the message into the composer for
  editing (mirrors the TUI's createSlashHandler).

Tests: parseCommandDispatch send-notice / prefill cases.
2026-06-19 07:28:50 -07:00
Carlos Diosdado
e00b965406 feat(tts): add xAI TTS speed and optimize_streaming_latency config knobs
The xAI TTS REST endpoint (POST /v1/tts) accepts 'speed' (0.7-1.5)
and 'optimize_streaming_latency' (0/1/2) parameters, but the Hermes
built-in xAI provider was reading neither from config nor sending
either in the request body. Add them as tts.xai.speed and
tts.xai.optimize_streaming_latency config knobs (with global
tts.speed / tts.optimize_streaming_latency fallbacks).

- speed: float, clamped to 0.7-1.5. 1.0 (the API default) is omitted
  from the request body to preserve the existing minimal-payload
  contract.
- optimize_streaming_latency: int, clamped to 0-2. 0 (best quality,
  the API default) is omitted from the request body.

Resolver order: tts.xai.<knob> overrides the global tts.<knob>.
2026-06-19 07:26:56 -07:00
Teknium
8b7c89bff2
feat(dashboard): session switcher panel on the Chat tab (#49077)
Add a ChatGPT-style conversation list beside the embedded TUI on the
dashboard Chat tab so users can swap sessions without leaving the page.

- New ChatSessionList component: lists recent sessions for the active
  profile (title/preview, last-active, message count, source), a New chat
  button, and a refresh control. Best-effort like ChatSidebar.
- Selecting a row drives /chat?resume=<id>, which ChatPage already treats
  as part of the PTY identity, so the terminal respawns resuming that
  conversation. Active row is highlighted; New chat clears resume.
- Wired into ChatPage as a dedicated right-side column (desktop) and into
  the existing slide-over panel above model/tools (narrow screens).
- i18n: new sessions.newChat key across all locales.
- Read-only switcher by design — delete/rename/export stay on Sessions.

Docs: web-dashboard.md Chat section documents the switcher.
2026-06-19 07:26:53 -07:00
teknium1
06c7c2577f test(desktop): lock generic OAuth status fallthrough for catalog-only providers 2026-06-19 07:26:46 -07:00
teknium1
1d59d2dcae feat(desktop): resolve OAuth status for catalog-only account providers
Accounts-tab cards derived from the unified provider_catalog() carry
status_fn=None and had no hardcoded branch in _resolve_provider_status,
so any future OAuth/account provider plugin rendered permanently
logged-out. Fall through to the canonical hermes_cli.auth.get_auth_status
slug dispatcher and adapt its shape, so membership AND status both
auto-extend with the hermes model universe.
2026-06-19 07:26:46 -07:00
Austin Pickett
d91b8d8368 test(desktop): make keyVar a typed EnvVarInfo factory
Address review feedback on the keyVar test helper: it mocks one /api/env row
(an EnvVarInfo), so type it as such and mirror the sibling provider() factory's
base-plus-Partial-override shape instead of hardcoding positional args and
fabricated fields (description='X direct API', url=''). Route the WidgetAI test
through it too, removing the inline duplicate of the same object shape.
2026-06-19 07:26:46 -07:00
Austin Pickett
ee0de638d7 feat(desktop): add API-keys search; keep provider lists priority-sorted
- API-keys tab: a SearchField filters provider cards by name / env-var key /
  description, with a 'no providers match' empty state. Card order stays
  priority-then-name (curated PROVIDER_GROUPS priority floats recommended
  providers up; equal priority falls back to alphabetical).
- Accounts tab: 'Other providers' keep sortProviders order (priority, then
  name) — unchanged.

Adds searchKeys/noKeysMatch i18n strings across all four locales. Vitest covers
priority/name ordering + live filtering + empty state.
2026-06-19 07:26:46 -07:00