Commit graph

1532 commits

Author SHA1 Message Date
Ben Barclay
d82f9fa7f7 feat(gateway): multiplex phase 0 — config flag, profile enumeration, profile-stamped session keys
Foundations for serving multiple profiles from one gateway process, inert
when off:

- gateway.multiplex_profiles config flag (default false), round-trips through
  GatewayConfig and load_gateway_config (top-level + nested gateway.* form).
- hermes_cli.profiles.profiles_to_serve(multiplex): the single chokepoint for
  which (profile, HERMES_HOME) pairs the gateway serves. Lightweight dir scan;
  active-profile-only when off, default + all named profiles when on.
- build_session_key gains a profile= namespace slot. Default/None reuse the
  historical 'agent:main:...' literal BYTE-IDENTICALLY (no session migration,
  positional parsers unaffected); a named profile becomes 'agent:<profile>:...'
  so two profiles on the same platform/chat never collide.
- SessionStore._resolve_profile_for_key + _session_key_for_source fallback
  resolve the namespace from the flag (legacy when off, active profile when on).

Tests: byte-identical-when-off (parametrized), namespace isolation, positional
layout preserved, config round-trip, profiles_to_serve enumeration.
2026-06-19 07:34:15 -07:00
teknium1
06c7c2577f test(desktop): lock generic OAuth status fallthrough for catalog-only providers 2026-06-19 07:26:46 -07:00
Austin Pickett
8fe7b52ebf test(desktop): lock GUI⊇hermes model provider parity; surface Bedrock
Adds the end-to-end parity contract test: every CANONICAL_PROVIDERS entry (the
`hermes model` universe) must be configurable on a desktop Providers tab —
keys(/api/env) ∪ ids(/api/providers/oauth) ⊇ canonical. Asserted as an
invariant against the live endpoints so the GUI can never silently drift from
the CLI again.

Surfacing this contract caught Bedrock: it's aws_sdk (no api-key vars), so it
had no Keys card. /api/env now tags AWS_REGION/AWS_PROFILE to the bedrock
provider card. Anthropic is whitelisted as a legitimate dual-tab provider
(direct API key + subscription OAuth).

Also refreshes the _OAUTH_PROVIDER_CATALOG docstring to describe its new role
as the override base for _build_oauth_catalog().
2026-06-19 07:26:46 -07:00
Austin Pickett
60dfa0f31b feat(desktop): Accounts tab derives membership from unified provider catalog
/api/providers/oauth now unions the explicit hand-tuned OAuth cards
(_OAUTH_PROVIDER_CATALOG — bespoke flow/status/cli, plus the api-key Anthropic
PKCE card and synthetic claude-code row) with every accounts-tab provider in
provider_catalog(). Any OAuth/external provider in the `hermes model` universe
now appears automatically, closing the drift where google-gemini-cli and
copilot-acp had no Accounts card despite being CLI-configurable.

Adds read-only status cards for google-gemini-cli (via existing
get_gemini_oauth_auth_status) and copilot-acp (managed-by-CLI, like claude-code).
DELETE handler routes through the same _build_oauth_catalog() builder.

Parity test asserts the Accounts tab offers every accounts-tab catalog provider
as an invariant.
2026-06-19 07:26:46 -07:00
Austin Pickett
3be1326f8d feat(desktop): /api/env derives provider key membership from unified catalog
The Keys tab now surfaces every keys-tab provider in provider_catalog() (the
`hermes model` universe), synthesizing a card even when the env var has no hand
entry in OPTIONAL_ENV_VARS. Closes the drift where openai-api, kilocode, novita,
tencent-tokenhub, and copilot were CLI-configurable but invisible in the desktop
Providers → API keys tab.

Each provider row now carries backend-derived provider/provider_label grouping
hints so the desktop can group by the same provider identity the CLI picker
uses. Hand OPTIONAL_ENV_VARS prose still wins where present (enrichment, not a
gate). Shared non-provider credentials (e.g. tool-category GITHUB_TOKEN) are
explicitly not hijacked into a provider card — Copilot uses its provider-owned
COPILOT_GITHUB_TOKEN.
2026-06-19 07:26:46 -07:00
Austin Pickett
054b8c82fd feat: unified provider_catalog() — one source for CLI picker and desktop tabs
Adds hermes_cli/provider_catalog.py, deriving one descriptor per provider from
the CANONICAL_PROVIDERS universe (what `hermes model` renders, auto-extended
from provider plugins), joined with auth/env from PROVIDER_REGISTRY and display
metadata from ProviderProfile (with canonical/env fallbacks for the four
profile-less providers and the many profiles with blank display/signup fields).

Each descriptor is tagged with the desktop tab it belongs on (keys vs accounts)
by auth_type. This is the single source of truth the desktop Providers tabs will
derive membership from, so they can no longer drift from the CLI picker.

Tests assert the parity contract (catalog == hermes model universe) and tab
routing as invariants, not snapshots.
2026-06-19 07:26:46 -07:00
Alex Yates
fad4b40d9d fix(model): persist /model switch by default across sessions
A plain /model <name> switch only lasted for the current session — every
new session reverted to the previously-configured model, so users had to
re-switch every time (e.g. glm-5.1 -> glm-5.2 on every launch).

Persist-by-default is now the behavior across all three /model surfaces
(CLI, gateway, TUI/dashboard), gated by a new config key
model.persist_switch_by_default (default true):

  /model <name>             switch model (persists to config.yaml)
  /model <name> --session   switch for this session only
  /model <name> --global    switch and persist (explicit, unchanged)

The effective persistence is resolved once via resolve_persist_behavior()
in hermes_cli/model_switch.py so --session opts out, --global opts in,
and the config-gated default applies otherwise. --global remains a valid
explicit no-op alias for the new default.
2026-06-19 07:07:06 -07:00
teknium1
1cc915763b test(cli): cover cli_refresh_interval default; map salvaged author
Follow-up to the salvaged #48312 — adds the config-default test (ported
from #48319) and the AUTHOR_MAP entry for the cherry-picked commit.
2026-06-19 07:06:34 -07:00
kshitijk4poor
01a6f11896 fix(debug): include gui.log (dashboard/TUI/pty/websocket) in hermes debug share
gui.log was registered in hermes_cli/logs.py::LOG_FILES (and surfaced by
`hermes logs gui`) but was never wired into `hermes debug share`. The share
report captured agent/errors/gateway/desktop tails plus full agent/gateway/
desktop logs — but nothing from gui.log, the surface the dashboard, TUI-over-
PTY bridge, and websocket layer (hermes_cli.web_server / pty_bridge /
tui_gateway) actually write to. A user reporting a dashboard or TUI bug shared
zero breadcrumbs from the broken surface.

Wire gui.log through all three share surfaces, matching the existing pattern:
- _capture_default_log_snapshots(): capture the gui snapshot (redacted like the rest)
- collect_debug_report(): add the gui.log summary tail block
- build_debug_share(): pull gui full_text, prepend dump header + redaction banner, add to the upload loop
- run_debug_share() --local branch: same, plus the local print block
- _PRIVACY_NOTICE: name gui.log in both bullets

Redaction is inherited for free — the gui snapshot goes through the same
_capture_log_snapshot(..., redact=redact) path, so secrets are scrubbed in
both the tail and full text (verified E2E: seeded key masked by default,
passes through under --no-redact, raw token never leaks).

Tests: seed gui.log in the fixture, add test_report_includes_gui_log, and bump
the upload-count tripwire 4->5 (test_share_uploads_five_pastes).
2026-06-19 07:05:42 -07:00
xxxigm
e738c08336 fix(backup): exclude regeneratable dependency and cache dirs
`hermes backup` walked every file under HERMES_HOME, excluding only
hermes-agent / node_modules / __pycache__ / backups / checkpoints. Python
dependency trees (plugin and MCP-server venvs, site-packages) and pip/uv
tool caches that live under HERMES_HOME were swept in file-by-file,
ballooning a backup to hundreds of thousands of entries that crawl for
hours — the reported "backup stuck for days / 426543 files" symptom.

Add the canonical regeneratable-dir names (.venv, venv, site-packages,
.tox, .nox, .pytest_cache, .mypy_cache, .ruff_cache — mirroring
agent.skill_utils.EXCLUDED_SKILL_DIRS) plus .cache to the backup's
exclusion set, used by both run_backup and the pre-update/pre-migration
_write_full_zip_backup. .archive is intentionally left in so the curator's
restorable archived skills still get backed up.

Tests cover each new dir name (excluded at any depth), that .archive and
cache-resembling files are kept, and an integration check that a planted
venv/site-packages/cache is pruned from the actual backup zip while
skills/config survive.
2026-06-19 14:37:41 +05:30
kshitijk4poor
1ab6f34791 refactor(dashboard): align Slack allowlist validation with gateway parse
- Drop empty entries before validating SLACK_ALLOWED_USERS so a trailing or
  interior comma (which the gateway silently tolerates in
  gateway/platforms/slack.py) is no longer rejected at the dashboard.
- Hoist the member-ID regex to a module-level _SLACK_MEMBER_ID_RE constant
  and note it stays in sync with the frontend SLACK_MEMBER_ID_RE.
- Add a regression test for the trailing-comma case.
2026-06-19 12:22:30 +05:30
kshitijk4poor
83c034bd5b fix(dashboard): accept Slack allow-all wildcard in allowed-users validation
The new SLACK_ALLOWED_USERS validation rejected '*', but the Slack gateway
honors '*' as an allow-all wildcard (gateway/platforms/slack.py DM auth,
slash-confirm, and approval-button paths). Accept '*' as a valid list entry
in both the API validator and the dashboard form so a value the runtime
honors is no longer blocked at setup.
2026-06-19 12:18:15 +05:30
Shannon Sands
d9190491a6 Add Slack setup hints and field validation 2026-06-19 12:16:23 +05:30
Shannon Sands
f741e70791 Add Slack allowed users setup field 2026-06-19 12:16:23 +05:30
kshitij
6278bca055
Merge pull request #48259 from NousResearch/fix/ns501-multipart-upload-salvage
fix(dashboard): clean up upload temp file on client disconnect + pin python-multipart (NS-501)
2026-06-19 12:03:58 +05:30
Shannon Sands
12dfcfdf73 fix(tui): restart dashboard chat on idle exit hotkeys 2026-06-19 12:02:22 +05:30
AhmetArif0
245b95b094 fix(terminal): block gateway lifecycle commands from inside the gateway process
systemctl --user restart hermes-gateway run via the terminal tool is a
child of the gateway itself. When systemd delivers SIGTERM the gateway
kills this subprocess before it can complete, so the service may never
restart — reproducing issue #37453.

The hermes gateway restart/stop guard (hermes_cli/gateway.py) and the
cron-path guard (hermes_cli/cron.py) already block equivalent commands
in their respective paths but the terminal tool had no such defense.

Add a hard-block before command execution in terminal_tool: when
_HERMES_GATEWAY=1 and the command matches _contains_gateway_lifecycle_command,
return an error immediately. force=True cannot bypass it — unlike the
normal dangerous-command approval flow, here even a user-approved restart
would fail because the SIGTERM propagates to child processes.

Also extend _GATEWAY_LIFECYCLE_PATTERNS to match systemctl with flags
(e.g. systemctl --user restart) — the previous regex required the
action word immediately after systemctl with no flags in between.

Adds 9 regression tests: 6 blocked variants (parametrized), force bypass
attempt, safe systemctl passthrough, and guard-inactive-outside-gateway.
2026-06-19 11:53:44 +05:30
Teknium
620fd59b8e
feat(model-picker): add Refresh Models control to bust stale model cache (#48691)
The desktop model picker had no way to force a fresh model fetch: model.options
went through the 1h-cached provider_models_cache.json, and there was no flag to
bust it. When a provider's cached list expired and its next live fetch failed,
the picker fell back to the curated static list — silently dropping live-only
models (e.g. OpenCode Zen's free tier like deepseek-v4-flash-free) the user had
been using.

- Thread refresh through model.options (RPC + REST /api/model/options) ->
  build_models_payload -> list_authenticated_providers, which calls
  clear_provider_models_cache() up front when set so every row re-fetches live.
- Add a 'Refresh Models' control to the desktop picker (5-locale i18n, spinning
  sync icon). Normal opens leave refresh=false to stay snappy on the cache.

Verified: stale cache hides deepseek-v4-flash-free -> refresh busts it -> live
re-fetch surfaces it. refresh=false never touches the cache.
2026-06-18 21:37:41 -07:00
kshitij
d06104a9ee
fix(dashboard): resolve chat TUI argv off event loop (#48561)
* fix(dashboard): resolve chat TUI argv off event loop

Dashboard chat now resolves its TUI launch command off the
FastAPI/WebSocket event loop. The resolver can run `npm install` /
`npm run build` through `_make_tui_argv()`, and doing that synchronously
in `/api/pty` can block proxy keepalives and other dashboard WebSocket
work long enough for reverse-proxy deployments to drop the chat
connection.

This keeps the current TUI build policy intact: normal production
launches still run the correctness-first `npm run build` path, while
`HERMES_TUI_DIR` remains the prebuilt/no-build path for distros and
containers. The change only moves the potentially slow resolver work to
a worker thread for the dashboard chat path, serialized by an
`asyncio.Lock` so concurrent chat tabs preserve one-build-at-a-time
behavior. `SystemExit` (node/npm missing) and the profile `HTTPException`
path still propagate cleanly through `asyncio.to_thread()`.

Salvaged from #26124 — rebased onto current main. The async wrapper now
threads the `profile` parameter that `_resolve_chat_argv` gained on main
since the PR was opened, so cross-profile chat is preserved.

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>

* chore: add 0xdany to AUTHOR_MAP

* fix(dashboard): bind chat-argv lock to app.state; cover error propagation

Self-review hardening on top of the salvaged fix:

- Move `_chat_argv_lock` from a module-level `asyncio.Lock()` onto
  `app.state` (initialised in `_lifespan`, lazy fallback via
  `_get_chat_argv_lock`), mirroring `event_lock`. A module-level
  `asyncio.Lock()` binds to whatever event loop is active at import time,
  which is the exact pattern `_get_event_state`'s docstring warns against
  (breaks across TestClient instances / uvicorn reloads). This keeps the
  lock on the running loop.
- Add two tests exercising the real `_resolve_chat_argv_async` →
  `asyncio.to_thread` → lock → re-raise chain: `SystemExit` (node/npm
  missing) and `HTTPException` (invalid profile) both propagate out of the
  worker thread and are caught by `pty_ws`'s existing handlers. The prior
  tests mocked `asyncio.to_thread` away and never covered this path.

* test(dashboard): dedupe pty error-propagation tests; assert close code

simplify-code cleanup pass on the salvage stack:

- Extract the shared scaffolding of the two pty_ws error-propagation tests
  into `_assert_pty_propagates`, keeping the two tests as distinct contracts
  for the `except SystemExit` and `except HTTPException` arms.
- Assert the stable WebSocket close code (1011) instead of relying solely on
  the user-facing "Chat unavailable" notice wording — a behavior contract per
  the AGENTS.md "behavior contracts over snapshots" rule, robust to notice
  rewording. The detail substring ("unknown profile") is still checked for the
  HTTPException case since proving the detail survives the thread hop is the
  point of that test.

No production-code change; the helper exercises the same real
_resolve_chat_argv_async -> asyncio.to_thread -> lock -> re-raise chain.

---------

Co-authored-by: draihan <draihan@student.ubc.ca>
2026-06-18 22:20:52 -04:00
Ben
03d9a95a74
fix(desktop): show Hindsight memory provider (#37546)
* fix(desktop): show Hindsight memory provider

* feat(desktop): configure Hindsight memory provider

* fix(desktop): limit Hindsight modes to supported setup

* refactor(desktop): generic memory-provider config surface

Replace the bespoke Hindsight settings surface with a declarative,
schema-driven path so adding a memory provider is pure declaration —
no per-provider page, conditional, or endpoint.

- memory_providers.py: declarative registry. Each provider lists its
  fields {key, label, kind, default, options, secret-vs-plain}. Hindsight's
  mode is a select(cloud, local_external), so rejecting local_embedded
  falls out of generic enum validation instead of a hand-written check.
- One generic endpoint pair GET/PUT /api/memory/providers/{name}/config.
  GET returns declared fields + current values (secrets only as is_set,
  never read back); PUT validates selects against their options, writes
  plain fields to the provider config file, secrets to the env store,
  and flips memory.provider.
- ProviderConfigPanel renders straight from the schema, replacing
  hindsight-settings.tsx and the memory.provider === 'hindsight'
  conditional in config-settings.tsx — same pattern as
  toolset-config-panel.tsx off env_vars.

Scoped to memory providers; storage layout is unchanged so the runtime
Hindsight plugin reads the same config.json / HINDSIGHT_API_KEY / provider
keys as before. Tests cover the registry, endpoint behavior (defaults,
write+secret, select rejection, unknown provider, secret-never-returned),
and the generic panel.
2026-06-18 16:48:47 -05:00
brooklyn!
2944b3c394
fix(desktop): make session delete idempotent and id-resolving (#48641)
DELETE /api/sessions/{id} was the only session endpoint that didn't
resolve the id (detail, messages, rename, export all call
resolve_session_id) and 404'd when the row was already gone. The desktop
optimistically removes the sidebar row, then RESTORES it and shows the
error on any failure — so deleting a session that had just been reaped
(empty-session hygiene) or removed by a concurrent client resurrected a
ghost row and surfaced "session not found". /goal + auto-compression churn
leaves transient empty rows that race the sidebar snapshot, which is the
exact "I deleted the empty one and got 'session not found'" report.

Resolve exact ids / unique prefixes, and treat an already-absent session
as an idempotent success — DELETE's contract is "ensure it's gone". This
mirrors the bulk-delete endpoint, which already treats ghost ids as
success.

Tests: deleting an absent id is idempotent (200, not 404); delete resolves
a unique prefix; a real session still deletes.
2026-06-18 21:16:06 +00:00
teknium1
3042045540 fix(picker): keep max_models=0 distinct from unlimited; lock cap semantics
Follow-up to the cap-removal salvage. The contributor guarded the new
unlimited default with `[:max_models] if max_models else ...`, which conflates
max_models=0 (used by slug-only callers that want an empty model list) with
None (unlimited). Tighten to `is not None` at all five slicing sites in
list_authenticated_providers / list_picker_providers, and add a regression test
asserting the three-way contract: None=full, 0=empty, N=first N.
2026-06-18 13:47:31 -07:00
islam666
9705e7944a fix(picker): remove max_models=50 cap in interactive model pickers
The interactive model pickers (Desktop REST API, TUI model.options, CLI
/model) were hard-capped at max_models=50, which truncated large provider
catalogs like Kilo Gateway (336 models) to just 50 entries. This made
most models undiscoverable via the picker search box.

Changes:
- Change build_models_payload() default from max_models=50 to None (unlimited)
- Change list_authenticated_providers() default from max_models=8 to None
- Change list_picker_providers() default from max_models=8 to None
- Fix all [:max_models] slicing to handle None as 'no limit'
- Remove max_models=50 from 5 interactive picker callers:
  * web_server.py: get_model_options (Desktop /api/model/options)
  * web_server.py: get_recommended_default_model
  * model_switch.py: prewarm_picker_cache_async
  * tui_gateway/server.py: model.options JSON-RPC
  * cli.py: HermesCLI model picker
- Telegram/Discord inline keyboard picker (gateway/slash_commands.py)
  still passes max_models=50 explicitly — unchanged behavior.

The total_models field was already in the response payload and is now
meaningful since models.length == total_models for interactive pickers.

Fixes #48279
2026-06-18 13:47:31 -07:00
Siddharth Balyan
73cd8622f9
feat(billing): /billing terminal billing — interactive TUI + CLI client (#45449)
* feat(billing): nous_billing http client + BillingState core (phase 2b)

Phase 2b terminal-billing client foundation:
- hermes_cli/nous_billing.py: typed client for the 4 /api/billing/* endpoints
  (state/charge/poll/auto-top-up). Raises typed errors (BillingScopeRequired,
  BillingRateLimited, BillingAuthError) mapped from the live-verified contract;
  fail-open is the caller's job. Idempotency-Key enforced client-side.
- agent/billing_view.py: surface-agnostic BillingState core + Decimal money
  parsing (server emits decimal strings, not 2dp), fail-open builder,
  idempotency-key gen, custom-amount validation.
- 51 unit tests (decimal parse/format, payload tiering, error->exception
  matrix, fail-open, amount validation).

Plan: docs/plans/2026-06-13-001-phase-2b-terminal-billing-tui-plan.md

* feat(billing): billing:manage scope + lazy step-up re-auth (phase 2b)

- NOUS_BILLING_MANAGE_SCOPE constant.
- nous_token_has_billing_scope(): split-based scope check (no false-positive
  substring match).
- step_up_nous_billing_scope(): re-runs the device flow requesting
  billing:manage, reusing the held credential's portal/inference URLs + client_id
  (so a preview stays a preview), persists like _login_nous but WITHOUT the model
  picker. Returns True iff the minted token carries the scope (False when NAS
  silently downscopes a non-admin / unticked grant).

Lazy step-up (plan D-A): normal login path unchanged; 403 insufficient_scope
from a billing call triggers this. 7 unit tests.

* feat(billing): billing JSON-RPC methods for the TUI (phase 2b)

billing.state / charge / charge_status / auto_reload / step_up in
tui_gateway/server.py. Return STRUCTURED success envelopes (result.ok +
result.error=<code>) rather than JSON-RPC-level errors, so the Ink rpc() promise
always resolves and the TUI branches on the typed billing error code
(insufficient_scope, rate_limited, no_payment_method, …) to render the right
affordance. Money serialized as decimal STRINGS + display strings. charge mints
+ echoes an idempotency_key for retry reuse. 16 unit tests.

* feat(billing): /billing CLI handler + command registry (phase 2b)

- CommandDef("billing", subcommands=buy|auto-reload|limit), added to
  _SLACK_VIA_HERMES_ONLY so it routes via /hermes on Slack (keeps the 50-cap
  parity test green, same as /credits).
- cli.py::_show_billing + screen helpers: all 5 screens (overview, buy→confirm→
  poll, auto-reload, monthly-limit read-only). Reuses _prompt_text_input_modal /
  _prompt_text_input (D-C). Non-interactive (_app is None) renders text + portal
  deep-link, never prompts (R7). Decimal money end-to-end. 2s/5-min cancellable
  poll loop; 429/503 = retry not failure; settled = ledger truth. Lazy step-up on
  403 insufficient_scope. no_payment_method treated as mainline funnel-to-portal.
- 6 CLI tests; 156 command tests (incl. Slack/Telegram parity) green.

* feat(billing): /billing Ink TUI screens + tests (phase 2b)

- ui-tui/src/app/slash/commands/billing.ts: /billing TUI command covering all 5
  screens — overview (text), buy <amt> → ConfirmReq → charge → non-blocking 2s/
  5-min poll loop → settled/failed/timeout branches, auto-reload <below> <to> →
  ConfirmReq → PATCH, limit (read-only). Reuses the existing ConfirmReq overlay
  (D-C) — no bespoke component. Typed-error envelope branching: insufficient_scope
  arms the lazy step-up confirm; no_payment_method/rate_limited/cap funnel to
  portal. Client-side amount validation mirrors the server (bounds + 2dp).
- gatewayTypes.ts: Billing* response interfaces.
- registry.ts: register billingCommands.
- billingCommand.test.ts: 12 vitest cases (overview/gating/buy-confirm-poll-
  settled/no_payment_method/step-up/limit/auto-reload/validation).

TUI build green; 12/12 vitest pass; slash tests pass once @hermes/ink is built.

* docs(billing): scrub private cross-repo references

NAS is a private repo — remove all references to it from the public PR:
- drop the cross-repo planning doc (planning scaffolding, not a deliverable;
  the PR description documents the design)
- replace 'NAS' / 'PR #412 preview' mentions in code + test comments with
  generic 'the server' / 'a preview deployment'

* docs(billing): scrub final NAS reference in step-up docstring

* docs(billing): drop dangling plan-doc refs

The phase-2b plan doc was removed in the cross-repo scrub (300afcc0b)
but two module docstrings still pointed at it. Drop the dead refs.

* feat(billing): interactive /billing overlay + step-up UX, portal-URL & token fixes

Adds the interactive /billing TUI overlay and hardens the terminal-billing
client across CLI and TUI.

- TUI: full /billing overlay state machine (overview to buy to confirm,
  auto-reload, read-only monthly limit) reusing the existing confirm overlay.
- Step-up: surface the verification link in-transcript and open the browser
  via the TUI's own opener (the device flow runs in the headless gateway, so a
  printed URL was being dropped); run the step-up handler off the main loop and
  emit the link as an out-of-band event so the gateway stays responsive.
- Step-up copy is scope-accurate ("Billing permission granted") and re-checks
  /state so it never claims "enabled" when the org kill-switch is still off.
- Portal deep-links resolve to absolute URLs against the active portal base
  (the server emits them relative) - fixes a bare "/billing?topup=open" link.
- Billing calls refresh an expired access token via the stored refresh token
  instead of reporting a false "not logged in".
- Optimistic funnel: advise "set up a saved card on the portal" up front when
  no card is on file (advisory, not a hard gate).
- Token resolution is cached briefly so the 2s charge poll loop stops
  re-locking + re-reading the auth store on every tick; 401 re-resolves fresh.
- Remove the temporary demo-mode shims.

Validation: 87 Python billing tests, 88 TS tests (billing command + gateway
event handler), tsc clean, ink + ui-tui builds green.

* docs(billing): add /billing TUI screenshots for PR

* fix(cli): guard _last_invalidate on bare instances; update stale prompt-fallback test

The UI-invalidate throttle read self._last_invalidate unconditionally, which
raised AttributeError on HermesCLI instances built without __init__ (the
thread-safety test's object.__new__ shell). Guard the read with getattr.

The off-main-thread branch of _prompt_text_input was changed (#23185) to cancel
cleanly to None instead of falling back to a bare input() that would hang on the
slash-worker thread; the test still asserted the old direct-input fallback.
Update it to assert the current intended behavior: returns None, calls neither
run_in_terminal nor input(), and does not hang.
2026-06-19 01:53:32 +05:30
Teknium
c37fdec2d9
feat(dashboard): surface full per-MCP catalog detail; fix pip-install doc (#48520)
The dashboard MCP catalog only showed name/description/transport and a
non-clickable source. Users couldn't see what an entry connects to or runs
before installing — the exact detail the docs trust model tells them to vet.

- /api/mcp/catalog now returns transport target (url, or command+args),
  auth_type, git install source/ref + bootstrap commands, default-enabled
  tool hint, and post-install guidance per entry.
- McpPage renders the endpoint URL (http) or command+args (stdio), the git
  install source/ref, a collapsible bootstrap-commands list, setup notes,
  and the source as a clickable link when it's a URL.
- Docs: drop the 'uv pip install -e .[mcp]' quick-start step (Hermes does
  not support pip installs; MCP ships with the standard install) and note
  the dashboard now surfaces this detail.
- Strengthen the catalog endpoint test to assert the new inspection fields.
2026-06-18 09:40:56 -07:00
Kewe63
f1254c8eaf fix(skills): rmtree scope guard + default pre_update_backup to true (#48200)
Defense-in-depth fix for the silent wipe of ~/.hermes/ documented in
#48200. A `hermes update --yes` run silently destroyed a user's
.env, MEMORY.md, kanban.db, custom skills, and scripts. Two changes:

1. `_rmtree_writable` in tools/skills_sync.py now refuses to rmtree
   anything outside SKILLS_DIR (the HERMES_HOME/skills/ root).
   All five call sites pass paths under SKILLS_DIR, so the guard is
   a no-op for current code and a loud, recoverable failure for
   any future regression (bad path join, malicious bundled
   manifest, stale path in scope after an exception).

2. The default `updates.pre_update_backup` flips from false to
   true in hermes_cli/config.py. A few minutes of zip per update
   is negligible compared to silent total data loss. Still
   overridable; --no-backup still works for one-off opt-out.

Five new tests in TestRmtreeWritableScopeGuard (root path,
hermes home, sibling dir, skills root itself, subdir) plus a
flipped `test_default_enabled_creates_backup` in test_backup.py.
178/178 tests pass in the two affected files. Public method
signatures unchanged, no test-stub blast radius.

Closes #48200
2026-06-18 08:53:35 -07:00
kshitijk4poor
f6fac60e66 refactor(skills): dedupe file-listing, share user-modified predicate, trim diff contract
Cleanup pass on the salvage (behavior-preserving):

- diff_bundled_skill now uses the existing _skill_file_list() helper
  instead of reimplementing the rglob/is_file/relative_to file-set
  enumeration inline (twice).
- Extract _is_tracked_user_modification(origin_hash, user_hash) and use
  it in BOTH the sync loop and list_user_modified_bundled_skills() so the
  'kept user edit' rule can't drift between the two sites.
- _read_text_for_diff -> _read_for_diff returns (bytes, text); the binary
  branch now compares the bytes it already read instead of re-reading
  both files from disk.
- Drop the unused 'user_present' key from diff_bundled_skill's return
  contract (no consumer or test ever read it).
- test_update_modified_notice: drop the brittle '>= 2 sites' count-floor
  so consolidating the two print paths into a shared helper stays a
  welcome refactor; keep the per-site 'count notice => discovery hint'
  invariant (still mutation-tested).
2026-06-18 12:42:58 +05:30
kshitijk4poor
6777916068 fix(skills): surface list-modified hint on both update paths + disambiguate diff
Salvage follow-up to the cherry-picked feat/test commits:

- W1: the unpack/install update path in main.py printed the
  '~ N user-modified (kept)' notice without the new
  'hermes skills list-modified' hint that the git-pull path got.
  Mirror the hint to both sites so the count is actionable
  regardless of which update path runs.
- W2: 'hermes skills diff <name>' (bundled-vs-stock) now shares the
  verb with the gateway write-approval 'diff <id>'. The gateway
  handler's docstring + truncation message pointed users to
  '/skills diff <id>' on the CLI, which now resolves a bundled skill
  by that name instead. Point at the pending JSON file and note the
  two diff commands are distinct.
- Add an invariant test asserting every 'user-modified (kept)' notice
  in main.py carries the discovery hint (guards sibling drift).
2026-06-18 12:28:11 +05:30
kshitij
832d5967f8
Merge pull request #48262 from kshitijk4poor/salvage-32445
feat(memory): improve OpenViking setup UX (salvage #32445)
2026-06-18 11:34:11 +05:30
kshitijk4poor
6752da9a77 fix(dashboard): clean up upload temp file on client disconnect + pin python-multipart (NS-501)
Follow-up to #47663 (streaming multipart upload), fixing two issues that
landed with it.

1. Temp file leaked on client disconnect. The streaming upload endpoint's
   except chain caught only HTTPException / PermissionError / OSError — all
   Exception subclasses. asyncio.CancelledError, raised when a browser aborts
   a large upload mid-stream (the exact NS-501 scenario), is a BaseException,
   so it bypassed every except clause and reached a finally that only closed
   the file handle and never unlinked the temp file. Every aborted large
   upload orphaned a partial `.{name}.*.upload` file (up to ~100 MB) in the
   target directory. Cleanup now lives in finally, keyed on a `renamed`
   success flag, so the temp file is removed on every non-success exit
   including BaseException paths. Added test_stream_upload_cleans_temp_on_cancellation,
   which fails on the pre-fix code (leaks the temp file) and passes with the fix.

2. python-multipart pinned to ==0.0.27 instead of ==0.0.20. The package was
   already resolved at 0.0.27 transitively (via daytona) before #47663; the
   explicit ==0.0.20 pin in the [web] extra and the tool.dashboard lazy-install
   set downgraded it. Bumped both to ==0.0.27 and regenerated with `uv lock`,
   keeping the lockfile coherent. The base dependency stays >=0.0.9,<1.
2026-06-18 11:32:18 +05:30
kshitijk4poor
1153b42b24 Merge upstream/main into OpenViking setup-UX (salvage #32445)
Resolves conflicts from the OpenViking churn that merged after #32445 was
opened (#48042/#47662 session-switch + write hardening, #47311/#47973):

- plugins/memory/openviking/__init__.py: keep both __init__ field groups
  (the PR's _runtime_start_* alongside main's _prefetch_threads/_shutting_down).
- tests/plugins/memory/test_openviking_provider.py: keep BOTH the PR's new
  setup-validation tests and main's session-switch/concurrency tests (disjoint
  additions to the same region).

Two fixes layered while reconciling (contributor work otherwise preserved):

- Restore the merged tenant-header contract (#22414/#21232). The PR had changed
  _VikingClient defaults to '' and made empty account/user OMIT the tenant
  headers; main's contract is that empty falls back to 'default' and the
  X-OpenViking-Account/User headers are ALWAYS sent (ROOT API keys need them).
  Reverted the constructor to 'account or os.environ.get(..., "default")' and
  updated the two PR tests that asserted the omit-when-empty behavior.

- Close a secret-file TOCTOU in the setup writers. _write_env_vars and
  _write_ovcli_config wrote the api_key/root_api_key file and chmod 0600
  AFTERWARD, leaving a world-readable window on newly-created files. Added
  _precreate_secret_file() to create with 0600 before any secret bytes land.
2026-06-18 11:28:51 +05:30
Ben Barclay
c661634537
fix(dashboard): stream file uploads via multipart instead of base64 JSON (NS-501) (#47663)
* fix(dashboard): stream file uploads via multipart instead of base64 JSON

The dashboard file manager uploaded files (including backup/restore zip
archives) by reading them client-side with FileReader.readAsDataURL and
POSTing a base64 data URL inside a JSON body to /api/files/upload. For a
large backup this (a) inflates the payload ~33%, (b) buffers the whole
file plus its decoded copy in memory, and (c) reliably trips an upstream
proxy body-size/timeout limit, surfacing as a 502 with the upload
appearing to hang indefinitely (NS-501). Dashboard-only hosted users have
no shell fallback to place the archive, so backup restore was unusable.

Add a streaming multipart endpoint POST /api/files/upload-stream
(UploadFile + Form) that reads the request body in 1 MiB chunks straight
to a sibling temp file, enforces the existing 100 MB size cap as it
streams (413 on overflow, before buffering the whole file), and
atomically renames into place so a partial/aborted/over-limit upload
never clobbers an existing file. The frontend api.uploadFile now sends
multipart/form-data (raw bytes, no base64, browser-set boundary) and
FilesPage passes the File object directly; the dead readAsDataUrl helper
is removed. The legacy base64 JSON endpoint stays for backward compat.

FastAPI's UploadFile/Form require python-multipart, which is NOT pulled in
by fastapi itself, so it is added to the base deps, the [web] extra, and
the tool.dashboard lazy-install set (kept in sync).

Validated: 5 new endpoint tests (roundtrip, multi-chunk >1 MiB,
over-limit 413 without clobbering + no temp-file leak, overwrite=false
conflict, forced-root traversal containment); existing base64 tests still
pass; web typecheck + vite build clean; and a real uvicorn server E2E
(5 MB multipart upload -> HTTP 200 in 0.21s, exact byte match) plus a
30 MB TestClient roundtrip confirm constant-memory streaming end to end.

Reported via beta (NS-501).

* build(deps): regenerate uv.lock for python-multipart (NS-501)

CI ran uv lock --check / uv sync --locked which failed because the
python-multipart dependency add was not reflected in uv.lock. Regenerate
the lockfile (resolves to 0.0.20, matching the [web] extra pin) after
merging current main.
2026-06-18 15:54:32 +10:00
Ben Barclay
9c3c5da356
fix(backup): hermes import never overwrites volatile gateway runtime state (NS-501) (#48243)
Importing a backup wrote every file from the zip over the target home
wholesale. On a hosted instance this clobbered gateway_state.json with the
source machine's last recorded run/desired state — driving the container-boot
reconciler (container_boot._read_desired_state, which only auto-starts a
gateway whose state is "running") off stale/foreign state and leaving the
gateway stuck "starting", disconnected from the Nous portal.

Add _IMPORT_SKIP_NAMES (gateway_state.json, gateway.pid, cron.pid,
gateway.lock, processes.json) and skip them by basename in run_import, so both
the root profile and named profiles preserve the target's own runtime state.
This mirrors what container_boot._STALE_RUNTIME_FILES already sweeps on every
container boot, and protects against older backups that predate the
backup-side exclusions. The import summary reports which files were preserved.

This is the second half of NS-501 (filed separately as NS-508): the upload
502 was fixed in #47663; this fixes the import-breaks-the-instance half.
2026-06-18 15:27:45 +10:00
Ben Barclay
4440d77bf3
fix(update): scope install-method stamp to the code tree, not $HERMES_HOME (#48188)
The install method (docker/git/pip/...) describes the *running binary*, but
detect_install_method() read it from $HERMES_HOME/.install_method — a shared
DATA directory. The Docker docs deliberately bind-mount $HERMES_HOME
(~/.hermes:/opt/data) so config/sessions/memory persist and can be shared with
a host-side Desktop/CLI install.

When a containerized gateway and a host install share one $HERMES_HOME, the
home-scoped stamp is a single slot describing two installs: the published image
stamps 'docker' on every boot, the host install then reads 'docker' and the
in-app updater refuses to run 'hermes update' ("doesn't apply inside the Docker
container"). Reinstalling the Desktop app from the DMG doesn't help because the
contaminated stamp is re-read every time.

Fix (option 1 — code-scoped stamp):
- detect_install_method() reads <install tree>/.install_method first (next to
  the running code, immune to the shared data dir). It falls back to the legacy
  $HERMES_HOME stamp for back-compat, but IGNORES a 'docker' home stamp when
  not actually containerized — so already-poisoned shared homes self-heal.
- stamp_install_method() writes the code-scoped stamp.
- install.sh stamps $INSTALL_DIR instead of $HERMES_HOME.
- Dockerfile bakes 'docker' into /opt/hermes/.install_method at build time
  (inside the immutable block); stage2-hook.sh no longer writes the home stamp
  and proactively removes a stale 'docker' one to heal existing shared homes.

Genuine containers still resolve to 'docker' (baked stamp, or legacy home stamp
honored when containerized). Unstamped installs in generic containers still fall
through to git/pip (preserves the #34397 fix).
2026-06-18 14:14:41 +10:00
Ben Barclay
c276b017ad
feat(relay): connector⇄gateway channel auth + signed-HTTP inbound receiver + enroll CLI (#48147)
* feat(relay): authenticate the connector⇄gateway WS channel

The relay gateway may be customer-managed and internet-exposed, so the
connector⇄gateway channel is itself authenticated (distinct from the
platform crypto the relay path sheds). Add gateway/relay/auth.py — a
Python port of the connector's HMAC token + delivery-signature schemes
(relayAuthToken.ts / deliverySigning.ts), verified byte-for-byte against
the connector's compiled TypeScript via cross-language test vectors.

Present an Authorization bearer on the /relay WS upgrade keyed by the
per-gateway secret (resolved from GATEWAY_RELAY_ID / GATEWAY_RELAY_SECRET
in env or config). The connector rejects an unauthenticated/invalid/
revoked upgrade with close 4401.

* feat(relay): signed-HTTP inbound delivery receiver

The connector delivers normalized inbound events to a tenant's gateway
over a signed HTTP POST, not the outbound /relay WS: the connector
instance owning a platform socket is generally not the instance a given
gateway dialed out to, so inbound targets a tenant endpoint that may
load-balance across gateway instances.

Add gateway/relay/inbound_receiver.py — verifies x-relay-signature /
x-relay-timestamp over the EXACT raw request bytes (re-serializing would
break the HMAC: JS JSON.stringify is compact, Python json.dumps spaces)
against the per-tenant delivery key verify list within a 300s replay
window, then dispatches messages to handle_message and interrupts to the
interrupt handler. Wire it into the adapter lifecycle (start in connect()
when a delivery key + bind port are configured, tear down in disconnect();
a purely-outbound dev gateway runs without it).

Refine test_relay_sheds_crypto to distinguish PLATFORM crypto (Discord
ed25519, Twilio/WeCom HMAC — still shed) from the connector⇄gateway
CHANNEL auth (intended): auth.py / inbound_receiver.py are exempt from
the platform-symbol scan but still banned from importing platform-crypto
modules, plus a positive guard that auth.py uses only stdlib hmac/hashlib.

* feat(relay): hermes gateway enroll CLI

Add the gateway half of zero-touch enrollment. `hermes gateway enroll`
resolves a fresh Nous Portal access token (the tenant-proving identity),
POSTs {enrollmentToken, gatewayId} to the connector's /relay/enroll, and
persists GATEWAY_RELAY_ID / GATEWAY_RELAY_SECRET / GATEWAY_RELAY_DELIVERY_KEY
to ~/.hermes/.env. The per-gateway secret authenticates the WS upgrade;
the per-tenant delivery key verifies signed inbound deliveries.

Refuses under is_managed() (hosted installs get the secret stamped in by
the orchestrator). Added as an 'enroll' subcommand on the existing
gateway subparser — not a new top-level command.

* docs(relay): inbound is signed HTTP, not WS; document channel auth

Fix the stale contract: §3/§5 said inbound rode the WS socket (single-
instance only, predates the multi-instance socket-ownership + channel-auth
model). Inbound + connector→gateway interrupt are signed HTTP POSTs to the
tenant endpoint. Add §6.1 documenting the two channel-auth schemes (per-
gateway WS-upgrade secret, per-tenant inbound delivery key) and how they
differ from the platform crypto the relay path sheds.

* test(relay): update build_gateway_parser callers for cmd_gateway_enroll

The enroll subcommand added cmd_gateway_enroll as a required keyword-only
arg to build_gateway_parser, but two existing parser-extraction tests still
called it with only cmd_gateway/cmd_proxy — failing CI with TypeError.
Thread the new handler through both call sites and add a test asserting
`gateway enroll` dispatches to cmd_gateway_enroll with its flags parsed.
2026-06-18 12:01:54 +10:00
Ben Barclay
fcf6cb3d73
fix(docker): supervised gateway uses --replace to take over stale holder (NS-505) (#47555)
* fix(docker): supervised gateway uses --replace to take over stale holder

Inside the s6 container image the per-profile gateway service rendered a
bare `hermes gateway run` (no --replace). When a gateway is started
OUTSIDE s6 — a stray shell `hermes gateway run`, an agent action, or the
Open WebUI helper (scripts/setup_open_webui.sh) — it grabs the
per-HERMES_HOME PID lock first. The supervised slot then execs the bare
`gateway run`, hits the "Another gateway instance is already running"
guard, exits non-zero, and s6 restarts it: a restart loop that floods the
log every ~12s and never binds. The container looks up but the gateway is
permanently down, and dashboard-only users (no shell) cannot recover.

Render the supervised run script as `gateway run --replace` so s6 is
authoritative for its slot: it reaps the stale holder via the hardened
takeover path (takeover marker + SIGTERM->SIGKILL-with-confirmation +
scoped-lock cleanup in gateway/run.py) and binds. This matches the
systemd service path, which already builds its argv with --replace
(_build_gateway_argv / 'nohup hermes gateway run --replace'), and the
intent already documented in _maybe_redirect_run_to_s6_supervision. The
existing HERMES_S6_SUPERVISED_CHILD sentinel still prevents the
run->start->run redirect recursion. Each profile is scoped to its own
HERMES_HOME and s6 guarantees one supervised instance per slot, so there
is no legitimate supervised sibling for --replace to clobber.

Reported via beta (NS-505): gateway.log showed PID 17907 'running
(manual process)' with the guard error repeating every ~12s on
v2026.6.5.

Adds a regression test asserting every gateway-run exec line in the
rendered script (default + named profile, both privilege branches)
carries --replace, and updates the existing render-script assertion.

* fix(ci): remove stray .venv symlink committed into repo

The PR's commit accidentally tracked a .venv symlink pointing at the
developer's local venv (mode 120000 -> /home/ben/nous/hermes-agent/.venv).
The CI test/e2e/build jobs run `uv venv` to create .venv and failed with
`failed to create directory .venv: File exists (os error 17)` because the
checkout already contained the symlink. All test shards aborted in <15s
during setup, before any test ran.

Untrack the symlink and add a bare `.venv` entry to .gitignore (the
existing `.venv/` rule only matches a directory, so a symlink slipped
through).
2026-06-18 10:49:02 +10:00
Teknium
9ba4615db2
fix(dump): show commit date instead of release date in hermes debug (#48104)
* feat(mcp): raise default tool-call timeout 120s -> 300s

Port from openai/codex#28234. Long-running MCP tools (web fetches,
sandboxed builds, deep-research servers) routinely exceed 120s, causing
spurious timeout failures. Codex bumped its default MCP tool timeout from
120 to 300 for the same reason.

- _DEFAULT_TOOL_TIMEOUT 120 -> 300 in tools/mcp_tool.py (per-server
  'timeout' config override unchanged)
- update test_default_timeout assertion
- document the default in mcp-config-reference.md

* fix(dump): show commit date instead of release date in hermes dump

The version line in `hermes dump` (the top of the /debug report) appended
the package release date in parentheses, which reads like a wall-clock
"generated at" timestamp and confuses support triage. Replace it with the
date the HEAD commit was actually made, resolved live via
`git log -1 --format=%cd --date=short`, kept next to the commit SHA.

On Docker/wheel installs with no .git the date resolves to '' and the
suffix is simply omitted (the baked SHA still identifies the build).
2026-06-17 16:53:42 -07:00
brooklyn!
c1f9eb0ec4
fix(desktop): resolve electronDist dynamically + self-heal blocked installs (supersedes #48081/#48082) (#48091)
* fix(desktop): resolve electronDist dynamically + self-heal blocked installs

Supersedes the static-path approach (#48081) and the install-step self-heal
(#48082) with a fix that removes the whole failure class instead of chasing each
symptom. Three distinct faults converged into the June desktop-build outage; this
closes all three.

Root cause (the part #48081 left open — "Gap B"):
  build.electronDist was a static relative path in apps/desktop/package.json, but
  npm workspace hoisting is NOT deterministic — depending on the npm version and
  what else is installed, npm nests the workspace-only electron devDep under
  apps/desktop/node_modules/electron OR hoists it to the repo root. A static path
  matches only one layout, so a clean install intermittently fails with "The
  specified electronDist does not exist". #48081 re-pointed the path at the
  nested layout (correct today) but electron-builder reads electronDist
  STATICALLY, so any future hoist change silently breaks it again — only caught
  by a CI invariant, never self-corrected.

Fix:
- scripts/run-electron-builder.cjs: resolve electron the way Node's runtime does
  — require.resolve("electron/package.json") walks node_modules from the desktop
  project upward and finds electron wherever npm actually put it. The path can
  never drift out of sync with the install layout again, on any OS/npm version.
    * dist present -> pass -c.electronDist=<abs>/dist so electron-builder reuses
      the unpacked runtime (keeps the #38673 fast path that dodges the 26.8.x
      missing-binary re-unpack bug).
    * dist absent  -> omit electronDist; electron-builder fetches Electron itself
      via @electron/get honoring electronVersion + ELECTRON_MIRROR.
  package.json: builder script now runs the wrapper; the static build.electronDist
  is removed (the resolver owns it).
- main.py / install.sh / install.ps1: on a dependency-install failure where the
  electron package staged but its dist is missing (electron's install.js
  process.exit(1) on a blocked/throttled binary download — #47266/#47917/#48021),
  repopulate the dist via electron's downloader (canonical, then npmmirror.com)
  and CONTINUE to the build instead of aborting. npm runs postinstall LAST, so
  the only casualty is electron/dist; bailing here is what made the pack-time
  mirror self-heal unreachable on a blocked network. Hard-fail only when electron
  never staged at all (a genuine dependency error).
- The pack-time mirror fallback now retries the build even when the pre-fetch
  can't populate the dist: the wrapper lets electron-builder download Electron
  itself via the mirror, so the retry is no longer a no-op (it was, when
  electronDist was a static path).

The exact 40.10.2 pin (already on main) keeps the third mode — the native
@electron-internal/extract-zip win32 binding that 40.10.3/40.10.4 ship without a
published prebuild — from recurring.

Tests:
- test_desktop_electron_pin.py: replace the static-path-matches-lockfile
  invariant with contracts that there is no hardcoded electronDist to drift, the
  builder script routes through the resolver, and the resolver uses Node module
  resolution + injects -c.electronDist.
- test_gui_command.py: install-failure self-heal continues to build; genuine
  (electron-never-staged) install failure still hard-fails; pack retries under
  the mirror even when the pre-fetch is blocked.

Salvages/supersedes the overlapping community work in #48003 (sitkarev),
#48012 (omegazheng), #48033 (james47kjv), and #48082.

Co-authored-by: sitkarev <59806492+sitkarev@users.noreply.github.com>
Co-authored-by: omegazheng <zheng@omegasys.eu>
Co-authored-by: james47kjv <220877172+james47kjv@users.noreply.github.com>

* fix(desktop): narrow Electron self-heal to real missing-dist failures

Follow-up on #48091 to remove the remaining misdiagnosis risk from the
installer/build fallback path (#46785 concern): only take the Electron
repair/retry path when Electron's package files are staged and dist is actually
missing/corrupt.

- main.py: add _electron_pkg_staged_missing_dist() and use it to gate install
  failure recovery; fail fast for unrelated npm install errors.
- main.py/install.sh/install.ps1: run cache purge + retry only when dist is
  missing; do not retry unrelated tsc/vite/build failures under an
  Electron-specific narrative.
- install.sh/install.ps1: tighten install-stage self-heal guard to require both
  package.json + install.js and missing dist.
- tests: add coverage that install failure hard-fails when Electron dist already
  exists, and update retry test to reflect the tightened recovery condition.

Validation:
- Python tests: 64 passed
- install.sh-related tests included in the run
- Real mac build on this machine:
  - npm ci at repo root: success
  - cd apps/desktop && npm run pack: success
  - electron-builder packaged darwin arm64 and used custom unpacked Electron dist

* refactor(desktop): trim electron self-heal helpers and comments

Deduplicate mirror-retry into _try_redownload_electron_dist / shell
counterparts; shorten wrapper and install-script commentary without
changing recovery semantics.

---------

Co-authored-by: sitkarev <59806492+sitkarev@users.noreply.github.com>
Co-authored-by: omegazheng <zheng@omegasys.eu>
Co-authored-by: james47kjv <220877172+james47kjv@users.noreply.github.com>
2026-06-17 18:48:35 -05:00
Teknium
f8098c6b6f
fix(desktop): resolve electronDist to the actual electron install location (#48081)
After the June lockfile regeneration (#46652) floated electron and reshuffled
npm workspace hoisting, the desktop pack fails with "The specified electronDist
does not exist". apps/desktop/package.json pointed electronDist at the repo
root (../../node_modules/electron/dist) while npm now installs electron nested
under apps/desktop/node_modules/electron. The two contradict, so a clean
install can never package the app (Windows + macOS).

- electronDist -> node_modules/electron/dist (resolved relative to apps/desktop,
  i.e. the workspace-local install npm actually produces).
- hermes_cli/main.py, scripts/install.sh, scripts/install.ps1: add a runtime
  electron-dir resolver that prefers apps/desktop/node_modules/electron and
  falls back to the root hoist, so dist checks + the mirror re-download work
  under either npm layout.
- patch-electron-builder-mac-binary.cjs: try the workspace-local Electron.app
  before the root hoist in the macOS binary-restore fallback (sibling site no
  PR touched).
- test: assert build.electronDist resolves to where the lockfile installs
  electron, so a future hoist change (root <-> nested) can't silently break it.

Salvages the overlapping work in #48003 (sitkarev), #48012 (omegazheng), and
#48033 (james47kjv).

Co-authored-by: sitkarev <59806492+sitkarev@users.noreply.github.com>
Co-authored-by: omegazheng <zheng@omegasys.eu>
Co-authored-by: james47kjv <220877172+james47kjv@users.noreply.github.com>
2026-06-17 18:08:01 -05:00
kshitij
49d7481dfb
Merge pull request #47706 from NousResearch/fix/cli-login-deprecation-graceful
fix(cli): deprecated `hermes login` fails gracefully for any provider
2026-06-17 23:02:32 +05:30
teknium1
aa6f77596b chore: add AUTHOR_MAP entry for #47904 salvage 2026-06-17 09:49:46 -07:00
definitelynotguru
eaddeaf2e6 feat(xai): add grok-composer-2.5-fast to xAI OAuth model picker
The model is callable via xAI OAuth but omitted from models.dev and
/v1/models listings. Merge it into the curated xAI catalog so it appears
in `hermes model` without requiring a custom model name.
2026-06-17 09:49:46 -07:00
Teknium
c6c8abbadb
refactor: remove agent-callable send_message tool (#47856)
* feat(mcp): raise default tool-call timeout 120s -> 300s

Port from openai/codex#28234. Long-running MCP tools (web fetches,
sandboxed builds, deep-research servers) routinely exceed 120s, causing
spurious timeout failures. Codex bumped its default MCP tool timeout from
120 to 300 for the same reason.

- _DEFAULT_TOOL_TIMEOUT 120 -> 300 in tools/mcp_tool.py (per-server
  'timeout' config override unchanged)
- update test_default_timeout assertion
- document the default in mcp-config-reference.md

* refactor: remove agent-callable send_message tool

The agent should not decide on its own to fire off cross-platform
messages or reactions. Outbound platform messaging is handled outside
the agent loop — cron delivery, the gateway kanban notifier
(dashboard-toggled), and the `hermes send` CLI.

Removes the model-tool registration only; the send engine in
send_message_tool.py (_send_to_platform, _send_via_adapter,
_parse_target_ref, per-platform _send_* helpers) is kept intact for
those non-agent callers. Drops the now-empty 'messaging' toolset and
its `hermes tools` toggle. Yuanbao DM guidance now points at the
native yb_send_dm tool.
2026-06-17 07:11:23 -07:00
Teknium
cbfa018aef
fix(auth): retry Codex device-code login on 429 with clear rate-limit message (#47860)
The OpenAI device-code login (POST auth.openai.com/.../deviceauth/usercode)
had no retry or 429 handling — a transient throttle from OpenAI surfaced as
a bare "Device code request returned status 429" with no guidance, reading
as a hard login failure.

- Retry the device-code request with capped exponential backoff (honoring
  Retry-After), up to 4 attempts.
- On persistent 429, raise a clear AuthError tagged CODEX_RATE_LIMITED_CODE
  (classified transient, not a credential problem) with a wait hint.
- Apply the same 429 classification to the token-exchange step (same bug
  class).

Unrelated to PR #47399 (Responses-API cache headers); this is the OAuth
device-code path in hermes_cli/auth.py.
2026-06-17 05:48:35 -07:00
Shannon Sands
674e8b098a Fix dashboard gateway profile scoping 2026-06-17 05:40:57 -07:00
Teknium
e48803daec
fix(gateway): defer macOS launchd reload when run inside the gateway tree (#47842)
When refresh_launchd_plist_if_needed() runs from inside the gateway's own
launchd process tree (agent-initiated self-update via the terminal tool), a
direct launchctl bootout tears down the service's process group — including
the CLI doing the refresh — before the follow-up bootstrap can run. The
gateway is left unloaded and KeepAlive can't revive it (#43842).

Detect in-service execution via gateway.status.get_running_pid() +
_is_pid_ancestor_of_current_process(), and delegate the bootout->bootstrap to
a detached (start_new_session=True) helper that survives the process-group
teardown. The normal out-of-tree CLI path is unchanged.

Fixes #43842.
2026-06-17 05:19:21 -07:00
kshitijk4poor
a7ec334448 fix(cli): deprecated hermes login fails gracefully for any provider
`hermes login` was removed in favor of `hermes auth` / `hermes model`, but
the subparser still validated `--provider` against a hardcoded choices list
(nous, openai-codex, xai-oauth). Running `hermes login --provider anthropic`
therefore crashed in argparse with `invalid choice: 'anthropic'` *before* the
deprecation handler could print the redirect to `hermes model` — so a user
trying to authenticate a perfectly valid provider just saw a hard error and
assumed the feature was broken rather than relocated.

- Drop the restrictive `choices=` so every `--provider` value reaches the
  deprecation handler (which ignores the value and prints guidance).
- Omit the subparser `help=` kwarg so the dead command no longer advertises
  itself in `hermes --help` (#24756). Avoids the `==SUPPRESS==` placeholder
  leak that `help=argparse.SUPPRESS` emits for a top-level subparser on 3.12+.
- `hermes login [--flags]` still reaches the actionable deprecation message
  for old scripts/aliases; `hermes login --help` shows the redirect.

Picks up the intent of the inactivity-closed #24902, rebased onto the
post-refactor parser location (hermes_cli/subcommands/login.py) and extended
to fix the whole bug class (any provider value), not just hiding from --help.

Tests: parametrized provider acceptance + help-suppression (no SUPPRESS leak).
2026-06-17 12:55:40 +05:30
kshitijk4poor
fbaad3031a test(cli): URL tokens must not trigger filesystem path completion
Regression coverage for the keystroke-latency fix: a URL token contains
"/", so the bare-slash path heuristic used to return it as a path word and
run os.listdir on every keystroke. Assert _extract_path_word rejects
http/https/ssh scheme tokens, that ordinary paths (incl. a bare colon) are
unaffected, and that the completer never touches the filesystem for a URL
under the cursor.
2026-06-17 12:33:56 +05:30
xxxigm
d1ecebcbfd
fix(desktop): re-download Electron binary via mirror when pack fails (#47266) (#47276)
* fix(desktop): re-download Electron binary via mirror when pack fails (#47266)

Since #38673 pinned build.electronDist to node_modules/electron/dist,
electron-builder reads the Electron binary straight from there and never
downloads it during `npm run pack`. That dist tree is only produced by the
electron package's postinstall (install.js) during `npm ci`. When that
download is blocked or throttled (GitHub's release host is unreachable in
some regions), the dist is missing and the build dies with:

    The specified electronDist does not exist: .../node_modules/electron/dist

The existing ELECTRON_MIRROR fallback in all three desktop-build paths
(scripts/install.ps1, scripts/install.sh, and `hermes desktop` in
hermes_cli/main.py) re-ran `npm run pack` with ELECTRON_MIRROR set — but
pack never downloads Electron anymore, so the mirror was never used and the
retry re-read the same missing dist. The fallback was effectively dead.

Drive the mirror through electron's own downloader instead:

- Add a dist-presence check + a downloader helper (Test-ElectronDist /
  Restore-ElectronDist, _electron_dist_ok / _restore_electron_dist,
  _electron_dist_ok / _redownload_electron_dist) that wipes a partial dist
  + the path.txt version marker (electron's install.js short-circuits on it)
  and re-runs `node install.js`, optionally via a mirror.
- On the first retry, repopulate a missing dist from the canonical source;
  on the mirror retry, re-fetch through npmmirror.com, then pack.
- Gate the re-download on the dist check so an unrelated build failure
  (tsc/vite) doesn't trigger a pointless ~200 MB refetch, and skip the final
  pack when the binary still can't be fetched instead of failing the same way.

* test(desktop): cover Electron dist re-download mirror fallback (#47266)

Add behavior coverage for the electronDist re-download fix:

- _electron_dist_ok across linux/win32/darwin, including the partial-dist
  case (dir present but binary missing) that makes the pinned electronDist
  fail.
- _redownload_electron_dist: no-op when the binary is present, bail when
  install.js is absent, wipe a stale dist + path.txt marker and run
  electron's downloader with ELECTRON_MIRROR injected, and report failure
  when the download still produces no binary.
- `hermes desktop`: the mirror fallback now drives electron's own downloader
  before re-running pack, and skips the final pack entirely when the binary
  can't be fetched.

Replaces the old mirror test that asserted the (now-fixed) dead behavior of
re-running `npm run pack` with ELECTRON_MIRROR set — pack never downloads
Electron under the pinned electronDist, so that retry could never help.
2026-06-16 15:40:55 -05:00
teknium1
db44af004c test(model-picker): cover two overlapping user-defined custom providers
Guards that two user-defined custom endpoints exposing an overlapping
model each keep their full catalog — the dedup must never cross-filter
two user-defined rows against each other.
2026-06-16 13:09:40 -07:00