hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-21 10:22:18 +00:00

Author	SHA1	Message	Date
kshitijk4poor	69716a2e6f	docs(compression): fix stale 'discarded' wording on in_place config flag Review nit (yoniebans): the config.py comment still said compaction is 'lossy: the pre-compaction transcript is discarded, matching Claude Code / Codex' — leftover from the original destructive design. The shipped behavior is soft-archive: lossy for the LIVE context (what the model reloads), but the pre-compaction turns are kept on disk (active=0, compacted=1), searchable via session_search and recoverable. Comment now says so. Comment-only; no behavior change.	2026-06-20 10:57:07 -07:00
kshitijk4poor	47fadc24d7	feat(compression): in-place compaction option that keeps one session id (#38763 ) Context compression today rewrites the message list AND rotates the session id — it ends the session, forks a parent_session_id child, and renumbers the title (name -> name #2). That moving identity key is the root cause of a whole bug cluster: /goal lost (#33618), pending response lost at the split (#14238), orphan sessions (#33907), TUI sid desync (#36777), FTS search gaps + duplicate sidebar entries (#45117), null continuation cwd (#42228), and title-rename dead-ends (#48989). It also forced a large defensive apparatus (compression lock, contextvar/env/ logging triple-sync, orphan finalization, gateway SessionEntry re-propagation, tip projection) whose only job is surviving a mid-conversation id change. Add a compression.in_place config flag (default False during rollout). When True, compaction rewrites the transcript and rebuilds the system prompt but keeps the SAME session_id: no end_session, no child row, no title renumber, no contextvar/logging re-sync, no memory/context-engine session-switch. The conversation keeps one durable id for life, like Claude Code / Codex. Compaction is lossy by design — the pre-compaction transcript is summarized away, not archived. The rotation path is unchanged when the flag is off (moved verbatim into an else branch). Staged rollout: this PR ships the option behind a default-off flag for live validation; a follow-up flips the default and deletes the now-redundant rotation machinery, superseding the 14 open band-aid PRs in this area. - hermes_cli/config.py: add compression.in_place (default False), documented - agent/agent_init.py: resolve the flag -> agent.compression_in_place - agent/conversation_compression.py: branch compress_context() on the flag - tests/run_agent/test_in_place_compaction.py: in-place invariants + rotation regression guard + config default The pre-flush of current-turn messages (#47202) runs in BOTH modes, so no boundary data loss. Prompt-cache invariant preserved: the system-prompt rebuild is the same single sanctioned invalidation that already happens during compaction — no NEW invalidation. Message alternation preserved.	2026-06-20 10:57:07 -07:00
teknium1	37a4dd4982	fix(auth): heal poisoned Nous inference URL on refresh instead of retaining it A nous inference_base_url that fails the host allowlist (e.g. a stale stg-inference-api.nousresearch.com persisted before the allowlist existed) was only replaced 'if refreshed_url:' — so when the validator rejected the URL it left the poisoned value in place. The 'falling back to default' warning fired but never took effect: every subsequent call, including the auxiliary compression call, kept hitting the dead staging endpoint and 401'd. Reset to DEFAULT_NOUS_INFERENCE_URL when validation returns None at both refresh sites in resolve_nous_runtime_credentials, so a poisoned auth.json self-heals on the next refresh. The proxy adapter already did this correctly; this brings the two auth.py sites in line.	2026-06-20 10:53:45 -07:00
Teknium	11c6f4c7bc	feat(setup): Blank Slate setup mode — minimal agent, opt in to everything (#36733 ) * feat(setup): Blank Slate setup mode — minimal agent, opt in to everything Adds a third first-time setup option alongside Quick Setup and Full Setup. Blank Slate forces ON only what an agent needs to run — provider & model, the File Operations toolset, and the Terminal toolset — and turns everything else OFF, then walks the user through opting each capability back in. What it does: - platform_toolsets.cli = [file, terminal] (explicit, authoritative list) - agent.disabled_toolsets = every other known toolset (web, browser, code_execution, vision, memory, delegation, cronjob, skills, image_gen, kanban, …). Applied last in the resolver, so it overrides the non-configurable platform-toolset recovery that would otherwise re-add toolsets like kanban — guaranteeing a true blank slate. - Optional config features off: compression, memory + user-profile capture, checkpoints, smart model routing, auto session reset. - Bundled skills default to NONE (reuses the .no-bundled-skills marker); offers to seed the full catalog. - Walks through tools / plugins / MCP / messaging, all opt-in. Proven end-to-end: with the Blank Slate config, model_tools.get_tool_definitions emits exactly 6 schemas — patch, process, read_file, search_files, terminal, write_file. Nothing else reaches the model. Re-enable later via hermes tools / hermes skills opt-in --sync / hermes setup agent. Tests: tests/hermes_cli/test_setup_blank_slate.py (8 tests) pin the writers, the resolver invariant ({file, terminal}), and the 6-schema end-to-end set. Docs: getting-started/quickstart.md documents all three setup modes. * feat(setup): Blank Slate fork — finish minimal, or walk through configs After applying the minimal baseline (provider/model + file + terminal, everything else off), Blank Slate now presents a choice instead of always running the full walkthrough: 1. Start with everything disabled — finish now with the minimal agent. 2. Walk through all configurations — opt in to tools, skills, plugins, MCP, and messaging. Provider/model and terminal are still configured first either way (the agent can't run without them). The finish-now path records the bundled-skill opt-out so future `hermes update` runs don't re-inject skills. The walkthrough body moved to a separate _blank_slate_walkthrough() helper. Tests: TestBlankSlateFork covers both branches (finish-now applies baseline + skill opt-out and skips the walkthrough; walkthrough path invokes it). Docs updated to describe the fork.	2026-06-20 10:45:55 -07:00
Teknium	5600105478	refactor(gateway): migrate slack/dingtalk/whatsapp/matrix/feishu/telegram/wecom/email/sms adapters to bundled plugins Salvage of PR #41284 onto current main. Relocates the last 9 inline messaging adapters (+ satellites: telegram_network, feishu_comment/_rules/meeting_invite, wecom_crypto, wecom_callback) from gateway/platforms/ into self-contained bundled plugins under plugins/platforms/<x>/, discovered via the platform registry. Strips the per-platform core touchpoints from gateway/run.py, gateway/config.py, hermes_cli/gateway.py, hermes_cli/setup.py, and tools/send_message_tool.py. Carries forward the migration fixes (explicit enabled:false honored, get_connected_platforms forces discovery, plugin is_connected via gateway.get_env_value, logs --component gateway matches plugins.platforms.*, matrix hidden on Windows). Additionally ports config keys main added since the PR base: the matrix plugin's _apply_yaml_config now also covers allowed_users, ignore_user_patterns, process_notices, and session_scope (the inline gateway/config.py matrix block gained these in the 1340 commits the PR sat open; they would otherwise have been silently dropped on deletion).	2026-06-20 10:26:45 -07:00
kshitijk4poor	a7dd98c860	fix(env): guard remaining malformed int/float env var casts with utils helpers Widen the env_float() guard from #48735 across the whole bug class: a non-numeric value (e.g. a stale .env "HERMES_API_TIMEOUT=abc" or a typo'd port) raised an unhandled ValueError and crashed adapter/agent init. Converts 22 genuinely-unguarded first-party int/float(os.getenv()) sites to the canonical utils.env_int / utils.env_float helpers (the established house pattern), instead of duplicating per-module helpers or inline try/except: - gateway/config.py: WECOM_CALLBACK_PORT, BLUEBUBBLES_WEBHOOK_PORT - gateway/platforms/email.py: EMAIL_IMAP/SMTP_PORT, EMAIL_POLL_INTERVAL - gateway/platforms/feishu.py: dedup cache + text/media batch settings - gateway/platforms/wecom.py, discord/adapter.py: text batch delays - gateway/platforms/telegram.py: media batch delay, TELEGRAM_WEBHOOK_PORT - gateway/platforms/whatsapp.py: WHATSAPP_NPM_INSTALL_TIMEOUT - hermes_cli/auth.py: CODEX/XAI refresh timeouts - agent/chat_completion_helpers.py: API/stream read/stale timeouts - run_agent.py, agent/auxiliary_client.py: API + nous timeouts Sites already guarded by try/except or local helpers are left untouched. The HERMES_MAX_ITERATIONS sites are already guarded on main via _current_max_iterations(), so they are not included.	2026-06-20 14:54:36 +05:30
helix4u	c253b07380	fix(model): clear stale endpoint credentials across switches	2026-06-19 19:58:26 -07:00
helix4u	95a3affc2e	fix(model): keep Nous picker from restoring stale custom keys	2026-06-19 19:58:26 -07:00
Teknium	cf58f1a520	feat(titles): support language-aware title generation (#45296 ) Make auxiliary title prompts match the user language by default, with an optional pinned `auxiliary.title_generation.language` config.	2026-06-19 17:15:52 -07:00
teknium1	64b21e50fb	fix(cli): publish agent ref to cli module so memory on_session_end fires on exit The god-file Phase 4 refactor (`094aa85c37`) moved agent construction into CLIAgentSetupMixin, which set the atexit shutdown reference with a bare `global _active_agent_ref`. After extraction that global binds the mixin module's namespace, not cli.py's. cli._run_cleanup reads cli._active_agent_ref to decide whether to fire the memory provider's on_session_end hook — and it stayed None for the whole session, so the `if _active_agent_ref:` branch was dead and on_session_end never ran on /exit. Custom memory providers silently lost end-of-session extraction. Fix: publish the reference onto the cli module explicitly (`import cli as _cli; _cli._active_agent_ref = self.agent`), using the deferred-import pattern already established in the mixin. Regression test asserts cli._active_agent_ref is populated by the mixin's publish line and guards against a relapse to the bare `global` form. The existing shutdown tests passed only because they hand-assigned the ref, which is exactly what masked this.	2026-06-19 16:59:43 -07:00
kshitijk4poor	d4e7dd609d	refactor(windows): tidy managed-node resolver helpers Behavior-preserving cleanups on the managed-node resolver: - Hoist _candidate_node_command_names() out of the inner dir loop in find_hermes_node_executable (computed once, not per directory). - Drop redundant os.environ.copy() at the two with_hermes_node_path( os.environ.copy()) sites \u2014 the helper already copies os.environ when called with no argument (verified env-equivalent). - Add reciprocal keep-in-sync comments between iter_hermes_node_dirs() (hermes_constants.py) and hermesManagedNodePathEntries() (electron main.cjs), which mirror the same platform-ordering rule across the Python/Node boundary.	2026-06-20 02:12:16 +05:30
kshitijk4poor	fcc169057d	fix(windows): prefer managed npm for hermes update desktop-rebuild gate The `hermes update` desktop-rebuild gate still used a bare `shutil.which("npm")` presence check. On a Windows box where the only working npm is the Hermes-managed npm.cmd (not on PATH), the gate would skip the desktop rebuild even though _build_web_ui / cmd_gui can now find it via find_node_executable. Route the gate through the same resolver for full bug-class coverage. Surfaced during review of #49239.	2026-06-20 02:01:24 +05:30
helix4u	7a7b56d498	fix(windows): prefer managed node for whatsapp and desktop	2026-06-20 02:00:37 +05:30
teknium1	2bd1977d8f	chore: release v0.17.0 (2026.6.19)	2026-06-19 12:38:31 -07:00
alt-glitch	88d523220f	fix(mcp): address adversarial review round 2 (stale-publish race, parity holes) Second review pass (Codex + Hermes subagent). Codex reproduced a real race with a two-thread harness; both converged on the remaining issues. - Generation-aware publish (fixes a lost-update race): two refresh callers (the late-refresh daemon and the between-turns prologue around turn 1) could each compute a snapshot outside the lock; a SLOWER caller holding an OLDER registry generation could acquire the publish lock after a newer caller and clobber it, deleting just-landed tools. refresh_agent_mcp_tools now captures registry._generation before computing and refuses to publish a stale set; agent._tool_snapshot_generation tracks the published generation. - Context-engine routing names (_context_engine_tool_names) are now staged on a local and published atomically with the snapshot, and only claimed when this rebuild actually appended the schema — matching agent_init's dedup so a registry/plugin tool of the same name keeps its own dispatch. (Previously mutated live, before the publish lock, and on no-change refreshes.) - CLI /reload-mcp: self.enabled_toolsets is resolved once at startup, so a server newly ENABLED in config mid-session wasn't picked up (TUI already re-resolved). Merge now-connected MCP server names into the override (unless the user pinned all/*), mirroring startup, and keep self.enabled_toolsets in sync. Closes the CLI/TUI parity hole. - ACP (acp_adapter/server.py) routed through the shared helper — it was a 5th sibling rebuild that re-injected memory tools but NOT context-engine tools and bypassed the atomic/name-diff path (inert today, fragile). - mcp_startup._resolve_discovery_timeout pulls its default from DEFAULT_CONFIG (single source of truth) instead of a stale hardcoded 5.0 literal. - Tests: stale-generation-no-clobber, _skip_mcp_refresh honored, timeout fallback uses DEFAULT_CONFIG.	2026-06-19 11:57:43 -07:00
alt-glitch	b6e2a54a94	fix(mcp): address adversarial review round 1 (cache parity, gates, races) Consolidated findings from three independent reviewers (Codex, Claude Code, a Hermes subagent w/ the hermes-agent-dev skill): - BLOCKING: refresh_agent_mcp_tools rebuilt only the registry subset, silently dropping post-build-injected memory-provider (mem0/honcho/…) and context- engine (lcm_) tools on every refresh. Now additive-preserving: re-applies the same injectors agent_init uses, staged on locals and published atomically. - Re-injection now honors the #5544 enabled_toolsets gate for context-engine tools, so a restricted-toolset platform can't get lcm_ leaked back in. - Atomic read-diff-publish under one lock: the returned `added` set and the (tools, valid_tool_names) pair are consistent even under concurrent callers (no half-swap, no TOCTOU). - background_review fork opts out (_skip_mcp_refresh) so its byte-identical tools[] cache parity with the parent is preserved. - CLI /reload-mcp routed through the shared helper (was a 4th divergent copy with the same clobber bug + missing disabled_toolsets). - Explicit reloads (TUI RPC + CLI) pass enabled_override so a server the user just enabled in config this session is picked up; automatic paths reuse the agent's build-time selection. - mcp_discovery_timeout default 5.0 -> 1.5s: correctness now comes from the between-turns refresh, so the startup wait is only a small turn-1 UX bump rather than a heavy dead-server latency penalty. - has_registered_mcp_tools checks registered TOOLS (not connected servers) so a zero-tool/prompt-only server doesn't make the per-turn hook fire forever. - Tests: rewrote the thread-safety test to actually exercise the write path (alternating tool sets), added the #5544-gate regression, the memory/context preservation regression, and a "callable next turn via valid_tool_names" contract; removed a dead monkeypatch line.	2026-06-19 11:57:43 -07:00
alt-glitch	93d6e73028	fix(mcp): expose late-connecting MCP tools to the agent (TUI/CLI/gateway) MCP servers that connect after the agent's one-time tool snapshot were invisible for the whole session. Two root causes, fixed together: 1. The startup discovery wait was a flat 0.75s. HTTP/OAuth servers commonly take 2-6s on a cold connect, so they missed the window and their tools never entered the agent's snapshot. `thread.join(timeout)` already returns the instant discovery completes, so raising the bound costs ~0s for the common case (no MCP / fast servers) and only ever blocks for a genuinely-pending server, capped so a dead server can't freeze startup. The bound is now configurable via `mcp_discovery_timeout` (config.yaml, default 5.0s). 2. Three call sites duplicated the agent tool-snapshot rebuild (the TUI `reload.mcp` RPC, the gateway reload, and the TUI late-binding refresh thread), and the late-refresh detected changes by tool COUNT — missing an equal-size add/remove swap. Consolidated into one shared `tools.mcp_tool.refresh_agent_mcp_tools(agent)` helper that diffs by tool NAME, mutates the agent under a lock (thread-safe), and respects the agent's own enabled/disabled toolsets. The late-binding refresh keeps its pre-first-turn cache-safety guard: it never rebuilds the tool list once a turn has started, so the cached prompt prefix is never invalidated mid-conversation. Tests: new tests/tools/test_refresh_agent_mcp_tools.py covers the name-based diff, in-place mutation, agent-scoped filtering, thread safety, and the config-driven discovery bound (incl. instant-return when nothing is pending). 75 passed across the touched areas.	2026-06-19 11:57:43 -07:00
Teknium	2a5e9d994a	Merge pull request #48275 from NousResearch/feat/cron-scheduler-provider-chronos feat(cron): pluggable CronScheduler interface + Chronos managed-cron provider (scale-to-zero)	2026-06-19 07:51:59 -07:00
Ben	1928aa0443	fix(managed-scope): honor managed scope in config→env bridges too Manual verification surfaced a second bypass class beyond the standalone config loaders: several code paths bridge config.yaml values into os.environ (HERMES_TIMEZONE, HERMES_REDACT_SECRETS, HERMES_MAX_ITERATIONS, TERMINAL_*, network.force_ipv4, ...) by reading the raw user YAML, so the env the whole process reads carried the USER's value even when an administrator pinned it — e.g. a managed timezone was overridden because gateway/run.py wrote the user's timezone into HERMES_TIMEZONE, and _resolve_timezone_name() checks the env var first. Wired the shared apply_managed_overlay() into every config→env bridge: - gateway/run.py module-level startup bridge (timezone, redact_secrets, max_turns, terminal, display, gateway.strict, ...) - gateway/run.py _reload_runtime_env_preserving_config_authority (the per-turn re-bridge that keeps config authoritative over reloaded .env — must keep MANAGED authoritative on every turn, not just startup) - hermes_cli/main.py early security.redact_secrets / network.force_ipv4 bridge (runs before load_config is usable, at import time) - hermes_cli/send_cmd.py top-level scalar config→env bridge Verified end-to-end against a writable managed dir (12/12 checks incl. timezone, logging, model, skin, gateway settings, write-guard) and in a clean process the gateway per-turn bridge writes HERMES_TIMEZONE=<managed>. Adds an order-independent regression test for the bridge overlay.	2026-06-19 07:46:33 -07:00
Ben	b0e47a98f9	fix(managed-scope): honor managed scope in all standalone config loaders The skin bug was one instance of a class: several subsystems build their config dict directly from config.yaml instead of routing through hermes_cli.config.load_config (which carries the managed merge), so they silently ignored administrator-pinned values. Audited every config.yaml reader and fixed the behavioral-read bypasses: - gateway/config.py load_gateway_config (messaging gateway: session_reset, quick_commands, stt, model, ...) - gateway/run.py _load_gateway_config (its read_raw_config fast path also skipped the merge — read_raw_config returns raw user YAML) - tui_gateway/server.py _load_cfg (new TUI + desktop backend: skin, reasoning_effort, service_tier, provider_routing) - cron/scheduler.py (scheduled-job model/reasoning/toolsets/provider_routing) - hermes_logging.py (logging.level/max_size_mb/backup_count) - hermes_time.py (timezone) - hermes_cli/doctor.py (memory-provider diagnostic reads effective config) All route through a new shared managed_scope.apply_managed_overlay() helper that mirrors _load_config_impl (env-only expansion so a user ${VAR} can't shadow a managed literal, root-model-string normalization, leaf-merge) and is fail-open. cli.py's earlier inline fix is refactored onto the same helper. Write-back paths (slash_commands, telegram/yuanbao dm_topics, profile distribution) are deliberately left reading raw user YAML — overlaying managed values there would persist them into the user file. The dashboard (web_server.py) already routes through load_config and needed no change. TUI loader caches the RAW config so _save_cfg never writes managed values to disk. Adds test_managed_scope_overlay.py (helper) and test_managed_scope_loaders.py (per-surface integration); mutation-checked.	2026-06-19 07:46:33 -07:00
Ben	ddd519ea70	feat(managed-scope): surface managed scope in config show and doctor - show_config prints an administrator header naming the managed source and lists the pinned config/env keys when a scope is active (silent otherwise). - hermes doctor gains a managed_scope_check under Configuration Files that reports the resolved managed dir + pinned key counts, and flags a HERMES_MANAGED_DIR redirect (the documented foot-gun).	2026-06-19 07:46:33 -07:00
Ben	4f9e15df97	feat(managed-scope): guard writes to managed config/env keys - set_config_value hard-rejects a managed config key (D2) and names the source, exiting non-zero. - save_env_value / remove_env_value refuse a managed env key. - save_config strips managed leaves from a bulk write (mechanical safety net) with a warning, so the unmanaged remainder still persists. New _strip_dotted_keys helper drives the bulk-save pruning. All guards are distinct from and layered after the existing is_managed() package-manager write-lock.	2026-06-19 07:46:33 -07:00
Ben	81a663abea	feat(managed-scope): apply managed .env last with override load_hermes_dotenv now loads the managed-scope .env after user/project .env and external secret sources, with override=True, so managed env values beat the user .env and any pre-existing shell export. Reuses the existing dotenv fallback + credential-sanitization path. Fail-open: no managed dir/.env is a no-op and any error is swallowed so managed scope never blocks startup.	2026-06-19 07:46:33 -07:00
Ben	b5ddd6e719	feat(managed-scope): managed config layer wins over user config _load_config_impl now deep-merges the managed config.yaml on top of the expanded user config so managed leaves win while sibling keys stay user-controlled (leaf-level merge, D3). Managed values are expanded against the process env only, never user-defined ${VAR}, so a user can't shadow a managed literal. The managed file's (mtime,size) is folded into the load cache key so editing it invalidates the cache. This inverts the usual env-over-config precedence for pinned keys by design (see design doc §4.1).	2026-06-19 07:46:33 -07:00
Ben	9cbcc0c9c8	feat(managed-scope): add managed_scope module (resolver, loaders, key helpers) New hermes_cli/managed_scope.py resolves a system-level managed directory (HERMES_MANAGED_DIR override > /etc/hermes), parses managed config.yaml/.env with fail-open semantics, and exposes is_key_managed/is_env_managed helpers. The system default is ignored under pytest and HERMES_MANAGED_DIR is added to the conftest env scrub so a real managed scope can't leak into the suite. Not wired into the load paths yet (Phases 2-3).	2026-06-19 07:46:33 -07:00
teknium1	a58287afcb	Merge remote-tracking branch 'origin/main' into pr48275-rebase # Conflicts: # cron/scheduler.py	2026-06-19 07:40:29 -07:00
Teknium	35e7ca03d5	fix(kanban): treat already-gone worker as terminated, not survived _terminate_reclaimed_worker early-returned on ProcessLookupError with terminated=False. The new reclaim-defer guard reads that as 'worker survived the kill' and defers the reclaim forever, so a stale task whose worker is already dead never lands in result.stale. ProcessLookupError means the process is gone — that IS a successful termination. Split it from the generic OSError branch and set terminated=True.	2026-06-19 07:38:10 -07:00
Sahil Saghir	b9e521da23	fix(kanban): hold reclaim while the worker is still alive release_stale_claims and detect_stale_running call _terminate_reclaimed_worker and then release the task claim unconditionally, even when the termination did not actually kill the worker. _terminate_reclaimed_worker already reports this via its "terminated" flag, but the callers ignore it. When a worker is parked in uninterruptible (D) state — for example throttled by a cgroup memory.high limit — a pending SIGTERM/SIGKILL cannot be delivered until the throttle lifts, so the kill is a no-op. The dispatcher then frees the claim and spawns a fresh worker beside the still-alive one. Repeated every dispatch tick this accumulates duplicate workers without bound, deepening the memory pressure that caused the throttle in the first place — a self-reinforcing runaway. Fix: gate both automatic reclaim paths on _worker_survived_termination(). When we attempted to kill our own host-local worker and it is still alive, defer the reclaim (_defer_reclaim_for_live_worker extends the claim a short grace and emits a reclaim_deferred event) instead of releasing. This guarantees at most one live worker per task and is self-correcting: not spawning a duplicate is what relieves the pressure so the pending signal lands and the worker dies, and the next tick reclaims cleanly. Non-host-local claims and the operator-driven reclaim_task() path keep their existing force-release behaviour. Related: #41448 (concurrent dispatchers amplify this by doubling reclaim frequency); #42858 (kill the worker rather than orphan it on archive). Tests: defer-when-worker-survives, reclaim-when-killed, release-when-not-host-local, and the detect_stale_running path.	2026-06-19 07:38:10 -07:00
Teknium	d7bff949af	fix(cli): default cli_refresh_interval to 1.0 to keep status bar alive (#49087 ) PR #49056 set the default to 0, which reverts the #45592 idle-clock fix: without a periodic invalidate, prompt_toolkit stops repainting the bottom chrome during idle and the status bar goes stale/disappears after a turn. Restore 1.0 as the default for everyone. The config knob stays — users on emulators where the per-second redraw fights auto-scroll (#48309) can set display.cli_refresh_interval: 0 to opt out.	2026-06-19 07:35:06 -07:00
Ben Barclay	1e70df5fdd	feat(gateway): multiplex phase 4 — lifecycle guard + per-profile observability - _guard_named_profile_under_multiplexer: when the default gateway is running with gateway.multiplex_profiles=on, a named-profile 'hermes gateway run' hard -errors (pointing at the multiplexer) instead of double-binding that profile's platforms. Inert unless all hold: this invocation is a named profile, a default-profile gateway is alive, and its config has multiplexing on. --force overrides. Wired into run_gateway's guard chain. - write_runtime_status gains served_profiles: the secondary-adapter startup records [active] + multiplexed profiles into runtime_status.json so 'hermes status' can show per-profile coverage without a second probe. Absent for single-profile gateways. Tests: served_profiles round-trips and is absent by default; guard is inert for the default profile / under --force / when no default gateway is running.	2026-06-19 07:34:15 -07:00
Ben Barclay	f538470cf4	feat(gateway): multiplex phase 2 — fail-closed profile credential isolation (Workstream A) The credential gate. When multiplexing is active, a profile's secrets resolve from a context-local scope, never the process-global os.environ (which in a multiplexer may hold another profile's keys, and is inherited by every subprocess spawned with env=dict(os.environ)). - agent/secret_scope.py: get_secret() backed by a secret-scope contextvar. FAIL-CLOSED: when multiplex is active and no scope is installed, an unscoped read RAISES UnscopedSecretError instead of falling back to os.environ — a missed/new call site crashes loudly at that line rather than leaking a cross-profile value. Genuinely-global vars (HERMES_*, PATH, kanban paths, …) keep reading os.environ via an allowlist. load_env_file/build_profile_ secret_scope parse a profile .env into an isolated dict WITHOUT mutating os.environ. Off by default => transparent os.getenv behavior. - hermes_cli/runtime_provider.py: all credential/provider/base-url reads go through _getenv -> get_secret. - agent/credential_pool.py: env fallbacks route through get_secret (the ~/.hermes/.env-first preference is preserved and already profile-correct via the home override). - tools/mcp_tool.py: MCP config interpolation resolves through get_secret, so a server's picks up the routed profile's value. - gateway/run.py: set_multiplex_active() at GatewayRunner init; per-turn .env reload is a no-op for credentials in multiplex mode (secrets come from the scope, not global env); _profile_runtime_scope context manager combines the HERMES_HOME override + secret scope; _run_agent wraps _run_agent_inner in that scope (resolved via _resolve_profile_home_for_source) when multiplexing. Propagates into the agent worker thread for free via the existing copy_context() in _run_in_executor_with_context. Tests: 13 unit (fail-closed, scope isolation, global allowlist, .env parsing without environ mutation) + 7 E2E (runtime_provider + MCP interpolation prove two profiles isolated, unscoped read raises, globals still read environ).	2026-06-19 07:34:15 -07:00
Ben Barclay	d82f9fa7f7	feat(gateway): multiplex phase 0 — config flag, profile enumeration, profile-stamped session keys Foundations for serving multiple profiles from one gateway process, inert when off: - gateway.multiplex_profiles config flag (default false), round-trips through GatewayConfig and load_gateway_config (top-level + nested gateway.* form). - hermes_cli.profiles.profiles_to_serve(multiplex): the single chokepoint for which (profile, HERMES_HOME) pairs the gateway serves. Lightweight dir scan; active-profile-only when off, default + all named profiles when on. - build_session_key gains a profile= namespace slot. Default/None reuse the historical 'agent:main:...' literal BYTE-IDENTICALLY (no session migration, positional parsers unaffected); a named profile becomes 'agent:<profile>:...' so two profiles on the same platform/chat never collide. - SessionStore._resolve_profile_for_key + _session_key_for_source fallback resolve the namespace from the flag (legacy when off, active profile when on). Tests: byte-identical-when-off (parametrized), namespace isolation, positional layout preserved, config round-trip, profiles_to_serve enumeration.	2026-06-19 07:34:15 -07:00
teknium1	1d59d2dcae	feat(desktop): resolve OAuth status for catalog-only account providers Accounts-tab cards derived from the unified provider_catalog() carry status_fn=None and had no hardcoded branch in _resolve_provider_status, so any future OAuth/account provider plugin rendered permanently logged-out. Fall through to the canonical hermes_cli.auth.get_auth_status slug dispatcher and adapt its shape, so membership AND status both auto-extend with the hermes model universe.	2026-06-19 07:26:46 -07:00
Austin Pickett	8fe7b52ebf	test(desktop): lock GUI⊇`hermes model` provider parity; surface Bedrock Adds the end-to-end parity contract test: every CANONICAL_PROVIDERS entry (the `hermes model` universe) must be configurable on a desktop Providers tab — keys(/api/env) ∪ ids(/api/providers/oauth) ⊇ canonical. Asserted as an invariant against the live endpoints so the GUI can never silently drift from the CLI again. Surfacing this contract caught Bedrock: it's aws_sdk (no api-key vars), so it had no Keys card. /api/env now tags AWS_REGION/AWS_PROFILE to the bedrock provider card. Anthropic is whitelisted as a legitimate dual-tab provider (direct API key + subscription OAuth). Also refreshes the _OAUTH_PROVIDER_CATALOG docstring to describe its new role as the override base for _build_oauth_catalog().	2026-06-19 07:26:46 -07:00
Austin Pickett	60dfa0f31b	feat(desktop): Accounts tab derives membership from unified provider catalog /api/providers/oauth now unions the explicit hand-tuned OAuth cards (_OAUTH_PROVIDER_CATALOG — bespoke flow/status/cli, plus the api-key Anthropic PKCE card and synthetic claude-code row) with every accounts-tab provider in provider_catalog(). Any OAuth/external provider in the `hermes model` universe now appears automatically, closing the drift where google-gemini-cli and copilot-acp had no Accounts card despite being CLI-configurable. Adds read-only status cards for google-gemini-cli (via existing get_gemini_oauth_auth_status) and copilot-acp (managed-by-CLI, like claude-code). DELETE handler routes through the same _build_oauth_catalog() builder. Parity test asserts the Accounts tab offers every accounts-tab catalog provider as an invariant.	2026-06-19 07:26:46 -07:00
Austin Pickett	3be1326f8d	feat(desktop): /api/env derives provider key membership from unified catalog The Keys tab now surfaces every keys-tab provider in provider_catalog() (the `hermes model` universe), synthesizing a card even when the env var has no hand entry in OPTIONAL_ENV_VARS. Closes the drift where openai-api, kilocode, novita, tencent-tokenhub, and copilot were CLI-configurable but invisible in the desktop Providers → API keys tab. Each provider row now carries backend-derived provider/provider_label grouping hints so the desktop can group by the same provider identity the CLI picker uses. Hand OPTIONAL_ENV_VARS prose still wins where present (enrichment, not a gate). Shared non-provider credentials (e.g. tool-category GITHUB_TOKEN) are explicitly not hijacked into a provider card — Copilot uses its provider-owned COPILOT_GITHUB_TOKEN.	2026-06-19 07:26:46 -07:00
Austin Pickett	054b8c82fd	feat: unified provider_catalog() — one source for CLI picker and desktop tabs Adds hermes_cli/provider_catalog.py, deriving one descriptor per provider from the CANONICAL_PROVIDERS universe (what `hermes model` renders, auto-extended from provider plugins), joined with auth/env from PROVIDER_REGISTRY and display metadata from ProviderProfile (with canonical/env fallbacks for the four profile-less providers and the many profiles with blank display/signup fields). Each descriptor is tagged with the desktop tab it belongs on (keys vs accounts) by auth_type. This is the single source of truth the desktop Providers tabs will derive membership from, so they can no longer drift from the CLI picker. Tests assert the parity contract (catalog == hermes model universe) and tab routing as invariants, not snapshots.	2026-06-19 07:26:46 -07:00
Alex Yates	fad4b40d9d	fix(model): persist /model switch by default across sessions A plain /model <name> switch only lasted for the current session — every new session reverted to the previously-configured model, so users had to re-switch every time (e.g. glm-5.1 -> glm-5.2 on every launch). Persist-by-default is now the behavior across all three /model surfaces (CLI, gateway, TUI/dashboard), gated by a new config key model.persist_switch_by_default (default true): /model <name> switch model (persists to config.yaml) /model <name> --session switch for this session only /model <name> --global switch and persist (explicit, unchanged) The effective persistence is resolved once via resolve_persist_behavior() in hermes_cli/model_switch.py so --session opts out, --global opts in, and the config-gated default applies otherwise. --global remains a valid explicit no-op alias for the new default.	2026-06-19 07:07:06 -07:00
OYLFLMH	c1ffd4c3b4	fix(cli): make refresh_interval configurable, default to 0 (disabled) Commit `6724daa2c` added refresh_interval=1.0 to keep the idle clock ticking, but unconditional 1 Hz redraws in non-fullscreen prompt_toolkit mode cause terminal emulators (Xshell, iTerm2, Windows Terminal) to auto-scroll to the bottom on every tick — breaking scroll-up to read history. Drive it from display.cli_refresh_interval (0 = disabled, the default) so users who want the ticking clock can opt in without affecting everyone. Fixes: #48309 Related: `6724daa2c`, `8972a151a`	2026-06-19 07:06:34 -07:00
kshitijk4poor	01a6f11896	fix(debug): include gui.log (dashboard/TUI/pty/websocket) in hermes debug share gui.log was registered in hermes_cli/logs.py::LOG_FILES (and surfaced by `hermes logs gui`) but was never wired into `hermes debug share`. The share report captured agent/errors/gateway/desktop tails plus full agent/gateway/ desktop logs — but nothing from gui.log, the surface the dashboard, TUI-over- PTY bridge, and websocket layer (hermes_cli.web_server / pty_bridge / tui_gateway) actually write to. A user reporting a dashboard or TUI bug shared zero breadcrumbs from the broken surface. Wire gui.log through all three share surfaces, matching the existing pattern: - _capture_default_log_snapshots(): capture the gui snapshot (redacted like the rest) - collect_debug_report(): add the gui.log summary tail block - build_debug_share(): pull gui full_text, prepend dump header + redaction banner, add to the upload loop - run_debug_share() --local branch: same, plus the local print block - _PRIVACY_NOTICE: name gui.log in both bullets Redaction is inherited for free — the gui snapshot goes through the same _capture_log_snapshot(..., redact=redact) path, so secrets are scrubbed in both the tail and full text (verified E2E: seeded key masked by default, passes through under --no-redact, raw token never leaks). Tests: seed gui.log in the fixture, add test_report_includes_gui_log, and bump the upload-count tripwire 4->5 (test_share_uploads_five_pastes).	2026-06-19 07:05:42 -07:00
Charles Power	fd92a3a5c9	fix(gateway): Windows restart no longer causes a silent outage `hermes gateway restart` on Windows could take the gateway offline with no replacement. restart() was stop() -> sleep(1.0) -> start(), but the graceful drain can run up to ~180s while the detached pythonw process stays alive. The 1s sleep let start() run against the still-draining old process; its "already running" guard then no-opped, and when the old process finally exited nothing relaunched it. Two root causes, both fixed: 1. Loose PID detection. `_scan_gateway_pids` and the gateway.status helpers used substring matches ("... gateway" in cmdline) for lifecycle decisions, so they false-matched `gateway status`/`dashboard` siblings and unrelated processes like `python -m tui_gateway`, plus stale gateway.pid records. Add a shared strict matcher `looks_like_gateway_command_line()` in gateway/status.py that requires the real `gateway run` subcommand (or the dedicated entrypoints), and route `_looks_like_gateway_process`, `_record_looks_like_gateway`, and `_scan_gateway_pids` through it. 2. restart() race. Wait until the gateway is authoritatively gone (`get_running_pid()` + strict `_gateway_pids()`) before relaunch; force-kill once if it lingers and raise rather than start a duplicate; verify the relaunch produced a running gateway and raise loudly if not (no more exit-0 silent outage). Scoped to Windows; systemd/launchd restart paths are already drain-aware. Adds tests/gateway/test_gateway_command_line_matcher.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-19 06:31:56 -07:00
xxxigm	e738c08336	fix(backup): exclude regeneratable dependency and cache dirs `hermes backup` walked every file under HERMES_HOME, excluding only hermes-agent / node_modules / __pycache__ / backups / checkpoints. Python dependency trees (plugin and MCP-server venvs, site-packages) and pip/uv tool caches that live under HERMES_HOME were swept in file-by-file, ballooning a backup to hundreds of thousands of entries that crawl for hours — the reported "backup stuck for days / 426543 files" symptom. Add the canonical regeneratable-dir names (.venv, venv, site-packages, .tox, .nox, .pytest_cache, .mypy_cache, .ruff_cache — mirroring agent.skill_utils.EXCLUDED_SKILL_DIRS) plus .cache to the backup's exclusion set, used by both run_backup and the pre-update/pre-migration _write_full_zip_backup. .archive is intentionally left in so the curator's restorable archived skills still get backed up. Tests cover each new dir name (excluded at any depth), that .archive and cache-resembling files are kept, and an integration check that a planted venv/site-packages/cache is pruned from the actual backup zip while skills/config survive.	2026-06-19 14:37:41 +05:30
kshitijk4poor	1ab6f34791	refactor(dashboard): align Slack allowlist validation with gateway parse - Drop empty entries before validating SLACK_ALLOWED_USERS so a trailing or interior comma (which the gateway silently tolerates in gateway/platforms/slack.py) is no longer rejected at the dashboard. - Hoist the member-ID regex to a module-level _SLACK_MEMBER_ID_RE constant and note it stays in sync with the frontend SLACK_MEMBER_ID_RE. - Add a regression test for the trailing-comma case.	2026-06-19 12:22:30 +05:30
kshitijk4poor	83c034bd5b	fix(dashboard): accept Slack allow-all wildcard in allowed-users validation The new SLACK_ALLOWED_USERS validation rejected '', but the Slack gateway honors '' as an allow-all wildcard (gateway/platforms/slack.py DM auth, slash-confirm, and approval-button paths). Accept '*' as a valid list entry in both the API validator and the dashboard form so a value the runtime honors is no longer blocked at setup.	2026-06-19 12:18:15 +05:30
Shannon Sands	d9190491a6	Add Slack setup hints and field validation	2026-06-19 12:16:23 +05:30
Shannon Sands	f741e70791	Add Slack allowed users setup field	2026-06-19 12:16:23 +05:30
kshitij	6278bca055	Merge pull request #48259 from NousResearch/fix/ns501-multipart-upload-salvage fix(dashboard): clean up upload temp file on client disconnect + pin python-multipart (NS-501)	2026-06-19 12:03:58 +05:30
Shannon Sands	12dfcfdf73	fix(tui): restart dashboard chat on idle exit hotkeys	2026-06-19 12:02:22 +05:30
AhmetArif0	245b95b094	fix(terminal): block gateway lifecycle commands from inside the gateway process systemctl --user restart hermes-gateway run via the terminal tool is a child of the gateway itself. When systemd delivers SIGTERM the gateway kills this subprocess before it can complete, so the service may never restart — reproducing issue #37453. The hermes gateway restart/stop guard (hermes_cli/gateway.py) and the cron-path guard (hermes_cli/cron.py) already block equivalent commands in their respective paths but the terminal tool had no such defense. Add a hard-block before command execution in terminal_tool: when _HERMES_GATEWAY=1 and the command matches _contains_gateway_lifecycle_command, return an error immediately. force=True cannot bypass it — unlike the normal dangerous-command approval flow, here even a user-approved restart would fail because the SIGTERM propagates to child processes. Also extend _GATEWAY_LIFECYCLE_PATTERNS to match systemctl with flags (e.g. systemctl --user restart) — the previous regex required the action word immediately after systemctl with no flags in between. Adds 9 regression tests: 6 blocked variants (parametrized), force bypass attempt, safe systemctl passthrough, and guard-inactive-outside-gateway.	2026-06-19 11:53:44 +05:30
Ben	637aff46e7	Merge remote-tracking branch 'origin/main' into hermes/hermes-6fe26723	2026-06-19 15:17:13 +10:00

1 2 3 4 5 ...

2891 commits