hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-31 19:16:29 +00:00

Author	SHA1	Message	Date
teknium	98ae28657f	feat(display): document and test memory_notifications setting Follow-up to salvaged PR #4684: - Add display.memory_notifications to DEFAULT_CONFIG (off\|on\|verbose, default on) - Document the setting in docs/user-guide/features/memory.md - Add resolver tests for off/on/verbose memory + skill paths	2026-06-16 05:45:40 -07:00
Wolfram Ravenwolf	4cf9d80fba	feat(display): verbose skill change notifications with content previews When display.memory_notifications is set to 'verbose', skill_manage notifications now show meaningful change details instead of just the generic tool message. Before (verbose mode): 💾 📝 Patched SKILL.md in skill 'gogcli' (1 replacement). After (verbose mode): 💾 📝 Skill 'gogcli' patched: "old pitfall text..." → "new pitfall text..." Changes: - skill_manager_tool.py: _patch_skill() now includes old/new string previews (truncated to 200 chars) in the result via '_change' key. _create_skill() and _edit_skill() include skill description from frontmatter for verbose create/edit notifications. - run_agent.py: Background review notification builder now reads the '_change' dict from skill tool results and formats descriptive notifications per action type (patch → old→new diff, create/edit → description preview). Falls back to generic message when _change data is unavailable (backwards compatible). This is especially useful when subagents patch skills, since neither the user nor the parent agent can see what the subagent changed.	2026-06-16 05:45:40 -07:00
Teknium	a6364bfa08	fix(telegram): edit streamed previews in place as rich (Bot API 10.1) (#46890 ) Streamed Telegram replies that finalize through editMessageText were converted to MarkdownV2, which has no table syntax and rewrites pipe tables into bullet lists — users saw a table while streaming that collapsed to a list at the last moment. Finalize now edits the existing preview IN PLACE via Bot API 10.1's editMessageText rich_message parameter when the content has constructs the legacy path degrades (tables, task lists, <details>, block math). No fresh send + delete, so no duplicate-preview flicker — the reason #46206 reverted the fresh-final re-send path. prefers_fresh_final_streaming stays False; the in-place edit replaces it. - _needs_rich_rendering(): rich reserved for table/task-list/details/math (adapted from #45995, @YonganZhang); plain replies stay on MarkdownV2. - _try_edit_rich(): editMessageText + rich_message via do_api_request, mirroring _try_send_rich's fallback/latch/transient contract. - edit_message finalize tries rich in place before the 4,096 overflow pre-flight (rich cap is 32,768), falling back to legacy on rejection. - rich_messages default flipped back to True (DEFAULT_CONFIG + adapter). - docs (en + zh-Hans) + cli-config example updated to default-on. Closes the root cause behind #45911 / #46009.	2026-06-16 05:26:04 -07:00
underthestars-zhy	5b3fa26366	fix(photon): unify project identifiers and update documentation for Spectrum provisioning Co-Authored-By: Marvin <marvin@photon.codes> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 05:25:56 -07:00
brooklyn!	c6b0eb4de0	fix(desktop): open remote-gateway artifacts via authenticated download (#46895 ) Some checks failed Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details Typecheck / typecheck (apps/bootstrap-installer) (push) Waiting to run Details Typecheck / typecheck (apps/desktop) (push) Waiting to run Details Typecheck / typecheck (apps/shared) (push) Waiting to run Details Typecheck / typecheck (ui-tui) (push) Waiting to run Details Typecheck / typecheck (web) (push) Waiting to run Details Typecheck / desktop-build (push) Waiting to run Details Docker / shell lint / Lint Dockerfile (hadolint) (push) Has been cancelled Details Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Has been cancelled Details OSV-Scanner / Scan lockfiles (push) Has been cancelled Details uv.lock check / uv lock --check (push) Has been cancelled Details On a remote gateway connection, agent-written files live on the gateway host, not the desktop's disk, so the Artifacts view's file:// hrefs failed ("Invalid external URL") and image thumbnails broke. Make mediaExternalUrl() remote-aware in one place: in remote mode it rewrites gateway-local paths to GET /api/files/download (a new endpoint that streams the file as a Content-Disposition: attachment). The artifacts view now resolves through it, and so do the existing chat-media and generated-image callers, for free. The download endpoint stays auth-gated; auth_middleware additionally accepts the session token as a ?token= query param for this one path so a shell/browser-opened download (which can't set the session header) still authenticates — the same query-token tradeoff as the /api/pty WebSocket. It is NOT added to PUBLIC_API_PATHS. Salvages #46663 (which carried ~19k lines of CRLF noise and made the endpoint public). Reimplemented on a clean LF base with the security hole closed and tests added. Co-authored-by: qingshan89 <qs2816661685@gmail.com>	2026-06-15 23:50:19 -05:00
Gille	0441b7f19f	fix(desktop): route global remote profile REST calls (#47011 ) * fix(desktop): route global remote profile REST calls * fix(dashboard): scope oauth provider routes by profile * test(tui): isolate notification poller queue	2026-06-15 23:24:55 -05:00
Shannon Sands	7cd71de1f4	Simplify dashboard update detection to containers	2026-06-15 20:08:39 -07:00
Shannon Sands	b1d6a57883	Detect containerized dashboard update management	2026-06-15 20:08:39 -07:00
Shannon Sands	0b6b29a30c	Hide hosted dashboard update controls	2026-06-15 20:08:39 -07:00
Teknium	2dbc3bd937	fix(skills): guard recursive skill delete against tree-escape (#46929 ) Port from Kilo-Org/kilocode#11240. Their issue #11227 lost a user's entire working directory: a built-in-skill sentinel location resolved to the server cwd and the skill-removal endpoint ran a recursive delete on it. Hermes' /skills uninstall path (skills_hub.py) is already hardened, but the agent-facing skill_manage(action='delete') path did a bare shutil.rmtree(skill_dir) with no last-line validation. Add _validate_delete_target(): refuse to rmtree a path that (1) isn't strictly inside a known skills root, (2) is a skills root itself, or (3) is reached via a symlink/junction. Tests: 4 cases (normal delete works; symlinked dir, skills-root, out-of-tree all refused). E2E verified with real symlink + file I/O.	2026-06-15 17:14:59 -07:00
Teknium	5a0e0d35b9	fix(mattermost): preserve thread-local delivery hygiene Salvage the valid thread-routing pieces from #41640: - route Mattermost progress/status sends through metadata thread IDs - treat top-level Mattermost channel posts as thread roots for progress - preserve thread metadata through media/file sends - allow flat fallback only for final notify-worthy replies on confirmed broken roots Co-authored-by: Wolfram Ravenwolf <github.com@wolfram.ravenwolf.de>	2026-06-15 15:06:23 -07:00
kshitij	d2b34e89b0	Merge pull request #44431 from erosika/feat/honcho-identity-tree feat(honcho): gateway-gated identity tree + canonicalize on pinUserPeer	2026-06-16 03:35:24 +05:30
kshitij	cffd6e3c8d	Merge pull request #46078 from xxxigm/fix/discord-slash-command-100-cap fix(discord): cap slash commands at Discord's 100-command limit	2026-06-16 02:05:31 +05:30
Teknium	c66ecf0bc3	feat(delegation): async background subagents via delegate_task(background=true) (#40946 ) * feat(delegation): async background subagents via delegate_task(background=true) delegate_task(background=true) dispatches a subagent that runs in the background and returns a handle immediately, so the user and model keep working while it runs. The full result — plus the original task source — re-enters the conversation as a new turn when the subagent finishes, riding the same completion-queue rail as terminal background processes. - tools/async_delegation.py: daemon-executor registry, capacity cap, rich self-contained completion event pushed onto the shared process_registry.completion_queue (type='async_delegation'). - delegate_tool.py: background param + single-task dispatch branch; batch async rejected (v1). - process_registry.py: format_process_notification renders the rich task-source block (goal/context/toolsets/model/status/result). - gateway/run.py: dedicated _async_delegation_watcher drains + injects results into the originating session (idle + post-turn), session_key routing enrichment, shutdown interrupt of dangling delegations. - config: delegation.max_async_children (default 3). Reuses the existing idle-drain wiring rather than mutating a running agent loop, preserving message-role alternation and prompt-cache invariants. 13 targeted tests; CLI + gateway paths E2E-verified. * test(delegation): make async non-blocking tests environment-independent CI 'test (5)' flaked on a cold, 8-worker runner: the first delegate_task(background=true) call measured 2.27s of one-time setup (config load + child-agent construction + imports), tripping the elapsed < 1.0 wall-clock assertion. That assertion was testing setup overhead, not blocking. Replace the wall-clock thresholds with the real invariant: dispatch returns while the child is still gated (active_count == 1, completion queue empty), which a synchronous impl could not do. Keep only a loose 4s sanity backstop well under the runner's 5s gate. * fix(delegation): harden async background delegation Follow-up review fixes: - Detach background child from parent._active_children at dispatch — otherwise parent-turn interrupts (Ctrl+C, mid-turn steering), cache evicts (release_clients), and session close (/new) kill/close the detached subagent mid-run, defeating the point of background mode. Lifecycle is owned by the async registry's interrupt_fn. - Make the capacity check atomic with the record insert (TOCTOU: two concurrent dispatches could both pass active_count() and exceed the cap). - TUI dedup: key async_delegation events by delegation_id — the fallthrough keyed them all as ("", type), suppressing every completion after the first in the desktop/TUI status feed. - CLI /stop now interrupts running background delegations and /agents lists them (they live outside the process registry and were invisible). - Drop stray unbalanced ']' line from the re-injection block and the unused _ASYNC_DEFAULT import. Tests: detach-at-dispatch + concurrent-capacity race added (15 total in test_async_delegation.py); 137 delegate + 140 process-registry/notify/watch + 7 TUI dedup tests pass. * fix(delegation): harden async background completion drains	2026-06-15 13:33:12 -07:00
ethernet	39f479cba8	Merge pull request #46085 from xxxigm/fix/bundled-node-global-npm-path fix(install): make `npm install -g` packages reachable on PATH	2026-06-15 15:47:54 -04:00
Austin Pickett	ed20f5ed06	fix(desktop): let explicit model switches escape broken config providers (#42241 ) (#46796 ) When a desktop/dashboard session had no agent built yet and the user explicitly picked a provider in the model picker, config.set('model', ...) would first try to initialize the agent from the (possibly broken) config default provider — failing before the user's explicit switch could take effect, trapping them on a misconfigured default. config.set now pre-parses the model flags: if an explicit --provider is present and no agent exists yet, it skips the default-provider agent build and routes straight through _apply_model_switch with the explicit provider. _apply_model_switch gained a parsed_flags passthrough (avoids double-parsing) and only falls back to resolve_runtime_provider(requested=None) when no explicit provider was given. The desktop hook now sends config.set instead of slash.exec for active-session model changes, so errors from the selected provider surface to the user instead of being swallowed. Co-authored-by: rodboev <rod.boev@gmail.com>	2026-06-15 15:36:51 -04:00
xxxigm	2a08b8c86f	test(dump): cover terminal backend override reporting Verifies `hermes debug` surfaces a TERMINAL_ENV override of terminal.backend, reports the config value when no override is present, and emits no spurious note when env and config agree.	2026-06-15 12:31:23 -07:00
liuhao1024	60cc42e38b	fix(inventory): deduplicate models between user-defined and aggregator providers When a user-defined provider (e.g. litellm-proxy) and an aggregator (e.g. openrouter) both advertise the same model name, the Desktop/TUI model picker would show the model under both groups. Selecting it from the aggregator row silently set model.provider to the aggregator, breaking calls because the aggregator doesn't actually serve that model ID. Fix: after list_authenticated_providers() returns, collect all models from user-defined provider rows and filter them out of aggregator rows. Uses is_aggregator() from hermes_cli/providers.py to identify aggregators. Case-insensitive matching. Fixes #45954	2026-06-15 12:25:41 -07:00
liuhao1024	9df1a1a8de	fix(doctor): recognize nvidia as vendor-slug-accepting provider NVIDIA NIM API uses vendor-prefixed model IDs (e.g. qwen/qwen3.5-122b-a10b, nvidia/nemotron-3-super-120b-a12b). The doctor command incorrectly warns that vendor-prefixed slugs belong to aggregators like openrouter when nvidia is the configured provider. Add 'nvidia' to the providers_accepting_vendor_slugs set so doctor no longer raises false-positive warnings for valid NVIDIA NIM configurations. Fixes #35425	2026-06-15 12:24:46 -07:00
Austin Pickett	5f6be7f31b	fix(teams): package Microsoft Teams SDK as an installable extra (salvage #43945 ) (#46764 ) * fix(teams): package Microsoft Teams SDK as an installable extra The Teams adapter imports the microsoft-teams-apps SDK, but it was never declared as a dependency, so source/local installs hit ImportError and the adapter silently reported the SDK as unavailable. Add a 'teams' extra (microsoft-teams-apps==2.0.13.4 + aiohttp) and document 'uv sync --extra teams'. Per the 2026-05-12 [all] policy, opt-in messaging-platform SDKs are NOT added to [all] (they would break every fresh install on a quarantined release); the teams extra is installed on demand like the other platform backends. Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai> * chore: map rio-jeong contributor email for attribution (#43945) * feat(teams): lazy-install the Teams SDK on demand (parity with other channels) The teams extra alone left Teams as the only messaging platform that wouldn't auto-install its SDK — every other channel (telegram, discord, slack, matrix, dingtalk, feishu) lazy-installs via tools.lazy_deps on first connect. Bring Teams to parity: - Add 'platform.teams' to LAZY_DEPS (microsoft-teams-apps + aiohttp). - Replace the passive 'check_teams_requirements = check_requirements' alias with a real lazy-installer that calls ensure_and_bind('platform.teams', ...), rebinding all Teams SDK globals on success (mirrors check_slack_requirements). - Call check_teams_requirements() at the top of TeamsAdapter.connect() so enabling Teams installs the SDK on demand. - Keep the passive check_requirements() as the registry check_fn so 'gateway status' probes never trigger a pip install. The 'teams' extra remains for packagers / explicit 'uv sync --extra teams'. Tests: rework the alias test into shortcircuit + lazy-install assertions, and update test_connect_fails_without_sdk to simulate an uninstallable SDK. --------- Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai> Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-06-15 14:35:15 -04:00
Austin Pickett	0bbf325a8f	fix(dashboard): scope chat sidebar model card to selected profile (#46665 ) * fix(dashboard): scope chat sidebar model card to selected profile The PTY already honors ?profile= on profile switch, but the JSON-RPC sidecar created sessions against the dashboard launch profile. Pass the management profile through session.create and reconnect on switch. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(dashboard): sync active profile with management scope Align the sidebar switcher with the sticky active profile on load and when "Set as active" is clicked, so Chat and management pages match what the Profiles page shows as active. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(dashboard): auto-reconnect chat sidebar on profile switch Bump the sidecar connection version when profile or PTY channel changes, matching the manual Reconnect path so gateway and events sockets come back without clicking the error banner. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(dashboard): prevent model selector chevron overlapping label Use inline flex layout instead of Button suffix, which is absolutely positioned and overlapped truncated model names at px-0. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-15 12:50:19 -04:00
xxxigm	f02484feba	test(deps): guard @assistant-ui cluster on one tap version Lockfile invariant that would have caught the desktop build break: the single hoisted @assistant-ui/tap must satisfy every @assistant-ui/* package's declared tap requirement (deps or non-optional peer). It is a contract, not a snapshot -- no hardcoded versions -- so it stays green across routine bumps but fails the moment the cluster splits its tap requirement again.	2026-06-15 11:55:02 -04:00
Teknium	3e7e9b24d4	fix: harden salvaged session and browser improvements Polish salvaged contributor work before PR review: - read browser inactivity timeout from config with documented fallback - skip redundant v10 trigram backfill before v11 FTS rebuild - show delegate_task goals safely in progress previews - show gateway status model/context without redundant token wording - wire gateway /sessions to shared session-listing helpers - map Ravenwolf author emails for release attribution Co-authored-by: Wolfram Ravenwolf <github.com@wolfram.ravenwolf.de> Co-authored-by: Amy Ravenwolf <amy@ravenwolf.de>	2026-06-15 07:46:34 -07:00
Wolfram Ravenwolf	ead38107a2	feat(status): restore model and context in gateway status PROBLEM: The old public /status PR drifted out of the current Amy patch stack, leaving /status without the model/provider, context window, or explicit cumulative token label that Wolfram uses to monitor context pressure from chat. SOLUTION: Re-port the feature onto the current gateway status handler. Prefer live/cached agent runtime metadata, fall back to SessionDB + SessionStore state between turns, add localized status model/context lines, and keep token totals explicitly labeled cumulative. Verification: tests/gateway/test_status_command.py, tests/hermes_cli/test_commands.py	2026-06-15 07:46:34 -07:00
Amy Ravenwolf	2f2e3616b4	fix(config): read browser inactivity timeout from config	2026-06-15 07:46:34 -07:00
Teknium	49e743985a	fix: route minimax m3 reasoning controls through profile Follow up PR #46609's api.minimax.io reasoning report by moving the behavior out of the broad run_agent host gate and into the MiniMax provider profile. Only MiniMax-M3 on the documented OpenAI-compatible /v1 route gets reasoning_split/thinking/reasoning_effort; Anthropic-format MiniMax and non-M3 models keep their existing wire shapes. Co-authored-by: goku94123 <gooku94123@gmail.com>	2026-06-15 07:08:43 -07:00
Teknium	be7c919bf9	fix(process): label background completion causes (#46659 ) Track why a background process finished and include that source in notify-on-complete messages so SIGTERM from process.kill, kill_all, backend loss, and ordinary exits are distinguishable.	2026-06-15 07:08:24 -07:00
Teknium	733472952a	fix: complete cron jobs lock salvage Route curator rollback through the same cross-process cron job lock, make save_jobs lock for legacy direct callers without deadlocking nested mutation paths, and harden the regression test so a second _jobs_lock caller really blocks across processes.	2026-06-15 06:29:00 -07:00
CiarasClaws	e5b4cf7bea	fix(cron): make jobs.json writes safe across processes `hermes cron pause`/`resume`/`remove` run in their own CLI process (CLI → cronjob tool → pause_job → update_job → save_jobs), entirely separate from the gateway process that also writes jobs.json (mark_job_run, advance_next_run, due-fast-forward in get_due_jobs). The only synchronization was a module-level `threading.Lock`, which serializes writers within a single process but does nothing across processes — and update_job/pause_job/remove_job/create_job did not even take it. The result is a classic lost update: a `cron pause` issued while the gateway is live loads jobs.json, sets enabled=False, and saves; concurrently the gateway loads the same file and saves back its run-bookkeeping, clobbering the pause. The CLI prints "Paused" (it succeeded against its own in-memory copy) but the job stays enabled and keeps firing, with no error surfaced. The scheduler's `.tick.lock` flock can't be reused for this — it is held for the entire tick, including multi-minute agent runs, so a CLI mutation would block for minutes. Add `_jobs_lock()`: a short-held cross-process advisory file lock (fcntl/msvcrt flock on `<hermes_home>/cron/.jobs.lock`) layered over the existing in-process lock, and wrap every load→modify→save critical section with it — create_job, update_job, remove_job, mark_job_run, advance_next_run, get_due_jobs, rewrite_skill_refs. The lock degrades to in-process-only if neither fcntl nor msvcrt is available, preserving prior behaviour. All critical sections are short (field edits, no agent execution), so contention resolves in milliseconds. Adds a regression test that proves the lock excludes a second process (an in-process threading.Lock cannot). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 06:29:00 -07:00
FT_IOxCS	92a456f711	fix(cli,deps): clear esbuild audit loop Upgrade the Vite/esbuild surfaces that kept web, ui-tui, and the bootstrap installer on vulnerable esbuild versions, regenerate the root lockfile, and preserve intentional package+lock dependency edits during update lockfile cleanup.	2026-06-15 06:18:27 -07:00
Teknium	0d82060c74	fix: harden WhatsApp target alias salvage Add a parser-only routing regression that proves raw WhatsApp group JIDs bypass channel-directory resolution and home-channel fallback, include channel_aliases.json in quick state snapshots, harden malformed alias handling, and map Keiron McCammon for release attribution.	2026-06-15 05:51:47 -07:00
Keiron McCammon	ea49a79633	fix(messaging): route WhatsApp group JIDs to the target, not the home DM send_message(target="whatsapp:<group-jid>") silently delivered to the configured home DM instead of the requested group. Two gaps: 1. _parse_target_ref had no WhatsApp branch. Group JIDs (<id>@g.us), user JIDs (<id>@s.whatsapp.net), linked-identity JIDs (<id>@lid), and broadcast/newsletter JIDs matched no pattern and fell through to `return None, None, False`, so the caller treated them as unresolvable and used the home channel. The bridge's /send endpoint accepts any chatId, so only the tool-side target parsing was at fault. Add a whatsapp branch that recognizes native JIDs as explicit targets. The pre-existing '+'-prefixed E.164 path is preserved. 2. WhatsApp groups have no human-friendly name — the channel directory is regenerated from session data on a timer, so a group shows up as its raw 18-digit JID and any hand-edit to channel_directory.json is clobbered on the next rebuild. Add a user-maintained alias overlay (~/.hermes/channel_aliases.json) re-applied on every build AND every load, giving durable friendly names and letting a freshly-created group be pre-named before its first message. Tests: TestParseTargetRefWhatsAppJID (7 cases) for the parser; TestChannelAliases (7 cases) for the overlay, plus an autouse fixture isolating CHANNEL_ALIASES_PATH so a real alias file can't leak into the existing directory tests.	2026-06-15 05:51:47 -07:00
Veritas-7	febdddb41a	fix(auth): refresh xAI OAuth tokens earlier	2026-06-15 05:40:23 -07:00
Teknium	aab2e99bae	test: cover request debug dump redaction Keep request dump writes on the shared atomic JSON path, add regression coverage for request body/error/stdout redaction, and map the salvaged contributor email for release attribution.	2026-06-15 05:31:21 -07:00
Teknium	a688d2a1bd	test: assert disk cleanup prunes protected walks	2026-06-15 05:25:27 -07:00
墨綠BG	40699c3292	🐛 fix(disk-cleanup): avoid brittle sweep review issues	2026-06-15 05:25:27 -07:00
墨綠BG	c1a70a5439	🐛 fix(disk-cleanup): prune protected cleanup walks	2026-06-15 05:25:27 -07:00
liuhao1024	2cddc9c895	fix(bedrock): check boto3 version >= 1.34.59 before using converse_stream converse() and converse_stream() were added in boto3 1.34.59. When Hermes is installed editable into system Python (e.g. Ubuntu 24.04 ships 1.34.46), the system boto3 takes precedence and calls to converse_stream fail with AttributeError. Add an early version check in _require_boto3() that raises a clear RuntimeError with upgrade instructions.	2026-06-15 05:25:17 -07:00
Tharushka Dinujaya	ec05d2bc3e	fix(gateway): evict scoped lock when PID+start_time match but process is not a gateway On Linux, systemd spawns core services (cron, nginx, sshd) with deterministic PIDs and jiffy start_times across reboots. A service can land on the exact same PID and start_time as a previous gateway, causing acquire_scoped_lock to mistake it for a live gateway and block startup. The existing stale-detection paths only covered: - start_times both non-None and different (clear mismatch) - start_times both None (macOS/Windows fallback to cmdline check) The boot-time collision falls through both: times are non-None and equal, so neither branch fired. Add a third check: when both start_times are known and match but the live process fails _looks_like_gateway_process, read its cmdline. If the cmdline is readable (non-None), we have positive evidence of an impostor and mark the lock stale. Requiring a readable cmdline keeps the check conservative — if cmdline is unreadable we do not evict.	2026-06-15 05:25:07 -07:00
Nicolò Boschi	a376ca0081	feat(hindsight): make observation scopes configurable on retain Adds an observation_scopes config key (and HINDSIGHT_RETAIN_OBSERVATION_SCOPES env var) so retained memories can opt into per_tag / all_combinations / custom scoping instead of Hindsight's default combined pass. Threaded through _build_retain_kwargs so all three retain paths honor it: auto-retain and flush-on-switch already use aretain_batch; the tool retain path is switched from aretain to aretain_batch (functionally equivalent, aretain just wraps a single-item batch) since aretain doesn't accept the observation_scopes parameter.	2026-06-15 04:59:17 -07:00
kshitijk4poor	497352bc4e	fix(auth): write rotated xAI OAuth tokens back to global root (#43589 ) The salvaged read-side fix lets a profile resolve the xAI OAuth grant from the global-root auth store when it has no own providers.xai-oauth block. But _save_xai_oauth_tokens still wrote rotated tokens only to the active profile store. Because xAI rotates the refresh_token on every refresh, a profile that reads root's grant and refreshes it left root holding a now- revoked refresh token — killing every other profile reading the stale root grant with invalid_grant once its access token expired (#43589). Detect the read-from-root case (profile lacks its own providers.xai-oauth block) and, after the profile save, write the rotated chain back to the global root too via a best-effort, TOCTOU-safe write-through that reuses _save_auth_store with an explicit target path. A profile that genuinely shadows root (has its own block) is left untouched, classic mode is a no-op, and a failed root write never breaks the profile's own save. Pairs with the read fallback in the preceding commit so the cross-profile xAI grant stays coherent in both directions.	2026-06-15 17:08:19 +05:30
Andrew Walker	f1d6f04362	fix(auth): resolve xAI OAuth credentials across profiles (cherry picked from commit `8d8b9f50e4`)	2026-06-15 17:03:35 +05:30
helix4u	dcc3216955	fix(mcp): fail fast for noninteractive oauth without tokens	2026-06-15 04:22:07 -07:00
Teknium	aca11c227e	fix(docker): skip gateway reconciliation in dashboard container (autodetect) (#46293 ) * fix(docker): skip per-profile gateway reconciliation in dashboard container When gateway and dashboard containers share a bind-mounted HERMES_HOME, both run the cont-init.d profile reconciliation script, which creates s6-log processes for every persisted profile. These s6-log processes in different containers race to flock() the same log-directory lock files under logs/gateways/<profile>/lock, producing repeated "s6-log: fatal: unable to lock ... Resource busy" errors and a supervision restart storm. Add HERMES_SKIP_PROFILE_RECONCILE env var support to container_boot.py and set it in the official docker-compose.yml dashboard service so the dashboard container no longer creates per-profile gateway s6 services it never uses. * chore(release): map salvaged contributor * refactor(docker): autodetect dashboard container instead of env-var gate Replace the HERMES_SKIP_PROFILE_RECONCILE env var with PID 1 argv role detection. A dashboard-only container never spawns or supervises per-profile gateways, so the reconcile boot hook now skips itself when /proc/1/cmdline is the dashboard command — no operator flag to set (or forget in a hand-written manifest, which would reintroduce the s6-log flock storm this prevents). - Extract _strip_container_argv_prefix() shared by the legacy-gateway and new dashboard detectors (DRY the init/wrapper/hermes peel). - Add _is_dashboard_container(); gate reconcile main() on it. - Drop HERMES_SKIP_PROFILE_RECONCILE from code + docker-compose.yml. - Tests: argv matrix for both roles + main()-level skip/reconcile proof and a regression that the removed env var is now inert. Co-authored-by: 895252509 <895252509@qq.com> --------- Co-authored-by: zhouxiang <895252509@qq.com> Co-authored-by: Ben <ben@nousresearch.com>	2026-06-15 20:51:48 +10:00
Gille	d6a8d9dcab	fix(tools): respect session cwd in file tools	2026-06-15 14:00:42 +05:30
Ben Barclay	95715dcb03	fix(s6): reserved default gateway must not follow sticky active_profile (#46483 ) The supervised `gateway-default` s6 slot runs bare `hermes gateway run` (no -p) to mean "the root HERMES_HOME profile". But `_apply_profile_override` falls through its #22502 HERMES_HOME guard for the container root (/opt/data, whose parent is not `profiles`) and reads the sticky `active_profile` file. If the user set another profile active (e.g. via the dashboard), the reserved default gateway gets redirected into that profile — producing a duplicate gateway for the active profile and no real default gateway. The profile page and `gateway status` then correctly report default as "not running" because there genuinely isn't one. Guard step 2 (the sticky active_profile fallback) with the existing HERMES_S6_SUPERVISED_CHILD sentinel that the container run-script already exports. Supervised named-profile slots pass -p explicitly (step 1, never reaches step 2); only the bare default slot was affected. Inert outside the s6 container — the sentinel is never set elsewhere. Reported in the 'Docker & Profiles & Dashboard' support thread.	2026-06-15 05:36:20 +00:00
Ben Barclay	80f8ffc74c	fix(dashboard): pin machine-dashboard reroute to the machine root, not $HOME/.hermes (#46487 ) The unified machine-dashboard reroute (cmd_dashboard) re-execs a named-profile dashboard launch as the machine dashboard and dropped HERMES_HOME from the child env with the comment "so the child binds the machine root". That holds for a standard install (root == ~/.hermes) but breaks the Docker layout: the published image sets `ENV HERMES_HOME=/opt/data`, so once HERMES_HOME is unset the child falls back to $HOME/.hermes = /opt/data/.hermes — an empty, auto-seeded home. Two user-visible symptoms, one root cause (reported via support): 1. Dashboard Profiles page shows only an empty `default` — the real default/oracle/saga profiles live under /opt/data/profiles, but the rerouted child resolves _get_profiles_root() to /opt/data/.hermes/profiles. 2. The "Update Hermes" button runs `hermes update` inside the container repeatedly instead of bailing with the docker-update guidance. The Docker guard keys off detect_install_method(), which reads $HERMES_HOME/.install_method; the image stamps that at /opt/data, but the misresolved home has no stamp, no HERMES_MANAGED, and no .git → falls through to "pip", so the guard never fires. The reporter's workaround was to bind-mount the host dir at both /opt/data and /opt/data/.hermes so the two paths converge (at the cost of a self-referential recursion). Fix: resolve the machine root explicitly with get_default_hermes_root() and set it on the child env instead of popping HERMES_HOME. That helper returns the root for both layouts — ~/.hermes for a standard install, and /opt/data for Docker (it strips a trailing profiles/<name>). Falls back to the old pop behaviour only if root resolution raises, so the reroute is never blocked. Regression tests in test_dashboard_unified_launch.py: the existing standard- install test now asserts the child carries HERMES_HOME == get_default_hermes_root() (not absent), and a new test_reexec_pins_docker_machine_root covers the Docker layout (HERMES_HOME=/opt/data/profiles/oracle → child gets /opt/data). Both fail against the pre-fix pop behaviour (mutation-verified).	2026-06-15 15:33:15 +10:00
Teknium	b770967263	fix(s6): persist profile gateway desired state (#46292 ) * fix: persist s6 gateway desired state * chore(release): map salvaged contributor --------- Co-authored-by: Alfred Smith <alfred@my-cloud.me> Co-authored-by: Ben <ben@nousresearch.com>	2026-06-15 14:02:10 +10:00
Teknium	61ee2dbfdb	fix(s6): make profile gateway log parent writable (#46291 ) * fix(gateway): chown logs/gateways parent so late-added profiles can log The per-profile log service script created $HERMES_HOME/logs/gateways/ via 'mkdir -p' but only chowned the leaf logs/gateways/<profile>. When the first log service boots in root context, the gateways/ parent stays root:root; every profile registered later runs its log service as the dropped hermes user, 'mkdir -p' fails with EACCES, and s6-log enters a sub-second fatal crash-loop flooding the container log. The stage2 recursive heal does not catch it either: it is gated on needs_chown, which is false when the top-level $HERMES_HOME is already hermes-owned. Two complementary fixes: - service_manager._render_log_run: chown the gateways/ parent (non-recursively) before the leaf chown. Runs on every root-context boot, so it also heals volumes already poisoned by older images. - docker/stage2-hook.sh: seed logs/gateways in the as_hermes mkdir -p block; cont-init runs before any service starts, so the parent already exists hermes-owned when the first log/run does 'mkdir -p'. The needs_chown repair loop needs no twin entry: it already chowns logs/ recursively, which covers logs/gateways. Fixes #45258 * chore(release): map salvaged contributor --------- Co-authored-by: tangtaizhong666 <tangtaizhong792@gmail.com>	2026-06-15 13:47:05 +10:00
Teknium	40d7c264f0	fix(s6): register profile gateways without auto-starting (#46266 ) * fix(s6): prevent profile create from auto-starting gateway service When hermes profile create runs inside an s6 container, _maybe_register_gateway_service() calls register_profile_gateway() which creates the service directory and triggers s6-svscanctl -a. Previously the service always started immediately, causing profiles that share the main gateway's bot token (e.g. Kanban worker profiles) to fail with a token-lock conflict and persist gateway_state: running — becoming zombies that resurrect on every container restart. Wire the existing start_now parameter through the S6 implementation: when start_now=False, write a marker file (same pattern as container_boot.py _register_gateway_slot) so s6-supervise leaves the service stopped until the user explicitly runs hermes -p <profile> gateway start. 4 files, +61/-6, 4 new tests (all passing). * test(docker): wait for gateway running state before restart --------- Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-06-15 11:43:23 +10:00

1 2 3 4 5 ...

5543 commits