hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-27 11:22:03 +00:00

Author	SHA1	Message	Date
Teknium	5b5c79a8ef	feat(kanban): typed block reasons + unblock-loop breaker (#52848 ) * feat(kanban): typed block reasons + unblock-loop breaker Stops the kanban blocked-task loop: a worker blocks a task, a cron unblocks it, the worker re-blocks for the same reason, repeat forever. block_task now takes a typed kind and a persistent block_recurrences counter on the tasks table: - kind=dependency routes to todo (parent-gated, auto-resumed), never the human 'blocked' bucket a cron would keep unblocking. - needs_input/capability/transient/untyped land in blocked; each same-cause re-block after an unblock increments block_recurrences, and at BLOCK_RECURRENCE_LIMIT (default 2) the task routes to triage for a human instead of blocked. - unblock_task no longer resets block_recurrences (the amnesia that let the loop run unbounded); complete_task clears it on success. Wired through the worker kanban_block tool (new kind arg) and the hermes kanban block --kind CLI flag, both reporting where the task actually landed. Docs + 11 new tests; 536 existing kanban tests green. * test(kanban): make second-block notify test use a distinct block cause test_notifier_second_blocked_delivers blocked the same task twice with the same (untyped) reason, which now trips the new unblock-loop breaker and routes the second block to triage instead of blocked — so only one 'blocked' notification fired. The test's actual intent is that TWO distinct block cycles each notify; give the two cycles different kinds (needs_input then capability) so they're genuinely separate blocks. The same-cause loop→triage path is covered by test_kanban_block_kinds.py.	2026-06-25 21:46:58 -07:00
Teknium	0b7128582f	fix(state): detect and repair FTS write corruption that silently drops gateway history (#52798 ) A readable state.db can still reject every message write through the messages_fts* triggers when the FTS5 index is corrupt: base-table reads and PRAGMA integrity_check pass, but INSERT INTO messages fails with 'database disk image is malformed'. The gateway reloads conversation_history from disk each turn, so a silently-failed write hands the next turn stale/empty history even though the same cached AIAgent still holds the live transcript — causing immediate same-session amnesia. (#50502) - hermes_state.py: _db_opens_cleanly() now drives a rolled-back message write through the FTS triggers, so write-only corruption (which the read-only probe reported healthy) is detected. repair_state_db_schema() gains an in-place FTS5 'rebuild' strategy (tier 0) before the dedup/drop tiers, plus an already_healthy short-circuit. Both 'hermes sessions repair' and 'hermes doctor' route through these, so the fix covers the whole class. - hermes_cli/doctor.py: the state.db check runs the write-health probe even on the success (readable) path and repairs in place with --fix. - gateway/run.py: _select_cached_agent_history() prefers the cached agent's longer live _session_messages over a shorter persisted transcript, so an FTS write failure can't wipe in-session context. - tests: regressions for write-health detection, in-place repair preserving rows + resuming writes, the already_healthy shortcut, and the gateway guard. Combines the approaches from #50504 (@0-CYBERDYNE-SYSTEMS-0, issue author), #52165 (@davidgut1982), and #50576 (@trevorgordon981).	2026-06-25 21:18:41 -07:00
liuhao1024	56cf517ccd	fix(cron): detect partial job loss in restore_cron_jobs_if_emptied (#52144 ) The desktop scheduler can overwrite cron/jobs.json with its own small set of internally-tracked crons after an update/restart, causing partial loss of tool-created cron jobs. The previous guard only checked for total loss (live_count == 0), missing the case where live_count > 0 but less than the pre-update snapshot count. Compare live_count against snap_count instead of checking for zero, so both total loss (0 vs N) and partial loss (1 vs 19) trigger restoration. Salvaged from #52161 by @liuhao1024. Closes #52144	2026-06-25 18:49:18 -07:00
Brooklyn Nicholson	ff81365988	feat(desktop): in-app spot editor for the file preview pane Adds a CodeMirror 6 spot editor to the right-rail file preview so users can make quick edits in-app without leaving for an IDE. Entering edit mode is a pure in-place swap of the read view — same fixed-height header, same gutter geometry/typography (mirrors SourceView 1:1) so nothing shifts — toggled via the Edit button, a bare `e` when the pane is hovered/focused, or the tab. - Save path is transport-agnostic (writeDesktopFileText): local Electron IPC or a new hardened POST /api/fs/write-text on the dashboard server (path validation, parent-must-exist, regular-files-only, size cap, atomic temp-file + os.replace), behind the existing auth middleware. - Stale-on-disk guard re-reads before writing and offers overwrite vs discard-and-reload instead of clobbering external/agent edits. - VS Code-style modified dot on the tab; ⌘/Ctrl+S and ⌘/Ctrl+Enter save, Esc cancels; GitHub highlight style matched to the read view's Shiki theme. - Typing stays render-free (draft in a ref; dirty flips once at the boundary).	2026-06-25 19:50:25 -05:00
Teknium	208f0d7c3b	fix(update): default pre-update backup to off (#52729 ) The pre-update HERMES_HOME zip shipped on by default (DEFAULT_CONFIG + runtime fallback both True), so every `hermes update` zipped the entire ~/.hermes — sessions DB, caches, skills — adding minutes to each update. The shipped cli-config.yaml.example, the --backup help, and the example config all already said "off by default," so the live default contradicted its own documentation. Flip the default to off everywhere: DEFAULT_CONFIG, the runtime `.get(..., False)` fallback in _run_pre_update_backup, and the stale --backup help string. Users who want the #48200 safety net opt in via updates.pre_update_backup: true or --backup for a single run. Updated test_default_enabled_creates_backup -> test_default_disabled_is_silent to assert the new default (silent no-op, no zip).	2026-06-25 16:01:09 -07:00
kshitij	e4ff494860	fix(cron): add default retention to per-run job output (#52383 ) (#52646 ) * fix(cron): add default retention to per-run job output to bound disk usage (#52383) Per-run cron output (cron/output/<job>/<timestamp>.md) is written once per execution and was never pruned, so a frequently-scheduled job on a long-running deploy accumulates one file per run indefinitely and can fill the volume ('no space left on device'). save_job_output() now keeps the most recent N output files per job and removes older ones. N defaults to 50 and is configurable via cron.output_retention; a non-positive value disables pruning for operators who manage cleanup externally. Salvaged from #52402 by @0xDevNinja. Closes #52383 * fix(config): add cron.output_retention to DEFAULT_CONFIG Follow-up to #52383: the retention config key was functional via get()-with-default but missing from DEFAULT_CONFIG, so the deep-merge wouldn't auto-populate it for new installs. Add it explicitly. --------- Co-authored-by: 0xDevNinja <manmit0x@gmail.com>	2026-06-25 16:00:13 -07:00
brooklyn!	ffa3d3c811	Merge pull request #49037 from NousResearch/bb/projects-paradigm feat(desktop): first-class projects — sidebar, coding rail, review pane, and agent project tools	2026-06-25 17:49:05 -05:00
Gille	e7d2f0b93c	fix(windows): suppress console flashes and harden gateway restarts	2026-06-25 14:42:38 -07:00
Brooklyn Nicholson	9f3aa1685c	fix(cli): register project command beside MoA	2026-06-25 16:40:27 -05:00
Brooklyn Nicholson	4e023f5bc9	feat(gateway): build authoritative project tree	2026-06-25 16:40:27 -05:00
Brooklyn Nicholson	e7811345c1	feat(kanban): link tasks to project worktrees	2026-06-25 16:40:26 -05:00
Brooklyn Nicholson	8a45ce2dd4	feat(projects): add per-profile project store	2026-06-25 16:40:26 -05:00
Teknium	c6575df927	feat(moa): expose MoA presets as selectable virtual models (#46081 ) * feat(moa): expose MoA presets as selectable virtual models Reconstructed onto current main (PR #46081's base had diverged with no common ancestor, marking the PR dirty so CI never dispatched). MoA is now a virtual provider: each named preset is a selectable model under provider 'moa', and the preset's aggregator is the acting model that answers and calls tools. Reference models fan out in parallel via a bounded ThreadPoolExecutor (the same batch pattern delegate_task uses) — all references dispatched at once, collected when every one finishes, then handed to the aggregator. Output order is preserved, failures and the MoA-recursion guard stay isolated per reference. - Removed the old mixture_of_agents model tool and moa toolset. - Added moa as a virtual provider in the provider/model inventory. - /moa is shortcut behavior over model selection (default preset / named preset / one-shot prompt). - Dashboard + Desktop manage named presets; presets appear in model pickers. - Parallel reference fan-out in agent/moa_loop.py with regression test. * fix(moa): thread moa_config through _run_agent to _run_agent_inner The reconstructed gateway MoA wiring declared moa_config on _run_agent (the profile-scoping wrapper) and used it inside _run_agent_inner, but the wrapper never forwarded it — _run_agent_inner had no such parameter, so the runtime hit NameError: name 'moa_config' is not defined on the compression-failure session sync path. Add moa_config to _run_agent_inner's signature and forward it from both wrapper call sites (multiplex and non-multiplex). Caught by tests/gateway/test_compression_failure_session_sync.py on CI shard test(4). * fix(moa): classify moa as a virtual provider in the catalog The moa virtual provider has no PROVIDER_REGISTRY/ProviderProfile entry, so provider_catalog() fell through to the default auth_type="api_key" with no env vars — tripping two catalog invariants: - test_provider_catalog: api_key providers must expose a credential env var - test_provider_parity: every hermes-model provider must be desktop-configurable moa already declares auth_type="virtual" in HERMES_OVERLAYS; consult that overlay as an auth_type fallback so the catalog reports moa as virtual (no real credential, no network endpoint). Exempt virtual providers from the desktop parity union check the same way 'custom' is exempt — derived from the catalog, not a hardcoded slug, so future virtual providers are covered too.	2026-06-25 13:52:06 -07:00
kshitij	ca714f6189	Merge pull request #52653 from kshitijk4poor/salvage/33814-env-quote-hash fix(config): quote .env values containing # to prevent token truncation (#30355)	2026-06-26 01:32:49 +05:30
kshitijk4poor	2107b86024	feat(compression): flip in_place default to True (#38763 ) [2/2] In-place compaction (single durable session id, non-destructive soft-archive) becomes the default. Rotation is now the opt-out fallback via compression.in_place: false. Prerequisite: #50098 (hygiene guard reads result flag not config flag) merged first — without it, flipping the default causes permanent transcript loss on gateway hygiene-compress and /compress when no session_db is available. Blast radius (empirically measured on current main): 7 rotation-asserting tests broke and are pinned to in_place=False in the companion test commit: - tests/agent/test_compression_concurrent_fork.py (2) - tests/agent/test_compression_logging_session_context.py (1) - tests/agent/test_compression_rotation_state.py (1) - tests/run_agent/test_compression_boundary_hook.py (2 _make_agent helpers) - tests/gateway/test_compression_concurrent_sessions.py (2) Rotation stays as a working fallback and deserves continued coverage. Plan: .hermes/plans/in-place-compaction-38763.md	2026-06-25 12:56:05 -07:00
sweetcornna	150afea942	fix(config): quote env values containing hash	2026-06-26 00:54:34 +05:30
Brooklyn Nicholson	c4c590e4a1	perf(desktop): make session switching fast under load Switching sessions in the desktop app could freeze the whole UI for several seconds on heavy, tool-rich chats. Root causes and fixes: - Cold `session.resume` built the AIAgent (MCP discovery, prompt/skill build) before returning, and the desktop awaits that RPC before it paints — so the entire switch blocked on the build. Add an opt-in `defer_build` resume path (the contract `session.create` already uses): return the full display transcript immediately, register an upgradable live session, and pre-warm the agent on a short timer. The persisted runtime identity (model/provider/base_url/api_mode/reasoning/tier) is restored on the deferred build so it can't drop the provider. - Nothing bounded how many in-memory agents accumulate; a user who reconnects often piled up detached sessions for the full 6h TTL. Add a soft LRU cap (`max_live_sessions`, default 16) that evicts the least-recently-active DETACHED sessions (no live client) — never a running, awaiting-input, mid-build, or live-transport one. Reopening re-resumes from disk. - On the prefetch-hit cold-resume path, skip rebuilding a throwaway merged-message array (and its 1000-entry Map) when the prefetch already painted the exact transcript; the downstream sameMessageList guard already drops the publish, so it was pure main-thread cost. The desktop opts into `defer_build` for every non-watch cold resume; the eager path stays for CLI/TUI and existing callers.	2026-06-25 14:03:03 -05:00
Brooklyn Nicholson	1d9ed7f48a	fix(desktop): ad-hoc sign macOS self-update rebuilds The desktop self-updater rebuilds and re-signs the .app on each user's own machine (`hermes desktop --build-only` -> electron-builder `--dir`). With CSC_IDENTITY_AUTO_DISCOVERY on (its default), electron-builder signs the type=distribution, hardened-runtime bundle with whatever identity is in that user's keychain -- typically a personal "Apple Development" cert -- which stalls/fails the sign step (no Developer ID, no provisioning profile) or clobbers the original notarized signature with an unusable one, tripping Gatekeeper on every post-update launch. Force ad-hoc signing for the local packaged rebuild instead: deterministic, and exactly what _desktop_macos_relaunchable_fixup already finishes off. No-op for source runs, off-macOS, when a real identity is configured (CSC_LINK / APPLE_SIGNING_IDENTITY), or when the caller already pinned the flag.	2026-06-25 12:08:29 -05:00
Ben Barclay	736e981abf	fix(auth): honor NOUS_INFERENCE_BASE_URL env override for Nous OAuth sessions (#52270 ) The host-allowlist hardening (#30611) plus the refresh heal (#49735) left the documented NOUS_INFERENCE_BASE_URL dev/staging escape hatch unreachable for OAuth sessions, despite three code comments asserting it still works. Root cause — resolution precedence in resolve_nous_runtime_credentials: inference_base_url = ( _optional_base_url(state.get("inference_base_url")) # stored — wins or os.getenv("NOUS_INFERENCE_BASE_URL") # env — unreachable or DEFAULT_NOUS_INFERENCE_URL ) A staging OAuth login persists its inference_base_url, but the allowlist rejects the staging host and the refresh heal rewrites the stored value to the production default. The stored (now prod) value is then read BEFORE the env var, so the override never takes effect — every request 401s against prod or is pinned to prod, and setting the env var does nothing. Fix: the user-set env override is the most-trusted source, so consult it FIRST for the URL used to build the client / returned to callers — while keeping the PERSISTED value the validated, network-provenance one (the override is a runtime overlay, never written to auth.json, so unsetting it cleanly reverts to prod). Applied at both chokepoints: - resolve_nous_runtime_credentials (no-refresh read path AND refresh path) - the nous_portal proxy adapter, which re-validates the resolver's returned base_url against the prod allowlist as defense-in-depth and would otherwise reject a legitimate staging override at the forward boundary. New _nous_inference_env_override() / split of stored-vs-effective URL keep the threat model intact: Portal-returned URLs are still allowlist-validated at every network site, and the env path stays ungated (trusted OS user). Also folds in the no-refresh read-path heal (supersedes the approach in the open #50265): a poisoned stored staging host now heals to the prod default on read even when no refresh fires. Tests: TestEnvOverrideWins (env wins on read + refresh paths; override never persisted; poisoned stored heals) and TestProxyAdapterEnvOverride. Verified the 4 behavioral tests fail against pre-fix code and pass with the fix; full inference-validation + nous-provider suites green (85 passed). E2E-validated against a real temp HERMES_HOME exercising the real resolver + proxy adapter: resolver→staging, persisted→prod, proxy→staging, unset→reverts to prod.	2026-06-25 00:11:15 -07:00
kshitijk4poor	d6cf383d74	refactor(setup): simplify Z.AI picker — drop dead fallback, fix tests - Remove dead `chosen_base or effective_base` fallback; _select_zai_endpoint always returns a non-empty base URL (returns current_base on cancel). - Add .rstrip("/") to official-endpoint return for symmetry with custom-proxy path (both now return normalized URLs). - Replace magic index 4 with len(ZAI_ENDPOINTS) in custom-proxy tests so they don't break if a 5th endpoint is added to ZAI_ENDPOINTS.	2026-06-25 12:07:01 +05:30
kshitijk4poor	f3372d3407	feat(setup): wire Z.AI endpoint picker into _model_flow_api_key_provider When provider_id == 'zai', replace the plain text Base URL input with _select_zai_endpoint, which presents a curses picker offering Global, China, Coding Plan Global, Coding Plan China, and custom proxy options. Other API-key providers (MiniMax, DeepSeek, etc.) keep the text input.	2026-06-25 12:07:01 +05:30
kshitijk4poor	d0f9c4bcc6	feat(setup): add _select_zai_endpoint helper for Z.AI endpoint picker Presents a curses-based picker (via _prompt_provider_choice) offering the four official Z.AI endpoints — Global, China, Coding Plan Global, Coding Plan China — plus a custom-proxy option. Sourced from ZAI_ENDPOINTS in auth.py so it stays in sync with the probe list. Not yet wired into the setup flow; that comes in the next commit.	2026-06-25 12:07:01 +05:30
brooklyn!	d473e5d07a	Merge pull request #52296 from NousResearch/bb/verify-stop-loop Add verification stop loop	2026-06-24 23:10:03 -05:00
Brooklyn Nicholson	2f1a47b90e	feat(agent): require verification before finishing edits Make verification closure the default coding behavior after landed file edits while keeping bounded retries and config/env switches for users who need to disable it.	2026-06-24 23:02:48 -05:00
Victor Kyriazakos	b693bee100	feat(cron): thread-preferred continuable delivery (open a thread, mirror DM fallback) Continuable cron jobs (attach_to_session / cron.mirror_delivery, default OFF) now prefer a dedicated thread on thread-capable platforms, falling back to origin-DM mirroring where threads don't exist. - Thread-capable (Telegram topics, Discord/Slack threads): open a fresh thread for the job via the shipped adapter.create_handoff_thread, route the brief into it, and seed the thread-keyed session so the user's in-thread reply continues with full context. This is the 'continuable cron opens its own thread' interface. - DM-only (WhatsApp/Signal/SMS): create_handoff_thread returns None -> fall back to mirroring into the origin DM session (existing behaviour). Reuses existing infrastructure end-to-end — no new adapter surface, no provider-chain signature change: - adapter.create_handoff_thread (already implemented per-platform, returns None on unsupported platforms = the fallback signal) - the live SessionStore via adapter._session_store (already set on every adapter), reached without threading a new param through the frozen CronScheduler.start() contract - gateway.mirror.mirror_to_session for the seed/append - existing per-target delivery routing carries the new thread_id for free Mirrors GatewayRunner._process_handoff's open-thread-or-fallback + seed pattern, standalone for the cron delivery path. thread_seeded guards against a double-mirror after seeding. Scoped to the origin target only; fan-out/broadcast targets are never threaded or mirrored. Config docs updated (cron.mirror_delivery) + cronjob tool attach_to_session description reframed around continuable/thread-preferred. Tests: +5 (thread id returned on thread platform; None on DM platform; None without capability/loop; seed creates thread session + mirrors; seed no-op on empty). 22/22 in TestCronDeliveryMirror; 532 cron tests pass (4 failures pre-existing: croniter-not-installed + TZ).	2026-06-24 20:27:05 -07:00
Victor Kyriazakos	1b181724fa	feat(cron): optional mirror of cron delivery into target chat session Adds an opt-in path so a cron job's delivered output is also appended to the TARGET chat's gateway session transcript (as an assistant turn), so a user reply to a recurring delivery (daily brief, reminder) is answered with the delivery in context instead of 'what is that?' amnesia. - Reuses the shipped gateway.mirror.mirror_to_session — the same primitive interactive send_message mirroring already uses. No messaging-toolset change (cron still can't call send_message; this rides delivery). - Gated: per-job attach_to_session overrides global cron.mirror_delivery (config.yaml). Default OFF — historical isolation preserved byte-for-byte. - Mirrors the CLEAN agent output, not the cron header/footer wrapper. - Alternation/cache-safe: append lands at a turn boundary, never mid-loop, never mutates the cached system prompt. Cold-start (no target session) is a silent no-op; mirror errors never fail a successful delivery. - Surfaced on the cronjob tool (attach_to_session) + config schema. Driven by enterprise cron-as-control-plane use case. 10 new tests; full cron + cronjob-tool suites pass (600).	2026-06-24 20:27:05 -07:00
Teknium	411faf08bd	fix(soul): installers seed the real default persona, upgrade legacy empty templates (#52246 ) The desktop bootstrap (and curl/PowerShell/docker installs) seeded ~/.hermes/SOUL.md with a comment-only scaffold that contained no persona text. That shadowed the runtime default (_ensure_default_soul_md -> DEFAULT_SOUL_MD), since seeding is guarded by 'if SOUL.md doesn't exist'. Result: every fresh installer install got the empty template instead of the documented Hermes persona; desktop just made it visible in onboarding. - install.sh / install.ps1 / docker/SOUL.md now write DEFAULT_SOUL_MD. - _ensure_default_soul_md() upgrades a SOUL.md still matching the known legacy scaffold in place; customized files (any deviation, incl. a persona appended below the comment) are never touched. - Detection normalizes CRLF/BOM so Windows-installer drift still matches.	2026-06-24 18:56:26 -07:00
Ben	d1cac0e5ef	feat(gateway): scale-to-zero idle detection + dormant-quiesce (Phase 0) The gateway-side BEHAVIOUR layer that consumes the relay scale-to-zero primitives (gateway-gateway Phase 5): the gateway decides it is idle and drives the relay transport dormant so the platform (Fly autostop:"suspend") can suspend the now-traffic-idle machine, which wakes on the connector's wakeUrl poke (decisions.md Q3=C', D1-D13). - gateway/scale_to_zero.py: pure helpers — scale_to_zero_enabled (the NAS Labs HERMES_SCALE_TO_ZERO stamp, D11/Q8=A), parse_idle_timeout_seconds (config.yaml gateway.scale_to_zero.idle_timeout_minutes, D2), messaging_is_relay_only_or_absent (F6/D1), should_arm (D1/D11/§3.4(1)), is_idle (D2/D3/F7). - gateway/run.py: _last_inbound_at clock stamped on user inbound in _handle_message (F13); the arm-gate + idle predicate + the _scale_to_zero_watcher dormant sequence (mark draining -> adapter go_dormant() -> cooldown), started only when armed. Deliberately NOT the stop path and NOT mark_resume_pending (F12/D13). - tools/process_registry.py: has_any_active() for the bg-work guard (D3/F7). - hermes_cli/config.py: gateway.scale_to_zero.idle_timeout_minutes default 5. Tests: 38 pure-logic + 6 watcher (incl. bg-work regression guard proven RED). Full relay + scale-to-zero suites: 184 passed. The 20 unrelated failures in the broader run are PRE-EXISTING on origin/main (custom-provider/tools tests), confirmed via a pristine baseline worktree.	2026-06-24 18:47:18 -07:00
helix4u	17beb55e3c	fix(telegram): gate rich draft previews separately	2026-06-24 18:11:14 -07:00
brooklyn!	7157b213f5	Merge pull request #47959 from NousResearch/bb/pets-gen Pet generation: frame-perfect hatch flow, backend picker, CPU-safe chroma, and CI-hardening	2026-06-24 19:41:34 -05:00
brooklyn!	b649cdee4a	Merge pull request #52203 from NousResearch/bb/update-drain-announce fix(update): announce gateway drain waits so desktop updates don't look hung	2026-06-24 19:28:44 -05:00
Ben	538c419d2e	fix(gateway): scope dashboard liveness fallback to the profile PR #52151 hardened the runtime-status liveness check to trust a readable live process command line over stale gateway_state.json argv, so a recycled PID now owned by an s6 supervisor no longer counts as a running gateway. That fix is correct but incomplete for the reported symptom: the web dashboard showed a named profile's gateway green while `hermes -p <name> gateway status` showed it stopped. Two further issues: 1. Cross-profile PID reuse. In per-profile Docker supervision, one profile's stale `gateway_state.json` can record a PID the OS later recycled onto a DIFFERENT profile's live gateway. That PID's command line still `looks_like_gateway`, so the dead profile was reported running. The recorded argv has its `-p <name>` selector stripped in-process by `_apply_profile_override`, so it cannot disambiguate; the live `/proc` cmdline still carries it. `get_runtime_status_running_pid` now accepts an `expected_home` and validates the live command line belongs to THAT profile (mirroring `hermes_cli.gateway._matches_current_profile`, the logic the CLI scan path already uses — which is why the CLI was correct). `_check_gateway_running` passes the enumerated profile dir. 2. The existing regression test `test_gateway_running_check_falls_back_to_ runtime_state` used the live pytest PID with a gateway-shaped record; once the live cmdline became authoritative it no longer looked like a gateway. Updated to mock the live cmdline to the real separate-process scenario it describes. The active-profile path (`get_running_pid`) is intentionally left unscoped: it is lock-verified and any live gateway cmdline is acceptable there. Multiplex mode is unaffected — `running` state is only ever written to a gateway's own home, never a secondary served profile's. Adds coverage for: cross-profile PID reuse (named + default), matching profile cmdline (`-p`, `--profile`, explicit HERMES_HOME=), the bare default gateway, and the unreadable-cmdline cross-platform fallback. Each new cross-profile assertion fails without the profile scope and passes with it. Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>	2026-06-25 10:25:54 +10:00
AIalliAI	463bf2be25	fix(update): announce gateway drain waits so desktop updates don't look hung On macOS, the desktop updater's stage 1 (hermes update --gateway) ends by restarting running gateways. launchd_restart() SIGTERMs the gateway and silently waits up to agent.restart_drain_timeout (default 180s) for the drain; the manual profile-gateway loop waits its drain budget per gateway the same way. Neither path prints anything before the wait, so the desktop updater's live output goes dead for minutes right after '✓ Update complete!' — users read it as a hung update and force-kill their gateway processes to make it move (#44515). The systemd branch already announces its drain ('draining (up to Ns)...'); launchd and the manual loop did not. Print the stop/drain (with PID and budget) before the wait in both paths, mirroring the systemd branch, and assert the message in the existing launchd drain test. Fixes #44515	2026-06-24 19:12:44 -05:00
Brooklyn Nicholson	1fe013ee16	feat(pets): polish generate flow and reduce hatch CPU pressure Ship the final pet-generation UX polish (provider picker behavior, step-2 cancel flow, banner integration, and visual consistency) and make saturated-chroma background removal C-op driven so hatch processing no longer hammers the machine during long runs.	2026-06-24 19:08:06 -05:00
Brooklyn Nicholson	aab49f6927	feat(pets): generation RPCs, non-blocking gallery + gateway plumbing - pet.generate / pet.hatch (parallel rows, off the reader thread) + cooperative pet.cancel; pet.export / pet.rename. - pet.gallery localOnly fast path + background manifest prefetch so the picker never blocks on petdex; rename follows the active-pet config. - gateway request gains optional timeout + AbortSignal for real Stop.	2026-06-24 13:48:38 -05:00
kshitij	c42d44cb2f	revert(plugins): restore user dashboard plugin backend API auto-import (#43719 ) (#51950 ) * Revert "refactor(security): centralize non-bundled plugin sources in one constant" This reverts commit `e2bea0abe6`. * Revert "fix(security): restrict dashboard plugin backend import to bundled plugins (#43719)" This reverts commit `8845f3316c`.	2026-06-24 07:46:54 -07:00
Elshayib	1a435a6d5d	fix(model-switch): prevent custom-provider misattribution in model picker (#48305 ) When the current provider is a custom endpoint (custom or custom:), the model switch pipeline must NOT auto-switch to a native provider/OpenRouter based on a static-catalog match. The user explicitly configured their own endpoint and the same model name may be served there; silently rewriting model.provider destroys their config. - detect_static_provider_for_model(): skip the static-catalog scan when the current provider is custom/custom: - switch_model() Step e: extend is_custom to cover custom:* so the detect_provider_for_model() last-resort fallback cannot fire Salvaged from #48351 by Elshayib (authorship preserved). Fixes #48305	2026-06-24 19:34:33 +05:30
kshitij	2187fd884c	Merge pull request #51027 from NousResearch/salvage/typed-model-routing fix(model_switch): route typed configured models off openai-codex (#45006)	2026-06-24 19:32:35 +05:30
kshitijk4poor	1a174dfb50	fix(models): gate openai-codex/xai-oauth soft-accept to family-shaped slugs (#45006 ) Completes the #45006 fix. PR-base commit (configured-provider routing) handles the case where a typed model IS declared in user/custom provider config. This commit closes the other root: when a typed model is NOT in any config and the current provider is a soft-accepting one (openai-codex / xai-oauth), the hidden-model soft-accept (#16172 / #19729) would accept ANY unknown name as a hidden model — so `qwen3.5-4b` typed on a Codex-default session "succeeded" and mislabeled the provider as "OpenAI Codex" (the exact reported symptom), then 400'd on the next turn. Gate the soft-accept to slugs that plausibly belong to the provider's family (openai-codex -> gpt-/codex-/o1/o3/o4; xai-oauth -> grok-). Family-shaped unknown slugs are still soft-accepted (preserving the #16172 entitlement-gated hidden-model intent); unrelated names are rejected with actionable guidance to pin the right provider via `--provider <slug>` or the picker. Adds TestCodexSoftAcceptPlausibilityGate (5 tests): unrelated names rejected on codex/xai, family-shaped hidden slugs still accepted, real catalog models unaffected. Verified load-bearing.	2026-06-24 19:23:53 +05:30
xxxigm	33926eb315	fix(cli): honor non-interactive context in prompt_yes_no The dashboard/desktop spawn gateway actions with stdin=DEVNULL and HERMES_NONINTERACTIVE=1 (hermes_cli/web_server.py), but prompt_yes_no ignored that contract and called sys.exit(1) on the resulting EOFError. On Windows, `gateway start` asks "Install it now so the gateway starts on login? [Y/n]" when the scheduled task / startup entry is not yet installed. Spawned from the desktop app there is no stdin to answer it, so every desktop-triggered gateway restart aborted at that prompt and the gateway never started ("Gateway service is not installed"). Fall back to the prompt's default when HERMES_NONINTERACTIVE is set, and treat a bare EOFError as "accept default" rather than exiting. This lets the Windows start path proceed unattended (Startup-folder fallback + direct spawn) while interactive TTY usage is unchanged. Ctrl+C still exits.	2026-06-24 17:56:30 +05:30
Teknium	3c75e11571	fix(browser): validate agent-browser is runnable, not just present (#51740 ) After `hermes update`, a globally-installed agent-browser's npm postinstall (fixUnixSymlink) re-points the global symlink (e.g. /opt/homebrew/bin/agent-browser) at our local node_modules binary. The next update wipes node_modules, leaving a dangling symlink that `which` still reports but exec fails on with exit 127 — silently breaking every browser tool (#48521). Root cause is trust-on-presence: shutil.which/Path.exists accept a name that resolves but won't run. Add hermes_constants.agent_browser_runnable() (resolves the path + runs --version) and gate all four resolution sites on it: _find_agent_browser now skips a dead candidate and falls through to the next working one (extended PATH -> local .bin -> npx), self-healing the dangling link. dep_ensure/doctor/nous_subscription validate too; doctor warns on a broken link. Closes #48521.	2026-06-24 00:14:49 -07:00
yusekiotacode	2ee6449fe5	fix(anthropic): use platform.claude.com for OAuth token exchange Anthropic migrated the OAuth token endpoint from console.anthropic.com/v1/oauth/token (now returns HTTP 404) to platform.claude.com/v1/oauth/token. The token refresh path already iterated both hosts, but the two initial code-exchange call sites were hardcoded to the dead console host, so every new Claude OAuth login failed with 'Token exchange failed: HTTP Error 404: Not Found' and saved no credentials. Fix the whole bug class: - Add _OAUTH_TOKEN_URLS [platform.claude.com, console.anthropic.com] in agent/anthropic_adapter.py; _OAUTH_TOKEN_URL now points at the live host for backward-compat with existing imports. - run_hermes_oauth_login_pure() (CLI flow) iterates the list, first success wins, mirroring the refresh path. - hermes_cli/web_server.py (desktop dashboard flow) imports the list and iterates it too, so the GUI login path is fixed identically. Probe: console.anthropic.com/v1/oauth/token -> HTTP 404 (gone), platform.claude.com/v1/oauth/token -> HTTP 400 (alive). Verified a real Claude MAX OAuth login now succeeds end-to-end.	2026-06-23 23:59:40 -07:00
Teknium	be78fbd70e	Revert "fix(profiles): clone auth.json so OAuth credentials carry to cloned profiles (#51719 )" (#51732 ) This reverts commit `f504aecffe`.	2026-06-23 23:58:43 -07:00
kyssta-exe	284d06cabf	fix(delegation): use target_model for bedrock api_mode routing (#49095 )	2026-06-23 23:49:37 -07:00
teknium1	d4be583d98	fix(telegram): raise default command-menu cap to 60 so skills stay visible The 30-slot default could not fit Hermes's ~50 built-in commands, so every skill command (and 20 built-ins) were silently dropped from the Telegram \`/\` menu by default — they only worked when typed manually. Raising the default to 60 keeps all built-ins plus common skill commands visible out of the box while staying under Telegram's ~4KB payload limit. Users can still tune it via platforms.telegram.extra.command_menu.	2026-06-23 23:49:22 -07:00
Thestral	dbe14ce35d	feat(gateway): configure Telegram command menu priority Adds a configurable Telegram BotCommand menu cap and priority list via platforms.telegram.extra.command_menu (max_commands clamped 1..100; priority_mode prepend\|append\|replace). Default cap stays 30; hidden commands remain invokable when typed and /commands lists the full set. Salvaged from PR #42021. Cherry-picked onto current main; the original edited gateway/platforms/telegram.py, now relocated to plugins/platforms/telegram/adapter.py.	2026-06-23 23:49:22 -07:00
Teknium	f504aecffe	fix(profiles): clone auth.json so OAuth credentials carry to cloned profiles (#51719 ) Selective --clone / --clone-from / --clone-config copied .env but not auth.json, silently dropping the credential pool — including OAuth tokens (Anthropic `claude /login`, Codex, xAI) that never land in .env. A profile cloned from an OAuth-authenticated default therefore resolved a different provider (or none) than the source under provider: auto. --clone-all already carried auth.json via the full copytree; only the selective path missed it. Add auth.json to _CLONE_CONFIG_FILES and tighten it to 0o600 after copy, matching .env semantics.	2026-06-23 23:44:34 -07:00
Teknium	050bd01b7b	fix(dashboard): serve uvicorn on SelectorEventLoop on Windows (#50641 ) (#51717 ) On Windows, start_server() served uvicorn via a bare asyncio.run(_serve()), which uses the default ProactorEventLoop. uvicorn's socket-serving stack assumes a SelectorEventLoop on win32 (uvicorn/loops/asyncio.py forces it, and uvicorn.Server.run threads config.get_loop_factory() into its runner for exactly this reason). Driving uvicorn on the proactor loop makes server.startup() bind a socket that never accepts: the dashboard and desktop backend print "Skipping web UI build" then hang forever with the port LISTENING but no TCP handshake completing. Fix is win32-scoped to keep the blast radius minimal: POSIX keeps the exact asyncio.run(_serve()) it had (its default loop is already a SelectorEventLoop / uvloop, which is what uvicorn serves on). Only on Windows do we mirror uvicorn.Server.run and run on the loop factory uvicorn picks, with a fallback to WindowsSelectorEventLoopPolicy for uvicorn < 0.36. Fixes hermes dashboard and hermes desktop (the Electron app spawns a hermes dashboard backend). The gateway symptom in the report has a separate root cause (no uvicorn) and is not addressed here.	2026-06-23 23:43:24 -07:00
uperLu	0d4cecb352	fix(cron): avoid provider package shadowing core cron	2026-06-23 23:39:22 -07:00
Ben	31bced1607	fix(profiles): detect a separate-process gateway in profile status The dashboard Profiles view showed "Gateway stopped" for a gateway that is in fact running — while the sidebar status strip and `hermes gateway status` (CLI) both correctly showed it running. Reported on v0.17.0 running the gateway + dashboard in one Docker container. Root cause: three liveness surfaces with three detection strengths, all reading the same `gateway.pid`: - `hermes gateway status` -> find_gateway_pids() (process-table scan) - sidebar /api/status -> get_running_pid() + gateway_state.json PID fallback + health-URL probe - Profiles view -> _check_gateway_running() = get_running_pid() ONLY, no fallback `get_running_pid()` short-circuits to None the moment the runtime lock (`gateway.lock`) doesn't register as held by the calling process — which is always true when the reader is a separate process from the gateway (the dashboard is its own s6 service in the container), and also for any launch-service-managed gateway that left a fresh `gateway_state.json` but no live PID file. So the Profiles view alone reported the live gateway as stopped. Fix: give _check_gateway_running the same fallback the sidebar already has — after the pid-file/lock check misses, validate the PID recorded in that profile's gateway_state.json against the live process table via the existing get_runtime_status_running_pid(). read_runtime_status() gains an optional path arg so a profile's state file can be read without mutating the process-global HERMES_HOME (preserving the contextvar-based profile isolation the dashboard relies on). Backward compatible: every existing caller passes no argument. Tests: a regression test that fails pre-fix (live gateway, lock check returns None -> must still report running) and a guard test that a 'stopped' state file is never reported running even with a live PID.	2026-06-24 16:36:17 +10:00

1 2 3 4 5 ...

3036 commits