hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-27 11:22:03 +00:00

Author	SHA1	Message	Date
Bartok9	710cd48fb1	fix(agent): validate context/memory tool schemas before wrapping Closes #47707 Context engines and memory providers expose tool schemas via get_tool_schemas(). agent_init.py wrapped each as {"type":"function","function":_schema} without validating that _schema carries a top-level name. A provider returning an entry already in OpenAI tool form ({"type":"function","function":{...}}) was then double-wrapped into a tool whose function has no name. Strict providers (e.g. DeepSeek) reject the entire request with HTTP 400 'tools[N].function: missing field name', so one malformed schema silently disables the whole toolset and breaks every turn. The schema was also never added to valid_tool_names, so even lenient providers could not call it. Add a shared normalize_tool_schema() helper that unwraps an already-wrapped entry and returns None for anything lacking a resolvable string name. Wire it into the agent_init context-engine loop and all three memory_manager surfaces (inject_memory_provider_tools, add_provider routing index, get_all_tool_schemas), so a single bad plugin schema is skipped with a warning instead of poisoning the request. Verification: 209 targeted agent/memory tests pass (incl. 9 new). New tests assert the unwrap + skip-nameless behavior and fail without the fix.	2026-06-25 02:17:29 +05:30
kshitij	77d2b50751	Merge pull request #52118 from NousResearch/salvage/36776-ddgs-timeout fix(ddgs): bound DuckDuckGo search with a wall-clock timeout (#36776)	2026-06-25 01:56:26 +05:30
kshitij	4d589b1e13	Merge pull request #52121 from NousResearch/salvage/43466-strip-cronjob-toolset fix(delegate): strip cronjob toolset from delegated children (#43466)	2026-06-25 01:54:37 +05:30
uzunkuyruk	489b85ee1e	fix(ddgs): bound DuckDuckGo search with a wall-clock timeout (#36776 ) A single ddgs (DuckDuckGo) search could hang indefinitely and block the shared agent loop — and therefore every platform (CLI, Telegram, Matrix...). The DDGS constructor's timeout only bounds individual HTTP requests; ddgs's multi-engine retry loop has no overall cap, so a slow/rate-limited response could spin for 20+ minutes with no output and no error. Run the synchronous ddgs call in a single-worker ThreadPoolExecutor and cap it with future.result(timeout=_SEARCH_TIMEOUT_SECS=30). On timeout, return a clear failure ("DuckDuckGo search timed out ... try a different provider") instead of blocking; the pool is shut down with cancel_futures so a hung worker is never awaited. Salvaged from #37422 by @uzunkuyruk (authorship preserved). Re-applied on current main (the PR's provider.py base had diverged). Added a load-bearing timeout regression test (the original PR only updated the fake's constructor and had no timeout-behavior test) — mutation-verified to fail without the cap. Closes #36776.	2026-06-25 01:45:06 +05:30
Riyasudeen Farook	1e4df599ec	fix(delegate): strip cronjob toolset from delegated children (#43466 ) _strip_blocked_tools used a hardcoded set missing 'cronjob'. Children on gateway platforms could inherit the cronjob toolset, scheduling persistent jobs that outlive the delegation despite DELEGATE_BLOCKED_TOOLS. Fix: derive the strip set from DELEGATE_BLOCKED_TOOLS at runtime so the two lists can never drift. Add 'cronjob' to DELEGATE_BLOCKED_TOOLS for documentation consistency. Two regression tests lock the invariant. Salvaged from #43687 by @riyas22. Adapted test to current main (no 'messaging' toolset exists -- send_message is intentionally not registered as an agent tool). Closes #43466	2026-06-25 01:37:25 +05:30
kshitij	7a79a4447c	Merge pull request #52116 from NousResearch/fix/46994-session-load-bool-iterable fix(gateway): skip non-dict entries in session loading (#46994)	2026-06-25 01:33:36 +05:30
kshitij	8f0a12ce09	Merge pull request #52114 from NousResearch/salvage/27405-preflight-fewbig fix(agent): trigger preflight compression on few-but-huge sessions (#27405)	2026-06-25 01:27:07 +05:30
kshitijk4poor	9c994377ed	fix(gateway): skip non-dict entries in session loading (#46994 ) Corrupted sessions.json entries (e.g. a bare bool where a dict is expected) caused TypeError on 'origin' in data' which escaped the (ValueError, KeyError) inner except and aborted loading ALL remaining sessions, not just the corrupted one. Two-layer fix: - Loop level: isinstance(entry_data, dict) guard before from_dict - from_dict: isinstance(data['origin'], dict) instead of bare truthiness - Added TypeError to the inner except as defense-in-depth Closes #46994	2026-06-25 01:26:13 +05:30
texhy	aacc6bb0a8	fix(agent): trigger preflight compression on few-but-huge sessions (#27405 ) The preflight-compression gate only ran the (expensive) token estimate when the message COUNT exceeded protect_first_n + protect_last_n + 1. A session with a handful of very large messages never tripped the count condition, so compression was never attempted and the turn eventually hit a hard context-overflow error. Add _should_run_preflight_estimate() with OR semantics: run the estimate when either the message count exceeds the protected ranges (the historical gate) OR a cheap char-based estimate already crosses the configured threshold. The downstream estimate_request_tokens_rough() stays authoritative — this is only a hint that decides whether to pay for the full estimate. Salvaged from #27435 by @texhy (authorship preserved). Re-applied on current main: the preflight gate moved from conversation_loop.py to turn_context.py since the PR was opened, so the helper + gate are placed there; the test imports the real MINIMUM_CONTEXT_LENGTH instead of a hardcoded literal. Closes #27405.	2026-06-25 01:20:23 +05:30
kshitijk4poor	e0272cfef2	Revert "fix(compression): make minimum context floor configurable (#31600 )" This reverts commit `cae1ee44a7`.	2026-06-25 01:04:44 +05:30
kshitij	59acaa972f	Merge pull request #52053 from NousResearch/salvage/31600-minimum-context-length-configurable fix(compression): make minimum context floor configurable (#31600)	2026-06-25 01:02:52 +05:30
Tranquil-Flow	cae1ee44a7	fix(compression): make minimum context floor configurable (#31600 ) Add compression.minimum_context_floor config key that allows users to lower the compression threshold floor below the hardcoded 64K default, preventing infinite tool-call loops on models whose structured output degrades well before 64K tokens. - agent/model_metadata.py: add get_configurable_minimum_context() helper with 16K hard safety limit - agent/context_compressor.py: accept minimum_context_floor param, thread it through _compute_threshold_tokens - agent/conversation_compression.py: use compressor's floor for aux model context validation - agent/agent_init.py: read compression.minimum_context_floor from config and pass to ContextCompressor - gateway/run.py: cache-busting includes new key Salvaged from #31686 by @Tranquil-Flow onto current main. Resolves conflicts with in-place compaction (#38763) and max_tokens threshold computation (#43547) that landed after the original PR. Closes #31600	2026-06-25 00:56:04 +05:30
liuhao1024	25e2312230	fix(memory): skip drift guard for add (append-only) action (#42874 ) The drift guard (introduced for #26045) correctly protects replace/remove from clobbering un-roundtrippable content, but it also fires on the add path. Since add only appends and never overwrites, the guard is unnecessary and causes false positives when prior add() calls in the same session shift the byte count of the on-disk file. Add skip_drift parameter to _reload_target() and pass True from add(). Replace/remove continue to use the drift guard unchanged. Salvaged from #42880 by @liuhao1024. Closes #42874	2026-06-25 00:51:12 +05:30
Jeffrey Quesnelle	b13e2fd694	Merge pull request #52044 from NousResearch/fix/install-venv-kill-venv-processes fix(install): kill venv-resident gateway before recreating venv on Windows	2026-06-24 15:16:58 -04:00
kshitij	9214aa7dde	Merge pull request #52090 from NousResearch/salvage/35994-reset-deadlock fix(gateway): offload agent cleanup off the event loop in /new reset (#35994)	2026-06-25 00:34:21 +05:30
kshitijk4poor	0225480369	fix(gateway): offload agent cleanup off the event loop in /new reset (#35994 ) The /new (and /reset) confirmation-button callback runs the slash-confirm handler on the asyncio event loop (see _request_slash_confirm). That handler calls _handle_reset_command, which invoked the SYNCHRONOUS, potentially long-blocking _cleanup_agent_resources inline: agent.close() tears down terminal sandboxes, browser daemons and background processes (subprocess waits), and shutdown_memory_provider() can make a network call. A slow teardown wedged the entire event loop, so the bot went silent and stopped processing all messages until a manual restart. Offload _cleanup_agent_resources via the existing contextvar-preserving _run_in_executor_with_context helper, bounded by asyncio.wait_for with a named _RESET_CLEANUP_TIMEOUT_S (30s). The loop is never blocked; on timeout the reset proceeds and the worker thread is left to finish on its own (it cannot be cancelled). The text /new path is unaffected (already off-loop). Tests (tests/gateway/test_35994_reset_button_deadlock.py): the loop keeps ticking while close() blocks in its worker thread; a cleanup that raises is swallowed (warning logged) and the reset still rotates the session; a cleanup that times out degrades gracefully. All three are mutation-verified to fail without their respective production branch.	2026-06-25 00:27:22 +05:30
kshitij	de281bcebc	Merge pull request #52084 from NousResearch/salvage/31884-silent-drop-after-stop fix(gateway): surface retry hint instead of silently dropping turn after /stop (#31884)	2026-06-25 00:06:32 +05:30
kshitij	5b065e32ed	Merge pull request #51051 from NousResearch/salvage/cron-provider-pin fix(cron): fail closed when an unpinned job provider drifts from creation snapshot (#44585)	2026-06-25 00:05:52 +05:30
sweetcornna	b41d9b845d	fix(gateway): surface retry hint instead of silently dropping turn after /stop (#31884 ) After /stop, the next user message can hit a stale generation token and return with api_calls=0, no failure, no interruption. _normalize_empty_agent_response fell through to an empty string, so the gateway logged "response=0 chars" and sent nothing — the message was silently lost while internal work sometimes continued. Add the api_calls==0 / not-failed / not-interrupted / not-partial branch to the single normalization chokepoint so the user gets a short retry hint instead of silence. Regression test asserts the hint surfaces. Salvaged from #33851 (re-applied on current main; original was 1401 commits behind and the function had moved).	2026-06-24 23:51:31 +05:30
brooklyn!	35e9c63d89	Merge pull request #52008 from infinitycrew39/fix/desktop-nous-onboarding-stale-provider fix(desktop): stop Nous Portal onboarding from validating stale Anthropic config	2026-06-24 13:12:44 -05:00
emozilla	6638199c53	fix(install): harden venv-resident process sweep on Windows Follow-up to the salvaged venv-recreate fix. Three changes to the Install-Venv pre-delete sweep: - Match the venv path with a case-insensitive StartsWith instead of the PowerShell -like operator. A venv path containing wildcard metacharacters ('[', ']') — legal in a Windows user name — silently fails to match under -like, which would let the locking process slip through and reintroduce the exact access-denied failure this fix closes. - Retry Remove-Item once after a short pause. A force-killed process can take a moment to release its file handles, so the first delete may still hit a locked .pyd; retry before failing the stage. - Note in a comment that the gateway autostart task runs at LIMITED integrity as the current user, so the installer always runs at equal-or-higher integrity and can read the process executable path, and that Get-CimInstance is preferred over Get-Process because it returns a null path for an uninspectable process instead of throwing. Adds a regression test asserting the recreate branch sweeps by venv path prefix, uses StartsWith rather than -like, and runs the sweep before Remove-Item. Covers issues #47036, #47557, #47910.	2026-06-24 13:25:44 -04:00
infinitycrew39	d8fe1c0b41	test(desktop): cover scoped onboarding runtime readiness checks Assert setup.runtime_check honors provider params and that Nous OAuth onboarding persists model config before validating the connected provider.	2026-06-24 23:19:51 +07:00
kshitij	c42d44cb2f	revert(plugins): restore user dashboard plugin backend API auto-import (#43719 ) (#51950 ) * Revert "refactor(security): centralize non-bundled plugin sources in one constant" This reverts commit `e2bea0abe6`. * Revert "fix(security): restrict dashboard plugin backend import to bundled plugins (#43719)" This reverts commit `8845f3316c`.	2026-06-24 07:46:54 -07:00
kshitij	7fb2027d85	Merge pull request #51881 from NousResearch/fix/29559-compression-abort-on-network-failure fix(compression): abort + preserve context on transient network summary failure (#29559, #25585)	2026-06-24 19:54:21 +05:30
Elshayib	1a435a6d5d	fix(model-switch): prevent custom-provider misattribution in model picker (#48305 ) When the current provider is a custom endpoint (custom or custom:), the model switch pipeline must NOT auto-switch to a native provider/OpenRouter based on a static-catalog match. The user explicitly configured their own endpoint and the same model name may be served there; silently rewriting model.provider destroys their config. - detect_static_provider_for_model(): skip the static-catalog scan when the current provider is custom/custom: - switch_model() Step e: extend is_custom to cover custom:* so the detect_provider_for_model() last-resort fallback cannot fire Salvaged from #48351 by Elshayib (authorship preserved). Fixes #48305	2026-06-24 19:34:33 +05:30
kyssta-exe	b85c460540	fix(tui): targeted save_config_value for model persistence (#48305 ) The TUI model-switch persistence (_persist_model_switch) rewrote the entire model config block via save_config(), destroying sibling keys the user set under model: (model_slots, model_fallback, base_url, ...) on every switch. Use targeted, atomic, comment-preserving save_config_value("model.default" / "model.provider" / "model.base_url") writes instead, so a model switch only touches the keys it changes. Salvaged from #48391 by kyssta-exe (authorship preserved). Fixes #48305	2026-06-24 19:34:33 +05:30
kshitij	2187fd884c	Merge pull request #51027 from NousResearch/salvage/typed-model-routing fix(model_switch): route typed configured models off openai-codex (#45006)	2026-06-24 19:32:35 +05:30
kshitijk4poor	1a174dfb50	fix(models): gate openai-codex/xai-oauth soft-accept to family-shaped slugs (#45006 ) Completes the #45006 fix. PR-base commit (configured-provider routing) handles the case where a typed model IS declared in user/custom provider config. This commit closes the other root: when a typed model is NOT in any config and the current provider is a soft-accepting one (openai-codex / xai-oauth), the hidden-model soft-accept (#16172 / #19729) would accept ANY unknown name as a hidden model — so `qwen3.5-4b` typed on a Codex-default session "succeeded" and mislabeled the provider as "OpenAI Codex" (the exact reported symptom), then 400'd on the next turn. Gate the soft-accept to slugs that plausibly belong to the provider's family (openai-codex -> gpt-/codex-/o1/o3/o4; xai-oauth -> grok-). Family-shaped unknown slugs are still soft-accepted (preserving the #16172 entitlement-gated hidden-model intent); unrelated names are rejected with actionable guidance to pin the right provider via `--provider <slug>` or the picker. Adds TestCodexSoftAcceptPlausibilityGate (5 tests): unrelated names rejected on codex/xai, family-shaped hidden slugs still accepted, real catalog models unaffected. Verified load-bearing.	2026-06-24 19:23:53 +05:30
kshitij	ae20c3fb90	Merge pull request #51025 from NousResearch/salvage/cron-autoreset-override fix(gateway): consume was_auto_reset so /model survives session auto-reset (#48031)	2026-06-24 19:20:11 +05:30
x7peeps	6879d77d74	fix(gateway): consume was_auto_reset so /model survives session auto-reset When `/model X` is the FIRST message after an idle/daily/suspended auto-reset, the slash-command path stores a session model override but leaves `session_entry.was_auto_reset = True` (it never passes through `_handle_message_with_agent`, which is where the flag was consumed). On the NEXT regular message, the auto-reset cleanup block pops the freshly-stored model/reasoning override BEFORE the flag is consumed — so the switch is silently lost and resolution falls back to the config default, while the session DB still shows the switched model (a two-sources-of-truth divergence). Consume the flag at both sites: 1. gateway/run.py — capture `was_auto_reset` into a local and set the attribute False immediately at the top of the cleanup block, so the cleanup can't re-fire on a later message and wipe an override stored between turns. Downstream reads use the captured local. 2. gateway/slash_commands.py — the model path consumes the flag before storing the override, so a /model-first-after-auto-reset isn't wiped by the next message's cleanup. Salvaged from #48062 by x7peeps (authorship preserved). Tests: tests/gateway/test_48031_model_switch_after_auto_reset.py — AST invariants pinning both consume sites (load-bearing; verified they fail when either consume is removed). Mirrors the AST-pin approach in test_35809_auto_reset_clean_context.py. Gateway session/reset suite: 16 passed. Fixes #48031	2026-06-24 19:12:44 +05:30
kshitij	d68a133458	Merge pull request #51890 from NousResearch/salvage/40695-handoff-watcher-async fix(gateway): offload handoff-watcher SQLite calls to avoid blocking the async heartbeat (#40695)	2026-06-24 19:10:52 +05:30
kshitij	7634488074	Merge pull request #51889 from NousResearch/salvage/41289-model-cmd-async fix(gateway): offload Discord /model provider-listing off the event loop (#41289)	2026-06-24 19:06:23 +05:30
kshitijk4poor	ab9134bf16	feat(openviking): add full recall prefetch policy Salvage of PR #48927 by @ehz0ah, which consolidates OpenViking recall work from #41706 (@huangxun375-stack), #33260, #49975, and #32444. Replaces stale background post-turn prefetch warming with synchronous current-query recall. The old queue_prefetch warmed the PREVIOUS user message while turn-start recall consumed the CURRENT one, so injected context was always about the wrong topic. Changes: - prefetch() now does session-aware /api/v1/search/search with the current query, falls back to /api/v1/search/find on failure - Contract-safe payloads: limit, score_threshold, context_type, session_id — no top_k, no search-body mode, no target_uri - L2 content reads for items with level=2 or empty abstracts, capped at full_read_limit (default 2) - Local ranking (score + query-token overlap + leaf boost), dedup, score threshold, and injected-char budget - queue_prefetch() is now a no-op (background warming removed) - Additive batched viking_read: uris param accepts up to 3 URIs - Per-request timeout support on _VikingClient.get/post/delete - Removes stale _prefetch_result/_prefetch_thread/_prefetch_generation state and _invalidate_prefetch_state() - Strengthened system_prompt_block guidance Salvage follow-up fixes: - Expose all 8 recall config knobs in get_config_schema() (PR #48927 had removed them; #41706 correctly exposed them). Env vars remain as internal mechanism but are now visible in setup wizard. - Lower default timeout 8s→4s, request_timeout 6s→3s, full_read_limit 3→2 to reduce per-turn blocking latency. Co-authored-by: Hao Zhe <haozhe4547@gmail.com> Co-authored-by: Eurekaxun <eurekaxun@163.com>	2026-06-24 18:53:49 +05:30
liuhao1024	721cf54fb1	fix(gateway): offload /model provider-listing off the event loop (#41289 ) The Discord/Telegram /model slash command listed providers synchronously on the gateway's async event loop. list_picker_providers / list_authenticated_providers are blocking and can fall through to a synchronous urllib HTTP fetch when the on-disk provider cache is stale, freezing the loop for 120-150s -> "application did not respond" and delayed agent starts. Port #41304's asyncio.to_thread offload to the current handler location. The handler moved from gateway/run.py to gateway/slash_commands.py (_handle_model_command); wrap BOTH blocking call sites so the whole bug class is covered: - picker path -> list_picker_providers - text-fallback path -> list_authenticated_providers asyncio.to_thread is already idiomatic in this module (and asyncio is imported), so the loop now stays responsive while the (possibly network-bound) listing runs on a worker thread. Adds tests/gateway/test_model_command_async_offload.py asserting the offload contract at the real handler seam for both paths (mutation- survivable: reverting either to_thread wrap fails the matching test). Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-06-24 18:40:52 +05:30
r266-tech	f0c5d812b0	fix(gateway): offload handoff watcher SessionDB polling off the event loop The Discord gateway heartbeat stalled ('Shard ID None heartbeat blocked for more than N seconds') because _handoff_watcher polled the synchronous, blocking SQLite-backed SessionDB directly on the asyncio event loop every 2s. Each list_pending/claim/complete/fail call performed blocking disk I/O on the loop thread, starving the Discord heartbeat coroutine. Wrap every blocking SessionDB call inside the watcher loop in asyncio.to_thread(...) so the SQLite work runs on a worker thread and the event loop (and heartbeat) stays responsive. These four call sites are the only synchronous self._session_db.* calls inside the watcher loop body. Adds tests/gateway/test_handoff_watcher_async_db.py asserting the watcher offloads its SessionDB calls via asyncio.to_thread (mutation-survivable: reverting any to_thread wrap fails the corresponding assertion). Fixes #40695 Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-06-24 18:40:23 +05:30
kshitijk4poor	ac822e4d36	fix(compression): abort (preserve context) on transient network summary failure (#29559 , #25585 ) When context compaction's summary generation fails, the compressor's default path (abort_on_summary_failure=False) drops the middle window and inserts a static 'summary unavailable' marker — destroying the compacted turns. #29559 reported the field impact: a Connection error at the compaction moment dropped 124->15 messages (110 lost) for a long browser-automation task; #25585 is the same failure mode (failed summary commits a destructive compaction anyway). compress() already has an EXCEPTION to the historical drop default: auth failures (401/403) ALWAYS abort and preserve the session, because rotating into a placeholder-summary child on a broken credential strands the user. A transient network/connection error is the same situation in reverse: it WILL recover, and retrying then is strictly better than discarding context for a momentary blip. Extend the always-abort carve-out to terminal connection/network failures: - new _last_summary_network_failure flag, set in _generate_summary's terminal failure branch when _is_connection_error(e) (reached only after any main-model fallback is exhausted), reset alongside the auth flag; - compress() aborts when it's set (returns messages unchanged, _last_compress_aborted=True), independent of abort_on_summary_failure; - a network-specific operator warning (distinct from the auth + config-flag messages). Scoped to connection errors only: a generic 500/400 still takes the historical fallback-drop path (test_non_auth_failure_still_uses_fallback_path stays green). Tests: network-failure detection + abort-despite-flag-false, both mutation-checked (removing the flag-set fails detection; removing the carve-out fails the abort).	2026-06-24 18:31:51 +05:30
xxxigm	89540d592b	test(cli): cover non-interactive prompt_yes_no fallback Regression coverage for the desktop gateway-restart hang: prompt_yes_no returns its default when HERMES_NONINTERACTIVE=1 or on a bare EOFError (closed/redirected stdin), and still exits on KeyboardInterrupt.	2026-06-24 17:56:30 +05:30
Ben	c93b9f9057	feat(relay): terminal 4401 (opt-out) → clean "Relay disabled" state Some checks are pending CI / detect (push) Waiting to run Details CI / tests (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / typecheck (push) Blocked by required conditions Details CI / docs-site (push) Blocked by required conditions Details CI / history-check (push) Blocked by required conditions Details CI / contributor-check (push) Blocked by required conditions Details CI / uv-lockfile (push) Blocked by required conditions Details CI / docker-lint (push) Blocked by required conditions Details CI / supply-chain (push) Blocked by required conditions Details CI / osv-scanner (push) Blocked by required conditions Details CI / All required checks pass (push) Blocked by required conditions Details Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Phase 7 Unit 7d-B. When an operator opts an instance OUT of the Team Gateway relay (Unit 7b deprovision), the connector revokes the per-gateway secret and closes the gateway's WS with 4401. The reconnect supervisor previously treated EVERY close as retryable, so the live process spun "retrying 4401" forever and the dashboard showed a red error — opt-out looked like a failure. Now a 4401 close that arrives AFTER a successful handshake is recognized as a terminal credential revocation: - ws_transport.py: track `_handshake_succeeded` (set when a descriptor is received); on a 4401 close after a prior success, latch `auth_revoked` and do NOT spawn the reconnect supervisor. A 4401 BEFORE any successful handshake stays retryable (cold-start / not-yet-provisioned race, not a revocation). New `auth_revoked` property + a websockets-version-safe close-code reader (prefers `.rcvd`/`.sent` Close frames; `.code` is deprecated in websockets 13+). - adapter.py: a revocation monitor turns `transport.auth_revoked` into a clean, NON-retryable `relay_disabled` fatal and notifies the gateway's fatal-error handler (so the adapter is removed and NOT queued for reconnection — the credential is dead until the instance is recreated). Monitor is cancelled on disconnect; only started when the transport exposes `auth_revoked` (prod WS). - run.py: `_handle_adapter_fatal_error` maps the `relay_disabled` code to a `disabled` platform_state (not `fatal`/`retrying`). - web: PlatformsCard renders the `disabled` state with a neutral outline badge, a PowerOff icon, and muted (not destructive-red) text + message. New optional `status.disabled` i18n string ("Disabled"). Also bundles the Phase 7 contract-doc update (this doc is authoritative in hermes-agent): docs/relay-connector-contract.md gains an "Author-first resolution + the account-link (DM) path" section documenting the multi-tenant-guild rule (D-7.2 — route by authenticated author binding, never by guild; unlinked → fail-closed), the `/link <code>` DM flow, and the connector-authoritative opt-out + terminal-4401 behavior this PR implements. Tests: +2 ws_transport (4401-after-handshake terminal / no-reconnect; 4401-before-handshake stays retryable) and +2 adapter (revocation → non-retryable relay_disabled fatal + handler fired; no-revocation → no fatal). 138 relay tests pass (incl. the contract-doc conformance test); ruff clean; web tsc clean. Phase 7 Unit 7d-B (relay-adapter solo lane). Q17 → Option 2; Option 3 (live de-register, no recreate) + the restart-re-provision hole deferred post-alpha.	2026-06-24 18:43:01 +10:00
Teknium	3c75e11571	fix(browser): validate agent-browser is runnable, not just present (#51740 ) After `hermes update`, a globally-installed agent-browser's npm postinstall (fixUnixSymlink) re-points the global symlink (e.g. /opt/homebrew/bin/agent-browser) at our local node_modules binary. The next update wipes node_modules, leaving a dangling symlink that `which` still reports but exec fails on with exit 127 — silently breaking every browser tool (#48521). Root cause is trust-on-presence: shutil.which/Path.exists accept a name that resolves but won't run. Add hermes_constants.agent_browser_runnable() (resolves the path + runs --version) and gate all four resolution sites on it: _find_agent_browser now skips a dead candidate and falls through to the next working one (extended PATH -> local .bin -> npx), self-healing the dangling link. dep_ensure/doctor/nous_subscription validate too; doctor warns on a broken link. Closes #48521.	2026-06-24 00:14:49 -07:00
Chaz Dinkle	abc3662bf6	fix(gateway): detect launchd in /restart service-manager probe (#43475 ) On a launchd-managed gateway (macOS), /restart stopped the gateway but never relaunched it: the handler's service detection checks only INVOCATION_ID (systemd) and container markers, so under launchd it takes the detached path and exits 0 — which KeepAlive.SuccessfulExit=false treats as a deliberate stop. The gateway stays silently dead until a manual launchctl kickstart. Detect launchd via XPC_SERVICE_NAME, which launchd sets to the job label for processes it spawns. The probe deliberately excludes the literal "0": interactive macOS shells inherit XPC_SERVICE_NAME=0 (a truthy string), and routing an unsupervised interactive gateway to the service path would make it exit non-zero with nothing to revive it. Routing through via_service=True (rather than forcing a non-zero exit on the detached path) matters: the detached path also spawns a helper that relaunches the gateway, so exiting non-zero there would have BOTH the helper and launchd respawn it — two gateways racing for the same bot tokens. The service path spawns no helper; launchd is the single respawner. Fixes #43475. Supersedes the run.py-era probes in #19940/#33393 (the handler has since moved to gateway/slash_commands.py) and avoids the double-spawn risk in the exit-code-site approaches (#43498, #43596).	2026-06-24 00:14:25 -07:00
Tranquil-Flow	73a20a6ad6	fix(telegram): clip mid-stream overflow instead of splitting (#48648 )	2026-06-24 00:00:46 -07:00
teknium1	ba50787180	test(anthropic-oauth): cover login token-endpoint host + fallback Add two regression tests for the salvaged #48706 fix: - login token exchange targets platform.claude.com first - falls back to console.anthropic.com when the new host is unreachable Also map the salvaged contributor's noreply email in release.py AUTHOR_MAP (CI author-map gate).	2026-06-23 23:59:40 -07:00
Teknium	be78fbd70e	Revert "fix(profiles): clone auth.json so OAuth credentials carry to cloned profiles (#51719 )" (#51732 ) This reverts commit `f504aecffe`.	2026-06-23 23:58:43 -07:00
justemu	4aa793345e	fix(matrix): use member_count as DM signal for named DM rooms Most Matrix clients auto-set a room name when creating a DM (e.g. "Alice & Bot" from participant display names), so the old `is_direct and not has_explicit_name` heuristic classified virtually all client-created DM rooms as "room", forcing require_mention gating in legitimate one-on-one DMs. member_count is now the primary DM signal: <=2 members means the room is necessarily a 1:1 conversation, regardless of m.direct or an explicit name. A room that grew to 3+ members but is still in stale m.direct is still classified as a room (conflict flag set). Falls back to the m.direct + name heuristic when the count is unavailable. Also hardens _get_room_member_count with a joined_members API fallback when the cache-backed state_store is empty. Salvaged from #48554 by @justemu onto the current plugin adapter path (gateway/platforms/matrix.py -> plugins/platforms/matrix/adapter.py). Fixes #48551	2026-06-23 23:57:38 -07:00
Teknium	0ef86febe2	docs(sessions): clarify sessions.json is the gateway routing index, not the session list (#51726 ) Users who inspect ~/.hermes/sessions/sessions.json see only gateway entries (e.g. agent:main:whatsapp:dm:...) and mistake it for the session index that hermes sessions list / /sessions read — which is actually state.db. Issue #49361 reported CLI sessions as 'invisible' on this premise. - gateway/session.py: write a self-documenting _README sentinel at the top of sessions.json explaining it's the gateway routing index and that ALL sessions (CLI/TUI/gateway) live in state.db; skip _-prefixed keys on load so the sentinel never round-trips into a SessionEntry. - Harden every sessions.json reader against the sentinel: mcp_serve loader, gateway/mirror.py, gateway/channel_directory.py all skip _-prefixed keys. - docs/user-guide/sessions.md: warning callout naming the exact symptom. - tests: assert prune ignores metadata sentinels; add round-trip coverage.	2026-06-23 23:56:36 -07:00
liuhao1024	7ff48a6291	fix(discord): check pairing store for component button auth Component button interactions (approve/deny, slash confirm, model picker, clarify) were not checking the pairing store for authorization. Users approved via `hermes pairing approve` could send messages and use slash commands (which go through the gateway authz_mixin), but button clicks were rejected because `_component_check_auth` only checked env-var allowlists (DISCORD_ALLOWED_USERS, GATEWAY_ALLOW_ALL_USERS, etc.) and not the pairing store. This was a regression from commit `f6f363662` which intentionally made component auth fail-closed when no allowlist is set (security fix for GHSA-mc26-p6fw-7pp6), but did not account for pairing-based auth. Fix: add a `PairingStore.is_approved("discord", uid)` check to `_component_check_auth`, mirroring `authz_mixin._check_authorization`. The pairing store check runs after all allowlist checks, preserving the fail-closed behavior for non-paired, non-allowed users. Fixes #50627	2026-06-23 23:55:18 -07:00
Teknium	0957d77187	test(agent): cover interrupt tool-tail alternation close (#48879 ) Regression coverage for the synthetic-assistant close: interrupt after a successful tool must persist an assistant tail (placeholder when no delivered text), real delivered text is preserved, and non-interrupted or non-tool tails are left untouched.	2026-06-23 23:52:28 -07:00
teknium1	53f8386587	test(delegation): regression for bedrock Claude target_model api_mode routing Asserts resolve_runtime_provider honors target_model over the stale persisted model.default when choosing the Bedrock dual-path api_mode: Claude target -> anthropic_messages, Nova target -> bedrock_converse. Both fail without the #49095 fix.	2026-06-23 23:49:37 -07:00
teknium1	d4be583d98	fix(telegram): raise default command-menu cap to 60 so skills stay visible The 30-slot default could not fit Hermes's ~50 built-in commands, so every skill command (and 20 built-ins) were silently dropped from the Telegram \`/\` menu by default — they only worked when typed manually. Raising the default to 60 keeps all built-ins plus common skill commands visible out of the box while staying under Telegram's ~4KB payload limit. Users can still tune it via platforms.telegram.extra.command_menu.	2026-06-23 23:49:22 -07:00
Thestral	dbe14ce35d	feat(gateway): configure Telegram command menu priority Adds a configurable Telegram BotCommand menu cap and priority list via platforms.telegram.extra.command_menu (max_commands clamped 1..100; priority_mode prepend\|append\|replace). Default cap stays 30; hidden commands remain invokable when typed and /commands lists the full set. Salvaged from PR #42021. Cherry-picked onto current main; the original edited gateway/platforms/telegram.py, now relocated to plugins/platforms/telegram/adapter.py.	2026-06-23 23:49:22 -07:00

1 2 3 4 5 ...

6135 commits