hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-01 12:02:05 +00:00

Author	SHA1	Message	Date
herbalizer404	3fe16e3cd5	fix(fallback): attach credential pool after provider switch When automatic fallback activates a provider that differs from the primary, try_activate_fallback() cleared the primary's pool (to avoid cross-provider base_url contamination, #33163) but never loaded the fallback provider's own pool. The fallback then ran with no pool, so rate_limit/billing/auth recovery couldn't rotate its credentials. After clearing a mismatched pool, load_pool(fb_provider) and attach it when it has credentials, so provider-specific rotation continues to work on the fallback target.	2026-06-27 04:39:26 -07:00
Tranquil-Flow	635841d210	fix(agent): reload credential pool on switch_model provider change (#52727 ) switch_model() swapped model/provider/base_url/api_key but never refreshed agent._credential_pool, which stays bound to the original provider. recover_with_credential_pool() then sees a pool.provider != agent.provider mismatch and short-circuits — so a 429/401 on the new provider gets no rotation and falls through to fallback instead. Reload load_pool(new_provider) inside switch_model when the provider changes (or the pool is missing). The reload is inside the protected swap block and the pool is added to the rollback snapshot, so a failed client rebuild restores the original pool. Fixes #16678, #52727.	2026-06-27 04:39:26 -07:00
Teknium	2002bb49a7	test(telegram): make config-bridge tests immune to ambient .env pollution (#53594 ) test_config_bridges_telegram_group_settings and test_config_bridges_telegram_user_allowlists asserted the YAML→env bridge via os.environ. A developer's real ~/.hermes/.env can repopulate TELEGRAM_* vars during load_gateway_config(): the microsoft_teams plugin runs load_dotenv(find_dotenv(usecwd=True)) at import time, which walks up from the cwd (under ~/.hermes/ in worktrees) and reloads the user's .env, defeating the env-over-YAML bridge for any key present there (e.g. TELEGRAM_GROUP_ALLOWED_CHATS). Assert the returned PlatformConfig.extra instead — it is parsed straight from the test's config.yaml and is immune to that ambient leak. free_response_chats is bridged to the env var only (not extra), and TELEGRAM_FREE_RESPONSE_CHATS doesn't appear in developer .env files, so it stays a deterministic os.environ assertion.	2026-06-27 04:36:45 -07:00
Teknium	d4c2217e87	fix(gateway): offload /model switch off the event loop (#53603 ) The Telegram/Discord /model command's actual switch calls switch_model() directly on the asyncio event loop. switch_model() can fall through to a synchronous models.dev HTTP fetch (requests.get, 15s timeout) on a cold or expired cache, freezing the gateway for up to 15s and dropping the Telegram connection while a user switches models. The picker provider-list and fallback text-list sites were already offloaded (#41289), but the two _switch_model() calls — the picker callback and the direct /model <name> path — were not. Wrap both in asyncio.to_thread. Closes #20525.	2026-06-27 04:36:22 -07:00
Teknium	caf4dcc7ad	fix(whatsapp): resolve phone↔LID aliases in adapter DM/group allowlist (#53588 ) Some checks failed CI / Detect affected areas (push) Waiting to run Details CI / Python tests (push) Blocked by required conditions Details CI / Python lints (push) Blocked by required conditions Details CI / TypeScript (push) Blocked by required conditions Details CI / Docs Site (push) Blocked by required conditions Details CI / Deny unrelated histories (push) Blocked by required conditions Details CI / Check contributors (push) Blocked by required conditions Details CI / Check uv.lock (push) Blocked by required conditions Details CI / Lint Docker scripts (push) Blocked by required conditions Details CI / Build&Test Docker image (push) Blocked by required conditions Details CI / Supply-chain scan (push) Blocked by required conditions Details CI / OSV scan (push) Waiting to run Details CI / All required checks pass (push) Blocked by required conditions Details Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Build Skills Index / build-index (push) Has been cancelled Details Build Skills Index / trigger-deploy (push) Has been cancelled Details The adapter-level intake gate (_is_dm_allowed / _is_group_allowed, reached via _should_process_message) did a raw set-membership check against the configured allowlist. WhatsApp now delivers inbound DM senders in LID form (<id>@lid) while operators configure allowlists with phone numbers, so the check never matched and every DM from an allowed contact was silently dropped before the gateway authz layer ran. Route both gates through the existing gateway.whatsapp_identity. expand_whatsapp_aliases helper (already used by gateway authz and session keys), which walks the bridge's lid-mapping-*.json session files. Phone and LID forms now resolve to each other in both directions; exact JID matches, wildcard, disabled/open policies, and empty-allowlist fail-closed behavior are all preserved. Fixes #14486	2026-06-27 04:17:12 -07:00
teknium1	38e7bd8a08	fix(agent): classify 429 'overloaded' bodies as overloaded, not rate_limit Z.AI / Zhipu reuse HTTP 429 for server-wide overload. The 429 status path classified these unconditionally as rate_limit with should_rotate_credential=True, so an overloaded provider exhausted the credential pool after two errors — fatal for a single-key user, who has nothing to rotate to. The credential is valid; the server is just busy. Disambiguate the 429 body against a shared _OVERLOADED_PATTERNS list and route overload language to FailoverReason.overloaded (retryable, no rotation), matching the existing 503/529 path and the message-only path (#52890). Genuine rate limits (no overload language) still rotate. Extracted the inline overloaded tuple #52890 added into the shared _OVERLOADED_PATTERNS constant so the status-code and message paths use one list. Closes #14038.	2026-06-27 04:16:54 -07:00
ms-alan	16192103f4	fix(config): accept placeholder base_url in custom provider validation _normalize_custom_provider_entry() ran urlparse() on base_url and dropped any entry whose value was an un-expanded placeholder, so a caller reaching the normalizer with raw config (e.g. the Dockerized gateway path) silently skipped the provider with a 'not a valid URL' warning. Skip URL validation when the candidate contains a placeholder token — both ${ENV_VAR} env-refs and bare {region}-style templates — since those are expanded at runtime. Closes #14457	2026-06-27 04:15:27 -07:00
HiddenPuppy	b34771fc06	fix(cli): disable prompt_toolkit CPR queries to stop escape-sequence leak (#13870 ) prompt_toolkit's renderer sends ESC[6n cursor-position queries before painting in non-fullscreen mode; the terminal replies ESC[<row>;<col>R. Over SSH/cloudflared tunnels and slow PTYs these replies race past the input parser and land in the display as raw '20;1R21;1R' text, and the pending-CPR future can stall the renderer so the prompt freezes after the agent's final answer. Build the prompt_toolkit output with enable_cpr=False so CPR is marked NOT_SUPPORTED up front and ESC[6n is never sent. This is the root-cause counterpart to the existing input-side _strip_leaked_terminal_responses scrubbing. Vt100_Output.from_pty() does not expose enable_cpr in prompt_toolkit 3.x, so _build_cpr_disabled_output() reproduces its get_size setup and calls the constructor directly; it returns None on any failure so startup falls back to the default output. Verified in a real PTY: baseline emits 1 ESC[6n query, the fix emits 0, banner/UI render identically. Layout is unaffected — with CPR off the renderer sizes the prompt to its preferred height (the same fallback prompt_toolkit uses on any terminal that doesn't answer CPR). Co-authored-by: Hermes Agent <noreply@nousresearch.com>	2026-06-27 04:15:20 -07:00
LeonSGP43	e7c013494d	fix(agent): preserve nested API error bodies	2026-06-27 04:13:53 -07:00
Teknium	5ab4136631	fix(webui): switch provider when Config-page model field changes (#53583 ) The dashboard Config tab's Model field is a flat string with no provider info. _denormalize_config_from_web only updated model.default and kept the stale provider, so picking an OpenRouter model while the default provider was ollama-local left provider=ollama-local and every call 404'd. When the model string actually changes, infer the serving provider — curated catalog first, then a vendor/model-slug heuristic for non-aggregator providers — and route the switch through the existing _normalize_main_model_assignment / _apply_main_model_assignment chokepoints so stale base_url/api_mode/api_key are cleared on a provider change and preserved on a same-provider re-pick. Saving an unchanged model never re-detects, so unrelated config saves keep an explicit provider. Closes #14058	2026-06-27 04:13:44 -07:00
teknium1	7ee0b68973	fix(gateway,feishu): refuse executor resurrection during real shutdown Add an explicit _closing guard to both owned executors so the recreate-on-shutdown path only recovers from an external teardown of the loop default — never resurrects a pool the gateway/adapter itself stopped. _shutdown_executor() sets the flag; _get_executor() raises if closing; feishu connect() re-arms on reconnect. Updates the gateway recreate test to assert the refusal contract and adds feishu coverage.	2026-06-27 04:13:09 -07:00
teknium1	b296915c82	fix(feishu): route blocking SDK calls through an adapter-owned executor Feishu SDK calls ran on asyncio's shared default executor, so a torn-down default executor wedged every send with 'Executor shutdown has been called' and left the gateway a zombie (#10849). The adapter now owns a ThreadPoolExecutor recreated on demand if shut down, mirroring the gateway-owned executor change. Routes all 17 self._client SDK calls through _run_blocking; shuts the pool down on disconnect.	2026-06-27 04:13:09 -07:00
konsisumer	1011c07966	fix(gateway): use owned executor for agent work	2026-06-27 04:13:09 -07:00
LeonSGP43	52a09d8faf	fix(byterover): honor auto extract config	2026-06-27 04:04:15 -07:00
teknium1	f062cf076b	fix(agent): also treat provider=ollama as an Ollama GLM backend Follow-up to the #13971 fix: a genuine native Ollama provider reached through a reverse proxy carries no ollama/:11434 URL signature, so the restricted detection would miss it. Add provider=="ollama" as an explicit True case (idea from #14789, @Tranquil-Flow) and cover both it and the #13971 LiteLLM-proxy-to-zai false-positive with E2E tests.	2026-06-27 04:03:07 -07:00
YuShu	00a8252b7d	fix(agent): scope Ollama/GLM stop-to-length heuristic to Ollama only The _is_ollama_glm_backend() function was too broad: any local endpoint running a GLM model was treated as Ollama, triggering the stop->length misreport heuristic introduced in `8011aa3`. This caused false truncation detection on sglang, vLLM, LM Studio, and other non-Ollama servers that correctly report finish_reason. When a GLM model on sglang/vLLM returned finish_reason='stop', the agent mistakenly reclassified it as 'length' if the response didn't end with a whitelisted punctuation character (ASCII or CJK). This particularly affected Chinese-language responses and Markdown-formatted text. Root cause: the is_local_endpoint() fallback assumed any local GLM endpoint = Ollama. But many non-Ollama servers also run on localhost. Fix: remove the is_local_endpoint() catch-all. Only detect Ollama via its distinctive signatures (port 11434, 'ollama' in URL). All other local servers are assumed to report finish_reason correctly. This is the correct tradeoff because: - False negatives (Ollama at custom port, heuristic not triggered) only mean the user sees a truncated response — same as having no heuristic - False positives (non-Ollama server, heuristic wrongly triggered) inject spurious continuation messages into the conversation — strictly worse Adds two tests: - sglang GLM response is NOT reclassified as truncated - Ollama GLM on port 11434 still triggers the heuristic as before Co-authored-by: Hermes Agent <hermes@nousresearch.com>	2026-06-27 04:03:07 -07:00
teknium1	ab1f9b94c5	fix(telegram): accept @username chat_id in delivery paths (#13206 ) TELEGRAM_HOME_CHANNEL set to an @username (not a numeric chat ID) crashed all webhook/cron->Telegram home-channel delivery with 'ValueError: invalid literal for int()'. The Telegram Bot API accepts both a numeric chat_id and an @username string; Hermes was force-coercing every chat_id with int(). Add normalize_telegram_chat_id() (returns int for numeric values, passes @username strings through) and apply it at the Bot API send/edit sites in the Telegram adapter and the send_message tool. Username targets are now recognized as explicit targets in _parse_target_ref. Reapplies the approach from #13274 (season179), whose branch predated the gateway/platforms/telegram.py -> plugins/platforms/telegram/adapter.py relocation. Dupes: #13535 (Tranquil-Flow), #37572 (chewkaah). Co-authored-by: season179 <season.saw@gmail.com>	2026-06-27 04:01:58 -07:00
teknium1	f2ca3e3d84	fix(gateway): hold _run_restart on _restart_task + explicit cancel-loop skip Follow-up on the cherry-picked #13173 fix. Holds the _run_restart task in self._restart_task (a bare asyncio.create_task keeps only a weak reference, so a still-pending task can be GC'd mid-flight) and explicitly skips it in the _stop_impl cancel loop alongside _stop_task. Adds AUTHOR_MAP entry for the contributor and a regression test that fails when the task is cancellable. Refs #12875	2026-06-27 03:57:31 -07:00
zeapsu	1ce5d6d974	fix(gateway): exclude _run_restart from _background_tasks to prevent zombie on /restart When request_restart() adds _run_restart to _background_tasks, _stop_impl later cancels all entries in that set. Since _run_restart is awaiting _stop_task at that point, the CancelledError propagates into _stop_impl, interrupting cleanup before _shutdown_event.set() and _exit_code = 75 execute. This leaves the gateway as a zombie (alive but disconnected) or exiting with code 0 instead of 75, preventing systemd Restart=on-failure from restarting the service. Fix: don't add _run_restart to _background_tasks — it self-terminates in ~50ms and needs no lifecycle management. Fixes #12875	2026-06-27 03:57:31 -07:00
teknium1	08e131f77c	test(telegram): cover bot self-message ingestion guard (#11905 ) Regression tests for the self-author guard added in the salvaged fix: - bot-authored DM-topic watcher echo is dropped (the exact #11905 symptom) - bot self-messages dropped in groups/supergroups too - other bots in the same chat are still processed (self-id, not is_bot) - observe-unmentioned sibling path also rejects self-messages - missing from_user does not crash Test scaffolding ported from @cola-runner's PR #12817 and adapted to the current plugins/platforms/telegram/adapter.py and _is_own_message().	2026-06-27 03:56:52 -07:00
Teknium	d73078e7b0	fix(cron): make per-profile cron isolation intentional and tested (#4707 ) (#53570 ) A profile's cron jobs now provably live in AND execute under that profile's HERMES_HOME. A job authored under profile `coder` is stored at `~/.hermes/profiles/coder/cron/jobs.json` and runs with coder's .env, config.yaml, scripts and skills — never the default root's. This was the de-facto behavior on main but only by accident: PR #50112 had re-anchored cron storage at the shared default root, and a later stale-branch squash merge (#52147) silently reverted it back to the profile home. Neither direction was guarded by a test, so it could flip again on the next stale merge. Changes: - cron/jobs.py: document the per-profile storage anchor (get_hermes_home, NOT get_default_hermes_root) and why anchoring at the root leaks config/credentials/skills across profiles — the #4707 security boundary. - cron/scheduler.py, cron/suggestions.py: same intent documented at the dynamic resolution helper and the suggestions store. - tests/cron/test_cron_profile_isolation.py: pin storage, lock-path, and execution-home resolution to the active profile so a re-anchor can't regress. Verified E2E: jobs created under two profiles land in separate per-profile stores with zero cross-profile leakage and no shared-root store; scheduler execution-home follows the active profile. Full cron suite: 576/576.	2026-06-27 03:55:01 -07:00
Bartok	864d5521ad	test(curator): join straggler curator-review thread on fixture teardown The curator_env fixture left async review threads (synchronous=False spawns a daemon 'curator-review' thread that calls save_state() on completion) running past test teardown. save_state() resolves the state path from HERMES_HOME at write time, so a straggler could write into the next test's tmp home, corrupting test_state_file_survives_corrupt_read (and others) under CI load. Join the thread on teardown while HERMES_HOME is still pinned to this test's home.	2026-06-27 03:52:52 -07:00
Bartok9	45ce35ed72	fix(agent): classify message-only 'overloaded' as server overload Salvage of #14261 by @ms-alan — rebased onto current main, scoped to the overloaded-classification fix, with a regression test that fails without it.	2026-06-27 03:52:52 -07:00
teknium1	151ae1e937	test(api-server): cover SSE failure finish_reason for both failure modes Lock the contract that a clean stream-queue termination followed by an agent failure never reports finish_reason: "stop". Covers the raised- exception case (#12422 repro), the flagged failed-result case, truncation (length), and the success happy path. Follow-up to the salvaged #12504 fix from @flobo3.	2026-06-27 03:52:44 -07:00
blaryx	76af2456a2	fix(dashboard): merge PUT /api/config with existing on-disk config The dashboard form is built from CONFIG_SCHEMA, which doesn't enumerate every root-level key the YAML supports. Most visibly, `custom_providers` is in `_KNOWN_ROOT_KEYS` but is absent from the schema — so the frontend never sends it in the PUT body. The previous full-replace save() then silently wiped the key from disk every time the user clicked anything that triggered a save. Other casualties (less visible because defaults re-mask them on load) include `agent.personalities`, `agent.reasoning_effort`, `terminal.lifetime_seconds`, etc. Fix: read the raw on-disk config and deep-merge the incoming PUT body on top of it before saving. The frontend can only overwrite what it explicitly sends; everything else is preserved verbatim. Reuses the existing `_deep_merge` helper from `hermes_cli.config`. Tests: - `test_round_trip_preserves_custom_providers` exercises the exact bug: seed config with custom_providers, GET → drop the key → PUT, assert it's still on disk. - `test_round_trip_preserves_schema_invisible_nested_keys` covers the shallow-vs-deep-merge case for nested dicts under `agent` etc. Both fail on current main; both pass with this patch.	2026-06-27 03:48:18 -07:00
Teknium	ec769e49d2	fix(gateway): WhatsApp/Signal hints affirm markdown instead of forbidding it (#53564 ) The 'whatsapp' and 'signal' PLATFORM_HINTS told the agent 'Please do not use markdown as it does not render' — factually wrong. Both adapters actively convert markdown to native formatting: - whatsapp_common.format_message(): bold, ~~strike~~, # headers, links, code blocks -> WhatsApp native syntax - signal_format.markdown_to_signal(): same conversions via bodyRanges, plus '- item' / '* item' bullets -> '• ' Unicode bullets The wrong hint made the agent strip bullets and bold the adapter would have rendered (#12224). Rewrote both hints to mirror whatsapp_cloud: markdown is auto-converted, bullet lists work, tables are not supported. Added a contract test asserting markdown-converting platforms never forbid markdown in their hint.	2026-06-27 03:46:41 -07:00
dodo-reach	ed54469d06	fix(gateway): show MoA presets in model picker	2026-06-27 03:43:38 -07:00
teknium1	4e0788783b	refactor(gateway): extract MoA one-shot restore helper; restore #28686 comment; real-method tests Follow-up on the salvaged MoA restore fix: - Extract the finally-block restore into _restore_moa_one_shot() so the behavior is unit-testable without re-implementing it, and so the gateway /moa handler and the finally block share one implementation. - Restore the load-bearing #28686 zombie-eviction comment above _release_running_agent_state that the original diff dropped. - Rewrite the tests to call the real _restore_moa_one_shot helper (the originals re-implemented the restore logic inline, so they passed regardless of the production code).	2026-06-27 03:43:28 -07:00
srojk34	2f29e3cfc5	fix(gateway): restore MoA one-shot model override on failed turns The MoA one-shot restore ran inside the try block after _handle_message_with_agent returned. When that call raised an exception (agent init failure, interpreter shutdown, OOM), the restore was skipped and the MoA model override stayed permanently on _session_model_overrides — silently routing all subsequent messages through the MoA reference fan-out with no user-visible indication. Move the restore to the finally block so it fires on every exit path (success, exception, interrupt). The restore data lives on the per-turn event object and would be lost if not consumed here.	2026-06-27 03:43:28 -07:00
briandevans	17cb829991	test(moa): cover non-list/bare-dict reference_models normalization	2026-06-27 03:43:16 -07:00
Teknium	60f58a2b95	feat(verify-on-stop): default OFF, one-time migration, skip doc-only edits (#53552 ) The verify-on-stop guard fired too eagerly — including on doc/markdown/skill edits with nothing to verify, where it pushed a pointless /tmp verification script. Three changes: 1. Default OFF for new installs: agent.verify_on_stop defaults to false (was the "auto" surface-aware sentinel). _config_version bumped 30 -> 31. 2. One-time migration (v30 -> v31): existing installs are switched off once, but only when the value is missing or still the "auto" sentinel — an explicit true/false the user set is preserved. 3. Path filter: build_verify_on_stop_nudge() now drops documentation/prose paths (.md/.mdx/.rst/.txt/LICENSE/CHANGELOG/...) so even when explicitly enabled, a doc-only turn never nudges. Mixed doc+code turns still nudge on the code paths. The legacy "auto" sentinel is still honored when set explicitly (ON for interactive coding surfaces, OFF for messaging). HERMES_VERIFY_ON_STOP env override unchanged.	2026-06-27 03:23:22 -07:00
Versun	c655cdf2c1	feat(dashboard): expose cron job execution fields	2026-06-27 03:20:32 -07:00
teknium1	50f6855217	feat(moa): make /moa one-shot only; route preset switching through the model picker /moa no longer does a sticky model switch. It now always runs a single prompt through the default MoA preset and restores the prior model afterward; the whole argument is the prompt (no preset-name matching). To switch to a MoA preset for the session, select it from the model picker, where presets already surface under a virtual Mixture of Agents provider on every model-selection surface. Also fixes #53444: the TUI one-shot only set session[model_override], which the already-built cached agent ignored, so MoA silently never ran and the turn used the original model. The TUI now does a real in-place agent.switch_model() via _apply_model_switch() when a live agent exists (with a proper restore after the turn), and falls back to a model_override for lazy/unbuilt sessions. Removes the redundant sticky-switch branch from the CLI, gateway, and TUI /moa handlers; updates the command description, usage string, and docs.	2026-06-27 03:09:09 -07:00
diamondeyesfox	8df231c941	fix(agent): rebaseline in-place compression flushes	2026-06-27 03:04:26 -07:00
Mahesh Sanikommu	1b75b3fd90	feat(memory): add Supermemory setup connection summary Add post_setup() and get_status_config() to the Supermemory memory provider so `hermes memory setup` and `hermes memory status` print a one-line connection summary (container, profile fact count, auto_recall/auto_capture). Point API-key onboarding at the Hermes connect URL (app.supermemory.ai/integrations?connect=hermes). Salvage of #52988. Two fixes folded in: - Test isolation: the new probe/status tests mocked _SupermemoryClient but not the __import__("supermemory") guard inside _probe_supermemory_connection, so they passed only where the optional supermemory package was installed and failed on a clean checkout / CI (the PR shipped with red CI). Added _stub_supermemory_importable() mirroring the existing test_is_available_false_when_import_missing pattern; the suite now passes with supermemory absent. - post_setup: `if api_key and api_key not in os.environ` checked whether the key's value named an env var (always false in practice). Fixed to compare the value: `os.environ.get("SUPERMEMORY_API_KEY") != api_key`. Verified: 38/38 in test_supermemory_provider.py and the full tests/plugins/memory/ suite green with supermemory not installed. Closes #52988	2026-06-27 15:07:34 +05:30
underthestars-zhy	8827300267	fix(photon): correlate tapbacks to bot message context Populate `reply_to_message_id`, `reply_to_text`, and `reply_to_is_own_message` on reaction events so the gateway injects `[Replying to your previous message: "..."]` when the agent receives a tapback. The sidecar now extracts a capped text preview from the hydrated reaction target (plain text and mixed group messages; null for attachment/voice-only targets), emitting it as `targetText` in the NDJSON reaction payload. The Python adapter reads this field and sets the reply correlation fields on the `MessageEvent`.	2026-06-27 00:51:34 -07:00
underthestars-zhy	4345b3e767	fix(photon): upgrade spectrum-ts sidecar to v8.0.0 v8 made `richlink` outbound-only; inbound rich links now arrive as plain `text`. Remove the `getBalloonBundleId`/`toRichlinkMessage` branches from the iMessage mapper patch and update the fixture, lockfile, and README accordingly.	2026-06-27 00:51:34 -07:00
underthestars-zhy	5636c22828	feat(photon): upgrade spectrum-ts sidecar to v7.0.0 Update the Photon platform plugin's Node.js sidecar from spectrum-ts 3.1.0 to 7.0.0, which splits the SDK into scoped `@spectrum-ts/*` packages with `spectrum-ts` as the umbrella re-export. - Bump exact pin in package.json/package-lock.json to 7.0.0 - Update mixed-attachments patch script to target the new `@spectrum-ts/imessage/dist/index.js` path and tab-indented output - Rewrite test fixture to match v7.x mapper shape (tab-indented, `const ... = async` declarations, single-line builder calls) and point at `@spectrum-ts/imessage/dist/index.js` - Update README upgrade guide to document the v5 package split and the postinstall patch validation step - Update comments in cli.py and index.mjs to reference v5/v7 changes	2026-06-27 00:51:34 -07:00
Teknium	d712a7fd73	fix(model-picker): surface the current custom/uncurated model in picker rows (#53457 ) A model selected via the CLI (e.g. /model openrouter/<uncurated-name>) was absent from every model picker — the main picker AND the MoA reference/ aggregator slot pickers — because each provider row only carried its curated catalog. Inject the current model at the front of its provider's row so it is selectable and shown everywhere.	2026-06-27 00:06:34 -07:00
Ben Barclay	fbf748b282	fix(dashboard-auth): follow redirects on self-hosted OIDC discovery (#53399 ) The self-hosted OIDC provider fetched the discovery document with a bare httpx.get(). httpx defaults to follow_redirects=False (unlike curl -L or the requests library), so when an IDP answers GET /.well-known/openid-configuration with a 3xx — Authentik canonicalises the .well-known path, and any IDP behind a reverse proxy doing an http→https upgrade redirects too — the bare redirect (empty body) tripped the status != 200 guard and raised 'OIDC discovery returned 302', which routes.py maps to the provider_unreachable audit event and a 503. The browser surfaced 'Auth provider self-hosted unreachable'. The user's smoking gun (curl -o writing zero bytes from inside the container) is exactly a redirect with no body — the same wall the code hit. Add follow_redirects=True to the discovery GET only. It's safe: the issuer-pin check and _require_https_or_loopback still validate the resolved document and every endpoint, so a redirect can't smuggle in a bad issuer or a cleartext endpoint. The token/revocation POSTs deliberately keep the no-follow default (they carry an auth code / refresh token and the endpoint is already the canonical absolute URL). Existing discovery tests mocked httpx.get with a canned 200 and never exercised a real 3xx. Add a regression test that runs a real loopback server returning a 302 on the .well-known path — fails without the fix (ProviderError: discovery returned 302), passes with it.	2026-06-27 14:14:51 +10:00
ethernet	bcc3eb3419	fix(ci): rip out some xdist legacy stuff... how did these ever work??	2026-06-26 19:15:18 -07:00
ethernet	f0cb049217	change(ci): migrate docker smoketests to real tests	2026-06-26 19:15:18 -07:00
ethernet	fb1dd1bf91	change(ci): docker-publish.yml -> docker.yml	2026-06-26 19:15:18 -07:00
ethernet	c918d07b50	refactor(ci): rewrite docker tests to check built container	2026-06-26 19:15:18 -07:00
ethernet	638243726e	refactor(ci): faster docker builds via --link and chmod removal	2026-06-26 19:15:18 -07:00
Nacho Avecilla	dbe734beff	fix(dashboard-auth): exclude non-interactive providers from interactive login surfaces (#53239 ) * Return None instead of erroring on drain login failure * Fix login on drain * Remove login for drained endpoints flow and clean the code * chore: drop unrelated credits changes from this PR * Remove extra comments that were not really necessary	2026-06-27 10:08:13 +10:00
kshitijk4poor	7475d125d2	test(mcp): stub mcp_oauth in backgrounding test to deflake CI The backgrounding-contract test (test_prepare_agent_startup_backgrounds_ blocking_mcp_for_chat) failed intermittently on loaded CI shards: it stubs tools.mcp_tool.discover_mcp_tools but NOT tools.mcp_oauth, so the background discovery thread paid the real, cold ~0.75s 'import tools.mcp_oauth' (added by this PR's _discover_mcp_tools_without_interactive_oauth) before calling the stubbed discovery. On a slow/loaded runner that import plus thread scheduling exceeded the 1.0s polling deadline, leaving calls['mcp'] == 0. Fix: stub tools.mcp_oauth with a nullcontext suppress_interactive_oauth (the same no-op production falls back to when mcp_oauth is unavailable), so the test exercises the backgrounding contract without paying an unrelated cold import in its timing window. Bumped the poll deadline 1.0s -> 3.0s as belt-and-suspenders. Production behaviour is unchanged; the import cost was always off the main thread. Verified: 5/5 pass repeatedly via scripts/run_tests.sh (per-file isolation, matching CI), ruff clean.	2026-06-27 04:59:23 +05:30
zapabob	e55ddc3e33	fix(mcp): suppress interactive OAuth stdin prompts during background discovery (#35927 ) When an MCP server requires OAuth, the interactive `hermes` TUI froze on startup: background MCP discovery hit the OAuth flow, which on an interactive TTY spawns a daemon thread doing a blocking `sys.stdin.readline()` (the "paste the redirect URL" fallback in mcp_oauth._wait_for_callback). That thread competes with the TUI's own stdin reader for the same terminal, so keystrokes get swallowed and the TUI appears frozen (up to the 300s OAuth timeout). Reported symptom: "MCP OAuth: authorization required / Open this URL ... the tui is freezing, not respond to typing." Add a thread-local `suppress_interactive_oauth()` context manager in tools/mcp_oauth.py; `_is_interactive()` returns False while it's active, so the stdin paste-thread and prompt are never created. Background discovery (hermes_cli/mcp_startup.py, tui_gateway/entry.py) now runs discovery inside that context, so OAuth-requiring servers soft-skip (raise OAuthNonInteractiveError, already handled) instead of stealing the TUI's stdin. A real `hermes mcp login` on the main thread is unaffected (thread-local). Salvaged from #35945 by @zapabob (authorship preserved via cherry-pick; resolved a conflict against main's new mcp_discovery_timeout / wait_for_mcp_ discovery refactor, keeping both). Verified E2E: with suppression the paste prompt is NOT printed and no stdin thread spawns (raises OAuthNonInteractive soft-skip); without it the prompt shows (the freeze). Mutation-verified (removing the suppress check in _is_interactive fails the regression test). 76 tests pass, ruff clean. Closes #35927. SELF-REVIEW FIX: the original #35945 used threading.local(), which does NOT propagate to the dedicated mcp-event-loop thread where OAuth actually runs (discover_mcp_tools dispatches the connect via run_coroutine_threadsafe), so the suppression was a NO-OP in production (the tests passed only by stubbing out the cross-thread dispatch). Converted to a contextvars.ContextVar, which asyncio copies onto the scheduled coroutine — empirically verified suppression now holds on the mcp-event-loop thread through the real _run_on_mcp_loop path. Added a cross-thread regression test (fails on threading.local, passes on the ContextVar) so the no-op can't regress.	2026-06-27 04:59:23 +05:30
briandevans	2d8c44ac87	fix(hermes-home): only honour legacy dir layout when it has content get_hermes_dir(new_subpath, old_name) returned the legacy <old_name>/ location as soon as it existed on disk — even when empty. When an empty legacy stub is created on a profile that already has populated data at the new consolidated <new_subpath>/ (install scaffolds, profile init, a stray mkdir, or ensure_hermes_home() recreating legacy dirs), the resolver silently flipped to the empty legacy dir and the real data became invisible. No log, no error — the feature behaved as if state was wiped. Reproduced as a Discord pairing store losing every approved user when an empty pairing/ shadowed the populated platforms/pairing/. Resolve the legacy path only when it has content: a populated directory (any entry) or a non-directory file counts; an empty directory falls through to the new layout. Inspection failures (PermissionError on lstat/iterdir, or any OSError short of FileNotFoundError) are treated as "occupied" so a transient error never orphans legacy data — only a genuine FileNotFoundError counts as absent. The lstat()-based gate also fixes the prior exists()/is_dir() path swallowing PermissionError and mis-reading an unreadable legacy dir as absent. This hardens all 11+ call sites that share the resolver (pairing, image/audio/video/document caches, matrix/whatsapp session stores, vision/credential/tts/browser dirs). Adds TestGetHermesDir regression coverage (empty/populated/subdir/file/ unreadable/unstatable cases) and updates test_credential_files to populate its legacy dirs so they still count as content. Closes #27602 Closes #27715	2026-06-27 04:57:15 +05:30
briandevans	c377e954fb	test(gateway): isolate secret-redaction layer from provider-error rewrite The existing test_chat_gateways_redact_secret_in_provider_error feeds a provider-error envelope (HTTP 401), which _sanitize_gateway_final_response rewrites wholesale to a generic category string. That rewrite strips the secret regardless of whether the redaction layer works, so the test cannot on its own prove _redact_gateway_user_facing_secrets is exercised. Add test_chat_gateways_redact_secret_in_non_error_body: ordinary assistant prose that echoes a bearer token but is NOT a provider-error envelope, so the rewrite path does not fire and secret redaction is the only defense. Verified fail-before (token leaks when _GATEWAY_SECRET_PATTERNS is emptied) and pass-after across whatsapp/slack/signal/matrix, while non-secret prose is preserved intact.	2026-06-27 04:47:10 +05:30

1 2 3 4 5 ...

6359 commits