hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-14 14:12:44 +00:00

Author	SHA1	Message	Date
aqilaziz	1a82b7a1ff	fix(tests): stabilize xai env and provider parity	2026-05-17 11:55:25 -07:00
worlldz	73df329214	fix(doctor): flag missing credentials for active openrouter provider	2026-05-17 11:53:04 -07:00
vaddisrinivas	7847a58b3a	fix(docker): preload messaging gateway deps	2026-05-17 11:51:46 -07:00
Hoang V. Pham	4a7cd2e16d	fix(codex): allow kanban worker board writes	2026-05-17 11:50:43 -07:00
soynchux	280c63ce91	fix(mcp): prevent parallel-safe prefix collisions	2026-05-17 11:41:26 -07:00
Mind-Dragon	874dad5cc1	test(delegation): add regression test for runtime missing 'provider' key Addresses reviewer feedback: when resolve_runtime_provider returns a dict without the 'provider' key, the result must be None regardless of configured_provider. This guards against malformed runtime responses. Test: test_runtime_missing_provider_key_returns_none	2026-05-17 11:40:05 -07:00
Mind-Dragon	84667cbc21	fix(delegation): preserve configured_provider name when runtime returns 'custom' Named custom providers (e.g. crof.ai) resolve to provider='custom' at the runtime level, causing subagents to lose their intended provider identity. On retry/fallback, resolve_provider_client('custom', model=...) searches all providers advertising that model and picks non-deterministically, routing to Z.AI or Bailian instead of the configured target. The fix preserves configured_provider when runtime['provider'] == 'custom', restoring the original provider name so routing stays correct through retries. Adds a named constant _RUNTIME_PROVIDER_CUSTOM instead of a magic string. Adds three regression tests: - test_named_custom_provider_preserves_provider_name: the #26954 case - test_standard_provider_not_overwritten_by_configured_name: openrouter/nous must still return their own identity, not the configured name - test_custom_provider_with_empty_configured_provider_falls_back_to_runtime: empty provider triggers the early-return None path as before	2026-05-17 11:40:05 -07:00
QuenVix	2f28b60a47	fix(send_message): preserve Slack and Matrix thread targets resolved from channel directory	2026-05-17 11:38:55 -07:00
QuenVix	d5a0815c3d	fix(transports): use monotonic deadlines in codex app-server turn loop	2026-05-17 11:37:45 -07:00
EloquentBrush0x	d0f551b44e	fix(doctor): show xAI OAuth login state in hermes doctor Auth Providers section `hermes doctor` displayed OAuth status for Nous, Codex, Gemini, and MiniMax but silently omitted xAI OAuth, even though `get_xai_oauth_auth_status()` exists and the same information is already surfaced in `hermes status`. Add xAI OAuth as a separate try/except block so an import failure cannot silence the already-printed provider rows above it — consistent with the per-provider isolation introduced in the doctor fallback fix. Tests: - 9 new tests in TestDoctorXaiOAuthStatus covering: logged-in ok, not-logged-in warn, error line present/absent, import failure isolation, runtime exception and None-return safety. - 9 existing run_doctor helpers updated to mock get_xai_oauth_auth_status for deterministic output.	2026-05-17 11:35:57 -07:00
EloquentBrush0x	016893f5e4	feat(status): show xAI OAuth login state in hermes status hermes status listed Nous Portal, OpenAI Codex, Qwen OAuth, and MiniMax OAuth in the Auth Providers section but omitted xAI OAuth entirely. Users who authenticated via `hermes auth add xai-oauth` had no way to verify their session state from the status output. Add xAI OAuth display using the same field shape as OpenAI Codex: auth_store (Auth file:), last_refresh (Refreshed:), and error when not logged in. The import is isolated in its own try/except so an import failure cannot affect the already-printed rows above it. Tests cover: - logged in: check mark, auth_store, last_refresh, error suppressed - not logged in: login command hint, error shown, error absent = no line - resilience: import failure, status function raises, returns None - isolation: xAI import failure does not break Nous/MiniMax display	2026-05-17 11:35:57 -07:00
EloquentBrush0x	e10bb9dffa	fix(doctor): isolate per-provider OAuth imports to prevent fallback regression Shared try/except import block meant that if any one status function was missing, all providers lost their OAuth fallback suppression. Split into per-provider try/except so each branch is independently safe. Add end-to-end test for xAI: bad XAI_API_KEY with healthy OAuth does not surface a blocking issue in run_doctor output. Add tests for None return, import failure isolation (xAI missing does not break Gemini), and move test_returns_false_for_unknown_provider out of the xAI-specific class.	2026-05-17 11:35:57 -07:00
EloquentBrush0x	e89d78ff09	fix(doctor): suppress stale XAI_API_KEY issue when xAI OAuth is healthy _has_healthy_oauth_fallback_for_apikey_provider() covers Gemini and MiniMax (added by #26853) but omits xAI. The xAI provider profile (plugins/model-providers/xai/__init__.py) has auth_type="api_key" and env_vars=("XAI_API_KEY",), so it enters the generic API-key connectivity loop. When XAI_API_KEY fails a 401 probe but xAI OAuth is healthy, the failure is promoted to the blocking summary even though xAI works fine via OAuth — the same false-positive #26853 fixed for Gemini and MiniMax. Fix: import get_xai_oauth_auth_status alongside the existing two helpers and add the "xai" branch. get_xai_oauth_auth_status() already exists in hermes_cli/auth.py and returns {"logged_in": True} when a valid OAuth token is present. Symmetric with the Gemini and MiniMax branches introduced in #26853. No behavior change for providers without an OAuth path.	2026-05-17 11:35:57 -07:00
kshitijk4poor	c74ff2c8ef	fix(browser): self-review pass — dead-import, log levels, future-proofing Addresses findings from two self-review passes pre-merge. First pass (3-agent parallel review): 1. plugins/browser/browser_use/provider.py: drop the ``_ = managed_nous_tools_enabled`` dead-import-hider in _get_config_or_none(). The import was actively misleading — the helper IS used in _get_config() (separate method, separate import), not here. The "keep static analysis happy" comment was wrong about what the helper does in this scope. 2. agent/browser_provider.py: drop ``pragma: no cover`` from is_configured() / provider_name() backward-compat aliases. They ARE covered by ``TestLegacyAbcAliases`` — the pragma would have masked future regressions. 3. tools/browser_tool.py: refactor _is_legacy_provider_registry_overridden() to compare against a module-frozen _DEFAULT_PROVIDER_REGISTRY snapshot instead of hardcoded set of 3 keys. Future maintainers adding a 4th built-in provider now just extend _PROVIDER_REGISTRY; the override detection adapts automatically. Previously the hardcoded ``set(...) != {"browserbase", "browser-use", "firecrawl"}`` would flip True forever on any 4-key registry, silently routing every install onto the legacy fixture path. 4. tools/browser_tool.py: when explicit ``browser.cloud_provider`` is set but the registry has no matching plugin (typo, uninstalled plugin, discovery failure), emit a WARNING with actionable text instead of silently falling through to auto-detect. Legacy code surfaced a typed credentials error via direct class instantiation; this log restores the signal in the post-migration path. 5. agent/browser_registry.py: trim the triple-redundant _LEGACY_PREFERENCE documentation. Module docstring + 13-line block-comment + 5-line inline comment was repeating the same point. Kept the docstring and trimmed the block-comment to 5 lines. 6. agent/browser_registry.py: upgrade is_available()-raised logging from DEBUG to WARNING with exc_info=True. A provider's availability check throwing is unusual enough that users debugging "no cloud provider" need the traceback in logs. 7. tests/plugins/browser/check_parity_vs_main.py: drop dead top-level imports (os, shutil, tempfile — only referenced inside the SUBPROCESS_SCRIPT string literal that runs in a child process). Second pass (architecture + claim-verification review): 8. tools/browser_tool.py: rewrite the inline comment in _get_cloud_provider auto-detect branch. Prior text claimed it "routes through the plugin registry's legacy preference walk so third-party plugins still get a chance to be selected when they're explicitly configured" — false on both counts. The branch uses module-level legacy class aliases (BrowserUseProvider / BrowserbaseProvider) directly; third-party plugins are intentionally reachable only via explicit ``browser.cloud_provider``. Corrected comment now matches behaviour and cross-references _LEGACY_PREFERENCE for the firecrawl gate rationale. 9. tools/browser_tool.py + tests/tools/test_managed_browserbase_and_modal.py: drop the unused ``get_active_browser_provider as _registry_get_active_browser_provider`` alias from the ``from agent.browser_registry import ...`` block. It was never referenced; matching test-stub line in the agent.browser_registry SimpleNamespace also dropped. ``get_provider`` is still imported (used by the explicit-config dispatch path at line 535). 10. plugins/browser/firecrawl/provider.py: align emergency_cleanup() with the early-guard pattern used in browserbase + browser_use plugins. Previously firecrawl tried the DELETE and relied on ``_headers()`` raising ValueError to trip a "missing credentials" warning; same final outcome but a different control flow that read like a bug to a maintainer skimming the three modules. Now: if is_available() is False, log+return early — identical shape to the other two providers. Verification: 54/54 unit tests + 13/13 parity scenarios still pass.	2026-05-17 04:04:15 -07:00
kshitijk4poor	1bb6f03724	fix(browser): ensure plugin discovery before registry lookup; parity harness Two changes that go together: 1. tools/browser_tool.py — add _ensure_browser_plugins_loaded() and call it from _get_cloud_provider() before consulting the registry. Normally model_tools triggers discover_plugins() as an import side-effect, but _get_cloud_provider() can be reached from contexts that haven't gone through model_tools (standalone scripts, certain unit-test paths, the new parity-sweep harness). Without the defensive call, the registry is empty and _registry_get_browser_provider() returns None — silently downgrading users to local mode when they explicitly configured a cloud provider with no credentials yet. The behavior-parity sweep below caught this as 4 scenario regressions (explicit-X-no-creds for all 3 providers, and explicit-firecrawl-with-creds). 2. tests/plugins/browser/check_parity_vs_main.py — subprocess harness that pins one Python invocation to origin/main and one to this PR's worktree via sys.path.insert(), runs _get_cloud_provider() across a 13-scenario config matrix, and diffs the reduced shape tuple (is_local, provider_name, is_available). Provider_name pulls from provider.provider_name() which is the legacy CloudBrowserProvider API and remains as a backward-compat alias on the new BrowserProvider ABC, so the comparison is apples-to-apples regardless of class identity. Final result: PARITY OK across 13 scenarios. The four observable config/credential matrices that exercise the dispatcher all match origin/main bit-for-bit: - no-config + no-env → local - explicit local + any env → local - explicit BB / BU / FC + no creds → provider returned with is_available()==False (so dispatcher surfaces typed credentials error; matches main exactly) - explicit BB / BU / FC + creds → provider returned with is_available()==True - no-config + BU creds → Browser Use - no-config + BB creds → Browserbase - no-config + both → Browser Use (legacy walk first hit) - no-config + FC only → local (firecrawl NOT in legacy walk) - no-config + FC + BB → Browserbase (legacy walk skips firecrawl) Per the dev skill's "behavior-parity for refactor PRs" rule — without this subprocess sweep, 31/31 unit tests pass while the production code path is silently broken for users who type `browser.cloud_provider: browserbase` and run a single browser command without prior model_tools import. Caught + fixed before push.	2026-05-17 04:04:15 -07:00
kshitijk4poor	fec0a0da98	test(plugins/browser): coverage for the 3-plugin migration Mirrors tests/plugins/web/test_web_search_provider_plugins.py from PR #25182. 31 tests across 5 classes: TestBundledPluginsRegister (8 tests) - Three plugins register (browserbase, browser-use, firecrawl) - Each plugin's name + display_name accessible - get_setup_schema() returns picker-shaped dict with post_setup hook - All three lifecycle methods (create_session, close_session, emergency_cleanup) overridden on every plugin TestIsAvailable (4 tests) - browserbase needs BOTH BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID - browserbase: api_key alone or project_id alone insufficient - browser-use satisfied by BROWSER_USE_API_KEY - firecrawl satisfied by FIRECRAWL_API_KEY TestRegistryResolution (8 tests) — most valuable, locks down pre-migration semantics: - _resolve(None) with no creds returns None (local mode) - _resolve('local') short-circuits to None - _resolve('browserbase') returns provider even when unavailable (so dispatcher surfaces typed credentials error) - _resolve('firecrawl') same: explicit-config wins - _resolve('unknown') falls through to auto-detect - Legacy walk picks browser-use over browserbase - browserbase-only configuration: browserbase wins - Regression: firecrawl is NEVER auto-selected even when single-eligible (preserves pre-migration gate; FIRECRAWL_API_KEY shared with web firecrawl must not silently route to paid cloud browser) TestLegacyAbcAliases (6 tests) - is_configured() delegates to is_available() for all three plugins - provider_name() returns display_name for all three plugins TestPickerIntegration (3 tests) - _plugin_browser_providers() exposes all three plugins as rows - Each row carries post_setup='agent_browser' - browser_plugin_name marker matches browser_provider All tests use real imports — no mocking of provider classes — so the suite catches drift in the ABC, registry, picker injection, and plugin glue layer simultaneously. 31/31 passing.	2026-05-17 04:04:15 -07:00
kshitijk4poor	250caebeb1	refactor(browser): delete tools/browser_providers/ directory; migrate tests The four files in tools/browser_providers/ (base.py, browserbase.py, browser_use.py, firecrawl.py) have been migrated into plugins/browser/<vendor>/provider.py over the previous commits. No in-tree code references them anymore — the legacy class names (BrowserbaseProvider / BrowserUseProvider / FirecrawlProvider) are re-exported from tools.browser_tool as aliases to the plugin classes, so existing test patches keep working. Updates tests/tools/test_managed_browserbase_and_modal.py: - Adds _load_plugin_module() helper next to _load_tool_module(). - Reroutes five _load_tool_module('tools.browser_providers.X', ...) calls to _load_plugin_module('plugins.browser.X.provider', ...). - Renames BrowserbaseProvider/BrowserUseProvider -> the new plugin class names (BrowserbaseBrowserProvider / BrowserUseBrowserProvider). - Updates is_configured() -> is_available() on the one assertion that cared about the rename (the others stay on is_configured() via the BrowserProvider ABC's backward-compat alias). Net diff: -630 / +39 lines (tests + dead-code deletion). Verified 23/23 tests in test_browser_cloud_*.py + test_managed_browserbase_and_modal.py still pass. Closes the file-tree mismatch portion of #25214. Remaining work: new plugin-level test coverage under tests/plugins/browser/, behaviour parity subprocess sweep vs origin/main, and full tests/tools/ regression sweep before opening the PR.	2026-05-17 04:04:15 -07:00
kshitijk4poor	40fde853fa	refactor(browser): dispatch _get_cloud_provider through agent.browser_registry Switches tools.browser_tool's cloud-provider lookup from the hardcoded _PROVIDER_REGISTRY class-instantiation pattern to the agent.browser_registry singleton registry that plugins self-populate. Changes: - tools/browser_tool.py top imports: pull BrowserProvider from agent.browser_provider (re-exported as CloudBrowserProvider for legacy callers) and the three provider classes from plugins/browser/<vendor>/. Legacy class names (BrowserbaseProvider, BrowserUseProvider, FirecrawlProvider) remain on tools.browser_tool as re-export shims so existing test patches (monkeypatch.setattr(browser_tool, 'BrowserUseProvider', ...)) keep working. - _get_cloud_provider() now consults agent.browser_registry.get_provider() for explicit-config lookups. The auto-detect fallback still uses BrowserUseProvider() / BrowserbaseProvider() at the module level so the cache-policy test fixtures (which patch those names) keep driving the function. Test-time _PROVIDER_REGISTRY overrides are detected by class identity and routed through the legacy factory-call path. - agent/browser_provider.py: BrowserProvider grows is_configured() and provider_name() as thin backward-compat aliases for the legacy CloudBrowserProvider API. Subclasses MUST implement is_available() and name; the aliases delegate. This keeps ~6 caller sites in browser_tool.py working without churning them. - tests/tools/test_managed_browserbase_and_modal.py: _install_fake_tools_package grows stubs for agent.browser_provider / agent.browser_registry / plugins.browser.<vendor>.provider so the test's spec-loader path (sys.modules-reset + reload-tool-from-disk) can satisfy tools.browser_tool's top-level imports. Verified: all 23 existing tests in test_browser_cloud_*.py + test_managed_browserbase_and_modal.py still pass post-cutover. The legacy tools/browser_providers/ directory is NOT yet deleted; several tests still _load_tool_module() those files via spec_from_file_location. The deletion + test-path updates land in a later commit.	2026-05-17 04:04:15 -07:00
flamiinngo	63805965e7	fix(security): restore type safety and extract constant in shell hook block handler Address code review feedback on _parse_response: 1. Restore isinstance(raw, str) guard so non-string message/reason values (e.g. integers, lists) from a malformed hook response fall back to the default rather than being forwarded as-is. This keeps the contract that message in the returned dict is always a string. 2. Extract the repeated literal 'Blocked by shell hook.' into a module-level constant _DEFAULT_BLOCK_MESSAGE to avoid duplication and make it easy to change in one place. Four new unit tests added to tests/agent/test_shell_hooks.py covering: - action block with no message (uses default) - decision block with no reason (uses default) - action block with empty string message (uses default) - action block with non-string message, e.g. integer (uses default)	2026-05-17 02:31:18 -07:00
godlin	6622277f11	fix ACP start events for polished tools	2026-05-17 02:31:18 -07:00
haran2001	d9abbe7fa4	fix(metadata): qwen3.6-plus has a 1M context window (#27008 ) qwen3.6-plus did not have an explicit entry in DEFAULT_CONTEXT_LENGTHS, so the longest-substring fallback matched the generic 'qwen': 131072 catch-all. That dropped the effective context limit from 1,048,576 tokens to 131,072, prematurely lowered the compression threshold, and produced misleading warnings about main/compression context mismatch in long sessions. Add an explicit 'qwen3.6-plus': 1048576 entry before the catch-all and cover it with a regression test (bare, qwen/, and dashscope/ prefixes). Note: PR #6599 also mentions touching model_metadata.py but the actual diff only edits hermes_cli/models.py, so this fix is independent and not duplicated by that PR. Closes #27008	2026-05-17 02:31:18 -07:00
haran2001	5a2a858b84	test(restart_drain): assert i18n catalog resolved (#22266 ) The restart-drain test previously asserted equality between two calls to t("gateway.draining", count=1), which masked the original xdist failure mode in #22266: if the locale catalog is not resolved from the worker's import path, t() returns the bare key path and both sides of the equality still match. Add a guard that the resolved value is not the raw catalog key and contains the English placeholder substitution. This keeps the test loudly failing when locale resolution silently degrades.	2026-05-17 02:31:18 -07:00
Yanzhong Su	d87b27cff8	fix(gateway): add codex runtime telegram alias	2026-05-17 02:31:18 -07:00
kshitij	5fba236644	chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355 ) Six days after #23937 (608 fixes) the codebase had accumulated 241 new PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix, same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the two are semantically equivalent for hashable scalar membership tests. All 241 instances fixed via `ruff check --select PLR6201 --fix --unsafe-fixes`, zero remaining. Every changed value is a hashable scalar (str/int/None/enum/signal); no risk of unhashable runtime errors. No behavior change. Test plan: - 119 files changed, +244/-244 (net zero) — exactly one-line edits - `ruff check` clean afterward - Compile checks pass on the largest touched files (cli.py, run_agent.py, gateway/run.py, gateway/platforms/discord.py, model_tools.py) - Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/ tests/tools/: 18187 passed, 59 pre-existing failures (verified against origin/main with the same shape — identical failure count, identical category — all xdist test-order flakes unrelated to this change) Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).	2026-05-17 02:29:41 -07:00
EloquentBrush0x	ad00777f04	fix(mcp-oauth): print SSH tunnel hint in _redirect_handler When Hermes runs on a remote host over SSH, MCP OAuth loopback flows silently fail: the OAuth provider redirects the user's browser to http://127.0.0.1:<port>/callback, which reaches the callback server on the remote machine — not the local machine where the browser is running. _redirect_handler already detected SSH (via _can_open_browser) and printed "Headless environment detected — open the URL manually." but gave no guidance on how to actually reach the callback server. Users got silent timeouts or "Could not establish connection" errors. This is the same bug fixed for xAI-oauth and Spotify in #26592, which added _print_loopback_ssh_hint() in hermes_cli/auth.py. mcp_oauth.py uses the identical loopback callback pattern (http://127.0.0.1:<port>/callback via _configure_callback_port / _wait_for_callback) but was missing the hint. Fix: when SSH_CLIENT or SSH_TTY is set and _oauth_port is available, print the ssh -N -L port-forward command and the OAuth-over-SSH guide URL to stderr, consistent with the rest of _redirect_handler's output. Tests: 4 new cases in TestRedirectHandlerSshHint covering SSH_CLIENT, SSH_TTY, local session (no hint), and missing _oauth_port (no hint).	2026-05-17 02:29:37 -07:00
EloquentBrush0x	a9ba636d53	fix(tools): run post_setup in _reconfigure_provider() for env-var providers _configure_provider() calls _run_post_setup() after collecting env vars (line 2286). _reconfigure_provider() did not — providers with both env_vars and post_setup (Browserbase, Browser Use, Firecrawl, Camofox) skipped the installation step on reconfiguration. Fix: mirror the _configure_provider() call. post_setup hooks are idempotent (check before installing), so no behaviour change for users who already have the dependencies installed.	2026-05-17 02:21:06 -07:00
Teknium	ad1aa1a037	feat(x_search): auto-enable toolset when xAI OAuth or XAI_API_KEY is configured (#27376 ) The x_search toolset is gated on xAI credentials (SuperGrok OAuth or XAI_API_KEY), but it was staying off-by-default even for users who had already configured those credentials — they had to also click through `hermes tools` → X (Twitter) Search to flip it on. The HASS_TOKEN → homeassistant rule already handles the parallel case cleanly; x_search needs the same treatment. Why a separate code path from HASS_TOKEN: `ha_` tools live inside the `hermes-cli` composite, so the subset-inference loop picks them up and the HASS branch just unmasks default_off. `x_search` is its own one-tool toolset NOT in the composite, so the subset loop never adds it — it has to be injected directly. Add `_xai_credentials_present()` — side-effect-free check for stored xAI OAuth tokens or XAI_API_KEY (dotenv or env). No network. * In `_get_platform_tools()` else branch (no explicit user config), inject `x_search` and carve a parallel hole in default_off. * Auto-enable does NOT fire when the user has saved an explicit toolset list via `hermes tools` — that list stays authoritative. * `agent.disabled_toolsets: [x_search]` still wins (global override). Tests: 4 new in test_tools_config.py covering OAuth path, API-key path, no-creds path, and explicit-config-respect. All pass alongside existing 70/70 in that file.	2026-05-17 02:19:38 -07:00
kshitij	519657aa98	fix(matrix): warn on clock-skew silent message drops (#12614 ) (#27330 ) The 5-second startup-grace filter in _on_room_message silently drops events where event_ts < startup_ts - 5. When the host clock is set ahead of real time, the comparison flips against every live event and the bot 'connects but never replies' — exactly the symptom in #12614. Reporter Schnurzel700 chased this for several weeks before tracing it to their Debian VM's clock being out of sync. The current /1000.0 millisecond->second conversion is correct (mautrix returns ms); the failure mode is purely environmental. Add a one-shot WARNING that fires when: - we are >30s past startup (initial-sync replay window closed), AND - 3 consecutive drops share the same skew within 60s (a constant clock offset, not varied-age backfill from an invited room). State is reset in connect() so reconnects after fixing NTP rearm the detector. Includes the NTP fix instruction in the warning message itself and a new Troubleshooting entry in the Matrix docs. 5 new tests cover the happy path, initial-sync backfill, under- threshold drops, varied-age backfill, and the reconnect rearm path.	2026-05-17 00:28:24 -07:00
teknium1	152d42d1a7	Merge origin/main into pr-27248 (resolving run_agent.py = ours) run_agent.py taken from HEAD (the extracted forwarder structure). The 25 run_agent.py fixes that landed on main during the PR's life need to be ported into the agent/* extracted modules in follow-up commits.	2026-05-16 23:16:52 -07:00
bitkyc08-arch	5631345b12	[agent] fix: harden api server response headers	2026-05-16 23:11:43 -07:00
Ambuj Kumar	a3017508bf	fix(gateway): preserve underscores in plain-text identifiers	2026-05-16 23:11:43 -07:00
subtract0	fdd455bc58	fix(gateway): avoid zsh status variable in update wrapper	2026-05-16 23:11:43 -07:00
Rahul	a52f014a8c	fix(tests): mock keychain in TestReadClaudeCodeCredentials to prevent credential leakage Tests in TestReadClaudeCodeCredentials were not mocking _read_claude_code_credentials_from_keychain, which was added after the tests were written. On macOS machines with real Claude Code credentials stored in the Keychain, the function returns live credentials instead of the test fixtures, causing assertions to fail and leaking real tokens in test output. Add an autouse fixture that stubs the keychain reader to None so all tests in the class exercise only the file-based credential path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 23:05:27 -07:00
Sylw3ster	8d4766afca	fix(api_server): coerce stringified booleans in request payloads	2026-05-16 23:02:02 -07:00
teknium1	47823790b0	refactor(run_agent): review fixes — keyword-forward __init__, drop dead code, tighten guards Four fixes from PR #27248 review: 1. __init__ forwarder is now keyword-forwarded (daimon-nous review). Previously the run_agent.AIAgent.__init__ wrapper forwarded all 64 params positionally to agent.agent_init.init_agent, so adding a 65th param on main would require three lockstep edits (signature, init_agent signature, forwarder call) or silently shift every value. Keyword forwarding makes this trivially safe — adding a param now only needs the two signatures and one extra keyword line. 2. Drop dead _ra() in agent/codex_runtime.py (daimon-nous + Copilot). The lazy run_agent reference was defined but never called inside this module — the codex paths use agent.* accessors only. 3. Drop unused imports in agent/codex_runtime.py (Copilot): contextvars, threading, time, uuid, Optional. Carried over from run_agent.py during the original extraction. 4. Tighten three source-introspection test guards (Copilot): - test_memory_nudge_counter_hydration.py — was scanning the concatenated source of run_agent.py + agent/conversation_loop.py and matching self.X or agent.X form. Now asserts the hydration block lives in agent/conversation_loop.py specifically with the agent.X form — the body never moves back, so if it ever drifts a future re-introduction fails the guard. - test_run_agent.py::TestMemoryNudgeCounterPersistence — anchor on agent.iteration_budget = IterationBudget exactly (was just iteration_budget = IterationBudget) so an unrelated identifier ending in iteration_budget can't match. - test_run_agent.py::TestMemoryProviderTurnStart — assert the agent._user_turn_count form directly (the extracted body uses agent.X, not self.X — accepting either was a transitional fudge). - test_jsondecodeerror_retryable.py — scan agent/conversation_loop.py only, not the concatenation. Not addressed in this commit: * Pre-existing bugs in agent/tool_executor.py (heartbeat index mismatch when calls are blocked, _current_tool clobber in result loop, blocked-counted-as-completed in spinner summary, dead result_preview computation). These were preserved byte-for-byte from the original _execute_tool_calls_concurrent — worth a separate follow-up PR with proper tests. * _OpenAIProxy.__instancecheck__ concern — pre-existing, not flagged by any of the original test patches (nothing actually does isinstance(x, OpenAI) against the proxy instance). * agent_init.py:949 mem_config potential NameError — pre-existing; only triggers if _agent_cfg.get('memory', {}) itself raises, which it can't with a stock dict. tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing test_auxiliary_client failure (unchanged). run_agent.py: 3821 -> 3937 lines (+116 from the keyword-forwarded init call's verbosity). Final: 16083 -> 3937 (-12146, 75% reduction).	2026-05-16 22:55:49 -07:00
kronexoi	ea2ee51f0b	fix(teams): fall back to default port on invalid port config	2026-05-16 22:54:40 -07:00
teknium1	773a0faca0	fix(deepseek): set default_aux_model on profile so aux warning stops firing Closes #26924 (and supersedes #26926) in spirit. DeepSeek was missing `default_aux_model` on its `ProviderProfile`, so `_get_aux_model_for_provider("deepseek")` returned an empty string and the compression / vision / session-search paths emitted "No auxiliary LLM provider configured -- context compression will drop middle turns without a summary." on every DeepSeek session, even when the user had perfectly working DeepSeek credentials. Fix lands at the profile layer rather than the legacy `_API_KEY_PROVIDER_AUX_MODELS_FALLBACK` dict the original PR targeted. Every modern provider (gemini, zai, minimax, anthropic, kimi-coding, stepfun, ollama-cloud, gmi, novita, kilocode, ai-gateway, opencode-zen) sets `default_aux_model` on its `ProviderProfile`; the fallback dict only exists for providers that predate the profiles system. Tests added under `tests/plugins/model_providers/test_deepseek_profile.py`: - `test_profile_advertises_deepseek_chat` -- pins the profile attribute - `test_consumer_api_returns_deepseek_chat` -- pins the consumer API behavior - `test_consumer_api_returns_non_empty` -- regression guard for the symptom in the issue Original diagnosis and aux-model choice from @kriscolab in PR #26926; moved one layer up. Co-authored-by: kriscolab <71590782+kriscolab@users.noreply.github.com>	2026-05-16 22:54:22 -07:00
0xchainer	782d743730	test(skills): add regression test for skill load failure returning None Add test_returns_none_when_skill_load_fails to verify that build_skill_invocation_message() returns None when a registered skill exists in the command cache but _load_skill_payload() fails. This guards against regression of the fix in `877d01b`.	2026-05-16 22:52:22 -07:00
0xchainer	57feef3201	test(gateway): add smoke test for logger init (regression guard for #27154 ) Verify that the module has a logger instance with the correct name, preventing regression of the NameError fixed in `a31d5aff`.	2026-05-16 22:43:08 -07:00
brooklyn!	9f182bd7b0	Merge pull request #27251 from NousResearch/bb/skin-render-magenta-bleed Some checks failed Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Docker Build and Publish / move-main (push) Blocked by required conditions Details Docker Build and Publish / move-latest (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details Tests / test (push) Waiting to run Details Tests / e2e (push) Waiting to run Details OSV-Scanner / Scan lockfiles (push) Has been cancelled Details uv.lock check / uv lock --check (push) Has been cancelled Details fix(tui): harden Terminal.app rendering and color paths	2026-05-16 23:07:19 -05:00
Brooklyn Nicholson	a65f723e68	fix(review): address Copilot follow-up on sanitizer and file decode errors Consume multi-byte non-CSI ESC sequences during ANSI sanitization and handle UnicodeDecodeError for `hermes send --file` so review findings are resolved without regressions.	2026-05-16 23:00:58 -05:00
Brooklyn Nicholson	290bf93104	fix(tui): harden Terminal.app render behavior Avoid Terminal.app paint corruption by disabling fast-echo in that terminal, sanitizing non-SGR control sequences before ANSI rendering, and defaulting Apple Terminal back to the safer 256-color path unless truecolor is explicitly requested.	2026-05-16 22:51:51 -05:00
Teknium	973f27e956	fix(run_agent): isolate background review fork from external memory plugins (#27190 ) Pass skip_memory=True to the AIAgent constructor used by _spawn_background_review() so the review fork's __init__ no longer rebuilds a _memory_manager wired to honcho / mem0 / supermemory / etc. under the parent's session_id. Before this change, the review fork ingested its harness prompt (the 'Review the conversation above and update the skill library...' text) into the user's real memory namespace via three sites in run_conversation(): - on_turn_start(turn_count, prompt) cadence + turn-message - prefetch_all(prompt) recall query - sync_all(prompt, review_output, ...) harness + review output recorded as a (user, assistant) pair Built-in MEMORY.md / USER.md state is still rebound from the parent right after construction, so memory(action='add') writes from the review continue to land on disk; only the external-plugin side effects are removed. Reported by @Utku.	2026-05-16 20:33:38 -07:00
teknium1	407a11b419	feat(discord): allow_any_attachment config to accept arbitrary file types The Discord adapter silently dropped any attachment whose extension wasn't in the SUPPORTED_DOCUMENT_TYPES allowlist (PDF, text family, zip, office). Users uploading .wav / .bin / other unrecognized formats saw nothing in their conversation — the file got logged as 'Unsupported document type' and discarded before the agent ever saw it. Add discord.allow_any_attachment (default false) to bypass the allowlist. When on: - Any file is downloaded, cached under ~/.hermes/cache/documents/, and surfaced as a DOCUMENT-typed event with application/octet-stream MIME - gateway/run.py already emits a context note with the cached path, auto-translated via to_agent_visible_cache_path() for Docker/Modal sandboxed terminals - File body is NOT inlined — only the path — so binary uploads don't blow up the context window - Allowlisted text formats (.txt/.md/.log) keep their 100 KiB inline behavior unchanged Also adds discord.max_attachment_bytes (default 32 MiB matches the historical hardcoded cap; 0 = unlimited) since users opting into arbitrary types may want to raise the cap. The whole attachment is held in memory while being cached, so unlimited carries a real memory cost. Env overrides: DISCORD_ALLOW_ANY_ATTACHMENT, DISCORD_MAX_ATTACHMENT_BYTES. Discord-only by deliberate scope. Telegram has hard 20 MB API limits and Slack has its own caps — extending the same flag there is a separate follow-up if/when requested.	2026-05-16 20:26:18 -07:00
teknium1	0530252384	refactor(run_agent): extract run_conversation to agent/conversation_loop.py The 3,877-line run_conversation body — the agent loop itself — moves out of run_agent.py into a dedicated module. AIAgent.run_conversation is now a thin forwarder that delegates to agent.conversation_loop.run_conversation with the AIAgent instance as the first argument. This is the largest single extraction in the run_agent.py refactor. The body keeps all 163 self.X references intact (rewritten as agent.X), all nested closures, all retry/backoff/compression machinery. Symbols that tests or callers patch on run_agent (_set_interrupt, handle_function_call, AIAgent class attrs) are resolved through _ra() inside the extracted module so the patch surface is preserved. Five tests doing inspect.getsource(AIAgent.run_conversation) updated to scan agent.conversation_loop.run_conversation. Two source-introspection tests (TestMemoryNudgeCounterPersistence, TestMemoryProviderTurnStart) updated to accept either self.X (legacy) or agent.X (extracted form) in the matched assertions. Live E2E verified on three model paths: * openai/gpt-5.4 (OpenAI chat completions via OpenRouter) * anthropic/claude-sonnet-4.6 (Anthropic Messages via OpenRouter) * moonshotai/kimi-k2-thinking (reasoning model, reasoning_content path) Plus read_file tool execution, terminal tool, web_search. tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing failure (test_auxiliary_client::test_custom_endpoint... — same as on main). run_agent.py: 9800 -> 5944 lines (-3856). Total reduction since baseline: 16083 -> 5944 (-10139, 63%).	2026-05-16 19:26:52 -07:00
teknium1	0430e71ec9	refactor(run_agent): extract streaming API caller (893 LOC) to agent/chat_completion_helpers.py Move _interruptible_streaming_api_call out of run_agent.py — the biggest single method in the file. Body lives next to interruptible_api_call in agent/chat_completion_helpers.py so streaming + non-streaming code share one home. Nested closures (_call_chat_completions, _call_anthropic, the codex stream branch) all come along with the body and still capture the parent function's locals as expected. AIAgent keeps a thin forwarder method. is_local_endpoint added to the import block (used by the stream stale-timeout disable logic). One source-introspection test in TestAnthropicInterruptHandler is updated to scan agent.chat_completion_helpers.interruptible_streaming_api_call instead of AIAgent._interruptible_streaming_api_call. tests/run_agent/ + tests/agent/: 4312 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 12277 -> 11385 lines (-892).	2026-05-16 18:48:22 -07:00
teknium1	4b25619bc4	refactor(run_agent): extract chat-completion helpers to agent/chat_completion_helpers.py Six methods move into a new module — bodies live there, AIAgent keeps thin forwarder methods so call sites and tests are unchanged. * interruptible_api_call — non-streaming API call with interrupt handling * build_api_kwargs — assemble OpenAI / Anthropic / Codex / Bedrock request kwargs * build_assistant_message — normalize assistant message dict (reasoning, tool_calls, codex passthrough fields, alibaba glm-4.7 quirk) * try_activate_fallback — provider fallback chain activation * handle_max_iterations — controlled stop when iteration budget exhausts * cleanup_task_resources — per-turn VM + browser teardown (skipped for persistent environments) Names tests patch on run_agent (cleanup_vm, cleanup_browser) are routed through _ra() so the patch surface is preserved. Two TestAnthropicInterruptHandler source-introspection tests were updated to scan agent.chat_completion_helpers.interruptible_api_call instead of AIAgent._interruptible_api_call — the body lives in the extracted module now. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 13282 -> 12253 lines (-1029).	2026-05-16 18:41:44 -07:00
teknium1	79559214a6	refactor(run_agent): extract tool execution to agent/tool_executor.py Move the two big tool-dispatch methods out of run_agent.py: * execute_tool_calls_concurrent — 408-line concurrent path (interrupt pre-flight, guardrail+plugin block, callback fan-out, ContextVar- preserving ThreadPoolExecutor, periodic heartbeats for the gateway inactivity monitor, per-tool result handling with subdir hints + guardrail observations + checkpoint, /steer drain) * execute_tool_calls_sequential — 441-line sequential path (the original behavior used for single-tool batches and interactive tools) Both take the parent AIAgent as their first argument; AIAgent keeps thin forwarders so call sites unchanged. handle_function_call is routed through _ra() so tests that patch run_agent.handle_function_call keep working. _set_interrupt likewise. The AST guard in test_tool_executor_contextvar_propagation.py is updated to scan both run_agent.py AND agent/tool_executor.py so it still catches the executor.submit(_run_tool, ...) regression regardless of which file the body lives in. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure as before). run_agent.py: 14309 -> 13461 lines (-848).	2026-05-16 18:24:05 -07:00
Teknium	3b39096904	Port from Kilo-Org/kilocode#9434: strip historical media after compression (#27189 ) After context compression, the protected tail messages retain their original image parts. When those include multi-MB pasted screenshots, every subsequent API request re-ships the same base-64 blobs forever — which can push the request past provider body-size limits and wedge the session even though compression 'succeeded'. Add _strip_historical_media() to agent/context_compressor.py. After the summary is built, find the newest user message that carries an image part and replace image parts in every earlier message with a short text placeholder ('[Attached image — stripped after compression]'). The newest image-bearing user turn keeps its media so the model can still analyse what the user just sent. Handles all three multimodal shapes: - OpenAI chat.completions image_url - OpenAI Responses API input_image - Anthropic native {type: image, source: ...} Includes 27 unit tests covering the helpers and the end-to-end compress() integration, plus a manual E2E check confirming a ~4MB two-image conversation shrinks to ~2MB after compression.	2026-05-16 17:18:25 -07:00
Guillaume Meyer	5cbe0b1c4f	test(plugins): cover _discover_all_plugins recursion + cross-link loader Add a TestDiscoverAllPlugins class covering the six cases the recursive scan needs to handle: - flat plugin uses its manifest ``name:`` as the key - category-namespaced plugin keys off ``<category>/<dirname>`` even when the manifest ``name:`` is bare (regression test for the original bug — ``plugins/observability/langfuse/`` with ``name: langfuse`` must surface as ``observability/langfuse``, not ``langfuse``) - user-installed plugin overrides bundled on key collision - depth cap: anything below ``<root>/<category>/<plugin>/`` is ignored - bundled ``memory/`` and ``context_engine/`` are skipped (they have their own loaders), but user plugins under those category names are still scanned Also add an in-source comment next to the key derivation pointing at the loader's matching line (``PluginManager._parse_manifest`` in plugins.py:1027-1028), so future renames of one site flag the other. Both items raised in Copilot review on #27161. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 17:15:19 -07:00

1 2 3 4 5 ...

3833 commits