hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-14 14:12:44 +00:00

Author	SHA1	Message	Date
kshitij	1a74795735	feat: add claude-opus-4.8 and claude-opus-4.8-fast (#34003 ) Anthropic released Claude Opus 4.8 on 2026-05-27, available on OpenRouter, Anthropic, Amazon Bedrock, and Claude Platform on AWS: - https://openrouter.ai/anthropic/claude-opus-4.8 - https://openrouter.ai/anthropic/claude-opus-4.8-fast The fast-mode variant is a separate model ID (anthropic/claude-opus-4.8-fast) priced at 2x of the base model — a notable improvement over the 6x premium on older Opus generations (4.6/4.7). It is NOT a `speed: "fast"` request parameter like Opus 4.6; Anthropic's native fast-mode beta still only covers Opus 4.6. Changes: hermes_cli/models.py - Add anthropic/claude-opus-4.8 + anthropic/claude-opus-4.8-fast to the OpenRouter fallback snapshot and the Nous Portal curated list (live catalogs surface them automatically when reachable; the fallback list matters when the manifest fetch fails). - Add claude-opus-4-8 to the Anthropic-native picker list. agent/model_metadata.py - Register claude-opus-4-8 / claude-opus-4.8 in DEFAULT_CONTEXT_LENGTHS with 1M tokens (matches 4.6/4.7). agent/anthropic_adapter.py - Extend _XHIGH_EFFORT_SUBSTRINGS, _ADAPTIVE_THINKING_SUBSTRINGS, and _NO_SAMPLING_PARAMS_SUBSTRINGS with "4-8"/"4.8". 4.8 inherits the Opus 4.7 API contract: adaptive thinking only, xhigh effort level supported, sampling parameters (temperature/top_p/top_k) return 400. - Add claude-opus-4-8 to _ANTHROPIC_OUTPUT_LIMITS (128k max output, same as 4.7). Matches by substring so claude-opus-4-8-fast and date-stamped variants resolve correctly. agent/usage_pricing.py - Add anthropic/claude-opus-4-8: $5/$25 per MTok input/output, $0.50 cache read, $6.25 cache write (same as 4.6/4.7). - Add anthropic/claude-opus-4-8-fast: $10/$50 per MTok (2x), $1.00 cache read, $12.50 cache write. Per OpenRouter, the 2x premium is the only differentiator from regular Opus 4.8. - OpenRouter routes still pull pricing from the live /models API, so no static OpenRouter entry is needed. tests/agent/test_model_metadata.py - Extend the Claude 4.6+ context-length tag list with 4.8/4-8. website/static/api/model-catalog.json - Regenerated via `python scripts/build_model_catalog.py` to pick up the new entries in the OpenRouter and Nous Portal fallback lists. E2E verification (isolated sys.path import against the worktree): - _supports_adaptive_thinking, _supports_xhigh_effort, _forbids_sampling_params all return True for claude-opus-4.8 and claude-opus-4.8-fast. - _supports_fast_mode (the `speed: "fast"` request-parameter gate) stays False for 4.8 — fast mode is a separate model ID on OpenRouter, not a parameter Anthropic accepts on the base model. - DEFAULT_CONTEXT_LENGTHS resolves 1M for both notations. - resolve_billing_route + _lookup_official_docs_pricing resolve the correct $5/$25 (regular) and $10/$50 (fast) pricing for both dot-notation and dash-notation inputs. - 4.7 and 4.6 regression: behavior unchanged. Unit tests: 305 passed across tests/agent/test_usage_pricing.py, test_model_metadata.py, tests/hermes_cli/test_model_catalog.py, test_models.py, test_model_validation.py, test_models_dev_preferred_merge.py.	2026-05-28 10:31:59 -07:00
Ben Heidorn	e8b9369a9d	feat(openrouter): pass session_id in extra_body for sticky routing OpenRouter supports a session_id field in extra_body that pins multi-turn conversations to the same provider endpoint, enabling prompt cache reuse across turns. The session_id was already threaded through to build_extra_body() but never included in the returned dict. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>	2026-05-28 08:52:19 -07:00
kshitij	0554ef1aa3	fix(agent): fallback immediately on provider content-policy blocks (#33883 ) * fix(agent): fallback immediately on provider content-policy blocks Provider safety-filter refusals (e.g. OpenAI Codex 'flagged for possible cybersecurity risk', OpenAI moderation 'violates our usage policies', Anthropic safety-system rejections, Azure content_filter) are deterministic decisions about a specific prompt. Retrying the same prompt up to api_max_retries times just reproduces the same refusal and burns paid attempts before surfacing the generic 'API failed after 3 retries — <provider message>' to Telegram / cron with no indication that the failure came from the model provider rather than Hermes itself. Classify these as a new FailoverReason.content_policy_blocked (non-retryable, should_fallback=True) and route them through the existing is_client_error path so the loop: - skips the 3x retry backoff - activates a configured fallback model immediately - emits a clear provider-safety message to the user (not the generic 'Non-retryable error (HTTP None)') and surfaces actionable guidance when no fallback is configured (rephrase, narrow context, or set fallback_model in hermes config) - returns a final_response that explicitly tells the user this came from the model provider, so gateway delivery is unambiguous and cron last_status reflects the safety block rather than a vague 'agent reported failure' Patterns are intentionally narrow — verbatim refusal phrasings keyed to specific provider safety pipelines, not generic words like 'policy' or 'violation' that would collide with billing / format / auth errors. Regression guards in test_18028_content_policy_blocked.py verify billing 402s, generic 400s, and OpenRouter account-level provider_policy_blocked remain distinct classifications. Salvaged from #18164 onto current main (file restructure: loop logic moved from run_agent.py to agent/conversation_loop.py, _emit_status → _buffer_status), broadened patterns beyond the original OpenAI Codex cybersecurity case to cover OpenAI moderation, Anthropic safety system, and Azure content_filter; added user-actionable guidance and a clear final_response so cron/gateway surfaces the policy block instead of a generic non-retryable error, and added a regression-guard test module mirroring the is_client_error predicate. Addresses #18028. Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com> * chore: add kchuang1015 to AUTHOR_MAP --------- Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com>	2026-05-28 07:28:24 -07:00
kshitij	a82c88bac0	fix(xai-oauth): accept bare-code manual paste (state=None) (#26923 ) (#33880 ) xAI's consent page renders the authorization code in-page rather than redirecting through the 127.0.0.1 callback, so on remote/headless setups (GCP Cloud Shell, Codespaces, container consoles, headless VPS) the only value the user can paste is the opaque code with no `code=`/`state=` query parameters. `_parse_pasted_callback` correctly returns `state=None` for that input, but `_xai_oauth_loopback_login` then validated state unconditionally and raised `xai_state_mismatch`, making the documented bare-code paste path unreachable. PKCE (code_verifier) still binds the token exchange to this client, so the local state-equality check is redundant when there is no state to compare. On the manual-paste path only, substitute the locally generated state when the callback returned none — the rest of the validation chain (code presence, error field, token exchange) is unchanged. The loopback HTTP-server path still requires a matching state (a real browser redirect always carries one). Also: clarify the manual-paste prompt to mention xAI's in-page code rendering so users know pasting the bare code on its own is expected. Root-cause analysis from #26923 comment by @AccursedGalaxy (2026-05-20). Tests ----- * test_xai_loopback_login_manual_paste_bare_code_succeeds — positive end-to-end through the token exchange with state=None. * test_xai_loopback_login_loopback_path_rejects_missing_state — the HTTP-server path still rejects state=None as a regression guard (the bare-code relaxation must NOT widen the loopback path). * Existing test_xai_loopback_login_manual_paste_state_mismatch_raises continues to verify wrong (non-None) state is rejected on manual-paste. Closes #26923.	2026-05-28 05:47:30 -07:00
helix4u	c0d04694ea	docs(email): clarify gateway vs Himalaya setup	2026-05-28 05:42:09 -07:00
Teknium	67011cc0d7	feat(agent): buffer retry/fallback status, surface only on terminal failure (#33816 ) Users report that the CLI/gateway floods them with confusing retry chatter during transient failures: a single 429 can produce 10+ "Provider/Endpoint/ Retrying in 5s..." lines before the request eventually succeeds. The same firehose hits Telegram, Discord, Slack, etc. via _emit_status. This patch defers all retry/fallback/compression status messages until we know the outcome: - if the turn ultimately succeeds (any path: primary recovers, fallback activates, compression unsticks the request), the buffer is silently dropped — the user sees nothing. - if every retry and fallback exhausts and the turn fails, the buffer is flushed at the terminal-failure return so the user sees the full retry trace alongside the final error. Backend logging (agent.log) is unchanged — every emission site still writes to logger.warning/info, so post-mortem diagnosis is intact. ## What changed run_agent.py: four new methods on AIAgent: _buffer_status(msg) — defer an _emit_status call _buffer_vprint(msg) — defer a _vprint(force=True) line _clear_status_buffer() — drop pending messages on success _flush_status_buffer() — replay pending messages on terminal failure agent/conversation_loop.py: - converted ~30 mid-process emit/vprint sites in the retry, fallback, compression, empty-response, and stream-watchdog paths to the buffered helpers - added _flush_status_buffer() at every terminal-failure return so users still see the trace when it actually matters - added _clear_status_buffer() at the "non-empty assistant content" point (NOT at "API call returned bytes" — empty responses still loop through the empty-retry path and would otherwise lose their trace between iterations) - silenced the two "(´;ω;`) oops, retrying..." / "(╥_╥) error, retrying..." spinner final-frame messages — the spinner now stops cleanly so retries leave no visible residue agent/chat_completion_helpers.py: same conversion for codex TTFB / stale- stream / fallback-activation status messages. agent/stream_diag.py: _emit_stream_drop now buffers instead of emitting directly. ## Tests tests/run_agent/test_retry_status_buffer.py: 7 unit tests covering accumulate→flush, clear-on-success, mixed kinds, empty-buffer no-op, re-buffer after flush, exception swallowing. Updated 3 existing tests that mocked _emit_status to also mock (or use) _buffer_status: - tests/run_agent/test_run_agent.py::test_empty_response_emits_status_for_gateway - tests/run_agent/test_stream_drop_logging.py (2 tests) - tests/agent/test_codex_ttfb_watchdog.py (TTFB hint test) ## Validation Live test: hermes chat -q against an unreachable endpoint with no fallback exhausts retries and prints the full trace at the end. Same flow against a working endpoint prints zero retry chatter.	2026-05-28 04:53:27 -07:00
Teknium	e0572a6def	fix(skills-hub): stop ellipsis-truncating the Identifier column (#33810 ) `hermes skills search` rendered the Identifier column with the default overflow behaviour, so long slugs (notably browse-sh — every browse-sh skill ends in a `-XXXXXX` hash that's part of the identifier) were cut to `browse-sh/weathe…`. Users copied the visible string into `hermes skills install` and got a not-found error because the hash was gone. Set overflow="fold" on the Identifier column in both search tables (`do_search` and the `_resolve_short_name` multi-match table) so long slugs wrap onto a second line instead of getting eaten. Also add a `--json` flag to `hermes skills search` (and the `/skills search` slash variant) for scripting — emits a list of {name, identifier, source, trust_level, description} objects with the full identifier, which is the right shape for copy-paste pipelines too. Closes #33674.	2026-05-28 04:53:13 -07:00
Teknium	5e1f793430	chore(web): remove web_crawl tool + provider crawl plumbing (#33824 ) The web_crawl_tool() function was an orphan — no model schema registered it, no skill or CLI command called it, and the agent had no way to invoke it. PR #32608 proposed wiring it up as a model-callable tool; we've decided not to expose crawl as a separate capability since web_search + web_extract cover the use cases we want models to have. Removed: - tools/web_tools.py: web_crawl_tool() (~230 LOC) - plugins/web/firecrawl/provider.py: supports_crawl() + crawl() - plugins/web/tavily/provider.py: supports_crawl() + crawl() - plugins/web/xai/provider.py: supports_crawl() override - agent/web_search_provider.py: supports_crawl() + crawl() ABC methods - agent/web_search_registry.py: get_active_crawl_provider() + the 'crawl' branch in _resolve() - agent/display.py: web_crawl tool-progress rendering - hermes_cli/config.py: 'web_crawl' from TAVILY_API_KEY.tools - tools/website_policy.py: stale comment reference - Tests: removed TestWebCrawlTavily class, the two website-policy web_crawl tests, the searxng/ddgs/brave-free crawl-error tests, the integration test_web_crawl method, and the test_unconfigured_crawl_emits_top_level_error test. Trimmed the capability-flag parametrize list and the WebSearchProvider ABC conformance tests. - Docs: trimmed the Crawl column from capability tables in both EN and zh-Hans, updated the developer-guide ABC table. Net: 25 files, +115/-1067. Closes #33762 (the schema-text bug only existed if #32608 landed). Supersedes #32608.	2026-05-28 04:52:42 -07:00
teknium1	b243afb68b	fix(discord): skip backfill for auto-created threads and update test fakes When auto-threading kicked in, the broadened backfill gate ran on the freshly-created thread — but the thread has no prior context to fetch, and the parent-channel reference passed to _fetch_channel_context would have leaked unrelated context (see #31467). Skip backfill when auto_threaded_channel is set. Also teach the _FakeTextChannel / _FakeThreadChannel test doubles to expose a no-op history() async generator so the broadened gate doesn't trip AttributeError → discord.Forbidden (MagicMock) → TypeError in the existing auto-thread tests. Add a regression test that asserts auto-threaded messages do not trigger backfill.	2026-05-28 04:52:02 -07:00
teknium1	68ddd6b338	refactor(discord): inline backfill gate and document intent Drop the _needed_mention local variable now that it has only one use, inline its expression as _has_mention_gap, and add a comment explaining the three backfill cases (mention-gated channel, thread, DM skip). Behaviorally identical to the prior commit; cleanup only. Co-authored-by: liuhao1024 <liuhao1024@users.noreply.github.com>	2026-05-28 04:52:02 -07:00
Pluviobyte	eafe11d456	fix(gateway): backfill Discord thread context Discord threads where the bot has already participated bypass mention gating by default, but the backfill check was still tied to the mention-needed condition. That meant follow-up thread messages could trigger a response without providing recent thread history to the session. Run history backfill for thread messages whenever backfill is enabled, while keeping DMs skipped and channel mention backfill behavior unchanged. Add a regression test for a known thread follow-up without an explicit mention. Fixes #33666 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-28 04:52:02 -07:00
Teknium	a1eaad2fc0	perf(skills-page): lazy-fetch the catalog instead of bundling 34MB into JS (#33809 ) PR #33748 grew the live skills index from ~2k skills to ~69k, which made the previous build-time bundling strategy untenable: the skills page's JS chunk was about to balloon from ~1MB to ~35MB. Initial page load on mobile became unusable, search lagged on every keystroke against the 68k-item array, and JSON.parse blocked the main thread at startup. Three changes: 1. extract-skills.py writes skills.json + skills-meta.json into website/static/api/ instead of website/src/data/. Static-served by Vercel as /docs/api/skills.json (gzipped on the wire), same CDN that already serves skills-index.json. 2. skills/index.tsx drops the static import and fetches both files in parallel on mount. Loading state shows '…' for the count; failures surface a small error pill instead of blanking the page. 3. Search is debounced 150ms and runs against a precomputed lowercase haystack stamped onto each row at load time. Before: array-join + toLowerCase per row per keystroke on a 68k array. After: single .includes() per row, deferred until typing settles. Validation: \| \| before \| after \| \|---\|---\|---\| \| skills.json location \| src/data/ (bundled) \| static/api/ (CDN) \| \| Largest JS chunk \| would be ~35MB at 68k skills \| 659 KB \| \| Initial page render \| wait for full parse \| immediate, fetch async \| \| Per-keystroke filter \| join+lowercase x 68k rows \| single includes x 68k rows \| \| Debounce \| none \| 150ms \| Built locally for both en and zh-Hans locales; the 34MB skills.json now lives in build/api/ and is served separately rather than inlined into the page's bundle. skills.json and skills-meta.json added to .gitignore — they were already build artifacts, but the gitignore only listed skills-index.json before.	2026-05-28 03:41:43 -07:00
teknium1	6f9182cb34	fix(kanban): content-addressed corrupt-DB backup filename Repeated quarantines of an unchanged corrupt kanban.db used to amplify disk usage by N: the gateway dispatcher's 5-minute retry loop, multi- profile fleets sharing one DB, and manual reopen attempts each produced a fresh '.corrupt.<timestamp>.bak' copy of the same bytes. After 10 retries on a 100KB DB you had 11x the disk footprint of duplicate corrupt data. Derive the backup filename from a sha256 of the main DB instead of a timestamp + collision counter. Same bytes → same filename → skip the copy on retries. Different bytes (partial repair, further damage) → different filename → preserve separately. Sidecar (-wal/-shm) backups inherit the same content-addressed name. Inspired by @hanzckernel's PR #33529, simplified down to ~30 LOC: drop the persistent JSON marker file, drop the atomic temp+fsync+rename helper (shutil.copy2 is fine for a quarantine-only path), drop the gateway-side WAL/SHM fingerprint extension (the existing (path, mtime, size) tuple still gives the 5-minute retry semantics it needs), and drop the gateway-side helper extraction. The backup file existing IS the marker; no separate state needed. Test: tests/hermes_cli/test_kanban_db.py::test_repeated_corrupt_open_reuses_single_backup proves 10 retries on the same corrupt bytes produce 1 backup (was 11), and mutating the corrupt bytes produces a second backup with a different fingerprint. Refs #33529 Co-authored-by: hanzckernel <zhicheng.han@mathematik.uni-goettingen.de>	2026-05-28 03:38:09 -07:00
Teknium	432a691758	fix(update): stream + idle-kill `npm run build` so a stalled webui-build can't soft-brick the install (#33803 ) `hermes update` ran the webui build with `capture_output=True` and no timeout. On low-memory hosts (WSL2's 4 GB default, small VPSes, antivirus stalls) Vite goes silent for minutes; users see a frozen terminal, decide the update is hung, and reboot. The reboot lands after `pip install -e .` has already touched the install but before the build completes, leaving the `hermes` launcher in place while `hermes_cli` is no longer importable — i.e. `ModuleNotFoundError: No module named 'hermes_cli'` (#33788, same class as #32384). Changes: - New `_run_with_idle_timeout()` helper: streams subprocess output line-by-line (so the user sees Vite progress in real time) and kills the process if no bytes appear on stdout/stderr for 180s. The existing stale-dist fallback (#23817) then serves the previous build instead of failing the update. - `_build_web_ui()` uses the helper for `npm run build` (the actual stall site). `npm install` keeps `subprocess.run` + capture_output to preserve the existing EPERM-retry-on-Windows contract. - Both `cmd_update` call sites print `→ Core update complete. Building dashboard (optional)...` before the webui build. The CLI is fully functional at this point; a webui-build failure only affects `hermes dashboard`. Telegraphing the boundary explicitly stops users from rebooting through the build step. Tests: - `tests/hermes_cli/test_run_with_idle_timeout.py` — 4 tests covering streaming success, nonzero exit, idle-kill, and missing-binary cases. Uses real `subprocess.Popen` on tiny Python scripts; isolated in its own file so per-file canonical-runner parallelism doesn't pair it with the mock-heavy tests. - `tests/hermes_cli/test_web_ui_build.py` — updated existing tests to patch `_run_with_idle_timeout` for the build step in addition to `subprocess.run` for the install step. - `tests/hermes_cli/test_cmd_update.py::test_update_refreshes_repo_and_tui_node_dependencies` — same update. Full suite: `scripts/run_tests.sh tests/hermes_cli/` → 5646 passed, 0 failed. Fixes #33788.	2026-05-28 03:34:47 -07:00
teknium1	78be458608	fix(patch): widen new_string \t/\r unescape to all match strategies (#33733 ) Extends @liuhao1024's escape-normalized fix so the patch tool also recovers when old_string carries a real tab byte and matches via the `exact` strategy — which is the headline reproduction in the issue and the most common case in practice (LLMs frequently get old_string right because they re-read the file, but still serialize new_string's tabs as two-character `\t`). Instead of gating on the match strategy, decide per-sequence by looking at the matched region of the file: only convert `\t` -> tab and `\r` -> CR when the file region we're replacing actually contains the corresponding control byte. That mirrors the region-based heuristic in `_detect_escape_drift` and keeps legitimate writes of the literal two-character string `"\t"` (e.g. patching `sep = "\t"` in Python source) untouched — those files have a backslash+t in the matched region, not a real tab, so new_string passes through verbatim. `\n` is still excluded because newlines serialize correctly through JSON and unescaping would corrupt source escape sequences far more often than help. E2E verified against the live `patch` tool: tab-indented file + literal `\t` in new_string under both `exact` (Variant 1) and `escape_normalized` (Variant 2) strategies now produces real tab bytes; a Python source line containing `sep = "\t"` (legitimate literal backslash-t) survives a patch unchanged. Tests updated to cover both strategies and the legitimate-literal case, and to assert that `\n` is intentionally preserved. Refs #33733	2026-05-28 03:27:20 -07:00
liuhao1024	e9f3f2b34a	fix(tools): unescape common sequences in new_string when escape_normalized matches When the patch tool matches via the escape_normalized strategy, old_string contains literal \t, \n, \r sequences that get unescaped to match real control characters in the file. However, new_string was written as-is, leaving literal backslash sequences in the output. Add _unescape_common_sequences() helper and apply it to new_string when the matching strategy is escape_normalized. This ensures LLM-generated tab/newline sequences become real bytes in the patched file. Fixes #33733	2026-05-28 03:27:20 -07:00
Teknium	10ee4a729b	fix(gateway): drain on Windows `hermes gateway stop` so sessions survive restart (#33798 ) Sessions now survive `hermes gateway stop` / `restart` on native Windows. Previously the gateway died on schtasks `/End` + os.kill SIGTERM without ever running the drain loop, so the v0.13.0 session-resume feature (#21192) silently broke on Windows: `resume_pending=True` was never written, and the next boot started with a blank conversation history (issue #33778). Root cause is twofold and the reporter only identified half of it: 1. `hermes_cli/gateway_windows.py::stop()` did not write the `planned_stop_marker` before signalling. The reporter caught this. 2. The bigger reason: `asyncio.add_signal_handler` raises NotImplementedError for SIGTERM/SIGINT on Windows, so even if the marker had been written, the gateway's existing SIGTERM handler (which is what calls `runner.stop()` and the `mark_resume_pending` loop) was never invoked. Writing the marker would have been necessary-but-insufficient. The fix has two parts: * gateway/run.py: new `_run_planned_stop_watcher` daemon thread polls for the planned-stop marker file every 0.5s. When the marker appears it `loop.call_soon_threadsafe(shutdown_signal_handler, None)` — the same shutdown path a real SIGTERM would have driven, including the pre-drain `mark_resume_pending` writes (run.py:5977) and graceful drain wait. The existing signal handler already accepts `received_signal=None` and falls through to `consume_planned_stop_marker_for_self()`, so no handler changes needed. Runs on every platform as cheap belt-and-suspenders. * hermes_cli/gateway_windows.py: `stop()` now writes the marker for the running gateway PID and waits up to `agent.restart_drain_timeout` (default 30s) for the PID to exit cleanly. On clean drain, the kill sweep is non-forceful; on timeout, escalates to `kill_gateway_processes(force=True)` which routes to taskkill /T /F per `references/windows-native-support.md`. Validation: * 7 new tests in tests/gateway/test_planned_stop_watcher.py covering: marker→handler dispatch, no-marker idle, already-draining skip, not-yet-running skip, stop_event responsiveness, fire-once semantics, error tolerance. * 8 new tests in tests/hermes_cli/test_gateway_windows.py covering: marker-before-kill ordering, clean-drain skips force-kill, drain-timeout escalates to force=True, no-pid-skips-drain, invalid-pid handling, fast-exit success, timeout failure, marker-write-failure tolerance. * E2E (Linux, detached orphan): write_planned_stop_marker(pid) + `_drain_gateway_pid(pid, 5.0)` returns True in 0.5s after the victim sees the marker and exits. Tested with a double-forked subprocess so the test parent isn't holding it as a zombie. * Targeted: tests/gateway/{restart_drain,restart_resume_pending, signal,signal_format,status,shutdown_forensics,approve_deny_commands, planned_stop_watcher} + tests/hermes_cli/{gateway_windows, gateway_service} → 519/519. What was wrong with the reporter's claim (for future archaeology): they described the symptom as "no `resume_pending=True` written to `sessions.json`" — but Hermes uses `state.db` (SQLite), not `sessions.json`, and `mark_resume_pending` is called regardless of the marker (the marker only affects exit code 0 vs 1 for systemd revival semantics). The real session-loss path is the missing drain on Windows, not a missing marker. Both halves are fixed here. Closes #33778.	2026-05-28 03:25:32 -07:00
teknium1	f8896dedc8	chore(release): map biser@bisko.be -> bisko in AUTHOR_MAP	2026-05-28 03:21:00 -07:00
Biser Perchinkov	b5495db701	fix(agent): re-pad reasoning_content on cross-provider fallback to require-side providers api_messages is built once before the retry loop while the primary provider is active. When a mid-conversation fallback switches to a require-side thinking provider (DeepSeek/Kimi/MiMo), assistant turns built under a non-require primary (e.g. Codex) go out without reasoning_content and the new provider rejects the request with HTTP 400 ("reasoning_content must be passed back"). Re-apply the echo-back pad against the current provider immediately before building the request kwargs. Idempotent and a no-op unless the active provider enforces echo-back, so it covers all fallback paths without affecting normal or reject-side operation. Drafted by Claude (Opus 4.7) under human review while fixing a personal deployment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 03:21:00 -07:00
Indigo Karasu	9179396cb7	fix(stream-consumer): only set _final_content_delivered when final response confirmed delivered In GatewayStreamConsumer._run(), _final_content_delivered was set to True based on the success of a mid-stream finalize edit, before the final finalize edit was attempted. When the final edit later failed (Telegram flood control, retry-after), _final_response_sent stayed False but _final_content_delivered was already True, so gateway/run.py suppressed its normal final send and the user saw a partial / fallback message instead of the real answer. Changes in gateway/stream_consumer.py: - Remove the premature _final_content_delivered = True at the top of the got_done block. - Set _final_content_delivered = True only when the actual final send / edit succeeds, in each finalize branch (no-finalize adapter, _message_id finalize, no-_already_sent send). - _send_fallback_final: don't set _final_response_sent = True when only some chunks were delivered; the gateway should still attempt a complete final send. Set _final_content_delivered = True alongside _final_response_sent on the success path and short-text path. - Cancellation handler: set _final_content_delivered = True alongside _final_response_sent when the best-effort final edit succeeds. Adds TestFinalContentDeliveredGuard with 3 regression tests covering the core bug scenario, the happy path, and partial fallback. Closes #33708 Closes #25010 Refs #29200 Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-05-28 03:15:19 -07:00
Dusk1e	a91b1c8b31	fix(tirith): reject non-regular tar members during auto-install process	2026-05-28 02:49:26 -07:00
teknium1	247b24b49f	chore(release): add AUTHOR_MAP entry for AdityaRajeshGadgil	2026-05-28 02:45:25 -07:00
Aditya Rajesh Gadgil	031983bbf8	fix: limit pre-update state snapshots	2026-05-28 02:45:25 -07:00
Teknium	8b6beaab5f	docs: 30-day overhaul — correctness audit, PR coverage, Nous Portal weave, sidebar reorg (#33782 ) * docs(audit): correctness pass across getting-started, reference, features, messaging, developer-guide, guides, integrations, user-guide * docs: add PR coverage for last 30d + Nous Portal weave + nav reorg + build fixes - Add docs for top user-visible PRs that shipped without docs (api-server session control, kanban features, telegram pin/edit, provider client tag, xAI retired-model migration, cron name lookup, --branch update flag, etc.) - Apply Nous Portal weave across 23 pages (tasteful one-liners on getting-started/learning-path, configuration, overview, vision, x-search, credential-pools, provider-routing, cron, codex-runtime, profiles, docker, messaging/index, multiple guides, plus FAQ + index promotion) - Reorganize sidebar: split Messaging into Popular/M365/Chinese/Other, Reference into Command/Configuration/Tools-Skills sub-categories, add orphan developer-guide pages (web-search-provider-plugin, browser-supervisor), move features from Integrations back to Features, fold lone spotify into Media & Web. - Regenerate skill stubs + catalogs (kanban-codex-lane, hermes-s6-container- supervision, web-pentest) - Fix broken anchor links (security/cron, configuration/fallback, telegram large-files, adding-platform-adapters step-by-step)	2026-05-28 02:41:36 -07:00
teknium1	c7f7783e5c	test(xai-proxy): regression coverage for #28932 429 handling Three new tests in tests/hermes_cli/test_proxy.py: - xai_adapter_retry_rotates_pool_entry_on_429 — headline #28932 case. Two-entry pool, 429 on first entry, must rotate to second entry AND must NOT call refresh_xai_oauth_pure (refresh is irrelevant for rate limits). - xai_adapter_retry_returns_none_on_429_when_pool_exhausted — single-entry pool: 429 returns None so the rate-limit response flows back to the client unchanged (existing behavior preserved). - xai_adapter_retry_returns_none_for_unrelated_status — non-{401, 429} statuses must not trigger any retry path at all; guards against the gate becoming too broad in future changes. Each test asserts that refresh_xai_oauth_pure is never called on the 429 path — refresh is a 401-specific concern. 39/39 in tests/hermes_cli/test_proxy.py.	2026-05-28 02:36:37 -07:00
sprmn24	4ed482549f	fix(xai-proxy): handle 429 rate-limit responses in proxy retry path get_retry_credential only triggered on 401; a 429 Too Many Requests from xAI was silently streamed back with no key rotation or back-off signal. - server.py: widen retry gate from == 401 to in {401, 429} - xai.py: on 429, skip token refresh and call mark_exhausted_and_rotate to stamp the 1-hour cooldown on the rate-limited key and return the next available credential. Returns None if pool is exhausted.	2026-05-28 02:36:37 -07:00
Dusk1e	aa3466063b	fix(android): reject unsafe tar members in psutil compatibility installer	2026-05-28 02:36:09 -07:00
Teknium	bb0ac5ced2	chore(release): AUTHOR_MAP entry for vynxevainglory-ai PR #29233 salvage.	2026-05-28 02:33:51 -07:00
Teknium	70abae8e3b	fix(kanban): show horizontal scrollbar instead of wrapping columns Salvage follow-up on top of @vynxevainglory-ai's PR #29233. Keep the column-body flex:1 + min-height:0 fix (tall columns scroll internally now), but drop the flex-wrap: wrap part — instead just stop hiding the existing horizontal scrollbar. PR #`523254b34` (sadiksaifi, May 18) deliberately moved the kanban board from a wrapping grid to a single-row pinned-width flex so the board stays as one stable horizontal row. The mistake in that PR was the scrollbar-width: none + ::-webkit-scrollbar { display: none } pair, which hid the affordance so columns past the viewport became visually inaccessible. Fixing that hidden-scrollbar bug while keeping the single-row design honors both contributors' intent.	2026-05-28 02:33:51 -07:00
Vynxe Vainglory	538f0fa339	fix(kanban): wrap columns into rows and fix vertical overflow Two CSS issues in the kanban dashboard: 1. Columns overflow horizontally with no way to reach them — the original scrollbar-width: none hid the scrollbar entirely, and even with a scrollbar, a wrapping layout is better UX for a board with 8+ columns. Changed to flex-wrap: wrap and removed the overflow-x: auto + hidden scrollbar rules. Columns now flow into multiple rows (~3 per row on a typical viewport) instead of running off-screen. 2. .hermes-kanban-column-body lacked flex: 1 and min-height: 0, so the flex child's implicit min-height: auto prevented it from shrinking below its content size. Columns with many cards pushed past the parent max-height instead of scrolling internally. Verified: 9 columns wrap into 3 rows, all visible without horizontal scroll. Done column (53 tasks) scrolls vertically within its column bounds.	2026-05-28 02:33:51 -07:00
teknium1	9b5dae17a5	feat(context-engine): host contract for external context engines Condenses the substance of PRs #16453, #17453, #16451, #17600, and #13373 into a minimal generic host contract that external context engine plugins (e.g. hermes-lcm) need to integrate cleanly. Drops scaffolding that duplicated existing infrastructure or had marginal value. Five concrete changes: 1. `_transition_context_engine_session()` on AIAgent — generic lifecycle helper that fires on_session_end → on_session_reset → on_session_start → optional carry_over_new_session_context. Engines implement only the hooks they need; missing hooks are skipped. Built-in compressor keeps its existing reset-only behavior because callers default to no metadata. `reset_session_state()` now optionally accepts previous_messages / old_session_id / carry_over_context and delegates to the transition helper when provided. (#16453) 2. `conversation_id` passed to `on_session_start()` — both the agent-init call site and the compression-boundary call site now forward `self._gateway_session_key` so plugin engines have a stable conversation identity that survives session_id rotation (compression splits, /new, resume). The key already existed on AIAgent; it just wasn't reaching engines. (#16453) 3. Canonical cache buckets forwarded to engines — the usage dict passed to `update_from_response()` now includes input_tokens, output_tokens, cache_read_tokens, cache_write_tokens, and reasoning_tokens on top of the legacy prompt/completion/total keys. Engines can make decisions on cache-hit ratios and reasoning costs instead of only aggregates. ABC docstring updated. (#17453) 4. Plugin-registered context engines visible in the picker — `_discover_context_engines()` in plugins_cmd.py now also includes engines registered via `ctx.register_context_engine()` from plugin manifests, deduplicating by name so repo-shipped descriptions win on collision. (#16451) 5. `_EngineCollector.register_command()` — context engines using the standard `register(ctx)` pattern can now expose slash commands (e.g. `/lcm`). Routes to the global plugin command registry with the same conflict-rejection policy regular plugins use (no shadowing built-ins, no clobbering other plugins). Previously these calls hit a no-op and the slash commands silently never appeared. (#17600) Dropped from the original 5 PRs: - Compression boundary signal (`boundary_reason="compression"`) from #16453 — already on main at `agent/conversation_compression.py:412-424`, landed via the bg-review extraction. - `discover_plugins()` before fallback in run_agent.py from #16451 — redundant: `get_plugin_context_engine()` already routes through `_ensure_plugins_discovered()` which is idempotent. - Runtime identity diagnostics method + helpers from #13373 (+251 LOC) — operators can already read engine state via `engine.get_status()`; the diagnostics view added marginal value relative to its surface area. - The 553-LOC slash-command machinery from #17600 — replaced with a 20-LOC `register_command` method on the collector that reuses the existing plugin command registry instead of building a parallel one. Net: ~215 LOC of host-contract changes + 282 LOC of focused tests, vs ~1,176 LOC across the original 5 PRs. Co-authored-by: Tosko4 <1294707+Tosko4@users.noreply.github.com> Closes #16453. Closes #17453. Closes #16451. Closes #17600. Closes #13373. Related: stephenschoettler/hermes-lcm#68.	2026-05-28 01:45:30 -07:00
Teknium	fb9f3a4ef9	fix(skills): pull full ClawHub catalog into the skills index (200 → 20k+) (#33748 ) * fix(skills): pull full ClawHub catalog into the skills index The website was showing 200 ClawHub skills out of 20k+ because `ClawHubSource.search("")` for empty queries went straight to a single unpaginated request. ClawHub's API caps any single page at 200 items and returns a `nextCursor`; we grabbed page 1 and stopped, so the cached index served from hermes-agent.nousresearch.com had a silent 99% truncation. End users never hit clawhub.ai directly (the index is rebuilt twice daily by .github/workflows/skills-index.yml and served as a static JSON on the docs site), so the cap-and-cache architecture is correct — it just wasn't being filled. Changes: - `ClawHubSource.search(query="")` now routes through the existing `_load_catalog_index()` paginating walker instead of the unpaginated listing fallback (non-empty queries still hit the fast catalog search). - `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live catalog is ~20k as of May 2026, with headroom for growth). - `build_skills_index.py`: per-source crawl limits split out — ClawHub and LobeHub get 100k, others keep their effective caps. - `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination regression hard-fails the CI build instead of silently shipping a degenerate index. Test plan: - New unit test `test_search_empty_query_paginates_full_catalog` exercises the cursor-following path with three mocked pages (450 total items) and asserts all pages are walked. - Existing 9 ClawHub tests + 127 broader skills_hub tests all pass. - E2E against live ClawHub API: walker reached 9700+ skills across 49 pages before this commit landed, paginating well past the previous 50-page cap. * fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k E2E walk against live ClawHub API hit my initial 250-page cap at 49,698 skills with cursor=yes still pending. The catalog is roughly 2.5x larger than the docstring estimate. - max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None well before this in practice) - SOURCE_LIMITS['clawhub'] 100k → 200k - EXPECTED_FLOORS['clawhub'] 5000 → 20000	2026-05-28 01:42:19 -07:00
Teknium	09a5cd8084	fix(auth): sync manual:device_code Codex pool entries on re-auth (#33744 ) #33164 made _save_codex_tokens sync the singleton-seeded `device_code` pool entry on Codex OAuth re-auth. That fixed the #33000 path but missed `manual:device_code` entries created by `hermes auth add openai-codex` (the recommended workaround for users who hit #33000 before #33164 landed). Every subsequent re-auth would refresh the device_code entry but leave the manual:device_code entry holding the consumed refresh token plus stale last_error_* markers — immediately recreating the 401 token_invalidated symptom on the next request, exactly as reported in #33538. Extend the refreshable source set to include `manual:device_code`. Completing the device-code OAuth flow proves the user owns the ChatGPT account, so it is safe to refresh every device-code-backed entry. Keep `manual:api_key` and other non-device-code manual sources untouched — those represent independent credentials. Closes #33538.	2026-05-28 01:33:10 -07:00
Dusk1e	43abc51f66	fix(security): require source CIDR allowlisting for public msgraph webhook binds	2026-05-28 01:26:18 -07:00
Teknium	986abb3cf7	docs: drop stale Kimi/DeepSeek vision example (#33736 ) Kimi K2.6 is natively multimodal — flagged by Shengyuan from the Kimi growth team. Replace the named-vendor example with a model-agnostic phrasing so the row doesn't go stale as more vendors ship vision.	2026-05-28 01:23:38 -07:00
Teknium	87e5b2fae0	feat(mcp): support TLS client certificates (mTLS) for HTTP and SSE servers (#33721 ) Adds first-class `client_cert` / `client_key` config keys so MCP servers behind mTLS work without an external TLS-terminating proxy. Resolves inbound community question (Jeremy W.). Schema (per `mcp_servers.<name>`, HTTP/SSE only): - `client_cert: "/path/to/combined.pem"` — single PEM with cert + key - `client_cert: "/path/to/cert"` + `client_key: "/path/to/key"` — separate - `client_cert: [cert, key]` or `[cert, key, password]` — list form, with optional passphrase for encrypted keys Paths support `~` expansion. Missing files raise a server-scoped `FileNotFoundError` at connect time rather than failing later with an opaque TLS handshake error. Wiring: - New SDK HTTP path (mcp >= 1.24): `cert=` on the user-owned `httpx.AsyncClient` alongside the existing `verify=` handling. - SSE path: routed through an `httpx_client_factory` that wraps the SDK's defaults (follow_redirects=True) and layers `verify` + `cert` on top. The factory is only injected when needed, so the SDK's built-in `create_mcp_http_client` keeps being used in the default case. - Deprecated mcp<1.24 path left untouched — that SDK's `streamablehttp_client` signature doesn't expose `cert`, and adding it would be dead code. Also documents the previously-undocumented `ssl_verify` key (bool or CA bundle path) in the MCP config reference. Tests: - `tests/tools/test_mcp_client_cert.py` (new, 19 tests): - `_resolve_client_cert` helper: all three input forms, `~` expansion, missing-file and validation errors. - HTTP transport: `cert=` forwarded into `httpx.AsyncClient` for string and tuple forms; absent when unset; missing-file error propagates. - SSE transport: factory only injected when cert or non-default verify is set; factory applies cert, custom CA bundle, and preserves `follow_redirects=True` + forwarded headers/auth. - Existing tests: 200/200 in `test_mcp_tool.py` + `test_mcp_sse_transport.py` still pass.	2026-05-28 00:55:55 -07:00
Stephen Schoettler	8595281f3c	fix: expose context engine tools with saved toolsets	2026-05-28 00:28:42 -07:00
Dusk1e	1a9ef83147	fix(security): require API_SERVER_KEY before dispatching API server work	2026-05-28 00:25:08 -07:00
LeonSGP43	442a9203c0	Fix xAI OAuth timeout manual fallback	2026-05-28 00:24:17 -07:00
helix4u	459d7694d3	fix(agent): preload jiter native parser	2026-05-28 00:20:11 -07:00
Robin Fernandes	dc52b82d53	test(auth): update entitlement CI expectations	2026-05-28 00:19:31 -07:00
Robin Fernandes	1cf5e639b3	fix(auth): refresh Nous entitlement in tool menus	2026-05-28 00:19:31 -07:00
Robin Fernandes	406901b27d	feat(auth) normalise the way in which we check whether a user has free/paid access to nous portal so we can expose behaviour and error messages accordingly.	2026-05-28 00:19:31 -07:00
stephenschoettler	0bf9b867cf	fix(website): pin serialize-javascript and uuid via npm overrides Resolves the two Dependabot alerts currently open against the website lockfile: - serialize-javascript: pin to ^7.0.5 (was 6.0.2 — high-severity RCE via RegExp.flags + Date.prototype.to*, plus medium-severity DoS) - uuid: pin to ^14.0.0 (was 8.3.2 — medium buffer bounds check miss in v3/v5/v6 when buf is provided) Lockfile regenerated against current main (not the stale lockfile from the original PR — several Dependabot bumps for mermaid, webpack-dev-server, @babel/plugin-transform-modules-systemjs, fast-uri, lodash-es+langium, lodash, follow-redirects, and dompurify have landed since #30036 was opened, so the website portion was re-applied surgically on top of those). Salvaged the website half of PR #30036. The TUI test half landed on main separately, so this PR is web-only.	2026-05-28 00:07:54 -07:00
kshitijk4poor	7b778db472	chore(release): map MoonRay305 contributor email for #32759 salvage Adds `squiddy@2rook.ai → MoonRay305` to AUTHOR_MAP so contributor_audit.py passes for the salvaged commits in #33482-followup PR.	2026-05-27 23:28:51 -07:00
Squiddy	3ba8962738	fix(kanban): add Windows init lock guard	2026-05-27 23:28:51 -07:00
Squiddy	90b6b3d18f	fix(kanban): harden sqlite connection concurrency	2026-05-27 23:28:51 -07:00
Brian D. Evans	3ad46933d3	docs(voice): use `uv pip install faster-whisper` in STT install hints (#29800 ) * docs(voice): use `uv pip install faster-whisper` in STT install hints Three runtime messages told users to `pip install faster-whisper` (reported in #29782 for the gateway STT failure message under Telegram-in-Docker, where the user hit `bash: pip: command not found`). The Hermes Docker image is built on `ghcr.io/astral-sh/uv` with a uv-managed venv that doesn't ship `pip` on PATH; users on modern `uv tool install` / `uv venv` installs see the same problem. The canonical install command in this repo is `uv pip install` (see `tools/lazy_deps.py:509` `feature_install_command()`), which works in Docker (uv image), in `uv tool install` venvs, and in pip-based venvs that already have uv on PATH. Changed three locations to match: - `gateway/run.py` — Telegram/Discord/Slack/WhatsApp/etc. voice reply when no STT provider is configured. Suggests `uv pip install faster-whisper` and notes that `pip install faster-whisper` also works if `pip` is on PATH. - `tools/voice_mode.py` — `/voice` status line for missing STT. - `cli.py` — Voice-mode startup error, "Option 1". No behavior change beyond the user-facing text. No production code path was touched. * docs(voice): add pip fallback to cli + voice_mode STT hints Copilot flagged that cli.py and tools/voice_mode.py recommend `uv pip install faster-whisper` without a fallback for environments where uv isn't on PATH. The gateway/run.py message already lists `pip install faster-whisper` as an alternative; this commit aligns the two remaining call sites to match. Addresses inline Copilot review on #29800. --------- Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>	2026-05-28 16:23:14 +10:00
Teknium	4e702fe2d9	test(ci): harden two flaky tests against CI noise (#33675 ) Some checks are pending Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker / shell lint / Lint Dockerfile (hadolint) (push) Waiting to run Details Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details OSV-Scanner / Scan lockfiles (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details uv.lock check / uv lock --check (push) Waiting to run Details Two unrelated transient failures on PR #33661's initial CI run, both pre-existing on main and recovered on rerun. Hardening: 1. tests/cron/test_scheduler.py::TestRunJobConfigLogging — added mocks for resolve_runtime_provider() and discover_mcp_tools(). The yaml-warning tests intend to exercise only the warning-log path, but _run_job_impl continues into provider resolution and MCP discovery after the warning. Both can spawn subprocesses / hit the network and pushed the test over its 30s budget under GHA load. 2. tests/tools/test_browser_supervisor.py — wrapped Chrome teardown against the stdlib subprocess._wait() race (bpo-38630). When SIGCHLD arrives during proc.wait(), _try_wait(WNOHANG) can return a foreign pid and the 'assert pid == self.pid or pid == 0' fires. Fixture now catches AssertionError/TimeoutExpired, force-kills, and always reaps so no zombie escapes. Same hardening applied to the early-skip branch.	2026-05-27 23:15:41 -07:00
Ben	875d930ac7	test(docker-update): stub subprocess.run in git-install regression guard The regression-guard test `test_cmd_update_on_git_install_does_not_print_docker_message` mocked `is_managed` and `detect_install_method` but not `subprocess.run`, so once `cmd_update(check=True)` decided this was a git install it shelled out to a real `git fetch upstream` / `git fetch origin`. On CI runners the worktree has no `upstream` remote configured and the fetch hung past the 30s pytest-timeout — test (4) slice failed in #33659 CI. Fix: stub `subprocess.run` with a successful CompletedProcess-shaped object whose stdout is `"0\n"`, so: - no real git command is ever invoked - the rev-list parsing later in the flow (`int(stdout.strip())`) succeeds rather than `ValueError`-ing through the test's SystemExit catch - the flow proceeds far enough to confirm the docker banner is absent (the actual assertion) Also broaden the except clause to `(SystemExit, Exception)`: the only assertion in this test is the negative-banner check on captured stdout; any further failure in the rest of the update flow is irrelevant to that contract. Verified locally: all 7 tests in `tests/hermes_cli/test_cmd_update_docker.py` pass in 0.39s (previously the regression-guard test alone consumed 30s+ and got SIGTERM'd).	2026-05-28 15:50:25 +10:00

1 2 3 4 5 ...

9788 commits