hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-14 14:12:44 +00:00

Author	SHA1	Message	Date
liuhao1024	4a2fa77c15	fix(cli): pre-check CUA release asset for Intel macOS before install The upstream cua-driver installer resolves the latest release and attempts to download an architecture-specific asset. When the release only ships arm64 builds (as of v0.1.6), the installer fails with a raw 404 on Intel macOS with no clear path forward. Add _check_cua_driver_asset_for_arch() that probes the GitHub Releases API before running the installer. If the latest release has no x86_64/amd64 asset, print a clear warning and link to the upstream issue. On arm64 or API failure, fail open and let the installer proceed as before. Fixes #24530	2026-05-21 19:17:45 -07:00
teknium1	9896e43db5	fix(skills): load Linux-tagged skills on Termux (android sys.platform) Reported by @LikiusInik in Discord: on Termux only 3 built-in skills appeared and /gh-pr-workflow + every other slash-skill from github/productivity/mlops was missing. Root cause: skill_matches_platform() compares sys.platform.startswith() against the skill's platforms list. Termux is a Linux userland on Android, but Python 3.13+ reports sys.platform == "android" instead of "linux" — so the ~60 built-in skills tagged platforms:[linux,macos, windows] (github-pr-workflow, google-workspace, github-auth, huggingface-hub, etc.) all got filtered out at the listing step in tools/skills_tool.py:_find_all_skills and never appeared as /slash commands or in skill_view. Fix: when is_termux() detects we're running inside Termux, accept "linux" platform tags regardless of whether sys.platform is "linux" (pre-3.13) or "android" (3.13+). Also accept explicit platforms:[termux] / [android] tags. macOS-only and Windows-only skills correctly remain excluded. E2E (simulated TERMUX_VERSION=set + sys.platform="android"): Before: _find_all_skills() returned ~3 skills. After: _find_all_skills() returns 84 skills including github-pr-workflow, google-workspace, github-auth, huggingface-hub. Apple-only skills remain excluded. Non-Termux Linux/macOS/Windows behavior unchanged (verified). Tests: tests/agent/test_skill_utils.py — 9 new cases covering android-as-Termux, the [linux,macos,windows] case, macOS-only exclusion, explicit termux/android tags, non-Termux Android safety, and unchanged behavior on real Linux/macOS.	2026-05-21 19:08:38 -07:00
briandevans	280dd4513a	fix(computer-use): address Copilot review on max_elements cap Four findings from Copilot's review on PR #22891, all in the AX elements-array cap added by 22fa1ed: 1. The truncation note ("response truncated to N of M elements") was appended unconditionally — including in the som/vision multimodal path, whose response carries a screenshot rather than an `elements` array. The note described a payload field that wasn't present. Moved the note into the AX-text branch where the array actually appears. 2. `_format_elements(cap.elements)` ran on the full untrimmed list with its own `max_lines=40` cap, so a caller passing `max_elements=10` would see summary lines referencing `#11..#40` even though the JSON `elements` array only held #1..#10. Format on `visible_elements` instead so the summary indices always exist in the response. 3. `_coerce_max_elements` enforced a lower bound but no upper bound, so `max_elements=10_000_000` silently disabled the safeguard and reintroduced the original context-blow-up. Added a hard cap (`_MAX_ALLOWED_MAX_ELEMENTS = 1000`) that clamps oversized values. 4. The schema string said "Default 100" but the property carried no `default` field, and claimed `max_elements` had no effect on som/ vision while the image-missing fallback path can still return an elements array. Added `"default": 100`, `"maximum": 1000`, and clarified the fallback-path wording. Each finding gets a regression test: - test_capture_ax_clamps_oversized_max_elements_to_hard_cap - test_capture_ax_summary_indices_match_returned_elements - test_capture_multimodal_summary_omits_truncation_note - test_schema_max_elements_documents_default_and_upper_bound Verified with `pytest tests/tools/test_computer_use.py` (53 passed, including the 5 new cases). Confirmed each new test fails on the pre-fix code path before applying the production change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 19:07:32 -07:00
briandevans	bb694bad42	fix(computer-use): cap AX `elements` array to prevent context blowup (#22865 ) `computer_use(action='capture', mode='ax')` returned the full AX element list verbatim in the JSON response. Dense Electron / Obsidian / JetBrains UIs publish 500+ AX nodes (one reproduction in #22865 returned 597 elements against Obsidian), so a single capture could consume enough context to trigger compression failures or render the session unusable. The human-readable `_format_elements` summary is already capped at 40 lines, so the truncation gap was invisible to anyone reading the summary output. Add a `max_elements` argument to the tool schema, default 100, that trims the AX `elements` array. When the cap fires, the response surfaces `total_elements` and `truncated_elements` and appends a "raise max_elements or pass app= to narrow" hint to the summary so the model knows the JSON view is partial and can re-issue with a tighter scope. Validation is centralized in `_coerce_max_elements`: missing / non-integer / sub-1 inputs fall back to the default cap, so the protection can never be silently disabled by a malformed tool-call argument. The cap only affects AX-mode JSON; `mode='som'` and `mode='vision'` keep returning a screenshot + image-aware summary unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 19:07:32 -07:00
brooklyn!	a7cd254c29	feat(tui): mouse_tracking DEC mode presets (salvage of #26681 ) (#30084 ) * feat(tui): make display.mouse_tracking pick which DEC modes to enable Previously the boolean flag was all-or-nothing across modes 1000+1002+1003+1006. Inside tmux, mode 1003 (any-motion) makes every mouse cross of the prompt row fire a clipboard probe that surfaces as "No image in clipboard" — sometimes dozens in a row. Disabling tracking entirely killed scroll-wheel scrolling too, since tmux's own scrollback is preempted by the alt-screen TUI. `display.mouse_tracking` (and `/mouse <preset>`) now accepts `off \| wheel \| buttons \| all` in addition to the legacy booleans. `wheel` is 1000+1006: scroll wheel + click only, no drag, no hover — the tmux-friendly subset. `buttons` adds 1002 for drag-to-select. `all` (= legacy `true`) keeps the hover-driven UI (scrollbar paginate-on-hover, link mouseenter, etc.). * fix(tui): repaint + sync mouse mode when display.mouse_tracking changes Two interacting bugs left the TUI blank when `display.mouse_tracking` switched at runtime (config edit, /mouse <preset>): 1. AlternateScreen's effect re-runs on every `mouseTracking` change, tearing down and re-entering the alt screen. After re-entry, ink's frame buffers are reset by `resetFramesForAltScreen()` but nothing schedules the follow-up render — the alt screen sits blank until some other state change happens to trigger one. Add a `scheduleRender()` in `setAltScreenActive`'s active=true branch so the freshly-entered alt screen gets a full repaint immediately. 2. `setAltScreenActive` early-returns when `active` hasn't changed, which silently drops a `mouseTracking` change if the cleanup→setup pair somehow leaves `altScreenActive` already true. Call `setAltScreenMouseTracking` explicitly from the AlternateScreen effect so the in-memory mode and terminal DECSET sequence stay in sync regardless of how `setAltScreenActive` resolved (the call is a no-op when the mode is unchanged). * fix(tui): address copilot review #4341269705 - tui_gateway/server.py: drop the never-referenced _MOUSE_TRACKING_MODES frozenset (comment #3284802434). _MOUSE_TRACKING_ALIASES already centralizes the canonical preset set via its values; the separate constant added no behavior. - tests/test_tui_gateway_server.py: update the existing test_config_mouse_uses_documented_key_with_legacy_fallback to assert the new preset strings ('all'/'off' instead of 'on'/'off', display.mouse_tracking persisted as 'all' instead of True) and add test_config_mouse_accepts_preset_strings_and_aliases covering /mouse set with wheel/click/unknown (comment #3284802453). The on/off legacy config.set return shape was an implementation detail of the boolean flag, not a stable API — the slash command, gateway help text, and docs all advertise the preset values now. - ui-tui/packages/hermes-ink/src/ink/ink.tsx: schedule a render at the end of reenterAltScreen() (comment #3284802461). Mirrors the same fix in setAltScreenActive() from `ece0a2f4c` — without it, SIGCONT/resize self-heal/stdin-gap re-entry leaves the alt screen blank because every caller returns early after invoking us. * fix(tui): address copilot review #4341308478 round 2 - ui-tui/src/config/env.ts (comment #3284837577): the precedence comment was misleading. Actual behavior on origin/main is HERMES_TUI_MOUSE_TRACKING (explicit override) > Termux default > HERMES_TUI_DISABLE_MOUSE legacy kill-switch. This is preserved from main; the only change here was the wrong comment that claimed DISABLE_MOUSE kept kill-switch semantics. Rewrote the comment block to document the actual precedence ladder. - tui_gateway/server.py /mouse set (comment #3284837607): replaced 'str(value or "").strip().lower()' with the explicit None idiom already used for /indicator, so programmatic callers can pass 0 / False and have them route through _MOUSE_TRACKING_ALIASES → 'off' instead of collapsing to '' and triggering the toggle path. - ui-tui/packages/hermes-ink/src/ink/components/AlternateScreen.tsx (comment #3284837620): always prepend DISABLE_MOUSE_TRACKING before enableMouseTrackingFor(...) on mount. Otherwise selecting 'wheel'/'buttons' from a state where DEC 1003 was already asserted (crash, another app, debugger) would silently leave hover on. Also unconditionally DISABLE on unmount so a crash mid-mount can't leak DEC modes back to the host shell. * chore(release): map nat@nthrow.io to @nthrow for #26681 salvage * fix(tui): drop redundant setAltScreenMouseTracking in AlternateScreen Copilot review #4341356637 (comment #3284880417). The explicit setAltScreenMouseTracking(mouseTracking) after setAltScreenActive(true, mouseTracking) was defensive paranoia added in the previous fix commit that's not actually reachable in practice: - React's cleanup always runs before the next setup, so on any prop change (mouseTracking or writeRaw) the cleanup sets active=false first. Setup then sees active was false and applies the new mode via setAltScreenActive without early-returning. - On the impossible 'active stayed true' path, the writeRaw above has already sent DISABLE_MOUSE_TRACKING + enableMouseTrackingFor(newMode) to the terminal, so the in-memory mode would lag but the visible state is already correct. Removing the redundant call means a single DEC sequence per mount. If the 'active stayed true' path ever manifests in practice, the right fix is in setAltScreenActive (track mode regardless of the active early-return), not here. * fix(tui): always DISABLE before enableMouseTrackingFor in ink.tsx Copilot review #4341379994 (comments #3284900825, #3284900840, #3284900852). Three remaining call sites in ink.tsx still re-enabled mouse tracking without first sending DISABLE_MOUSE_TRACKING: - handleResize alt-screen recovery (line ~577) - reassertTerminalModes stdin-gap re-assertion (line ~1351) - reenterAltScreen SIGCONT/resize/stdin-gap self-heal (line ~1408) For 'wheel'/'buttons' presets, omitting DISABLE leaves any externally- asserted DEC 1003 (other apps, prior crash, tmux state) still active and the hover-free preset silently has hover on. DISABLE_MOUSE_TRACKING is idempotent and safe to send unconditionally — it resets all four modes. Matches the pattern already in setAltScreenMouseTracking and the AlternateScreen mount path. * fix(tui): always DISABLE before enableMouseTrackingFor in exitAlternateScreen Copilot review #4341452823 (comment #3284959762). exitAlternateScreen() was the last call site in ink.tsx still re-enabling mouse tracking without DISABLE first. Editors (vim/nvim/less) and tmux can leave DEC 1003 hover asserted across the handoff back; without DISABLE, 'wheel'/'buttons' presets silently kept hover on after the editor quit. Now all five enableMouseTrackingFor() call sites in ink.tsx prepend DISABLE_MOUSE_TRACKING — handleResize, reassertTerminalModes, reenterAltScreen, setAltScreenMouseTracking, exitAlternateScreen. * fix(tui): add defensive default to enableMouseTrackingFor switch Copilot review #4341485231 (comment #3284979323). TS exhaustive switch returns string per the type system, but a JS caller / corrupted config / hot-reload-in-dev could reach the function with an unknown value at runtime. Without a default, that path returns undefined which then concatenates as the literal string 'undefined' into the terminal byte stream — visibly garbling output. Treat unknown as 'off' (no DEC sequences) so the worst case is silent input loss rather than a wrecked screen. --------- Co-authored-by: Nat Thrower <nat@nthrow.io>	2026-05-21 20:25:52 -05:00
xxxigm	bec2250d2c	test(computer_use): end-to-end regression for capture routing (#24015 ) Add tests/tools/test_computer_use_capture_routing.py — 13 integration tests that drive _capture_response end-to-end with deterministic stubs for the routing helper, _run_async, vision_analyze_tool, and get_hermes_dir, so the full code path is exercised without a live cua-driver, real auxiliary client, or network access. Coverage: * TestCaptureResponseDefaultPath (3 cases) - SOM PNG capture returns the legacy multimodal envelope when the routing helper says 'native' (image/png MIME). - Same path returns image/jpeg MIME for JPEG payloads (cua-driver can return either). - AX-only mode never even consults the routing helper because no PNG is present. * TestCaptureResponseRoutedToAuxVision (5 cases) - SOM capture with routing on returns a JSON string with the vision_analysis embedded, the AX/SOM index preserved, and NO image_url parts. Verifies the aux call receives a path under the configured cache and a prompt that grounds itself against the AX summary. - Temp screenshot file is unlinked after _capture_response returns, including when the aux call raises (the finally block runs). - Empty / malformed aux analysis falls back to the multimodal envelope so the user always gets something useful. * TestRoutingDecisionWiring (4 cases) - Explicit auxiliary.vision in config flips routing on regardless of main-model vision capability. - Vision-capable main + native tool-result support keeps multimodal. - Config load failure fails open (returns False, multimodal path continues to work). - Helper exception is swallowed and routes to legacy behaviour. * TestBugReproductionAnchor (1 case) - directly pins the #24015 contract: when routing is on, the response must NEVER contain a 'data:image' or 'image_url' substring. That is exactly what tripped the reporter's HTTP 404 ('No endpoints found that support image input') on tencent/hy3-preview before the fix. Bug-reproduction proof: $ git checkout upstream/main -- tools/computer_use/tool.py $ scripts/run_tests.sh tests/tools/test_computer_use_capture_routing.py ============================== 13 failed in 1.29s ============================== $ # restore tool.py to this branch's HEAD $ scripts/run_tests.sh tests/tools/test_computer_use_capture_routing.py ============================== 13 passed in 1.04s ============================== Total branch coverage: 85 passed across test_computer_use.py, test_computer_use_vision_routing.py, test_computer_use_capture_routing.py	2026-05-21 17:38:19 -07:00
xxxigm	5ce5fe3181	test(computer_use): cover capture vision-routing helper Add tests/tools/test_computer_use_vision_routing.py — 28 unit tests that pin the contract of the new vision-routing helper introduced in the previous commit: * TestExplicitAuxVisionOverride (12 cases): mirror the auxiliary.vision detection rules used by agent.image_routing so the capture path and the user-attached-image path agree on what counts as an explicit override (provider/model/base_url with non-blank, non-'auto' values). * TestRouteDecision (7 cases): pin the policy itself — explicit override always wins, vision-capable + native-tool-result keeps multimodal, everything else fails closed and routes to aux. * TestLookupHelpers (5 cases): defensive paths for the models.dev / tool-result-support lookups (blank inputs, exceptions, missing caps). * TestModuleSurface (4 cases): pin the public/__all__ surface and keep internal helpers addressable so the integration test in the next commit can monkeypatch them deterministically. Run with: scripts/run_tests.sh tests/tools/test_computer_use_vision_routing.py	2026-05-21 17:38:19 -07:00
Teknium	2a474bcf72	fix(termux): resolve packed-refs and worktree refs in skill-sync fingerprint The bundled-skill sync stamp added in the cherry-picked salvage commit parsed .git/HEAD and looked for a loose ref file in the worktree gitdir only, so two real cases hit the unresolved branch: - repos after `git gc` where active refs live in packed-refs - linked worktrees, whose branch ref lives in <commondir>/refs/heads/ (verified on the worktree this salvage was built in) Both fell back to a constant-string fingerprint, so post-commit launches would never re-run the real skill sync. Now we resolve packed-refs and check both the worktree gitdir and the common dir for loose refs. Adds three tests covering: packed-refs resolution, worktree common-dir packed lookup, worktree common-dir loose lookup, and the explicit 'unresolved' marker (still stable + version-fallback-safe).	2026-05-21 17:19:05 -07:00
adybag14-cyber	6dbbf20ff4	perf(termux): speed up non-tui cli startup	2026-05-21 17:19:05 -07:00
briandevans	5aa4727f34	fix(computer-use): surface app=… filter no-match instead of silently using frontmost (#24170 bug 1) `CuaDriverBackend.capture(app=X)` and `focus_app(app=X)` silently fell back to the frontmost on-screen window when X matched no app — typically a menu-bar utility (e.g. "Fuwari" in the bug reporter's case) rather than the requested app. The agent then received UI elements for the wrong app and clicked / typed into it. The root cause is a localized macOS app name mismatch: `list_windows` returns the localized `app_name` (e.g. "計算機" on a Japanese/Chinese system) but callers naturally pass the English name ("Calculator"). The substring filter doesn't match, and the code falls through to picking the frontmost window with no signal that the filter was effectively dropped. Fix: - `capture(app=…)`: when the filter matches nothing, return a `CaptureResult` with empty `app`/`elements` and a diagnostic `window_title` pointing the caller at `list_apps` and noting the localized-name convention. `_active_pid` / `_active_window_id` are left untouched so a subsequent action doesn't inadvertently hit the wrong process. - `focus_app(app=…)`: when the filter matches nothing, set `target = None` and let the existing `return ActionResult(ok=False, …, "No on-screen window found for app …")` path fire instead of falsely reporting success on the frontmost window. This addresses bug 1 only from #24170. Bugs 2 & 5 are addressed in #30046; bugs 3 & 4 in #30032.	2026-05-21 17:15:35 -07:00
Bartok9	4cc18877c6	fix(computer_use): preserve app context for capture_after; fix element label parsing (#24170 bugs 2 & 5) Bug 2 (capture_after=True loses app context): _maybe_follow_capture called backend.capture(mode='som') with no app=, causing cua-driver to capture the frontmost window instead of the app targeted by the preceding capture/focus_app. Fix: track _last_app on CuaDriverBackend and thread it through the follow-up capture call so the same app is re-captured regardless of which window has OS focus. Bug 5 (element labels stripped in capture results): _ELEMENT_LINE_RE matched the classic ' - [N] AXRole "label"' format but not the '[N] AXRole (order) id=Label' format introduced in cua-driver v0.1.6. All element labels were silently dropped as empty strings, making element identification impossible. Fix: extend regex to capture both group(3) (quoted label) and group(4) (id= label), and update _parse_elements_from_tree to use group(4) as fallback. Both old and new cua-driver output now produce populated UIElement.label values. focus_app() now also sets _last_app so that capture_after= on any subsequent action re-targets the focused app. 5 new regression tests added. Part of #24170 (bugs 1 and 3/4 addressed separately).	2026-05-21 14:19:09 -07:00
Teknium	3fde8c153d	fix(skills): prune dependency/venv dirs from all skill scanners (#30042 ) * fix(skills): skip dependency dirs in skill scan * fix(skills): widen sibling rglob scanners to use shared exclusion set Follow-up to PR #29968. The contributor's PR widened EXCLUDED_SKILL_DIRS in the canonical walker (iter_skill_index_files), which fixes the user-visible discovery path. This commit sweeps the ~12 other rglob('SKILL.md') sites that did their own ad-hoc filtering — most only checked .git/.hub, some had no filter at all — so dependency dirs (.venv, node_modules, site-packages, etc.) cannot leak ghost skills through the secondary paths. Adds agent.skill_utils.is_excluded_skill_path(path) helper. Migrates all 13 sites to use it. Removes 3 hardcoded duplicate filter sets. Sites touched: agent/curator_backup.py - skill backup file count gateway/run.py - disabled-skill response (2 sites) hermes_cli/dump.py - skill count in env dump hermes_cli/profile_describer.py- profile description (2 sites) hermes_cli/profile_distribution.py - profile install count hermes_cli/profiles.py - profile skill count hermes_cli/skills_hub.py - category detection tools/skill_manager_tool.py - skill name lookup (already used set, now uses helper) tools/skill_usage.py - usage tracking + skill dir lookup (2 sites) tools/skills_hub.py - optional skills find + scan (2 sites) tools/skills_sync.py - bundled skills sync E2E verified with the exact reported shape (bring/scripts/.venv/.../typer/.agents/skills/typer/SKILL.md): no sibling site picks up the ghost skill, all five legit-skill counts still return 1. * chore(infographic): retro-pop-grid bento for PR #30042 skill-scanner sweep --------- Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>	2026-05-21 14:18:02 -07:00
helix4u	3462b097e2	fix(voice): chunk oversized CLI recordings	2026-05-21 14:17:39 -07:00
Teknium	552e9c7881	feat(secrets): Bitwarden Secrets Manager integration with lazy bws install (#30035 ) * feat(secrets): Bitwarden Secrets Manager integration with lazy bws install Pull API keys from Bitwarden Secrets Manager at process startup instead of storing them all in plaintext in ~/.hermes/.env. One bootstrap token (BWS_ACCESS_TOKEN) replaces N per-provider keys, and rotating a credential becomes a single change in the Bitwarden web app. Bitwarden defaults to source of truth: secrets pulled from BSM overwrite any matching env vars on startup so rotations actually take effect. Set secrets.bitwarden.override_existing: false in config.yaml to invert. The bws binary is auto-downloaded into ~/.hermes/bin/bws on first use (pinned to v2.0.0, SHA-256 verified against the GitHub release checksum file). No apt, brew, or sudo required. New surfaces: hermes secrets bitwarden setup — interactive wizard hermes secrets bitwarden status — config + binary + token state hermes secrets bitwarden sync — dry-run fetch / --apply exports hermes secrets bitwarden disable — flip enabled: false hermes secrets bitwarden install — just download the binary Failures (missing binary, bad token, no network) never block Hermes startup — they emit a one-line warning to stderr and continue with whatever credentials .env already had. Docs: website/docs/user-guide/secrets/{index,bitwarden}.md Tests: tests/test_bitwarden_secrets.py (26 tests, hermetic — bws subprocess and HTTP downloads fully mocked) * chore(infographic): add bitwarden-secrets-manager bento-grid retro-pop-grid Generated for PR #30035 — Bitwarden Secrets Manager integration. Style picked via pick_pr_infographic_style.py rotation: layout: bento-grid style: retro-pop-grid aspect: 1:1 square Saved at infographic/bitwarden-secrets-manager/infographic.png	2026-05-21 14:10:34 -07:00
liuhao1024	18cd1e5c72	fix(computer_use): correct type_text MCP tool name and implement drag action Bug 3: The cua_backend type_text() method called MCP tool 'type_text_chars' which does not exist in current cua-driver. Changed to 'type_text' which is the correct MCP tool name. Bug 4: The drag() method returned a hardcoded 'not supported' error even though cua-driver exposes a 'drag' MCP tool. Implemented proper drag dispatching with coordinate-based and element-based targeting. Added dispatch-level validation for drag to ensure from/to coordinates or elements are provided before calling any backend. Fixes #24170 (bugs 3 and 4)	2026-05-21 14:08:28 -07:00
helix4u	ba9964ff0d	fix(custom): pass custom provider extra body Allow custom OpenAI-compatible providers declared under `custom_providers:` to set provider-specific `extra_body` fields and have Hermes merge them into chat-completions requests when the matching custom endpoint is active. This is a manual per-provider override rather than a model-name heuristic. OpenAI-compatible Gemma thinking support is real, but the on-wire payload shape is backend-specific: some servers want top-level `enable_thinking`, while vLLM Gemma and NIM-style endpoints expect `chat_template_kwargs`. A per-provider override is safer than picking one assumed payload. Example config: ```yaml custom_providers: - name: gemma-local base_url: http://localhost:8080/v1 model: google/gemma-4-31b-it extra_body: enable_thinking: true reasoning_effort: high ``` For vLLM Gemma or NIM-style endpoints, use the nested shape those servers expect: ```yaml extra_body: chat_template_kwargs: enable_thinking: true ``` Changes: - `hermes_cli/config.py`: preserve `extra_body` in normalized `custom_providers:` entries and allow it in the validated field set. - `hermes_cli/runtime_provider.py`: propagate custom-provider `extra_body` as `request_overrides.extra_body` for named custom runtime resolution, including credential-pool paths. - `agent/agent_init.py`: at agent init, locate the matching custom-provider entry by `base_url` (+ optional model) and merge its `extra_body` into `AIAgent.request_overrides`, with caller-provided overrides winning on conflicting top-level keys. - `plugins/model-providers/custom/__init__.py`: keep existing CustomProfile behavior (Ollama `num_ctx`, `think=False` when reasoning disabled); user-configured `extra_body` flows through `request_overrides`. - `website/docs/integrations/providers.md`: document the explicit `extra_body` override and the vLLM/Gemma `chat_template_kwargs` variant. - Tests cover config normalization, runtime propagation, model matching, trailing-slash equivalence, fallback when no `model` field is set, and caller-override merging precedence. Verified end-to-end against `CustomProfile` via `ChatCompletionsTransport`: configured `extra_body` reaches `kwargs.extra_body` on the wire request, and coexists with profile-generated entries (Ollama `num_ctx`, `think=False`) without clobber. Salvaged from #29022 onto current `main`. Cosmetic typing edit in `plugins/model-providers/custom/__init__.py` and a stale-base docs revert in `providers.md` were dropped during cherry-pick. Closes #29022	2026-05-21 07:48:53 -07:00
ethernet	48be2e0e4d	test: use subprocesses for each test file (#29016 ) * ci(tests): install ripgrep from prebuilt tarball instead of apt apt-get update + install of ripgrep takes ~4 min on the GHA Ubuntu runners (the apt-get update against archive.ubuntu.com is the slow part; ripgrep itself is small). Switching to the upstream musl binary tarball cuts the step to a few seconds. - Pinned to ripgrep 15.1.0 with sha256 verification (same hash as published in the releases sha256 sidecar file). - Drops the `rg` binary into /usr/local/bin so it is on PATH for every subsequent step without GITHUB_PATH manipulation. - Applied to both the test and e2e jobs in tests.yml. * fix(cli): compile syntax check to tempdir, not source __pycache__ `_validate_critical_files_syntax` runs `py_compile.compile()` on each critical bootstrap file after a successful `git pull`. The default `py_compile` writes the resulting `.pyc` next to the source under `__pycache__/`, which causes two real problems: 1. Parallel test workers walking the same source tree (e.g. running the suite under per-file process isolation) can race against each other on the `__pycache__` write — manifests as flaky 'directory not empty' errors during teardown. 2. In production, the post-pull syntax check leaves a `.pyc` behind that the next interpreter run might pick up — fine when the interpreter version matches, sketchy if it doesn't. Fix: write the compiled output to a `tempfile.TemporaryDirectory()` that's discarded on function exit. We only care about the compile-or-not signal, not the artifact. * test(runner): per-file process isolation, drop manual state reset + xdist Replace fragile manual _reset_module_state test fixtures with robust per-file subprocess isolation. Each test file runs in a fresh `python -m pytest <file>` subprocess via ThreadPoolExecutor. No xdist, no custom pytest plugin, no shared worker state. Key changes: * scripts/run_tests_parallel.py — new runner: discovers test files, runs N in parallel via ThreadPoolExecutor, captures stdout per file, treats exit code 5 (no tests collected) as pass, kills all children on exit. Change from cpu_count to cpu_count2. The runner is I/O-bound (waiting on subprocess.communicate() from pytest children) The parent process does almost no CPU work, so 2x oversubscription keeps more pipes full. When a file fails, immediately show the last 30 lines of pytest output (stack traces + FAILED summary) plus a ready-to-copy repro command: python -m pytest tests/agent/test_auxiliary_client.py scripts/run_tests.sh — delegates to run_tests_parallel.py * .github/workflows/tests.yml — test step: python scripts/run_tests_parallel.py * pyproject.toml — drop pytest-xdist, pytest-split; simplify addopts * tests/conftest.py — remove ~200 lines of manual state-reset fixtures * AGENTS.md — update Testing section for per-file design * test(runner): speed gateway test antipattern scan up * fix(test): web search provider plugin test missing xai * fix(tests): make 14 test files pass under per-file subprocess isolation Tests that relied on cross-file state pollution from xdist workers fail when run in isolation (per-file subprocess model). Root causes and fixes: Tool registry not populated: - test_video_generation_tool_surface_matrix: add discover_builtin_tools() - test_web_providers_brave_free/ddgs/searxng/general: autouse fixtures registering all 8 bundled web providers, reset after each test - test_website_policy: same provider registration pattern - test_web_tools_tavily: same pattern across 3 dispatch test classes - Also add is_safe_url/check_website_access mocks where SSRF check blocks example.com (DNS resolution fails in isolated envs) Stale check_fn cache: - test_kanban_tools: invalidate_check_fn_cache() + _clear_tool_defs_cache() in both kanban guidance tests (prior test cached False for kanban_show) - test_discord_tool: cache invalidation in setup/teardown - test_homeassistant_tool: invalidate_check_fn_cache() before registry queries Module-level state pollution: - test_auxiliary_client: autouse fixture clearing _aux_unhealthy_until cache - test_skill_commands: set_session_vars() instead of patch.dict(os.environ) (ContextVar takes precedence over os.environ) - test_dm_topics: overwrite sys.modules + separate telegram.constants mock + force-reimport of gateway.platforms.telegram - test_terminal_tool_requirements: removed duplicate class declaration, autouse _clear_caches fixture * change(tests): run_tests.sh explicitly includes env vars instead of manually dropping some vars, now we just only include some * fix(tests): 5 more isolation/NixOS fixes - test_approval_plugin_hooks: isolate HERMES_HOME so real user's command_allowlist doesn't short-circuit the approval path - test_google_chat: skipif when Platform.GOOGLE_CHAT not in enum (feature not merged on this branch) - test_write_deny: test systemd prefix against tmp_path instead of /etc/systemd which resolves to /nix/store on NixOS - test_pty_bridge: use shutil.which('cat') instead of /bin/cat (doesn't exist on NixOS) - profiles.py: rmtree onexc handler chmod's parent dirs too, fixing profile deletion when copytree preserved read-only modes from nix store * fix(tests): clear unhealthy cache in autouse fixture for auxiliary_client * fix(tests): skip send_message when telegram not installed; handle missing worker_id in browser_supervisor * fix: py3.11 rmtree onexc compat + belt-and-suspenders unhealthy cache clear for expired codex test * fix: address PR #29016 review feedback - Remove tracked .pytest-cache/ artifact and add to .gitignore - Fix stale 'xdist worker' comment in conftest.py - Deduplicate web provider registration into tests/tools/conftest.py shared helper (register_all_web_providers), replacing 8 copy-pasted blocks across 6 test files - Update PR description: remove stale recovered-test-files claim, fix worker count to match code (cpu_count2) fix: eliminate race in stale-cache achievements test The background scan thread could complete and overwrite _SNAPSHOT_CACHE before evaluate_all() returned the stale data — only 10 fake sessions made the scan finish instantly. Added scan_delay param to _FakeSessionDB and set it to 2s in the stale-cache test so the background thread can't win the race.	2026-05-21 16:40:04 +05:30
alt-glitch	87d9239009	chore: trim verbose comments/docstrings, add AUTHOR_MAP entry - Replace 18-line comment block with 3-line invariant statement - Trim test docstrings from multi-paragraph to single-line summaries - Trim assertion messages from 4-line to 2-line mismatch reports - Replace 5-line WHAT comments in stubs with 1-line WHY comments - Add ziliangdotme@gmail.com -> ziliangpeng to AUTHOR_MAP	2026-05-21 12:49:21 +05:30
Ziliang Peng	c3a09f7835	fix(background_review): propagate parent toolset config to keep tools[] cache-stable ## Summary The background skill/memory-review fork constructed a child `AIAgent` without propagating `enabled_toolsets` / `disabled_toolsets` from the parent. When the parent narrowed its toolset (via `hermes tools disable` or `config.yaml`), the fork's default `enabled_toolsets=None` expanded to "all registered tools" — and the fork's outbound request body sent a wider `tools[]` array than the parent's main-turn request. Anthropic's prompt-cache key includes the `tools[]` array byte-for-byte, so this divergence forked the cache lineage on every nudge and forced a full prefix rewrite. On a captured ~4 hour Claude-via-Hermes session this cost roughly 4.3 M cache-write tokens — about half of those attributable to the per-nudge alternation between the main turn's narrowed `tools[]` and the review fork's wider `tools[]`. ## Goal Extend the byte-stability invariant established by PR #17276 (which fixed `system`) to the `tools[]` slot of the request body, so the review fork's outbound request hits the parent's warmed Anthropic prefix cache regardless of how the parent's toolset is configured. ## Implementation Two-line change in `agent/background_review.py`: pass `enabled_toolsets=getattr(agent, "enabled_toolsets", None)` and the matching `disabled_toolsets` kwarg into the `AIAgent(...)` call inside `_spawn_background_review`. Adds an explanatory block comment that calls out the cache-key dependency and the relationship to PR #17276. The post-construction runtime whitelist (`set_thread_tool_whitelist({memory, skills})`) is untouched — it still gates which tools the model is allowed to dispatch. This change aligns only what the request body transmits, not what the review is allowed to do, so the safety contract from issue #15204 remains intact. ## Testing - `tests/run_agent/test_background_review_cache_parity.py`: new `test_review_fork_inherits_parent_toolset_config` asserts the parent's `enabled_toolsets` and `disabled_toolsets` reach the review-fork constructor as kwargs. - `tests/run_agent/test_background_review_toolset_restriction.py`: the existing `test_background_review_does_not_narrow_toolset_schema` was inverted (its old "must NOT pass enabled_toolsets" rule was built on the assumption that the parent always ran with the registry default — wrong in practice when the parent is narrowed). Renamed to `test_background_review_matches_parent_toolset_config` and updated to assert the parent's value propagates verbatim. - Verified the new positive test fails without the fix and passes with it. - Full suite for `test_background_review*`: ``` $ python -m pytest tests/run_agent/test_background_review.py \ tests/run_agent/test_background_review_summary.py \ tests/run_agent/test_background_review_toolset_restriction.py \ tests/run_agent/test_background_review_cache_parity.py -q 18 passed in 1.85s ``` ## Scope - `agent/background_review.py`: 2 added kwargs + explanatory comment. - Two test files: one new positive test, one inverted existing test. - No production code paths outside the review fork; no schema changes; no public-API changes. Refs: ziliangpeng/hermes-agent#1 (root-cause analysis with wire-level cache-write measurements). Extends PR #17276's `system`-bytes invariant to the `tools[]` slot.	2026-05-21 12:49:21 +05:30
EloquentBrush0x	6c26727bb3	fix(gateway): extend observe+attribution to location and media handlers _handle_location_message and _handle_media_message were skipped when the observe-unmentioned-group-messages feature landed (`a9db0e2c7`). Both handlers now: 1. Check _should_observe_unmentioned_group_message on the skipped path and call _observe_unmentioned_group_message so group chatter is stored as shared session context even when the bot is not addressed. 2. Call _apply_telegram_group_observe_attribution on the triggered path so the dispatched event uses the shared (user_id=None) group session instead of the per-user session, letting the model see previously observed context. For stickers the attribution is applied after _handle_sticker completes (which overwrites event.text with the vision description); for all other media types it is applied once after caption cleaning. Four new tests cover the observe and attribution paths for both handlers.	2026-05-20 23:52:18 -07:00
0xsir0000	5edb346c75	security(file-safety): also write-deny <root>/.env when running under a profile (#15981 ) build_write_denied_paths() resolved the protected ``.env`` via get_hermes_home(), which is profile-aware. When a profile is active HERMES_HOME points at ``<root>/profiles/<name>`` and ``hermes_home / ".env"`` expands to the profile env file only — the global ``<root>/.env`` is left off the deny list and a write_file call against it succeeds. Since the top-level .env supplies credentials inherited by every profile, this is a P0 credential-exfiltration / overwrite path. Add a parallel ``_hermes_root_path()`` helper that returns the Hermes root (via the existing ``get_default_hermes_root()`` constant) and include ``<root>/.env`` in the deny list alongside ``<active_profile>/.env``. Both paths now refuse write_file/patch regardless of profile state. The active HERMES_HOME .env entry is preserved so the protection in non-profile mode is unchanged. A regression test exercises the profile-active scenario by pointing HERMES_HOME at ``<tmp>/profiles/coder`` and asserting that ``<tmp>/.env`` is denied. Fixes #15981	2026-05-20 23:37:37 -07:00
Teknium	32aea113f0	fix(agent): consult supports_vision override in auto-mode routing The contributor PR (#17936) only patched the strip path in `_model_supports_vision()`. The auto-mode router in `agent/image_routing._lookup_supports_vision` still only read models.dev, so a custom-provider model declared as vision-capable would still get its images routed through vision_analyze in the default `agent.image_input_mode: auto` setting. Users had to set both `supports_vision: true` AND `image_input_mode: native` to bypass the text pipeline. Single-knob behavior now: `supports_vision: true` alone is enough in auto mode. The strip path and the routing path consult the same resolver. - Extract override resolution into `_supports_vision_override()` in agent/image_routing.py and wire it into `_lookup_supports_vision()`. - Refactor `run_agent._model_supports_vision` to call the same helper (DRY, single source of truth for the resolution order). - Strict YAML boolean coercion: `supports_vision: "false"` (quoted — a common YAML mistake) no longer coerces to True via bool() truthiness. Recognised tokens: true/false/yes/no/on/off/1/0 plus real bools and 0/1. Unrecognised values return None and fall through to models.dev. - Add @CNSeniorious000 to AUTHOR_MAP for release attribution. Tests: 26 new (TestCoerceCapabilityBool, TestSupportsVisionOverride, TestLookupSupportsVisionOverride, TestAutoModeRespectsOverride). Existing contributor tests + image_routing + vision_native_fast_path + native_image_buffer_isolation all green (92/92).	2026-05-20 23:27:10 -07:00
Muspi Merol	1c76689b28	fix(agent): resolve supports_vision override for named custom providers Named custom providers are rewritten to provider="custom" at runtime (hermes_cli/runtime_provider.py:_resolve_named_custom_runtime), so a config under providers.my-vllm.models.my-llava.supports_vision was unreachable via self.provider alone. Also try cfg.model.provider as a candidate provider key, covering both runtime and config naming. Adds a regression test for the named-provider path.	2026-05-20 23:27:10 -07:00
Muspi Merol	24c7ce0fb8	feat(agent): allow declaring supports_vision via user config Custom/local provider models absent from models.dev get classified as non-vision and have their image content stripped before reaching the upstream API. Surface a user-facing override: model: supports_vision: true providers: my-vllm: models: my-llava: supports_vision: true The override short-circuits the models.dev lookup in _model_supports_vision(), which is the single gate guarding image-strip preprocessing on every transport path. Refs #8731.	2026-05-20 23:27:10 -07:00
Teknium	b4afc6546e	fix(xai): restore encrypted reasoning replay across turns xAI partner integration requires Hermes to thread `encrypted_content` reasoning items back to the Responses API on every turn so Grok can maintain cross-turn reasoning coherence. PR #26644 (May 15) gated this off for `is_xai_responses` on the theory that the OAuth/SuperGrok surface rejected replayed encrypted blobs and produced the multi-turn "Expected to have received \`response.created\` before \`error\`" failure. That diagnosis was wrong — the prelude-SSE fallback added in the same PR is what actually fixed that failure mode. Suppressing the replay was an unnecessary side-effect that broke the whole point of xAI's partnership integration. Changes: - agent/codex_responses_adapter.py — drop the `is_xai_responses` gate in `_chat_messages_to_responses_input`. Keep the kwarg in the signature for transport compatibility; update the docstring to document the May 2026 reversal. - agent/transports/codex.py — restore `kwargs["include"] = ["reasoning.encrypted_content"]` on the xAI Responses path so xAI echoes encrypted reasoning back to us. - tests/run_agent/test_codex_xai_oauth_recovery.py — flip the three xAI assertions (now: xAI MUST receive replayed reasoning AND we MUST include encrypted_content in the request). - tests/agent/transports/test_codex_transport.py — flip the `include` assertions on `test_xai_reasoning_effort_passed` and `test_xai_grok_4_omits_reasoning_effort`; update the allowlist block comment. The prelude-SSE fallback and the entitlement-403 surfacing fixes from #26644 are untouched — they were independent fixes that happened to ride along with the reasoning-replay gate. Validation: - Targeted: tests/run_agent/test_codex_xai_oauth_recovery.py + tests/agent/transports/test_codex_transport.py → 65/65 pass - Broader: tests/agent/transports/ + tests/run_agent/ → 1674 passed, 3 skipped, 0 failures - E2E (real imports, isolated HERMES_HOME, ResponsesApiTransport build_kwargs): turn-1 request carries `include: ["reasoning.encrypted_content"]`; turn-2 input replays the encrypted_content blob from turn-1's `codex_reasoning_items`; native Codex unchanged.	2026-05-20 23:12:45 -07:00
liuhao1024	4ead464f97	fix(security): guard os.chmod(parent) against / and top-level dirs Five call sites do os.chmod(path.parent, 0o700) without checking that the parent resolves to a safe directory. If HERMES_HOME or another path env var resolves to /, the chmod strips traversal permission from the root inode and bricks the entire host. Add secure_parent_dir() to hermes_constants.py that refuses to chmod / or any top-level directory (depth < 2). Replace all 5 call sites with this helper. Fixes #25821	2026-05-20 22:56:55 -07:00
Markus	a9db0e2c74	Observe unmentioned Telegram group messages	2026-05-20 22:55:31 -07:00
Teknium	c6a992e3e3	fix(security): derive <VENDOR>_API_KEY from host as final credential fallback After #28660's host-gating fix, users with provider=custom and base_url pointed at a commercial endpoint (DeepSeek, Groq, Mistral, …) hit no-key-required even when they had the vendor-named env var set (DEEPSEEK_API_KEY, GROQ_API_KEY, …). The issue author flagged this as 'what users intuitively expect'. Adds _host_derived_api_key() to derive an env var name from the base URL host using the registrable label (second-to-last). Appended to all three api_key_candidates chains (_resolve_named_custom_runtime direct-alias path, named-custom path, _resolve_openrouter_runtime non-openrouter branch). Lookalike resistance: api.deepseek.com.attacker.test resolves to vendor label 'attacker', NOT 'deepseek' — DEEPSEEK_API_KEY stays put. IPs and loopback yield no vendor label. Already-handled vendors (OPENAI/OPENROUTER/ OLLAMA) are filtered to prevent bypass of the explicit host-gated paths. Adds 6 tests covering positive paths (DeepSeek, Groq), the lookalike attack, loopback rejection, the already-handled-vendor filter, and direct helper unit tests. Also adds erhnysr to AUTHOR_MAP.	2026-05-20 22:12:09 -07:00
Erhnysr	9514ddbee2	fix(security): address review feedback from pmos69 - Preserve OPENROUTER_API_KEY for explicit mirror/proxy configs when requested provider is openrouter and OPENROUTER_BASE_URL is set - Gate OPENAI_API_KEY and OPENROUTER_API_KEY in named custom provider path (_resolve_named_custom_runtime) on authoritative hosts - Gate same keys in direct-alias path - Update tests to reflect secure-by-default behavior for local endpoints	2026-05-20 22:12:09 -07:00
Erhnysr	59088228f6	fix(security): prevent API key leakage to non-authoritative custom endpoints Custom endpoint provider was forwarding OPENAI_API_KEY and OLLAMA_API_KEY to arbitrary hosts. Keys should only be sent to their authoritative domains (openai.com, ollama.com) or when explicitly configured via pool/env. - Gate OPENAI_API_KEY to openai.com hosts only - Gate OLLAMA_API_KEY to ollama.com hosts only - Return 'no-key-required' for unrecognized custom endpoints - Update tests to reflect secure-by-default behavior Closes #28660	2026-05-20 22:12:09 -07:00
helix4u	b9b6e034d5	fix(gateway): prioritize Telegram command menu	2026-05-20 19:14:21 -07:00
Teknium	c6a380eb6c	fix(skills-hub): widen identifier-dedup to GitHubSource + fix test patch path Sibling fix on top of @EloquentBrush0x's PR #29441. - tools/skills_hub.py GitHubSource.search() had the same r.name dedup bug. Two configured GitHub taps publishing same-named skills would collapse to one. - tests/hermes_cli/test_skills_hub.py:test_browse_skills_dedup_uses_identifier_not_name patched hermes_cli.skills_hub.create_source_router, but browse_skills() imports it locally from tools.skills_hub. Fixed patch path.	2026-05-20 15:04:01 -07:00
EloquentBrush0x	8f92327891	fix(skills-hub): fix dedup in browse_skills() programmatic API browse_skills() is the TUI gateway's API for the web UI skills browser (tui_gateway/server.py:6574). It had the same dedup-by-name bug as do_browse() and unified_search() fixed in the parent commit: r.name is not unique for browse-sh skills (Airbnb, Booking.com, Zillow all publish "search-listings"), so the dedup loop silently dropped all but the first skill with each task name. Switch to r.identifier, which is always globally unique. Add a regression test asserting that two browse-sh skills with the same name but different hostnames both appear in the browse_skills() result.	2026-05-20 15:04:01 -07:00
EloquentBrush0x	fc7e04e9ed	fix(skills-hub): deduplicate search results by identifier, not name Browse.sh exposes skills by task name (e.g. "search-listings"), which is shared across hundreds of sites. Deduplicating by name silently dropped every browse-sh skill after the first one with a given task name — e.g. only Airbnb's "search-listings" would survive, collapsing Booking.com, Zillow, and every other site's variant into nothing. Switch unified_search() and do_browse() to use r.identifier as the dedup key. identifier is always globally unique (e.g. "browse-sh/airbnb.com/search-listings-ddgioa"), so same-named skills from different browse-sh hostnames are preserved as distinct results. Update existing TestUnifiedSearchDedup tests to model the real scenario (same identifier appearing from two sources) and add a regression test that asserts browse-sh skills with the same name but different hostnames are never collapsed.	2026-05-20 15:04:01 -07:00
kshitij	3ce1cf2bb7	Merge pull request #29484 from kshitijk4poor/kp/x-search-degraded-flag Merged after self-review + local verification of date validation and degraded flag. All tests pass, claims confirmed end-to-end.	2026-05-20 14:39:27 -07:00
helix4u	1a7bb988fc	fix(gateway): harden kanban and provider cleanup races	2026-05-20 14:31:22 -07:00
kshitijk4poor	2a352f96ee	fix(x_search): surface degraded results + validate dates The xAI Responses API for x_search returns 200 OK with a synthesized fluff answer in two failure modes that callers currently cannot distinguish from a real, citation-backed result: 1. Any narrowing filter (allowed_x_handles, excluded_x_handles, from_date, to_date) was active, but the X index returned no matching posts. The model then answers from training data. 2. The date range is malformed, inverted, or pure-future (e.g. from_date=2030-01-01). The API call burns quota and Grok responds with a generic answer. Mitigations, both client-side: * Validate from_date / to_date before the HTTP call: - Strict YYYY-MM-DD. - from_date <= to_date when both set. - from_date <= today UTC (no posts in a window that hasn't started). to_date in the future remains allowed so callers can request 'from yesterday to tomorrow'. * Add 'degraded' + 'degraded_reason' to successful responses. degraded=True iff any narrowing filter was active AND both the top-level 'citations' array and inline 'url_citation' annotations came back empty. A broad query with no filters that returns no citations is not flagged degraded — that case is just an unsourced answer, not a filter miss. Tests cover all four validation paths plus six degraded-flag scenarios (each filter type, inline vs top-level citation recovery, broad query baseline). All existing tests continue to pass; the additions are purely additive on the success-path response shape. Discovered while testing the x_search toolset end-to-end: queries scoped to @Teknium1 returned confident-sounding generic text about Nous Research with zero citations, and from_date in 2030 produced sassy non-answers. Both are now detectable by the caller.	2026-05-21 02:38:45 +05:30
Teknium	31a0100104	feat(state.db): persist platform_message_id; restore yuanbao exact-id recall PR #29211 dropped JSONL gateway transcripts and noted that the platform's own `message_id` field (used by Yuanbao's recall guard to redact a message by exact platform id) was no longer preserved — falling back to content-match. That fallback works for the common case but redacts the wrong row when two messages share text (or fails to match when content is post-processed). Restore exact-id matching by giving state.db a column for it: - New `platform_message_id TEXT` column on the messages table (SCHEMA_VERSION bump 11 → 12; column added via declarative reconciler on existing DBs, no version-gated migration block needed) - Partial index `idx_messages_platform_msg_id` on (session_id, platform_message_id) to keep recall's point-lookup cheap even on large sessions - `append_message()` and `replace_messages()` accept the new value: the gateway-facing `append_to_transcript` in `gateway/session.py` forwards either `message["platform_message_id"]` or the legacy `message["message_id"]` key (yuanbao's existing convention) - `get_messages_as_conversation()` surfaces the column back on the message dict as `message_id` so platform code reads the same shape it used to read from JSONL - Yuanbao `_patch_transcript`: restore branch A1 (exact id match) ahead of A2 (content match) ahead of B (system-note). Both branches log which one fired so operators can tell from gateway.log whether recall hit the canonical path or had to fall back. Tests: - New low-level round-trip tests in `test_hermes_state.py` for both `append_message` and `replace_messages` paths - The PR's `test_yuanbao_recall_db_only.py` was rewritten to assert the new contract: branch A1 (id match) works against DB-only transcripts, and branch A2 (content match) still recovers rows that were observed without a platform id (e.g. agent-processed @bot messages where run.py doesn't carry msg_id through)	2026-05-20 13:00:57 -07:00
yoniebans	0cc1a1d2d9	refactor(yuanbao): drop dead branch A1 message_id loop + pin missing fixture PR #29211 review findings: 1. test_retry_replacement: pin DEFAULT_DB_PATH so SessionDB() doesn't write to the real ~/.hermes/state.db. Same fix as the other DB-only fixtures. 2. yuanbao recall branch A1 (message_id exact match) was structurally dead once load_transcript() became DB-only — state.db never preserves the platform message_id. Removed the dead loop, consolidated to a single content-match branch (renamed 'A: content match'). Branch B (system note) unchanged. Updated the test name + docstring to reflect this. Note: self._lock is no longer taken in append_to_transcript (was guarding the JSONL file append). SQLite append_message handles its own concurrency via WAL mode, so this is safe; flagging for awareness.	2026-05-20 13:00:57 -07:00
yoniebans	c634c07bcc	test(gateway): pin DEFAULT_DB_PATH in fixtures to prevent real state.db writes Fixtures that instantiate SessionStore() trigger SessionDB() with no args, which resolves to ~/.hermes/state.db via the DEFAULT_DB_PATH module constant (snapshot of get_hermes_home() at hermes_state import time). The autouse _hermetic_environment fixture in tests/conftest.py monkeypatches HERMES_HOME env, but DEFAULT_DB_PATH is already cached by then. Per-test monkeypatch.setattr(hermes_state, 'DEFAULT_DB_PATH', tmp_path/'state.db') forces the DB into tmp_path so the tests can't leak into the real profile. Verified by counting u1-prefixed sessions in real state.db before/after: delta=0.	2026-05-20 13:00:57 -07:00
yoniebans	b4b118c201	refactor(gateway): drop _append_to_jsonl from mirror Mirror messages are persisted via _append_to_sqlite. JSONL writer was a redundant dual-write. Updated test assertions from JSONL file checks to SQLite mock verification.	2026-05-20 13:00:57 -07:00
yoniebans	351fdcc6e6	refactor(gateway): stop writing JSONL in append_to_transcript / rewrite_transcript state.db is canonical. JSONL transcripts were a transition fallback; the fallback was removed in the previous commit. Existing *.jsonl files on disk are left untouched.	2026-05-20 13:00:57 -07:00
yoniebans	971cfaa38c	refactor(yuanbao): migrate recall to load_transcript() Yuanbao's recall feature was reading the gateway JSONL directly to look up messages by platform message_id, which state.db does not preserve. Migrated to use load_transcript() which returns DB messages. Recall branch A1 (message_id match) now falls through to A2 (content match) or B (system note) for all sessions — a documented degradation. Follow-up issue: add platform_message_id column to state.db messages to restore exact-id matching.	2026-05-20 13:00:57 -07:00
yoniebans	024a8e3ee9	refactor(gateway): drop JSONL fallback in load_transcript state.db is canonical. The 'use whichever source is longer' branch was defensive code for the pre-DB migration; on every real DB it has not fired (verified on a session corpus with 27 jsonl files / 950 sessions — zero jsonl-bigger cases). Test changes: - TestLoadTranscriptCorruptLines: deleted (tested dead JSONL code path) - TestLoadTranscriptPreferLongerSource: deleted (tested removed fallback) - Replaced with TestLoadTranscriptDBOnly (DB-only reads) - TestSessionStoreRewriteTranscript: fixture now creates DB session - test_gateway_retry_replaces_last_user_turn: fixture uses real DB	2026-05-20 13:00:57 -07:00
yoniebans	1d27be0ff3	test(gateway): pin SQLite-only load_transcript behaviour	2026-05-20 13:00:57 -07:00
Teknium	eeb747de25	feat(sessions): opt-in per-session JSON snapshot writer PR #29182 deleted the per-session JSON snapshot writer outright because state.db is canonical and the snapshots had no in-tree consumer. Some users have external tooling that reads `~/.hermes/sessions/session_{sid}.json` directly, so reintroduce the writer behind a config flag that defaults to off. - Add `sessions.write_json_snapshots` (default False) to DEFAULT_CONFIG - Restore `AIAgent._save_session_log` + `_clean_session_content` as gated methods. When the flag is off the call is a fast no-op; when on, the writer behaves as before (atomic write, truncation guard preserved, REASONING_SCRATCHPAD → think tag normalization) - Re-derive the target path from `agent.session_id` on each call so `/branch` and `/compress` re-points happen automatically — no need to restore the explicit re-point bookkeeping at call sites - Wire the single call site in `_persist_session` (the cleanup-on-exit hook). Did NOT restore the 7 intra-turn calls the original PR deleted — those were redundant writes within the same turn that doubled disk I/O without adding any persistence guarantee `_persist_session` does not already provide - Read the flag once at agent init via `load_config()`, cache as `agent._session_json_enabled` - Update `TestNoSessionJsonSnapshot` → `TestSessionJsonSnapshotOptIn` to pin behavior: default off (no file), opt-in true (file written), no-op method on default agents, logs_dir retained unconditionally - Update CONTRIBUTING.md and the bundled `hermes-agent` skill to document the flag and its default	2026-05-20 11:44:10 -07:00
yoniebans	b6c6f650ee	test(session-log): pin no-session_json regression + drop trailing whitespace Adds TestNoSessionJsonSnapshot to lock the contract that session_log_file attribute, _save_session_log method, and the per-session JSON snapshot writer are gone. logs_dir is retained for request_dump_*.json. Also cleans up stray trailing whitespace in test_run_agent_codex_responses introduced when the _save_session_log stub line was deleted.	2026-05-20 11:44:10 -07:00
yoniebans	6f1a5f8597	refactor(session-log): delete dead _clean_session_content helper Only caller was the removed _save_session_log. Also removes the unused convert_scratchpad_to_think and has_incomplete_scratchpad imports from run_agent.py (both still used elsewhere via their own imports).	2026-05-20 11:44:10 -07:00
yoniebans	cebd480818	refactor(session-log): drop branch/compress re-point of session_log_file The attribute no longer exists; nothing to re-point.	2026-05-20 11:44:10 -07:00
yoniebans	ce26785187	refactor(session-log): delete _save_session_log and all callers state.db now stores every message field the JSON snapshot stored. Removed the method, all 7 call-sites, and ~13 test stubs that suppressed its file I/O. Body is in git history if it ever needs to come back.	2026-05-20 11:44:10 -07:00

1 2 3 4 5 ...

4097 commits