Commit graph

9176 commits

Author SHA1 Message Date
CipherFrame
b5c6d9ac08 fix: wire STT lazy-install into transcription_tools.py
The ensure('stt.faster_whisper') lazy-install mechanism was defined in
lazy_deps.py but never called from the STT code path. When
_HAS_FASTER_WHISPER (a module-level constant) evaluated to False at
import time, _get_provider() returned 'none' immediately without
attempting installation. On fresh container builds or venv recreations,
this meant voice message transcription broke silently until someone
manually installed faster-whisper.

Add _try_lazy_install_stt() helper that calls ensure() and
re-checks dynamically via importlib.util.find_spec. Wire it into
all three gates in transcription_tools.py:

- _get_provider() explicit 'local' path (line 221)
- _get_provider() auto-detect path (line 287)
- _transcribe_local() guard (line 405)

This ensures the first voice message after any fresh install triggers
auto-installation instead of failing permanently until a process restart.
2026-05-21 23:36:18 -07:00
helix4u
f6f25b9449 fix(agent): fail fast on small Ollama runtime context 2026-05-21 23:25:01 -07:00
Teknium
e77f1ed5f7 fix(agent): widen toolset gate to context engine tools (#5544 sibling)
The memory-provider gate added in the prior commit closes one of two
blind-injection sites in agent_init.py. The context engine block (lines
~1445) follows the identical pattern: agent.context_compressor.get_tool_schemas()
(lcm_grep, lcm_describe, lcm_expand) was appended to agent.tools unconditionally,
ignoring enabled_toolsets.

Same bug class, same local-model latency penalty, same one-line gate — using
'context_engine' as the toolset name (matches the existing plugin-system
convention in plugins.py, plugins_cmd.py, etc.).

Also adds Lempkey to scripts/release.py AUTHOR_MAP for the prior commit's
authorship.
2026-05-21 23:18:37 -07:00
lempkey
4c61fb6cf6 fix(agent): gate memory tool injection on enabled_toolsets (#5544)
MemoryManager.get_all_tool_schemas() output was appended to AIAgent.tools
unconditionally — bypassing the enabled_toolsets / platform_toolsets filter.
Setting `platform_toolsets: telegram: []` had no effect: fact_store and other
memory provider tools still leaked into the tool surface on every session.

Impact on local models (per @thundercat49's benchmarks on Qwen3-30B-A3B Q4_K_M /
RTX 3090): tool-formatted prompts process at 134 tok/s vs 1,230 tok/s for plain
text. With 8 memory tool schemas injected, a simple 'hello' on Telegram took
~42s instead of ~1.7s. Small models also entered tool-call loops when memory
tools were the only tools present.

Gate condition (matches the natural meaning of enabled_toolsets):
  None                       → no filter, inject (backward compat)
  contains 'memory'          → user opted in, inject
  otherwise (including [])   → skip injection

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-21 23:18:37 -07:00
brooklyn!
1264fab156
fix(tui): surface verbose tool details (#30225)
Some checks failed
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Docker Build and Publish / move-latest (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
Nix Lockfile Fix / auto-fix-main (push) Waiting to run
Nix Lockfile Fix / fix (push) Waiting to run
Nix / nix (macos-latest) (push) Waiting to run
Nix / nix (ubuntu-latest) (push) Waiting to run
Tests / test (push) Waiting to run
Tests / e2e (push) Waiting to run
Deploy Site / deploy-vercel (push) Has been cancelled
Deploy Site / deploy-docs (push) Has been cancelled
OSV-Scanner / Scan lockfiles (push) Has been cancelled
uv.lock check / uv lock --check (push) Has been cancelled
* fix(tui): surface verbose tool details

Emit redacted structured verbose args/results to the TUI so /verbose verbose can show full tool detail without reopening stdout, and fail closed if redaction is unavailable.

Salvages #29011.

Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>

* fix(tui): address verbose detail review

Label verbose tool failures as errors, cover forced verbose reasoning, and avoid new diff type warnings from the redaction regression tests.

* fix(tui): bound verbose tool payloads

Cap verbose tool detail text before emitting JSON-RPC events and preserve verbose results on inline diff completions.

* fix(tui): align termux argv test with gc flag

Update the stale TUI launch expectation so the Termux freshness path matches the current direct Node argv.

---------

Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
2026-05-22 00:16:52 -05:00
Teknium
4e2c66a098 chore(release): add AUTHOR_MAP entry for Stark-X 2026-05-21 19:17:51 -07:00
Stark-X
eb51fb6f50 fix(ssh): keep bulk sync extraction scoped to .hermes 2026-05-21 19:17:51 -07:00
liuhao1024
4a2fa77c15 fix(cli): pre-check CUA release asset for Intel macOS before install
The upstream cua-driver installer resolves the latest release and attempts
to download an architecture-specific asset. When the release only ships
arm64 builds (as of v0.1.6), the installer fails with a raw 404 on Intel
macOS with no clear path forward.

Add _check_cua_driver_asset_for_arch() that probes the GitHub Releases API
before running the installer. If the latest release has no x86_64/amd64
asset, print a clear warning and link to the upstream issue. On arm64 or
API failure, fail open and let the installer proceed as before.

Fixes #24530
2026-05-21 19:17:45 -07:00
teknium1
9896e43db5 fix(skills): load Linux-tagged skills on Termux (android sys.platform)
Reported by @LikiusInik in Discord: on Termux only 3 built-in skills
appeared and /gh-pr-workflow + every other slash-skill from
github/productivity/mlops was missing.

Root cause: skill_matches_platform() compares sys.platform.startswith()
against the skill's platforms list. Termux is a Linux userland on
Android, but Python 3.13+ reports sys.platform == "android" instead of
"linux" — so the ~60 built-in skills tagged platforms:[linux,macos,
windows] (github-pr-workflow, google-workspace, github-auth,
huggingface-hub, etc.) all got filtered out at the listing step in
tools/skills_tool.py:_find_all_skills and never appeared as /slash
commands or in skill_view.

Fix: when is_termux() detects we're running inside Termux, accept
"linux" platform tags regardless of whether sys.platform is "linux"
(pre-3.13) or "android" (3.13+). Also accept explicit
platforms:[termux] / [android] tags. macOS-only and Windows-only
skills correctly remain excluded.

E2E (simulated TERMUX_VERSION=set + sys.platform="android"):
  Before: _find_all_skills() returned ~3 skills.
  After:  _find_all_skills() returns 84 skills including
          github-pr-workflow, google-workspace, github-auth,
          huggingface-hub. Apple-only skills remain excluded.

Non-Termux Linux/macOS/Windows behavior unchanged (verified).

Tests: tests/agent/test_skill_utils.py — 9 new cases covering
android-as-Termux, the [linux,macos,windows] case, macOS-only
exclusion, explicit termux/android tags, non-Termux Android safety,
and unchanged behavior on real Linux/macOS.
2026-05-21 19:08:38 -07:00
adybag14-cyber
d08c2a016a fix(tui): termux-gate composer rendering tweaks for Ink TUI
Salvaged from #28942 (adybag14-cyber). Only the Ink TUI half is taken
here — the bundled "termux compatibility note" added to skills_tool.py
in the original PR did not address the actual user-reported bug
(skill_matches_platform() filtering Linux skills out on Termux) and
also regressed the EXCLUDED_SKILL_DIRS set used to prune nested
.venv/site-packages skills.

Changes:
- ui-tui/src/lib/prompt.ts: single-cell ASCII '>' marker in Termux mode
  to avoid ambiguous-width glyph artifacts while typing.
- ui-tui/src/components/appLayout.tsx: suppress profile prefix on
  narrow Termux panes (>=90 cols still shows it).
- ui-tui/src/lib/inputMetrics.ts + components/messageLine.tsx +
  lib/virtualHeights.ts: termux-aware transcript body width — drop
  the desktop 20-col floor on narrow mobile layouts, align virtual
  heights with actual rendered width.
- ui-tui/src/components/textInput.tsx: disable fast-echo bypass by
  default in Termux to avoid ghosting at soft-wrap boundaries.
  HERMES_TUI_TERMUX_FAST_ECHO=1 opts back in.

Tests: ui-tui/src/__tests__/{prompt,termuxComposerLayout,textInputFastEcho}.test.ts
(12 PR-added tests pass; 3 pre-existing wrapAnsi-bundling failures on
main are unrelated.)

The real skill-listing fix on Termux ('android' platform matching
Linux skills) ships as a follow-up commit on this branch.
2026-05-21 19:08:38 -07:00
Teknium
0e2873a77d fix(computer_use): build summary once before aux-vision routing branch
The cherry-pick of #22891 (max_elements cap) reshuffled _capture_response
so summary was assigned inside both the multimodal and AX branches,
but #30126's aux-vision routing call (_route_capture_through_aux_vision)
fires BEFORE either branch and references the not-yet-bound name.

Compute summary once up-front, keep the AX-branch rebuild for the
truncation note.
2026-05-21 19:07:32 -07:00
briandevans
280dd4513a fix(computer-use): address Copilot review on max_elements cap
Four findings from Copilot's review on PR #22891, all in the AX
elements-array cap added by 22fa1ed:

1. The truncation note ("response truncated to N of M elements") was
   appended unconditionally — including in the som/vision multimodal
   path, whose response carries a screenshot rather than an `elements`
   array. The note described a payload field that wasn't present.
   Moved the note into the AX-text branch where the array actually
   appears.

2. `_format_elements(cap.elements)` ran on the full untrimmed list with
   its own `max_lines=40` cap, so a caller passing `max_elements=10`
   would see summary lines referencing `#11..#40` even though the JSON
   `elements` array only held #1..#10. Format on `visible_elements`
   instead so the summary indices always exist in the response.

3. `_coerce_max_elements` enforced a lower bound but no upper bound,
   so `max_elements=10_000_000` silently disabled the safeguard and
   reintroduced the original context-blow-up. Added a hard cap
   (`_MAX_ALLOWED_MAX_ELEMENTS = 1000`) that clamps oversized values.

4. The schema string said "Default 100" but the property carried no
   `default` field, and claimed `max_elements` had no effect on som/
   vision while the image-missing fallback path can still return an
   elements array. Added `"default": 100`, `"maximum": 1000`, and
   clarified the fallback-path wording.

Each finding gets a regression test:

- test_capture_ax_clamps_oversized_max_elements_to_hard_cap
- test_capture_ax_summary_indices_match_returned_elements
- test_capture_multimodal_summary_omits_truncation_note
- test_schema_max_elements_documents_default_and_upper_bound

Verified with `pytest tests/tools/test_computer_use.py` (53 passed,
including the 5 new cases). Confirmed each new test fails on the
pre-fix code path before applying the production change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 19:07:32 -07:00
briandevans
bb694bad42 fix(computer-use): cap AX elements array to prevent context blowup (#22865)
`computer_use(action='capture', mode='ax')` returned the full AX element
list verbatim in the JSON response. Dense Electron / Obsidian / JetBrains
UIs publish 500+ AX nodes (one reproduction in #22865 returned 597
elements against Obsidian), so a single capture could consume enough
context to trigger compression failures or render the session unusable.
The human-readable `_format_elements` summary is already capped at 40
lines, so the truncation gap was invisible to anyone reading the summary
output.

Add a `max_elements` argument to the tool schema, default 100, that
trims the AX `elements` array. When the cap fires, the response surfaces
`total_elements` and `truncated_elements` and appends a "raise
max_elements or pass app= to narrow" hint to the summary so the model
knows the JSON view is partial and can re-issue with a tighter scope.

Validation is centralized in `_coerce_max_elements`: missing /
non-integer / sub-1 inputs fall back to the default cap, so the
protection can never be silently disabled by a malformed tool-call
argument. The cap only affects AX-mode JSON; `mode='som'` and
`mode='vision'` keep returning a screenshot + image-aware summary
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 19:07:32 -07:00
brooklyn!
9e30ef224d
fix(tui): preserve scrollback when branching sessions (#30162)
Keep the visible transcript mounted after /branch switches to the new session, since the backend already carries the copied history forward.
2026-05-21 21:01:04 -05:00
brooklyn!
a7cd254c29
feat(tui): mouse_tracking DEC mode presets (salvage of #26681) (#30084)
* feat(tui): make display.mouse_tracking pick which DEC modes to enable

Previously the boolean flag was all-or-nothing across modes 1000+1002+1003+1006.
Inside tmux, mode 1003 (any-motion) makes every mouse cross of the prompt row
fire a clipboard probe that surfaces as "No image in clipboard" — sometimes
dozens in a row. Disabling tracking entirely killed scroll-wheel scrolling too,
since tmux's own scrollback is preempted by the alt-screen TUI.

`display.mouse_tracking` (and `/mouse <preset>`) now accepts `off | wheel |
buttons | all` in addition to the legacy booleans. `wheel` is 1000+1006:
scroll wheel + click only, no drag, no hover — the tmux-friendly subset.
`buttons` adds 1002 for drag-to-select. `all` (= legacy `true`) keeps the
hover-driven UI (scrollbar paginate-on-hover, link mouseenter, etc.).

* fix(tui): repaint + sync mouse mode when display.mouse_tracking changes

Two interacting bugs left the TUI blank when `display.mouse_tracking`
switched at runtime (config edit, /mouse <preset>):

1. AlternateScreen's effect re-runs on every `mouseTracking` change,
   tearing down and re-entering the alt screen. After re-entry, ink's
   frame buffers are reset by `resetFramesForAltScreen()` but nothing
   schedules the follow-up render — the alt screen sits blank until
   some other state change happens to trigger one. Add a
   `scheduleRender()` in `setAltScreenActive`'s active=true branch so
   the freshly-entered alt screen gets a full repaint immediately.

2. `setAltScreenActive` early-returns when `active` hasn't changed,
   which silently drops a `mouseTracking` change if the cleanup→setup
   pair somehow leaves `altScreenActive` already true. Call
   `setAltScreenMouseTracking` explicitly from the AlternateScreen
   effect so the in-memory mode and terminal DECSET sequence stay in
   sync regardless of how `setAltScreenActive` resolved (the call is a
   no-op when the mode is unchanged).

* fix(tui): address copilot review #4341269705

- tui_gateway/server.py: drop the never-referenced _MOUSE_TRACKING_MODES
  frozenset (comment #3284802434). _MOUSE_TRACKING_ALIASES already
  centralizes the canonical preset set via its values; the separate
  constant added no behavior.
- tests/test_tui_gateway_server.py: update the existing
  test_config_mouse_uses_documented_key_with_legacy_fallback to assert
  the new preset strings ('all'/'off' instead of 'on'/'off',
  display.mouse_tracking persisted as 'all' instead of True) and add
  test_config_mouse_accepts_preset_strings_and_aliases covering /mouse
  set with wheel/click/unknown (comment #3284802453). The on/off legacy
  config.set return shape was an implementation detail of the boolean
  flag, not a stable API — the slash command, gateway help text, and
  docs all advertise the preset values now.
- ui-tui/packages/hermes-ink/src/ink/ink.tsx: schedule a render at the
  end of reenterAltScreen() (comment #3284802461). Mirrors the same fix
  in setAltScreenActive() from ece0a2f4c — without it, SIGCONT/resize
  self-heal/stdin-gap re-entry leaves the alt screen blank because
  every caller returns early after invoking us.

* fix(tui): address copilot review #4341308478 round 2

- ui-tui/src/config/env.ts (comment #3284837577): the precedence
  comment was misleading. Actual behavior on origin/main is
  HERMES_TUI_MOUSE_TRACKING (explicit override) > Termux default >
  HERMES_TUI_DISABLE_MOUSE legacy kill-switch. This is preserved from
  main; the only change here was the wrong comment that claimed
  DISABLE_MOUSE kept kill-switch semantics. Rewrote the comment block
  to document the actual precedence ladder.
- tui_gateway/server.py /mouse set (comment #3284837607): replaced
  'str(value or "").strip().lower()' with the explicit None idiom
  already used for /indicator, so programmatic callers can pass 0 /
  False and have them route through _MOUSE_TRACKING_ALIASES → 'off'
  instead of collapsing to '' and triggering the toggle path.
- ui-tui/packages/hermes-ink/src/ink/components/AlternateScreen.tsx
  (comment #3284837620): always prepend DISABLE_MOUSE_TRACKING before
  enableMouseTrackingFor(...) on mount. Otherwise selecting
  'wheel'/'buttons' from a state where DEC 1003 was already asserted
  (crash, another app, debugger) would silently leave hover on. Also
  unconditionally DISABLE on unmount so a crash mid-mount can't leak
  DEC modes back to the host shell.

* chore(release): map nat@nthrow.io to @nthrow for #26681 salvage

* fix(tui): drop redundant setAltScreenMouseTracking in AlternateScreen

Copilot review #4341356637 (comment #3284880417). The explicit
setAltScreenMouseTracking(mouseTracking) after setAltScreenActive(true,
mouseTracking) was defensive paranoia added in the previous fix commit
that's not actually reachable in practice:

- React's cleanup always runs before the next setup, so on any prop
  change (mouseTracking or writeRaw) the cleanup sets active=false
  first. Setup then sees active was false and applies the new mode
  via setAltScreenActive without early-returning.
- On the impossible 'active stayed true' path, the writeRaw above has
  already sent DISABLE_MOUSE_TRACKING + enableMouseTrackingFor(newMode)
  to the terminal, so the in-memory mode would lag but the visible
  state is already correct.

Removing the redundant call means a single DEC sequence per mount.
If the 'active stayed true' path ever manifests in practice, the
right fix is in setAltScreenActive (track mode regardless of the
active early-return), not here.

* fix(tui): always DISABLE before enableMouseTrackingFor in ink.tsx

Copilot review #4341379994 (comments #3284900825, #3284900840,
#3284900852). Three remaining call sites in ink.tsx still re-enabled
mouse tracking without first sending DISABLE_MOUSE_TRACKING:

- handleResize alt-screen recovery (line ~577)
- reassertTerminalModes stdin-gap re-assertion (line ~1351)
- reenterAltScreen SIGCONT/resize/stdin-gap self-heal (line ~1408)

For 'wheel'/'buttons' presets, omitting DISABLE leaves any externally-
asserted DEC 1003 (other apps, prior crash, tmux state) still active
and the hover-free preset silently has hover on. DISABLE_MOUSE_TRACKING
is idempotent and safe to send unconditionally — it resets all four
modes. Matches the pattern already in setAltScreenMouseTracking and
the AlternateScreen mount path.

* fix(tui): always DISABLE before enableMouseTrackingFor in exitAlternateScreen

Copilot review #4341452823 (comment #3284959762). exitAlternateScreen()
was the last call site in ink.tsx still re-enabling mouse tracking
without DISABLE first. Editors (vim/nvim/less) and tmux can leave
DEC 1003 hover asserted across the handoff back; without DISABLE,
'wheel'/'buttons' presets silently kept hover on after the editor
quit. Now all five enableMouseTrackingFor() call sites in ink.tsx
prepend DISABLE_MOUSE_TRACKING — handleResize, reassertTerminalModes,
reenterAltScreen, setAltScreenMouseTracking, exitAlternateScreen.

* fix(tui): add defensive default to enableMouseTrackingFor switch

Copilot review #4341485231 (comment #3284979323). TS exhaustive switch
returns string per the type system, but a JS caller / corrupted config
/ hot-reload-in-dev could reach the function with an unknown value at
runtime. Without a default, that path returns undefined which then
concatenates as the literal string 'undefined' into the terminal byte
stream — visibly garbling output. Treat unknown as 'off' (no DEC
sequences) so the worst case is silent input loss rather than a
wrecked screen.

---------

Co-authored-by: Nat Thrower <nat@nthrow.io>
2026-05-21 20:25:52 -05:00
Ben Barclay
4d58e48cdb
Merge pull request #29387 from NousResearch/fix/no-docker-tag
fix(ci): stop pushing per-commit SHA tags to Docker Hub
2026-05-22 10:38:32 +10:00
xxxigm
bec2250d2c test(computer_use): end-to-end regression for capture routing (#24015)
Add tests/tools/test_computer_use_capture_routing.py — 13 integration
tests that drive _capture_response end-to-end with deterministic stubs
for the routing helper, _run_async, vision_analyze_tool, and
get_hermes_dir, so the full code path is exercised without a live
cua-driver, real auxiliary client, or network access.

Coverage:

  * TestCaptureResponseDefaultPath (3 cases)
    - SOM PNG capture returns the legacy multimodal envelope when the
      routing helper says 'native' (image/png MIME).
    - Same path returns image/jpeg MIME for JPEG payloads (cua-driver
      can return either).
    - AX-only mode never even consults the routing helper because no
      PNG is present.

  * TestCaptureResponseRoutedToAuxVision (5 cases)
    - SOM capture with routing on returns a JSON string with the
      vision_analysis embedded, the AX/SOM index preserved, and NO
      image_url parts. Verifies the aux call receives a path under
      the configured cache and a prompt that grounds itself against
      the AX summary.
    - Temp screenshot file is unlinked after _capture_response returns,
      including when the aux call raises (the finally block runs).
    - Empty / malformed aux analysis falls back to the multimodal
      envelope so the user always gets *something* useful.

  * TestRoutingDecisionWiring (4 cases)
    - Explicit auxiliary.vision in config flips routing on regardless of
      main-model vision capability.
    - Vision-capable main + native tool-result support keeps multimodal.
    - Config load failure fails open (returns False, multimodal path
      continues to work).
    - Helper exception is swallowed and routes to legacy behaviour.

  * TestBugReproductionAnchor (1 case) - directly pins the #24015
    contract: when routing is on, the response must NEVER contain a
    'data:image' or 'image_url' substring. That is exactly what tripped
    the reporter's HTTP 404 ('No endpoints found that support image
    input') on tencent/hy3-preview before the fix.

Bug-reproduction proof:
  $ git checkout upstream/main -- tools/computer_use/tool.py
  $ scripts/run_tests.sh tests/tools/test_computer_use_capture_routing.py
  ============================== 13 failed in 1.29s ==============================

  $ # restore tool.py to this branch's HEAD
  $ scripts/run_tests.sh tests/tools/test_computer_use_capture_routing.py
  ============================== 13 passed in 1.04s ==============================

Total branch coverage:
  85 passed across test_computer_use.py, test_computer_use_vision_routing.py,
  test_computer_use_capture_routing.py
2026-05-21 17:38:19 -07:00
xxxigm
e02a7e5e1c fix(computer_use): route SOM/vision captures via auxiliary.vision (#24015)
When the active main model has no vision capability — or when the user
explicitly configured auxiliary.vision in config.yaml — sending the
captured screenshot back to the main model in a multimodal tool-result
envelope is the wrong move: it trips HTTP 404 / 400 at the provider
boundary (e.g. 'No endpoints found that support image input') and the
agent loop reports a hard tool failure for what should have been a
simple capture.

The reporter on #24015 hit this with:

  model:
    default: tencent/hy3-preview      # no vision support
    provider: openrouter
  auxiliary:
    vision:
      provider: openrouter
      model: google/gemini-2.5-flash  # explicitly configured

…and observed:

  computer_use(action='capture', mode='som')
  → ⚠️ API call failed (attempt1/3): NotFoundError [HTTP 404]
     🔌 Provider: openrouter  Model: tencent/hy3-preview
     📝 Error: HTTP 404: No endpoints found that support image input

Fix: in tools/computer_use/tool.py::_capture_response, after a
screenshot is captured (modes 'som' / 'vision'), consult the routing
helper introduced earlier in this branch. When it says 'route to aux',
materialise the PNG to $HERMES_HOME/cache/vision/, run vision_analyze
on it (which honours auxiliary.vision via the standard async_call_llm
task='vision' router), and return a text-only JSON tool result that
embeds the analysis alongside the existing AX/SOM index. The main
model never sees the pixels — it sees an actionable text description
plus the same set-of-mark element index it normally uses.

The two new helpers (_should_route_through_aux_vision,
_route_capture_through_aux_vision) keep the policy and the IO
separated so each can be tested in isolation. Both fail open: if the
config import fails, if the aux call raises, or if the analysis is
empty, we fall back to the existing multimodal envelope so the
behaviour is at worst the pre-fix status quo. Temp screenshot files
are cleaned up unconditionally in a finally block — even on aux call
failure — to avoid leaving residue under cache/vision/.

The end-to-end regression for #24015 is added in the next commit.
2026-05-21 17:38:19 -07:00
xxxigm
5ce5fe3181 test(computer_use): cover capture vision-routing helper
Add tests/tools/test_computer_use_vision_routing.py — 28 unit tests
that pin the contract of the new vision-routing helper introduced in
the previous commit:

  * TestExplicitAuxVisionOverride (12 cases): mirror the
    auxiliary.vision detection rules used by agent.image_routing so
    the capture path and the user-attached-image path agree on what
    counts as an explicit override (provider/model/base_url with
    non-blank, non-'auto' values).
  * TestRouteDecision (7 cases): pin the policy itself — explicit
    override always wins, vision-capable + native-tool-result keeps
    multimodal, everything else fails closed and routes to aux.
  * TestLookupHelpers (5 cases): defensive paths for the models.dev /
    tool-result-support lookups (blank inputs, exceptions, missing
    caps).
  * TestModuleSurface (4 cases): pin the public/__all__ surface and
    keep internal helpers addressable so the integration test in the
    next commit can monkeypatch them deterministically.

Run with:
  scripts/run_tests.sh tests/tools/test_computer_use_vision_routing.py
2026-05-21 17:38:19 -07:00
xxxigm
531efe7208 fix(computer_use): add helper to decide capture vision routing
Add tools/computer_use/vision_routing.py with
should_route_capture_to_aux_vision(provider, model, cfg) — a small
policy helper that decides whether a captured screenshot should be
returned as a multimodal envelope (main model has native vision) or
pre-analysed through the auxiliary.vision pipeline so the main model
only sees text.

The decision mirrors agent.image_routing.decide_image_input_mode for
user-attached images, so the capture path and the user-turn path agree
on what counts as an explicit aux vision override:
  * provider/model/base_url under auxiliary.vision => explicit override
    => route through aux vision
  * provider+model accepts multimodal tool results AND main model
    reports supports_vision=True => keep multimodal envelope
  * everything else (no tool-result image support, non-vision model,
    metadata lookup failure) => fail closed and route through aux

No call sites are changed in this commit; the helper is added in
isolation so the routing decision can be unit-tested before it is
plumbed into _capture_response().
2026-05-21 17:38:19 -07:00
Teknium
2a474bcf72 fix(termux): resolve packed-refs and worktree refs in skill-sync fingerprint
The bundled-skill sync stamp added in the cherry-picked salvage commit
parsed .git/HEAD and looked for a loose ref file in the worktree gitdir
only, so two real cases hit the unresolved branch:

- repos after `git gc` where active refs live in packed-refs
- linked worktrees, whose branch ref lives in <commondir>/refs/heads/
  (verified on the worktree this salvage was built in)

Both fell back to a constant-string fingerprint, so post-commit launches
would never re-run the real skill sync. Now we resolve packed-refs and
check both the worktree gitdir and the common dir for loose refs.

Adds three tests covering: packed-refs resolution, worktree common-dir
packed lookup, worktree common-dir loose lookup, and the explicit
'unresolved' marker (still stable + version-fallback-safe).
2026-05-21 17:19:05 -07:00
adybag14-cyber
6dbbf20ff4 perf(termux): speed up non-tui cli startup 2026-05-21 17:19:05 -07:00
briandevans
5aa4727f34 fix(computer-use): surface app=… filter no-match instead of silently using frontmost (#24170 bug 1)
`CuaDriverBackend.capture(app=X)` and `focus_app(app=X)` silently fell back
to the frontmost on-screen window when X matched no app — typically a
menu-bar utility (e.g. "Fuwari" in the bug reporter's case) rather than
the requested app. The agent then received UI elements for the wrong app
and clicked / typed into it.

The root cause is a localized macOS app name mismatch: `list_windows`
returns the localized `app_name` (e.g. "計算機" on a Japanese/Chinese
system) but callers naturally pass the English name ("Calculator"). The
substring filter doesn't match, and the code falls through to picking the
frontmost window with no signal that the filter was effectively dropped.

Fix:

- `capture(app=…)`: when the filter matches nothing, return a
  `CaptureResult` with empty `app`/`elements` and a diagnostic
  `window_title` pointing the caller at `list_apps` and noting the
  localized-name convention. `_active_pid` / `_active_window_id` are left
  untouched so a subsequent action doesn't inadvertently hit the wrong
  process.
- `focus_app(app=…)`: when the filter matches nothing, set `target = None`
  and let the existing `return ActionResult(ok=False, …, "No on-screen
  window found for app …")` path fire instead of falsely reporting success
  on the frontmost window.

This addresses bug 1 only from #24170. Bugs 2 & 5 are addressed in #30046;
bugs 3 & 4 in #30032.
2026-05-21 17:15:35 -07:00
Bartok9
4cc18877c6 fix(computer_use): preserve app context for capture_after; fix element label parsing (#24170 bugs 2 & 5)
Bug 2 (capture_after=True loses app context):
_maybe_follow_capture called backend.capture(mode='som') with no app=,
causing cua-driver to capture the frontmost window instead of the app
targeted by the preceding capture/focus_app. Fix: track _last_app on
CuaDriverBackend and thread it through the follow-up capture call so
the same app is re-captured regardless of which window has OS focus.

Bug 5 (element labels stripped in capture results):
_ELEMENT_LINE_RE matched the classic '  - [N] AXRole "label"' format
but not the '[N] AXRole (order) id=Label' format introduced in
cua-driver v0.1.6. All element labels were silently dropped as empty
strings, making element identification impossible.

Fix: extend regex to capture both group(3) (quoted label) and group(4)
(id= label), and update _parse_elements_from_tree to use group(4) as
fallback. Both old and new cua-driver output now produce populated
UIElement.label values.

focus_app() now also sets _last_app so that capture_after= on any
subsequent action re-targets the focused app.

5 new regression tests added.

Part of #24170 (bugs 1 and 3/4 addressed separately).
2026-05-21 14:19:09 -07:00
Teknium
3fde8c153d
fix(skills): prune dependency/venv dirs from all skill scanners (#30042)
* fix(skills): skip dependency dirs in skill scan

* fix(skills): widen sibling rglob scanners to use shared exclusion set

Follow-up to PR #29968. The contributor's PR widened EXCLUDED_SKILL_DIRS
in the canonical walker (iter_skill_index_files), which fixes the
user-visible discovery path. This commit sweeps the ~12 other
rglob('SKILL.md') sites that did their own ad-hoc filtering — most only
checked .git/.hub, some had no filter at all — so dependency dirs
(.venv, node_modules, site-packages, etc.) cannot leak ghost skills
through the secondary paths.

Adds agent.skill_utils.is_excluded_skill_path(path) helper. Migrates
all 13 sites to use it. Removes 3 hardcoded duplicate filter sets.

Sites touched:
  agent/curator_backup.py        - skill backup file count
  gateway/run.py                 - disabled-skill response (2 sites)
  hermes_cli/dump.py             - skill count in env dump
  hermes_cli/profile_describer.py- profile description (2 sites)
  hermes_cli/profile_distribution.py - profile install count
  hermes_cli/profiles.py         - profile skill count
  hermes_cli/skills_hub.py       - category detection
  tools/skill_manager_tool.py    - skill name lookup (already used set, now uses helper)
  tools/skill_usage.py           - usage tracking + skill dir lookup (2 sites)
  tools/skills_hub.py            - optional skills find + scan (2 sites)
  tools/skills_sync.py           - bundled skills sync

E2E verified with the exact reported shape
(bring/scripts/.venv/.../typer/.agents/skills/typer/SKILL.md): no
sibling site picks up the ghost skill, all five legit-skill counts
still return 1.

* chore(infographic): retro-pop-grid bento for PR #30042 skill-scanner sweep

---------

Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
2026-05-21 14:18:02 -07:00
helix4u
3462b097e2 fix(voice): chunk oversized CLI recordings 2026-05-21 14:17:39 -07:00
Teknium
552e9c7881
feat(secrets): Bitwarden Secrets Manager integration with lazy bws install (#30035)
* feat(secrets): Bitwarden Secrets Manager integration with lazy bws install

Pull API keys from Bitwarden Secrets Manager at process startup
instead of storing them all in plaintext in ~/.hermes/.env.  One
bootstrap token (BWS_ACCESS_TOKEN) replaces N per-provider keys, and
rotating a credential becomes a single change in the Bitwarden web
app.

Bitwarden defaults to source of truth: secrets pulled from BSM
overwrite any matching env vars on startup so rotations actually
take effect.  Set secrets.bitwarden.override_existing: false in
config.yaml to invert.

The bws binary is auto-downloaded into ~/.hermes/bin/bws on first
use (pinned to v2.0.0, SHA-256 verified against the GitHub release
checksum file).  No apt, brew, or sudo required.

New surfaces:
  hermes secrets bitwarden setup    — interactive wizard
  hermes secrets bitwarden status   — config + binary + token state
  hermes secrets bitwarden sync     — dry-run fetch / --apply exports
  hermes secrets bitwarden disable  — flip enabled: false
  hermes secrets bitwarden install  — just download the binary

Failures (missing binary, bad token, no network) never block Hermes
startup — they emit a one-line warning to stderr and continue with
whatever credentials .env already had.

Docs: website/docs/user-guide/secrets/{index,bitwarden}.md
Tests: tests/test_bitwarden_secrets.py (26 tests, hermetic — bws
       subprocess and HTTP downloads fully mocked)

* chore(infographic): add bitwarden-secrets-manager bento-grid retro-pop-grid

Generated for PR #30035 — Bitwarden Secrets Manager integration.
Style picked via pick_pr_infographic_style.py rotation:
  layout: bento-grid
  style:  retro-pop-grid
  aspect: 1:1 square

Saved at infographic/bitwarden-secrets-manager/infographic.png
2026-05-21 14:10:34 -07:00
liuhao1024
18cd1e5c72 fix(computer_use): correct type_text MCP tool name and implement drag action
Bug 3: The cua_backend type_text() method called MCP tool 'type_text_chars'
which does not exist in current cua-driver. Changed to 'type_text' which is
the correct MCP tool name.

Bug 4: The drag() method returned a hardcoded 'not supported' error even
though cua-driver exposes a 'drag' MCP tool. Implemented proper drag
dispatching with coordinate-based and element-based targeting.

Added dispatch-level validation for drag to ensure from/to coordinates
or elements are provided before calling any backend.

Fixes #24170 (bugs 3 and 4)
2026-05-21 14:08:28 -07:00
github-actions[bot]
0ce12a9241 fix(nix): auto-refresh npm lockfile hashes
Source: 56b79f12ac

Run: https://github.com/NousResearch/hermes-agent/actions/runs/26250404490
2026-05-21 20:11:48 +00:00
Teknium
56b79f12ac
fix(dashboard): remove country flags from language picker (#29997)
Closes #29750. Reporter flagged that 繁體中文 displayed the TW flag
instead of the PRC flag. Rather than picking a side, drop the
language-flag pairings entirely — languages aren't countries
(English ≠ GB, Portuguese ≠ PT, Mandarin variants ≠ any single
jurisdiction), and endonyms are unambiguous.

- LOCALE_META: strip flagCountryCode field
- LanguageSwitcher: remove LocaleFlagIcon component + both call sites
- main.tsx: drop flag-icons CSS import
- package.json: uninstall flag-icons
2026-05-21 13:10:52 -07:00
teknium1
3d2f146460 fix(tui): also pass --expose-gc on the wheel-bundled launch path
The original PR fixed the ext_dir and built-tui paths but missed the
sibling pip-wheel path at line 1155. Without this, wheel installs would
lose --expose-gc entirely (the env-var append at the call site was
already removed). All three production node-launch sites now pass
--expose-gc via argv consistently.
2026-05-21 13:10:34 -07:00
teknium1
2e3f576298 chore(release): map yichengqiao21 to YarrowQiao 2026-05-21 13:10:34 -07:00
YarrowQiao
2ea7cf287e fix(tui): pass --expose-gc as node argv instead of NODE_OPTIONS
Node refuses to start when NODE_OPTIONS contains --expose-gc:

    node: --expose-gc is not allowed in NODE_OPTIONS

NODE_OPTIONS is restricted to a small allowlist of flags that are safe
to inject via env (since any process able to set env vars on a node
child could otherwise enable arbitrary capabilities). --expose-gc is
not on that list and never has been -- it must be passed as a direct
CLI flag.

_launch_tui() was appending --expose-gc to NODE_OPTIONS before spawning
the TUI's node process, which made `hermes --tui` fail to start on
every modern node release. The intent (manual GC for long sessions to
avoid fatal-OOM) is preserved by inserting --expose-gc directly into
the node argv in _make_tui_argv() -- same effect, but actually allowed.

--max-old-space-size=8192 stays in NODE_OPTIONS: it *is* allowlisted,
and keeping it there means downstream node spawns inherit the same
heap cap without having to re-thread the flag through every spawn site.

The dev paths (`tsx src/entry.tsx` and `npm start` fallback) are left
alone -- they don't accept node flags directly, and the production
dist path is the one users actually hit via `hermes --tui`.

Repro before fix:

    $ hermes --tui
    /usr/bin/node: --expose-gc is not allowed in NODE_OPTIONS
2026-05-21 13:10:34 -07:00
helix4u
ba9964ff0d fix(custom): pass custom provider extra body
Allow custom OpenAI-compatible providers declared under `custom_providers:`
to set provider-specific `extra_body` fields and have Hermes merge them into
chat-completions requests when the matching custom endpoint is active.

This is a manual per-provider override rather than a model-name heuristic.
OpenAI-compatible Gemma thinking support is real, but the on-wire payload
shape is backend-specific: some servers want top-level `enable_thinking`,
while vLLM Gemma and NIM-style endpoints expect `chat_template_kwargs`.
A per-provider override is safer than picking one assumed payload.

Example config:

```yaml
custom_providers:
  - name: gemma-local
    base_url: http://localhost:8080/v1
    model: google/gemma-4-31b-it
    extra_body:
      enable_thinking: true
      reasoning_effort: high
```

For vLLM Gemma or NIM-style endpoints, use the nested shape those servers
expect:

```yaml
extra_body:
  chat_template_kwargs:
    enable_thinking: true
```

Changes:

- `hermes_cli/config.py`: preserve `extra_body` in normalized
  `custom_providers:` entries and allow it in the validated field set.
- `hermes_cli/runtime_provider.py`: propagate custom-provider `extra_body`
  as `request_overrides.extra_body` for named custom runtime resolution,
  including credential-pool paths.
- `agent/agent_init.py`: at agent init, locate the matching custom-provider
  entry by `base_url` (+ optional model) and merge its `extra_body` into
  `AIAgent.request_overrides`, with caller-provided overrides winning on
  conflicting top-level keys.
- `plugins/model-providers/custom/__init__.py`: keep existing CustomProfile
  behavior (Ollama `num_ctx`, `think=False` when reasoning disabled);
  user-configured `extra_body` flows through `request_overrides`.
- `website/docs/integrations/providers.md`: document the explicit
  `extra_body` override and the vLLM/Gemma `chat_template_kwargs` variant.
- Tests cover config normalization, runtime propagation, model matching,
  trailing-slash equivalence, fallback when no `model` field is set, and
  caller-override merging precedence.

Verified end-to-end against `CustomProfile` via `ChatCompletionsTransport`:
configured `extra_body` reaches `kwargs.extra_body` on the wire request,
and coexists with profile-generated entries (Ollama `num_ctx`, `think=False`)
without clobber.

Salvaged from #29022 onto current `main`. Cosmetic typing edit in
`plugins/model-providers/custom/__init__.py` and a stale-base docs revert
in `providers.md` were dropped during cherry-pick.

Closes #29022
2026-05-21 07:48:53 -07:00
ethernet
2fdefca570
Merge pull request #28269 from cresslank/chore/tui-remove-unused-babel-deps
chore(tui): remove unused Babel build deps
2026-05-21 10:21:31 -04:00
ethernet
48be2e0e4d
test: use subprocesses for each test file (#29016)
* ci(tests): install ripgrep from prebuilt tarball instead of apt

apt-get update + install of ripgrep takes ~4 min on the GHA Ubuntu
runners (the apt-get update against archive.ubuntu.com is the slow
part; ripgrep itself is small). Switching to the upstream musl
binary tarball cuts the step to a few seconds.

- Pinned to ripgrep 15.1.0 with sha256 verification (same hash as
  published in the releases sha256 sidecar file).
- Drops the `rg` binary into /usr/local/bin so it is on PATH for
  every subsequent step without GITHUB_PATH manipulation.
- Applied to both the test and e2e jobs in tests.yml.

* fix(cli): compile syntax check to tempdir, not source __pycache__

`_validate_critical_files_syntax` runs `py_compile.compile()` on each
critical bootstrap file after a successful `git pull`. The default
`py_compile` writes the resulting `.pyc` next to the source under
`__pycache__/`, which causes two real problems:

1. Parallel test workers walking the same source tree (e.g. running
   the suite under per-file process isolation) can race against each
   other on the `__pycache__` write — manifests as flaky 'directory
   not empty' errors during teardown.
2. In production, the post-pull syntax check leaves a `.pyc` behind
   that the next interpreter run might pick up — fine when the
   interpreter version matches, sketchy if it doesn't.

Fix: write the compiled output to a `tempfile.TemporaryDirectory()`
that's discarded on function exit. We only care about the compile-or-not
signal, not the artifact.

* test(runner): per-file process isolation, drop manual state reset + xdist

Replace fragile manual _reset_module_state test fixtures with robust
per-file subprocess isolation. Each test file runs in a fresh
`python -m pytest <file>` subprocess via ThreadPoolExecutor. No xdist,
no custom pytest plugin, no shared worker state.

Key changes:
  * scripts/run_tests_parallel.py — new runner: discovers test files,
    runs N in parallel via ThreadPoolExecutor, captures stdout per file,
    treats exit code 5 (no tests collected) as pass, kills all children
    on exit. Change from cpu_count to cpu_count*2. The runner is
    I/O-bound (waiting on subprocess.communicate() from pytest children)
    The parent process does almost no CPU work, so 2x oversubscription
    keeps more pipes full. When a file fails, immediately show the last
    30 lines of pytest output (stack traces + FAILED summary) plus a
    ready-to-copy repro command:
      python -m pytest tests/agent/test_auxiliary_client.py
  * scripts/run_tests.sh — delegates to run_tests_parallel.py
  * .github/workflows/tests.yml — test step: python
scripts/run_tests_parallel.py
  * pyproject.toml — drop pytest-xdist, pytest-split; simplify addopts
  * tests/conftest.py — remove ~200 lines of manual state-reset fixtures
  * AGENTS.md — update Testing section for per-file design

* test(runner): speed gateway test antipattern scan up

* fix(test): web search provider plugin test missing xai

* fix(tests): make 14 test files pass under per-file subprocess isolation

Tests that relied on cross-file state pollution from xdist workers
fail when run in isolation (per-file subprocess model). Root causes
and fixes:

Tool registry not populated:
  - test_video_generation_tool_surface_matrix: add discover_builtin_tools()
  - test_web_providers_brave_free/ddgs/searxng/general: autouse fixtures
    registering all 8 bundled web providers, reset after each test
  - test_website_policy: same provider registration pattern
  - test_web_tools_tavily: same pattern across 3 dispatch test classes
  - Also add is_safe_url/check_website_access mocks where SSRF check
    blocks example.com (DNS resolution fails in isolated envs)

Stale check_fn cache:
  - test_kanban_tools: invalidate_check_fn_cache() + _clear_tool_defs_cache()
    in both kanban guidance tests (prior test cached False for kanban_show)
  - test_discord_tool: cache invalidation in setup/teardown
  - test_homeassistant_tool: invalidate_check_fn_cache() before registry queries

Module-level state pollution:
  - test_auxiliary_client: autouse fixture clearing _aux_unhealthy_until cache
  - test_skill_commands: set_session_vars() instead of patch.dict(os.environ)
    (ContextVar takes precedence over os.environ)
  - test_dm_topics: overwrite sys.modules + separate telegram.constants mock
    + force-reimport of gateway.platforms.telegram
  - test_terminal_tool_requirements: removed duplicate class declaration,
    autouse _clear_caches fixture

* change(tests): run_tests.sh explicitly includes env vars

instead of manually dropping some vars, now we just only include some

* fix(tests): 5 more isolation/NixOS fixes

- test_approval_plugin_hooks: isolate HERMES_HOME so real user's
  command_allowlist doesn't short-circuit the approval path
- test_google_chat: skipif when Platform.GOOGLE_CHAT not in enum
  (feature not merged on this branch)
- test_write_deny: test systemd prefix against tmp_path instead of
  /etc/systemd which resolves to /nix/store on NixOS
- test_pty_bridge: use shutil.which('cat') instead of /bin/cat
  (doesn't exist on NixOS)
- profiles.py: rmtree onexc handler chmod's parent dirs too, fixing
  profile deletion when copytree preserved read-only modes from
  nix store

* fix(tests): clear unhealthy cache in autouse fixture for auxiliary_client

* fix(tests): skip send_message when telegram not installed; handle missing worker_id in browser_supervisor

* fix: py3.11 rmtree onexc compat + belt-and-suspenders unhealthy cache clear for expired codex test

* fix: address PR #29016 review feedback

- Remove tracked .pytest-cache/ artifact and add to .gitignore
- Fix stale 'xdist worker' comment in conftest.py
- Deduplicate web provider registration into tests/tools/conftest.py
  shared helper (register_all_web_providers), replacing 8 copy-pasted
  blocks across 6 test files
- Update PR description: remove stale recovered-test-files claim,
  fix worker count to match code (cpu_count*2)

* fix: eliminate race in stale-cache achievements test

The background scan thread could complete and overwrite _SNAPSHOT_CACHE
before evaluate_all() returned the stale data — only 10 fake sessions
made the scan finish instantly. Added scan_delay param to _FakeSessionDB
and set it to 2s in the stale-cache test so the background thread can't
win the race.
2026-05-21 16:40:04 +05:30
alt-glitch
87d9239009 chore: trim verbose comments/docstrings, add AUTHOR_MAP entry
- Replace 18-line comment block with 3-line invariant statement
- Trim test docstrings from multi-paragraph to single-line summaries
- Trim assertion messages from 4-line to 2-line mismatch reports
- Replace 5-line WHAT comments in stubs with 1-line WHY comments
- Add ziliangdotme@gmail.com -> ziliangpeng to AUTHOR_MAP
2026-05-21 12:49:21 +05:30
Ziliang Peng
c3a09f7835 fix(background_review): propagate parent toolset config to keep tools[] cache-stable
## Summary

The background skill/memory-review fork constructed a child `AIAgent`
without propagating `enabled_toolsets` / `disabled_toolsets` from the
parent. When the parent narrowed its toolset (via `hermes tools
disable` or `config.yaml`), the fork's default `enabled_toolsets=None`
expanded to "all registered tools" — and the fork's outbound request
body sent a wider `tools[]` array than the parent's main-turn request.

Anthropic's prompt-cache key includes the `tools[]` array byte-for-byte,
so this divergence forked the cache lineage on every nudge and forced a
full prefix rewrite. On a captured ~4 hour Claude-via-Hermes session
this cost roughly 4.3 M cache-write tokens — about half of those
attributable to the per-nudge alternation between the main turn's
narrowed `tools[]` and the review fork's wider `tools[]`.

## Goal

Extend the byte-stability invariant established by PR #17276 (which
fixed `system`) to the `tools[]` slot of the request body, so the
review fork's outbound request hits the parent's warmed Anthropic
prefix cache regardless of how the parent's toolset is configured.

## Implementation

Two-line change in `agent/background_review.py`: pass
`enabled_toolsets=getattr(agent, "enabled_toolsets", None)` and the
matching `disabled_toolsets` kwarg into the `AIAgent(...)` call inside
`_spawn_background_review`. Adds an explanatory block comment that
calls out the cache-key dependency and the relationship to PR #17276.

The post-construction runtime whitelist
(`set_thread_tool_whitelist({memory, skills})`) is untouched — it
still gates which tools the model is allowed to *dispatch*. This
change aligns only what the request body *transmits*, not what the
review is allowed to do, so the safety contract from issue #15204
remains intact.

## Testing

- `tests/run_agent/test_background_review_cache_parity.py`: new
  `test_review_fork_inherits_parent_toolset_config` asserts the
  parent's `enabled_toolsets` and `disabled_toolsets` reach the
  review-fork constructor as kwargs.
- `tests/run_agent/test_background_review_toolset_restriction.py`:
  the existing `test_background_review_does_not_narrow_toolset_schema`
  was inverted (its old "must NOT pass enabled_toolsets" rule was
  built on the assumption that the parent always ran with the
  registry default — wrong in practice when the parent is narrowed).
  Renamed to `test_background_review_matches_parent_toolset_config`
  and updated to assert the parent's value propagates verbatim.
- Verified the new positive test fails without the fix and passes
  with it.
- Full suite for `test_background_review*`:

  ```
  $ python -m pytest tests/run_agent/test_background_review.py \
                     tests/run_agent/test_background_review_summary.py \
                     tests/run_agent/test_background_review_toolset_restriction.py \
                     tests/run_agent/test_background_review_cache_parity.py -q
  18 passed in 1.85s
  ```

## Scope

- `agent/background_review.py`: 2 added kwargs + explanatory comment.
- Two test files: one new positive test, one inverted existing test.
- No production code paths outside the review fork; no schema changes;
  no public-API changes.

Refs: ziliangpeng/hermes-agent#1 (root-cause analysis with wire-level
cache-write measurements). Extends PR #17276's `system`-bytes
invariant to the `tools[]` slot.
2026-05-21 12:49:21 +05:30
EloquentBrush0x
6c26727bb3 fix(gateway): extend observe+attribution to location and media handlers
_handle_location_message and _handle_media_message were skipped when the
observe-unmentioned-group-messages feature landed (a9db0e2c7). Both handlers
now:

1. Check _should_observe_unmentioned_group_message on the skipped path and
   call _observe_unmentioned_group_message so group chatter is stored as
   shared session context even when the bot is not addressed.

2. Call _apply_telegram_group_observe_attribution on the triggered path so
   the dispatched event uses the shared (user_id=None) group session instead
   of the per-user session, letting the model see previously observed context.
   For stickers the attribution is applied after _handle_sticker completes
   (which overwrites event.text with the vision description); for all other
   media types it is applied once after caption cleaning.

Four new tests cover the observe and attribution paths for both handlers.
2026-05-20 23:52:18 -07:00
0xsir0000
5edb346c75 security(file-safety): also write-deny <root>/.env when running under a profile (#15981)
build_write_denied_paths() resolved the protected ``.env`` via
get_hermes_home(), which is profile-aware. When a profile is active
HERMES_HOME points at ``<root>/profiles/<name>`` and ``hermes_home / ".env"``
expands to the *profile* env file only — the global ``<root>/.env`` is left
off the deny list and a write_file call against it succeeds. Since the
top-level .env supplies credentials inherited by every profile, this is a
P0 credential-exfiltration / overwrite path.

Add a parallel ``_hermes_root_path()`` helper that returns the Hermes root
(via the existing ``get_default_hermes_root()`` constant) and include
``<root>/.env`` in the deny list alongside ``<active_profile>/.env``. Both
paths now refuse write_file/patch regardless of profile state. The active
HERMES_HOME .env entry is preserved so the protection in non-profile mode
is unchanged.

A regression test exercises the profile-active scenario by pointing
HERMES_HOME at ``<tmp>/profiles/coder`` and asserting that ``<tmp>/.env``
is denied.

Fixes #15981
2026-05-20 23:37:37 -07:00
teknium1
f722ec723f chore: add nycomar to AUTHOR_MAP 2026-05-20 23:27:38 -07:00
Omar B
be0728cacc fix: handle Discord typing indicator 429 gracefully
The typing indicator loop (send_typing) ran every 8s and died on any
exception, including Discord 429 rate limits.  Once a 429 killed the
loop, the indicator never restarted — and the raw exception bounce
could cascade into broader gateway instability.

Changes:
- Bump sleep interval from 8s to 12s (typing light lasts ~10s)
- On 429: extract retry_after, log a warning, sleep the backoff,
  and continue the loop
- On non-rate-limit errors: log debug and return (unchanged
  behaviour)
2026-05-20 23:27:38 -07:00
Teknium
975e13091e fix(cli): honour image-routing decision in quiet-mode -q --image path
The interactive CLI input path consults decide_image_input_mode() to pick
between native image_url attachment and the vision_analyze text pipeline,
but the non-interactive 'hermes chat -Q -q ... --image FOO' path
unconditionally called _preprocess_images_with_vision() — so even with
`model.supports_vision: true` set, --image always went through the
text-pipeline. Symptom: vision_analyze runs 4-5s per image and the model
sees a lossy text summary instead of the actual pixels.

Mirror the interactive path: load config, call decide_image_input_mode,
branch on native vs text. Falls back to the text-pipeline on any import
or build error (Pyright-clean: _build_parts guarded with `is not None`).

Live E2E (provider=custom, base_url=openrouter, anthropic/claude-haiku-4.5,
red 64x64 PNG):
  baseline (no override): vision_analyze called (8 log lines), 5.8s
  with supports_vision:   vision_analyze NOT called (0 log lines),  3.9s
Same model, same image, single knob flips text→native routing.
2026-05-20 23:27:10 -07:00
Teknium
32aea113f0 fix(agent): consult supports_vision override in auto-mode routing
The contributor PR (#17936) only patched the strip path in
`_model_supports_vision()`. The auto-mode router in
`agent/image_routing._lookup_supports_vision` still only read models.dev,
so a custom-provider model declared as vision-capable would still get its
images routed through vision_analyze in the default `agent.image_input_mode:
auto` setting. Users had to set both `supports_vision: true` AND
`image_input_mode: native` to bypass the text pipeline.

Single-knob behavior now: `supports_vision: true` alone is enough in auto
mode. The strip path and the routing path consult the same resolver.

- Extract override resolution into `_supports_vision_override()` in
  agent/image_routing.py and wire it into `_lookup_supports_vision()`.
- Refactor `run_agent._model_supports_vision` to call the same helper
  (DRY, single source of truth for the resolution order).
- Strict YAML boolean coercion: `supports_vision: "false"` (quoted —
  a common YAML mistake) no longer coerces to True via bool() truthiness.
  Recognised tokens: true/false/yes/no/on/off/1/0 plus real bools and 0/1.
  Unrecognised values return None and fall through to models.dev.
- Add @CNSeniorious000 to AUTHOR_MAP for release attribution.

Tests: 26 new (TestCoerceCapabilityBool, TestSupportsVisionOverride,
TestLookupSupportsVisionOverride, TestAutoModeRespectsOverride). Existing
contributor tests + image_routing + vision_native_fast_path +
native_image_buffer_isolation all green (92/92).
2026-05-20 23:27:10 -07:00
Muspi Merol
1c76689b28 fix(agent): resolve supports_vision override for named custom providers
Named custom providers are rewritten to provider="custom" at runtime
(hermes_cli/runtime_provider.py:_resolve_named_custom_runtime), so a
config under providers.my-vllm.models.my-llava.supports_vision was
unreachable via self.provider alone. Also try cfg.model.provider as a
candidate provider key, covering both runtime and config naming.

Adds a regression test for the named-provider path.
2026-05-20 23:27:10 -07:00
Muspi Merol
24c7ce0fb8 feat(agent): allow declaring supports_vision via user config
Custom/local provider models absent from models.dev get classified as
non-vision and have their image content stripped before reaching the
upstream API. Surface a user-facing override:

  model:
    supports_vision: true

  providers:
    my-vllm:
      models:
        my-llava:
          supports_vision: true

The override short-circuits the models.dev lookup in
_model_supports_vision(), which is the single gate guarding image-strip
preprocessing on every transport path.

Refs #8731.
2026-05-20 23:27:10 -07:00
Teknium
b4afc6546e fix(xai): restore encrypted reasoning replay across turns
xAI partner integration requires Hermes to thread `encrypted_content`
reasoning items back to the Responses API on every turn so Grok can
maintain cross-turn reasoning coherence. PR #26644 (May 15) gated this
off for `is_xai_responses` on the theory that the OAuth/SuperGrok
surface rejected replayed encrypted blobs and produced the multi-turn
"Expected to have received \`response.created\` before \`error\`"
failure. That diagnosis was wrong — the prelude-SSE fallback added in
the same PR is what actually fixed that failure mode. Suppressing the
replay was an unnecessary side-effect that broke the whole point of
xAI's partnership integration.

Changes:
- agent/codex_responses_adapter.py — drop the `is_xai_responses` gate
  in `_chat_messages_to_responses_input`. Keep the kwarg in the
  signature for transport compatibility; update the docstring to
  document the May 2026 reversal.
- agent/transports/codex.py — restore
  `kwargs["include"] = ["reasoning.encrypted_content"]` on the xAI
  Responses path so xAI echoes encrypted reasoning back to us.
- tests/run_agent/test_codex_xai_oauth_recovery.py — flip the three
  xAI assertions (now: xAI MUST receive replayed reasoning AND we MUST
  include encrypted_content in the request).
- tests/agent/transports/test_codex_transport.py — flip the
  `include` assertions on `test_xai_reasoning_effort_passed` and
  `test_xai_grok_4_omits_reasoning_effort`; update the allowlist
  block comment.

The prelude-SSE fallback and the entitlement-403 surfacing fixes from
#26644 are untouched — they were independent fixes that happened to
ride along with the reasoning-replay gate.

Validation:
- Targeted: tests/run_agent/test_codex_xai_oauth_recovery.py +
  tests/agent/transports/test_codex_transport.py → 65/65 pass
- Broader: tests/agent/transports/ + tests/run_agent/ →
  1674 passed, 3 skipped, 0 failures
- E2E (real imports, isolated HERMES_HOME, ResponsesApiTransport
  build_kwargs): turn-1 request carries
  `include: ["reasoning.encrypted_content"]`; turn-2 input replays
  the encrypted_content blob from turn-1's
  `codex_reasoning_items`; native Codex unchanged.
2026-05-20 23:12:45 -07:00
Teknium
127b56a61a style: docstring + whitespace cleanup on secure_parent_dir
- Drop two extra blank lines between display_hermes_home and secure_parent_dir
- Fix docstring saying 'depth < 2' (actual guard is parts < 3)
2026-05-20 22:56:55 -07:00
liuhao1024
4ead464f97 fix(security): guard os.chmod(parent) against / and top-level dirs
Five call sites do os.chmod(path.parent, 0o700) without checking that
the parent resolves to a safe directory. If HERMES_HOME or another
path env var resolves to /, the chmod strips traversal permission from
the root inode and bricks the entire host.

Add secure_parent_dir() to hermes_constants.py that refuses to chmod
/ or any top-level directory (depth < 2). Replace all 5 call sites
with this helper.

Fixes #25821
2026-05-20 22:56:55 -07:00
teknium1
3bbe980115 chore: add Glucksberg to AUTHOR_MAP 2026-05-20 22:55:31 -07:00