Commit graph

9202 commits

Author SHA1 Message Date
Pratik Rai
1f5219fda5 fix(security): protect Hermes control-plane files from prompt injection
Adds active-HERMES_HOME control-plane files to the write deny list:
auth.json, config.yaml, webhook_subscriptions.json, and any path
under mcp-tokens/. realpath() resolves before comparison so
directory-traversal and symlink targets are normalised, preventing
trivial deny-list bypass via ../ tricks.

Without this, a prompt-injected agent could rewrite Hermes' own
auth state or routing config via write_file / patch — without
triggering the terminal dangerous-command approval — and persist
attacker-controlled behaviour across sessions.

Fixes #14072
2026-05-22 04:32:14 -07:00
Teknium
6f436a463e infographic: PR #27784 anthropic adapter refactor salvage 2026-05-22 04:23:02 -07:00
kshitijk4poor
9d61408837 refactor: extract 7 helpers from convert_messages_to_anthropic
Split convert_messages_to_anthropic (complexity 79) into 7 focused helpers:

- _convert_assistant_message    — assistant msg to content blocks
- _convert_tool_message_to_result — tool msg to tool_result + merge
- _convert_user_message         — user msg validation + conversion
- _strip_orphaned_tool_blocks   — orphan tool_use + tool_result removal
- _merge_consecutive_roles      — role alternation enforcement
- _manage_thinking_signatures   — strip/preserve/downgrade by endpoint
- _evict_old_screenshots        — keep only 3 most recent images

Main function complexity: 79 → 10 (below C901 threshold).
Zero logic changes — pure extraction. Net -4 lines (refactor itself);
+45/-17 follow-up polish for annotation tightening (List[Dict] →
List[Dict[str, Any]]), restored rationale comments in
_manage_thinking_signatures (third-party endpoint examples, #13848/#16748
issue refs, redacted_thinking 'data'-as-signature note), and "Mutates
``result`` in place." docstring lines on the four mutating helpers.
2026-05-22 04:23:02 -07:00
Teknium
ec2ab5bfaf infographic: PR #8056 hash pairing codes salvage 2026-05-22 04:11:49 -07:00
Teknium
82c2035823 fix(pairing): handle legacy plaintext pending entries during upgrade
When an existing install upgrades to the hashed-pending schema, its
on-disk pending.json still has the old {code: entry} format with no
hash/salt fields. The original PR #8056 assumed every entry had both
fields and would have KeyErrored in approve_code, list_pending, and
_cleanup_expired.

Guard each consumer:
  - approve_code: skip entries that are not a dict, lack salt/hash,
    or have a non-hex salt. Legacy entries simply fail to match.
  - list_pending: tolerate missing 'hash' (show "legacy" placeholder)
    and non-numeric created_at (skip the row).
  - _cleanup_expired: treat malformed/legacy entries as expired so
    they get pruned on the next call rather than wedging the file.

Regression tests cover all three consumers plus a mixed-malformed
case.
2026-05-22 04:11:49 -07:00
Tom Qiao
2e509422ef fix(security): hash gateway pairing codes instead of storing plaintext
Pairing codes were stored as plaintext keys in JSON files. Now uses
sha256 + random salt hashing with constant-time comparison.

Fixes #8036

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-22 04:11:49 -07:00
0xDevNinja
3ac2125140 refactor(image_gen): port FAL backend to plugins/image_gen/fal
Mirrors the architecture established by the web (#25182), browser
(#25214), and video_gen (#25126) plugin migrations:

* `tools/fal_common.py` — stateless atoms shared by both FAL-backed
  plugins (image_gen + video_gen). Holds the lazy `fal_client` import
  helper, `_ManagedFalSyncClient`, `_normalize_fal_queue_url_format`,
  `_extract_http_status`. Stateful pieces (`fal_client` module global,
  `_managed_fal_client*` cache, `_submit_fal_request`,
  `_resolve_managed_fal_gateway`, `_get_managed_fal_client`)
  intentionally stay on `tools.image_generation_tool` so the existing
  `monkeypatch.setattr(image_tool, ...)` patch sites keep working
  unchanged.

* `plugins/video_gen/fal/__init__.py` — drops its inline
  `_load_fal_client` duplicate; consumes `tools.fal_common.import_fal_client`.

* `plugins/image_gen/fal/{plugin.yaml,__init__.py}` — new plugin.
  `FalImageGenProvider` is a thin registration adapter that resolves
  the legacy module via `import tools.image_generation_tool as _it`
  and calls `_it.image_generate_tool` + `_it._resolve_fal_model` at
  call time. The 18-model catalog, `_build_fal_payload`, managed-
  gateway selection, and Clarity Upscaler chaining all remain in
  `tools.image_generation_tool` as the single source of truth —
  the plugin is a registration adapter, not a parallel implementation.

* `tools/image_generation_tool.py::_dispatch_to_plugin_provider` —
  drops the `configured == "fal"` skip. Setting `image_gen.provider:
  fal` now routes through the registry like any other provider; the
  plugin re-enters this module's pipeline so behavior is identical.
  Unset `image_gen.provider` still falls through to the in-tree
  pipeline (preserves no-config-with-FAL_KEY UX from #15696).

* `hermes_cli/tools_config.py` — drops the hardcoded "FAL.ai" row from
  `TOOL_CATEGORIES["image_gen"]["providers"]` (now injected by
  `_plugin_image_gen_providers` like every other backend) and the
  `getattr(provider, "name") == "fal"` skip that protected against
  duplication with the hardcoded row. The "Nous Subscription" row
  stays as a setup-flow entry — same shape browser kept "Nous
  Subscription (Browser Use cloud)" after #25214.

* `tests/plugins/image_gen/test_fal_provider.py` — 14 cases covering
  the ABC surface, call-time indirection (verifying
  `monkeypatch.setattr(image_tool, "image_generate_tool", ...)` takes
  effect through the plugin), response-shape stamping, exception
  handling, and registry wiring.

* `tests/plugins/image_gen/check_parity_vs_main.py` — subprocess
  harness mirroring `tests/plugins/browser/check_parity_vs_main.py`.
  Pins one path to origin/main, one to the worktree; runs six
  scenarios (unset, explicit-fal-no-creds, explicit-fal-with-creds,
  explicit-fal-with-model, typo provider, managed-gateway-only) and
  diffs the reduced shape `{dispatch_kind, provider_name, model}`
  per scenario. The only acceptable diff is "legacy_fal → plugin
  (fal)" for explicit-FAL paths — every other delta is flagged as
  a regression.

* `tests/hermes_cli/test_image_gen_picker.py::test_fal_surfaced_alongside_other_plugins`
  — flips the previous `test_fal_skipped_to_avoid_duplicate` to
  match the new shape (FAL is a plugin now, no dedup needed).

Verified: 195/195 tests across
`tests/{tools/test_image_generation*,tools/test_managed_media_gateways,plugins/image_gen,plugins/video_gen,hermes_cli/test_image_gen_picker}.py`
pass on this branch with no test patches modified outside the picker
test that asserted the old skip behaviour.

Fixes #26241
2026-05-22 04:10:45 -07:00
Teknium
7dea33303a infographic: PR #30373 aux model picker parity salvage 2026-05-22 04:10:38 -07:00
Teknium
d246f9a278 fix(aux-picker): drop stale session_search slot
PR #27590 removed auxiliary.session_search from DEFAULT_CONFIG (single-shape
tool now returns DB content directly without an aux LLM), but the slot
remained in _AUX_TASK_SLOTS (web_server.py) and AUX_TASKS (ModelsPage.tsx).
Removing the dead entries while we're touching these tables.
2026-05-22 04:10:38 -07:00
flooryyyy
c1e93aa331 fix: add missing aux model slots to model picker
triage_specifier, kanban_decomposer, profile_describer exist in
DEFAULT_CONFIG auxiliary section but weren't in _AUX_TASK_SLOTS,
_AUX_TASKS, or the dashboard AUX_TASKS array — so users couldn't
configure them through hermes model or the web dashboard.

9â\x86\x9212 aux slots across all three UI surfaces.
2026-05-22 04:10:38 -07:00
Teknium
8b49012a0a infographic: PR #8306 webhook HMAC bypass salvage 2026-05-22 03:45:21 -07:00
Teknium
3fc715ddf5 test(webhook): regression cases for empty-secret HMAC bypass
Covers _reload_dynamic_routes() rejecting empty or missing per-route
secrets when no global fallback exists, preserving the INSECURE_NO_AUTH
opt-in, inheriting a global secret when only the per-route value is
missing, and partial-skip when only one of multiple routes is bad.
2026-05-22 03:45:21 -07:00
memosr
9c90b3a597 fix(security): validate secret in _reload_dynamic_routes to prevent HMAC bypass 2026-05-22 03:45:21 -07:00
briandevans
22b0d6dc1a test(tools): centralize disable_lazy_stt_install fixture in conftest
Move the autouse `_disable_lazy_stt_install` fixture out of the three
transcription test files and into `tests/tools/conftest.py` as a regular
(non-autouse) fixture. Each transcription test module opts in once at
the top via `pytestmark = pytest.mark.usefixtures(...)`.

Why: addresses three Copilot inline review comments on this PR that
flagged the verbatim duplication across files. Centralizing also keeps
the patch target in a single place, so a future rename of
`_try_lazy_install_stt` only updates one location.

Why opt-in (not autouse in conftest): other `tests/tools/` files do not
patch `_HAS_FASTER_WHISPER` and have no reason to bypass the runtime
lazy-install probe; making the fixture autouse globally would silently
mask any future test that wants to exercise the real lazy-install path.
2026-05-22 03:33:01 -07:00
briandevans
5dc232a6e2 test(tools): disarm lazy-install probe so _HAS_FASTER_WHISPER patches work
`b5c6d9ac0` ("fix: wire STT lazy-install into transcription_tools.py")
added `_try_lazy_install_stt()`, which calls
`importlib.util.find_spec("faster_whisper")` after `ensure()` runs.
In the dev / CI environment `faster_whisper` is already installed, so
the probe returns truthy and `_get_provider()` returns "local" even
when the test has patched `_HAS_FASTER_WHISPER=False` to simulate
"not installed".

Add a per-file autouse fixture that patches `_try_lazy_install_stt`
to return False so the simulation stays accurate. The 16 baseline
failures across `test_transcription_tools.py`,
`test_transcription.py`, and `test_transcription_dotenv_fallback.py`
disappear; the production lazy-install path is unaffected at runtime.
2026-05-22 03:33:01 -07:00
Teknium
c25f9d1d36
feat(secrets): label detected credentials with their source (Bitwarden) (#30364)
When Bitwarden Secrets Manager supplies a provider key, 'hermes model'
and the setup wizard show 'credentials ✓' with no hint of where the
key came from — identical to the .env case. Users assume the integration
isn't wired up and re-enter the key (or hit Enter and cancel).

env_loader now tracks which env vars were injected by an external secret
source and exposes get_secret_source() / format_secret_source_suffix() so
the provider flows can render 'Anthropic credentials: sk-ant-... ✓
(from Bitwarden)' instead of an unlabeled checkmark.

Wired into _prompt_api_key (kimi, z.ai, minimax, opencode, ...), the
Anthropic provider flow, the Bedrock flow, and the GitHub Copilot token
display.

Future secret sources (Vault, 1Password, etc.) drop in by setting their
own label in _SECRET_SOURCES; format_secret_source_suffix() has a generic
fallback so no call sites need updating.
2026-05-22 03:32:58 -07:00
teknium1
d617858896 fix(openviking): target-aware mirror subdir, drop private-attr access, dedupe URI builder
- on_memory_write: map target='memory' -> patterns/, 'user' -> preferences/
  (was hardcoded to preferences/ for both)
- Replace client._user with self._user (no private-attr leakage)
- Extract _build_memory_uri() helper + module-level subdir maps
- Restore on_memory_write signature parity with MemoryProvider base
  (metadata kwarg; eliminates Pyright incompatible-override warning)
- AUTHOR_MAP entry for chrisdlc119@outlook.com
2026-05-22 01:27:52 -07:00
Christian de la Cruz
2d587c5662 fix(openviking): store memories via content/write API instead of session messages
_tool_remember and on_memory_write were posting memories as session
messages that depend on commit-time VLM extraction to persist. With
extraction_enabled: false (no VLM configured), the extraction pipeline
never processes these messages, causing memories to be silently lost.

Replace both paths with direct POST to /api/v1/content/write?mode=create,
which creates the file, stores the content, and queues vector indexing
in a single API call. Error reporting is immediate — no silent failures.

- Maps viking_remember category to viking:// subdirectory
- Generates UUID-based URIs via uuid4().hex[:12]
- Returns byte count in confirmation message
2026-05-22 01:27:52 -07:00
Teknium
caf0f30eab chore(release): add sgtworkman to AUTHOR_MAP 2026-05-22 01:24:11 -07:00
sgtworkman
70d53d8b75 fix: run computer use post-setup when enabling tool 2026-05-22 01:24:11 -07:00
Rodrigo
fbdca64f73 fix(computer-use): skip capture_after when action failed (ok=False)
_maybe_follow_capture() issued a follow-up screenshot unconditionally
when capture_after=True, even when res.ok=False. The model then received
a normal-looking screenshot alongside an error message, and in practice
it often ignored ok=False and proceeded as if the action had succeeded.

Fix: return _text_response(res) early when res.ok is False so the model
receives only the error and can decide how to recover.

Tests added:
- test_capture_after_skipped_when_action_failed: patches click to return
  ok=False and asserts no capture call is issued.
- test_capture_after_fires_when_action_succeeds: ensures the happy path
  still triggers the follow-up capture.
2026-05-22 01:19:01 -07:00
Teknium
07b7cf6fe4 chore(release): add rodrigoeqnit to AUTHOR_MAP 2026-05-22 01:14:15 -07:00
Rodrigo
c52cd48e25 fix(computer-use): add set_value to ComputerUseBackend ABC and _NoopBackend stub
_dispatch() routes action="set_value" to backend.set_value(), but:
- ComputerUseBackend did not declare set_value as @abstractmethod, so
  subclasses could silently omit it without a TypeError at class load time.
- _NoopBackend (the test/CI stub) had no set_value method at all, causing
  AttributeError in any test that exercises the set_value action path.

Fix:
- Add set_value as @abstractmethod to ComputerUseBackend in backend.py.
- Add a recording stub in _NoopBackend in tool.py.
- Add two TestDispatch cases: one verifying the call reaches the backend,
  one verifying the missing-value guard returns a clean error.
2026-05-22 01:14:15 -07:00
Tranquil-Flow
d3f62c6913 fix(cli): clamp curses color 8 for 8-color terminals (Docker)
curses.init_pair(N, 8, -1) uses extended color 8 ("bright black" /
dim gray) which does not exist on 8-color terminals (COLORS == 8,
valid range 0-7).  This crashes the entire plugins UI, session
browser, and radio picker in Docker containers with:

    curses.error: init_pair() : color number is greater than COLORS-1

Replace all 5 occurrences across plugins_cmd.py, main.py, and
curses_ui.py with min(8, curses.COLORS - 1), which falls back to
COLOR_WHITE (7) on 8-color terminals.

Closes #13688
2026-05-21 23:40:58 -07:00
Teknium
c769be344a
fix(agent): recover from providers rejecting list-type tool content (#27344) (#30259)
Some providers (Xiaomi MiMo, some Alibaba endpoints, a long tail of
OpenAI-compatible servers) follow the OpenAI spec strictly and require
tool message `content` to be a string — they reject our list-type
content (text + image_url parts) with HTTP 400 'text is not set' /
'tool message content must be a string'.

Instead of an allowlist of known-good providers (maintenance burden,
guaranteed to miss aggregators like OpenRouter where the underlying
model determines support, not the aggregator name), this lands a
reactive recovery:

1. New `FailoverReason.multimodal_tool_content_unsupported` with a
   small pattern list covering the common 400 wordings.
2. `AIAgent._try_strip_image_parts_from_tool_messages` walks the API
   message list, downgrades any `role:tool` message whose content is
   list-with-image to a plain text summary (preserves text parts) in
   place, AND records the active (provider, model) in a session-scoped
   `_no_list_tool_content_models` set.
3. `_tool_result_content_for_active_model` short-circuits to a text
   summary when (provider, model) is in the cache — so after the first
   400 + retry, subsequent screenshots in the same session skip the
   round trip entirely.
4. Retry hook in `agent.conversation_loop` mirrors the existing
   `image_too_large` recovery: detect the reason, run the helper,
   retry once, fall through to the normal error path if no list-type
   tool content was actually present.

Cache is transient (per-session) by design — next session retries in
case the provider added support, no persistent state to maintain.

Fixes #27344. Closes #27351 (allowlist approach superseded by reactive
recovery).
2026-05-21 23:40:16 -07:00
teknium1
372e9a18cd fixup: log lazy-install errors at debug + AUTHOR_MAP for CipherFrame
Co-authored-by: CipherFrame <cipherframe@users.noreply.github.com>
2026-05-21 23:36:18 -07:00
CipherFrame
b5c6d9ac08 fix: wire STT lazy-install into transcription_tools.py
The ensure('stt.faster_whisper') lazy-install mechanism was defined in
lazy_deps.py but never called from the STT code path. When
_HAS_FASTER_WHISPER (a module-level constant) evaluated to False at
import time, _get_provider() returned 'none' immediately without
attempting installation. On fresh container builds or venv recreations,
this meant voice message transcription broke silently until someone
manually installed faster-whisper.

Add _try_lazy_install_stt() helper that calls ensure() and
re-checks dynamically via importlib.util.find_spec. Wire it into
all three gates in transcription_tools.py:

- _get_provider() explicit 'local' path (line 221)
- _get_provider() auto-detect path (line 287)
- _transcribe_local() guard (line 405)

This ensures the first voice message after any fresh install triggers
auto-installation instead of failing permanently until a process restart.
2026-05-21 23:36:18 -07:00
helix4u
f6f25b9449 fix(agent): fail fast on small Ollama runtime context 2026-05-21 23:25:01 -07:00
Teknium
e77f1ed5f7 fix(agent): widen toolset gate to context engine tools (#5544 sibling)
The memory-provider gate added in the prior commit closes one of two
blind-injection sites in agent_init.py. The context engine block (lines
~1445) follows the identical pattern: agent.context_compressor.get_tool_schemas()
(lcm_grep, lcm_describe, lcm_expand) was appended to agent.tools unconditionally,
ignoring enabled_toolsets.

Same bug class, same local-model latency penalty, same one-line gate — using
'context_engine' as the toolset name (matches the existing plugin-system
convention in plugins.py, plugins_cmd.py, etc.).

Also adds Lempkey to scripts/release.py AUTHOR_MAP for the prior commit's
authorship.
2026-05-21 23:18:37 -07:00
lempkey
4c61fb6cf6 fix(agent): gate memory tool injection on enabled_toolsets (#5544)
MemoryManager.get_all_tool_schemas() output was appended to AIAgent.tools
unconditionally — bypassing the enabled_toolsets / platform_toolsets filter.
Setting `platform_toolsets: telegram: []` had no effect: fact_store and other
memory provider tools still leaked into the tool surface on every session.

Impact on local models (per @thundercat49's benchmarks on Qwen3-30B-A3B Q4_K_M /
RTX 3090): tool-formatted prompts process at 134 tok/s vs 1,230 tok/s for plain
text. With 8 memory tool schemas injected, a simple 'hello' on Telegram took
~42s instead of ~1.7s. Small models also entered tool-call loops when memory
tools were the only tools present.

Gate condition (matches the natural meaning of enabled_toolsets):
  None                       → no filter, inject (backward compat)
  contains 'memory'          → user opted in, inject
  otherwise (including [])   → skip injection

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-21 23:18:37 -07:00
brooklyn!
1264fab156
fix(tui): surface verbose tool details (#30225)
Some checks failed
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Docker Build and Publish / move-latest (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
Nix Lockfile Fix / auto-fix-main (push) Waiting to run
Nix Lockfile Fix / fix (push) Waiting to run
Nix / nix (macos-latest) (push) Waiting to run
Nix / nix (ubuntu-latest) (push) Waiting to run
Tests / test (push) Waiting to run
Tests / e2e (push) Waiting to run
Deploy Site / deploy-vercel (push) Has been cancelled
Deploy Site / deploy-docs (push) Has been cancelled
OSV-Scanner / Scan lockfiles (push) Has been cancelled
uv.lock check / uv lock --check (push) Has been cancelled
* fix(tui): surface verbose tool details

Emit redacted structured verbose args/results to the TUI so /verbose verbose can show full tool detail without reopening stdout, and fail closed if redaction is unavailable.

Salvages #29011.

Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>

* fix(tui): address verbose detail review

Label verbose tool failures as errors, cover forced verbose reasoning, and avoid new diff type warnings from the redaction regression tests.

* fix(tui): bound verbose tool payloads

Cap verbose tool detail text before emitting JSON-RPC events and preserve verbose results on inline diff completions.

* fix(tui): align termux argv test with gc flag

Update the stale TUI launch expectation so the Termux freshness path matches the current direct Node argv.

---------

Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
2026-05-22 00:16:52 -05:00
Teknium
4e2c66a098 chore(release): add AUTHOR_MAP entry for Stark-X 2026-05-21 19:17:51 -07:00
Stark-X
eb51fb6f50 fix(ssh): keep bulk sync extraction scoped to .hermes 2026-05-21 19:17:51 -07:00
liuhao1024
4a2fa77c15 fix(cli): pre-check CUA release asset for Intel macOS before install
The upstream cua-driver installer resolves the latest release and attempts
to download an architecture-specific asset. When the release only ships
arm64 builds (as of v0.1.6), the installer fails with a raw 404 on Intel
macOS with no clear path forward.

Add _check_cua_driver_asset_for_arch() that probes the GitHub Releases API
before running the installer. If the latest release has no x86_64/amd64
asset, print a clear warning and link to the upstream issue. On arm64 or
API failure, fail open and let the installer proceed as before.

Fixes #24530
2026-05-21 19:17:45 -07:00
teknium1
9896e43db5 fix(skills): load Linux-tagged skills on Termux (android sys.platform)
Reported by @LikiusInik in Discord: on Termux only 3 built-in skills
appeared and /gh-pr-workflow + every other slash-skill from
github/productivity/mlops was missing.

Root cause: skill_matches_platform() compares sys.platform.startswith()
against the skill's platforms list. Termux is a Linux userland on
Android, but Python 3.13+ reports sys.platform == "android" instead of
"linux" — so the ~60 built-in skills tagged platforms:[linux,macos,
windows] (github-pr-workflow, google-workspace, github-auth,
huggingface-hub, etc.) all got filtered out at the listing step in
tools/skills_tool.py:_find_all_skills and never appeared as /slash
commands or in skill_view.

Fix: when is_termux() detects we're running inside Termux, accept
"linux" platform tags regardless of whether sys.platform is "linux"
(pre-3.13) or "android" (3.13+). Also accept explicit
platforms:[termux] / [android] tags. macOS-only and Windows-only
skills correctly remain excluded.

E2E (simulated TERMUX_VERSION=set + sys.platform="android"):
  Before: _find_all_skills() returned ~3 skills.
  After:  _find_all_skills() returns 84 skills including
          github-pr-workflow, google-workspace, github-auth,
          huggingface-hub. Apple-only skills remain excluded.

Non-Termux Linux/macOS/Windows behavior unchanged (verified).

Tests: tests/agent/test_skill_utils.py — 9 new cases covering
android-as-Termux, the [linux,macos,windows] case, macOS-only
exclusion, explicit termux/android tags, non-Termux Android safety,
and unchanged behavior on real Linux/macOS.
2026-05-21 19:08:38 -07:00
adybag14-cyber
d08c2a016a fix(tui): termux-gate composer rendering tweaks for Ink TUI
Salvaged from #28942 (adybag14-cyber). Only the Ink TUI half is taken
here — the bundled "termux compatibility note" added to skills_tool.py
in the original PR did not address the actual user-reported bug
(skill_matches_platform() filtering Linux skills out on Termux) and
also regressed the EXCLUDED_SKILL_DIRS set used to prune nested
.venv/site-packages skills.

Changes:
- ui-tui/src/lib/prompt.ts: single-cell ASCII '>' marker in Termux mode
  to avoid ambiguous-width glyph artifacts while typing.
- ui-tui/src/components/appLayout.tsx: suppress profile prefix on
  narrow Termux panes (>=90 cols still shows it).
- ui-tui/src/lib/inputMetrics.ts + components/messageLine.tsx +
  lib/virtualHeights.ts: termux-aware transcript body width — drop
  the desktop 20-col floor on narrow mobile layouts, align virtual
  heights with actual rendered width.
- ui-tui/src/components/textInput.tsx: disable fast-echo bypass by
  default in Termux to avoid ghosting at soft-wrap boundaries.
  HERMES_TUI_TERMUX_FAST_ECHO=1 opts back in.

Tests: ui-tui/src/__tests__/{prompt,termuxComposerLayout,textInputFastEcho}.test.ts
(12 PR-added tests pass; 3 pre-existing wrapAnsi-bundling failures on
main are unrelated.)

The real skill-listing fix on Termux ('android' platform matching
Linux skills) ships as a follow-up commit on this branch.
2026-05-21 19:08:38 -07:00
Teknium
0e2873a77d fix(computer_use): build summary once before aux-vision routing branch
The cherry-pick of #22891 (max_elements cap) reshuffled _capture_response
so summary was assigned inside both the multimodal and AX branches,
but #30126's aux-vision routing call (_route_capture_through_aux_vision)
fires BEFORE either branch and references the not-yet-bound name.

Compute summary once up-front, keep the AX-branch rebuild for the
truncation note.
2026-05-21 19:07:32 -07:00
briandevans
280dd4513a fix(computer-use): address Copilot review on max_elements cap
Four findings from Copilot's review on PR #22891, all in the AX
elements-array cap added by 22fa1ed:

1. The truncation note ("response truncated to N of M elements") was
   appended unconditionally — including in the som/vision multimodal
   path, whose response carries a screenshot rather than an `elements`
   array. The note described a payload field that wasn't present.
   Moved the note into the AX-text branch where the array actually
   appears.

2. `_format_elements(cap.elements)` ran on the full untrimmed list with
   its own `max_lines=40` cap, so a caller passing `max_elements=10`
   would see summary lines referencing `#11..#40` even though the JSON
   `elements` array only held #1..#10. Format on `visible_elements`
   instead so the summary indices always exist in the response.

3. `_coerce_max_elements` enforced a lower bound but no upper bound,
   so `max_elements=10_000_000` silently disabled the safeguard and
   reintroduced the original context-blow-up. Added a hard cap
   (`_MAX_ALLOWED_MAX_ELEMENTS = 1000`) that clamps oversized values.

4. The schema string said "Default 100" but the property carried no
   `default` field, and claimed `max_elements` had no effect on som/
   vision while the image-missing fallback path can still return an
   elements array. Added `"default": 100`, `"maximum": 1000`, and
   clarified the fallback-path wording.

Each finding gets a regression test:

- test_capture_ax_clamps_oversized_max_elements_to_hard_cap
- test_capture_ax_summary_indices_match_returned_elements
- test_capture_multimodal_summary_omits_truncation_note
- test_schema_max_elements_documents_default_and_upper_bound

Verified with `pytest tests/tools/test_computer_use.py` (53 passed,
including the 5 new cases). Confirmed each new test fails on the
pre-fix code path before applying the production change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 19:07:32 -07:00
briandevans
bb694bad42 fix(computer-use): cap AX elements array to prevent context blowup (#22865)
`computer_use(action='capture', mode='ax')` returned the full AX element
list verbatim in the JSON response. Dense Electron / Obsidian / JetBrains
UIs publish 500+ AX nodes (one reproduction in #22865 returned 597
elements against Obsidian), so a single capture could consume enough
context to trigger compression failures or render the session unusable.
The human-readable `_format_elements` summary is already capped at 40
lines, so the truncation gap was invisible to anyone reading the summary
output.

Add a `max_elements` argument to the tool schema, default 100, that
trims the AX `elements` array. When the cap fires, the response surfaces
`total_elements` and `truncated_elements` and appends a "raise
max_elements or pass app= to narrow" hint to the summary so the model
knows the JSON view is partial and can re-issue with a tighter scope.

Validation is centralized in `_coerce_max_elements`: missing /
non-integer / sub-1 inputs fall back to the default cap, so the
protection can never be silently disabled by a malformed tool-call
argument. The cap only affects AX-mode JSON; `mode='som'` and
`mode='vision'` keep returning a screenshot + image-aware summary
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 19:07:32 -07:00
brooklyn!
9e30ef224d
fix(tui): preserve scrollback when branching sessions (#30162)
Keep the visible transcript mounted after /branch switches to the new session, since the backend already carries the copied history forward.
2026-05-21 21:01:04 -05:00
brooklyn!
a7cd254c29
feat(tui): mouse_tracking DEC mode presets (salvage of #26681) (#30084)
* feat(tui): make display.mouse_tracking pick which DEC modes to enable

Previously the boolean flag was all-or-nothing across modes 1000+1002+1003+1006.
Inside tmux, mode 1003 (any-motion) makes every mouse cross of the prompt row
fire a clipboard probe that surfaces as "No image in clipboard" — sometimes
dozens in a row. Disabling tracking entirely killed scroll-wheel scrolling too,
since tmux's own scrollback is preempted by the alt-screen TUI.

`display.mouse_tracking` (and `/mouse <preset>`) now accepts `off | wheel |
buttons | all` in addition to the legacy booleans. `wheel` is 1000+1006:
scroll wheel + click only, no drag, no hover — the tmux-friendly subset.
`buttons` adds 1002 for drag-to-select. `all` (= legacy `true`) keeps the
hover-driven UI (scrollbar paginate-on-hover, link mouseenter, etc.).

* fix(tui): repaint + sync mouse mode when display.mouse_tracking changes

Two interacting bugs left the TUI blank when `display.mouse_tracking`
switched at runtime (config edit, /mouse <preset>):

1. AlternateScreen's effect re-runs on every `mouseTracking` change,
   tearing down and re-entering the alt screen. After re-entry, ink's
   frame buffers are reset by `resetFramesForAltScreen()` but nothing
   schedules the follow-up render — the alt screen sits blank until
   some other state change happens to trigger one. Add a
   `scheduleRender()` in `setAltScreenActive`'s active=true branch so
   the freshly-entered alt screen gets a full repaint immediately.

2. `setAltScreenActive` early-returns when `active` hasn't changed,
   which silently drops a `mouseTracking` change if the cleanup→setup
   pair somehow leaves `altScreenActive` already true. Call
   `setAltScreenMouseTracking` explicitly from the AlternateScreen
   effect so the in-memory mode and terminal DECSET sequence stay in
   sync regardless of how `setAltScreenActive` resolved (the call is a
   no-op when the mode is unchanged).

* fix(tui): address copilot review #4341269705

- tui_gateway/server.py: drop the never-referenced _MOUSE_TRACKING_MODES
  frozenset (comment #3284802434). _MOUSE_TRACKING_ALIASES already
  centralizes the canonical preset set via its values; the separate
  constant added no behavior.
- tests/test_tui_gateway_server.py: update the existing
  test_config_mouse_uses_documented_key_with_legacy_fallback to assert
  the new preset strings ('all'/'off' instead of 'on'/'off',
  display.mouse_tracking persisted as 'all' instead of True) and add
  test_config_mouse_accepts_preset_strings_and_aliases covering /mouse
  set with wheel/click/unknown (comment #3284802453). The on/off legacy
  config.set return shape was an implementation detail of the boolean
  flag, not a stable API — the slash command, gateway help text, and
  docs all advertise the preset values now.
- ui-tui/packages/hermes-ink/src/ink/ink.tsx: schedule a render at the
  end of reenterAltScreen() (comment #3284802461). Mirrors the same fix
  in setAltScreenActive() from ece0a2f4c — without it, SIGCONT/resize
  self-heal/stdin-gap re-entry leaves the alt screen blank because
  every caller returns early after invoking us.

* fix(tui): address copilot review #4341308478 round 2

- ui-tui/src/config/env.ts (comment #3284837577): the precedence
  comment was misleading. Actual behavior on origin/main is
  HERMES_TUI_MOUSE_TRACKING (explicit override) > Termux default >
  HERMES_TUI_DISABLE_MOUSE legacy kill-switch. This is preserved from
  main; the only change here was the wrong comment that claimed
  DISABLE_MOUSE kept kill-switch semantics. Rewrote the comment block
  to document the actual precedence ladder.
- tui_gateway/server.py /mouse set (comment #3284837607): replaced
  'str(value or "").strip().lower()' with the explicit None idiom
  already used for /indicator, so programmatic callers can pass 0 /
  False and have them route through _MOUSE_TRACKING_ALIASES → 'off'
  instead of collapsing to '' and triggering the toggle path.
- ui-tui/packages/hermes-ink/src/ink/components/AlternateScreen.tsx
  (comment #3284837620): always prepend DISABLE_MOUSE_TRACKING before
  enableMouseTrackingFor(...) on mount. Otherwise selecting
  'wheel'/'buttons' from a state where DEC 1003 was already asserted
  (crash, another app, debugger) would silently leave hover on. Also
  unconditionally DISABLE on unmount so a crash mid-mount can't leak
  DEC modes back to the host shell.

* chore(release): map nat@nthrow.io to @nthrow for #26681 salvage

* fix(tui): drop redundant setAltScreenMouseTracking in AlternateScreen

Copilot review #4341356637 (comment #3284880417). The explicit
setAltScreenMouseTracking(mouseTracking) after setAltScreenActive(true,
mouseTracking) was defensive paranoia added in the previous fix commit
that's not actually reachable in practice:

- React's cleanup always runs before the next setup, so on any prop
  change (mouseTracking or writeRaw) the cleanup sets active=false
  first. Setup then sees active was false and applies the new mode
  via setAltScreenActive without early-returning.
- On the impossible 'active stayed true' path, the writeRaw above has
  already sent DISABLE_MOUSE_TRACKING + enableMouseTrackingFor(newMode)
  to the terminal, so the in-memory mode would lag but the visible
  state is already correct.

Removing the redundant call means a single DEC sequence per mount.
If the 'active stayed true' path ever manifests in practice, the
right fix is in setAltScreenActive (track mode regardless of the
active early-return), not here.

* fix(tui): always DISABLE before enableMouseTrackingFor in ink.tsx

Copilot review #4341379994 (comments #3284900825, #3284900840,
#3284900852). Three remaining call sites in ink.tsx still re-enabled
mouse tracking without first sending DISABLE_MOUSE_TRACKING:

- handleResize alt-screen recovery (line ~577)
- reassertTerminalModes stdin-gap re-assertion (line ~1351)
- reenterAltScreen SIGCONT/resize/stdin-gap self-heal (line ~1408)

For 'wheel'/'buttons' presets, omitting DISABLE leaves any externally-
asserted DEC 1003 (other apps, prior crash, tmux state) still active
and the hover-free preset silently has hover on. DISABLE_MOUSE_TRACKING
is idempotent and safe to send unconditionally — it resets all four
modes. Matches the pattern already in setAltScreenMouseTracking and
the AlternateScreen mount path.

* fix(tui): always DISABLE before enableMouseTrackingFor in exitAlternateScreen

Copilot review #4341452823 (comment #3284959762). exitAlternateScreen()
was the last call site in ink.tsx still re-enabling mouse tracking
without DISABLE first. Editors (vim/nvim/less) and tmux can leave
DEC 1003 hover asserted across the handoff back; without DISABLE,
'wheel'/'buttons' presets silently kept hover on after the editor
quit. Now all five enableMouseTrackingFor() call sites in ink.tsx
prepend DISABLE_MOUSE_TRACKING — handleResize, reassertTerminalModes,
reenterAltScreen, setAltScreenMouseTracking, exitAlternateScreen.

* fix(tui): add defensive default to enableMouseTrackingFor switch

Copilot review #4341485231 (comment #3284979323). TS exhaustive switch
returns string per the type system, but a JS caller / corrupted config
/ hot-reload-in-dev could reach the function with an unknown value at
runtime. Without a default, that path returns undefined which then
concatenates as the literal string 'undefined' into the terminal byte
stream — visibly garbling output. Treat unknown as 'off' (no DEC
sequences) so the worst case is silent input loss rather than a
wrecked screen.

---------

Co-authored-by: Nat Thrower <nat@nthrow.io>
2026-05-21 20:25:52 -05:00
Ben Barclay
4d58e48cdb
Merge pull request #29387 from NousResearch/fix/no-docker-tag
fix(ci): stop pushing per-commit SHA tags to Docker Hub
2026-05-22 10:38:32 +10:00
xxxigm
bec2250d2c test(computer_use): end-to-end regression for capture routing (#24015)
Add tests/tools/test_computer_use_capture_routing.py — 13 integration
tests that drive _capture_response end-to-end with deterministic stubs
for the routing helper, _run_async, vision_analyze_tool, and
get_hermes_dir, so the full code path is exercised without a live
cua-driver, real auxiliary client, or network access.

Coverage:

  * TestCaptureResponseDefaultPath (3 cases)
    - SOM PNG capture returns the legacy multimodal envelope when the
      routing helper says 'native' (image/png MIME).
    - Same path returns image/jpeg MIME for JPEG payloads (cua-driver
      can return either).
    - AX-only mode never even consults the routing helper because no
      PNG is present.

  * TestCaptureResponseRoutedToAuxVision (5 cases)
    - SOM capture with routing on returns a JSON string with the
      vision_analysis embedded, the AX/SOM index preserved, and NO
      image_url parts. Verifies the aux call receives a path under
      the configured cache and a prompt that grounds itself against
      the AX summary.
    - Temp screenshot file is unlinked after _capture_response returns,
      including when the aux call raises (the finally block runs).
    - Empty / malformed aux analysis falls back to the multimodal
      envelope so the user always gets *something* useful.

  * TestRoutingDecisionWiring (4 cases)
    - Explicit auxiliary.vision in config flips routing on regardless of
      main-model vision capability.
    - Vision-capable main + native tool-result support keeps multimodal.
    - Config load failure fails open (returns False, multimodal path
      continues to work).
    - Helper exception is swallowed and routes to legacy behaviour.

  * TestBugReproductionAnchor (1 case) - directly pins the #24015
    contract: when routing is on, the response must NEVER contain a
    'data:image' or 'image_url' substring. That is exactly what tripped
    the reporter's HTTP 404 ('No endpoints found that support image
    input') on tencent/hy3-preview before the fix.

Bug-reproduction proof:
  $ git checkout upstream/main -- tools/computer_use/tool.py
  $ scripts/run_tests.sh tests/tools/test_computer_use_capture_routing.py
  ============================== 13 failed in 1.29s ==============================

  $ # restore tool.py to this branch's HEAD
  $ scripts/run_tests.sh tests/tools/test_computer_use_capture_routing.py
  ============================== 13 passed in 1.04s ==============================

Total branch coverage:
  85 passed across test_computer_use.py, test_computer_use_vision_routing.py,
  test_computer_use_capture_routing.py
2026-05-21 17:38:19 -07:00
xxxigm
e02a7e5e1c fix(computer_use): route SOM/vision captures via auxiliary.vision (#24015)
When the active main model has no vision capability — or when the user
explicitly configured auxiliary.vision in config.yaml — sending the
captured screenshot back to the main model in a multimodal tool-result
envelope is the wrong move: it trips HTTP 404 / 400 at the provider
boundary (e.g. 'No endpoints found that support image input') and the
agent loop reports a hard tool failure for what should have been a
simple capture.

The reporter on #24015 hit this with:

  model:
    default: tencent/hy3-preview      # no vision support
    provider: openrouter
  auxiliary:
    vision:
      provider: openrouter
      model: google/gemini-2.5-flash  # explicitly configured

…and observed:

  computer_use(action='capture', mode='som')
  → ⚠️ API call failed (attempt1/3): NotFoundError [HTTP 404]
     🔌 Provider: openrouter  Model: tencent/hy3-preview
     📝 Error: HTTP 404: No endpoints found that support image input

Fix: in tools/computer_use/tool.py::_capture_response, after a
screenshot is captured (modes 'som' / 'vision'), consult the routing
helper introduced earlier in this branch. When it says 'route to aux',
materialise the PNG to $HERMES_HOME/cache/vision/, run vision_analyze
on it (which honours auxiliary.vision via the standard async_call_llm
task='vision' router), and return a text-only JSON tool result that
embeds the analysis alongside the existing AX/SOM index. The main
model never sees the pixels — it sees an actionable text description
plus the same set-of-mark element index it normally uses.

The two new helpers (_should_route_through_aux_vision,
_route_capture_through_aux_vision) keep the policy and the IO
separated so each can be tested in isolation. Both fail open: if the
config import fails, if the aux call raises, or if the analysis is
empty, we fall back to the existing multimodal envelope so the
behaviour is at worst the pre-fix status quo. Temp screenshot files
are cleaned up unconditionally in a finally block — even on aux call
failure — to avoid leaving residue under cache/vision/.

The end-to-end regression for #24015 is added in the next commit.
2026-05-21 17:38:19 -07:00
xxxigm
5ce5fe3181 test(computer_use): cover capture vision-routing helper
Add tests/tools/test_computer_use_vision_routing.py — 28 unit tests
that pin the contract of the new vision-routing helper introduced in
the previous commit:

  * TestExplicitAuxVisionOverride (12 cases): mirror the
    auxiliary.vision detection rules used by agent.image_routing so
    the capture path and the user-attached-image path agree on what
    counts as an explicit override (provider/model/base_url with
    non-blank, non-'auto' values).
  * TestRouteDecision (7 cases): pin the policy itself — explicit
    override always wins, vision-capable + native-tool-result keeps
    multimodal, everything else fails closed and routes to aux.
  * TestLookupHelpers (5 cases): defensive paths for the models.dev /
    tool-result-support lookups (blank inputs, exceptions, missing
    caps).
  * TestModuleSurface (4 cases): pin the public/__all__ surface and
    keep internal helpers addressable so the integration test in the
    next commit can monkeypatch them deterministically.

Run with:
  scripts/run_tests.sh tests/tools/test_computer_use_vision_routing.py
2026-05-21 17:38:19 -07:00
xxxigm
531efe7208 fix(computer_use): add helper to decide capture vision routing
Add tools/computer_use/vision_routing.py with
should_route_capture_to_aux_vision(provider, model, cfg) — a small
policy helper that decides whether a captured screenshot should be
returned as a multimodal envelope (main model has native vision) or
pre-analysed through the auxiliary.vision pipeline so the main model
only sees text.

The decision mirrors agent.image_routing.decide_image_input_mode for
user-attached images, so the capture path and the user-turn path agree
on what counts as an explicit aux vision override:
  * provider/model/base_url under auxiliary.vision => explicit override
    => route through aux vision
  * provider+model accepts multimodal tool results AND main model
    reports supports_vision=True => keep multimodal envelope
  * everything else (no tool-result image support, non-vision model,
    metadata lookup failure) => fail closed and route through aux

No call sites are changed in this commit; the helper is added in
isolation so the routing decision can be unit-tested before it is
plumbed into _capture_response().
2026-05-21 17:38:19 -07:00
Teknium
2a474bcf72 fix(termux): resolve packed-refs and worktree refs in skill-sync fingerprint
The bundled-skill sync stamp added in the cherry-picked salvage commit
parsed .git/HEAD and looked for a loose ref file in the worktree gitdir
only, so two real cases hit the unresolved branch:

- repos after `git gc` where active refs live in packed-refs
- linked worktrees, whose branch ref lives in <commondir>/refs/heads/
  (verified on the worktree this salvage was built in)

Both fell back to a constant-string fingerprint, so post-commit launches
would never re-run the real skill sync. Now we resolve packed-refs and
check both the worktree gitdir and the common dir for loose refs.

Adds three tests covering: packed-refs resolution, worktree common-dir
packed lookup, worktree common-dir loose lookup, and the explicit
'unresolved' marker (still stable + version-fallback-safe).
2026-05-21 17:19:05 -07:00
adybag14-cyber
6dbbf20ff4 perf(termux): speed up non-tui cli startup 2026-05-21 17:19:05 -07:00
briandevans
5aa4727f34 fix(computer-use): surface app=… filter no-match instead of silently using frontmost (#24170 bug 1)
`CuaDriverBackend.capture(app=X)` and `focus_app(app=X)` silently fell back
to the frontmost on-screen window when X matched no app — typically a
menu-bar utility (e.g. "Fuwari" in the bug reporter's case) rather than
the requested app. The agent then received UI elements for the wrong app
and clicked / typed into it.

The root cause is a localized macOS app name mismatch: `list_windows`
returns the localized `app_name` (e.g. "計算機" on a Japanese/Chinese
system) but callers naturally pass the English name ("Calculator"). The
substring filter doesn't match, and the code falls through to picking the
frontmost window with no signal that the filter was effectively dropped.

Fix:

- `capture(app=…)`: when the filter matches nothing, return a
  `CaptureResult` with empty `app`/`elements` and a diagnostic
  `window_title` pointing the caller at `list_apps` and noting the
  localized-name convention. `_active_pid` / `_active_window_id` are left
  untouched so a subsequent action doesn't inadvertently hit the wrong
  process.
- `focus_app(app=…)`: when the filter matches nothing, set `target = None`
  and let the existing `return ActionResult(ok=False, …, "No on-screen
  window found for app …")` path fire instead of falsely reporting success
  on the frontmost window.

This addresses bug 1 only from #24170. Bugs 2 & 5 are addressed in #30046;
bugs 3 & 4 in #30032.
2026-05-21 17:15:35 -07:00
Bartok9
4cc18877c6 fix(computer_use): preserve app context for capture_after; fix element label parsing (#24170 bugs 2 & 5)
Bug 2 (capture_after=True loses app context):
_maybe_follow_capture called backend.capture(mode='som') with no app=,
causing cua-driver to capture the frontmost window instead of the app
targeted by the preceding capture/focus_app. Fix: track _last_app on
CuaDriverBackend and thread it through the follow-up capture call so
the same app is re-captured regardless of which window has OS focus.

Bug 5 (element labels stripped in capture results):
_ELEMENT_LINE_RE matched the classic '  - [N] AXRole "label"' format
but not the '[N] AXRole (order) id=Label' format introduced in
cua-driver v0.1.6. All element labels were silently dropped as empty
strings, making element identification impossible.

Fix: extend regex to capture both group(3) (quoted label) and group(4)
(id= label), and update _parse_elements_from_tree to use group(4) as
fallback. Both old and new cua-driver output now produce populated
UIElement.label values.

focus_app() now also sets _last_app so that capture_after= on any
subsequent action re-targets the focused app.

5 new regression tests added.

Part of #24170 (bugs 1 and 3/4 addressed separately).
2026-05-21 14:19:09 -07:00