When the active main model has no vision capability — or when the user
explicitly configured auxiliary.vision in config.yaml — sending the
captured screenshot back to the main model in a multimodal tool-result
envelope is the wrong move: it trips HTTP 404 / 400 at the provider
boundary (e.g. 'No endpoints found that support image input') and the
agent loop reports a hard tool failure for what should have been a
simple capture.
The reporter on #24015 hit this with:
model:
default: tencent/hy3-preview # no vision support
provider: openrouter
auxiliary:
vision:
provider: openrouter
model: google/gemini-2.5-flash # explicitly configured
…and observed:
computer_use(action='capture', mode='som')
→ ⚠️ API call failed (attempt1/3): NotFoundError [HTTP 404]
🔌 Provider: openrouter Model: tencent/hy3-preview
📝 Error: HTTP 404: No endpoints found that support image input
Fix: in tools/computer_use/tool.py::_capture_response, after a
screenshot is captured (modes 'som' / 'vision'), consult the routing
helper introduced earlier in this branch. When it says 'route to aux',
materialise the PNG to $HERMES_HOME/cache/vision/, run vision_analyze
on it (which honours auxiliary.vision via the standard async_call_llm
task='vision' router), and return a text-only JSON tool result that
embeds the analysis alongside the existing AX/SOM index. The main
model never sees the pixels — it sees an actionable text description
plus the same set-of-mark element index it normally uses.
The two new helpers (_should_route_through_aux_vision,
_route_capture_through_aux_vision) keep the policy and the IO
separated so each can be tested in isolation. Both fail open: if the
config import fails, if the aux call raises, or if the analysis is
empty, we fall back to the existing multimodal envelope so the
behaviour is at worst the pre-fix status quo. Temp screenshot files
are cleaned up unconditionally in a finally block — even on aux call
failure — to avoid leaving residue under cache/vision/.
The end-to-end regression for #24015 is added in the next commit.
Add tools/computer_use/vision_routing.py with
should_route_capture_to_aux_vision(provider, model, cfg) — a small
policy helper that decides whether a captured screenshot should be
returned as a multimodal envelope (main model has native vision) or
pre-analysed through the auxiliary.vision pipeline so the main model
only sees text.
The decision mirrors agent.image_routing.decide_image_input_mode for
user-attached images, so the capture path and the user-turn path agree
on what counts as an explicit aux vision override:
* provider/model/base_url under auxiliary.vision => explicit override
=> route through aux vision
* provider+model accepts multimodal tool results AND main model
reports supports_vision=True => keep multimodal envelope
* everything else (no tool-result image support, non-vision model,
metadata lookup failure) => fail closed and route through aux
No call sites are changed in this commit; the helper is added in
isolation so the routing decision can be unit-tested before it is
plumbed into _capture_response().
`CuaDriverBackend.capture(app=X)` and `focus_app(app=X)` silently fell back
to the frontmost on-screen window when X matched no app — typically a
menu-bar utility (e.g. "Fuwari" in the bug reporter's case) rather than
the requested app. The agent then received UI elements for the wrong app
and clicked / typed into it.
The root cause is a localized macOS app name mismatch: `list_windows`
returns the localized `app_name` (e.g. "計算機" on a Japanese/Chinese
system) but callers naturally pass the English name ("Calculator"). The
substring filter doesn't match, and the code falls through to picking the
frontmost window with no signal that the filter was effectively dropped.
Fix:
- `capture(app=…)`: when the filter matches nothing, return a
`CaptureResult` with empty `app`/`elements` and a diagnostic
`window_title` pointing the caller at `list_apps` and noting the
localized-name convention. `_active_pid` / `_active_window_id` are left
untouched so a subsequent action doesn't inadvertently hit the wrong
process.
- `focus_app(app=…)`: when the filter matches nothing, set `target = None`
and let the existing `return ActionResult(ok=False, …, "No on-screen
window found for app …")` path fire instead of falsely reporting success
on the frontmost window.
This addresses bug 1 only from #24170. Bugs 2 & 5 are addressed in #30046;
bugs 3 & 4 in #30032.
Bug 2 (capture_after=True loses app context):
_maybe_follow_capture called backend.capture(mode='som') with no app=,
causing cua-driver to capture the frontmost window instead of the app
targeted by the preceding capture/focus_app. Fix: track _last_app on
CuaDriverBackend and thread it through the follow-up capture call so
the same app is re-captured regardless of which window has OS focus.
Bug 5 (element labels stripped in capture results):
_ELEMENT_LINE_RE matched the classic ' - [N] AXRole "label"' format
but not the '[N] AXRole (order) id=Label' format introduced in
cua-driver v0.1.6. All element labels were silently dropped as empty
strings, making element identification impossible.
Fix: extend regex to capture both group(3) (quoted label) and group(4)
(id= label), and update _parse_elements_from_tree to use group(4) as
fallback. Both old and new cua-driver output now produce populated
UIElement.label values.
focus_app() now also sets _last_app so that capture_after= on any
subsequent action re-targets the focused app.
5 new regression tests added.
Part of #24170 (bugs 1 and 3/4 addressed separately).
* fix(skills): skip dependency dirs in skill scan
* fix(skills): widen sibling rglob scanners to use shared exclusion set
Follow-up to PR #29968. The contributor's PR widened EXCLUDED_SKILL_DIRS
in the canonical walker (iter_skill_index_files), which fixes the
user-visible discovery path. This commit sweeps the ~12 other
rglob('SKILL.md') sites that did their own ad-hoc filtering — most only
checked .git/.hub, some had no filter at all — so dependency dirs
(.venv, node_modules, site-packages, etc.) cannot leak ghost skills
through the secondary paths.
Adds agent.skill_utils.is_excluded_skill_path(path) helper. Migrates
all 13 sites to use it. Removes 3 hardcoded duplicate filter sets.
Sites touched:
agent/curator_backup.py - skill backup file count
gateway/run.py - disabled-skill response (2 sites)
hermes_cli/dump.py - skill count in env dump
hermes_cli/profile_describer.py- profile description (2 sites)
hermes_cli/profile_distribution.py - profile install count
hermes_cli/profiles.py - profile skill count
hermes_cli/skills_hub.py - category detection
tools/skill_manager_tool.py - skill name lookup (already used set, now uses helper)
tools/skill_usage.py - usage tracking + skill dir lookup (2 sites)
tools/skills_hub.py - optional skills find + scan (2 sites)
tools/skills_sync.py - bundled skills sync
E2E verified with the exact reported shape
(bring/scripts/.venv/.../typer/.agents/skills/typer/SKILL.md): no
sibling site picks up the ghost skill, all five legit-skill counts
still return 1.
* chore(infographic): retro-pop-grid bento for PR #30042 skill-scanner sweep
---------
Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
Bug 3: The cua_backend type_text() method called MCP tool 'type_text_chars'
which does not exist in current cua-driver. Changed to 'type_text' which is
the correct MCP tool name.
Bug 4: The drag() method returned a hardcoded 'not supported' error even
though cua-driver exposes a 'drag' MCP tool. Implemented proper drag
dispatching with coordinate-based and element-based targeting.
Added dispatch-level validation for drag to ensure from/to coordinates
or elements are provided before calling any backend.
Fixes#24170 (bugs 3 and 4)
Five call sites do os.chmod(path.parent, 0o700) without checking that
the parent resolves to a safe directory. If HERMES_HOME or another
path env var resolves to /, the chmod strips traversal permission from
the root inode and bricks the entire host.
Add secure_parent_dir() to hermes_constants.py that refuses to chmod
/ or any top-level directory (depth < 2). Replace all 5 call sites
with this helper.
Fixes#25821
Sibling fix on top of @EloquentBrush0x's PR #29441.
- tools/skills_hub.py GitHubSource.search() had the same r.name dedup bug.
Two configured GitHub taps publishing same-named skills would collapse to one.
- tests/hermes_cli/test_skills_hub.py:test_browse_skills_dedup_uses_identifier_not_name
patched hermes_cli.skills_hub.create_source_router, but browse_skills() imports
it locally from tools.skills_hub. Fixed patch path.
Browse.sh exposes skills by task name (e.g. "search-listings"), which is
shared across hundreds of sites. Deduplicating by name silently dropped
every browse-sh skill after the first one with a given task name — e.g.
only Airbnb's "search-listings" would survive, collapsing Booking.com,
Zillow, and every other site's variant into nothing.
Switch unified_search() and do_browse() to use r.identifier as the dedup
key. identifier is always globally unique (e.g.
"browse-sh/airbnb.com/search-listings-ddgioa"), so same-named skills from
different browse-sh hostnames are preserved as distinct results.
Update existing TestUnifiedSearchDedup tests to model the real scenario
(same identifier appearing from two sources) and add a regression test
that asserts browse-sh skills with the same name but different hostnames
are never collapsed.
The xAI Responses API for x_search returns 200 OK with a
synthesized fluff answer in two failure modes that callers currently
cannot distinguish from a real, citation-backed result:
1. Any narrowing filter (allowed_x_handles, excluded_x_handles,
from_date, to_date) was active, but the X index returned no
matching posts. The model then answers from training data.
2. The date range is malformed, inverted, or pure-future (e.g.
from_date=2030-01-01). The API call burns quota and Grok
responds with a generic answer.
Mitigations, both client-side:
* Validate from_date / to_date before the HTTP call:
- Strict YYYY-MM-DD.
- from_date <= to_date when both set.
- from_date <= today UTC (no posts in a window that hasn't
started). to_date in the future remains allowed so callers
can request 'from yesterday to tomorrow'.
* Add 'degraded' + 'degraded_reason' to successful responses.
degraded=True iff any narrowing filter was active AND both the
top-level 'citations' array and inline 'url_citation'
annotations came back empty. A broad query with no filters that
returns no citations is *not* flagged degraded — that case is
just an unsourced answer, not a filter miss.
Tests cover all four validation paths plus six degraded-flag
scenarios (each filter type, inline vs top-level citation
recovery, broad query baseline). All existing tests continue to
pass; the additions are purely additive on the success-path
response shape.
Discovered while testing the x_search toolset end-to-end:
queries scoped to @Teknium1 returned confident-sounding generic
text about Nous Research with zero citations, and from_date in
2030 produced sassy non-answers. Both are now detectable by the
caller.
* fix(lint): skip per-file shell linter when LSP will handle the file
`_check_lint` ran `npx tsc --noEmit FILE.ts` after every `.ts`/`.tsx`
edit. `tsc` ignores `tsconfig.json` when given an explicit file argument
(documented quirk) and defaults to no-lib / ES5, so every ES2015+ stdlib
reference reports as missing:
- `Cannot find global value 'Promise'`
- `Cannot find name 'Map' / 'Set' / 'ReadonlySet' / 'Iterable'`
- `Property 'isFinite' does not exist on type 'NumberConstructor'`
- `Module 'phaser' can only be default-imported using esModuleInterop`
- `import.meta is only allowed when --module is es2020+`
On real TypeScript projects this floods the `lint` field on
WriteResult / PatchResult with up to 25K tokens of false positives
per edit. The delta filter in `_check_lint_delta` is supposed to mask
them, but a tiny edit shifts line numbers and every phantom resurfaces
as "introduced by this edit". The result is a 1MB+ phantom-error dump
on every patch that eats the agent's context budget. Same shape for
`.go` (`go vet` outside a module) and `.rs` (`rustfmt --check` outside
a Cargo project).
PR #24168 added an LSP tier on top of this — real `tsserver` / `gopls`
/ `rust-analyzer` diagnostics surface in the separate `lsp_diagnostics`
field. But the broken shell linter kept running underneath, so the
phantom-error dump kept happening even when LSP was giving us a clean
authoritative signal.
This change short-circuits the shell linter for the structurally-broken
extensions (`.ts`, `.tsx`, `.go`, `.rs`) when an LSP server is active
and claims the file via `LSPService.enabled_for(path)`. The LSP tier
runs as before and carries the real diagnostics in `lsp_diagnostics`.
Other shell linters (`py_compile`, `node --check`) keep running
unconditionally — they're fast, file-local, and correct.
Default behavior (LSP disabled, LSP misconfigured, remote backend, file
outside a workspace) is unchanged — the existing fallback paths trigger
when `_lsp_will_handle` returns False, so users who haven't opted into
LSP get the same shell-linter behavior they had before.
Drive-by: `.tsx` was missing from the `LINTERS` table entirely, so TS
React files got no post-edit syntax check at all. Added it for
symmetry; in practice it now hits the LSP-skip path.
Tests:
- `tests/agent/lsp/test_shell_linter_lsp_skip.py` — 14 tests covering:
* skip happens for each redundant extension when LSP claims the file
(asserted by patching `_exec` to raise on any shell-linter call)
* shell linter still runs when LSP is inactive (regression guard)
* `.py` / `.js` continue to run unconditionally even with LSP active
* `_lsp_will_handle` is exception-safe: returns False on None
service, remote backend, or `enabled_for` raising
* `.tsx` is in both `LINTERS` and `_SHELL_LINTER_LSP_REDUNDANT`
- All pre-existing tests in `tests/agent/lsp/` and
`tests/tools/test_file_operations*.py` still pass (233/233).
* fix(lint): address Copilot review on #29054
Two fixes from copilot-pull-request-reviewer on PR #29054:
1. `.tsx` regression with LSP disabled
(https://github.com/NousResearch/hermes-agent/pull/29054#discussion_r3271017282)
The first revision added `.tsx` to the `LINTERS` table so that
TypeScript React files would hit the LSP skip path. Side effect:
when LSP is *disabled* (the default), `.tsx` edits would suddenly
run `npx tsc --noEmit FILE.tsx` and inherit the same phantom-error
dump this PR is supposed to fix. Pre-PR behavior was implicit
`skipped` (no `LINTERS` entry); restore that.
- Remove `.tsx` from `LINTERS`.
- Remove `.tsx` from `_SHELL_LINTER_LSP_REDUNDANT` (the skip path
is unreachable without a `LINTERS` entry — falls through to
`ext not in LINTERS` first).
- When LSP IS enabled, `.tsx` is still covered by the LSP tier
via `_maybe_lsp_diagnostics` (typescript-language-server's
`extensions` tuple includes `.tsx`), so the diagnostics still
surface — just on the `lsp_diagnostics` channel, not `lint`.
- Update test_shell_linter_lsp_skip.py to reflect this contract
(drop `.tsx` from the parametrize lists; add
`test_tsx_stays_out_of_linters_table_for_default_compatibility`
and `test_tsx_default_check_lint_returns_skipped`).
2. V4A patches dropped `WriteResult.lsp_diagnostics`
(https://github.com/NousResearch/hermes-agent/pull/29054#discussion_r3271017295)
`tools/patch_parser.py::apply_v4a_operations` calls
`file_ops.write_file()` per operation, then calls `_check_lint()`
directly afterwards — but never propagates `WriteResult.lsp_diagnostics`
to the `PatchResult`. The shell-linter skip introduced in this PR
makes the gap visible: a `.ts` / `.go` / `.rs` V4A patch with LSP
active would return `lint = {f: {skipped: True}}` and zero
diagnostics from any channel.
- `_apply_add` and `_apply_update` now return
`Tuple[bool, str, Optional[str]]` where the third element is
`WriteResult.lsp_diagnostics` (or `None` on failure / no diags).
- `_apply_delete` and `_apply_move` stay 2-tuples — they don't
produce diagnostics, no write goes through `write_file`.
- `apply_v4a_operations` accumulates per-file diagnostics blocks
and surfaces a combined block on `PatchResult.lsp_diagnostics`.
Each block already carries its `<diagnostics file="...">` header
from `LSPService.report_for_file`, so concatenation preserves
per-file attribution.
Tests added (`test_patch_parser.py::TestV4ALspDiagnosticsPropagation`):
- ADD op: `WriteResult.lsp_diagnostics` flows to `PatchResult`
- UPDATE op: same
- No diagnostics → `PatchResult.lsp_diagnostics is None` (not "")
- Multi-file patch: combined block contains every per-file block
Verification:
- Targeted test scope: 257/257 pass
(tests/agent/lsp/, tests/tools/test_file_operations*.py,
tests/tools/test_patch_parser.py)
- Wider sweep: 5400 pass; 11 failures all pre-existing on origin/main
(file_staleness / file_read_guards / file_state_registry — unrelated
macOS /var/folders tmp-path sensitivity issues, confirmed by
re-running on a clean origin/main checkout)
* docs(test): align shell-linter LSP skip docstring with .tsx behavior
Copilot review feedback (review #4324947616, comment #3271049036):
the test module docstring still listed .tsx alongside .ts/.go/.rs in
the skip contract, but .tsx is now intentionally NOT in LINTERS or
_SHELL_LINTER_LSP_REDUNDANT. Updated the bullet list to drop .tsx from
the skip contract and added a paragraph documenting why .tsx is left
out (preserves pre-PR implicit-skip behavior for LSP-disabled users;
LSP coverage still happens via _maybe_lsp_diagnostics).
* test(lsp): drop unused tmp_path from _make_fops helper
Copilot review #3271069484: the helper accepted tmp_path but never
used it. Callers still need tmp_path themselves for the file they're
asserting against, so we just drop the helper's parameter.
Add browser CDP launch candidates for Chrome, Chromium, Brave, and Edge while preserving Chrome-first selection. Retry candidate launch failures instead of giving up after the first executable.
Update /browser CLI and TUI messaging, docs, and tool descriptions from Chrome-only wording to Chromium-family browser support. Add regression coverage for Brave/Edge paths, Chrome-first precedence, fallback launches, and CDP endpoint probing.
Commits 8bf09455d (Grogger, explicit creationflags=) and 95683c028
(nekwo, **_popen_kwargs via windows_hide_flags()) landed 77 minutes
apart and both injected creationflags into the same subprocess.Popen
call. nekwo's commit correctly replaced the explicit line in
tools/process_registry.py but only added the kwargs spread in
tools/environments/local.py -- leaving creationflags specified twice.
Result on Windows: every LocalEnvironment.init_session() raised
"subprocess.Popen() got multiple values for keyword argument
'creationflags'" and fell back to bash -l per command (much slower --
bashrc runs on every shell invocation).
Drop the explicit line so **_popen_kwargs is the single source.
`_wait_for_process()` was sleeping for a fixed 200ms between polls of
the subprocess exit status. For commands that complete in <50ms (echo,
pwd, date, cat short files, write_file with small content, read_file
with small content), the agent was stuck waiting for the next 200ms
tick to notice the process had exited. That floor was the dominant
component of per-tool latency for typical short commands.
Replace with adaptive backoff: start at 5ms, multiply by 1.5 each
iteration up to 200ms. Fast commands (the common case) return in
~6ms; long-running commands (builds, tests, sleeps) reach the 200ms
steady-state poll rate within ~12 iterations (~150ms total) and pay
identical CPU after that.
Tool-call wall time (deterministic microbench of `echo first`):
before: median 200ms min 200ms max 200ms
after: median 5ms min 5ms max 7ms
saved: ~195ms per terminal tool call
End-to-end chat -q with 3 sequential terminal tool calls
(`echo first`, `echo second`, `echo third`):
before: median 5.73s, min 5.61s
after: median 4.64s, min 4.60s
saved: ~1100ms wall per turn
Live tmux session: a typical 'write file, read it back' turn now
displays each tool as 0.1s in the spinner (was 0.9s before). The
agent observes the subprocess exit ~200ms faster per call. For chat
workflows that do 4-8 terminal/file calls per turn this saves
800ms-1.5s of pure wall-clock waiting.
Why it's safe:
- Interrupt and timeout checks still fire on every iteration (no
longer rate-limited to 5/sec)
- Activity callback fires on the same 'due' schedule (`touch_activity_if_due`)
- DEBUG_INTERRUPT heartbeat is unchanged (30s)
- Steady-state poll rate for long-running commands matches the old
200ms within ~150ms of startup
Tests:
- tests/tools/ — 5246 passed, 22 skipped, 2 pre-existing xdist flakes
(test_delegate.py::test_depth_limit, test_constants — pass in isolation)
- Live tmux: 2-turn conversation + multiple tool calls, no errors
Adds a new bundled web search provider plugin backed by xAI's agentic
Web Search tool (server-side `web_search` on the Responses API). Slots
in alongside the existing Firecrawl / Tavily / Exa / Brave / SearXNG /
DDGS providers; opt in via `web.backend: xai` (or auto-selected by the
registry's single-provider shortcut when it's the only available web
provider, matching every other backend's behavior).
Reuses the existing xAI HTTP credential plumbing (`tools/xai_http.py`)
so it works with both `hermes auth login xai-oauth` (SuperGrok OAuth)
and `XAI_API_KEY` — no new credential paths, no new env vars, no new
setup-wizard prompts. The existing `xai_grok` post_setup hook handles
credential collection.
Reference: https://docs.x.ai/developers/tools/web-search
Provider behavior
-----------------
- Sends a structured prompt to Grok with `tools=[{"type": "web_search"}]`
enabled and `include=["no_inline_citations"]`, then parses results
from a `{"results": [...]}` JSON block (primary), falling back to
`url_citation` annotations (secondary) and the top-level `citations`
list (last-ditch). Annotation fallback falls through to citations
when no rows are extractable, so future annotation types xAI may
add don't silently mask real data.
- HTTP 200 + `{"error": {...}}` envelopes (model-overload, refusal)
are surfaced as failures rather than masked as success-with-empty-
results.
- HTTP 401 on the OAuth path triggers a single `force_refresh=True`
retry — closes two gaps the resolver's proactive JWT-exp shortcut
doesn't cover: opaque (non-JWT) access tokens and mid-window
revocation. Env-var (`XAI_API_KEY`) credentials never retry; they
can't be refreshed and an immediate retry would just burn quota.
- `is_available()` is a cheap probe (env var OR auth.json read), never
invokes the OAuth resolver — required by the ABC contract because
it runs on every `hermes tools` repaint and at tool-registration time.
- Class docstring documents the LLM-in-a-trench-coat trust model so
callers piping untrusted input into `web_search` know returned URLs
are model-generated and should be validated before fetching.
Config (`config.yaml`):
web:
backend: xai
xai:
model: grok-4.3 # optional, defaults to grok-4.3
allowed_domains: # optional, max 5 — mutex with excluded_domains
- arxiv.org
excluded_domains: # optional, max 5
- example-spam.com
timeout: 90 # optional, seconds
Files
-----
- plugins/web/xai/plugin.yaml (new) plugin manifest
- plugins/web/xai/__init__.py (new) register(ctx) hook
- plugins/web/xai/provider.py (new) XAIWebSearchProvider impl
- tools/xai_http.py (+47) has_xai_credentials()
cheap-probe helper +
keyword-only force_refresh
arg on resolve_xai_http_
credentials() (backwards
compatible; all 9 other
call sites unaffected)
- tools/web_tools.py (+11) "xai" added to configured-
backend set + branch in
_is_backend_available()
- tests/tools/test_web_providers_xai.py (new, 39 tests) covers
identity, cheap-probe semantics,
JSON / annotation / citations
parse paths, request payload
shape, error envelopes, OAuth
force-refresh-on-401 retry,
env-var-no-retry guard, 500-not-
retried guard, refresh-returns-
same-token guard, OAuth runtime
resolution, and backend wiring.
Tests
-----
- 39 xai-suite passes
- 79 sibling web-provider tests (brave-free, ddgs, searxng, base) pass
- 119 cross-suite tests for other xai_http callers (transcription,
x_search, tts) pass — verifies the new keyword-only arg is BC
- scripts/check-windows-footguns.py: clean on all 5 modified files
No edits to run_agent.py, cli.py, gateway/, toolsets, config schema,
plugin core, or auth core.
The catalog's sourceUrl points at github.com/browserbase/browse.sh,
whose underlying repository is not always public — most raw URLs derived
from it 404. Use the per-skill detail endpoint instead, which returns a
skillMdUrl CDN blob that reliably resolves to the SKILL.md text. Fall
back to a raw.githubusercontent.com sourceUrl if the detail call fails.
- tools/skills_hub.py: rewrite BrowseShSource.fetch() to resolve via
/api/skills/{slug} -> skillMdUrl; drop the unreachable _to_raw_url
helper; expose the resolved URL in bundle.metadata.skill_md_url.
- tests/tools/test_skills_hub_browse_sh.py: match the real catalog
shape (name = task name, slug = host/task-id), exercise the
detail-endpoint -> blob two-call flow, and add a fallback test.
- scripts/release.py: map kylejeong21@gmail.com -> Kylejeong2.
Adds BrowseShSource — a new skill source adapter that integrates
Browserbase's browse.sh catalog (169+ site-specific SKILL.md files)
into the Hermes Skills Hub.
- BrowseShSource class in tools/skills_hub.py implementing SkillSource ABC
- Fetches browse.sh catalog API with 1h TTL cache
- Full-text search across name, title, description, hostname, category, tags
- fetch() downloads SKILL.md via sourceUrl (GitHub HTML -> raw URL conversion)
- Registered in create_source_router() after LobeHubSource
- Tests in tests/tools/test_skills_hub_browse_sh.py (7 tests, all passing)
Apply Windows CREATE_NO_WINDOW flags to foreground local terminal subprocesses and tracked background processes so Hermes operations do not flash or steal focus with extra console windows.
When 'hermes update' syncs bundled skills, the summary line only shows
the count of user-modified skills that were kept (e.g. '3 user-modified
(kept)'), but not *which* skills. Once the update finishes, the user
has no way to know which skills need triage.
Append the skill names to the summary line, truncated to 5 with a
'+N more' suffix for long lists:
Done: 12 new, 3 updated, 7 unchanged, 3 user-modified (kept):
hermes-agent, debugging-hermes-tui-commands, system-health.
25 total bundled.
Closes#28121
Sweep of all CI failures on origin/main, grouped by drift source:
Telegram allowlist gate (db50af910 added user-authz to _should_process_message):
- Hardcoded "[Telegram]" prefix in the logger.warning so the call no
longer dereferences self.name → self.platform, which test fixtures
built via object.__new__ never set.
- test_telegram_format / test_allowed_channels_widening fixtures stub
_is_callback_user_authorized → True so the new gate doesn't reject
guest-mode / allowed-channels test messages.
- test_telegram_approval_buttons::test_update_prompt_callback_not_affected
sets TELEGRAM_ALLOWED_USERS="*" so the fail-closed default doesn't
reject the callback before it writes .update_response.
Approval surface (6d495d9e7 renamed status, 214b95392 detached stdin):
- test_no_callback_returns_approval_required: status is now
"pending_approval" (was "approval_required").
- test_close_stdin_allows_eof_driven_process_to_finish: switch to
use_pty=True; non-PTY now uses stdin=DEVNULL.
Mattermost (send() now resolves root_id via _api_get first):
- test_send_with_thread_reply mocks _session.get with a thread-root
response so the new resolver doesn't TypeError on a bare AsyncMock.
Kanban (d8ad431de rename, f55d94a1e review column, _kanban_worker_skill_available):
- _safe_int → _to_epoch in the two test_kanban_db tests.
- Spawn-skills tests (×3) monkey-patch _kanban_worker_skill_available
to True since the isolated kanban_home fixture has no devops/kanban-worker tree.
- test_gateway_dispatcher_disables_corrupt_board: connect count
3 → 5 (review-column probe now also runs per tick).
Aux-config severity at_or_above (a94ddd807):
- test_diagnostics_endpoint_severity_filter expects warning filter to
include error+critical now (was exact-match).
Anthropic error handling (conversation loop extracted from run_agent):
- _no_backoff_wait fixture patches BOTH run_agent.jittered_backoff AND
agent.conversation_loop.jittered_backoff. The latter is the actual
call site; without the second patch tests burn ~2s per retry and
hit the 30s SIGALRM timeout on CI.
Other test pollution / drift:
- test_auto_does_not_select_copilot_from_github_token: patch
agent.bedrock_adapter.has_aws_credentials → False so boto3's
credential chain can't auto-pick Bedrock from developer ~/.aws.
- test_setup_openclaw_migration: patch hermes_cli.gateway.get_env_value
in addition to setup_mod.get_env_value — _platform_status reads
through the gateway module's binding.
- test_gateway_prefix: COMPONENT_PREFIXES["gateway"] now includes
"hermes_plugins" too.
- test_recommended_update_command_defaults_to_hermes_update: also
short-circuit get_managed_update_command in case a stray
~/.hermes/.managed marker is present.
- test_user_id_is_not_explicit: _parse_target_ref now returns
is_explicit=False for Slack U.../W... IDs (chat.postMessage rejects
them — a DM must be opened first via conversations.open).
1. trajectory_compressor.py: yaml.safe_load() returns None on empty
files, crashing with TypeError on `if 'tokenizer' in data`. Fix by
adding `or {}` fallback. (HIGH — blocks startup with empty config)
2. 6 files with fcntl.flock(LOCK_UN) in finally blocks without
try/except: cron/scheduler.py, hermes_cli/auth.py,
agent/shell_hooks.py, tools/skill_usage.py,
tools/environments/file_sync.py, tools/memory_tool.py. If unlock
raises OSError, fd.close() is skipped and the lock is held forever.
The msvcrt branches already had try/except; the fcntl branches did
not. Fix by wrapping in try/except (OSError, IOError): pass.
3. agent/copilot_acp_client.py line 639: TOCTOU race — path.exists()
followed by path.read_text() with no try/except. If file is deleted
between the check and the read, FileNotFoundError propagates. Fix
by using try/except FileNotFoundError.
4. gateway/sticker_cache.py: non-atomic write via Path.write_text()
can leave truncated JSON on crash, causing JSONDecodeError on next
load. Fix by writing to tempfile + fsync + os.replace (atomic).
The standalone _send_telegram path in send_message_tool lacked the
thread-not-found fallback that the gateway adapter has. When a forum
topic thread_id was stale or deleted, the send would fail entirely
instead of retrying to the General topic.
Changes:
- Add _is_telegram_thread_not_found() helper matching gateway adapter
- Add thread-not-found retry in text send path
- Add thread-not-found retry in media send path (with f.seek(0))
- Separate text_kwargs from thread_kwargs to prevent
disable_web_page_preview leaking into send_photo/send_video calls
Closes#27012
When sending messages containing @username patterns, auto-generate
MessageEntity(type='mention') entries so that the receiving bot's
require_mention filter can trigger. This enables proper bot-to-bot
interop where mention-based routing is used.
Background-process completion notifications (notify_on_complete) and
watch-pattern notifications were always delivered to the Telegram main
chat instead of the originating private-chat topic.
Hermes-created Telegram DM topic lanes only render a send when it carries
both message_thread_id and a reply anchor. The synthetic MessageEvent
injected on process completion had no message_id, so _reply_anchor_for_event
returned None and _thread_kwargs_for_send dropped message_thread_id
entirely — routing the notification to the main chat.
Capture the triggering message id at spawn time and thread it through to
the synthetic event so it can be reply-anchored back into the topic:
- session_context: add HERMES_SESSION_MESSAGE_ID context var
- telegram adapter: populate SessionSource.message_id on inbound messages
- terminal tool: persist watcher_message_id on the process session
- process registry: carry/persist message_id on watcher dicts + checkpoint
- gateway: set MessageEvent.message_id on injected notifications
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the send_message tool runs outside the gateway process (agent loop,
TUI, cron, etc.), _gateway_runner_ref() returns None and the standalone
path in _send_telegram constructs Bot(token=token) directly, bypassing
any configured proxy. In regions where api.telegram.org is blocked, the
send times out after ~5s with 'Telegram send failed: Timed out' and
nothing ever shows up in gateway.log because the request never reaches
the gateway.
Resolve TELEGRAM_PROXY (via gateway.platforms.base.resolve_proxy_url,
which also honours HTTPS_PROXY/HTTP_PROXY/ALL_PROXY and NO_PROXY) just
before constructing the Bot. When a proxy is found, attach an
HTTPXRequest(proxy=...) for both 'request' and 'get_updates_request',
matching what gateway/platforms/telegram.py already does for in-gateway
sends and what the Discord standalone sender already does. Any
exception attaching the proxy falls back cleanly to a direct connection,
preserving prior behaviour for users without a proxy configured.
Adds tests/tools/test_send_message_telegram_proxy.py covering both the
proxy-configured and no-proxy cases.
Salvages #23208 by @awizemann. Tracks which chat session created a
kanban task so clients can render a per-session board without falling
back to tenant + time-window heuristics.
- Schema: tasks gains nullable session_id TEXT column with index
(additive migration in _migrate_add_optional_columns).
- ACP: server.py exposes the originating session id via HERMES_SESSION_ID
with save/restore around the agent loop.
- Tool: kanban_create reads HERMES_SESSION_ID (with explicit override).
- CLI: 'hermes kanban list --session <id>' filter; JSON output exposes
session_id.
Salvages #26897 by @loicnico96. The per-task model_override DB column
already exists on main, but it wasn't exposed in user-facing surfaces.
This adds:
- 'kanban show' prints 'model: <name>' when model_override is set
- kanban_show / kanban_list tool responses include the model_override field
Original branch was stale (PR was authored against an older field name
'model'); applied the substantive surface exposure manually using the
current 'model_override' field name.
Salvages #28199 by @bensargotest-sys. Aligns Kanban docs with current
tool registration: dispatcher-spawned task workers get task tools,
profiles that explicitly enable the kanban toolset get orchestrator
routing tools (kanban_list, kanban_unblock). Corrects failure-limit
text to current default of 2. Hardens the e2e subprocess script to
resolve repo root and use the spawnable default assignee. Updates the
diagnostics severity fixture to assert error below the critical
threshold.
Salvages #23738 by @LeonSGP43. Wheel installs were missing skills/ and
optional-skills/ because pyproject's [tool.setuptools.packages.find]
only includes Python packages — the skills directories don't have
__init__.py so they were silently dropped from the wheel.
Adds setup.py with data_files spec emitting skills/* and optional-skills/*
under hermes_agent-<v>.data/data/, and a get_bundled_skills_dir() helper
in hermes_constants that discovers the wheel-installed location via
sysconfig before falling back to a source-checkout path. tools/skills_sync
uses the helper so 'hermes update' works for pip-installed users.
Salvages #27526 by @shunsuke-hikiyama. Adds an --initial-status flag
(running|blocked, default running) to 'kanban create', threaded through
kanban_db.create_task() and the kanban_create tool schema. 'blocked'
parks the task directly in the blocked column for R3 human-ops review,
skipping the brief running-to-blocked transition.
Dropped the unrelated 'add' alias, WIFEXITED Windows compat, and
slash-handler error formatting changes that were bundled in the
original PR — those should ship as their own focused changes if still
wanted.
Salvages #25579 by @wesleysimplicio. Stamps task_runs.metadata.worker_session_id
from HERMES_SESSION_ID on kanban_complete. Cherry-picked the substantive
commit (not the AUTHOR_MAP fixup tip) onto current main.
The _SLACK_TARGET_RE regex only matched IDs starting with C (channel),
G (group), or D (direct message). Slack user IDs start with U, causing
'Could not resolve' errors when trying to send DMs to specific users.
Changes:
- Expand _SLACK_TARGET_RE to accept U-prefixed IDs (user IDs)
- Add conversations.open fallback to resolve user IDs to DM channel
IDs before sending, since chat.postMessage requires a conversation ID
Fixes #ISSUE_NUMBER
When a tool call requires user approval in the non-blocking gateway path,
the LLM previously received a result that was indistinguishable from a
failed tool call (exit_code=-1, error=message). The LLM could not tell
whether the tool was pending approval, had returned empty results, or had
failed silently — causing it to burn context on wrong hypotheses.
Fix changes the result format to include:
- status: pending_approval (clear state name)
- approval_pending: True (explicit boolean for LLMs to detect)
- error: cleared to empty string (removes misleading error signal)
This lets the LLM reason about approval latency vs actual errors,
short-circuiting the previous silent failure mode.
Fixes#14806
Background process non-PTY path used stdin=subprocess.PIPE unconditionally,
creating an orphan pipe that was never written to and never closed. Child
processes that read stdin would block indefinitely, competing with the
parent's prompt_toolkit event loop for terminal ownership and causing
complete keyboard lockout.
Change to stdin=subprocess.DEVNULL so children get immediate EOF on stdin
reads instead of blocking forever. For interactive stdin, the PTY path
(which has its own independent PTY via ptyprocess.PtyProcess.spawn) should
be used instead.
Fixes#17959
xAI's /v1/responses and /v1/chat/completions endpoints reject tool schemas
whose enum values contain a forward slash with a generic HTTP 400 'Invalid
arguments passed to the model.' before any token is emitted — the schema
compiler trips on the '/' character regardless of where it appears.
Most commonly hit by MCP-derived tools whose enum lists HuggingFace model
IDs ('Qwen/Qwen3.5-0.8B', 'openai/gpt-oss-20b') or owner/name environment
identifiers.
Mirrors the existing strip_pattern_and_format sanitizer (PR for #27197).
The new strip_slash_enum walks tool parameters and drops the entire enum
keyword when any value contains '/' — keeping it partial would still 400
since xAI's failure is all-or-nothing on the enum. The field description
still reaches the model so the prompting hint is preserved.
Wired in at both code paths for parity:
- agent/chat_completion_helpers.py (main agent xAI Responses path)
- agent/auxiliary_client.py (aux client xAI Responses path, matching
the same parity guarantee 2fae8fba9 established for pattern/format)
Salvaged from #28021 by @Slimydog21 — contributor's branch was severely
stale (would have reverted ~5000 LOC across azure/kanban/i18n); fix
re-applied surgically on current main with their sanitizer + 9 tests
preserved verbatim. Author noreply email used (original was a Mac
hostname leak).
Tirith flags .app domains with a lookalike_tld finding because the TLD
"can be confused with file extensions". This is a false positive for
legitimate production APIs (e.g. api.example.app, lark.app).
Add _is_app_tld_finding() and a post-parse suppression block in
check_command_security(): if the only finding(s) on a warn verdict are
lookalike_tld entries for .app, downgrade the action to allow.
Mixed findings (e.g. .app + shortened_url) and block verdicts are
unaffected. Non-.app lookalike_tld findings (.zip, .exe, etc.) are
preserved.
Add 15 regression tests covering: .app-only suppression, mixed-finding
preservation, non-.app TLD preservation, block-verdict invariance, and
the helper's field-name and case-insensitivity behaviour.
Closes#24461
* feat(dep_ensure): complete Windows bootstrap — dep_ensure + install.ps1 + detection
dep_ensure.py gains Windows awareness: PowerShell invocation, platform-
specific browser detection, (path, shell) tuple returns.
install.ps1 gains -Ensure/-PostInstall modes using npm -g --prefix
(aligned with install.sh) and agent-browser install for Chromium.
browser_tool.py gains node/ in candidate dirs for Windows .cmd shims.
Both install scripts bundled in pip wheel.
Tracking: #27826
* fix(install.ps1): add --ignore-scripts to npm install for camofox
@askjo/camofox-browser has a dependency (impit) whose postinstall
script runs `npx only-allow pnpm`, which fails under npm. Adding
--ignore-scripts avoids the spurious failure without affecting
functionality.
Tracking: #27826
* fix: remove duplicate install scripts from git
CI already copies scripts/install.{sh,ps1} into hermes_cli/scripts/
during wheel build. No need to commit copies — .gitignore keeps them
out, _find_install_script() falls back to scripts/ for git-clone users.
Tracking: #27826
* fix: address review — remove env_extra, fix ps1 error handling
- Remove unused env_extra parameter from ensure_dependency()
- Invoke-EnsureMode node case now uses Test-Node consistently
- Install-AgentBrowser uses throw instead of exit 1
The agent can now produce a chart, PDF, spreadsheet, or any other supported
file type and have it land in Slack / Discord / Telegram / WhatsApp / etc.
as a native attachment, just by mentioning the absolute path in its
response. Same primitive works for kanban-worker completions: workers
attach artifacts via kanban_complete(artifacts=[...]) and the gateway
notifier uploads them alongside the completion message.
Changes:
- gateway/platforms/base.py: extract_local_files now covers PDFs, docx,
spreadsheets (xlsx/csv/json/yaml), presentations (pptx), archives
(zip/tar/gz), audio (mp3/wav/...), and html — not just images and video.
Image/video extensions still embed inline; everything else routes to
send_document via the existing dispatch partition in gateway/run.py.
- tools/kanban_tools.py + hermes_cli/kanban_db.py: kanban_complete gains
an explicit ``artifacts`` parameter. The handler stashes it in
metadata.artifacts (for downstream workers) and the kernel promotes
it onto the completed-event payload so the notifier can find it
without a second SQL round-trip.
- gateway/run.py: _kanban_notifier_watcher now calls a new helper
_deliver_kanban_artifacts after sending the completion text. The
helper reads payload.artifacts (preferred), falls back to scanning
the payload summary and task.result with extract_local_files, then
partitions images / videos / documents and uploads each via
send_multiple_images / send_video / send_document.
- website/docs/user-guide/features/deliverable-mode.md + sidebars.ts:
user-facing docs page covering the extension list, the kanban
artifacts pattern, and the MCP-for-connector-breadth recommendation.
Tests:
- tests/gateway/test_extract_local_files.py: 7 new test cases
(documents, spreadsheets, presentations, audio, archives, html,
chart-pdf canonical case). 44 passing, 0 regressions.
- tests/tools/test_kanban_tools.py: 4 new cases covering the artifacts
arg shape (list / string / merge with existing metadata / type
rejection). 17 passing.
- tests/hermes_cli/test_kanban_notify.py: 2 new cases covering full
notifier → artifact-upload path and missing-file silent-skip. 12
passing.
- E2E (real files, real kanban kernel, real BasePlatformAdapter):
worker calls kanban_complete(artifacts=[png,pdf,csv]) → metadata +
event payload land → notifier helper partitions correctly →
send_multiple_images called once with the PNG, send_document called
twice with PDF + CSV.
What's NOT in this PR (deferred to follow-ups):
- Ad-hoc "research this for two hours, ping the thread when done"
slash command — covered today by kanban subscriptions; a dedicated
slash command can ride a follow-up PR if needed.
- Setup-wizard prompt for recommended MCP servers (Notion, GitHub,
Linear, etc.) — docs page lists them; UI is a separate change.
Plan and rationale captured in ~/.hermes/docs/perplexity-computer-parity.pdf
(local doc, not shipped).
* feat(session_search): single-shape tool with discovery, scroll, browse — no LLM
Replaces the LLM-summarized session_search with a single-shape tool that
returns actual messages from the DB. Three calling shapes inferred from
args (no mode parameter):
1. Discovery — pass query. FTS5 + anchored ±5 window + bookends per hit,
all in one call. ~20ms on a real DB instead of ~90s for the previous
three aux-LLM calls.
2. Scroll — pass session_id + around_message_id. Returns a window
centered on the anchor. To paginate, re-anchor on the first/last id
of the returned window. Boundary message appears in both windows
as the orientation marker. ~1ms per scroll call.
3. Browse — no args. Recent sessions chronologically.
Bookend_start (first 3 user+assistant msgs) and bookend_end (last 3) give
the agent goal + resolution on every discovery hit, so a single tool call
reconstructs a long session's arc without loading the whole transcript.
The aux-LLM summary path is gone: it cost ~$0.30/call, took ~30s, and
laundered FTS5 hits through a model that could confabulate when the right
session wasn't in the hit list. The merged shape returns byte-for-byte
content from SQLite.
History:
- PR #20238 (JabberELF) seeded the fast/summary dual-mode split.
- PR #26419 (yoniebans) expanded to fast/guided/summary with bookends,
multi-anchor drill-down, default-mode config, and a teaching skill.
This PR collapses that toolkit into one shape with explicit scroll
support, drops the summary path, drops the mode parameter, drops the
config knob, drops the skill. JabberELF's seed work is acknowledged via
the AUTHOR_MAP entry.
Validation:
- 38/38 tool tests pass (tests/tools/test_session_search.py)
- 12/12 get_messages_around tests pass (tests/hermes_state/)
- 11/11 get_anchored_view tests pass (tests/hermes_state/)
- Full tests/tools/ run: 5168 passing, 2 failures pre-exist on main
(test ordering in test_delegate.py, unrelated)
- E2E against live state DB: discovery 20ms, scroll 1ms, browse 280ms;
pagination forward+backward works with boundary-message orientation;
error paths return clean tool_error responses
Co-authored-by: JabberELF <abcdjmm970703@gmail.com>
Co-authored-by: yoniebans <jonny@nousresearch.com>
* chore(session_search): prune dead LLM-summary config and docs
Companion to the single-shape rewrite. The auxiliary.session_search config
block, max_concurrency / extra_body tunables, and matching docs sections
all referenced the removed LLM summarization path. Removing them so users
don't try to tune knobs that nothing reads.
- hermes_cli/config.py: drop dead auxiliary.session_search block from
DEFAULT_CONFIG. Leftover keys in user config.yaml are harmless and
ignored.
- hermes_cli/tips.py: drop two tips referencing the removed
max_concurrency / extra_body knobs.
- website/docs/user-guide/configuration.md: drop 'Session Search Tuning'
section and the auxiliary.session_search block from the example.
- website/docs/user-guide/features/fallback-providers.md: drop session_search
rows from the auxiliary-tasks tables and the dedicated tuning subsection.
- website/docs/reference/tools-reference.md: rewrite the session_search
entry to describe the new three-shape behaviour.
- CONTRIBUTING.md: update the file-tree description.
- tests/tools/test_llm_content_none_guard.py: remove TestSessionSearchContentNone
class and test_session_search_tool_guarded — both guard against an
unguarded .content.strip() call site in _summarize_session() that no
longer exists.
Validation: 97/97 targeted tests still pass (hermes_state + session_search +
llm_content_none_guard). Config tests 55/55.
---------
Co-authored-by: JabberELF <abcdjmm970703@gmail.com>
Co-authored-by: yoniebans <jonny@nousresearch.com>