* feat(nous): unified client=hermes-client-v<version> tag on every Portal request
Every Hermes request to Nous Portal now carries the same
client=hermes-client-v<__version__> tag (e.g. client=hermes-client-v0.13.0
on this release), sourced live from hermes_cli.__version__. The release
script's regex bump auto-aligns it on every release.
Centralized in agent/portal_tags.py and wired into all four call sites:
- NousProfile.build_extra_body (main agent loop, every chat completion)
- auxiliary_client.NOUS_EXTRA_BODY + _build_call_kwargs (aux client)
- run_agent.py compression-summary fallback path
- tools/web_tools.py web_extract fallback
Replaces the client=aux marker added in #24194 with the unified version
tag. Tests assert against the helper output (invariant) rather than the
literal string, so they don't need updating on every release.
* feat(nous): cover /goal judge and kanban specify aux paths
Two aux-using surfaces bypassed call_llm by invoking
client.chat.completions.create() directly without extra_body, so they
were missing the unified Portal client tag:
- hermes_cli/goals.py — /goal standing-goal judge
- hermes_cli/kanban_specify.py — kanban triage specifier
Both now pass extra_body=get_auxiliary_extra_body() or None so they
inherit the version tag when the aux client points at Nous Portal, and
emit nothing otherwise (no tag leak to OpenRouter/Anthropic auxes).
The live adapter path in _send_via_adapter called adapter.send() without
passing thread_id, while the standalone fallback path correctly forwarded
it. For plugin platforms (google_chat, teams, irc, line) running with the
gateway in-process, this caused every threaded reply to land as a new
top-level message instead of continuing the thread.
Matches the pattern already used by _send_matrix_via_adapter and
_send_feishu: build metadata={"thread_id": thread_id} and pass it through.
In WSL2, sounddevice.query_devices() returns [] even when the
PulseAudio bridge is functional. The existing code already handled
the case where the query itself raises an exception, but it missed
the empty-list case.
This change treats an empty device list as non-fatal in WSL when
PULSE_SERVER is configured, matching the existing exception-handler
behavior.
Fixes: WSL users seeing 'No audio input/output devices detected'
even though paplay/arecord work fine.
Tavily's /crawl endpoint requires Authorization: Bearer <key> in the header,
unlike /search and /extract which accept api_key in the JSON body.
Without the header, crawl returns 401 Unauthorized.
_parse_target_ref() has no handler for XMPP JIDs (user@server or
room@conference.server), so they fall through to the final
`return None, None, False`. This causes send_message to fail when
targeting an XMPP chat by JID, since the JID is not numeric and
doesn't match any other platform pattern.
Add an explicit check for XMPP targets containing '@', matching the
existing Matrix pattern above it.
Three follow-ups to PR #24168 found during live E2E testing on TS/bash files:
1. typescript-language-server now installs the typescript SDK (tsserver)
alongside it. Without that sibling install, initialize() failed with
"Could not find a valid TypeScript installation" and the server was
marked broken — no diagnostics ever reached the agent. New extra_pkgs
field on INSTALL_RECIPES makes that explicit and reusable for future
peer-dep cases.
2. _check_lint now treats "linter command exists on PATH but cannot
actually run" as skipped instead of error. The motivating case is
npx tsc when typescript is not in node_modules — npx prints its
"This is not the tsc command you are looking for" banner and exits
non-zero, which previously blocked the LSP semantic tier (gated on
success or skipped). Pattern-matched per base command (npx,
rustfmt, go) so genuine lint errors still flow through normally.
3. hermes lsp status now surfaces a Backend warnings section when
bash-language-server is installed but shellcheck is missing. The
server itself spawns fine but bash-language-server delegates
diagnostics to shellcheck — without it on PATH the integration
looks alive but never reports any problems. Same warning is
logged once at server spawn time.
Validation:
- 12 new tests in tests/agent/lsp/test_install_and_lint_fixes.py:
* recipe carries typescript SDK
* _install_npm passes both pkg + extras to npm CLI
* backwards compat: recipes without extras still work
* _backend_warnings quiet when bash absent / both present
* _backend_warnings fires when bash installed without shellcheck
* status output includes the Backend warnings section
* _looks_like_linter_unusable catches the npx tsc banner
* real TS type errors not misclassified as unusable
* unfamiliar linters fall through normally
* _check_lint returns skipped on npx tsc unusable
* _check_lint returns error on real tsc type errors
- Full lsp + file_operations test suite: 245/245 pass
- Live E2E:
* try_install("typescript-language-server") installs both packages
into node_modules
* write_file(bad.ts, ...) returns lint=skipped + lsp_diagnostics
with two real TS errors (was lint=error, no lsp_diagnostics)
* hermes lsp status renders the shellcheck warning when bash is
installed but shellcheck is not on PATH
The clarify tool returned 'not available in this execution context' for
every gateway-mode agent because gateway/run.py never passed
clarify_callback into the AIAgent constructor. Schema actively encouraged
calling it; users never saw the question.
Changes:
- tools/clarify_gateway.py — new event-based primitive mirroring
tools/approval.py: register/wait_for_response/resolve_gateway_clarify
with per-session FIFO, threading.Event blocking with 1s heartbeat
slices (so the inactivity watchdog keeps ticking), and
clear_session for boundary cleanup.
- gateway/platforms/base.py — abstract send_clarify with a numbered-text
fallback so every adapter (Discord, Slack, WhatsApp, Signal, Matrix,
etc.) gets a working clarify out of the box. Plus an active-session
bypass: when the agent is blocked on a text-awaiting clarify, the next
non-command message routes inline to the runner's intercept instead
of being queued + triggering an interrupt. Same shape as the /approve
deadlock fix from PR #4926.
- gateway/platforms/telegram.py — concrete send_clarify renders one
inline button per choice plus '✏️ Other (type answer)'. cl: callback
handler resolves numeric choices immediately, flips to text-capture
mode for Other, with the same authorization guards as exec/slash
approvals.
- gateway/run.py — clarify_callback wired at the cached-agent per-turn
callback assignment site (only the user-facing agent path; cron and
hygiene-compress agents have no human attached). Bridges sync→async
via run_coroutine_threadsafe, blocks with the configured timeout, and
returns a '[user did not respond within Xm]' sentinel on timeout so
the agent adapts rather than pinning the running-agent guard. Text-
intercept added to _handle_message before slash-confirm intercept
(skipping slash commands). clear_session called in the run's finally
to cancel any orphan entries.
- hermes_cli/config.py — agent.clarify_timeout default 600s.
- website/docs/user-guide/messaging/telegram.md — Interactive Prompts
section.
Tests:
- tests/tools/test_clarify_gateway.py (14 tests) — full primitive
coverage: button resolve, open-ended auto-await, Other flip, timeout
None, unknown-id idempotency, clear_session cancellation, FIFO
ordering, register/unregister notify, config default.
- tests/gateway/test_telegram_clarify_buttons.py (12 tests) — render
paths (multi-choice/open-ended/long-label/HTML-escape/not-connected),
callback dispatch (numeric resolve/Other flip/already-resolved/
unauthorized/invalid-token), and base-adapter text fallback.
Out of scope: bot-to-bot, guest mode, checklists, poll media, live
photos. Closes#24191.
* feat(lsp): semantic diagnostics from real language servers in write_file/patch
Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server,
clangd, bash-language-server, ...) into the post-write lint check used by write_file
and patch. The model now sees type errors, undefined names, missing imports, and
project-wide semantic issues introduced by its edits, not just syntax errors.
LSP is gated on git workspace detection: when the agent's cwd or the file being
edited is inside a git worktree, LSP runs against that workspace; otherwise the
existing in-process syntax checks are the only tier. This keeps users on
user-home cwds (Telegram/Discord gateway chats) from spawning daemons.
The post-write check is layered: in-process syntax check first (microseconds),
then LSP semantic diagnostics second when syntax is clean. Diagnostics are
delta-filtered against a baseline captured at write start, so the agent only
sees errors its edit introduced. A flaky/missing language server can never
break a write -- every LSP failure path falls back silently to the syntax-only
result.
New module agent/lsp/ split into:
- protocol.py: Content-Length JSON-RPC framer + envelope helpers
- client.py: async LSPClient (spawn, initialize, didOpen/didChange,
ContentModified retry, push/pull diagnostic stores)
- workspace.py: git worktree walk-up + per-server NearestRoot resolver
- servers.py: registry of 26 language servers (extension match,
root resolver, spawn builder per language)
- install.py: auto-install dispatch (npm install --prefix, go install
with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/
- manager.py: LSPService (per-(server_id, root) client registry, lazy
spawn, broken-set, in-flight dedupe, sync facade for tools layer)
- reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file)
- cli.py: hermes lsp {status,list,install,install-all,restart,which}
Wired into tools/file_operations.py:
- write_file/patch_replace now call _snapshot_lsp_baseline before write
- _check_lint_delta gains a third tier: LSP semantic diagnostics when
syntax is clean
- All LSP code paths swallow exceptions; write_file's contract unchanged
Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true),
wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server
overrides (disabled, command, env, initialization_options).
Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and
read_message round-trip, EOF/truncation/missing Content-Length), workspace
gate (git walk-up, exclude markers, fallback to file location), reporter
(severity filter, max-per-file cap, truncation), service-level delta filter,
and an in-process mock LSP server that exercises the full client lifecycle
including didChange version bumps, dedup, crash recovery, and idempotent
teardown.
Live E2E verified end-to-end through ShellFileOperations: pyright
auto-installed via npm into HERMES_HOME, baseline captured, type error
introduced, single delta diagnostic surfaced with correct line/column/code/
source, then patch fix removes the diagnostic from the output.
Docs: new website/docs/user-guide/features/lsp.md page covering supported
languages, configuration knobs, performance characteristics, and
troubleshooting; cli-commands.md updated with the 'hermes lsp' reference;
sidebar updated.
* feat(lsp): structured logging, backend gate, defensive walk caps
Cherry-picks the substantive ideas from #24155 (different scope, same
problem space) onto our PR.
agent/lsp/eventlog.py (new): dedicated structured logger
``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets
keep a 1000-write session at exactly ONE INFO line ("active for
<root>") at the default INFO threshold; clean writes log at DEBUG so
they never reach agent.log under normal config. State transitions
(server starts, no project root for a file, server unavailable) fire
at INFO/WARNING once per (server_id, key); novel events (timeouts,
unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``.
agent/lsp/manager.py: wire the eventlog into _get_or_spawn and
get_diagnostics_sync so users can answer "did LSP fire on this edit?"
with a single grep, plus surface "binary not on PATH" warnings once
instead of silently retrying every write.
tools/file_operations.py: backend-type gate. ``_lsp_local_only()``
returns False for non-local backends (Docker / Modal / SSH /
Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics``
now skip entirely on remote envs. The host-side language server
can't see files inside a sandbox, so this prevents pretending to
lint a file the host process can't open.
agent/lsp/protocol.py: 8 KiB cap on the header block in
``read_message``. A pathological server that streams headers
without ever emitting CRLF-CRLF would have looped forever consuming
bytes; now raises ``LSPProtocolError`` instead.
agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and
``nearest_root`` upward walks, plus try/except containment around
``Path(...).resolve()`` and child ``.exists()`` calls. Defensive
against pathological inputs (symlink loops, encoding errors,
permission failures mid-walk) — the lint hook is hot-path code and
must never raise.
Tests:
- tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state
silence (clean writes stay DEBUG), state-transition INFO-once
semantics (active for, no project root), action-required
WARNING-once (server unavailable), per-call WARNING (timeouts,
spawn failures), and the "1000 clean writes => 1 INFO" contract.
- tests/agent/lsp/test_backend_gate.py: 5 tests verifying
_lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip
the LSP layer for non-local backends and route correctly for
LocalEnvironment.
- tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header
exercising the 8 KiB cap.
Validation:
- 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap)
- 198/198 pass when run alongside existing file_operations tests
- Live E2E re-run with pyright still surfaces "ERROR [2:12] Type
... reportReturnType (Pyright)" through the full path, then patch
fix removes it on the next call.
* feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field
Two improvements salvaged from #24414's plugin-form alternative,
keeping our core-integrated design:
1. atexit cleanup of spawned language servers
----------------------------------------------------------------
``agent/lsp/__init__.get_service`` now registers an ``atexit``
handler on first creation that tears down the LSPService on
Python exit. Without this, every ``hermes chat`` exit was
leaking pyright/gopls/etc. processes for a few seconds while
their stdout buffers drained -- they got reaped by the kernel
eventually but a watchful ``ps aux`` would catch them.
The handler runs once per process (gated by
``_atexit_registered``); idempotent ``shutdown_service``
ensures double-fire is a no-op. Errors during shutdown are
swallowed at debug level since by the time atexit fires the
user has already seen the agent's final response.
2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult
----------------------------------------------------------------
Previously the LSP layer folded its diagnostic block into the
``lint.output`` string, conflating the syntax-check tier with
the semantic tier. The agent (and any downstream parsers) now
read syntax errors and semantic errors as independent signals:
{
"bytes_written": 42,
"lint": {"status": "ok", "output": ""},
"lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..."
}
``_check_lint_delta`` returns to its original two-tier shape
(syntax check + delta filter); ``write_file`` and
``patch_replace`` independently fetch LSP diagnostics via
``_maybe_lsp_diagnostics`` and pass them into the new field.
``patch_replace`` propagates the inner write_file's
``lsp_diagnostics`` so the outer PatchResult carries the patch's
delta correctly.
Tests: 19 new
- tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration
fires once and only once across N get_service calls; the
registered callable is our internal shutdown wrapper;
shutdown_service is idempotent and safe when never started;
exceptions during shutdown are swallowed; inactive service is
cached so we don't rebuild on every check.
- tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult
/ PatchResult dataclass shape, to_dict include/omit semantics,
channel separation (lint and lsp_diagnostics carry independent
signals), write_file populates the field via
_maybe_lsp_diagnostics only when the syntax tier is clean,
patch_replace propagates the field forward from its internal
write_file.
Validation:
- 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field)
- 217/217 pass with file_operations + LSP combined
- Live E2E reverified: clean writes -> both fields empty/none; type
error introduced -> lint clean (parses), lsp_diagnostics carries
the pyright reportReturnType block; patch fix -> both fields
clean again.
* fix(lsp): broken-set short-circuit so a wedged server isn't paid every write
Discovered while auditing failure paths: a language server binary that
hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY
subsequent write to re-pay the 8s snapshot_baseline timeout. Five
writes = ~64s of dead time.
The bug: ``_get_or_spawn`` adds the (server_id, root) pair to
``_broken`` inside its inner exception handler, but when the OUTER
``_loop.run`` timeout fires, it cancels the inner task before that
handler runs. The pair never makes it to broken-set, so the next
write re-enters the spawn path and re-pays the timeout.
Fix:
- New ``_mark_broken_for_file`` helper at the service layer marks
the (server_id, workspace_root) pair broken from the OUTSIDE when
the outer timeout fires. Called from the except branches in
``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError
+ generic Exception). Also kills any orphan client process that
survived the cancelled future, fire-and-forget with a 1s ceiling.
- ``enabled_for`` now consults the broken-set BEFORE returning True.
Files in already-broken (server_id, root) pairs short-circuit to
False, so the file_operations layer skips the LSP path entirely
with no spawn cost. Until the service is restarted (``hermes lsp
restart``) or the process exits.
- A single eventlog WARNING is emitted on first mark-broken so the
user knows which server gave up. Subsequent edits in the same
project stay silent.
Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the
key shape (server_id, per_server_root), enabled_for short-circuit,
sibling-file skip in same project, project isolation (broken in
A doesn't affect B), graceful no-op for missing-server / no-workspace,
and an end-to-end test that snapshots after a failure and verifies
the next ``enabled_for`` returns False.
Validation:
- Live retest of the wedged-binary scenario: 5 sequential writes,
first 8.88s (the one snapshot timeout), subsequent four ~0.84s
(no LSP cost). Down from 5x12.85s = 64s before this fix.
- 99/99 LSP tests pass (92 prior + 7 broken-set)
- 224/224 pass with file_operations + LSP combined
- Happy path E2E reverified — clean write, type error introduced,
patch fix all behave correctly with the new broken-set logic.
Note: the FIRST write to a wedged binary still pays 8s (the
snapshot_baseline timeout). We could shorten that, but pyright/
tsserver normally take 2-3s and slow CI rust-analyzer can need
5+ seconds, so 8s is the conservative ceiling. Subsequent writes
are instant.
Daytona ships breaking SDK changes on June 10, 2026 — `list()` returns
an iterator and the `page=` offset parameter is removed. We pin
daytona==0.155.0 so we're past the May 24 hard-cutoff, but the
legacy-sandbox resume path in DaytonaEnvironment still passes `page=1`
and reads `.items` off the result.
Switch to `next(iter(results), None)` against a single-result
`list(labels=..., limit=1)` call. Update tests to use `iter([...])`
and drop the `page=1` kwarg from list() assertions.
Allow integrations to share a visible Camofox identity with Hermes and recover existing tabs without carrying local patches.
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback
Three coordinated mitigations for the Mini Shai-Hulud worm hitting
mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package
compromise that follows.
# What this PR makes true
1. Users with the poisoned mistralai 2.4.6 in their venv get a loud
detection banner with copy-pasteable remediation steps the moment
they run hermes (and on every gateway startup).
2. One quarantined / yanked PyPI package can no longer silently demote
a fresh install to 'core only' — the installer keeps every other
extra and tells the user which tier landed.
3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can
lazy-install on first use under a strict allowlist, instead of
eagerly pulling everything at install time.
# Detection: hermes_cli/security_advisories.py
- ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for
mistralai==2.4.6). Adding the next one is a single dataclass.
- detect_compromised() uses importlib.metadata.version() — no pip
dependency, works in uv venvs that lack pip.
- Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits
the startup banner to once per 24h per advisory.
- Acks persisted to security.acked_advisories in config.yaml; never
re-banner after ack.
- Wired into:
* hermes doctor — runs first, prints full remediation block
* hermes doctor --ack <id> — dismisses an advisory
* cli.py interactive run() and single-query branches — short
stderr banner pointing at hermes doctor
* gateway/run.py startup — operator-visible warning in gateway.log
# Lazy-install framework: tools/lazy_deps.py
- LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs,
memory.honcho, provider.bedrock, etc.) to pip specs.
- ensure(feature) installs missing deps in the active venv via the
uv → pip → ensurepip ladder (matches tools_config._pip_install).
- Strict spec safety regex rejects URLs, file paths, shell metas,
pip flag injection, control chars — only PyPI-by-name accepted.
- Gated on security.allow_lazy_installs (default true) plus the
HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs.
- Migrated three backends as proof of pattern:
* tools/tts_tool.py — _import_elevenlabs() calls ensure first
* plugins/memory/honcho/client.py — get_honcho_client lazy-installs
* tts.mistral / stt.mistral entries pre-registered for when PyPI
restores mistralai
# Installer fallback tiers
scripts/install.sh, scripts/install.ps1, setup-hermes.sh:
- Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one
array when a transitive breaks; users keep every other extra.
- New 'all minus known-broken' tier between [all] and the existing
PyPI-only-extras tier. Only kicks in when [all] fails resolve.
- All three tiers explicit: every fallback announces which tier
landed and prints a re-run hint when not on Tier 1.
- install.ps1 and install.sh both regenerate their tier specs from
the same _BROKEN_EXTRAS array so updates stay in sync.
Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral'
in its extra list — bug fixed by the refactor (mistral is filtered
out).
# Config
hermes_cli/config.py — DEFAULT_CONFIG.security gains:
- acked_advisories: [] (advisory IDs the user has dismissed)
- allow_lazy_installs: True (security gate for ensure())
No config version bump needed — both keys nest under existing
security: block, and load_config's deep-merge picks up DEFAULT_CONFIG
defaults for users with older configs.
# Tests
tests/hermes_cli/test_security_advisories.py — 23 tests covering:
- detect_compromised matches/non-matches, wildcard frozenset
- ack persistence, idempotence, blank rejection, config-failure path
- banner cache rate limiting + 24h re-banner + ack-stops-banner
- short_banner_lines / full_remediation_text / render_doctor_section /
gateway_log_message
- shipped catalog well-formedness invariant
tests/tools/test_lazy_deps.py — 40 tests covering:
- spec safety: 11 safe parametrized + 18 unsafe parametrized
- allowlist: unknown-feature rejection, namespace.name shape,
every shipped spec passes the safety regex
- security gating: config flag, env var, default, fail-open
- ensure() happy/sad paths: already-satisfied, install success,
pip stderr surfaced on failure, install-succeeds-but-still-missing
- is_available, feature_install_command
Combined: 63 new tests, all passing under scripts/run_tests.sh.
# Validation
- scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py
tests/tools/test_lazy_deps.py → 63/63 passing
- scripts/run_tests.sh tests/hermes_cli/test_doctor.py
tests/hermes_cli/test_doctor_command_install.py
tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py
tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing
- scripts/run_tests.sh tests/hermes_cli/ tests/tools/ →
9191 passed, 8 pre-existing failures (verified on origin/main
before this change)
- bash -n on install.sh and setup-hermes.sh → OK
- py_compile on all modified .py files → OK
- End-to-end smoke test of detect_compromised + render_doctor_section
+ gateway_log_message with mocked installed version → produces
copy-pasteable remediation output
# Community
Full advisory + remediation steps:
website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md
Short-form post drafts (Discord, GitHub pinned issue, README banner):
scripts/community-announcement-shai-hulud.md
Refs: PR #24205 (mistral disabled), Socket Security advisory
<https://socket.dev/blog/mini-shai-hulud-worm-pypi>
* build(deps): pin every direct dep to ==X.Y.Z (no ranges)
Companion to the supply-chain advisory work: replace every >=/</~= range
in pyproject.toml's [project.dependencies] and [project.optional-dependencies]
with an exact ==X.Y.Z pin sourced from uv.lock.
Why: ranges allow PyPI to ship a fresh version of any direct dep at any
time without a code review on our side. With ranges, the malicious
mistralai 2.4.6 release would have been pulled by every fresh
'pip install -e .[all]' for the hours between upload and PyPI's
quarantine — exactly the install window we got hit on. Exact pins close
that window: the only way a new package version reaches a user is via
an intentional update on our end.
What the user-facing change is: nothing, behavior-wise. Every package
resolves to the same version it was already resolving to via uv.lock —
the pins just remove the resolver's freedom to pick a different one.
Cost: any user installing Hermes alongside another package that requires
a newer pin gets a resolver conflict. Acceptable for our isolated-venv
install path; documented in the new comment block.
Build-system requires line (setuptools>=61.0) is intentionally left
as a range — pinning the build backend would block fresh pip from
bootstrapping the build on architectures where that exact wheel isn't
available.
mistral extra (mistralai==2.3.0) is pinned but stays out of [all]
(per PR #24205). 'uv lock' regeneration will fail until PyPI restores
mistralai; lockfile regeneration is gated behind that, NOT on every PR.
LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy-
install pathway can never resolve a different version than the one
declared in pyproject.toml.
Validation:
- Cross-checked all 77 pinned direct deps in pyproject.toml against
uv.lock — every pin matches the resolved version exactly.
- Cross-checked all LAZY_DEPS specs against uv.lock — same.
- 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly.
- tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py
→ 63/63 passing (every shipped spec passes the safety regex).
- Doctor + TTS + transcription targeted suite → 146/146 passing.
* build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra
You asked: 'what about the dependencies the dependencies rely on?' —
correctly noting that exact-pinning direct deps in pyproject.toml does
NOT cover the transitive graph. `pip install` and `uv pip install` both
re-resolve transitives fresh from PyPI at install time, so a compromised
transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would
still hit our users even with every direct dep exact-pinned.
# What this commit fixes
1. **Both real installer scripts now prefer `uv sync --locked` as Tier 0.**
uv.lock records SHA256 hashes for every transitive — a compromised
package with a different hash gets REJECTED. Falls through to the
existing `uv pip install` cascade if the lockfile is missing or
stale, with a loud warning that the fallback path does NOT
hash-verify transitives. Previously only `setup-hermes.sh` (the dev
path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1`
(the paths fresh users actually run) skipped it.
2. **Removed the `[mistral]` extra entirely.** The `mistralai` PyPI
project is fully quarantined right now — every version returns 404,
so any pin we wrote was unresolvable, which broke `uv lock --check`
in CI. Restoration is documented in pyproject.toml as a 5-step
checklist (verify, re-add extra, re-enable in 4 modules, regenerate
lock, optionally re-add to [all]).
3. **Regenerated uv.lock.** 262 packages, mistralai/eval-type-backport/
jsonpath-python pruned. `uv lock --check` now passes.
# Defense-in-depth view
| Layer | Where | Protects against |
|----------------------------|-------------------|-------------------------------------------|
| Exact pins in pyproject | direct deps | new mistralai 2.4.6-style direct compromise |
| uv.lock + `--locked` install | transitive graph | transitive worm injection |
| Tier-0 hash-verified path | install.sh / .ps1 | actually USE the lockfile in fresh installs |
| `uv lock --check` CI gate | every PR | drift between pyproject and lockfile |
| `hermes_cli/security_advisories.py` | runtime | cleanup for users who already got hit |
The exact pinning + hash verification together close the supply-chain
gap. Without the lockfile path, exact pins alone are theater.
# Validation
- `uv lock --check` → passes (262 packages resolved, no drift).
- `bash -n` on install.sh + setup-hermes.sh → OK.
- 209/209 tests passing across new + adjacent test files
(test_lazy_deps.py, test_security_advisories.py, test_doctor.py,
test_tts_mistral.py, test_transcription_tools.py).
- TOML parse OK.
* chore: remove community announcement drafts (PR body covers it)
* build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard)
Extends the lazy-install framework to cover everything that's not used by
every hermes session. Base install drops from ~60 packages to 45.
Moved out of core dependencies = []:
- anthropic (only when provider=anthropic native, not via aggregators)
- exa-py, firecrawl-py, parallel-web (search backends; only when picked)
- fal-client (image gen; only when picked)
- edge-tts (default TTS but still optional)
New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web]
[fal] [edge-tts]. All added to [all].
New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel},
tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix},
terminal.{modal,daytona,vercel}, tool.dashboard.
Each import site now calls ensure() before importing the SDK. Where the
module had a top-level try/except (telegram, discord, fastapi), the
graceful-fallback pattern was extended to lazy-install on first
check_*_requirements() call and re-bind module globals.
Updated test_windows_native_support.py tzdata check from snapshot
(>=2023.3 literal) to invariant (any version + win32 marker).
Validation:
- Base install: 45 packages (was ~60); 6 newly-extracted packages absent
- uv lock --check: passes (262 packages, no drift)
- 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing
- py_compile clean on all 12 modified modules
The `mistralai` PyPI package was quarantined on 2026-05-12 after a
malicious 2.4.6 release. Every fresh resolve (AUR makepkg, Docker build,
CI run, install.sh first-run) currently fails on
`mistralai>=2.3.0,<3` because PyPI returns zero candidates.
Existing users running `hermes update` mostly didn't notice — `hermes
update` falls back from `.[all]` to per-extra retries and silently
skips mistral with a warning that scrolls past. But fresh installs
hard-fail or lose every other extra.
Changes:
- pyproject.toml: drop `hermes-agent[mistral]` from `[all]` and
`[termux-all]`. The `mistral` extra itself is preserved so users
can opt back in once PyPI un-quarantines.
- hermes_cli/tools_config.py: hide Mistral Voxtral TTS from the
`hermes tools` provider picker until restored.
- hermes_cli/web_server.py: drop "mistral" from dashboard STT options.
- tools/transcription_tools.py: explicit `provider: mistral` returns
"none" with a clear status message; auto-detect skips mistral.
- tools/tts_tool.py: dispatcher returns a clear "temporarily disabled"
error before any SDK import attempt (avoids cached-stale-package
surprises).
- tests/tools/: update three test files to assert the new disabled
behavior. Each test docstring records why and points at the rollback
trigger (PyPI un-quarantines mistralai).
Restore plan: revert this commit once the package is available on PyPI
again. The behavior change is intentional and documented in code
comments + test docstrings to make the rollback trivial.
Validation:
- scripts/run_tests.sh tests/tools/ -k 'mistral or stt or tts' →
425/425 passing.
Refs: https://pypi.org/simple/mistralai/ (currently
"pypi:project-status: quarantined").
Set HERMES_SESSION_ID using the existing session_context.py ContextVar
system for concurrency safety (multiple gateway sessions in one process
won't cross-talk). Also writes os.environ as fallback for CLI mode.
Touchpoints:
- gateway/session_context.py: Add _SESSION_ID ContextVar + _VAR_MAP entry
- run_agent.py: Set both ContextVar and os.environ at init and on
context-compression rotation
- tools/environments/local.py: Bridge ContextVars into subprocess env
in _make_run_env() (ContextVars don't propagate to child processes)
- tests/run_agent/test_session_id_env.py: 3 tests covering env, provided
ID, and ContextVar paths
execute_code subprocess already passes HERMES_* prefixed vars through
_scrub_child_env (line 82: _SAFE_ENV_PREFIXES includes 'HERMES_').
Primary use case: webhook-triggered agents that need to include a
`--resume <session_id>` takeover command in their output.
Replace with for all literal-tuple
membership tests. Set lookup is O(1) vs O(n) for tuple — consistent
micro-optimization across the codebase.
608 instances fixed via `ruff --fix --unsafe-fixes`, 0 remaining.
133 files, +626/-626 (net zero).
Adds the only #17873 category not covered by the in-flight PRs #17962
(briandevans, reverse shell + download-execute) and #7993 (SHL0MS,
credential reads + curl/wget exfiltration): sudo invocations that an
LLM-driven agent can drive without TTY interaction.
The agent has no TTY, so the sudo forms that succeed without human
involvement are those reading the password from stdin (`-S` / `--stdin`)
or via an askpass helper (`-A` / `--askpass`). The shell-launch (`-s`)
and list-privileges (`-a`) flags are also gated since they are
privilege-relevant invocations the agent can chain after acquiring the
password (e.g. read SUDO_PASSWORD from .env -> sudo -S -s -> root shell).
Plain `sudo cmd` (no flag) is TTY-bound and excluded.
Two patterns:
1. Direct flag: `\bsudo\b[^;|&\n]*?\s+(?:-s\b|--stdin\b|-a\b|--askpass\b)`
The lazy `[^;|&\n]*?` consumes flag-arguments without spanning
command separators, so `sudo -u root -S whoami` matches (a textbook
offensive form that a strict `(?:\s+-[^\s]+)*` "leading flags only"
pattern would have missed because `root` is a flag-value not a flag).
2. Combined short flags: `\bsudo\b[^;|&\n]*?\s+-[a-z]*[sa][a-z]*\b`
Catches packed forms like `sudo -nS id` where multiple flags share
a single `-X` token.
`_normalize_command_for_detection` lowercases input before pattern
matching (tools/approval.py:340), so case variants of S/s and A/a
collapse — both letter-pairs are gated since each is a privilege-
relevant invocation.
Tests: 21 new cases in TestDetectSudoStdin (12 positive covering all
flag-order permutations including herestring source and printf-piped
forms; 9 negative including TTY-bound `sudo whoami`, interactive
`sudo -i`, env-var reference `$SUDO_USER`, doc lookup `man sudo`,
package install, and the `pseudosudo` word-boundary edge case).
Empirical coverage: 11/11 attacks matched, 0/10 false positives.
Refs: #17873 category 4. Adjacent: #17962 (reverse shell + download-
execute), #7993 (credential reads + curl/wget exfiltration).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes#9590: Block explicit sudo -S (stdin password mode) commands
when the SUDO_PASSWORD environment variable is not configured.
The attack vector: the LLM constructs 'echo guessedpass | sudo -S cmd'
to brute-force sudo passwords, iterates based on sudo's error output
('Sorry, try again'). The existing _transform_sudo_command only
injects -S when SUDO_PASSWORD exists; without it, the LLM's explicit
sudo -S must be treated as a guessing attempt.
Changes:
- Add _check_sudo_stdin_guard() in approval.py: detects sudo -S when
SUDO_PASSWORD is absent, anchored to command-start positions
(^ ; && || | etc.) to avoid false positives on literal text
- Integrate into check_all_command_guards() above yolo/mode=off so
the block is unconditional (like the hardline floor)
- Add 6 tests covering: detection, allow-list, SUDO_PASSWORD bypass,
integration with check_all_command_guards, yolo non-bypass,
container backend bypass
The kanban dispatcher sets HERMES_KANBAN_TASK on every spawned worker
but launches it with the assignee profile's HERMES_HOME (e.g.
~/.hermes/profiles/<name>/), which has no gateway.pid file. The
existing _check_send_message therefore returned False from the
is_gateway_running() fallback, even though the parent gateway is
alive and reachable.
Net effect: workers could call kanban_* tools (gated on
HERMES_KANBAN_TASK in _check_kanban_mode) but not send_message. This
breaks the natural pattern of "worker does the job, calls
send_message to deliver rich content to the originating chat, then
calls kanban_complete with a one-line summary" because the kanban
notifier's payload_summary is hard-truncated to the first line
(~200 chars) at gateway/run.py:3963 — anything richer has to ship
via send_message.
Honoring HERMES_KANBAN_TASK in _check_send_message — symmetric with
_check_kanban_mode in kanban_tools.py:42 — closes the gap. No new
state, no new env var, no profile-config changes required.
Two independent opt-in QoL toggles, both off by default.
terminal.docker_extra_args:
- List of extra flags appended verbatim to docker run after security
defaults. Useful for adding capabilities (e.g. --cap-add SETUID) or
other docker run options not exposed by existing config keys.
- Non-string entries are logged and skipped.
- Also available via TERMINAL_DOCKER_EXTRA_ARGS='[...]' env var.
display.timestamps:
- Appends [HH:MM] to user input bullet and the assistant response box
header. Single hub in _format_submitted_user_message_preview()
covers both single-line and multi-line user previews; assistant
response label gets the timestamp at box-open time.
Closes#1569 (timestamps).
Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>
When kanban_complete rejects a created_cards list as hallucinated, the
task is intentionally left in-flight (the gate runs before the write
txn) so the worker can retry with a corrected list or pass
created_cards=[] to skip the check. The retry path already worked, but
the previous error wording read like a terminal failure and workers
were observed abandoning the run instead of trying again.
Spell out the recovery path explicitly in the tool_error response
("Your task is still in-flight ... Retry kanban_complete with ...") and
add regression coverage at both the kernel and tool layers so the
retry contract — and the wording the worker depends on to discover
it — is pinned.
Fixes#22923
Adds CDPSupervisor.evaluate_runtime() and wires it into _browser_eval as a
fast path when a supervisor is alive for the current task_id. Replaces the
~180ms agent-browser subprocess fork+exec+Node-startup hop with a ~1ms
Runtime.evaluate over the supervisor's already-connected WebSocket.
Falls through to the existing agent-browser CLI path when no supervisor is
running (e.g. backends without CDP, or before the first browser_navigate
attaches one), so behaviour is unchanged where it can't apply.
JS-side exceptions surface directly without falling through to the
subprocess (the subprocess would just re-raise the same error, slower);
supervisor-side failures (loop down, no session) fall through cleanly.
Benchmark — 30 iterations of `1 + 1` against headless Chrome:
supervisor WS mean= 0.96ms median= 0.91ms
agent-browser subprocess mean=179.35ms median=167.73ms
→ 187x speedup mean
Tests: 14 unit tests (mocked supervisor + response-shape coverage), 5
real-Chrome e2e tests in test_browser_supervisor.py (gated on Chrome
being installed). Browser test suite: 355 passed, 1 skipped.
Adapted from PR #20568 commit ce3518578 (Eric Litovsky / @kallidean).
Adds two-tier gating for the kanban tool surface so dispatcher-spawned
workers see only task-lifecycle tools (show/complete/block/heartbeat/
comment/create/link) while orchestrator profiles with `toolsets: [kanban]`
also see board-routing tools (kanban_list, kanban_unblock).
Workers shouldn't be enumerating or unblocking the board — they should
close their own task via the lifecycle tools. Hiding board-routing tools
from worker schemas keeps the worker focused and the toolset-isolation
contract honest.
Plus inherited from the same upstream commit:
- 50/200 row bound on kanban_list with `truncated` + `next_limit` metadata.
- Belt-and-suspenders runtime guard `_require_orchestrator_tool()` inside
the orchestrator handlers in case a stale registration ever routes a
worker to one of them.
- Tests for the new gate, the stricter bound, and the fact that even a
worker with `toolsets: [kanban]` in config still doesn't see board
routing.
Co-authored-by: Eric Litovsky <elitovsky@zenproject.net>
When the active main model has native vision and the provider supports
multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini
3, OpenRouter, Nous), vision_analyze loads the image bytes and returns
them to the model as a multimodal tool-result envelope. The model then
sees the pixels directly on its next turn instead of receiving a lossy
text description from an auxiliary LLM.
Falls back to the legacy aux-LLM text path for non-vision models and
unverified providers.
Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and
Cline. All four converge on the same pattern: tool results carry image
content blocks for vision-capable provider/model combinations.
Changes
- tools/vision_tools.py: _vision_analyze_native fast path + provider
capability table (_supports_media_in_tool_results). Schema description
updated to reflect new behaviour.
- agent/codex_responses_adapter.py: function_call_output.output now
accepts the array form for multimodal tool results (was string-only).
Preflight validates input_text/input_image parts.
- agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so
tools see the live CLI/gateway override, not the stale config.yaml
default. set_runtime_main()/clear_runtime_main() helpers.
- run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn
start so vision_analyze's fast-path check sees the actual runtime.
- tests/conftest.py: clear runtime-main override between tests.
Tests
- tests/tools/test_vision_native_fast_path.py: provider capability
table, envelope shape, fast-path gating (vision-capable model uses
fast path; non-vision model falls through to aux).
- tests/run_agent/test_codex_multimodal_tool_result.py: list tool
content becomes function_call_output.output array; preflight
preserves arrays and drops unknown part types.
Live verified
- Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a
typed filepath, gets pixels back, reads exact text from images that
no aux description could capture (font color irony, multi-line
fruit-count list, etc.).
PR replaces the closed prior efforts (#16506 shipped the inbound user-
attached path; this PR closes the gap for tool-discovered images).
Linux's MAX_ARG_STRLEN caps any single argv element at 128 KB
(32 * PAGE_SIZE). The previous heredoc-in-the-command-string approach
in _write_to_sandbox put the entire tool result inside the 'bash -c'
arg, so any result over ~128 KB raised OSError [Errno 7] 'Argument
list too long' before the heredoc ever ran. The caller logged a
warning, but quiet_mode (CLI default) sets tools.* to ERROR — so the
warning never reached agent.log either, and the agent saw a 1.5 KB
preview tagged 'Full output could not be saved to sandbox'. Hits
delegate_task with 3+ subagent outputs routinely now.
Switch to passing content via env.execute(stdin_data=...). cmd is
now just 'mkdir -p X && cat > Y' (under 1 KB), and the heavyweight
payload travels through stdin where there is no argv-element limit.
E2E reproduced the user's exact 144,778-char delegate_task envelope:
old code OSError'd, new code round-trips cleanly to disk with all
three task summaries intact.
Problem: terminal.docker_env set in config.yaml was silently ignored.
Docker containers never received the user-specified env vars.
Root cause: docker_env was missing from all three config→env bridging
maps (cli.py env_mappings, gateway/run.py _terminal_env_map,
hermes_cli/config.py _config_to_env_sync) and from the terminal_tool
_get_env_config() reader. _create_environment() consumed the key from
container_config correctly, but it was always {} because TERMINAL_DOCKER_ENV
was never set.
Also extend the list-serialisation branches in cli.py and gateway/run.py
to handle dict values via json.dumps (lists already used json.dumps;
plain str() on a dict produces undecodable output).
Fix:
- cli.py: add "docker_env": "TERMINAL_DOCKER_ENV" to env_mappings;
serialise dict values with json.dumps alongside existing list path
- gateway/run.py: same additions to _terminal_env_map and serialisation
- hermes_cli/config.py: add "terminal.docker_env": "TERMINAL_DOCKER_ENV"
to _config_to_env_sync so `hermes config set terminal.docker_env …`
persists to .env correctly
- tools/terminal_tool.py: add docker_env key to _get_env_config() reading
TERMINAL_DOCKER_ENV via _parse_env_var with default "{}"
Tests: add test_docker_env_is_bridged_everywhere to
tests/tools/test_terminal_config_env_sync.py — stash-verified: fails on
origin/main, passes with fix.
Fixes#20537
After Popen succeeds with os.setsid (detached process group), 5 things
happen with no try/except: Thread construction, reader.start(), lock
acquisition, prune+register, checkpoint write. If any raises, the
Popen object goes unregistered and the detached process group leaks
indefinitely.
Wrap the post-spawn setup in try/except. On failure:
- os.killpg(getpgid(pid), SIGKILL) takes down the entire process
group (not just the shell - important because of detached PG +
-lic shell wrapper that may have spawned children)
- proc.kill() fallback for ProcessLookupError/PermissionError/OSError
- proc.wait(timeout=5) reaps with a bound
- re-raise to preserve original traceback
Nested try/except around cleanup so a secondary failure can't mask the
original.
Closes#2749.
Problem
=======
`tools.checkpoint_manager._touch_project` reads the project metadata
file with `json.loads(meta_path.read_text(...))`, then immediately does:
meta["workdir"] = str(_normalize_path(working_dir))
The `except` block only catches `(OSError, ValueError)`. When the file
parses successfully but returns a non-dict value (a list `[]`, `null`,
or a scalar from a corrupted or hand-truncated write), `json.loads`
succeeds without error and `meta` is set to, e.g., `[]`. The subsequent
subscript assignment then raises `TypeError: list indices must be
integers or slices, not str`, which is NOT caught by the narrow except
clause.
This TypeError propagates up through `_take` to `ensure_checkpoint`,
where the broad `except Exception` safety net swallows it. The effect
is that `ensure_checkpoint` silently returns False for the entire
session — all checkpoints are skipped for the affected working directory
without any user-visible error.
Root cause
==========
Missing `isinstance(meta, dict)` guard after `json.loads`, identical in
pattern to bugs fixed in `cron/jobs.py` (#22569) and
`tools/process_registry.py` (#22544). The same guard is already
present one function below in `_list_projects` (line 506), but was
inadvertently omitted in `_touch_project`.
Fix
===
Add two lines after the try/except:
```python
if not isinstance(meta, dict):
meta = {}
```
This matches the existing guard in `_list_projects` and ensures a fresh
empty dict is used whenever the persisted value is not a mapping —
preserving the `created_at` semantics via `setdefault` on the next line.
Tests
=====
`TestTouchProjectMalformedMeta` covers four non-dict root values
(`[]`, `null`, `42`, `"oops"`). Each writes a corrupted metadata file,
calls `_touch_project`, and asserts: (a) no exception raised, (b) the
metadata file is rewritten as a valid dict containing `last_touch` and
`workdir`. All four fail on main with `TypeError`, pass with fix.
Full `tests/tools/test_checkpoint_manager.py` regression: 77 passed.
`tools/image_generation_tool.py` did `import fal_client` at module
top, which pulled the entire fal_client + httpx + rich stack on every
process that ran `discover_builtin_tools()` — every `hermes` cold
start, even ones that never touch image generation.
Make the import lazy: replace the eager import with a placeholder
(`fal_client: Any = None`) and add an idempotent `_load_fal_client()`
that rebinds the module global on first use. Call it from the two
runtime entry points (`_ManagedFalSyncClient.__init__` and
`_submit_fal_request`) and from the SDK-presence check in
`check_image_generation_requirements`.
The loader short-circuits if the global is already truthy, which
preserves the test pattern of monkeypatching `fal_client` to install
a mock — the `monkeypatch.setattr(image_tool, "fal_client", ...)`
calls in test_image_generation.py keep working unchanged.
Measured impact (15-run min times, 9950X3D):
tools.image_generation_tool alone: 77 → 20 ms (-74%)
36 → 20 MB (-44%)
import cli (full): 734 → 720 ms (-2%)
import model_tools: 372 → 366 ms (-2%)
The microbench is dramatic but the full-CLI win is small — fal_client
shares its httpx + rich dependencies with the rest of the agent, so
on a real cold start most of the 16 MB / 64 ms is already paid by
other imports. The win matters mostly for processes that touch this
tool without otherwise loading httpx (rare) and for architectural
consistency with the previous lazy-load PRs (#22681 google_chat,
#22831 teams).
Tests: 55/55 `tests/tools/test_image_generation.py` pass, including
the cases that monkeypatch the module global to install a mock
fal_client. End-to-end verification confirms `import model_tools`
no longer pulls `fal_client` into `sys.modules`.
Pick openrouter/pareto-code as your model and OpenRouter auto-routes each
request to the cheapest model meeting your coding-quality bar (ranked by
Artificial Analysis). The new openrouter.min_coding_score config key (0.0-1.0,
default 0.65) tunes the floor.
- hermes_cli/models.py: add openrouter/pareto-code to OPENROUTER_MODELS so
it shows up in the picker with a description
- hermes_cli/config.py: add openrouter.min_coding_score (default 0.65 — lands
on a mid-tier coder on the current Pareto frontier)
- plugins/model-providers/openrouter: emit extra_body.plugins =
[{id: pareto-router, min_coding_score: X}] when model is openrouter/pareto-code
AND the score is a valid float in [0.0, 1.0]
- agent/transports/chat_completions.py: same emission on the legacy flag
path (when no provider profile is loaded)
- run_agent.py: openrouter_min_coding_score kwarg + storage; plumbed into
both build_kwargs() invocations and the context-summary extra_body path
- cli.py: read openrouter.min_coding_score once at init, validate float in
[0,1], pass to AIAgent constructions (CLI + background-task paths)
- cron/scheduler.py, batch_runner.py, tools/delegate_tool.py,
tui_gateway/server.py: propagate the kwarg (mirrors providers_order
plumbing — subagents inherit, cron/batch read from config)
- tests: profile-level + transport-level coverage of the model gating,
unset/empty/out-of-range handling, and the legacy flag path
- docs: new 'OpenRouter Pareto Code Router' section in providers.md
Verified end-to-end against api.openrouter.ai: at score=0.65 we land on a
mid-tier coder, at omission we get the strongest. Score is silently dropped
on any model other than openrouter/pareto-code, so it's safe to leave set.
acp_command / acp_args descriptions previously primed the model to
populate them — "Per-task ACP command override (e.g. 'copilot')" —
even when no ACP CLI was installed. Models with weaker schema-following
discipline would set them and the spawn would fail.
Add explicit "Do NOT set unless the user has explicitly told you"
guidance at both the top-level acp_command and the per-task override.
Strengthen acp_args to mention it's empty unless acp_command is set.
Adds 2 tests pinning the descriptions.
Note: this is a cosmetic prompt-engineering fix — the params remain
exposed in the schema. The fully-correct fix is to gate them behind
a config flag or runtime ACP-CLI detection so the schema only emits
them when an ACP harness is available. Tracked as a follow-up; this
PR ships the low-cost stopgap.
Salvage of #22680 (delegate schema only). The original PR also
bundled unrelated fixes for #22548, #21944, #22150 — those
need separate PRs since #22548 and #21944 are already addressed
on main (#22780 + #22798 in flight) and #22150 deserves its own
review.
Closes#22013.
Problem: `_get_cloud_provider()` set `_cloud_provider_resolved = True`
before resolution. If credentials were briefly unavailable on the first
call (e.g. a managed Nous Portal token mid-refresh), the resolver pinned
the entire process to local mode forever, even after credentials
self-healed seconds later.
Root cause: bookkeeping was set up-front, so any code path that fell
through to `return _cached_cloud_provider` (config read failure, no
credentials yet, explicit-provider instantiation failure) committed the
transient `None` to the cache permanently.
Fix: invert the bookkeeping. `_cloud_provider_resolved = True` is now
set only when (a) the user explicitly chose `cloud_provider: local`, or
(b) a provider was successfully resolved. All transient `None` paths
return without poisoning the cache, so the next call retries. Explicit
provider instantiation failures now log at warning level with stack
trace so operators can diagnose them.
Tests: 5 new cases in tests/tools/test_browser_cloud_provider_cache.py
covering explicit local, successful resolution, no-credentials-yet,
config read failure, and explicit provider instantiation failure.
Stash-verify confirmed the 3 transient-None tests fail without the fix.
All 320 existing browser tests still green.
Closes#22324
Returning users who enabled '🖱️ Computer Use (macOS)' via 'hermes tools'
saw '✓ Saved configuration' but no install — cua-driver was never on
PATH and the toolset failed at first use. Two compounding causes:
1. _toolset_needs_configuration_prompt fell through to _toolset_has_keys,
which returned True for any provider with empty env_vars. cua-driver
has no env vars, so the gate skipped _configure_toolset entirely and
_run_post_setup('cua_driver') never ran.
2. No stable CLI entry-point existed for re-running the install when
the picker no-op'd it (e.g. when toggling the toolset off+on inside
one picker session, where 'added' is empty).
Changes:
- hermes_cli/tools_config.py: add _POST_SETUP_INSTALLED registry
mapping post_setup keys to installed-state predicates. The gate
now returns True when any visible provider has a registered
post_setup whose predicate fails. cua_driver is the only opt-in
for now; other post_setup hooks keep their existing behaviour.
- hermes_cli/main.py: add 'hermes computer-use install' and
'hermes computer-use status' as a stable docs target. install
reuses the same _run_post_setup('cua_driver') path that the
picker invokes; status reports whether cua-driver is on PATH.
- tools/computer_use/cua_backend.py: install hint now points users
at 'hermes computer-use install' first.
- website/docs/user-guide/features/computer-use.md: document the
new command as the primary install path.
- website/docs/reference/cli-commands.md: catalog 'hermes
computer-use' alongside 'hermes tools'.
- tests/hermes_cli/test_post_setup_gating.py: regression coverage
for the gate predicate (missing -> setup forced, installed ->
setup skipped, broken predicate -> non-blocking, unregistered
keys -> behaviour unchanged).
Fixes#22737. Reported by @f-trycua.
The github-pr-workflow skill wraps the URL in double-quotes
('curl -H ... "https://api.github.com/..."'), which the original
allowlist regex (\s+https://api...) did not match. Without this,
the bundled github-pr-workflow skill is still blocked at every
cron tick despite #22605's fix landing for the bare-URL form.
Make the leading quote optional and add a regression test pinning
both single- and double-quoted forms.
The delegate_task tool description hardcoded 'default 3' / 'default 2' for
max_concurrent_children / max_spawn_depth, which misled the model on any
install that raised these limits — the schema text said 'default 3' even
when the user had set max_concurrent_children=15 / max_spawn_depth=3, so
the model would self-cap at 3 and never use the headroom.
Make the description dynamic. ToolEntry gains an optional
dynamic_schema_overrides callable; registry.get_definitions() merges its
output on top of the static schema before returning it. delegate_tool
registers a builder that reads the current delegation.* config and emits:
- 'up to N items concurrently for this user' (N = max_concurrent_children)
- 'Nested delegation IS enabled / OFF for this user (max_spawn_depth=N)'
- 'orchestrator children can themselves delegate up to M more level(s)'
- 'orchestrator_enabled=false' when the kill switch is set
The model_tools cache key already includes config.yaml mtime+size, so
edits to delegation.* in config invalidate the cached tool definitions
without an explicit hook. CLI_CONFIG staleness within a process is a
pre-existing limitation of _load_config and out of scope here.
Static description / tasks.description / role.description in
DELEGATE_TASK_SCHEMA are placeholders so module import doesn't trigger
cli.CLI_CONFIG load before the test conftest can redirect HERMES_HOME.
Plugin platforms (IRC, Teams, Google Chat) currently fail with
`No live adapter for platform '<name>'` when a `deliver=<plugin>` cron
job runs in a separate process from the gateway, even though the
platforms are eligible cron targets via `cron_deliver_env_var` (added
in #21306). Built-in platforms (Telegram, Discord, Slack, etc.) use
direct REST helpers in `tools/send_message_tool.py` so cron can deliver
without holding the gateway in the same process; plugin platforms
historically depended on `_gateway_runner_ref()` which returns `None`
out of process.
This change adds an optional `standalone_sender_fn` field to
`PlatformEntry` so plugins can register an ephemeral send path that
opens its own connection, sends, and closes without needing the live
adapter. The dispatch site in `_send_via_adapter` falls through to the
hook when the gateway runner is unavailable, with a descriptive error
when neither path applies. The hook is optional, so existing plugins
are unaffected.
Reference migrations land in the same change for IRC, Teams, and
Google Chat, exercising the hook across stdlib (asyncio + IRC protocol),
Bot Framework OAuth client_credentials, and Google service-account
flows respectively.
Security hardening on the new code paths:
* IRC: control-character stripping on chat_id and message body to
block CRLF command injection; bounded nick-collision retries; JOIN
before PRIVMSG so channels with the default `+n` mode accept the
delivery.
* Teams: TEAMS_SERVICE_URL validated against an allowlist of known
Bot Framework hosts (`smba.trafficmanager.net`,
`smba.infra.gov.teams.microsoft.us`) to block SSRF; chat_id and
tenant_id constrained to the documented Bot Framework character set;
per-request timeouts so a slow STS endpoint cannot starve the
activity POST.
* Google Chat: chat_id and thread_id validated against strict
resource-name regexes; service-account refresh wrapped in
`asyncio.wait_for` so a hung token endpoint cannot stall the
scheduler.
Test coverage: 20 new tests covering happy path, missing-config errors,
network failure modes, and each defensive validation. Existing tests
unchanged. `bash scripts/run_tests.sh tests/tools/test_send_message_tool.py
tests/gateway/test_irc_adapter.py tests/gateway/test_teams.py
tests/gateway/test_google_chat.py` reports 341 passed, 0 regressions.
Documentation: new "Out-of-process cron delivery" section in
website/docs/developer-guide/adding-platform-adapters.md and an entry
in gateway/platforms/ADDING_A_PLATFORM.md naming the hook.
Follow-up to PR #21293 (cli.py), which fixed the same anti-pattern.
`asyncio.get_event_loop()` is documented as effectively "always returns
the running loop when called from a coroutine" and emits
DeprecationWarning/RuntimeWarning in some interpreter configurations.
The Python docs explicitly recommend get_running_loop() inside coroutines.
Replaces the remaining 9 call sites that are unconditionally inside
async def bodies:
- tools/browser_cdp_tool.py — _cdp_call() (4 sites): deadline + remaining
computations inside the async websockets.connect context manager.
- hermes_cli/web_server.py — get_status, _start_device_code_flow,
submit_oauth_code (3 sites): all FastAPI async endpoints offloading
blocking httpx / PKCE work to run_in_executor.
- environments/agent_loop.py — HermesAgentLoop (1 site): tool dispatch
inside the async rollout loop.
- environments/benchmarks/terminalbench_2/terminalbench2_env.py —
rollout_and_score_eval (1 site): test verification thread offload.
All 9 sites are unconditionally inside async def bodies, so a running
loop is guaranteed and no try/except RuntimeError fallback is needed
(unlike the cli.py case in #21293, which ran from a background thread).
Behavior is identical on supported Python versions; aligns the codebase
with the post-#21293 idiom and avoids future warnings as the deprecation
hardens.
Salvaged from PR #21930 by @Zhekinmaksim onto current main (the
original branch was 109 commits behind and carried unintended
stale-branch reverts of unrelated landed changes — _tail_lines
encoding=utf-8 and the Windows PTY bridge guard). Only the 9 swaps
from the PR's intended scope are applied here.
Comments are injected into the next worker's system prompt by
build_worker_context() as '**{author}** (timestamp): {body}'. The
previous code accepted args['author'] as a free-form override and
exposed it on KANBAN_COMMENT_SCHEMA, which let a worker:
1. Receive a prompt-injection in a malicious task body.
2. Call kanban_comment with author='hermes-system' (or any other
authoritative-looking name) on a sibling task.
3. The next worker assigned to that sibling task sees the forged
comment in its boot context as what reads like a system-authored
directive.
Always derive author from HERMES_PROFILE (the dispatcher already sets
this per worker at hermes_cli/kanban_db.py:3718), and remove the
'author' property from the tool schema so the LLM can't see the
override surface.
Cross-task commenting itself remains unrestricted (see #19713) —
comments are the deliberate handoff channel between tasks; only the
author-override surface is closed.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>