On macOS (and Windows), /proc is unavailable so _get_process_start_time()
always returns None. When a gateway creates a scoped lock record with
start_time=None and then exits, macOS can reuse that PID for an unrelated
process. On restart, acquire_scoped_lock() sees:
1. os.kill(pid, 0) succeeds (PID is alive — but it's bluetoothuserd, not
the gateway)
2. existing.start_time is None and current_start is None, so the
start_time comparison is inconclusive
3. The lock is treated as active, blocking gateway startup with:
"Telegram bot token already in use (PID 873). Stop the other gateway
first."
Root cause: _read_process_cmdline() only reads /proc/<pid>/cmdline, which
doesn't exist on macOS. It always returns None, making
_looks_like_gateway_process() always return False, so the cmdline fallback
path in acquire_scoped_lock() was unreachable on macOS.
Fix (two parts):
1. _read_process_cmdline(): Add a ps(1) fallback for platforms without
/proc. When /proc/<pid>/cmdline doesn't exist, we now run
"ps -p <pid> -o command=" to retrieve the process command line. The
/proc path is tried first (preserving Linux performance); ps is only
invoked as a fallback.
2. acquire_scoped_lock(): When both the lock record's start_time and the
live process's start_time are None (the macOS case), fall back to
checking whether the live PID still looks like a Hermes gateway process
via _looks_like_gateway_process(). If it doesn't, the lock is stale.
Closes#16376
The c1eb2dcda tiered installer made two install paths look frozen on
slow networks or broken environments because both swallowed the
underlying tool's stderr.
scripts/install.sh, setup-hermes.sh:
curl -LsSf https://astral.sh/uv/install.sh | sh 2>/dev/null
printed only '✗ Failed to install uv' on failure with no diagnostic.
Common real causes (glibc mismatch on old distros, corp proxy / TLS
interception, missing curl, ~/.local/bin not writable, disk full)
were invisible. Also: piping curl into sh masks curl failures under
set -e (no pipefail) — sh exits 0 on empty stdin, so a network error
succeeded silently.
Fix: download installer to a tempfile first, then run it. Capture
curl + installer output to a log; on failure, indent and print it.
scripts/install.sh hash-verified tier:
uv sync --all-extras --locked 2>"$(mktemp)" silenced uv's progress
output, making a fresh-venv install (~50 transitives including
torch-class deps) look hung for 1-5 minutes — users see 'Trying tier:
hash-verified (uv.lock) ...' and assume it's frozen. The mktemp
substitution also wasn't saved to a variable, so the uv error on
failure was unreachable.
Fix: stream uv's stderr directly so users see live 'Resolved N /
Prepared / Installed' progress. Print an upfront note that the first
run takes 1-5 minutes.
Detect when write_file / patch calls fail during a turn and are never
superseded by a successful write to the same path. When the final
text response is delivered, append an advisory footer listing the
files that did NOT change — so models that over-claim 'patched 5 files'
after 4 silent failures can't hide the lie.
Catches the failure mode reported in Ben Eng's llm-wiki session:
grok-4.1-fast issued batches of parallel patches, half failed with
'Could not find old_string', and the agent summarised the turn
claiming every file was edited. The user had to manually run
'git status' each turn to catch it.
The verifier is a pure post-hoc check on tool results — no new LLM
calls, no synthetic messages injected into history (prompt cache
preserved), no changes to tool argument dispatch. Per-turn state is
keyed by path; a later successful write to the same path clears the
failure entry so single-file retry recovery is not flagged.
Wired into both _execute_tool_calls_concurrent and
_execute_tool_calls_sequential, so batched parallel patches and one-at-
a-time edits are both covered. Footer emission happens after the
agent loop exits, before transform_llm_output / post_llm_call plugin
hooks run, so plugins still see (and can modify) the augmented text.
Config: display.file_mutation_verifier (bool, default true) +
HERMES_FILE_MUTATION_VERIFIER env override.
31 unit tests in tests/run_agent/test_file_mutation_verifier.py cover
target extraction (write_file, patch-replace, patch-v4a single and
multi-file), error-preview extraction (JSON .error field and plain
string), per-turn state transitions (first-error-wins on repeated
failure, success supersedes failure), footer rendering (truncation
at 10 entries, user-actionable hint), and env/config precedence.
Companion docs updated: user-guide/configuration.md +
reference/environment-variables.md.
* feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback
Three coordinated mitigations for the Mini Shai-Hulud worm hitting
mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package
compromise that follows.
# What this PR makes true
1. Users with the poisoned mistralai 2.4.6 in their venv get a loud
detection banner with copy-pasteable remediation steps the moment
they run hermes (and on every gateway startup).
2. One quarantined / yanked PyPI package can no longer silently demote
a fresh install to 'core only' — the installer keeps every other
extra and tells the user which tier landed.
3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can
lazy-install on first use under a strict allowlist, instead of
eagerly pulling everything at install time.
# Detection: hermes_cli/security_advisories.py
- ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for
mistralai==2.4.6). Adding the next one is a single dataclass.
- detect_compromised() uses importlib.metadata.version() — no pip
dependency, works in uv venvs that lack pip.
- Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits
the startup banner to once per 24h per advisory.
- Acks persisted to security.acked_advisories in config.yaml; never
re-banner after ack.
- Wired into:
* hermes doctor — runs first, prints full remediation block
* hermes doctor --ack <id> — dismisses an advisory
* cli.py interactive run() and single-query branches — short
stderr banner pointing at hermes doctor
* gateway/run.py startup — operator-visible warning in gateway.log
# Lazy-install framework: tools/lazy_deps.py
- LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs,
memory.honcho, provider.bedrock, etc.) to pip specs.
- ensure(feature) installs missing deps in the active venv via the
uv → pip → ensurepip ladder (matches tools_config._pip_install).
- Strict spec safety regex rejects URLs, file paths, shell metas,
pip flag injection, control chars — only PyPI-by-name accepted.
- Gated on security.allow_lazy_installs (default true) plus the
HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs.
- Migrated three backends as proof of pattern:
* tools/tts_tool.py — _import_elevenlabs() calls ensure first
* plugins/memory/honcho/client.py — get_honcho_client lazy-installs
* tts.mistral / stt.mistral entries pre-registered for when PyPI
restores mistralai
# Installer fallback tiers
scripts/install.sh, scripts/install.ps1, setup-hermes.sh:
- Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one
array when a transitive breaks; users keep every other extra.
- New 'all minus known-broken' tier between [all] and the existing
PyPI-only-extras tier. Only kicks in when [all] fails resolve.
- All three tiers explicit: every fallback announces which tier
landed and prints a re-run hint when not on Tier 1.
- install.ps1 and install.sh both regenerate their tier specs from
the same _BROKEN_EXTRAS array so updates stay in sync.
Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral'
in its extra list — bug fixed by the refactor (mistral is filtered
out).
# Config
hermes_cli/config.py — DEFAULT_CONFIG.security gains:
- acked_advisories: [] (advisory IDs the user has dismissed)
- allow_lazy_installs: True (security gate for ensure())
No config version bump needed — both keys nest under existing
security: block, and load_config's deep-merge picks up DEFAULT_CONFIG
defaults for users with older configs.
# Tests
tests/hermes_cli/test_security_advisories.py — 23 tests covering:
- detect_compromised matches/non-matches, wildcard frozenset
- ack persistence, idempotence, blank rejection, config-failure path
- banner cache rate limiting + 24h re-banner + ack-stops-banner
- short_banner_lines / full_remediation_text / render_doctor_section /
gateway_log_message
- shipped catalog well-formedness invariant
tests/tools/test_lazy_deps.py — 40 tests covering:
- spec safety: 11 safe parametrized + 18 unsafe parametrized
- allowlist: unknown-feature rejection, namespace.name shape,
every shipped spec passes the safety regex
- security gating: config flag, env var, default, fail-open
- ensure() happy/sad paths: already-satisfied, install success,
pip stderr surfaced on failure, install-succeeds-but-still-missing
- is_available, feature_install_command
Combined: 63 new tests, all passing under scripts/run_tests.sh.
# Validation
- scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py
tests/tools/test_lazy_deps.py → 63/63 passing
- scripts/run_tests.sh tests/hermes_cli/test_doctor.py
tests/hermes_cli/test_doctor_command_install.py
tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py
tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing
- scripts/run_tests.sh tests/hermes_cli/ tests/tools/ →
9191 passed, 8 pre-existing failures (verified on origin/main
before this change)
- bash -n on install.sh and setup-hermes.sh → OK
- py_compile on all modified .py files → OK
- End-to-end smoke test of detect_compromised + render_doctor_section
+ gateway_log_message with mocked installed version → produces
copy-pasteable remediation output
# Community
Full advisory + remediation steps:
website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md
Short-form post drafts (Discord, GitHub pinned issue, README banner):
scripts/community-announcement-shai-hulud.md
Refs: PR #24205 (mistral disabled), Socket Security advisory
<https://socket.dev/blog/mini-shai-hulud-worm-pypi>
* build(deps): pin every direct dep to ==X.Y.Z (no ranges)
Companion to the supply-chain advisory work: replace every >=/</~= range
in pyproject.toml's [project.dependencies] and [project.optional-dependencies]
with an exact ==X.Y.Z pin sourced from uv.lock.
Why: ranges allow PyPI to ship a fresh version of any direct dep at any
time without a code review on our side. With ranges, the malicious
mistralai 2.4.6 release would have been pulled by every fresh
'pip install -e .[all]' for the hours between upload and PyPI's
quarantine — exactly the install window we got hit on. Exact pins close
that window: the only way a new package version reaches a user is via
an intentional update on our end.
What the user-facing change is: nothing, behavior-wise. Every package
resolves to the same version it was already resolving to via uv.lock —
the pins just remove the resolver's freedom to pick a different one.
Cost: any user installing Hermes alongside another package that requires
a newer pin gets a resolver conflict. Acceptable for our isolated-venv
install path; documented in the new comment block.
Build-system requires line (setuptools>=61.0) is intentionally left
as a range — pinning the build backend would block fresh pip from
bootstrapping the build on architectures where that exact wheel isn't
available.
mistral extra (mistralai==2.3.0) is pinned but stays out of [all]
(per PR #24205). 'uv lock' regeneration will fail until PyPI restores
mistralai; lockfile regeneration is gated behind that, NOT on every PR.
LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy-
install pathway can never resolve a different version than the one
declared in pyproject.toml.
Validation:
- Cross-checked all 77 pinned direct deps in pyproject.toml against
uv.lock — every pin matches the resolved version exactly.
- Cross-checked all LAZY_DEPS specs against uv.lock — same.
- 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly.
- tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py
→ 63/63 passing (every shipped spec passes the safety regex).
- Doctor + TTS + transcription targeted suite → 146/146 passing.
* build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra
You asked: 'what about the dependencies the dependencies rely on?' —
correctly noting that exact-pinning direct deps in pyproject.toml does
NOT cover the transitive graph. `pip install` and `uv pip install` both
re-resolve transitives fresh from PyPI at install time, so a compromised
transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would
still hit our users even with every direct dep exact-pinned.
# What this commit fixes
1. **Both real installer scripts now prefer `uv sync --locked` as Tier 0.**
uv.lock records SHA256 hashes for every transitive — a compromised
package with a different hash gets REJECTED. Falls through to the
existing `uv pip install` cascade if the lockfile is missing or
stale, with a loud warning that the fallback path does NOT
hash-verify transitives. Previously only `setup-hermes.sh` (the dev
path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1`
(the paths fresh users actually run) skipped it.
2. **Removed the `[mistral]` extra entirely.** The `mistralai` PyPI
project is fully quarantined right now — every version returns 404,
so any pin we wrote was unresolvable, which broke `uv lock --check`
in CI. Restoration is documented in pyproject.toml as a 5-step
checklist (verify, re-add extra, re-enable in 4 modules, regenerate
lock, optionally re-add to [all]).
3. **Regenerated uv.lock.** 262 packages, mistralai/eval-type-backport/
jsonpath-python pruned. `uv lock --check` now passes.
# Defense-in-depth view
| Layer | Where | Protects against |
|----------------------------|-------------------|-------------------------------------------|
| Exact pins in pyproject | direct deps | new mistralai 2.4.6-style direct compromise |
| uv.lock + `--locked` install | transitive graph | transitive worm injection |
| Tier-0 hash-verified path | install.sh / .ps1 | actually USE the lockfile in fresh installs |
| `uv lock --check` CI gate | every PR | drift between pyproject and lockfile |
| `hermes_cli/security_advisories.py` | runtime | cleanup for users who already got hit |
The exact pinning + hash verification together close the supply-chain
gap. Without the lockfile path, exact pins alone are theater.
# Validation
- `uv lock --check` → passes (262 packages resolved, no drift).
- `bash -n` on install.sh + setup-hermes.sh → OK.
- 209/209 tests passing across new + adjacent test files
(test_lazy_deps.py, test_security_advisories.py, test_doctor.py,
test_tts_mistral.py, test_transcription_tools.py).
- TOML parse OK.
* chore: remove community announcement drafts (PR body covers it)
* build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard)
Extends the lazy-install framework to cover everything that's not used by
every hermes session. Base install drops from ~60 packages to 45.
Moved out of core dependencies = []:
- anthropic (only when provider=anthropic native, not via aggregators)
- exa-py, firecrawl-py, parallel-web (search backends; only when picked)
- fal-client (image gen; only when picked)
- edge-tts (default TTS but still optional)
New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web]
[fal] [edge-tts]. All added to [all].
New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel},
tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix},
terminal.{modal,daytona,vercel}, tool.dashboard.
Each import site now calls ensure() before importing the SDK. Where the
module had a top-level try/except (telegram, discord, fastapi), the
graceful-fallback pattern was extended to lazy-install on first
check_*_requirements() call and re-bind module globals.
Updated test_windows_native_support.py tzdata check from snapshot
(>=2023.3 literal) to invariant (any version + win32 marker).
Validation:
- Base install: 45 packages (was ~60); 6 newly-extracted packages absent
- uv lock --check: passes (262 packages, no drift)
- 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing
- py_compile clean on all 12 modified modules
The `mistralai` PyPI package was quarantined on 2026-05-12 after a
malicious 2.4.6 release. Every fresh resolve (AUR makepkg, Docker build,
CI run, install.sh first-run) currently fails on
`mistralai>=2.3.0,<3` because PyPI returns zero candidates.
Existing users running `hermes update` mostly didn't notice — `hermes
update` falls back from `.[all]` to per-extra retries and silently
skips mistral with a warning that scrolls past. But fresh installs
hard-fail or lose every other extra.
Changes:
- pyproject.toml: drop `hermes-agent[mistral]` from `[all]` and
`[termux-all]`. The `mistral` extra itself is preserved so users
can opt back in once PyPI un-quarantines.
- hermes_cli/tools_config.py: hide Mistral Voxtral TTS from the
`hermes tools` provider picker until restored.
- hermes_cli/web_server.py: drop "mistral" from dashboard STT options.
- tools/transcription_tools.py: explicit `provider: mistral` returns
"none" with a clear status message; auto-detect skips mistral.
- tools/tts_tool.py: dispatcher returns a clear "temporarily disabled"
error before any SDK import attempt (avoids cached-stale-package
surprises).
- tests/tools/: update three test files to assert the new disabled
behavior. Each test docstring records why and points at the rollback
trigger (PyPI un-quarantines mistralai).
Restore plan: revert this commit once the package is available on PyPI
again. The behavior change is intentional and documented in code
comments + test docstrings to make the rollback trivial.
Validation:
- scripts/run_tests.sh tests/tools/ -k 'mistral or stt or tts' →
425/425 passing.
Refs: https://pypi.org/simple/mistralai/ (currently
"pypi:project-status: quarantined").
Handle MiniMax OAuth expiry values consistently across CLI and dashboard
flows, fix CLI status/add behavior, and force pooled OAuth runtime
requests through Anthropic Messages.
- web_server._minimax_poller: parse expired_in via the shared resolver
so unix-ms absolute timestamps stop landing as TTL seconds and crashing
with 'year 583911 is out of range' when a user connects MiniMax OAuth
from the dashboard.
- auth._minimax_oauth_login / _refresh_minimax_oauth_state: same fix on
the CLI login + refresh paths.
- auth.get_auth_status: dispatch minimax-oauth to its dedicated status
function instead of falling through.
- auth_commands.auth_add_command: 'hermes auth add minimax-oauth' now
starts the device-code login flow and persists a pool entry with the
access + refresh tokens, instead of requiring credentials to already
exist.
- runtime_provider._resolve_runtime_from_pool_entry: pin pooled
minimax-oauth credentials to anthropic_messages so a stale
model.api_mode: chat_completions can't send requests to
/anthropic/chat/completions and trigger MiniMax nginx 404s.
Co-authored-by: Cursor <cursoragent@cursor.com>
Qwen models on Nous Portal (e.g. qwen3.6-plus) now get the same envelope-layout
cache_control markers and long-lived (1h cross-session) cache treatment as
Portal Claude. Portal proxies to OpenRouter with identical wire-format and
cache_control semantics, but the prior policy left Portal Qwen falling through
to the alibaba-family branch (which only matches provider=opencode/alibaba),
serving 0% cache hits and re-billing the full prompt every turn.
Scope is narrow: Portal Claude OR Portal Qwen. Other models on Portal keep
their existing behavior.
- _anthropic_prompt_cache_policy: add (is_nous_portal and qwen) -> (True, False)
- _supports_long_lived_anthropic_cache: drop Claude-only gate for Portal so
Qwen also gets the validated 1h cross-session layout
- tests cover both functions, both bare and vendored qwen slug forms, and
the rejection of non-Claude non-Qwen Portal traffic
* fix(tui-clipboard): skip native safety net on OSC52-capable terminals
On terminals with first-class OSC 52 support (Ghostty, kitty, WezTerm,
Windows Terminal, VS Code), setClipboard() currently fires both OSC 52
AND a parallel native-tool write (wl-copy / xclip / pbcopy). On Wayland
+ wl-copy this corrupts the clipboard: probeLinuxCopy() runs wl-copy
with empty stdin as an existence check (destructive — wipes clipboard
to empty string), and the subsequent real wl-copy invocation races
OSC 52 plus its own daemon's previous SIGTERM.
Symptom: user on Arch + Ghostty + wl-copy (Wayland, no tmux, no SSH)
had to press Ctrl+Shift+C three times before a selection landed.
env -u WAYLAND_DISPLAY -u DISPLAY HERMES_TUI_FORCE_OSC52=1 (which
short-circuits copyNative via the DISPLAY-absent early-return) made
every copy work instantly — proving OSC 52 alone is sufficient on
Ghostty and that copyNative() is actively destructive there.
Add OSC52_CAPABLE_TERMINALS allowlist to terminal.ts (same pattern as
the existing EXTENDED_KEYS_TERMINALS), and gate copyNative() on the
terminal NOT being on it. The native safety net continues to fire on
unrecognised terminals (xterm, GNOME Terminal, Konsole, Terminal.app,
etc.) where OSC 52 is less reliable.
* fix(tui-clipboard): address Copilot review feedback
- Move OSC52_CAPABLE_TERMINALS + supportsOsc52Clipboard() from
ink/terminal.ts to utils/env.ts. ink/terminal.ts already imports
link from ink/termio/osc.ts; importing back into termio/osc.ts
introduced a circular dependency. utils/env.ts has no deps on
either file and already owns terminal detection (detectTerminal()),
so the helper sits naturally next to it.
- Replace the inline gating (!SSH_CONNECTION && !supportsOsc52Clipboard())
with a pure shouldUseNativeClipboard(env, terminal) helper. The old
expression skipped native on allowlisted terminals even when
setClipboard() wouldn't actually emit OSC 52 (e.g. inside
TMUX/STY where we use tmux load-buffer instead, or when the user
has set HERMES_TUI_FORCE_OSC52=0). That made the clipboard write
a no-op in those configurations. The new helper:
1. SSH_CONNECTION set -> false (existing behaviour)
2. TMUX or STY set -> true (we go through load-buffer, no race)
3. shouldEmitClipboardSequence() false -> true (native is the
only path left when OSC 52 is suppressed)
4. Otherwise: skip native iff terminal is allowlisted.
- Add 11 tests for shouldUseNativeClipboard covering the SSH guard,
TMUX/STY tmux-inside-Ghostty case, HERMES_TUI_FORCE_OSC52=0
override, allowlisted vs non-allowlisted terminals, precedence,
and default-args smoke. Tests follow the package's existing
parameterised-helper style (no vi.mock; helpers accept env and
terminal as arguments).
- Update test imports to the new utils/env.js path.
* fix(tui-clipboard): address Copilot round 2 feedback
* fix(tui-clipboard): address Copilot round 3 feedback
* fix(tui-clipboard): address Copilot round 4 feedback
Free-tier users were seeing 'No free models currently available.' in the
`hermes model` and post-login pickers even though qwen/qwen3.6-plus is
free on the Portal right now. Three independent breakages compounded:
1. The docs-hosted catalog manifest at website/static/api/model-catalog.json
was not regenerated when _PROVIDER_MODELS['nous'] was updated, so users
fetching the manifest got a list that didn't include qwen/qwen3.6-plus.
2. _resolve_nous_pricing_credentials() returned ('', '') on any auth blip,
collapsing get_pricing_for_provider('nous') to {} and making every
curated model fall through the free-tier filter as 'paid'.
3. Even with healthy pricing, the picker only ever showed models from the
in-repo curated list intersected with live pricing — a Portal-flagged
free model not yet in the curated list could never appear.
Changes:
- hermes_cli/models.py: new union_with_portal_free_recommendations() that
augments the curated list with Portal freeRecommendedModels entries
(with synthetic free pricing so partition keeps them). The Portal's
/api/nous/recommended-models endpoint is now the source of truth for
free-tier surfacing — old Hermes builds will see new free models
without a CLI release.
- hermes_cli/models.py: _resolve_nous_pricing_credentials() falls back to
the public inference base URL when runtime cred resolution fails.
The /v1/models endpoint exposes pricing without auth, so silently
returning {} just because a refresh token expired was wrong.
- hermes_cli/auth.py + hermes_cli/main.py: both free-tier picker call
sites call union_with_portal_free_recommendations() before partition.
- tests/hermes_cli/test_models.py: 7 tests covering union behaviour
(prepend, dedup, end-to-end with stale pricing, empty/missing/error
payloads, invalid entries).
- tests/hermes_cli/test_model_catalog.py: drift guard
TestManifestMatchesInRepoLists fails CI when _PROVIDER_MODELS['nous']
or OPENROUTER_MODELS is edited without re-running
scripts/build_model_catalog.py. Verified empirically that removing a
manifest entry triggers an assertion with an actionable error message.
Validation:
- 133/133 targeted tests pass (test_models, test_model_catalog,
test_auth_nous_provider).
- Live E2E against the real Portal:
- Stale curated list ['claude-opus','claude-sonnet','gpt-5.4'] (no
qwen) → after union: ['qwen/qwen3.6-plus', ...] →
partition(free_tier=True): selectable=['qwen/qwen3.6-plus'].
- Simulated expired refresh token → anon fetch returns 403 pricing
entries including qwen/qwen3.6-plus -> {prompt:0, completion:0}.
- ruff: clean.
cua-driver was only installed once on toolset enable: `_run_post_setup` early-returns when the binary is already on PATH, so upstream fixes (e.g. v0.1.6 Safari window-focus fix) never reached existing users without manual reinstall.
Two refresh points now:
- `hermes update` re-runs the upstream installer at the end of the update if cua-driver is on PATH (macOS-only, no-op otherwise). Ties driver freshness to the user-controlled update cadence — no startup latency, no per-launch GitHub API call.
- `hermes computer-use install --upgrade` for manual force-refresh.
The upstream `install.sh` always pulls the latest release, so re-running is the canonical upgrade path. No version-comparison logic needed.
`hermes computer-use status` now shows the installed version, and points at `--upgrade` for refreshing.
Fixes#22832.
## Root cause
`hermes_cli/web_server.py:start_oauth_login` dispatched OAuth flows by
the catalog's `flow` field rather than provider id:
if catalog_entry["flow"] == "pkce":
return _start_anthropic_pkce()
The catalog had two `flow: "pkce"` entries — `anthropic` and
`minimax-oauth` — so clicking "Login" on MiniMax in the dashboard's
Keys tab unconditionally launched the Anthropic/Claude PKCE flow.
## Fix
Three changes in `hermes_cli/web_server.py`:
1. Catalog entry for `minimax-oauth` changed from `flow: "pkce"` to
`flow: "device_code"`. From a UX perspective MiniMax is a
verification-URI + user-code flow (open URL, enter code, backend
polls) — same shape as Nous's device-code flow. The PKCE bit
(verifier + challenge from `_minimax_pkce_pair`) is a security
extension that doesn't change the operator experience; the existing
dashboard modal already renders `device_code` correctly for this UX.
2. New MiniMax branch in `_start_device_code_flow`, mirroring the
existing Nous branch but calling MiniMax-specific helpers
(`_minimax_request_user_code`, `_minimax_pkce_pair`). Stashes
verifier + state in the session for the poller to consume. Handles
the overloaded `expired_in` field (could be unix-ms timestamp OR
seconds-from-now duration) the same way `_minimax_poll_token` does.
3. New `_minimax_poller` background thread mirroring `_nous_poller`.
Calls `_minimax_poll_token` → on success builds the same
`auth_state` dict the CLI flow (`_minimax_oauth_login`) builds, and
persists via `_minimax_save_auth_state` so the dashboard path leaves
the system in the same state as `hermes auth add minimax-oauth`.
Plus a dispatcher tightening to prevent regression: the `pkce` branch
now requires `provider_id == "anthropic"`, so any future PKCE provider
added without a proper start function gets a clean
`400 Unsupported flow` rather than silently launching Anthropic OAuth.
## Test
New `tests/hermes_cli/test_web_oauth_dispatch.py`:
- Regression test asserting MiniMax start does NOT return claude.ai
- Sanity test that Anthropic PKCE still works after the dispatcher
tightening
- Forward-looking test: a hypothetical pkce-flagged provider without
an explicit branch is rejected cleanly rather than misrouted
## Limitations
- The dashboard MiniMax path defaults to `region="global"`. CN-region
operators can still use the CLI flow which supports `--region cn`.
Adding a region toggle to the dashboard UI is a follow-up.
Follow-up to #23863 (CJK table alignment). The realigner was
correctly padding pipes to identical column offsets, but when a
table's natural width exceeds terminal cells it produced lines that
the terminal soft-wrapped mid-cell, destroying column alignment
visually even though the bytes were perfectly padded. Reported as
'columns are not aligned' on tables containing one long row alongside
several short rows.
Approach mirrors Claude Code's MarkdownTable.tsx narrow-terminal
fallback: when realign_markdown_tables is given an available_width
budget and the rebuilt horizontal table exceeds it, render each body
row as 'Header: value' lines separated by a thin ─ rule. Word-wraps
oversize values at the budget with a 2-space continuation indent.
- agent/markdown_tables.py: realign_markdown_tables(text, available_width=None);
threshold check at the top of _render_block flips into a new
_render_vertical fallback. Includes _wrap_to_width with hard-break
for tokens longer than the budget.
- cli.py: helper _terminal_width_for_streaming() returns
shutil.get_terminal_size().columns minus _STREAM_PAD and a 2-cell
safety margin; passed to all three realign call sites
(_render_final_assistant_content for strip+render Panel paths, and
the streaming flushers in _emit_stream_text / _flush_stream).
- tests/agent/test_markdown_tables.py: 4 new tests covering the
overflow-vertical fallback for ASCII + CJK content, the
'fits → keep horizontal' case, and the long-cell wrap with indent.
Live-verified: with COLUMNS=100, the user's reported 'long row in
ASCII table' case now renders as vertical key-value rows that all fit
the panel; the 6-column CJK comparison table still renders as an
aligned horizontal table because it fits inside 100 cols.
* feat(ui-tui): resolve links to readable page titles
Mirror desktop pretty-link behavior in the TUI by resolving HTTP links to page titles with shared caching and safe fetch filters, plus slug-based fallbacks so chat links stay readable even when title fetch fails.
* refactor(ui-tui): tighten link-title fallback handling
Clean up the link-title resolver by hardening in-flight cleanup and clarifying title length limits, while adding focused coverage for HTML entity decoding and markdown-label fallback behavior.
* fix(ui-tui): block private-network targets in title fetches
Prevent automatic link-title resolution from requesting local or private hosts by rejecting RFC1918, link-local, ULA, and intranet-style hostnames before fetch, and add regression coverage for blocked host patterns.
The old mtime-tracking staleness machinery (_tui_build_needed,
_hermes_ink_bundle_stale, _find_bundled_tui) tried to avoid rebuilding
by comparing source timestamps to dist/entry.js. This was fragile and
added ~100 lines of code. Replace with three clear paths:
1. HERMES_TUI_DIR set (prebuilt/nix): just node dist/entry.js, no build
2. --dev mode: tsx src/entry.tsx, no build, hot reload
3. Normal: always npm run build (esbuild is ~1s, correctness > caching)
Also error when HERMES_TUI_DIR is set with --dev (footgun: prebuilt
bundle has no source code to hot-reload).
Based on PR #23950 by @nicoechaniz.
- Add "kimi" and "moonshot" to PROVIDER_TO_MODELS_DEV → kimi-for-coding
- Gate OpenRouter metadata step behind "if not effective_provider":
known providers should not be overridden by community-maintained OR data
- Keep the targeted Kimi-family 32k guard as a secondary safety net
inside the OR gate (for unknown providers with Kimi models)
Co-authored-by: nicoechaniz <nicoechaniz@altermundi.net>
Kimi-k2.6 (which supports 262K context) was incorrectly resolved as 32K,
tripping the 64K minimum-context guard and preventing use of the model on
Ollama Cloud and Kimi Coding / Moonshot providers.
Three fixes in the context-length resolution chain:
1. Ollama Cloud native /api/show query: new _query_ollama_api_show()
queries the Ollama native API for authoritative GGUF model_info
context_length. For hosted Ollama, prefers model_info over num_ctx
since users can't set their own num_ctx on Cloud. Added at step 5e
in get_model_context_length(), before the models.dev fallback.
2. models.dev :cloud/-cloud suffix fallback: lookup_models_dev_context()
now also tries appending :cloud and -cloud suffixes when the bare
model name doesn't match. models.dev stores 'kimi-k2.6:cloud' but
users and the live API use bare 'kimi-k2.6'.
3. Kimi-family 32K guard: after the OpenRouter metadata step, reject
exactly 32768 for Kimi-named models (kimi-*, moonshot*) and fall
through to hardcoded defaults ('kimi': 262144). OpenRouter reports
32768 for moonshotai/kimi-k2.6 but the model actually supports 262K.
Narrow filter — only 32768, only Kimi-family — becomes dead code
when OpenRouter updates its metadata.
---
Set HERMES_SESSION_ID using the existing session_context.py ContextVar
system for concurrency safety (multiple gateway sessions in one process
won't cross-talk). Also writes os.environ as fallback for CLI mode.
Touchpoints:
- gateway/session_context.py: Add _SESSION_ID ContextVar + _VAR_MAP entry
- run_agent.py: Set both ContextVar and os.environ at init and on
context-compression rotation
- tools/environments/local.py: Bridge ContextVars into subprocess env
in _make_run_env() (ContextVars don't propagate to child processes)
- tests/run_agent/test_session_id_env.py: 3 tests covering env, provided
ID, and ContextVar paths
execute_code subprocess already passes HERMES_* prefixed vars through
_scrub_child_env (line 82: _SAFE_ENV_PREFIXES includes 'HERMES_').
Primary use case: webhook-triggered agents that need to include a
`--resume <session_id>` takeover command in their output.
Cuts input cost for first-turn Claude requests by ~85-90% on subsequent
sessions within an hour. Tools array (~13k tokens for default toolset) +
stable system prefix (~5-8k tokens) get a 1h cache_control marker; the
volatile suffix (memory, USER profile, timestamp, session id) sits in a
separate non-cached block at the end so it doesn't poison the cross-session
prefix when it changes.
Provider gate: Claude on native Anthropic (incl. OAuth subscription),
OpenRouter, and Nous Portal (which proxies to OpenRouter). All other
providers keep today's system_and_3 layout unchanged.
Layout (4 cache_control breakpoints, Anthropic max):
1. tools[-1] -> 1h (cross-session)
2. system content[0] -> 1h (cross-session, stable prefix)
3. messages[-2] -> 5m (within-session rolling)
4. messages[-1] -> 5m (within-session rolling)
Within-session rolling shrinks from 3 messages to 2 to free the breakpoint
budget. On Claude with realistic tool loadouts the long-lived tier carries
the bulk of cross-session value anyway.
System prompt is now always assembled cache-friendly: stable identity /
guidance / skills / platform hints first, then session-stable context
files (AGENTS.md, .cursorrules), then per-call volatile content. Old
single-string callers see the same logical content (same join order),
just reordered so volatile lives at the end.
Config knobs (defaults shown):
prompt_caching:
cache_ttl: "5m" # rolling-window TTL (unchanged)
long_lived_prefix: true # opt-out switch
long_lived_ttl: "1h" # cross-session prefix TTL
Live E2E (tests/agent/test_prompt_caching_live.py, gated on
OPENROUTER_API_KEY) on anthropic/claude-haiku-4.5 with default toolset:
Call 1 (cold): cache_write=13,415 cache_read=0
Call 2 (NEW agent + msg): cache_write=391 cache_read=13,025
Cross-session reuse: 97.09%
Implementation:
* agent/prompt_caching.py: new apply_anthropic_cache_control_long_lived()
+ mark_tools_for_long_lived_cache(); existing apply_anthropic_cache_control()
preserved verbatim for the fallback path.
* agent/anthropic_adapter.py: convert_tools_to_anthropic() now forwards
cache_control onto each Anthropic-format tool dict.
* run_agent.py: _build_system_prompt_parts() returns the 3-tier dict;
_build_system_prompt() joins them (backward compatible).
_supports_long_lived_anthropic_cache() policy added next to the existing
_anthropic_prompt_cache_policy() (which now also recognises Nous Portal
Claude — pre-existing gap fixed in passing).
_build_api_kwargs() resolves tools_for_api once and propagates the
marker through all four build paths (anthropic_messages, bedrock,
codex_responses, profile/legacy chat completions).
Long-lived flag plumbed into the runtime snapshot/restore + model-switch
+ fallback-promotion paths.
Tests:
* tests/agent/test_prompt_caching.py: +8 tests (TestMarkToolsForLongLivedCache,
TestApplyAnthropicCacheControlLongLived).
* tests/run_agent/test_anthropic_prompt_cache_policy.py: +9 tests
(TestSupportsLongLivedAnthropicCache matrix across 8 endpoint classes
+ a fallback-target case).
* tests/agent/test_prompt_caching_live.py: new live E2E (skipif when
OPENROUTER_API_KEY is unset; runs outside the hermetic suite).
* Targeted suites: 327/327 pass (caching/adapter/policy/builder).
* tests/agent/ + tests/run_agent/: 3992 pass, 17 skip, 1 pre-existing
flake (test_async_httpx_del_neuter::test_same_key_replaces_stale_loop_entry,
verified failing on pristine origin/main).
Replace with for all literal-tuple
membership tests. Set lookup is O(1) vs O(n) for tuple — consistent
micro-optimization across the codebase.
608 instances fixed via `ruff --fix --unsafe-fixes`, 0 remaining.
133 files, +626/-626 (net zero).
#23482 fixed cache poisoning in the sync path: when a Codex auxiliary
timeout closes the underlying OpenAI client, _evict_cached_client_instance
walks CodexAuxiliaryClient wrappers via their _real_client attribute and
drops the cache entry so the next aux call rebuilds.
The cache key includes async_mode (see _client_cache_key), so the sync and
async clients for the same provider live in two distinct entries pointing
at the same underlying transport. The fix walked the sync wrapper's
_real_client correctly but the async wrappers
(AsyncCodexAuxiliaryClient, AsyncAnthropicAuxiliaryClient,
AsyncGeminiNativeClient) never exposed _real_client at all, so the async
entry survived eviction and kept handing out the poisoned client.
Effect on async aux callers: one timeout now poisons every subsequent
async aux call (compression, vision, session_search, title_generation)
with 'Connection error' until gateway restart -- even while the sync
route recovered as designed in #23482.
Mirror the sync wrapper's _real_client onto each async wrapper so the
existing eviction helper finds them. Three changes, one per wrapper:
- AsyncCodexAuxiliaryClient: self._real_client = sync_wrapper._real_client
(the underlying OpenAI client)
- AsyncAnthropicAuxiliaryClient: same shape
- AsyncGeminiNativeClient: self._real_client = sync_client (Gemini's
native facade is itself the leaf; no OpenAI client beneath it)
Update _evict_cached_client_instance docstring to reflect that it now
covers both sync and async wrappers via the same attribute walk.
Test: TestAuxiliaryClientPoisonedCacheEviction.test_evict_cached_client_instance_walks_async_wrapper
seeds both sync and async cache entries pointing at the same leaf and
asserts both are dropped on a single eviction call. Verified the test
fails without the wrapper changes ("async cache entry survived
eviction -- wrapper is missing _real_client") and passes with them.
Refs #23482, #23432
CJK and emoji glyphs render as two terminal cells but JS String#length
and the model's own padding count them as one, so any markdown table
with Chinese / Japanese / Korean cells drifts right per row when a
real terminal renders it. Both surfaces fix this with a display-cell
width measurement (wcswidth on the Python side, stringWidth on the
TUI side).
Changes:
- agent/markdown_tables.py: new helper. realign_markdown_tables(text)
detects markdown table blocks (header + |---| divider) and
rewrites the row padding using wcwidth.wcswidth so every pipe and
dash lines up across rows. No-op on text without tables.
- cli.py: hook the helper into _render_final_assistant_content for
strip / render modes (raw passes through untouched), and into the
streaming line emitter so live token-by-token rendering also
produces aligned tables. A small two-buffer state machine in
_emit_stream_text holds table rows until the block ends, then
flushes them through the realigner so all rows pad to a single
per-column width.
- ui-tui/src/components/markdown.tsx: renderTable now uses
stringWidth (Bun.stringWidth fast path + East-Asian-width-aware
fallback, already memoised in @hermes/ink) instead of UTF-16
String#length for both column-width measurement and per-cell
padding. Drops the comment that documented the bug as a deliberate
limitation.
Validation:
- New tests/agent/test_markdown_tables.py (11): every rebuilt block
shares pipe column offsets across rows for pure CJK, mixed
CJK+emoji, ragged-row, and multi-table inputs.
- Updated tests/cli/test_cli_markdown_rendering.py: the existing
strip-mode test asserted exact whitespace; rewritten to assert the
alignment contract (cell content survives + every rendered row
shares pipe offsets).
- New ui-tui markdown.test.ts case (1): rendered column-2 start
offset is identical for the header + every body row, including
the CJK row that drifted before the fix.
- Live: hermes chat -q with the user-reported screenshot prompt now
produces a perfectly aligned table on the wire (header, divider,
4 body rows including '通义千问', all pipes at identical columns).
The /model picker for Nous Portal users was returning the in-repo
_PROVIDER_MODELS["nous"] snapshot — which only updates on Hermes
releases — instead of the remote manifest published at
https://hermes-agent.nousresearch.com/docs/api/model-catalog.json.
OpenRouter already pulled from the manifest via fetch_openrouter_models;
"nous" was the only curated provider where the existing manifest
plumbing (get_curated_nous_model_ids → get_curated_nous_models) was
defined but not wired into the picker pipeline. Switch the curated
build in list_authenticated_providers to use it, with the same
graceful fallback to the in-repo snapshot when the manifest is
unreachable.
Test: tests/hermes_cli/test_model_catalog.py exercises the picker with
a patched manifest and asserts the manifest's nous list reaches
list_picker_providers. Falls-back-to-static path was already covered
by test_curated_nous_ids_falls_back_to_hardcoded_on_empty_catalog.
- getattr(self, '_slash_confirm_state', None) at the two read sites that
trip object.__new__(HermesCLI) test fixtures (test_cli_external_editor,
test_cli_skin_integration)
- _build_tui_layout_children: make slash_confirm_widget keyword-only with
default None to avoid breaking subclassing extension hook for wrapper
CLIs (test_cli_extension_hooks)
- AUTHOR_MAP entry for zhengyn0001
Follow-up to the salvaged commit ca1d4375a.
Follow-up to PR #23824. Adds two correctness fixes on top of the
contributor's salvaged commit:
1. Stale-dist fallback no longer gated on `fatal=False`. `cmd_dashboard`
passes `fatal=True` and is the primary scenario this fallback is for
(issue #23817 — Windows Scheduled Task at logon). The previous gate
meant the fallback never fired in the case it was designed for.
2. `--skip-build` now verifies the dist actually exists before starting
the server. Without this, a misconfigured pre-build would launch the
dashboard pointing at a missing dist and silently serve 404s. We now
exit 1 with a clear "pre-build first: cd web && npm run build"
message, and on success print which dist directory is being used.
Verified end-to-end on Linux:
- build fails + stale dist (fatal=True) -> fallback fires
- build fails + no dist (fatal=True) -> exit 1 with stderr surfaced
- build fails + stale dist (fatal=False) -> fallback fires
- --skip-build + missing dist -> exit 1 with clear guidance
- --skip-build + valid dist -> 'Skipping web UI build...'
Three improvements for non-interactive contexts (Windows Scheduled
Tasks, CI/CD) where the web UI build may fail (issue #23817):
1. Retry build once after 3s — covers boot-time races (antivirus
scanning Node.js, npm cache not ready, transient disk I/O)
2. Fall back to existing dist when build fails (non-fatal mode) —
a stale UI is far better than no UI at all
3. Add --skip-build flag — lets callers pre-build in their wrapper
script and start the dashboard without internal build attempt
4. Surface npm stderr in build failure output for easier debugging
Fixes#23817
On Windows systems using a Chinese GBK locale, `hermes update` could misreport the Web UI build as failed even when `npm run build` actually succeeded. The failure was caused by Python decoding captured npm output with the process locale inside a background subprocess reader thread. When npm emitted bytes such as `0x85`, decoding under GBK raised `UnicodeDecodeError`, and Hermes then surfaced a misleading "Web UI build failed" warning.
This change makes the npm install/npm ci path and the Web UI build step decode captured output explicitly as UTF-8 with `errors="replace"`. That keeps unexpected bytes from crashing output collection, preserves successful builds, and prevents false negatives during update on Windows.
The patch also adds regression tests that verify these subprocess calls always use explicit UTF-8 decoding with replacement semantics.
When the user's main provider is openai-codex on the ChatGPT-account
backend (https://chatgpt.com/backend-api/codex), sending a native image
attachment encodes it as data:image/...base64,... in the input_image
field. The OpenAI Responses API on the public endpoint accepts that, but
the ChatGPT-account variant rejects it with HTTP 400:
Invalid 'input[N].content[K].image_url'. Expected a valid URL, but got
a value with an invalid format.
Hermes' image-rejection phrase list didn't include this wording, so the
error escaped the strip-and-retry branch and fell through to the generic
recovery path: model fallback → context-too-large → compression cascade
→ auxiliary OpenRouter 402 spam (issue #23570).
Add a NARROW phrase keyed on the field-path apostrophe used by the Codex
Responses error format: "image_url'. expected". This matches the actual
error format without false-tripping on generic 'Expected a valid URL'
errors from unrelated tools (webhooks, redirect_uri, etc.). Once matched,
the existing branch strips images from history, sets _vision_supported=
False for the session, and retries text-only.
Refs #23570 (1 of 3 image-replay improvements; persistence rewrite to
store image PATHS instead of inlined base64 is a separate follow-up)
* Revert "fix(goals): force judge to use tool calls instead of JSON-text replies (#23547)"
This reverts commit a63a2b7c78.
* Revert "fix(goals): forward standing /goal state on auto-compression session rotation (#23530)"
This reverts commit 4a080b1d5a.
* Revert "feat(goals): /goal checklist + /subgoal user controls (#23456)"
This reverts commit 404640a2b7.