Commit graph

809 commits

Author SHA1 Message Date
rob-maron
2863e9484a
Use nous portal as model metadata authority (#24502)
* nous portal metadata resolver

* minor fixes
2026-05-12 11:59:31 -07:00
Teknium
c1eb2dcda7
feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220)
* feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback

Three coordinated mitigations for the Mini Shai-Hulud worm hitting
mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package
compromise that follows.

# What this PR makes true

1. Users with the poisoned mistralai 2.4.6 in their venv get a loud
   detection banner with copy-pasteable remediation steps the moment
   they run hermes (and on every gateway startup).
2. One quarantined / yanked PyPI package can no longer silently demote
   a fresh install to 'core only' — the installer keeps every other
   extra and tells the user which tier landed.
3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can
   lazy-install on first use under a strict allowlist, instead of
   eagerly pulling everything at install time.

# Detection: hermes_cli/security_advisories.py

- ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for
  mistralai==2.4.6). Adding the next one is a single dataclass.
- detect_compromised() uses importlib.metadata.version() — no pip
  dependency, works in uv venvs that lack pip.
- Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits
  the startup banner to once per 24h per advisory.
- Acks persisted to security.acked_advisories in config.yaml; never
  re-banner after ack.
- Wired into:
  * hermes doctor — runs first, prints full remediation block
  * hermes doctor --ack <id> — dismisses an advisory
  * cli.py interactive run() and single-query branches — short
    stderr banner pointing at hermes doctor
  * gateway/run.py startup — operator-visible warning in gateway.log

# Lazy-install framework: tools/lazy_deps.py

- LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs,
  memory.honcho, provider.bedrock, etc.) to pip specs.
- ensure(feature) installs missing deps in the active venv via the
  uv → pip → ensurepip ladder (matches tools_config._pip_install).
- Strict spec safety regex rejects URLs, file paths, shell metas,
  pip flag injection, control chars — only PyPI-by-name accepted.
- Gated on security.allow_lazy_installs (default true) plus the
  HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs.
- Migrated three backends as proof of pattern:
  * tools/tts_tool.py — _import_elevenlabs() calls ensure first
  * plugins/memory/honcho/client.py — get_honcho_client lazy-installs
  * tts.mistral / stt.mistral entries pre-registered for when PyPI
    restores mistralai

# Installer fallback tiers

scripts/install.sh, scripts/install.ps1, setup-hermes.sh:

- Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one
  array when a transitive breaks; users keep every other extra.
- New 'all minus known-broken' tier between [all] and the existing
  PyPI-only-extras tier. Only kicks in when [all] fails resolve.
- All three tiers explicit: every fallback announces which tier
  landed and prints a re-run hint when not on Tier 1.
- install.ps1 and install.sh both regenerate their tier specs from
  the same _BROKEN_EXTRAS array so updates stay in sync.

Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral'
in its extra list — bug fixed by the refactor (mistral is filtered
out).

# Config

hermes_cli/config.py — DEFAULT_CONFIG.security gains:
- acked_advisories: []  (advisory IDs the user has dismissed)
- allow_lazy_installs: True  (security gate for ensure())

No config version bump needed — both keys nest under existing
security: block, and load_config's deep-merge picks up DEFAULT_CONFIG
defaults for users with older configs.

# Tests

tests/hermes_cli/test_security_advisories.py — 23 tests covering:
- detect_compromised matches/non-matches, wildcard frozenset
- ack persistence, idempotence, blank rejection, config-failure path
- banner cache rate limiting + 24h re-banner + ack-stops-banner
- short_banner_lines / full_remediation_text / render_doctor_section /
  gateway_log_message
- shipped catalog well-formedness invariant

tests/tools/test_lazy_deps.py — 40 tests covering:
- spec safety: 11 safe parametrized + 18 unsafe parametrized
- allowlist: unknown-feature rejection, namespace.name shape,
  every shipped spec passes the safety regex
- security gating: config flag, env var, default, fail-open
- ensure() happy/sad paths: already-satisfied, install success,
  pip stderr surfaced on failure, install-succeeds-but-still-missing
- is_available, feature_install_command

Combined: 63 new tests, all passing under scripts/run_tests.sh.

# Validation

- scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py
  tests/tools/test_lazy_deps.py → 63/63 passing
- scripts/run_tests.sh tests/hermes_cli/test_doctor.py
  tests/hermes_cli/test_doctor_command_install.py
  tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py
  tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing
- scripts/run_tests.sh tests/hermes_cli/ tests/tools/ →
  9191 passed, 8 pre-existing failures (verified on origin/main
  before this change)
- bash -n on install.sh and setup-hermes.sh → OK
- py_compile on all modified .py files → OK
- End-to-end smoke test of detect_compromised + render_doctor_section
  + gateway_log_message with mocked installed version → produces
  copy-pasteable remediation output

# Community

Full advisory + remediation steps:
website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md

Short-form post drafts (Discord, GitHub pinned issue, README banner):
scripts/community-announcement-shai-hulud.md

Refs: PR #24205 (mistral disabled), Socket Security advisory
<https://socket.dev/blog/mini-shai-hulud-worm-pypi>

* build(deps): pin every direct dep to ==X.Y.Z (no ranges)

Companion to the supply-chain advisory work: replace every >=/</~= range
in pyproject.toml's [project.dependencies] and [project.optional-dependencies]
with an exact ==X.Y.Z pin sourced from uv.lock.

Why: ranges allow PyPI to ship a fresh version of any direct dep at any
time without a code review on our side. With ranges, the malicious
mistralai 2.4.6 release would have been pulled by every fresh
'pip install -e .[all]' for the hours between upload and PyPI's
quarantine — exactly the install window we got hit on. Exact pins close
that window: the only way a new package version reaches a user is via
an intentional update on our end.

What the user-facing change is: nothing, behavior-wise. Every package
resolves to the same version it was already resolving to via uv.lock —
the pins just remove the resolver's freedom to pick a different one.

Cost: any user installing Hermes alongside another package that requires
a newer pin gets a resolver conflict. Acceptable for our isolated-venv
install path; documented in the new comment block.

Build-system requires line (setuptools>=61.0) is intentionally left
as a range — pinning the build backend would block fresh pip from
bootstrapping the build on architectures where that exact wheel isn't
available.

mistral extra (mistralai==2.3.0) is pinned but stays out of [all]
(per PR #24205). 'uv lock' regeneration will fail until PyPI restores
mistralai; lockfile regeneration is gated behind that, NOT on every PR.

LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy-
install pathway can never resolve a different version than the one
declared in pyproject.toml.

Validation:

- Cross-checked all 77 pinned direct deps in pyproject.toml against
  uv.lock — every pin matches the resolved version exactly.
- Cross-checked all LAZY_DEPS specs against uv.lock — same.
- 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly.
- tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py
  → 63/63 passing (every shipped spec passes the safety regex).
- Doctor + TTS + transcription targeted suite → 146/146 passing.

* build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra

You asked: 'what about the dependencies the dependencies rely on?' —
correctly noting that exact-pinning direct deps in pyproject.toml does
NOT cover the transitive graph. `pip install` and `uv pip install` both
re-resolve transitives fresh from PyPI at install time, so a compromised
transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would
still hit our users even with every direct dep exact-pinned.

# What this commit fixes

1. **Both real installer scripts now prefer `uv sync --locked` as Tier 0.**
   uv.lock records SHA256 hashes for every transitive — a compromised
   package with a different hash gets REJECTED. Falls through to the
   existing `uv pip install` cascade if the lockfile is missing or
   stale, with a loud warning that the fallback path does NOT
   hash-verify transitives. Previously only `setup-hermes.sh` (the dev
   path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1`
   (the paths fresh users actually run) skipped it.

2. **Removed the `[mistral]` extra entirely.** The `mistralai` PyPI
   project is fully quarantined right now — every version returns 404,
   so any pin we wrote was unresolvable, which broke `uv lock --check`
   in CI. Restoration is documented in pyproject.toml as a 5-step
   checklist (verify, re-add extra, re-enable in 4 modules, regenerate
   lock, optionally re-add to [all]).

3. **Regenerated uv.lock.** 262 packages, mistralai/eval-type-backport/
   jsonpath-python pruned. `uv lock --check` now passes.

# Defense-in-depth view

| Layer                      | Where             | Protects against                          |
|----------------------------|-------------------|-------------------------------------------|
| Exact pins in pyproject    | direct deps       | new mistralai 2.4.6-style direct compromise |
| uv.lock + `--locked` install | transitive graph  | transitive worm injection                  |
| Tier-0 hash-verified path  | install.sh / .ps1 | actually USE the lockfile in fresh installs |
| `uv lock --check` CI gate  | every PR          | drift between pyproject and lockfile      |
| `hermes_cli/security_advisories.py` | runtime  | cleanup for users who already got hit      |

The exact pinning + hash verification together close the supply-chain
gap. Without the lockfile path, exact pins alone are theater.

# Validation

- `uv lock --check` → passes (262 packages resolved, no drift).
- `bash -n` on install.sh + setup-hermes.sh → OK.
- 209/209 tests passing across new + adjacent test files
  (test_lazy_deps.py, test_security_advisories.py, test_doctor.py,
  test_tts_mistral.py, test_transcription_tools.py).
- TOML parse OK.

* chore: remove community announcement drafts (PR body covers it)

* build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard)

Extends the lazy-install framework to cover everything that's not used by
every hermes session. Base install drops from ~60 packages to 45.

Moved out of core dependencies = []:
- anthropic   (only when provider=anthropic native, not via aggregators)
- exa-py, firecrawl-py, parallel-web (search backends; only when picked)
- fal-client  (image gen; only when picked)
- edge-tts    (default TTS but still optional)

New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web]
[fal] [edge-tts]. All added to [all].

New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel},
tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix},
terminal.{modal,daytona,vercel}, tool.dashboard.

Each import site now calls ensure() before importing the SDK. Where the
module had a top-level try/except (telegram, discord, fastapi), the
graceful-fallback pattern was extended to lazy-install on first
check_*_requirements() call and re-bind module globals.

Updated test_windows_native_support.py tzdata check from snapshot
(>=2023.3 literal) to invariant (any version + win32 marker).

Validation:
- Base install: 45 packages (was ~60); 6 newly-extracted packages absent
- uv lock --check: passes (262 packages, no drift)
- 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing
- py_compile clean on all 12 modified modules
2026-05-12 01:02:25 -07:00
Robin Fernandes
94d9db72ba add client marker tag on aux inference requests 2026-05-11 22:30:42 -07:00
rob-maron
32abe742fa fix comment 2026-05-11 21:30:29 -07:00
rob-maron
f0c2964f0b remove comments 2026-05-11 21:30:29 -07:00
rob-maron
057fc7b073 fix guard 2026-05-11 21:30:29 -07:00
rob-maron
528bba6734 fix kimi 2026-05-11 21:30:29 -07:00
Teknium
ea1d0462cf
fix(cli): vertical fallback for markdown tables wider than terminal (#23948)
Follow-up to #23863 (CJK table alignment). The realigner was
correctly padding pipes to identical column offsets, but when a
table's natural width exceeds terminal cells it produced lines that
the terminal soft-wrapped mid-cell, destroying column alignment
visually even though the bytes were perfectly padded. Reported as
'columns are not aligned' on tables containing one long row alongside
several short rows.

Approach mirrors Claude Code's MarkdownTable.tsx narrow-terminal
fallback: when realign_markdown_tables is given an available_width
budget and the rebuilt horizontal table exceeds it, render each body
row as 'Header: value' lines separated by a thin ─ rule. Word-wraps
oversize values at the budget with a 2-space continuation indent.

- agent/markdown_tables.py: realign_markdown_tables(text, available_width=None);
  threshold check at the top of _render_block flips into a new
  _render_vertical fallback. Includes _wrap_to_width with hard-break
  for tokens longer than the budget.
- cli.py: helper _terminal_width_for_streaming() returns
  shutil.get_terminal_size().columns minus _STREAM_PAD and a 2-cell
  safety margin; passed to all three realign call sites
  (_render_final_assistant_content for strip+render Panel paths, and
  the streaming flushers in _emit_stream_text / _flush_stream).
- tests/agent/test_markdown_tables.py: 4 new tests covering the
  overflow-vertical fallback for ASCII + CJK content, the
  'fits → keep horizontal' case, and the long-cell wrap with indent.

Live-verified: with COLUMNS=100, the user's reported 'long row in
ASCII table' case now renders as vertical key-value rows that all fit
the panel; the 6-column CJK comparison table still renders as an
aligned horizontal table because it fits inside 100 cols.
2026-05-11 16:49:13 -07:00
nicoechaniz
e2b713cced fix(model-metadata): skip OpenRouter for known providers, add kimi/moonshot to PROVIDER_TO_MODELS_DEV
Based on PR #23950 by @nicoechaniz.

- Add "kimi" and "moonshot" to PROVIDER_TO_MODELS_DEV → kimi-for-coding
- Gate OpenRouter metadata step behind "if not effective_provider":
  known providers should not be overridden by community-maintained OR data
- Keep the targeted Kimi-family 32k guard as a secondary safety net
  inside the OR gate (for unknown providers with Kimi models)

Co-authored-by: nicoechaniz <nicoechaniz@altermundi.net>
2026-05-11 13:16:07 -07:00
kshitijk4poor
91eef6255e fix: correct context-length resolution for kimi-k2.6 on Ollama Cloud and Kimi Coding
Kimi-k2.6 (which supports 262K context) was incorrectly resolved as 32K,
tripping the 64K minimum-context guard and preventing use of the model on
Ollama Cloud and Kimi Coding / Moonshot providers.

Three fixes in the context-length resolution chain:

1. Ollama Cloud native /api/show query: new _query_ollama_api_show()
   queries the Ollama native API for authoritative GGUF model_info
   context_length.  For hosted Ollama, prefers model_info over num_ctx
   since users can't set their own num_ctx on Cloud.  Added at step 5e
   in get_model_context_length(), before the models.dev fallback.

2. models.dev :cloud/-cloud suffix fallback: lookup_models_dev_context()
   now also tries appending :cloud and -cloud suffixes when the bare
   model name doesn't match.  models.dev stores 'kimi-k2.6:cloud' but
   users and the live API use bare 'kimi-k2.6'.

3. Kimi-family 32K guard: after the OpenRouter metadata step, reject
   exactly 32768 for Kimi-named models (kimi-*, moonshot*) and fall
   through to hardcoded defaults ('kimi': 262144).  OpenRouter reports
   32768 for moonshotai/kimi-k2.6 but the model actually supports 262K.
   Narrow filter — only 32768, only Kimi-family — becomes dead code
   when OpenRouter updates its metadata.

---
2026-05-11 13:16:07 -07:00
Teknium
7b76366552
feat(prompt-cache): cross-session 1h prefix cache for Claude on Anthropic / OpenRouter / Nous Portal (#23828)
Cuts input cost for first-turn Claude requests by ~85-90% on subsequent
sessions within an hour. Tools array (~13k tokens for default toolset) +
stable system prefix (~5-8k tokens) get a 1h cache_control marker; the
volatile suffix (memory, USER profile, timestamp, session id) sits in a
separate non-cached block at the end so it doesn't poison the cross-session
prefix when it changes.

Provider gate: Claude on native Anthropic (incl. OAuth subscription),
OpenRouter, and Nous Portal (which proxies to OpenRouter). All other
providers keep today's system_and_3 layout unchanged.

Layout (4 cache_control breakpoints, Anthropic max):
  1. tools[-1]              -> 1h (cross-session)
  2. system content[0]      -> 1h (cross-session, stable prefix)
  3. messages[-2]           -> 5m (within-session rolling)
  4. messages[-1]           -> 5m (within-session rolling)

Within-session rolling shrinks from 3 messages to 2 to free the breakpoint
budget. On Claude with realistic tool loadouts the long-lived tier carries
the bulk of cross-session value anyway.

System prompt is now always assembled cache-friendly: stable identity /
guidance / skills / platform hints first, then session-stable context
files (AGENTS.md, .cursorrules), then per-call volatile content. Old
single-string callers see the same logical content (same join order),
just reordered so volatile lives at the end.

Config knobs (defaults shown):
  prompt_caching:
    cache_ttl: "5m"           # rolling-window TTL (unchanged)
    long_lived_prefix: true    # opt-out switch
    long_lived_ttl: "1h"       # cross-session prefix TTL

Live E2E (tests/agent/test_prompt_caching_live.py, gated on
OPENROUTER_API_KEY) on anthropic/claude-haiku-4.5 with default toolset:
  Call 1 (cold):              cache_write=13,415  cache_read=0
  Call 2 (NEW agent + msg):   cache_write=391     cache_read=13,025
  Cross-session reuse:        97.09%

Implementation:
* agent/prompt_caching.py: new apply_anthropic_cache_control_long_lived()
  + mark_tools_for_long_lived_cache(); existing apply_anthropic_cache_control()
  preserved verbatim for the fallback path.
* agent/anthropic_adapter.py: convert_tools_to_anthropic() now forwards
  cache_control onto each Anthropic-format tool dict.
* run_agent.py: _build_system_prompt_parts() returns the 3-tier dict;
  _build_system_prompt() joins them (backward compatible).
  _supports_long_lived_anthropic_cache() policy added next to the existing
  _anthropic_prompt_cache_policy() (which now also recognises Nous Portal
  Claude — pre-existing gap fixed in passing).
  _build_api_kwargs() resolves tools_for_api once and propagates the
  marker through all four build paths (anthropic_messages, bedrock,
  codex_responses, profile/legacy chat completions).
  Long-lived flag plumbed into the runtime snapshot/restore + model-switch
  + fallback-promotion paths.

Tests:
* tests/agent/test_prompt_caching.py: +8 tests (TestMarkToolsForLongLivedCache,
  TestApplyAnthropicCacheControlLongLived).
* tests/run_agent/test_anthropic_prompt_cache_policy.py: +9 tests
  (TestSupportsLongLivedAnthropicCache matrix across 8 endpoint classes
  + a fallback-target case).
* tests/agent/test_prompt_caching_live.py: new live E2E (skipif when
  OPENROUTER_API_KEY is unset; runs outside the hermetic suite).
* Targeted suites: 327/327 pass (caching/adapter/policy/builder).
* tests/agent/ + tests/run_agent/: 3992 pass, 17 skip, 1 pre-existing
  flake (test_async_httpx_del_neuter::test_same_key_replaces_stale_loop_entry,
  verified failing on pristine origin/main).
2026-05-11 11:14:56 -07:00
kshitij
2ec8d2b42f
chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937)
Replace  with  for all literal-tuple
membership tests. Set lookup is O(1) vs O(n) for tuple — consistent
micro-optimization across the codebase.

608 instances fixed via `ruff --fix --unsafe-fixes`, 0 remaining.
133 files, +626/-626 (net zero).
2026-05-11 11:13:25 -07:00
wuli666
111b859e49 fix(auxiliary): evict async wrappers on poisoned client (follow-up to #23482)
#23482 fixed cache poisoning in the sync path: when a Codex auxiliary
timeout closes the underlying OpenAI client, _evict_cached_client_instance
walks CodexAuxiliaryClient wrappers via their _real_client attribute and
drops the cache entry so the next aux call rebuilds.

The cache key includes async_mode (see _client_cache_key), so the sync and
async clients for the same provider live in two distinct entries pointing
at the same underlying transport. The fix walked the sync wrapper's
_real_client correctly but the async wrappers
(AsyncCodexAuxiliaryClient, AsyncAnthropicAuxiliaryClient,
AsyncGeminiNativeClient) never exposed _real_client at all, so the async
entry survived eviction and kept handing out the poisoned client.

Effect on async aux callers: one timeout now poisons every subsequent
async aux call (compression, vision, session_search, title_generation)
with 'Connection error' until gateway restart -- even while the sync
route recovered as designed in #23482.

Mirror the sync wrapper's _real_client onto each async wrapper so the
existing eviction helper finds them. Three changes, one per wrapper:

- AsyncCodexAuxiliaryClient: self._real_client = sync_wrapper._real_client
  (the underlying OpenAI client)
- AsyncAnthropicAuxiliaryClient: same shape
- AsyncGeminiNativeClient: self._real_client = sync_client (Gemini's
  native facade is itself the leaf; no OpenAI client beneath it)

Update _evict_cached_client_instance docstring to reflect that it now
covers both sync and async wrappers via the same attribute walk.

Test: TestAuxiliaryClientPoisonedCacheEviction.test_evict_cached_client_instance_walks_async_wrapper
seeds both sync and async cache entries pointing at the same leaf and
asserts both are dropped on a single eviction call. Verified the test
fails without the wrapper changes ("async cache entry survived
eviction -- wrapper is missing _real_client") and passes with them.

Refs #23482, #23432
2026-05-11 11:13:20 -07:00
Teknium
1d00716754
fix(cli,tui): align CJK / wide-char markdown tables (#23863)
CJK and emoji glyphs render as two terminal cells but JS String#length
and the model's own padding count them as one, so any markdown table
with Chinese / Japanese / Korean cells drifts right per row when a
real terminal renders it. Both surfaces fix this with a display-cell
width measurement (wcswidth on the Python side, stringWidth on the
TUI side).

Changes:
- agent/markdown_tables.py: new helper. realign_markdown_tables(text)
  detects markdown table blocks (header + |---| divider) and
  rewrites the row padding using wcwidth.wcswidth so every pipe and
  dash lines up across rows. No-op on text without tables.
- cli.py: hook the helper into _render_final_assistant_content for
  strip / render modes (raw passes through untouched), and into the
  streaming line emitter so live token-by-token rendering also
  produces aligned tables. A small two-buffer state machine in
  _emit_stream_text holds table rows until the block ends, then
  flushes them through the realigner so all rows pad to a single
  per-column width.
- ui-tui/src/components/markdown.tsx: renderTable now uses
  stringWidth (Bun.stringWidth fast path + East-Asian-width-aware
  fallback, already memoised in @hermes/ink) instead of UTF-16
  String#length for both column-width measurement and per-cell
  padding. Drops the comment that documented the bug as a deliberate
  limitation.

Validation:
- New tests/agent/test_markdown_tables.py (11): every rebuilt block
  shares pipe column offsets across rows for pure CJK, mixed
  CJK+emoji, ragged-row, and multi-table inputs.
- Updated tests/cli/test_cli_markdown_rendering.py: the existing
  strip-mode test asserted exact whitespace; rewritten to assert the
  alignment contract (cell content survives + every rendered row
  shares pipe offsets).
- New ui-tui markdown.test.ts case (1): rendered column-2 start
  offset is identical for the header + every body row, including
  the CJK row that drifted before the fix.
- Live: hermes chat -q with the user-reported screenshot prompt now
  produces a perfectly aligned table on the wire (header, divider,
  4 body rows including '通义千问', all pipes at identical columns).
2026-05-11 11:13:06 -07:00
kshitij
657874460f
chore: ruff auto-fixes — collapsible-else-if, if-stmt-min-max, dict.fromkeys (#23926)
PLR5501 (collapsible-else-if): 28 instances — else: if: → elif:
PLR1730 (if-stmt-min-max):   15 instances — if x<y: x=y → x=max(x,y)
C420   (dict.fromkeys):       2 instances — dictcomp → dict.fromkeys
PLR1704 (redefined-argument): 1 instance — reason → err_msg (shadow fix)
C414   (unnecessary-list):    1 instance — sorted(list(x)) → sorted(x)

28 files, -44 net lines. All mechanical, zero logic changes.
17,211 tests pass, zero regressions.
2026-05-11 11:03:29 -07:00
Teknium
228b7d27bd
fix(auxiliary): cache 402'd providers as unhealthy with TTL to stop per-call retry storms (#23597)
When an auxiliary provider returns HTTP 402 (credit / payment), every
subsequent compression / title-gen / session-search / vision call still
re-tried it as the FIRST entry in the chain — burning ~1 RTT to hit 402
again, then falling back. On a long Discord/LCM session that meant dozens
of doomed 402s per minute (issue #23570).

Add a per-process unhealthy-provider cache with a 10 min TTL. When any
caller observes a payment error against a provider, the label is marked
unhealthy and skipped by:
  * _resolve_auto Step-1 (main provider use-as-aux path)
  * _resolve_auto Step-2 (aggregator/fallback chain)
  * _try_payment_fallback (used by call_llm/acall_llm on first 402)

Skip-logs are throttled to once per minute per label so a bursty session
doesn't spam agent.log. Entries auto-expire so a topped-up account
recovers without manual intervention. The cache is in-process only by
design — multi-profile users with different keys per profile must each
hit the 402 once.

Refs #23570
2026-05-10 22:43:14 -07:00
Teknium
e5bce320db
fix(auxiliary): evict cached client on timeout/connection error (#23482)
A Codex auxiliary timeout closes the underlying OpenAI client (so the
streaming hang doesn't sit until the user kills the session), but the
cached wrapper kept pointing at the now-dead transport. Subsequent
auxiliary calls (compression retry, memory flush, background review,
title generation routed via provider: main) reused that closed client
and failed fast with 'Connection error' until the gateway restarted —
even though the main agent route was healthy the whole time.

Sync `_get_cached_client` had no liveness check (async did, via loop
identity), and the connection-error fallback in `call_llm` only fired
on the auto provider path, so an explicit provider — including the
common `auxiliary.compression.provider: main` shape — never evicted.

Three fixes:

* New `_evict_cached_client_instance(target)` helper that drops the
  cache entry whose stored client is target (or wraps it via
  `_real_client`, for `CodexAuxiliaryClient`).
* `_CodexCompletionsAdapter._close_client_on_timeout` evicts the
  wrapper after closing the inner OpenAI client.
* `call_llm` and `async_call_llm` evict on `_is_connection_error`
  before re-raising, regardless of whether the provider is auto.

Net effect: one timeout costs one summary attempt + the existing 30s
compressor cooldown; the next compaction rebuilds the client and
works. Non-connection errors (4xx/5xx) do not evict, so cache hits
stay stable.

Closes #23432
2026-05-10 18:55:05 -07:00
Teknium1
ae83a54be4 docs(kanban): worker lane contract page + review-required convention
Closes the architectural-pin part of #19931. Most of what that issue
asked for is already implemented (logs under kanban root, env-pinned
workspace, dispatcher routing of unknown assignees, lifecycle
ownership, structured handoff conventions). What was missing:

1. A written contract integrators can point at when adding a new
   worker lane shape, and
2. The "code-changing workers should not auto-promote success to
   done" convention.

This commit ships both as docs+convention layered on existing primitives.
No kernel changes — the kanban_complete / kanban_block / kanban_comment
surfaces already support the review-required pattern; we just hadn't
written it down or made it visible to workers.

Changes:

- `agent/prompt_builder.py::KANBAN_GUIDANCE`: append the review-required
  exception to step 5 of the lifecycle. Workers get the cue
  auto-injected into their system prompt — drop structured metadata
  into a kanban_comment first, then end with
  kanban_block(reason="review-required: <summary>") instead of
  kanban_complete when the work needs review. Total prompt size went
  from ~3000 to ~3275 chars; well under the 4096 budget enforced by
  test_kanban_guidance_size.

- `skills/devops/kanban-worker/SKILL.md`: add a worked example to the
  existing "Good summary + metadata shapes" section between the
  Coding-task and Research-task examples. Same shape as the others
  (kanban_comment with structured handoff JSON, then kanban_block with
  the human-readable reason). Plus a one-line guide on when to use
  kanban_complete vs the review-required pattern.

- `website/docs/user-guide/features/kanban-worker-lanes.md` (new): the
  integrator-facing contract. Covers the hierarchy, the three things
  every lane must provide (assignee, spawn mechanism, lifecycle
  terminator), the env vars the dispatcher injects, the
  review-required convention, the failure modes the kernel handles
  for free, and an explicit "external CLI worker lane" deferred-
  pending-concrete-asker section that links to #19931 and #19924.

- `website/sidebars.ts`: link the new page under user-guide/features.

The "specialist worker lanes for external CLI tools (Codex / Claude
Code / OpenCode)" runner is NOT shipped here. The dispatcher's
spawn_fn parameter already supports plugin-shaped extension; the
per-CLI integration work (auth, sandbox policy, exit-code mapping)
needs a concrete asker. The new docs page tells would-be integrators
the contract any such lane must satisfy.

Refs #19931
2026-05-10 18:15:52 -07:00
Teknium
d6e1fadbf5
fix(xai): omit reasoning.effort for grok models that reject it (#23435)
xAI's Responses API returns HTTP 400 ("Model X does not support
parameter reasoningEffort") for grok-4, grok-4-0709, grok-4-fast-*,
grok-4-1-fast-*, grok-3, grok-4.20-0309-*, and grok-code-fast-1 — even
though those models reason natively. Hermes was unconditionally sending
`reasoning: {effort: 'medium'}` to xAI for every Grok model, breaking
direct `--provider xai` for the entire grok-4 line.

Add a substring allowlist predicate (verified live against api.x.ai
2026-05-10) covering the only Grok families that accept the effort dial:
grok-3-mini*, grok-4.20-multi-agent*, grok-4.3*. The Responses transport
omits the `reasoning` key entirely for everything else while still
including `reasoning.encrypted_content` so we capture native reasoning
tokens.

Verified end-to-end: `hermes chat -q hi --provider xai --model grok-4-0709`
went from HTTP 400 to a successful reply.
2026-05-10 15:21:30 -07:00
Teknium
c39168453d
feat(i18n): localize all gateway commands + web dashboard, add 8 new locales (16 total) (#22914)
* feat(i18n): localize /model command output

Reported by @tianma8888: when Chinese users run /model, the labels
("Provider:", "Context:", "_session only_", etc.) are still English.
This routes the static prose through the existing i18n catalog so it
follows display.language / HERMES_LANGUAGE.

Changes:
- locales/{en,zh,ja,de,es,fr,tr,uk}.yaml: add 17 keys under
  gateway.model.* covering switched/provider/context/max_output/cost/
  capabilities/prompt_caching/warning/saved_global/session_only_hint/
  current_label/current_tag/more_models_suffix/usage_*.
- gateway/run.py _handle_model_command: replace hardcoded f-strings in
  the picker callback, the text-list fallback, and the direct-switch
  confirmation block with t("gateway.model.<key>", ...).

What stays English:
- model IDs, provider slugs, capability strings, cost figures, and the
  "[Note: model was just switched...]" prepended to the model's next
  prompt (LLM-facing, not user-facing).
- The two slightly-different session-only hints unify on a single key
  with the em-dash phrasing.

Validation: tests/agent/test_i18n.py 27/27 passing (parity contract
holds), tests/gateway/ -k 'model or i18n' 74/74 passing.

* feat(i18n): localize all gateway slash command outputs

Expands the i18n catalog from 7 strings to 234 keys across 35 gateway
slash command handlers, so non-English users see localized output for
\`/profile\`, \`/status\`, \`/help\`, \`/personality\`, \`/voice\`, \`/reset\`,
\`/agents\`, \`/restart\`, \`/commands\`, \`/goal\`, \`/retry\`, \`/undo\`,
\`/sethome\`, \`/title\`, \`/yolo\`, \`/background\`, \`/approve\`, \`/deny\`,
\`/insights\`, \`/debug\`, \`/rollback\`, \`/reasoning\`, \`/fast\`,
\`/verbose\`, \`/footer\`, \`/compress\`, \`/topic\`, \`/kanban\`,
\`/resume\`, \`/branch\`, \`/usage\`, \`/reload-mcp\`, \`/reload-skills\`,
\`/update\`, \`/stop\` (plus the \`/model\` block already added in the
previous commit).

Reported by @tianma8888 — Chinese users want command output prose in
their language, not just the labels we already had.

Translations are hand-written for all 8 supported locales (en, zh, ja,
de, es, fr, tr, uk), matching each catalog's existing style: full-width
punctuation in zh, em-dashes in zh/ja/uk, French spaced colons,
German noun capitalization, etc.

What stays English (unchanged):
- Identifiers/values: model IDs, file paths, profile names, session IDs,
  command flag names like --global, URLs, config keys.
- Backtick code spans: \`/foo\`, \`config.yaml\`.
- Log messages (logger.info/warning/error).
- LLM-facing system notes prepended to next prompt (e.g. [Note: model
  was just switched...]).
- Strings produced by external modules (gateway_help_lines,
  format_gateway, manual_compression_feedback) — those have their
  own surfaces.

New shared keys for cross-handler boilerplate:
- gateway.shared.session_db_unavailable (5 call sites: branch, title,
  resume, topic, _disable_telegram_topic_mode_for_chat)
- gateway.shared.session_not_found (1 site)
- gateway.shared.warn_passthrough (2 sites in /title's f"⚠️ {e}" pattern)

YAML gotcha fixed: \`yolo.on\` and \`yolo.off\` were originally written
unquoted, which YAML 1.1 parses as boolean True/False keys. Renamed to
\`yolo.enabled\` / \`yolo.disabled\` for both safety and clarity.

Test fix: tests/agent/test_i18n.py::test_t_missing_key_in_non_english_falls_back_to_english
now resets the catalog cache on teardown, so the fake "foo: English Foo"
locale doesn't poison the module-level cache for subsequent tests in
the same xdist worker. (Without this, every gateway slash command test
that shares a worker with the i18n suite would see the fake catalog.)

Validation:
- tests/agent/test_i18n.py: 27/27 (parity contract — every key in every
  locale, matching placeholder tokens).
- tests/gateway/: 5077 passed, 0 failed (full gateway suite).
- 180 t() call sites added across 35 handlers; 1872 catalog entries
  total (234 keys × 8 locales).

* feat(i18n): add 8 new locales — af, ko, it, ga, zh-hant, pt, ru, hu

Expands the static-message catalog from 8 → 16 languages, each with full
270-key parity against the English source-of-truth.  Every locale now
covers the same surface PR #22914 added: approval prompts plus all 35
gateway slash command outputs.

New locales:
- af  Afrikaans      (community ask in #21961 by @GodsBoy; PRs #21962, #21970)
- ko  Korean         (PRs #20297 by @tmdgusya, #22285 by @project820)
- it  Italian        (PR #20371 by @leprincep35700)
- ga  Irish/Gaeilge  (PR #20962 by @ryanmcc09-dot)
- zh-hant Traditional Chinese (PRs #20523 by @jackey8616, #13140 by @anomixer)
- pt  Portuguese     (PRs #20443 by @pedroborges, #15737 by @carloshenriquecarniatto, #22063 by @Magaav)
- ru  Russian        (PR #22770 by @DrMaks22)
- hu  Hungarian      (PR #22336 by @lunasec007)

Each locale uses native-quality translations matching the existing tone
and conventions of the older 8 locales:
- zh-hant uses 繁體 characters with TW/HK technical vocabulary (軟體
  not 软件, 連線 not 连接, 設定 not 设置, 訊息 not 消息, 工作階段 not 会话, 程式
  not 程序, 預設 not 默认, 伺服器 not 服务器), full-width punctuation 「:()」.
- ko uses formal 합니다체 (습니다/합니다) register throughout.
- pt uses European Portuguese as baseline with neutral PT/BR vocabulary
  where possible.
- ga uses standard An Caighdeán Oifigiúil; English loanwords retained
  for tech terms without good Irish equivalents (gateway, API, JSON).
- All preserve {placeholder} tokens, backtick code spans, slash commands,
  brand names (Hermes, MCP, TTS, YOLO, OpenAI, Telegram, etc.), and emoji.

Aliases added in agent/i18n.py:
- af-za, Afrikaans → af
- ko-kr, Korean, 한국어 → ko
- it-it, italiano → it
- ga-ie, Irish, Gaeilge → ga
- zh-tw, zh-hk, zh-mo, traditional-chinese → zh-hant (note: zh-tw used to
  alias to zh; now aliases to its own zh-hant catalog)
- zh-cn, zh-hans, zh-sg → zh (unchanged from before)
- pt-pt, pt-br, brazilian, portuguese → pt
- ru-ru, Russian, русский → ru
- hu-hu, Magyar → hu

The zh-tw alias re-routing is intentional: previously typing 'zh-TW' got
the Simplified Chinese catalog (wrong vocabulary for Taiwan/HK users).
Now those users get the proper Traditional Chinese catalog.

Validation:
- tests/agent/test_i18n.py: 43/43 (parity contract holds for all 16
  languages × 270 keys = 4320 catalog entries, with matching placeholder
  tokens).
- E2E alias resolution verified for all 19 alias inputs (Afrikaans, ko-KR,
  한국어, italiano, Gaeilge, zh-TW, zh-HK, traditional-chinese, pt-BR,
  brazilian, Magyar, etc.).
- tests/gateway/: 5198 passed (3 pre-existing TTS routing failures
  unrelated to i18n).

Credit to all contributors whose PRs surfaced these language requests.
Their original PRs may now be closed as superseded with credit.

* feat(dashboard-i18n): add 14 web dashboard locales matching the static catalog

Brings the React dashboard (web/src/) up to the same 16-language
coverage the static catalog already has after the previous commits in
this PR. The Translations interface is TypeScript-typed, so every new
locale must provide every key — tsc -b is the parity guard.

Languages added (each is a complete 429-line locale file):
- af  Afrikaans
- ja  Japanese        (PR #22513 by @snuffxxx surfaced this)
- de  German          (PR #21749 by @mag1art)
- es  Spanish         (PR #21749)
- fr  French          (PRs #21749, #10310 by @foXaCe)
- tr  Turkish
- uk  Ukrainian
- ko  Korean          (PRs #21749, #18894 by @ovstng, #22285 by @project820)
- it  Italian
- ga  Irish (Gaeilge)
- zh-hant Traditional Chinese (PR #13140 by @anomixer)
- pt  Portuguese      (PRs #22063 by @Magaav, #22182 by @wesleysimplicio, #15737 by @carloshenriquecarniatto)
- ru  Russian         (PRs #21749, #22770 by @DrMaks22)
- hu  Hungarian       (PR #22336 by @lunasec007)

Each translation covers all 15 namespaces with full key parity vs en.ts,
preserves every {placeholder} token verbatim, keeps identifiers
untranslated (brand names, file paths, cron expressions, code spans),
translates the language.switchTo tooltip into the target language, and
matches existing tone conventions (zh-hant uses TW/HK vocab; ja uses
formal desu/masu; ko uses formal seumnida register; ga uses An
Caighdean Oifigiuil with English loanwords for tech vocab without good
Irish equivalents).

Plumbing:
- web/src/i18n/types.ts: Locale union expanded to all 16 codes.
- web/src/i18n/context.tsx: imports all 16 catalogs; exports
  LOCALE_META (endonym + flag per locale); isLocale() type guard.
- web/src/i18n/index.ts: re-export LOCALE_META.
- web/src/components/LanguageSwitcher.tsx: replaced two-state EN-ZH
  toggle with a click-to-open dropdown listing all 16 languages.

Note: zh-hant.ts exports zhHant (camelCase) since hyphen is invalid in
a JS identifier; the canonical 'zh-hant' string keys it in TRANSLATIONS.

Validation:
- npx tsc -b: 0 errors. Every locale satisfies Translations.
- npm run build (tsc + vite production): green, 2062 modules.
- Each locale file is exactly 429 lines.

Out of scope: plugin dashboards (kanban/achievements ship as prebuilt
bundles with no source in repo); Docusaurus docs (separate surface);
TUI (no i18n yet).

* feat(plugin-i18n): localize achievements + kanban plugin dashboards across all 16 locales

Brings the two shipped plugin dashboards (hermes-achievements, kanban)
under the same i18n umbrella as the core dashboard PR #22914 just
established.  Both bundles now read user-facing strings from the host's
i18n catalog via SDK.useI18n() instead of hardcoded English.

## Approach

Plugin dashboards ship as prebuilt IIFE bundles in
plugins/<name>/dashboard/dist/index.js — no build step, no source in
repo (upstream-authored, vendored as compiled JS).  Earlier contributor
PRs (#22594, #22595, #18747) tried direct edits but didn't actually
wire the bundles to read translations.

This change does the wiring properly:

1.  Each bundle gets a useI18n shim at IIFE scope:
        const useI18n = SDK.useI18n
          || function () { return { t: { kanban: null }, locale: "en" }; };
    Older host SDKs without useI18n still load the bundle and render
    English fallbacks.

2.  A small tx(t, path, fallback, vars) helper resolves dotted keys
    under the plugin's namespace (t.kanban.* or t.achievements.*) and
    interpolates {placeholder} tokens.

3.  Every React component starts with const { t } = useI18n() and
    each user-visible string is wrapped in tx(t, "key", "English fallback").
    Helpers called outside React components (window.prompt callers,
    constants used during init) take t as a parameter.

4.  Top-level constants that were English dictionaries (COLUMN_LABEL,
    COLUMN_HELP, DESTRUCTIVE_TRANSITIONS, DIAGNOSTIC_EVENT_LABELS in
    kanban) become getColumnLabel(t, status)-style functions backed by
    FALLBACK_* dictionaries.

## Translations added

Two new top-level namespaces added to the dashboard's TypeScript-typed
Translations interface:

- achievements: ~70 keys covering the hero, scan banner, achievement
  card, share dialog, stats, filters, and empty states.
- kanban: ~145 keys covering the board, columns (with nested
  columnLabels and columnHelp sub-dicts), card detail panel,
  bulk-actions toolbar, dependency editor, board switcher, and
  diagnostic callouts.

Each key is provided across all 16 supported locales:
en, zh, zh-hant, ja, de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu.

Total new translation entries: ~3,440 (215 keys × 16 locales).

## What stays English (deliberate)

- API paths, CSS class names, data-* attributes, JSON keys, regex
  strings, URLs, file paths (~/.hermes/kanban.db, boards/_archived/).
- State identifier strings used as lookup keys (triage / todo / ready /
  running / blocked / done / archived) — labels translate, key strings
  don't.
- The PNG share-card text rendered to canvas in the achievements
  ShareDialog (HERMES AGENT watermark, UNLOCKED stamp, tier names) —
  these become part of a globally-shared image and stay English.
- localStorage keys (hermes.kanban.selectedBoard).
- Brand names (Kanban, Hermes, WebSocket, Nous Research).

## Contributor credit

PR #22594 by @02356abc and PR #22595 by @02356abc supplied the
en + zh kanban namespace skeleton (145 keys); used as the en source-
of-truth in this commit and translated to the other 14 locales.

PR #18747 by @laolaoshiren first surfaced the achievements
localization request.

## Validation

- npx tsc -b: 0 errors. All 16 locale .ts files satisfy the
  Translations type with full key parity.
- npm run build (tsc + vite production build): green, 2062 modules,
  1.56MB JS / 95KB CSS, ~2.5s build.
- node --check on both plugin bundles: parse cleanly.
- 126 tx() call sites in kanban, 46 in achievements.

## Out of scope

- TUI (ui-tui/) has no i18n infrastructure yet.
- Docusaurus docs (website/i18n/) — already had zh-Hans; expanding
  is a separate translation workstream (Thai / Korean / Hindi PRs).
2026-05-10 07:14:14 -07:00
Teknium
5aa755e4e6
feat(plugins): run any LLM call from inside a plugin via ctx.llm (#23194)
* feat(plugins): host-owned LLM access via ctx.llm

Plugins can now ask the host to run a one-shot chat or structured
completion against the user's active model and auth, without ever
seeing an OAuth token or API key. Closes the gap where plugins that
needed bounded structured inference (receipts, CRM extraction,
support classification) had to either bring their own provider keys
or register a tool the agent had to call.

New surface on PluginContext:
- ctx.llm.complete(messages, ...)
- ctx.llm.complete_structured(instructions, input, json_schema, ...)
- async siblings ctx.llm.acomplete / acomplete_structured

Backed by the existing auxiliary_client.call_llm pipeline — every
provider, fallback chain, vision routing, and timeout policy Hermes
already supports applies automatically.

Trust gate (fail-closed by default):
- plugins.entries.<id>.llm.allow_model_override
- plugins.entries.<id>.llm.allowed_models (allowlist; '*' = any)
- plugins.entries.<id>.llm.allow_agent_id_override
- plugins.entries.<id>.llm.allow_profile_override

Embedded model@profile shorthand goes through the same gate as
explicit profile=, so it can't bypass the auth-profile policy.
Conflicting explicit and embedded profiles fail closed.

Also lands:
- plugins/plugin-llm-example/ — reference plugin that registers
  /receipt-extract, demonstrating image+text structured input,
  jsonschema validation, and the trust-gate config.
- website/docs/developer-guide/plugin-llm-access.md — full API docs.
- 45 unit tests covering trust gates, JSON parsing, schema
  validation, image encoding, async surface, and config loading.

Validation:
- 2628 tests pass in tests/agent/
- E2E: bundled plugin loaded with isolated HERMES_HOME, slash
  command produced parsed JSON via stubbed call_llm
- response_format extra_body wired correctly for both json_object
  and json_schema modes

* docs(plugin-llm): rewrite quickstart and framing

The quickstart now uses a meeting-notes-to-tasks example instead of
a receipt extractor, and the page leads with hook-time / gateway
pre-filter / scheduled-job framing rather than the OpenClaw
KB/support/CRM/finance/migration enumeration that the original
upstream PR used. Receipt example moved to a separate worked
example link so the docs page itself doesn't echo any of the
upstream framing.

Also clarifies where ctx.llm fits in the broader plugin surface
(table comparing register_tool / register_platform / register_hook
/ etc.) and what makes this lane different from auxiliary_client
internals.

No code change.

* docs(plugin-llm): reframe as any LLM call, not just structured output

The original draft leaned heavily on complete_structured() and made
the chat lane (complete() / acomplete()) feel like a footnote.
Restructure so:

- The page title and description say 'any LLM call.'
- The lead shows BOTH a plain chat call (error rewriter) AND a
  structured call (triage scorer) up top.
- Quick start has two complete plugin examples — /tldr (chat) and
  /paste-to-tasks (structured).
- New 'When to use which' table for choosing complete() vs
  complete_structured() vs the async siblings.
- Trust-gate sections explicitly note 'all four methods,' and the
  request-shaping list calls out chat-only fields (messages) and
  structured-only fields (instructions, input, json_schema)
  alongside each other.
- The 'Where this fits' section now says 'for any reason,
  structured or not.'

The receipt-extractor reference plugin still exists under
plugins/plugin-llm-example/ — but the docs page no longer treats
it as the canonical surface example. It's now described as 'a third
worked example, this time with image input.'

No code change.

* feat(plugin-llm): split provider/model into independent explicit kwargs

The first cut accepted a single 'provider/model' slug on every method
and split it internally. That looked clean but broke under live test:
the model-override path tried to use the slug's vendor prefix as a
literal Hermes provider id, which silently switched the user off
their aggregator (e.g. plugin asks for 'openai/gpt-4o-mini' on a user
who routes through OpenRouter — host attempted to call the 'openai'
provider directly, failed because OPENAI_API_KEY wasn't set).

New shape mirrors the host's main config:

  ctx.llm.complete(
      messages=[...],
      provider='openrouter',         # gated, optional
      model='openai/gpt-4o-mini',    # gated, optional
      profile='work',                # gated, optional
      ...
  )

Each is independently gated by its own allow_*_override flag.
Granting model-override does NOT auto-grant provider-override.
Allowlists are now per-axis (allowed_providers, allowed_models)
matched literally against whatever string the plugin sends.

Dropped 'model@profile' embedded-suffix shorthand entirely. Hermes
doesn't use that pattern anywhere else; profile= is its own kwarg.

Live E2E (against real OpenRouter via Teknium's config) confirms:
- zero-config call works
- default-deny blocks each override with a helpful error
- model-only override stays on user's active provider (the bug)
- provider+model override switches cleanly
- allowlist refuses non-listed entries
- structured output round-trip parses + schema-validates

Tests: 49 cases (up from 45); all green. Docs updated to match the
new shape, including a 'most plugins never need this section' callout
on the trust-gate config block.

* fix+cleanup(plugin-llm): real attribution, hook-mode coverage, move example out of core

Three integration fixes for the ctx.llm surface:

1. Attribution bug — result.provider and result.model now reflect
   what call_llm actually used, not placeholder fallbacks ('auto',
   'default'). New _resolve_attribution() helper:

     - explicit overrides win (what the call targeted)
     - response.model wins for the recorded model (provider
       canonicalisation: 'gpt-4o' → 'gpt-4o-2024-08-06' etc.)
     - falls back to _read_main_provider() / _read_main_model()
       when no override is set, so audit logs reflect the user's
       active main provider/model
     - 'auto' / 'default' only when EVERYTHING is empty

   Live verified: zero-config call now records
   provider='openrouter', model='anthropic/claude-4.7-opus-20260416'
   instead of provider='auto', model='default'.

2. Hook-mode coverage — TestHookMode confirms ctx.llm.complete
   works from inside a registered post_tool_call callback. The
   docs page promised hook integration; now there's a test that
   exercises the lazy-import path through the real invoke_hook
   machinery. Two cases: traceback-rewrite hook with conditional
   ctx.llm.complete, and minimal hook regression for the
   sync-hook + sync-llm path.

3. Reference plugin moved out of core. plugins/plugin-llm-example/
   is gone from hermes-agent — it now lives in the new
   NousResearch/hermes-example-plugins companion repo. The docs
   page links there. Hermes' bundled plugins should be plugins
   users actually run; reference / docs-companion plugins live
   externally.

Test count: 56 (up from 49). Wider sweep on tests/hermes_cli/
+ tests/gateway/ + tests/tools/ + tests/agent/ shows 16770
passing; the 12 failures are all pre-existing on origin/main
(verified by stashing this branch's changes and re-running) —
kanban-boards, delegate-task, gateway-restart, tts-routing —
none touch the plugin_llm surface.

* chore(plugins): move all example plugins to companion repo

Reference / docs-companion plugins now live exclusively in
NousResearch/hermes-example-plugins, not bundled with the core repo:

- example-dashboard
- strike-freedom-cockpit

A new fourth example, plugin-llm-async-example, was added to that
repo demonstrating ctx.llm's async surface (acomplete()) with
asyncio.gather() — registers /translate <lang>: <text> which fires
forward translation + sentiment classifier in parallel, then a
back-translation for QA. Live-tested at 2.5s for three real
provider round-trips (would be ~5-6s sequential).

Docs updated:
- developer-guide/plugin-llm-access.md links both sync and async
  examples in the Reference section
- user-guide/features/extending-the-dashboard.md repoints both demo
  sections to the companion repo with corrected install paths
- user-guide/features/built-in-plugins.md drops the two demo rows
- AGENTS.md notes that example plugins live in the companion repo

Net: hermes-agent's plugins/ directory now contains only plugins
users actually run (memory providers, dashboard tabs that ship real
features, the disk-cleanup hook, platform adapters). All four
demo / reference plugins live externally where they can be cloned
on demand instead of inflating the core install.
2026-05-10 07:09:28 -07:00
Teknium
7312f7f849
feat(curator): hint at hermes curator pin in the rename block (#23212)
Surfaces the pin command at the moment users care about it: when a
consolidation just landed against their skill library and they're
looking at the umbrella name in the curator output. Previously `hermes
curator pin` existed but had no discovery surface — users only learned
it existed by reading docs or stumbling onto `hermes curator --help`.

The hint:

    archived 3 skill(s):
      • docx-extraction → document-tools
      • pdf-extraction → document-tools
      • old-stale — pruned (stale)
    full report: hermes curator status
    keep an umbrella stable: hermes curator pin document-tools

Gated on having at least one consolidation that produced an umbrella.
Pruned-only runs (nothing surviving to pin) skip the hint. When
multiple umbrellas were produced, picks alphabetically first as a
concrete example rather than listing them all.

3 new tests in tests/agent/test_curator_classification.py covering:
consolidation produces hint with real umbrella name, pruned-only run
omits it, multi-umbrella picks one example.
2026-05-10 06:44:53 -07:00
kshitijk4poor
44cdf555a8 fix(codex-spark): defensive 128k entry in DEFAULT_CONTEXT_LENGTHS + clarify validation test docstring
Two follow-ups from self-review:

1. Add gpt-5.3-codex-spark to DEFAULT_CONTEXT_LENGTHS at 128k. The
   primary resolution path for Spark goes through provider='openai-codex'
   → _CODEX_OAUTH_CONTEXT_FALLBACK (already correct). But if any future
   code path resolves Spark's context with a different provider (custom
   proxy, generic fallthrough), the longest-substring-first lookup in
   step 8 would match 'gpt-5' and report 400k, which is wrong by ~3x.
   Adding the explicit override is a cheap defensive correctness fix
   matching how gpt-5.4-mini and gpt-5.4-nano already shadow the generic
   gpt-5 entry.

2. Update test_openai_codex_model_validation_fallback.py docstring. The
   bug it was originally written for (gpt-5.3-codex-spark missing from
   listing) is now resolved by this PR's catalog restoration. The test
   still validly exercises the soft-accept code path for any future
   entitlement-gated Codex slug that ships before Hermes catalogs it,
   but the framing was stale — clarified.
2026-05-09 23:17:25 -07:00
kshitij
9ee9a4297d docs(codex-spark): document ChatGPT Pro entitlement gating
PR #12994 stripped gpt-5.3-codex-spark on the assumption that it was
unsupported. It's actually research-preview, ChatGPT-Pro-only, exposed
via the Codex OAuth backend at chatgpt.com/backend-api/codex/models —
not via the public OpenAI API.

Add explanatory comments in:
  - DEFAULT_CODEX_MODELS / _FORWARD_COMPAT_TEMPLATE_MODELS (codex_models.py)
  - _CODEX_OAUTH_CONTEXT_FALLBACK (model_metadata.py)
  - list_authenticated_providers' live-discovery branch (model_switch.py)

so future maintainers don't strip the entry again. Also documents the
intentional asymmetry that Spark stays out of the "openai" provider
catalog (it isn't on the public API) and why the supported_in_api
filter is *not* applied for the openai-codex route.
2026-05-09 23:17:25 -07:00
olegdater
c6dc295a35 fix(model-metadata): set codex-spark fallback context to 128k 2026-05-09 23:17:25 -07:00
olegdater
2a6f3deb50 fix(model-metadata): restore gpt-5.3-codex-spark fallback context 2026-05-09 23:17:25 -07:00
Teknium
3800972dd0
feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955)
When the active main model has native vision and the provider supports
multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini
3, OpenRouter, Nous), vision_analyze loads the image bytes and returns
them to the model as a multimodal tool-result envelope. The model then
sees the pixels directly on its next turn instead of receiving a lossy
text description from an auxiliary LLM.

Falls back to the legacy aux-LLM text path for non-vision models and
unverified providers.

Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and
Cline. All four converge on the same pattern: tool results carry image
content blocks for vision-capable provider/model combinations.

Changes
- tools/vision_tools.py: _vision_analyze_native fast path + provider
  capability table (_supports_media_in_tool_results). Schema description
  updated to reflect new behaviour.
- agent/codex_responses_adapter.py: function_call_output.output now
  accepts the array form for multimodal tool results (was string-only).
  Preflight validates input_text/input_image parts.
- agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so
  tools see the live CLI/gateway override, not the stale config.yaml
  default. set_runtime_main()/clear_runtime_main() helpers.
- run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn
  start so vision_analyze's fast-path check sees the actual runtime.
- tests/conftest.py: clear runtime-main override between tests.

Tests
- tests/tools/test_vision_native_fast_path.py: provider capability
  table, envelope shape, fast-path gating (vision-capable model uses
  fast path; non-vision model falls through to aux).
- tests/run_agent/test_codex_multimodal_tool_result.py: list tool
  content becomes function_call_output.output array; preflight
  preserves arrays and drops unknown part types.

Live verified
- Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a
  typed filepath, gets pixels back, reads exact text from images that
  no aux description could capture (font color irony, multi-line
  fruit-count list, etc.).

PR replaces the closed prior efforts (#16506 shipped the inbound user-
attached path; this PR closes the gap for tool-discovered images).
2026-05-09 21:06:19 -07:00
Teknium
4375b82cd9
feat(curator): show rename map in user-visible summary (#22910)
* feat(curator): show rename map (where skills went) in user-visible summary

The full data has always been on disk in REPORT.md, but the user-visible
curator summary (gateway 💾 line, CLI session-start panel,
`hermes curator status`) was counts-only — "consolidated 4 into 2
umbrellas" with no names. Users only discovered renames when something
they expected was gone.

New `_build_rename_summary()` formats the rename map and appends it to
`final_summary`:

    auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1
    archived 3 skill(s):
      • docx-extraction → document-tools
      • pdf-extraction → document-tools
      • old-stale-thing — pruned (stale)
    full report: hermes curator status

Empty on no-op ticks (no archives), so most ticks add zero log noise.
Cap of 10 entries keeps agent.log readable when a 50-skill
consolidation lands; the full list is always in REPORT.md.

`hermes curator status` indents continuation lines so the multi-line
summary reads as one logical field.

5 new tests in tests/agent/test_curator_classification.py covering
empty / consolidation / pruning / cap / mixed cases.

* feat(curator): show recent run summary once on `hermes update`

The rename map is now visible from where users actually look — the
update flow they explicitly run, instead of just the live gateway log
or transient CLI session-start panel.

Behavior:
- After `hermes update`, if the most recent curator run produced a
  rename map (multi-line summary) that the user hasn't seen yet, print
  it once with a 'last run Xh ago' header and a one-time-message
  footer.
- Stamp `last_run_summary_shown_at = last_run_at` after printing so
  subsequent `hermes update` invocations are silent until a newer
  curator run lands.
- Silent on no-op runs (single-line summary like 'auto: no changes;
  llm: no change'). Still stamps shown so we don't reconsider on
  every update.
- Silent when the curator has never run (the existing first-run
  notice handles that case).

Output:

    ℹ Skill curator — last run 4h ago
      auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1
      archived 3 skill(s):
        • docx-extraction → document-tools
        • pdf-extraction → document-tools
        • old-stale-thing — pruned (stale)
      full report: hermes curator status
      (This message shows once per curator run. View anytime: hermes curator status)

State migration:
- `_default_state()` gains `last_run_summary_shown_at: None`. Existing
  state files lack the field; `.get()` returns None; the comparison
  treats any prior run as 'not yet shown' and prints once on next
  update. Self-healing.

Wiring:
- Both `hermes update` paths in main.py call the new
  `_print_curator_recent_run_notice()` right after the existing
  first-run notice. Best-effort try/except so a state-load bug
  never breaks the update flow.

6 tests in tests/hermes_cli/test_curator_recent_run_notice.py:
no-run / single-line / multi-line / show-once / new-run-resets /
time-formatter buckets.
2026-05-09 18:43:40 -07:00
Wesley Simplicio
4f8d8ad912 fix(error_classifier): classify generic-typed timeout messages as transient (carve-out of #22664)
RuntimeError('claude CLI turn timed out') from a local OpenAI-compatible
shim was falling through to FailoverReason.unknown, surfacing as 'Empty
response from model' and burning 3 retry slots on the same failing
endpoint. _classify_by_message had no timeout-message branch — only
billing/rate_limit/auth/context_overflow/model_not_found patterns. The
type-based check at line 565 also requires isinstance(error, (TimeoutError,
ConnectionError, OSError)) — a plain RuntimeError doesn't match.

Add _TIMEOUT_MESSAGE_PATTERNS for 'timed out', 'deadline exceeded',
'request timed out', 'operation timed out', 'upstream timed out', 'turn
timed out'. _classify_by_message returns FailoverReason.timeout (retryable=True)
when any pattern matches.

Salvage of #22664's classifier portion. The original PR also bundled a
fallback self-selection guard which is now redundant (already on main
via #22780) plus DeepSeek thinking and session_search fixes that are
their own separate concerns.

Follow-up to #22780 — fixes the still-broken classification of
generic-typed provider-shim timeouts that #22780's dedup didn't cover.
2026-05-09 17:54:07 -07:00
Wesley Simplicio
35f773c459 fix(context_compressor): treat streaming premature-close as transient error
Problem:
When a provider or proxy drops a streaming response mid-flight (httpcore
raises RemoteProtocolError: "incomplete chunked read", "peer closed
connection", "response ended prematurely", etc.), _generate_summary
would not classify it as a transient error.  Instead of retrying on the
main model, it entered the generic 60-second cooldown, leaving context
growing unbounded until the cooldown expired.  Issue #18458.

Root cause:
_is_connection_error in auxiliary_client.py did not match httpcore's
streaming premature-close error substrings.  context_compressor.py's
_generate_summary except block never called _is_connection_error, so
those errors fell through to the 60-second generic cooldown rather than
triggering the retry-on-main fallback path used for timeouts.

Fix:
1. auxiliary_client.py — extend _is_connection_error keyword list with:
   "incomplete chunked read", "peer closed connection",
   "response ended prematurely", "unexpected eof",
   "remoteprotocolerror", "localprotocolerror".
   Also guard the `from openai import ...` with try/except ImportError
   so the function works in environments without the openai package.
2. context_compressor.py — import _is_connection_error and call it in
   _generate_summary's except block as _is_streaming_closed.  Include
   _is_streaming_closed in the fallback-to-main condition (alongside
   _is_model_not_found, _is_timeout, _is_json_decode) and use the
   shorter 30s transient cooldown for streaming-closed errors.

Tests:
4 new regression tests in TestStreamingClosedFallback:
- test_incomplete_chunked_read_falls_back_to_main
- test_peer_closed_connection_falls_back_to_main
- test_streaming_closed_on_main_uses_short_cooldown  (stash-verified)
- test_non_streaming_unknown_error_still_uses_long_cooldown

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 17:52:51 -07:00
Teknium
c7f0aab949
feat(openrouter): wire Pareto Code router with min_coding_score knob (#22838)
Pick openrouter/pareto-code as your model and OpenRouter auto-routes each
request to the cheapest model meeting your coding-quality bar (ranked by
Artificial Analysis). The new openrouter.min_coding_score config key (0.0-1.0,
default 0.65) tunes the floor.

- hermes_cli/models.py: add openrouter/pareto-code to OPENROUTER_MODELS so
  it shows up in the picker with a description
- hermes_cli/config.py: add openrouter.min_coding_score (default 0.65 — lands
  on a mid-tier coder on the current Pareto frontier)
- plugins/model-providers/openrouter: emit extra_body.plugins =
  [{id: pareto-router, min_coding_score: X}] when model is openrouter/pareto-code
  AND the score is a valid float in [0.0, 1.0]
- agent/transports/chat_completions.py: same emission on the legacy flag
  path (when no provider profile is loaded)
- run_agent.py: openrouter_min_coding_score kwarg + storage; plumbed into
  both build_kwargs() invocations and the context-summary extra_body path
- cli.py: read openrouter.min_coding_score once at init, validate float in
  [0,1], pass to AIAgent constructions (CLI + background-task paths)
- cron/scheduler.py, batch_runner.py, tools/delegate_tool.py,
  tui_gateway/server.py: propagate the kwarg (mirrors providers_order
  plumbing — subagents inherit, cron/batch read from config)
- tests: profile-level + transport-level coverage of the model gating,
  unset/empty/out-of-range handling, and the legacy flag path
- docs: new 'OpenRouter Pareto Code Router' section in providers.md

Verified end-to-end against api.openrouter.ai: at score=0.65 we land on a
mid-tier coder, at omission we get the strongest. Score is silently dropped
on any model other than openrouter/pareto-code, so it's safe to leave set.
2026-05-09 14:47:00 -07:00
Ninso112
883e11f0a0 fix(openrouter): add x-grok-conv-id header for Grok models to improve prompt cache hit rates (carve-out of #22708)
Pass session_id through to provider profile build_api_kwargs_extras so
the OpenRouter profile can attach an xAI cache-affinity header
(x-grok-conv-id: <session-id>) for x-ai/grok-* models. xAI prompt
cache requires server affinity via this header — without it the cache
is poisoned and Grok prompt-cache hit rates drop dramatically on
multi-turn sessions.

Carve-out of #22708 by Ninso112. The original PR bundled a /diff
slash command, a zsh completion fix (already on main via #22802),
and holographic memory null-guards. This salvage keeps just the
Grok header work — small, targeted, and well-tested. Other
contributors and changes preserved for separate review.

Closes #22705.
2026-05-09 13:38:52 -07:00
Teknium
1c9ffb177c
fix(model-metadata): align hy3-preview static fallback + delete change-detector test (#22805)
Two co-located fixes:

1. agent/model_metadata.py: bump hy3-preview static fallback from
   256000 to 262144 (256 * 1024) to match OpenRouter live metadata
   so cache and offline both agree (issue #22268).

2. tests/hermes_cli/test_tencent_tokenhub_provider.py: replace the
   exact-value change-detector (assert ctx == 256000) with an
   invariant assertion (registered + >= 4096). Per AGENTS.md
   'Don't write change-detector tests': pinning the upstream-controlled
   context length is exactly the test class the rule forbids — it
   breaks every time the provider bumps the published value, with
   zero behavioral coverage gained.

Salvage of #22574 with a redirect on the test approach. The
contributor's diff bumped the integer and added a SECOND
change-detector pinning DEFAULT_CONTEXT_LENGTHS[hy3-preview] == 262144,
which would re-break on the next published bump. We instead delete
the change-detector entirely and assert the relationship.

Closes #22268.
2026-05-09 13:37:19 -07:00
Maxim Esipov
17d8914850 fix(auxiliary): rotate pooled auth after quota failures 2026-05-09 13:35:04 -07:00
Teknium
775c0e22cf
perf(models_dev): cache-first lookup, skip network when disk cache is fresh (#22808)
`fetch_models_dev()` is on the hot path of every `AIAgent.__init__`
(via `context_compressor → get_model_context_length`). The previous
policy was "always try network first, only fall back to disk if
network fails," so every fresh `hermes chat` / `hermes gateway` /
batch / cron process paid 250-500 ms re-fetching a 2 MB JSON registry
that was already on disk from earlier runs.

Add a stage 2 between in-mem and network: if
`models_dev_cache.json` exists and its mtime is younger than the
existing `_MODELS_DEV_CACHE_TTL` (1 hour, same TTL the in-mem cache
already uses), load from disk and skip the network call.

The in-mem TTL is anchored to the disk file's age, so a 50-min-old
cache stays in-memory for only 10 more minutes — no surprise
extension of staleness window.

Invariants preserved:
- `force_refresh=True` still always hits the network and only falls
  back to disk on failure (`hermes config refresh` semantics).
- Missing disk cache → fall through to network (first-ever run).
- Stale disk cache (mtime > TTL) → fall through to network.
- Negative file age (clock skew) → fall through to network.
- Network failure → existing stage-4 stale-disk fallback unchanged.

Measured impact (3-run medians, 9950X3D, fresh process per run):
  fetch_models_dev cold:  256 → 17 ms  (-93%)
  hermes chat -q wall:   4.00 → 3.73 s (-7% median)
                         3.99 → 3.60 s (-10% min)

The chat-end-to-end win is bounded below by API latency variance, but
the fetch_models_dev microbenchmark is the cleanest signal: 239 ms
shaved off every fresh-process agent construction.

Win compounds with the previous perf PRs:
  #22681 google_chat lazy-load
  #22766 doctor parallel + IMDS off
  #22790 gateway.platforms PEP 562

Tests: all 30 `tests/agent/test_models_dev.py` pass (added 4 new ones
covering the new disk-cache-first path, force_refresh override, stale
disk fallback, and missing-disk-cache fall-through). Full `tests/agent/`
suite: 2560 passed, 0 failed.
2026-05-09 13:32:38 -07:00
Julien Talbot
cd712b176a feat(transports/codex): pass reasoning.effort to xAI Responses API
The is_xai_responses branch only sent include=[reasoning.encrypted_content]
without forwarding the resolved reasoning_effort. Other Responses providers
(OpenAI, GitHub) already get effort forwarded — this aligns the xAI path.

Without this, agent.reasoning_effort is silently dropped on the xAI direct
path, making Hermes unable to control reasoning depth on grok-4.x via
api.x.ai. Tests added to TestCodexBuildKwargs cover effort passthrough,
disabled state, and minimal-clamp parity with non-xAI.
2026-05-09 13:23:02 -07:00
Teknium
6e5489c9f3
fix(memory): tighten MEMORY_GUIDANCE against ephemeral PR/issue/SHA notes (#22781)
The model regularly writes session-outcome facts to MEMORY.md despite
the existing 'Do NOT save task progress' line — entries like
'Submitted PR #22577 for the kanban dedup fix' or 'Fixed bug X in
file Y'. These are stale within days, pollute the system prompt,
and crowd out durable user preferences (the issue #22563 reporter
saw 9 sections of bug-fix notes injected on a brand-new task).

Add explicit examples of what NOT to save (PR numbers, issue
numbers, commit SHAs, 'fixed/submitted/Phase N done', file counts)
plus the 7-day-staleness heuristic so the model has a concrete
calibration target rather than guessing what counts as 'task progress'.

Closes #22563 (the prompt-side, low-risk portion). The bigger
relevance-based-injection / vector-retrieval feature requested in
#22563 is tracked under #2184 (Richer local memory). Per skill rule
on prompt caching, dynamic memory injection breaks the frozen-snapshot
invariant and needs a separate design call.
2026-05-09 12:48:25 -07:00
obafemiferanmi1999
0f1d41a88c fix(transports): use PEP 604 annotation for ToolCall.extra_content
`ToolCall.extra_content` was annotated `Optional[Dict[str, Any]]`,
but neither `Optional` nor `Dict` are imported at the top of
`agent/transports/types.py` — only `Any` is.  The rest of the file
consistently uses PEP 604 / 585 syntax (e.g. `str | None`,
`dict[str, Any] | None`).

The file has `from __future__ import annotations`, so the missing
names don't crash class definition.  But the annotation IS evaluated
when anything calls `typing.get_type_hints(ToolCall)` —
introspection raises `NameError: name 'Optional' is not defined`.

ruff catches it cleanly:

    F821 Undefined name `Optional`  agent/transports/types.py:65:32
    F821 Undefined name `Dict`      agent/transports/types.py:65:41

Switch the annotation to `dict[str, Any] | None` to match the
rest of the file's style.  No new imports needed.

Verified:
  - ruff F-checks now pass on the file
  - `typing.get_type_hints(ToolCall)` succeeds where it raised before
  - 166/166 tests in tests/agent/transports/ pass on Windows + Python 3.12
2026-05-09 02:25:37 -07:00
qWaitCrypto
2c8c48fbc7 fix(webui): clarify MEDIA absolute-path hint 2026-05-09 02:22:40 -07:00
qWaitCrypto
aad5490e74 fix(webui): add platform hint for MEDIA rendering
WebUI sessions construct AIAgent(platform="webui") but PLATFORM_HINTS
had no "webui" entry, so the agent received no platform hint at all.
The WebUI frontend supports rich MEDIA:/absolute/path previews for
images, audio, video, PDF, HTML, CSV, diffs, and Excalidraw, but
without a hint the agent either ignores MEDIA: or falls back to
Markdown image syntax which silently fails for local files.

Add a webui hint that documents the MEDIA: render path and warns
against ![alt](/path) for local files.

Fixes #21883
2026-05-09 02:22:40 -07:00
kshitij
c7e8add120
fix(context): handle JSON decode errors in compression — salvage of #22248 (#22416)
When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON
body with `Content-Type: application/json` — e.g. an HTML 502 page from a
misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw
`json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose
message contains "expecting value"). Previously this fell through to the
unknown-error branch and entered a 60s cooldown without retrying on the
main model, dropping the middle conversation turns instead.

This change folds JSON-decode detection into the existing fast-path
fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring
match for "expecting value", retry once on the main model, and use a
shorter 30s cooldown when already on main (the body shape tends to flip
back to valid quickly when the upstream proxy recovers).

The three duplicated fallback bodies (model-not-found, unknown-error,
JSON-decode) are consolidated into a single `_fallback_to_main_for_compression`
helper that handles the shared bookkeeping (record aux-model failure for
`/usage`-style callers, clear summary_model, clear cooldown).

Also adds three unit tests covering: raw `JSONDecodeError` retries on main,
substring-match for wrapped exceptions, and the 30s cooldown when already
on main.

Salvage of #22248 by @0xharryriddle. Closes #22244.

Co-authored-by: Harry Riddle <ntconguit@gmail.com>
2026-05-09 01:47:15 -07:00
Teknium
0ec052ca24
perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138)
Interactive `hermes` launch drops from ~21s to ~2.5s. Three independent
fixes, each targets a distinct hot spot in the banner / tool-registration
path that fires on every CLI invocation.

1. `get_external_skills_dirs()` in-process mtime cache (~10s saved)
   The function re-read + YAML-parsed the full ~/.hermes/config.yaml on
   every call. Banner build invokes it once per skill to resolve the
   category column, which on a 120-skill install meant ~120 reparses of
   a 15 KB config (~85 ms each). Added a
   `(config_path, mtime_ns) -> list[Path]` memo; stat() is ~2 us vs
   ~85 ms for the parse. Edits to config.yaml invalidate the cache on
   the next call via mtime.

2. Feishu availability probe uses `importlib.util.find_spec` (~5.2s saved)
   `tools/feishu_doc_tool.py::_check_feishu` and the identical helper in
   `feishu_drive_tool.py` were calling `import lark_oapi` purely to
   detect whether the SDK was installed. Executing the real import pulls
   in websockets + dispatcher + every v2 API model — ~5 seconds of work
   that fires at every tool-registry bootstrap. `find_spec` answers the
   same question ("is lark_oapi importable?") without executing the
   module. The actual tool handlers still do the real import on invoke,
   so runtime behavior is unchanged.

3. `_web_requires_env` no longer triggers Nous portal refresh (~800ms saved)
   `tools/web_tools.py::_web_requires_env` used
   `managed_nous_tools_enabled()` to gate four gateway env-var names in
   the returned list. The gate called `get_nous_auth_status()` ->
   `resolve_nous_runtime_credentials()` -> live HTTP POST to the portal
   on every tool-registry bootstrap. But the list is pure metadata — if
   the env var is set at runtime, the tool lights up; otherwise it
   doesn't. Including the four names unconditionally is harmless for
   unsubscribed users (vars just aren't set) and eliminates the sync
   HTTP round trip from startup.

Test:
- tests/agent/test_external_skills_dirs_cache.py (new, 6 cases):
  returns config'd dir, caches on second call (yaml_load patched to
  raise — never invoked), invalidates on mtime bump, empty when config
  missing, returned list is a defensive copy, per-HERMES_HOME cache key
  isolation.
- Existing tests/agent/test_external_skills.py and tests/tools/
  continue to pass modulo pre-existing flakes on main (test_delegate,
  test_send_message — unrelated, pass in isolation).

Measured: bare `hermes` (cold → REPL ready) 21,519ms -> 2,618ms on
Teknium's install (119 skills, 15 KB config.yaml, Nous auth logged in,
lark_oapi installed). 8x faster.
2026-05-08 16:39:32 -07:00
Teknium
cc38282b04 feat(cross-platform): psutil for PID/process management + Windows footgun checker
## Why

Hermes supports Linux, macOS, and native Windows, but the codebase grew up
POSIX-first and has accumulated patterns that silently break (or worse,
silently kill!) on Windows:

- `os.kill(pid, 0)` as a liveness probe — on Windows this maps to
  CTRL_C_EVENT and broadcasts Ctrl+C to the target's entire console
  process group (bpo-14484, open since 2012).
- `os.killpg` — doesn't exist on Windows at all (AttributeError).
- `os.setsid` / `os.getuid` / `os.geteuid` — same.
- `signal.SIGKILL` / `signal.SIGHUP` / `signal.SIGUSR1` — module-attr
  errors at runtime on Windows.
- `open(path)` / `open(path, "r")` without explicit encoding= — inherits
  the platform default, which is cp1252/mbcs on Windows (UTF-8 on POSIX),
  causing mojibake round-tripping between hosts.
- `wmic` — removed from Windows 10 21H1+.

This commit does three things:

1. Makes `psutil` a core dependency and migrates critical callsites to it.
2. Adds a grep-based CI gate (`scripts/check-windows-footguns.py`) that
   blocks new instances of any of the above patterns.
3. Fixes every existing instance in the codebase so the baseline is clean.

## What changed

### 1. psutil as a core dependency (pyproject.toml)

Added `psutil>=5.9.0,<8` to core deps. psutil is the canonical
cross-platform answer for "is this PID alive" and "kill this process
tree" — its `pid_exists()` uses `OpenProcess + GetExitCodeProcess` on
Windows (NOT a signal call), and its `Process.children(recursive=True)`
+ `.kill()` combo replaces `os.killpg()` portably.

### 2. `gateway/status.py::_pid_exists`

Rewrote to call `psutil.pid_exists()` first, falling back to the
hand-rolled ctypes `OpenProcess + WaitForSingleObject` dance on Windows
(and `os.kill(pid, 0)` on POSIX) only if psutil is somehow missing —
e.g. during the scaffold phase of a fresh install before pip finishes.

### 3. `os.killpg` migration to psutil (7 callsites, 5 files)

- `tools/code_execution_tool.py`
- `tools/process_registry.py`
- `tools/tts_tool.py`
- `tools/environments/local.py` (3 sites kept as-is, suppressed with
  `# windows-footgun: ok` — the pgid semantics psutil can't replicate,
  and the calls are already Windows-guarded at the outer branch)
- `gateway/platforms/whatsapp.py`

### 4. `scripts/check-windows-footguns.py` (NEW, 500 lines)

Grep-based checker with 11 rules covering every Windows cross-platform
footgun we've hit so far:

1. `os.kill(pid, 0)` — the silent killer
2. `os.setsid` without guard
3. `os.killpg` (recommends psutil)
4. `os.getuid` / `os.geteuid` / `os.getgid`
5. `os.fork`
6. `signal.SIGKILL`
7. `signal.SIGHUP/SIGUSR1/SIGUSR2/SIGALRM/SIGCHLD/SIGPIPE/SIGQUIT`
8. `subprocess` shebang script invocation
9. `wmic` without `shutil.which` guard
10. Hardcoded `~/Desktop` (OneDrive trap)
11. `asyncio.add_signal_handler` without try/except
12. `open()` without `encoding=` on text mode

Features:
- Triple-quoted-docstring aware (won't flag prose inside docstrings)
- Trailing-comment aware (won't flag mentions in `# os.kill(pid, 0)` comments)
- Guard-hint aware (skips lines with `hasattr(os, ...)`,
  `shutil.which(...)`, `if platform.system() != 'Windows'`, etc.)
- Inline suppression with `# windows-footgun: ok — <reason>`
- `--list` to print all rules with fixes
- `--all` / `--diff <ref>` / staged-files (default) modes
- Scans 380 files in under 2 seconds

### 5. CI integration

A GitHub Actions workflow that runs the checker on every PR and push is
staged at `/tmp/hermes-stash/windows-footguns.yml` — not included in this
commit because the GH token on the push machine lacks `workflow` scope.
A maintainer with `workflow` permissions should add it as
`.github/workflows/windows-footguns.yml` in a follow-up. Content:

```yaml
name: Windows footgun check
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: {python-version: "3.11"}
      - run: python scripts/check-windows-footguns.py --all
```

### 6. CONTRIBUTING.md — "Cross-Platform Compatibility" expansion

Expanded from 5 to 16 rules, each with message, example, and fix.
Recommends psutil as the preferred API for PID / process-tree operations.

### 7. Baseline cleanup (91 → 0 findings)

- 14 `open()` sites → added `encoding='utf-8'` (internal logs/caches) or
  `encoding='utf-8-sig'` (user-editable files that Notepad may BOM)
- 23 POSIX-only callsites in systemd helpers, pty_bridge, and plugin
  tool subprocess management → annotated with
  `# windows-footgun: ok — <reason>`
- 7 `os.killpg` sites → migrated to psutil (see §3 above)

## Verification

```
$ python scripts/check-windows-footguns.py --all
✓ No Windows footguns found (380 file(s) scanned).

$ python -c "from gateway.status import _pid_exists; import os
> print('self:', _pid_exists(os.getpid())); print('bogus:', _pid_exists(999999))"
self: True
bogus: False
```

Proof-of-repro that `os.kill(pid, 0)` was actually killing processes
before this fix — see commit `1cbe39914` and bpo-14484. This commit
removes the last hand-rolled ctypes path from the hot liveness-check
path and defers to the best-maintained cross-platform answer.
2026-05-08 14:27:40 -07:00
Teknium
40e7a71c35 feat: enrich system-prompt environment hints with host + terminal-backend info
build_environment_hints() now emits a factual block describing the
execution environment on every prompt build:

* Local backend: host OS, $HOME, and cwd — so the agent stops guessing
  paths from the hostname. Windows also gets two specific callouts:
  - hostname != username (prevents C:\Users\<hostname>\... bugs)
  - `terminal` shells out to bash (git-bash/MSYS), not PowerShell

* Remote backend (docker/singularity/modal/daytona/ssh/vercel_sandbox):
  host info is SUPPRESSED — the agent's tools can't touch the host, so
  showing it is misleading. Instead we probe the backend once per
  process with `uname/whoami/pwd` and cache the result. On probe
  failure, fall back to a per-backend description that states only what
  we know from the backend choice itself (container type + likely OS
  family) without inventing user/cwd/$HOME.

Linux/Mac local users now get a small helpful 3-line host block instead
of an empty string. Zero change to the existing WSL hint paragraph.

Tests: 8 new/updated in TestEnvironmentHints, including a regression
guard that fails if a new remote backend is added without listing it in
_REMOTE_TERMINAL_BACKENDS.
2026-05-08 14:27:40 -07:00
Teknium
cbce5e93fc codebase: add encoding='utf-8' to all bare open() calls (PLW1514)
Closes the last Python-on-Windows UTF-8 exposure by making every
text-mode open() call explicit about its encoding.

Before: on Windows, bare open(path, 'r') defaults to the system
locale encoding (cp1252 on US-locale installs).  That means reading
any config/yaml/markdown/json file with non-ASCII content either
crashes with UnicodeDecodeError or silently mis-decodes bytes.

After: all 89 affected call sites in production code now pass
encoding='utf-8' explicitly.  Works identically on every platform
and every locale, no surprise behavior.

Mechanical sweep via:
  ruff check --preview --extend-select PLW1514 --unsafe-fixes --fix     --exclude 'tests,venv,.venv,node_modules,website,optional-skills,               skills,tinker-atropos,plugins' .

All 89 fixes have the same shape: open(x) or open(x, mode) became
open(x, encoding='utf-8') or open(x, mode, encoding='utf-8').  Nothing
else changed.  Every modified file still parses and the Windows/sandbox
test suite is still green (85 passed, 14 skipped, 0 failed across
tests/tools/test_code_execution_windows_env.py +
tests/tools/test_code_execution_modes.py + tests/tools/test_env_passthrough.py +
tests/test_hermes_bootstrap.py).

Scope notes:
  - tests/ excluded: test fixtures can use locale encoding intentionally
    (exercising edge cases).  If we want to tighten tests later that's
    a separate PR.
  - plugins/ excluded: plugin-specific conventions may differ; plugin
    authors own their code.
  - optional-skills/ and skills/ excluded: skill scripts are user-authored
    and we don't want to mass-edit them.
  - website/ and tinker-atropos/ excluded: vendored / generated content.

46 files touched, 89 +/- lines (symmetric replacement).  No behavior
change on POSIX or on Windows when the file is ASCII; bug fix on
Windows when the file contains non-ASCII.
2026-05-08 14:27:40 -07:00
ddupont
e31f3b3c56 feat(computer-use): background focus-safe backend — set_value, structured windows, MIME detection
Extends the cua-driver computer-use backend to drive backgrounded macOS
windows without stealing keyboard or mouse focus from the foreground app.
All changes target the cua-driver MCP backend and the shared dispatcher.

## cua_backend.py

**Window-aware capture**: capture() now calls list_windows + get_window_state
instead of the removed capture tool. Prefers structuredContent.windows
(MCP 2024-11-05+ cua-driver) for zero-parse window enumeration; falls back
to regex-parsed text for older builds. Stores the selected (pid, window_id)
as sticky context so subsequent action calls do not need a redundant round-trip.

**Action routing**: click/scroll/type_text/key all carry the sticky pid
(and window_id for element-indexed clicks). type_text routes through
type_text_chars (individual key events) rather than AX attribute write --
WebKit AXTextFields reject attribute writes from backgrounded processes.

**Key parsing**: _parse_key_combo splits cmd+s-style strings into
(key, [modifiers]) and routes to hotkey (modifier present) or
press_key (bare key) -- cua-driver actual tool names.

**set_value method**: new set_value(value, element) calls the cua-driver
set_value MCP tool. For AXPopUpButton / HTML select in a backgrounded Safari,
AXPress opens the native macOS popup which closes immediately when the app is
non-frontmost; set_value AX-presses the matching child option directly
(no menu required, no focus steal).

**focus_app**: reimplemented as a pure window-selector (enumerates
list_windows, sets sticky pid/window_id) without ever raising the window
or stealing focus.

**list_apps**: fixed tool name from listApps to list_apps; handles plain-text
response via regex when structured data is absent.

**Structured-content extraction**: _extract_tool_result now surfaces
structuredContent from MCP results, enabling the list_windows window array
without text parsing.

**Helpers**: _parse_windows_from_text, _parse_elements_from_tree,
_split_tree_text, _parse_key_combo extracted as module-level functions.

## schema.py

Added set_value to the action enum with a description explaining when to
prefer it over click (select/popup elements, sliders, no focus steal).
Added value field for set_value payloads.

## tool.py

Routed set_value action through _dispatch to backend.set_value.
Added set_value to _DESTRUCTIVE_ACTIONS (approval-gated).
Fixed MIME-type detection in _capture_response: cua-driver may return
JPEG; detect from base64 magic bytes (/9j/ -> image/jpeg, else image/png)
rather than hardcoding image/png.

## agent/display.py + run_agent.py

Guard _detect_tool_failure and result-preview logic against non-string
function_result values: multimodal tool results (dicts with _multimodal=True)
are not string-sliceable; treat them as successes and fall back to str()
for length/preview.
2026-05-08 11:07:38 -07:00
Teknium
850413f120 feat(computer-use): cua-driver backend, universal any-model schema
Background macOS desktop control via cua-driver MCP — does NOT steal the
user's cursor or keyboard focus, works with any tool-capable model.

Replaces the Anthropic-native `computer_20251124` approach from the
abandoned #4562 with a generic OpenAI function-calling schema plus SOM
(set-of-mark) captures so Claude, GPT, Gemini, and open models can all
drive the desktop via numbered element indices.

- `tools/computer_use/` package — swappable ComputerUseBackend ABC +
  CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary).
- Universal `computer_use` tool with one schema for all providers.
  Actions: capture (som/vision/ax), click, double_click, right_click,
  middle_click, drag, scroll, type, key, wait, list_apps, focus_app.
- Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style
  `content: [text, image_url]` parts) that flows through
  handle_function_call into the tool message. Anthropic adapter converts
  into native `tool_result` image blocks; OpenAI-compatible providers
  get the parts list directly.
- Image eviction in convert_messages_to_anthropic: only the 3 most
  recent screenshots carry real image data; older ones become text
  placeholders to cap per-turn token cost.
- Context compressor image pruning: old multimodal tool results have
  their image parts stripped instead of being skipped.
- Image-aware token estimation: each image counts as a flat 1500 tokens
  instead of its base64 char length (~1MB would have registered as
  ~250K tokens before).
- COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset
  is active.
- Session DB persistence strips base64 from multimodal tool messages.
- Trajectory saver normalises multimodal messages to text-only.
- `hermes tools` post-setup installs cua-driver via the upstream script
  and prints permission-grant instructions.
- CLI approval callback wired so destructive computer_use actions go
  through the same prompt_toolkit approval dialog as terminal commands.
- Hard safety guards at the tool level: blocked type patterns
  (curl|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash,
  force delete, lock screen, log out).
- Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic)
  workflow guide.
- Docs: `user-guide/features/computer-use.md` plus reference catalog
  entries.

44 new tests in tests/tools/test_computer_use.py covering schema
shape (universal, not Anthropic-native), dispatch routing, safety
guards, multimodal envelope, Anthropic adapter conversion, screenshot
eviction, context compressor pruning, image-aware token estimation,
run_agent helpers, and universality guarantees.

469/469 pass across tests/tools/test_computer_use.py + the affected
agent/ test suites.

- `model_tools.py` provider-gating: the tool is available to every
  provider. Providers without multi-part tool message support will see
  text-only tool results (graceful degradation via `text_summary`).
- Anthropic server-side `clear_tool_uses_20250919` — deferred;
  client-side eviction + compressor pruning cover the same cost ceiling
  without a beta header.

- macOS only. cua-driver uses private SkyLight SPIs
  (SLEventPostToPid, SLPSPostEventRecordTo,
  _AXObserverAddNotificationAndCheckRemote) that can break on any macOS
  update. Pin with HERMES_CUA_DRIVER_VERSION.
- Requires Accessibility + Screen Recording permissions — the post-setup
  prints the Settings path.

Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic-
native schema). Credit @0xbyt4 for the original #3816 groundwork whose
context/eviction/token design is preserved here in generic form.
2026-05-08 11:07:38 -07:00
kshitijk4poor
81928f03ab refactor(gmi): move User-Agent to profile.default_headers
The previous revision of this PR added six GMI-specific branches
(`elif base_url_host_matches(..., 'api.gmi-serving.com')`) across
run_agent.py and agent/auxiliary_client.py, plus a _HERMES_UA_HEADERS
constant in auxiliary_client.py.

ProviderProfile already has a `default_headers: dict[str, str]` field
commented as 'Client-level quirks (set once at client construction)'.
Other plugins (ai-gateway, kimi-coding) already use it. Two of the four
auxiliary_client sites we previously patched already had a generic
`else: profile.default_headers` fallback that picked it up (so did
both run_agent sites).

This revision:

* Sets `default_headers={'User-Agent': 'HermesAgent/<ver>'}` on the
  GMI profile in plugins/model-providers/gmi/__init__.py.
* Reverts all six GMI-specific branches in run_agent.py and
  auxiliary_client.py.
* Adds the generic profile-fallback `else` block to the two
  auxiliary_client sites (`_to_async_client`, `resolve_provider_client`)
  that didn't have it yet. This benefits every provider whose profile
  declares default_headers, not just GMI — e.g. Vercel AI Gateway's
  HTTP-Referer/X-Title now flow through the async client path too.
* Replaces the GMI-specific URL-branch tests with a profile-level
  assertion and keeps the run_agent integration test (with
  `provider='gmi'` so the fallback picks up the profile).

Net diff vs main: +82/-0 across 5 files, touching only the GMI plugin,
two generic fallback blocks in auxiliary_client.py, AUTHOR_MAP, and
tests. No core files change.

Based on #20907 by @isaachuangGMICLOUD.
2026-05-08 03:22:11 -07:00
Austin Pickett
d87c7b99e2
fix(analytics): prevent silent token loss and add Claude 4.5–4.7 pricing (#21455)
- Add pricing entries for Claude Opus 4.5/4.6/4.7, Sonnet 4.5/4.6, and
  Haiku 4.5 with updated source URLs (platform.claude.com)
- Add _normalize_anthropic_model_name() to handle dot-notation variants
  (e.g. claude-opus-4.7 → claude-opus-4-7) for pricing lookups
- Fix silent token loss: ensure session row exists before UPDATE in both
  run_agent.py and hermes_state.py (INSERT OR IGNORE is idempotent)
- Log token persistence failures at DEBUG level instead of swallowing
  them silently — makes undercounted analytics diagnosable
- Surface reasoning tokens in CLI /usage and TUI usage panel
- Add 'reasoning' and 'cost_status' fields to TUI Usage type
2026-05-07 13:24:31 -07:00
LeonSGP43
fc88eec926 fix(compressor): soften summary prompt for content filters 2026-05-07 06:42:32 -07:00