hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-02 12:13:05 +00:00

Author	SHA1	Message	Date
rob-maron	2863e9484a	Use nous portal as model metadata authority (#24502 ) * nous portal metadata resolver * minor fixes	2026-05-12 11:59:31 -07:00
Teknium	c1eb2dcda7	feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220 ) * feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback Three coordinated mitigations for the Mini Shai-Hulud worm hitting mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package compromise that follows. # What this PR makes true 1. Users with the poisoned mistralai 2.4.6 in their venv get a loud detection banner with copy-pasteable remediation steps the moment they run hermes (and on every gateway startup). 2. One quarantined / yanked PyPI package can no longer silently demote a fresh install to 'core only' — the installer keeps every other extra and tells the user which tier landed. 3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can lazy-install on first use under a strict allowlist, instead of eagerly pulling everything at install time. # Detection: hermes_cli/security_advisories.py - ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for mistralai==2.4.6). Adding the next one is a single dataclass. - detect_compromised() uses importlib.metadata.version() — no pip dependency, works in uv venvs that lack pip. - Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits the startup banner to once per 24h per advisory. - Acks persisted to security.acked_advisories in config.yaml; never re-banner after ack. - Wired into: * hermes doctor — runs first, prints full remediation block * hermes doctor --ack <id> — dismisses an advisory * cli.py interactive run() and single-query branches — short stderr banner pointing at hermes doctor * gateway/run.py startup — operator-visible warning in gateway.log # Lazy-install framework: tools/lazy_deps.py - LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs, memory.honcho, provider.bedrock, etc.) to pip specs. - ensure(feature) installs missing deps in the active venv via the uv → pip → ensurepip ladder (matches tools_config._pip_install). - Strict spec safety regex rejects URLs, file paths, shell metas, pip flag injection, control chars — only PyPI-by-name accepted. - Gated on security.allow_lazy_installs (default true) plus the HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs. - Migrated three backends as proof of pattern: * tools/tts_tool.py — _import_elevenlabs() calls ensure first * plugins/memory/honcho/client.py — get_honcho_client lazy-installs * tts.mistral / stt.mistral entries pre-registered for when PyPI restores mistralai # Installer fallback tiers scripts/install.sh, scripts/install.ps1, setup-hermes.sh: - Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one array when a transitive breaks; users keep every other extra. - New 'all minus known-broken' tier between [all] and the existing PyPI-only-extras tier. Only kicks in when [all] fails resolve. - All three tiers explicit: every fallback announces which tier landed and prints a re-run hint when not on Tier 1. - install.ps1 and install.sh both regenerate their tier specs from the same _BROKEN_EXTRAS array so updates stay in sync. Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral' in its extra list — bug fixed by the refactor (mistral is filtered out). # Config hermes_cli/config.py — DEFAULT_CONFIG.security gains: - acked_advisories: [] (advisory IDs the user has dismissed) - allow_lazy_installs: True (security gate for ensure()) No config version bump needed — both keys nest under existing security: block, and load_config's deep-merge picks up DEFAULT_CONFIG defaults for users with older configs. # Tests tests/hermes_cli/test_security_advisories.py — 23 tests covering: - detect_compromised matches/non-matches, wildcard frozenset - ack persistence, idempotence, blank rejection, config-failure path - banner cache rate limiting + 24h re-banner + ack-stops-banner - short_banner_lines / full_remediation_text / render_doctor_section / gateway_log_message - shipped catalog well-formedness invariant tests/tools/test_lazy_deps.py — 40 tests covering: - spec safety: 11 safe parametrized + 18 unsafe parametrized - allowlist: unknown-feature rejection, namespace.name shape, every shipped spec passes the safety regex - security gating: config flag, env var, default, fail-open - ensure() happy/sad paths: already-satisfied, install success, pip stderr surfaced on failure, install-succeeds-but-still-missing - is_available, feature_install_command Combined: 63 new tests, all passing under scripts/run_tests.sh. # Validation - scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py tests/tools/test_lazy_deps.py → 63/63 passing - scripts/run_tests.sh tests/hermes_cli/test_doctor.py tests/hermes_cli/test_doctor_command_install.py tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing - scripts/run_tests.sh tests/hermes_cli/ tests/tools/ → 9191 passed, 8 pre-existing failures (verified on origin/main before this change) - bash -n on install.sh and setup-hermes.sh → OK - py_compile on all modified .py files → OK - End-to-end smoke test of detect_compromised + render_doctor_section + gateway_log_message with mocked installed version → produces copy-pasteable remediation output # Community Full advisory + remediation steps: website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md Short-form post drafts (Discord, GitHub pinned issue, README banner): scripts/community-announcement-shai-hulud.md Refs: PR #24205 (mistral disabled), Socket Security advisory <https://socket.dev/blog/mini-shai-hulud-worm-pypi> * build(deps): pin every direct dep to ==X.Y.Z (no ranges) Companion to the supply-chain advisory work: replace every >=/</~= range in pyproject.toml's [project.dependencies] and [project.optional-dependencies] with an exact ==X.Y.Z pin sourced from uv.lock. Why: ranges allow PyPI to ship a fresh version of any direct dep at any time without a code review on our side. With ranges, the malicious mistralai 2.4.6 release would have been pulled by every fresh 'pip install -e .[all]' for the hours between upload and PyPI's quarantine — exactly the install window we got hit on. Exact pins close that window: the only way a new package version reaches a user is via an intentional update on our end. What the user-facing change is: nothing, behavior-wise. Every package resolves to the same version it was already resolving to via uv.lock — the pins just remove the resolver's freedom to pick a different one. Cost: any user installing Hermes alongside another package that requires a newer pin gets a resolver conflict. Acceptable for our isolated-venv install path; documented in the new comment block. Build-system requires line (setuptools>=61.0) is intentionally left as a range — pinning the build backend would block fresh pip from bootstrapping the build on architectures where that exact wheel isn't available. mistral extra (mistralai==2.3.0) is pinned but stays out of [all] (per PR #24205). 'uv lock' regeneration will fail until PyPI restores mistralai; lockfile regeneration is gated behind that, NOT on every PR. LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy- install pathway can never resolve a different version than the one declared in pyproject.toml. Validation: - Cross-checked all 77 pinned direct deps in pyproject.toml against uv.lock — every pin matches the resolved version exactly. - Cross-checked all LAZY_DEPS specs against uv.lock — same. - 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly. - tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py → 63/63 passing (every shipped spec passes the safety regex). - Doctor + TTS + transcription targeted suite → 146/146 passing. * build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra You asked: 'what about the dependencies the dependencies rely on?' — correctly noting that exact-pinning direct deps in pyproject.toml does NOT cover the transitive graph. `pip install` and `uv pip install` both re-resolve transitives fresh from PyPI at install time, so a compromised transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would still hit our users even with every direct dep exact-pinned. # What this commit fixes 1. Both real installer scripts now prefer `uv sync --locked` as Tier 0. uv.lock records SHA256 hashes for every transitive — a compromised package with a different hash gets REJECTED. Falls through to the existing `uv pip install` cascade if the lockfile is missing or stale, with a loud warning that the fallback path does NOT hash-verify transitives. Previously only `setup-hermes.sh` (the dev path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1` (the paths fresh users actually run) skipped it. 2. Removed the `[mistral]` extra entirely. The `mistralai` PyPI project is fully quarantined right now — every version returns 404, so any pin we wrote was unresolvable, which broke `uv lock --check` in CI. Restoration is documented in pyproject.toml as a 5-step checklist (verify, re-add extra, re-enable in 4 modules, regenerate lock, optionally re-add to [all]). 3. Regenerated uv.lock. 262 packages, mistralai/eval-type-backport/ jsonpath-python pruned. `uv lock --check` now passes. # Defense-in-depth view \| Layer \| Where \| Protects against \| \|----------------------------\|-------------------\|-------------------------------------------\| \| Exact pins in pyproject \| direct deps \| new mistralai 2.4.6-style direct compromise \| \| uv.lock + `--locked` install \| transitive graph \| transitive worm injection \| \| Tier-0 hash-verified path \| install.sh / .ps1 \| actually USE the lockfile in fresh installs \| \| `uv lock --check` CI gate \| every PR \| drift between pyproject and lockfile \| \| `hermes_cli/security_advisories.py` \| runtime \| cleanup for users who already got hit \| The exact pinning + hash verification together close the supply-chain gap. Without the lockfile path, exact pins alone are theater. # Validation - `uv lock --check` → passes (262 packages resolved, no drift). - `bash -n` on install.sh + setup-hermes.sh → OK. - 209/209 tests passing across new + adjacent test files (test_lazy_deps.py, test_security_advisories.py, test_doctor.py, test_tts_mistral.py, test_transcription_tools.py). - TOML parse OK. * chore: remove community announcement drafts (PR body covers it) * build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard) Extends the lazy-install framework to cover everything that's not used by every hermes session. Base install drops from ~60 packages to 45. Moved out of core dependencies = []: - anthropic (only when provider=anthropic native, not via aggregators) - exa-py, firecrawl-py, parallel-web (search backends; only when picked) - fal-client (image gen; only when picked) - edge-tts (default TTS but still optional) New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web] [fal] [edge-tts]. All added to [all]. New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel}, tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix}, terminal.{modal,daytona,vercel}, tool.dashboard. Each import site now calls ensure() before importing the SDK. Where the module had a top-level try/except (telegram, discord, fastapi), the graceful-fallback pattern was extended to lazy-install on first check_*_requirements() call and re-bind module globals. Updated test_windows_native_support.py tzdata check from snapshot (>=2023.3 literal) to invariant (any version + win32 marker). Validation: - Base install: 45 packages (was ~60); 6 newly-extracted packages absent - uv lock --check: passes (262 packages, no drift) - 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing - py_compile clean on all 12 modified modules	2026-05-12 01:02:25 -07:00
Robin Fernandes	94d9db72ba	add client marker tag on aux inference requests	2026-05-11 22:30:42 -07:00
rob-maron	32abe742fa	fix comment	2026-05-11 21:30:29 -07:00
rob-maron	f0c2964f0b	remove comments	2026-05-11 21:30:29 -07:00
rob-maron	057fc7b073	fix guard	2026-05-11 21:30:29 -07:00
rob-maron	528bba6734	fix kimi	2026-05-11 21:30:29 -07:00
Teknium	ea1d0462cf	fix(cli): vertical fallback for markdown tables wider than terminal (#23948 ) Follow-up to #23863 (CJK table alignment). The realigner was correctly padding pipes to identical column offsets, but when a table's natural width exceeds terminal cells it produced lines that the terminal soft-wrapped mid-cell, destroying column alignment visually even though the bytes were perfectly padded. Reported as 'columns are not aligned' on tables containing one long row alongside several short rows. Approach mirrors Claude Code's MarkdownTable.tsx narrow-terminal fallback: when realign_markdown_tables is given an available_width budget and the rebuilt horizontal table exceeds it, render each body row as 'Header: value' lines separated by a thin ─ rule. Word-wraps oversize values at the budget with a 2-space continuation indent. - agent/markdown_tables.py: realign_markdown_tables(text, available_width=None); threshold check at the top of _render_block flips into a new _render_vertical fallback. Includes _wrap_to_width with hard-break for tokens longer than the budget. - cli.py: helper _terminal_width_for_streaming() returns shutil.get_terminal_size().columns minus _STREAM_PAD and a 2-cell safety margin; passed to all three realign call sites (_render_final_assistant_content for strip+render Panel paths, and the streaming flushers in _emit_stream_text / _flush_stream). - tests/agent/test_markdown_tables.py: 4 new tests covering the overflow-vertical fallback for ASCII + CJK content, the 'fits → keep horizontal' case, and the long-cell wrap with indent. Live-verified: with COLUMNS=100, the user's reported 'long row in ASCII table' case now renders as vertical key-value rows that all fit the panel; the 6-column CJK comparison table still renders as an aligned horizontal table because it fits inside 100 cols.	2026-05-11 16:49:13 -07:00
nicoechaniz	e2b713cced	fix(model-metadata): skip OpenRouter for known providers, add kimi/moonshot to PROVIDER_TO_MODELS_DEV Based on PR #23950 by @nicoechaniz. - Add "kimi" and "moonshot" to PROVIDER_TO_MODELS_DEV → kimi-for-coding - Gate OpenRouter metadata step behind "if not effective_provider": known providers should not be overridden by community-maintained OR data - Keep the targeted Kimi-family 32k guard as a secondary safety net inside the OR gate (for unknown providers with Kimi models) Co-authored-by: nicoechaniz <nicoechaniz@altermundi.net>	2026-05-11 13:16:07 -07:00
kshitijk4poor	91eef6255e	fix: correct context-length resolution for kimi-k2.6 on Ollama Cloud and Kimi Coding Kimi-k2.6 (which supports 262K context) was incorrectly resolved as 32K, tripping the 64K minimum-context guard and preventing use of the model on Ollama Cloud and Kimi Coding / Moonshot providers. Three fixes in the context-length resolution chain: 1. Ollama Cloud native /api/show query: new _query_ollama_api_show() queries the Ollama native API for authoritative GGUF model_info context_length. For hosted Ollama, prefers model_info over num_ctx since users can't set their own num_ctx on Cloud. Added at step 5e in get_model_context_length(), before the models.dev fallback. 2. models.dev :cloud/-cloud suffix fallback: lookup_models_dev_context() now also tries appending :cloud and -cloud suffixes when the bare model name doesn't match. models.dev stores 'kimi-k2.6:cloud' but users and the live API use bare 'kimi-k2.6'. 3. Kimi-family 32K guard: after the OpenRouter metadata step, reject exactly 32768 for Kimi-named models (kimi-, moonshot) and fall through to hardcoded defaults ('kimi': 262144). OpenRouter reports 32768 for moonshotai/kimi-k2.6 but the model actually supports 262K. Narrow filter — only 32768, only Kimi-family — becomes dead code when OpenRouter updates its metadata. ---	2026-05-11 13:16:07 -07:00
Teknium	7b76366552	feat(prompt-cache): cross-session 1h prefix cache for Claude on Anthropic / OpenRouter / Nous Portal (#23828 ) Cuts input cost for first-turn Claude requests by ~85-90% on subsequent sessions within an hour. Tools array (~13k tokens for default toolset) + stable system prefix (~5-8k tokens) get a 1h cache_control marker; the volatile suffix (memory, USER profile, timestamp, session id) sits in a separate non-cached block at the end so it doesn't poison the cross-session prefix when it changes. Provider gate: Claude on native Anthropic (incl. OAuth subscription), OpenRouter, and Nous Portal (which proxies to OpenRouter). All other providers keep today's system_and_3 layout unchanged. Layout (4 cache_control breakpoints, Anthropic max): 1. tools[-1] -> 1h (cross-session) 2. system content[0] -> 1h (cross-session, stable prefix) 3. messages[-2] -> 5m (within-session rolling) 4. messages[-1] -> 5m (within-session rolling) Within-session rolling shrinks from 3 messages to 2 to free the breakpoint budget. On Claude with realistic tool loadouts the long-lived tier carries the bulk of cross-session value anyway. System prompt is now always assembled cache-friendly: stable identity / guidance / skills / platform hints first, then session-stable context files (AGENTS.md, .cursorrules), then per-call volatile content. Old single-string callers see the same logical content (same join order), just reordered so volatile lives at the end. Config knobs (defaults shown): prompt_caching: cache_ttl: "5m" # rolling-window TTL (unchanged) long_lived_prefix: true # opt-out switch long_lived_ttl: "1h" # cross-session prefix TTL Live E2E (tests/agent/test_prompt_caching_live.py, gated on OPENROUTER_API_KEY) on anthropic/claude-haiku-4.5 with default toolset: Call 1 (cold): cache_write=13,415 cache_read=0 Call 2 (NEW agent + msg): cache_write=391 cache_read=13,025 Cross-session reuse: 97.09% Implementation: * agent/prompt_caching.py: new apply_anthropic_cache_control_long_lived() + mark_tools_for_long_lived_cache(); existing apply_anthropic_cache_control() preserved verbatim for the fallback path. * agent/anthropic_adapter.py: convert_tools_to_anthropic() now forwards cache_control onto each Anthropic-format tool dict. * run_agent.py: _build_system_prompt_parts() returns the 3-tier dict; _build_system_prompt() joins them (backward compatible). _supports_long_lived_anthropic_cache() policy added next to the existing _anthropic_prompt_cache_policy() (which now also recognises Nous Portal Claude — pre-existing gap fixed in passing). _build_api_kwargs() resolves tools_for_api once and propagates the marker through all four build paths (anthropic_messages, bedrock, codex_responses, profile/legacy chat completions). Long-lived flag plumbed into the runtime snapshot/restore + model-switch + fallback-promotion paths. Tests: * tests/agent/test_prompt_caching.py: +8 tests (TestMarkToolsForLongLivedCache, TestApplyAnthropicCacheControlLongLived). * tests/run_agent/test_anthropic_prompt_cache_policy.py: +9 tests (TestSupportsLongLivedAnthropicCache matrix across 8 endpoint classes + a fallback-target case). * tests/agent/test_prompt_caching_live.py: new live E2E (skipif when OPENROUTER_API_KEY is unset; runs outside the hermetic suite). * Targeted suites: 327/327 pass (caching/adapter/policy/builder). * tests/agent/ + tests/run_agent/: 3992 pass, 17 skip, 1 pre-existing flake (test_async_httpx_del_neuter::test_same_key_replaces_stale_loop_entry, verified failing on pristine origin/main).	2026-05-11 11:14:56 -07:00
kshitij	2ec8d2b42f	chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937 ) Replace with for all literal-tuple membership tests. Set lookup is O(1) vs O(n) for tuple — consistent micro-optimization across the codebase. 608 instances fixed via `ruff --fix --unsafe-fixes`, 0 remaining. 133 files, +626/-626 (net zero).	2026-05-11 11:13:25 -07:00
wuli666	111b859e49	fix(auxiliary): evict async wrappers on poisoned client (follow-up to #23482 ) #23482 fixed cache poisoning in the sync path: when a Codex auxiliary timeout closes the underlying OpenAI client, _evict_cached_client_instance walks CodexAuxiliaryClient wrappers via their _real_client attribute and drops the cache entry so the next aux call rebuilds. The cache key includes async_mode (see _client_cache_key), so the sync and async clients for the same provider live in two distinct entries pointing at the same underlying transport. The fix walked the sync wrapper's _real_client correctly but the async wrappers (AsyncCodexAuxiliaryClient, AsyncAnthropicAuxiliaryClient, AsyncGeminiNativeClient) never exposed _real_client at all, so the async entry survived eviction and kept handing out the poisoned client. Effect on async aux callers: one timeout now poisons every subsequent async aux call (compression, vision, session_search, title_generation) with 'Connection error' until gateway restart -- even while the sync route recovered as designed in #23482. Mirror the sync wrapper's _real_client onto each async wrapper so the existing eviction helper finds them. Three changes, one per wrapper: - AsyncCodexAuxiliaryClient: self._real_client = sync_wrapper._real_client (the underlying OpenAI client) - AsyncAnthropicAuxiliaryClient: same shape - AsyncGeminiNativeClient: self._real_client = sync_client (Gemini's native facade is itself the leaf; no OpenAI client beneath it) Update _evict_cached_client_instance docstring to reflect that it now covers both sync and async wrappers via the same attribute walk. Test: TestAuxiliaryClientPoisonedCacheEviction.test_evict_cached_client_instance_walks_async_wrapper seeds both sync and async cache entries pointing at the same leaf and asserts both are dropped on a single eviction call. Verified the test fails without the wrapper changes ("async cache entry survived eviction -- wrapper is missing _real_client") and passes with them. Refs #23482, #23432	2026-05-11 11:13:20 -07:00
Teknium	1d00716754	fix(cli,tui): align CJK / wide-char markdown tables (#23863 ) CJK and emoji glyphs render as two terminal cells but JS String#length and the model's own padding count them as one, so any markdown table with Chinese / Japanese / Korean cells drifts right per row when a real terminal renders it. Both surfaces fix this with a display-cell width measurement (wcswidth on the Python side, stringWidth on the TUI side). Changes: - agent/markdown_tables.py: new helper. realign_markdown_tables(text) detects markdown table blocks (header + \|---\| divider) and rewrites the row padding using wcwidth.wcswidth so every pipe and dash lines up across rows. No-op on text without tables. - cli.py: hook the helper into _render_final_assistant_content for strip / render modes (raw passes through untouched), and into the streaming line emitter so live token-by-token rendering also produces aligned tables. A small two-buffer state machine in _emit_stream_text holds table rows until the block ends, then flushes them through the realigner so all rows pad to a single per-column width. - ui-tui/src/components/markdown.tsx: renderTable now uses stringWidth (Bun.stringWidth fast path + East-Asian-width-aware fallback, already memoised in @hermes/ink) instead of UTF-16 String#length for both column-width measurement and per-cell padding. Drops the comment that documented the bug as a deliberate limitation. Validation: - New tests/agent/test_markdown_tables.py (11): every rebuilt block shares pipe column offsets across rows for pure CJK, mixed CJK+emoji, ragged-row, and multi-table inputs. - Updated tests/cli/test_cli_markdown_rendering.py: the existing strip-mode test asserted exact whitespace; rewritten to assert the alignment contract (cell content survives + every rendered row shares pipe offsets). - New ui-tui markdown.test.ts case (1): rendered column-2 start offset is identical for the header + every body row, including the CJK row that drifted before the fix. - Live: hermes chat -q with the user-reported screenshot prompt now produces a perfectly aligned table on the wire (header, divider, 4 body rows including '通义千问', all pipes at identical columns).	2026-05-11 11:13:06 -07:00
kshitij	657874460f	chore: ruff auto-fixes — collapsible-else-if, if-stmt-min-max, dict.fromkeys (#23926 ) PLR5501 (collapsible-else-if): 28 instances — else: if: → elif: PLR1730 (if-stmt-min-max): 15 instances — if x<y: x=y → x=max(x,y) C420 (dict.fromkeys): 2 instances — dictcomp → dict.fromkeys PLR1704 (redefined-argument): 1 instance — reason → err_msg (shadow fix) C414 (unnecessary-list): 1 instance — sorted(list(x)) → sorted(x) 28 files, -44 net lines. All mechanical, zero logic changes. 17,211 tests pass, zero regressions.	2026-05-11 11:03:29 -07:00
Teknium	228b7d27bd	fix(auxiliary): cache 402'd providers as unhealthy with TTL to stop per-call retry storms (#23597 ) When an auxiliary provider returns HTTP 402 (credit / payment), every subsequent compression / title-gen / session-search / vision call still re-tried it as the FIRST entry in the chain — burning ~1 RTT to hit 402 again, then falling back. On a long Discord/LCM session that meant dozens of doomed 402s per minute (issue #23570). Add a per-process unhealthy-provider cache with a 10 min TTL. When any caller observes a payment error against a provider, the label is marked unhealthy and skipped by: * _resolve_auto Step-1 (main provider use-as-aux path) * _resolve_auto Step-2 (aggregator/fallback chain) * _try_payment_fallback (used by call_llm/acall_llm on first 402) Skip-logs are throttled to once per minute per label so a bursty session doesn't spam agent.log. Entries auto-expire so a topped-up account recovers without manual intervention. The cache is in-process only by design — multi-profile users with different keys per profile must each hit the 402 once. Refs #23570	2026-05-10 22:43:14 -07:00
Teknium	e5bce320db	fix(auxiliary): evict cached client on timeout/connection error (#23482 ) A Codex auxiliary timeout closes the underlying OpenAI client (so the streaming hang doesn't sit until the user kills the session), but the cached wrapper kept pointing at the now-dead transport. Subsequent auxiliary calls (compression retry, memory flush, background review, title generation routed via provider: main) reused that closed client and failed fast with 'Connection error' until the gateway restarted — even though the main agent route was healthy the whole time. Sync `_get_cached_client` had no liveness check (async did, via loop identity), and the connection-error fallback in `call_llm` only fired on the auto provider path, so an explicit provider — including the common `auxiliary.compression.provider: main` shape — never evicted. Three fixes: * New `_evict_cached_client_instance(target)` helper that drops the cache entry whose stored client is target (or wraps it via `_real_client`, for `CodexAuxiliaryClient`). * `_CodexCompletionsAdapter._close_client_on_timeout` evicts the wrapper after closing the inner OpenAI client. * `call_llm` and `async_call_llm` evict on `_is_connection_error` before re-raising, regardless of whether the provider is auto. Net effect: one timeout costs one summary attempt + the existing 30s compressor cooldown; the next compaction rebuilds the client and works. Non-connection errors (4xx/5xx) do not evict, so cache hits stay stable. Closes #23432	2026-05-10 18:55:05 -07:00
Teknium1	ae83a54be4	docs(kanban): worker lane contract page + review-required convention Closes the architectural-pin part of #19931. Most of what that issue asked for is already implemented (logs under kanban root, env-pinned workspace, dispatcher routing of unknown assignees, lifecycle ownership, structured handoff conventions). What was missing: 1. A written contract integrators can point at when adding a new worker lane shape, and 2. The "code-changing workers should not auto-promote success to done" convention. This commit ships both as docs+convention layered on existing primitives. No kernel changes — the kanban_complete / kanban_block / kanban_comment surfaces already support the review-required pattern; we just hadn't written it down or made it visible to workers. Changes: - `agent/prompt_builder.py::KANBAN_GUIDANCE`: append the review-required exception to step 5 of the lifecycle. Workers get the cue auto-injected into their system prompt — drop structured metadata into a kanban_comment first, then end with kanban_block(reason="review-required: <summary>") instead of kanban_complete when the work needs review. Total prompt size went from ~3000 to ~3275 chars; well under the 4096 budget enforced by test_kanban_guidance_size. - `skills/devops/kanban-worker/SKILL.md`: add a worked example to the existing "Good summary + metadata shapes" section between the Coding-task and Research-task examples. Same shape as the others (kanban_comment with structured handoff JSON, then kanban_block with the human-readable reason). Plus a one-line guide on when to use kanban_complete vs the review-required pattern. - `website/docs/user-guide/features/kanban-worker-lanes.md` (new): the integrator-facing contract. Covers the hierarchy, the three things every lane must provide (assignee, spawn mechanism, lifecycle terminator), the env vars the dispatcher injects, the review-required convention, the failure modes the kernel handles for free, and an explicit "external CLI worker lane" deferred- pending-concrete-asker section that links to #19931 and #19924. - `website/sidebars.ts`: link the new page under user-guide/features. The "specialist worker lanes for external CLI tools (Codex / Claude Code / OpenCode)" runner is NOT shipped here. The dispatcher's spawn_fn parameter already supports plugin-shaped extension; the per-CLI integration work (auth, sandbox policy, exit-code mapping) needs a concrete asker. The new docs page tells would-be integrators the contract any such lane must satisfy. Refs #19931	2026-05-10 18:15:52 -07:00
Teknium	d6e1fadbf5	fix(xai): omit reasoning.effort for grok models that reject it (#23435 ) xAI's Responses API returns HTTP 400 ("Model X does not support parameter reasoningEffort") for grok-4, grok-4-0709, grok-4-fast-, grok-4-1-fast-, grok-3, grok-4.20-0309-, and grok-code-fast-1 — even though those models reason natively. Hermes was unconditionally sending `reasoning: {effort: 'medium'}` to xAI for every Grok model, breaking direct `--provider xai` for the entire grok-4 line. Add a substring allowlist predicate (verified live against api.x.ai 2026-05-10) covering the only Grok families that accept the effort dial: grok-3-mini, grok-4.20-multi-agent, grok-4.3. The Responses transport omits the `reasoning` key entirely for everything else while still including `reasoning.encrypted_content` so we capture native reasoning tokens. Verified end-to-end: `hermes chat -q hi --provider xai --model grok-4-0709` went from HTTP 400 to a successful reply.	2026-05-10 15:21:30 -07:00
Teknium	c39168453d	feat(i18n): localize all gateway commands + web dashboard, add 8 new locales (16 total) (#22914 ) * feat(i18n): localize /model command output Reported by @tianma8888: when Chinese users run /model, the labels ("Provider:", "Context:", "_session only_", etc.) are still English. This routes the static prose through the existing i18n catalog so it follows display.language / HERMES_LANGUAGE. Changes: - locales/{en,zh,ja,de,es,fr,tr,uk}.yaml: add 17 keys under gateway.model.* covering switched/provider/context/max_output/cost/ capabilities/prompt_caching/warning/saved_global/session_only_hint/ current_label/current_tag/more_models_suffix/usage_. - gateway/run.py _handle_model_command: replace hardcoded f-strings in the picker callback, the text-list fallback, and the direct-switch confirmation block with t("gateway.model.<key>", ...). What stays English: - model IDs, provider slugs, capability strings, cost figures, and the "[Note: model was just switched...]" prepended to the model's next prompt (LLM-facing, not user-facing). - The two slightly-different session-only hints unify on a single key with the em-dash phrasing. Validation: tests/agent/test_i18n.py 27/27 passing (parity contract holds), tests/gateway/ -k 'model or i18n' 74/74 passing. feat(i18n): localize all gateway slash command outputs Expands the i18n catalog from 7 strings to 234 keys across 35 gateway slash command handlers, so non-English users see localized output for \`/profile\`, \`/status\`, \`/help\`, \`/personality\`, \`/voice\`, \`/reset\`, \`/agents\`, \`/restart\`, \`/commands\`, \`/goal\`, \`/retry\`, \`/undo\`, \`/sethome\`, \`/title\`, \`/yolo\`, \`/background\`, \`/approve\`, \`/deny\`, \`/insights\`, \`/debug\`, \`/rollback\`, \`/reasoning\`, \`/fast\`, \`/verbose\`, \`/footer\`, \`/compress\`, \`/topic\`, \`/kanban\`, \`/resume\`, \`/branch\`, \`/usage\`, \`/reload-mcp\`, \`/reload-skills\`, \`/update\`, \`/stop\` (plus the \`/model\` block already added in the previous commit). Reported by @tianma8888 — Chinese users want command output prose in their language, not just the labels we already had. Translations are hand-written for all 8 supported locales (en, zh, ja, de, es, fr, tr, uk), matching each catalog's existing style: full-width punctuation in zh, em-dashes in zh/ja/uk, French spaced colons, German noun capitalization, etc. What stays English (unchanged): - Identifiers/values: model IDs, file paths, profile names, session IDs, command flag names like --global, URLs, config keys. - Backtick code spans: \`/foo\`, \`config.yaml\`. - Log messages (logger.info/warning/error). - LLM-facing system notes prepended to next prompt (e.g. [Note: model was just switched...]). - Strings produced by external modules (gateway_help_lines, format_gateway, manual_compression_feedback) — those have their own surfaces. New shared keys for cross-handler boilerplate: - gateway.shared.session_db_unavailable (5 call sites: branch, title, resume, topic, _disable_telegram_topic_mode_for_chat) - gateway.shared.session_not_found (1 site) - gateway.shared.warn_passthrough (2 sites in /title's f"⚠️ {e}" pattern) YAML gotcha fixed: \`yolo.on\` and \`yolo.off\` were originally written unquoted, which YAML 1.1 parses as boolean True/False keys. Renamed to \`yolo.enabled\` / \`yolo.disabled\` for both safety and clarity. Test fix: tests/agent/test_i18n.py::test_t_missing_key_in_non_english_falls_back_to_english now resets the catalog cache on teardown, so the fake "foo: English Foo" locale doesn't poison the module-level cache for subsequent tests in the same xdist worker. (Without this, every gateway slash command test that shares a worker with the i18n suite would see the fake catalog.) Validation: - tests/agent/test_i18n.py: 27/27 (parity contract — every key in every locale, matching placeholder tokens). - tests/gateway/: 5077 passed, 0 failed (full gateway suite). - 180 t() call sites added across 35 handlers; 1872 catalog entries total (234 keys × 8 locales). * feat(i18n): add 8 new locales — af, ko, it, ga, zh-hant, pt, ru, hu Expands the static-message catalog from 8 → 16 languages, each with full 270-key parity against the English source-of-truth. Every locale now covers the same surface PR #22914 added: approval prompts plus all 35 gateway slash command outputs. New locales: - af Afrikaans (community ask in #21961 by @GodsBoy; PRs #21962, #21970) - ko Korean (PRs #20297 by @tmdgusya, #22285 by @project820) - it Italian (PR #20371 by @leprincep35700) - ga Irish/Gaeilge (PR #20962 by @ryanmcc09-dot) - zh-hant Traditional Chinese (PRs #20523 by @jackey8616, #13140 by @anomixer) - pt Portuguese (PRs #20443 by @pedroborges, #15737 by @carloshenriquecarniatto, #22063 by @Magaav) - ru Russian (PR #22770 by @DrMaks22) - hu Hungarian (PR #22336 by @lunasec007) Each locale uses native-quality translations matching the existing tone and conventions of the older 8 locales: - zh-hant uses 繁體 characters with TW/HK technical vocabulary (軟體 not 软件, 連線 not 连接, 設定 not 设置, 訊息 not 消息, 工作階段 not 会话, 程式 not 程序, 預設 not 默认, 伺服器 not 服务器), full-width punctuation 「：（）」. - ko uses formal 합니다체 (습니다/합니다) register throughout. - pt uses European Portuguese as baseline with neutral PT/BR vocabulary where possible. - ga uses standard An Caighdeán Oifigiúil; English loanwords retained for tech terms without good Irish equivalents (gateway, API, JSON). - All preserve {placeholder} tokens, backtick code spans, slash commands, brand names (Hermes, MCP, TTS, YOLO, OpenAI, Telegram, etc.), and emoji. Aliases added in agent/i18n.py: - af-za, Afrikaans → af - ko-kr, Korean, 한국어 → ko - it-it, italiano → it - ga-ie, Irish, Gaeilge → ga - zh-tw, zh-hk, zh-mo, traditional-chinese → zh-hant (note: zh-tw used to alias to zh; now aliases to its own zh-hant catalog) - zh-cn, zh-hans, zh-sg → zh (unchanged from before) - pt-pt, pt-br, brazilian, portuguese → pt - ru-ru, Russian, русский → ru - hu-hu, Magyar → hu The zh-tw alias re-routing is intentional: previously typing 'zh-TW' got the Simplified Chinese catalog (wrong vocabulary for Taiwan/HK users). Now those users get the proper Traditional Chinese catalog. Validation: - tests/agent/test_i18n.py: 43/43 (parity contract holds for all 16 languages × 270 keys = 4320 catalog entries, with matching placeholder tokens). - E2E alias resolution verified for all 19 alias inputs (Afrikaans, ko-KR, 한국어, italiano, Gaeilge, zh-TW, zh-HK, traditional-chinese, pt-BR, brazilian, Magyar, etc.). - tests/gateway/: 5198 passed (3 pre-existing TTS routing failures unrelated to i18n). Credit to all contributors whose PRs surfaced these language requests. Their original PRs may now be closed as superseded with credit. * feat(dashboard-i18n): add 14 web dashboard locales matching the static catalog Brings the React dashboard (web/src/) up to the same 16-language coverage the static catalog already has after the previous commits in this PR. The Translations interface is TypeScript-typed, so every new locale must provide every key — tsc -b is the parity guard. Languages added (each is a complete 429-line locale file): - af Afrikaans - ja Japanese (PR #22513 by @snuffxxx surfaced this) - de German (PR #21749 by @mag1art) - es Spanish (PR #21749) - fr French (PRs #21749, #10310 by @foXaCe) - tr Turkish - uk Ukrainian - ko Korean (PRs #21749, #18894 by @ovstng, #22285 by @project820) - it Italian - ga Irish (Gaeilge) - zh-hant Traditional Chinese (PR #13140 by @anomixer) - pt Portuguese (PRs #22063 by @Magaav, #22182 by @wesleysimplicio, #15737 by @carloshenriquecarniatto) - ru Russian (PRs #21749, #22770 by @DrMaks22) - hu Hungarian (PR #22336 by @lunasec007) Each translation covers all 15 namespaces with full key parity vs en.ts, preserves every {placeholder} token verbatim, keeps identifiers untranslated (brand names, file paths, cron expressions, code spans), translates the language.switchTo tooltip into the target language, and matches existing tone conventions (zh-hant uses TW/HK vocab; ja uses formal desu/masu; ko uses formal seumnida register; ga uses An Caighdean Oifigiuil with English loanwords for tech vocab without good Irish equivalents). Plumbing: - web/src/i18n/types.ts: Locale union expanded to all 16 codes. - web/src/i18n/context.tsx: imports all 16 catalogs; exports LOCALE_META (endonym + flag per locale); isLocale() type guard. - web/src/i18n/index.ts: re-export LOCALE_META. - web/src/components/LanguageSwitcher.tsx: replaced two-state EN-ZH toggle with a click-to-open dropdown listing all 16 languages. Note: zh-hant.ts exports zhHant (camelCase) since hyphen is invalid in a JS identifier; the canonical 'zh-hant' string keys it in TRANSLATIONS. Validation: - npx tsc -b: 0 errors. Every locale satisfies Translations. - npm run build (tsc + vite production): green, 2062 modules. - Each locale file is exactly 429 lines. Out of scope: plugin dashboards (kanban/achievements ship as prebuilt bundles with no source in repo); Docusaurus docs (separate surface); TUI (no i18n yet). * feat(plugin-i18n): localize achievements + kanban plugin dashboards across all 16 locales Brings the two shipped plugin dashboards (hermes-achievements, kanban) under the same i18n umbrella as the core dashboard PR #22914 just established. Both bundles now read user-facing strings from the host's i18n catalog via SDK.useI18n() instead of hardcoded English. ## Approach Plugin dashboards ship as prebuilt IIFE bundles in plugins/<name>/dashboard/dist/index.js — no build step, no source in repo (upstream-authored, vendored as compiled JS). Earlier contributor PRs (#22594, #22595, #18747) tried direct edits but didn't actually wire the bundles to read translations. This change does the wiring properly: 1. Each bundle gets a useI18n shim at IIFE scope: const useI18n = SDK.useI18n \|\| function () { return { t: { kanban: null }, locale: "en" }; }; Older host SDKs without useI18n still load the bundle and render English fallbacks. 2. A small tx(t, path, fallback, vars) helper resolves dotted keys under the plugin's namespace (t.kanban.* or t.achievements.) and interpolates {placeholder} tokens. 3. Every React component starts with const { t } = useI18n() and each user-visible string is wrapped in tx(t, "key", "English fallback"). Helpers called outside React components (window.prompt callers, constants used during init) take t as a parameter. 4. Top-level constants that were English dictionaries (COLUMN_LABEL, COLUMN_HELP, DESTRUCTIVE_TRANSITIONS, DIAGNOSTIC_EVENT_LABELS in kanban) become getColumnLabel(t, status)-style functions backed by FALLBACK_ dictionaries. ## Translations added Two new top-level namespaces added to the dashboard's TypeScript-typed Translations interface: - achievements: ~70 keys covering the hero, scan banner, achievement card, share dialog, stats, filters, and empty states. - kanban: ~145 keys covering the board, columns (with nested columnLabels and columnHelp sub-dicts), card detail panel, bulk-actions toolbar, dependency editor, board switcher, and diagnostic callouts. Each key is provided across all 16 supported locales: en, zh, zh-hant, ja, de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu. Total new translation entries: ~3,440 (215 keys × 16 locales). ## What stays English (deliberate) - API paths, CSS class names, data-* attributes, JSON keys, regex strings, URLs, file paths (~/.hermes/kanban.db, boards/_archived/). - State identifier strings used as lookup keys (triage / todo / ready / running / blocked / done / archived) — labels translate, key strings don't. - The PNG share-card text rendered to canvas in the achievements ShareDialog (HERMES AGENT watermark, UNLOCKED stamp, tier names) — these become part of a globally-shared image and stay English. - localStorage keys (hermes.kanban.selectedBoard). - Brand names (Kanban, Hermes, WebSocket, Nous Research). ## Contributor credit PR #22594 by @02356abc and PR #22595 by @02356abc supplied the en + zh kanban namespace skeleton (145 keys); used as the en source- of-truth in this commit and translated to the other 14 locales. PR #18747 by @laolaoshiren first surfaced the achievements localization request. ## Validation - npx tsc -b: 0 errors. All 16 locale .ts files satisfy the Translations type with full key parity. - npm run build (tsc + vite production build): green, 2062 modules, 1.56MB JS / 95KB CSS, ~2.5s build. - node --check on both plugin bundles: parse cleanly. - 126 tx() call sites in kanban, 46 in achievements. ## Out of scope - TUI (ui-tui/) has no i18n infrastructure yet. - Docusaurus docs (website/i18n/) — already had zh-Hans; expanding is a separate translation workstream (Thai / Korean / Hindi PRs).	2026-05-10 07:14:14 -07:00
Teknium	5aa755e4e6	feat(plugins): run any LLM call from inside a plugin via ctx.llm (#23194 ) * feat(plugins): host-owned LLM access via ctx.llm Plugins can now ask the host to run a one-shot chat or structured completion against the user's active model and auth, without ever seeing an OAuth token or API key. Closes the gap where plugins that needed bounded structured inference (receipts, CRM extraction, support classification) had to either bring their own provider keys or register a tool the agent had to call. New surface on PluginContext: - ctx.llm.complete(messages, ...) - ctx.llm.complete_structured(instructions, input, json_schema, ...) - async siblings ctx.llm.acomplete / acomplete_structured Backed by the existing auxiliary_client.call_llm pipeline — every provider, fallback chain, vision routing, and timeout policy Hermes already supports applies automatically. Trust gate (fail-closed by default): - plugins.entries.<id>.llm.allow_model_override - plugins.entries.<id>.llm.allowed_models (allowlist; '' = any) - plugins.entries.<id>.llm.allow_agent_id_override - plugins.entries.<id>.llm.allow_profile_override Embedded model@profile shorthand goes through the same gate as explicit profile=, so it can't bypass the auth-profile policy. Conflicting explicit and embedded profiles fail closed. Also lands: - plugins/plugin-llm-example/ — reference plugin that registers /receipt-extract, demonstrating image+text structured input, jsonschema validation, and the trust-gate config. - website/docs/developer-guide/plugin-llm-access.md — full API docs. - 45 unit tests covering trust gates, JSON parsing, schema validation, image encoding, async surface, and config loading. Validation: - 2628 tests pass in tests/agent/ - E2E: bundled plugin loaded with isolated HERMES_HOME, slash command produced parsed JSON via stubbed call_llm - response_format extra_body wired correctly for both json_object and json_schema modes docs(plugin-llm): rewrite quickstart and framing The quickstart now uses a meeting-notes-to-tasks example instead of a receipt extractor, and the page leads with hook-time / gateway pre-filter / scheduled-job framing rather than the OpenClaw KB/support/CRM/finance/migration enumeration that the original upstream PR used. Receipt example moved to a separate worked example link so the docs page itself doesn't echo any of the upstream framing. Also clarifies where ctx.llm fits in the broader plugin surface (table comparing register_tool / register_platform / register_hook / etc.) and what makes this lane different from auxiliary_client internals. No code change. * docs(plugin-llm): reframe as any LLM call, not just structured output The original draft leaned heavily on complete_structured() and made the chat lane (complete() / acomplete()) feel like a footnote. Restructure so: - The page title and description say 'any LLM call.' - The lead shows BOTH a plain chat call (error rewriter) AND a structured call (triage scorer) up top. - Quick start has two complete plugin examples — /tldr (chat) and /paste-to-tasks (structured). - New 'When to use which' table for choosing complete() vs complete_structured() vs the async siblings. - Trust-gate sections explicitly note 'all four methods,' and the request-shaping list calls out chat-only fields (messages) and structured-only fields (instructions, input, json_schema) alongside each other. - The 'Where this fits' section now says 'for any reason, structured or not.' The receipt-extractor reference plugin still exists under plugins/plugin-llm-example/ — but the docs page no longer treats it as the canonical surface example. It's now described as 'a third worked example, this time with image input.' No code change. * feat(plugin-llm): split provider/model into independent explicit kwargs The first cut accepted a single 'provider/model' slug on every method and split it internally. That looked clean but broke under live test: the model-override path tried to use the slug's vendor prefix as a literal Hermes provider id, which silently switched the user off their aggregator (e.g. plugin asks for 'openai/gpt-4o-mini' on a user who routes through OpenRouter — host attempted to call the 'openai' provider directly, failed because OPENAI_API_KEY wasn't set). New shape mirrors the host's main config: ctx.llm.complete( messages=[...], provider='openrouter', # gated, optional model='openai/gpt-4o-mini', # gated, optional profile='work', # gated, optional ... ) Each is independently gated by its own allow__override flag. Granting model-override does NOT auto-grant provider-override. Allowlists are now per-axis (allowed_providers, allowed_models) matched literally against whatever string the plugin sends. Dropped 'model@profile' embedded-suffix shorthand entirely. Hermes doesn't use that pattern anywhere else; profile= is its own kwarg. Live E2E (against real OpenRouter via Teknium's config) confirms: - zero-config call works - default-deny blocks each override with a helpful error - model-only override stays on user's active provider (the bug) - provider+model override switches cleanly - allowlist refuses non-listed entries - structured output round-trip parses + schema-validates Tests: 49 cases (up from 45); all green. Docs updated to match the new shape, including a 'most plugins never need this section' callout on the trust-gate config block. fix+cleanup(plugin-llm): real attribution, hook-mode coverage, move example out of core Three integration fixes for the ctx.llm surface: 1. Attribution bug — result.provider and result.model now reflect what call_llm actually used, not placeholder fallbacks ('auto', 'default'). New _resolve_attribution() helper: - explicit overrides win (what the call targeted) - response.model wins for the recorded model (provider canonicalisation: 'gpt-4o' → 'gpt-4o-2024-08-06' etc.) - falls back to _read_main_provider() / _read_main_model() when no override is set, so audit logs reflect the user's active main provider/model - 'auto' / 'default' only when EVERYTHING is empty Live verified: zero-config call now records provider='openrouter', model='anthropic/claude-4.7-opus-20260416' instead of provider='auto', model='default'. 2. Hook-mode coverage — TestHookMode confirms ctx.llm.complete works from inside a registered post_tool_call callback. The docs page promised hook integration; now there's a test that exercises the lazy-import path through the real invoke_hook machinery. Two cases: traceback-rewrite hook with conditional ctx.llm.complete, and minimal hook regression for the sync-hook + sync-llm path. 3. Reference plugin moved out of core. plugins/plugin-llm-example/ is gone from hermes-agent — it now lives in the new NousResearch/hermes-example-plugins companion repo. The docs page links there. Hermes' bundled plugins should be plugins users actually run; reference / docs-companion plugins live externally. Test count: 56 (up from 49). Wider sweep on tests/hermes_cli/ + tests/gateway/ + tests/tools/ + tests/agent/ shows 16770 passing; the 12 failures are all pre-existing on origin/main (verified by stashing this branch's changes and re-running) — kanban-boards, delegate-task, gateway-restart, tts-routing — none touch the plugin_llm surface. * chore(plugins): move all example plugins to companion repo Reference / docs-companion plugins now live exclusively in NousResearch/hermes-example-plugins, not bundled with the core repo: - example-dashboard - strike-freedom-cockpit A new fourth example, plugin-llm-async-example, was added to that repo demonstrating ctx.llm's async surface (acomplete()) with asyncio.gather() — registers /translate <lang>: <text> which fires forward translation + sentiment classifier in parallel, then a back-translation for QA. Live-tested at 2.5s for three real provider round-trips (would be ~5-6s sequential). Docs updated: - developer-guide/plugin-llm-access.md links both sync and async examples in the Reference section - user-guide/features/extending-the-dashboard.md repoints both demo sections to the companion repo with corrected install paths - user-guide/features/built-in-plugins.md drops the two demo rows - AGENTS.md notes that example plugins live in the companion repo Net: hermes-agent's plugins/ directory now contains only plugins users actually run (memory providers, dashboard tabs that ship real features, the disk-cleanup hook, platform adapters). All four demo / reference plugins live externally where they can be cloned on demand instead of inflating the core install.	2026-05-10 07:09:28 -07:00
Teknium	7312f7f849	feat(curator): hint at `hermes curator pin` in the rename block (#23212 ) Surfaces the pin command at the moment users care about it: when a consolidation just landed against their skill library and they're looking at the umbrella name in the curator output. Previously `hermes curator pin` existed but had no discovery surface — users only learned it existed by reading docs or stumbling onto `hermes curator --help`. The hint: archived 3 skill(s): • docx-extraction → document-tools • pdf-extraction → document-tools • old-stale — pruned (stale) full report: hermes curator status keep an umbrella stable: hermes curator pin document-tools Gated on having at least one consolidation that produced an umbrella. Pruned-only runs (nothing surviving to pin) skip the hint. When multiple umbrellas were produced, picks alphabetically first as a concrete example rather than listing them all. 3 new tests in tests/agent/test_curator_classification.py covering: consolidation produces hint with real umbrella name, pruned-only run omits it, multi-umbrella picks one example.	2026-05-10 06:44:53 -07:00
kshitijk4poor	44cdf555a8	fix(codex-spark): defensive 128k entry in DEFAULT_CONTEXT_LENGTHS + clarify validation test docstring Two follow-ups from self-review: 1. Add gpt-5.3-codex-spark to DEFAULT_CONTEXT_LENGTHS at 128k. The primary resolution path for Spark goes through provider='openai-codex' → _CODEX_OAUTH_CONTEXT_FALLBACK (already correct). But if any future code path resolves Spark's context with a different provider (custom proxy, generic fallthrough), the longest-substring-first lookup in step 8 would match 'gpt-5' and report 400k, which is wrong by ~3x. Adding the explicit override is a cheap defensive correctness fix matching how gpt-5.4-mini and gpt-5.4-nano already shadow the generic gpt-5 entry. 2. Update test_openai_codex_model_validation_fallback.py docstring. The bug it was originally written for (gpt-5.3-codex-spark missing from listing) is now resolved by this PR's catalog restoration. The test still validly exercises the soft-accept code path for any future entitlement-gated Codex slug that ships before Hermes catalogs it, but the framing was stale — clarified.	2026-05-09 23:17:25 -07:00
kshitij	9ee9a4297d	docs(codex-spark): document ChatGPT Pro entitlement gating PR #12994 stripped gpt-5.3-codex-spark on the assumption that it was unsupported. It's actually research-preview, ChatGPT-Pro-only, exposed via the Codex OAuth backend at chatgpt.com/backend-api/codex/models — not via the public OpenAI API. Add explanatory comments in: - DEFAULT_CODEX_MODELS / _FORWARD_COMPAT_TEMPLATE_MODELS (codex_models.py) - _CODEX_OAUTH_CONTEXT_FALLBACK (model_metadata.py) - list_authenticated_providers' live-discovery branch (model_switch.py) so future maintainers don't strip the entry again. Also documents the intentional asymmetry that Spark stays out of the "openai" provider catalog (it isn't on the public API) and why the supported_in_api filter is not applied for the openai-codex route.	2026-05-09 23:17:25 -07:00
olegdater	c6dc295a35	fix(model-metadata): set codex-spark fallback context to 128k	2026-05-09 23:17:25 -07:00
olegdater	2a6f3deb50	fix(model-metadata): restore gpt-5.3-codex-spark fallback context	2026-05-09 23:17:25 -07:00
Teknium	3800972dd0	feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955 ) When the active main model has native vision and the provider supports multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini 3, OpenRouter, Nous), vision_analyze loads the image bytes and returns them to the model as a multimodal tool-result envelope. The model then sees the pixels directly on its next turn instead of receiving a lossy text description from an auxiliary LLM. Falls back to the legacy aux-LLM text path for non-vision models and unverified providers. Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and Cline. All four converge on the same pattern: tool results carry image content blocks for vision-capable provider/model combinations. Changes - tools/vision_tools.py: _vision_analyze_native fast path + provider capability table (_supports_media_in_tool_results). Schema description updated to reflect new behaviour. - agent/codex_responses_adapter.py: function_call_output.output now accepts the array form for multimodal tool results (was string-only). Preflight validates input_text/input_image parts. - agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so tools see the live CLI/gateway override, not the stale config.yaml default. set_runtime_main()/clear_runtime_main() helpers. - run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn start so vision_analyze's fast-path check sees the actual runtime. - tests/conftest.py: clear runtime-main override between tests. Tests - tests/tools/test_vision_native_fast_path.py: provider capability table, envelope shape, fast-path gating (vision-capable model uses fast path; non-vision model falls through to aux). - tests/run_agent/test_codex_multimodal_tool_result.py: list tool content becomes function_call_output.output array; preflight preserves arrays and drops unknown part types. Live verified - Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a typed filepath, gets pixels back, reads exact text from images that no aux description could capture (font color irony, multi-line fruit-count list, etc.). PR replaces the closed prior efforts (#16506 shipped the inbound user- attached path; this PR closes the gap for tool-discovered images).	2026-05-09 21:06:19 -07:00
Teknium	4375b82cd9	feat(curator): show rename map in user-visible summary (#22910 ) * feat(curator): show rename map (where skills went) in user-visible summary The full data has always been on disk in REPORT.md, but the user-visible curator summary (gateway 💾 line, CLI session-start panel, `hermes curator status`) was counts-only — "consolidated 4 into 2 umbrellas" with no names. Users only discovered renames when something they expected was gone. New `_build_rename_summary()` formats the rename map and appends it to `final_summary`: auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1 archived 3 skill(s): • docx-extraction → document-tools • pdf-extraction → document-tools • old-stale-thing — pruned (stale) full report: hermes curator status Empty on no-op ticks (no archives), so most ticks add zero log noise. Cap of 10 entries keeps agent.log readable when a 50-skill consolidation lands; the full list is always in REPORT.md. `hermes curator status` indents continuation lines so the multi-line summary reads as one logical field. 5 new tests in tests/agent/test_curator_classification.py covering empty / consolidation / pruning / cap / mixed cases. * feat(curator): show recent run summary once on `hermes update` The rename map is now visible from where users actually look — the update flow they explicitly run, instead of just the live gateway log or transient CLI session-start panel. Behavior: - After `hermes update`, if the most recent curator run produced a rename map (multi-line summary) that the user hasn't seen yet, print it once with a 'last run Xh ago' header and a one-time-message footer. - Stamp `last_run_summary_shown_at = last_run_at` after printing so subsequent `hermes update` invocations are silent until a newer curator run lands. - Silent on no-op runs (single-line summary like 'auto: no changes; llm: no change'). Still stamps shown so we don't reconsider on every update. - Silent when the curator has never run (the existing first-run notice handles that case). Output: ℹ Skill curator — last run 4h ago auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1 archived 3 skill(s): • docx-extraction → document-tools • pdf-extraction → document-tools • old-stale-thing — pruned (stale) full report: hermes curator status (This message shows once per curator run. View anytime: hermes curator status) State migration: - `_default_state()` gains `last_run_summary_shown_at: None`. Existing state files lack the field; `.get()` returns None; the comparison treats any prior run as 'not yet shown' and prints once on next update. Self-healing. Wiring: - Both `hermes update` paths in main.py call the new `_print_curator_recent_run_notice()` right after the existing first-run notice. Best-effort try/except so a state-load bug never breaks the update flow. 6 tests in tests/hermes_cli/test_curator_recent_run_notice.py: no-run / single-line / multi-line / show-once / new-run-resets / time-formatter buckets.	2026-05-09 18:43:40 -07:00
Wesley Simplicio	4f8d8ad912	fix(error_classifier): classify generic-typed timeout messages as transient (carve-out of #22664 ) RuntimeError('claude CLI turn timed out') from a local OpenAI-compatible shim was falling through to FailoverReason.unknown, surfacing as 'Empty response from model' and burning 3 retry slots on the same failing endpoint. _classify_by_message had no timeout-message branch — only billing/rate_limit/auth/context_overflow/model_not_found patterns. The type-based check at line 565 also requires isinstance(error, (TimeoutError, ConnectionError, OSError)) — a plain RuntimeError doesn't match. Add _TIMEOUT_MESSAGE_PATTERNS for 'timed out', 'deadline exceeded', 'request timed out', 'operation timed out', 'upstream timed out', 'turn timed out'. _classify_by_message returns FailoverReason.timeout (retryable=True) when any pattern matches. Salvage of #22664's classifier portion. The original PR also bundled a fallback self-selection guard which is now redundant (already on main via #22780) plus DeepSeek thinking and session_search fixes that are their own separate concerns. Follow-up to #22780 — fixes the still-broken classification of generic-typed provider-shim timeouts that #22780's dedup didn't cover.	2026-05-09 17:54:07 -07:00
Wesley Simplicio	35f773c459	fix(context_compressor): treat streaming premature-close as transient error Problem: When a provider or proxy drops a streaming response mid-flight (httpcore raises RemoteProtocolError: "incomplete chunked read", "peer closed connection", "response ended prematurely", etc.), _generate_summary would not classify it as a transient error. Instead of retrying on the main model, it entered the generic 60-second cooldown, leaving context growing unbounded until the cooldown expired. Issue #18458. Root cause: _is_connection_error in auxiliary_client.py did not match httpcore's streaming premature-close error substrings. context_compressor.py's _generate_summary except block never called _is_connection_error, so those errors fell through to the 60-second generic cooldown rather than triggering the retry-on-main fallback path used for timeouts. Fix: 1. auxiliary_client.py — extend _is_connection_error keyword list with: "incomplete chunked read", "peer closed connection", "response ended prematurely", "unexpected eof", "remoteprotocolerror", "localprotocolerror". Also guard the `from openai import ...` with try/except ImportError so the function works in environments without the openai package. 2. context_compressor.py — import _is_connection_error and call it in _generate_summary's except block as _is_streaming_closed. Include _is_streaming_closed in the fallback-to-main condition (alongside _is_model_not_found, _is_timeout, _is_json_decode) and use the shorter 30s transient cooldown for streaming-closed errors. Tests: 4 new regression tests in TestStreamingClosedFallback: - test_incomplete_chunked_read_falls_back_to_main - test_peer_closed_connection_falls_back_to_main - test_streaming_closed_on_main_uses_short_cooldown (stash-verified) - test_non_streaming_unknown_error_still_uses_long_cooldown Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 17:52:51 -07:00
Teknium	c7f0aab949	feat(openrouter): wire Pareto Code router with min_coding_score knob (#22838 ) Pick openrouter/pareto-code as your model and OpenRouter auto-routes each request to the cheapest model meeting your coding-quality bar (ranked by Artificial Analysis). The new openrouter.min_coding_score config key (0.0-1.0, default 0.65) tunes the floor. - hermes_cli/models.py: add openrouter/pareto-code to OPENROUTER_MODELS so it shows up in the picker with a description - hermes_cli/config.py: add openrouter.min_coding_score (default 0.65 — lands on a mid-tier coder on the current Pareto frontier) - plugins/model-providers/openrouter: emit extra_body.plugins = [{id: pareto-router, min_coding_score: X}] when model is openrouter/pareto-code AND the score is a valid float in [0.0, 1.0] - agent/transports/chat_completions.py: same emission on the legacy flag path (when no provider profile is loaded) - run_agent.py: openrouter_min_coding_score kwarg + storage; plumbed into both build_kwargs() invocations and the context-summary extra_body path - cli.py: read openrouter.min_coding_score once at init, validate float in [0,1], pass to AIAgent constructions (CLI + background-task paths) - cron/scheduler.py, batch_runner.py, tools/delegate_tool.py, tui_gateway/server.py: propagate the kwarg (mirrors providers_order plumbing — subagents inherit, cron/batch read from config) - tests: profile-level + transport-level coverage of the model gating, unset/empty/out-of-range handling, and the legacy flag path - docs: new 'OpenRouter Pareto Code Router' section in providers.md Verified end-to-end against api.openrouter.ai: at score=0.65 we land on a mid-tier coder, at omission we get the strongest. Score is silently dropped on any model other than openrouter/pareto-code, so it's safe to leave set.	2026-05-09 14:47:00 -07:00
Ninso112	883e11f0a0	fix(openrouter): add x-grok-conv-id header for Grok models to improve prompt cache hit rates (carve-out of #22708 ) Pass session_id through to provider profile build_api_kwargs_extras so the OpenRouter profile can attach an xAI cache-affinity header (x-grok-conv-id: <session-id>) for x-ai/grok-* models. xAI prompt cache requires server affinity via this header — without it the cache is poisoned and Grok prompt-cache hit rates drop dramatically on multi-turn sessions. Carve-out of #22708 by Ninso112. The original PR bundled a /diff slash command, a zsh completion fix (already on main via #22802), and holographic memory null-guards. This salvage keeps just the Grok header work — small, targeted, and well-tested. Other contributors and changes preserved for separate review. Closes #22705.	2026-05-09 13:38:52 -07:00
Teknium	1c9ffb177c	fix(model-metadata): align hy3-preview static fallback + delete change-detector test (#22805 ) Two co-located fixes: 1. agent/model_metadata.py: bump hy3-preview static fallback from 256000 to 262144 (256 * 1024) to match OpenRouter live metadata so cache and offline both agree (issue #22268). 2. tests/hermes_cli/test_tencent_tokenhub_provider.py: replace the exact-value change-detector (assert ctx == 256000) with an invariant assertion (registered + >= 4096). Per AGENTS.md 'Don't write change-detector tests': pinning the upstream-controlled context length is exactly the test class the rule forbids — it breaks every time the provider bumps the published value, with zero behavioral coverage gained. Salvage of #22574 with a redirect on the test approach. The contributor's diff bumped the integer and added a SECOND change-detector pinning DEFAULT_CONTEXT_LENGTHS[hy3-preview] == 262144, which would re-break on the next published bump. We instead delete the change-detector entirely and assert the relationship. Closes #22268.	2026-05-09 13:37:19 -07:00
Maxim Esipov	17d8914850	fix(auxiliary): rotate pooled auth after quota failures	2026-05-09 13:35:04 -07:00
Teknium	775c0e22cf	perf(models_dev): cache-first lookup, skip network when disk cache is fresh (#22808 ) `fetch_models_dev()` is on the hot path of every `AIAgent.__init__` (via `context_compressor → get_model_context_length`). The previous policy was "always try network first, only fall back to disk if network fails," so every fresh `hermes chat` / `hermes gateway` / batch / cron process paid 250-500 ms re-fetching a 2 MB JSON registry that was already on disk from earlier runs. Add a stage 2 between in-mem and network: if `models_dev_cache.json` exists and its mtime is younger than the existing `_MODELS_DEV_CACHE_TTL` (1 hour, same TTL the in-mem cache already uses), load from disk and skip the network call. The in-mem TTL is anchored to the disk file's age, so a 50-min-old cache stays in-memory for only 10 more minutes — no surprise extension of staleness window. Invariants preserved: - `force_refresh=True` still always hits the network and only falls back to disk on failure (`hermes config refresh` semantics). - Missing disk cache → fall through to network (first-ever run). - Stale disk cache (mtime > TTL) → fall through to network. - Negative file age (clock skew) → fall through to network. - Network failure → existing stage-4 stale-disk fallback unchanged. Measured impact (3-run medians, 9950X3D, fresh process per run): fetch_models_dev cold: 256 → 17 ms (-93%) hermes chat -q wall: 4.00 → 3.73 s (-7% median) 3.99 → 3.60 s (-10% min) The chat-end-to-end win is bounded below by API latency variance, but the fetch_models_dev microbenchmark is the cleanest signal: 239 ms shaved off every fresh-process agent construction. Win compounds with the previous perf PRs: #22681 google_chat lazy-load #22766 doctor parallel + IMDS off #22790 gateway.platforms PEP 562 Tests: all 30 `tests/agent/test_models_dev.py` pass (added 4 new ones covering the new disk-cache-first path, force_refresh override, stale disk fallback, and missing-disk-cache fall-through). Full `tests/agent/` suite: 2560 passed, 0 failed.	2026-05-09 13:32:38 -07:00
Julien Talbot	cd712b176a	feat(transports/codex): pass reasoning.effort to xAI Responses API The is_xai_responses branch only sent include=[reasoning.encrypted_content] without forwarding the resolved reasoning_effort. Other Responses providers (OpenAI, GitHub) already get effort forwarded — this aligns the xAI path. Without this, agent.reasoning_effort is silently dropped on the xAI direct path, making Hermes unable to control reasoning depth on grok-4.x via api.x.ai. Tests added to TestCodexBuildKwargs cover effort passthrough, disabled state, and minimal-clamp parity with non-xAI.	2026-05-09 13:23:02 -07:00
Teknium	6e5489c9f3	fix(memory): tighten MEMORY_GUIDANCE against ephemeral PR/issue/SHA notes (#22781 ) The model regularly writes session-outcome facts to MEMORY.md despite the existing 'Do NOT save task progress' line — entries like 'Submitted PR #22577 for the kanban dedup fix' or 'Fixed bug X in file Y'. These are stale within days, pollute the system prompt, and crowd out durable user preferences (the issue #22563 reporter saw 9 sections of bug-fix notes injected on a brand-new task). Add explicit examples of what NOT to save (PR numbers, issue numbers, commit SHAs, 'fixed/submitted/Phase N done', file counts) plus the 7-day-staleness heuristic so the model has a concrete calibration target rather than guessing what counts as 'task progress'. Closes #22563 (the prompt-side, low-risk portion). The bigger relevance-based-injection / vector-retrieval feature requested in #22563 is tracked under #2184 (Richer local memory). Per skill rule on prompt caching, dynamic memory injection breaks the frozen-snapshot invariant and needs a separate design call.	2026-05-09 12:48:25 -07:00
obafemiferanmi1999	0f1d41a88c	fix(transports): use PEP 604 annotation for ToolCall.extra_content `ToolCall.extra_content` was annotated `Optional[Dict[str, Any]]`, but neither `Optional` nor `Dict` are imported at the top of `agent/transports/types.py` — only `Any` is. The rest of the file consistently uses PEP 604 / 585 syntax (e.g. `str \| None`, `dict[str, Any] \| None`). The file has `from __future__ import annotations`, so the missing names don't crash class definition. But the annotation IS evaluated when anything calls `typing.get_type_hints(ToolCall)` — introspection raises `NameError: name 'Optional' is not defined`. ruff catches it cleanly: F821 Undefined name `Optional` agent/transports/types.py:65:32 F821 Undefined name `Dict` agent/transports/types.py:65:41 Switch the annotation to `dict[str, Any] \| None` to match the rest of the file's style. No new imports needed. Verified: - ruff F-checks now pass on the file - `typing.get_type_hints(ToolCall)` succeeds where it raised before - 166/166 tests in tests/agent/transports/ pass on Windows + Python 3.12	2026-05-09 02:25:37 -07:00
qWaitCrypto	2c8c48fbc7	fix(webui): clarify MEDIA absolute-path hint	2026-05-09 02:22:40 -07:00
qWaitCrypto	aad5490e74	fix(webui): add platform hint for MEDIA rendering WebUI sessions construct AIAgent(platform="webui") but PLATFORM_HINTS had no "webui" entry, so the agent received no platform hint at all. The WebUI frontend supports rich MEDIA:/absolute/path previews for images, audio, video, PDF, HTML, CSV, diffs, and Excalidraw, but without a hint the agent either ignores MEDIA: or falls back to Markdown image syntax which silently fails for local files. Add a webui hint that documents the MEDIA: render path and warns against ![alt](/path) for local files. Fixes #21883	2026-05-09 02:22:40 -07:00
kshitij	c7e8add120	fix(context): handle JSON decode errors in compression — salvage of #22248 (#22416 ) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of #22248 by @0xharryriddle. Closes #22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>	2026-05-09 01:47:15 -07:00
Teknium	0ec052ca24	perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138 ) Interactive `hermes` launch drops from ~21s to ~2.5s. Three independent fixes, each targets a distinct hot spot in the banner / tool-registration path that fires on every CLI invocation. 1. `get_external_skills_dirs()` in-process mtime cache (~10s saved) The function re-read + YAML-parsed the full ~/.hermes/config.yaml on every call. Banner build invokes it once per skill to resolve the category column, which on a 120-skill install meant ~120 reparses of a 15 KB config (~85 ms each). Added a `(config_path, mtime_ns) -> list[Path]` memo; stat() is ~2 us vs ~85 ms for the parse. Edits to config.yaml invalidate the cache on the next call via mtime. 2. Feishu availability probe uses `importlib.util.find_spec` (~5.2s saved) `tools/feishu_doc_tool.py::_check_feishu` and the identical helper in `feishu_drive_tool.py` were calling `import lark_oapi` purely to detect whether the SDK was installed. Executing the real import pulls in websockets + dispatcher + every v2 API model — ~5 seconds of work that fires at every tool-registry bootstrap. `find_spec` answers the same question ("is lark_oapi importable?") without executing the module. The actual tool handlers still do the real import on invoke, so runtime behavior is unchanged. 3. `_web_requires_env` no longer triggers Nous portal refresh (~800ms saved) `tools/web_tools.py::_web_requires_env` used `managed_nous_tools_enabled()` to gate four gateway env-var names in the returned list. The gate called `get_nous_auth_status()` -> `resolve_nous_runtime_credentials()` -> live HTTP POST to the portal on every tool-registry bootstrap. But the list is pure metadata — if the env var is set at runtime, the tool lights up; otherwise it doesn't. Including the four names unconditionally is harmless for unsubscribed users (vars just aren't set) and eliminates the sync HTTP round trip from startup. Test: - tests/agent/test_external_skills_dirs_cache.py (new, 6 cases): returns config'd dir, caches on second call (yaml_load patched to raise — never invoked), invalidates on mtime bump, empty when config missing, returned list is a defensive copy, per-HERMES_HOME cache key isolation. - Existing tests/agent/test_external_skills.py and tests/tools/ continue to pass modulo pre-existing flakes on main (test_delegate, test_send_message — unrelated, pass in isolation). Measured: bare `hermes` (cold → REPL ready) 21,519ms -> 2,618ms on Teknium's install (119 skills, 15 KB config.yaml, Nous auth logged in, lark_oapi installed). 8x faster.	2026-05-08 16:39:32 -07:00
Teknium	cc38282b04	feat(cross-platform): psutil for PID/process management + Windows footgun checker ## Why Hermes supports Linux, macOS, and native Windows, but the codebase grew up POSIX-first and has accumulated patterns that silently break (or worse, silently kill!) on Windows: - `os.kill(pid, 0)` as a liveness probe — on Windows this maps to CTRL_C_EVENT and broadcasts Ctrl+C to the target's entire console process group (bpo-14484, open since 2012). - `os.killpg` — doesn't exist on Windows at all (AttributeError). - `os.setsid` / `os.getuid` / `os.geteuid` — same. - `signal.SIGKILL` / `signal.SIGHUP` / `signal.SIGUSR1` — module-attr errors at runtime on Windows. - `open(path)` / `open(path, "r")` without explicit encoding= — inherits the platform default, which is cp1252/mbcs on Windows (UTF-8 on POSIX), causing mojibake round-tripping between hosts. - `wmic` — removed from Windows 10 21H1+. This commit does three things: 1. Makes `psutil` a core dependency and migrates critical callsites to it. 2. Adds a grep-based CI gate (`scripts/check-windows-footguns.py`) that blocks new instances of any of the above patterns. 3. Fixes every existing instance in the codebase so the baseline is clean. ## What changed ### 1. psutil as a core dependency (pyproject.toml) Added `psutil>=5.9.0,<8` to core deps. psutil is the canonical cross-platform answer for "is this PID alive" and "kill this process tree" — its `pid_exists()` uses `OpenProcess + GetExitCodeProcess` on Windows (NOT a signal call), and its `Process.children(recursive=True)` + `.kill()` combo replaces `os.killpg()` portably. ### 2. `gateway/status.py::_pid_exists` Rewrote to call `psutil.pid_exists()` first, falling back to the hand-rolled ctypes `OpenProcess + WaitForSingleObject` dance on Windows (and `os.kill(pid, 0)` on POSIX) only if psutil is somehow missing — e.g. during the scaffold phase of a fresh install before pip finishes. ### 3. `os.killpg` migration to psutil (7 callsites, 5 files) - `tools/code_execution_tool.py` - `tools/process_registry.py` - `tools/tts_tool.py` - `tools/environments/local.py` (3 sites kept as-is, suppressed with `# windows-footgun: ok` — the pgid semantics psutil can't replicate, and the calls are already Windows-guarded at the outer branch) - `gateway/platforms/whatsapp.py` ### 4. `scripts/check-windows-footguns.py` (NEW, 500 lines) Grep-based checker with 11 rules covering every Windows cross-platform footgun we've hit so far: 1. `os.kill(pid, 0)` — the silent killer 2. `os.setsid` without guard 3. `os.killpg` (recommends psutil) 4. `os.getuid` / `os.geteuid` / `os.getgid` 5. `os.fork` 6. `signal.SIGKILL` 7. `signal.SIGHUP/SIGUSR1/SIGUSR2/SIGALRM/SIGCHLD/SIGPIPE/SIGQUIT` 8. `subprocess` shebang script invocation 9. `wmic` without `shutil.which` guard 10. Hardcoded `~/Desktop` (OneDrive trap) 11. `asyncio.add_signal_handler` without try/except 12. `open()` without `encoding=` on text mode Features: - Triple-quoted-docstring aware (won't flag prose inside docstrings) - Trailing-comment aware (won't flag mentions in `# os.kill(pid, 0)` comments) - Guard-hint aware (skips lines with `hasattr(os, ...)`, `shutil.which(...)`, `if platform.system() != 'Windows'`, etc.) - Inline suppression with `# windows-footgun: ok — <reason>` - `--list` to print all rules with fixes - `--all` / `--diff <ref>` / staged-files (default) modes - Scans 380 files in under 2 seconds ### 5. CI integration A GitHub Actions workflow that runs the checker on every PR and push is staged at `/tmp/hermes-stash/windows-footguns.yml` — not included in this commit because the GH token on the push machine lacks `workflow` scope. A maintainer with `workflow` permissions should add it as `.github/workflows/windows-footguns.yml` in a follow-up. Content: ```yaml name: Windows footgun check on: push: branches: [main] pull_request: branches: [main] jobs: check: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: {python-version: "3.11"} - run: python scripts/check-windows-footguns.py --all ``` ### 6. CONTRIBUTING.md — "Cross-Platform Compatibility" expansion Expanded from 5 to 16 rules, each with message, example, and fix. Recommends psutil as the preferred API for PID / process-tree operations. ### 7. Baseline cleanup (91 → 0 findings) - 14 `open()` sites → added `encoding='utf-8'` (internal logs/caches) or `encoding='utf-8-sig'` (user-editable files that Notepad may BOM) - 23 POSIX-only callsites in systemd helpers, pty_bridge, and plugin tool subprocess management → annotated with `# windows-footgun: ok — <reason>` - 7 `os.killpg` sites → migrated to psutil (see §3 above) ## Verification ``` $ python scripts/check-windows-footguns.py --all ✓ No Windows footguns found (380 file(s) scanned). $ python -c "from gateway.status import _pid_exists; import os > print('self:', _pid_exists(os.getpid())); print('bogus:', _pid_exists(999999))" self: True bogus: False ``` Proof-of-repro that `os.kill(pid, 0)` was actually killing processes before this fix — see commit ``1cbe39914`` and bpo-14484. This commit removes the last hand-rolled ctypes path from the hot liveness-check path and defers to the best-maintained cross-platform answer.	2026-05-08 14:27:40 -07:00
Teknium	40e7a71c35	feat: enrich system-prompt environment hints with host + terminal-backend info build_environment_hints() now emits a factual block describing the execution environment on every prompt build: * Local backend: host OS, $HOME, and cwd — so the agent stops guessing paths from the hostname. Windows also gets two specific callouts: - hostname != username (prevents C:\Users\<hostname>\... bugs) - `terminal` shells out to bash (git-bash/MSYS), not PowerShell * Remote backend (docker/singularity/modal/daytona/ssh/vercel_sandbox): host info is SUPPRESSED — the agent's tools can't touch the host, so showing it is misleading. Instead we probe the backend once per process with `uname/whoami/pwd` and cache the result. On probe failure, fall back to a per-backend description that states only what we know from the backend choice itself (container type + likely OS family) without inventing user/cwd/$HOME. Linux/Mac local users now get a small helpful 3-line host block instead of an empty string. Zero change to the existing WSL hint paragraph. Tests: 8 new/updated in TestEnvironmentHints, including a regression guard that fails if a new remote backend is added without listing it in _REMOTE_TERMINAL_BACKENDS.	2026-05-08 14:27:40 -07:00
Teknium	cbce5e93fc	codebase: add encoding='utf-8' to all bare open() calls (PLW1514) Closes the last Python-on-Windows UTF-8 exposure by making every text-mode open() call explicit about its encoding. Before: on Windows, bare open(path, 'r') defaults to the system locale encoding (cp1252 on US-locale installs). That means reading any config/yaml/markdown/json file with non-ASCII content either crashes with UnicodeDecodeError or silently mis-decodes bytes. After: all 89 affected call sites in production code now pass encoding='utf-8' explicitly. Works identically on every platform and every locale, no surprise behavior. Mechanical sweep via: ruff check --preview --extend-select PLW1514 --unsafe-fixes --fix --exclude 'tests,venv,.venv,node_modules,website,optional-skills, skills,tinker-atropos,plugins' . All 89 fixes have the same shape: open(x) or open(x, mode) became open(x, encoding='utf-8') or open(x, mode, encoding='utf-8'). Nothing else changed. Every modified file still parses and the Windows/sandbox test suite is still green (85 passed, 14 skipped, 0 failed across tests/tools/test_code_execution_windows_env.py + tests/tools/test_code_execution_modes.py + tests/tools/test_env_passthrough.py + tests/test_hermes_bootstrap.py). Scope notes: - tests/ excluded: test fixtures can use locale encoding intentionally (exercising edge cases). If we want to tighten tests later that's a separate PR. - plugins/ excluded: plugin-specific conventions may differ; plugin authors own their code. - optional-skills/ and skills/ excluded: skill scripts are user-authored and we don't want to mass-edit them. - website/ and tinker-atropos/ excluded: vendored / generated content. 46 files touched, 89 +/- lines (symmetric replacement). No behavior change on POSIX or on Windows when the file is ASCII; bug fix on Windows when the file contains non-ASCII.	2026-05-08 14:27:40 -07:00
ddupont	e31f3b3c56	feat(computer-use): background focus-safe backend — set_value, structured windows, MIME detection Extends the cua-driver computer-use backend to drive backgrounded macOS windows without stealing keyboard or mouse focus from the foreground app. All changes target the cua-driver MCP backend and the shared dispatcher. ## cua_backend.py Window-aware capture: capture() now calls list_windows + get_window_state instead of the removed capture tool. Prefers structuredContent.windows (MCP 2024-11-05+ cua-driver) for zero-parse window enumeration; falls back to regex-parsed text for older builds. Stores the selected (pid, window_id) as sticky context so subsequent action calls do not need a redundant round-trip. Action routing: click/scroll/type_text/key all carry the sticky pid (and window_id for element-indexed clicks). type_text routes through type_text_chars (individual key events) rather than AX attribute write -- WebKit AXTextFields reject attribute writes from backgrounded processes. Key parsing: _parse_key_combo splits cmd+s-style strings into (key, [modifiers]) and routes to hotkey (modifier present) or press_key (bare key) -- cua-driver actual tool names. set_value method: new set_value(value, element) calls the cua-driver set_value MCP tool. For AXPopUpButton / HTML select in a backgrounded Safari, AXPress opens the native macOS popup which closes immediately when the app is non-frontmost; set_value AX-presses the matching child option directly (no menu required, no focus steal). focus_app: reimplemented as a pure window-selector (enumerates list_windows, sets sticky pid/window_id) without ever raising the window or stealing focus. list_apps: fixed tool name from listApps to list_apps; handles plain-text response via regex when structured data is absent. Structured-content extraction: _extract_tool_result now surfaces structuredContent from MCP results, enabling the list_windows window array without text parsing. Helpers: _parse_windows_from_text, _parse_elements_from_tree, _split_tree_text, _parse_key_combo extracted as module-level functions. ## schema.py Added set_value to the action enum with a description explaining when to prefer it over click (select/popup elements, sliders, no focus steal). Added value field for set_value payloads. ## tool.py Routed set_value action through _dispatch to backend.set_value. Added set_value to _DESTRUCTIVE_ACTIONS (approval-gated). Fixed MIME-type detection in _capture_response: cua-driver may return JPEG; detect from base64 magic bytes (/9j/ -> image/jpeg, else image/png) rather than hardcoding image/png. ## agent/display.py + run_agent.py Guard _detect_tool_failure and result-preview logic against non-string function_result values: multimodal tool results (dicts with _multimodal=True) are not string-sliceable; treat them as successes and fall back to str() for length/preview.	2026-05-08 11:07:38 -07:00
Teknium	850413f120	feat(computer-use): cua-driver backend, universal any-model schema Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl\|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form.	2026-05-08 11:07:38 -07:00
kshitijk4poor	81928f03ab	refactor(gmi): move User-Agent to profile.default_headers The previous revision of this PR added six GMI-specific branches (`elif base_url_host_matches(..., 'api.gmi-serving.com')`) across run_agent.py and agent/auxiliary_client.py, plus a _HERMES_UA_HEADERS constant in auxiliary_client.py. ProviderProfile already has a `default_headers: dict[str, str]` field commented as 'Client-level quirks (set once at client construction)'. Other plugins (ai-gateway, kimi-coding) already use it. Two of the four auxiliary_client sites we previously patched already had a generic `else: profile.default_headers` fallback that picked it up (so did both run_agent sites). This revision: * Sets `default_headers={'User-Agent': 'HermesAgent/<ver>'}` on the GMI profile in plugins/model-providers/gmi/__init__.py. * Reverts all six GMI-specific branches in run_agent.py and auxiliary_client.py. * Adds the generic profile-fallback `else` block to the two auxiliary_client sites (`_to_async_client`, `resolve_provider_client`) that didn't have it yet. This benefits every provider whose profile declares default_headers, not just GMI — e.g. Vercel AI Gateway's HTTP-Referer/X-Title now flow through the async client path too. * Replaces the GMI-specific URL-branch tests with a profile-level assertion and keeps the run_agent integration test (with `provider='gmi'` so the fallback picks up the profile). Net diff vs main: +82/-0 across 5 files, touching only the GMI plugin, two generic fallback blocks in auxiliary_client.py, AUTHOR_MAP, and tests. No core files change. Based on #20907 by @isaachuangGMICLOUD.	2026-05-08 03:22:11 -07:00
Austin Pickett	d87c7b99e2	fix(analytics): prevent silent token loss and add Claude 4.5–4.7 pricing (#21455 ) - Add pricing entries for Claude Opus 4.5/4.6/4.7, Sonnet 4.5/4.6, and Haiku 4.5 with updated source URLs (platform.claude.com) - Add _normalize_anthropic_model_name() to handle dot-notation variants (e.g. claude-opus-4.7 → claude-opus-4-7) for pricing lookups - Fix silent token loss: ensure session row exists before UPDATE in both run_agent.py and hermes_state.py (INSERT OR IGNORE is idempotent) - Log token persistence failures at DEBUG level instead of swallowing them silently — makes undercounted analytics diagnosable - Surface reasoning tokens in CLI /usage and TUI usage panel - Add 'reasoning' and 'cost_status' fields to TUI Usage type	2026-05-07 13:24:31 -07:00
LeonSGP43	fc88eec926	fix(compressor): soften summary prompt for content filters	2026-05-07 06:42:32 -07:00

1 2 3 4 5 ...

809 commits