hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-05 12:42:30 +00:00

Author	SHA1	Message	Date
rob-maron	c23a87bc16	union paid recs from nous portal with static list (#24509 )	2026-05-12 12:16:17 -07:00
rob-maron	2863e9484a	Use nous portal as model metadata authority (#24502 ) * nous portal metadata resolver * minor fixes	2026-05-12 11:59:31 -07:00
Teknium	c594a23047	feat(agent): per-turn file-mutation verifier footer (#24498 ) Detect when write_file / patch calls fail during a turn and are never superseded by a successful write to the same path. When the final text response is delivered, append an advisory footer listing the files that did NOT change — so models that over-claim 'patched 5 files' after 4 silent failures can't hide the lie. Catches the failure mode reported in Ben Eng's llm-wiki session: grok-4.1-fast issued batches of parallel patches, half failed with 'Could not find old_string', and the agent summarised the turn claiming every file was edited. The user had to manually run 'git status' each turn to catch it. The verifier is a pure post-hoc check on tool results — no new LLM calls, no synthetic messages injected into history (prompt cache preserved), no changes to tool argument dispatch. Per-turn state is keyed by path; a later successful write to the same path clears the failure entry so single-file retry recovery is not flagged. Wired into both _execute_tool_calls_concurrent and _execute_tool_calls_sequential, so batched parallel patches and one-at- a-time edits are both covered. Footer emission happens after the agent loop exits, before transform_llm_output / post_llm_call plugin hooks run, so plugins still see (and can modify) the augmented text. Config: display.file_mutation_verifier (bool, default true) + HERMES_FILE_MUTATION_VERIFIER env override. 31 unit tests in tests/run_agent/test_file_mutation_verifier.py cover target extraction (write_file, patch-replace, patch-v4a single and multi-file), error-preview extraction (JSON .error field and plain string), per-turn state transitions (first-error-wins on repeated failure, success supersedes failure), footer rendering (truncation at 10 entries, user-actionable hint), and env/config precedence. Companion docs updated: user-guide/configuration.md + reference/environment-variables.md.	2026-05-12 11:54:13 -07:00
Teknium	c1eb2dcda7	feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220 ) * feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback Three coordinated mitigations for the Mini Shai-Hulud worm hitting mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package compromise that follows. # What this PR makes true 1. Users with the poisoned mistralai 2.4.6 in their venv get a loud detection banner with copy-pasteable remediation steps the moment they run hermes (and on every gateway startup). 2. One quarantined / yanked PyPI package can no longer silently demote a fresh install to 'core only' — the installer keeps every other extra and tells the user which tier landed. 3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can lazy-install on first use under a strict allowlist, instead of eagerly pulling everything at install time. # Detection: hermes_cli/security_advisories.py - ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for mistralai==2.4.6). Adding the next one is a single dataclass. - detect_compromised() uses importlib.metadata.version() — no pip dependency, works in uv venvs that lack pip. - Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits the startup banner to once per 24h per advisory. - Acks persisted to security.acked_advisories in config.yaml; never re-banner after ack. - Wired into: * hermes doctor — runs first, prints full remediation block * hermes doctor --ack <id> — dismisses an advisory * cli.py interactive run() and single-query branches — short stderr banner pointing at hermes doctor * gateway/run.py startup — operator-visible warning in gateway.log # Lazy-install framework: tools/lazy_deps.py - LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs, memory.honcho, provider.bedrock, etc.) to pip specs. - ensure(feature) installs missing deps in the active venv via the uv → pip → ensurepip ladder (matches tools_config._pip_install). - Strict spec safety regex rejects URLs, file paths, shell metas, pip flag injection, control chars — only PyPI-by-name accepted. - Gated on security.allow_lazy_installs (default true) plus the HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs. - Migrated three backends as proof of pattern: * tools/tts_tool.py — _import_elevenlabs() calls ensure first * plugins/memory/honcho/client.py — get_honcho_client lazy-installs * tts.mistral / stt.mistral entries pre-registered for when PyPI restores mistralai # Installer fallback tiers scripts/install.sh, scripts/install.ps1, setup-hermes.sh: - Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one array when a transitive breaks; users keep every other extra. - New 'all minus known-broken' tier between [all] and the existing PyPI-only-extras tier. Only kicks in when [all] fails resolve. - All three tiers explicit: every fallback announces which tier landed and prints a re-run hint when not on Tier 1. - install.ps1 and install.sh both regenerate their tier specs from the same _BROKEN_EXTRAS array so updates stay in sync. Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral' in its extra list — bug fixed by the refactor (mistral is filtered out). # Config hermes_cli/config.py — DEFAULT_CONFIG.security gains: - acked_advisories: [] (advisory IDs the user has dismissed) - allow_lazy_installs: True (security gate for ensure()) No config version bump needed — both keys nest under existing security: block, and load_config's deep-merge picks up DEFAULT_CONFIG defaults for users with older configs. # Tests tests/hermes_cli/test_security_advisories.py — 23 tests covering: - detect_compromised matches/non-matches, wildcard frozenset - ack persistence, idempotence, blank rejection, config-failure path - banner cache rate limiting + 24h re-banner + ack-stops-banner - short_banner_lines / full_remediation_text / render_doctor_section / gateway_log_message - shipped catalog well-formedness invariant tests/tools/test_lazy_deps.py — 40 tests covering: - spec safety: 11 safe parametrized + 18 unsafe parametrized - allowlist: unknown-feature rejection, namespace.name shape, every shipped spec passes the safety regex - security gating: config flag, env var, default, fail-open - ensure() happy/sad paths: already-satisfied, install success, pip stderr surfaced on failure, install-succeeds-but-still-missing - is_available, feature_install_command Combined: 63 new tests, all passing under scripts/run_tests.sh. # Validation - scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py tests/tools/test_lazy_deps.py → 63/63 passing - scripts/run_tests.sh tests/hermes_cli/test_doctor.py tests/hermes_cli/test_doctor_command_install.py tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing - scripts/run_tests.sh tests/hermes_cli/ tests/tools/ → 9191 passed, 8 pre-existing failures (verified on origin/main before this change) - bash -n on install.sh and setup-hermes.sh → OK - py_compile on all modified .py files → OK - End-to-end smoke test of detect_compromised + render_doctor_section + gateway_log_message with mocked installed version → produces copy-pasteable remediation output # Community Full advisory + remediation steps: website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md Short-form post drafts (Discord, GitHub pinned issue, README banner): scripts/community-announcement-shai-hulud.md Refs: PR #24205 (mistral disabled), Socket Security advisory <https://socket.dev/blog/mini-shai-hulud-worm-pypi> * build(deps): pin every direct dep to ==X.Y.Z (no ranges) Companion to the supply-chain advisory work: replace every >=/</~= range in pyproject.toml's [project.dependencies] and [project.optional-dependencies] with an exact ==X.Y.Z pin sourced from uv.lock. Why: ranges allow PyPI to ship a fresh version of any direct dep at any time without a code review on our side. With ranges, the malicious mistralai 2.4.6 release would have been pulled by every fresh 'pip install -e .[all]' for the hours between upload and PyPI's quarantine — exactly the install window we got hit on. Exact pins close that window: the only way a new package version reaches a user is via an intentional update on our end. What the user-facing change is: nothing, behavior-wise. Every package resolves to the same version it was already resolving to via uv.lock — the pins just remove the resolver's freedom to pick a different one. Cost: any user installing Hermes alongside another package that requires a newer pin gets a resolver conflict. Acceptable for our isolated-venv install path; documented in the new comment block. Build-system requires line (setuptools>=61.0) is intentionally left as a range — pinning the build backend would block fresh pip from bootstrapping the build on architectures where that exact wheel isn't available. mistral extra (mistralai==2.3.0) is pinned but stays out of [all] (per PR #24205). 'uv lock' regeneration will fail until PyPI restores mistralai; lockfile regeneration is gated behind that, NOT on every PR. LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy- install pathway can never resolve a different version than the one declared in pyproject.toml. Validation: - Cross-checked all 77 pinned direct deps in pyproject.toml against uv.lock — every pin matches the resolved version exactly. - Cross-checked all LAZY_DEPS specs against uv.lock — same. - 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly. - tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py → 63/63 passing (every shipped spec passes the safety regex). - Doctor + TTS + transcription targeted suite → 146/146 passing. * build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra You asked: 'what about the dependencies the dependencies rely on?' — correctly noting that exact-pinning direct deps in pyproject.toml does NOT cover the transitive graph. `pip install` and `uv pip install` both re-resolve transitives fresh from PyPI at install time, so a compromised transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would still hit our users even with every direct dep exact-pinned. # What this commit fixes 1. Both real installer scripts now prefer `uv sync --locked` as Tier 0. uv.lock records SHA256 hashes for every transitive — a compromised package with a different hash gets REJECTED. Falls through to the existing `uv pip install` cascade if the lockfile is missing or stale, with a loud warning that the fallback path does NOT hash-verify transitives. Previously only `setup-hermes.sh` (the dev path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1` (the paths fresh users actually run) skipped it. 2. Removed the `[mistral]` extra entirely. The `mistralai` PyPI project is fully quarantined right now — every version returns 404, so any pin we wrote was unresolvable, which broke `uv lock --check` in CI. Restoration is documented in pyproject.toml as a 5-step checklist (verify, re-add extra, re-enable in 4 modules, regenerate lock, optionally re-add to [all]). 3. Regenerated uv.lock. 262 packages, mistralai/eval-type-backport/ jsonpath-python pruned. `uv lock --check` now passes. # Defense-in-depth view \| Layer \| Where \| Protects against \| \|----------------------------\|-------------------\|-------------------------------------------\| \| Exact pins in pyproject \| direct deps \| new mistralai 2.4.6-style direct compromise \| \| uv.lock + `--locked` install \| transitive graph \| transitive worm injection \| \| Tier-0 hash-verified path \| install.sh / .ps1 \| actually USE the lockfile in fresh installs \| \| `uv lock --check` CI gate \| every PR \| drift between pyproject and lockfile \| \| `hermes_cli/security_advisories.py` \| runtime \| cleanup for users who already got hit \| The exact pinning + hash verification together close the supply-chain gap. Without the lockfile path, exact pins alone are theater. # Validation - `uv lock --check` → passes (262 packages resolved, no drift). - `bash -n` on install.sh + setup-hermes.sh → OK. - 209/209 tests passing across new + adjacent test files (test_lazy_deps.py, test_security_advisories.py, test_doctor.py, test_tts_mistral.py, test_transcription_tools.py). - TOML parse OK. * chore: remove community announcement drafts (PR body covers it) * build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard) Extends the lazy-install framework to cover everything that's not used by every hermes session. Base install drops from ~60 packages to 45. Moved out of core dependencies = []: - anthropic (only when provider=anthropic native, not via aggregators) - exa-py, firecrawl-py, parallel-web (search backends; only when picked) - fal-client (image gen; only when picked) - edge-tts (default TTS but still optional) New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web] [fal] [edge-tts]. All added to [all]. New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel}, tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix}, terminal.{modal,daytona,vercel}, tool.dashboard. Each import site now calls ensure() before importing the SDK. Where the module had a top-level try/except (telegram, discord, fastapi), the graceful-fallback pattern was extended to lazy-install on first check_*_requirements() call and re-bind module globals. Updated test_windows_native_support.py tzdata check from snapshot (>=2023.3 literal) to invariant (any version + win32 marker). Validation: - Base install: 45 packages (was ~60); 6 newly-extracted packages absent - uv lock --check: passes (262 packages, no drift) - 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing - py_compile clean on all 12 modified modules	2026-05-12 01:02:25 -07:00
Teknium	99ad2d1372	fix(deps): unbreak [all] install — drop mistralai while PyPI quarantined (#24205 ) The `mistralai` PyPI package was quarantined on 2026-05-12 after a malicious 2.4.6 release. Every fresh resolve (AUR makepkg, Docker build, CI run, install.sh first-run) currently fails on `mistralai>=2.3.0,<3` because PyPI returns zero candidates. Existing users running `hermes update` mostly didn't notice — `hermes update` falls back from `.[all]` to per-extra retries and silently skips mistral with a warning that scrolls past. But fresh installs hard-fail or lose every other extra. Changes: - pyproject.toml: drop `hermes-agent[mistral]` from `[all]` and `[termux-all]`. The `mistral` extra itself is preserved so users can opt back in once PyPI un-quarantines. - hermes_cli/tools_config.py: hide Mistral Voxtral TTS from the `hermes tools` provider picker until restored. - hermes_cli/web_server.py: drop "mistral" from dashboard STT options. - tools/transcription_tools.py: explicit `provider: mistral` returns "none" with a clear status message; auto-detect skips mistral. - tools/tts_tool.py: dispatcher returns a clear "temporarily disabled" error before any SDK import attempt (avoids cached-stale-package surprises). - tests/tools/: update three test files to assert the new disabled behavior. Each test docstring records why and points at the rollback trigger (PyPI un-quarantines mistralai). Restore plan: revert this commit once the package is available on PyPI again. The behavior change is intentional and documented in code comments + test docstrings to make the rollback trivial. Validation: - scripts/run_tests.sh tests/tools/ -k 'mistral or stt or tts' → 425/425 passing. Refs: https://pypi.org/simple/mistralai/ (currently "pypi:project-status: quarantined").	2026-05-11 23:02:15 -07:00
Austin Pickett	58e2109f10	fix(minimax): harden OAuth dashboard and runtime Handle MiniMax OAuth expiry values consistently across CLI and dashboard flows, fix CLI status/add behavior, and force pooled OAuth runtime requests through Anthropic Messages. - web_server._minimax_poller: parse expired_in via the shared resolver so unix-ms absolute timestamps stop landing as TTL seconds and crashing with 'year 583911 is out of range' when a user connects MiniMax OAuth from the dashboard. - auth._minimax_oauth_login / _refresh_minimax_oauth_state: same fix on the CLI login + refresh paths. - auth.get_auth_status: dispatch minimax-oauth to its dedicated status function instead of falling through. - auth_commands.auth_add_command: 'hermes auth add minimax-oauth' now starts the device-code login flow and persists a pool entry with the access + refresh tokens, instead of requiring credentials to already exist. - runtime_provider._resolve_runtime_from_pool_entry: pin pooled minimax-oauth credentials to anthropic_messages so a stale model.api_mode: chat_completions can't send requests to /anthropic/chat/completions and trigger MiniMax nginx 404s. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-11 22:15:16 -07:00
Teknium	7993e03c06	fix(cache): route Nous Portal Qwen through Portal-Claude cache pathway (#24151 ) Qwen models on Nous Portal (e.g. qwen3.6-plus) now get the same envelope-layout cache_control markers and long-lived (1h cross-session) cache treatment as Portal Claude. Portal proxies to OpenRouter with identical wire-format and cache_control semantics, but the prior policy left Portal Qwen falling through to the alibaba-family branch (which only matches provider=opencode/alibaba), serving 0% cache hits and re-billing the full prompt every turn. Scope is narrow: Portal Claude OR Portal Qwen. Other models on Portal keep their existing behavior. - _anthropic_prompt_cache_policy: add (is_nous_portal and qwen) -> (True, False) - _supports_long_lived_anthropic_cache: drop Claude-only gate for Portal so Qwen also gets the validated 1h cross-session layout - tests cover both functions, both bare and vendored qwen slug forms, and the rejection of non-Claude non-Qwen Portal traffic	2026-05-11 21:04:55 -07:00
Teknium	e85592591e	fix(nous): surface Portal-flagged free models in picker even when curated list is stale (#24082 ) Free-tier users were seeing 'No free models currently available.' in the `hermes model` and post-login pickers even though qwen/qwen3.6-plus is free on the Portal right now. Three independent breakages compounded: 1. The docs-hosted catalog manifest at website/static/api/model-catalog.json was not regenerated when _PROVIDER_MODELS['nous'] was updated, so users fetching the manifest got a list that didn't include qwen/qwen3.6-plus. 2. _resolve_nous_pricing_credentials() returned ('', '') on any auth blip, collapsing get_pricing_for_provider('nous') to {} and making every curated model fall through the free-tier filter as 'paid'. 3. Even with healthy pricing, the picker only ever showed models from the in-repo curated list intersected with live pricing — a Portal-flagged free model not yet in the curated list could never appear. Changes: - hermes_cli/models.py: new union_with_portal_free_recommendations() that augments the curated list with Portal freeRecommendedModels entries (with synthetic free pricing so partition keeps them). The Portal's /api/nous/recommended-models endpoint is now the source of truth for free-tier surfacing — old Hermes builds will see new free models without a CLI release. - hermes_cli/models.py: _resolve_nous_pricing_credentials() falls back to the public inference base URL when runtime cred resolution fails. The /v1/models endpoint exposes pricing without auth, so silently returning {} just because a refresh token expired was wrong. - hermes_cli/auth.py + hermes_cli/main.py: both free-tier picker call sites call union_with_portal_free_recommendations() before partition. - tests/hermes_cli/test_models.py: 7 tests covering union behaviour (prepend, dedup, end-to-end with stale pricing, empty/missing/error payloads, invalid entries). - tests/hermes_cli/test_model_catalog.py: drift guard TestManifestMatchesInRepoLists fails CI when _PROVIDER_MODELS['nous'] or OPENROUTER_MODELS is edited without re-running scripts/build_model_catalog.py. Verified empirically that removing a manifest entry triggers an assertion with an actionable error message. Validation: - 133/133 targeted tests pass (test_models, test_model_catalog, test_auth_nous_provider). - Live E2E against the real Portal: - Stale curated list ['claude-opus','claude-sonnet','gpt-5.4'] (no qwen) → after union: ['qwen/qwen3.6-plus', ...] → partition(free_tier=True): selectable=['qwen/qwen3.6-plus']. - Simulated expired refresh token → anon fetch returns 403 pricing entries including qwen/qwen3.6-plus -> {prompt:0, completion:0}. - ruff: clean.	2026-05-11 18:08:16 -07:00
Teknium	ced1990c1c	feat(computer-use): refresh cua-driver on `hermes update` + add `install --upgrade` (#24063 ) cua-driver was only installed once on toolset enable: `_run_post_setup` early-returns when the binary is already on PATH, so upstream fixes (e.g. v0.1.6 Safari window-focus fix) never reached existing users without manual reinstall. Two refresh points now: - `hermes update` re-runs the upstream installer at the end of the update if cua-driver is on PATH (macOS-only, no-op otherwise). Ties driver freshness to the user-controlled update cadence — no startup latency, no per-launch GitHub API call. - `hermes computer-use install --upgrade` for manual force-refresh. The upstream `install.sh` always pulls the latest release, so re-running is the canonical upgrade path. No version-comparison logic needed. `hermes computer-use status` now shows the installed version, and points at `--upgrade` for refreshing.	2026-05-11 17:10:58 -07:00
Ahmed Badr	05bad7b1e7	fix(dashboard): MiniMax 'Login' button launched Claude OAuth (#22832 ) Fixes #22832. ## Root cause `hermes_cli/web_server.py:start_oauth_login` dispatched OAuth flows by the catalog's `flow` field rather than provider id: if catalog_entry["flow"] == "pkce": return _start_anthropic_pkce() The catalog had two `flow: "pkce"` entries — `anthropic` and `minimax-oauth` — so clicking "Login" on MiniMax in the dashboard's Keys tab unconditionally launched the Anthropic/Claude PKCE flow. ## Fix Three changes in `hermes_cli/web_server.py`: 1. Catalog entry for `minimax-oauth` changed from `flow: "pkce"` to `flow: "device_code"`. From a UX perspective MiniMax is a verification-URI + user-code flow (open URL, enter code, backend polls) — same shape as Nous's device-code flow. The PKCE bit (verifier + challenge from `_minimax_pkce_pair`) is a security extension that doesn't change the operator experience; the existing dashboard modal already renders `device_code` correctly for this UX. 2. New MiniMax branch in `_start_device_code_flow`, mirroring the existing Nous branch but calling MiniMax-specific helpers (`_minimax_request_user_code`, `_minimax_pkce_pair`). Stashes verifier + state in the session for the poller to consume. Handles the overloaded `expired_in` field (could be unix-ms timestamp OR seconds-from-now duration) the same way `_minimax_poll_token` does. 3. New `_minimax_poller` background thread mirroring `_nous_poller`. Calls `_minimax_poll_token` → on success builds the same `auth_state` dict the CLI flow (`_minimax_oauth_login`) builds, and persists via `_minimax_save_auth_state` so the dashboard path leaves the system in the same state as `hermes auth add minimax-oauth`. Plus a dispatcher tightening to prevent regression: the `pkce` branch now requires `provider_id == "anthropic"`, so any future PKCE provider added without a proper start function gets a clean `400 Unsupported flow` rather than silently launching Anthropic OAuth. ## Test New `tests/hermes_cli/test_web_oauth_dispatch.py`: - Regression test asserting MiniMax start does NOT return claude.ai - Sanity test that Anthropic PKCE still works after the dispatcher tightening - Forward-looking test: a hypothetical pkce-flagged provider without an explicit branch is rejected cleanly rather than misrouted ## Limitations - The dashboard MiniMax path defaults to `region="global"`. CN-region operators can still use the CLI flow which supports `--region cn`. Adding a region toggle to the dashboard UI is a follow-up.	2026-05-11 16:51:09 -07:00
Teknium	ea1d0462cf	fix(cli): vertical fallback for markdown tables wider than terminal (#23948 ) Follow-up to #23863 (CJK table alignment). The realigner was correctly padding pipes to identical column offsets, but when a table's natural width exceeds terminal cells it produced lines that the terminal soft-wrapped mid-cell, destroying column alignment visually even though the bytes were perfectly padded. Reported as 'columns are not aligned' on tables containing one long row alongside several short rows. Approach mirrors Claude Code's MarkdownTable.tsx narrow-terminal fallback: when realign_markdown_tables is given an available_width budget and the rebuilt horizontal table exceeds it, render each body row as 'Header: value' lines separated by a thin ─ rule. Word-wraps oversize values at the budget with a 2-space continuation indent. - agent/markdown_tables.py: realign_markdown_tables(text, available_width=None); threshold check at the top of _render_block flips into a new _render_vertical fallback. Includes _wrap_to_width with hard-break for tokens longer than the budget. - cli.py: helper _terminal_width_for_streaming() returns shutil.get_terminal_size().columns minus _STREAM_PAD and a 2-cell safety margin; passed to all three realign call sites (_render_final_assistant_content for strip+render Panel paths, and the streaming flushers in _emit_stream_text / _flush_stream). - tests/agent/test_markdown_tables.py: 4 new tests covering the overflow-vertical fallback for ASCII + CJK content, the 'fits → keep horizontal' case, and the long-cell wrap with indent. Live-verified: with COLUMNS=100, the user's reported 'long row in ASCII table' case now renders as vertical key-value rows that all fit the panel; the 6-column CJK comparison table still renders as an aligned horizontal table because it fits inside 100 cols.	2026-05-11 16:49:13 -07:00
ethernet	825bd50e6b	Merge pull request #18036 from NousResearch/fix/bundle-size ui-tui: bundle with esbuild, drop runtime node_modules	2026-05-11 17:46:19 -04:00
ethernet	c6ca11618a	refactor(tui): simplify TUI build logic, remove stale staleness checks The old mtime-tracking staleness machinery (_tui_build_needed, _hermes_ink_bundle_stale, _find_bundled_tui) tried to avoid rebuilding by comparing source timestamps to dist/entry.js. This was fragile and added ~100 lines of code. Replace with three clear paths: 1. HERMES_TUI_DIR set (prebuilt/nix): just node dist/entry.js, no build 2. --dev mode: tsx src/entry.tsx, no build, hot reload 3. Normal: always npm run build (esbuild is ~1s, correctness > caching) Also error when HERMES_TUI_DIR is set with --dev (footgun: prebuilt bundle has no source code to hot-reload).	2026-05-11 17:04:34 -04:00
ethernet	3197b4de6d	Merge remote-tracking branch 'origin/main' into fix/bundle-size	2026-05-11 16:01:04 -04:00
Siddharth Balyan	271883447e	feat: expose HERMES_SESSION_ID to agent tools via ContextVar + env (#23847 ) Set HERMES_SESSION_ID using the existing session_context.py ContextVar system for concurrency safety (multiple gateway sessions in one process won't cross-talk). Also writes os.environ as fallback for CLI mode. Touchpoints: - gateway/session_context.py: Add _SESSION_ID ContextVar + _VAR_MAP entry - run_agent.py: Set both ContextVar and os.environ at init and on context-compression rotation - tools/environments/local.py: Bridge ContextVars into subprocess env in _make_run_env() (ContextVars don't propagate to child processes) - tests/run_agent/test_session_id_env.py: 3 tests covering env, provided ID, and ContextVar paths execute_code subprocess already passes HERMES_* prefixed vars through _scrub_child_env (line 82: _SAFE_ENV_PREFIXES includes 'HERMES_'). Primary use case: webhook-triggered agents that need to include a `--resume <session_id>` takeover command in their output.	2026-05-12 00:16:45 +05:30
Teknium	7b76366552	feat(prompt-cache): cross-session 1h prefix cache for Claude on Anthropic / OpenRouter / Nous Portal (#23828 ) Cuts input cost for first-turn Claude requests by ~85-90% on subsequent sessions within an hour. Tools array (~13k tokens for default toolset) + stable system prefix (~5-8k tokens) get a 1h cache_control marker; the volatile suffix (memory, USER profile, timestamp, session id) sits in a separate non-cached block at the end so it doesn't poison the cross-session prefix when it changes. Provider gate: Claude on native Anthropic (incl. OAuth subscription), OpenRouter, and Nous Portal (which proxies to OpenRouter). All other providers keep today's system_and_3 layout unchanged. Layout (4 cache_control breakpoints, Anthropic max): 1. tools[-1] -> 1h (cross-session) 2. system content[0] -> 1h (cross-session, stable prefix) 3. messages[-2] -> 5m (within-session rolling) 4. messages[-1] -> 5m (within-session rolling) Within-session rolling shrinks from 3 messages to 2 to free the breakpoint budget. On Claude with realistic tool loadouts the long-lived tier carries the bulk of cross-session value anyway. System prompt is now always assembled cache-friendly: stable identity / guidance / skills / platform hints first, then session-stable context files (AGENTS.md, .cursorrules), then per-call volatile content. Old single-string callers see the same logical content (same join order), just reordered so volatile lives at the end. Config knobs (defaults shown): prompt_caching: cache_ttl: "5m" # rolling-window TTL (unchanged) long_lived_prefix: true # opt-out switch long_lived_ttl: "1h" # cross-session prefix TTL Live E2E (tests/agent/test_prompt_caching_live.py, gated on OPENROUTER_API_KEY) on anthropic/claude-haiku-4.5 with default toolset: Call 1 (cold): cache_write=13,415 cache_read=0 Call 2 (NEW agent + msg): cache_write=391 cache_read=13,025 Cross-session reuse: 97.09% Implementation: * agent/prompt_caching.py: new apply_anthropic_cache_control_long_lived() + mark_tools_for_long_lived_cache(); existing apply_anthropic_cache_control() preserved verbatim for the fallback path. * agent/anthropic_adapter.py: convert_tools_to_anthropic() now forwards cache_control onto each Anthropic-format tool dict. * run_agent.py: _build_system_prompt_parts() returns the 3-tier dict; _build_system_prompt() joins them (backward compatible). _supports_long_lived_anthropic_cache() policy added next to the existing _anthropic_prompt_cache_policy() (which now also recognises Nous Portal Claude — pre-existing gap fixed in passing). _build_api_kwargs() resolves tools_for_api once and propagates the marker through all four build paths (anthropic_messages, bedrock, codex_responses, profile/legacy chat completions). Long-lived flag plumbed into the runtime snapshot/restore + model-switch + fallback-promotion paths. Tests: * tests/agent/test_prompt_caching.py: +8 tests (TestMarkToolsForLongLivedCache, TestApplyAnthropicCacheControlLongLived). * tests/run_agent/test_anthropic_prompt_cache_policy.py: +9 tests (TestSupportsLongLivedAnthropicCache matrix across 8 endpoint classes + a fallback-target case). * tests/agent/test_prompt_caching_live.py: new live E2E (skipif when OPENROUTER_API_KEY is unset; runs outside the hermetic suite). * Targeted suites: 327/327 pass (caching/adapter/policy/builder). * tests/agent/ + tests/run_agent/: 3992 pass, 17 skip, 1 pre-existing flake (test_async_httpx_del_neuter::test_same_key_replaces_stale_loop_entry, verified failing on pristine origin/main).	2026-05-11 11:14:56 -07:00
wuli666	111b859e49	fix(auxiliary): evict async wrappers on poisoned client (follow-up to #23482 ) #23482 fixed cache poisoning in the sync path: when a Codex auxiliary timeout closes the underlying OpenAI client, _evict_cached_client_instance walks CodexAuxiliaryClient wrappers via their _real_client attribute and drops the cache entry so the next aux call rebuilds. The cache key includes async_mode (see _client_cache_key), so the sync and async clients for the same provider live in two distinct entries pointing at the same underlying transport. The fix walked the sync wrapper's _real_client correctly but the async wrappers (AsyncCodexAuxiliaryClient, AsyncAnthropicAuxiliaryClient, AsyncGeminiNativeClient) never exposed _real_client at all, so the async entry survived eviction and kept handing out the poisoned client. Effect on async aux callers: one timeout now poisons every subsequent async aux call (compression, vision, session_search, title_generation) with 'Connection error' until gateway restart -- even while the sync route recovered as designed in #23482. Mirror the sync wrapper's _real_client onto each async wrapper so the existing eviction helper finds them. Three changes, one per wrapper: - AsyncCodexAuxiliaryClient: self._real_client = sync_wrapper._real_client (the underlying OpenAI client) - AsyncAnthropicAuxiliaryClient: same shape - AsyncGeminiNativeClient: self._real_client = sync_client (Gemini's native facade is itself the leaf; no OpenAI client beneath it) Update _evict_cached_client_instance docstring to reflect that it now covers both sync and async wrappers via the same attribute walk. Test: TestAuxiliaryClientPoisonedCacheEviction.test_evict_cached_client_instance_walks_async_wrapper seeds both sync and async cache entries pointing at the same leaf and asserts both are dropped on a single eviction call. Verified the test fails without the wrapper changes ("async cache entry survived eviction -- wrapper is missing _real_client") and passes with them. Refs #23482, #23432	2026-05-11 11:13:20 -07:00
Teknium	1d00716754	fix(cli,tui): align CJK / wide-char markdown tables (#23863 ) CJK and emoji glyphs render as two terminal cells but JS String#length and the model's own padding count them as one, so any markdown table with Chinese / Japanese / Korean cells drifts right per row when a real terminal renders it. Both surfaces fix this with a display-cell width measurement (wcswidth on the Python side, stringWidth on the TUI side). Changes: - agent/markdown_tables.py: new helper. realign_markdown_tables(text) detects markdown table blocks (header + \|---\| divider) and rewrites the row padding using wcwidth.wcswidth so every pipe and dash lines up across rows. No-op on text without tables. - cli.py: hook the helper into _render_final_assistant_content for strip / render modes (raw passes through untouched), and into the streaming line emitter so live token-by-token rendering also produces aligned tables. A small two-buffer state machine in _emit_stream_text holds table rows until the block ends, then flushes them through the realigner so all rows pad to a single per-column width. - ui-tui/src/components/markdown.tsx: renderTable now uses stringWidth (Bun.stringWidth fast path + East-Asian-width-aware fallback, already memoised in @hermes/ink) instead of UTF-16 String#length for both column-width measurement and per-cell padding. Drops the comment that documented the bug as a deliberate limitation. Validation: - New tests/agent/test_markdown_tables.py (11): every rebuilt block shares pipe column offsets across rows for pure CJK, mixed CJK+emoji, ragged-row, and multi-table inputs. - Updated tests/cli/test_cli_markdown_rendering.py: the existing strip-mode test asserted exact whitespace; rewritten to assert the alignment contract (cell content survives + every rendered row shares pipe offsets). - New ui-tui markdown.test.ts case (1): rendered column-2 start offset is identical for the header + every body row, including the CJK row that drifted before the fix. - Live: hermes chat -q with the user-reported screenshot prompt now produces a perfectly aligned table on the wire (header, divider, 4 body rows including '通义千问', all pipes at identical columns).	2026-05-11 11:13:06 -07:00
Teknium	8e2eb4b511	fix(/model): surface Nous Portal models from remote catalog manifest (#23912 ) The /model picker for Nous Portal users was returning the in-repo _PROVIDER_MODELS["nous"] snapshot — which only updates on Hermes releases — instead of the remote manifest published at https://hermes-agent.nousresearch.com/docs/api/model-catalog.json. OpenRouter already pulled from the manifest via fetch_openrouter_models; "nous" was the only curated provider where the existing manifest plumbing (get_curated_nous_model_ids → get_curated_nous_models) was defined but not wired into the picker pipeline. Switch the curated build in list_authenticated_providers to use it, with the same graceful fallback to the in-repo snapshot when the manifest is unreachable. Test: tests/hermes_cli/test_model_catalog.py exercises the picker with a patched manifest and asserts the manifest's nous list reaches list_picker_providers. Falls-back-to-static path was already covered by test_curated_nous_ids_falls_back_to_hardcoded_on_empty_catalog.	2026-05-11 10:15:30 -07:00
zhengyuna	054f568578	fix: use TUI modal for slash confirmations	2026-05-11 10:02:03 -07:00
Teknium1	283381b1ce	fix(dashboard): validate dist exists when --skip-build is set Follow-up to PR #23824. Adds two correctness fixes on top of the contributor's salvaged commit: 1. Stale-dist fallback no longer gated on `fatal=False`. `cmd_dashboard` passes `fatal=True` and is the primary scenario this fallback is for (issue #23817 — Windows Scheduled Task at logon). The previous gate meant the fallback never fired in the case it was designed for. 2. `--skip-build` now verifies the dist actually exists before starting the server. Without this, a misconfigured pre-build would launch the dashboard pointing at a missing dist and silently serve 404s. We now exit 1 with a clear "pre-build first: cd web && npm run build" message, and on success print which dist directory is being used. Verified end-to-end on Linux: - build fails + stale dist (fatal=True) -> fallback fires - build fails + no dist (fatal=True) -> exit 1 with stderr surfaced - build fails + stale dist (fatal=False) -> fallback fires - --skip-build + missing dist -> exit 1 with clear guidance - --skip-build + valid dist -> 'Skipping web UI build...'	2026-05-11 09:27:05 -07:00
文森.Z	a479ec01ed	fix: make web UI build output decoding robust on Windows On Windows systems using a Chinese GBK locale, `hermes update` could misreport the Web UI build as failed even when `npm run build` actually succeeded. The failure was caused by Python decoding captured npm output with the process locale inside a background subprocess reader thread. When npm emitted bytes such as `0x85`, decoding under GBK raised `UnicodeDecodeError`, and Hermes then surfaced a misleading "Web UI build failed" warning. This change makes the npm install/npm ci path and the Web UI build step decode captured output explicitly as UTF-8 with `errors="replace"`. That keeps unexpected bytes from crashing output collection, preserves successful builds, and prevents false negatives during update on Windows. The patch also adds regression tests that verify these subprocess calls always use explicit UTF-8 decoding with replacement semantics.	2026-05-11 08:14:03 -07:00
Teknium	7026af4e23	fix(agent): catch ChatGPT-account Codex data-URL rejection so images are stripped instead of cascading to compression (#23602 ) When the user's main provider is openai-codex on the ChatGPT-account backend (https://chatgpt.com/backend-api/codex), sending a native image attachment encodes it as data:image/...base64,... in the input_image field. The OpenAI Responses API on the public endpoint accepts that, but the ChatGPT-account variant rejects it with HTTP 400: Invalid 'input[N].content[K].image_url'. Expected a valid URL, but got a value with an invalid format. Hermes' image-rejection phrase list didn't include this wording, so the error escaped the strip-and-retry branch and fell through to the generic recovery path: model fallback → context-too-large → compression cascade → auxiliary OpenRouter 402 spam (issue #23570). Add a NARROW phrase keyed on the field-path apostrophe used by the Codex Responses error format: "image_url'. expected". This matches the actual error format without false-tripping on generic 'Expected a valid URL' errors from unrelated tools (webhooks, redirect_uri, etc.). Once matched, the existing branch strips images from history, sets _vision_supported= False for the session, and retries text-only. Refs #23570 (1 of 3 image-replay improvements; persistence rewrite to store image PATHS instead of inlined base64 is a separate follow-up)	2026-05-11 07:37:22 -07:00
Teknium	3e7145e0bb	revert: roll back /goal checklist + /subgoal feature stack (#23813 ) * Revert "fix(goals): force judge to use tool calls instead of JSON-text replies (#23547)" This reverts commit `a63a2b7c78`. * Revert "fix(goals): forward standing /goal state on auto-compression session rotation (#23530)" This reverts commit `4a080b1d5a`. * Revert "feat(goals): /goal checklist + /subgoal user controls (#23456)" This reverts commit `404640a2b7`.	2026-05-11 07:06:27 -07:00
fr33d3m0n	976d8e27ad	fix(approval): catch sudo with stdin/askpass/shell privilege flags Adds the only #17873 category not covered by the in-flight PRs #17962 (briandevans, reverse shell + download-execute) and #7993 (SHL0MS, credential reads + curl/wget exfiltration): sudo invocations that an LLM-driven agent can drive without TTY interaction. The agent has no TTY, so the sudo forms that succeed without human involvement are those reading the password from stdin (`-S` / `--stdin`) or via an askpass helper (`-A` / `--askpass`). The shell-launch (`-s`) and list-privileges (`-a`) flags are also gated since they are privilege-relevant invocations the agent can chain after acquiring the password (e.g. read SUDO_PASSWORD from .env -> sudo -S -s -> root shell). Plain `sudo cmd` (no flag) is TTY-bound and excluded. Two patterns: 1. Direct flag: `\bsudo\b[^;\|&\n]?\s+(?:-s\b\|--stdin\b\|-a\b\|--askpass\b)` The lazy `[^;\|&\n]?` consumes flag-arguments without spanning command separators, so `sudo -u root -S whoami` matches (a textbook offensive form that a strict `(?:\s+-[^\s]+)` "leading flags only" pattern would have missed because `root` is a flag-value not a flag). 2. Combined short flags: `\bsudo\b[^;\|&\n]?\s+-[a-z][sa][a-z]\b` Catches packed forms like `sudo -nS id` where multiple flags share a single `-X` token. `_normalize_command_for_detection` lowercases input before pattern matching (tools/approval.py:340), so case variants of S/s and A/a collapse — both letter-pairs are gated since each is a privilege- relevant invocation. Tests: 21 new cases in TestDetectSudoStdin (12 positive covering all flag-order permutations including herestring source and printf-piped forms; 9 negative including TTY-bound `sudo whoami`, interactive `sudo -i`, env-var reference `$SUDO_USER`, doc lookup `man sudo`, package install, and the `pseudosudo` word-boundary edge case). Empirical coverage: 11/11 attacks matched, 0/10 false positives. Refs: #17873 category 4. Adjacent: #17962 (reverse shell + download- execute), #7993 (credential reads + curl/wget exfiltration). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 06:56:30 -07:00
OpenClaw Agent	9520a1ccdf	fix(terminal): block sudo -S password guessing when SUDO_PASSWORD is not set Fixes #9590: Block explicit sudo -S (stdin password mode) commands when the SUDO_PASSWORD environment variable is not configured. The attack vector: the LLM constructs 'echo guessedpass \| sudo -S cmd' to brute-force sudo passwords, iterates based on sudo's error output ('Sorry, try again'). The existing _transform_sudo_command only injects -S when SUDO_PASSWORD exists; without it, the LLM's explicit sudo -S must be treated as a guessing attempt. Changes: - Add _check_sudo_stdin_guard() in approval.py: detects sudo -S when SUDO_PASSWORD is absent, anchored to command-start positions (^ ; && \|\| \| etc.) to avoid false positives on literal text - Integrate into check_all_command_guards() above yolo/mode=off so the block is unconditional (like the hardline floor) - Add 6 tests covering: detection, allow-list, SUDO_PASSWORD bypass, integration with check_all_command_guards, yolo non-bypass, container backend bypass	2026-05-11 06:56:30 -07:00
kshitijk4poor	494824fb11	chore: remove unused sentinel in test_send_message_tool	2026-05-11 06:44:58 -07:00
Dominikh	379e7dd014	test(send_message): cover _check_send_message gating paths Adds a TestCheckSendMessage class with 7 focused tests pinning the four passing conditions and the failure modes: - HERMES_KANBAN_TASK grants access (the new branch) - HERMES_KANBAN_TASK short-circuits before consulting session_context or gateway.status (so workers don't depend on those import paths being healthy) - HERMES_SESSION_PLATFORM=telegram grants access - HERMES_SESSION_PLATFORM=local falls through to gateway check - is_gateway_running()=True grants access - All signals absent → False - gateway.status ImportError is swallowed → False Pinning the short-circuit (test #2) is the load-bearing one — it documents the contract that worker-side availability cannot regress to depending on gateway-side state lookups.	2026-05-11 06:44:58 -07:00
Sylw3ster	641e40c4bd	fix(kanban): restore HERMES_KANBAN_BOARD after scoped slash override	2026-05-11 06:44:58 -07:00
liuhao1024	2b3bf17dfa	fix(kanban): call kanban_block on iteration-budget exhaustion to prevent protocol violation When a kanban worker subprocess hits the iteration budget, the agent loop strips tools and asks the model for a summary. The model cannot call kanban_block itself at that point, so the process exits rc=0 without calling kanban_complete or kanban_block — a protocol violation that the dispatcher detects as a fatal error, giving up after 1 failure and stranding downstream tasks. Fix: after _handle_max_iterations() returns, check HERMES_KANBAN_TASK and call kanban_block with a reason describing the exhaustion. The dispatcher then sees a clean block transition instead of a protocol violation, and the task can be retried or escalated by a human. Fixes [Bug] kanban-worker exits cleanly (rc=0) on iteration-budget exhaustion without calling kanban_complete or kanban_block #23216	2026-05-11 06:44:58 -07:00
Frowtek	f6d4f3c37d	fix(kanban): route gateway create auto-subscribe to explicit board	2026-05-11 06:44:58 -07:00
Teknium	228b7d27bd	fix(auxiliary): cache 402'd providers as unhealthy with TTL to stop per-call retry storms (#23597 ) When an auxiliary provider returns HTTP 402 (credit / payment), every subsequent compression / title-gen / session-search / vision call still re-tried it as the FIRST entry in the chain — burning ~1 RTT to hit 402 again, then falling back. On a long Discord/LCM session that meant dozens of doomed 402s per minute (issue #23570). Add a per-process unhealthy-provider cache with a 10 min TTL. When any caller observes a payment error against a provider, the label is marked unhealthy and skipped by: * _resolve_auto Step-1 (main provider use-as-aux path) * _resolve_auto Step-2 (aggregator/fallback chain) * _try_payment_fallback (used by call_llm/acall_llm on first 402) Skip-logs are throttled to once per minute per label so a bursty session doesn't spam agent.log. Entries auto-expire so a topped-up account recovers without manual intervention. The cache is in-process only by design — multi-profile users with different keys per profile must each hit the 402 once. Refs #23570	2026-05-10 22:43:14 -07:00
0xbyt4	ace1c4ea8c	fix(discord): typing indicator task not cleaned up after API error When the Discord typing API call fails (rate limit, network error, 403), _typing_loop returns early but the stale task remains in _typing_tasks. Subsequent send_typing calls see the stale entry and skip, leaving no typing indicator for the rest of the agent invocation. Add finally block to _typing_loop to always remove the task from _typing_tasks on exit, whether from cancellation, error, or normal completion. This allows send_typing to create a fresh task. 3 new tests in test_discord_send.py: - Task removed after API error - Typing restartable after failure - stop_typing cleans up	2026-05-10 22:41:26 -07:00
Teknium	228a4d11ae	fix(config): warn loudly on YAML parse failure instead of silent default fallback (#23585 ) A YAML parse error in ~/.hermes/config.yaml caused load_config() to print one line to stdout (Warning: Failed to load config: ...) and silently fall back to DEFAULT_CONFIG, dropping every user override (auxiliary providers, fallback chain, model settings). Users only noticed when downstream behavior misbehaved — see issue #23570 where a tab-indent error in the auxiliary section caused aux fallback to use OpenRouter (depleted) instead of the configured Codex/MiniMax chain. Now: log at WARNING (so 'hermes logs' surfaces it), write a prominent line to stderr, dedup on (path, mtime_ns, size) so concurrent loads don't spam, and re-warn after the user edits the file. Both call sites (raw read + merged load) route through the same helper. Refs #23570	2026-05-10 22:36:19 -07:00
Gutslabs	3af3c4eb8c	fix(misc): three small defensive fixes from PR #1974 Salvages the three substantive low-severity fixes from Gutslabs' #1974 "misc bug fixes" bundle. The other 8 claims in that PR were either already fixed on main with superior implementations (state lock, firecrawl lazy import, fcntl/msvcrt guard, path normalization, schema migrations) or did not survive review. - run_agent: `_materialize_data_url_for_vision` uses `NamedTemporaryFile(delete=False)`; if `base64.b64decode` raises on a corrupt data URL the temp file would persist forever. Wrap the write in try/except and `os.unlink` the temp on failure. - gateway/session: `append_to_transcript` JSONL write had no error handling, so disk-full / read-only-fs / permission errors crashed the message handler. The SQLite write above is the primary store, so swallow OSError on the JSONL fallback with a debug log. - gateway/status: `_read_pid_record` reads `pid_path.read_text()` after an `exists()` check; if the PID file is deleted between the two calls (concurrent gateway restart) we hit an unhandled OSError. Catch it and return None. Adds a regression test for the tempfile cleanup; the other two paths are defensive try/excepts on infrequent OSError that don't warrant dedicated tests. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-05-10 22:28:01 -07:00
teknium1	edb4a2bda5	test(telegram): cover env-clamped helper + adaptive text-batch tiers - New tests/gateway/test_telegram_text_batch_perf.py: TestEnvFloatClamped — 7 tests covering default-when-unset, valid parse, garbage fallback, NaN rejection, Inf rejection, min-clamp, max-clamp. Asserts asyncio.sleep() always gets a finite number. TestAdaptiveTextBatchTiers — 4 tests covering the tier-constant invariants and the min(cap, tier_delay) composition rule. - tests/gateway/test_display_config.py: update assertions for Telegram's new tool_progress='new' default.	2026-05-10 22:22:25 -07:00
Hugo Sqr	f2e8ed2405	Add unit tests for hyperliquid skill functionality - Implement tests for normalizing perpetual markets and DEXs. - Validate JSON output for main commands including markets, candles, and review. - Ensure environment variable resolution and dotenv file reading are covered. - Test export functionality for market data with expected output structure.	2026-05-10 22:15:04 -07:00
Teknium1	28b4fe6007	test: stabilize quick-command redaction test against xdist ordering agent.redact._REDACT_ENABLED is snapshotted at import time from HERMES_REDACT_SECRETS env. Under xdist a prior test in the same worker can flip it, so test_exec_command_output_is_redacted was order-dependent. Pin it via monkeypatch like test_terminal_output_transform_still_runs_strip_and_redact does.	2026-05-10 22:12:23 -07:00
0xbyt4	f6736ced81	fix(security): sanitize env and redact output in quick commands + remove write-only _pending_messages 1. Quick command exec ran in the gateway process's full environment without env sanitization or output redaction. A quick command like "env" or "printenv" would leak all API keys, OAuth tokens, and bot credentials to the messaging user. Fix: apply _sanitize_subprocess_env() before exec and redact_sensitive_text() on output before returning. 2. GatewayRunner._pending_messages was written on every interrupt (lines 1331-1334) but never read or consumed anywhere. The actual interrupt delivery uses adapter._pending_messages (a separate dict). Removed the write-only accumulation to prevent unbounded growth.	2026-05-10 22:12:23 -07:00
teknium1	82352e54c4	test(telegram): regression coverage for edit overflow split-and-deliver Two new tests: - tests/gateway/test_telegram_format.py test_message_too_long_splits_into_continuations_not_silent_truncation: asserts edit_message returns success=True with continuation_message_ids populated and message_id pointing at the last continuation when content exceeds MAX_MESSAGE_LENGTH (#19537). Replaces the original fail-on-overflow assertion with the split-and-deliver contract. - tests/gateway/test_stream_consumer.py TestEditOverflowSplitAndDeliver.test_consumer_advances_message_id_on_split_and_deliver: asserts the consumer side updates _message_id to the latest continuation, clears _last_sent_text, and fires on_new_message when the adapter reports a split-and-deliver result.	2026-05-10 22:02:56 -07:00
Teknium	3b122cc1ac	feat(kanban): stranded_in_ready diagnostic for unclaimed tasks (#23578 ) Surface ready tasks that nobody claims within a threshold (default 30 min) regardless of why. One identity-agnostic signal that catches: - Operator typo'd the assignee - Profile was deleted, leaving its tasks stranded - External worker pool (Codex CLI lane, custom daemon) is down - Dispatcher misconfigured (wrong board / wrong HERMES_HOME) Today the dispatcher correctly skips these (no respawn loop, good) but nothing surfaces the fact that operator-actionable work is accumulating. The new `stranded_in_ready` rule does that without requiring a manual lane registry — it reads the most recent ready- transition event (`created` / `promoted` / `reclaimed` / `unblocked`) and fires when (now - last_ready_ts) > threshold. Severity escalates with age: warning at threshold, error at 2x, critical at 6x. The cli_hint and reassign actions point operators at the right next step. Out of scope deliberately: - Lane registry (#20157 closed) — this signal supersedes it. - Pushing the diagnostic into messaging gateways — diagnostics are pull-only via 'hermes kanban diagnostics' for now; gateway push is a separate UX decision. Tests: 10 new + 461 existing kanban tests pass. E2E verified end- to-end via 'hermes kanban diagnostics --json' against a 2h-old stranded task — surfaces as error severity with correct actions.	2026-05-10 21:58:44 -07:00
eloklam	b60462a205	test(kanban): remove stale t.summary assertion from search test Task.summary was never a real field; latest_summary already covers it. Matches the haystack cleanup in commit f3015e6ab.	2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam	0ea234e093	feat(kanban): dashboard batch QOL upgrade - Shift-click range selection, column select-all, select-all-visible - Multi-card drag/drop via selectedIds + /tasks/bulk - Expanded bulk actions: todo/ready/blocked/unblock/complete/archive, priority setter, reassign with reclaim_first checkbox - Partial failure card highlight (failedIds + hermes-kanban-card--failed) - Search expanded to body, result, latest_summary, summary - Clear filters button + reset all filters on board switch - Accessibility: larger checkbox hit target, tabIndex/role/aria-label, Enter/Space/Esc keyboard handlers - Fix temporal-dead-zone bug: move clearSelected before moveSelected	2026-05-10 21:44:37 -07:00
Teknium	a63a2b7c78	fix(goals): force judge to use tool calls instead of JSON-text replies (#23547 ) Live-tested on gemini-3-flash-preview the judge kept returning empty or non-JSON content, tripping the consecutive-parse-failures auto- pause. Free-form JSON output is hopeful; tool-call schemas are enforced server-side by virtually every modern provider. Two new tools the judge calls: - submit_checklist(items) — Phase A, decompose - update_checklist(updates, new_items, reason) — Phase B, evaluate Both phases now call the auxiliary client with tool_choice forcing the right tool. read_file remains for Phase B history inspection, with the loop exiting only when update_checklist is called or the read budget is exhausted (at which point read_file is dropped from the toolbox and update_checklist is forced). Robustness: - _call_judge_with_tool_choice falls back tool_choice forced→required→ auto if the provider rejects a particular shape. - If a fully-broken provider still returns content instead of a tool call, the legacy JSON-text parsers stay around as a last-ditch backstop so we never silently lose a checklist. - _normalize_update_args replaces the JSON parser for the apply layer; same 1-based→0-based conversion + terminal-status filter. Live verification: same fizzbuzz goal that was hitting 'judge model returned unparseable output 3 turns in a row' before now terminates in 2 turns, all 11 items marked completed with item-specific evidence, no auto-pause. Agent log shows 'produced 11 checklist items via tool call' instead of the JSON- parse path. Tests: 7 new cases for the tool-call path (Phase A success, Phase B update only, Phase B read_file→update, JSON-content backstop, empty-text item dropping, non-terminal status filter).	2026-05-10 20:51:40 -07:00
Teknium	4a080b1d5a	fix(goals): forward standing /goal state on auto-compression session rotation (#23530 ) When run_agent's _compress_context fires mid-turn it ends the parent session in SessionDB and creates a new continuation session with a fresh session_id. The /goal state is keyed on session_id in state_meta ("goal:<sid>"), so without forwarding the goal silently disappears: _get_goal_manager() rebinds for the new session_id, load_goal() returns None, mgr.is_active() is False, and the continuation loop dies with no user-visible signal. Fix: in the same SessionDB transaction block that creates the continuation session, copy state_meta[goal:<old>] → state_meta[goal:<new>] when present. No-op when the user has no active goal. Logged at INFO so a stuck loop is debuggable. Tests cover the round-trip via SessionDB and the no-op path. Affects all three run-conversation surfaces (CLI, gateway, TUI gateway) because _compress_context is the single rotation site.	2026-05-10 20:41:53 -07:00
Mike Nguyen	ba5640fa11	fix(gateway): route kanban notifications to creator profile	2026-05-10 20:04:53 -07:00
teknium1	7f90141c63	test(telegram): native-draft transport coverage + docs Added tests/gateway/test_stream_consumer_draft.py with 11 tests covering: - Transport selection: auto+dm-supported -> draft; auto+group -> edit; explicit edit; explicit draft on unsupported adapter -> edit; MagicMock adapter -> edit (back-compat for the existing test suite). - Happy path: DM stream animates draft frames with a single shared draft_id, then finalizes via a regular adapter.send. - Group fallback: drafts entirely skipped in non-DM chats. - Failure fallback: send_draft returning success=False disables drafts for the rest of the response. - Draft_id lifecycle: consecutive responses use distinct ids; tool boundaries bump the id so post-tool text animates fresh below the tool-progress bubble (the openclaw #32535 leak guard). - _already_sent contract: drafts must NOT set the flag so the gateway's fallback final-send still fires (drafts have no message_id). Updated website/docs/user-guide/messaging/telegram.md with a 'Streaming transport' section explaining auto\|draft\|edit\|off, the DM-only constraint, and the per-response fallback behaviour.	2026-05-10 20:02:50 -07:00
Teknium	771b8c4a36	test(conftest): plug every gateway-kill leak path (#23486 ) The existing _live_system_guard (PR #23397) blocked os.kill / os.killpg and a narrow subset of subprocess invocations. Tests still SIGTERMed the live gateway today (May 10) because the guard had structural holes. Plug them all: - subprocess: also wrap getoutput, getstatusoutput - os.system, os.popen - completely unwrapped before - pty.spawn - completely unwrapped before - asyncio.create_subprocess_exec / create_subprocess_shell - bypassed the subprocess module entirely; now wrapped - Subprocess command inspection now looks at the WHOLE command string, not just tokens[0]. Catches sudo systemctl, env systemctl, bash -c 'systemctl', setsid systemctl, /usr/bin/systemctl, etc. - New process-killer block: pkill / killall / taskkill / fuser targeting hermes/python patterns is now refused - os.kill PID 0 (own group) allowed; PID -1 (every process we can signal) refused - subprocess.Popen wrapper preserves __class_getitem__ so third-party packages that use Popen[bytes] as a type annotation still import Coverage is locked in by tests/test_live_system_guard_self_test.py - exercises every primitive against a guaranteed-foreign PID and asserts the guard fires. Adding a new kill primitive without updating the guard breaks CI. scripts/run_tests.sh now also force-loads ~/.hermes/pytest_live_guard.py when present (developer-machine convenience), so even worktrees that predate this commit get the protection on subsequent test runs through the canonical wrapper.	2026-05-10 18:55:28 -07:00
Teknium	e5bce320db	fix(auxiliary): evict cached client on timeout/connection error (#23482 ) A Codex auxiliary timeout closes the underlying OpenAI client (so the streaming hang doesn't sit until the user kills the session), but the cached wrapper kept pointing at the now-dead transport. Subsequent auxiliary calls (compression retry, memory flush, background review, title generation routed via provider: main) reused that closed client and failed fast with 'Connection error' until the gateway restarted — even though the main agent route was healthy the whole time. Sync `_get_cached_client` had no liveness check (async did, via loop identity), and the connection-error fallback in `call_llm` only fired on the auto provider path, so an explicit provider — including the common `auxiliary.compression.provider: main` shape — never evicted. Three fixes: * New `_evict_cached_client_instance(target)` helper that drops the cache entry whose stored client is target (or wraps it via `_real_client`, for `CodexAuxiliaryClient`). * `_CodexCompletionsAdapter._close_client_on_timeout` evicts the wrapper after closing the inner OpenAI client. * `call_llm` and `async_call_llm` evict on `_is_connection_error` before re-raising, regardless of whether the provider is auto. Net effect: one timeout costs one summary attempt + the existing 30s compressor cooldown; the next compaction rebuilds the client and works. Non-connection errors (4xx/5xx) do not evict, so cache hits stay stable. Closes #23432	2026-05-10 18:55:05 -07:00
rahimsais	737314fe91	fix(telegram): normalize dm threads and retry control sends Cherry-picked from PR #10371. Two-layer defense for the spurious-thread_id issue (#3206): 1. _build_message_event filters DM thread_ids: only preserve thread_id for real topic messages (is_topic_message=True). Telegram puts message_thread_id on every DM that is a reply, but reply-chain ids route to nonexistent threads on send. 2. _send_message_with_thread_fallback helper: control sends (send_update_prompt, send_exec_approval / send_slash_confirm, send_model_picker) retry once without message_thread_id when Telegram returns BadRequest 'Message thread not found'. Mirrors the pattern PR #3390 added for the streaming send path. Salvage notes: - Conflict 1 (line ~4099): merged the contributor's DM is_topic_message filter with the existing forum General-topic default from #22423, preserving both behaviors. - Conflict 2 (line ~1664 / 1690): kept main's delete_message (PR #23416) alongside the new helper. Tightened the helper's exception catch from bare 'Exception' to use the existing _is_bad_request_error + _is_thread_not_found_error helpers (line 484-496) for consistency with the streaming send path. - Widened the fix to send_update_prompt (was bare self._bot.send_message, same bug class). Authored by rahimsais via PR #10371 (re-attributed from donrhmexe@ local commit author).	2026-05-10 18:09:31 -07:00

1 2 3 4 5 ...

3610 commits