hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-18 04:41:56 +00:00

Author	SHA1	Message	Date
tymrtn	d1fc748def	fix(kanban): /kanban slash command emits argparse garbage instead of help Closes #21794. `/kanban`, `/kanban help`, `/kanban --help`, and `/kanban <sub> -h` all returned broken output to the gateway and interactive CLI. Three underlying bugs in `hermes_cli.kanban.run_slash`: 1. argparse writes help to stdout but `run_slash` only captured stderr at parse time, so `-h` text was silently swallowed and replaced with the `(usage error: 0)` sentinel. 2. The wrapping parser used `prog="/"` and routed via a synthetic "_top → kanban" subparser, producing `usage: / kanban …` (stray space) and `usage: /kanban kanban …` (doubled token) in error text. 3. Bare `/kanban` and `/kanban help` dumped argparse's full ~3KB usage tree, which reads as visual garbage in a chat bubble. Fix: drive the kanban_parser directly (no double-wrap), rewrite prog strings on every leaf subparser, capture stdout AND stderr around parse_args, distinguish SystemExit(0) (help — return captured stdout) from SystemExit(2) (error — return single-line ⚠-prefixed message), and add an explicit chat-friendly short-help block returned for bare invocation and the help aliases (`help`, `--help`, `-h`, `?`). Added 5 regression tests covering bare invocation, every help alias, subcommand help, unknown action, and missing required arg. Affects every chat platform via gateway/run.py::_handle_kanban_command and the interactive CLI via cli.py::_handle_kanban_command. Co-Authored-By: Nagatha (Claude Opus 4.7) <noreply@anthropic.com>	2026-05-09 22:49:29 -07:00
li0near	6f2d60559e	fix(kanban): drop redundant init_db() in gateway watchers (#21378 ) Both `_kanban_notifier_watcher` and `_kanban_dispatcher_watcher`'s `_tick_once_for_board` called `_kb.connect(board=slug)` immediately followed by `_kb.init_db(board=slug)`. Since `connect()` already runs the schema + idempotent migration on first open per process, the explicit `init_db()` was redundant — and worse, `init_db()` deliberately busts the per-process `_INITIALIZED_PATHS` cache and re-runs the migration on a second connection that races the first. On every cold gateway start against a legacy DB this surfaced as either `sqlite3.OperationalError: duplicate column name: <col>` or intermittent `database is locked` errors logged at the first tick. The duplicate-column case is now tolerated by `_add_column_if_missing` (commit `78698381a`), but the wasted second migration plus the database-is-locked race remain fixable by skipping the redundant call entirely. Drops `_kb.init_db(board=slug)` at both call sites and adds a regression test in `tests/hermes_cli/test_kanban_notify.py` that pins the absence via source inspection plus a runtime spy. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-05-09 22:38:01 -07:00
Teknium	68e44642c8	fix(stream-retry): collapse two-line drop status, name provider, and let agent.log capture diagnostics (#22993 ) Subagent stream drops were spamming the parent terminal with two lines per blip ('Connection dropped...' + 'Reconnected...') while leaving zero breadcrumb in agent.log to debug them. Two underlying bugs, fixed together: 1. quiet_mode raised the run_agent/tools/etc. loggers to ERROR, which filters records before root-logger file handlers see them. The comment claimed 'File handlers still capture everything' — that was wrong. Removed in both run_agent.py and cli.py; console quietness already comes from hermes_logging not installing a console StreamHandler in non-verbose mode. 2. The stream-retry blocks emitted two _emit_status calls per drop ('⚠️ Connection dropped... Reconnecting...' + '🔄 Reconnected — resuming…') with no provider name, so multi-provider sessions had to dig through agent.log to attribute a drop. Replaced both call sites with a single _emit_stream_drop helper that emits ONE line naming the provider and error class, and always writes a structured WARNING to agent.log with subagent_id, depth, provider, base_url, error_type. Net UX change: 6 lines per triple-subagent drop → 3 lines, each naming the provider. agent.log now has a structured breadcrumb per retry that didn't exist before. Tests: 6 new tests in tests/run_agent/test_stream_drop_logging.py covering the logger-level guard, structured WARNING content, single status line per drop (no Reconnected follow-up), and provider naming.	2026-05-09 22:35:35 -07:00
Teknium	3800972dd0	feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955 ) When the active main model has native vision and the provider supports multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini 3, OpenRouter, Nous), vision_analyze loads the image bytes and returns them to the model as a multimodal tool-result envelope. The model then sees the pixels directly on its next turn instead of receiving a lossy text description from an auxiliary LLM. Falls back to the legacy aux-LLM text path for non-vision models and unverified providers. Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and Cline. All four converge on the same pattern: tool results carry image content blocks for vision-capable provider/model combinations. Changes - tools/vision_tools.py: _vision_analyze_native fast path + provider capability table (_supports_media_in_tool_results). Schema description updated to reflect new behaviour. - agent/codex_responses_adapter.py: function_call_output.output now accepts the array form for multimodal tool results (was string-only). Preflight validates input_text/input_image parts. - agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so tools see the live CLI/gateway override, not the stale config.yaml default. set_runtime_main()/clear_runtime_main() helpers. - run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn start so vision_analyze's fast-path check sees the actual runtime. - tests/conftest.py: clear runtime-main override between tests. Tests - tests/tools/test_vision_native_fast_path.py: provider capability table, envelope shape, fast-path gating (vision-capable model uses fast path; non-vision model falls through to aux). - tests/run_agent/test_codex_multimodal_tool_result.py: list tool content becomes function_call_output.output array; preflight preserves arrays and drops unknown part types. Live verified - Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a typed filepath, gets pixels back, reads exact text from images that no aux description could capture (font color irony, multi-line fruit-count list, etc.). PR replaces the closed prior efforts (#16506 shipped the inbound user- attached path; this PR closes the gap for tool-discovered images).	2026-05-09 21:06:19 -07:00
Clooooode	998676dd0c	chore(test): comment of test case rewrite to english Some checks are pending Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Docker Build and Publish / move-latest (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details OSV-Scanner / Scan lockfiles (push) Waiting to run Details Tests / test (push) Waiting to run Details Tests / e2e (push) Waiting to run Details uv.lock check / uv lock --check (push) Waiting to run Details	2026-05-09 19:31:41 -07:00
Clooooode	dd49d50389	test(kanban): assert re-block notification is delivered after unblock cycle Adds test_notifier_second_blocked_delivers to cover the case where a task is blocked, unblocked, then blocked again — the second blocked event must still deliver a gateway notification. Currently fails because blocked is treated as a terminal event kind, causing the subscription to be dropped after the first block.	2026-05-09 19:31:41 -07:00
Tranquil-Flow	8954537f95	fix(kanban): request default board explicitly (#21819 )	2026-05-09 19:31:32 -07:00
Teknium	08ec602770	fix(tool-result-storage): persist via stdin to bypass 128 KB exec-arg cap (#22913 ) Linux's MAX_ARG_STRLEN caps any single argv element at 128 KB (32 * PAGE_SIZE). The previous heredoc-in-the-command-string approach in _write_to_sandbox put the entire tool result inside the 'bash -c' arg, so any result over ~128 KB raised OSError [Errno 7] 'Argument list too long' before the heredoc ever ran. The caller logged a warning, but quiet_mode (CLI default) sets tools.* to ERROR — so the warning never reached agent.log either, and the agent saw a 1.5 KB preview tagged 'Full output could not be saved to sandbox'. Hits delegate_task with 3+ subagent outputs routinely now. Switch to passing content via env.execute(stdin_data=...). cmd is now just 'mkdir -p X && cat > Y' (under 1 KB), and the heavyweight payload travels through stdin where there is no argv-element limit. E2E reproduced the user's exact 144,778-char delegate_task envelope: old code OSError'd, new code round-trips cleanly to disk with all three task summaries intact.	2026-05-09 18:44:58 -07:00
Teknium	4375b82cd9	feat(curator): show rename map in user-visible summary (#22910 ) * feat(curator): show rename map (where skills went) in user-visible summary The full data has always been on disk in REPORT.md, but the user-visible curator summary (gateway 💾 line, CLI session-start panel, `hermes curator status`) was counts-only — "consolidated 4 into 2 umbrellas" with no names. Users only discovered renames when something they expected was gone. New `_build_rename_summary()` formats the rename map and appends it to `final_summary`: auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1 archived 3 skill(s): • docx-extraction → document-tools • pdf-extraction → document-tools • old-stale-thing — pruned (stale) full report: hermes curator status Empty on no-op ticks (no archives), so most ticks add zero log noise. Cap of 10 entries keeps agent.log readable when a 50-skill consolidation lands; the full list is always in REPORT.md. `hermes curator status` indents continuation lines so the multi-line summary reads as one logical field. 5 new tests in tests/agent/test_curator_classification.py covering empty / consolidation / pruning / cap / mixed cases. * feat(curator): show recent run summary once on `hermes update` The rename map is now visible from where users actually look — the update flow they explicitly run, instead of just the live gateway log or transient CLI session-start panel. Behavior: - After `hermes update`, if the most recent curator run produced a rename map (multi-line summary) that the user hasn't seen yet, print it once with a 'last run Xh ago' header and a one-time-message footer. - Stamp `last_run_summary_shown_at = last_run_at` after printing so subsequent `hermes update` invocations are silent until a newer curator run lands. - Silent on no-op runs (single-line summary like 'auto: no changes; llm: no change'). Still stamps shown so we don't reconsider on every update. - Silent when the curator has never run (the existing first-run notice handles that case). Output: ℹ Skill curator — last run 4h ago auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1 archived 3 skill(s): • docx-extraction → document-tools • pdf-extraction → document-tools • old-stale-thing — pruned (stale) full report: hermes curator status (This message shows once per curator run. View anytime: hermes curator status) State migration: - `_default_state()` gains `last_run_summary_shown_at: None`. Existing state files lack the field; `.get()` returns None; the comparison treats any prior run as 'not yet shown' and prints once on next update. Self-healing. Wiring: - Both `hermes update` paths in main.py call the new `_print_curator_recent_run_notice()` right after the existing first-run notice. Best-effort try/except so a state-load bug never breaks the update flow. 6 tests in tests/hermes_cli/test_curator_recent_run_notice.py: no-run / single-line / multi-line / show-once / new-run-resets / time-formatter buckets.	2026-05-09 18:43:40 -07:00
ming	85383c6363	fix(cli): preserve config comments on setting writes	2026-05-09 17:55:12 -07:00
Teknium	4ca7c2104d	test(gateway): stub /proc unavailability in find_gateway_pids fallback test Follow-up test fix for #22693 — the existing test for ps-failure + pid-file fallback needed the /proc walk path stubbed too since /proc is now consulted first.	2026-05-09 17:54:17 -07:00
Wesley Simplicio	6bf7ac3185	fix(gateway): detect gateway process via /proc in Docker without procps Salvage of NousResearch/hermes-agent#7622. Docker images often lack procps so `ps` is unavailable. Try reading /proc/*/cmdline first (works in any Linux container) and fall back to `ps -A eww` only when /proc is not present. PermissionError on individual PIDs is silently skipped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 17:54:17 -07:00
Teknium	2ffef15675	fix(test_gateway): stop run_gateway() tests from rewriting the dev's installed systemd unit (#22900 ) run_gateway() calls refresh_systemd_unit_if_needed() on every invocation so restart settings stay current after exit-code-75 respawns. The user-scope unit path resolves under Path.home() (NOT sandboxed by conftest, only HERMES_HOME is), and generate_systemd_unit() bakes the current HERMES_HOME into the unit's Environment= line. Result: any test that exercises run_gateway() end-to-end on a real Linux dev box silently rewrites the developer's installed ~/.config/systemd/user/hermes-gateway.service with a polluted HERMES_HOME pointing at /tmp/pytest-of-<user>/.../hermes_test. On the next reboot, systemd loads that unit, the gateway starts looking at an empty tmp dir, and Telegram/Discord/etc. all show as 'No messaging platforms enabled' even though the user's real config is fine. Three tests in tests/hermes_cli/test_gateway.py hit this path: test_run_gateway_exits_cleanly_on_keyboard_interrupt, test_run_gateway_exits_nonzero_when_start_gateway_reports_failure, and test_run_gateway_root_guard_has_escape_hatch. Two-layer fix: 1. _install_fake_gateway_run helper (covers all four run_gateway() call sites in test_gateway.py and any future ones) now also stubs supports_systemd_services and refresh_systemd_unit_if_needed. 2. refresh_systemd_unit_if_needed() itself sniffs the generated unit body for /pytest-of- and /hermes_test markers and refuses to write when present. Defense in depth so a future test that bypasses the helper still can't corrupt the dev's gateway. Tests that legitimately exercise the refresh flow (test_run_gateway_refreshes_outdated_unit_on_boot) patch generate_systemd_unit to return synthetic content that doesn't carry those markers, so they keep working. Adds test_refresh_refuses_to_bake_pytest_tmpdir_into_real_user_unit as a regression test for the source-side guard.	2026-05-09 17:54:09 -07:00
Wesley Simplicio	4f8d8ad912	fix(error_classifier): classify generic-typed timeout messages as transient (carve-out of #22664 ) RuntimeError('claude CLI turn timed out') from a local OpenAI-compatible shim was falling through to FailoverReason.unknown, surfacing as 'Empty response from model' and burning 3 retry slots on the same failing endpoint. _classify_by_message had no timeout-message branch — only billing/rate_limit/auth/context_overflow/model_not_found patterns. The type-based check at line 565 also requires isinstance(error, (TimeoutError, ConnectionError, OSError)) — a plain RuntimeError doesn't match. Add _TIMEOUT_MESSAGE_PATTERNS for 'timed out', 'deadline exceeded', 'request timed out', 'operation timed out', 'upstream timed out', 'turn timed out'. _classify_by_message returns FailoverReason.timeout (retryable=True) when any pattern matches. Salvage of #22664's classifier portion. The original PR also bundled a fallback self-selection guard which is now redundant (already on main via #22780) plus DeepSeek thinking and session_search fixes that are their own separate concerns. Follow-up to #22780 — fixes the still-broken classification of generic-typed provider-shim timeouts that #22780's dedup didn't cover.	2026-05-09 17:54:07 -07:00
Wesley Simplicio	6ddc48b058	fix(fallback): resolve api_key_env in fallback chain entries (carve-out of #22665 ) Fallback chain entries with 'api_key_env: ENV_VAR_NAME' weren't being resolved by either the init-time fallback path (line ~1660) or the runtime _try_activate_fallback path (line ~8045). Only literal 'api_key' was honored; the snake_case 'api_key_env' alias documented elsewhere in the config was silently dropped, so a 'provider: custom' fallback with base_url + api_key_env worked as primary but failed as fallback with 'no endpoint credentials found' / 401. Adds 'or fb.get("api_key_env")' to the existing 'key_env' lookup in both call sites, with empty-string-to-None coercion so unset env vars don't poison the resolver. Salvage of #22665's fallback portion. The original PR also bundled gateway-degrade-on-no-adapters changes (those land via the carve-out in #22853 which is the same code) and run_agent.py memory-nudge counter hydration (issue #22357 territory, not mentioned in the title). Drops both bundled pieces; keeps just the api_key_env fix. Closes #5392.	2026-05-09 17:53:56 -07:00
Wesley Simplicio	246c676c2b	fix(gateway): degrade gracefully when all platform adapters are missing When connected_count == 0 AND enabled_platform_count > 0, the gateway treated 'all adapters returned None' identically to 'all adapters failed to connect' — both as fatal startup errors. The 'returned None' case happens when imports fail silently or when adapters are present in config but their dependencies aren't installed (e.g. discord.py missing). Cron jobs and other gateway-runtime work would unnecessarily fail to start. Split: only return False when startup_retryable_errors is non-empty (real connection attempt failed). When the list is empty AND enabled > 0, log a warning and continue running, matching the 'no platforms enabled' cron path. Salvage of #22642's gateway slice. Drops the bundled run_agent.py memory-nudge counter hydration block (issue #22357 territory) which wasn't mentioned in the PR description. Closes #5196.	2026-05-09 17:53:46 -07:00
Wesley Simplicio	116a1446a4	fix(terminal): bridge docker_env config to TERMINAL_DOCKER_ENV Problem: terminal.docker_env set in config.yaml was silently ignored. Docker containers never received the user-specified env vars. Root cause: docker_env was missing from all three config→env bridging maps (cli.py env_mappings, gateway/run.py _terminal_env_map, hermes_cli/config.py _config_to_env_sync) and from the terminal_tool _get_env_config() reader. _create_environment() consumed the key from container_config correctly, but it was always {} because TERMINAL_DOCKER_ENV was never set. Also extend the list-serialisation branches in cli.py and gateway/run.py to handle dict values via json.dumps (lists already used json.dumps; plain str() on a dict produces undecodable output). Fix: - cli.py: add "docker_env": "TERMINAL_DOCKER_ENV" to env_mappings; serialise dict values with json.dumps alongside existing list path - gateway/run.py: same additions to _terminal_env_map and serialisation - hermes_cli/config.py: add "terminal.docker_env": "TERMINAL_DOCKER_ENV" to _config_to_env_sync so `hermes config set terminal.docker_env …` persists to .env correctly - tools/terminal_tool.py: add docker_env key to _get_env_config() reading TERMINAL_DOCKER_ENV via _parse_env_var with default "{}" Tests: add test_docker_env_is_bridged_everywhere to tests/tools/test_terminal_config_env_sync.py — stash-verified: fails on origin/main, passes with fix. Fixes #20537	2026-05-09 17:53:35 -07:00
Wesley Simplicio	53ec32819c	fix(process_registry): kill orphaned Popen on post-spawn setup failure After Popen succeeds with os.setsid (detached process group), 5 things happen with no try/except: Thread construction, reader.start(), lock acquisition, prune+register, checkpoint write. If any raises, the Popen object goes unregistered and the detached process group leaks indefinitely. Wrap the post-spawn setup in try/except. On failure: - os.killpg(getpgid(pid), SIGKILL) takes down the entire process group (not just the shell - important because of detached PG + -lic shell wrapper that may have spawned children) - proc.kill() fallback for ProcessLookupError/PermissionError/OSError - proc.wait(timeout=5) reaps with a bound - re-raise to preserve original traceback Nested try/except around cleanup so a secondary failure can't mask the original. Closes #2749.	2026-05-09 17:53:24 -07:00
adybag14-cyber	6d5d467d39	fix(update): use termux-all uv fallback path on Termux	2026-05-09 17:53:15 -07:00
Wesley Simplicio	2245879af0	fix(checkpoint): guard _touch_project against non-dict project metadata Problem ======= `tools.checkpoint_manager._touch_project` reads the project metadata file with `json.loads(meta_path.read_text(...))`, then immediately does: meta["workdir"] = str(_normalize_path(working_dir)) The `except` block only catches `(OSError, ValueError)`. When the file parses successfully but returns a non-dict value (a list `[]`, `null`, or a scalar from a corrupted or hand-truncated write), `json.loads` succeeds without error and `meta` is set to, e.g., `[]`. The subsequent subscript assignment then raises `TypeError: list indices must be integers or slices, not str`, which is NOT caught by the narrow except clause. This TypeError propagates up through `_take` to `ensure_checkpoint`, where the broad `except Exception` safety net swallows it. The effect is that `ensure_checkpoint` silently returns False for the entire session — all checkpoints are skipped for the affected working directory without any user-visible error. Root cause ========== Missing `isinstance(meta, dict)` guard after `json.loads`, identical in pattern to bugs fixed in `cron/jobs.py` (#22569) and `tools/process_registry.py` (#22544). The same guard is already present one function below in `_list_projects` (line 506), but was inadvertently omitted in `_touch_project`. Fix === Add two lines after the try/except: ```python if not isinstance(meta, dict): meta = {} ``` This matches the existing guard in `_list_projects` and ensures a fresh empty dict is used whenever the persisted value is not a mapping — preserving the `created_at` semantics via `setdefault` on the next line. Tests ===== `TestTouchProjectMalformedMeta` covers four non-dict root values (`[]`, `null`, `42`, `"oops"`). Each writes a corrupted metadata file, calls `_touch_project`, and asserts: (a) no exception raised, (b) the metadata file is rewritten as a valid dict containing `last_touch` and `workdir`. All four fail on main with `TypeError`, pass with fix. Full `tests/tools/test_checkpoint_manager.py` regression: 77 passed.	2026-05-09 17:53:13 -07:00
Wesley Simplicio	058c50816c	fix(session): route OR-combined short CJK tokens to LIKE fallback (#20494 ) The FTS5 trigram tokenizer requires >=3 CJK characters per individual token to produce matchable trigrams. A query like "广西 OR 桂林 OR 漓江" has cjk_count=6 (passes the existing >=3 guard) but each token is only 2 CJK chars, so the trigram index returns 0 results. Fix: - Add per-token check: if any non-operator CJK token has <3 CJK chars, force the LIKE fallback path regardless of total cjk_count. - Expand the LIKE fallback to build one LIKE condition per non-operator token joined with OR, so each term is matched independently. Regression tests added in TestCJKSearchFallback: - test_cjk_or_combined_short_tokens_returns_results - test_cjk_short_token_or_query_preserves_filters	2026-05-09 17:53:02 -07:00
Wesley Simplicio	35f773c459	fix(context_compressor): treat streaming premature-close as transient error Problem: When a provider or proxy drops a streaming response mid-flight (httpcore raises RemoteProtocolError: "incomplete chunked read", "peer closed connection", "response ended prematurely", etc.), _generate_summary would not classify it as a transient error. Instead of retrying on the main model, it entered the generic 60-second cooldown, leaving context growing unbounded until the cooldown expired. Issue #18458. Root cause: _is_connection_error in auxiliary_client.py did not match httpcore's streaming premature-close error substrings. context_compressor.py's _generate_summary except block never called _is_connection_error, so those errors fell through to the 60-second generic cooldown rather than triggering the retry-on-main fallback path used for timeouts. Fix: 1. auxiliary_client.py — extend _is_connection_error keyword list with: "incomplete chunked read", "peer closed connection", "response ended prematurely", "unexpected eof", "remoteprotocolerror", "localprotocolerror". Also guard the `from openai import ...` with try/except ImportError so the function works in environments without the openai package. 2. context_compressor.py — import _is_connection_error and call it in _generate_summary's except block as _is_streaming_closed. Include _is_streaming_closed in the fallback-to-main condition (alongside _is_model_not_found, _is_timeout, _is_json_decode) and use the shorter 30s transient cooldown for streaming-closed errors. Tests: 4 new regression tests in TestStreamingClosedFallback: - test_incomplete_chunked_read_falls_back_to_main - test_peer_closed_connection_falls_back_to_main - test_streaming_closed_on_main_uses_short_cooldown (stash-verified) - test_non_streaming_unknown_error_still_uses_long_cooldown Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 17:52:51 -07:00
heathley	0c5c4d1b8d	fix(skills-hub): cover remaining SSRF fetch paths after #10029	2026-05-09 17:52:12 -07:00
Teknium	70bfd429e5	fix(gateway): preserve reasoning_content, codex_message_items, finish_reason on transcript replay (#22839 ) PR #2974 whitelisted three reasoning fields (reasoning, reasoning_details, codex_reasoning_items) for the gateway's simple-text replay branch. Three more fields were added to the DB later but the whitelist was never updated: - reasoning_content: provider-facing thinking text. _copy_reasoning_content_for_api promotes 'reasoning' -> 'reasoning_content' at send time only when the strings happen to match. Carrying the original verbatim avoids loss for providers that return them as distinct fields (DeepSeek/Kimi/ Moonshot thinking modes), and preserves the empty-string sentinel that DeepSeek V4 Pro requires for thinking-mode replay. - codex_message_items: exact assistant message items with 'phase'. OpenAI docs: 'preserve and resend phase on all assistant messages — dropping it can degrade performance.' Required for prefix cache hits. No recovery path exists — once dropped, gone. - finish_reason: informational; cheap to keep so transcripts replay identically across CLI and gateway. The CLI is unaffected because cli.py keeps the live in-memory message list across turns (cli.py:10046 'self.conversation_history = result["messages"]'). The gateway rebuilds agent_history from the SQLite transcript on every turn, so any field stripped during replay is silently lost. Refactors the inline whitelist into a module-level _build_replay_entry() helper so the contract can be unit-tested. 16 new tests pin the field set and falsy-value handling. Verified end-to-end: DB stores all 8 fields, replay now preserves all 8 (was preserving only 5 for assistant text turns).	2026-05-09 14:47:33 -07:00
Teknium	c7f0aab949	feat(openrouter): wire Pareto Code router with min_coding_score knob (#22838 ) Pick openrouter/pareto-code as your model and OpenRouter auto-routes each request to the cheapest model meeting your coding-quality bar (ranked by Artificial Analysis). The new openrouter.min_coding_score config key (0.0-1.0, default 0.65) tunes the floor. - hermes_cli/models.py: add openrouter/pareto-code to OPENROUTER_MODELS so it shows up in the picker with a description - hermes_cli/config.py: add openrouter.min_coding_score (default 0.65 — lands on a mid-tier coder on the current Pareto frontier) - plugins/model-providers/openrouter: emit extra_body.plugins = [{id: pareto-router, min_coding_score: X}] when model is openrouter/pareto-code AND the score is a valid float in [0.0, 1.0] - agent/transports/chat_completions.py: same emission on the legacy flag path (when no provider profile is loaded) - run_agent.py: openrouter_min_coding_score kwarg + storage; plumbed into both build_kwargs() invocations and the context-summary extra_body path - cli.py: read openrouter.min_coding_score once at init, validate float in [0,1], pass to AIAgent constructions (CLI + background-task paths) - cron/scheduler.py, batch_runner.py, tools/delegate_tool.py, tui_gateway/server.py: propagate the kwarg (mirrors providers_order plumbing — subagents inherit, cron/batch read from config) - tests: profile-level + transport-level coverage of the model gating, unset/empty/out-of-range handling, and the legacy flag path - docs: new 'OpenRouter Pareto Code Router' section in providers.md Verified end-to-end against api.openrouter.ai: at score=0.65 we land on a mid-tier coder, at omission we get the strongest. Score is silently dropped on any model other than openrouter/pareto-code, so it's safe to leave set.	2026-05-09 14:47:00 -07:00
Henkey	b349ae1e4c	fix(acp): honor task cwd for foreground terminal commands	2026-05-09 14:46:34 -07:00
HenkDz	840ebe063e	fix: make session search initialize session db	2026-05-09 14:36:58 -07:00
helix4u	9c26297c80	fix(gateway): preserve Ctrl+C for Windows foreground runs	2026-05-09 14:34:18 -07:00
Ninso112	883e11f0a0	fix(openrouter): add x-grok-conv-id header for Grok models to improve prompt cache hit rates (carve-out of #22708 ) Pass session_id through to provider profile build_api_kwargs_extras so the OpenRouter profile can attach an xAI cache-affinity header (x-grok-conv-id: <session-id>) for x-ai/grok-* models. xAI prompt cache requires server affinity via this header — without it the cache is poisoned and Grok prompt-cache hit rates drop dramatically on multi-turn sessions. Carve-out of #22708 by Ninso112. The original PR bundled a /diff slash command, a zsh completion fix (already on main via #22802), and holographic memory null-guards. This salvage keeps just the Grok header work — small, targeted, and well-tested. Other contributors and changes preserved for separate review. Closes #22705.	2026-05-09 13:38:52 -07:00
Denis	236f3b0521	feat(gateway): add Telegram notification mode to suppress intermediate push notifications Add a configurable notifications mode for the Telegram platform adapter that controls which messages trigger push notifications. - display.platforms.telegram.notifications: "all" (default) \| "important" - HERMES_TELEGRAM_NOTIFICATIONS env var override - In "important" mode, all sends use disable_notification=True except: - Approvals (send_exec_approval) and slash confirmations - Final response messages (metadata["notify"]=True) - Zero overhead in default "all" mode - Zero impact on non-Telegram platforms Closes #22771	2026-05-09 13:38:25 -07:00
Wesley Simplicio	ca13993217	fix(delegate): add explicit do-not-use guidance to acp_command/acp_args schema (carve-out of #22680 ) acp_command / acp_args descriptions previously primed the model to populate them — "Per-task ACP command override (e.g. 'copilot')" — even when no ACP CLI was installed. Models with weaker schema-following discipline would set them and the spawn would fail. Add explicit "Do NOT set unless the user has explicitly told you" guidance at both the top-level acp_command and the per-task override. Strengthen acp_args to mention it's empty unless acp_command is set. Adds 2 tests pinning the descriptions. Note: this is a cosmetic prompt-engineering fix — the params remain exposed in the schema. The fully-correct fix is to gate them behind a config flag or runtime ACP-CLI detection so the schema only emits them when an ACP harness is available. Tracked as a follow-up; this PR ships the low-cost stopgap. Salvage of #22680 (delegate schema only). The original PR also bundled unrelated fixes for #22548, #21944, #22150 — those need separate PRs since #22548 and #21944 are already addressed on main (#22780 + #22798 in flight) and #22150 deserves its own review. Closes #22013.	2026-05-09 13:37:30 -07:00
Teknium	1c9ffb177c	fix(model-metadata): align hy3-preview static fallback + delete change-detector test (#22805 ) Two co-located fixes: 1. agent/model_metadata.py: bump hy3-preview static fallback from 256000 to 262144 (256 * 1024) to match OpenRouter live metadata so cache and offline both agree (issue #22268). 2. tests/hermes_cli/test_tencent_tokenhub_provider.py: replace the exact-value change-detector (assert ctx == 256000) with an invariant assertion (registered + >= 4096). Per AGENTS.md 'Don't write change-detector tests': pinning the upstream-controlled context length is exactly the test class the rule forbids — it breaks every time the provider bumps the published value, with zero behavioral coverage gained. Salvage of #22574 with a redirect on the test approach. The contributor's diff bumped the integer and added a SECOND change-detector pinning DEFAULT_CONTEXT_LENGTHS[hy3-preview] == 262144, which would re-break on the next published bump. We instead delete the change-detector entirely and assert the relationship. Closes #22268.	2026-05-09 13:37:19 -07:00
Wesley Simplicio	1dd0790654	fix(doctor): skip pluggable provider profiles when a dedicated check exists (#22346 ) Problem ------- `hermes doctor` ran two health checks for Anthropic: a dedicated one with the correct `x-api-key` + `anthropic-version` headers, and a generic Bearer-auth one driven by the pluggable `ProviderProfile` for "anthropic". The generic check called `https://api.anthropic.com/v1/models` with `Authorization: Bearer ...`, which Anthropic answers with HTTP 404, producing a noisy duplicate warning even when the dedicated check passed. Root cause ---------- `hermes_cli/doctor.py:_build_apikey_providers_list` deduplicated profiles against a `_known_canonical` set built from the static list (Z.AI/GLM, Kimi, DeepSeek, …). Providers with their own dedicated check above the generic loop (Anthropic, OpenRouter, Bedrock) were not in that set, so their profiles were appended and ran a second, broken check. Fix --- Add `{"anthropic", "openrouter", "bedrock"}` to the skip set, and also skip profiles whose aliases match any of those names (e.g. `claude`, `claude-oauth` → anthropic). Tests ----- tests/hermes_cli/test_doctor_dedicated_provider_skip.py: - test_build_apikey_providers_list_skips_dedicated_check_providers: asserts the assembled list does not contain anthropic, openrouter, or bedrock entries. - test_build_apikey_providers_list_includes_non_dedicated_providers: sanity guard that legitimate providers (DeepSeek, Z.AI/GLM) survive. Both confirmed via stash-verify (fail pre-fix with anthropic/openrouter leaking, pass post-fix). Fixes #22346	2026-05-09 13:36:33 -07:00
Wesley Simplicio	78698381af	fix(kanban): make _migrate_add_optional_columns idempotent on concurrent open ALTER TABLE calls inside _migrate_add_optional_columns were guarded by a snapshot of PRAGMA table_info taken at function entry. When the gateway dispatcher opens the kanban DB twice per tick (once in _tick_once_for_board and once via init_db's discard-and-reconnect path), a second connection can run the same migration before the first one commits, causing: sqlite3.OperationalError: duplicate column name: consecutive_failures This crashed the dispatcher on every first tick after a gateway restart (subsequent ticks succeeded because the columns were then present). Fix: introduce _add_column_if_missing() which wraps ALTER TABLE in a try/except that swallows OperationalError whose message contains 'duplicate column name'. All ALTER TABLE calls in _migrate_add_optional_columns are routed through this helper. Closes #21708	2026-05-09 13:36:23 -07:00
Wesley Simplicio	68854cdcdb	fix(agent): extract thinking from content-list blocks for DeepSeek V4 Pro DeepSeek V4 Pro returns thinking content as typed blocks inside the content array rather than as a top-level reasoning_content field: [{"type": "thinking", "thinking": "..."}, {"type": "output", ...}] _extract_reasoning only handled content as a plain string, so the thinking text was silently dropped. On the next turn the session was replayed without the thinking block, causing: HTTP 400: The content[].thinking in the thinking mode must be passed back to the API. Fix: when content is a list and no structured reasoning field was found, scan for items with type=='thinking' and accumulate their 'thinking' (or 'text') value into reasoning_parts. Structured fields (reasoning, reasoning_content, reasoning_details) still take priority so existing provider behaviour is unchanged. Closes #21944	2026-05-09 13:36:12 -07:00
Wesley Simplicio	98e94beb1b	fix(deps): declare youtube-transcript-api in pyproject.toml [youtube] extra skills/media/youtube-content/scripts/fetch_transcript.py and optional-skills/productivity/memento-flashcards/scripts/youtube_quiz.py both import youtube-transcript-api at runtime, but the package was not listed in pyproject.toml. A fresh `uv sync` therefore omits it, and both skills fail on first invocation with: ModuleNotFoundError: No module named 'youtube_transcript_api' Add a new [youtube] optional-dependency group with youtube-transcript-api>=1.2.0 (the v1.x API surface the scripts already use) and include it in [all] so standard installs pick it up. Regression tests: TestPyprojectDeclaresYoutubeExtra verifies the extra is present in pyproject.toml and included in [all]. Closes #22243 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 13:36:01 -07:00
Wesley Simplicio	3fd4ccbd8b	fix(email): send IMAP ID extension to support 163/NetEase mailbox 163/NetEase IMAP servers reject every UID SEARCH/FETCH with `BYE Unsafe Login` unless the client first identifies itself via the RFC 2971 ID command after LOGIN. Without this, the email gateway logs in OK but then fails on the very first poll and the connection is torn down. Send the ID payload best-effort after both `imap.login()` sites (`EmailAdapter.connect` and `_fetch_new_messages`). Failures are swallowed at debug level so non-supporting IMAP servers (Gmail, Outlook, Fastmail, Yahoo, etc.) keep working unchanged. Closes #22271	2026-05-09 13:35:50 -07:00
Wesley Simplicio	48bf0ea249	fix(browser_tool): fall through to autodetect on config read failure	2026-05-09 13:35:39 -07:00
Wesley Simplicio	3170c8d448	fix(browser_tool): do not cache transient None cloud provider resolution Problem: `_get_cloud_provider()` set `_cloud_provider_resolved = True` before resolution. If credentials were briefly unavailable on the first call (e.g. a managed Nous Portal token mid-refresh), the resolver pinned the entire process to local mode forever, even after credentials self-healed seconds later. Root cause: bookkeeping was set up-front, so any code path that fell through to `return _cached_cloud_provider` (config read failure, no credentials yet, explicit-provider instantiation failure) committed the transient `None` to the cache permanently. Fix: invert the bookkeeping. `_cloud_provider_resolved = True` is now set only when (a) the user explicitly chose `cloud_provider: local`, or (b) a provider was successfully resolved. All transient `None` paths return without poisoning the cache, so the next call retries. Explicit provider instantiation failures now log at warning level with stack trace so operators can diagnose them. Tests: 5 new cases in tests/tools/test_browser_cloud_provider_cache.py covering explicit local, successful resolution, no-credentials-yet, config read failure, and explicit provider instantiation failure. Stash-verify confirmed the 3 transient-None tests fail without the fix. All 320 existing browser tests still green. Closes #22324	2026-05-09 13:35:39 -07:00
Maxim Esipov	17d8914850	fix(auxiliary): rotate pooled auth after quota failures	2026-05-09 13:35:04 -07:00
Teknium	775c0e22cf	perf(models_dev): cache-first lookup, skip network when disk cache is fresh (#22808 ) `fetch_models_dev()` is on the hot path of every `AIAgent.__init__` (via `context_compressor → get_model_context_length`). The previous policy was "always try network first, only fall back to disk if network fails," so every fresh `hermes chat` / `hermes gateway` / batch / cron process paid 250-500 ms re-fetching a 2 MB JSON registry that was already on disk from earlier runs. Add a stage 2 between in-mem and network: if `models_dev_cache.json` exists and its mtime is younger than the existing `_MODELS_DEV_CACHE_TTL` (1 hour, same TTL the in-mem cache already uses), load from disk and skip the network call. The in-mem TTL is anchored to the disk file's age, so a 50-min-old cache stays in-memory for only 10 more minutes — no surprise extension of staleness window. Invariants preserved: - `force_refresh=True` still always hits the network and only falls back to disk on failure (`hermes config refresh` semantics). - Missing disk cache → fall through to network (first-ever run). - Stale disk cache (mtime > TTL) → fall through to network. - Negative file age (clock skew) → fall through to network. - Network failure → existing stage-4 stale-disk fallback unchanged. Measured impact (3-run medians, 9950X3D, fresh process per run): fetch_models_dev cold: 256 → 17 ms (-93%) hermes chat -q wall: 4.00 → 3.73 s (-7% median) 3.99 → 3.60 s (-10% min) The chat-end-to-end win is bounded below by API latency variance, but the fetch_models_dev microbenchmark is the cleanest signal: 239 ms shaved off every fresh-process agent construction. Win compounds with the previous perf PRs: #22681 google_chat lazy-load #22766 doctor parallel + IMDS off #22790 gateway.platforms PEP 562 Tests: all 30 `tests/agent/test_models_dev.py` pass (added 4 new ones covering the new disk-cache-first path, force_refresh override, stale disk fallback, and missing-disk-cache fall-through). Full `tests/agent/` suite: 2560 passed, 0 failed.	2026-05-09 13:32:38 -07:00
Julien Talbot	cd712b176a	feat(transports/codex): pass reasoning.effort to xAI Responses API The is_xai_responses branch only sent include=[reasoning.encrypted_content] without forwarding the resolved reasoning_effort. Other Responses providers (OpenAI, GitHub) already get effort forwarded — this aligns the xAI path. Without this, agent.reasoning_effort is silently dropped on the xAI direct path, making Hermes unable to control reasoning depth on grok-4.x via api.x.ai. Tests added to TestCodexBuildKwargs cover effort passthrough, disabled state, and minimal-clamp parity with non-xAI.	2026-05-09 13:23:02 -07:00
Teknium	dcff23a25f	test(xai-image): regression-guard literal '1k'/'2k' resolution payload The xAI image-gen provider was DOA from PR #14765 onward — every request 422'd because the resolution param was being mapped to '1024'/'2048' but xAI's API expects the literal strings '1k'/'2k'. PR #18678 fixed the mapping; this test asserts the wire payload carries the literal so the regression cannot recur silently.	2026-05-09 13:07:46 -07:00
Teknium	8f711f79a4	fix(tools): install cua-driver when Computer Use is enabled via 'hermes tools' (#22765 ) Returning users who enabled '🖱️ Computer Use (macOS)' via 'hermes tools' saw '✓ Saved configuration' but no install — cua-driver was never on PATH and the toolset failed at first use. Two compounding causes: 1. _toolset_needs_configuration_prompt fell through to _toolset_has_keys, which returned True for any provider with empty env_vars. cua-driver has no env vars, so the gate skipped _configure_toolset entirely and _run_post_setup('cua_driver') never ran. 2. No stable CLI entry-point existed for re-running the install when the picker no-op'd it (e.g. when toggling the toolset off+on inside one picker session, where 'added' is empty). Changes: - hermes_cli/tools_config.py: add _POST_SETUP_INSTALLED registry mapping post_setup keys to installed-state predicates. The gate now returns True when any visible provider has a registered post_setup whose predicate fails. cua_driver is the only opt-in for now; other post_setup hooks keep their existing behaviour. - hermes_cli/main.py: add 'hermes computer-use install' and 'hermes computer-use status' as a stable docs target. install reuses the same _run_post_setup('cua_driver') path that the picker invokes; status reports whether cua-driver is on PATH. - tools/computer_use/cua_backend.py: install hint now points users at 'hermes computer-use install' first. - website/docs/user-guide/features/computer-use.md: document the new command as the primary install path. - website/docs/reference/cli-commands.md: catalog 'hermes computer-use' alongside 'hermes tools'. - tests/hermes_cli/test_post_setup_gating.py: regression coverage for the gate predicate (missing -> setup forced, installed -> setup skipped, broken predicate -> non-blocking, unregistered keys -> behaviour unchanged). Fixes #22737. Reported by @f-trycua.	2026-05-09 13:02:25 -07:00
Teknium	e7c0d6ee53	fix(fallback): skip chain entries matching current provider/model/base_url (#22780 ) _try_activate_fallback() walked the chain by index without comparing the candidate entry against the currently-failing backend. So a misconfigured chain that listed the same provider+model as the primary, or two custom_providers entries pointing at the same shim URL, would loop the same failure 3x for the same backend. After the fix, advance() skips: - entries where (provider, model) match the current agent's - entries with a base_url + model matching the current backend (catches two custom_providers names pointing at the same shim) Recursing through self._try_activate_fallback() continues to the next chain entry; if everything matches, returns False and the caller moves on without retrying the same broken path. 3 regression tests covering same-provider-same-model skip, same-base_url- same-model skip, and the all-self-matching-returns-False exhaustion path. Closes #22548 (the Hermes-side portion). The 120s timeout itself in the downstream claude-cli shim is a deployment concern documented in that issue's wherewolf87 comment.	2026-05-09 12:48:19 -07:00
Teknium	70bc52e408	fix(cli): make Ctrl+Enter insert newline on WSL/SSH/Windows Terminal (#22777 ) Native Windows, WSL, SSH sessions, and Windows Terminal all send Ctrl+Enter as bare LF (c-j). Hermes was binding c-j as submit on every POSIX platform, so Ctrl+Enter submitted instead of inserting a newline on those terminals. Reported in #22379. Add _preserve_ctrl_enter_newline() predicate that detects the environments where Ctrl+Enter must produce a newline (sys.platform == 'win32', SSH_CONNECTION/SSH_CLIENT/SSH_TTY env, WT_SESSION, WSL_DISTRO_NAME, /proc/version 'microsoft' marker). Gate the c-j-as-submit binding off in those environments and gate the c-j-as-newline handler on. Local POSIX TTYs without those markers (docker exec, plain ssh from a Mac) keep c-j as submit so plain Enter still works on thin PTYs. Add install_ctrl_enter_alias() in hermes_cli/pt_input_extras.py mapping the three CSI-u / modifyOtherKeys variants of Ctrl+Enter ('\x1b[13;5u', '\x1b[27;5;13~', '\x1b[27;5;13u') to the (Escape, ControlM) tuple Alt+Enter produces. This lets Kitty / mintty / xterm-with-modifyOtherKeys users over SSH get a Ctrl+Enter newline through the existing Alt+Enter handler. 9 new tests + extended existing test_lf_enter_binds_to_submit_handler_posix to cover bare-local vs SSH branches. Closes #22379.	2026-05-09 12:48:14 -07:00
Teknium	2124ad72a2	fix(api-server): emit length/error finish_reason for truncation/failure (#22775 ) Non-streaming /v1/chat/completions wrapped any AIAgent result \u2014 including partial/failed runs \u2014 as a successful 200 with finish_reason='stop' and the internal failure string substituted into message.content. API clients had no way to distinguish 'agent answered: X' from 'agent crashed and the X you see is its error message'. After the fix: - completed: True \u2192 200 finish_reason='stop' (unchanged) - partial + truncated text \u2192 200 finish_reason='length' + hermes extras - partial + no text / failed \u2192 502 OpenAI error envelope (SDKs raise) - other failures \u2192 200 finish_reason='error' + hermes extras Adds X-Hermes-Completed / X-Hermes-Partial / X-Hermes-Error headers plus a 'hermes' extras object on partial responses for clients that want the full picture. Closes #22496.	2026-05-09 12:48:08 -07:00
Teknium	86f69e8c2a	fix(agent): hydrate memory-nudge counters from conversation_history (#22774 ) Gateway creates a fresh AIAgent per inbound message in several common scenarios: cache miss, idle eviction (1h TTL), config-signature mismatch, process restart. A freshly-built AIAgent has _turns_since_memory=0 and _user_turn_count=0, so the memory.nudge_interval trigger ('_turns_since_memory >= _memory_nudge_interval') can never be reached when these reconstructions happen on roughly the cadence of the interval. A user can chat for hours on Telegram without ever seeing a self-improvement review fire. Reconstruct the counters from conversation_history at the top of run_conversation(), right after the existing _hydrate_todo_store call. Idempotent guard ('if self._user_turn_count == 0') means a cached agent that already accumulated counters keeps them; only freshly-built agents hydrate. Modulo arithmetic preserves the original 1-in-N cadence rather than firing a review immediately on resume. 7 regression tests pinning the contract (mid-cycle history, modulo wrap, idempotency, zero-interval skip, role==user filtering, production-code anchor). Closes #22357.	2026-05-09 12:48:03 -07:00
Teknium	ade5981429	fix(kanban): sanitize comment author rendering in build_worker_context (#22769 ) Operator-controlled HERMES_PROFILE values were rendered as '${author} (${ts}):' — markdown bold with no provenance prefix. Worker comment bodies render directly underneath. A misleading profile name like 'hermes-system' or 'operator' could be misread by the next worker as a system directive above attacker-influenced content (confused-deputy primitive gated on operator misconfig). The LLM-controlled author-forgery surface was already closed in #22435 (author removed from KANBAN_COMMENT_SCHEMA). This is defense-in-depth: render with an explicit 'comment from worker `<author>` at <ts>:' prefix so even 'hermes-system' resolves to 'comment from worker `hermes-system` at ...' — parseable as worker-comment metadata, not a system directive. Strip backticks from author so they can't break out of the fence. Update test_build_worker_context_caps_comments to count by body regex since the rendered author line now also starts with 'comment '. Closes #22452.	2026-05-09 12:47:58 -07:00
Teknium	e90aa7f280	fix(agent): notify context engine on commit_memory_session (#22764 ) When session_id rotates (e.g. /new), commit_memory_session was firing MemoryManager.on_session_end but skipping ContextEngine.on_session_end. Engines that accumulate per-session state (LCM-style DAGs, summary stores) leaked that state from the rotated-out session into whatever continued under the same compressor instance. Mirror the call shutdown_memory_provider already makes — same lifecycle moment, same hook contract ("real session boundaries (CLI exit, /reset, gateway expiry)"). /new is a real boundary for the old session_id; providers keep their state but the rotated-out session_id is done. 6 regression tests covering both-hooks-fire, no-memory-manager, no-context-engine, both failure-tolerant paths. Closes #22394.	2026-05-09 12:28:42 -07:00

... 4 5 6 7 8 ...

3764 commits