hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-08 03:01:47 +00:00

Author	SHA1	Message	Date
Hao Zhe	2b6345cee3	fix(memory): harden OpenViking local path uploads	2026-05-07 05:21:50 -07:00
Hao Zhe	187951ec6b	test(memory): harden OpenViking local upload coverage	2026-05-07 05:21:50 -07:00
nan	7137cccbd1	fix(memory): support OpenViking local resource uploads	2026-05-07 05:21:50 -07:00
0oAstro	abe5a3c937	fix(model_switch): live model discovery for custom_providers in /model picker custom_providers entries (section 4 of list_authenticated_providers) only read the static models: dict from config.yaml, ignoring the live /v1/models endpoint. This means gateways like Bifrost that expose hundreds of models only show the handful explicitly listed in config. Add live discovery via fetch_api_models() for custom_providers entries that have api_key + base_url, matching the existing behavior for user providers: entries (section 3). When the endpoint is reachable and returns models, the live list replaces the static subset. Fixes: /model picker showing only 9 models from a Bifrost gateway that actually exposes 581.	2026-05-07 05:21:26 -07:00
Teknium	e82f3b0c41	test: update send_message_tool mocks for force_document kwarg	2026-05-07 05:20:10 -07:00
leon7609	d34f03c32a	feat(gateway): support [[as_document]] directive for skill media routing Skills that produce large/lossless images (e.g. info-graph, where a rendered JPG is 1-2 MB) currently lose quality in Telegram delivery because `_IMAGE_EXTS` membership routes the file through `send_multiple_images` → `sendMediaGroup`, which Telegram's server re-encodes to JPEG @ 1280px max edge. The original bytes only survive when the file goes through `send_document`, which the dispatch tables in three places (`_process_message_background`, `_deliver_media_from_response`, and the `send_message` tool's telegram path) only reach for files whose extension is NOT in `_IMAGE_EXTS`. This commit adds an `[[as_document]]` directive that mirrors the existing `[[audio_as_voice]]` shape: a skill emits the directive once in its response, and every image-extension MEDIA: file in that response is delivered via `send_document` instead of `send_multiple_images` / `sendPhoto`. The directive is detected at the dispatch sites (which see the raw response) and the directive string is stripped from the user-visible cleaned text in `extract_media` so it never leaks. Granularity is intentionally all-or-nothing per response, matching [[audio_as_voice]]'s scope. Skills that need fine control can split into two responses. Verified the targeted use case: info-graph emits 信息图已生成（...） [[as_document]] MEDIA:/tmp/info-graph-x/infographic.jpg → Telegram receives `infographic.jpg` via sendDocument, original 1MB JPEG bytes preserved, no recompression. Forwarding and download filenames stay clean (`infographic.jpg`). Tests: +3 cases in TestExtractMedia covering directive strip, isolation from voice flag, and coexistence with [[audio_as_voice]]. All 113 pre-existing media/extract/send tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 05:20:10 -07:00
Molvikar	8d363f8d54	fix(bedrock): preserve reasoningContent across converse normalization	2026-05-07 05:17:16 -07:00
badfriend	4f364c4e99	fix(mcp): give 'mcp add --command' a distinct argparse dest The --command flag of `hermes mcp add` shared its argparse dest with the top-level subparser (`dest="command"` in `hermes_cli/_parser.py`). When the flag was omitted, argparse still wrote `args.command = None`, clobbering the top-level value of `"mcp"`. The dispatcher then saw `args.command is None` and fell through to interactive chat, so `hermes mcp add ...` silently launched chat instead of registering the server. `cmd_mcp_add` was never reached. Use `dest="mcp_command"` on the flag and read it from `cmd_mcp_add`. The user-facing CLI flag `--command` is unchanged; only the in-memory namespace attribute moves. Also updates the `_make_args` helper in `tests/hermes_cli/test_mcp_config.py` to populate the new dest, and adds `tests/hermes_cli/test_mcp_add_command_dest.py` with a parser- level regression test. Closes #19785. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-07 05:17:03 -07:00
teknium1	333598cb0e	fix(gateway): cap cached session sources with LRU eviction Follow-up on top of Zyproth's session-source cache: swap the unbounded dict for an OrderedDict with a 512-entry LRU cap so long-running gateways can't accumulate stale entries for dead sessions forever. - self._session_sources is now an OrderedDict - _cache_session_source() move_to_end + popitem(last=False) above cap - _get_cached_session_source() move_to_end on hit (LRU read bump) - restart_test_helpers.py wires OrderedDict + _session_sources_max	2026-05-07 05:16:38 -07:00
Zyproth	176b93575a	fix(gateway): preserve thread routing from cached live session sources	2026-05-07 05:16:38 -07:00
Teknium	042eb930e2	fix(security): close TOCTOU window in hermes_cli/auth.py credential writers (#21194 ) `_save_auth_store`, `_save_qwen_cli_tokens`, and `_write_shared_nous_state` all created the temp file via `Path.open('w')` / `Path.write_text` and only tightened permissions to 0o600 afterward. Between create and chmod the file existed at the process umask (commonly 0o644 = world-readable on multi-user hosts), briefly exposing OAuth access/refresh tokens for Nous, Codex, Copilot, Claude, Qwen, Gemini, and every other native OAuth provider that flows through auth.json. Switch all three to `os.open(O_WRONLY\|O_CREAT\|O_EXCL, 0o600)` + `os.fdopen` + `fsync` so the file is atomic at 0o600 on creation. Tighten each parent directory (`~/.hermes/`, Qwen auth dir, Nous shared auth dir) to 0o700 so siblings can't traverse to the creds. `_save_auth_store` also gains a per-process random temp suffix to match `agent/google_oauth.py` (#19673) and `tools/mcp_oauth.py` (#21148). Adds `tests/hermes_cli/test_auth_toctou_file_modes.py` asserting final file mode 0o600 and parent dir mode 0o700 across all three writers, plus an explicit `os.open(flags, mode)` check on the main auth.json writer that would fail if anyone reintroduces the `Path.open('w')` pattern. POSIX-only (mode bits skipped on Windows).	2026-05-07 05:12:05 -07:00
Brian Su	8b32a9d0f1	feat: add Discord message deletion action	2026-05-07 05:11:09 -07:00
Teknium	fb1ce793e6	feat(security): enable secret redaction by default (#17691 , #20785 ) (#21193 ) Flip the default for HERMES_REDACT_SECRETS from off to on so the redactor already wired into send_message_tool, logs, and tool output actually runs on a fresh install. - agent/redact.py: env-var default "" → "true" - hermes_cli/config.py: DEFAULT_CONFIG security.redact_secrets True; two config-template comments rewritten - gateway/run.py + cli.py: startup log / banner warning when the user has explicitly opted out, so the downgrade is visible in agent.log and at CLI banner time - docs/reference/environment-variables.md: description reconciled - tests: flipped the default-pin, restructured the force=True regression test to explicit-false instead of unset Users who need raw credential values (redactor development) can still opt out via security.redact_secrets: false in config.yaml or HERMES_REDACT_SECRETS=false in .env. Closes #17691. Addresses #20785 (short-term output-pipeline recommendation).	2026-05-07 05:10:33 -07:00
Teknium	ecaafe5f22	test(weixin): update timeout assertion for asyncio.wait_for migration	2026-05-07 05:10:04 -07:00
Zyproth	6e8f1e09a9	fix(gateway): use monotonic deadlines in QR onboarding flows	2026-05-07 05:09:39 -07:00
thelumiereguy	8a96fa48c1	fix(gateway): avoid duplicated responses history	2026-05-07 05:07:59 -07:00
Michael Nguyen	a84e56d4c6	fix(auth): sync shared Nous refresh tokens	2026-05-07 05:07:06 -07:00
Teknium	38b1c7dce5	refactor(gateway): simplify auto-resume + extend to crash recovery Follow-up on top of @kyan12's PR #20888 — same feature, cleaner shape, wider coverage. Changes: - Drop the synthetic '[System note: ...]' in the internal MessageEvent. The existing _is_resume_pending branch in _handle_message_with_agent (run.py ~L13738) already injects a reason-aware recovery system note on the next turn. With kyan's text in place the model saw two stacked system notes. Now the event text is empty and the existing injection path owns the wording. - Drop SessionStore.list_resume_pending() as a new public method. The filter is 8 lines inline in _schedule_resume_pending_sessions() — one caller, no other pluggability need. - Add 'restart_interrupted' to the auto-resume reason set. That's the reason SessionStore.suspend_recently_active() stamps on sessions recovered from a crash/OOM/SIGKILL (no .clean_shutdown marker). Previously those sessions had to wait for a real user message to auto-resume; now they continue automatically at startup like drain-timeout interruptions do. - Reasons live in a _AUTO_RESUME_REASONS frozenset at class scope so future reasons (e.g. 'manual_resume_request') can be opted in with one line. Test coverage added: - drain-timeout + crash-recovery both scheduled - stale entries skipped (outside freshness window) - suspended entries skipped (suspended > resume_pending) - originless entries skipped (no routing target) - disallowed reasons skipped (graceful forward-compat) E2E verified end-to-end with a real on-disk SessionStore: 2 eligible sessions scheduled, 2 ineligible skipped, empty-text internal events delivered to the adapter. Co-authored-by: Kevin Yan <kevyan1998@gmail.com>	2026-05-07 05:05:34 -07:00
Kevin Yan	961a3535fa	fix(gateway): preserve resume marker on interrupted restart	2026-05-07 05:05:34 -07:00
Kevin Yan	fad684b1f3	feat(gateway): auto-resume interrupted sessions after restart	2026-05-07 05:05:34 -07:00
mwnickerson	411cfa26e3	fix: auto-block repeated kanban retries	2026-05-07 05:05:20 -07:00
LeonSGP43	06f24351c5	fix(kanban): stop reclaimed workers before retry	2026-05-07 05:05:20 -07:00
stephen0110	40b51c93a2	fix(kanban): heartbeat tool extends claim TTL, not just last_heartbeat_at The kanban_heartbeat tool called heartbeat_worker but never heartbeat_claim, so a worker that loops the tool while a single tool call blocks the agent for >DEFAULT_CLAIM_TTL_SECONDS still got reclaimed by release_stale_claims. The function name and heartbeat_claim's own docstring imply otherwise: "Workers that know they'll exceed 15 minutes should call this every few minutes to keep ownership." But there was no caller in the worker tool path. Workers couldn't invoke heartbeat_claim themselves either — it isn't exposed as a tool. Fix: _handle_heartbeat now calls heartbeat_claim first, reading HERMES_KANBAN_CLAIM_LOCK from the worker env (the dispatcher pins this in _default_spawn). Falls back to _claimer_id() for locally- driven workers that didn't go through dispatcher spawn. Test: tests/tools/test_kanban_tools.py::test_heartbeat_extends_claim_expires rewinds claim_expires into the past, calls the tool, and asserts the new value is at least now + DEFAULT_CLAIM_TTL_SECONDS // 2. Verified to fail against the unfixed code (claim_expires stays at the rewound value). Closes the root cause underlying the symptom in #21141 (15-min respawns of long-running workers). #21141 separately addresses post-reclaim cleanup; this fixes the upstream "shouldn't have been reclaimed in the first place" half.	2026-05-07 05:05:20 -07:00
Teknium	bf843adf05	feat(gateway): opt-in cleanup of temporary progress bubbles (#21186 ) When display.cleanup_progress (or display.platforms.<plat>.cleanup_progress) is true, the gateway deletes tool-progress bubbles, long-running '⏳ Still working...' notices, and status-callback messages after the final response is delivered successfully. Currently effective on adapters that implement delete_message (Telegram); silently no-ops elsewhere. Off by default. Failed runs skip cleanup so bubbles stay as breadcrumbs. Minimal plumbing: base.py's existing post_delivery_callback slot now chains new registrations onto any existing callback (with per-callback exception isolation) rather than clobbering. Stale-generation registrations are rejected so they can't step on a fresher run's callbacks. This lets the cleanup callback coexist with the background-review release hook already registered on the same slot. Co-authored-by: mrcharlesiv <Mrcharlesiv@gmail.com>	2026-05-07 05:04:37 -07:00
Tranquil-Flow	d4de7d4179	test(skills): cover additional rescan paths in skill_commands cache (#14536 ) The rescan-on-platform-change fix landed in #18739 ships one regression test that exercises the HERMES_PLATFORM env-var path. Three other code paths in get_skill_commands / _resolve_skill_commands_platform have no direct coverage; this commit adds a regression test for each. - Gateway session context (HERMES_SESSION_PLATFORM via ContextVar): the resolver consults get_session_env after HERMES_PLATFORM, and the gateway sets that variable through set_session_vars (a ContextVar), not os.environ. The test uses set_session_vars / clear_session_vars to drive the actual gateway signal, and the disabled-skill stub reads the same value via get_session_env. A regression that swapped get_session_env for plain os.getenv would still pass an env-var-based test but break concurrent gateway sessions, which is the bug the ContextVar plumbing exists to prevent. - Returning to no-platform-scope (CLI / cron / RL rollouts after a gateway session): the cached telegram view must be dropped and the unfiltered scan repopulated when HERMES_PLATFORM is unset again. - Same-platform cache hit: consecutive calls under the same platform scope must NOT rescan. The rescan trigger is change in scope, not "always re-resolve" — a gateway serving many consecutive telegram requests should pay the scan cost once, not per request. The third test wraps scan_skill_commands with a spy after the cache is primed, so the assertion is on call_count == 0 across three subsequent get_skill_commands() calls. All 39 tests in tests/agent/test_skill_commands.py pass under scripts/run_tests.sh.	2026-05-07 04:59:43 -07:00
briandevans	11b9b146f1	fix(image-routing): expose attached image paths in native multimodal text part In native image mode (vision-capable models like gpt-4o, claude-sonnet-4), build_native_content_parts() previously emitted only the user's caption plus image_url parts. The local file path of each attached image never appeared in the conversation text, so the model could see the pixels but had no string handle for tools that take image_url: str (custom MCP tools, vision_analyze on a re-look, attach-to-tracker workflows). The text-mode path already injects an equivalent hint via Runner._enrich_message_with_vision ("...vision_analyze using image_url: <path>..."). This brings native mode to parity by appending one "[Image attached at: <path>]" line per successfully attached image to the user-text part of the multimodal turn. Skipped (unreadable) paths are NOT advertised, so the model is never told a non-existent file is attached. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 04:58:00 -07:00
Sanjay Santhanam	1f27ca638f	test(update): teach restart-mocks about the post-update survivor sweep Issue #17648 added a post-update SIGTERM-survivor sweep to `cmd_update`: ~3s after issuing graceful/SIGTERM restarts, the code re-queries `find_gateway_pids` and SIGKILLs anything still alive. That's the right fix for stuck-drain gateways in production, but it broke three unit tests that assumed `find_gateway_pids` would keep returning the same PIDs forever: FAILED ::TestCmdUpdateLaunchdRestart::test_update_restarts_profile_manual_gateways AssertionError: Expected 'kill' to not have been called. Called 1 times. Calls: [call(12345, <Signals.SIGKILL: 9>)]. FAILED ::TestCmdUpdateLaunchdRestart::test_update_profile_manual_gateway_falls_back_to_sigterm AssertionError: Expected 'kill' to have been called once. Called 2 times. Calls: [call(12345, SIGTERM), call(12345, SIGKILL)]. FAILED ::TestServicePidExclusion::test_update_kills_manual_pid_but_not_service_pid assert 2 == 1 manual_kills = [call(42999, SIGTERM), call(42999, SIGKILL)] In each test `os.kill` is mocked, so the simulated PID never actually exits \u2014 the sweep finds it again and escalates. The production code is correct; the tests just need to model OS behaviour properly. Two-test fix (profile-manual restart cases): use `side_effect=[[12345], []]` so the first `find_gateway_pids` call returns the live PID and the second (the sweep) returns nothing, as if the OS had reaped the process. Service-PID-exclusion fix: track which PIDs got killed in a closure set, and exclude them on subsequent `fake_find` calls. `os.kill` gets a `side_effect` that records the kill instead of swallowing it silently. Now the sweep doesn't re-find the manual PID, no SIGKILL escalation, `manual_kills == 1`. Validation: $ pytest tests/hermes_cli/test_update_gateway_restart.py -q 43 passed in 4.13s No production code change. Fixes the three failures observed on `main` (run 25250051126): test_update_restarts_profile_manual_gateways test_update_profile_manual_gateway_falls_back_to_sigterm test_update_kills_manual_pid_but_not_service_pid Refs: #17648 (post-update survivor sweep that the tests didn't model).	2026-05-07 04:56:25 -07:00
Gutslabs	7d36e8346b	fix(security): close TOCTOU window when saving MCP OAuth credentials _write_json (the persistence helper used by HermesTokenStorage for both tokens and client_info) created the temp file via Path.write_text and only chmod'd it to 0o600 afterward. Between create and chmod the file existed on disk at the process umask (commonly 0o644 = world-readable), briefly exposing MCP OAuth access/refresh tokens to other local users. Use os.open with O_WRONLY\|O_CREAT\|O_EXCL and an explicit S_IRUSR\|S_IWUSR mode so the file is created atomically at 0o600, plus tighten the parent dir to 0o700 so siblings can't traverse to the creds file. The temp name also gains a per-process random suffix to avoid collisions between concurrent writers and stale leftovers from a crashed prior write. Mirrors the fix shipped for agent/google_oauth.py in #19673. Adds a regression test asserting the resulting file mode is 0o600 and the parent directory is 0o700 (skipped on Windows where POSIX mode bits aren't enforced).	2026-05-07 04:56:13 -07:00
Sanjay Santhanam	595bcc89fc	test(update): patch isatty on real streams to fix xdist-flaky --yes tests Two CI tests for the new `--yes` update flag (#18261) flaked under `pytest-xdist` on Linux/Python 3.11 even though they passed every local run on macOS/Python 3.14.4: FAILED tests/hermes_cli/test_update_yes_flag.py ::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty `AssertionError: assert <MagicMock 'input'>.called is False` FAILED tests/hermes_cli/test_update_yes_flag.py ::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting `AssertionError: assert <MagicMock '_restore_stashed_changes'>.called is False` Captured stdout for the first failure shows `cmd_update` taking the "Non-interactive session \u2014 skipping config migration prompt." branch \u2014 i.e. the `sys.stdin.isatty() and sys.stdout.isatty()` check at `hermes_cli/main.py:7118` evaluated to `False` despite the test doing: with patch("hermes_cli.main.sys") as mock_sys: mock_sys.stdin.isatty.return_value = True mock_sys.stdout.isatty.return_value = True The whole-module mock is fragile under xdist worker reuse: a sibling test that imports `hermes_cli.main` first can leave another `sys` reference resolved inside the function (re-import in a helper, etc.), and the wholesale module replacement never gets consulted. Switch to `patch.object(_sys.stdin, "isatty", return_value=True)` (and the same for `stdout`). That patches the attribute on the real stream object \u2014 every call site, no matter how it reached `sys.stdin`, hits the patched method. Same fix applied to the stash-restore test (it took the "non-TTY \u2192 skip restore prompt" branch for the same reason). Validation: $ pytest tests/hermes_cli/test_update_yes_flag.py -q 3 passed in 5.47s No production code change. Fixes the two failures observed on `main` (run 25250051126): `tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty` `tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting` Refs: #18261 (added the `--yes` flag + these tests).	2026-05-07 04:54:57 -07:00
Sanjay Santhanam	033e533d05	test(docker): align Dockerfile contract tests with simplified TUI flow The Dockerfile dropped the manual `@hermes/ink` materialisation gymnastics in favour of letting npm workspaces resolve the bundled package naturally. Two contract tests still asserted the older flow: `test_dockerfile_installs_tui_dependencies` required: 'ui-tui/packages/hermes-ink/package-lock.json' in dockerfile_text …but the lockfile is no longer COPIED individually \u2014 the entire `ui-tui/packages/hermes-ink/` tree is COPIED instead (the workspace reference from `ui-tui/package.json` is `file:` so npm needs the real source, not just a manifest stub). `test_dockerfile_materializes_local_tui_ink_package` required a 7-clause conjunction matching specific `rm -rf` / `npm install --omit=dev` `--prefix node_modules/@hermes/ink` / `rm -rf .../react` invocations that were stripped out when the workspace resolution was simplified. Update the assertions to pin the contract the image actually has to carry rather than the exact shell incantations the old flow used: * TUI deps install: ui-tui/package.json + ui-tui/package-lock.json + ui-tui/packages/hermes-ink/ tree are all COPIED, and an npm install/ci step runs in ui-tui. * Bundled hermes-ink: the workspace package source is COPIED (so `await import('@hermes/ink')` resolves at runtime). This keeps the spirit of #15012 / #16690 (zombie reaping + bundled workspace materialisation must continue to work) without locking the Dockerfile into one specific implementation flavour. Validation: $ pytest tests/tools/test_dockerfile_pid1_reaping.py -q 6 passed in 1.43s No production code change. Fixes the two failures observed on `main` (run 25250051126): `tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_installs_tui_dependencies` `tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_materializes_local_tui_ink_package`	2026-05-07 04:53:10 -07:00
mrcoferland	bd0c54d171	fix: route Telegram image documents through photo handling	2026-05-07 04:51:46 -07:00
Teknium	51f9953e69	feat(profiles): --no-skills flag for empty profile creation (#20986 ) Adds `hermes profile create <name> --no-skills` to create a profile with zero bundled skills. Writes a `.no-bundled-skills` marker file in the profile root so `hermes update`'s all-profile skill sync loop also skips the profile — without the marker, every update would re-seed skills and the user would have to delete them again. Use case (from @hiut1u): orchestrator profiles and narrow-task profiles don't need 100+ bundled skills polluting their system prompt. - create_profile() gains a `no_skills` param, mutually exclusive with `--clone` / `--clone-all` (cloning explicitly copies skills). - seed_profile_skills() no-ops on opted-out profiles and returns `{skipped_opt_out: True}` so callers can report cleanly. - Web API (POST /api/profiles) accepts `no_skills: bool`. - Delete `.no-bundled-skills` to opt back in — next `hermes update` re-seeds normally. 6 new tests in TestNoSkillsOptOut cover marker write, mutual exclusion with clone, seed_profile_skills opt-out, fresh profile unaffected, and delete-marker-re-enables-seeding.	2026-05-07 04:34:38 -07:00
Teknium	5a3cadf6eb	fix(discord): narrow rate-limit catch and move sync state under gateway/ Two follow-ups on top of helix4u's slash-command sync hardening: - Only suppress exceptions that are actually Discord 429 rate limits (discord.RateLimited, HTTPException with status 429, or a clearly rate-limit-named duck type). Arbitrary failures that happen to expose a retry_after attribute now re-raise to the outer handler instead of silently swallowing a cooldown. - Move the sync-state JSON under $HERMES_HOME/gateway/ so the home root stops collecting ad-hoc runtime files. Added a test verifying unrelated exceptions don't get misclassified as rate limits.	2026-05-06 18:12:35 -07:00
helix4u	d797755a1c	fix(gateway): wait for systemd restart readiness	2026-05-06 18:12:35 -07:00
Teknium	3cdbf334d5	fix(gateway): don't dead-end setup wizard when only system-scope unit is installed The setup wizard dropped non-root users at a bare shell prompt when trying to start a system-scope gateway service. Previously _require_root_for_system_service called sys.exit(1), which the wizard's `except Exception` guards cannot catch (SystemExit is a BaseException). Users with a pre-existing /etc/systemd/system unit (e.g. from an earlier `sudo hermes setup` run) hit this whenever they re-ran `hermes setup` as a regular user. - Convert _require_root_for_system_service to raise a typed SystemScopeRequiresRootError (RuntimeError subclass) instead of sys.exit(1). The direct CLI path (`hermes gateway install\|start\|stop\| restart\|uninstall` without sudo) still exits 1 cleanly via a new catch at the top of gateway_command, matching the existing UserSystemdUnavailableError pattern. - Add _system_scope_wizard_would_need_root() pre-check and _print_system_scope_remediation() helper. Both setup wizards (hermes_cli/setup.py and hermes_cli/gateway.py::gateway_setup) now detect the dead-end before prompting and print actionable guidance: either `sudo systemctl start <service>` this time, or uninstall the system unit and install a per-user one. - Defense-in-depth: all 5 wizard prompt sites also catch SystemScopeRequiresRootError and fall back to the remediation helper if the pre-check is bypassed (race, etc.). Tests: 12 new tests in TestSystemScopeRequiresRootError, TestSystemScopeWizardPreCheck, TestSystemScopeRemediationOutput, and TestGatewayCommandCatchesSystemScopeError covering the exception contract, pre-check matrix (root vs non-root, system-only vs user-present vs none vs explicit system=True), remediation output for each action, and the direct-CLI exit-1 path.	2026-05-06 15:58:02 -07:00
brooklyn!	04cf4788cc	fix(tui): restore voice push-to-talk parity (#20897 ) * fix(tui): restore classic CLI voice push-to-talk parity (cherry picked from commit `93b9ae301b`) * fix(tui): harden voice push-to-talk stop flow Address review feedback from PR #16189 by stopping the active recorder before background transcription, documenting single-shot voice capture, and covering the TUI gateway flags with regression tests. * fix(tui): preserve silent voice strike tracking Keep single-shot voice recording's no-speech counter alive across starts so the TUI can still emit the three-strikes auto-disable event, and bind the auto-restart state at module scope for type checking. * fix(tui): clean up voice stop failure path Address follow-up review by naming the TUI flow as single-shot push-to-talk and cancelling the recorder when forced stop cannot produce a WAV. * fix(tui): report busy voice capture starts Return explicit start state from the voice wrapper so the TUI gateway does not report recording while forced-stop transcription is still cleaning up. * fix(tui): handle busy voice record responses Apply the gateway busy status immediately in the TUI and route forced-stop voice events to the session that sent the stop request. * fix(tui): clear voice recording on null response Treat a null voice.record RPC result as a failed optimistic start so the REC badge cannot stick after gateway-side errors. * fix(tui): count silent manual voice stops Preserve single-shot voice no-speech strikes through forced stop transcription so empty push-to-talk captures still trigger the three-strikes guard. --------- Co-authored-by: Montbra <montbra@gmail.com>	2026-05-06 15:49:59 -07:00
brooklyn!	5044e1cbf1	fix(cli): submit LF enter in thin PTYs (#20896 )	2026-05-06 13:51:13 -07:00
Guillaume Meyer	7df6115199	feat(gateway): also gate pre-restart "Gateway restarting" notification Extend the gateway_restart_notification flag to cover _notify_active_sessions_of_shutdown — the message that fires just before drain ("⚠️ Gateway restarting — Your current task will be interrupted. Send any message after restart and I'll try to resume where you left off.") sent to active sessions and home channels. Same operator/end-user reasoning: on a Slack workspace shared with end users, "Gateway restarting" reads as "the bot is broken" — the operator should be able to suppress it consistently with the other two lifecycle pings rather than having a partial opt-out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 13:39:43 -07:00
Guillaume Meyer	b71f80e6ce	feat(gateway): per-platform gateway_restart_notification flag Adds an opt-out toggle on PlatformConfig that gates both restart lifecycle pings: the "♻ Gateway restarted" message sent to the chat that issued /restart, and the "♻️ Gateway online" home-channel startup notification. Defaults to True so existing deployments are unaffected. The motivating split is operator vs. end-user surfaces: a back-channel like Telegram should keep these pings, while a Slack workspace shared with end users should not surface gateway lifecycle noise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 13:39:43 -07:00
Teknium	33bf5f6292	fix(auth): fall back to global-root auth.json for providers missing in profile Profile processes (kanban workers, cron subprocesses, delegated subagents) read the profile's auth.json only. If a provider was authenticated at the global root but not inside the profile, the profile's credential_pool comes back empty and the process fails with 'No LLM provider configured' — even though the credentials are sitting in ~/.hermes/auth.json. #18594 propagated HERMES_HOME correctly, which is what surfaced this: workers now land in the right profile, and the profile turns out to shadow global with no fallback. Semantics (read-only, per-provider shadowing): * Profile has any entries for provider X → use profile only (global ignored). * Profile has zero entries for provider X → fall back to global. * Writes (write_credential_pool, _save_auth_store) still target the profile. * Classic mode (HERMES_HOME == global root) skips the fallback entirely — _global_auth_file_path() returns None. Also mirrors the fallback in get_provider_auth_state so OAuth singletons (nous, minimax-oauth, openai-codex, spotify) inherit cleanly — the Nous shared-token store (PR #19712) remains the authoritative path for Nous OAuth rotation, this just makes the read side consistent with it. Seat belt: _load_global_auth_store() refuses to read the real user's ~/.hermes/auth.json under PYTEST_CURRENT_TEST even when HERMES_HOME points to a profile-shaped path. Guard uses $HOME (stable across fixtures) rather than Path.home() (which fixtures often monkeypatch to a tmp root). Reported by @SeedsForbidden on Twitter as the credential_pool shadowing follow-up to the #18594 fix.	2026-05-06 13:29:54 -07:00
kshitijk4poor	a2ff193050	chore: follow-up cleanup for Kanban migration fix - Expand migration comment to name the primary failure mode (missing column OperationalError from #20842) ahead of the secondary SQLite schema-reparse concern; also document the stale-cols-snapshot invariant - Add clarifying comments on from_row() legacy fallback branches noting they are belt-and-suspenders dead code post-migration - Add task_events comment in existing test explaining why the table is required by the migrator - Add test_legacy_migration_no_legacy_columns_at_all: Scenario A — explicitly asserts the exact #20842 crash no longer occurs and that consecutive_failures defaults to 0 on a DB that never had spawn_failures - Add test_legacy_migration_both_columns_already_present: Scenario D — asserts the migration is a no-op when both columns already exist, preserving the existing counter value	2026-05-06 11:25:16 -07:00
helix4u	b1d420e75f	fix(kanban): avoid fragile failure-column renames	2026-05-06 11:25:16 -07:00
Yuqian	441ef75d15	fix(feishu): keep topic replies in threads Route Feishu topic progress, status, approval, stream, and fallback messages through threaded replies by preserving the originating message id as the reply target. Add regressions for tool progress topic metadata and Feishu metadata-driven reply routing.	2026-05-06 10:52:51 -07:00
kshitij	5c906d7026	feat(web): add SearXNG as a native search-only backend Adds SearXNG as a free, self-hosted web search provider. SearXNG is a privacy-respecting metasearch engine that requires no API key — just a running instance and SEARXNG_URL pointing at it. ## What this adds - `tools/web_providers/searxng.py` — `SearXNGSearchProvider` implementing `WebSearchProvider` (search only; no extract capability) - `_is_backend_available("searxng")` — gates on SEARXNG_URL - `_get_backend()` — accepts "searxng" as a configured value; adds it to auto-detect candidates (lower priority than paid services) - `web_search_tool` — dispatches to SearXNG when it is the active backend - `check_web_api_key()` — includes SearXNG in availability check - `OPTIONAL_ENV_VARS["SEARXNG_URL"]` — registered with tools=["web_search"] - `tools_config.py` — SearXNG appears in the `hermes tools` provider picker - `nous_subscription.py` — `direct_searxng` detection, web_active / web_available - `setup.py` — SEARXNG_URL listed in the missing-credential hint - 23 tests covering: is_configured, happy-path search, score sorting, limit, HTTP/request errors, _is_backend_available, _get_backend, check_web_api_key ## Config ```yaml # Use SearXNG for search, any paid provider for extract web: search_backend: "searxng" extract_backend: "firecrawl" # Or: SearXNG as the sole backend (web_extract will use the next available) web: backend: "searxng" ``` SearXNG is search-only — it does not implement WebExtractProvider. Users who only configure SEARXNG_URL get web_search available; web_extract falls back to the next available extract provider (or is unavailable if none). Closes #19198 (Phase 2 Task 4 — SearXNG provider) Ref: #11562 (original SearXNG PR)	2026-05-06 10:05:29 -07:00
kshitij	cd2cbc73b7	refactor(web): per-capability backend selection for search/extract split Introduce the foundation for independently selecting web search and extract backends — enabling future combinations like SearXNG for search + Firecrawl for extract. Architecture: - tools/web_providers/base.py: WebSearchProvider and WebExtractProvider ABCs with normalized result contracts (mirrors CloudBrowserProvider) - tools/web_tools.py: _get_search_backend() and _get_extract_backend() read per-capability config keys, fall through to shared web.backend - hermes_cli/config.py: web.search_backend and web.extract_backend in DEFAULT_CONFIG (empty = inherit from web.backend) Behavioral change: - web_search_tool() now dispatches via _get_search_backend() - web_extract_tool() now dispatches via _get_extract_backend() - When per-capability keys are empty (default), behavior is identical to before — _get_search_backend() falls through to _get_backend() This is purely structural — no new backends are added. SearXNG and other search-only/extract-only providers can now be added as simple drop-in modules in follow-up PRs. 12 new tests, 49 existing tests pass with zero regressions. Ref: #19198	2026-05-06 09:16:25 -07:00
Teknium	a24789d738	fix(opencode-go): keep users on opencode-go instead of hijacking to native providers (#20802 ) OpenCode Go and OpenCode Zen are flat-namespace model resellers — their /v1/models returns bare IDs (deepseek-v4-flash, minimax-m2.7), and the inference API rejects vendor-prefixed names with HTTP 401 'Model not supported'. Two bugs fixed: 1. `switch_model` in hermes_cli/model_switch.py was silently switching the user off opencode-go to native deepseek when they typed `/model deepseek-v4-flash`. Step d found the model in opencode-go's live catalog, but step e (detect_provider_for_model) still ran and matched the bare name against deepseek's static catalog. Fix: track whether the live catalog resolved it; skip step e when it did. 2. `normalize_model_for_provider` in hermes_cli/model_normalize.py only stripped the exact `opencode-zen/` prefix, leaving arbitrary vendor prefixes like `minimax/minimax-m2.7` (commonly copied from aggregator slugs into fallback_model configs) intact — causing HTTP 401s when the fallback chain activated. Fix: opencode-go/opencode-zen strip ANY leading vendor prefix because their APIs are flat-namespace. Tests: 11 new cases in tests/hermes_cli/test_opencode_go_flat_namespace.py covering both normalization (prefix stripping, regression guards for opencode-zen Claude hyphenation and openrouter vendor-prepending) and switch_model (bare-name resolution on opencode-go's live catalog must not trigger cross-provider hijack). Reported by @Ufonik via Discord; Kimi K2.6 always worked because moonshotai has no overlapping entry in a native provider's static catalog. Deepseek and minimax failed because their v4/v2.7 names existed in the native deepseek/minimax catalogs.	2026-05-06 09:08:33 -07:00
Cleo	906881c38b	fix(cli): catch OSError in _resolve_attachment_path to prevent ENAMETOOLONG dropping long slash commands When the user pastes a long slash command like \`/goal <long prose>\` into \`hermes chat\`, the input flows into \`_detect_file_drop()\`, whose \`starts_like_path\` prefilter accepts anything starting with \`/\` and forwards it to \`_resolve_attachment_path()\`. That helper calls \`Path.exists()\` which invokes \`os.stat()\`, which raises \`OSError(errno=ENAMETOOLONG)\` — 63 on macOS, 36 on Linux — when the candidate exceeds NAME_MAX (typically 255 bytes). The OSError propagates up to the broad \`except Exception\` in \`process_loop\` (cli.py:11798), gets logged at WARNING level, and the user's input is silently dropped. From the user's POV the chat prompt hangs — the only signal is in agent.log: WARNING cli: process_loop unhandled error (msg may be lost): [Errno 63] File name too long: "/goal Drive the space board..." This affects any slash command with prose-length arguments — \`/goal\` in particular but also \`/skill\`, \`/cron\`, custom user commands. Fix: wrap the \`exists()\`/\`is_file()\` calls in try/except OSError so structurally-invalid path candidates cleanly return None. The slash- command dispatch path downstream (cli.py:11718) then handles the input correctly. Tests: two new regression cases in test_cli_file_drop.py cover the original \`/goal\` reproducer and a synthetic long path. All 35 file- drop tests pass. Reproducer (without the fix): python -c "from cli import _detect_file_drop; _detect_file_drop('/goal ' + 'a'*300)" → OSError: [Errno 63] File name too long	2026-05-06 06:34:48 -07:00
Teknium	a0fedfbb1b	feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709 ) Replaces the per-directory shadow-repo design with a single shared shadow git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated across every working directory the agent has ever touched; a dozen worktrees of the same project cost near-zero in additional disk. Why --- Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/ grow to multi-GB on active machines: 1. Each working directory got its own full shadow git repo — no object dedup across projects or across worktrees of the same project. 2. _prune() was a documented no-op: max_snapshots only limited the /rollback listing. Loose objects accumulated forever. 3. Defaults: enabled=True, auto_prune=False — users paid the disk cost without ever asking for /rollback. Field report on a single workstation: 847 MB across 47 shadow repos, mostly redundant clones of the hermes-agent source tree. Changes ------- - tools/checkpoint_manager.py: full rewrite. Single bare store, per-project refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>), per-project metadata (store/projects/<hash>.json with workdir + created_at + last_touch). On first v2 init, any pre-v2 per-directory shadow repos are auto-migrated into legacy-<timestamp>/ so the new store starts clean. _prune() now actually rewrites the per-project ref to the last max_snapshots commits and runs git gc --prune=now. New _enforce_size_cap() drops oldest commits round-robin across projects when the store exceeds max_total_size_mb. _drop_oversize_from_index() filters any single file larger than max_file_size_mb out of the snapshot. - hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI (status / list / prune / clear / clear-legacy) for managing the store outside a session. - hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20, auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10. Tightened DEFAULT_EXCLUDES (added target/, .so/.dylib/.dll, .mp4/.mov, .zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.). - run_agent.py / cli.py / gateway/run.py: thread the new kwargs through AIAgent and the startup auto_prune hooks. - Tests rewritten to match v2 storage while keeping backwards-compat coverage for the pre-v2 prune path (per-directory shadow repos under base/ are still swept correctly for anyone mid-migration). - Docs updated: user-guide/checkpoints-and-rollback.md explains the shared store, new defaults, migration, and the new CLI; reference/cli-commands.md documents 'hermes checkpoints'. E2E validated ------------- - Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/. - Object dedup: two projects with an identical shared.py blob resolve to 7 total objects in the store (v1 would have stored the blob twice). - max_snapshots=3 actually enforced: after 6 commits, list shows 3. - Orphan prune: deleting a project's workdir + 'hermes checkpoints prune --retention-days 0' removes its ref, index, and metadata; GC reclaims the objects. - max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the tracked source code files. - hermes checkpoints {status,prune,clear,clear-legacy} all work from the CLI without an agent running. Breaking / migration -------------------- No in-place data migration — legacy per-directory shadow repos are moved into legacy-<timestamp>/ on first run. Old /rollback history is still accessible by inspecting the archive with git; run 'hermes checkpoints clear-legacy' to reclaim the space when ready. Users relying on /rollback must now set checkpoints.enabled=true (or pass --checkpoints) explicitly.	2026-05-06 05:44:35 -07:00
helix4u	76074d9ee6	fix(cli): recover classic CLI output after resize	2026-05-06 04:20:54 -07:00
adybag14-cyber	e45df2e81e	fix(ui): reduce status-line jitter while scrolling	2026-05-06 04:02:09 -07:00

1 2 3 4 5 ...

3288 commits