hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-13 14:02:16 +00:00

Author	SHA1	Message	Date
QuenVix	7245bc77eb	fix(fallback): merge fallback_providers with legacy fallback_model configurations	2026-05-23 05:24:57 -07:00
Teknium	7f1b2b4569	fix(approval): pin 'silence is not consent' contract on timeout/deny (#24912 ) (#30879 ) User incident (Slack, 2026-05-13): user walked away mid-conversation, agent requested approval to run `rm -rf .git`, the prompt timed out after the gateway_timeout (default 300s), and the agent removed the .git folder on its own. Corroborated by an independent report from a Telegram user. The underlying code path was correct — `check_all_command_guards` returns `approved=False` with a BLOCKED message on both timeout and explicit deny, and `terminal_tool` surfaces that as `status=blocked` to the agent. The bug is at the model-interface layer: the message "BLOCKED: Command timed out. Do NOT retry this command." reads to some models as "try a different command achieving the same outcome." This commit changes only the model-facing message + the structured return shape: - Timeout message now explicitly names the three evasion paths the agent must avoid: retry, rephrase, AND achieve the same outcome via a different command. Ends with "Silence is not consent." - Explicit deny gets the same shape minus the silence-is-not-consent line (it WAS an explicit deny, not silence). - New structured fields on the return dict: `outcome` ("timeout" or "denied") and `user_consent` (always False on this branch) so plugins, hooks, and audit pipelines don't have to string-parse the message to distinguish the two cases. The mechanism that should already have prevented the original incident — timeout treated as deny, BLOCKED result, post hook fires with `choice="timeout"` — is unchanged. This commit hardens only the agent's reading of the result. Tests: - test_timeout_returns_approved_false_with_no_consent — pins the return shape on the Slack-shaped notify_cb-registered path - test_timeout_message_is_emphatic_against_retry_and_rephrase — pins the exact phrases the message must contain - test_explicit_deny_carries_same_no_consent_shape — same contract on explicit /deny - test_timeout_emits_post_hook_with_timeout_outcome — pins the post_approval_response hook payload so audit plugins can act 329 approval tests passing (4 new + 325 existing). Fixes #24912	2026-05-23 02:59:13 -07:00
Teknium	6855d17753	fix(memory): guard against external drift in MEMORY.md/USER.md (#26045 ) (#30877 ) Reproduction (production, 2026-05-14): two concurrent sessions on the same agent. Session A patches MEMORY.md directly via the patch tool, appending ~8KB of structured content (Vendor Master, Standing Orders, Pin Board) — none of it through the memory tool, so no § delimiters. Session B starts later with stale in-memory state (1 entry, ~331 chars). Session B calls memory(action=replace) on its one known entry. The tool's _read_file parses A's content as a single 8KB 'entry' (no § splits), then replace truncates that entry to B's new 333-byte content. ~8KB of structured content silently destroyed. The atomic-rename write path is fine in isolation. The bug is the implicit contract: the tool assumes MEMORY.md is exclusively a §-delimited list of small entries it wrote, but the v0.13 install runbook itself uses 'cat >> MEMORY.md' for onboarding, the patch tool edits the file directly, and operators do too. Fix: a drift guard in MemoryStore._detect_external_drift that fires on either signal: 1. Re-parse + re-serialize doesn't produce identical bytes (catches oddly-encoded delimiters / partial writes). 2. Any single parsed entry exceeds the store's whole-file char limit. The tool budgets the ENTIRE store against that limit (2200 chars for memory, 1375 for user), so no tool-written entry can legitimately be larger. An entry bigger than the store limit means an external writer dropped free-form content into what the tool will treat as one entry. When drift fires, _reload_target writes a .bak.<ts> snapshot of the on-disk file, then add/replace/remove refuse to flush. The original file stays untouched. The error dict surfaces the .bak path AND a remediation string ('integrate missing entries via memory(add=...) one at a time, then rewrite the file clean') so the model can act on it without escalating to the operator. Tests: - test_replace_refuses_on_drift, test_add_refuses_on_drift, test_remove_refuses_on_drift — all three mutators refuse - test_clean_file_does_not_trigger_drift — false-positive check - test_error_message_points_at_remediation — error string shape - test_drift_guard_also_protects_user_target — USER.md too - test_drift_backup_filename_is_unique_per_invocation — bak.<ts> naming pin 144 memory tests passing (was 137; +7). Fixes #26045	2026-05-23 02:51:29 -07:00
xxxigm	b5ea6a5c80	test(xai-oauth): regression coverage for the bad-credentials disambiguator (#29344 ) Eleven new tests pinning the #29344 fix. Layout mirrors the existing "Fix D" entitlement section so the bad-credentials disambiguator sits alongside the entitlement-block tests it complements. Classifier-level coverage: * ``test_is_entitlement_failure_false_for_bad_credentials_wke_suffix`` — verbatim shape from the reporter's wire capture (``{code: 'caller does not have permission', error: 'OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]'}``) ↦ classifier must return False so the refresh path runs. * ``test_is_entitlement_failure_false_for_wke_suffix_in_normalized_shape`` — same body after ``_extract_api_error_context`` has rewritten it to ``{reason, message}``. The disambiguator must fire in BOTH shapes; without this guard the production call site at ``_recover_with_credential_pool`` (which goes through the normalised extractor) would still misclassify. * ``test_is_entitlement_failure_false_for_any_wke_unauthenticated_variant`` — parametrised forward-compat: ``bad-credentials``, ``expired-token``, ``revoked``, ``some-future-reason``. xAI documents the prefix as stable, the suffix after the colon as a reason code that can grow; every variant under ``unauthenticated:`` must route to refresh. * ``test_is_entitlement_failure_false_via_oauth2_validation_phrase_alone`` — belt-and-braces guard: if a future API revision drops the WKE suffix but keeps "OAuth2 access token could not be validated", we still classify correctly. * ``test_is_entitlement_failure_wke_signal_overrides_entitlement_keywords`` — defensive: if a body ever carries BOTH the WKE suffix and entitlement language, the WKE signal wins. Auth is recoverable; entitlement isn't, and a refreshed token will resurface the entitlement message on the next request. * ``test_is_entitlement_failure_case_insensitive_wke_match`` — pins that the classifier lowercases the haystack so a future xAI build that uppercases the prefix doesn't reintroduce the bug. Recovery-path coverage (end-to-end through ``_recover_with_credential_pool``): * ``test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403`` — the headline test the reporter requested: a bad-credentials 403 with the exact wire body must call ``try_refresh_current()`` exactly once and ``_swap_credential`` once. Pre-fix this returned ``(False, _)`` because the entitlement classifier over-matched and short-circuited the refresh path. * ``test_recover_with_credential_pool_still_blocks_real_entitlement`` — companion regression guard for #26847: a pure unsubscribed- account body (no WKE suffix, no OAuth2-validation phrase) must still surface as entitlement and skip refresh. The new disambiguator must not weaken the original loop-protection it was added to preserve. The scaffolding reuses ``_make_codex_agent``, ``_FakePool``, and the existing ``MagicMock`` patterns from the surrounding tests so the new section reads as a natural extension of "Fix D" rather than a separate test file.	2026-05-23 02:48:13 -07:00
Teknium	9acf949e34	feat(telegram): edit status messages in place instead of appending (#30864 ) Closes #30045. Based on @qike-ms's PR #30141. Telegram status callbacks (lifecycle, compression, context-pressure) used to append a fresh bubble on every emit. Now adapter tracks {(chat_id, status_key) -> message_id}; first call sends, subsequent calls edit. Failed edits drop the cache entry and fall through to a fresh send. - gateway/platforms/telegram.py: send_or_update_status() (+34 LOC) - gateway/run.py: route _status_callback_sync through it when the adapter supports it; plain adapter.send() otherwise (+15 LOC) - 5 tests covering first send / edit-in-place / edit-failure fallback / distinct key & chat isolation	2026-05-23 02:42:10 -07:00
Teknium	4b6d68bd64	test(fast-command): stub _load_gateway_runtime_config too PR `2362cc468` ("fix(gateway): enforce env variable template expansion on runtime config loaders") refactored `_load_service_tier` to read config via the new `_load_gateway_runtime_config` wrapper instead of opening `_hermes_home/config.yaml` directly. The `test_run_agent_passes_priority_processing_to_gateway_agent` test still only stubbed `_load_gateway_config` (the inner loader), so the runtime wrapper saw an empty config and `_load_service_tier` returned None, breaking the test: FAILED tests/gateway/test_fast_command.py::test_run_agent_passes_priority_processing_to_gateway_agent - AssertionError: assert None == 'priority' Fix: also stub `_load_gateway_runtime_config` to return the expected `agent.service_tier=fast` config, so the test once again drives the priority routing path it was written to verify. Confirmed reproducing on current main before the patch and passing after.	2026-05-23 02:40:33 -07:00
Zyrixtrex	61ac118724	fix(webhook): enforce INSECURE_NO_AUTH safety rail on dynamic route reloads	2026-05-23 02:39:12 -07:00
Teknium	6942b1836e	fix(skills_guard): explain why --force is rejected on dangerous verdicts Follow-up to @sprmn24's verdict-logic fix. The previous block-message ended in 'Use --force to override' regardless of verdict — but as of the --force fix above, dangerous community/trusted skills can't be overridden by --force at all. The misleading hint sends users in a loop. Replace it with a specific message that tells them what the documented behavior actually is. Adds two regression tests covering the dangerous-verdict message shape and one that pins the existing --force hint for non-dangerous blocks.	2026-05-23 02:37:30 -07:00
sprmn24	789043b691	fix(security): update tests for verdict and --force changes	2026-05-23 02:37:30 -07:00
Teknium	db489a315f	fix(tests): allowlist tmp_path for kanban_notify artifact delivery (#30852 ) `_deliver_kanban_artifacts` routes candidates through `BasePlatformAdapter.filter_local_delivery_paths` (added in `41d2c758c`), which rejects paths outside `MEDIA_DELIVERY_SAFE_ROOTS`. The two artifact-delivery tests create fixtures under `tmp_path`, which lives outside the cache roots — so under CI's hermetic HOME the filter silently dropped both fake files and the assertions on `images_uploaded` / `documents_uploaded` failed. Fix: monkeypatch `HERMES_MEDIA_ALLOW_DIRS=str(tmp_path)` in both tests so the safety filter accepts the fixtures. Production behaviour unchanged; test-side fix only. CI fail repro on origin/main: test (6) shard, both test_notifier_uploads_artifacts_on_completion and test_notifier_artifact_delivery_skips_missing_files.	2026-05-23 02:34:34 -07:00
xxxigm	5b6f0b695b	test(tls-fd-recycle): pin shutdown-only + thread-aware close contract (#29507 ) Ten regressions across both prongs of the #29507 fix, organised so each test names exactly which way the bug could come back: Prong 1 — ``force_close_tcp_sockets``: * ``shutdown_only_no_close`` is the smoking-gun assertion. If a future refactor adds back ``sock.close()`` to this helper, the FD-recycling race that wrote TLS bytes on top of ``kanban.db`` is back, and this trips. * ``uses_shut_rdwr`` pins that both halves are shut down (a half-close wouldn't unblock a worker stuck in ``recv``). * ``swallows_oserror_on_shutdown`` covers the already-shutdown case. * ``handles_multiple_pool_entries`` walks all pool connections. Prong 2 — thread-aware ``_close_request_client_once``: * ``stranger_thread_aborts_only_no_close`` simulates the asyncio_0 → Thread-1616 interrupt path: stranger drives abort, holder stays populated for the worker's eventual finally. * ``owner_thread_pops_and_full_close`` is the worker-thread path: pops + full close. * ``stranger_then_owner_close_sequence_runs_full_close_exactly_once`` replays the reporter's exact timeline at object level: abort runs once, full close runs once, holder ends empty. Agent surface: * ``_abort_request_openai_client_does_not_call_client_close`` pins that the new entrypoint shuts sockets and emits the ``deferred_close=stranger_thread`` marker but never calls ``client.close()``. * ``_abort_request_openai_client_null_client_is_noop`` defensive. End-to-end: * ``fd_recycle_window_closed_by_shutdown_only`` reproduces the race at object level — runs the abort path from a stranger thread and asserts that no ``close()`` ever fires, so the kernel can never recycle the FD under the owner's still-active reference.	2026-05-23 02:31:10 -07:00
xxxigm	e2a7d73a66	fix(force_close_tcp_sockets): shutdown only, do not release FD (#29507 ) The helper used to call ``socket.shutdown(SHUT_RDWR)`` followed by ``socket.close()`` to drop CLOSE-WAIT entries immediately. On its own ``shutdown()`` is safe from any thread — it only sends FIN and breaks pending ``recv``/``send`` — but ``close()`` releases the FD integer to the kernel. When the helper runs on a stranger thread (the interrupt loop, the stale-call detector) the FD release races the owning httpx worker thread that still has the same integer cached inside the SSL BIO. The kernel then recycles that integer to the next ``open()`` call — in production, kanban dispatcher's ``kanban.db`` — and the worker's delayed TLS flush writes a 24-byte TLS application-data record on top of the SQLite header. Restrict the helper to ``shutdown(SHUT_RDWR)`` only. The owning httpx worker's own unwind will close the underlying socket via the same Python ``socket.socket`` object, which atomically swaps ``_fd`` to -1 before issuing ``close(2)`` — no FD-aliasing window. The log field ``tcp_force_closed=N`` is kept (now counts shutdowns) so existing dashboards / log parsers keep working.	2026-05-23 02:31:10 -07:00
walli	0e7448d63a	fix(qqbot): use original attachment filename for cached files Add original_name parameter to _download_and_cache, preferring the attachment metadata filename over the CDN URL path basename. Previously files were cached with meaningless QQ CDN hash names (e.g. qqdownload_...oadftnv5), causing ugly filenames when sent back to users. Aligns with qqbot-agent-sdk's AttachmentDownloader.download_document.	2026-05-23 02:27:17 -07:00
walli	a54f5afc70	fix(qqbot): handle op 7/9 and expand fatal close code set 1. Handle op 7 (Server Reconnect): close WS to trigger reconnect loop while preserving session for Resume 2. Handle op 9 (Invalid Session): check d value to determine if session is resumable; clear session only when not resumable 3. Remove 4009 from session-clearing set (connection timeout is resumable) 4. Expand fatal close codes: 4001/4002/4010-4014 now stop reconnect immediately instead of retrying uselessly 5. Add unit tests	2026-05-23 02:27:17 -07:00
walli	bbd77d165c	fix(qqbot): add INTERACTION intent and expose video/file cached paths 1. Add INTERACTION intent bit (1<<26) to _send_identify, fixing approval button clicks not being received (INTERACTION_CREATE events were never dispatched by the gateway) 2. Include local cached path in video/file attachment descriptions so the LLM can reference files for re-sending to users 3. Add unit tests (TestIdentifyIntents, TestProcessAttachmentsPathExposure)	2026-05-23 02:27:17 -07:00
QuenVix	2362cc4688	fix(gateway): enforce env variable template expansion on runtime config loaders	2026-05-23 02:27:08 -07:00
QuenVix	d21ac579e9	fix(gateway): honor key_env in auth-failure fallback resolution	2026-05-23 02:25:53 -07:00
Teknium	99671a8634	test(kanban): allow tmp_path artifacts past media-delivery validator PR #`41d2c758c` ("Fix unsafe gateway media path delivery") tightened `validate_media_delivery_path` so that artifacts emitted by the agent must live inside `MEDIA_DELIVERY_SAFE_ROOTS` (Hermes-managed cache dirs) or an operator-allowlisted root via `HERMES_MEDIA_ALLOW_DIRS`. Two kanban-notifier tests put their PDFs and PNGs under pytest's `tmp_path`, which is correctly rejected by the new validator. They started failing on main as soon as that PR landed: FAILED tests/hermes_cli/test_kanban_notify.py::test_notifier_uploads_artifacts_on_completion FAILED tests/hermes_cli/test_kanban_notify.py::test_notifier_artifact_delivery_skips_missing_files Symptom in logs: "Skipping unsafe local file path outside allowed roots". The validator is doing exactly what it should — the tests were relying on the looser pre-fix behaviour. Fix: add `HERMES_MEDIA_ALLOW_DIRS=tmp_path` to the `kanban_home` fixture so artifacts under `tmp_path` are recognised as safe. This is the same allowlist mechanism the operator-facing env var documents.	2026-05-23 02:25:09 -07:00
teknium1	70aaa774be	fix(opencode-go): emit Kimi reasoning_effort, match KimiProfile shape The Kimi K2 branch added in the prior commit only emitted extra_body.thinking and dropped reasoning_effort entirely. KimiProfile (api.moonshot.ai/v1) sends both fields, and OpenCode Go proxies to the same Moonshot backend. Mirror that shape on the Go path so /reasoning effort actually reaches Kimi. - low/medium/high pass through verbatim - xhigh/max clamp to high (Moonshot's max supported value) - minimal / unknown effort → omit reasoning_effort, keep thinking on - disabled / no config → unchanged - DeepSeek branch unchanged	2026-05-23 02:20:28 -07:00
Harish Kukreja	3589960e03	fix(provider): expose OpenCode Go reasoning controls	2026-05-23 02:20:28 -07:00
helix4u	71291d83cd	test: keep tirith checks hermetic	2026-05-23 02:20:14 -07:00
QuenVix	52a368fa72	fix(gateway): preserve WhatsApp pairing approvals across JID/LID alias flips	2026-05-23 01:46:34 -07:00
Teknium	3127a41cb1	test(acp): pin parse_model_input in slash-command tests The two ACP slash-command tests that exercise `provider:model` routing (`test_set_session_model_accepts_provider_prefixed_choice` and `test_model_switch_uses_requested_provider`) relied on the live `hermes_cli.models._KNOWN_PROVIDER_NAMES` / `_PROVIDER_ALIASES` module state to parse `anthropic:claude-sonnet-4-6` into `("anthropic", "claude-sonnet-4-6")`. If any earlier test in the same xdist worker registers a custom provider that shadows `anthropic` or otherwise mutates those globals, the parser falls into the `detect_provider_for_model` branch and resolves to `custom` instead. Observed once in CI on run 26326728502 / job 77505732299 as `AssertionError: assert 'custom' == 'anthropic'` — could not reproduce locally under per-file isolation, so the failing in-file order was specific to a particular xdist scheduling. Monkeypatching `parse_model_input` + `detect_provider_for_model` for both tests removes the global-catalog dependency, so the tests now only exercise what they were written to verify (the `requested_provider -> runtime -> AIAgent kwargs` plumbing).	2026-05-23 01:44:56 -07:00
xxxigm	da636e982b	test(plugins): regression coverage for project-plugin RCE chain (#29156 ) 35 new tests across 5 classes covering every layer of the GHSA-5qr3-c538-wm9j defence. Each class corresponds to one chokepoint so a regression in any single layer is caught by the named class: * ``TestProjectPluginsEnvGate`` (13 cases) — parametrised over both the documented truthy values (``1`` / ``true`` / ``yes`` / ``on`` + uppercase variants) and the previously-bypassing falsy strings (``0`` / ``false`` / ``no`` / ``off`` / ``""`` / ``False``). The falsy half is the direct env-bypass repro: pre-fix any non-empty string enabled the project source. * ``TestApiPathSanitizer`` (16 cases) — unit-level coverage of the new ``_safe_plugin_api_relpath`` helper. Absolute paths (``/etc/passwd``, ``/tmp/payload.py``, ``/usr/bin/python``), ``..``-traversal payloads (including nested ``subdir/../../..``), and non-string / empty / whitespace-only values must all return ``None``. Safe relative paths (``api.py``, ``backend/routes.py``) round-trip unchanged so legitimate plugins keep working. * ``TestDiscoveryScrubsApiField`` (3 cases) — end-to-end through ``_discover_dashboard_plugins`` with a real manifest on disk. Verifies that the cached plugin entry's ``_api_file`` is scrubbed at discovery time (``None`` + ``has_api: False``) so any downstream consumer can't be tricked into re-deriving the unsafe path from cache. * ``TestMountApiRoutesRefusesUntrusted`` (3 cases) — pokes synthetic plugin entries with each refusal vector directly into the cache and patches ``importlib.util.spec_from_file_location`` to assert it is not invoked for project-source / traversal payloads, and is invoked normally for bundled / user plugins. * ``TestEndToEndPocBlocked`` (1 case) — reproduces the original advisory PoC: operator sets ``HERMES_ENABLE_PROJECT_PLUGINS=0`` believing project plugins are off, attacker plants a manifest in CWD's ``.hermes/plugins/`` with ``api`` pointing at an absolute payload path. Asserts that the importer is never called against the payload path and that ``hermes_dashboard_plugin_evil`` is not in ``sys.modules`` after the mount routine runs. An autouse fixture busts ``_dashboard_plugins_cache`` before and after each test so the production cache (populated by the import-time ``_mount_plugin_api_routes()`` call) can't bleed in. All 12 pre-existing dashboard-plugin tests in ``test_web_server.py`` still pass unchanged.	2026-05-23 01:43:52 -07:00
Eugeniusz Gilewski	41d2c758c3	Fix unsafe gateway media path delivery	2026-05-23 01:40:35 -07:00
Markus	4a91e36495	fix(gateway): separate observed Telegram group context	2026-05-23 01:33:42 -07:00
Teknium	97e975edd2	fix(file-safety): widen read-deny to .env, mcp-tokens/, webhook secrets, root Extends @briandevans's PR #17659 from {auth.json, auth.lock, .anthropic_oauth.json} to also cover: - HERMES_HOME/.env (provider API keys) - HERMES_HOME/webhook_subscriptions.json (per-route HMAC secrets) - HERMES_HOME/mcp-tokens/ (OAuth token directory; dir + everything inside) …AND iterates over both _hermes_home_path() AND _hermes_root_path() so profile-mode runs (HERMES_HOME = <root>/profiles/<name>) also block <root>/{auth.json, .env, mcp-tokens/, ...}. Same widening shape as the write-deny side already does (#15981, #14157). Explicitly NOT a security boundary. Per the personal-assistant trust model, the terminal tool runs as the same OS user and can `cat auth.json` directly. This read-deny exists as defense-in-depth: - Models that respect tool denials empirically tend to stop rather than reach for the shell. - The denial surfaces an audit trail when something tries to read credentials — easier to spot in logs than a generic `cat`. Docstring + error message both flag this as defense-in-depth so future contributors don't mistake it for a real security boundary and don't re-decline reports that propose the same fix shape. Absorbs the .env and mcp-tokens/ coverage from @tomqiaozc's parallel PR #8055 (closed-as-duplicate, credited). Co-authored-by: Tom Qiao <zqiao@microsoft.com>	2026-05-22 20:15:09 -07:00
briandevans	567ea61298	fix(file-safety): block auth.json read via TERMINAL_CWD relative path read_file_tool resolves relative paths against TERMINAL_CWD (or the task's live terminal cwd), but the prior call passed the original unresolved string to get_read_block_error. That function's own resolve() is anchored at the Python process cwd, so when a task's TERMINAL_CWD pointed at HERMES_HOME and the agent issued read_file on the relative path "auth.json", the credential-store denylist was never reached and the file was read normally. Pass the already-resolved absolute path string at the file_tools call site, document the contract on get_read_block_error, and add a read_file_tool-level regression test that pins the relative-path case under TERMINAL_CWD == HERMES_HOME. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 20:15:09 -07:00
briandevans	056e00a77e	fix(file-safety): block read_file on HERMES_HOME credential stores (#17656 ) `get_read_block_error` previously only denied reads inside `${HERMES_HOME}/skills/.hub`, which left `auth.json` (provider OAuth state + plaintext API keys) and `.anthropic_oauth.json` (Anthropic PKCE tokens) directly readable by the agent. A prompt-injection reaching `read_file` could exfiltrate active provider credentials in plaintext. Mode-0600 file permissions only protect against other Unix users — the agent runs as the file's owner, so `read_file` is unaffected. Extend the existing deny list with the three credential paths identified in #17656 (`auth.json`, `auth.lock`, `.anthropic_oauth.json`). The check uses the same `Path.resolve()` pattern as `skills/.hub`, so symlink/path-traversal indirection is caught too. The agent doesn't need to read these directly — `auxiliary_client` and `credential_pool` consume them through process env / OAuth flows that bypass `read_file`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 20:15:09 -07:00
Teknium	3f78d8073c	fix(skills): make content_hash filename-sensitive too (symmetric with bundle_content_hash) PR #6656 added rel_path + \x00 prefixing to ``bundle_content_hash`` so a filename swap between two files in a bundle changes the digest. But it only patched the in-memory side — ``content_hash`` in ``tools/skills_guard.py`` (the on-disk equivalent) still hashed file contents only. These two functions need to stay symmetric: ``check_for_skill_updates`` compares the disk hash of an installed skill against the bundle hash of the upstream copy. With the asymmetric fix, every clean install showed as drifted because the digests no longer matched (2 existing tests in ``test_skills_hub.py`` started failing as soon as the contributor's change landed). Apply the same ``rel_path + \x00 + content`` shape to the disk-side function. Both functions now produce the same digest for the same skill content laid out two ways. Documented the symmetry invariant in the docstring so a future change to either function knows to touch both. Also adds tests/tools/test_pr_6656_regressions.py with 10 regression tests covering all three fixes salvaged in PR #6656: - uninstall_skill path traversal (4 cases: parent segments, absolute paths, symlink escape, legitimate skill) - bundle_content_hash filename swap detection (4 cases: in-memory swap, identity, disk-side swap, bundle↔disk symmetry) - list_pending lock contract (2 cases: source-grep contract, smoke) Also fixes AUTHOR_MAP entry for @aaronlab — their commit email (1115117931@qq.com) maps to "aaronagent" which isn't a real GitHub login, so changelog @mentions would 404.	2026-05-22 19:59:24 -07:00
teknium1	8cf977c8b1	fix(plugins): widen _sanitize_plugin_name for category-namespaced names Follow-up to PR #28832 — the dashboard plugin routes now accept slashed names like `observability/langfuse` and `image_gen/openai`, but `_sanitize_plugin_name` still rejected forward slash and so dashboard update + remove on those plugins fell through to '404 not found' even though they exist on disk. Adds an opt-in `allow_subdir=True` flag that: - Permits internal forward slashes (category-namespaced plugin keys emitted by `_discover_all_plugins`). - Strips leading and trailing slashes. - Still rejects `..` and backslash, and still asserts the resolved target lives inside `plugins_dir`. Opted in at the two read-paths that operate on installed plugins: `_require_installed_plugin` (CLI update/remove) and `_user_installed_plugin_dir` (dashboard update/remove). The install path keeps the default (`allow_subdir=False`) because freshly-cloned plugins always land top-level under `~/.hermes/plugins/<name>/`. Adds 6 targeted unit tests covering the new flag's allow/reject matrix.	2026-05-22 19:50:32 -07:00
ethernet	f89afdbd17	fix(test): deflake two intermittent CI failures - test_browser_secret_exfil: mock _run_browser_command instead of launching real Chrome (secret check is pre-launch, browser is irrelevant to the assertion) - test_web_server: add time.sleep(0.05) after pub.send_text() to yield the event loop before receive_text(). TestClient's sync mode can race the broadcast handler otherwise, hanging the test.	2026-05-22 19:46:18 -07:00
Teknium	a84cec61ca	fix(minimax-oauth): refresh short-lived access tokens per request (#30619 ) * fix(minimax-oauth): refresh short-lived access tokens per request MiniMax OAuth issues ~15-minute access tokens. The Anthropic SDK caches api_key as a static string at client construction, so a session that resolves credentials once at startup keeps sending the same bearer until MiniMax returns 401 mid-session. Swap the static string for a callable token provider, reusing the existing Entra-ID bearer-hook infrastructure in build_anthropic_client. The callable re-reads auth.json on each invocation and calls _refresh_minimax_oauth_state, which is a no-op when the token still has more than 60s of life left and refreshes proactively otherwise. Refreshes persist to auth.json so other processes (gateway, cron) see them immediately. The wire-up lives at the agent-init / model-switch boundary rather than in resolve_runtime_provider, so aux client paths that hand the api_key string to OpenAI(api_key=...) are unaffected. * docs: add infographic for minimax-oauth token refresh	2026-05-22 15:16:15 -07:00
adybag14-cyber	a3beee475b	perf(termux): speed up bare cli prompt startup	2026-05-22 14:27:38 -07:00
adybag14-cyber	6c3fd9714f	perf(termux): fast-path cli version startup	2026-05-22 14:27:38 -07:00
kshitijk4poor	cc8e5ec2af	refactor(gateway): migrate Discord adapter to bundled plugin (full Teams parity) First migration of an existing built-in platform adapter to the plugin system established by IRC / Teams / LINE / Google Chat. Closes #24325; advances the umbrella refactor in #3823. Matches Teams' shape exactly — adapter under ``plugins/platforms/discord/`` with the standard ``__init__.py`` / ``adapter.py`` / ``plugin.yaml`` shell, ``register(ctx)`` entry point, no back-compat shim at the old import path, and full parity for the four hooks Teams uses plus the ``apply_yaml_config_fn`` hook that landed in #25443 (the Discord plugin is the first consumer of that hook): * ``standalone_sender_fn`` — out-of-process cron delivery via REST API * ``setup_fn`` — interactive ``hermes setup gateway`` wizard * ``apply_yaml_config_fn`` — translate ``config.yaml`` ``discord:`` keys into ``DISCORD_`` env vars (replaces the hardcoded block in ``gateway/config.py``) ``is_connected`` — declares connection state from ``DISCORD_BOT_TOKEN`` * ``check_fn`` — lazy-installs ``discord.py`` on demand * plus ``allowed_users_env``, ``allow_all_env``, ``cron_deliver_env_var``, ``max_message_length``, ``emoji``, ``required_env``, ``install_hint`` * ``gateway/platforms/discord.py`` (5,101 LOC) → ``plugins/platforms/discord/adapter.py`` (git rename, R090). * New ``plugins/platforms/discord/{__init__.py, plugin.yaml}`` with ``requires_env`` / ``optional_env`` declarations. * Append ``register(ctx)`` block + new hook implementations (``_standalone_send``, ``interactive_setup``, ``_apply_yaml_config``, ``_clean_discord_user_ids``, ``_is_connected``, ``_build_adapter``, plus helpers ``_DISCORD_CHANNEL_TYPE_PROBE_CACHE`` etc.) to the adapter. * Replace the ``Platform.DISCORD elif`` branch in ``GatewayRunner._create_adapter()`` (−9 LOC) with a generic post-creation hook (+6 LOC) in the registry path: any plugin adapter that declares a ``gateway_runner`` attribute now gets it auto-injected. Webhook's built-in branch is unchanged (it doesn't go through the registry path). * Move ``_send_discord`` (190 LOC) and helpers (``_DISCORD_CHANNEL_TYPE_PROBE_CACHE``, ``_remember_channel_is_forum``, ``_probe_is_forum_cached``, ``_derive_forum_thread_name``) from ``tools/send_message_tool.py`` into the plugin as ``_standalone_send``. * Wire via ``standalone_sender_fn=_standalone_send`` (Teams pattern; same gap fixed in #21804 for other plugin platforms). * Replace the Discord ``elif`` in ``tools/send_message_tool.py`` ``_send_to_platform`` with a 10-line registry-hook dispatch. * Drop the ``DiscordAdapter`` import and the ``Platform.DISCORD: DiscordAdapter.MAX_MESSAGE_LENGTH`` ``_MAX_LENGTHS`` entry — the registry's ``max_message_length=2000`` covers it. * Move ``_setup_discord`` and ``_clean_discord_user_ids`` (68 LOC) from ``hermes_cli/setup.py`` into the plugin as ``interactive_setup``. * Wire via ``setup_fn=interactive_setup``. CLI helpers (``prompt``, ``print_info``, etc.) are lazy-imported so the plugin's module-load surface stays minimal. * Remove ``"discord": _s._setup_discord`` from ``hermes_cli/gateway.py::_builtin_setup_fn``. * Remove the entire 32-line ``_PLATFORMS["discord"]`` static dict entry — Discord's setup metadata is now discovered dynamically via ``_all_platforms()`` from the registry entry. * Move the 59-line ``discord_cfg`` YAML→env bridge from ``gateway/config.py::load_gateway_config()`` into the plugin as ``_apply_yaml_config``. Covers ``require_mention``, ``thread_require_mention``, ``free_response_channels``, ``auto_thread``, ``reactions``, ``ignored_channels``, ``allowed_channels``, ``no_thread_channels``, ``allow_mentions.{everyone,roles,users, replied_user}``, and ``reply_to_mode`` (including the YAML 1.1 ``off``-as-False coercion and the ``extra.reply_to_mode`` fallback). * Wire via ``apply_yaml_config_fn=_apply_yaml_config``. * The hook runs BEFORE ``_apply_env_overrides`` and after the generic shared-key loop, exactly as documented in ``website/docs/developer-guide/adding-platform-adapters.md``. * Behavior is preserved exactly — every assignment still uses ``not os.getenv(...)`` guards so env vars take precedence over YAML. All 78 references to the old import path are rewritten — no back-compat shim: * 51 ``from gateway.platforms.discord import X`` → ``from plugins.platforms.discord.adapter import X`` * 5 ``import gateway.platforms.discord as discord_platform`` → ``import plugins.platforms.discord.adapter as discord_platform`` * 1 ``from gateway.platforms import discord as discord_mod`` → ``from plugins.platforms.discord import adapter as discord_mod`` * 21 ``mock.patch("gateway.platforms.discord.X")`` strings → ``mock.patch("plugins.platforms.discord.adapter.X")`` * 1 docstring reference in ``hermes_cli/commands.py`` * 1 import in ``tools/send_message_tool.py`` (now removed entirely) The import-safety test in ``tests/gateway/test_discord_imports.py`` is updated to purge the new canonical module name from ``sys.modules``. 38 files changed, +621 / −473 — net positive due to the YAML hook implementation (89 new LOC in the plugin trading for 59 deleted in core), but every line moved has a clear plugin home now. The git rename is detected at R090 because the adapter gained ~340 LOC of moved-in hook implementations (``_standalone_send`` + ``interactive_setup`` + ``_apply_yaml_config`` + helpers). * All 568 Discord-specific tests pass across 25 ``test_discord_.py`` files plus voice/send/text-batching/reload-skills/stream-consumer/ integration tests. All 147 tests in the YAML-touching subset (``test_discord_reply_mode``, ``test_discord_free_response``, ``test_discord_allowed_channels``, ``test_discord_allowed_mentions``, ``test_discord_channel_controls``, ``test_discord_reactions``, ``test_discord_thread_persistence``, ``test_runtime_footer``) pass — this is the strongest signal that the YAML→env hook behaves identically to the legacy block. * Broader gateway/cron/integration sweep (1297 tests) introduces zero new failures vs ``main``. Pre-existing failures in ``tests/gateway/test_tts_media_routing.py`` and ``tests/e2e/test_platform_commands.py`` reproduce identically on the unchanged ``main`` revision. * Plugin discovery sanity check confirms Discord registers alongside the other four platform plugins: Registered platforms: ['discord', 'google_chat', 'irc', 'line', 'teams'] These Discord-shaped tendrils in core were deliberately not moved — they are generic platform-registry concerns affecting every platform, not Discord-specific: * ``gateway/config.py:1205`` ``DISCORD_BOT_TOKEN → config.token`` env enablement — same shape Telegram has. The existing ``env_enablement_fn`` registry hook only seeds ``extra``, not ``.token``, so it can't replace this without an adapter refactor to read from ``extra["bot_token"]``. * ``gateway/run.py`` voice-mode hooks (``self.adapters.get(Platform.DISCORD)`` for ``start_voice_mode``/``stop_voice_mode``), role-based auth, ``DISCORD_ALLOW_BOTS`` branch in ``_is_user_authorized``, ``_UPDATE_ALLOWED_PLATFORMS`` frozenset, and the per-platform allowlist maps — generic platform-registry concerns. * ``Platform.DISCORD`` enum literal — stable identifier used as dict keys throughout the codebase; removing it is a separate refactor with no real benefit. * ``tools/discord_tool.py`` and ``tools/environments/local.py`` — first-class agent tools and env-passthrough config, neither is the gateway adapter. Each of these is worth its own scoping issue when the time comes.	2026-05-22 14:21:41 -07:00
Teknium	e32d2ffc1d	fix(security): wire Nous URL allowlist into refresh / mint persistence sites @memosr's PR #27612 put the inference_base_url allowlist check only at the Nous proxy adapter forward boundary. The poisoned URL, however, lands in ``auth.json`` upstream of that — at five refresh / agent-key-mint payload read sites inside ``resolve_nous_runtime_credentials`` and ``_extend_state_from_refresh``. Without gating those sites, a single MITM on a refresh response persists the attacker's URL across restarts, even if the proxy adapter's defense-in-depth check would later catch it on the way out. Replace ``_optional_base_url`` with ``_validate_nous_inference_url_from_network`` at all five Portal-network reads: - hermes_cli/auth.py L4840 (refresh-only access-token path) - hermes_cli/auth.py L4876 (mint payload path) - hermes_cli/auth.py L5154 (terminal-runtime access-token refresh) - hermes_cli/auth.py L5262 (cross-process serialized refresh) - hermes_cli/auth.py L5317 (terminal-runtime mint payload) The state-read path at L5025 (``state.get("inference_base_url")``) is deliberately NOT gated — pre-existing state in ``auth.json`` is either already validated (it came from one of the five network sites above) or set by a trusted local actor (manual edit, ``_setup_nous_auth`` test fixture, ``hermes login nous`` against a staging endpoint via the documented ``NOUS_INFERENCE_BASE_URL`` env override). Direct write_file / patch tampering with auth.json is independently blocked by PR #14157. Adds tests/hermes_cli/test_nous_inference_url_validation.py covering: - validator https + host + edge-case rules (12 cases) - all 5 network call sites grep contracts (no _optional_base_url regression possible without test failure) - proxy adapter defense-in-depth check still present - env override path NOT gated (documented dev/staging behaviour) 18 new tests, all 119 Nous-auth tests green.	2026-05-22 14:17:40 -07:00
Julien Talbot	09afafb87e	fix(xai): resolve Grok Build context for OAuth	2026-05-22 13:05:36 -07:00
Teknium	42104218e0	fix(file-safety): also write-deny <root>/control-files in profile mode PR #14157 added control-plane write-deny against the ACTIVE HERMES_HOME, which is fine in non-profile mode but leaves a gap once a profile is active: HERMES_HOME points at <root>/profiles/<name>, so the global <root>/auth.json + <root>/config.yaml + <root>/webhook_subscriptions.json + <root>/mcp-tokens/ remain writable. Same shape as the .env gap PR #15981 closed via _hermes_root_path(). Apply the same widening pattern here. The control-file/mcp-tokens check now iterates BOTH _hermes_home_path() and _hermes_root_path() (dedupes when they coincide in non-profile mode). Also tightens the mcp-tokens check from "startswith dir + os.sep" to "==dir OR startswith dir + os.sep" so writing the directory entry itself is blocked, not just files inside. Regression tests cover both protections in a real profile-mode layout (<tmp>/hermes/profiles/coder as HERMES_HOME, <tmp>/hermes as root).	2026-05-22 04:32:14 -07:00
Pratik Rai	1f5219fda5	fix(security): protect Hermes control-plane files from prompt injection Adds active-HERMES_HOME control-plane files to the write deny list: auth.json, config.yaml, webhook_subscriptions.json, and any path under mcp-tokens/. realpath() resolves before comparison so directory-traversal and symlink targets are normalised, preventing trivial deny-list bypass via ../ tricks. Without this, a prompt-injected agent could rewrite Hermes' own auth state or routing config via write_file / patch — without triggering the terminal dangerous-command approval — and persist attacker-controlled behaviour across sessions. Fixes #14072	2026-05-22 04:32:14 -07:00
Teknium	82c2035823	fix(pairing): handle legacy plaintext pending entries during upgrade When an existing install upgrades to the hashed-pending schema, its on-disk pending.json still has the old {code: entry} format with no hash/salt fields. The original PR #8056 assumed every entry had both fields and would have KeyErrored in approve_code, list_pending, and _cleanup_expired. Guard each consumer: - approve_code: skip entries that are not a dict, lack salt/hash, or have a non-hex salt. Legacy entries simply fail to match. - list_pending: tolerate missing 'hash' (show "legacy" placeholder) and non-numeric created_at (skip the row). - _cleanup_expired: treat malformed/legacy entries as expired so they get pruned on the next call rather than wedging the file. Regression tests cover all three consumers plus a mixed-malformed case.	2026-05-22 04:11:49 -07:00
Tom Qiao	2e509422ef	fix(security): hash gateway pairing codes instead of storing plaintext Pairing codes were stored as plaintext keys in JSON files. Now uses sha256 + random salt hashing with constant-time comparison. Fixes #8036 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-22 04:11:49 -07:00
0xDevNinja	3ac2125140	refactor(image_gen): port FAL backend to plugins/image_gen/fal Mirrors the architecture established by the web (#25182), browser (#25214), and video_gen (#25126) plugin migrations: * `tools/fal_common.py` — stateless atoms shared by both FAL-backed plugins (image_gen + video_gen). Holds the lazy `fal_client` import helper, `_ManagedFalSyncClient`, `_normalize_fal_queue_url_format`, `_extract_http_status`. Stateful pieces (`fal_client` module global, `_managed_fal_client` cache, `_submit_fal_request`, `_resolve_managed_fal_gateway`, `_get_managed_fal_client`) intentionally stay on `tools.image_generation_tool` so the existing `monkeypatch.setattr(image_tool, ...)` patch sites keep working unchanged. `plugins/video_gen/fal/__init__.py` — drops its inline `_load_fal_client` duplicate; consumes `tools.fal_common.import_fal_client`. * `plugins/image_gen/fal/{plugin.yaml,__init__.py}` — new plugin. `FalImageGenProvider` is a thin registration adapter that resolves the legacy module via `import tools.image_generation_tool as _it` and calls `_it.image_generate_tool` + `_it._resolve_fal_model` at call time. The 18-model catalog, `_build_fal_payload`, managed- gateway selection, and Clarity Upscaler chaining all remain in `tools.image_generation_tool` as the single source of truth — the plugin is a registration adapter, not a parallel implementation. * `tools/image_generation_tool.py::_dispatch_to_plugin_provider` — drops the `configured == "fal"` skip. Setting `image_gen.provider: fal` now routes through the registry like any other provider; the plugin re-enters this module's pipeline so behavior is identical. Unset `image_gen.provider` still falls through to the in-tree pipeline (preserves no-config-with-FAL_KEY UX from #15696). * `hermes_cli/tools_config.py` — drops the hardcoded "FAL.ai" row from `TOOL_CATEGORIES["image_gen"]["providers"]` (now injected by `_plugin_image_gen_providers` like every other backend) and the `getattr(provider, "name") == "fal"` skip that protected against duplication with the hardcoded row. The "Nous Subscription" row stays as a setup-flow entry — same shape browser kept "Nous Subscription (Browser Use cloud)" after #25214. * `tests/plugins/image_gen/test_fal_provider.py` — 14 cases covering the ABC surface, call-time indirection (verifying `monkeypatch.setattr(image_tool, "image_generate_tool", ...)` takes effect through the plugin), response-shape stamping, exception handling, and registry wiring. * `tests/plugins/image_gen/check_parity_vs_main.py` — subprocess harness mirroring `tests/plugins/browser/check_parity_vs_main.py`. Pins one path to origin/main, one to the worktree; runs six scenarios (unset, explicit-fal-no-creds, explicit-fal-with-creds, explicit-fal-with-model, typo provider, managed-gateway-only) and diffs the reduced shape `{dispatch_kind, provider_name, model}` per scenario. The only acceptable diff is "legacy_fal → plugin (fal)" for explicit-FAL paths — every other delta is flagged as a regression. * `tests/hermes_cli/test_image_gen_picker.py::test_fal_surfaced_alongside_other_plugins` — flips the previous `test_fal_skipped_to_avoid_duplicate` to match the new shape (FAL is a plugin now, no dedup needed). Verified: 195/195 tests across `tests/{tools/test_image_generation*,tools/test_managed_media_gateways,plugins/image_gen,plugins/video_gen,hermes_cli/test_image_gen_picker}.py` pass on this branch with no test patches modified outside the picker test that asserted the old skip behaviour. Fixes #26241	2026-05-22 04:10:45 -07:00
Teknium	3fc715ddf5	test(webhook): regression cases for empty-secret HMAC bypass Covers _reload_dynamic_routes() rejecting empty or missing per-route secrets when no global fallback exists, preserving the INSECURE_NO_AUTH opt-in, inheriting a global secret when only the per-route value is missing, and partial-skip when only one of multiple routes is bad.	2026-05-22 03:45:21 -07:00
briandevans	22b0d6dc1a	test(tools): centralize disable_lazy_stt_install fixture in conftest Move the autouse `_disable_lazy_stt_install` fixture out of the three transcription test files and into `tests/tools/conftest.py` as a regular (non-autouse) fixture. Each transcription test module opts in once at the top via `pytestmark = pytest.mark.usefixtures(...)`. Why: addresses three Copilot inline review comments on this PR that flagged the verbatim duplication across files. Centralizing also keeps the patch target in a single place, so a future rename of `_try_lazy_install_stt` only updates one location. Why opt-in (not autouse in conftest): other `tests/tools/` files do not patch `_HAS_FASTER_WHISPER` and have no reason to bypass the runtime lazy-install probe; making the fixture autouse globally would silently mask any future test that wants to exercise the real lazy-install path.	2026-05-22 03:33:01 -07:00
briandevans	5dc232a6e2	test(tools): disarm lazy-install probe so _HAS_FASTER_WHISPER patches work ``b5c6d9ac0`` ("fix: wire STT lazy-install into transcription_tools.py") added `_try_lazy_install_stt()`, which calls `importlib.util.find_spec("faster_whisper")` after `ensure()` runs. In the dev / CI environment `faster_whisper` is already installed, so the probe returns truthy and `_get_provider()` returns "local" even when the test has patched `_HAS_FASTER_WHISPER=False` to simulate "not installed". Add a per-file autouse fixture that patches `_try_lazy_install_stt` to return False so the simulation stays accurate. The 16 baseline failures across `test_transcription_tools.py`, `test_transcription.py`, and `test_transcription_dotenv_fallback.py` disappear; the production lazy-install path is unaffected at runtime.	2026-05-22 03:33:01 -07:00
Teknium	c25f9d1d36	feat(secrets): label detected credentials with their source (Bitwarden) (#30364 ) When Bitwarden Secrets Manager supplies a provider key, 'hermes model' and the setup wizard show 'credentials ✓' with no hint of where the key came from — identical to the .env case. Users assume the integration isn't wired up and re-enter the key (or hit Enter and cancel). env_loader now tracks which env vars were injected by an external secret source and exposes get_secret_source() / format_secret_source_suffix() so the provider flows can render 'Anthropic credentials: sk-ant-... ✓ (from Bitwarden)' instead of an unlabeled checkmark. Wired into _prompt_api_key (kimi, z.ai, minimax, opencode, ...), the Anthropic provider flow, the Bedrock flow, and the GitHub Copilot token display. Future secret sources (Vault, 1Password, etc.) drop in by setting their own label in _SECRET_SOURCES; format_secret_source_suffix() has a generic fallback so no call sites need updating.	2026-05-22 03:32:58 -07:00
sgtworkman	70d53d8b75	fix: run computer use post-setup when enabling tool	2026-05-22 01:24:11 -07:00
Rodrigo	fbdca64f73	fix(computer-use): skip capture_after when action failed (ok=False) _maybe_follow_capture() issued a follow-up screenshot unconditionally when capture_after=True, even when res.ok=False. The model then received a normal-looking screenshot alongside an error message, and in practice it often ignored ok=False and proceeded as if the action had succeeded. Fix: return _text_response(res) early when res.ok is False so the model receives only the error and can decide how to recover. Tests added: - test_capture_after_skipped_when_action_failed: patches click to return ok=False and asserts no capture call is issued. - test_capture_after_fires_when_action_succeeds: ensures the happy path still triggers the follow-up capture.	2026-05-22 01:19:01 -07:00
Rodrigo	c52cd48e25	fix(computer-use): add set_value to ComputerUseBackend ABC and _NoopBackend stub _dispatch() routes action="set_value" to backend.set_value(), but: - ComputerUseBackend did not declare set_value as @abstractmethod, so subclasses could silently omit it without a TypeError at class load time. - _NoopBackend (the test/CI stub) had no set_value method at all, causing AttributeError in any test that exercises the set_value action path. Fix: - Add set_value as @abstractmethod to ComputerUseBackend in backend.py. - Add a recording stub in _NoopBackend in tool.py. - Add two TestDispatch cases: one verifying the call reaches the backend, one verifying the missing-value guard returns a clean error.	2026-05-22 01:14:15 -07:00

1 2 3 4 5 ...

4154 commits