hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-10 08:32:09 +00:00

Author	SHA1	Message	Date
Bartok9	08c0b22417	fix(gateway): scope tool-result MEDIA scan to current turn The post-run scan that appends tool-emitted MEDIA: tags to the final response iterated every tool/function message in the full conversation and relied solely on path-based dedup against paths reconstructed from the replayable transcript. When that reconstruction does not byte-match the in-memory tool content (timestamp stripping, observed-context withholding, compression rewrites), a stale path emitted several turns earlier is absent from the dedup set and leaks onto a later text-only reply (Telegram 'Sending media group of 1 photo(s)' with no MEDIA directive present). Scope the scan to this turn's new messages by slicing result['messages'] at len(agent_history) (agent_history is passed as conversation_history into run_conversation, so the returned list is history + this turn). Retain path-based dedup as a secondary guard and as the sole guard on the compression-shrink fallback, preserving the #160 behaviour. Closes #34608	2026-05-29 13:13:34 -07:00
teknium1	38c4f8c371	test(gateway): update system-unit cwd assertion to HERMES_HOME anchor test_system_unit_has_no_root_paths asserted the system unit's WorkingDirectory was the remapped checkout path (/home/alice/.hermes/hermes-agent). That is the brittle pin this PR fixes — the system unit now anchors cwd at the target user's HERMES_HOME (/home/alice/.hermes). The test's intent (no root-home leak, target-user paths present) is unchanged and still holds.	2026-05-29 12:36:59 -07:00
teknium1	a1cb5fa2c7	fix(gateway): anchor service WorkingDirectory at HERMES_HOME, not the source checkout The systemd unit (and launchd plist) pinned WorkingDirectory to PROJECT_ROOT (the checkout the unit was generated from). When that checkout is transient — a git worktree, or a clone hermes update later relocates/removes — the path rots. systemd then fails the start at the CHDIR step (status=200/CHDIR) BEFORE Python loads, so the on-boot refresh_systemd_unit_if_needed() self-heal never runs and Restart=always crash-loops forever on a dead directory. Observed in the wild: a gateway that crash-looped 153 times overnight, bot offline until a manual 'hermes gateway restart' regenerated the unit. Anchor cwd at HERMES_HOME instead — it never moves, always exists, and the gateway never needed cwd to be the checkout (ExecStart uses an absolute python + -m hermes_cli.main). Existing broken units now differ from the generated unit and self-heal on the next start/restart/update.	2026-05-29 12:36:59 -07:00
Teknium	45b00bb49a	fix(packaging): ship hermes_cli subpackages in wheel (#34811 ) [tool.setuptools.packages.find] listed 'hermes_cli' without the 'hermes_cli.' wildcard, so the wheel shipped hermes_cli/.py but dropped the dashboard_auth and proxy subpackages. The dashboard died on every install with ModuleNotFoundError: No module named 'hermes_cli.dashboard_auth' (#34701); 'hermes proxy' was equally broken. Add the wildcard, and add a regression test that drives setuptools' own find_packages against the live tree so any future subpackage dropped from the include list fails CI instead of a user's container.	2026-05-29 12:36:09 -07:00
teknium1	8836b3a113	fix(cli): widen Windows .bat wrapper fix to custom-name alias path The profile alias --name path in main.py rewrote the wrapper with a hardcoded #!/bin/sh script right after create_wrapper_script(), clobbering the .bat on Windows and reintroducing the exact bug for custom aliases. create_wrapper_script() now takes an optional target so the alias file is named after the alias while the -p content references the profile — one platform-aware code path, no post-hoc rewrite.	2026-05-29 12:32:47 -07:00
liuhao1024	6312dd8c3a	fix(cli): create .bat wrapper on Windows instead of POSIX shell script On Windows, hermes profile create produced a #!/bin/sh script that the shell cannot execute. Now creates a .bat file with @echo off + %* on Windows, and keeps the POSIX shell script on macOS/Linux. Also fixes check_alias_collision to use 'where' instead of 'which' on Windows, and remove_wrapper_script to find .bat files. Fixes #34708	2026-05-29 12:32:47 -07:00
zapabob	aa283d1e4f	fix(model): isolate custom provider picker credentials	2026-05-29 12:32:35 -07:00
Teknium	27a2c4f36f	fix(mcp): stop reporting false OAuth success when no token was obtained (#34807 ) * docs(code-execution): document HERMES_* env narrowing + passthrough workaround The execute_code sandbox-child env scrub (`108397726`, #27303) deliberately dropped the broad HERMES_ prefix passthrough, keeping only an operational 4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES__WEBHOOK, or a plugin-defined one) now sees it unset in the child. Document the behavior change and the two recovery routes (terminal.env_passthrough in config.yaml, or required_environment_variables in skill frontmatter), plus the debug log line that surfaces the drop for diagnosis. fix(mcp): stop reporting false OAuth success when no token was obtained `hermes mcp login` reported "Authenticated — N tool(s) available" for servers that serve tools/list without auth (e.g. Google's official Drive MCP server) even when the OAuth flow never completed — dynamic client registration 400'd because the provider doesn't support RFC 7591, so no token was ever acquired. Every real tool call then hung until timeout with no indication of why. Login now verifies a token actually landed on disk after the probe. When it didn't, it warns that authentication didn't complete and shows the config needed to supply a pre-registered client_id/client_secret (the existing, already-supported workaround for DCR-less providers). Adds a docs pitfall for Google Drive / Atlassian-style providers. Fixes #34775	2026-05-29 12:32:19 -07:00
Teknium	1cb850b674	fix(api_server): emit per-turn transcript on run.completed (#34703 ) (#34804 ) * docs(code-execution): document HERMES_* env narrowing + passthrough workaround The execute_code sandbox-child env scrub (`108397726`, #27303) deliberately dropped the broad HERMES_ prefix passthrough, keeping only an operational 4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES__WEBHOOK, or a plugin-defined one) now sees it unset in the child. Document the behavior change and the two recovery routes (terminal.env_passthrough in config.yaml, or required_environment_variables in skill frontmatter), plus the debug log line that surfaces the drop for diagnosis. fix(api_server): emit per-turn transcript on run.completed (#34703) WebUI clients lost intermediate (pre-tool-call) assistant text after switching session pages mid-stream. The session-chat SSE stream delivers all assistant text as assistant.delta events under one message_id interleaved with tool.* events, then a single assistant.completed carrying only the final reply — so a client accumulating deltas into one buffer cannot reconstruct intermediate text segments that preceded tool calls, and they vanish from the live view (state.db persists them correctly). run.completed now carries the authoritative per-turn transcript (assistant + tool messages for this turn, in client-safe shape) so any SSE consumer can reconcile its live view against ground truth without a separate GET /messages round-trip. Purely additive — clients that ignore the field are unaffected.	2026-05-29 12:27:49 -07:00
Teknium	b6ed3913d2	feat(skills): categorize tap skills from skills.sh.json grouping sidecar A GitHub tap can ship a repo-root skills.sh.json (the published skills.sh schema) declaring category groupings. The Skills Hub now reads it at index time and uses each grouping title as the skill's category label, instead of the tag-derived guess. Generic: any tap that ships the file gets real categorization — NVIDIA's groupings (Inference AI, Decision Optimization, GPU Development, etc.) flow through automatically. - GitHubSource: _get_skillsh_groupings() fetches+caches the sidecar per repo; _parse_skillsh_groupings() flattens it to {skill_name: title}; _list_skills_in_repo() stamps meta.extra['category']; _meta_to_dict now serializes extra so the category survives the index cache round-trip. - extract-skills.py: prefers extra['category'] over the tag heuristic and exempts sidecar categories from the small-category to Other collapse. - Docs + 12 tests.	2026-05-29 12:24:39 -07:00
Teknium	4de8009ce4	feat(skills): integrate NVIDIA/skills as a trusted skills hub tap NVIDIA/skills is now a default trusted tap in the Hermes Skills Hub — discoverable, browsable, searchable, and auto-updating through the same pipeline that already serves OpenAI, Anthropic, and HuggingFace skills. Rebased onto current main.	2026-05-29 12:24:39 -07:00
kshitij	7379f17556	fix(gateway): only fire planned-stop watcher for self-targeting markers + fix Windows consume (#34749 ) * fix(gateway): only fire planned-stop watcher for markers targeting self Salvaged from #34599 — rebased onto current main. The planned-stop watcher now only fires shutdown for a marker that targets the current process, instead of any marker that exists on disk. Fixes the Windows crash loop (#34597) where a stale marker from a previous Gateway instance kills a freshly booted Gateway ~400ms after start with a false "Received UNKNOWN — initiating shutdown". Co-authored-by: Bartok9 <danielrpike9@gmail.com> * fix(gateway): match planned-stop/takeover markers by PID alone when start_time is unavailable Follow-up to the #34599 salvage. The watcher's non-destructive probe (planned_stop_marker_targets_self) already falls back to PID equality when a process start_time is unavailable, but the authoritative consume it gates (_consume_pid_marker_for_self) still required a non-None start_time match. _get_process_start_time reads /proc/<pid>/stat and returns None on macOS and native Windows — the only platform the planned-stop watcher exists for. So on Windows the probe would fire the shutdown handler (PID matches) but the handler's consume_planned_stop_marker_for_self() would return False, and a legitimate 'hermes gateway stop' was still misclassified as an unexpected UNKNOWN exit (exit 1) and revived by the service manager — a residual half of the #34597 crash loop on the legitimate-stop path. Align the consume with the probe: when both start_times are known they must match (PID-reuse guard preserved on Linux); when either is unavailable, fall back to PID equality alone, bounded by the existing short marker TTL. This also fixes the parallel --replace takeover consume on Windows, which shares the same helper. Adds regression tests for the Windows (None start_time) path, the foreign-PID rejection under that fallback, and confirmation the start_time-mismatch guard still rejects when both are known. --------- Co-authored-by: Bartok9 <danielrpike9@gmail.com>	2026-05-29 17:36:58 +00:00
alt-glitch	0563ab0652	fix(test): add fal_client.submit stub to surface matrix test The plugin switched from fal_client.subscribe() to submit()+handle.get(). The test mock only had subscribe, causing CI failures.	2026-05-29 22:26:24 +05:30
alt-glitch	3183b2e28c	fix(video_gen): veo3.1 duration format and 4k resolution FAL veo3.1 API expects duration as "4s"/"6s"/"8s" (with unit suffix), not bare "4"/"6"/"8" like other families. Add per-family duration_suffix field and apply it in _build_payload. Also add "4k" to veo3.1 resolutions per FAL API docs. Note: the managed gateway currently rejects the "4s" format (expects integer duration). Gateway-side fix needed for veo3.1 to work through the Nous subscription path.	2026-05-29 22:26:24 +05:30
alt-glitch	a4c18f65d4	feat(video_gen): wire Nous subscription override into hermes tools UX Add the same managed-gateway UX that image_gen already has: - TOOL_CATEGORIES['video_gen'] gets a 'Nous Subscription' provider row with managed_nous_feature='video_gen' + video_gen_plugin_name='fal' - NousSubscriptionFeatures gains a video_gen property + feature state computation (managed/active/available using the fal-queue gateway) - _GATEWAY_TOOL_LABELS, _GATEWAY_DIRECT_LABELS, _ALL_GATEWAY_KEYS, _get_gateway_direct_credentials, opted_in all include video_gen - apply_nous_managed_defaults and apply_gateway_defaults handle video_gen - _is_toolset_satisfied checks Nous features for video_gen - _is_provider_active detects managed video_gen (use_gateway + fal provider) - _select_plugin_video_gen_provider accepts use_gateway kwarg, propagated from all 4 call sites in _configure_provider when managed_feature is set - hermes setup status shows 'Video Generation (FAL via Nous subscription)' Users on a Nous subscription can now pick 'Nous Subscription' under hermes tools → Video Generation, which sets video_gen.provider=fal + video_gen.use_gateway=true. The FAL plugin's _resolve_managed_fal_video_gateway then routes through the managed queue gateway — no FAL_KEY needed.	2026-05-29 22:26:24 +05:30
alt-glitch	b6294ea9f1	test(video_gen): cover gateway decision matrix gaps and 4xx error path - Add test for 4xx ValueError with actionable remediation message - Add test for is_available() returning True via managed gateway - Add test for prefers_gateway overriding direct FAL_KEY - Add test for is_available() via gateway in plugin test file	2026-05-29 22:26:24 +05:30
alt-glitch	d04b3c193e	feat(video_gen): route FAL video gen through managed Nous gateway Wire plugins/video_gen/fal/__init__.py to use the same _ManagedFalSyncClient pattern that image gen already uses. Changes: - Add managed gateway resolution, client caching, and _submit_fal_video_request() that routes between direct FAL_KEY and Nous gateway modes - Update is_available() to return True when either FAL_KEY or the managed gateway is reachable - Update generate() to use submit+get handle pattern instead of fal_client.subscribe() directly - Fix happy-horse endpoint namespace: fal-ai/ → alibaba/ (matches the tool-gateway allowlist from fal-video-gen branch) - Surface actionable error on 4xx gateway rejections Tests: - 4 new tests in test_managed_media_gateways.py (gateway routing, client reuse, direct mode fallback, alibaba namespace) - Updated existing test_fal_plugin.py fixture to use submit/handle pattern and patch _resolve_managed_fal_video_gateway for isolation	2026-05-29 22:26:24 +05:30
kshitijk4poor	38695254f8	perf(state): merge FTS5 segments on VACUUM + add 'hermes sessions optimize' The FTS5 indexes (messages_fts, messages_fts_trigram) grow as a series of incremental b-tree segments — one per trigger-driven insert batch. SQLite's automerge caps at ~16 segments, so a long-lived store keeps scanning many segments per MATCH and never collapses them unless the special 'optimize' command runs. Nothing in the codebase ever ran it: vacuum() only fired after a prune that deleted rows, and even then never merged FTS segments. Changes: - SessionDB.optimize_fts(): merges each FTS5 index to a single segment, probing for the (optional/lazy) trigram table first so it is safe to call unconditionally. Layout-only — search results and snippet() are unchanged. - vacuum() now calls optimize_fts() before VACUUM so freed index pages are returned to the OS in the same pass. - 'hermes sessions optimize' CLI subcommand for on-demand reclamation + segment compaction (previously there was no way to compact the store without a prune deleting rows), with before/after size reporting. Benchmark (8000 msgs, fragmented to 8 segments/index): - segments 8 -> 1 on both indexes - porter MATCH 5.5x faster (0.449 -> 0.081 ms/q) - trigram MATCH 3.0x faster (0.632 -> 0.207 ms/q) - 8000 matches before == 8000 after, identical row ids (no functional change) Orthogonal to the structural FTS-size PRs (#20239 external-content, #27770 optional trigram) — segment merge helps regardless of those. Tests: TestOptimizeFts covers index count, search+snippet preservation, missing-trigram path, and idempotency. Full test_hermes_state.py green (227).	2026-05-29 05:09:56 -07:00
teknium1	1c53d39eaa	test: deflake process-registry kill + PTY resize tests Two CI flakes surfaced on PR #34572 (both in files this PR doesn't touch; pre-existing host-dependent flakes): 1. test_process_registry::TestPopenLeakOnSetupFailure — the failure-cleanup tests use a fake proc.pid (8888/9999) and assert proc.kill() runs. But spawn_local's primary cleanup is os.killpg(os.getpgid(pid), SIGKILL), falling back to proc.kill() only on ProcessLookupError/PermissionError/ OSError. When the fake PID happens to exist on a busy host, os.getpgid succeeds, os.killpg fires against an UNRELATED real process group, and proc.kill() is never reached -> flaky AssertionError (and a real risk of SIGKILLing an innocent process group from a unit test). Patch os.getpgid to raise ProcessLookupError so the fallback path runs deterministically and no real killpg is ever issued. 2. test_web_server::test_resize_escape_is_forwarded — the receive loop calls the blocking conn.receive_bytes() with no exception guard. Once the child prints its winsize and exits, the PTY closes; on a missed-marker run the next recv blocks until the 30s pytest-timeout instead of failing fast. Add a try/except break (matching the working sibling tests) and bump the child's pre-read sleep 0.15s -> 0.5s so the resize reliably lands first. Verified: 4/4 pass across 3 consecutive runs; root cause for #1 reproduced (os.getpgid(1) succeeds -> old code skips proc.kill).	2026-05-29 04:22:41 -07:00
teknium1	fd09b2c55e	fix(gateway): trust adapter-owned access policy over env default-deny (#34515 ) Config-driven platform policies (dm_policy / group_policy / allow_from / group_allow_from) for WeCom, Weixin, Yuanbao, and QQBot now work without also setting a PLATFORM_ALLOWED_USERS env var. These adapters enforce their access policy at intake — a message is dropped inside the adapter and never dispatched unless it already passed the policy. The gateway's env-based check (_is_user_authorized) ran afterward and, with no env allowlist set, fell through to an env-only default-deny — silently rejecting `dm_policy: open` and config-only allowlists the adapter had already authorized. Rather than re-implement each adapter's policy a second time in run.py (which would drift), adapters that own their gate now declare it via a new BasePlatformAdapter.enforces_own_access_policy property (default False). The gateway trusts that flag and skips the env-only default-deny for those platforms. Env allowlists still take precedence when set. Also resolves unauthorized DM behavior from config dm_policy so allowlist / disabled policies drop unauthorized DMs silently instead of leaking pairing codes, while an explicit pairing policy opts back in. Co-authored-by: Frowtek <frowte3k@gmail.com>	2026-05-29 04:22:41 -07:00
Teknium	e4b9532c18	feat: embedder environment-hint hook for the system prompt (#34574 ) * fix(security): block AWS SDK creds from subprocess env * fix(security): narrow Bedrock subprocess strip to inference bearer token only Scopes the AWS_SDK subprocess strip down from the full AWS credential chain to just AWS_BEARER_TOKEN_BEDROCK — the only Hermes-managed inference secret (analogous to OPENAI_API_KEY). The general AWS credential chain (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN / AWS_PROFILE / config + role pointers) is intentionally left inheritable. Why: per SECURITY.md §3.2 the local terminal is the user's trusted operator shell. Hard-blocklisting the general chain would (a) regress every user who runs aws/terraform/cdk/boto3 in the agent terminal — not just Bedrock users, since PROVIDER_REGISTRY is iterated unconditionally at import — and (b) be unrecoverable, because env_passthrough.py refuses to re-allow anything in _HERMES_PROVIDER_ENV_BLOCKLIST (GHSA-rhgp-j443-p4rf). The narrow strip closes the reported leak (opencode enumerating the Bedrock catalog off the leaked bearer token) with no capability loss. Keeps zapabob's self-healing auth_type=="aws_sdk" mechanism so any future SDK-cred provider is covered automatically. Tests: bearer token stripped + general chain preserved (no-regression guard), on both the runtime strip path and the blocklist-membership path. Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp> * feat: embedder environment-hint hook for the system prompt Adds HERMES_ENVIRONMENT_HINT env var (and config.yaml agent.environment_hint) so a host wrapping Hermes (sandbox runner, managed platform) can describe the runtime environment — proxy, credential handling, mount layout — in the system prompt's environment-hints block, without editing the identity slot (SOUL.md). Read once at prompt-build time, so it lands in the stable, cache-safe portion of the system prompt. Env var overrides the config key (build-time/container mechanism). Empty by default — no behavior change for existing installs. --------- Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>	2026-05-29 04:10:05 -07:00
briandevans	6e179c44b1	fix(web): ensure plugin discovery before web_*_tool registry lookups Web search/extract dispatch read agent.web_search_registry before plugin discovery had run, so in any process that hadn't imported model_tools.py (subprocess agent runs, delegate children, standalone scripts) the registry was empty: get_provider('firecrawl') returned None and the dispatcher emitted the misleading 'No web extract provider configured' error even with web.extract_backend set and FIRECRAWL_API_KEY exported. Adds an idempotent _ensure_web_plugins_loaded() helper (mirrors tools.browser_tool._ensure_browser_plugins_loaded) and calls it at the top of both the web_search_tool and web_extract_tool dispatch sites before the registry lookup. Fixes #27580. Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>	2026-05-29 04:00:00 -07:00
teknium1	c77a697fa4	refactor(vision): consolidate native fast-path gate into one shared helper The fast-path decision (native routing + provider allowlist OR supports_vision override) lived inline in vision_analyze and was copied into browser_vision. Extract it to _should_use_native_vision_fast_path() so both tools share one source of truth. - vision_tools: gate logic now one helper; vision_analyze calls it in 3 lines - browser_tool: thin envelope decoration over the shared helper, not a copy - browser_vision typed Union[str, Dict] to match its real return shape - tests slimmed to target the override path + text-mode-wins invariant	2026-05-29 03:58:56 -07:00
tillfalko	2402ec5e7b	test: extend test coverage to native image routing	2026-05-29 03:58:56 -07:00
EloquentBrush0x	784d8dd2c2	fix(matrix): fail-closed approval reaction auth when MATRIX_ALLOWED_USERS is empty The _on_reaction approval handler used: if self._allowed_user_ids and sender not in self._allowed_user_ids: When MATRIX_ALLOWED_USERS is not configured, _allowed_user_ids is an empty set. The short-circuit on the empty set caused the deny block to never execute, allowing any Matrix room member to approve or deny tool calls via ✅/❎ reactions — even users that run.py's _is_user_authorized would reject for regular messages. Fix mirrors the Telegram _is_callback_user_authorized fix (commit `89d32052e`, PR #28494): deny by default when no allowlist is configured, unless GATEWAY_ALLOW_ALL_USERS=true is explicitly set.	2026-05-29 03:58:45 -07:00
teknium1	3171845479	fix(code-exec): make dropped HERMES_* env vars diagnosable in sandbox scrub Follow-up mitigation for the #27303 env-scrub tightening. Dropping the broad HERMES_ prefix in favor of a 4-var operational allowlist is correct hardening, but a sandbox script that imports a repo module reading a non-allowlisted HERMES_* var at import time would otherwise see it silently unset. _scrub_child_env now emits a one-shot debug log naming the dropped non-secret HERMES_* vars and pointing at the env_passthrough opt-in escape hatch. Secret-shaped vars are never named in the log. Tests: dropped vars are logged + env_passthrough named; no log when nothing is dropped; secret vars excluded from the diagnostic.	2026-05-29 03:44:49 -07:00
firefly	4bdae34771	test(code-exec): regression suite for the approval-bypass cluster Cover context+callback propagation and teardown-clears, a source guard that both RPC threads stay wrapped, the check_execute_code_guard decision matrix (isolated backend, headless-local, cron-deny, gateway approve/deny/timeout/missing-notify, smart mode, session-yolo), the env-scrub allowlist/secret rules, and a behavioral test that execute_code() blocks before spawning on denial. Refs #4146, #27303, #30882, #33057	2026-05-29 03:44:49 -07:00
firefly	1083977261	fix(code-exec): restore approval context in execute_code RPC threads + guard entry Wrap both execute_code RPC threads (local UDS + remote file-RPC) with propagate_context_to_thread so gateway sessions no longer fall into check_dangerous_command's non-interactive auto-approve branch and the CLI approval prompt stays reachable. Add check_execute_code_guard: one-shot fail-closed approval of the whole script in gateway/ask/cron-deny before the child spawns (skips isolated backends; command-string built only past the early returns). Drop the broad HERMES_ env passthrough for an explicit operational allowlist plus DSN/WEBHOOK secret substrings, and update the POSIX-equivalence oracle. Refs #4146, #27303, #30882, #33057	2026-05-29 03:44:49 -07:00
firefly	21aeefe5fd	fix(code-exec): propagate agent-turn context into tool worker threads Worker threads that dispatch Hermes tools started with an empty contextvars.Context and no thread-local approval/sudo callbacks. Add tools/thread_context.propagate_context_to_thread factoring that capture/install/clear lifecycle (mirrors the GHSA-qg5c-hvr5-hjgr pattern), and refactor agent/tool_executor onto it so the security-critical logic lives in one audited place. Update the contextvar-propagation source guard for the new call shape. Refs #33057	2026-05-29 03:44:49 -07:00
kshitijk4poor	a22c250001	refactor(auth): remove vestigial Nous min_key_ttl/inference_auth_mode params After the legacy session-key path was removed, two parameters became dead surface on the Nous runtime-resolution chain: - min_key_ttl_seconds: del'd inside refresh_nous_oauth_pure and pass-through / telemetry-only in refresh_nous_oauth_from_state, _try_import_shared_nous_state, _nous_device_code_login, and resolve_nous_runtime_credentials. It controlled the now-deleted agent-key mint TTL and drives no behavior. - inference_auth_mode: with the legacy mode gone, AUTO and FRESH are behaviorally identical; the value only fed _normalize_nous_inference_auth_mode validation and oauth trace output, never a branch. Removing inference_auth_mode orphaned its whole supporting cluster (NOUS_INFERENCE_AUTH_MODE_AUTO/FRESH, NOUS_INFERENCE_AUTH_MODES, _normalize_nous_inference_auth_mode), and dropping min_key_ttl_seconds orphaned DEFAULT_AGENT_KEY_MIN_TTL_SECONDS — all deleted here. Updated every caller (run_agent, auxiliary_client, credential_pool, proxy adapter, runtime_provider, web_server, main, auth_commands, setup) and pruned the matching test kwargs. Deleted two tests that exercised the removed surface (test_legacy_auth_mode_is_rejected, test_try_refresh_..._accepts_explicit_auth_mode). No behavior change: net -134 LOC of dead code.	2026-05-29 02:24:48 -07:00
Robin Fernandes	4e4984a11a	test(auth): update nous jwt-only expectations	2026-05-29 02:24:48 -07:00
Robin Fernandes	7e958dafc2	fix(auth): address Nous JWT fallback review	2026-05-29 02:24:48 -07:00
Robin Fernandes	41ff6e5937	refactor(auth): Disable Nous legacy session key fallback	2026-05-29 02:24:48 -07:00
teknium1	18c9e89106	test: update _invoke_tool dispatch assertion for new toolset-scope kwargs The scoping fix added enabled_toolsets/disabled_toolsets to the agent_runtime_helpers sequential dispatch into handle_function_call, so test_invoke_tool_dispatches_to_handle_function_call's assert_called_once_with (exact match) needs the two new kwargs. Both are None for the default agent fixture.	2026-05-29 02:04:12 -07:00
teknium1	7427b9d581	fix(tool-search): scope bridge catalog + dispatch to the session's toolsets Tool Search read its catalog from the global registry (get_tool_definitions with no toolset scope = 'start with everything'), so a restricted-toolset session — subagent, kanban worker, curated gateway session — could: 1. tool_search the entire process registry, not just its granted tools, and 2. tool_call any registered plugin/MCP tool it was never given, because registry.dispatch() has no enabled_tools gate for non-execute_code tools. A scoped session (enabled_toolsets=['mcp-github']) reported total_available=26 and successfully invoked an out-of-scope plugin tool via tool_call. Fix: - handle_function_call gains enabled_toolsets/disabled_toolsets; the bridge dispatch scopes get_tool_definitions to them (also stops polluting the process-global _last_resolved_tool_names with out-of-scope tools, which leaked into execute_code's sandbox-tool fallback). - A defense-in-depth gate rejects any tool_call'd name not in the scoped deferrable catalog. - tool_executor's unwrap (both concurrent + sequential paths) enforces the same scope before dispatch, since it unwraps tool_call -> underlying name and bypasses the bridge branch. New _tool_search_scoped_names() helper, cached per-agent on registry generation + toolset scope. - New scoped_deferrable_names() helper in tool_search.py shared by both sites. Tests: 4 new regression tests in TestRegression_ToolsetScoping (scoped catalog, out-of-scope tool_call rejection, no global pollution, helper).	2026-05-29 02:04:12 -07:00
teknium1	369075dc95	feat(tools): progressive tool disclosure for MCP and plugin tools Adds Tool Search, a structured-tools progressive-disclosure layer that replaces MCP and non-core plugin tools in the model-visible tools array with three bridge tools (tool_search / tool_describe / tool_call) when the deferrable surface would consume more than a configurable percentage of the active model's context window. Core Hermes tools are never deferred. Default mode is 'auto' with a 10% context threshold, so small toolsets pay no overhead. Set tools.tool_search.enabled to 'on' to force or 'off' to disable. Design carefully reflects the OpenClaw production failure modes documented in the openclaw-tool-search-report: - Core tools never defer (toolsets._HERMES_CORE_TOOLS). Addresses the 'tools silently missing from isolated cron turns' regression class (openclaw#84141) by construction: there is no code path that can drop a core tool. - Catalog is stateless across turns — rebuilt from the live tool-defs list on every assembly. No session-keyed Map that can drift out of sync with the registry. - tool_call unwraps the bridge call before any hook fires, so plugin pre/post hooks, guardrails, approval flows, and the activity feed all see the underlying tool name, not the bridge (addresses openclaw#85588 and the verbose-mode complaint on openclaw#79823). - The unwrap happens in both the parallel and sequential paths of agent/tool_executor.py and also in handle_function_call, so direct callers (sandboxed code, eval harnesses) are covered too. - Bridge tools cannot invoke each other (recursion guard) and cannot invoke core tools (those must be called directly). - Tools mode only — no JS-sandbox code-mode. Keeps the surface small. - Token estimation via cheap char/4 heuristic; precision isn't needed for the threshold decision. Files: - tools/tool_search.py — new module (BM25 retrieval, classification, threshold gate, bridge dispatch, unwrap helper). - tests/tools/test_tool_search.py — 35 tests including the OpenClaw #84141 regression guard. - model_tools.py — wires assembly into _compute_tool_definitions as the final step, adds skip_tool_search_assembly kwarg so the bridge can see the real catalog, dispatches the three bridge tools. - agent/tool_executor.py — unwraps tool_call in both parallel and sequential parsing loops so checkpointing, guardrails, plugin hooks, and tool-progress callbacks all observe the underlying tool name. - hermes_cli/config.py — DEFAULT_CONFIG['tools']['tool_search'] block. - website/docs/user-guide/features/tool-search.md — user docs. Validation: - 35/35 new tests pass. - Existing tool/registry/model_tools/config/coercion/executor tests (82 + 74 + small adjacents) green. - Live E2E: 20 fake MCP tools registered, get_tool_definitions returns 3 bridges, tool_search returns top 3 hits, tool_describe returns full schema, tool_call dispatches to the real underlying handler and the underlying result is what the model sees. - Reserved-name recursion guard verified live. - Core-tool refusal via tool_call verified live.	2026-05-29 02:04:12 -07:00
teknium1	73d73f1f0d	fix(codex): relax no-byte TTFB watchdog default from 12s to 120s The chatgpt.com/backend-api/codex endpoint can spend tens of seconds in backend admission / prompt prefill before emitting its first SSE event. The 12s no-byte TTFB cutoff aborted those still-valid streams, surfacing as 'Codex stream produced no bytes within 12s' through all retries (Discord reports). The OpenAI SDK's own streaming read timeout is 600s, so 12s was ~50x more aggressive than the transport layer would have tolerated. Default the no-byte cutoff to 120s and raise the openai-codex MAX cap default to 120s so it no longer clamps the new default back to 20s. Disabling stays available via HERMES_CODEX_TTFB_TIMEOUT_SECONDS=0; the 25k-token auto-disable, _STRICT override, and post-first-event idle watchdog are unchanged. Co-authored-by: Gille <4317663+helix4u@users.noreply.github.com>	2026-05-29 02:02:25 -07:00
teknium1	6bebab4761	fix(security): narrow Bedrock subprocess strip to inference bearer token only Scopes the AWS_SDK subprocess strip down from the full AWS credential chain to just AWS_BEARER_TOKEN_BEDROCK — the only Hermes-managed inference secret (analogous to OPENAI_API_KEY). The general AWS credential chain (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN / AWS_PROFILE / config + role pointers) is intentionally left inheritable. Why: per SECURITY.md §3.2 the local terminal is the user's trusted operator shell. Hard-blocklisting the general chain would (a) regress every user who runs aws/terraform/cdk/boto3 in the agent terminal — not just Bedrock users, since PROVIDER_REGISTRY is iterated unconditionally at import — and (b) be unrecoverable, because env_passthrough.py refuses to re-allow anything in _HERMES_PROVIDER_ENV_BLOCKLIST (GHSA-rhgp-j443-p4rf). The narrow strip closes the reported leak (opencode enumerating the Bedrock catalog off the leaked bearer token) with no capability loss. Keeps zapabob's self-healing auth_type=="aws_sdk" mechanism so any future SDK-cred provider is covered automatically. Tests: bearer token stripped + general chain preserved (no-regression guard), on both the runtime strip path and the blocklist-membership path. Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>	2026-05-29 01:48:08 -07:00
zapabob	95b5b72404	fix(security): block AWS SDK creds from subprocess env	2026-05-29 01:48:08 -07:00
Teknium	db2ce9e7d2	fix(compression): fail open when lock subsystem is missing (version skew) (#34475 ) A process running mismatched module versions — conversation_compression.py re-imported with the post-#34351 lock code while a long-lived hermes_state.SessionDB stays bound to the pre-#34351 class in memory — has the try_acquire_compression_lock call site but not the method. The AttributeError it raises is NOT a sqlite3.Error, so the method's own fail-open guard never runs; the exception escapes to the outer agent loop, which prints the error and retries. Compression never succeeds, the token count never drops, and the loop re-triggers compaction forever (the 'API call #47/#48/#49 ... has no attribute try_acquire_compression_lock' spin a user hit after an update). Wrap the lock acquire so any unexpected exception fails OPEN: skip locking and proceed with compression. Skipping the lock risks a rare concurrent-compression session fork; an infinite no-progress loop that never compresses at all is strictly worse. The remediation hint in the log points at the real fix (restart / hermes update to resync the stale module). Also guards get_compression_lock_holder against the same skew. Adds a regression test simulating the version skew (real SessionDB wrapped so only the lock methods raise AttributeError) — asserts _compress_context proceeds and rotates instead of raising.	2026-05-29 01:32:32 -07:00
Teknium	e28a668b40	fix(gateway): diagnosable MEDIA rejections + canonical cache roots + null-path guard Operators can now see which MEDIA path was dropped and why, generated artifacts under the canonical ~/.hermes/cache/{images,...} layout deliver, and a crafted ~\x00 path no longer aborts the whole attachment batch. - MEDIA_DELIVERY_SAFE_ROOTS: add canonical cache/{images,audio,videos, documents,screenshots} alongside the legacy *_cache dirs (#31733). - filter_media/local_delivery_paths: log the rejected path (was a blind "outside allowed roots") via _log_safe_path, which strips control chars and Unicode line separators so a model-emitted path can't forge a log line. - validate_media_delivery_path + extract_media: guard os.path.expanduser so a ~\x00 path returns None / is skipped instead of raising and dropping every other attachment in the response. Salvaged and slimmed from #33251 (780 LOC -> 35): the reason-tag taxonomy, the parts-eliding redactor, and the extension-partition hoist are dropped in favor of logging the path directly. All three findings were verified and reproduced by the contributor. Co-authored-by: wysie <wysie@users.noreply.github.com>	2026-05-29 01:23:35 -07:00
teknium1	2765b02021	fix(packaging): ship bundled plugin.yaml manifests in wheel and sdist The v0.15.0 PyPI wheel shipped every plugin's Python code but none of its plugin.yaml manifests, so plugin discovery (hermes_cli/plugins.py) found zero plugins and ALL gateway platforms failed with "No adapter available for <platform>" (discord, slack, mattermost, ...). Same gap also dropped the web-search provider manifests (#28149). Declare manifest coverage in both packaging channels: - wheel: [tool.setuptools.package-data] plugins += /plugin.yaml, /plugin.yml - sdist: MANIFEST.in recursive-include plugins plugin.yaml plugin.yml (Homebrew and other downstream packagers build from the sdist) Verified by building the wheel before/after: plugin.yaml count went 0 -> 69, discord's manifest now ships. Adds a regression test asserting both channels cover manifests. Fixes #34034 Co-authored-by: outsourc-e <201563152+outsourc-e@users.noreply.github.com> Co-authored-by: Dhruvil Parikh <41384593+dparikh79@users.noreply.github.com> Co-authored-by: ousiaresearch <261687298+ousiaresearch@users.noreply.github.com> Co-authored-by: libre-7 <6366424+libre-7@users.noreply.github.com>	2026-05-29 01:23:28 -07:00
Teknium	c01a2df0a3	fix(auth): don't launch a text-mode browser inside the terminal for OAuth (#34479 ) OAuth auto-open only checked _is_remote_session() (SSH + cloud-shell env vars). On a headless/CLI-only Linux box with no GUI browser, none of those trip, so webbrowser.open() resolved to a console browser (w3m/lynx/links) and launched it INSIDE the terminal — hijacking the user's TTY with the xAI 'Account Management' login page instead of letting them copy the URL. Add _can_open_graphical_browser(): returns False when webbrowser would resolve to a known console browser, when $BROWSER names one, when there's no display server on Linux, or when no browser resolves at all. Gate all 5 OAuth auto-open callsites (xAI loopback, Spotify loopback, MiniMax device code, Anthropic, Google) on it in addition to the existing remote check. Headless boxes now print the URL / fall through to manual-paste instead.	2026-05-29 01:23:06 -07:00
wysie	f32b66c758	fix: improve plugins list usability	2026-05-29 00:59:42 -07:00
Blake	26b83a5f5f	fix(cli): ignore terminal focus reports (salvage of #16780 ) Ghostty/macOS window or tab navigation (Cmd+Shift+[ / ], Alt+Tab, etc.) can deliver terminal focus reports (CSI I / CSI O) to the running TUI. prompt_toolkit does not map those sequences by default, so its parser falls back to literal key presses (ESC, [, I/O) and inserts `[I` / `[O` into the prompt buffer after the ESC byte is handled. Fix: register the two sequences as Keys.Ignore in ANSI_SEQUENCES at parser level, plus a no-op kb.add(Keys.Ignore) handler so the default self-insert path never inserts focus-report bytes. Salvage notes: original PR put the helper in cli.py. Salvaged into hermes_cli/pt_input_extras.py alongside install_shift_enter_alias / install_ctrl_enter_alias to match the established pattern for ANSI_SEQUENCES augmentation. setdefault → in-check so any prior user registration wins. Closes #16780	2026-05-29 00:31:44 -07:00
moikapy	f6a2ba6261	fix(auxiliary): detect xAI OAuth 403 bad-credentials as auth error xAI returns HTTP 403 (not 401) with unauthenticated:bad-credentials when an OAuth2 access token has expired or is invalid. The existing _is_auth_error() only checked for 401 status codes, so these tokens were never refreshed and the 403 propagated as a generic permission denied error. Three fixes: 1. _is_auth_error: Recognize xAI's 403+bad-credentials pattern as an auth failure, triggering token refresh instead of silent failure. 2. _refresh_provider_credentials: Add xai-oauth branch with pool-level refresh (try_refresh_current with select to ensure current entry) then fallback to singleton resolver with force_refresh=True. 3. _recoverable_pool_provider: Map api.x.ai host to xai-oauth pool for auto-resolved providers, matching existing pattern for openai-codex/openrouter/nous/anthropic. Includes 14 tests covering the new detection logic, host mapping, and graceful fallback behavior. Signed-off-by: moikapy <moikapy@devmoi.com>	2026-05-29 00:28:02 -07:00
teknium1	bc736ff543	test(model-catalog): use exact URL equality in fallback tests CodeQL flagged 'hermes-agent.nousresearch.com' in url and similar substring checks as py/incomplete-url-substring-sanitization. The rule is about URL allowlist checks in production code, not test routing — there's no security boundary here. Switch to url == self.PRIMARY / self.FALLBACK, which is the same semantic and silences the rule.	2026-05-29 00:25:36 -07:00
teknium1	f2d88c820c	fix(model-catalog): fall through to raw.github when Vercel 403s; swap step-3.5-flash for step-3.7-flash on OpenRouter+Nous The docs site (Vercel) serves /docs/api/model-catalog.json behind a bot mitigation rule that returns HTTP 403 + x-vercel-mitigated: challenge for non-browser User-Agents — including urllib (what the CLI uses) and curl. When that happens, get_catalog() falls back to the stale disk cache and new model releases (Opus 4.8, etc.) never reach the /model picker even though they're already in OPENROUTER_MODELS and the live OpenRouter API. Adds a fallback URL chain: when the primary catalog URL fails, walk DEFAULT_CATALOG_FALLBACK_URLS — currently the raw.githubusercontent.com copy of the same file. GitHub raw doesn't bot-gate, so the manifest stays reachable through Vercel firewall hiccups. Per-provider override URLs keep their direct-fetch semantics (operators configure those specifically, no implicit fallback). Also swaps stepfun/step-3.5-flash for stepfun/step-3.7-flash in the OpenRouter + Nous Portal curated picker lists. Native stepfun provider configuration (api.stepfun.ai) is left alone — that depends on what stepfun.ai itself serves, not what OpenRouter routes. Test plan: 5 new TestFallbackChain tests cover primary-success, primary-failure-fallback-success, all-fail, primary==fallback-dedup, and end-to-end get_catalog routing through the new helper. Existing 23 tests in test_model_catalog.py still pass (28 total). Wider tests/hermes_cli/ sweep: 5701/5701 pass.	2026-05-29 00:25:36 -07:00
Rohit Sharma	9d4fda9952	feat(kanban): add POST /runs/{run_id}/terminate endpoint Closes the termination-control gap left by PR #28432, which shipped the read-only sibling endpoints (/workers/active, /runs/{run_id}, /runs/{run_id}/inspect) but no way to stop a misbehaving worker from the dashboard without dropping to the CLI. The new endpoint resolves run_id -> task_id and delegates to the existing kanban_db.reclaim_task() flow, so the SIGTERM->SIGKILL escalation, run-outcome bookkeeping, and event-log append all match POST /tasks/{task_id}/reclaim exactly. No new termination semantics introduced. Responses: 200 {ok, run_id, task_id} on success 404 unknown run_id 409 run already ended OR task no longer reclaimable Refs: #23762	2026-05-29 00:21:54 -07:00
teknium1	7d10105918	test(kanban): update iteration-exhaustion tests for #29747 gap 2 The two tests in TestRunConversation now verify the new behavior: - test_kanban_block_called_on_iteration_exhaustion → verifies _record_task_failure(outcome='timed_out') is called instead of kanban_block - test_no_kanban_block_when_not_in_kanban_mode → verifies the bridge is a no-op when HERMES_KANBAN_TASK is unset The function names are kept for diff stability; both assert against _record_task_failure now, which is the correct contract per the gap-2 fix in this PR.	2026-05-29 00:13:29 -07:00

1 2 3 4 5 ...

4572 commits