Commit graph

4572 commits

Author SHA1 Message Date
Bartok9
08c0b22417 fix(gateway): scope tool-result MEDIA scan to current turn
The post-run scan that appends tool-emitted MEDIA: tags to the final
response iterated every tool/function message in the full conversation
and relied solely on path-based dedup against paths reconstructed from
the replayable transcript. When that reconstruction does not byte-match
the in-memory tool content (timestamp stripping, observed-context
withholding, compression rewrites), a stale path emitted several turns
earlier is absent from the dedup set and leaks onto a later text-only
reply (Telegram 'Sending media group of 1 photo(s)' with no MEDIA
directive present).

Scope the scan to this turn's new messages by slicing result['messages']
at len(agent_history) (agent_history is passed as conversation_history
into run_conversation, so the returned list is history + this turn).
Retain path-based dedup as a secondary guard and as the sole guard on
the compression-shrink fallback, preserving the #160 behaviour.

Closes #34608
2026-05-29 13:13:34 -07:00
teknium1
38c4f8c371 test(gateway): update system-unit cwd assertion to HERMES_HOME anchor
test_system_unit_has_no_root_paths asserted the system unit's
WorkingDirectory was the remapped *checkout* path
(/home/alice/.hermes/hermes-agent). That is the brittle pin this PR
fixes — the system unit now anchors cwd at the target user's HERMES_HOME
(/home/alice/.hermes). The test's intent (no root-home leak, target-user
paths present) is unchanged and still holds.
2026-05-29 12:36:59 -07:00
teknium1
a1cb5fa2c7 fix(gateway): anchor service WorkingDirectory at HERMES_HOME, not the source checkout
The systemd unit (and launchd plist) pinned WorkingDirectory to PROJECT_ROOT
(the checkout the unit was generated from). When that checkout is transient —
a git worktree, or a clone hermes update later relocates/removes — the path
rots. systemd then fails the start at the CHDIR step (status=200/CHDIR) BEFORE
Python loads, so the on-boot refresh_systemd_unit_if_needed() self-heal never
runs and Restart=always crash-loops forever on a dead directory. Observed in
the wild: a gateway that crash-looped 153 times overnight, bot offline until a
manual 'hermes gateway restart' regenerated the unit.

Anchor cwd at HERMES_HOME instead — it never moves, always exists, and the
gateway never needed cwd to be the checkout (ExecStart uses an absolute python
+ -m hermes_cli.main). Existing broken units now differ from the generated unit
and self-heal on the next start/restart/update.
2026-05-29 12:36:59 -07:00
Teknium
45b00bb49a
fix(packaging): ship hermes_cli subpackages in wheel (#34811)
[tool.setuptools.packages.find] listed 'hermes_cli' without the
'hermes_cli.*' wildcard, so the wheel shipped hermes_cli/*.py but
dropped the dashboard_auth and proxy subpackages. The dashboard died
on every install with ModuleNotFoundError: No module named
'hermes_cli.dashboard_auth' (#34701); 'hermes proxy' was equally
broken.

Add the wildcard, and add a regression test that drives setuptools'
own find_packages against the live tree so any future subpackage
dropped from the include list fails CI instead of a user's container.
2026-05-29 12:36:09 -07:00
teknium1
8836b3a113 fix(cli): widen Windows .bat wrapper fix to custom-name alias path
The profile alias --name path in main.py rewrote the wrapper with a
hardcoded #!/bin/sh script right after create_wrapper_script(), clobbering
the .bat on Windows and reintroducing the exact bug for custom aliases.

create_wrapper_script() now takes an optional target so the alias file is
named after the alias while the -p content references the profile — one
platform-aware code path, no post-hoc rewrite.
2026-05-29 12:32:47 -07:00
liuhao1024
6312dd8c3a fix(cli): create .bat wrapper on Windows instead of POSIX shell script
On Windows, hermes profile create produced a #!/bin/sh script that the
shell cannot execute.  Now creates a .bat file with @echo off + %* on
Windows, and keeps the POSIX shell script on macOS/Linux.

Also fixes check_alias_collision to use 'where' instead of 'which' on
Windows, and remove_wrapper_script to find .bat files.

Fixes #34708
2026-05-29 12:32:47 -07:00
zapabob
aa283d1e4f fix(model): isolate custom provider picker credentials 2026-05-29 12:32:35 -07:00
Teknium
27a2c4f36f
fix(mcp): stop reporting false OAuth success when no token was obtained (#34807)
* docs(code-execution): document HERMES_* env narrowing + passthrough workaround

The execute_code sandbox-child env scrub (108397726, #27303) deliberately
dropped the broad HERMES_ prefix passthrough, keeping only an operational
4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a
non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES_*_WEBHOOK,
or a plugin-defined one) now sees it unset in the child.

Document the behavior change and the two recovery routes (terminal.env_passthrough
in config.yaml, or required_environment_variables in skill frontmatter), plus
the debug log line that surfaces the drop for diagnosis.

* fix(mcp): stop reporting false OAuth success when no token was obtained

`hermes mcp login` reported "Authenticated — N tool(s) available" for
servers that serve tools/list without auth (e.g. Google's official Drive
MCP server) even when the OAuth flow never completed — dynamic client
registration 400'd because the provider doesn't support RFC 7591, so no
token was ever acquired. Every real tool call then hung until timeout
with no indication of why.

Login now verifies a token actually landed on disk after the probe. When
it didn't, it warns that authentication didn't complete and shows the
config needed to supply a pre-registered client_id/client_secret (the
existing, already-supported workaround for DCR-less providers).

Adds a docs pitfall for Google Drive / Atlassian-style providers.

Fixes #34775
2026-05-29 12:32:19 -07:00
Teknium
1cb850b674
fix(api_server): emit per-turn transcript on run.completed (#34703) (#34804)
* docs(code-execution): document HERMES_* env narrowing + passthrough workaround

The execute_code sandbox-child env scrub (108397726, #27303) deliberately
dropped the broad HERMES_ prefix passthrough, keeping only an operational
4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a
non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES_*_WEBHOOK,
or a plugin-defined one) now sees it unset in the child.

Document the behavior change and the two recovery routes (terminal.env_passthrough
in config.yaml, or required_environment_variables in skill frontmatter), plus
the debug log line that surfaces the drop for diagnosis.

* fix(api_server): emit per-turn transcript on run.completed (#34703)

WebUI clients lost intermediate (pre-tool-call) assistant text after
switching session pages mid-stream. The session-chat SSE stream delivers
all assistant text as assistant.delta events under one message_id
interleaved with tool.* events, then a single assistant.completed
carrying only the final reply — so a client accumulating deltas into one
buffer cannot reconstruct intermediate text segments that preceded tool
calls, and they vanish from the live view (state.db persists them
correctly).

run.completed now carries the authoritative per-turn transcript
(assistant + tool messages for this turn, in client-safe shape) so any
SSE consumer can reconcile its live view against ground truth without a
separate GET /messages round-trip. Purely additive — clients that ignore
the field are unaffected.
2026-05-29 12:27:49 -07:00
Teknium
b6ed3913d2 feat(skills): categorize tap skills from skills.sh.json grouping sidecar
A GitHub tap can ship a repo-root skills.sh.json (the published skills.sh
schema) declaring category groupings. The Skills Hub now reads it at index
time and uses each grouping title as the skill's category label, instead of
the tag-derived guess. Generic: any tap that ships the file gets real
categorization — NVIDIA's groupings (Inference AI, Decision Optimization,
GPU Development, etc.) flow through automatically.

- GitHubSource: _get_skillsh_groupings() fetches+caches the sidecar per repo;
  _parse_skillsh_groupings() flattens it to {skill_name: title};
  _list_skills_in_repo() stamps meta.extra['category']; _meta_to_dict now
  serializes extra so the category survives the index cache round-trip.
- extract-skills.py: prefers extra['category'] over the tag heuristic and
  exempts sidecar categories from the small-category to Other collapse.
- Docs + 12 tests.
2026-05-29 12:24:39 -07:00
Teknium
4de8009ce4 feat(skills): integrate NVIDIA/skills as a trusted skills hub tap
NVIDIA/skills is now a default trusted tap in the Hermes Skills Hub —
discoverable, browsable, searchable, and auto-updating through the same
pipeline that already serves OpenAI, Anthropic, and HuggingFace skills.

Rebased onto current main.
2026-05-29 12:24:39 -07:00
kshitij
7379f17556
fix(gateway): only fire planned-stop watcher for self-targeting markers + fix Windows consume (#34749)
* fix(gateway): only fire planned-stop watcher for markers targeting self

Salvaged from #34599 — rebased onto current main.

The planned-stop watcher now only fires shutdown for a marker that targets
the current process, instead of any marker that exists on disk. Fixes the
Windows crash loop (#34597) where a stale marker from a previous Gateway
instance kills a freshly booted Gateway ~400ms after start with a false
"Received UNKNOWN — initiating shutdown".

Co-authored-by: Bartok9 <danielrpike9@gmail.com>

* fix(gateway): match planned-stop/takeover markers by PID alone when start_time is unavailable

Follow-up to the #34599 salvage. The watcher's non-destructive probe
(planned_stop_marker_targets_self) already falls back to PID equality when
a process start_time is unavailable, but the authoritative consume it gates
(_consume_pid_marker_for_self) still required a non-None start_time match.

_get_process_start_time reads /proc/<pid>/stat and returns None on macOS and
native Windows — the only platform the planned-stop watcher exists for. So on
Windows the probe would fire the shutdown handler (PID matches) but the
handler's consume_planned_stop_marker_for_self() would return False, and a
legitimate 'hermes gateway stop' was still misclassified as an unexpected
UNKNOWN exit (exit 1) and revived by the service manager — a residual half of
the #34597 crash loop on the legitimate-stop path.

Align the consume with the probe: when both start_times are known they must
match (PID-reuse guard preserved on Linux); when either is unavailable, fall
back to PID equality alone, bounded by the existing short marker TTL. This
also fixes the parallel --replace takeover consume on Windows, which shares
the same helper.

Adds regression tests for the Windows (None start_time) path, the foreign-PID
rejection under that fallback, and confirmation the start_time-mismatch guard
still rejects when both are known.

---------

Co-authored-by: Bartok9 <danielrpike9@gmail.com>
2026-05-29 17:36:58 +00:00
alt-glitch
0563ab0652 fix(test): add fal_client.submit stub to surface matrix test
The plugin switched from fal_client.subscribe() to submit()+handle.get().
The test mock only had subscribe, causing CI failures.
2026-05-29 22:26:24 +05:30
alt-glitch
3183b2e28c fix(video_gen): veo3.1 duration format and 4k resolution
FAL veo3.1 API expects duration as "4s"/"6s"/"8s" (with unit suffix),
not bare "4"/"6"/"8" like other families. Add per-family duration_suffix
field and apply it in _build_payload. Also add "4k" to veo3.1 resolutions
per FAL API docs.

Note: the managed gateway currently rejects the "4s" format (expects
integer duration). Gateway-side fix needed for veo3.1 to work through
the Nous subscription path.
2026-05-29 22:26:24 +05:30
alt-glitch
a4c18f65d4 feat(video_gen): wire Nous subscription override into hermes tools UX
Add the same managed-gateway UX that image_gen already has:

- TOOL_CATEGORIES['video_gen'] gets a 'Nous Subscription' provider row
  with managed_nous_feature='video_gen' + video_gen_plugin_name='fal'
- NousSubscriptionFeatures gains a video_gen property + feature state
  computation (managed/active/available using the fal-queue gateway)
- _GATEWAY_TOOL_LABELS, _GATEWAY_DIRECT_LABELS, _ALL_GATEWAY_KEYS,
  _get_gateway_direct_credentials, opted_in all include video_gen
- apply_nous_managed_defaults and apply_gateway_defaults handle video_gen
- _is_toolset_satisfied checks Nous features for video_gen
- _is_provider_active detects managed video_gen (use_gateway + fal provider)
- _select_plugin_video_gen_provider accepts use_gateway kwarg, propagated
  from all 4 call sites in _configure_provider when managed_feature is set
- hermes setup status shows 'Video Generation (FAL via Nous subscription)'

Users on a Nous subscription can now pick 'Nous Subscription' under
hermes tools → Video Generation, which sets video_gen.provider=fal +
video_gen.use_gateway=true. The FAL plugin's _resolve_managed_fal_video_gateway
then routes through the managed queue gateway — no FAL_KEY needed.
2026-05-29 22:26:24 +05:30
alt-glitch
b6294ea9f1 test(video_gen): cover gateway decision matrix gaps and 4xx error path
- Add test for 4xx ValueError with actionable remediation message
- Add test for is_available() returning True via managed gateway
- Add test for prefers_gateway overriding direct FAL_KEY
- Add test for is_available() via gateway in plugin test file
2026-05-29 22:26:24 +05:30
alt-glitch
d04b3c193e feat(video_gen): route FAL video gen through managed Nous gateway
Wire plugins/video_gen/fal/__init__.py to use the same
_ManagedFalSyncClient pattern that image gen already uses.

Changes:
- Add managed gateway resolution, client caching, and
  _submit_fal_video_request() that routes between direct FAL_KEY
  and Nous gateway modes
- Update is_available() to return True when either FAL_KEY or the
  managed gateway is reachable
- Update generate() to use submit+get handle pattern instead of
  fal_client.subscribe() directly
- Fix happy-horse endpoint namespace: fal-ai/ → alibaba/ (matches
  the tool-gateway allowlist from fal-video-gen branch)
- Surface actionable error on 4xx gateway rejections

Tests:
- 4 new tests in test_managed_media_gateways.py (gateway routing,
  client reuse, direct mode fallback, alibaba namespace)
- Updated existing test_fal_plugin.py fixture to use submit/handle
  pattern and patch _resolve_managed_fal_video_gateway for isolation
2026-05-29 22:26:24 +05:30
kshitijk4poor
38695254f8 perf(state): merge FTS5 segments on VACUUM + add 'hermes sessions optimize'
The FTS5 indexes (messages_fts, messages_fts_trigram) grow as a series of
incremental b-tree segments — one per trigger-driven insert batch. SQLite's
automerge caps at ~16 segments, so a long-lived store keeps scanning many
segments per MATCH and never collapses them unless the special 'optimize'
command runs. Nothing in the codebase ever ran it: vacuum() only fired after
a prune that deleted rows, and even then never merged FTS segments.

Changes:
- SessionDB.optimize_fts(): merges each FTS5 index to a single segment,
  probing for the (optional/lazy) trigram table first so it is safe to call
  unconditionally. Layout-only — search results and snippet() are unchanged.
- vacuum() now calls optimize_fts() before VACUUM so freed index pages are
  returned to the OS in the same pass.
- 'hermes sessions optimize' CLI subcommand for on-demand reclamation +
  segment compaction (previously there was no way to compact the store
  without a prune deleting rows), with before/after size reporting.

Benchmark (8000 msgs, fragmented to 8 segments/index):
- segments 8 -> 1 on both indexes
- porter MATCH 5.5x faster (0.449 -> 0.081 ms/q)
- trigram MATCH 3.0x faster (0.632 -> 0.207 ms/q)
- 8000 matches before == 8000 after, identical row ids (no functional change)

Orthogonal to the structural FTS-size PRs (#20239 external-content,
#27770 optional trigram) — segment merge helps regardless of those.

Tests: TestOptimizeFts covers index count, search+snippet preservation,
missing-trigram path, and idempotency. Full test_hermes_state.py green (227).
2026-05-29 05:09:56 -07:00
teknium1
1c53d39eaa test: deflake process-registry kill + PTY resize tests
Two CI flakes surfaced on PR #34572 (both in files this PR doesn't touch;
pre-existing host-dependent flakes):

1. test_process_registry::TestPopenLeakOnSetupFailure — the failure-cleanup
   tests use a fake proc.pid (8888/9999) and assert proc.kill() runs. But
   spawn_local's primary cleanup is os.killpg(os.getpgid(pid), SIGKILL),
   falling back to proc.kill() only on ProcessLookupError/PermissionError/
   OSError. When the fake PID happens to exist on a busy host, os.getpgid
   succeeds, os.killpg fires against an UNRELATED real process group, and
   proc.kill() is never reached -> flaky AssertionError (and a real risk of
   SIGKILLing an innocent process group from a unit test). Patch os.getpgid
   to raise ProcessLookupError so the fallback path runs deterministically
   and no real killpg is ever issued.

2. test_web_server::test_resize_escape_is_forwarded — the receive loop calls
   the blocking conn.receive_bytes() with no exception guard. Once the child
   prints its winsize and exits, the PTY closes; on a missed-marker run the
   next recv blocks until the 30s pytest-timeout instead of failing fast.
   Add a try/except break (matching the working sibling tests) and bump the
   child's pre-read sleep 0.15s -> 0.5s so the resize reliably lands first.

Verified: 4/4 pass across 3 consecutive runs; root cause for #1 reproduced
(os.getpgid(1) succeeds -> old code skips proc.kill).
2026-05-29 04:22:41 -07:00
teknium1
fd09b2c55e fix(gateway): trust adapter-owned access policy over env default-deny (#34515)
Config-driven platform policies (dm_policy / group_policy / allow_from /
group_allow_from) for WeCom, Weixin, Yuanbao, and QQBot now work without
also setting a PLATFORM_ALLOWED_USERS env var.

These adapters enforce their access policy at intake — a message is dropped
inside the adapter and never dispatched unless it already passed the policy.
The gateway's env-based check (_is_user_authorized) ran afterward and, with
no env allowlist set, fell through to an env-only default-deny — silently
rejecting `dm_policy: open` and config-only allowlists the adapter had
already authorized.

Rather than re-implement each adapter's policy a second time in run.py
(which would drift), adapters that own their gate now declare it via a new
BasePlatformAdapter.enforces_own_access_policy property (default False). The
gateway trusts that flag and skips the env-only default-deny for those
platforms. Env allowlists still take precedence when set.

Also resolves unauthorized DM behavior from config dm_policy so allowlist /
disabled policies drop unauthorized DMs silently instead of leaking pairing
codes, while an explicit pairing policy opts back in.

Co-authored-by: Frowtek <frowte3k@gmail.com>
2026-05-29 04:22:41 -07:00
Teknium
e4b9532c18
feat: embedder environment-hint hook for the system prompt (#34574)
* fix(security): block AWS SDK creds from subprocess env

* fix(security): narrow Bedrock subprocess strip to inference bearer token only

Scopes the AWS_SDK subprocess strip down from the full AWS credential chain
to just AWS_BEARER_TOKEN_BEDROCK — the only Hermes-managed *inference* secret
(analogous to OPENAI_API_KEY). The general AWS credential chain
(AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN / AWS_PROFILE
/ config + role pointers) is intentionally left inheritable.

Why: per SECURITY.md §3.2 the local terminal is the user's trusted operator
shell. Hard-blocklisting the general chain would (a) regress *every* user who
runs aws/terraform/cdk/boto3 in the agent terminal — not just Bedrock users,
since PROVIDER_REGISTRY is iterated unconditionally at import — and (b) be
unrecoverable, because env_passthrough.py refuses to re-allow anything in
_HERMES_PROVIDER_ENV_BLOCKLIST (GHSA-rhgp-j443-p4rf). The narrow strip closes
the reported leak (opencode enumerating the Bedrock catalog off the leaked
bearer token) with no capability loss.

Keeps zapabob's self-healing auth_type=="aws_sdk" mechanism so any future
SDK-cred provider is covered automatically.

Tests: bearer token stripped + general chain preserved (no-regression guard),
on both the runtime strip path and the blocklist-membership path.

Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>

* feat: embedder environment-hint hook for the system prompt

Adds HERMES_ENVIRONMENT_HINT env var (and config.yaml agent.environment_hint)
so a host wrapping Hermes (sandbox runner, managed platform) can describe the
runtime environment — proxy, credential handling, mount layout — in the system
prompt's environment-hints block, without editing the identity slot (SOUL.md).

Read once at prompt-build time, so it lands in the stable, cache-safe portion
of the system prompt. Env var overrides the config key (build-time/container
mechanism). Empty by default — no behavior change for existing installs.

---------

Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>
2026-05-29 04:10:05 -07:00
briandevans
6e179c44b1 fix(web): ensure plugin discovery before web_*_tool registry lookups
Web search/extract dispatch read agent.web_search_registry before plugin
discovery had run, so in any process that hadn't imported model_tools.py
(subprocess agent runs, delegate children, standalone scripts) the registry
was empty: get_provider('firecrawl') returned None and the dispatcher emitted
the misleading 'No web extract provider configured' error even with
web.extract_backend set and FIRECRAWL_API_KEY exported.

Adds an idempotent _ensure_web_plugins_loaded() helper (mirrors
tools.browser_tool._ensure_browser_plugins_loaded) and calls it at the top of
both the web_search_tool and web_extract_tool dispatch sites before the
registry lookup.

Fixes #27580.

Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>
2026-05-29 04:00:00 -07:00
teknium1
c77a697fa4 refactor(vision): consolidate native fast-path gate into one shared helper
The fast-path decision (native routing + provider allowlist OR
supports_vision override) lived inline in vision_analyze and was copied
into browser_vision. Extract it to _should_use_native_vision_fast_path()
so both tools share one source of truth.

- vision_tools: gate logic now one helper; vision_analyze calls it in 3 lines
- browser_tool: thin envelope decoration over the shared helper, not a copy
- browser_vision typed Union[str, Dict] to match its real return shape
- tests slimmed to target the override path + text-mode-wins invariant
2026-05-29 03:58:56 -07:00
tillfalko
2402ec5e7b test: extend test coverage to native image routing 2026-05-29 03:58:56 -07:00
EloquentBrush0x
784d8dd2c2 fix(matrix): fail-closed approval reaction auth when MATRIX_ALLOWED_USERS is empty
The _on_reaction approval handler used:

    if self._allowed_user_ids and sender not in self._allowed_user_ids:

When MATRIX_ALLOWED_USERS is not configured, _allowed_user_ids is an
empty set. The short-circuit on the empty set caused the deny block to
never execute, allowing any Matrix room member to approve or deny tool
calls via / reactions — even users that run.py's _is_user_authorized
would reject for regular messages.

Fix mirrors the Telegram _is_callback_user_authorized fix (commit
89d32052e, PR #28494): deny by default when no allowlist is configured,
unless GATEWAY_ALLOW_ALL_USERS=true is explicitly set.
2026-05-29 03:58:45 -07:00
teknium1
3171845479 fix(code-exec): make dropped HERMES_* env vars diagnosable in sandbox scrub
Follow-up mitigation for the #27303 env-scrub tightening. Dropping the
broad HERMES_ prefix in favor of a 4-var operational allowlist is correct
hardening, but a sandbox script that imports a repo module reading a
non-allowlisted HERMES_* var at import time would otherwise see it
silently unset. _scrub_child_env now emits a one-shot debug log naming the
dropped non-secret HERMES_* vars and pointing at the env_passthrough
opt-in escape hatch. Secret-shaped vars are never named in the log.

Tests: dropped vars are logged + env_passthrough named; no log when
nothing is dropped; secret vars excluded from the diagnostic.
2026-05-29 03:44:49 -07:00
firefly
4bdae34771 test(code-exec): regression suite for the approval-bypass cluster
Cover context+callback propagation and teardown-clears, a source guard that both RPC threads stay wrapped, the check_execute_code_guard decision matrix (isolated backend, headless-local, cron-deny, gateway approve/deny/timeout/missing-notify, smart mode, session-yolo), the env-scrub allowlist/secret rules, and a behavioral test that execute_code() blocks before spawning on denial.

Refs #4146, #27303, #30882, #33057
2026-05-29 03:44:49 -07:00
firefly
1083977261 fix(code-exec): restore approval context in execute_code RPC threads + guard entry
Wrap both execute_code RPC threads (local UDS + remote file-RPC) with propagate_context_to_thread so gateway sessions no longer fall into check_dangerous_command's non-interactive auto-approve branch and the CLI approval prompt stays reachable. Add check_execute_code_guard: one-shot fail-closed approval of the whole script in gateway/ask/cron-deny before the child spawns (skips isolated backends; command-string built only past the early returns). Drop the broad HERMES_ env passthrough for an explicit operational allowlist plus DSN/WEBHOOK secret substrings, and update the POSIX-equivalence oracle.

Refs #4146, #27303, #30882, #33057
2026-05-29 03:44:49 -07:00
firefly
21aeefe5fd fix(code-exec): propagate agent-turn context into tool worker threads
Worker threads that dispatch Hermes tools started with an empty contextvars.Context and no thread-local approval/sudo callbacks. Add tools/thread_context.propagate_context_to_thread factoring that capture/install/clear lifecycle (mirrors the GHSA-qg5c-hvr5-hjgr pattern), and refactor agent/tool_executor onto it so the security-critical logic lives in one audited place. Update the contextvar-propagation source guard for the new call shape.

Refs #33057
2026-05-29 03:44:49 -07:00
kshitijk4poor
a22c250001 refactor(auth): remove vestigial Nous min_key_ttl/inference_auth_mode params
After the legacy session-key path was removed, two parameters became dead
surface on the Nous runtime-resolution chain:

- min_key_ttl_seconds: del'd inside refresh_nous_oauth_pure and pass-through /
  telemetry-only in refresh_nous_oauth_from_state, _try_import_shared_nous_state,
  _nous_device_code_login, and resolve_nous_runtime_credentials. It controlled the
  now-deleted agent-key mint TTL and drives no behavior.
- inference_auth_mode: with the legacy mode gone, AUTO and FRESH are behaviorally
  identical; the value only fed _normalize_nous_inference_auth_mode validation and
  oauth trace output, never a branch.

Removing inference_auth_mode orphaned its whole supporting cluster
(NOUS_INFERENCE_AUTH_MODE_AUTO/FRESH, NOUS_INFERENCE_AUTH_MODES,
_normalize_nous_inference_auth_mode), and dropping min_key_ttl_seconds orphaned
DEFAULT_AGENT_KEY_MIN_TTL_SECONDS — all deleted here.

Updated every caller (run_agent, auxiliary_client, credential_pool, proxy adapter,
runtime_provider, web_server, main, auth_commands, setup) and pruned the matching
test kwargs. Deleted two tests that exercised the removed surface
(test_legacy_auth_mode_is_rejected, test_try_refresh_..._accepts_explicit_auth_mode).

No behavior change: net -134 LOC of dead code.
2026-05-29 02:24:48 -07:00
Robin Fernandes
4e4984a11a test(auth): update nous jwt-only expectations 2026-05-29 02:24:48 -07:00
Robin Fernandes
7e958dafc2 fix(auth): address Nous JWT fallback review 2026-05-29 02:24:48 -07:00
Robin Fernandes
41ff6e5937 refactor(auth): Disable Nous legacy session key fallback 2026-05-29 02:24:48 -07:00
teknium1
18c9e89106 test: update _invoke_tool dispatch assertion for new toolset-scope kwargs
The scoping fix added enabled_toolsets/disabled_toolsets to the
agent_runtime_helpers sequential dispatch into handle_function_call, so
test_invoke_tool_dispatches_to_handle_function_call's assert_called_once_with
(exact match) needs the two new kwargs. Both are None for the default agent
fixture.
2026-05-29 02:04:12 -07:00
teknium1
7427b9d581 fix(tool-search): scope bridge catalog + dispatch to the session's toolsets
Tool Search read its catalog from the global registry (get_tool_definitions
with no toolset scope = 'start with everything'), so a restricted-toolset
session — subagent, kanban worker, curated gateway session — could:

  1. tool_search the entire process registry, not just its granted tools, and
  2. tool_call any registered plugin/MCP tool it was never given, because
     registry.dispatch() has no enabled_tools gate for non-execute_code tools.

A scoped session (enabled_toolsets=['mcp-github']) reported total_available=26
and successfully invoked an out-of-scope plugin tool via tool_call.

Fix:
- handle_function_call gains enabled_toolsets/disabled_toolsets; the bridge
  dispatch scopes get_tool_definitions to them (also stops polluting the
  process-global _last_resolved_tool_names with out-of-scope tools, which
  leaked into execute_code's sandbox-tool fallback).
- A defense-in-depth gate rejects any tool_call'd name not in the scoped
  deferrable catalog.
- tool_executor's unwrap (both concurrent + sequential paths) enforces the
  same scope before dispatch, since it unwraps tool_call -> underlying name
  and bypasses the bridge branch. New _tool_search_scoped_names() helper,
  cached per-agent on registry generation + toolset scope.
- New scoped_deferrable_names() helper in tool_search.py shared by both sites.

Tests: 4 new regression tests in TestRegression_ToolsetScoping (scoped
catalog, out-of-scope tool_call rejection, no global pollution, helper).
2026-05-29 02:04:12 -07:00
teknium1
369075dc95 feat(tools): progressive tool disclosure for MCP and plugin tools
Adds Tool Search, a structured-tools progressive-disclosure layer that
replaces MCP and non-core plugin tools in the model-visible tools array
with three bridge tools (tool_search / tool_describe / tool_call) when
the deferrable surface would consume more than a configurable percentage
of the active model's context window. Core Hermes tools are never deferred.

Default mode is 'auto' with a 10% context threshold, so small toolsets
pay no overhead. Set tools.tool_search.enabled to 'on' to force or 'off'
to disable.

Design carefully reflects the OpenClaw production failure modes
documented in the openclaw-tool-search-report:

  - Core tools never defer (toolsets._HERMES_CORE_TOOLS). Addresses the
    'tools silently missing from isolated cron turns' regression class
    (openclaw#84141) by construction: there is no code path that can
    drop a core tool.
  - Catalog is stateless across turns — rebuilt from the live tool-defs
    list on every assembly. No session-keyed Map that can drift out of
    sync with the registry.
  - tool_call unwraps the bridge call before any hook fires, so plugin
    pre/post hooks, guardrails, approval flows, and the activity feed
    all see the underlying tool name, not the bridge (addresses
    openclaw#85588 and the verbose-mode complaint on openclaw#79823).
  - The unwrap happens in both the parallel and sequential paths of
    agent/tool_executor.py and also in handle_function_call, so direct
    callers (sandboxed code, eval harnesses) are covered too.
  - Bridge tools cannot invoke each other (recursion guard) and cannot
    invoke core tools (those must be called directly).
  - Tools mode only — no JS-sandbox code-mode. Keeps the surface small.
  - Token estimation via cheap char/4 heuristic; precision isn't needed
    for the threshold decision.

Files:
  - tools/tool_search.py — new module (BM25 retrieval, classification,
    threshold gate, bridge dispatch, unwrap helper).
  - tests/tools/test_tool_search.py — 35 tests including the OpenClaw
    #84141 regression guard.
  - model_tools.py — wires assembly into _compute_tool_definitions as the
    final step, adds skip_tool_search_assembly kwarg so the bridge can
    see the real catalog, dispatches the three bridge tools.
  - agent/tool_executor.py — unwraps tool_call in both parallel and
    sequential parsing loops so checkpointing, guardrails, plugin hooks,
    and tool-progress callbacks all observe the underlying tool name.
  - hermes_cli/config.py — DEFAULT_CONFIG['tools']['tool_search'] block.
  - website/docs/user-guide/features/tool-search.md — user docs.

Validation:
  - 35/35 new tests pass.
  - Existing tool/registry/model_tools/config/coercion/executor tests
    (82 + 74 + small adjacents) green.
  - Live E2E: 20 fake MCP tools registered, get_tool_definitions returns
    3 bridges, tool_search returns top 3 hits, tool_describe returns
    full schema, tool_call dispatches to the real underlying handler
    and the underlying result is what the model sees.
  - Reserved-name recursion guard verified live.
  - Core-tool refusal via tool_call verified live.
2026-05-29 02:04:12 -07:00
teknium1
73d73f1f0d fix(codex): relax no-byte TTFB watchdog default from 12s to 120s
The chatgpt.com/backend-api/codex endpoint can spend tens of seconds in
backend admission / prompt prefill before emitting its first SSE event. The
12s no-byte TTFB cutoff aborted those still-valid streams, surfacing as
'Codex stream produced no bytes within 12s' through all retries (Discord
reports). The OpenAI SDK's own streaming read timeout is 600s, so 12s was
~50x more aggressive than the transport layer would have tolerated.

Default the no-byte cutoff to 120s and raise the openai-codex MAX cap default
to 120s so it no longer clamps the new default back to 20s. Disabling stays
available via HERMES_CODEX_TTFB_TIMEOUT_SECONDS=0; the 25k-token auto-disable,
_STRICT override, and post-first-event idle watchdog are unchanged.

Co-authored-by: Gille <4317663+helix4u@users.noreply.github.com>
2026-05-29 02:02:25 -07:00
teknium1
6bebab4761 fix(security): narrow Bedrock subprocess strip to inference bearer token only
Scopes the AWS_SDK subprocess strip down from the full AWS credential chain
to just AWS_BEARER_TOKEN_BEDROCK — the only Hermes-managed *inference* secret
(analogous to OPENAI_API_KEY). The general AWS credential chain
(AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN / AWS_PROFILE
/ config + role pointers) is intentionally left inheritable.

Why: per SECURITY.md §3.2 the local terminal is the user's trusted operator
shell. Hard-blocklisting the general chain would (a) regress *every* user who
runs aws/terraform/cdk/boto3 in the agent terminal — not just Bedrock users,
since PROVIDER_REGISTRY is iterated unconditionally at import — and (b) be
unrecoverable, because env_passthrough.py refuses to re-allow anything in
_HERMES_PROVIDER_ENV_BLOCKLIST (GHSA-rhgp-j443-p4rf). The narrow strip closes
the reported leak (opencode enumerating the Bedrock catalog off the leaked
bearer token) with no capability loss.

Keeps zapabob's self-healing auth_type=="aws_sdk" mechanism so any future
SDK-cred provider is covered automatically.

Tests: bearer token stripped + general chain preserved (no-regression guard),
on both the runtime strip path and the blocklist-membership path.

Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>
2026-05-29 01:48:08 -07:00
zapabob
95b5b72404 fix(security): block AWS SDK creds from subprocess env 2026-05-29 01:48:08 -07:00
Teknium
db2ce9e7d2
fix(compression): fail open when lock subsystem is missing (version skew) (#34475)
A process running mismatched module versions — conversation_compression.py
re-imported with the post-#34351 lock code while a long-lived
hermes_state.SessionDB stays bound to the pre-#34351 class in memory — has
the try_acquire_compression_lock call site but not the method. The
AttributeError it raises is NOT a sqlite3.Error, so the method's own
fail-open guard never runs; the exception escapes to the outer agent loop,
which prints the error and retries. Compression never succeeds, the token
count never drops, and the loop re-triggers compaction forever (the
'API call #47/#48/#49 ... has no attribute try_acquire_compression_lock'
spin a user hit after an update).

Wrap the lock acquire so any unexpected exception fails OPEN: skip locking
and proceed with compression. Skipping the lock risks a rare
concurrent-compression session fork; an infinite no-progress loop that never
compresses at all is strictly worse. The remediation hint in the log points
at the real fix (restart / hermes update to resync the stale module).

Also guards get_compression_lock_holder against the same skew.

Adds a regression test simulating the version skew (real SessionDB wrapped
so only the lock methods raise AttributeError) — asserts _compress_context
proceeds and rotates instead of raising.
2026-05-29 01:32:32 -07:00
Teknium
e28a668b40 fix(gateway): diagnosable MEDIA rejections + canonical cache roots + null-path guard
Operators can now see which MEDIA path was dropped and why, generated
artifacts under the canonical ~/.hermes/cache/{images,...} layout deliver,
and a crafted ~\x00 path no longer aborts the whole attachment batch.

- MEDIA_DELIVERY_SAFE_ROOTS: add canonical cache/{images,audio,videos,
  documents,screenshots} alongside the legacy *_cache dirs (#31733).
- filter_media/local_delivery_paths: log the rejected path (was a blind
  "outside allowed roots") via _log_safe_path, which strips control chars
  and Unicode line separators so a model-emitted path can't forge a log line.
- validate_media_delivery_path + extract_media: guard os.path.expanduser
  so a ~\x00 path returns None / is skipped instead of raising and dropping
  every other attachment in the response.

Salvaged and slimmed from #33251 (780 LOC -> 35): the reason-tag taxonomy,
the parts-eliding redactor, and the extension-partition hoist are dropped in
favor of logging the path directly. All three findings were verified and
reproduced by the contributor.

Co-authored-by: wysie <wysie@users.noreply.github.com>
2026-05-29 01:23:35 -07:00
teknium1
2765b02021 fix(packaging): ship bundled plugin.yaml manifests in wheel and sdist
The v0.15.0 PyPI wheel shipped every plugin's Python code but none of its
plugin.yaml manifests, so plugin discovery (hermes_cli/plugins.py) found zero
plugins and ALL gateway platforms failed with "No adapter available for
<platform>" (discord, slack, mattermost, ...). Same gap also dropped the
web-search provider manifests (#28149).

Declare manifest coverage in both packaging channels:
- wheel: [tool.setuptools.package-data] plugins += **/plugin.yaml, **/plugin.yml
- sdist: MANIFEST.in recursive-include plugins plugin.yaml plugin.yml
  (Homebrew and other downstream packagers build from the sdist)

Verified by building the wheel before/after: plugin.yaml count went 0 -> 69,
discord's manifest now ships. Adds a regression test asserting both channels
cover manifests.

Fixes #34034

Co-authored-by: outsourc-e <201563152+outsourc-e@users.noreply.github.com>
Co-authored-by: Dhruvil Parikh <41384593+dparikh79@users.noreply.github.com>
Co-authored-by: ousiaresearch <261687298+ousiaresearch@users.noreply.github.com>
Co-authored-by: libre-7 <6366424+libre-7@users.noreply.github.com>
2026-05-29 01:23:28 -07:00
Teknium
c01a2df0a3
fix(auth): don't launch a text-mode browser inside the terminal for OAuth (#34479)
OAuth auto-open only checked _is_remote_session() (SSH + cloud-shell env
vars). On a headless/CLI-only Linux box with no GUI browser, none of those
trip, so webbrowser.open() resolved to a console browser (w3m/lynx/links)
and launched it INSIDE the terminal — hijacking the user's TTY with the
xAI 'Account Management' login page instead of letting them copy the URL.

Add _can_open_graphical_browser(): returns False when webbrowser would
resolve to a known console browser, when $BROWSER names one, when there's
no display server on Linux, or when no browser resolves at all. Gate all 5
OAuth auto-open callsites (xAI loopback, Spotify loopback, MiniMax device
code, Anthropic, Google) on it in addition to the existing remote check.
Headless boxes now print the URL / fall through to manual-paste instead.
2026-05-29 01:23:06 -07:00
wysie
f32b66c758 fix: improve plugins list usability 2026-05-29 00:59:42 -07:00
Blake
26b83a5f5f fix(cli): ignore terminal focus reports (salvage of #16780)
Ghostty/macOS window or tab navigation (Cmd+Shift+[ / ], Alt+Tab,
etc.) can deliver terminal focus reports (CSI I / CSI O) to the
running TUI. prompt_toolkit does not map those sequences by default,
so its parser falls back to literal key presses (ESC, [, I/O) and
inserts `[I` / `[O` into the prompt buffer after the ESC byte is
handled.

Fix: register the two sequences as Keys.Ignore in ANSI_SEQUENCES at
parser level, plus a no-op kb.add(Keys.Ignore) handler so the
default self-insert path never inserts focus-report bytes.

Salvage notes: original PR put the helper in cli.py. Salvaged into
hermes_cli/pt_input_extras.py alongside install_shift_enter_alias /
install_ctrl_enter_alias to match the established pattern for
ANSI_SEQUENCES augmentation. setdefault → in-check so any prior user
registration wins.

Closes #16780
2026-05-29 00:31:44 -07:00
moikapy
f6a2ba6261 fix(auxiliary): detect xAI OAuth 403 bad-credentials as auth error
xAI returns HTTP 403 (not 401) with unauthenticated:bad-credentials
when an OAuth2 access token has expired or is invalid. The existing
_is_auth_error() only checked for 401 status codes, so these tokens
were never refreshed and the 403 propagated as a generic permission
denied error.

Three fixes:

1. _is_auth_error: Recognize xAI's 403+bad-credentials pattern as
   an auth failure, triggering token refresh instead of silent failure.

2. _refresh_provider_credentials: Add xai-oauth branch with
   pool-level refresh (try_refresh_current with select to ensure
   current entry) then fallback to singleton resolver with
   force_refresh=True.

3. _recoverable_pool_provider: Map api.x.ai host to xai-oauth
   pool for auto-resolved providers, matching existing pattern for
   openai-codex/openrouter/nous/anthropic.

Includes 14 tests covering the new detection logic, host mapping,
and graceful fallback behavior.

Signed-off-by: moikapy <moikapy@devmoi.com>
2026-05-29 00:28:02 -07:00
teknium1
bc736ff543 test(model-catalog): use exact URL equality in fallback tests
CodeQL flagged 'hermes-agent.nousresearch.com' in url and similar substring
checks as py/incomplete-url-substring-sanitization. The rule is about URL
allowlist checks in production code, not test routing — there's no
security boundary here. Switch to url == self.PRIMARY / self.FALLBACK,
which is the same semantic and silences the rule.
2026-05-29 00:25:36 -07:00
teknium1
f2d88c820c fix(model-catalog): fall through to raw.github when Vercel 403s; swap step-3.5-flash for step-3.7-flash on OpenRouter+Nous
The docs site (Vercel) serves /docs/api/model-catalog.json behind a bot
mitigation rule that returns HTTP 403 + x-vercel-mitigated: challenge for
non-browser User-Agents — including urllib (what the CLI uses) and curl.
When that happens, get_catalog() falls back to the stale disk cache and
new model releases (Opus 4.8, etc.) never reach the /model picker even
though they're already in OPENROUTER_MODELS and the live OpenRouter API.

Adds a fallback URL chain: when the primary catalog URL fails, walk
DEFAULT_CATALOG_FALLBACK_URLS — currently the raw.githubusercontent.com
copy of the same file. GitHub raw doesn't bot-gate, so the manifest stays
reachable through Vercel firewall hiccups. Per-provider override URLs
keep their direct-fetch semantics (operators configure those specifically,
no implicit fallback).

Also swaps stepfun/step-3.5-flash for stepfun/step-3.7-flash in the
OpenRouter + Nous Portal curated picker lists. Native stepfun provider
configuration (api.stepfun.ai) is left alone — that depends on what
stepfun.ai itself serves, not what OpenRouter routes.

Test plan: 5 new TestFallbackChain tests cover primary-success,
primary-failure-fallback-success, all-fail, primary==fallback-dedup, and
end-to-end get_catalog routing through the new helper. Existing 23 tests
in test_model_catalog.py still pass (28 total). Wider tests/hermes_cli/
sweep: 5701/5701 pass.
2026-05-29 00:25:36 -07:00
Rohit Sharma
9d4fda9952 feat(kanban): add POST /runs/{run_id}/terminate endpoint
Closes the termination-control gap left by PR #28432, which shipped the
read-only sibling endpoints (/workers/active, /runs/{run_id},
/runs/{run_id}/inspect) but no way to stop a misbehaving worker from
the dashboard without dropping to the CLI.

The new endpoint resolves run_id -> task_id and delegates to the
existing kanban_db.reclaim_task() flow, so the SIGTERM->SIGKILL
escalation, run-outcome bookkeeping, and event-log append all match
POST /tasks/{task_id}/reclaim exactly. No new termination semantics
introduced.

Responses:
  200 {ok, run_id, task_id} on success
  404 unknown run_id
  409 run already ended OR task no longer reclaimable

Refs: #23762
2026-05-29 00:21:54 -07:00
teknium1
7d10105918 test(kanban): update iteration-exhaustion tests for #29747 gap 2
The two tests in TestRunConversation now verify the new behavior:
  - test_kanban_block_called_on_iteration_exhaustion → verifies
    _record_task_failure(outcome='timed_out') is called instead of
    kanban_block
  - test_no_kanban_block_when_not_in_kanban_mode → verifies the bridge
    is a no-op when HERMES_KANBAN_TASK is unset

The function names are kept for diff stability; both assert against
_record_task_failure now, which is the correct contract per the gap-2
fix in this PR.
2026-05-29 00:13:29 -07:00