Commit graph

980 commits

Author SHA1 Message Date
Teknium
cc93053b42 fix(xai-oauth): apply WKE disambiguator to recovery-path catch-all (#29344)
_recover_with_credential_pool had a second classification site that blanket-
treated any 403 against xai-oauth as entitlement (defense-in-depth for
#26847).  That override defeated the new _is_entitlement_failure
disambiguator from the parent commit — bad-credentials 403s still
short-circuited the refresh path.

Apply the same WKE-unauthenticated / OAuth2-validation-phrase guard at
the override site so xAI's authoritative 'this is auth, not entitlement'
signal wins there too.  The #26847 catch-all still triggers for genuine
entitlement bodies that don't carry the disambiguator.

Closes the end-to-end gap exposed by
test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403.
2026-05-23 02:48:13 -07:00
xxxigm
30c22f1158 fix(api-call): defer client.close() to owning worker thread on interrupt (#29507)
Layer-2 defense for the FD-recycling race: even with
``force_close_tcp_sockets`` reduced to shutdown-only, the followup
``client.close()`` in ``_close_openai_client`` still walks the httpx
pool and closes sockets — and if called from a stranger thread (the
interrupt-check loop, the stale-call detector) it has the same
FD-recycling exposure that wrote a TLS record on top of ``kanban.db``.

Stamp the request_client_holder with the owning thread's ident at
``_set_request_client`` time. In ``_close_request_client_once``:

* Owning thread (the worker's ``finally``) → pop + ``client.close()``
  via ``_close_request_openai_client``, exactly as before.
* Stranger thread → ``_abort_request_openai_client`` (new): only
  ``shutdown(SHUT_RDWR)`` the pool sockets and log a deferred-close
  marker. The holder stays populated so the worker's eventual
  ``finally`` performs the real close from its own thread context,
  where the FD release races nothing.

Applied symmetrically to both the non-streaming
``interruptible_api_call`` and the streaming variant — both routinely
get hit by stranger-thread interrupts.

The log field ``tcp_force_closed=N`` keeps its existing shape; the new
abort path adds ``deferred_close=stranger_thread`` so production
triage can distinguish the two close kinds.
2026-05-23 02:31:10 -07:00
xxxigm
e2a7d73a66 fix(force_close_tcp_sockets): shutdown only, do not release FD (#29507)
The helper used to call ``socket.shutdown(SHUT_RDWR)`` followed by
``socket.close()`` to drop CLOSE-WAIT entries immediately. On its own
``shutdown()`` is safe from any thread — it only sends FIN and breaks
pending ``recv``/``send`` — but ``close()`` releases the FD integer to
the kernel. When the helper runs on a stranger thread (the interrupt
loop, the stale-call detector) the FD release races the owning httpx
worker thread that still has the same integer cached inside the SSL
BIO. The kernel then recycles that integer to the next ``open()`` call
— in production, kanban dispatcher's ``kanban.db`` — and the worker's
delayed TLS flush writes a 24-byte TLS application-data record on top
of the SQLite header.

Restrict the helper to ``shutdown(SHUT_RDWR)`` only. The owning httpx
worker's own unwind will close the underlying socket via the same
Python ``socket.socket`` object, which atomically swaps ``_fd`` to -1
before issuing ``close(2)`` — no FD-aliasing window.

The log field ``tcp_force_closed=N`` is kept (now counts shutdowns) so
existing dashboards / log parsers keep working.
2026-05-23 02:31:10 -07:00
sprmn24
53cb6d32be fix(agent): use atomic_json_write for request debug dumps instead of bare write_text 2026-05-23 02:30:57 -07:00
sprmn24
b2e6fdd3bf fix(agent): log warning when fallback model normalization fails instead of silently swallowing 2026-05-23 02:23:24 -07:00
Teknium
97e975edd2 fix(file-safety): widen read-deny to .env, mcp-tokens/, webhook secrets, root
Extends @briandevans's PR #17659 from {auth.json, auth.lock,
.anthropic_oauth.json} to also cover:

  - HERMES_HOME/.env                       (provider API keys)
  - HERMES_HOME/webhook_subscriptions.json (per-route HMAC secrets)
  - HERMES_HOME/mcp-tokens/                (OAuth token directory; dir
                                            + everything inside)

…AND iterates over both _hermes_home_path() AND _hermes_root_path()
so profile-mode runs (HERMES_HOME = <root>/profiles/<name>) also block
<root>/{auth.json, .env, mcp-tokens/, ...}. Same widening shape as the
write-deny side already does (#15981, #14157).

Explicitly NOT a security boundary. Per the personal-assistant trust
model, the terminal tool runs as the same OS user and can `cat
auth.json` directly. This read-deny exists as defense-in-depth:

  - Models that respect tool denials empirically tend to stop rather
    than reach for the shell.
  - The denial surfaces an audit trail when something tries to read
    credentials — easier to spot in logs than a generic `cat`.

Docstring + error message both flag this as defense-in-depth so future
contributors don't mistake it for a real security boundary and don't
re-decline reports that propose the same fix shape.

Absorbs the .env and mcp-tokens/ coverage from @tomqiaozc's parallel
PR #8055 (closed-as-duplicate, credited).

Co-authored-by: Tom Qiao <zqiao@microsoft.com>
2026-05-22 20:15:09 -07:00
briandevans
567ea61298 fix(file-safety): block auth.json read via TERMINAL_CWD relative path
read_file_tool resolves relative paths against TERMINAL_CWD (or the
task's live terminal cwd), but the prior call passed the original
unresolved string to get_read_block_error. That function's own
resolve() is anchored at the Python process cwd, so when a task's
TERMINAL_CWD pointed at HERMES_HOME and the agent issued read_file
on the relative path "auth.json", the credential-store denylist was
never reached and the file was read normally.

Pass the already-resolved absolute path string at the file_tools call
site, document the contract on get_read_block_error, and add a
read_file_tool-level regression test that pins the relative-path
case under TERMINAL_CWD == HERMES_HOME.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 20:15:09 -07:00
briandevans
056e00a77e fix(file-safety): block read_file on HERMES_HOME credential stores (#17656)
`get_read_block_error` previously only denied reads inside
`${HERMES_HOME}/skills/.hub`, which left `auth.json` (provider OAuth
state + plaintext API keys) and `.anthropic_oauth.json` (Anthropic PKCE
tokens) directly readable by the agent. A prompt-injection reaching
`read_file` could exfiltrate active provider credentials in plaintext.

Mode-0600 file permissions only protect against *other Unix users* —
the agent runs as the file's owner, so `read_file` is unaffected.

Extend the existing deny list with the three credential paths
identified in #17656 (`auth.json`, `auth.lock`, `.anthropic_oauth.json`).
The check uses the same `Path.resolve()` pattern as `skills/.hub`, so
symlink/path-traversal indirection is caught too. The agent doesn't
need to read these directly — `auxiliary_client` and `credential_pool`
consume them through process env / OAuth flows that bypass `read_file`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 20:15:09 -07:00
Teknium
a84cec61ca
fix(minimax-oauth): refresh short-lived access tokens per request (#30619)
* fix(minimax-oauth): refresh short-lived access tokens per request

MiniMax OAuth issues ~15-minute access tokens. The Anthropic SDK caches
api_key as a static string at client construction, so a session that
resolves credentials once at startup keeps sending the same bearer until
MiniMax returns 401 mid-session.

Swap the static string for a callable token provider, reusing the existing
Entra-ID bearer-hook infrastructure in build_anthropic_client. The callable
re-reads auth.json on each invocation and calls _refresh_minimax_oauth_state,
which is a no-op when the token still has more than 60s of life left and
refreshes proactively otherwise. Refreshes persist to auth.json so other
processes (gateway, cron) see them immediately.

The wire-up lives at the agent-init / model-switch boundary rather than in
resolve_runtime_provider, so aux client paths that hand the api_key string
to OpenAI(api_key=...) are unaffected.

* docs: add infographic for minimax-oauth token refresh
2026-05-22 15:16:15 -07:00
Julien Talbot
09afafb87e fix(xai): resolve Grok Build context for OAuth 2026-05-22 13:05:36 -07:00
Teknium
42104218e0 fix(file-safety): also write-deny <root>/control-files in profile mode
PR #14157 added control-plane write-deny against the ACTIVE HERMES_HOME,
which is fine in non-profile mode but leaves a gap once a profile is
active: HERMES_HOME points at <root>/profiles/<name>, so the global
<root>/auth.json + <root>/config.yaml + <root>/webhook_subscriptions.json
+ <root>/mcp-tokens/ remain writable. Same shape as the .env gap PR
#15981 closed via _hermes_root_path().

Apply the same widening pattern here. The control-file/mcp-tokens check
now iterates BOTH _hermes_home_path() and _hermes_root_path() (dedupes
when they coincide in non-profile mode). Also tightens the mcp-tokens
check from "startswith dir + os.sep" to "==dir OR startswith dir + os.sep"
so writing the directory entry itself is blocked, not just files inside.

Regression tests cover both protections in a real profile-mode layout
(<tmp>/hermes/profiles/coder as HERMES_HOME, <tmp>/hermes as root).
2026-05-22 04:32:14 -07:00
Pratik Rai
1f5219fda5 fix(security): protect Hermes control-plane files from prompt injection
Adds active-HERMES_HOME control-plane files to the write deny list:
auth.json, config.yaml, webhook_subscriptions.json, and any path
under mcp-tokens/. realpath() resolves before comparison so
directory-traversal and symlink targets are normalised, preventing
trivial deny-list bypass via ../ tricks.

Without this, a prompt-injected agent could rewrite Hermes' own
auth state or routing config via write_file / patch — without
triggering the terminal dangerous-command approval — and persist
attacker-controlled behaviour across sessions.

Fixes #14072
2026-05-22 04:32:14 -07:00
kshitijk4poor
9d61408837 refactor: extract 7 helpers from convert_messages_to_anthropic
Split convert_messages_to_anthropic (complexity 79) into 7 focused helpers:

- _convert_assistant_message    — assistant msg to content blocks
- _convert_tool_message_to_result — tool msg to tool_result + merge
- _convert_user_message         — user msg validation + conversion
- _strip_orphaned_tool_blocks   — orphan tool_use + tool_result removal
- _merge_consecutive_roles      — role alternation enforcement
- _manage_thinking_signatures   — strip/preserve/downgrade by endpoint
- _evict_old_screenshots        — keep only 3 most recent images

Main function complexity: 79 → 10 (below C901 threshold).
Zero logic changes — pure extraction. Net -4 lines (refactor itself);
+45/-17 follow-up polish for annotation tightening (List[Dict] →
List[Dict[str, Any]]), restored rationale comments in
_manage_thinking_signatures (third-party endpoint examples, #13848/#16748
issue refs, redacted_thinking 'data'-as-signature note), and "Mutates
``result`` in place." docstring lines on the four mutating helpers.
2026-05-22 04:23:02 -07:00
Teknium
c769be344a
fix(agent): recover from providers rejecting list-type tool content (#27344) (#30259)
Some providers (Xiaomi MiMo, some Alibaba endpoints, a long tail of
OpenAI-compatible servers) follow the OpenAI spec strictly and require
tool message `content` to be a string — they reject our list-type
content (text + image_url parts) with HTTP 400 'text is not set' /
'tool message content must be a string'.

Instead of an allowlist of known-good providers (maintenance burden,
guaranteed to miss aggregators like OpenRouter where the underlying
model determines support, not the aggregator name), this lands a
reactive recovery:

1. New `FailoverReason.multimodal_tool_content_unsupported` with a
   small pattern list covering the common 400 wordings.
2. `AIAgent._try_strip_image_parts_from_tool_messages` walks the API
   message list, downgrades any `role:tool` message whose content is
   list-with-image to a plain text summary (preserves text parts) in
   place, AND records the active (provider, model) in a session-scoped
   `_no_list_tool_content_models` set.
3. `_tool_result_content_for_active_model` short-circuits to a text
   summary when (provider, model) is in the cache — so after the first
   400 + retry, subsequent screenshots in the same session skip the
   round trip entirely.
4. Retry hook in `agent.conversation_loop` mirrors the existing
   `image_too_large` recovery: detect the reason, run the helper,
   retry once, fall through to the normal error path if no list-type
   tool content was actually present.

Cache is transient (per-session) by design — next session retries in
case the provider added support, no persistent state to maintain.

Fixes #27344. Closes #27351 (allowlist approach superseded by reactive
recovery).
2026-05-21 23:40:16 -07:00
helix4u
f6f25b9449 fix(agent): fail fast on small Ollama runtime context 2026-05-21 23:25:01 -07:00
Teknium
e77f1ed5f7 fix(agent): widen toolset gate to context engine tools (#5544 sibling)
The memory-provider gate added in the prior commit closes one of two
blind-injection sites in agent_init.py. The context engine block (lines
~1445) follows the identical pattern: agent.context_compressor.get_tool_schemas()
(lcm_grep, lcm_describe, lcm_expand) was appended to agent.tools unconditionally,
ignoring enabled_toolsets.

Same bug class, same local-model latency penalty, same one-line gate — using
'context_engine' as the toolset name (matches the existing plugin-system
convention in plugins.py, plugins_cmd.py, etc.).

Also adds Lempkey to scripts/release.py AUTHOR_MAP for the prior commit's
authorship.
2026-05-21 23:18:37 -07:00
lempkey
4c61fb6cf6 fix(agent): gate memory tool injection on enabled_toolsets (#5544)
MemoryManager.get_all_tool_schemas() output was appended to AIAgent.tools
unconditionally — bypassing the enabled_toolsets / platform_toolsets filter.
Setting `platform_toolsets: telegram: []` had no effect: fact_store and other
memory provider tools still leaked into the tool surface on every session.

Impact on local models (per @thundercat49's benchmarks on Qwen3-30B-A3B Q4_K_M /
RTX 3090): tool-formatted prompts process at 134 tok/s vs 1,230 tok/s for plain
text. With 8 memory tool schemas injected, a simple 'hello' on Telegram took
~42s instead of ~1.7s. Small models also entered tool-call loops when memory
tools were the only tools present.

Gate condition (matches the natural meaning of enabled_toolsets):
  None                       → no filter, inject (backward compat)
  contains 'memory'          → user opted in, inject
  otherwise (including [])   → skip injection

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-21 23:18:37 -07:00
teknium1
9896e43db5 fix(skills): load Linux-tagged skills on Termux (android sys.platform)
Reported by @LikiusInik in Discord: on Termux only 3 built-in skills
appeared and /gh-pr-workflow + every other slash-skill from
github/productivity/mlops was missing.

Root cause: skill_matches_platform() compares sys.platform.startswith()
against the skill's platforms list. Termux is a Linux userland on
Android, but Python 3.13+ reports sys.platform == "android" instead of
"linux" — so the ~60 built-in skills tagged platforms:[linux,macos,
windows] (github-pr-workflow, google-workspace, github-auth,
huggingface-hub, etc.) all got filtered out at the listing step in
tools/skills_tool.py:_find_all_skills and never appeared as /slash
commands or in skill_view.

Fix: when is_termux() detects we're running inside Termux, accept
"linux" platform tags regardless of whether sys.platform is "linux"
(pre-3.13) or "android" (3.13+). Also accept explicit
platforms:[termux] / [android] tags. macOS-only and Windows-only
skills correctly remain excluded.

E2E (simulated TERMUX_VERSION=set + sys.platform="android"):
  Before: _find_all_skills() returned ~3 skills.
  After:  _find_all_skills() returns 84 skills including
          github-pr-workflow, google-workspace, github-auth,
          huggingface-hub. Apple-only skills remain excluded.

Non-Termux Linux/macOS/Windows behavior unchanged (verified).

Tests: tests/agent/test_skill_utils.py — 9 new cases covering
android-as-Termux, the [linux,macos,windows] case, macOS-only
exclusion, explicit termux/android tags, non-Termux Android safety,
and unchanged behavior on real Linux/macOS.
2026-05-21 19:08:38 -07:00
Teknium
3fde8c153d
fix(skills): prune dependency/venv dirs from all skill scanners (#30042)
* fix(skills): skip dependency dirs in skill scan

* fix(skills): widen sibling rglob scanners to use shared exclusion set

Follow-up to PR #29968. The contributor's PR widened EXCLUDED_SKILL_DIRS
in the canonical walker (iter_skill_index_files), which fixes the
user-visible discovery path. This commit sweeps the ~12 other
rglob('SKILL.md') sites that did their own ad-hoc filtering — most only
checked .git/.hub, some had no filter at all — so dependency dirs
(.venv, node_modules, site-packages, etc.) cannot leak ghost skills
through the secondary paths.

Adds agent.skill_utils.is_excluded_skill_path(path) helper. Migrates
all 13 sites to use it. Removes 3 hardcoded duplicate filter sets.

Sites touched:
  agent/curator_backup.py        - skill backup file count
  gateway/run.py                 - disabled-skill response (2 sites)
  hermes_cli/dump.py             - skill count in env dump
  hermes_cli/profile_describer.py- profile description (2 sites)
  hermes_cli/profile_distribution.py - profile install count
  hermes_cli/profiles.py         - profile skill count
  hermes_cli/skills_hub.py       - category detection
  tools/skill_manager_tool.py    - skill name lookup (already used set, now uses helper)
  tools/skill_usage.py           - usage tracking + skill dir lookup (2 sites)
  tools/skills_hub.py            - optional skills find + scan (2 sites)
  tools/skills_sync.py           - bundled skills sync

E2E verified with the exact reported shape
(bring/scripts/.venv/.../typer/.agents/skills/typer/SKILL.md): no
sibling site picks up the ghost skill, all five legit-skill counts
still return 1.

* chore(infographic): retro-pop-grid bento for PR #30042 skill-scanner sweep

---------

Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
2026-05-21 14:18:02 -07:00
Teknium
552e9c7881
feat(secrets): Bitwarden Secrets Manager integration with lazy bws install (#30035)
* feat(secrets): Bitwarden Secrets Manager integration with lazy bws install

Pull API keys from Bitwarden Secrets Manager at process startup
instead of storing them all in plaintext in ~/.hermes/.env.  One
bootstrap token (BWS_ACCESS_TOKEN) replaces N per-provider keys, and
rotating a credential becomes a single change in the Bitwarden web
app.

Bitwarden defaults to source of truth: secrets pulled from BSM
overwrite any matching env vars on startup so rotations actually
take effect.  Set secrets.bitwarden.override_existing: false in
config.yaml to invert.

The bws binary is auto-downloaded into ~/.hermes/bin/bws on first
use (pinned to v2.0.0, SHA-256 verified against the GitHub release
checksum file).  No apt, brew, or sudo required.

New surfaces:
  hermes secrets bitwarden setup    — interactive wizard
  hermes secrets bitwarden status   — config + binary + token state
  hermes secrets bitwarden sync     — dry-run fetch / --apply exports
  hermes secrets bitwarden disable  — flip enabled: false
  hermes secrets bitwarden install  — just download the binary

Failures (missing binary, bad token, no network) never block Hermes
startup — they emit a one-line warning to stderr and continue with
whatever credentials .env already had.

Docs: website/docs/user-guide/secrets/{index,bitwarden}.md
Tests: tests/test_bitwarden_secrets.py (26 tests, hermetic — bws
       subprocess and HTTP downloads fully mocked)

* chore(infographic): add bitwarden-secrets-manager bento-grid retro-pop-grid

Generated for PR #30035 — Bitwarden Secrets Manager integration.
Style picked via pick_pr_infographic_style.py rotation:
  layout: bento-grid
  style:  retro-pop-grid
  aspect: 1:1 square

Saved at infographic/bitwarden-secrets-manager/infographic.png
2026-05-21 14:10:34 -07:00
helix4u
ba9964ff0d fix(custom): pass custom provider extra body
Allow custom OpenAI-compatible providers declared under `custom_providers:`
to set provider-specific `extra_body` fields and have Hermes merge them into
chat-completions requests when the matching custom endpoint is active.

This is a manual per-provider override rather than a model-name heuristic.
OpenAI-compatible Gemma thinking support is real, but the on-wire payload
shape is backend-specific: some servers want top-level `enable_thinking`,
while vLLM Gemma and NIM-style endpoints expect `chat_template_kwargs`.
A per-provider override is safer than picking one assumed payload.

Example config:

```yaml
custom_providers:
  - name: gemma-local
    base_url: http://localhost:8080/v1
    model: google/gemma-4-31b-it
    extra_body:
      enable_thinking: true
      reasoning_effort: high
```

For vLLM Gemma or NIM-style endpoints, use the nested shape those servers
expect:

```yaml
extra_body:
  chat_template_kwargs:
    enable_thinking: true
```

Changes:

- `hermes_cli/config.py`: preserve `extra_body` in normalized
  `custom_providers:` entries and allow it in the validated field set.
- `hermes_cli/runtime_provider.py`: propagate custom-provider `extra_body`
  as `request_overrides.extra_body` for named custom runtime resolution,
  including credential-pool paths.
- `agent/agent_init.py`: at agent init, locate the matching custom-provider
  entry by `base_url` (+ optional model) and merge its `extra_body` into
  `AIAgent.request_overrides`, with caller-provided overrides winning on
  conflicting top-level keys.
- `plugins/model-providers/custom/__init__.py`: keep existing CustomProfile
  behavior (Ollama `num_ctx`, `think=False` when reasoning disabled);
  user-configured `extra_body` flows through `request_overrides`.
- `website/docs/integrations/providers.md`: document the explicit
  `extra_body` override and the vLLM/Gemma `chat_template_kwargs` variant.
- Tests cover config normalization, runtime propagation, model matching,
  trailing-slash equivalence, fallback when no `model` field is set, and
  caller-override merging precedence.

Verified end-to-end against `CustomProfile` via `ChatCompletionsTransport`:
configured `extra_body` reaches `kwargs.extra_body` on the wire request,
and coexists with profile-generated entries (Ollama `num_ctx`, `think=False`)
without clobber.

Salvaged from #29022 onto current `main`. Cosmetic typing edit in
`plugins/model-providers/custom/__init__.py` and a stale-base docs revert
in `providers.md` were dropped during cherry-pick.

Closes #29022
2026-05-21 07:48:53 -07:00
alt-glitch
87d9239009 chore: trim verbose comments/docstrings, add AUTHOR_MAP entry
- Replace 18-line comment block with 3-line invariant statement
- Trim test docstrings from multi-paragraph to single-line summaries
- Trim assertion messages from 4-line to 2-line mismatch reports
- Replace 5-line WHAT comments in stubs with 1-line WHY comments
- Add ziliangdotme@gmail.com -> ziliangpeng to AUTHOR_MAP
2026-05-21 12:49:21 +05:30
Ziliang Peng
c3a09f7835 fix(background_review): propagate parent toolset config to keep tools[] cache-stable
## Summary

The background skill/memory-review fork constructed a child `AIAgent`
without propagating `enabled_toolsets` / `disabled_toolsets` from the
parent. When the parent narrowed its toolset (via `hermes tools
disable` or `config.yaml`), the fork's default `enabled_toolsets=None`
expanded to "all registered tools" — and the fork's outbound request
body sent a wider `tools[]` array than the parent's main-turn request.

Anthropic's prompt-cache key includes the `tools[]` array byte-for-byte,
so this divergence forked the cache lineage on every nudge and forced a
full prefix rewrite. On a captured ~4 hour Claude-via-Hermes session
this cost roughly 4.3 M cache-write tokens — about half of those
attributable to the per-nudge alternation between the main turn's
narrowed `tools[]` and the review fork's wider `tools[]`.

## Goal

Extend the byte-stability invariant established by PR #17276 (which
fixed `system`) to the `tools[]` slot of the request body, so the
review fork's outbound request hits the parent's warmed Anthropic
prefix cache regardless of how the parent's toolset is configured.

## Implementation

Two-line change in `agent/background_review.py`: pass
`enabled_toolsets=getattr(agent, "enabled_toolsets", None)` and the
matching `disabled_toolsets` kwarg into the `AIAgent(...)` call inside
`_spawn_background_review`. Adds an explanatory block comment that
calls out the cache-key dependency and the relationship to PR #17276.

The post-construction runtime whitelist
(`set_thread_tool_whitelist({memory, skills})`) is untouched — it
still gates which tools the model is allowed to *dispatch*. This
change aligns only what the request body *transmits*, not what the
review is allowed to do, so the safety contract from issue #15204
remains intact.

## Testing

- `tests/run_agent/test_background_review_cache_parity.py`: new
  `test_review_fork_inherits_parent_toolset_config` asserts the
  parent's `enabled_toolsets` and `disabled_toolsets` reach the
  review-fork constructor as kwargs.
- `tests/run_agent/test_background_review_toolset_restriction.py`:
  the existing `test_background_review_does_not_narrow_toolset_schema`
  was inverted (its old "must NOT pass enabled_toolsets" rule was
  built on the assumption that the parent always ran with the
  registry default — wrong in practice when the parent is narrowed).
  Renamed to `test_background_review_matches_parent_toolset_config`
  and updated to assert the parent's value propagates verbatim.
- Verified the new positive test fails without the fix and passes
  with it.
- Full suite for `test_background_review*`:

  ```
  $ python -m pytest tests/run_agent/test_background_review.py \
                     tests/run_agent/test_background_review_summary.py \
                     tests/run_agent/test_background_review_toolset_restriction.py \
                     tests/run_agent/test_background_review_cache_parity.py -q
  18 passed in 1.85s
  ```

## Scope

- `agent/background_review.py`: 2 added kwargs + explanatory comment.
- Two test files: one new positive test, one inverted existing test.
- No production code paths outside the review fork; no schema changes;
  no public-API changes.

Refs: ziliangpeng/hermes-agent#1 (root-cause analysis with wire-level
cache-write measurements). Extends PR #17276's `system`-bytes
invariant to the `tools[]` slot.
2026-05-21 12:49:21 +05:30
0xsir0000
5edb346c75 security(file-safety): also write-deny <root>/.env when running under a profile (#15981)
build_write_denied_paths() resolved the protected ``.env`` via
get_hermes_home(), which is profile-aware. When a profile is active
HERMES_HOME points at ``<root>/profiles/<name>`` and ``hermes_home / ".env"``
expands to the *profile* env file only — the global ``<root>/.env`` is left
off the deny list and a write_file call against it succeeds. Since the
top-level .env supplies credentials inherited by every profile, this is a
P0 credential-exfiltration / overwrite path.

Add a parallel ``_hermes_root_path()`` helper that returns the Hermes root
(via the existing ``get_default_hermes_root()`` constant) and include
``<root>/.env`` in the deny list alongside ``<active_profile>/.env``. Both
paths now refuse write_file/patch regardless of profile state. The active
HERMES_HOME .env entry is preserved so the protection in non-profile mode
is unchanged.

A regression test exercises the profile-active scenario by pointing
HERMES_HOME at ``<tmp>/profiles/coder`` and asserting that ``<tmp>/.env``
is denied.

Fixes #15981
2026-05-20 23:37:37 -07:00
Teknium
32aea113f0 fix(agent): consult supports_vision override in auto-mode routing
The contributor PR (#17936) only patched the strip path in
`_model_supports_vision()`. The auto-mode router in
`agent/image_routing._lookup_supports_vision` still only read models.dev,
so a custom-provider model declared as vision-capable would still get its
images routed through vision_analyze in the default `agent.image_input_mode:
auto` setting. Users had to set both `supports_vision: true` AND
`image_input_mode: native` to bypass the text pipeline.

Single-knob behavior now: `supports_vision: true` alone is enough in auto
mode. The strip path and the routing path consult the same resolver.

- Extract override resolution into `_supports_vision_override()` in
  agent/image_routing.py and wire it into `_lookup_supports_vision()`.
- Refactor `run_agent._model_supports_vision` to call the same helper
  (DRY, single source of truth for the resolution order).
- Strict YAML boolean coercion: `supports_vision: "false"` (quoted —
  a common YAML mistake) no longer coerces to True via bool() truthiness.
  Recognised tokens: true/false/yes/no/on/off/1/0 plus real bools and 0/1.
  Unrecognised values return None and fall through to models.dev.
- Add @CNSeniorious000 to AUTHOR_MAP for release attribution.

Tests: 26 new (TestCoerceCapabilityBool, TestSupportsVisionOverride,
TestLookupSupportsVisionOverride, TestAutoModeRespectsOverride). Existing
contributor tests + image_routing + vision_native_fast_path +
native_image_buffer_isolation all green (92/92).
2026-05-20 23:27:10 -07:00
Teknium
b4afc6546e fix(xai): restore encrypted reasoning replay across turns
xAI partner integration requires Hermes to thread `encrypted_content`
reasoning items back to the Responses API on every turn so Grok can
maintain cross-turn reasoning coherence. PR #26644 (May 15) gated this
off for `is_xai_responses` on the theory that the OAuth/SuperGrok
surface rejected replayed encrypted blobs and produced the multi-turn
"Expected to have received \`response.created\` before \`error\`"
failure. That diagnosis was wrong — the prelude-SSE fallback added in
the same PR is what actually fixed that failure mode. Suppressing the
replay was an unnecessary side-effect that broke the whole point of
xAI's partnership integration.

Changes:
- agent/codex_responses_adapter.py — drop the `is_xai_responses` gate
  in `_chat_messages_to_responses_input`. Keep the kwarg in the
  signature for transport compatibility; update the docstring to
  document the May 2026 reversal.
- agent/transports/codex.py — restore
  `kwargs["include"] = ["reasoning.encrypted_content"]` on the xAI
  Responses path so xAI echoes encrypted reasoning back to us.
- tests/run_agent/test_codex_xai_oauth_recovery.py — flip the three
  xAI assertions (now: xAI MUST receive replayed reasoning AND we MUST
  include encrypted_content in the request).
- tests/agent/transports/test_codex_transport.py — flip the
  `include` assertions on `test_xai_reasoning_effort_passed` and
  `test_xai_grok_4_omits_reasoning_effort`; update the allowlist
  block comment.

The prelude-SSE fallback and the entitlement-403 surfacing fixes from
#26644 are untouched — they were independent fixes that happened to
ride along with the reasoning-replay gate.

Validation:
- Targeted: tests/run_agent/test_codex_xai_oauth_recovery.py +
  tests/agent/transports/test_codex_transport.py → 65/65 pass
- Broader: tests/agent/transports/ + tests/run_agent/ →
  1674 passed, 3 skipped, 0 failures
- E2E (real imports, isolated HERMES_HOME, ResponsesApiTransport
  build_kwargs): turn-1 request carries
  `include: ["reasoning.encrypted_content"]`; turn-2 input replays
  the encrypted_content blob from turn-1's
  `codex_reasoning_items`; native Codex unchanged.
2026-05-20 23:12:45 -07:00
liuhao1024
4ead464f97 fix(security): guard os.chmod(parent) against / and top-level dirs
Five call sites do os.chmod(path.parent, 0o700) without checking that
the parent resolves to a safe directory. If HERMES_HOME or another
path env var resolves to /, the chmod strips traversal permission from
the root inode and bricks the entire host.

Add secure_parent_dir() to hermes_constants.py that refuses to chmod
/ or any top-level directory (depth < 2). Replace all 5 call sites
with this helper.

Fixes #25821
2026-05-20 22:56:55 -07:00
helix4u
1a7bb988fc fix(gateway): harden kanban and provider cleanup races 2026-05-20 14:31:22 -07:00
Teknium
eeb747de25 feat(sessions): opt-in per-session JSON snapshot writer
PR #29182 deleted the per-session JSON snapshot writer outright because
state.db is canonical and the snapshots had no in-tree consumer.  Some
users have external tooling that reads `~/.hermes/sessions/session_{sid}.json`
directly, so reintroduce the writer behind a config flag that defaults
to off.

- Add `sessions.write_json_snapshots` (default False) to DEFAULT_CONFIG
- Restore `AIAgent._save_session_log` + `_clean_session_content` as
  gated methods.  When the flag is off the call is a fast no-op; when
  on, the writer behaves as before (atomic write, truncation guard
  preserved, REASONING_SCRATCHPAD → think tag normalization)
- Re-derive the target path from `agent.session_id` on each call so
  `/branch` and `/compress` re-points happen automatically — no need
  to restore the explicit re-point bookkeeping at call sites
- Wire the single call site in `_persist_session` (the cleanup-on-exit
  hook).  Did NOT restore the 7 intra-turn calls the original PR deleted
  — those were redundant writes within the same turn that doubled disk
  I/O without adding any persistence guarantee `_persist_session` does
  not already provide
- Read the flag once at agent init via `load_config()`, cache as
  `agent._session_json_enabled`
- Update `TestNoSessionJsonSnapshot` → `TestSessionJsonSnapshotOptIn`
  to pin behavior: default off (no file), opt-in true (file written),
  no-op method on default agents, logs_dir retained unconditionally
- Update CONTRIBUTING.md and the bundled `hermes-agent` skill to
  document the flag and its default
2026-05-20 11:44:10 -07:00
yoniebans
cebd480818 refactor(session-log): drop branch/compress re-point of session_log_file
The attribute no longer exists; nothing to re-point.
2026-05-20 11:44:10 -07:00
yoniebans
c547392fd4 refactor(session-log): stop initializing session_log_file attribute 2026-05-20 11:44:10 -07:00
yoniebans
ce26785187 refactor(session-log): delete _save_session_log and all callers
state.db now stores every message field the JSON snapshot stored. Removed
the method, all 7 call-sites, and ~13 test stubs that suppressed its file I/O.
Body is in git history if it ever needs to come back.
2026-05-20 11:44:10 -07:00
teknium1
42c4288411 fix(chat_completions): broaden tool_name strip docstring + AUTHOR_MAP
Salvage follow-up to PR #28958 (savanne-kham):

- convert_messages() docstring now explicitly documents the tool_name strip
  alongside Codex fields, names which providers reject it (Fireworks,
  Moonshot/Kimi), and why permissive providers (OpenRouter, MiniMax)
  masked the bug.
- AUTHOR_MAP entry for savanne.kham@protonmail.com -> savanne-kham.
2026-05-20 02:44:08 -07:00
Savanne Kham
258965663c fix(chat_completions): strip tool_name from messages for strict providers
The 'tool_name' key on role=tool messages is an internal Hermes field
(stored in the messages.tool_name SQLite column for FTS indexing) that
is not part of the OpenAI Chat Completions schema. Strict OpenAI-compatible
providers — notably Moonshot AI (Kimi) — reject it with HTTP 400:

  Error from provider: Extra inputs are not permitted,
  field: 'messages[N].tool_name', value: 'execute_code'

Add 'tool_name' to the sanitize block in ChatCompletionsTransport.convert_messages
alongside the existing Codex Responses API fields (codex_reasoning_items,
codex_message_items) so it is popped before the request is sent.

Reproducer:
  hermes chat --model kimi-k2.6
  > list the top 5 Hacker News stories
  -> assistant emits tool_call(execute_code)
  -> tool result message gets tool_name='execute_code'
  -> next turn's payload includes messages[N].tool_name -> 400

Permissive backends (MiniMax, OpenRouter on most routes) ignore the extra
field and were masking the bug.
2026-05-20 02:44:08 -07:00
Teknium
6cb9917c73
perf(compression): defer feasibility check to first compression attempt (#28957)
`AIAgent.__init__` was eagerly calling
`_check_compression_model_feasibility()` which probes the auxiliary
provider chain and runs `get_model_context_length()` (potentially
network-bound) to decide whether the configured auxiliary model can
fit a full compression-threshold window. That cost ~440ms cold on
every agent construction.

Most `chat -q` invocations finish in 1-5 seconds and never accumulate
enough context to trip the compression threshold, so the feasibility
check is pure overhead. The result is also only consumed when
compression actually fires (the function adjusts the live threshold
downward if the aux model can't fit; absent that mutation, the gate
in `conversation_loop.py:442` would never fire anyway).

Defer to first `compress_context()` call via
`agent._compression_feasibility_checked` sentinel. Runs at most once
per agent lifetime, just before the first compression pass. The
warning storage (`_compression_warning`) and gateway replay
machinery is unchanged — it still emits to status_callback on the
first turn that actually needs compression.

E2E timing (chat -q 'hi', 3 runs each):
                BEFORE   AFTER    delta
  median wall   2.03s    1.86s    -8% (-169ms)
  min wall      1.92s    1.63s    -15% (-293ms)

Real cold-start observation (synthetic 31-turn agent loop): identical
behavior since feasibility check fires once on first compression and
caches. No semantic difference for sessions that DO compress.

UX trade-off: users with broken auxiliary-provider config no longer
see the warning at session start. They see it when compression first
fires — which is exactly when it matters. For users with working
config (the vast majority), the warning never fires anyway, so the
deferral is invisible.

Tests:
- tests/run_agent/test_compression_feasibility.py — 16/16 pass
  (the one test that asserted call-at-init was updated to drive the
  lazy check explicitly via agent._check_compression_model_feasibility())
- Live tmux session: 2-turn conversation + tool call completes clean,
  zero errors in agent.log
2026-05-19 17:27:17 -07:00
Teknium
64a9a199bb
fix(xai-oauth): pin inference base_url to x.ai origin (#28952)
XAI_BASE_URL / HERMES_XAI_BASE_URL let users repoint the OAuth-authenticated
inference endpoint, but the env override was an unguarded credential-leak
vector: a tampered .env or hostile shell init setting
XAI_BASE_URL=https://attacker.example/v1 would silently ship the SuperGrok
OAuth bearer to a third party on every request.

Add _xai_validate_inference_base_url() that pins the host to x.ai or a
*.x.ai subdomain and rejects non-HTTPS. On rejection, fall back to the
default with a warning rather than raise — a bad env var should not
deadlock auth, but should never leak the bearer either.

Apply at all three sites that read the env override for xai-oauth:
- hermes_cli/auth.py resolve_xai_oauth_runtime_credentials (main path)
- hermes_cli/auth.py _xai_oauth_loopback_login (initial login)
- agent/auxiliary_client.py _resolve_xai_oauth_for_aux (aux client)

E2E validated against four scenarios: attacker.example, lookalike
api.x.ai.evil.com, http:// downgrade on api.x.ai, and legit custom.x.ai
subdomain (which still resolves correctly).

Discovered while comparing against the opencode-grok-auth plugin
(github.com/ysnock404/opencode-grok-auth), which highlighted the same
guard on the OpenCode side.
2026-05-19 14:51:21 -07:00
Teknium
544c31b50b
perf(agent-loop): cut 47% of per-conversation function calls via 3 targeted hot-path optimizations (#28866)
* perf(config): add load_config_readonly() fast path for hot agent loop

`load_config()` is called from the agent loop's per-API-call hot path via
`get_provider_request_timeout()` and `get_provider_stale_timeout()` —
both invoked once per turn from `_resolved_api_call_timeout()` in
run_agent.py.

Profiling a synthetic 20-tool-call agent run revealed:
- 21 invocations of `load_config()` cumulating 56ms (~17% of agent loop)
- 34,398 deepcopy calls totaling 37ms (config defensive deepcopy + chain)
- 8,652 `_expand_env_vars` invocations (~412 per turn)

Microbench (cache-hit, real config.yaml present):
  load_config()          265us/call  (125us deepcopy + 140us infra)
  load_config_readonly() 138us/call  (~48% faster)

`load_config_readonly()` returns the cached dict directly without the
defensive deepcopy. Documented contract: caller must not mutate. Returns
plain dict (not MappingProxyType) so downstream `isinstance(x, dict)`
guards keep working — caught during initial implementation when
MappingProxyType broke get_provider_request_timeout's guard logic.

Wired into hermes_cli/timeouts.py (the two functions called per agent
turn). load_config() is unchanged for the 263 other call sites that
mutate the result before save_config(), are not in the hot path, or
where the safety guarantee matters more than the perf.

Profile A/B (cached config, 21-turn agent loop):
                                BEFORE  AFTER   delta
  get_provider_request_timeout  55ms    16ms    -71%
  total function calls          399k    160k    -60%
  deepcopy calls (in hotspots)  34,398  ~0      ~elim

Verified:
- isinstance(load_config_readonly(), dict) is True
- timeout/stale resolutions correct
- load_config() still returns isolated mutable deepcopies
- tests/hermes_cli/test_config*.py / test_timeouts.py: 102/102 pass
- tests/cli/ + tests/agent/test_auxiliary_client.py: 883/883 pass

* perf(redact): substring pre-screens skip non-matching regex chains

Every log record passes through `RedactingFormatter.format` which calls
`redact_sensitive_text`, which historically ran ALL 13 secret-pattern
regexes against every line — including DB connection strings, JWTs,
Discord mentions, Signal phone numbers, etc. — even for typical clean
log records like 'INFO run_agent: API call completed'.

Add cheap substring pre-checks before each regex pass. False positives
still run the regex (which then matches nothing); false negatives are
impossible because every pattern requires the gated substring to match
its leading anchor:

- `_PREFIX_RE`        gated on any of 33 known credential prefix substrings
- `_ENV_ASSIGN_RE`    gated on `=` in text
- `_JSON_FIELD_RE`    gated on `:` and `"` in text
- `_AUTH_HEADER_RE`   gated on `uthorization`/`UTHORIZATION` in text
- `_TELEGRAM_RE`      gated on `:` in text
- `_PRIVATE_KEY_RE`   gated on `BEGIN` and `-----`
- `_DB_CONNSTR_RE`    gated on `://` in text
- `_JWT_RE`           gated on `eyJ` in text
- URL userinfo/query  gated on `://`
- `_redact_form_body` gated on `&` and `=`
- `_DISCORD_MENTION_RE` gated on `<@`
- `_SIGNAL_PHONE_RE`  gated on `+`

Microbench (5 typical log records, 20k iterations each):
                              BEFORE  AFTER  delta
  redact_sensitive_text per call  5.63us  1.79us  -68%

Real-world impact: ~244 log records emitted in a 30-turn agent loop, so
the chain saves ~1ms of CPU per conversation. Bigger win is the
reduction in regex execution and GC pressure during heavy logging
sessions (verbose logging, gateway message processing).

Security regression test: 30 secret-containing inputs (sk-/ghp_/JWT/DB
connstr/Auth-Bearer/private key/URL userinfo/Discord/Signal/etc.)
verified to produce identical redacted output before/after. All 75
existing tests/agent/test_redact.py cases pass.

The `?access_token=foo&code=bar` (bare query string, no scheme) case
that 'leaks' is pre-existing behavior — the URL query redaction
requires a well-formed URL with scheme+host. Not a regression.

* perf(run_agent): cache _needs_thinking_reasoning_pad result per (provider, model, base_url)

Profile of a 31-turn synthetic agent run shows `_needs_thinking_reasoning_pad`
fires 495 times (~16 per turn) and each call ran 3 helper methods, each
hitting `base_url_host_matches` 1-4 times via `urlparse`. Total cost:
3,342 base_url_host_matches calls + 3,373 urlparse calls accounting for
~36ms of agent-loop overhead (~7% of the entire post-network work).

Provider / model / base_url don't change during a conversation except via
`switch_model` and fallback activation — both of which already overwrite
those attributes atomically. Cache the result on a tuple key; since the
key is derived from the very fields that would change, the cache
auto-invalidates on the next read after a switch. No manual invalidation
needed in switch_model / _try_activate_fallback.

Profile A/B (31-turn cached-config agent run):
                                      BEFORE  AFTER  delta
  _needs_thinking_reasoning_pad cum    18ms    1ms    -94%
  _copy_reasoning_content_for_api cum  17ms    1ms    -94%
  base_url_host_matches calls          3,342   372    -89%
  urlparse calls                       3,373   403    -88%
  total function calls                 296k    223k   -25%

Verified:
- tests/run_agent/test_deepseek_reasoning_content_echo.py: 36/36 pass
- tests/run_agent/ (full): 1383/1383 pass + 3 skipped
2026-05-19 14:25:10 -07:00
justincc
a61420952e fix(agent): set tool_name on tool-result messages at construction time
Introduces make_tool_result_message() in tool_dispatch_helpers.py as the
single place where tool-result message dicts are built. All six construction
sites in tool_executor.py, agent_runtime_helpers.py, and mini_swe_runner.py
now use it, so tool_name is set in memory from the moment a message is
created rather than relying on fallback logic in the flush paths.

Fixes blank tool_name in both state.db and JSON session logs.

Adds tests.
2026-05-19 20:49:11 +01:00
Teknium
88ee58f7d2
fix(kanban): stale reclaim must not tick failure counter (#28680)
Follow-up to #28452. detect_stale_running() was calling
_record_task_failure() on every reclaim, which ticked the
consecutive_failures counter. With the default failure_limit=2,
two legitimately long-running tasks (>4 h without explicit
heartbeat) would auto-block via the spawn-failure circuit
breaker — even though no worker actually failed.

Stale reclaim is dispatcher-side absence-of-heartbeat detection,
not a worker fault. Removed the _record_task_failure() call;
the 'stale' event in task_events is still the audit surface,
but the failure counter is now reserved for spawn_failed /
timed_out / crashed (real failures).

Also documents the heartbeat requirement:
- KANBAN_GUIDANCE in agent/prompt_builder.py now states the
  rule ('call kanban_heartbeat at least once an hour for tasks
  running longer than 1 hour') so workers learn the contract.
- kanban.md adds the stale event row to the events table and
  flags the heartbeat requirement in the worker lifecycle list.

New regression test: test_detect_stale_does_not_tick_failure_counter
locks in the new behaviour.
2026-05-19 03:15:18 -07:00
vanthinh6886
62573f44cf fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes
1. trajectory_compressor.py: yaml.safe_load() returns None on empty
   files, crashing with TypeError on `if 'tokenizer' in data`. Fix by
   adding `or {}` fallback. (HIGH — blocks startup with empty config)

2. 6 files with fcntl.flock(LOCK_UN) in finally blocks without
   try/except: cron/scheduler.py, hermes_cli/auth.py,
   agent/shell_hooks.py, tools/skill_usage.py,
   tools/environments/file_sync.py, tools/memory_tool.py. If unlock
   raises OSError, fd.close() is skipped and the lock is held forever.
   The msvcrt branches already had try/except; the fcntl branches did
   not. Fix by wrapping in try/except (OSError, IOError): pass.

3. agent/copilot_acp_client.py line 639: TOCTOU race — path.exists()
   followed by path.read_text() with no try/except. If file is deleted
   between the check and the read, FileNotFoundError propagates. Fix
   by using try/except FileNotFoundError.

4. gateway/sticker_cache.py: non-atomic write via Path.write_text()
   can leave truncated JSON on crash, causing JSONDecodeError on next
   load. Fix by writing to tempfile + fsync + os.replace (atomic).
2026-05-19 00:12:41 -07:00
ooovenenoso
d759a67c0f fix: add recovery hints to loop guard warnings 2026-05-19 00:12:12 -07:00
Teknium
b5c1fe78aa
feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373)
Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that
group several skills under one slash command. Invoking /<bundle-name>
from any surface (CLI, TUI, dashboard, any gateway platform) loads
every referenced skill into a single combined user message.

Use cases:
- /backend-dev → loads github-code-review + test-driven-development
  + github-pr-workflow as one bundle.
- /research → loads several research skills together.
- Team task profiles shared via dotfiles.

Behavior:
- Bundles take precedence over individual skills when slugs collide.
- Missing skills are skipped with a note, not fatal.
- No system-prompt mutation — bundles generate a fresh user message
  at invocation time, the same way /<skill> does. Prompt cache stays
  intact.
- Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI),
  /help display.

Schema (~/.hermes/skill-bundles/<slug>.yaml):
    name: backend-dev
    description: Backend feature work.
    skills:
      - github-code-review
      - test-driven-development
    instruction: |
      Optional extra guidance prepended to the loaded skills.

New module: agent/skill_bundles.py — load, scan, resolve, build
invocation message, save, delete. yaml.safe_load only; broken
bundles log a warning and are skipped, never raise.

New CLI subcommand: hermes bundles {list,show,create,delete,reload}.
Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py.
'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it.

New in-session slash command: /bundles lists installed bundles in
both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py)
and gateway (gateway/run.py) before the existing /<skill-name> path.

Autocomplete: SlashCommandCompleter gained an optional
skill_bundles_provider parameter that defaults to None — the prompt
shows '▣ <description> (N skills)' for bundles vs '' for skills.

Tests:
- tests/agent/test_skill_bundles.py — 33 tests covering slugify,
  scan/cache freshness, resolve (including underscore→hyphen
  Telegram alias), build_bundle_invocation_message (loading, missing
  skills, user/bundle instruction injection, dedup), save/delete,
  reload diff, list sort.
- tests/hermes_cli/test_bundles.py — 8 tests for the CLI
  subcommand (create/list/show/delete/reload, --force, missing
  bundle errors).
- tests/gateway/test_bundles_command.py — 4 tests for the gateway
  handler and bundle resolution priority.

Live E2E: verified subprocess invocations of hermes bundles
{list,create,show,reload,delete} round-trip correctly against an
isolated HERMES_HOME.

Docs:
- website/docs/user-guide/features/skills.md — new 'Skill Bundles'
  section with quick example, YAML schema, management commands,
  behavior notes.
- website/docs/reference/cli-commands.md — 'hermes bundles' added to
  the top-level command table and given its own subcommand section.
2026-05-18 21:38:05 -07:00
RyanRana
206f595f66 perf(prompt): cache kanban worker guidance at session init
Salvages #24402 by @RyanRana. The KANBAN_GUIDANCE block (~835 tokens)
is session-static — the dispatcher decides at spawn time whether the
process is a kanban worker via the kanban_show tool's check_fn (gated
on HERMES_KANBAN_TASK env var). Re-checking 'kanban_show' in
valid_tool_names and re-loading the reference on every system-prompt
rebuild (init + each context compression) is wasted work.

Caches the resolved string on agent._kanban_worker_guidance once in
agent_init and consumes it in system_prompt.build_system_prompt(),
with a getattr fallback for code paths that bypass agent_init.
2026-05-18 20:56:44 -07:00
xxxigm
3b6f57fa66 fix(run-agent): treat any 403 on xai-oauth as entitlement to stop refresh-loop
The existing ``_is_entitlement_failure`` heuristic only fires when
the response body contains specific substrings ("do not have an
active Grok subscription", etc.). xAI has been seen to 403 standard
SuperGrok subscribers with a terser body that doesn't match those
keywords (#26847), and the recovery path would then mint a fresh
token, get a fresh 403, and loop until Ctrl+C.

Add a defense-in-depth check at the recovery call site: any 403 on
``provider == "xai-oauth"`` short-circuits ``try_refresh_current``
so the error surfaces immediately with the friendly hint from
``_summarize_api_error``. Keeps the existing keyword path for all
other providers untouched.
2026-05-18 20:08:09 -07:00
briandevans
756900723a fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS
Qwen3.x and DeepSeek-V3.x default to chatty/hallucinatory tool use without
enforcement steering — agents narrate "calling tool X" without actually
emitting a tool call, or run partial loops. Both model families fit the
same failure pattern TOOL_USE_ENFORCEMENT_GUIDANCE was already injected
for (gpt, codex, gemini, gemma, grok, glm).

Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>

Squashed salvage of:
- 403e567ce fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS
- 9433eabe7 test(agent): use realistic qwen-plus identifier in enforcement test

Fixes #28079.
2026-05-18 20:06:49 -07:00
AceWattGit
25e0f4d465 fix: wrap _pool_may_recover_from_rate_limit call through run_agent namespace
The conversation_loop.py references _pool_may_recover_from_rate_limit which
was defined in run_agent.py. After the conversation-loop extraction refactor,
the helper was no longer in the same module scope. Wrap the call as
_ra()._pool_may_recover_from_rate_limit() to route through the run_agent
monkeypatch namespace where the helper is available.

Adds regression test in test_gemini_fast_fallback.py.

Fixes: MAILROOM Email Triage NameError, OPS Execution Monitor NameError.
2026-05-18 20:04:57 -07:00
zccyman
af78449acd feat(bg-review): add bundled/pinned skill protection rules to review prompts (#27644)
The background review prompts (_SKILL_REVIEW_PROMPT and
_COMBINED_REVIEW_PROMPT) now include explicit protection rules
for bundled, hub-installed, and pinned skills — aligning with
the curator's existing policy at curator.py L345/350.

Before this change, bg-review could freely rewrite bundled skills
like 'hermes-agent' or pinned skills, while the 7-day curator
explicitly skips them.

The review agent now sees:
  • Bundled skills (shipped with Hermes)
  • Hub-installed skills (installed via hermes skills install)
  • Pinned skills (marked via hermes curator pin)
If only protected skills need updating, the review says
'Nothing to save.' and stops.

Fixes #27644
2026-05-18 20:02:22 -07:00
burjorjee
52b049b560 fix: treat inline-shell timeout guard as timeout 2026-05-18 19:36:04 -07:00
墨綠BG
50e93f23f2 🐛 fix(memory): require newline after context tag 2026-05-18 10:53:08 -07:00
墨綠BG
341c8d3030 🐛 fix(memory): keep inline memory-context mentions visible 2026-05-18 10:53:08 -07:00