Commit graph

9577 commits

Author SHA1 Message Date
ygd58
51013268cf fix(cron): clarify schedule is required for create in tool schema
Grok models (and other LLMs) sometimes omit the schedule parameter
when calling the cronjob tool with action=create because the schema
only listed 'action' in required[] and the schedule description did
not explicitly state it was mandatory (issue #32427).

Fix: update schema descriptions to clearly state schedule is REQUIRED
for action=create, making this explicit for models that rely on
description text for parameter compliance.

Fixes #32427
2026-05-26 14:09:37 -07:00
Teknium
ccd3d04fc5
chore(models): swap qwen3.6-plus → qwen3.7-max in openrouter+nous lists (#32809)
Updates curated picker lists for both the OpenRouter fallback snapshot
(`OPENROUTER_MODELS`) and the Nous Portal list (`_PROVIDER_MODELS['nous']`).
Regenerates website/static/api/model-catalog.json via
`scripts/build_model_catalog.py` to keep the docs-hosted manifest in
sync (drift guard in `test_in_repo_lists_match_manifest`).

tests/hermes_cli/test_models.py fixtures updated — they pinned the
old model id as their live-fetch sample.
2026-05-26 14:01:47 -07:00
Teknium
8b69ec03af
feat(mcp): Nous-approved MCP catalog with interactive picker (#30870)
* feat(mcp): Nous-approved MCP catalog with interactive picker

Adds an optional-mcps/ directory mirroring optional-skills/: curated,
Nous-approved MCP servers shipped with the repo but disabled by default.
Presence in optional-mcps/ = approval. No community tier, no trust signals.
Entries are added by merging a PR.

New surface:
  hermes mcp                       Interactive catalog picker (default)
  hermes mcp catalog               Plain-text list, scriptable
  hermes mcp install <name>        Install a catalog entry

Picker behavior:
  not installed   -> install (clone/bootstrap if needed, prompt for creds)
  installed/off   -> enable
  installed/on    -> menu (disable / uninstall / reinstall)

Manifest schema (manifest_version: 1) supports:
- transport: stdio (command/args, ${INSTALL_DIR} substitution) or http (url)
- install: optional git clone + bootstrap commands (for repos that need
  local venv setup, like the n8n bridge); omit for npx/uvx servers
- auth: api_key (prompts -> ~/.hermes/.env), oauth (provider-mediated
  or native MCP), or none

Catalog entries are never auto-updated. Users re-run `hermes mcp install`
to refresh. Credentials always go to ~/.hermes/.env (the .env-is-for-secrets
rule), never to per-server env blocks.

Ships n8n as the reference manifest (https://github.com/CyberSamuraiX/hermes-n8n-mcp).

Tests: 19 catalog tests + E2E install/uninstall round-trip via the shipped
manifest.

* feat(mcp): tool-selection checklist + Linear catalog entry

Adds install-time tool selection so users only enable the MCP tools they
actually want, and ships Linear as a second reference catalog entry to
demonstrate the http+oauth path alongside n8n's stdio+api_key+git-bootstrap.

Tool selection flow:
  install (clone/auth/credentials) ->
  probe server for available tools ->
  curses checklist with pre-checked rows ->
  write mcp_servers.<name>.tools.include

Pre-check priority:
  1. user's prior tools.include  (reinstall preserves selection)
  2. manifest's tools.default_enabled  (curated subset)
  3. all probed tools  (default)

Probe-failure fallback (server unreachable, OAuth not yet complete,
backing service offline):
  - manifest declared default_enabled -> applied directly
  - no default declared -> no filter written (all-on when reachable)
  - both cases point user at hermes mcp configure <name>

Manifest schema additions:
  tools:
    default_enabled: [list, of, tool, names]   # optional

Updates:
  - optional-mcps/linear/manifest.yaml -- new reference entry (http+oauth)
  - optional-mcps/n8n/manifest.yaml -- tools.default_enabled set to the
    8 read-mostly tools; mutating tools (activate/deactivate, container_logs)
    pruned by default
  - docs: new 'Tool selection at install time' section in features/mcp.md

Tests: 7 new tests in TestToolSelection covering probe-success / probe-fail
matrix, manifest-default filtering, reinstall-preserves-selection, and
invalid-default-enabled rejection. 26 catalog tests + 32 existing
mcp_config tests passing.

* feat(mcp): polish — picker unification, include-mode convergence, hardening

Addresses review findings on PR #30870. Lands all improvements that
belong in this PR before merge; defers separate cleanup (consolidating
two probe implementations, change-detector tests) to follow-ups.

Picker UX (mcp_picker.py)
- Unifies catalog + custom (user-added) MCPs in one view with distinct
  status badges (available / enabled / installed (disabled) /
  custom — enabled / custom — disabled)
- Adds 'Configure tools (probe server + re-pick)' action to both the
  catalog-installed and custom-row submenus — the existing
  hermes mcp configure flow was previously unreachable from the picker
- Loops until ESC/q so the user can manage several entries in one
  session instead of having to re-launch
- Uninstall message now mentions .env credentials are preserved with a
  pointer to clean them up manually if no longer needed
- Surfaces a 'requires a newer Hermes' warning per future-manifest
  entry instead of silently hiding it

Catalog (mcp_catalog.py)
- catalog_diagnostics() exposes which manifests were skipped and why
  (future_manifest vs invalid) so UIs can give actionable feedback
- _do_git_install detects SHA-shaped refs (regex /[0-9a-f]{7,40}/)
  and skips the doomed 'git clone --branch <sha>' attempt — clone --branch
  only accepts branches/tags, so SHAs always failed noisily before
  falling back to the full-clone path
- Probe-success all-tools-enabled message now mentions that new tools
  the server adds later will be auto-enabled (no-filter mode)

Convergence (tools_config.py)
- _configure_mcp_tools_interactive now writes tools.include (whitelist)
  instead of tools.exclude (blacklist), matching the catalog flow and
  hermes mcp configure. The on-disk config shape no longer depends on
  which UI the user touched last
- Two existing tests updated to assert the new include-mode contract

Discoverability
- Setup wizard final step now prints 'Browse curated MCPs: hermes mcp'
- Three tip-corpus entries pointing at the new catalog
- Docs updated with: trust model (manifests run code locally, gated by
  PR review, but read before installing), runtime ${ENV_VAR} substitution
  semantics, and the manifest_version forward-compat behavior

Tests
- 7 new tests covering future-manifest diagnostics, custom MCP picker
  rows, SHA-ref git-install path, branch-ref git-install path, and the
  tools_config include-mode write contract
- 80 MCP-related tests passing across test_mcp_catalog.py,
  test_mcp_config.py, test_mcp_tools_config.py

* fix(mcp): drop setup-wizard catalog hint to satisfy supply-chain scanner

The wizard line 'Browse curated MCPs: hermes mcp' triggered the
CI supply-chain scanner because it pattern-matches on edits to any
file named hermes_cli/setup.py — that filename matches the Python
'install-hook file' heuristic even though this setup.py is the
user-facing 'hermes setup' wizard, not a packaging install hook.

The catalog is already surfaced via three tip-corpus entries in
hermes_cli/tips.py (which the scanner doesn't flag), so dropping the
wizard mention loses no discoverability. Worth revisiting after a
scanner allowlist for this specific file lands.
2026-05-26 12:48:14 -07:00
Teknium
2517917de3
fix(cli): restore fallback paste collapse + handle long single-line pastes (#32447)
Follow-up to #32087 after community report from @ethernet that 8000-char
single-line pastes get dumped raw into the input box.

A) Fallback regression revert
   paste_collapse_threshold_fallback default: 0 -> 5
   #32087 disabled the fallback handler by default. The fallback path
   has been always-on with line_count >= 5 since #3065 (March 2026);
   the previous shape was the salvaged contributor's design and didn't
   match pre-existing behavior for terminals without bracketed paste
   support (Windows terminals, some SSH setups). Restoring the original
   on-by-default.

B) Long single-line paste guard
   New config key: paste_collapse_char_threshold (default 2000)
   Bracketed-paste handler and fallback handler now BOTH collapse when
   line count >= line threshold OR total char length >= char threshold.
   Catches the case ethernet hit: ~8000 chars of minified JSON / log
   output on a single line dumped raw into the buffer.
   TUI mirrors the same config via uiStore.pasteCollapseChars.
   Set 0 to disable.

Defaults verified:
  paste_collapse_threshold: 5
  paste_collapse_threshold_fallback: 5
  paste_collapse_char_threshold: 2000

Tests:
  tests/hermes_cli/test_config.py: 87/87 pass
  ui-tui useConfigSync.test.ts: 34/34 pass
  ui-tui useComposerState.test.ts: 9/9 pass
  tsc: 0 new errors in touched files
2026-05-25 23:49:01 -07:00
Teknium
31c8d5ff5f chore(wecom): make defusedxml dep acquireable and tolerant of absence
Follow-up on top of @TheOnlyMika's #32155 cherry-pick. The defusedxml
hardening import was unconditional, which would break the gateway for
anyone running a WeComCallback adapter without the (transitive-only)
defusedxml present.

- Wrap the import in the same try/except pattern as aiohttp/httpx in
  the same file. Sets DEFUSEDXML_AVAILABLE flag.
- Extend check_wecom_callback_requirements() to gate on the flag, so
  the gateway logs the actual missing dep and skips the adapter
  instead of crashing.
- Add [wecom] extra to pyproject.toml with defusedxml==0.7.1.
- Register platform.wecom_callback in tools/lazy_deps.py so users get
  prompted to install it on first WeComCallback configuration, same
  pattern as discord/slack/matrix.

defusedxml is still the right call for pre-auth XML parsing — this
commit just makes the dep declarative and recoverable instead of a
hard import-time crash.
2026-05-25 23:30:43 -07:00
TheOnlyMika
5744b17579 harden: restrict markdown link schemes; parse untrusted XML with defusedxml
Two small defensive-hardening changes:

- web/src/components/Markdown.tsx: render links only for http(s)/mailto
  schemes; other schemes (javascript:, data:, vbscript:) are dropped to
  plain text so a crafted link in rendered content can't execute on click.

- gateway/platforms/wecom_callback.py: parse the untrusted, pre-auth WeCom
  callback request body with defusedxml instead of xml.etree, blocking
  entity-expansion / billion-laughs (and XXE) on the parse path. defusedxml
  is already a dependency (uv.lock); response-building XML in
  wecom_crypto.py is unchanged (it is not parsed from untrusted input).

Verified: dashboard typechecks and builds; defusedxml blocks an
entity-expansion payload while valid WeCom envelopes still parse.
2026-05-25 23:30:43 -07:00
dearmayo
f4953bc648 fix(subdirectory_hints): prevent loading AGENTS.md outside workspace
SubdirectoryHintTracker was scanning directories outside the active
working directory, allowing files like ~/.codex/AGENTS.md or
~/.claude/CLAUDE.md to be loaded and injected into the agent context.
This causes cross-agent context contamination and instruction mixup.

Add _is_ancestor_or_same() helper and a path boundary check in
_is_valid_subdir(): only directories within the working directory tree
(i.e. path.is_relative_to(working_dir)) are allowed.

Also add exist_ok=True to mkdir() calls in new tests to prevent
pytest-xdist race conditions when workers share the same tmp_path parent.

Tests added:
- test_outside_working_dir_rejected: verifies sibling dirs are blocked
- test_outside_working_dir_absolute_path_rejected: verifies ~/.codex paths blocked
- test_inside_workspace_subdir_allowed: verifies normal subdir access unaffected
- test_sibling_repo_not_loaded_via_ancestor_walk: ancestor walk stays within workspace
2026-05-25 23:17:33 -07:00
Krisli Dimo
9d10c45e32 fix(telegram): tighten table row-group spacing and drop redundant first bullet
The GFM → Telegram-row-group rewriter previously joined every line in
every row with a blank line ("\n\n".join(rendered_rows)), which made
multi-column tables explode into one-bullet-per-paragraph walls on
mobile.  It also emitted the row heading twice when the table had no
row-label column: once as the standalone bold heading and once again
as the first labeled bullet (heading == headers[0] == data_cells[0]).

This commit:

* Uses single newlines between the heading and its bullets within a
  row-group, and a blank line only BETWEEN row-groups.
* Skips any bullet whose value duplicates the heading text when the
  table has no row-label column (the heading already carries that
  information).  Tables WITH a row-label column are unaffected since
  the heading comes from the label cell and never duplicates a header.

Updated existing test assertions accordingly and added two regression
tests: one that reproduces the screenshot bug (wide five-column "Plays"
comparison table) and one that pins the row-label-column behavior so
the dedup logic doesn't accidentally swallow real data.

tests/gateway/test_telegram_format.py: 101 passed
2026-05-25 23:16:00 -07:00
kshitij
66851dc413
chore: add krislidimo to AUTHOR_MAP for PR #29775 (#32434) 2026-05-25 23:15:56 -07:00
Teknium
d8703e27f5
feat(skills-hub): health checks, freshness badge, and a watchdog cron (#32345)
Layered safety so the Skills Hub at /docs/skills stays in sync without
silent rot. Three pieces:

1. build_skills_index.py — refuses to ship a degenerate index.
   EXPECTED_FLOORS per source (skills.sh ≥100, lobehub ≥100, clawhub ≥50,
   official ≥50, github ≥30, browse-sh ≥50) and MIN_TOTAL=1500. Any source
   collapsing to zero (the silent OpenAI breakage that hid for weeks) now
   fails the workflow loud — broken index never reaches the live site.

2. extract-skills.py + the React page — visible freshness signal.
   Sidecar website/src/data/skills-meta.json carries the index's
   generated_at timestamp, plus per-source counts. Skills Hub renders a
   'Catalog refreshed N hours ago · auto-rebuilt twice daily' line under
   the hero copy. If the cron stalls, users see the staleness immediately.

3. .github/workflows/skills-index-freshness.yml — watchdog cron.
   Every 4 hours, fetches the live /docs/api/skills-index.json, validates
   shape, checks age (>26h is stale), checks the same per-source floors,
   and opens (or appends to) a GitHub issue when anything is off. The
   issue is title-prefixed [skills-index-watchdog] so subsequent failures
   append a comment instead of spamming new issues.

Net effect:
- A silent regression like 'OpenAI tap moved its skills' now fails the
  build instead of shipping a quietly broken catalog.
- A stuck cron (like the landingpage breakage that ran red for weeks) now
  files an issue within 4 hours.
- Users see how fresh the catalog is on the page itself.

Test plan:
- Local: built skills-meta.json from the live index → 'Catalog refreshed
  N minutes ago' rendered correctly in the static HTML.
- Probe logic dry-run against the live index: total=2456, all 6 sources
  above floor, age 0.1h — issues=NONE.
- Triggered skills-index.yml manually; both jobs green, deploy-site.yml
  dispatch fired.
2026-05-25 23:10:45 -07:00
Teknium
cea87d9139
fix(skills-hub): show every catalog source on /docs/skills (skills.sh, ClawHub, browse.sh, OpenAI, …) (#32336)
Some checks are pending
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Docker / shell lint / Lint Dockerfile (hadolint) (push) Waiting to run
Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Waiting to run
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Docker Build and Publish / move-latest (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
Nix / nix (macos-latest) (push) Waiting to run
Nix / nix (ubuntu-latest) (push) Waiting to run
Build Skills Index / build-index (push) Waiting to run
Build Skills Index / trigger-deploy (push) Blocked by required conditions
Tests / test (1) (push) Waiting to run
Tests / test (2) (push) Waiting to run
Tests / test (3) (push) Waiting to run
Tests / test (4) (push) Waiting to run
Tests / test (5) (push) Waiting to run
Tests / test (6) (push) Waiting to run
Tests / save-durations (push) Blocked by required conditions
Tests / e2e (push) Waiting to run
The Skills Hub page was stuck on a stale Feb 25 snapshot, showing only Built-in
+ Optional + Anthropic + LobeHub. The unified index already has 2078 skills
from skills.sh / ClawHub / LobeHub / GitHub taps / Claude Marketplace, and
BrowseShSource adds another ~330 — none of it was reaching the page.

Changes:

- website/scripts/extract-skills.py: read website/static/api/skills-index.json
  (the unified multi-source catalog, rebuilt twice daily) as the canonical
  external source. Keep the legacy skills/index-cache/ fallback for offline
  builds. Add friendly per-source labels (skills.sh, ClawHub, browse.sh,
  OpenAI, HuggingFace, Anthropic, LobeHub, etc.) and per-entry installCmd.
- website/src/pages/skills/index.tsx: add source pills + ordering for the 11
  new sources; render installCmd from the index entry.
- website/scripts/prebuild.mjs: when no local skills-index.json exists, fetch
  the live one from hermes-agent.nousresearch.com so local 'npm run build'
  matches production without burning GitHub API quota.
- scripts/build_skills_index.py: crawl BrowseShSource so browse.sh entries
  land in the unified index. Adjust source_order.
- tools/skills_hub.py: GitHubSource.DEFAULT_TAPS — openai/skills moved its
  skills into skills/.curated/ and skills/.system/, so add both as explicit
  taps (the listing code skips dotted dirs by design). Drop
  VoltAgent/awesome-agent-skills (README-only, no SKILL.md files) and
  MiniMax-AI/cli (singular skill, not a tap directory). Net effect: github
  source jumps from 83 → 143 skills, with OpenAI properly included.
- .github/workflows/deploy-site.yml: build the unified index BEFORE running
  extract-skills.py — previous order meant extract-skills always fell back
  to the legacy cache. Drop the 'skip if file exists' guard; the file is
  gitignored and must be rebuilt every deploy.
- .github/workflows/skills-index.yml: drop the broken 'deploy-with-index'
  job (it cp'd 'landingpage/\*' which no longer exists, failing every cron
  run since the landingpage move). Replace it with a workflow_dispatch
  trigger of deploy-site.yml so the index refresh still reaches production
  on schedule.
- website/docs/user-guide/features/skills.md: drop VoltAgent from the
  default-taps doc list to match the code.

Before: 695 skills (Built-in 90, Optional 84, Anthropic 16, LobeHub 505).
After:  2168 skills across 9 source pills, including the 1212 skills.sh
        entries the user expected to see.
2026-05-25 18:34:54 -07:00
MorAlekss
c26af46811 fix(skills): reject symlinks in skill bundles before install 2026-05-25 18:33:02 -07:00
Teknium
fe9744cbee chore(release): map ffr31mr + TheOnlyMika in AUTHOR_MAP
Pre-salvage prep for the must-have security cluster (#32103, #32155).
#32103 author commit uses dearmayo@localhost; PR opener is ffr31mr —
same pattern as the existing holynn-q localhost mapping.
2026-05-25 18:33:02 -07:00
Teknium
ccd899318e
fix(cron): split scanner into two tiers so skill prose stops false-positiving (#32339)
The runtime cron prompt scanner (added in #3968 to plug the
"malicious skill carrying an injection payload" gap) reuses the same
critical-severity patterns as the create-time user-prompt scan against
the *assembled* prompt — which includes loaded skill markdown.

That works fine for narrow patterns like "ignore previous instructions"
which never legitimately appear in prose. It catastrophically false-
positives on command-shape patterns like `cat ~/.hermes/.env`,
`authorized_keys`, `/etc/sudoers`, and `rm -rf /`, which routinely
appear in security postmortems and runbooks as **descriptive prose**
about attacks, not as actual commands.

Concrete failure: the bundled `hermes-agent-dev` skill contains a
security postmortem section saying "the attacker could just
`cat ~/.hermes/.env`". Every PR-scout cron job that loaded this skill
was silently blocked with `Blocked: prompt matches threat pattern
'read_secrets'`. All 11 scout jobs failed for weeks.

Fix: split the scanner into two tiers and route by context:

  - `_scan_cron_prompt` (strict, unchanged behavior) runs against
    the small user-authored cron prompt at create/update and as a
    runtime defense-in-depth when no skills are attached. A legit
    user prompt has no business saying `cat .env`, so the strict
    patterns still apply there.

  - `_scan_cron_skill_assembled` (new, looser) runs against the
    assembled prompt when skills are attached. It only catches
    unambiguous prompt-injection directives ("ignore previous
    instructions", "disregard your rules", "system prompt override",
    "do not tell the user") plus invisible-unicode markers. Command-
    shape patterns are dropped because they false-positive on prose.

This is defense-in-depth, not the only line of defense. Skill bodies
are already scanned at install time by `skills_guard.py`; the runtime
cron scan exists purely as a tripwire for an obvious injection
directive surviving a malicious install. Catching prose mentions of
commands was never the goal of #3968 — the test that planted a skill
containing `cat ~/.hermes/.env` was the wrong shape of test for the
threat model.

Tests:
- `_scan_cron_prompt` strict behavior preserved (56 existing tests
  unchanged: bare `cat .env`, `rm -rf /`, etc. still block).
- New `TestScanCronSkillAssembled` class verifies the looser scanner:
  injection / disregard / system-override / do-not-tell-the-user /
  invisible-unicode still block; descriptive prose about attack
  commands is allowed; GitHub auth-header allowlist still works.
- `test_skill_with_env_exfil_payload_raises` (planted `cat .env`
  in skill body) replaced with `test_skill_with_env_exfil_command
  _in_prose_is_allowed` documenting the new correct behavior with
  the real-world postmortem-style example that triggered the bug.
- All 11 originally-failing PR-scout jobs validated end-to-end via
  `_build_job_prompt` — assembled prompts now build successfully
  with the `hermes-agent-dev` skill attached.

Total: 75/75 tests in cron + cronjob_tools + threat scanner pass;
544/544 across the wider cron / memory / threat-pattern surface.
2026-05-25 18:20:45 -07:00
Teknium
e3236e99a4 fix(anthropic): API-key path skips OAuth autodiscovery + prunes stale entries
When the user picks 'Anthropic API key' at `hermes setup` (vs 'Claude
Pro/Max subscription'), `save_anthropic_api_key()` writes ANTHROPIC_API_KEY
to ~/.hermes/.env and zeros ANTHROPIC_TOKEN.  That env-var pattern is the
user's explicit choice of auth method — API key, not OAuth.

But the anthropic credential pool's autodiscovery (_seed_from_singletons)
unconditionally read ~/.claude/.credentials.json from the Claude Code CLI
and any saved hermes_pkce creds, and added them to the SAME anthropic
pool as the user's API key.  Two problems:

  1. Even with the API key at higher priority, a 401/429 on the API key
     would rotate the session onto an autodiscovered OAuth credential,
     silently flipping the agent into the Claude Code masquerade
     mid-conversation: 'You are Claude Code' system block, every tool
     renamed to mcp_*, claude-cli User-Agent header.

  2. Switching OAuth → API key at `hermes setup` cleared the env vars
     but left previously-seeded OAuth entries dormant in auth.json,
     where rotation could revive them.

The user picking the API-key path is explicitly opting OUT of the
masquerade.  Mixing OAuth credentials into their pool defeats that
choice.

Fix: in `_seed_from_singletons` for provider='anthropic', detect the
API-key path (ANTHROPIC_API_KEY set in env, no OAuth env var set) and:
  - Skip calling read_claude_code_credentials() and
    read_hermes_oauth_credentials() entirely
  - Prune any stale hermes_pkce / claude_code entries that may already
    be in the on-disk pool

OAuth-path users (ANTHROPIC_TOKEN set) are unaffected — autodiscovery
continues to fire as before.

Tests: 3 new regression tests (api-key skips autodiscovery, api-key
prunes stale entries, oauth path still autodiscovers).  Full file 70/70.
2026-05-25 17:41:40 -07:00
Teknium
2c6bbaf352
fix(gateway): coerce scalar model: to dict before /model --global persist (#32272)
Reported via AskClaw. When config.yaml has `model: <name>` (flat string)
instead of the nested `model: {default: ..., provider: ...}` form, every
gateway `/model X --global` crashed silently with

    TypeError: 'str' object does not support item assignment

The persist block did:

    model_cfg = cfg.setdefault("model", {})
    model_cfg["default"] = result.new_model

`setdefault` returns the existing scalar, and the next assignment blows
up. The 'switch failed' warning was logged at WARNING level and the user
never saw why their persist didn't stick.

Coerce scalar/None `model:` into a dict before mutation, in both the
gateway path (`gateway/run.py`) and the sister site in
`hermes_cli/doctor.py --fix` (same setdefault-on-string flaw). The CLI
`/model` path is unaffected because it goes through `_set_nested` which
already replaces scalar leaves with dicts.

Regression test `tests/gateway/test_model_command_flat_string_config.py`
covers the flat-string, missing, and proper-dict cases. Without the fix,
the flat-string case fails with the exact original TypeError.
2026-05-25 15:22:23 -07:00
Teknium
de76f4dbcf
fix(secrets): only apply external secrets once per HERMES_HOME per process (#32271)
`load_hermes_dotenv()` is called at module-import time from cli.py,
hermes_cli/main.py, run_agent.py, trajectory_compressor.py, gateway/run.py,
tui_gateway/server.py, acp_adapter/entry.py, and a few others. Each call
triggered `_apply_external_secret_sources()`, which re-parsed config,
re-fetched from Bitwarden Secrets Manager (its own 300s cache mostly absorbed
this), re-ran the ASCII sanitization sweep, and reprinted

  Bitwarden Secrets Manager: applied N secret(s) (...)

to stderr. Users saw the status line 3-5x per CLI startup.

Guard the function with a process-level set of HERMES_HOME paths that have
already had external secrets applied. Subsequent calls for the same home_path
are no-ops. `reset_secret_source_cache()` lets tests (and any future
long-running consumer that wants to refresh after a config change) force a
re-pull.
2026-05-25 15:18:55 -07:00
Teknium
6bd0be30be
feat(patch): indentation preservation, CRLF preservation, per-file failure escalation (#507) (#32273)
Three granular patch-tool refinements from the Roo Code deep-dive (#507).

## Indentation preservation (fuzzy_match.py)

When fuzzy_find_and_replace matches via a non-exact strategy, the file's
indentation may differ from what the LLM sent in old_string/new_string
(common case: model sends zero-indent old/new for a method body that
lives inside an 8-space-indented class). Before this commit the
replacement was spliced in verbatim, producing a file with a broken
indent level that may still parse but is logically wrong.

The fix computes the indent delta between old_string's first meaningful
line and the matched region's first meaningful line, then re-indents
every line of new_string by that delta. Exact-strategy matches are
untouched (passthrough). Same approach as Roo Code's
multi-search-replace.ts:466-500.

## CRLF preservation (file_operations.py)

Models nearly always send tool args with bare LF endings (JSON-encoded),
but the file on disk may have CRLF (Windows-line-ending configs, .bat,
.cmd, .ini files). Before this commit:

- write_file silently normalized CRLF to LF on every overwrite
- patch produced mixed-ending files: the substituted region had LF,
  the surrounding context kept CRLF

The fix detects the file's existing line endings (via pre_content if
already read for lint/LSP, otherwise a tiny head -c 4096 probe), and
normalizes the entire write to that ending. New files are written
verbatim (no detection possible).

## Per-file failure escalation (file_tools.py)

When the agent fails to patch the same file 3+ times in a row, the
existing 'old_string not found' hint isn't strong enough — the model
keeps retrying with variations against a stale view of the file.

The fix tracks consecutive failures per (task_id, resolved_path) and
injects an escalating hint after 3 failures: 'This is failure #N
patching X. Stop retrying. Either re-read fresh, use longer context,
or fall back to write_file.' Counter resets on a successful patch to
the same path.

## Validation

- 22 new tests across tests/tools/test_fuzzy_match.py (5),
  test_line_ending_preservation.py (12), test_patch_failure_tracking.py (5)
- All existing tests pass (165/165 in the touched files)
- E2E verified with real _handle_patch / _handle_write_file calls
  against real CRLF files and real failure loops

Closes part of #507. The remaining open items in #507 (2b start_line
hint, behavioral rules) were declined after audit:
- 2b adds schema bloat for a problem the existing 'multiple matches'
  contract already handles
- Behavioral rules conflict with the personality system

Items 1, 2d, 2e, 3, 4 of #507 were already landed in earlier work.
2026-05-25 15:18:45 -07:00
Teknium
c2aa235328
fix(agent): log outer-loop exceptions at ERROR with traceback (#32264)
The outer 'except Exception' guard in run_conversation() captures
exceptions raised inside the agent loop (during streaming, tool
dispatch, message construction, etc.) and prints a one-line summary
to the screen.  The traceback was only logged at DEBUG, so it never
landed in errors.log (WARNING+) and was lost.

For intermittent failures — the most important kind to debug — users
saw 'Error during OpenAI-compatible API call #N: <message>' on
screen with no way to recover the call site.  Switching to
logger.exception() emits the full traceback at ERROR so it goes to
both agent.log and errors.log automatically.

This is a pure logging change; control flow is unchanged.
2026-05-25 15:16:54 -07:00
Teknium
30928f945f
fix(dashboard): suffix-allowlist plugin assets + denylist subprocess-influencing env vars (#32277)
Two posture fixes surfaced by the web-pentest skill self-test against
the dashboard (issue #32267).

1. /dashboard-plugins/<name>/<path> previously returned 200 for any
   file inside the plugin's dashboard directory — including
   plugin_api.py and __pycache__/*.pyc. The path is unauthenticated by
   architecture (SPA loads JS via <script src> and CSS via <link href>,
   neither of which can attach a custom auth header), so the fix is
   not "require token" — it's "restrict to browser-fetchable suffixes."
   Allowlist now: .js .mjs .css .json .html .svg .png .jpg .jpeg .gif
   .webp .ico .woff .woff2 .ttf .otf .map. Everything else → 404.

   This stops a private user-installed plugin's Python source from
   being readable by anyone reachable on the dashboard's loopback port
   (other local users on a shared box, sidecar containers sharing the
   host netns).

2. save_env_value() now refuses to persist env-var names that
   influence how the next subprocess executes: LD_PRELOAD,
   LD_LIBRARY_PATH, LD_AUDIT, DYLD_*, PYTHONPATH, PYTHONHOME,
   PYTHONSTARTUP, NODE_OPTIONS, NODE_PATH, PATH, SHELL, EDITOR,
   VISUAL, PAGER, BROWSER, GIT_SSH_COMMAND, GIT_EXEC_PATH; plus
   HERMES_HOME / HERMES_PROFILE / HERMES_CONFIG / HERMES_ENV.

   PUT /api/env is authed but the session token lives in the SPA HTML
   where any future plugin XSS or local process can read it. Without
   this gate, a token-holder could plant LD_PRELOAD in .env and the
   next hermes process start would load attacker code via the dotenv
   to os.environ chain. This is enforced on write only — pre-existing
   .env values are left alone (the gate is in save_env_value, not in
   load_env). PUT /api/env now returns 400 with the explanatory
   message instead of an opaque 500.

   IMPORTANT: HERMES_* overall is NOT blocked — only the four runtime
   location names. Integration credentials following the HERMES_*
   convention (HERMES_GEMINI_*, HERMES_LANGFUSE_*, HERMES_SPOTIFY_*,
   HERMES_QWEN_BASE_URL, ...) keep working.

Regression tests cover both fixes (30 new test cases). No existing
tests changed; 257 passing in tests/hermes_cli/.

Closes #32267.
2026-05-25 15:07:19 -07:00
teknium1
27df4b3882 fix(telegram): exempt reply_to_mode=off DM topic sends from anchor-required guard
Salvage follow-up. The new private-DM-topic fail-loud contract from
PR #27107 hits 'requires a reply anchor' when reply_to_mode='off' is
configured, even though commit 21a15b671 (PR #23994) verified that
message_thread_id alone routes correctly on python-telegram-bot's
reference client when the user has explicitly opted out of quote
bubbles. Carve out the explicit opt-in path so users on reply_to_mode
'off' aren't regressed — the new guard now only applies to callers
that didn't ask for the anchor to be suppressed.
2026-05-25 14:54:02 -07:00
teknium1
926da69b45 test(telegram): switch transient-flake retry test to group chat
Salvage follow-up. The transient thread-not-found retry test was
exercising chat_id='123' (positive, looks-like-private) which now
hits the new private-DM-topic fail-closed contract. The test's
intent is the transient-flake retry on real forum topics in groups,
so use -100123 to make the scenario unambiguous.
2026-05-25 14:54:02 -07:00
stepanov1975
5b1c75d662 refactor: simplify Telegram DM topic refresh
(cherry picked from commit bf8048ad87)
2026-05-25 14:54:02 -07:00
stepanov1975
c394e7919d fix: refresh stale Telegram DM topic threads
(cherry picked from commit 26b87057ad)
2026-05-25 14:54:02 -07:00
stepanov1975
dcd504cea4 fix: auto-create Telegram DM topics for delivery
(cherry picked from commit 5cde0614e8)
2026-05-25 14:54:02 -07:00
stepanov1975
96c71d8c46 fix: require anchors for Telegram DM topic deliveries
(cherry picked from commit 6daafb3fd4)
2026-05-25 14:54:02 -07:00
stepanov1975
6b7da11749 test: isolate API server env in gateway tests
(cherry picked from commit 3d585f8db5)
2026-05-25 14:54:02 -07:00
stepanov1975
415be55394 fix: route Telegram DM topic deliveries directly
(cherry picked from commit ad8f97db6c)
2026-05-25 14:54:02 -07:00
Teknium
0dee92df22
feat(security): promptware defense — shared threat patterns + memory load-time scan + tool-result delimiters (#32269)
Hardens the context window against Brainworm-class promptware attacks
(see #496). Three changes:

1. tools/threat_patterns.py — single source of truth for injection/promptware
   patterns. Replaces the duplicated pattern lists in prompt_builder.py and
   memory_tool.py. Adds ~15 new Brainworm/C2 patterns (node registration,
   heartbeat/beacon, pull tasking, anti-forensic disk avoidance, identity
   override, known framework names). Three scopes — 'all' (narrow, classic
   injection), 'context' (adds promptware/role-play, broader detection),
   'strict' (adds persistence/SSH-backdoor patterns for user-mediated writes).

2. MemoryStore.load_from_disk() now scans entries at snapshot-build time.
   Poisoned entries are replaced with [BLOCKED: ...] placeholders in the
   frozen system-prompt snapshot. Live state keeps the original so the
   user can still inspect + remove via memory(action=read/remove). Scan is
   deterministic from disk bytes — prefix-cache invariant holds.

3. make_tool_result_message() wraps results from high-risk tools
   (web_extract, web_search, browser_*, mcp_*) in
   <untrusted_tool_result source="...">...</untrusted_tool_result>
   delimiters with framing prose telling the model the content is data,
   not instructions. Architectural defense against indirect injection
   from poisoned web pages, GitHub issues, MCP responses — does NOT
   regex-scan tool results (pattern arms race + per-iteration latency).
   Multimodal content lists pass through unwrapped to preserve adapter
   compatibility.

Pattern philosophy: anchor on C2-specific vocabulary or unambiguous attack
behavior, NOT on bossy English. Dropped patterns suggested in #496 that
would have tripped legitimate content: standalone 'you are obligated to',
'do not respond immediately', 'you must X' without a C2-verb anchor.

Validation:
- 257/257 targeted tests pass (test_threat_patterns + test_memory_tool +
  test_tool_dispatch_helpers + test_prompt_builder)
- E2E run with real Brainworm payload: blocked from AGENTS.md context-file
  path, blocked from MEMORY.md snapshot, wrapped in delimiters when
  arriving via web_extract. Legitimate 'you must follow conventions'
  phrasing not flagged.

Explicitly NOT in this PR (per #496 discussion):
- Per-tool-result regex scanning (pattern arms race)
- SessionBehaviorMonitor / polling-loop detection (wrong layer)
- Outbound network gating (Docker backend already covers this)
- security.context_scanning warn|block knob (current behavior is always
  block-with-placeholder — there's no warn mode that makes sense)

Closes #496 for Phase 1 + the architectural delimiter piece of Phase 2.
Phase 3 stays in tracking issue territory.
2026-05-25 14:52:24 -07:00
Teknium
b6ce7a451f chore(release): add ronhi for PR #29523 salvage
Maps the machine-local commit email (ronhi@buildabear1.localdomain) to
the GitHub login RonHillDev so the attribution check passes.
2026-05-25 14:51:43 -07:00
ronhi
bbc8f2f961 chore(models): drop retired grok-4-1-fast from metadata, tests, docs
xAI retired grok-4-1-fast. hermes_cli/models.py already removed it from
the static fallback in an earlier commit, but the context-length
metadata, the tests pinning those values, and the provider doc still
referenced the retired ID. Clean those up so retired model names stop
appearing in user-facing output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 14:51:43 -07:00
Teknium
263e008d6b
feat(skills): add web-pentest optional skill (#32265)
Adds optional-skills/security/web-pentest/ — an authorized web app
penetration testing skill adapted from Shannon's methodology (concepts
only; AGPL-clean fresh implementation).

Phased: recon (read-only) → vuln analysis (delegate_task per OWASP
class) → proof-based exploitation → report.

Guardrails baked in:
- Authorization gate before first active scan (templates/authorization.md)
- Scope allowlist (scope.txt) consulted by recon-scan.sh and
  documented as the rule for every active request
- Aux-client leakage warning (compression + title gen replay history;
  payloads/creds must not enter chat verbatim)
- Bypass-exhaustion discipline before false-positive classification
- L3/L4 (proof-required) for reportable findings; L1/L2 listed as
  candidates only

Closes #400. Supersedes #21845 (plugin-shaped proposal; skill-shaped is
cheaper and matches the existing optional-skills/security/ pattern).
2026-05-25 14:51:41 -07:00
teknium1
386f245d9d feat(skills): add optional openhands skill — closes #477
Adds an optional autonomous-ai-agents skill that delegates coding tasks
to the OpenHands CLI (https://github.com/All-Hands-AI/OpenHands). Sits
alongside claude-code / codex / opencode and is the model-agnostic
option in that family — any LiteLLM-supported provider works.

This is a ground-truth rewrite of #19325 by @xzessmedia (Tim Koepsel).
The original PR's SKILL.md was drafted by the OpenHands agent itself and
hallucinated several flags that don't exist in the real CLI (\`--model\`,
\`--max-iterations\`, \`--workspace\`, \`--sandbox docker\`), pointed at
the wrong PyPI package (\`openhands-ai\`, which is the legacy V0 SDK),
and claimed native Windows support that the upstream docs explicitly
disclaim. Rather than cherry-pick and rewrite half the lines under
contributor authorship, the SKILL.md was rebuilt against a verified
install (\`uv tool install openhands --python 3.12\`) and a real
end-to-end \`--headless --json\` run against openrouter/openai/gpt-4o-mini.

Authorship credited via the \`author:\` frontmatter field and an
AUTHOR_MAP entry in scripts/release.py.

Changes:
- optional-skills/autonomous-ai-agents/openhands/SKILL.md (new)
- website/docs/user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-openhands.md (auto-gen)
- website/docs/reference/optional-skills-catalog.md (one new row)
- website/sidebars.ts (one new entry under Optional → Autonomous AI Agents)
- scripts/release.py (AUTHOR_MAP entry for xzessmedia)

Pitfalls documented in the SKILL came from running the tool, not from
the upstream README: LiteLLM bedrock/sagemaker stderr noise on every
invocation, banner spam (\`OPENHANDS_SUPPRESS_BANNER=1\` required),
\`--override-with-envs\` mandatory or the CLI ignores LLM_* env vars
entirely, the dashed-vs-undashed Conversation ID footgun for \`--resume\`,
LiteLLM model-slug double-prefix when going through OpenRouter.
2026-05-25 14:49:34 -07:00
Teknium
5671461c0c
feat(skills): add code-wiki skill — closes #486 (#32240)
* feat(skills): add code-wiki skill — closes #486

Bundled skill at skills/software-development/code-wiki/ that generates
comprehensive documentation for any codebase: project overview, architecture
walkthrough with Mermaid flowchart, per-module deep-dives, class diagram,
sequence diagrams, getting-started guide, and (when applicable) API reference.

Output defaults to ~/.hermes/wikis/<repo-name>/ (external to repo, like
Google CodeWiki); in-repo output supported when user explicitly requests it.

Uses only existing Hermes tools (terminal, read_file, search_files,
write_file) — no Docker, no external services, no extra dependencies. Works
on local repos and GitHub URLs (shallow-clones to a temp dir). Bounded scope
defaults (depth 3, cap 10 modules) keep token cost reasonable on large repos.

* refactor(skills): move code-wiki to optional-skills

Per the 'when in doubt, optional' rule — wiki generation is a 'I want this
big thing right now' capability, not daily-driver behavior. Lines up with
finance/research/blockchain skills as install-on-demand rather than always
loaded.

Install via: hermes skills install official/software-development/code-wiki
2026-05-25 14:48:53 -07:00
Teknium
5caeb65a08 test(tts): regression coverage for #29417 double-[pause] fix
Three new tests in tests/tools/test_tts_xai_speech_tags.py:

- multi_paragraph_emits_single_pause — the headline #29417 case.
  Requires a first sentence of 12+ chars to hit the
  _XAI_FIRST_SENTENCE_RE length floor; the trivial 'Hello.\\n\\nWorld.'
  case dodged the bug by accident, which is why the PR's quoted
  repro didn't reproduce.  Uses the longer 'Welcome to the demo of
  our new product line.\\n\\nIt has many features.' shape that
  actually trips the bug.
- single_paragraph_still_gets_first_sentence_pause — sanity guard
  that the fix only suppresses the first-sentence pass when a
  paragraph pass injected [pause], so plain single-paragraph input
  still gets its leading pause.
- single_newline_still_gets_first_sentence_pause — single newline
  isn't a paragraph break, no [pause] from the paragraph pass, so
  the first-sentence pause MUST still fire.  Catches over-broad
  fixes.
2026-05-25 14:30:06 -07:00
EloquentBrush0x
1d73d5facc fix(tts): prevent double [pause] in xAI auto speech tags for multi-paragraph text
_apply_xai_auto_speech_tags runs two independent transformations:
  1. paragraph breaks (\n\n) → " [pause] "
  2. first-sentence boundary → " [pause] "

Both fired unconditionally, so multi-paragraph input produced
"Hello world. [pause] [pause] Second paragraph." — an unnatural
double pause in the TTS audio.

Guard the first-sentence substitution with _XAI_SPEECH_TAG_RE.search(clean):
if the paragraph pass already inserted a [pause] tag, skip the
first-sentence pass. Single-paragraph behavior is unchanged.
2026-05-25 14:30:06 -07:00
alt-glitch
b62af47da8 chore: drop stale line-number reference in PRIORITY path comment
The cherry-pick comment referenced 'line ~6771' for the /stop handler,
but on current main the handler is at a different offset. Remove the
hard-coded line number — the 'above' reference is sufficient.
2026-05-25 16:23:24 +00:00
xxxigm
737ee81167 test(gateway): regression tests for #30170 subagent interrupt protection
17 new tests in tests/gateway/test_subagent_protection_30170.py pin
down both the detection helper and the demotion behaviour:

  * TestAgentHasActiveSubagents — 11 cases covering the precision and
    defensiveness of _agent_has_active_subagents:
      - returns False for None, _AGENT_PENDING_SENTINEL, and stub
        agents that lack the _active_children attribute;
      - returns False for an empty list (the steady state of an idle
        AIAgent);
      - returns True for one or many children;
      - works when _active_children_lock is None (test stubs);
      - rejects truthy MagicMock auto-attributes — this is the
        regression-guard for "every MagicMock-based gateway test
        suddenly demotes to queue mode" (which is how this was
        originally found);
      - accepts list/tuple/set as the children container.

  * TestBusyHandlerDemotesInterruptForSubagents — 6 cases driving
    _handle_active_session_busy_message directly:
      - parent.interrupt is NOT called when subagents are active,
        message is still merged into the pending queue;
      - ack copy mentions "Subagent working", "queued", and the
        /stop escape hatch — and does NOT mention "Interrupting";
      - with no subagents, behaviour is byte-identical to the
        pre-#30170 interrupt path (parent.interrupt called with the
        user text, ack says "Interrupting");
      - configured queue mode keeps its vanilla "Queued for the next
        turn" ack (the #30170 demotion-specific copy must NOT fire);
      - configured steer mode still routes to running_agent.steer()
        even when subagents are active (the guard is interrupt-only);
      - _AGENT_PENDING_SENTINEL does not trigger demotion.

Refs #30170.
2026-05-25 16:23:24 +00:00
xxxigm
99d62f6ba1 fix(gateway): protect in-flight subagents from busy-mode interrupts (#30170)
When a user sends a conversational follow-up while delegate_task is
running, gateway/run.py calls running_agent.interrupt(event.text) on
the PARENT agent. AIAgent.interrupt() then cascades synchronously
through self._active_children and calls interrupt() on every child
subagent, aborting in-flight delegate_task work. The user sees the
fallback cascade with no root-cause in the gateway log, and minutes of
subagent progress are destroyed — the exact failure mode reported in

Add GatewayRunner._agent_has_active_subagents(running_agent) — a
static helper that returns True iff the parent is currently driving
subagents via delegate_task. The helper is type-defensive: it ignores
truthy MagicMock auto-attributes (so this doesn't accidentally fire
in every test mock that hits the busy path), the _AGENT_PENDING_SENTINEL
placeholder, and missing locks.

Wire the helper into both interrupt branches:

  1. _handle_active_session_busy_message — the adapter-level busy
     handler. When busy_input_mode == 'interrupt' AND the parent has
     active subagents, demote to 'queue' semantics: skip the
     parent.interrupt() call, merge the message into the pending
     queue, and surface a dedicated ack (" Subagent working — your
     message is queued for when it finishes (use /stop to cancel
     everything).") so the operator knows the message wasn't lost and
     discovers the explicit escape hatch.

  2. The PRIORITY interrupt branch inside _handle_message — the
     non-command fast path. Same rationale, same demotion. Routes
     through _queue_or_replace_pending_event so the next-turn pickup
     stays unchanged.

Explicit /stop and /new commands take a completely different path
(_interrupt_and_clear_session in the slash-command dispatch at line
~6771) and are NOT affected by this guard — the operator still has a
way to force-cancel everything when they actually mean it. Configured
'queue' and 'steer' modes are also untouched: 'queue' already does the
right thing, and 'steer' goes through running_agent.steer() which does
NOT cascade to children (so subagents survive a steer too).

This is Phase 1 of the fix outlined in #30170 — the minimum viable
change that stops subagent loss. Phase 2 (delegation-aware steer
forwarding to active children) and Phase 3 (async delegation, #11508)
are intentionally out of scope.

Refs #30170.
2026-05-25 16:23:24 +00:00
brooklyn!
50aaf0c4ad
fix(tui): delineate assistant responses from details (#31087)
* fix(tui): delineate assistant responses from details

Add a muted Response marker before assistant text when thinking/tool details are visible so reasoning and final output do not visually run together.

* fix(tui): account for response separator height

Keep virtual transcript estimates aligned with the new response separator and avoid allocating trimmed copies of long assistant text.

* fix(tui): gate response separator estimate on details

Only add response-separator height when assistant details actually render, and use a non-allocating body-text check.

* fix(tui): skip empty detail height estimates

Do not add virtual transcript height for assistant details when no thinking or tool detail UI will render.

* fix(tui): estimate details by section visibility

Pass resolved thinking/tool visibility into virtual height estimates so hidden detail sections do not reserve response-separator rows.
2026-05-25 10:23:03 -05:00
brooklyn!
0ec0cafdd0
Merge pull request #31084 from NousResearch/bb/tui-right-click-copy-selection
fix(tui): right-click copies active transcript selection
2026-05-25 10:22:43 -05:00
Savanne Kham
4117fc3645 fix(credential-pool): correct pool rotation when weekly usage limit is reached
After key #1 is marked exhausted the retry still called the API with key #1
due to env-var bias in _get_cached_client / resolve_api_key_provider_credentials.
Fix: peek the pool and pass the active entry's key as explicit_api_key.
Secondary: api_key_hint in mark_exhausted_and_rotate pins the correct entry
under concurrent CLI+gateway calls; _is_payment_error matches GoUsageLimitError;
extract_api_error_context parses "Resets in Xhr Ymin".
2026-05-25 06:32:30 -07:00
Teknium
8f19485f53 chore(release): map kylekahraman email to GitHub login
Required by CI author validation after salvaging PR #29723.
2026-05-25 06:23:18 -07:00
kylekahraman
ab42658dfc feat: configurable paste collapse thresholds (TUI + CLI)
Adds two new config keys:
- paste_collapse_threshold (default: 5) — line count threshold for
  bracketed paste collapse in both TUI and CLI
- paste_collapse_threshold_fallback (default: 0, disabled) — same for
  the fallback heuristic in terminals without bracketed paste support

TUI frontend reads these from config.get full via applyDisplay/patchUiState.
CLI reads from self.config at paste-handling time.

Closes #5626
Related: #5623
2026-05-25 06:23:18 -07:00
zccyman
973bb124a4 fix(credential-pool): rotate immediately when credential already exhausted
Closes #26145.

When the user interrupts the retry loop between two 429s (Ctrl-C in
interactive mode, /new, gateway disconnect), the local has_retried_429
flag dies with the recovery function. On the next user prompt the agent
restarts with has_retried_429=False, hits 429 on the exhausted credential,
sets the flag, returns 'retry once'. Repeat forever — the second 429 that
would trigger rotation is never reached, and healthy entries (priority>0
free/paid accounts) are never tried.

Fix: in recover_with_credential_pool's rate_limit branch, pre-check
pool.current().last_status before running the retry-once dance. If the
current entry is already STATUS_EXHAUSTED, rotate immediately. Uses
getattr() for the attribute read so existing tests with SimpleNamespace
mocks (which only set 'label') keep working.

Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com>
2026-05-25 06:21:28 -07:00
Teknium
0a6a0ba527 test(skills): widen assertion in PR#6656 regression to accept new validator msg
The new install-path validator from this PR raises 'Unsafe install path:
...' earlier in the pipeline than the previous resolve-then-check path.
Behavior is identical (ok=False, victim untouched, refused before
rmtree) — only the error string changed.
2026-05-25 06:13:36 -07:00
峯岸 亮
3b9b9a7ad7 fix(skills): guard uninstall lock paths
Validate Skills Hub lock-file install paths at both ends of the
lifecycle so a poisoned or malformed lock.json entry cannot drive
shutil.rmtree to a location outside SKILLS_DIR:

- HubLockFile.record_install rejects empty/'.'/absolute/traversal/
  Windows-drive paths at write time, and requires the final path
  component to match the skill name (shape: '<skill>' or
  '<category>/<skill>').
- install_from_quarantine resolves its destination through the same
  validator, catching symlink/junction redirects inside skills/.
- uninstall_skill resolves the lock entry through the new validator
  before rmtree. Refuses anything that resolves to SKILLS_DIR itself
  (empty/dot paths) or to a target outside SKILLS_DIR (absolute paths,
  traversal, symlinked dirs in skills/ pointing outward).
- 14 focused regression tests covering each rejection class plus a
  symlink-redirect case.

E2E verified: hand-crafted poisoned lock.json entries (absolute path,
empty install_path, traversal) all refuse and leave the targeted
victim untouched; legitimate uninstall still succeeds.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-25 06:13:36 -07:00
Teknium
0d137f1039
feat(errors): actionable guidance for Nous OAuth 401s (#32082)
Nous Portal is OAuth-only (auth_type=oauth_device_code, no API key path),
but the non-retryable-401 guidance branch only covered openai-codex and
xai-oauth. A Nous 401 fell through to the generic 'Your API key was
rejected... run hermes setup' message, which is wrong advice — the user
needs hermes auth add nous --type oauth, not an API key.

Also flag the case where the failing model slug ends in :free (OpenRouter
syntax) while provider is nous. Without that hint, users re-OAuth
successfully and then hit the same 401 on the next message because Nous
Portal doesn't carry the OpenRouter free-tier slug.

Reported by ashh — debug dump showed Nous device_code exhausted +
deepseek/deepseek-v4-flash:free as the model.
2026-05-25 06:06:51 -07:00
wysie
dbe5d84972 fix(auxiliary): universal main-model fallback for aux tasks (#31845)
Aux callers (title generation, vision, session search, etc.) can reach
resolve_provider_client() without an explicit model when the user
picked their main provider via 'hermes model' and didn't bother
configuring a per-task auxiliary.<task>.model override.  The
expectation in that case is universal: 'use my main model for side
tasks too.'

Before, the OAuth providers (xai-oauth, openai-codex) silently
returned (None, None) on an empty model — both lack a catalog default
because their accepted-model lists drift on the backend.  That caused
_resolve_auto to drop to its Step-2 fallback chain (OpenRouter /
Nous / etc.), so aux tasks billed against the wrong subscription
without warning.

The fix is at the top of resolve_provider_client() — a single
3-step universal fallback that runs before any provider branch, so
no provider-specific empty-model guards are needed (now or for any
future provider we add):

    1. caller-passed model (caller knew what they wanted)
    2. provider's catalog default (cheap aux model, if registered)
    3. user's main model from config.yaml

Behaviour by provider class:

- OAuth providers (xai-oauth, openai-codex) — no catalog default, so
  step 3 applies.  Title gen runs on grok-4.3 / gpt-5.4 against the
  user's actual subscription instead of leaking to OpenRouter.
- API-key providers (anthropic, gemini, kimi-coding, etc.) — catalog
  default wins at step 2, preserving the original 'cheap aux model'
  behaviour.  Anthropic users still get claude-haiku-4-5 for titles,
  not opus.
- Explicit-model callers (auxiliary.<task>.model config, programmatic
  callers) — caller wins at step 1, no surprise switching.

Salvaged from @wysie's PR #31845 which fixed the xai-oauth branch
specifically.  The universal shape supersedes the per-branch fix
and covers openai-codex (same bug class) plus any future OAuth
providers.

4 new tests in TestResolveProviderClientUniversalModelFallback:

- empty_model_for_oauth_provider_falls_back_to_main_model
- empty_model_for_codex_also_uses_main_model
- empty_model_for_catalog_provider_uses_catalog_default
- explicit_model_takes_precedence_over_fallbacks

365/365 across tests/agent/test_auxiliary_*, tests/run_agent/test_codex_xai_oauth_recovery.py, tests/hermes_cli/test_auth_xai_oauth_provider.py, and tests/hermes_cli/test_plugin_auxiliary_tasks.py.

Co-authored-by: wysie <wysie@users.noreply.github.com>
2026-05-25 05:50:56 -07:00
Teknium
46c1ae8b24
fix(tests): four pre-existing flakes from the security cluster merge (#32072)
All four failures were broken by the security cluster (#10082 / #10133 /
#4609 / symlink-reject batch) merging on May 25. They were red on
origin/main HEAD when #32042 and #32061 ran, gating PRs that touched
unrelated code.

1) tests/hermes_cli/test_update_zip_symlink_reject.py
   test_update_via_zip_accepts_normal_member called the real
   _update_via_zip without sandboxing PROJECT_ROOT — so the function's
   shutil.copytree() actually copied the fake README from the test ZIP
   over the real repo's README.md, which then made
   test_readme_mentions_powershell_installer fail in any test run that
   happened to pick this test up earlier. Mock PROJECT_ROOT to an
   isolated tmp_path / install_dir, stub subprocess so pip/uv reinstall
   doesn't actually run, and assert the fake README lands in the
   sandbox (not the real tree).

2) tests/tools/test_windows_native_support.py
   test_readme_mentions_powershell_installer was the victim of (1) —
   nothing wrong with the test itself, the fix in (1) clears it.

3) tests/tools/test_file_read_guards.py
   test_proc_fd_other_not_blocked called _is_blocked_device('/proc/self/fd/3')
   expecting False. But _is_blocked_device runs realpath() and on
   pytest xdist workers fd 3 happens to be dup'd to /dev/urandom
   (because the worker subprocess inherits open fds from pytest's
   collection pipe machinery). Switch to the lower-level
   _is_blocked_device_path which is the path-pattern check the test
   actually means to exercise; realpath-resolution coverage already
   lives in test_symlink_to_blocked_device_is_blocked.

4) tests/tools/test_transcription_tools.py
   Module installed a faster_whisper stub via sys.modules without
   setting __spec__, then later @pytest.mark.skipif called
   importlib.util.find_spec('faster_whisper') which raises
   'ValueError: __spec__ is None' for modules with a None spec attr.
   Set __spec__ on the stub to a real ModuleSpec.

Validation: 195/195 green across the 4 affected files.
2026-05-25 05:50:29 -07:00