hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-19 15:18:03 +00:00

Author	SHA1	Message	Date
ygd58	51013268cf	fix(cron): clarify schedule is required for create in tool schema Grok models (and other LLMs) sometimes omit the schedule parameter when calling the cronjob tool with action=create because the schema only listed 'action' in required[] and the schedule description did not explicitly state it was mandatory (issue #32427). Fix: update schema descriptions to clearly state schedule is REQUIRED for action=create, making this explicit for models that rely on description text for parameter compliance. Fixes #32427	2026-05-26 14:09:37 -07:00
Teknium	ccd3d04fc5	chore(models): swap qwen3.6-plus → qwen3.7-max in openrouter+nous lists (#32809 ) Updates curated picker lists for both the OpenRouter fallback snapshot (`OPENROUTER_MODELS`) and the Nous Portal list (`_PROVIDER_MODELS['nous']`). Regenerates website/static/api/model-catalog.json via `scripts/build_model_catalog.py` to keep the docs-hosted manifest in sync (drift guard in `test_in_repo_lists_match_manifest`). tests/hermes_cli/test_models.py fixtures updated — they pinned the old model id as their live-fetch sample.	2026-05-26 14:01:47 -07:00
Teknium	8b69ec03af	feat(mcp): Nous-approved MCP catalog with interactive picker (#30870 ) * feat(mcp): Nous-approved MCP catalog with interactive picker Adds an optional-mcps/ directory mirroring optional-skills/: curated, Nous-approved MCP servers shipped with the repo but disabled by default. Presence in optional-mcps/ = approval. No community tier, no trust signals. Entries are added by merging a PR. New surface: hermes mcp Interactive catalog picker (default) hermes mcp catalog Plain-text list, scriptable hermes mcp install <name> Install a catalog entry Picker behavior: not installed -> install (clone/bootstrap if needed, prompt for creds) installed/off -> enable installed/on -> menu (disable / uninstall / reinstall) Manifest schema (manifest_version: 1) supports: - transport: stdio (command/args, ${INSTALL_DIR} substitution) or http (url) - install: optional git clone + bootstrap commands (for repos that need local venv setup, like the n8n bridge); omit for npx/uvx servers - auth: api_key (prompts -> ~/.hermes/.env), oauth (provider-mediated or native MCP), or none Catalog entries are never auto-updated. Users re-run `hermes mcp install` to refresh. Credentials always go to ~/.hermes/.env (the .env-is-for-secrets rule), never to per-server env blocks. Ships n8n as the reference manifest (https://github.com/CyberSamuraiX/hermes-n8n-mcp). Tests: 19 catalog tests + E2E install/uninstall round-trip via the shipped manifest. * feat(mcp): tool-selection checklist + Linear catalog entry Adds install-time tool selection so users only enable the MCP tools they actually want, and ships Linear as a second reference catalog entry to demonstrate the http+oauth path alongside n8n's stdio+api_key+git-bootstrap. Tool selection flow: install (clone/auth/credentials) -> probe server for available tools -> curses checklist with pre-checked rows -> write mcp_servers.<name>.tools.include Pre-check priority: 1. user's prior tools.include (reinstall preserves selection) 2. manifest's tools.default_enabled (curated subset) 3. all probed tools (default) Probe-failure fallback (server unreachable, OAuth not yet complete, backing service offline): - manifest declared default_enabled -> applied directly - no default declared -> no filter written (all-on when reachable) - both cases point user at hermes mcp configure <name> Manifest schema additions: tools: default_enabled: [list, of, tool, names] # optional Updates: - optional-mcps/linear/manifest.yaml -- new reference entry (http+oauth) - optional-mcps/n8n/manifest.yaml -- tools.default_enabled set to the 8 read-mostly tools; mutating tools (activate/deactivate, container_logs) pruned by default - docs: new 'Tool selection at install time' section in features/mcp.md Tests: 7 new tests in TestToolSelection covering probe-success / probe-fail matrix, manifest-default filtering, reinstall-preserves-selection, and invalid-default-enabled rejection. 26 catalog tests + 32 existing mcp_config tests passing. * feat(mcp): polish — picker unification, include-mode convergence, hardening Addresses review findings on PR #30870. Lands all improvements that belong in this PR before merge; defers separate cleanup (consolidating two probe implementations, change-detector tests) to follow-ups. Picker UX (mcp_picker.py) - Unifies catalog + custom (user-added) MCPs in one view with distinct status badges (available / enabled / installed (disabled) / custom — enabled / custom — disabled) - Adds 'Configure tools (probe server + re-pick)' action to both the catalog-installed and custom-row submenus — the existing hermes mcp configure flow was previously unreachable from the picker - Loops until ESC/q so the user can manage several entries in one session instead of having to re-launch - Uninstall message now mentions .env credentials are preserved with a pointer to clean them up manually if no longer needed - Surfaces a 'requires a newer Hermes' warning per future-manifest entry instead of silently hiding it Catalog (mcp_catalog.py) - catalog_diagnostics() exposes which manifests were skipped and why (future_manifest vs invalid) so UIs can give actionable feedback - _do_git_install detects SHA-shaped refs (regex /[0-9a-f]{7,40}/) and skips the doomed 'git clone --branch <sha>' attempt — clone --branch only accepts branches/tags, so SHAs always failed noisily before falling back to the full-clone path - Probe-success all-tools-enabled message now mentions that new tools the server adds later will be auto-enabled (no-filter mode) Convergence (tools_config.py) - _configure_mcp_tools_interactive now writes tools.include (whitelist) instead of tools.exclude (blacklist), matching the catalog flow and hermes mcp configure. The on-disk config shape no longer depends on which UI the user touched last - Two existing tests updated to assert the new include-mode contract Discoverability - Setup wizard final step now prints 'Browse curated MCPs: hermes mcp' - Three tip-corpus entries pointing at the new catalog - Docs updated with: trust model (manifests run code locally, gated by PR review, but read before installing), runtime ${ENV_VAR} substitution semantics, and the manifest_version forward-compat behavior Tests - 7 new tests covering future-manifest diagnostics, custom MCP picker rows, SHA-ref git-install path, branch-ref git-install path, and the tools_config include-mode write contract - 80 MCP-related tests passing across test_mcp_catalog.py, test_mcp_config.py, test_mcp_tools_config.py * fix(mcp): drop setup-wizard catalog hint to satisfy supply-chain scanner The wizard line 'Browse curated MCPs: hermes mcp' triggered the CI supply-chain scanner because it pattern-matches on edits to any file named hermes_cli/setup.py — that filename matches the Python 'install-hook file' heuristic even though this setup.py is the user-facing 'hermes setup' wizard, not a packaging install hook. The catalog is already surfaced via three tip-corpus entries in hermes_cli/tips.py (which the scanner doesn't flag), so dropping the wizard mention loses no discoverability. Worth revisiting after a scanner allowlist for this specific file lands.	2026-05-26 12:48:14 -07:00
Teknium	2517917de3	fix(cli): restore fallback paste collapse + handle long single-line pastes (#32447 ) Follow-up to #32087 after community report from @ethernet that 8000-char single-line pastes get dumped raw into the input box. A) Fallback regression revert paste_collapse_threshold_fallback default: 0 -> 5 #32087 disabled the fallback handler by default. The fallback path has been always-on with line_count >= 5 since #3065 (March 2026); the previous shape was the salvaged contributor's design and didn't match pre-existing behavior for terminals without bracketed paste support (Windows terminals, some SSH setups). Restoring the original on-by-default. B) Long single-line paste guard New config key: paste_collapse_char_threshold (default 2000) Bracketed-paste handler and fallback handler now BOTH collapse when line count >= line threshold OR total char length >= char threshold. Catches the case ethernet hit: ~8000 chars of minified JSON / log output on a single line dumped raw into the buffer. TUI mirrors the same config via uiStore.pasteCollapseChars. Set 0 to disable. Defaults verified: paste_collapse_threshold: 5 paste_collapse_threshold_fallback: 5 paste_collapse_char_threshold: 2000 Tests: tests/hermes_cli/test_config.py: 87/87 pass ui-tui useConfigSync.test.ts: 34/34 pass ui-tui useComposerState.test.ts: 9/9 pass tsc: 0 new errors in touched files	2026-05-25 23:49:01 -07:00
Teknium	31c8d5ff5f	chore(wecom): make defusedxml dep acquireable and tolerant of absence Follow-up on top of @TheOnlyMika's #32155 cherry-pick. The defusedxml hardening import was unconditional, which would break the gateway for anyone running a WeComCallback adapter without the (transitive-only) defusedxml present. - Wrap the import in the same try/except pattern as aiohttp/httpx in the same file. Sets DEFUSEDXML_AVAILABLE flag. - Extend check_wecom_callback_requirements() to gate on the flag, so the gateway logs the actual missing dep and skips the adapter instead of crashing. - Add [wecom] extra to pyproject.toml with defusedxml==0.7.1. - Register platform.wecom_callback in tools/lazy_deps.py so users get prompted to install it on first WeComCallback configuration, same pattern as discord/slack/matrix. defusedxml is still the right call for pre-auth XML parsing — this commit just makes the dep declarative and recoverable instead of a hard import-time crash.	2026-05-25 23:30:43 -07:00
TheOnlyMika	5744b17579	harden: restrict markdown link schemes; parse untrusted XML with defusedxml Two small defensive-hardening changes: - web/src/components/Markdown.tsx: render links only for http(s)/mailto schemes; other schemes (javascript:, data:, vbscript:) are dropped to plain text so a crafted link in rendered content can't execute on click. - gateway/platforms/wecom_callback.py: parse the untrusted, pre-auth WeCom callback request body with defusedxml instead of xml.etree, blocking entity-expansion / billion-laughs (and XXE) on the parse path. defusedxml is already a dependency (uv.lock); response-building XML in wecom_crypto.py is unchanged (it is not parsed from untrusted input). Verified: dashboard typechecks and builds; defusedxml blocks an entity-expansion payload while valid WeCom envelopes still parse.	2026-05-25 23:30:43 -07:00
dearmayo	f4953bc648	fix(subdirectory_hints): prevent loading AGENTS.md outside workspace SubdirectoryHintTracker was scanning directories outside the active working directory, allowing files like ~/.codex/AGENTS.md or ~/.claude/CLAUDE.md to be loaded and injected into the agent context. This causes cross-agent context contamination and instruction mixup. Add _is_ancestor_or_same() helper and a path boundary check in _is_valid_subdir(): only directories within the working directory tree (i.e. path.is_relative_to(working_dir)) are allowed. Also add exist_ok=True to mkdir() calls in new tests to prevent pytest-xdist race conditions when workers share the same tmp_path parent. Tests added: - test_outside_working_dir_rejected: verifies sibling dirs are blocked - test_outside_working_dir_absolute_path_rejected: verifies ~/.codex paths blocked - test_inside_workspace_subdir_allowed: verifies normal subdir access unaffected - test_sibling_repo_not_loaded_via_ancestor_walk: ancestor walk stays within workspace	2026-05-25 23:17:33 -07:00
Krisli Dimo	9d10c45e32	fix(telegram): tighten table row-group spacing and drop redundant first bullet The GFM → Telegram-row-group rewriter previously joined every line in every row with a blank line ("\n\n".join(rendered_rows)), which made multi-column tables explode into one-bullet-per-paragraph walls on mobile. It also emitted the row heading twice when the table had no row-label column: once as the standalone bold heading and once again as the first labeled bullet (heading == headers[0] == data_cells[0]). This commit: * Uses single newlines between the heading and its bullets within a row-group, and a blank line only BETWEEN row-groups. * Skips any bullet whose value duplicates the heading text when the table has no row-label column (the heading already carries that information). Tables WITH a row-label column are unaffected since the heading comes from the label cell and never duplicates a header. Updated existing test assertions accordingly and added two regression tests: one that reproduces the screenshot bug (wide five-column "Plays" comparison table) and one that pins the row-label-column behavior so the dedup logic doesn't accidentally swallow real data. tests/gateway/test_telegram_format.py: 101 passed	2026-05-25 23:16:00 -07:00
kshitij	66851dc413	chore: add krislidimo to AUTHOR_MAP for PR #29775 (#32434 )	2026-05-25 23:15:56 -07:00
Teknium	d8703e27f5	feat(skills-hub): health checks, freshness badge, and a watchdog cron (#32345 ) Layered safety so the Skills Hub at /docs/skills stays in sync without silent rot. Three pieces: 1. build_skills_index.py — refuses to ship a degenerate index. EXPECTED_FLOORS per source (skills.sh ≥100, lobehub ≥100, clawhub ≥50, official ≥50, github ≥30, browse-sh ≥50) and MIN_TOTAL=1500. Any source collapsing to zero (the silent OpenAI breakage that hid for weeks) now fails the workflow loud — broken index never reaches the live site. 2. extract-skills.py + the React page — visible freshness signal. Sidecar website/src/data/skills-meta.json carries the index's generated_at timestamp, plus per-source counts. Skills Hub renders a 'Catalog refreshed N hours ago · auto-rebuilt twice daily' line under the hero copy. If the cron stalls, users see the staleness immediately. 3. .github/workflows/skills-index-freshness.yml — watchdog cron. Every 4 hours, fetches the live /docs/api/skills-index.json, validates shape, checks age (>26h is stale), checks the same per-source floors, and opens (or appends to) a GitHub issue when anything is off. The issue is title-prefixed [skills-index-watchdog] so subsequent failures append a comment instead of spamming new issues. Net effect: - A silent regression like 'OpenAI tap moved its skills' now fails the build instead of shipping a quietly broken catalog. - A stuck cron (like the landingpage breakage that ran red for weeks) now files an issue within 4 hours. - Users see how fresh the catalog is on the page itself. Test plan: - Local: built skills-meta.json from the live index → 'Catalog refreshed N minutes ago' rendered correctly in the static HTML. - Probe logic dry-run against the live index: total=2456, all 6 sources above floor, age 0.1h — issues=NONE. - Triggered skills-index.yml manually; both jobs green, deploy-site.yml dispatch fired.	2026-05-25 23:10:45 -07:00
Teknium	cea87d9139	fix(skills-hub): show every catalog source on /docs/skills (skills.sh, ClawHub, browse.sh, OpenAI, …) (#32336 ) Some checks are pending Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker / shell lint / Lint Dockerfile (hadolint) (push) Waiting to run Details Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Docker Build and Publish / move-latest (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details Build Skills Index / build-index (push) Waiting to run Details Build Skills Index / trigger-deploy (push) Blocked by required conditions Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details The Skills Hub page was stuck on a stale Feb 25 snapshot, showing only Built-in + Optional + Anthropic + LobeHub. The unified index already has 2078 skills from skills.sh / ClawHub / LobeHub / GitHub taps / Claude Marketplace, and BrowseShSource adds another ~330 — none of it was reaching the page. Changes: - website/scripts/extract-skills.py: read website/static/api/skills-index.json (the unified multi-source catalog, rebuilt twice daily) as the canonical external source. Keep the legacy skills/index-cache/ fallback for offline builds. Add friendly per-source labels (skills.sh, ClawHub, browse.sh, OpenAI, HuggingFace, Anthropic, LobeHub, etc.) and per-entry installCmd. - website/src/pages/skills/index.tsx: add source pills + ordering for the 11 new sources; render installCmd from the index entry. - website/scripts/prebuild.mjs: when no local skills-index.json exists, fetch the live one from hermes-agent.nousresearch.com so local 'npm run build' matches production without burning GitHub API quota. - scripts/build_skills_index.py: crawl BrowseShSource so browse.sh entries land in the unified index. Adjust source_order. - tools/skills_hub.py: GitHubSource.DEFAULT_TAPS — openai/skills moved its skills into skills/.curated/ and skills/.system/, so add both as explicit taps (the listing code skips dotted dirs by design). Drop VoltAgent/awesome-agent-skills (README-only, no SKILL.md files) and MiniMax-AI/cli (singular skill, not a tap directory). Net effect: github source jumps from 83 → 143 skills, with OpenAI properly included. - .github/workflows/deploy-site.yml: build the unified index BEFORE running extract-skills.py — previous order meant extract-skills always fell back to the legacy cache. Drop the 'skip if file exists' guard; the file is gitignored and must be rebuilt every deploy. - .github/workflows/skills-index.yml: drop the broken 'deploy-with-index' job (it cp'd 'landingpage/\*' which no longer exists, failing every cron run since the landingpage move). Replace it with a workflow_dispatch trigger of deploy-site.yml so the index refresh still reaches production on schedule. - website/docs/user-guide/features/skills.md: drop VoltAgent from the default-taps doc list to match the code. Before: 695 skills (Built-in 90, Optional 84, Anthropic 16, LobeHub 505). After: 2168 skills across 9 source pills, including the 1212 skills.sh entries the user expected to see.	2026-05-25 18:34:54 -07:00
MorAlekss	c26af46811	fix(skills): reject symlinks in skill bundles before install	2026-05-25 18:33:02 -07:00
Teknium	fe9744cbee	chore(release): map ffr31mr + TheOnlyMika in AUTHOR_MAP Pre-salvage prep for the must-have security cluster (#32103, #32155). #32103 author commit uses dearmayo@localhost; PR opener is ffr31mr — same pattern as the existing holynn-q localhost mapping.	2026-05-25 18:33:02 -07:00
Teknium	ccd899318e	fix(cron): split scanner into two tiers so skill prose stops false-positiving (#32339 ) The runtime cron prompt scanner (added in #3968 to plug the "malicious skill carrying an injection payload" gap) reuses the same critical-severity patterns as the create-time user-prompt scan against the assembled prompt — which includes loaded skill markdown. That works fine for narrow patterns like "ignore previous instructions" which never legitimately appear in prose. It catastrophically false- positives on command-shape patterns like `cat ~/.hermes/.env`, `authorized_keys`, `/etc/sudoers`, and `rm -rf /`, which routinely appear in security postmortems and runbooks as descriptive prose about attacks, not as actual commands. Concrete failure: the bundled `hermes-agent-dev` skill contains a security postmortem section saying "the attacker could just `cat ~/.hermes/.env`". Every PR-scout cron job that loaded this skill was silently blocked with `Blocked: prompt matches threat pattern 'read_secrets'`. All 11 scout jobs failed for weeks. Fix: split the scanner into two tiers and route by context: - `_scan_cron_prompt` (strict, unchanged behavior) runs against the small user-authored cron prompt at create/update and as a runtime defense-in-depth when no skills are attached. A legit user prompt has no business saying `cat .env`, so the strict patterns still apply there. - `_scan_cron_skill_assembled` (new, looser) runs against the assembled prompt when skills are attached. It only catches unambiguous prompt-injection directives ("ignore previous instructions", "disregard your rules", "system prompt override", "do not tell the user") plus invisible-unicode markers. Command- shape patterns are dropped because they false-positive on prose. This is defense-in-depth, not the only line of defense. Skill bodies are already scanned at install time by `skills_guard.py`; the runtime cron scan exists purely as a tripwire for an obvious injection directive surviving a malicious install. Catching prose mentions of commands was never the goal of #3968 — the test that planted a skill containing `cat ~/.hermes/.env` was the wrong shape of test for the threat model. Tests: - `_scan_cron_prompt` strict behavior preserved (56 existing tests unchanged: bare `cat .env`, `rm -rf /`, etc. still block). - New `TestScanCronSkillAssembled` class verifies the looser scanner: injection / disregard / system-override / do-not-tell-the-user / invisible-unicode still block; descriptive prose about attack commands is allowed; GitHub auth-header allowlist still works. - `test_skill_with_env_exfil_payload_raises` (planted `cat .env` in skill body) replaced with `test_skill_with_env_exfil_command _in_prose_is_allowed` documenting the new correct behavior with the real-world postmortem-style example that triggered the bug. - All 11 originally-failing PR-scout jobs validated end-to-end via `_build_job_prompt` — assembled prompts now build successfully with the `hermes-agent-dev` skill attached. Total: 75/75 tests in cron + cronjob_tools + threat scanner pass; 544/544 across the wider cron / memory / threat-pattern surface.	2026-05-25 18:20:45 -07:00
Teknium	e3236e99a4	fix(anthropic): API-key path skips OAuth autodiscovery + prunes stale entries When the user picks 'Anthropic API key' at `hermes setup` (vs 'Claude Pro/Max subscription'), `save_anthropic_api_key()` writes ANTHROPIC_API_KEY to ~/.hermes/.env and zeros ANTHROPIC_TOKEN. That env-var pattern is the user's explicit choice of auth method — API key, not OAuth. But the anthropic credential pool's autodiscovery (_seed_from_singletons) unconditionally read ~/.claude/.credentials.json from the Claude Code CLI and any saved hermes_pkce creds, and added them to the SAME anthropic pool as the user's API key. Two problems: 1. Even with the API key at higher priority, a 401/429 on the API key would rotate the session onto an autodiscovered OAuth credential, silently flipping the agent into the Claude Code masquerade mid-conversation: 'You are Claude Code' system block, every tool renamed to mcp_*, claude-cli User-Agent header. 2. Switching OAuth → API key at `hermes setup` cleared the env vars but left previously-seeded OAuth entries dormant in auth.json, where rotation could revive them. The user picking the API-key path is explicitly opting OUT of the masquerade. Mixing OAuth credentials into their pool defeats that choice. Fix: in `_seed_from_singletons` for provider='anthropic', detect the API-key path (ANTHROPIC_API_KEY set in env, no OAuth env var set) and: - Skip calling read_claude_code_credentials() and read_hermes_oauth_credentials() entirely - Prune any stale hermes_pkce / claude_code entries that may already be in the on-disk pool OAuth-path users (ANTHROPIC_TOKEN set) are unaffected — autodiscovery continues to fire as before. Tests: 3 new regression tests (api-key skips autodiscovery, api-key prunes stale entries, oauth path still autodiscovers). Full file 70/70.	2026-05-25 17:41:40 -07:00
Teknium	2c6bbaf352	fix(gateway): coerce scalar `model:` to dict before /model --global persist (#32272 ) Reported via AskClaw. When config.yaml has `model: <name>` (flat string) instead of the nested `model: {default: ..., provider: ...}` form, every gateway `/model X --global` crashed silently with TypeError: 'str' object does not support item assignment The persist block did: model_cfg = cfg.setdefault("model", {}) model_cfg["default"] = result.new_model `setdefault` returns the existing scalar, and the next assignment blows up. The 'switch failed' warning was logged at WARNING level and the user never saw why their persist didn't stick. Coerce scalar/None `model:` into a dict before mutation, in both the gateway path (`gateway/run.py`) and the sister site in `hermes_cli/doctor.py --fix` (same setdefault-on-string flaw). The CLI `/model` path is unaffected because it goes through `_set_nested` which already replaces scalar leaves with dicts. Regression test `tests/gateway/test_model_command_flat_string_config.py` covers the flat-string, missing, and proper-dict cases. Without the fix, the flat-string case fails with the exact original TypeError.	2026-05-25 15:22:23 -07:00
Teknium	de76f4dbcf	fix(secrets): only apply external secrets once per HERMES_HOME per process (#32271 ) `load_hermes_dotenv()` is called at module-import time from cli.py, hermes_cli/main.py, run_agent.py, trajectory_compressor.py, gateway/run.py, tui_gateway/server.py, acp_adapter/entry.py, and a few others. Each call triggered `_apply_external_secret_sources()`, which re-parsed config, re-fetched from Bitwarden Secrets Manager (its own 300s cache mostly absorbed this), re-ran the ASCII sanitization sweep, and reprinted Bitwarden Secrets Manager: applied N secret(s) (...) to stderr. Users saw the status line 3-5x per CLI startup. Guard the function with a process-level set of HERMES_HOME paths that have already had external secrets applied. Subsequent calls for the same home_path are no-ops. `reset_secret_source_cache()` lets tests (and any future long-running consumer that wants to refresh after a config change) force a re-pull.	2026-05-25 15:18:55 -07:00
Teknium	6bd0be30be	feat(patch): indentation preservation, CRLF preservation, per-file failure escalation (#507 ) (#32273 ) Three granular patch-tool refinements from the Roo Code deep-dive (#507). ## Indentation preservation (fuzzy_match.py) When fuzzy_find_and_replace matches via a non-exact strategy, the file's indentation may differ from what the LLM sent in old_string/new_string (common case: model sends zero-indent old/new for a method body that lives inside an 8-space-indented class). Before this commit the replacement was spliced in verbatim, producing a file with a broken indent level that may still parse but is logically wrong. The fix computes the indent delta between old_string's first meaningful line and the matched region's first meaningful line, then re-indents every line of new_string by that delta. Exact-strategy matches are untouched (passthrough). Same approach as Roo Code's multi-search-replace.ts:466-500. ## CRLF preservation (file_operations.py) Models nearly always send tool args with bare LF endings (JSON-encoded), but the file on disk may have CRLF (Windows-line-ending configs, .bat, .cmd, .ini files). Before this commit: - write_file silently normalized CRLF to LF on every overwrite - patch produced mixed-ending files: the substituted region had LF, the surrounding context kept CRLF The fix detects the file's existing line endings (via pre_content if already read for lint/LSP, otherwise a tiny head -c 4096 probe), and normalizes the entire write to that ending. New files are written verbatim (no detection possible). ## Per-file failure escalation (file_tools.py) When the agent fails to patch the same file 3+ times in a row, the existing 'old_string not found' hint isn't strong enough — the model keeps retrying with variations against a stale view of the file. The fix tracks consecutive failures per (task_id, resolved_path) and injects an escalating hint after 3 failures: 'This is failure #N patching X. Stop retrying. Either re-read fresh, use longer context, or fall back to write_file.' Counter resets on a successful patch to the same path. ## Validation - 22 new tests across tests/tools/test_fuzzy_match.py (5), test_line_ending_preservation.py (12), test_patch_failure_tracking.py (5) - All existing tests pass (165/165 in the touched files) - E2E verified with real _handle_patch / _handle_write_file calls against real CRLF files and real failure loops Closes part of #507. The remaining open items in #507 (2b start_line hint, behavioral rules) were declined after audit: - 2b adds schema bloat for a problem the existing 'multiple matches' contract already handles - Behavioral rules conflict with the personality system Items 1, 2d, 2e, 3, 4 of #507 were already landed in earlier work.	2026-05-25 15:18:45 -07:00
Teknium	c2aa235328	fix(agent): log outer-loop exceptions at ERROR with traceback (#32264 ) The outer 'except Exception' guard in run_conversation() captures exceptions raised inside the agent loop (during streaming, tool dispatch, message construction, etc.) and prints a one-line summary to the screen. The traceback was only logged at DEBUG, so it never landed in errors.log (WARNING+) and was lost. For intermittent failures — the most important kind to debug — users saw 'Error during OpenAI-compatible API call #N: <message>' on screen with no way to recover the call site. Switching to logger.exception() emits the full traceback at ERROR so it goes to both agent.log and errors.log automatically. This is a pure logging change; control flow is unchanged.	2026-05-25 15:16:54 -07:00
Teknium	30928f945f	fix(dashboard): suffix-allowlist plugin assets + denylist subprocess-influencing env vars (#32277 ) Two posture fixes surfaced by the web-pentest skill self-test against the dashboard (issue #32267). 1. /dashboard-plugins/<name>/<path> previously returned 200 for any file inside the plugin's dashboard directory — including plugin_api.py and __pycache__/.pyc. The path is unauthenticated by architecture (SPA loads JS via <script src> and CSS via <link href>, neither of which can attach a custom auth header), so the fix is not "require token" — it's "restrict to browser-fetchable suffixes." Allowlist now: .js .mjs .css .json .html .svg .png .jpg .jpeg .gif .webp .ico .woff .woff2 .ttf .otf .map. Everything else → 404. This stops a private user-installed plugin's Python source from being readable by anyone reachable on the dashboard's loopback port (other local users on a shared box, sidecar containers sharing the host netns). 2. save_env_value() now refuses to persist env-var names that influence how the next subprocess executes: LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT, DYLD_, PYTHONPATH, PYTHONHOME, PYTHONSTARTUP, NODE_OPTIONS, NODE_PATH, PATH, SHELL, EDITOR, VISUAL, PAGER, BROWSER, GIT_SSH_COMMAND, GIT_EXEC_PATH; plus HERMES_HOME / HERMES_PROFILE / HERMES_CONFIG / HERMES_ENV. PUT /api/env is authed but the session token lives in the SPA HTML where any future plugin XSS or local process can read it. Without this gate, a token-holder could plant LD_PRELOAD in .env and the next hermes process start would load attacker code via the dotenv to os.environ chain. This is enforced on write only — pre-existing .env values are left alone (the gate is in save_env_value, not in load_env). PUT /api/env now returns 400 with the explanatory message instead of an opaque 500. IMPORTANT: HERMES_* overall is NOT blocked — only the four runtime location names. Integration credentials following the HERMES_* convention (HERMES_GEMINI_, HERMES_LANGFUSE_, HERMES_SPOTIFY_*, HERMES_QWEN_BASE_URL, ...) keep working. Regression tests cover both fixes (30 new test cases). No existing tests changed; 257 passing in tests/hermes_cli/. Closes #32267.	2026-05-25 15:07:19 -07:00
teknium1	27df4b3882	fix(telegram): exempt reply_to_mode=off DM topic sends from anchor-required guard Salvage follow-up. The new private-DM-topic fail-loud contract from PR #27107 hits 'requires a reply anchor' when reply_to_mode='off' is configured, even though commit `21a15b671` (PR #23994) verified that message_thread_id alone routes correctly on python-telegram-bot's reference client when the user has explicitly opted out of quote bubbles. Carve out the explicit opt-in path so users on reply_to_mode 'off' aren't regressed — the new guard now only applies to callers that didn't ask for the anchor to be suppressed.	2026-05-25 14:54:02 -07:00
teknium1	926da69b45	test(telegram): switch transient-flake retry test to group chat Salvage follow-up. The transient thread-not-found retry test was exercising chat_id='123' (positive, looks-like-private) which now hits the new private-DM-topic fail-closed contract. The test's intent is the transient-flake retry on real forum topics in groups, so use -100123 to make the scenario unambiguous.	2026-05-25 14:54:02 -07:00
stepanov1975	5b1c75d662	refactor: simplify Telegram DM topic refresh (cherry picked from commit `bf8048ad87`)	2026-05-25 14:54:02 -07:00
stepanov1975	c394e7919d	fix: refresh stale Telegram DM topic threads (cherry picked from commit `26b87057ad`)	2026-05-25 14:54:02 -07:00
stepanov1975	dcd504cea4	fix: auto-create Telegram DM topics for delivery (cherry picked from commit `5cde0614e8`)	2026-05-25 14:54:02 -07:00
stepanov1975	96c71d8c46	fix: require anchors for Telegram DM topic deliveries (cherry picked from commit `6daafb3fd4`)	2026-05-25 14:54:02 -07:00
stepanov1975	6b7da11749	test: isolate API server env in gateway tests (cherry picked from commit `3d585f8db5`)	2026-05-25 14:54:02 -07:00
stepanov1975	415be55394	fix: route Telegram DM topic deliveries directly (cherry picked from commit `ad8f97db6c`)	2026-05-25 14:54:02 -07:00
Teknium	0dee92df22	feat(security): promptware defense — shared threat patterns + memory load-time scan + tool-result delimiters (#32269 ) Hardens the context window against Brainworm-class promptware attacks (see #496). Three changes: 1. tools/threat_patterns.py — single source of truth for injection/promptware patterns. Replaces the duplicated pattern lists in prompt_builder.py and memory_tool.py. Adds ~15 new Brainworm/C2 patterns (node registration, heartbeat/beacon, pull tasking, anti-forensic disk avoidance, identity override, known framework names). Three scopes — 'all' (narrow, classic injection), 'context' (adds promptware/role-play, broader detection), 'strict' (adds persistence/SSH-backdoor patterns for user-mediated writes). 2. MemoryStore.load_from_disk() now scans entries at snapshot-build time. Poisoned entries are replaced with [BLOCKED: ...] placeholders in the frozen system-prompt snapshot. Live state keeps the original so the user can still inspect + remove via memory(action=read/remove). Scan is deterministic from disk bytes — prefix-cache invariant holds. 3. make_tool_result_message() wraps results from high-risk tools (web_extract, web_search, browser_, mcp_) in <untrusted_tool_result source="...">...</untrusted_tool_result> delimiters with framing prose telling the model the content is data, not instructions. Architectural defense against indirect injection from poisoned web pages, GitHub issues, MCP responses — does NOT regex-scan tool results (pattern arms race + per-iteration latency). Multimodal content lists pass through unwrapped to preserve adapter compatibility. Pattern philosophy: anchor on C2-specific vocabulary or unambiguous attack behavior, NOT on bossy English. Dropped patterns suggested in #496 that would have tripped legitimate content: standalone 'you are obligated to', 'do not respond immediately', 'you must X' without a C2-verb anchor. Validation: - 257/257 targeted tests pass (test_threat_patterns + test_memory_tool + test_tool_dispatch_helpers + test_prompt_builder) - E2E run with real Brainworm payload: blocked from AGENTS.md context-file path, blocked from MEMORY.md snapshot, wrapped in delimiters when arriving via web_extract. Legitimate 'you must follow conventions' phrasing not flagged. Explicitly NOT in this PR (per #496 discussion): - Per-tool-result regex scanning (pattern arms race) - SessionBehaviorMonitor / polling-loop detection (wrong layer) - Outbound network gating (Docker backend already covers this) - security.context_scanning warn\|block knob (current behavior is always block-with-placeholder — there's no warn mode that makes sense) Closes #496 for Phase 1 + the architectural delimiter piece of Phase 2. Phase 3 stays in tracking issue territory.	2026-05-25 14:52:24 -07:00
Teknium	b6ce7a451f	chore(release): add ronhi for PR #29523 salvage Maps the machine-local commit email (ronhi@buildabear1.localdomain) to the GitHub login RonHillDev so the attribution check passes.	2026-05-25 14:51:43 -07:00
ronhi	bbc8f2f961	chore(models): drop retired grok-4-1-fast from metadata, tests, docs xAI retired grok-4-1-fast. hermes_cli/models.py already removed it from the static fallback in an earlier commit, but the context-length metadata, the tests pinning those values, and the provider doc still referenced the retired ID. Clean those up so retired model names stop appearing in user-facing output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 14:51:43 -07:00
Teknium	263e008d6b	feat(skills): add web-pentest optional skill (#32265 ) Adds optional-skills/security/web-pentest/ — an authorized web app penetration testing skill adapted from Shannon's methodology (concepts only; AGPL-clean fresh implementation). Phased: recon (read-only) → vuln analysis (delegate_task per OWASP class) → proof-based exploitation → report. Guardrails baked in: - Authorization gate before first active scan (templates/authorization.md) - Scope allowlist (scope.txt) consulted by recon-scan.sh and documented as the rule for every active request - Aux-client leakage warning (compression + title gen replay history; payloads/creds must not enter chat verbatim) - Bypass-exhaustion discipline before false-positive classification - L3/L4 (proof-required) for reportable findings; L1/L2 listed as candidates only Closes #400. Supersedes #21845 (plugin-shaped proposal; skill-shaped is cheaper and matches the existing optional-skills/security/ pattern).	2026-05-25 14:51:41 -07:00
teknium1	386f245d9d	feat(skills): add optional openhands skill — closes #477 Adds an optional autonomous-ai-agents skill that delegates coding tasks to the OpenHands CLI (https://github.com/All-Hands-AI/OpenHands). Sits alongside claude-code / codex / opencode and is the model-agnostic option in that family — any LiteLLM-supported provider works. This is a ground-truth rewrite of #19325 by @xzessmedia (Tim Koepsel). The original PR's SKILL.md was drafted by the OpenHands agent itself and hallucinated several flags that don't exist in the real CLI (\`--model\`, \`--max-iterations\`, \`--workspace\`, \`--sandbox docker\`), pointed at the wrong PyPI package (\`openhands-ai\`, which is the legacy V0 SDK), and claimed native Windows support that the upstream docs explicitly disclaim. Rather than cherry-pick and rewrite half the lines under contributor authorship, the SKILL.md was rebuilt against a verified install (\`uv tool install openhands --python 3.12\`) and a real end-to-end \`--headless --json\` run against openrouter/openai/gpt-4o-mini. Authorship credited via the \`author:\` frontmatter field and an AUTHOR_MAP entry in scripts/release.py. Changes: - optional-skills/autonomous-ai-agents/openhands/SKILL.md (new) - website/docs/user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-openhands.md (auto-gen) - website/docs/reference/optional-skills-catalog.md (one new row) - website/sidebars.ts (one new entry under Optional → Autonomous AI Agents) - scripts/release.py (AUTHOR_MAP entry for xzessmedia) Pitfalls documented in the SKILL came from running the tool, not from the upstream README: LiteLLM bedrock/sagemaker stderr noise on every invocation, banner spam (\`OPENHANDS_SUPPRESS_BANNER=1\` required), \`--override-with-envs\` mandatory or the CLI ignores LLM_* env vars entirely, the dashed-vs-undashed Conversation ID footgun for \`--resume\`, LiteLLM model-slug double-prefix when going through OpenRouter.	2026-05-25 14:49:34 -07:00
Teknium	5671461c0c	feat(skills): add code-wiki skill — closes #486 (#32240 ) * feat(skills): add code-wiki skill — closes #486 Bundled skill at skills/software-development/code-wiki/ that generates comprehensive documentation for any codebase: project overview, architecture walkthrough with Mermaid flowchart, per-module deep-dives, class diagram, sequence diagrams, getting-started guide, and (when applicable) API reference. Output defaults to ~/.hermes/wikis/<repo-name>/ (external to repo, like Google CodeWiki); in-repo output supported when user explicitly requests it. Uses only existing Hermes tools (terminal, read_file, search_files, write_file) — no Docker, no external services, no extra dependencies. Works on local repos and GitHub URLs (shallow-clones to a temp dir). Bounded scope defaults (depth 3, cap 10 modules) keep token cost reasonable on large repos. * refactor(skills): move code-wiki to optional-skills Per the 'when in doubt, optional' rule — wiki generation is a 'I want this big thing right now' capability, not daily-driver behavior. Lines up with finance/research/blockchain skills as install-on-demand rather than always loaded. Install via: hermes skills install official/software-development/code-wiki	2026-05-25 14:48:53 -07:00
Teknium	5caeb65a08	test(tts): regression coverage for #29417 double-[pause] fix Three new tests in tests/tools/test_tts_xai_speech_tags.py: - multi_paragraph_emits_single_pause — the headline #29417 case. Requires a first sentence of 12+ chars to hit the _XAI_FIRST_SENTENCE_RE length floor; the trivial 'Hello.\\n\\nWorld.' case dodged the bug by accident, which is why the PR's quoted repro didn't reproduce. Uses the longer 'Welcome to the demo of our new product line.\\n\\nIt has many features.' shape that actually trips the bug. - single_paragraph_still_gets_first_sentence_pause — sanity guard that the fix only suppresses the first-sentence pass when a paragraph pass injected [pause], so plain single-paragraph input still gets its leading pause. - single_newline_still_gets_first_sentence_pause — single newline isn't a paragraph break, no [pause] from the paragraph pass, so the first-sentence pause MUST still fire. Catches over-broad fixes.	2026-05-25 14:30:06 -07:00
EloquentBrush0x	1d73d5facc	fix(tts): prevent double [pause] in xAI auto speech tags for multi-paragraph text _apply_xai_auto_speech_tags runs two independent transformations: 1. paragraph breaks (\n\n) → " [pause] " 2. first-sentence boundary → " [pause] " Both fired unconditionally, so multi-paragraph input produced "Hello world. [pause] [pause] Second paragraph." — an unnatural double pause in the TTS audio. Guard the first-sentence substitution with _XAI_SPEECH_TAG_RE.search(clean): if the paragraph pass already inserted a [pause] tag, skip the first-sentence pass. Single-paragraph behavior is unchanged.	2026-05-25 14:30:06 -07:00
alt-glitch	b62af47da8	chore: drop stale line-number reference in PRIORITY path comment The cherry-pick comment referenced 'line ~6771' for the /stop handler, but on current main the handler is at a different offset. Remove the hard-coded line number — the 'above' reference is sufficient.	2026-05-25 16:23:24 +00:00
xxxigm	737ee81167	test(gateway): regression tests for #30170 subagent interrupt protection 17 new tests in tests/gateway/test_subagent_protection_30170.py pin down both the detection helper and the demotion behaviour: * TestAgentHasActiveSubagents — 11 cases covering the precision and defensiveness of _agent_has_active_subagents: - returns False for None, _AGENT_PENDING_SENTINEL, and stub agents that lack the _active_children attribute; - returns False for an empty list (the steady state of an idle AIAgent); - returns True for one or many children; - works when _active_children_lock is None (test stubs); - rejects truthy MagicMock auto-attributes — this is the regression-guard for "every MagicMock-based gateway test suddenly demotes to queue mode" (which is how this was originally found); - accepts list/tuple/set as the children container. * TestBusyHandlerDemotesInterruptForSubagents — 6 cases driving _handle_active_session_busy_message directly: - parent.interrupt is NOT called when subagents are active, message is still merged into the pending queue; - ack copy mentions "Subagent working", "queued", and the /stop escape hatch — and does NOT mention "Interrupting"; - with no subagents, behaviour is byte-identical to the pre-#30170 interrupt path (parent.interrupt called with the user text, ack says "Interrupting"); - configured queue mode keeps its vanilla "Queued for the next turn" ack (the #30170 demotion-specific copy must NOT fire); - configured steer mode still routes to running_agent.steer() even when subagents are active (the guard is interrupt-only); - _AGENT_PENDING_SENTINEL does not trigger demotion. Refs #30170.	2026-05-25 16:23:24 +00:00
xxxigm	99d62f6ba1	fix(gateway): protect in-flight subagents from busy-mode interrupts (#30170 ) When a user sends a conversational follow-up while delegate_task is running, gateway/run.py calls running_agent.interrupt(event.text) on the PARENT agent. AIAgent.interrupt() then cascades synchronously through self._active_children and calls interrupt() on every child subagent, aborting in-flight delegate_task work. The user sees the fallback cascade with no root-cause in the gateway log, and minutes of subagent progress are destroyed — the exact failure mode reported in Add GatewayRunner._agent_has_active_subagents(running_agent) — a static helper that returns True iff the parent is currently driving subagents via delegate_task. The helper is type-defensive: it ignores truthy MagicMock auto-attributes (so this doesn't accidentally fire in every test mock that hits the busy path), the _AGENT_PENDING_SENTINEL placeholder, and missing locks. Wire the helper into both interrupt branches: 1. _handle_active_session_busy_message — the adapter-level busy handler. When busy_input_mode == 'interrupt' AND the parent has active subagents, demote to 'queue' semantics: skip the parent.interrupt() call, merge the message into the pending queue, and surface a dedicated ack ("⏳ Subagent working — your message is queued for when it finishes (use /stop to cancel everything).") so the operator knows the message wasn't lost and discovers the explicit escape hatch. 2. The PRIORITY interrupt branch inside _handle_message — the non-command fast path. Same rationale, same demotion. Routes through _queue_or_replace_pending_event so the next-turn pickup stays unchanged. Explicit /stop and /new commands take a completely different path (_interrupt_and_clear_session in the slash-command dispatch at line ~6771) and are NOT affected by this guard — the operator still has a way to force-cancel everything when they actually mean it. Configured 'queue' and 'steer' modes are also untouched: 'queue' already does the right thing, and 'steer' goes through running_agent.steer() which does NOT cascade to children (so subagents survive a steer too). This is Phase 1 of the fix outlined in #30170 — the minimum viable change that stops subagent loss. Phase 2 (delegation-aware steer forwarding to active children) and Phase 3 (async delegation, #11508) are intentionally out of scope. Refs #30170.	2026-05-25 16:23:24 +00:00
brooklyn!	50aaf0c4ad	fix(tui): delineate assistant responses from details (#31087 ) * fix(tui): delineate assistant responses from details Add a muted Response marker before assistant text when thinking/tool details are visible so reasoning and final output do not visually run together. * fix(tui): account for response separator height Keep virtual transcript estimates aligned with the new response separator and avoid allocating trimmed copies of long assistant text. * fix(tui): gate response separator estimate on details Only add response-separator height when assistant details actually render, and use a non-allocating body-text check. * fix(tui): skip empty detail height estimates Do not add virtual transcript height for assistant details when no thinking or tool detail UI will render. * fix(tui): estimate details by section visibility Pass resolved thinking/tool visibility into virtual height estimates so hidden detail sections do not reserve response-separator rows.	2026-05-25 10:23:03 -05:00
brooklyn!	0ec0cafdd0	Merge pull request #31084 from NousResearch/bb/tui-right-click-copy-selection fix(tui): right-click copies active transcript selection	2026-05-25 10:22:43 -05:00
Savanne Kham	4117fc3645	fix(credential-pool): correct pool rotation when weekly usage limit is reached After key #1 is marked exhausted the retry still called the API with key #1 due to env-var bias in _get_cached_client / resolve_api_key_provider_credentials. Fix: peek the pool and pass the active entry's key as explicit_api_key. Secondary: api_key_hint in mark_exhausted_and_rotate pins the correct entry under concurrent CLI+gateway calls; _is_payment_error matches GoUsageLimitError; extract_api_error_context parses "Resets in Xhr Ymin".	2026-05-25 06:32:30 -07:00
Teknium	8f19485f53	chore(release): map kylekahraman email to GitHub login Required by CI author validation after salvaging PR #29723.	2026-05-25 06:23:18 -07:00
kylekahraman	ab42658dfc	feat: configurable paste collapse thresholds (TUI + CLI) Adds two new config keys: - paste_collapse_threshold (default: 5) — line count threshold for bracketed paste collapse in both TUI and CLI - paste_collapse_threshold_fallback (default: 0, disabled) — same for the fallback heuristic in terminals without bracketed paste support TUI frontend reads these from config.get full via applyDisplay/patchUiState. CLI reads from self.config at paste-handling time. Closes #5626 Related: #5623	2026-05-25 06:23:18 -07:00
zccyman	973bb124a4	fix(credential-pool): rotate immediately when credential already exhausted Closes #26145. When the user interrupts the retry loop between two 429s (Ctrl-C in interactive mode, /new, gateway disconnect), the local has_retried_429 flag dies with the recovery function. On the next user prompt the agent restarts with has_retried_429=False, hits 429 on the exhausted credential, sets the flag, returns 'retry once'. Repeat forever — the second 429 that would trigger rotation is never reached, and healthy entries (priority>0 free/paid accounts) are never tried. Fix: in recover_with_credential_pool's rate_limit branch, pre-check pool.current().last_status before running the retry-once dance. If the current entry is already STATUS_EXHAUSTED, rotate immediately. Uses getattr() for the attribute read so existing tests with SimpleNamespace mocks (which only set 'label') keep working. Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com>	2026-05-25 06:21:28 -07:00
Teknium	0a6a0ba527	test(skills): widen assertion in PR#6656 regression to accept new validator msg The new install-path validator from this PR raises 'Unsafe install path: ...' earlier in the pipeline than the previous resolve-then-check path. Behavior is identical (ok=False, victim untouched, refused before rmtree) — only the error string changed.	2026-05-25 06:13:36 -07:00
峯岸亮	3b9b9a7ad7	fix(skills): guard uninstall lock paths Validate Skills Hub lock-file install paths at both ends of the lifecycle so a poisoned or malformed lock.json entry cannot drive shutil.rmtree to a location outside SKILLS_DIR: - HubLockFile.record_install rejects empty/'.'/absolute/traversal/ Windows-drive paths at write time, and requires the final path component to match the skill name (shape: '<skill>' or '<category>/<skill>'). - install_from_quarantine resolves its destination through the same validator, catching symlink/junction redirects inside skills/. - uninstall_skill resolves the lock entry through the new validator before rmtree. Refuses anything that resolves to SKILLS_DIR itself (empty/dot paths) or to a target outside SKILLS_DIR (absolute paths, traversal, symlinked dirs in skills/ pointing outward). - 14 focused regression tests covering each rejection class plus a symlink-redirect case. E2E verified: hand-crafted poisoned lock.json entries (absolute path, empty install_path, traversal) all refuse and leave the targeted victim untouched; legitimate uninstall still succeeds. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-05-25 06:13:36 -07:00
Teknium	0d137f1039	feat(errors): actionable guidance for Nous OAuth 401s (#32082 ) Nous Portal is OAuth-only (auth_type=oauth_device_code, no API key path), but the non-retryable-401 guidance branch only covered openai-codex and xai-oauth. A Nous 401 fell through to the generic 'Your API key was rejected... run hermes setup' message, which is wrong advice — the user needs hermes auth add nous --type oauth, not an API key. Also flag the case where the failing model slug ends in :free (OpenRouter syntax) while provider is nous. Without that hint, users re-OAuth successfully and then hit the same 401 on the next message because Nous Portal doesn't carry the OpenRouter free-tier slug. Reported by ashh — debug dump showed Nous device_code exhausted + deepseek/deepseek-v4-flash:free as the model.	2026-05-25 06:06:51 -07:00
wysie	dbe5d84972	fix(auxiliary): universal main-model fallback for aux tasks (#31845 ) Aux callers (title generation, vision, session search, etc.) can reach resolve_provider_client() without an explicit model when the user picked their main provider via 'hermes model' and didn't bother configuring a per-task auxiliary.<task>.model override. The expectation in that case is universal: 'use my main model for side tasks too.' Before, the OAuth providers (xai-oauth, openai-codex) silently returned (None, None) on an empty model — both lack a catalog default because their accepted-model lists drift on the backend. That caused _resolve_auto to drop to its Step-2 fallback chain (OpenRouter / Nous / etc.), so aux tasks billed against the wrong subscription without warning. The fix is at the top of resolve_provider_client() — a single 3-step universal fallback that runs before any provider branch, so no provider-specific empty-model guards are needed (now or for any future provider we add): 1. caller-passed model (caller knew what they wanted) 2. provider's catalog default (cheap aux model, if registered) 3. user's main model from config.yaml Behaviour by provider class: - OAuth providers (xai-oauth, openai-codex) — no catalog default, so step 3 applies. Title gen runs on grok-4.3 / gpt-5.4 against the user's actual subscription instead of leaking to OpenRouter. - API-key providers (anthropic, gemini, kimi-coding, etc.) — catalog default wins at step 2, preserving the original 'cheap aux model' behaviour. Anthropic users still get claude-haiku-4-5 for titles, not opus. - Explicit-model callers (auxiliary.<task>.model config, programmatic callers) — caller wins at step 1, no surprise switching. Salvaged from @wysie's PR #31845 which fixed the xai-oauth branch specifically. The universal shape supersedes the per-branch fix and covers openai-codex (same bug class) plus any future OAuth providers. 4 new tests in TestResolveProviderClientUniversalModelFallback: - empty_model_for_oauth_provider_falls_back_to_main_model - empty_model_for_codex_also_uses_main_model - empty_model_for_catalog_provider_uses_catalog_default - explicit_model_takes_precedence_over_fallbacks 365/365 across tests/agent/test_auxiliary_*, tests/run_agent/test_codex_xai_oauth_recovery.py, tests/hermes_cli/test_auth_xai_oauth_provider.py, and tests/hermes_cli/test_plugin_auxiliary_tasks.py. Co-authored-by: wysie <wysie@users.noreply.github.com>	2026-05-25 05:50:56 -07:00
Teknium	46c1ae8b24	fix(tests): four pre-existing flakes from the security cluster merge (#32072 ) All four failures were broken by the security cluster (#10082 / #10133 / #4609 / symlink-reject batch) merging on May 25. They were red on origin/main HEAD when #32042 and #32061 ran, gating PRs that touched unrelated code. 1) tests/hermes_cli/test_update_zip_symlink_reject.py test_update_via_zip_accepts_normal_member called the real _update_via_zip without sandboxing PROJECT_ROOT — so the function's shutil.copytree() actually copied the fake README from the test ZIP over the real repo's README.md, which then made test_readme_mentions_powershell_installer fail in any test run that happened to pick this test up earlier. Mock PROJECT_ROOT to an isolated tmp_path / install_dir, stub subprocess so pip/uv reinstall doesn't actually run, and assert the fake README lands in the sandbox (not the real tree). 2) tests/tools/test_windows_native_support.py test_readme_mentions_powershell_installer was the victim of (1) — nothing wrong with the test itself, the fix in (1) clears it. 3) tests/tools/test_file_read_guards.py test_proc_fd_other_not_blocked called _is_blocked_device('/proc/self/fd/3') expecting False. But _is_blocked_device runs realpath() and on pytest xdist workers fd 3 happens to be dup'd to /dev/urandom (because the worker subprocess inherits open fds from pytest's collection pipe machinery). Switch to the lower-level _is_blocked_device_path which is the path-pattern check the test actually means to exercise; realpath-resolution coverage already lives in test_symlink_to_blocked_device_is_blocked. 4) tests/tools/test_transcription_tools.py Module installed a faster_whisper stub via sys.modules without setting __spec__, then later @pytest.mark.skipif called importlib.util.find_spec('faster_whisper') which raises 'ValueError: __spec__ is None' for modules with a None spec attr. Set __spec__ on the stub to a real ModuleSpec. Validation: 195/195 green across the 4 affected files.	2026-05-25 05:50:29 -07:00

1 2 3 4 5 ...

9577 commits