Commit graph

507 commits

Author SHA1 Message Date
helix4u
71291d83cd test: keep tirith checks hermetic 2026-05-23 02:20:14 -07:00
Teknium
97e975edd2 fix(file-safety): widen read-deny to .env, mcp-tokens/, webhook secrets, root
Extends @briandevans's PR #17659 from {auth.json, auth.lock,
.anthropic_oauth.json} to also cover:

  - HERMES_HOME/.env                       (provider API keys)
  - HERMES_HOME/webhook_subscriptions.json (per-route HMAC secrets)
  - HERMES_HOME/mcp-tokens/                (OAuth token directory; dir
                                            + everything inside)

…AND iterates over both _hermes_home_path() AND _hermes_root_path()
so profile-mode runs (HERMES_HOME = <root>/profiles/<name>) also block
<root>/{auth.json, .env, mcp-tokens/, ...}. Same widening shape as the
write-deny side already does (#15981, #14157).

Explicitly NOT a security boundary. Per the personal-assistant trust
model, the terminal tool runs as the same OS user and can `cat
auth.json` directly. This read-deny exists as defense-in-depth:

  - Models that respect tool denials empirically tend to stop rather
    than reach for the shell.
  - The denial surfaces an audit trail when something tries to read
    credentials — easier to spot in logs than a generic `cat`.

Docstring + error message both flag this as defense-in-depth so future
contributors don't mistake it for a real security boundary and don't
re-decline reports that propose the same fix shape.

Absorbs the .env and mcp-tokens/ coverage from @tomqiaozc's parallel
PR #8055 (closed-as-duplicate, credited).

Co-authored-by: Tom Qiao <zqiao@microsoft.com>
2026-05-22 20:15:09 -07:00
briandevans
567ea61298 fix(file-safety): block auth.json read via TERMINAL_CWD relative path
read_file_tool resolves relative paths against TERMINAL_CWD (or the
task's live terminal cwd), but the prior call passed the original
unresolved string to get_read_block_error. That function's own
resolve() is anchored at the Python process cwd, so when a task's
TERMINAL_CWD pointed at HERMES_HOME and the agent issued read_file
on the relative path "auth.json", the credential-store denylist was
never reached and the file was read normally.

Pass the already-resolved absolute path string at the file_tools call
site, document the contract on get_read_block_error, and add a
read_file_tool-level regression test that pins the relative-path
case under TERMINAL_CWD == HERMES_HOME.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 20:15:09 -07:00
briandevans
056e00a77e fix(file-safety): block read_file on HERMES_HOME credential stores (#17656)
`get_read_block_error` previously only denied reads inside
`${HERMES_HOME}/skills/.hub`, which left `auth.json` (provider OAuth
state + plaintext API keys) and `.anthropic_oauth.json` (Anthropic PKCE
tokens) directly readable by the agent. A prompt-injection reaching
`read_file` could exfiltrate active provider credentials in plaintext.

Mode-0600 file permissions only protect against *other Unix users* —
the agent runs as the file's owner, so `read_file` is unaffected.

Extend the existing deny list with the three credential paths
identified in #17656 (`auth.json`, `auth.lock`, `.anthropic_oauth.json`).
The check uses the same `Path.resolve()` pattern as `skills/.hub`, so
symlink/path-traversal indirection is caught too. The agent doesn't
need to read these directly — `auxiliary_client` and `credential_pool`
consume them through process env / OAuth flows that bypass `read_file`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 20:15:09 -07:00
Julien Talbot
09afafb87e fix(xai): resolve Grok Build context for OAuth 2026-05-22 13:05:36 -07:00
Teknium
c769be344a
fix(agent): recover from providers rejecting list-type tool content (#27344) (#30259)
Some providers (Xiaomi MiMo, some Alibaba endpoints, a long tail of
OpenAI-compatible servers) follow the OpenAI spec strictly and require
tool message `content` to be a string — they reject our list-type
content (text + image_url parts) with HTTP 400 'text is not set' /
'tool message content must be a string'.

Instead of an allowlist of known-good providers (maintenance burden,
guaranteed to miss aggregators like OpenRouter where the underlying
model determines support, not the aggregator name), this lands a
reactive recovery:

1. New `FailoverReason.multimodal_tool_content_unsupported` with a
   small pattern list covering the common 400 wordings.
2. `AIAgent._try_strip_image_parts_from_tool_messages` walks the API
   message list, downgrades any `role:tool` message whose content is
   list-with-image to a plain text summary (preserves text parts) in
   place, AND records the active (provider, model) in a session-scoped
   `_no_list_tool_content_models` set.
3. `_tool_result_content_for_active_model` short-circuits to a text
   summary when (provider, model) is in the cache — so after the first
   400 + retry, subsequent screenshots in the same session skip the
   round trip entirely.
4. Retry hook in `agent.conversation_loop` mirrors the existing
   `image_too_large` recovery: detect the reason, run the helper,
   retry once, fall through to the normal error path if no list-type
   tool content was actually present.

Cache is transient (per-session) by design — next session retries in
case the provider added support, no persistent state to maintain.

Fixes #27344. Closes #27351 (allowlist approach superseded by reactive
recovery).
2026-05-21 23:40:16 -07:00
Teknium
e77f1ed5f7 fix(agent): widen toolset gate to context engine tools (#5544 sibling)
The memory-provider gate added in the prior commit closes one of two
blind-injection sites in agent_init.py. The context engine block (lines
~1445) follows the identical pattern: agent.context_compressor.get_tool_schemas()
(lcm_grep, lcm_describe, lcm_expand) was appended to agent.tools unconditionally,
ignoring enabled_toolsets.

Same bug class, same local-model latency penalty, same one-line gate — using
'context_engine' as the toolset name (matches the existing plugin-system
convention in plugins.py, plugins_cmd.py, etc.).

Also adds Lempkey to scripts/release.py AUTHOR_MAP for the prior commit's
authorship.
2026-05-21 23:18:37 -07:00
lempkey
4c61fb6cf6 fix(agent): gate memory tool injection on enabled_toolsets (#5544)
MemoryManager.get_all_tool_schemas() output was appended to AIAgent.tools
unconditionally — bypassing the enabled_toolsets / platform_toolsets filter.
Setting `platform_toolsets: telegram: []` had no effect: fact_store and other
memory provider tools still leaked into the tool surface on every session.

Impact on local models (per @thundercat49's benchmarks on Qwen3-30B-A3B Q4_K_M /
RTX 3090): tool-formatted prompts process at 134 tok/s vs 1,230 tok/s for plain
text. With 8 memory tool schemas injected, a simple 'hello' on Telegram took
~42s instead of ~1.7s. Small models also entered tool-call loops when memory
tools were the only tools present.

Gate condition (matches the natural meaning of enabled_toolsets):
  None                       → no filter, inject (backward compat)
  contains 'memory'          → user opted in, inject
  otherwise (including [])   → skip injection

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-21 23:18:37 -07:00
teknium1
9896e43db5 fix(skills): load Linux-tagged skills on Termux (android sys.platform)
Reported by @LikiusInik in Discord: on Termux only 3 built-in skills
appeared and /gh-pr-workflow + every other slash-skill from
github/productivity/mlops was missing.

Root cause: skill_matches_platform() compares sys.platform.startswith()
against the skill's platforms list. Termux is a Linux userland on
Android, but Python 3.13+ reports sys.platform == "android" instead of
"linux" — so the ~60 built-in skills tagged platforms:[linux,macos,
windows] (github-pr-workflow, google-workspace, github-auth,
huggingface-hub, etc.) all got filtered out at the listing step in
tools/skills_tool.py:_find_all_skills and never appeared as /slash
commands or in skill_view.

Fix: when is_termux() detects we're running inside Termux, accept
"linux" platform tags regardless of whether sys.platform is "linux"
(pre-3.13) or "android" (3.13+). Also accept explicit
platforms:[termux] / [android] tags. macOS-only and Windows-only
skills correctly remain excluded.

E2E (simulated TERMUX_VERSION=set + sys.platform="android"):
  Before: _find_all_skills() returned ~3 skills.
  After:  _find_all_skills() returns 84 skills including
          github-pr-workflow, google-workspace, github-auth,
          huggingface-hub. Apple-only skills remain excluded.

Non-Termux Linux/macOS/Windows behavior unchanged (verified).

Tests: tests/agent/test_skill_utils.py — 9 new cases covering
android-as-Termux, the [linux,macos,windows] case, macOS-only
exclusion, explicit termux/android tags, non-Termux Android safety,
and unchanged behavior on real Linux/macOS.
2026-05-21 19:08:38 -07:00
Teknium
3fde8c153d
fix(skills): prune dependency/venv dirs from all skill scanners (#30042)
* fix(skills): skip dependency dirs in skill scan

* fix(skills): widen sibling rglob scanners to use shared exclusion set

Follow-up to PR #29968. The contributor's PR widened EXCLUDED_SKILL_DIRS
in the canonical walker (iter_skill_index_files), which fixes the
user-visible discovery path. This commit sweeps the ~12 other
rglob('SKILL.md') sites that did their own ad-hoc filtering — most only
checked .git/.hub, some had no filter at all — so dependency dirs
(.venv, node_modules, site-packages, etc.) cannot leak ghost skills
through the secondary paths.

Adds agent.skill_utils.is_excluded_skill_path(path) helper. Migrates
all 13 sites to use it. Removes 3 hardcoded duplicate filter sets.

Sites touched:
  agent/curator_backup.py        - skill backup file count
  gateway/run.py                 - disabled-skill response (2 sites)
  hermes_cli/dump.py             - skill count in env dump
  hermes_cli/profile_describer.py- profile description (2 sites)
  hermes_cli/profile_distribution.py - profile install count
  hermes_cli/profiles.py         - profile skill count
  hermes_cli/skills_hub.py       - category detection
  tools/skill_manager_tool.py    - skill name lookup (already used set, now uses helper)
  tools/skill_usage.py           - usage tracking + skill dir lookup (2 sites)
  tools/skills_hub.py            - optional skills find + scan (2 sites)
  tools/skills_sync.py           - bundled skills sync

E2E verified with the exact reported shape
(bring/scripts/.venv/.../typer/.agents/skills/typer/SKILL.md): no
sibling site picks up the ghost skill, all five legit-skill counts
still return 1.

* chore(infographic): retro-pop-grid bento for PR #30042 skill-scanner sweep

---------

Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
2026-05-21 14:18:02 -07:00
helix4u
ba9964ff0d fix(custom): pass custom provider extra body
Allow custom OpenAI-compatible providers declared under `custom_providers:`
to set provider-specific `extra_body` fields and have Hermes merge them into
chat-completions requests when the matching custom endpoint is active.

This is a manual per-provider override rather than a model-name heuristic.
OpenAI-compatible Gemma thinking support is real, but the on-wire payload
shape is backend-specific: some servers want top-level `enable_thinking`,
while vLLM Gemma and NIM-style endpoints expect `chat_template_kwargs`.
A per-provider override is safer than picking one assumed payload.

Example config:

```yaml
custom_providers:
  - name: gemma-local
    base_url: http://localhost:8080/v1
    model: google/gemma-4-31b-it
    extra_body:
      enable_thinking: true
      reasoning_effort: high
```

For vLLM Gemma or NIM-style endpoints, use the nested shape those servers
expect:

```yaml
extra_body:
  chat_template_kwargs:
    enable_thinking: true
```

Changes:

- `hermes_cli/config.py`: preserve `extra_body` in normalized
  `custom_providers:` entries and allow it in the validated field set.
- `hermes_cli/runtime_provider.py`: propagate custom-provider `extra_body`
  as `request_overrides.extra_body` for named custom runtime resolution,
  including credential-pool paths.
- `agent/agent_init.py`: at agent init, locate the matching custom-provider
  entry by `base_url` (+ optional model) and merge its `extra_body` into
  `AIAgent.request_overrides`, with caller-provided overrides winning on
  conflicting top-level keys.
- `plugins/model-providers/custom/__init__.py`: keep existing CustomProfile
  behavior (Ollama `num_ctx`, `think=False` when reasoning disabled);
  user-configured `extra_body` flows through `request_overrides`.
- `website/docs/integrations/providers.md`: document the explicit
  `extra_body` override and the vLLM/Gemma `chat_template_kwargs` variant.
- Tests cover config normalization, runtime propagation, model matching,
  trailing-slash equivalence, fallback when no `model` field is set, and
  caller-override merging precedence.

Verified end-to-end against `CustomProfile` via `ChatCompletionsTransport`:
configured `extra_body` reaches `kwargs.extra_body` on the wire request,
and coexists with profile-generated entries (Ollama `num_ctx`, `think=False`)
without clobber.

Salvaged from #29022 onto current `main`. Cosmetic typing edit in
`plugins/model-providers/custom/__init__.py` and a stale-base docs revert
in `providers.md` were dropped during cherry-pick.

Closes #29022
2026-05-21 07:48:53 -07:00
ethernet
48be2e0e4d
test: use subprocesses for each test file (#29016)
* ci(tests): install ripgrep from prebuilt tarball instead of apt

apt-get update + install of ripgrep takes ~4 min on the GHA Ubuntu
runners (the apt-get update against archive.ubuntu.com is the slow
part; ripgrep itself is small). Switching to the upstream musl
binary tarball cuts the step to a few seconds.

- Pinned to ripgrep 15.1.0 with sha256 verification (same hash as
  published in the releases sha256 sidecar file).
- Drops the `rg` binary into /usr/local/bin so it is on PATH for
  every subsequent step without GITHUB_PATH manipulation.
- Applied to both the test and e2e jobs in tests.yml.

* fix(cli): compile syntax check to tempdir, not source __pycache__

`_validate_critical_files_syntax` runs `py_compile.compile()` on each
critical bootstrap file after a successful `git pull`. The default
`py_compile` writes the resulting `.pyc` next to the source under
`__pycache__/`, which causes two real problems:

1. Parallel test workers walking the same source tree (e.g. running
   the suite under per-file process isolation) can race against each
   other on the `__pycache__` write — manifests as flaky 'directory
   not empty' errors during teardown.
2. In production, the post-pull syntax check leaves a `.pyc` behind
   that the next interpreter run might pick up — fine when the
   interpreter version matches, sketchy if it doesn't.

Fix: write the compiled output to a `tempfile.TemporaryDirectory()`
that's discarded on function exit. We only care about the compile-or-not
signal, not the artifact.

* test(runner): per-file process isolation, drop manual state reset + xdist

Replace fragile manual _reset_module_state test fixtures with robust
per-file subprocess isolation. Each test file runs in a fresh
`python -m pytest <file>` subprocess via ThreadPoolExecutor. No xdist,
no custom pytest plugin, no shared worker state.

Key changes:
  * scripts/run_tests_parallel.py — new runner: discovers test files,
    runs N in parallel via ThreadPoolExecutor, captures stdout per file,
    treats exit code 5 (no tests collected) as pass, kills all children
    on exit. Change from cpu_count to cpu_count*2. The runner is
    I/O-bound (waiting on subprocess.communicate() from pytest children)
    The parent process does almost no CPU work, so 2x oversubscription
    keeps more pipes full. When a file fails, immediately show the last
    30 lines of pytest output (stack traces + FAILED summary) plus a
    ready-to-copy repro command:
      python -m pytest tests/agent/test_auxiliary_client.py
  * scripts/run_tests.sh — delegates to run_tests_parallel.py
  * .github/workflows/tests.yml — test step: python
scripts/run_tests_parallel.py
  * pyproject.toml — drop pytest-xdist, pytest-split; simplify addopts
  * tests/conftest.py — remove ~200 lines of manual state-reset fixtures
  * AGENTS.md — update Testing section for per-file design

* test(runner): speed gateway test antipattern scan up

* fix(test): web search provider plugin test missing xai

* fix(tests): make 14 test files pass under per-file subprocess isolation

Tests that relied on cross-file state pollution from xdist workers
fail when run in isolation (per-file subprocess model). Root causes
and fixes:

Tool registry not populated:
  - test_video_generation_tool_surface_matrix: add discover_builtin_tools()
  - test_web_providers_brave_free/ddgs/searxng/general: autouse fixtures
    registering all 8 bundled web providers, reset after each test
  - test_website_policy: same provider registration pattern
  - test_web_tools_tavily: same pattern across 3 dispatch test classes
  - Also add is_safe_url/check_website_access mocks where SSRF check
    blocks example.com (DNS resolution fails in isolated envs)

Stale check_fn cache:
  - test_kanban_tools: invalidate_check_fn_cache() + _clear_tool_defs_cache()
    in both kanban guidance tests (prior test cached False for kanban_show)
  - test_discord_tool: cache invalidation in setup/teardown
  - test_homeassistant_tool: invalidate_check_fn_cache() before registry queries

Module-level state pollution:
  - test_auxiliary_client: autouse fixture clearing _aux_unhealthy_until cache
  - test_skill_commands: set_session_vars() instead of patch.dict(os.environ)
    (ContextVar takes precedence over os.environ)
  - test_dm_topics: overwrite sys.modules + separate telegram.constants mock
    + force-reimport of gateway.platforms.telegram
  - test_terminal_tool_requirements: removed duplicate class declaration,
    autouse _clear_caches fixture

* change(tests): run_tests.sh explicitly includes env vars

instead of manually dropping some vars, now we just only include some

* fix(tests): 5 more isolation/NixOS fixes

- test_approval_plugin_hooks: isolate HERMES_HOME so real user's
  command_allowlist doesn't short-circuit the approval path
- test_google_chat: skipif when Platform.GOOGLE_CHAT not in enum
  (feature not merged on this branch)
- test_write_deny: test systemd prefix against tmp_path instead of
  /etc/systemd which resolves to /nix/store on NixOS
- test_pty_bridge: use shutil.which('cat') instead of /bin/cat
  (doesn't exist on NixOS)
- profiles.py: rmtree onexc handler chmod's parent dirs too, fixing
  profile deletion when copytree preserved read-only modes from
  nix store

* fix(tests): clear unhealthy cache in autouse fixture for auxiliary_client

* fix(tests): skip send_message when telegram not installed; handle missing worker_id in browser_supervisor

* fix: py3.11 rmtree onexc compat + belt-and-suspenders unhealthy cache clear for expired codex test

* fix: address PR #29016 review feedback

- Remove tracked .pytest-cache/ artifact and add to .gitignore
- Fix stale 'xdist worker' comment in conftest.py
- Deduplicate web provider registration into tests/tools/conftest.py
  shared helper (register_all_web_providers), replacing 8 copy-pasted
  blocks across 6 test files
- Update PR description: remove stale recovered-test-files claim,
  fix worker count to match code (cpu_count*2)

* fix: eliminate race in stale-cache achievements test

The background scan thread could complete and overwrite _SNAPSHOT_CACHE
before evaluate_all() returned the stale data — only 10 fake sessions
made the scan finish instantly. Added scan_delay param to _FakeSessionDB
and set it to 2s in the stale-cache test so the background thread can't
win the race.
2026-05-21 16:40:04 +05:30
Teknium
32aea113f0 fix(agent): consult supports_vision override in auto-mode routing
The contributor PR (#17936) only patched the strip path in
`_model_supports_vision()`. The auto-mode router in
`agent/image_routing._lookup_supports_vision` still only read models.dev,
so a custom-provider model declared as vision-capable would still get its
images routed through vision_analyze in the default `agent.image_input_mode:
auto` setting. Users had to set both `supports_vision: true` AND
`image_input_mode: native` to bypass the text pipeline.

Single-knob behavior now: `supports_vision: true` alone is enough in auto
mode. The strip path and the routing path consult the same resolver.

- Extract override resolution into `_supports_vision_override()` in
  agent/image_routing.py and wire it into `_lookup_supports_vision()`.
- Refactor `run_agent._model_supports_vision` to call the same helper
  (DRY, single source of truth for the resolution order).
- Strict YAML boolean coercion: `supports_vision: "false"` (quoted —
  a common YAML mistake) no longer coerces to True via bool() truthiness.
  Recognised tokens: true/false/yes/no/on/off/1/0 plus real bools and 0/1.
  Unrecognised values return None and fall through to models.dev.
- Add @CNSeniorious000 to AUTHOR_MAP for release attribution.

Tests: 26 new (TestCoerceCapabilityBool, TestSupportsVisionOverride,
TestLookupSupportsVisionOverride, TestAutoModeRespectsOverride). Existing
contributor tests + image_routing + vision_native_fast_path +
native_image_buffer_isolation all green (92/92).
2026-05-20 23:27:10 -07:00
Teknium
b4afc6546e fix(xai): restore encrypted reasoning replay across turns
xAI partner integration requires Hermes to thread `encrypted_content`
reasoning items back to the Responses API on every turn so Grok can
maintain cross-turn reasoning coherence. PR #26644 (May 15) gated this
off for `is_xai_responses` on the theory that the OAuth/SuperGrok
surface rejected replayed encrypted blobs and produced the multi-turn
"Expected to have received \`response.created\` before \`error\`"
failure. That diagnosis was wrong — the prelude-SSE fallback added in
the same PR is what actually fixed that failure mode. Suppressing the
replay was an unnecessary side-effect that broke the whole point of
xAI's partnership integration.

Changes:
- agent/codex_responses_adapter.py — drop the `is_xai_responses` gate
  in `_chat_messages_to_responses_input`. Keep the kwarg in the
  signature for transport compatibility; update the docstring to
  document the May 2026 reversal.
- agent/transports/codex.py — restore
  `kwargs["include"] = ["reasoning.encrypted_content"]` on the xAI
  Responses path so xAI echoes encrypted reasoning back to us.
- tests/run_agent/test_codex_xai_oauth_recovery.py — flip the three
  xAI assertions (now: xAI MUST receive replayed reasoning AND we MUST
  include encrypted_content in the request).
- tests/agent/transports/test_codex_transport.py — flip the
  `include` assertions on `test_xai_reasoning_effort_passed` and
  `test_xai_grok_4_omits_reasoning_effort`; update the allowlist
  block comment.

The prelude-SSE fallback and the entitlement-403 surfacing fixes from
#26644 are untouched — they were independent fixes that happened to
ride along with the reasoning-replay gate.

Validation:
- Targeted: tests/run_agent/test_codex_xai_oauth_recovery.py +
  tests/agent/transports/test_codex_transport.py → 65/65 pass
- Broader: tests/agent/transports/ + tests/run_agent/ →
  1674 passed, 3 skipped, 0 failures
- E2E (real imports, isolated HERMES_HOME, ResponsesApiTransport
  build_kwargs): turn-1 request carries
  `include: ["reasoning.encrypted_content"]`; turn-2 input replays
  the encrypted_content blob from turn-1's
  `codex_reasoning_items`; native Codex unchanged.
2026-05-20 23:12:45 -07:00
Savanne Kham
258965663c fix(chat_completions): strip tool_name from messages for strict providers
The 'tool_name' key on role=tool messages is an internal Hermes field
(stored in the messages.tool_name SQLite column for FTS indexing) that
is not part of the OpenAI Chat Completions schema. Strict OpenAI-compatible
providers — notably Moonshot AI (Kimi) — reject it with HTTP 400:

  Error from provider: Extra inputs are not permitted,
  field: 'messages[N].tool_name', value: 'execute_code'

Add 'tool_name' to the sanitize block in ChatCompletionsTransport.convert_messages
alongside the existing Codex Responses API fields (codex_reasoning_items,
codex_message_items) so it is popped before the request is sent.

Reproducer:
  hermes chat --model kimi-k2.6
  > list the top 5 Hacker News stories
  -> assistant emits tool_call(execute_code)
  -> tool result message gets tool_name='execute_code'
  -> next turn's payload includes messages[N].tool_name -> 400

Permissive backends (MiniMax, OpenRouter on most routes) ignore the extra
field and were masking the bug.
2026-05-20 02:44:08 -07:00
brooklyn!
5e743559e0
fix(lint): skip per-file shell linter when LSP will handle the file (#29054)
* fix(lint): skip per-file shell linter when LSP will handle the file

`_check_lint` ran `npx tsc --noEmit FILE.ts` after every `.ts`/`.tsx`
edit. `tsc` ignores `tsconfig.json` when given an explicit file argument
(documented quirk) and defaults to no-lib / ES5, so every ES2015+ stdlib
reference reports as missing:

  - `Cannot find global value 'Promise'`
  - `Cannot find name 'Map' / 'Set' / 'ReadonlySet' / 'Iterable'`
  - `Property 'isFinite' does not exist on type 'NumberConstructor'`
  - `Module 'phaser' can only be default-imported using esModuleInterop`
  - `import.meta is only allowed when --module is es2020+`

On real TypeScript projects this floods the `lint` field on
WriteResult / PatchResult with up to 25K tokens of false positives
per edit. The delta filter in `_check_lint_delta` is supposed to mask
them, but a tiny edit shifts line numbers and every phantom resurfaces
as "introduced by this edit". The result is a 1MB+ phantom-error dump
on every patch that eats the agent's context budget. Same shape for
`.go` (`go vet` outside a module) and `.rs` (`rustfmt --check` outside
a Cargo project).

PR #24168 added an LSP tier on top of this — real `tsserver` / `gopls`
/ `rust-analyzer` diagnostics surface in the separate `lsp_diagnostics`
field. But the broken shell linter kept running underneath, so the
phantom-error dump kept happening even when LSP was giving us a clean
authoritative signal.

This change short-circuits the shell linter for the structurally-broken
extensions (`.ts`, `.tsx`, `.go`, `.rs`) when an LSP server is active
and claims the file via `LSPService.enabled_for(path)`. The LSP tier
runs as before and carries the real diagnostics in `lsp_diagnostics`.
Other shell linters (`py_compile`, `node --check`) keep running
unconditionally — they're fast, file-local, and correct.

Default behavior (LSP disabled, LSP misconfigured, remote backend, file
outside a workspace) is unchanged — the existing fallback paths trigger
when `_lsp_will_handle` returns False, so users who haven't opted into
LSP get the same shell-linter behavior they had before.

Drive-by: `.tsx` was missing from the `LINTERS` table entirely, so TS
React files got no post-edit syntax check at all. Added it for
symmetry; in practice it now hits the LSP-skip path.

Tests:
  - `tests/agent/lsp/test_shell_linter_lsp_skip.py` — 14 tests covering:
    * skip happens for each redundant extension when LSP claims the file
      (asserted by patching `_exec` to raise on any shell-linter call)
    * shell linter still runs when LSP is inactive (regression guard)
    * `.py` / `.js` continue to run unconditionally even with LSP active
    * `_lsp_will_handle` is exception-safe: returns False on None
      service, remote backend, or `enabled_for` raising
    * `.tsx` is in both `LINTERS` and `_SHELL_LINTER_LSP_REDUNDANT`
  - All pre-existing tests in `tests/agent/lsp/` and
    `tests/tools/test_file_operations*.py` still pass (233/233).

* fix(lint): address Copilot review on #29054

Two fixes from copilot-pull-request-reviewer on PR #29054:

1. `.tsx` regression with LSP disabled
   (https://github.com/NousResearch/hermes-agent/pull/29054#discussion_r3271017282)

   The first revision added `.tsx` to the `LINTERS` table so that
   TypeScript React files would hit the LSP skip path. Side effect:
   when LSP is *disabled* (the default), `.tsx` edits would suddenly
   run `npx tsc --noEmit FILE.tsx` and inherit the same phantom-error
   dump this PR is supposed to fix. Pre-PR behavior was implicit
   `skipped` (no `LINTERS` entry); restore that.

   - Remove `.tsx` from `LINTERS`.
   - Remove `.tsx` from `_SHELL_LINTER_LSP_REDUNDANT` (the skip path
     is unreachable without a `LINTERS` entry — falls through to
     `ext not in LINTERS` first).
   - When LSP IS enabled, `.tsx` is still covered by the LSP tier
     via `_maybe_lsp_diagnostics` (typescript-language-server's
     `extensions` tuple includes `.tsx`), so the diagnostics still
     surface — just on the `lsp_diagnostics` channel, not `lint`.
   - Update test_shell_linter_lsp_skip.py to reflect this contract
     (drop `.tsx` from the parametrize lists; add
     `test_tsx_stays_out_of_linters_table_for_default_compatibility`
     and `test_tsx_default_check_lint_returns_skipped`).

2. V4A patches dropped `WriteResult.lsp_diagnostics`
   (https://github.com/NousResearch/hermes-agent/pull/29054#discussion_r3271017295)

   `tools/patch_parser.py::apply_v4a_operations` calls
   `file_ops.write_file()` per operation, then calls `_check_lint()`
   directly afterwards — but never propagates `WriteResult.lsp_diagnostics`
   to the `PatchResult`. The shell-linter skip introduced in this PR
   makes the gap visible: a `.ts` / `.go` / `.rs` V4A patch with LSP
   active would return `lint = {f: {skipped: True}}` and zero
   diagnostics from any channel.

   - `_apply_add` and `_apply_update` now return
     `Tuple[bool, str, Optional[str]]` where the third element is
     `WriteResult.lsp_diagnostics` (or `None` on failure / no diags).
   - `_apply_delete` and `_apply_move` stay 2-tuples — they don't
     produce diagnostics, no write goes through `write_file`.
   - `apply_v4a_operations` accumulates per-file diagnostics blocks
     and surfaces a combined block on `PatchResult.lsp_diagnostics`.
     Each block already carries its `<diagnostics file="...">` header
     from `LSPService.report_for_file`, so concatenation preserves
     per-file attribution.

Tests added (`test_patch_parser.py::TestV4ALspDiagnosticsPropagation`):

- ADD op: `WriteResult.lsp_diagnostics` flows to `PatchResult`
- UPDATE op: same
- No diagnostics → `PatchResult.lsp_diagnostics is None` (not "")
- Multi-file patch: combined block contains every per-file block

Verification:

- Targeted test scope: 257/257 pass
  (tests/agent/lsp/, tests/tools/test_file_operations*.py,
  tests/tools/test_patch_parser.py)
- Wider sweep: 5400 pass; 11 failures all pre-existing on origin/main
  (file_staleness / file_read_guards / file_state_registry — unrelated
  macOS /var/folders tmp-path sensitivity issues, confirmed by
  re-running on a clean origin/main checkout)

* docs(test): align shell-linter LSP skip docstring with .tsx behavior

Copilot review feedback (review #4324947616, comment #3271049036):
the test module docstring still listed .tsx alongside .ts/.go/.rs in
the skip contract, but .tsx is now intentionally NOT in LINTERS or
_SHELL_LINTER_LSP_REDUNDANT. Updated the bullet list to drop .tsx from
the skip contract and added a paragraph documenting why .tsx is left
out (preserves pre-PR implicit-skip behavior for LSP-disabled users;
LSP coverage still happens via _maybe_lsp_diagnostics).

* test(lsp): drop unused tmp_path from _make_fops helper

Copilot review #3271069484: the helper accepted tmp_path but never
used it. Callers still need tmp_path themselves for the file they're
asserting against, so we just drop the helper's parameter.
2026-05-20 01:46:40 -05:00
ooovenenoso
d759a67c0f fix: add recovery hints to loop guard warnings 2026-05-19 00:12:12 -07:00
Teknium
b5c1fe78aa
feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373)
Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that
group several skills under one slash command. Invoking /<bundle-name>
from any surface (CLI, TUI, dashboard, any gateway platform) loads
every referenced skill into a single combined user message.

Use cases:
- /backend-dev → loads github-code-review + test-driven-development
  + github-pr-workflow as one bundle.
- /research → loads several research skills together.
- Team task profiles shared via dotfiles.

Behavior:
- Bundles take precedence over individual skills when slugs collide.
- Missing skills are skipped with a note, not fatal.
- No system-prompt mutation — bundles generate a fresh user message
  at invocation time, the same way /<skill> does. Prompt cache stays
  intact.
- Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI),
  /help display.

Schema (~/.hermes/skill-bundles/<slug>.yaml):
    name: backend-dev
    description: Backend feature work.
    skills:
      - github-code-review
      - test-driven-development
    instruction: |
      Optional extra guidance prepended to the loaded skills.

New module: agent/skill_bundles.py — load, scan, resolve, build
invocation message, save, delete. yaml.safe_load only; broken
bundles log a warning and are skipped, never raise.

New CLI subcommand: hermes bundles {list,show,create,delete,reload}.
Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py.
'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it.

New in-session slash command: /bundles lists installed bundles in
both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py)
and gateway (gateway/run.py) before the existing /<skill-name> path.

Autocomplete: SlashCommandCompleter gained an optional
skill_bundles_provider parameter that defaults to None — the prompt
shows '▣ <description> (N skills)' for bundles vs '' for skills.

Tests:
- tests/agent/test_skill_bundles.py — 33 tests covering slugify,
  scan/cache freshness, resolve (including underscore→hyphen
  Telegram alias), build_bundle_invocation_message (loading, missing
  skills, user/bundle instruction injection, dedup), save/delete,
  reload diff, list sort.
- tests/hermes_cli/test_bundles.py — 8 tests for the CLI
  subcommand (create/list/show/delete/reload, --force, missing
  bundle errors).
- tests/gateway/test_bundles_command.py — 4 tests for the gateway
  handler and bundle resolution priority.

Live E2E: verified subprocess invocations of hermes bundles
{list,create,show,reload,delete} round-trip correctly against an
isolated HERMES_HOME.

Docs:
- website/docs/user-guide/features/skills.md — new 'Skill Bundles'
  section with quick example, YAML schema, management commands,
  behavior notes.
- website/docs/reference/cli-commands.md — 'hermes bundles' added to
  the top-level command table and given its own subcommand section.
2026-05-18 21:38:05 -07:00
briandevans
756900723a fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS
Qwen3.x and DeepSeek-V3.x default to chatty/hallucinatory tool use without
enforcement steering — agents narrate "calling tool X" without actually
emitting a tool call, or run partial loops. Both model families fit the
same failure pattern TOOL_USE_ENFORCEMENT_GUIDANCE was already injected
for (gpt, codex, gemini, gemma, grok, glm).

Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>

Squashed salvage of:
- 403e567ce fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS
- 9433eabe7 test(agent): use realistic qwen-plus identifier in enforcement test

Fixes #28079.
2026-05-18 20:06:49 -07:00
AceWattGit
25e0f4d465 fix: wrap _pool_may_recover_from_rate_limit call through run_agent namespace
The conversation_loop.py references _pool_may_recover_from_rate_limit which
was defined in run_agent.py. After the conversation-loop extraction refactor,
the helper was no longer in the same module scope. Wrap the call as
_ra()._pool_may_recover_from_rate_limit() to route through the run_agent
monkeypatch namespace where the helper is available.

Adds regression test in test_gemini_fast_fallback.py.

Fixes: MAILROOM Email Triage NameError, OPS Execution Monitor NameError.
2026-05-18 20:04:57 -07:00
墨綠BG
50e93f23f2 🐛 fix(memory): require newline after context tag 2026-05-18 10:53:08 -07:00
墨綠BG
341c8d3030 🐛 fix(memory): keep inline memory-context mentions visible 2026-05-18 10:53:08 -07:00
EloquentBrush0x
b570e0fdd0 fix(codex-oauth): quarantine terminal refresh errors so dead tokens are not replayed across sessions
When a Codex OAuth refresh token is permanently invalidated (HTTP 400/401/403,
token revoked or reused), _mark_exhausted was called but auth.json was left with
the dead credentials. On the next session, _seed_from_singletons re-read
auth.json and re-seeded the pool with the same revoked token, triggering the
same terminal failure in a loop.

Add _is_terminal_codex_oauth_refresh_error to auth.py and a matching quarantine
block in _refresh_entry: when a terminal error is detected and auth.json holds
no newer tokens, clear access_token/refresh_token from auth.json and remove all
device_code-sourced pool entries from memory. Mirrors the Nous quarantine added
in c90556262 and the xAI quarantine in #28116.

Also add a pre-refresh sync from auth.json before calling refresh_codex_oauth_pure,
matching the xAI and Nous patterns, to avoid refresh_token_reused races when
multiple Hermes processes share the same auth.json singleton.

Salvaged from #27911 by @EloquentBrush0x — contributor's branch was severely
stale (would have reverted ~5000 LOC across azure/kanban/i18n subsystems);
fix re-applied surgically on current main with their predicate and tests preserved.
2026-05-18 10:31:40 -07:00
Teknium
9aae59feab
fix(compress): make abort-on-summary-failure opt-in via config flag (#28117)
PR #28102 made the summary-failure abort path the unconditional default,
changing established behavior. Gate it behind config.yaml flag
`compression.abort_on_summary_failure` (default False = historical
fallback-placeholder behavior).

- hermes_cli/config.py: new `compression.abort_on_summary_failure` key,
  default False, documented inline.
- agent/agent_init.py: read the flag from compression config and pass to
  ContextCompressor.
- agent/context_compressor.py: `__init__` accepts `abort_on_summary_failure`
  (default False). `compress()` failure branch gates the abort on the
  flag; when False, falls through to the restored legacy fallback path
  (static "summary unavailable" placeholder + drop middle window).
- tests: restore original fallback expectations as default; add new
  TestAbortOnSummaryFailure class for the opt-in mode.

Gateway/CLI plumbing (force=True on /compress, hygiene/handler abort
detection, locale `gateway.compress.aborted` key) from PR #28102 stays
intact — those paths only fire when `_last_compress_aborted` is True,
which now only happens when the flag is enabled.
2026-05-18 10:28:20 -07:00
EloquentBrush0x
5e40f83cb7 fix(xai-oauth): quarantine terminal refresh errors so dead tokens are not replayed across sessions
When refresh_xai_oauth_pure raises a terminal error (HTTP 400/401/403,
i.e. revoked or reused refresh token), _refresh_entry's existing race-
recovery path re-syncs from auth.json and returns if another process has
already rotated the tokens.  If auth.json still holds the same stale
token pair, the function fell through to _mark_exhausted — leaving the
dead credentials in auth.json.  On the next Hermes startup _seed_from_singletons
re-seeded the pool from those stale tokens, causing the same failure loop
on every session.

Fix: after the auth.json re-sync check in the xAI-oauth error handler,
detect terminal errors with the new _is_terminal_xai_oauth_refresh_error
helper and apply a quarantine:
- Clear access_token and refresh_token from providers["xai-oauth"]["tokens"]
  in auth.json so they are not re-seeded.
- Write a last_auth_error entry for hermes doctor / auth status diagnostics.
- Remove all loopback_pkce entries from the in-memory pool so the current
  session stops retrying with the dead credentials.

Mirrors the identical quarantine already in place for Nous OAuth
(c90556262).

Closes the parity gap introduced when c90556262 added Nous-only terminal
error handling without a corresponding xAI-oauth path.
2026-05-18 10:28:09 -07:00
flamiinngo
5613dfea93 fix(security): redact xAI (Grok) API keys in logs
xAI is a first-class provider in hermes-agent with its own credential
pool entry (XAI_API_KEY / xai-oauth). API keys follow the format
xai-<60+ alphanumeric chars> and were absent from _PREFIX_PATTERNS in
agent/redact.py.

When a key appears raw in log output, tool results, or error messages,
it passed through completely unmasked. The ENV-assignment and Bearer
header patterns catch the most common cases, but a raw token in a
stack trace or debug print had no protection.

Verified before fix:
  redact_sensitive_text("using key xai-ABCD...rstu to call xAI", force=True)
  # "using key xai-ABCD...rstu to call xAI"  <- exposed

After fix:
  # "using key xai-AB...rstu to call xAI"    <- masked

Five unit tests added to TestXaiToken covering bare token masking,
env assignment, short-prefix false positive, company name false
positive, and visible prefix in masked output.
2026-05-18 10:21:22 -07:00
Teknium
1634397ddb
fix(compress): abort instead of dropping messages when summary LLM fails (#28102)
When auxiliary compression's summary generation returns None (aux model
errored, returned non-JSON, timed out, etc.) the compressor previously
still dropped every middle message between compress_start..compress_end
and replaced them with a static 'Summary generation was unavailable'
placeholder. The session kept going but the user silently lost N turns
of context for nothing.

New behavior: on summary failure, compress() aborts entirely — returns
the input messages unchanged and sets _last_compress_aborted=True. The
existing _summary_failure_cooldown_until gate (30-60s) keeps the aux
model from being burned on every turn. Auto-compress callers detect
the no-op (len(after) == len(before)) and stop looping. The chat is
'frozen' at its current size until the next /compress or /new.

Manual /compress (CLI + gateway) now passes force=True which clears
the cooldown so users can retry immediately after an auto-abort. If
the manual retry also fails, the user gets a visible warning telling
them nothing was dropped and how to retry.

- agent/context_compressor.py: compress() gains force= kwarg; failure
  branch sets _last_compress_aborted and returns messages unchanged
  instead of inserting placeholder.
- run_agent.py: _compress_context() detects abort, surfaces warning,
  skips session-rotation entirely, returns messages unchanged.
- cli.py + gateway/run.py: manual /compress paths pass force=True.
- gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted
  and emit the new 'Compression aborted' warning (gateway.compress.aborted)
  instead of the old 'N historical messages were removed' message.
- locales/*.yaml: new gateway.compress.aborted key in all 16 locales.
- tests: updated to assert the abort contract (messages preserved,
  compression_count not incremented, abort flag set, no placeholder
  leaked). New test_force_true_bypasses_failure_cooldown covers the
  manual-retry path.
2026-05-18 10:19:40 -07:00
glennc
9df9816dab feat(azure-foundry): add Microsoft Entra ID auth
Use azure-identity DefaultAzureCredential for keyless Foundry auth.

Preserve refreshable callable credentials through OpenAI and Anthropic client paths.

Add setup, doctor, auth status, docs, and tests for Entra auth.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-18 10:14:38 -07:00
teknium
f0c6d59148 fix(anthropic): scope MiniMax beta-strip to MiniMax only
Cherry-pick of @sharziki's #27022 routed Azure Foundry through
_requires_bearer_auth, which also triggered the MiniMax-specific
beta-strip in _common_betas_for_base_url — dropping the 1M-context
beta from Azure even though Azure needs it for 1M context.

Split the strip predicate: introduce _is_minimax_anthropic_endpoint
so the fine-grained-tool-streaming and context-1m strips only fire
for MiniMax hosts, leaving Azure's bearer-auth header swap intact
without losing 1M context.

Also add a regression test that asserts Azure gets Bearer auth,
the api-version query param, and the context-1m-2025-08-07 beta.
2026-05-18 09:27:18 -07:00
wysie
ff078738ea fix(skills): load symlinked skill slash commands 2026-05-18 00:34:29 -07:00
teknium1
4a3f13b47b perf(prompt-cache): date-only timestamp + loud gateway-DB roundtrip logging
The system prompt's 'Conversation started:' line carried minute precision
(%I:%M %p), making it byte-unstable across every rebuild path. Within a
CLI session the in-memory cache held, but on the gateway path (fresh
AIAgent per turn → restore from session DB), any silent failure in the
read or write path dropped the cache stem and forced a full re-prefill
on every subsequent turn. Local prefix-caching backends (llama.cpp /
vLLM) saw this as KV-cache invalidation; remote prefix-caching providers
saw it as an Anthropic-style cache miss.

Three changes:

1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM').
   System prompt now byte-stable for the full day. The model can still
   query exact time via tools when it actually needs it. Credit:
   @iamfoz (PR #20451).

2. Loud logging on session DB write failures. The update_system_prompt
   call used to log at DEBUG, hiding disk-full / locked-database / schema
   drift behind a silent fall-through that forced fresh rebuilds on
   every subsequent turn. Now WARN with the session id and exception so
   persistent issues show up in agent.log without verbose mode.

3. Three-way stored-state distinction on read. The previous
   'session_row.get("system_prompt") or None' collapsed three states
   into one (missing row / null column / empty string). Now we tell them
   apart and WARN when a continuing session lands on null/empty (which
   means the previous turn's write never persisted — every subsequent
   turn rebuilds and the prefix cache misses every time).

The restore block is extracted into _restore_or_build_system_prompt()
so the prefix-cache path can be unit-tested in isolation.

E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary
sleep restores byte-identical bytes from the session DB. NULL stored
prompt fires the new warning. Date-only timestamp survives the rebuild
path. All on real SessionDB, no mocks.

Tests:
  - tests/agent/test_system_prompt_restore.py (10 new tests)
  - tests/run_agent/test_run_agent.py::TestBuildSystemPrompt::
        test_datetime_is_date_only_not_minute_precision

Closes #20451 (date-only), #18547 (prefix stabilization),
#8689 (stabilize timestamp across compression), #15866 (timestamp
caching question), #8687 (compression timestamp), #27339
(claim #3: live timestamp in cached system prompt).

Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>
2026-05-17 23:20:37 -07:00
teknium1
766f263bd2 test(auxiliary): cover layered fallback (chain → main agent → warn)
7 new tests:

TestAuxiliaryFallbackLayering (3):
  - configured_chain succeeds → main agent fallback NOT consulted
  - chain returns nothing → main agent fallback runs and succeeds
  - both exhausted → user-visible 'all fallbacks exhausted' warning
    fires before the original error is re-raised

TestTryMainAgentModelFallback (4):
  - returns (None, None, "") when main provider is 'auto'
  - returns (None, None, "") when failed provider == main provider
    (no point retrying the same backend)
  - resolves the main provider's client when configured correctly
  - skips when main provider is marked unhealthy
2026-05-17 17:15:31 -07:00
teknium1
ec096cfbd8 test(auxiliary): adapt eviction tests to capacity-error fallback
The two TestAuxiliaryClientPoisonedCacheEviction tests were written
when explicit-provider users got no fallback at all on connection
errors — they asserted ConnectionError propagated after eviction
because the fallback gate blocked the auto chain.

After the #26803 fix in the previous commit, capacity errors
(payment/quota/connection) now DO trigger fallback even on explicit
providers. The tests still verify cache eviction (their actual
contract) but now stub _try_payment_fallback so the fallback
machinery does not attempt a real network call.
2026-05-17 17:15:31 -07:00
Bartok9
24c209f112 fix(auxiliary): detect quota exhaustion as payment error; allow capacity-error fallback for explicit providers
Closes #26803

Root causes:
1. _is_payment_error() checked for billing keywords (credits, insufficient
   funds, billing, payment required) but missed daily token quota exhaustion
   phrases used by Bedrock, Vertex AI, and LiteLLM proxies — e.g.
   'Too many tokens per day', 'quota exceeded', 'resource exhausted',
   'daily limit'. These are functionally identical to credit exhaustion
   (provider cannot serve the request) but don't trigger fallback.

2. The call_llm() fallback chain was gated on resolved_provider == 'auto'.
   When a task resolves to a specific provider (e.g. 'custom' for a LiteLLM
   proxy, or 'openrouter'), capacity failures (payment/quota/connection)
   silently raise instead of trying alternatives. This is overly conservative:
   capacity errors mean the provider *cannot* serve the request regardless of
   user intent, so alternatives should always be tried.

Fixes:
- Add quota-related keywords to _is_payment_error(): quota_exceeded,
  too many tokens per day, daily limit, tokens per day, daily quota,
  resource exhausted (Vertex AI gRPC code).
- Allow fallback for capacity errors (payment + connection) even when
  resolved_provider is not 'auto'. Rate-limit fallback stays gated on
  is_auto to honour explicit provider constraints for transient limits.
- Apply both fixes to sync call_llm() and async acall_llm() paths.
- Add 6 targeted tests for the new quota-error detection cases.
2026-05-17 17:15:31 -07:00
Robin Fernandes
569bc94b59 fix(auth) fix a few cases where refresh tokens were not rotated. 2026-05-17 16:56:37 -07:00
Robin Fernandes
0bac7dd05b refactor(auth): collapse Nous inference fallback controls 2026-05-17 16:56:37 -07:00
Robin Fernandes
89a3d038cf Switch to JWT token for inference against Nous, falling back to old opaque token on failure. 2026-05-17 16:56:37 -07:00
Robin Fernandes
c905562623 fix(auth): stop replaying invalid Nous refresh tokens
Quarantine Nous OAuth state when refresh fails with terminal invalid_grant/invalid_token errors. Clear local and shared refresh material across runtime, managed access-token, proxy, and credential-pool paths so Hermes stops retrying revoked refresh sessions.
2026-05-17 16:56:37 -07:00
Hoang V. Pham
4a7cd2e16d fix(codex): allow kanban worker board writes 2026-05-17 11:50:43 -07:00
QuenVix
d5a0815c3d fix(transports): use monotonic deadlines in codex app-server turn loop 2026-05-17 11:37:45 -07:00
flamiinngo
63805965e7 fix(security): restore type safety and extract constant in shell hook block handler
Address code review feedback on _parse_response:

1. Restore isinstance(raw, str) guard so non-string message/reason values
   (e.g. integers, lists) from a malformed hook response fall back to the
   default rather than being forwarded as-is. This keeps the contract that
   message in the returned dict is always a string.

2. Extract the repeated literal 'Blocked by shell hook.' into a module-level
   constant _DEFAULT_BLOCK_MESSAGE to avoid duplication and make it easy to
   change in one place.

Four new unit tests added to tests/agent/test_shell_hooks.py covering:
- action block with no message (uses default)
- decision block with no reason (uses default)
- action block with empty string message (uses default)
- action block with non-string message, e.g. integer (uses default)
2026-05-17 02:31:18 -07:00
haran2001
d9abbe7fa4 fix(metadata): qwen3.6-plus has a 1M context window (#27008)
qwen3.6-plus did not have an explicit entry in DEFAULT_CONTEXT_LENGTHS,
so the longest-substring fallback matched the generic 'qwen': 131072
catch-all. That dropped the effective context limit from 1,048,576
tokens to 131,072, prematurely lowered the compression threshold, and
produced misleading warnings about main/compression context mismatch
in long sessions.

Add an explicit 'qwen3.6-plus': 1048576 entry before the catch-all and
cover it with a regression test (bare, qwen/, and dashscope/ prefixes).

Note: PR #6599 also mentions touching model_metadata.py but the actual
diff only edits hermes_cli/models.py, so this fix is independent and
not duplicated by that PR.

Closes #27008
2026-05-17 02:31:18 -07:00
kshitij
5fba236644
chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355)
Six days after #23937 (608 fixes) the codebase had accumulated 241 new
PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix,
same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the
two are semantically equivalent for hashable scalar membership tests.

All 241 instances fixed via `ruff check --select PLR6201 --fix
--unsafe-fixes`, zero remaining. Every changed value is a hashable
scalar (str/int/None/enum/signal); no risk of unhashable runtime
errors. No behavior change.

Test plan:
- 119 files changed, +244/-244 (net zero) — exactly one-line edits
- `ruff check` clean afterward
- Compile checks pass on the largest touched files (cli.py, run_agent.py,
  gateway/run.py, gateway/platforms/discord.py, model_tools.py)
- Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/
  tests/tools/: 18187 passed, 59 pre-existing failures (verified against
  origin/main with the same shape — identical failure count, identical
  category — all xdist test-order flakes unrelated to this change)

Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).
2026-05-17 02:29:41 -07:00
Rahul
a52f014a8c fix(tests): mock keychain in TestReadClaudeCodeCredentials to prevent credential leakage
Tests in TestReadClaudeCodeCredentials were not mocking
_read_claude_code_credentials_from_keychain, which was added after the
tests were written. On macOS machines with real Claude Code credentials
stored in the Keychain, the function returns live credentials instead of
the test fixtures, causing assertions to fail and leaking real tokens in
test output.

Add an autouse fixture that stubs the keychain reader to None so all
tests in the class exercise only the file-based credential path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 23:05:27 -07:00
0xchainer
782d743730 test(skills): add regression test for skill load failure returning None
Add test_returns_none_when_skill_load_fails to verify that
build_skill_invocation_message() returns None when a registered
skill exists in the command cache but _load_skill_payload() fails.
This guards against regression of the fix in 877d01b.
2026-05-16 22:52:22 -07:00
Teknium
3b39096904
Port from Kilo-Org/kilocode#9434: strip historical media after compression (#27189)
After context compression, the protected tail messages retain their
original image parts. When those include multi-MB pasted screenshots,
every subsequent API request re-ships the same base-64 blobs forever —
which can push the request past provider body-size limits and wedge the
session even though compression 'succeeded'.

Add _strip_historical_media() to agent/context_compressor.py. After the
summary is built, find the newest user message that carries an image
part and replace image parts in every earlier message with a short
text placeholder ('[Attached image — stripped after compression]').
The newest image-bearing user turn keeps its media so the model can
still analyse what the user just sent.

Handles all three multimodal shapes:
  - OpenAI chat.completions image_url
  - OpenAI Responses API input_image
  - Anthropic native {type: image, source: ...}

Includes 27 unit tests covering the helpers and the end-to-end
compress() integration, plus a manual E2E check confirming a ~4MB
two-image conversation shrinks to ~2MB after compression.
2026-05-16 17:18:25 -07:00
Teknium
93e109a1d5
fix(moonshot): strip $ref siblings and collapse tuple items in tool schemas (#27104)
Port from anomalyco/opencode#24730: Moonshot's JSON Schema validator rejects
two shapes that the rest of the JSON Schema ecosystem accepts:

1. $ref nodes with sibling keywords. Moonshot expands the reference before
   validation and then rejects the node if keys like `description`, `type`,
   or `default` appear alongside $ref. MCP-sourced tool schemas commonly
   put a `description` on $ref-typed properties so the model sees the
   field hint — which worked on every provider except Moonshot.

2. Tuple-style `items` arrays (positional element schemas). Moonshot's
   engine requires ONE schema applied to every array element. Common in
   tool schemas generated from Go/Protobuf that model fixed-length arrays
   as `[{type:number}, {type:number}]`.

Repairs applied in `agent/moonshot_schema.py`:

- Rule 3: when a node has `$ref`, return `{"$ref": <value>}` only
  (strip every sibling). The referenced definition still carries its own
  description on the target node, which Moonshot accepts.
- Rule 4: when `items` is a list, collapse to the first element schema
  (falling back to `{}` which is then filled by the generic missing-type
  rule). Preserves `minItems` / `maxItems` / other siblings.

Tests: 10 new cases across TestRefSiblingStripping + TestTupleItems,
plus the existing TestMissingTypeFilled::test_ref_node_is_not_given_synthetic_type
still passes (it asserted plain $ref passes through; now it passes through
as exactly `{"$ref": "..."}` which is strictly compatible).

All 35 tests in test_moonshot_schema.py pass.
2026-05-16 13:02:19 -07:00
teknium1
72f94f4a7c test(security): regression guard for OAuth PKCE state/verifier separation
Two unit tests for run_hermes_oauth_login_pure():

1. test_authorization_url_state_is_not_pkce_verifier — asserts state in the
   auth URL is independent from the PKCE code_verifier sent in the token
   exchange, and that the verifier never appears in the URL.

2. test_callback_state_mismatch_aborts — asserts the flow returns None
   (no token exchange) when the callback state does not match the value
   we generated.

Negative control verified: reintroducing the b17e5c10 vulnerable pattern
(state = verifier, no callback validation) makes both tests fail.

Also adds AUTHOR_MAP entry for shaun0927 (contributor of the fix).
2026-05-16 02:38:02 -07:00
teknium1
374dc81c23 fix(copilot-acp): tighten deprecation detection + sharpen GitHub Models 413 hint
Follow-up improvements on top of @konsisumer's cherry-picked fix for #10648:

1. Deprecation patterns required BOTH a product fingerprint ('gh-copilot') and
   a deprecation marker. The previous list included 'copilot-cli' and bare
   'deprecation', which would false-positive on stderr from the NEW
   @github/copilot CLI — whose repo is literally github.com/github/copilot-cli
   and which legitimately surfaces those substrings in its own messages.

2. Replace the deprecation hint. The user in #10648 installed
   'gh extension install github/gh-copilot' (the deprecated extension)
   thinking that's what ACP mode uses, when ACP actually spawns the new
   'copilot' binary from '@github/copilot'. The hint now points users at the
   correct install command ('npm install -g @github/copilot') with the new
   CLI's repo URL, and demotes provider-switching to a fallback alternative.

3. Change _URL_TO_PROVIDER value for models.inference.ai.azure.com from the
   'github-models' alias to the canonical 'copilot' provider id, matching the
   convention used by every other entry in the table.

4. Sharpen the 413 hint message. The free tier's ~8K cap is below the
   system-prompt floor, so this endpoint is fundamentally incompatible with
   an agentic loop — not a 'use a different URL' problem.

Tests:
- New parametrized false-positive coverage for the new CLI's stderr shape.
- Updated assertion to require canonical 'copilot' provider mapping.
- All 14 deprecation/URL tests pass.
2026-05-16 02:24:48 -07:00
konsisumer
b85b938b1f test: add tests for copilot ACP deprecation detection and Azure URL mapping
Cover the deprecation pattern matching against real gh-copilot stderr
output, verify the GitHub Models Azure URL is in _URL_TO_PROVIDER, and
confirm _is_github_models_base_url recognises the Azure endpoint.
2026-05-16 02:24:48 -07:00