hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-14 14:12:44 +00:00

Author	SHA1	Message	Date
Savanne Kham	258965663c	fix(chat_completions): strip tool_name from messages for strict providers The 'tool_name' key on role=tool messages is an internal Hermes field (stored in the messages.tool_name SQLite column for FTS indexing) that is not part of the OpenAI Chat Completions schema. Strict OpenAI-compatible providers — notably Moonshot AI (Kimi) — reject it with HTTP 400: Error from provider: Extra inputs are not permitted, field: 'messages[N].tool_name', value: 'execute_code' Add 'tool_name' to the sanitize block in ChatCompletionsTransport.convert_messages alongside the existing Codex Responses API fields (codex_reasoning_items, codex_message_items) so it is popped before the request is sent. Reproducer: hermes chat --model kimi-k2.6 > list the top 5 Hacker News stories -> assistant emits tool_call(execute_code) -> tool result message gets tool_name='execute_code' -> next turn's payload includes messages[N].tool_name -> 400 Permissive backends (MiniMax, OpenRouter on most routes) ignore the extra field and were masking the bug.	2026-05-20 02:44:08 -07:00
brooklyn!	5e743559e0	fix(lint): skip per-file shell linter when LSP will handle the file (#29054 ) * fix(lint): skip per-file shell linter when LSP will handle the file `_check_lint` ran `npx tsc --noEmit FILE.ts` after every `.ts`/`.tsx` edit. `tsc` ignores `tsconfig.json` when given an explicit file argument (documented quirk) and defaults to no-lib / ES5, so every ES2015+ stdlib reference reports as missing: - `Cannot find global value 'Promise'` - `Cannot find name 'Map' / 'Set' / 'ReadonlySet' / 'Iterable'` - `Property 'isFinite' does not exist on type 'NumberConstructor'` - `Module 'phaser' can only be default-imported using esModuleInterop` - `import.meta is only allowed when --module is es2020+` On real TypeScript projects this floods the `lint` field on WriteResult / PatchResult with up to 25K tokens of false positives per edit. The delta filter in `_check_lint_delta` is supposed to mask them, but a tiny edit shifts line numbers and every phantom resurfaces as "introduced by this edit". The result is a 1MB+ phantom-error dump on every patch that eats the agent's context budget. Same shape for `.go` (`go vet` outside a module) and `.rs` (`rustfmt --check` outside a Cargo project). PR #24168 added an LSP tier on top of this — real `tsserver` / `gopls` / `rust-analyzer` diagnostics surface in the separate `lsp_diagnostics` field. But the broken shell linter kept running underneath, so the phantom-error dump kept happening even when LSP was giving us a clean authoritative signal. This change short-circuits the shell linter for the structurally-broken extensions (`.ts`, `.tsx`, `.go`, `.rs`) when an LSP server is active and claims the file via `LSPService.enabled_for(path)`. The LSP tier runs as before and carries the real diagnostics in `lsp_diagnostics`. Other shell linters (`py_compile`, `node --check`) keep running unconditionally — they're fast, file-local, and correct. Default behavior (LSP disabled, LSP misconfigured, remote backend, file outside a workspace) is unchanged — the existing fallback paths trigger when `_lsp_will_handle` returns False, so users who haven't opted into LSP get the same shell-linter behavior they had before. Drive-by: `.tsx` was missing from the `LINTERS` table entirely, so TS React files got no post-edit syntax check at all. Added it for symmetry; in practice it now hits the LSP-skip path. Tests: - `tests/agent/lsp/test_shell_linter_lsp_skip.py` — 14 tests covering: * skip happens for each redundant extension when LSP claims the file (asserted by patching `_exec` to raise on any shell-linter call) * shell linter still runs when LSP is inactive (regression guard) * `.py` / `.js` continue to run unconditionally even with LSP active * `_lsp_will_handle` is exception-safe: returns False on None service, remote backend, or `enabled_for` raising * `.tsx` is in both `LINTERS` and `_SHELL_LINTER_LSP_REDUNDANT` - All pre-existing tests in `tests/agent/lsp/` and `tests/tools/test_file_operations.py` still pass (233/233). fix(lint): address Copilot review on #29054 Two fixes from copilot-pull-request-reviewer on PR #29054: 1. `.tsx` regression with LSP disabled (https://github.com/NousResearch/hermes-agent/pull/29054#discussion_r3271017282) The first revision added `.tsx` to the `LINTERS` table so that TypeScript React files would hit the LSP skip path. Side effect: when LSP is disabled (the default), `.tsx` edits would suddenly run `npx tsc --noEmit FILE.tsx` and inherit the same phantom-error dump this PR is supposed to fix. Pre-PR behavior was implicit `skipped` (no `LINTERS` entry); restore that. - Remove `.tsx` from `LINTERS`. - Remove `.tsx` from `_SHELL_LINTER_LSP_REDUNDANT` (the skip path is unreachable without a `LINTERS` entry — falls through to `ext not in LINTERS` first). - When LSP IS enabled, `.tsx` is still covered by the LSP tier via `_maybe_lsp_diagnostics` (typescript-language-server's `extensions` tuple includes `.tsx`), so the diagnostics still surface — just on the `lsp_diagnostics` channel, not `lint`. - Update test_shell_linter_lsp_skip.py to reflect this contract (drop `.tsx` from the parametrize lists; add `test_tsx_stays_out_of_linters_table_for_default_compatibility` and `test_tsx_default_check_lint_returns_skipped`). 2. V4A patches dropped `WriteResult.lsp_diagnostics` (https://github.com/NousResearch/hermes-agent/pull/29054#discussion_r3271017295) `tools/patch_parser.py::apply_v4a_operations` calls `file_ops.write_file()` per operation, then calls `_check_lint()` directly afterwards — but never propagates `WriteResult.lsp_diagnostics` to the `PatchResult`. The shell-linter skip introduced in this PR makes the gap visible: a `.ts` / `.go` / `.rs` V4A patch with LSP active would return `lint = {f: {skipped: True}}` and zero diagnostics from any channel. - `_apply_add` and `_apply_update` now return `Tuple[bool, str, Optional[str]]` where the third element is `WriteResult.lsp_diagnostics` (or `None` on failure / no diags). - `_apply_delete` and `_apply_move` stay 2-tuples — they don't produce diagnostics, no write goes through `write_file`. - `apply_v4a_operations` accumulates per-file diagnostics blocks and surfaces a combined block on `PatchResult.lsp_diagnostics`. Each block already carries its `<diagnostics file="...">` header from `LSPService.report_for_file`, so concatenation preserves per-file attribution. Tests added (`test_patch_parser.py::TestV4ALspDiagnosticsPropagation`): - ADD op: `WriteResult.lsp_diagnostics` flows to `PatchResult` - UPDATE op: same - No diagnostics → `PatchResult.lsp_diagnostics is None` (not "") - Multi-file patch: combined block contains every per-file block Verification: - Targeted test scope: 257/257 pass (tests/agent/lsp/, tests/tools/test_file_operations.py, tests/tools/test_patch_parser.py) - Wider sweep: 5400 pass; 11 failures all pre-existing on origin/main (file_staleness / file_read_guards / file_state_registry — unrelated macOS /var/folders tmp-path sensitivity issues, confirmed by re-running on a clean origin/main checkout) docs(test): align shell-linter LSP skip docstring with .tsx behavior Copilot review feedback (review #4324947616, comment #3271049036): the test module docstring still listed .tsx alongside .ts/.go/.rs in the skip contract, but .tsx is now intentionally NOT in LINTERS or _SHELL_LINTER_LSP_REDUNDANT. Updated the bullet list to drop .tsx from the skip contract and added a paragraph documenting why .tsx is left out (preserves pre-PR implicit-skip behavior for LSP-disabled users; LSP coverage still happens via _maybe_lsp_diagnostics). * test(lsp): drop unused tmp_path from _make_fops helper Copilot review #3271069484: the helper accepted tmp_path but never used it. Callers still need tmp_path themselves for the file they're asserting against, so we just drop the helper's parameter.	2026-05-20 01:46:40 -05:00
ooovenenoso	d759a67c0f	fix: add recovery hints to loop guard warnings	2026-05-19 00:12:12 -07:00
Teknium	b5c1fe78aa	feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373 ) Skill bundles are tiny YAML files in ~/.hermes/skill-bundles/ that group several skills under one slash command. Invoking /<bundle-name> from any surface (CLI, TUI, dashboard, any gateway platform) loads every referenced skill into a single combined user message. Use cases: - /backend-dev → loads github-code-review + test-driven-development + github-pr-workflow as one bundle. - /research → loads several research skills together. - Team task profiles shared via dotfiles. Behavior: - Bundles take precedence over individual skills when slugs collide. - Missing skills are skipped with a note, not fatal. - No system-prompt mutation — bundles generate a fresh user message at invocation time, the same way /<skill> does. Prompt cache stays intact. - Works in CLI dispatch, gateway dispatch, autocomplete (CLI + TUI), /help display. Schema (~/.hermes/skill-bundles/<slug>.yaml): name: backend-dev description: Backend feature work. skills: - github-code-review - test-driven-development instruction: \| Optional extra guidance prepended to the loaded skills. New module: agent/skill_bundles.py — load, scan, resolve, build invocation message, save, delete. yaml.safe_load only; broken bundles log a warning and are skipped, never raise. New CLI subcommand: hermes bundles {list,show,create,delete,reload}. Implementation in hermes_cli/bundles.py; wired in hermes_cli/main.py. 'bundles' added to _BUILTIN_SUBCOMMANDS so plugin discovery skips it. New in-session slash command: /bundles lists installed bundles in both CLI and gateway. /<bundle-name> dispatch added to CLI (cli.py) and gateway (gateway/run.py) before the existing /<skill-name> path. Autocomplete: SlashCommandCompleter gained an optional skill_bundles_provider parameter that defaults to None — the prompt shows '▣ <description> (N skills)' for bundles vs '⚡' for skills. Tests: - tests/agent/test_skill_bundles.py — 33 tests covering slugify, scan/cache freshness, resolve (including underscore→hyphen Telegram alias), build_bundle_invocation_message (loading, missing skills, user/bundle instruction injection, dedup), save/delete, reload diff, list sort. - tests/hermes_cli/test_bundles.py — 8 tests for the CLI subcommand (create/list/show/delete/reload, --force, missing bundle errors). - tests/gateway/test_bundles_command.py — 4 tests for the gateway handler and bundle resolution priority. Live E2E: verified subprocess invocations of hermes bundles {list,create,show,reload,delete} round-trip correctly against an isolated HERMES_HOME. Docs: - website/docs/user-guide/features/skills.md — new 'Skill Bundles' section with quick example, YAML schema, management commands, behavior notes. - website/docs/reference/cli-commands.md — 'hermes bundles' added to the top-level command table and given its own subcommand section.	2026-05-18 21:38:05 -07:00
briandevans	756900723a	fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS Qwen3.x and DeepSeek-V3.x default to chatty/hallucinatory tool use without enforcement steering — agents narrate "calling tool X" without actually emitting a tool call, or run partial loops. Both model families fit the same failure pattern TOOL_USE_ENFORCEMENT_GUIDANCE was already injected for (gpt, codex, gemini, gemma, grok, glm). Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com> Squashed salvage of: - `403e567ce` fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS - `9433eabe7` test(agent): use realistic qwen-plus identifier in enforcement test Fixes #28079.	2026-05-18 20:06:49 -07:00
AceWattGit	25e0f4d465	fix: wrap _pool_may_recover_from_rate_limit call through run_agent namespace The conversation_loop.py references _pool_may_recover_from_rate_limit which was defined in run_agent.py. After the conversation-loop extraction refactor, the helper was no longer in the same module scope. Wrap the call as _ra()._pool_may_recover_from_rate_limit() to route through the run_agent monkeypatch namespace where the helper is available. Adds regression test in test_gemini_fast_fallback.py. Fixes: MAILROOM Email Triage NameError, OPS Execution Monitor NameError.	2026-05-18 20:04:57 -07:00
墨綠BG	50e93f23f2	🐛 fix(memory): require newline after context tag	2026-05-18 10:53:08 -07:00
墨綠BG	341c8d3030	🐛 fix(memory): keep inline memory-context mentions visible	2026-05-18 10:53:08 -07:00
EloquentBrush0x	b570e0fdd0	fix(codex-oauth): quarantine terminal refresh errors so dead tokens are not replayed across sessions When a Codex OAuth refresh token is permanently invalidated (HTTP 400/401/403, token revoked or reused), _mark_exhausted was called but auth.json was left with the dead credentials. On the next session, _seed_from_singletons re-read auth.json and re-seeded the pool with the same revoked token, triggering the same terminal failure in a loop. Add _is_terminal_codex_oauth_refresh_error to auth.py and a matching quarantine block in _refresh_entry: when a terminal error is detected and auth.json holds no newer tokens, clear access_token/refresh_token from auth.json and remove all device_code-sourced pool entries from memory. Mirrors the Nous quarantine added in `c90556262` and the xAI quarantine in #28116. Also add a pre-refresh sync from auth.json before calling refresh_codex_oauth_pure, matching the xAI and Nous patterns, to avoid refresh_token_reused races when multiple Hermes processes share the same auth.json singleton. Salvaged from #27911 by @EloquentBrush0x — contributor's branch was severely stale (would have reverted ~5000 LOC across azure/kanban/i18n subsystems); fix re-applied surgically on current main with their predicate and tests preserved.	2026-05-18 10:31:40 -07:00
Teknium	9aae59feab	fix(compress): make abort-on-summary-failure opt-in via config flag (#28117 ) PR #28102 made the summary-failure abort path the unconditional default, changing established behavior. Gate it behind config.yaml flag `compression.abort_on_summary_failure` (default False = historical fallback-placeholder behavior). - hermes_cli/config.py: new `compression.abort_on_summary_failure` key, default False, documented inline. - agent/agent_init.py: read the flag from compression config and pass to ContextCompressor. - agent/context_compressor.py: `__init__` accepts `abort_on_summary_failure` (default False). `compress()` failure branch gates the abort on the flag; when False, falls through to the restored legacy fallback path (static "summary unavailable" placeholder + drop middle window). - tests: restore original fallback expectations as default; add new TestAbortOnSummaryFailure class for the opt-in mode. Gateway/CLI plumbing (force=True on /compress, hygiene/handler abort detection, locale `gateway.compress.aborted` key) from PR #28102 stays intact — those paths only fire when `_last_compress_aborted` is True, which now only happens when the flag is enabled.	2026-05-18 10:28:20 -07:00
EloquentBrush0x	5e40f83cb7	fix(xai-oauth): quarantine terminal refresh errors so dead tokens are not replayed across sessions When refresh_xai_oauth_pure raises a terminal error (HTTP 400/401/403, i.e. revoked or reused refresh token), _refresh_entry's existing race- recovery path re-syncs from auth.json and returns if another process has already rotated the tokens. If auth.json still holds the same stale token pair, the function fell through to _mark_exhausted — leaving the dead credentials in auth.json. On the next Hermes startup _seed_from_singletons re-seeded the pool from those stale tokens, causing the same failure loop on every session. Fix: after the auth.json re-sync check in the xAI-oauth error handler, detect terminal errors with the new _is_terminal_xai_oauth_refresh_error helper and apply a quarantine: - Clear access_token and refresh_token from providers["xai-oauth"]["tokens"] in auth.json so they are not re-seeded. - Write a last_auth_error entry for hermes doctor / auth status diagnostics. - Remove all loopback_pkce entries from the in-memory pool so the current session stops retrying with the dead credentials. Mirrors the identical quarantine already in place for Nous OAuth (`c90556262`). Closes the parity gap introduced when `c90556262` added Nous-only terminal error handling without a corresponding xAI-oauth path.	2026-05-18 10:28:09 -07:00
flamiinngo	5613dfea93	fix(security): redact xAI (Grok) API keys in logs xAI is a first-class provider in hermes-agent with its own credential pool entry (XAI_API_KEY / xai-oauth). API keys follow the format xai-<60+ alphanumeric chars> and were absent from _PREFIX_PATTERNS in agent/redact.py. When a key appears raw in log output, tool results, or error messages, it passed through completely unmasked. The ENV-assignment and Bearer header patterns catch the most common cases, but a raw token in a stack trace or debug print had no protection. Verified before fix: redact_sensitive_text("using key xai-ABCD...rstu to call xAI", force=True) # "using key xai-ABCD...rstu to call xAI" <- exposed After fix: # "using key xai-AB...rstu to call xAI" <- masked Five unit tests added to TestXaiToken covering bare token masking, env assignment, short-prefix false positive, company name false positive, and visible prefix in masked output.	2026-05-18 10:21:22 -07:00
Teknium	1634397ddb	fix(compress): abort instead of dropping messages when summary LLM fails (#28102 ) When auxiliary compression's summary generation returns None (aux model errored, returned non-JSON, timed out, etc.) the compressor previously still dropped every middle message between compress_start..compress_end and replaced them with a static 'Summary generation was unavailable' placeholder. The session kept going but the user silently lost N turns of context for nothing. New behavior: on summary failure, compress() aborts entirely — returns the input messages unchanged and sets _last_compress_aborted=True. The existing _summary_failure_cooldown_until gate (30-60s) keeps the aux model from being burned on every turn. Auto-compress callers detect the no-op (len(after) == len(before)) and stop looping. The chat is 'frozen' at its current size until the next /compress or /new. Manual /compress (CLI + gateway) now passes force=True which clears the cooldown so users can retry immediately after an auto-abort. If the manual retry also fails, the user gets a visible warning telling them nothing was dropped and how to retry. - agent/context_compressor.py: compress() gains force= kwarg; failure branch sets _last_compress_aborted and returns messages unchanged instead of inserting placeholder. - run_agent.py: _compress_context() detects abort, surfaces warning, skips session-rotation entirely, returns messages unchanged. - cli.py + gateway/run.py: manual /compress paths pass force=True. - gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted and emit the new 'Compression aborted' warning (gateway.compress.aborted) instead of the old 'N historical messages were removed' message. - locales/*.yaml: new gateway.compress.aborted key in all 16 locales. - tests: updated to assert the abort contract (messages preserved, compression_count not incremented, abort flag set, no placeholder leaked). New test_force_true_bypasses_failure_cooldown covers the manual-retry path.	2026-05-18 10:19:40 -07:00
glennc	9df9816dab	feat(azure-foundry): add Microsoft Entra ID auth Use azure-identity DefaultAzureCredential for keyless Foundry auth. Preserve refreshable callable credentials through OpenAI and Anthropic client paths. Add setup, doctor, auth status, docs, and tests for Entra auth. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-18 10:14:38 -07:00
teknium	f0c6d59148	fix(anthropic): scope MiniMax beta-strip to MiniMax only Cherry-pick of @sharziki's #27022 routed Azure Foundry through _requires_bearer_auth, which also triggered the MiniMax-specific beta-strip in _common_betas_for_base_url — dropping the 1M-context beta from Azure even though Azure needs it for 1M context. Split the strip predicate: introduce _is_minimax_anthropic_endpoint so the fine-grained-tool-streaming and context-1m strips only fire for MiniMax hosts, leaving Azure's bearer-auth header swap intact without losing 1M context. Also add a regression test that asserts Azure gets Bearer auth, the api-version query param, and the context-1m-2025-08-07 beta.	2026-05-18 09:27:18 -07:00
wysie	ff078738ea	fix(skills): load symlinked skill slash commands	2026-05-18 00:34:29 -07:00
teknium1	4a3f13b47b	perf(prompt-cache): date-only timestamp + loud gateway-DB roundtrip logging The system prompt's 'Conversation started:' line carried minute precision (%I:%M %p), making it byte-unstable across every rebuild path. Within a CLI session the in-memory cache held, but on the gateway path (fresh AIAgent per turn → restore from session DB), any silent failure in the read or write path dropped the cache stem and forced a full re-prefill on every subsequent turn. Local prefix-caching backends (llama.cpp / vLLM) saw this as KV-cache invalidation; remote prefix-caching providers saw it as an Anthropic-style cache miss. Three changes: 1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM'). System prompt now byte-stable for the full day. The model can still query exact time via tools when it actually needs it. Credit: @iamfoz (PR #20451). 2. Loud logging on session DB write failures. The update_system_prompt call used to log at DEBUG, hiding disk-full / locked-database / schema drift behind a silent fall-through that forced fresh rebuilds on every subsequent turn. Now WARN with the session id and exception so persistent issues show up in agent.log without verbose mode. 3. Three-way stored-state distinction on read. The previous 'session_row.get("system_prompt") or None' collapsed three states into one (missing row / null column / empty string). Now we tell them apart and WARN when a continuing session lands on null/empty (which means the previous turn's write never persisted — every subsequent turn rebuilds and the prefix cache misses every time). The restore block is extracted into _restore_or_build_system_prompt() so the prefix-cache path can be unit-tested in isolation. E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary sleep restores byte-identical bytes from the session DB. NULL stored prompt fires the new warning. Date-only timestamp survives the rebuild path. All on real SessionDB, no mocks. Tests: - tests/agent/test_system_prompt_restore.py (10 new tests) - tests/run_agent/test_run_agent.py::TestBuildSystemPrompt:: test_datetime_is_date_only_not_minute_precision Closes #20451 (date-only), #18547 (prefix stabilization), #8689 (stabilize timestamp across compression), #15866 (timestamp caching question), #8687 (compression timestamp), #27339 (claim #3: live timestamp in cached system prompt). Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>	2026-05-17 23:20:37 -07:00
teknium1	766f263bd2	test(auxiliary): cover layered fallback (chain → main agent → warn) 7 new tests: TestAuxiliaryFallbackLayering (3): - configured_chain succeeds → main agent fallback NOT consulted - chain returns nothing → main agent fallback runs and succeeds - both exhausted → user-visible 'all fallbacks exhausted' warning fires before the original error is re-raised TestTryMainAgentModelFallback (4): - returns (None, None, "") when main provider is 'auto' - returns (None, None, "") when failed provider == main provider (no point retrying the same backend) - resolves the main provider's client when configured correctly - skips when main provider is marked unhealthy	2026-05-17 17:15:31 -07:00
teknium1	ec096cfbd8	test(auxiliary): adapt eviction tests to capacity-error fallback The two TestAuxiliaryClientPoisonedCacheEviction tests were written when explicit-provider users got no fallback at all on connection errors — they asserted ConnectionError propagated after eviction because the fallback gate blocked the auto chain. After the #26803 fix in the previous commit, capacity errors (payment/quota/connection) now DO trigger fallback even on explicit providers. The tests still verify cache eviction (their actual contract) but now stub _try_payment_fallback so the fallback machinery does not attempt a real network call.	2026-05-17 17:15:31 -07:00
Bartok9	24c209f112	fix(auxiliary): detect quota exhaustion as payment error; allow capacity-error fallback for explicit providers Closes #26803 Root causes: 1. _is_payment_error() checked for billing keywords (credits, insufficient funds, billing, payment required) but missed daily token quota exhaustion phrases used by Bedrock, Vertex AI, and LiteLLM proxies — e.g. 'Too many tokens per day', 'quota exceeded', 'resource exhausted', 'daily limit'. These are functionally identical to credit exhaustion (provider cannot serve the request) but don't trigger fallback. 2. The call_llm() fallback chain was gated on resolved_provider == 'auto'. When a task resolves to a specific provider (e.g. 'custom' for a LiteLLM proxy, or 'openrouter'), capacity failures (payment/quota/connection) silently raise instead of trying alternatives. This is overly conservative: capacity errors mean the provider cannot serve the request regardless of user intent, so alternatives should always be tried. Fixes: - Add quota-related keywords to _is_payment_error(): quota_exceeded, too many tokens per day, daily limit, tokens per day, daily quota, resource exhausted (Vertex AI gRPC code). - Allow fallback for capacity errors (payment + connection) even when resolved_provider is not 'auto'. Rate-limit fallback stays gated on is_auto to honour explicit provider constraints for transient limits. - Apply both fixes to sync call_llm() and async acall_llm() paths. - Add 6 targeted tests for the new quota-error detection cases.	2026-05-17 17:15:31 -07:00
Robin Fernandes	569bc94b59	fix(auth) fix a few cases where refresh tokens were not rotated.	2026-05-17 16:56:37 -07:00
Robin Fernandes	0bac7dd05b	refactor(auth): collapse Nous inference fallback controls	2026-05-17 16:56:37 -07:00
Robin Fernandes	89a3d038cf	Switch to JWT token for inference against Nous, falling back to old opaque token on failure.	2026-05-17 16:56:37 -07:00
Robin Fernandes	c905562623	fix(auth): stop replaying invalid Nous refresh tokens Quarantine Nous OAuth state when refresh fails with terminal invalid_grant/invalid_token errors. Clear local and shared refresh material across runtime, managed access-token, proxy, and credential-pool paths so Hermes stops retrying revoked refresh sessions.	2026-05-17 16:56:37 -07:00
Hoang V. Pham	4a7cd2e16d	fix(codex): allow kanban worker board writes	2026-05-17 11:50:43 -07:00
QuenVix	d5a0815c3d	fix(transports): use monotonic deadlines in codex app-server turn loop	2026-05-17 11:37:45 -07:00
flamiinngo	63805965e7	fix(security): restore type safety and extract constant in shell hook block handler Address code review feedback on _parse_response: 1. Restore isinstance(raw, str) guard so non-string message/reason values (e.g. integers, lists) from a malformed hook response fall back to the default rather than being forwarded as-is. This keeps the contract that message in the returned dict is always a string. 2. Extract the repeated literal 'Blocked by shell hook.' into a module-level constant _DEFAULT_BLOCK_MESSAGE to avoid duplication and make it easy to change in one place. Four new unit tests added to tests/agent/test_shell_hooks.py covering: - action block with no message (uses default) - decision block with no reason (uses default) - action block with empty string message (uses default) - action block with non-string message, e.g. integer (uses default)	2026-05-17 02:31:18 -07:00
haran2001	d9abbe7fa4	fix(metadata): qwen3.6-plus has a 1M context window (#27008 ) qwen3.6-plus did not have an explicit entry in DEFAULT_CONTEXT_LENGTHS, so the longest-substring fallback matched the generic 'qwen': 131072 catch-all. That dropped the effective context limit from 1,048,576 tokens to 131,072, prematurely lowered the compression threshold, and produced misleading warnings about main/compression context mismatch in long sessions. Add an explicit 'qwen3.6-plus': 1048576 entry before the catch-all and cover it with a regression test (bare, qwen/, and dashscope/ prefixes). Note: PR #6599 also mentions touching model_metadata.py but the actual diff only edits hermes_cli/models.py, so this fix is independent and not duplicated by that PR. Closes #27008	2026-05-17 02:31:18 -07:00
kshitij	5fba236644	chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355 ) Six days after #23937 (608 fixes) the codebase had accumulated 241 new PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix, same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the two are semantically equivalent for hashable scalar membership tests. All 241 instances fixed via `ruff check --select PLR6201 --fix --unsafe-fixes`, zero remaining. Every changed value is a hashable scalar (str/int/None/enum/signal); no risk of unhashable runtime errors. No behavior change. Test plan: - 119 files changed, +244/-244 (net zero) — exactly one-line edits - `ruff check` clean afterward - Compile checks pass on the largest touched files (cli.py, run_agent.py, gateway/run.py, gateway/platforms/discord.py, model_tools.py) - Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/ tests/tools/: 18187 passed, 59 pre-existing failures (verified against origin/main with the same shape — identical failure count, identical category — all xdist test-order flakes unrelated to this change) Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).	2026-05-17 02:29:41 -07:00
Rahul	a52f014a8c	fix(tests): mock keychain in TestReadClaudeCodeCredentials to prevent credential leakage Tests in TestReadClaudeCodeCredentials were not mocking _read_claude_code_credentials_from_keychain, which was added after the tests were written. On macOS machines with real Claude Code credentials stored in the Keychain, the function returns live credentials instead of the test fixtures, causing assertions to fail and leaking real tokens in test output. Add an autouse fixture that stubs the keychain reader to None so all tests in the class exercise only the file-based credential path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 23:05:27 -07:00
0xchainer	782d743730	test(skills): add regression test for skill load failure returning None Add test_returns_none_when_skill_load_fails to verify that build_skill_invocation_message() returns None when a registered skill exists in the command cache but _load_skill_payload() fails. This guards against regression of the fix in `877d01b`.	2026-05-16 22:52:22 -07:00
Teknium	3b39096904	Port from Kilo-Org/kilocode#9434: strip historical media after compression (#27189 ) After context compression, the protected tail messages retain their original image parts. When those include multi-MB pasted screenshots, every subsequent API request re-ships the same base-64 blobs forever — which can push the request past provider body-size limits and wedge the session even though compression 'succeeded'. Add _strip_historical_media() to agent/context_compressor.py. After the summary is built, find the newest user message that carries an image part and replace image parts in every earlier message with a short text placeholder ('[Attached image — stripped after compression]'). The newest image-bearing user turn keeps its media so the model can still analyse what the user just sent. Handles all three multimodal shapes: - OpenAI chat.completions image_url - OpenAI Responses API input_image - Anthropic native {type: image, source: ...} Includes 27 unit tests covering the helpers and the end-to-end compress() integration, plus a manual E2E check confirming a ~4MB two-image conversation shrinks to ~2MB after compression.	2026-05-16 17:18:25 -07:00
Teknium	93e109a1d5	fix(moonshot): strip $ref siblings and collapse tuple items in tool schemas (#27104 ) Port from anomalyco/opencode#24730: Moonshot's JSON Schema validator rejects two shapes that the rest of the JSON Schema ecosystem accepts: 1. $ref nodes with sibling keywords. Moonshot expands the reference before validation and then rejects the node if keys like `description`, `type`, or `default` appear alongside $ref. MCP-sourced tool schemas commonly put a `description` on $ref-typed properties so the model sees the field hint — which worked on every provider except Moonshot. 2. Tuple-style `items` arrays (positional element schemas). Moonshot's engine requires ONE schema applied to every array element. Common in tool schemas generated from Go/Protobuf that model fixed-length arrays as `[{type:number}, {type:number}]`. Repairs applied in `agent/moonshot_schema.py`: - Rule 3: when a node has `$ref`, return `{"$ref": <value>}` only (strip every sibling). The referenced definition still carries its own description on the target node, which Moonshot accepts. - Rule 4: when `items` is a list, collapse to the first element schema (falling back to `{}` which is then filled by the generic missing-type rule). Preserves `minItems` / `maxItems` / other siblings. Tests: 10 new cases across TestRefSiblingStripping + TestTupleItems, plus the existing TestMissingTypeFilled::test_ref_node_is_not_given_synthetic_type still passes (it asserted plain $ref passes through; now it passes through as exactly `{"$ref": "..."}` which is strictly compatible). All 35 tests in test_moonshot_schema.py pass.	2026-05-16 13:02:19 -07:00
teknium1	72f94f4a7c	test(security): regression guard for OAuth PKCE state/verifier separation Two unit tests for run_hermes_oauth_login_pure(): 1. test_authorization_url_state_is_not_pkce_verifier — asserts state in the auth URL is independent from the PKCE code_verifier sent in the token exchange, and that the verifier never appears in the URL. 2. test_callback_state_mismatch_aborts — asserts the flow returns None (no token exchange) when the callback state does not match the value we generated. Negative control verified: reintroducing the `b17e5c10` vulnerable pattern (state = verifier, no callback validation) makes both tests fail. Also adds AUTHOR_MAP entry for shaun0927 (contributor of the fix).	2026-05-16 02:38:02 -07:00
teknium1	374dc81c23	fix(copilot-acp): tighten deprecation detection + sharpen GitHub Models 413 hint Follow-up improvements on top of @konsisumer's cherry-picked fix for #10648: 1. Deprecation patterns required BOTH a product fingerprint ('gh-copilot') and a deprecation marker. The previous list included 'copilot-cli' and bare 'deprecation', which would false-positive on stderr from the NEW @github/copilot CLI — whose repo is literally github.com/github/copilot-cli and which legitimately surfaces those substrings in its own messages. 2. Replace the deprecation hint. The user in #10648 installed 'gh extension install github/gh-copilot' (the deprecated extension) thinking that's what ACP mode uses, when ACP actually spawns the new 'copilot' binary from '@github/copilot'. The hint now points users at the correct install command ('npm install -g @github/copilot') with the new CLI's repo URL, and demotes provider-switching to a fallback alternative. 3. Change _URL_TO_PROVIDER value for models.inference.ai.azure.com from the 'github-models' alias to the canonical 'copilot' provider id, matching the convention used by every other entry in the table. 4. Sharpen the 413 hint message. The free tier's ~8K cap is below the system-prompt floor, so this endpoint is fundamentally incompatible with an agentic loop — not a 'use a different URL' problem. Tests: - New parametrized false-positive coverage for the new CLI's stderr shape. - Updated assertion to require canonical 'copilot' provider mapping. - All 14 deprecation/URL tests pass.	2026-05-16 02:24:48 -07:00
konsisumer	b85b938b1f	test: add tests for copilot ACP deprecation detection and Azure URL mapping Cover the deprecation pattern matching against real gh-copilot stderr output, verify the GitHub Models Azure URL is in _URL_TO_PROVIDER, and confirm _is_github_models_base_url recognises the Azure endpoint.	2026-05-16 02:24:48 -07:00
helix4u	97a32afdc4	fix(auxiliary): resolve xai oauth compression from pool	2026-05-15 19:53:37 -07:00
Teknium	31ba2b0cbc	fix(xai-oauth): recover from prelude SSE errors, gate reasoning replay, surface entitlement 403s (#26644 ) Three fixes for the May 2026 xAI OAuth (SuperGrok / X Premium) rollout failures: - _run_codex_stream: when openai SDK raises RuntimeError("Expected to have received `response.created` before `<type>`"), retry once then fall back to responses.create(stream=True) — same path used for missing-response.completed postlude. Fallback surfaces the real provider error with body+status_code intact. Also fixes #8133 (response.in_progress prelude on custom relays) and #14634 (codex.rate_limits prelude on codex-lb). - _summarize_api_error: when error body matches xAI's entitlement shape, append a one-line hint pointing to https://grok.com and /model. Once-only, applies to both auxiliary warnings and main-loop error surfacing. - _chat_messages_to_responses_input: new is_xai_responses kwarg drops replayed codex_reasoning_items (encrypted_content) before they reach xAI. Also drops reasoning.encrypted_content from the xAI include array. Native Codex behavior unchanged. Grok still reasons natively each turn; coherence rides on visible message text alone. Closes #8133, #14634.	2026-05-15 16:35:12 -07:00
kchantharuan	13c3d4b4ef	feat(nvidia): add NIM billing origin header	2026-05-15 14:06:51 -07:00
Teknium	4e89c53082	fix(async): close unscheduled coroutines in all threadsafe bridges (#26584 ) Wraps every sync->async coroutine-scheduling site in the codebase with a new agent.async_utils.safe_schedule_threadsafe() helper that closes the coroutine on scheduling failure (closed loop, shutdown race, etc.) instead of leaking it as 'coroutine was never awaited' RuntimeWarnings plus reference leaks. 22 production call sites migrated across the codebase: - acp_adapter/events.py, acp_adapter/permissions.py - agent/lsp/manager.py - cron/scheduler.py (media + text delivery paths) - gateway/platforms/feishu.py (5 sites, via existing _submit_on_loop helper which now delegates to safe_schedule_threadsafe) - gateway/run.py (10 sites: telegram rename, agent:step hook, status callback, interim+bg-review, clarify send, exec-approval button+text, temp-bubble cleanup, channel-directory refresh) - plugins/memory/hindsight, plugins/platforms/google_chat - tools/browser_supervisor.py (3), browser_cdp_tool.py, computer_use/cua_backend.py, slash_confirm.py - tools/environments/modal.py (_AsyncWorker) - tools/mcp_tool.py (2 + 8 _run_on_mcp_loop callers converted to factory-style so the coroutine is never constructed on a dead loop) - tui_gateway/ws.py Tests: new tests/agent/test_async_utils.py covers helper behavior under live loop, dead loop, None loop, and scheduling exceptions. Regression tests added at three PR-original sites (acp events, acp permissions, mcp loop runner) mirroring contributor's intent. Live-tested end-to-end: - Helper stress test: 1500 schedules across live/dead/race scenarios, zero leaked coroutines - Race exercised: 5000 schedules with loop killed mid-flight, 100 ok / 4900 None returns, zero leaks - hermes chat -q with terminal tool call (exercises step_callback bridge) - MCP probe against failing subprocess servers + factory path - Real gateway daemon boot + SIGINT shutdown across multiple platform adapter inits - WSTransport 100 live + 50 dead-loop writes - Cron delivery path live + dead loop Salvages PR #2657 — adopts contributor's intent over a much wider site list and a single centralized helper instead of inline try/except at each site. 3 of the original PR's 6 sites no longer exist on main (environments/patches.py deleted, DingTalk refactored to native async); the equivalent fix lives in tools/environments/modal.py instead. Co-authored-by: JithendraNara <jithendranaidunara@gmail.com>	2026-05-15 14:00:01 -07:00
Jaaneek	7fdc16dd4a	refactor(transports/codex): trim duplicated cache-key comments The xAI prompt_cache_key block carried two long comment paragraphs that either restated setdefault semantics, narrated the SDK type-validation mechanism, or recapped the historical motivation for the extra_body indirection — all already covered by the test docstring at test_xai_responses_sends_cache_key_via_extra_body (which links to the xAI docs). Also restored the truncated link in the body-injection comment. No behavior change.	2026-05-15 12:11:32 -07:00
Jaaneek	b62c997973	feat(xai-oauth): add xAI Grok OAuth (SuperGrok Subscription) provider Adds a new authentication provider that lets SuperGrok subscribers sign in to Hermes with their xAI account via the standard OAuth 2.0 PKCE loopback flow, instead of pasting a raw API key from console.x.ai. Highlights ---------- * OAuth 2.0 PKCE loopback login against accounts.x.ai with discovery, state/nonce, and a strict CORS-origin allowlist on the callback. * Authorize URL carries `plan=generic` (required for non-allowlisted loopback clients) and `referrer=hermes-agent` for best-effort attribution in xAI's OAuth server logs. * Token storage in `auth.json` with file-locked atomic writes; JWT `exp`-based expiry detection with skew; refresh-token rotation synced both ways between the singleton store and the credential pool so multi-process / multi-profile setups don't tear each other's refresh tokens. * Reactive 401 retry: on a 401 from the xAI Responses API, the agent refreshes the token, swaps it back into `self.api_key`, and retries the call once. Guarded against silent account swaps when the active key was sourced from a different (manual) pool entry. * Auxiliary tasks (curator, vision, embeddings, etc.) route through a dedicated xAI Responses-mode auxiliary client instead of falling back to OpenRouter billing. * Direct HTTP tools (`tools/xai_http.py`, transcription, TTS, image-gen plugin) resolve credentials through a unified runtime → singleton → env-var fallback chain so xai-oauth users get them for free. * `hermes auth add xai-oauth` and `hermes auth remove xai-oauth N` are wired through the standard auth-commands surface; remove cleans up the singleton loopback_pkce entry so it doesn't silently reinstate. * `hermes model` provider picker shows "xAI Grok OAuth (SuperGrok Subscription)" and the model-flow falls back to pool credentials when the singleton is missing. Hardening --------- * Discovery and refresh responses validate the returned `token_endpoint` host against the same `.x.ai` allowlist as the authorization endpoint, blocking MITM persistence of a hostile endpoint. Discovery / refresh / token-exchange `response.json()` calls are wrapped to raise typed `AuthError` on malformed bodies (captive portals, proxy error pages) instead of leaking JSONDecodeError tracebacks. * `prompt_cache_key` is routed through `extra_body` on the codex transport (sending it as a top-level kwarg trips xAI's SDK with a TypeError). * Credential-pool sync-back preserves `active_provider` so refreshing an OAuth entry doesn't silently flip the active provider out from under the running agent. Testing ------- * New `tests/hermes_cli/test_auth_xai_oauth_provider.py` (~63 tests) covers JWT expiry, OAuth URL params (plan + referrer), CORS origins, redirect URI validation, singleton↔pool sync, concurrency races, refresh error paths, runtime resolution, and malformed-JSON guards. * Extended `test_credential_pool.py`, `test_codex_transport.py`, and `test_run_agent_codex_responses.py` cover the pool sync-back, `extra_body` routing, and 401 reactive refresh paths. * 165 tests passing on this branch via `scripts/run_tests.sh`.	2026-05-15 12:11:32 -07:00
ethernet	1702a94c88	Merge pull request #25957 from stephenschoettler/fix/main-ci-unblocker-after-21012 fix(ci): stabilize shared test state after 21012	2026-05-14 21:26:52 -04:00
Teknium	19071529f6	fix(lsp): shift baseline diagnostics into post-edit coordinates (#25978 ) Pre-existing diagnostics below an edit point used to surface as 'LSP diagnostics introduced by this edit' whenever the edit deleted or inserted lines. The delta-filter key included the diagnostic's range, so the same logical error reported at a different line in the post-edit snapshot looked like a brand new diagnostic. Concrete case: deleting 14 lines in cli.py caused Pyright errors at lines 9873, 10590, 12413, 13004 (unrelated to the edit) to be reported as introduced by it. Fix: build a piecewise-linear line-shift map (via difflib's SequenceMatcher) from pre and post content, and remap baseline diagnostics into post-edit coordinates before the set-difference. Diagnostics in deleted regions drop out cleanly; diagnostics below the edit shift by the right amount; diagnostics above are untouched. The strict (range-aware) equality key stays — so a genuinely new instance of an identical error class at a different line still surfaces as new. Pieces: - agent/lsp/range_shift.py — build_line_shift, shift_diagnostic_range, shift_baseline. Pure functions, no LSP state. - agent/lsp/manager.py — LSPService.get_diagnostics_sync gains an optional line_shift kwarg; baseline is shift_baseline'd before computing the seen-set. _diag_key keeps the strict range key. - tools/file_operations.py — write_file captures pre_content for any LSP-handled extension (not just LINTERS_INPROC) and passes pre/post to _maybe_lsp_diagnostics, which builds the shift map. - New _lsp_handles_extension helper guards the pre_content read. Trade-offs preserved: - Genuinely new same-class errors at different lines still surface (content-only key would have swallowed them). - Pre-existing errors at unshifted positions still get filtered (covered by the strict-key path with no shift). - Best-effort: when pre_content can't be captured (file didn't exist, permissions), the unshifted comparison still catches most pre-existing errors; the edge case it misses is a new file with a non-empty baseline, which is structurally impossible.	2026-05-14 15:56:07 -07:00
Teknium	fe83c4001b	fix(codex-app-server): attach redacted stderr tail to generic failures (#25929 ) When codex app-server fails outside the OAuth-classified path (non-auth turn/start errors, plain TimeoutErrors, generic turn-ended status, subprocess silently exits, hard deadline timeout), the user got a bare 'Internal error' / 'turn/start failed: ...' with no context. Diagnosing config/provider/auth-bridge issues forced a re-run with verbose codex flags. Add a _format_error_with_stderr helper that appends the last few stderr lines via agent.redact.redact_sensitive_text(force=True), and use it at every catch-all error site: - ensure_started() failures (codex init / thread/start) now return a TurnResult.error with should_retire=True instead of bubbling - non-OAuth turn/start CodexAppServerError / TimeoutError - subprocess-died branch (previously dumped raw stderr_blob[-300:] with no redaction — a leak risk) - turn ended with non-completed status - hard turn-timeout deadline OAuth-classified failures and the post-tool quiet watchdog already produce clean hints and stay unchanged. The redactor catches sk-, gh_*, Authorization: Bearer, query-string tokens, JWTs, private keys, etc., so provider error payloads can't leak into chat output or trajectories. Inspired by openclaw#80718, adapted for our app-server transport.	2026-05-14 14:55:23 -07:00
Stephen Schoettler	5ce0067c08	fix(ci): stabilize shared test state after 21012	2026-05-14 14:28:14 -07:00
ethernet	cd64bed55e	Merge pull request #21012 from stephenschoettler/fix/ci-pr-check-unblock fix(ci): unblock shared PR checks	2026-05-14 16:16:42 -04:00
EthanGuo-coder	26933c2f59	fix(agent/gemini-cloudcode): seed delta defaults for reasoning-only stream chunks _make_stream_chunk built delta_kwargs with only `role`, so a reasoning-only chunk produced a SimpleNamespace without a `.content` attribute. Downstream consumers that read `delta.content` then raised AttributeError on Gemini 2.5 Flash, where the thinking delta arrives before any content delta. Seed `content`, `tool_calls`, `reasoning`, and `reasoning_content` as None up front, matching the pattern already used in gemini_native_adapter.py. Key-present arguments still override the defaults. Fixes #24974 References: Related open PR #24984 (luyao618) applies the same 1-line fix; this PR adds a regression test that #24984 omits Co-Authored-By: Claude <noreply@anthropic.com>	2026-05-14 08:03:56 -07:00
Teknium	12f755c9eb	fix(codex-runtime): retire wedged sessions + post-tool watchdog + OAuth refresh classify (#25769 ) Mirrors openclaw beta.8's app-server resilience fixes so a stuck codex subprocess can't burn the full turn deadline and so users get a `codex login` pointer instead of raw RPC errors when their token expires. - TurnResult.should_retire signals the caller to drop+respawn codex. - Deadline-hit path and dead-subprocess detection set should_retire so the next turn doesn't ride a CPU-spinning or auth-broken process. - Post-tool watchdog (post_tool_quiet_timeout=90s): if a tool item completes and codex goes silent past the threshold without further output or turn/completed, fast-fail instead of waiting the full 600s. Resets on any non-tool activity so normal think-after-tool flows are not affected. - <turn_aborted> and <turn_aborted/> in agent text are treated as terminal — some codex builds tear down a turn that way without emitting turn/completed. - _classify_oauth_failure() inspects RPC error message + stderr tail for invalid_grant / token refresh / 401 / etc. and rewrites user-facing errors to 'run codex login'. Conservative: generic failures still surface verbatim. Fires at turn/start failure, turn/completed failure, and dead-subprocess paths. - thread/start cross-fill: tolerate thread.id, thread.sessionId, top-level sessionId/threadId so future codex schema drift doesn't KeyError us at handshake. - run_agent.py: when run_turn returns should_retire=True OR raises, close + null self._codex_session so the next turn respawns. Tests: +30 cases across session + integration suites. tests/agent/transports/test_codex_app_server_session.py 50/50 pass tests/run_agent/test_codex_app_server_integration.py 27/27 pass Broader codex scope (transports + cli runtime/migration) 376/376 pass	2026-05-14 07:55:09 -07:00
teknium1	4ceab16893	fix(compression): keep default protect_first_n at 3 + align ABC Follow-up on the salvaged feat commit: - Keep the constructor / config / yaml-example default at 3 so existing gateway and CLI users see no behavioural change. PR #13754 (which this builds on) had lowered the default to 2 to chase pre-feature parity in the system-prompt-present case, at the cost of quietly halving the protected head for the gateway path (which strips the system prompt before calling compress()). With the new "system prompt is implicit" semantics, default 3 gives every caller a stable head shape. - agent/context_engine.py: bring the ABC's protect_first_n docstring in line with the new semantics so plugin context engines interpret the config key the same way the built-in compressor does. - tests: adjust the default-value test (3, not 2) and a stale comment; per-test protect_first_n=2/3/1 values added in PR #13754 stay as-is since those tests fix concrete head shapes.	2026-05-13 22:25:16 -07:00

1 2 3 4 5 ...

493 commits