Commit graph

6430 commits

Author SHA1 Message Date
konsisumer
e4b69bf149 fix(gateway): guard against None request_overrides in _build_api_kwargs 2026-04-28 06:57:23 -07:00
Teknium
1d8b9e6458
fix(auxiliary): auto-detect Anthropic Messages transport for all aux clients (#17027)
Auxiliary tasks (title_generation, vision, compression, web_extract,
session_search) now pick the correct wire protocol based on the
endpoint, not just on which resolve_provider_client branch built the
client.  Fixes 404s on Kimi Coding Plan and any other named provider
whose endpoint speaks Anthropic Messages.

Root cause: the 'api_key' branch of resolve_provider_client (and the
Step 2 fallback chain inside _resolve_auto) always built a plain
OpenAI client regardless of what the endpoint actually spoke.  For
provider=kimi-coding + model=kimi-for-coding, that meant:

    POST https://api.kimi.com/coding/v1/chat/completions
    { "model": "kimi-for-coding", ... }
    → 404 resource_not_found_error

The /coding route only accepts the Anthropic Messages shape (the main
agent already uses api_mode=anthropic_messages for it).  Earlier fixes
(#16819, #22ddac4b1) patched the anonymous-custom, named-custom, and
external-process branches — but the named api_key branch (kimi-coding,
minimax, zai, future /anthropic providers) was the fourth sibling and
never got the same treatment.

Fix: one module-level helper _maybe_wrap_anthropic() that rewraps a
plain OpenAI client in AnthropicAuxiliaryClient when:

  - api_mode is explicitly 'anthropic_messages', OR
  - the URL ends in '/anthropic', OR
  - the host is api.kimi.com + path contains '/coding', OR
  - the host is api.anthropic.com.

Wired into _wrap_if_needed (covers all resolve_provider_client
branches that already go through it) and into the Step 2 api_key
fallback chain inside _resolve_auto.  Explicit api_mode still wins:
passing api_mode='chat_completions' forces OpenAI wire, and already-
wrapped specialized adapters (Codex, Gemini native, CopilotACP) pass
through unchanged.

E2E verified:
- resolve_provider_client('kimi-coding', 'kimi-for-coding')
  → AnthropicAuxiliaryClient (was plain OpenAI, which 404'd)
- _resolve_auto Step 1 for kimi-coding runtime → AnthropicAuxiliaryClient
- resolve_provider_client('openrouter', ...) → plain OpenAI (no regression)
- api_mode='chat_completions' override → plain OpenAI (explicit wins)

Tests:
- tests/agent/test_auxiliary_transport_autodetect.py (new): 21 tests
  covering URL detection, wrap decisions, and integration.
- 204/205 existing auxiliary tests pass (1 pre-existing failure on
  main, unrelated to this change).

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 06:50:14 -07:00
Teknium
e123f4ecf0
feat(gateway): opt-in runtime-metadata footer on final replies (#17026)
Append a compact 'model · 68% · ~/projects/hermes' footer to the FINAL
message of each turn, disabled by default (display.runtime_footer.enabled).
Answers the Telegram-side parity ask: runtime context that the CLI status
bar already shows is now available in messaging replies when enabled.

Wiring:
- gateway/runtime_footer.py: resolve_footer_config + format_runtime_footer +
  build_footer_line. Pure-function renderer; per-platform overrides under
  display.platforms.<platform>.runtime_footer.
- gateway/run.py: appends footer to response right after reasoning prepend
  so it lands only on the final message (never tool progress or streaming
  chunks). When streaming already delivered the body (already_sent), the
  footer is sent as a small trailing message instead.
- agent_result now exposes context_length alongside last_prompt_tokens so
  the footer can compute the pct; both gateway return paths updated.
- /footer [on|off|status] slash command, wired in CLI (cli.py) and gateway
  (gateway/run.py both running-agent bypass and main dispatch). Global
  toggle only; per-platform overrides via config.yaml.

Graceful degradation:
- Missing context_length (unknown model) → pct field silently dropped
  (no '?%' artifact).
- Empty final_response → no footer appended.
- Unknown field names in config → silently ignored.

Tests: 25-case unit suite (tests/gateway/test_runtime_footer.py) plus E2E
harness covering streaming vs non-streaming branches, per-platform override,
and the exact argument contract gateway/run.py uses.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 06:50:04 -07:00
Teknium
6085d7a93e
chore: remove unused imports and dead locals (ruff F401, F841) (#17010)
Mechanical cleanup across 43 files — removes 46 unused imports
(F401) and 14 unused local variables (F841) detected by
`ruff check --select F401,F841`. Net: -49 lines.

Also fixes a latent NameError in rl_cli.py where `get_hermes_home()`
was called at module line 32 before its import at line 65 — the
module never imported successfully on main. The ruff audit surfaced
this because it correctly saw the symbol as imported-but-unused
(the call happened before the import ran); the fix moves the import
to the top of the file alongside other stdlib imports.

One `# noqa: F401` kept in hermes_cli/status.py for `subprocess`:
tests monkeypatch `hermes_cli.status.subprocess` as a regression
guard that systemctl isn't called on Termux, so the name must
exist at module scope even though the module body doesn't reference
it. Docstring explains the reason.

Also fixes an invalid `# noqa:` directive in
gateway/platforms/discord.py:308 that lacked a rule code.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 06:46:45 -07:00
teknium1
3d8be2c617 fix(install): widen /dev/tty open-probe to sibling gates (#16746)
The contributor's PR (#16750) scoped the fix to run_setup_wizard() and
explicitly punted the two sibling sites. Both have the identical
[ -e /dev/tty ] pattern followed by a < /dev/tty redirect and crash in
Docker the same way:

- scripts/install.sh:732 install_system_packages() -- apt sudo prompt
  fallback. sudo ... < /dev/tty dies with the same ENXIO.
- scripts/install.sh:1395 maybe_start_gateway() -- gateway-install gate,
  same function path as the wizard reproducer.

Fix both with the same (: </dev/tty) 2>/dev/null probe, and parametrize
the regression test over all three gated functions so any future
regression is caught regardless of which site breaks.
2026-04-28 06:45:55 -07:00
briandevans
89e8c87354 test(install): regex-based gate assertions per copilot review on #16750
Address the three Copilot inline findings on the regression test:

- Switch _extract_run_setup_wizard() from str.index() with hard-coded
  markers (which raises ValueError if `maybe_start_gateway()` is renamed
  or the marker leaks into a comment) to an anchored regex on the
  function-definition + closing-brace boundaries.
- Match `[ -e /dev/tty ]` with surrounding whitespace, optional quoting,
  and the `test -e /dev/tty` form so the regression guard catches every
  spelling of the existence-only check, not just the exact substring.
- Replace the literal `(: </dev/tty)` substring assertion with a
  higher-level invariant — the gate must be an `if`/`if !` whose test
  redirects stdin from /dev/tty — so equivalent open-based probes
  (`exec 3</dev/tty` + close, brace-grouped variants, etc.) keep the
  test green while the bare existence check stays caught.

Verified guard: both tests still pass on the fix and both fail on
`origin/main` with the documented messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:45:55 -07:00
briandevans
20c9340c34 fix(install): probe /dev/tty by opening it, not bare existence (#16746)
In Docker builds the `/dev/tty` device node is present in the mount
namespace, so `[ -e /dev/tty ]` returns true — but opening it fails
with `ENXIO: No such device or address`. Under the old gate the
"no terminal available" skip never triggered, the setup wizard ran,
and the build aborted a few lines later when bash tried `< /dev/tty`:

    /tmp/install.sh: line 1347: /dev/tty: No such device or address

Replace the existence check with `(: </dev/tty) 2>/dev/null`, which
actually attempts to open /dev/tty in a subshell. The probe succeeds
when piped from `curl | bash` on a real terminal (the wizard's intended
use case) and fails cleanly in Docker build / CI contexts so the skip
kicks in before the redirect can crash.

Add a regression test that statically asserts run_setup_wizard does not
gate on the bare existence check and that the open-based probe is in
place.

Fixes #16746.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:45:55 -07:00
teknium1
b2339c87e4 chore(release): map dejie.guo@gmail.com -> JayGwod 2026-04-28 06:45:35 -07:00
Dejie Guo
8cced33784 fix(model): prefer live models for user providers 2026-04-28 06:45:35 -07:00
Teknium
69b8fa65d4
docs(delegate_task): clarify that it is synchronous and not durable (#17022)
delegate_task runs inside the parent turn and is cancelled when the parent is interrupted (new user message, /stop, /new). The child status payload (status=interrupted, exit_reason=interrupted) is already honest, but the tool schema and user-facing docs did not set the expectation, so users reasonably assumed delegated subagents would keep running in the background after interrupting the parent.

Updates:

- tools/delegate_tool.py DELEGATE_TASK_SCHEMA description adds a WHEN NOT TO USE bullet pointing at cronjob / terminal(background=True, notify_on_complete=True) for durable long-running work.

- website/docs/user-guide/features/delegation.md gains a Lifetime and Durability callout above Key Properties.

- website/docs/guides/delegation-patterns.md expands the Use something else list and the Constraints section with the same guidance.

Reported by LizLiz (@lizliz404) via Teknium.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 06:45:15 -07:00
Teknium
5f84eac451
feat(gateway): bust cached agent on compression/context_length config edits (#17008)
The gateway caches one AIAgent per session to preserve prompt-cache hits,
keyed by _agent_config_signature().  The signature previously only
fingerprinted model/credentials/toolsets/ephemeral-prompt — NOT the
compression or context_length config.  As a result, users who edited
model.context_length or compression.threshold in config.yaml on a
long-lived gateway saw no effect until they triggered an unrelated
cache eviction (/model switch, /reset, gateway restart).

Add a new cache_keys parameter to _agent_config_signature and a
_CACHE_BUSTING_CONFIG_KEYS registry listing config values the agent
bakes in at construction time.  Call sites read the current config and
pass it through — next gateway message with an edited config
rebuilds the agent.

Keys registered:
- model.context_length
- compression.enabled
- compression.threshold
- compression.target_ratio
- compression.protect_last_n

Reported by @OP (Apr 26 feedback bundle).

## Changes
- gateway/run.py: new _CACHE_BUSTING_CONFIG_KEYS tuple,
  _extract_cache_busting_config classmethod, cache_keys kwarg on
  _agent_config_signature, call site passes the extracted dict
- tests/gateway/test_agent_cache.py: 11 new tests
  (5 on _agent_config_signature behavior, 6 on _extract_cache_busting_config)

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 06:37:42 -07:00
kshitijk4poor
b5905f0d4a chore: add Mirac1eSky to AUTHOR_MAP 2026-04-28 06:37:22 -07:00
Siwen Wang
d6137453ac fix(gateway): drain stale httpx polling connections on Telegram reconnect
Network errors through proxies (e.g. sing-box) can leave httpx
connections in a half-closed state occupying pool slots.  After enough
reconnect cycles the 256-connection default fills up entirely, causing
Pool timeout: All connections in the connection pool are occupied.

Fix: cycle only the getUpdates request object (_request[0]) via
shut-down + re-initialize before restarting polling.  This drains stale
connections without touching the general request (_request[1]) that
concurrent send_message / edit_message calls rely on.

The drain is applied to both _handle_polling_network_error and
_handle_polling_conflict reconnect paths via a shared
_drain_polling_connections() helper.  Failures in the drain are
swallowed so reconnect always proceeds.

Based on #16466 by @Mirac1eSky.
2026-04-28 06:37:22 -07:00
Teknium
391f1ca1f4
feat(aux): translate extra_body.reasoning into Codex Responses API (#17004)
Auxiliary callers that configure reasoning via
auxiliary.<task>.extra_body.reasoning were having that config silently
dropped by the Codex Responses adapter — it only forwarded
messages/model/tools through to responses.stream(), never translating
chat.completions-shaped reasoning hints into the Responses API's
top-level reasoning + include fields.

Mirror the main-agent translation from agent/transports/codex.py:
- extra_body.reasoning.effort → resp_kwargs.reasoning.{effort, summary:"auto"}
- 'minimal' → 'low' clamp (Codex backend rejects 'minimal')
- Always include ['reasoning.encrypted_content'] when reasoning is enabled
- {'enabled': False} → omit reasoning and include entirely
- Non-dict reasoning values are ignored defensively

Reported by @OP (Apr 26 feedback bundle).

## Changes
- agent/auxiliary_client.py: _CodexCompletionsAdapter.create() now reads
  and translates extra_body.reasoning before calling responses.stream()
- tests/agent/test_auxiliary_client.py: 9 new tests covering all effort
  levels, the minimal→low clamp, the disabled path, the no-op paths,
  and defensive handling of wrong-shape inputs

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 05:47:42 -07:00
Teknium
72dea9f4f7
feat(gateway): make hygiene hard message limit configurable (#17000)
The gateway session-hygiene pre-compression safety valve had a hardcoded
400-message threshold. On long-lived sessions with short turns this was
either too high (users with aggressive compression preferences) or too
low (users with very large context models who want to keep more history
in-flight).

Add compression.hygiene_hard_message_limit (default 400) so it can be
tuned without forking the gateway.

Reported by @OP (Apr 26 feedback bundle).

## Changes
- hermes_cli/config.py: new DEFAULT_CONFIG key with 400 default
- gateway/run.py: read compression.hygiene_hard_message_limit at
  hygiene-time, fall back to 400 if missing/invalid
- tests/gateway/test_session_hygiene.py: two tests — override fires at
  the configured limit, default does not fire below 400

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 05:43:12 -07:00
Teknium
06164a7b28
fix(codex): resync pool entry from auth.json after reauth (#17001)
When openai-codex tokens expire or the ChatGPT account hits a 429
window, the pool entry gets marked STATUS_EXHAUSTED with
last_error_reset_at many hours in the future. If the user then runs
`hermes model` / `hermes auth openai-codex` to reauth, fresh tokens
land in ~/.hermes/auth.json but the pool entry stayed frozen behind
its reset_at — every request kept failing with 'credential pool: no
available entries (all exhausted or empty)' until the original window
elapsed.

_available_entries() already had auth.json/credentials-file resync
branches for anthropic/claude_code and nous/device_code; openai-codex
was missing. Added _sync_codex_entry_from_auth_store() mirroring the
nous version (reads state["tokens"][{access,refresh}_token] +
state["last_refresh"]) and wired it into the exhausted-entry resync
loop.

Also softens the 'codex CLI not found' doctor warning — native
device-code OAuth does not require the Codex binary, only
importing existing Codex CLI tokens does. Downgraded to an info line.

Reported on Discord by p1aceho1der: Codex stalled indefinitely after
a rate-limit reset, reauth didn't help, and doctor falsely warned
that the codex CLI was required.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 05:43:09 -07:00
teknium1
529eb29b6a fix(gemini): clamp Flash thinkingLevel to documented low/medium/high set
Gemini 3 Flash documents low/medium/high as the accepted thinkingLevel
values. The salvaged bridge was forwarding Hermes' "minimal" effort to
Flash verbatim, which is not a documented Gemini level and risks a 400
from the native adapter.

Clamp minimal->low on Flash (matching how Pro already clamps minimal+low
down), and funnel anything outside {low, medium, high} into medium to
keep the request valid by construction. No behaviour change for the
documented effort levels.
2026-04-28 05:38:23 -07:00
Nanako0129
dbbe2d1973 fix(gemini): bridge reasoning_config into thinking_config for chat-completions routes 2026-04-28 05:38:23 -07:00
teknium1
315a11a76f chore(prompt): tell telegram models to prefer bullets over tables
Telegram has no native table syntax. The gateway auto-rewrites pipe
tables into row-group bullets (see previous commit), but letting models
know up front means they emit the clean form directly instead of
relying on post-processing to synthesize headings.

Also helps users whose MEMORY.md formatting policies were being
overridden — the platform hint now carries the guidance.
2026-04-28 05:37:50 -07:00
LeonSGP43
a3b9343f08 feat(telegram): render markdown tables as row groups 2026-04-28 05:37:50 -07:00
helix4u
d8c5573ffe fix(profiles): migrate Honcho host on rename 2026-04-28 05:37:09 -07:00
teknium1
c69310c625 fix(weixin): raise descriptive error when rate-limit retries exhaust
The rate-limit branch added by the original PR did sleep+continue with
no attempt to record the last error, so persistent iLink -2 responses
exhausted the retry loop and hit 'assert last_error is not None',
raising AssertionError instead of a descriptive RuntimeError.

Record last_error = RuntimeError(...) before continuing, and break out
of the loop on the final attempt instead of sleeping uselessly.
2026-04-28 05:21:58 -07:00
teknium1
d3a9c69e9b chore(release): map leihaibo1992 author for #16757 salvage 2026-04-28 05:21:58 -07:00
Leihb
a54106bbc8 fix(weixin): split long messages (>2000 chars) into chunks to prevent truncation
- Change MAX_MESSAGE_LENGTH from 4000 to 2000 to match Weixin iLink API limit
- Add RATE_LIMIT_ERRCODE = -2 handling with 3x backoff retry
- Increase default send_chunk_delay_seconds from 0.35 to 1.5 to avoid rate limits
- Increase default send_chunk_retries from 2 to 4 for better reliability
- Use _split_text() in send() to chunk long messages before delivery

Fixes #16411
2026-04-28 05:21:58 -07:00
teknium1
1a4289b6b7 chore(release): map revar@users.noreply.github.com -> revaraver 2026-04-28 05:21:49 -07:00
revar
052b3449e5 test(cli): regression test for manual /compress system_message
Add tests/test_cli_manual_compress.py verifying _manual_compress passes
None (not the cached system prompt) to _compress_context, forwards the
/compress <topic> focus string, rotates CLI session_id to the new child
session, and clears the pending title.

Co-authored-by: revar <revar@users.noreply.github.com>
2026-04-28 05:21:49 -07:00
ygd58
fb112d6a73 fix(cli): pass None as system_message in manual compress to prevent duplication
_manual_compress() passed self.agent._cached_system_prompt to
_compress_context() as the system_message argument. _compress_context
calls _build_system_prompt(system_message), which appends system_message
to prompt_parts that already contain the agent identity block — causing
the identity to appear twice in the new session's system prompt
(20,957 -> 42,303 chars, +102% as reported in issue #15281).

Fix: pass None instead of _cached_system_prompt. _build_system_prompt(None)
rebuilds the system prompt correctly from scratch without appending a
pre-built prompt on top of the identity layers.

Fixes #15281
2026-04-28 05:21:49 -07:00
teknium1
7444e49d4e fix(gateway): use transcript timestamp for auto-continue freshness
Follow-up to PR #16802 (BeliefanX). The original fix read
`agent_history[-1].get("timestamp")` for the tool-tail freshness gate,
but `gateway/run.py` strips the `timestamp` field off all tool/tool_call
rows when building `agent_history` from the raw transcript (see
`clean_msg = {k: v for k, v in msg.items() if k != "timestamp"}`).  At
runtime the tool-tail branch always saw `None` and silently took the
legacy-fresh path — the stale-guard never fired for the tool-tail case
it was supposed to cover.

Changes:
- Read the freshness signal from the RAW `history` list (via new
  `_last_transcript_timestamp()` helper) BEFORE the strip.  Both the
  resume_pending branch and the tool-tail branch use this single signal,
  replacing the two divergent ones.
- Default window bumped 15 min → 1 hour via new
  `_AUTO_CONTINUE_FRESHNESS_SECS_DEFAULT`.  The 15-minute default was
  shorter than the default `gateway_timeout` of 30 min, so a legitimate
  long-running turn interrupted near its timeout boundary and resumed
  shortly after would have been misclassified as stale.
- Configurable via `config.yaml` `agent.gateway_auto_continue_freshness`
  (bridged to `HERMES_AUTO_CONTINUE_FRESHNESS` at gateway startup — same
  pattern as `gateway_timeout`).  Set to 0 to disable the gate.
- `_coerce_gateway_timestamp` now explicitly rejects bool (which is a
  subclass of int and would otherwise coerce to 0.0/1.0).
- Tests rewritten to exercise the real production data shape: raw
  `history` → `_build_agent_history` strip → freshness decision.  A
  regression guard (`test_stale_tool_tail_with_production_data_shape`)
  asserts `agent_history` tool rows carry NO timestamp, protecting
  against someone "fixing" the original bug by re-adding the stripped
  field (which would break the OpenAI tool-result message contract).

Add BeliefanX to scripts/release.py AUTHOR_MAP.

E2E verified: config.yaml → env var bridge → helper returns configured
value; default 1h window; malformed/empty env var falls back to default;
ISO-Z timestamps parse; ms-epoch coerced; bool rejected.
2026-04-28 05:20:35 -07:00
beliefanx
93feffbcfa fix(gateway): avoid stale interrupted turn auto-continue 2026-04-28 05:20:35 -07:00
Teknium
b61d9b297a refactor: consolidate symlink-safe atomic replace into shared helper
Extract the islink/realpath guard from the 16743 fix into a single
atomic_replace() helper in utils.py, then migrate every os.replace()
call site in the codebase to use it.

The original PR #16777 correctly identified and fixed the bug, but
only patched 9 of ~24 call sites. The same bug class (managed
deployments that symlink state files silently losing the link on
every write) still existed at auth.json, sessions file, gateway
config, env_loader, webhook subscriptions, debug store, model
catalog, pairing, google OAuth, nous rate guard, and more.

Rather than add another 10+ copies of the same three-line guard,
consolidate into atomic_replace(tmp, target) which:
- resolves symlinks via os.path.realpath before os.replace
- returns the resolved real path so callers can re-apply permissions
- is a drop-in replacement for os.replace at the use sites

Changes:
- utils.py: new atomic_replace() helper + atomic_json_write /
  atomic_yaml_write now call it instead of inlining the guard
- 16 files: all os.replace() call sites migrated to atomic_replace()
  - agent/{google_oauth, nous_rate_guard, shell_hooks}.py
  - cron/jobs.py
  - gateway/{pairing, session, platforms/telegram}.py
  - hermes_cli/{auth, config, debug, env_loader, model_catalog, webhook}.py
  - tools/{memory_tool, skill_manager_tool, skills_sync}.py

Tests: tests/test_atomic_replace_symlinks.py pins the invariant for
atomic_replace + atomic_json_write + atomic_yaml_write, covers plain
files, first-time creates, broken symlinks, and permission preservation.

Refs #16743
Builds on #16777 by @vominh1919.
2026-04-28 04:58:22 -07:00
vominh1919
3ab97a32d1 fix: preserve symlinks during atomic file writes (#16743)
os.replace(tmp, path) replaces the symlink itself with a regular file,
breaking users who symlink config.yaml, SOUL.md, or .env from ~/.hermes/
to a dotfiles repo or managed profile package.

Fix: resolve symlinks via os.path.realpath() before os.replace(), so the
real file is overwritten in-place while the symlink survives.

Fixed in 7 files covering all os.replace call sites:
- utils.py (atomic_json_write, atomic_yaml_write — fixes save_config)
- hermes_cli/config.py (env sanitizer, save_env_value, remove_env_value)
- tools/skill_manager_tool.py (_atomic_write_text — SOUL.md writes)
- tools/memory_tool.py (memory file writes)
- tools/skills_sync.py (manifest writes)
- cron/jobs.py (job state + output file writes)
- agent/shell_hooks.py (hook file writes)

Fixes NousResearch/hermes-agent#16743
2026-04-28 04:58:22 -07:00
teknium1
1369dae226 test(openclaw-migration): cover alias reverse-lookup for real OpenClaw schema
Real OpenClaw configs key agents.defaults.models by full provider/model
API ID with an 'alias' field on the value (e.g.
{'anthropic/claude-opus-4-6': {'alias': 'Claude Opus 4.6'}}).  Add
regression tests for issue #16745 covering:

- reverse-lookup of alias against real schema (keyed by API ID)
- alias resolution when model is a bare string vs {'primary': ...}
- passthrough when the value is already a provider/model API ID
- passthrough when the alias has no catalog match
- string-valued catalog entries (belt-and-suspenders)
- no catalog at all
2026-04-28 04:58:13 -07:00
vominh1919
7996c14795 fix: resolve model aliases during claw migrate (#16745)
`hermes claw migrate` copied OpenClaw's model setting verbatim, which
could be a display alias (e.g. "Claude Opus 4.6") instead of the actual
API ID (e.g. "claude-opus-4-6"). Hermes then sent the alias to the API,
causing HTTP 404 model not found.

Fix: look up the model string in agents.defaults.models (plural) alias
catalog. If found, use the resolved "id" field, prepending the provider
prefix if needed. If not found (already an API ID), pass through unchanged.

Fixes NousResearch/hermes-agent#16745
2026-04-28 04:58:13 -07:00
阿泥豆
4aa0a7c195 fix(error-classifier): add insufficient balance to billing patterns
DeepSeek API returns HTTP 400 with 'Insufficient Balance' message when
account funds are depleted. This pattern was not in _BILLING_PATTERNS,
causing the error to be misclassified instead of triggering billing
exhaustion handling (e.g., fallback to alternate provider).

Suggested by teknium1 in PR review of #15586.
2026-04-28 04:58:09 -07:00
Teknium
7428abd54e chore(release): map mtf201013@gmail.com -> ma-pony 2026-04-28 04:58:03 -07:00
Teknium
0f473d643d refactor(schema): consolidate nullable-union stripping in schema_sanitizer
Adds tools.schema_sanitizer.strip_nullable_unions as the single
implementation for collapsing anyOf/oneOf nullable unions.  Both the
MCP input-schema normalizer and the Anthropic tool-schema guard now
delegate to it instead of re-implementing the same walk three times.

The global sanitizer also gains a final pass so any tool that slips
past the two earlier hooks (plugin tools, non-MCP custom tools with
Pydantic-shaped schemas) still gets safe input_schemas on Anthropic.

- tools/schema_sanitizer.py:
    * New public strip_nullable_unions(schema, keep_nullable_hint=True).
    * _sanitize_single_tool() calls it as a final pass (hint preserved
      so coerce_tool_args can still map string "null" to None).
- tools/mcp_tool.py: _normalize_mcp_input_schema delegates.
- agent/anthropic_adapter.py: _normalize_tool_input_schema delegates
  with keep_nullable_hint=False (Anthropic does not recognize nullable).

No behavioral change for the fix itself; tests (73/73 targeted +
E2E across MCP→sanitizer→Anthropic paths) pass.
2026-04-28 04:58:03 -07:00
Pony.Ma
aa94883288 fix(mcp): preserve nullable schema coercion 2026-04-28 04:58:03 -07:00
Pony.Ma
1350d12b0b fix: keep mcp dynamic refresh tasks tracked 2026-04-28 04:58:03 -07:00
Pony.Ma
02ae152222 fix(mcp): normalize nullable tool schemas 2026-04-28 04:58:03 -07:00
teknium1
9cd02b1698 chore(release): map r.filgueiras@apheris.com -> rfilgueiras 2026-04-28 03:53:11 -07:00
Ruda Porto Filgueiras
37551ee53e test(bedrock): add model picker and region routing tests
25 new tests (all Bedrock API calls mocked, no real AWS creds needed):

tests/hermes_cli/test_bedrock_model_picker.py (20 tests):
  - provider_model_ids("bedrock") uses live discovery, returns regional
    model IDs, falls back gracefully on empty/exception, resolves all
    bedrock aliases (aws, aws-bedrock, amazon-bedrock) to live discovery
  - list_authenticated_providers() section 2: bedrock appears with AWS
    creds, model list from discover_bedrock_models(), total_models
    matches, is_current flag works, absent creds hides bedrock, discovery
    failure does not crash, no duplicate entries
  - Region routing: botocore profile eu-central-1 yields eu.* model IDs
    end-to-end; env var takes priority over botocore profile
  - providers.py overlay: exists with correct transport/auth_type, label
    is non-empty, all aliases normalize to bedrock

tests/agent/test_bedrock_adapter.py (5 tests):
  - resolve_bedrock_region() botocore profile fallback, botocore failure
    fallback, us-east-1 hard fallback (with botocore mocked)
2026-04-28 03:53:11 -07:00
Ruda Porto Filgueiras
a23f18cc3e fix(bedrock): add live model discovery and region resolution for non-US regions
provider_model_ids("bedrock") fell through to a static _PROVIDER_MODELS
table containing only hardcoded us.* model IDs.  Users configured for
non-US AWS regions (eu-central-1, ap-northeast-1, etc.) saw wrong or no
models in /model and autocomplete.

Root causes fixed:

1. models.py: provider_model_ids() now calls discover_bedrock_models()
   keyed by the resolved region before falling back to the static table.
   A new bedrock_model_ids_or_none() helper in bedrock_adapter.py
   consolidates the discover -> extract IDs -> fallback pattern used by
   all three call sites.

2. providers.py: registers bedrock in HERMES_OVERLAYS with
   transport=bedrock_converse and auth_type=aws_sdk so
   get_provider("bedrock") and resolve_provider_full("bedrock") work.

3. model_switch.py: list_authenticated_providers() sections 2 and 3
   detect AWS credentials via has_aws_credentials() for aws_sdk
   overlays and use live discovery for the model list.

4. bedrock_adapter.py: resolve_bedrock_region() reads the configured
   region from botocore.session before falling back to us-east-1,
   covering users who set their region in ~/.aws/config via a named
   profile rather than env vars.

5. tui_gateway/server.py: passes provider= to get_model_context_length()
   so context window lookups work correctly for the Bedrock provider.
2026-04-28 03:53:11 -07:00
Teknium
023f5c74b1
fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path (#16957)
* fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path

OAuth requests now identify as Hermes on the wire. Removed:

  - "You are Claude Code, Anthropic's official CLI for Claude." system
    prompt prepend
  - Hermes Agent → Claude Code / Nous Research → Anthropic
    system-prompt substitutions
  - mcp_ tool-name prefix on outgoing tool schemas + message history
  - Matching mcp_ strip on inbound tool_use blocks (strip_tool_prefix path
    removed from AnthropicTransport.normalize_response, + all 5 call
    sites in run_agent.py and auxiliary_client.py)
  - user-agent: claude-cli/<v> (external, cli) and x-app: cli headers on
    the Messages API client

Added:

  - OAuth path strips context-1m-2025-08-07 — Anthropic rejects OAuth
    requests carrying it with HTTP 400 'This authentication style is
    incompatible with the long context beta header.'

Kept (auth plumbing, not identity spoofing):

  - _is_oauth_token classifier and is_oauth flag threading
  - Bearer vs x-api-key auth routing
  - _OAUTH_ONLY_BETAS (claude-code-20250219, oauth-2025-04-20) — backend
    requires these on the OAuth-gated Messages endpoint
  - _OAUTH_CLIENT_ID (Claude Code's) — Anthropic doesn't issue OAuth
    creds to third parties; this is the only way the login flow works
  - claude-cli/<v> User-Agent on the OAuth token exchange + refresh
    endpoints at platform.claude.com/v1/oauth/token — bare requests get
    Cloudflare 1010 blocked

Verified live against api.anthropic.com with a fresh sk-ant-oat01-*
token:

  - claude-haiku-4-5 simple message: HTTP 200, 'OK' response
  - claude-haiku-4-5 tool call: HTTP 200, stop_reason=tool_use, tool
    named 'terminal' (no mcp_ prefix) round-tripped correctly
  - Outgoing wire: no user-agent, no x-app, real Hermes identity in
    system prompt, real tool name in schema

Closes/supersedes #16820 (mcp_ PascalCase normalization patch — no longer
needed since the mcp_ round-trip is gone).

* fix(anthropic): resolve_anthropic_token() reads credential pool first

Close the gap where ~/.hermes/auth.json → credential_pool.anthropic
(where hermes login + dashboard PKCE flow write OAuth tokens) was not
in resolve_anthropic_token()'s source list.

Before: users who authed via hermes login got the token written into
the pool, but legacy fallback code paths (auxiliary_client, models
catalog fetch, explicit-runtime path) that call resolve_anthropic_token()
saw None and raised 'No Anthropic credentials found' — even though the
token was sitting in auth.json.

New priority 1: pool.select() with env-sourced entries skipped. Skipping
env:* entries preserves the existing env-var priority logic further
down the chain (static env OAuth → refreshable Claude Code upgrade via
_prefer_refreshable_claude_code_token).

Surfaced while writing the hermes-agent-dev skill playbook for
'finding a live OAuth token for an E2E test'.

---------

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 03:51:17 -07:00
Teknium
2b728e1274
fix(agent): drop thinking-only assistant turns before provider call (#16959)
Adds a pre-call sanitizer that detects assistant messages containing only
reasoning (reasoning / reasoning_content, no visible content, no
tool_calls) and drops them from the API copy. Adjacent user messages
left behind are merged so role alternation is preserved for the
provider.

Mirrors Claude Code's approach in src/utils/messages.ts
(filterOrphanedThinkingOnlyMessages + mergeAdjacentUserMessages). We
drop the whole turn rather than fabricate stub text (the '.' /
'(continued)' pattern from contributor PRs #11098, #13010, #16842 that
were rejected because they put words in the model's mouth).

The stored conversation history (self.messages) is never mutated — only
the per-call api_messages copy. Users still see the reasoning block in
the CLI/gateway transcript; only the wire copy is cleaned. Session
persistence keeps the full trace.

Two call sites covered:
- Main agent loop, after _sanitize_api_messages (catches every turn).
- Iteration-limit-summary fallback path.

Tests: tests/run_agent/test_thinking_only_sanitizer.py — 25 cases
covering detection (string/list content, whitespace-only, tool_calls,
reasoning_details list form), drop behavior, adjacent-user merge
(string+string, list+list, mixed), non-mutation of input dicts, and
system-message handling.

E2E live-tested against 5 providers with a poisoned history (empty
assistant message + reasoning_content): OpenRouter→Anthropic/OpenAI/
DeepSeek-R1/Qwen, native Gemini. All 5 accepted the cleaned request.
Happy-path regression (5/5) confirms the sanitizer is a noop when no
thinking-only turn exists.

Related: #16823 (wontfix — stub-text approach rejected).

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 03:50:51 -07:00
teknium1
5316ce95de chore(release): map simonweng@tencent.com -> Contentment003111
AUTHOR_MAP entry for the tencent-tokenhub provider PR #16860 contributor.
2026-04-28 03:45:52 -07:00
simonweng
a6a6cf047d feat(providers): add tencent-tokenhub provider support
Registers tencent-tokenhub (https://tokenhub.tencentmaas.com/v1) as a
new API-key provider with model tencent/hy3-preview (256K context).

- PROVIDER_REGISTRY entry + TOKENHUB_API_KEY / TOKENHUB_BASE_URL env vars
- Aliases: tencent, tokenhub, tencent-cloud, tencentmaas
- openai_chat transport with is_tokenhub branch for top-level
  reasoning_effort (Hy3 is a reasoning model)
- tencent/hy3-preview:free added to OpenRouter curated list
- 60+ tests (provider registry, aliases, runtime resolution,
  credentials, model catalog, URL mapping, context length)
- Docs: integrations/providers.md, environment-variables.md,
  model-catalog.json

Author: simonweng <simonweng@tencent.com>
Salvaged from PR #16860 onto current main (resolved conflicts with
#16935 Azure Anthropic env-var hint tests and the --provider choices=
list removal in chat_parser).
2026-04-28 03:45:52 -07:00
Teknium
bd10acd747
fix(providers): honor key_env/api_key_env on Azure Anthropic + accept alias in normalizer (#16935)
Three related fixes around custom env-var-name hints for provider entries.

1. Azure Anthropic path: previously hardcoded to look up AZURE_ANTHROPIC_KEY
   then ANTHROPIC_API_KEY with no way to override.  If a user wrote
     model:
       provider: anthropic
       base_url: https://my-resource.services.ai.azure.com/anthropic
       key_env: MY_CUSTOM_KEY
   the key_env hint was silently ignored and the resolver raised
   'No Azure Anthropic API key found' even when MY_CUSTOM_KEY was set
   in the environment.  The runtime now checks, in order:
     (1) os.getenv(model_cfg.key_env)
     (2) os.getenv(model_cfg.api_key_env)    # docs alias
     (3) model_cfg.api_key                     # inline value
     (4) AZURE_ANTHROPIC_KEY                   # historical default
     (5) ANTHROPIC_API_KEY                     # historical default
   Error message updated to mention key_env as an option.

2. Provider entry normalizer (_normalize_custom_provider_entry): accept
   'api_key_env' as a snake_case alias for 'key_env', and 'apiKeyEnv' as a
   camelCase alias.  Adds both to the _KNOWN_KEYS set so the 'unknown
   config keys ignored' warning doesn't fire on valid configs.

3. _VALID_CUSTOM_PROVIDER_FIELDS: add 'key_env'.  That set documents
   supported custom_providers entry fields; it was drifting from reality
   since key_env has been read at runtime in auxiliary_client.py,
   runtime_provider.py, and main.py for a while.

Docs: website/docs/guides/azure-foundry.md now uses the canonical key_env
field and notes that api_key_env / keyEnv / apiKeyEnv are accepted as
aliases.

Validation: 12 new tests in test_runtime_provider_resolution.py covering
all 5 Azure Anthropic resolution paths + 4 normalizer-alias tests.  Pass
rate across related suites (165 + 46 tests): 100%.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 02:12:08 -07:00
teknium1
4148e85b3a docs(web): document web_search limit parameter and query operators 2026-04-28 02:09:30 -07:00
墨綠BG
4462b349b2 feat(web): expose search result limit 2026-04-28 02:09:30 -07:00
Teknium
4e5ebf07ea
fix(matrix): stop tagging the user on every reply (#16932)
The mention_user_id injection from #38a6bada9 unconditionally attached an
@user:server mention pill + MSC3952 m.mentions.user_ids payload to every
outbound reply and every tool-progress status update. The stated intent
was push notifications in muted rooms, but shipped as always-on in every
room, DM or group, muted or not — so every reply pinged the user.

- gateway/platforms/base.py: stop injecting mention_user_id into send
  metadata on every reply; restore the original _thread_metadata passthrough.
- gateway/run.py: drop mention_user_id from status-thread metadata.
- gateway/platforms/matrix.py: drop the mention-pill append block in
  _send_text that consumed the metadata. Keep the reaction-based exec
  approval half of #38a6bada9 and the inbound/outbound m.mentions
  handling (unrelated to the per-reply ping).

Reported by Elkim [NOUS] on Discord.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 02:00:37 -07:00