hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-29 18:46:59 +00:00

Author	SHA1	Message	Date
konsisumer	e4b69bf149	fix(gateway): guard against None request_overrides in _build_api_kwargs	2026-04-28 06:57:23 -07:00
Teknium	1d8b9e6458	fix(auxiliary): auto-detect Anthropic Messages transport for all aux clients (#17027 ) Auxiliary tasks (title_generation, vision, compression, web_extract, session_search) now pick the correct wire protocol based on the endpoint, not just on which resolve_provider_client branch built the client. Fixes 404s on Kimi Coding Plan and any other named provider whose endpoint speaks Anthropic Messages. Root cause: the 'api_key' branch of resolve_provider_client (and the Step 2 fallback chain inside _resolve_auto) always built a plain OpenAI client regardless of what the endpoint actually spoke. For provider=kimi-coding + model=kimi-for-coding, that meant: POST https://api.kimi.com/coding/v1/chat/completions { "model": "kimi-for-coding", ... } → 404 resource_not_found_error The /coding route only accepts the Anthropic Messages shape (the main agent already uses api_mode=anthropic_messages for it). Earlier fixes (#16819, #`22ddac4b1`) patched the anonymous-custom, named-custom, and external-process branches — but the named api_key branch (kimi-coding, minimax, zai, future /anthropic providers) was the fourth sibling and never got the same treatment. Fix: one module-level helper _maybe_wrap_anthropic() that rewraps a plain OpenAI client in AnthropicAuxiliaryClient when: - api_mode is explicitly 'anthropic_messages', OR - the URL ends in '/anthropic', OR - the host is api.kimi.com + path contains '/coding', OR - the host is api.anthropic.com. Wired into _wrap_if_needed (covers all resolve_provider_client branches that already go through it) and into the Step 2 api_key fallback chain inside _resolve_auto. Explicit api_mode still wins: passing api_mode='chat_completions' forces OpenAI wire, and already- wrapped specialized adapters (Codex, Gemini native, CopilotACP) pass through unchanged. E2E verified: - resolve_provider_client('kimi-coding', 'kimi-for-coding') → AnthropicAuxiliaryClient (was plain OpenAI, which 404'd) - _resolve_auto Step 1 for kimi-coding runtime → AnthropicAuxiliaryClient - resolve_provider_client('openrouter', ...) → plain OpenAI (no regression) - api_mode='chat_completions' override → plain OpenAI (explicit wins) Tests: - tests/agent/test_auxiliary_transport_autodetect.py (new): 21 tests covering URL detection, wrap decisions, and integration. - 204/205 existing auxiliary tests pass (1 pre-existing failure on main, unrelated to this change). Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 06:50:14 -07:00
Teknium	e123f4ecf0	feat(gateway): opt-in runtime-metadata footer on final replies (#17026 ) Append a compact 'model · 68% · ~/projects/hermes' footer to the FINAL message of each turn, disabled by default (display.runtime_footer.enabled). Answers the Telegram-side parity ask: runtime context that the CLI status bar already shows is now available in messaging replies when enabled. Wiring: - gateway/runtime_footer.py: resolve_footer_config + format_runtime_footer + build_footer_line. Pure-function renderer; per-platform overrides under display.platforms.<platform>.runtime_footer. - gateway/run.py: appends footer to response right after reasoning prepend so it lands only on the final message (never tool progress or streaming chunks). When streaming already delivered the body (already_sent), the footer is sent as a small trailing message instead. - agent_result now exposes context_length alongside last_prompt_tokens so the footer can compute the pct; both gateway return paths updated. - /footer [on\|off\|status] slash command, wired in CLI (cli.py) and gateway (gateway/run.py both running-agent bypass and main dispatch). Global toggle only; per-platform overrides via config.yaml. Graceful degradation: - Missing context_length (unknown model) → pct field silently dropped (no '?%' artifact). - Empty final_response → no footer appended. - Unknown field names in config → silently ignored. Tests: 25-case unit suite (tests/gateway/test_runtime_footer.py) plus E2E harness covering streaming vs non-streaming branches, per-platform override, and the exact argument contract gateway/run.py uses. Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 06:50:04 -07:00
Teknium	6085d7a93e	chore: remove unused imports and dead locals (ruff F401, F841) (#17010 ) Mechanical cleanup across 43 files — removes 46 unused imports (F401) and 14 unused local variables (F841) detected by `ruff check --select F401,F841`. Net: -49 lines. Also fixes a latent NameError in rl_cli.py where `get_hermes_home()` was called at module line 32 before its import at line 65 — the module never imported successfully on main. The ruff audit surfaced this because it correctly saw the symbol as imported-but-unused (the call happened before the import ran); the fix moves the import to the top of the file alongside other stdlib imports. One `# noqa: F401` kept in hermes_cli/status.py for `subprocess`: tests monkeypatch `hermes_cli.status.subprocess` as a regression guard that systemctl isn't called on Termux, so the name must exist at module scope even though the module body doesn't reference it. Docstring explains the reason. Also fixes an invalid `# noqa:` directive in gateway/platforms/discord.py:308 that lacked a rule code. Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 06:46:45 -07:00
teknium1	3d8be2c617	fix(install): widen /dev/tty open-probe to sibling gates (#16746 ) The contributor's PR (#16750) scoped the fix to run_setup_wizard() and explicitly punted the two sibling sites. Both have the identical [ -e /dev/tty ] pattern followed by a < /dev/tty redirect and crash in Docker the same way: - scripts/install.sh:732 install_system_packages() -- apt sudo prompt fallback. sudo ... < /dev/tty dies with the same ENXIO. - scripts/install.sh:1395 maybe_start_gateway() -- gateway-install gate, same function path as the wizard reproducer. Fix both with the same (: </dev/tty) 2>/dev/null probe, and parametrize the regression test over all three gated functions so any future regression is caught regardless of which site breaks.	2026-04-28 06:45:55 -07:00
briandevans	89e8c87354	test(install): regex-based gate assertions per copilot review on #16750 Address the three Copilot inline findings on the regression test: - Switch _extract_run_setup_wizard() from str.index() with hard-coded markers (which raises ValueError if `maybe_start_gateway()` is renamed or the marker leaks into a comment) to an anchored regex on the function-definition + closing-brace boundaries. - Match `[ -e /dev/tty ]` with surrounding whitespace, optional quoting, and the `test -e /dev/tty` form so the regression guard catches every spelling of the existence-only check, not just the exact substring. - Replace the literal `(: </dev/tty)` substring assertion with a higher-level invariant — the gate must be an `if`/`if !` whose test redirects stdin from /dev/tty — so equivalent open-based probes (`exec 3</dev/tty` + close, brace-grouped variants, etc.) keep the test green while the bare existence check stays caught. Verified guard: both tests still pass on the fix and both fail on `origin/main` with the documented messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 06:45:55 -07:00
briandevans	20c9340c34	fix(install): probe /dev/tty by opening it, not bare existence (#16746 ) In Docker builds the `/dev/tty` device node is present in the mount namespace, so `[ -e /dev/tty ]` returns true — but opening it fails with `ENXIO: No such device or address`. Under the old gate the "no terminal available" skip never triggered, the setup wizard ran, and the build aborted a few lines later when bash tried `< /dev/tty`: /tmp/install.sh: line 1347: /dev/tty: No such device or address Replace the existence check with `(: </dev/tty) 2>/dev/null`, which actually attempts to open /dev/tty in a subshell. The probe succeeds when piped from `curl \| bash` on a real terminal (the wizard's intended use case) and fails cleanly in Docker build / CI contexts so the skip kicks in before the redirect can crash. Add a regression test that statically asserts run_setup_wizard does not gate on the bare existence check and that the open-based probe is in place. Fixes #16746. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 06:45:55 -07:00
teknium1	b2339c87e4	chore(release): map dejie.guo@gmail.com -> JayGwod	2026-04-28 06:45:35 -07:00
Dejie Guo	8cced33784	fix(model): prefer live models for user providers	2026-04-28 06:45:35 -07:00
Teknium	69b8fa65d4	docs(delegate_task): clarify that it is synchronous and not durable (#17022 ) delegate_task runs inside the parent turn and is cancelled when the parent is interrupted (new user message, /stop, /new). The child status payload (status=interrupted, exit_reason=interrupted) is already honest, but the tool schema and user-facing docs did not set the expectation, so users reasonably assumed delegated subagents would keep running in the background after interrupting the parent. Updates: - tools/delegate_tool.py DELEGATE_TASK_SCHEMA description adds a WHEN NOT TO USE bullet pointing at cronjob / terminal(background=True, notify_on_complete=True) for durable long-running work. - website/docs/user-guide/features/delegation.md gains a Lifetime and Durability callout above Key Properties. - website/docs/guides/delegation-patterns.md expands the Use something else list and the Constraints section with the same guidance. Reported by LizLiz (@lizliz404) via Teknium. Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 06:45:15 -07:00
Teknium	5f84eac451	feat(gateway): bust cached agent on compression/context_length config edits (#17008 ) The gateway caches one AIAgent per session to preserve prompt-cache hits, keyed by _agent_config_signature(). The signature previously only fingerprinted model/credentials/toolsets/ephemeral-prompt — NOT the compression or context_length config. As a result, users who edited model.context_length or compression.threshold in config.yaml on a long-lived gateway saw no effect until they triggered an unrelated cache eviction (/model switch, /reset, gateway restart). Add a new cache_keys parameter to _agent_config_signature and a _CACHE_BUSTING_CONFIG_KEYS registry listing config values the agent bakes in at construction time. Call sites read the current config and pass it through — next gateway message with an edited config rebuilds the agent. Keys registered: - model.context_length - compression.enabled - compression.threshold - compression.target_ratio - compression.protect_last_n Reported by @OP (Apr 26 feedback bundle). ## Changes - gateway/run.py: new _CACHE_BUSTING_CONFIG_KEYS tuple, _extract_cache_busting_config classmethod, cache_keys kwarg on _agent_config_signature, call site passes the extracted dict - tests/gateway/test_agent_cache.py: 11 new tests (5 on _agent_config_signature behavior, 6 on _extract_cache_busting_config) Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 06:37:42 -07:00
kshitijk4poor	b5905f0d4a	chore: add Mirac1eSky to AUTHOR_MAP	2026-04-28 06:37:22 -07:00
Siwen Wang	d6137453ac	fix(gateway): drain stale httpx polling connections on Telegram reconnect Network errors through proxies (e.g. sing-box) can leave httpx connections in a half-closed state occupying pool slots. After enough reconnect cycles the 256-connection default fills up entirely, causing Pool timeout: All connections in the connection pool are occupied. Fix: cycle only the getUpdates request object (_request[0]) via shut-down + re-initialize before restarting polling. This drains stale connections without touching the general request (_request[1]) that concurrent send_message / edit_message calls rely on. The drain is applied to both _handle_polling_network_error and _handle_polling_conflict reconnect paths via a shared _drain_polling_connections() helper. Failures in the drain are swallowed so reconnect always proceeds. Based on #16466 by @Mirac1eSky.	2026-04-28 06:37:22 -07:00
Teknium	391f1ca1f4	feat(aux): translate extra_body.reasoning into Codex Responses API (#17004 ) Auxiliary callers that configure reasoning via auxiliary.<task>.extra_body.reasoning were having that config silently dropped by the Codex Responses adapter — it only forwarded messages/model/tools through to responses.stream(), never translating chat.completions-shaped reasoning hints into the Responses API's top-level reasoning + include fields. Mirror the main-agent translation from agent/transports/codex.py: - extra_body.reasoning.effort → resp_kwargs.reasoning.{effort, summary:"auto"} - 'minimal' → 'low' clamp (Codex backend rejects 'minimal') - Always include ['reasoning.encrypted_content'] when reasoning is enabled - {'enabled': False} → omit reasoning and include entirely - Non-dict reasoning values are ignored defensively Reported by @OP (Apr 26 feedback bundle). ## Changes - agent/auxiliary_client.py: _CodexCompletionsAdapter.create() now reads and translates extra_body.reasoning before calling responses.stream() - tests/agent/test_auxiliary_client.py: 9 new tests covering all effort levels, the minimal→low clamp, the disabled path, the no-op paths, and defensive handling of wrong-shape inputs Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 05:47:42 -07:00
Teknium	72dea9f4f7	feat(gateway): make hygiene hard message limit configurable (#17000 ) The gateway session-hygiene pre-compression safety valve had a hardcoded 400-message threshold. On long-lived sessions with short turns this was either too high (users with aggressive compression preferences) or too low (users with very large context models who want to keep more history in-flight). Add compression.hygiene_hard_message_limit (default 400) so it can be tuned without forking the gateway. Reported by @OP (Apr 26 feedback bundle). ## Changes - hermes_cli/config.py: new DEFAULT_CONFIG key with 400 default - gateway/run.py: read compression.hygiene_hard_message_limit at hygiene-time, fall back to 400 if missing/invalid - tests/gateway/test_session_hygiene.py: two tests — override fires at the configured limit, default does not fire below 400 Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 05:43:12 -07:00
Teknium	06164a7b28	fix(codex): resync pool entry from auth.json after reauth (#17001 ) When openai-codex tokens expire or the ChatGPT account hits a 429 window, the pool entry gets marked STATUS_EXHAUSTED with last_error_reset_at many hours in the future. If the user then runs `hermes model` / `hermes auth openai-codex` to reauth, fresh tokens land in ~/.hermes/auth.json but the pool entry stayed frozen behind its reset_at — every request kept failing with 'credential pool: no available entries (all exhausted or empty)' until the original window elapsed. _available_entries() already had auth.json/credentials-file resync branches for anthropic/claude_code and nous/device_code; openai-codex was missing. Added _sync_codex_entry_from_auth_store() mirroring the nous version (reads state["tokens"][{access,refresh}_token] + state["last_refresh"]) and wired it into the exhausted-entry resync loop. Also softens the 'codex CLI not found' doctor warning — native device-code OAuth does not require the Codex binary, only importing existing Codex CLI tokens does. Downgraded to an info line. Reported on Discord by p1aceho1der: Codex stalled indefinitely after a rate-limit reset, reauth didn't help, and doctor falsely warned that the codex CLI was required. Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 05:43:09 -07:00
teknium1	529eb29b6a	fix(gemini): clamp Flash thinkingLevel to documented low/medium/high set Gemini 3 Flash documents low/medium/high as the accepted thinkingLevel values. The salvaged bridge was forwarding Hermes' "minimal" effort to Flash verbatim, which is not a documented Gemini level and risks a 400 from the native adapter. Clamp minimal->low on Flash (matching how Pro already clamps minimal+low down), and funnel anything outside {low, medium, high} into medium to keep the request valid by construction. No behaviour change for the documented effort levels.	2026-04-28 05:38:23 -07:00
Nanako0129	dbbe2d1973	fix(gemini): bridge reasoning_config into thinking_config for chat-completions routes	2026-04-28 05:38:23 -07:00
teknium1	315a11a76f	chore(prompt): tell telegram models to prefer bullets over tables Telegram has no native table syntax. The gateway auto-rewrites pipe tables into row-group bullets (see previous commit), but letting models know up front means they emit the clean form directly instead of relying on post-processing to synthesize headings. Also helps users whose MEMORY.md formatting policies were being overridden — the platform hint now carries the guidance.	2026-04-28 05:37:50 -07:00
LeonSGP43	a3b9343f08	feat(telegram): render markdown tables as row groups	2026-04-28 05:37:50 -07:00
helix4u	d8c5573ffe	fix(profiles): migrate Honcho host on rename	2026-04-28 05:37:09 -07:00
teknium1	c69310c625	fix(weixin): raise descriptive error when rate-limit retries exhaust The rate-limit branch added by the original PR did sleep+continue with no attempt to record the last error, so persistent iLink -2 responses exhausted the retry loop and hit 'assert last_error is not None', raising AssertionError instead of a descriptive RuntimeError. Record last_error = RuntimeError(...) before continuing, and break out of the loop on the final attempt instead of sleeping uselessly.	2026-04-28 05:21:58 -07:00
teknium1	d3a9c69e9b	chore(release): map leihaibo1992 author for #16757 salvage	2026-04-28 05:21:58 -07:00
Leihb	a54106bbc8	fix(weixin): split long messages (>2000 chars) into chunks to prevent truncation - Change MAX_MESSAGE_LENGTH from 4000 to 2000 to match Weixin iLink API limit - Add RATE_LIMIT_ERRCODE = -2 handling with 3x backoff retry - Increase default send_chunk_delay_seconds from 0.35 to 1.5 to avoid rate limits - Increase default send_chunk_retries from 2 to 4 for better reliability - Use _split_text() in send() to chunk long messages before delivery Fixes #16411	2026-04-28 05:21:58 -07:00
teknium1	1a4289b6b7	chore(release): map revar@users.noreply.github.com -> revaraver	2026-04-28 05:21:49 -07:00
revar	052b3449e5	test(cli): regression test for manual /compress system_message Add tests/test_cli_manual_compress.py verifying _manual_compress passes None (not the cached system prompt) to _compress_context, forwards the /compress <topic> focus string, rotates CLI session_id to the new child session, and clears the pending title. Co-authored-by: revar <revar@users.noreply.github.com>	2026-04-28 05:21:49 -07:00
ygd58	fb112d6a73	fix(cli): pass None as system_message in manual compress to prevent duplication _manual_compress() passed self.agent._cached_system_prompt to _compress_context() as the system_message argument. _compress_context calls _build_system_prompt(system_message), which appends system_message to prompt_parts that already contain the agent identity block — causing the identity to appear twice in the new session's system prompt (20,957 -> 42,303 chars, +102% as reported in issue #15281). Fix: pass None instead of _cached_system_prompt. _build_system_prompt(None) rebuilds the system prompt correctly from scratch without appending a pre-built prompt on top of the identity layers. Fixes #15281	2026-04-28 05:21:49 -07:00
teknium1	7444e49d4e	fix(gateway): use transcript timestamp for auto-continue freshness Follow-up to PR #16802 (BeliefanX). The original fix read `agent_history[-1].get("timestamp")` for the tool-tail freshness gate, but `gateway/run.py` strips the `timestamp` field off all tool/tool_call rows when building `agent_history` from the raw transcript (see `clean_msg = {k: v for k, v in msg.items() if k != "timestamp"}`). At runtime the tool-tail branch always saw `None` and silently took the legacy-fresh path — the stale-guard never fired for the tool-tail case it was supposed to cover. Changes: - Read the freshness signal from the RAW `history` list (via new `_last_transcript_timestamp()` helper) BEFORE the strip. Both the resume_pending branch and the tool-tail branch use this single signal, replacing the two divergent ones. - Default window bumped 15 min → 1 hour via new `_AUTO_CONTINUE_FRESHNESS_SECS_DEFAULT`. The 15-minute default was shorter than the default `gateway_timeout` of 30 min, so a legitimate long-running turn interrupted near its timeout boundary and resumed shortly after would have been misclassified as stale. - Configurable via `config.yaml` `agent.gateway_auto_continue_freshness` (bridged to `HERMES_AUTO_CONTINUE_FRESHNESS` at gateway startup — same pattern as `gateway_timeout`). Set to 0 to disable the gate. - `_coerce_gateway_timestamp` now explicitly rejects bool (which is a subclass of int and would otherwise coerce to 0.0/1.0). - Tests rewritten to exercise the real production data shape: raw `history` → `_build_agent_history` strip → freshness decision. A regression guard (`test_stale_tool_tail_with_production_data_shape`) asserts `agent_history` tool rows carry NO timestamp, protecting against someone "fixing" the original bug by re-adding the stripped field (which would break the OpenAI tool-result message contract). Add BeliefanX to scripts/release.py AUTHOR_MAP. E2E verified: config.yaml → env var bridge → helper returns configured value; default 1h window; malformed/empty env var falls back to default; ISO-Z timestamps parse; ms-epoch coerced; bool rejected.	2026-04-28 05:20:35 -07:00
beliefanx	93feffbcfa	fix(gateway): avoid stale interrupted turn auto-continue	2026-04-28 05:20:35 -07:00
Teknium	b61d9b297a	refactor: consolidate symlink-safe atomic replace into shared helper Extract the islink/realpath guard from the 16743 fix into a single atomic_replace() helper in utils.py, then migrate every os.replace() call site in the codebase to use it. The original PR #16777 correctly identified and fixed the bug, but only patched 9 of ~24 call sites. The same bug class (managed deployments that symlink state files silently losing the link on every write) still existed at auth.json, sessions file, gateway config, env_loader, webhook subscriptions, debug store, model catalog, pairing, google OAuth, nous rate guard, and more. Rather than add another 10+ copies of the same three-line guard, consolidate into atomic_replace(tmp, target) which: - resolves symlinks via os.path.realpath before os.replace - returns the resolved real path so callers can re-apply permissions - is a drop-in replacement for os.replace at the use sites Changes: - utils.py: new atomic_replace() helper + atomic_json_write / atomic_yaml_write now call it instead of inlining the guard - 16 files: all os.replace() call sites migrated to atomic_replace() - agent/{google_oauth, nous_rate_guard, shell_hooks}.py - cron/jobs.py - gateway/{pairing, session, platforms/telegram}.py - hermes_cli/{auth, config, debug, env_loader, model_catalog, webhook}.py - tools/{memory_tool, skill_manager_tool, skills_sync}.py Tests: tests/test_atomic_replace_symlinks.py pins the invariant for atomic_replace + atomic_json_write + atomic_yaml_write, covers plain files, first-time creates, broken symlinks, and permission preservation. Refs #16743 Builds on #16777 by @vominh1919.	2026-04-28 04:58:22 -07:00
vominh1919	3ab97a32d1	fix: preserve symlinks during atomic file writes (#16743 ) os.replace(tmp, path) replaces the symlink itself with a regular file, breaking users who symlink config.yaml, SOUL.md, or .env from ~/.hermes/ to a dotfiles repo or managed profile package. Fix: resolve symlinks via os.path.realpath() before os.replace(), so the real file is overwritten in-place while the symlink survives. Fixed in 7 files covering all os.replace call sites: - utils.py (atomic_json_write, atomic_yaml_write — fixes save_config) - hermes_cli/config.py (env sanitizer, save_env_value, remove_env_value) - tools/skill_manager_tool.py (_atomic_write_text — SOUL.md writes) - tools/memory_tool.py (memory file writes) - tools/skills_sync.py (manifest writes) - cron/jobs.py (job state + output file writes) - agent/shell_hooks.py (hook file writes) Fixes NousResearch/hermes-agent#16743	2026-04-28 04:58:22 -07:00
teknium1	1369dae226	test(openclaw-migration): cover alias reverse-lookup for real OpenClaw schema Real OpenClaw configs key agents.defaults.models by full provider/model API ID with an 'alias' field on the value (e.g. {'anthropic/claude-opus-4-6': {'alias': 'Claude Opus 4.6'}}). Add regression tests for issue #16745 covering: - reverse-lookup of alias against real schema (keyed by API ID) - alias resolution when model is a bare string vs {'primary': ...} - passthrough when the value is already a provider/model API ID - passthrough when the alias has no catalog match - string-valued catalog entries (belt-and-suspenders) - no catalog at all	2026-04-28 04:58:13 -07:00
vominh1919	7996c14795	fix: resolve model aliases during claw migrate (#16745 ) `hermes claw migrate` copied OpenClaw's model setting verbatim, which could be a display alias (e.g. "Claude Opus 4.6") instead of the actual API ID (e.g. "claude-opus-4-6"). Hermes then sent the alias to the API, causing HTTP 404 model not found. Fix: look up the model string in agents.defaults.models (plural) alias catalog. If found, use the resolved "id" field, prepending the provider prefix if needed. If not found (already an API ID), pass through unchanged. Fixes NousResearch/hermes-agent#16745	2026-04-28 04:58:13 -07:00
阿泥豆	4aa0a7c195	fix(error-classifier): add insufficient balance to billing patterns DeepSeek API returns HTTP 400 with 'Insufficient Balance' message when account funds are depleted. This pattern was not in _BILLING_PATTERNS, causing the error to be misclassified instead of triggering billing exhaustion handling (e.g., fallback to alternate provider). Suggested by teknium1 in PR review of #15586.	2026-04-28 04:58:09 -07:00
Teknium	7428abd54e	chore(release): map mtf201013@gmail.com -> ma-pony	2026-04-28 04:58:03 -07:00
Teknium	0f473d643d	refactor(schema): consolidate nullable-union stripping in schema_sanitizer Adds tools.schema_sanitizer.strip_nullable_unions as the single implementation for collapsing anyOf/oneOf nullable unions. Both the MCP input-schema normalizer and the Anthropic tool-schema guard now delegate to it instead of re-implementing the same walk three times. The global sanitizer also gains a final pass so any tool that slips past the two earlier hooks (plugin tools, non-MCP custom tools with Pydantic-shaped schemas) still gets safe input_schemas on Anthropic. - tools/schema_sanitizer.py: * New public strip_nullable_unions(schema, keep_nullable_hint=True). * _sanitize_single_tool() calls it as a final pass (hint preserved so coerce_tool_args can still map string "null" to None). - tools/mcp_tool.py: _normalize_mcp_input_schema delegates. - agent/anthropic_adapter.py: _normalize_tool_input_schema delegates with keep_nullable_hint=False (Anthropic does not recognize nullable). No behavioral change for the fix itself; tests (73/73 targeted + E2E across MCP→sanitizer→Anthropic paths) pass.	2026-04-28 04:58:03 -07:00
Pony.Ma	aa94883288	fix(mcp): preserve nullable schema coercion	2026-04-28 04:58:03 -07:00
Pony.Ma	1350d12b0b	fix: keep mcp dynamic refresh tasks tracked	2026-04-28 04:58:03 -07:00
Pony.Ma	02ae152222	fix(mcp): normalize nullable tool schemas	2026-04-28 04:58:03 -07:00
teknium1	9cd02b1698	chore(release): map r.filgueiras@apheris.com -> rfilgueiras	2026-04-28 03:53:11 -07:00
Ruda Porto Filgueiras	37551ee53e	test(bedrock): add model picker and region routing tests 25 new tests (all Bedrock API calls mocked, no real AWS creds needed): tests/hermes_cli/test_bedrock_model_picker.py (20 tests): - provider_model_ids("bedrock") uses live discovery, returns regional model IDs, falls back gracefully on empty/exception, resolves all bedrock aliases (aws, aws-bedrock, amazon-bedrock) to live discovery - list_authenticated_providers() section 2: bedrock appears with AWS creds, model list from discover_bedrock_models(), total_models matches, is_current flag works, absent creds hides bedrock, discovery failure does not crash, no duplicate entries - Region routing: botocore profile eu-central-1 yields eu.* model IDs end-to-end; env var takes priority over botocore profile - providers.py overlay: exists with correct transport/auth_type, label is non-empty, all aliases normalize to bedrock tests/agent/test_bedrock_adapter.py (5 tests): - resolve_bedrock_region() botocore profile fallback, botocore failure fallback, us-east-1 hard fallback (with botocore mocked)	2026-04-28 03:53:11 -07:00
Ruda Porto Filgueiras	a23f18cc3e	fix(bedrock): add live model discovery and region resolution for non-US regions provider_model_ids("bedrock") fell through to a static _PROVIDER_MODELS table containing only hardcoded us.* model IDs. Users configured for non-US AWS regions (eu-central-1, ap-northeast-1, etc.) saw wrong or no models in /model and autocomplete. Root causes fixed: 1. models.py: provider_model_ids() now calls discover_bedrock_models() keyed by the resolved region before falling back to the static table. A new bedrock_model_ids_or_none() helper in bedrock_adapter.py consolidates the discover -> extract IDs -> fallback pattern used by all three call sites. 2. providers.py: registers bedrock in HERMES_OVERLAYS with transport=bedrock_converse and auth_type=aws_sdk so get_provider("bedrock") and resolve_provider_full("bedrock") work. 3. model_switch.py: list_authenticated_providers() sections 2 and 3 detect AWS credentials via has_aws_credentials() for aws_sdk overlays and use live discovery for the model list. 4. bedrock_adapter.py: resolve_bedrock_region() reads the configured region from botocore.session before falling back to us-east-1, covering users who set their region in ~/.aws/config via a named profile rather than env vars. 5. tui_gateway/server.py: passes provider= to get_model_context_length() so context window lookups work correctly for the Bedrock provider.	2026-04-28 03:53:11 -07:00
Teknium	023f5c74b1	fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path (#16957 ) * fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path OAuth requests now identify as Hermes on the wire. Removed: - "You are Claude Code, Anthropic's official CLI for Claude." system prompt prepend - Hermes Agent → Claude Code / Nous Research → Anthropic system-prompt substitutions - mcp_ tool-name prefix on outgoing tool schemas + message history - Matching mcp_ strip on inbound tool_use blocks (strip_tool_prefix path removed from AnthropicTransport.normalize_response, + all 5 call sites in run_agent.py and auxiliary_client.py) - user-agent: claude-cli/<v> (external, cli) and x-app: cli headers on the Messages API client Added: - OAuth path strips context-1m-2025-08-07 — Anthropic rejects OAuth requests carrying it with HTTP 400 'This authentication style is incompatible with the long context beta header.' Kept (auth plumbing, not identity spoofing): - _is_oauth_token classifier and is_oauth flag threading - Bearer vs x-api-key auth routing - _OAUTH_ONLY_BETAS (claude-code-20250219, oauth-2025-04-20) — backend requires these on the OAuth-gated Messages endpoint - _OAUTH_CLIENT_ID (Claude Code's) — Anthropic doesn't issue OAuth creds to third parties; this is the only way the login flow works - claude-cli/<v> User-Agent on the OAuth token exchange + refresh endpoints at platform.claude.com/v1/oauth/token — bare requests get Cloudflare 1010 blocked Verified live against api.anthropic.com with a fresh sk-ant-oat01-* token: - claude-haiku-4-5 simple message: HTTP 200, 'OK' response - claude-haiku-4-5 tool call: HTTP 200, stop_reason=tool_use, tool named 'terminal' (no mcp_ prefix) round-tripped correctly - Outgoing wire: no user-agent, no x-app, real Hermes identity in system prompt, real tool name in schema Closes/supersedes #16820 (mcp_ PascalCase normalization patch — no longer needed since the mcp_ round-trip is gone). * fix(anthropic): resolve_anthropic_token() reads credential pool first Close the gap where ~/.hermes/auth.json → credential_pool.anthropic (where hermes login + dashboard PKCE flow write OAuth tokens) was not in resolve_anthropic_token()'s source list. Before: users who authed via hermes login got the token written into the pool, but legacy fallback code paths (auxiliary_client, models catalog fetch, explicit-runtime path) that call resolve_anthropic_token() saw None and raised 'No Anthropic credentials found' — even though the token was sitting in auth.json. New priority 1: pool.select() with env-sourced entries skipped. Skipping env:* entries preserves the existing env-var priority logic further down the chain (static env OAuth → refreshable Claude Code upgrade via _prefer_refreshable_claude_code_token). Surfaced while writing the hermes-agent-dev skill playbook for 'finding a live OAuth token for an E2E test'. --------- Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 03:51:17 -07:00
Teknium	2b728e1274	fix(agent): drop thinking-only assistant turns before provider call (#16959 ) Adds a pre-call sanitizer that detects assistant messages containing only reasoning (reasoning / reasoning_content, no visible content, no tool_calls) and drops them from the API copy. Adjacent user messages left behind are merged so role alternation is preserved for the provider. Mirrors Claude Code's approach in src/utils/messages.ts (filterOrphanedThinkingOnlyMessages + mergeAdjacentUserMessages). We drop the whole turn rather than fabricate stub text (the '.' / '(continued)' pattern from contributor PRs #11098, #13010, #16842 that were rejected because they put words in the model's mouth). The stored conversation history (self.messages) is never mutated — only the per-call api_messages copy. Users still see the reasoning block in the CLI/gateway transcript; only the wire copy is cleaned. Session persistence keeps the full trace. Two call sites covered: - Main agent loop, after _sanitize_api_messages (catches every turn). - Iteration-limit-summary fallback path. Tests: tests/run_agent/test_thinking_only_sanitizer.py — 25 cases covering detection (string/list content, whitespace-only, tool_calls, reasoning_details list form), drop behavior, adjacent-user merge (string+string, list+list, mixed), non-mutation of input dicts, and system-message handling. E2E live-tested against 5 providers with a poisoned history (empty assistant message + reasoning_content): OpenRouter→Anthropic/OpenAI/ DeepSeek-R1/Qwen, native Gemini. All 5 accepted the cleaned request. Happy-path regression (5/5) confirms the sanitizer is a noop when no thinking-only turn exists. Related: #16823 (wontfix — stub-text approach rejected). Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 03:50:51 -07:00
teknium1	5316ce95de	chore(release): map simonweng@tencent.com -> Contentment003111 AUTHOR_MAP entry for the tencent-tokenhub provider PR #16860 contributor.	2026-04-28 03:45:52 -07:00
simonweng	a6a6cf047d	feat(providers): add tencent-tokenhub provider support Registers tencent-tokenhub (https://tokenhub.tencentmaas.com/v1) as a new API-key provider with model tencent/hy3-preview (256K context). - PROVIDER_REGISTRY entry + TOKENHUB_API_KEY / TOKENHUB_BASE_URL env vars - Aliases: tencent, tokenhub, tencent-cloud, tencentmaas - openai_chat transport with is_tokenhub branch for top-level reasoning_effort (Hy3 is a reasoning model) - tencent/hy3-preview:free added to OpenRouter curated list - 60+ tests (provider registry, aliases, runtime resolution, credentials, model catalog, URL mapping, context length) - Docs: integrations/providers.md, environment-variables.md, model-catalog.json Author: simonweng <simonweng@tencent.com> Salvaged from PR #16860 onto current main (resolved conflicts with #16935 Azure Anthropic env-var hint tests and the --provider choices= list removal in chat_parser).	2026-04-28 03:45:52 -07:00
Teknium	bd10acd747	fix(providers): honor key_env/api_key_env on Azure Anthropic + accept alias in normalizer (#16935 ) Three related fixes around custom env-var-name hints for provider entries. 1. Azure Anthropic path: previously hardcoded to look up AZURE_ANTHROPIC_KEY then ANTHROPIC_API_KEY with no way to override. If a user wrote model: provider: anthropic base_url: https://my-resource.services.ai.azure.com/anthropic key_env: MY_CUSTOM_KEY the key_env hint was silently ignored and the resolver raised 'No Azure Anthropic API key found' even when MY_CUSTOM_KEY was set in the environment. The runtime now checks, in order: (1) os.getenv(model_cfg.key_env) (2) os.getenv(model_cfg.api_key_env) # docs alias (3) model_cfg.api_key # inline value (4) AZURE_ANTHROPIC_KEY # historical default (5) ANTHROPIC_API_KEY # historical default Error message updated to mention key_env as an option. 2. Provider entry normalizer (_normalize_custom_provider_entry): accept 'api_key_env' as a snake_case alias for 'key_env', and 'apiKeyEnv' as a camelCase alias. Adds both to the _KNOWN_KEYS set so the 'unknown config keys ignored' warning doesn't fire on valid configs. 3. _VALID_CUSTOM_PROVIDER_FIELDS: add 'key_env'. That set documents supported custom_providers entry fields; it was drifting from reality since key_env has been read at runtime in auxiliary_client.py, runtime_provider.py, and main.py for a while. Docs: website/docs/guides/azure-foundry.md now uses the canonical key_env field and notes that api_key_env / keyEnv / apiKeyEnv are accepted as aliases. Validation: 12 new tests in test_runtime_provider_resolution.py covering all 5 Azure Anthropic resolution paths + 4 normalizer-alias tests. Pass rate across related suites (165 + 46 tests): 100%. Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 02:12:08 -07:00
teknium1	4148e85b3a	docs(web): document web_search limit parameter and query operators	2026-04-28 02:09:30 -07:00
墨綠BG	4462b349b2	✨ feat(web): expose search result limit	2026-04-28 02:09:30 -07:00
Teknium	4e5ebf07ea	fix(matrix): stop tagging the user on every reply (#16932 ) The mention_user_id injection from #`38a6bada9` unconditionally attached an @user:server mention pill + MSC3952 m.mentions.user_ids payload to every outbound reply and every tool-progress status update. The stated intent was push notifications in muted rooms, but shipped as always-on in every room, DM or group, muted or not — so every reply pinged the user. - gateway/platforms/base.py: stop injecting mention_user_id into send metadata on every reply; restore the original _thread_metadata passthrough. - gateway/run.py: drop mention_user_id from status-thread metadata. - gateway/platforms/matrix.py: drop the mention-pill append block in _send_text that consumed the metadata. Keep the reaction-based exec approval half of #`38a6bada9` and the inbound/outbound m.mentions handling (unrelated to the per-reply ping). Reported by Elkim [NOUS] on Discord. Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 02:00:37 -07:00

1 2 3 4 5 ...

6430 commits