provider_model_ids("bedrock") fell through to a static _PROVIDER_MODELS
table containing only hardcoded us.* model IDs. Users configured for
non-US AWS regions (eu-central-1, ap-northeast-1, etc.) saw wrong or no
models in /model and autocomplete.
Root causes fixed:
1. models.py: provider_model_ids() now calls discover_bedrock_models()
keyed by the resolved region before falling back to the static table.
A new bedrock_model_ids_or_none() helper in bedrock_adapter.py
consolidates the discover -> extract IDs -> fallback pattern used by
all three call sites.
2. providers.py: registers bedrock in HERMES_OVERLAYS with
transport=bedrock_converse and auth_type=aws_sdk so
get_provider("bedrock") and resolve_provider_full("bedrock") work.
3. model_switch.py: list_authenticated_providers() sections 2 and 3
detect AWS credentials via has_aws_credentials() for aws_sdk
overlays and use live discovery for the model list.
4. bedrock_adapter.py: resolve_bedrock_region() reads the configured
region from botocore.session before falling back to us-east-1,
covering users who set their region in ~/.aws/config via a named
profile rather than env vars.
5. tui_gateway/server.py: passes provider= to get_model_context_length()
so context window lookups work correctly for the Bedrock provider.
* fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path
OAuth requests now identify as Hermes on the wire. Removed:
- "You are Claude Code, Anthropic's official CLI for Claude." system
prompt prepend
- Hermes Agent → Claude Code / Nous Research → Anthropic
system-prompt substitutions
- mcp_ tool-name prefix on outgoing tool schemas + message history
- Matching mcp_ strip on inbound tool_use blocks (strip_tool_prefix path
removed from AnthropicTransport.normalize_response, + all 5 call
sites in run_agent.py and auxiliary_client.py)
- user-agent: claude-cli/<v> (external, cli) and x-app: cli headers on
the Messages API client
Added:
- OAuth path strips context-1m-2025-08-07 — Anthropic rejects OAuth
requests carrying it with HTTP 400 'This authentication style is
incompatible with the long context beta header.'
Kept (auth plumbing, not identity spoofing):
- _is_oauth_token classifier and is_oauth flag threading
- Bearer vs x-api-key auth routing
- _OAUTH_ONLY_BETAS (claude-code-20250219, oauth-2025-04-20) — backend
requires these on the OAuth-gated Messages endpoint
- _OAUTH_CLIENT_ID (Claude Code's) — Anthropic doesn't issue OAuth
creds to third parties; this is the only way the login flow works
- claude-cli/<v> User-Agent on the OAuth token exchange + refresh
endpoints at platform.claude.com/v1/oauth/token — bare requests get
Cloudflare 1010 blocked
Verified live against api.anthropic.com with a fresh sk-ant-oat01-*
token:
- claude-haiku-4-5 simple message: HTTP 200, 'OK' response
- claude-haiku-4-5 tool call: HTTP 200, stop_reason=tool_use, tool
named 'terminal' (no mcp_ prefix) round-tripped correctly
- Outgoing wire: no user-agent, no x-app, real Hermes identity in
system prompt, real tool name in schema
Closes/supersedes #16820 (mcp_ PascalCase normalization patch — no longer
needed since the mcp_ round-trip is gone).
* fix(anthropic): resolve_anthropic_token() reads credential pool first
Close the gap where ~/.hermes/auth.json → credential_pool.anthropic
(where hermes login + dashboard PKCE flow write OAuth tokens) was not
in resolve_anthropic_token()'s source list.
Before: users who authed via hermes login got the token written into
the pool, but legacy fallback code paths (auxiliary_client, models
catalog fetch, explicit-runtime path) that call resolve_anthropic_token()
saw None and raised 'No Anthropic credentials found' — even though the
token was sitting in auth.json.
New priority 1: pool.select() with env-sourced entries skipped. Skipping
env:* entries preserves the existing env-var priority logic further
down the chain (static env OAuth → refreshable Claude Code upgrade via
_prefer_refreshable_claude_code_token).
Surfaced while writing the hermes-agent-dev skill playbook for
'finding a live OAuth token for an E2E test'.
---------
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Adds a pre-call sanitizer that detects assistant messages containing only
reasoning (reasoning / reasoning_content, no visible content, no
tool_calls) and drops them from the API copy. Adjacent user messages
left behind are merged so role alternation is preserved for the
provider.
Mirrors Claude Code's approach in src/utils/messages.ts
(filterOrphanedThinkingOnlyMessages + mergeAdjacentUserMessages). We
drop the whole turn rather than fabricate stub text (the '.' /
'(continued)' pattern from contributor PRs #11098, #13010, #16842 that
were rejected because they put words in the model's mouth).
The stored conversation history (self.messages) is never mutated — only
the per-call api_messages copy. Users still see the reasoning block in
the CLI/gateway transcript; only the wire copy is cleaned. Session
persistence keeps the full trace.
Two call sites covered:
- Main agent loop, after _sanitize_api_messages (catches every turn).
- Iteration-limit-summary fallback path.
Tests: tests/run_agent/test_thinking_only_sanitizer.py — 25 cases
covering detection (string/list content, whitespace-only, tool_calls,
reasoning_details list form), drop behavior, adjacent-user merge
(string+string, list+list, mixed), non-mutation of input dicts, and
system-message handling.
E2E live-tested against 5 providers with a poisoned history (empty
assistant message + reasoning_content): OpenRouter→Anthropic/OpenAI/
DeepSeek-R1/Qwen, native Gemini. All 5 accepted the cleaned request.
Happy-path regression (5/5) confirms the sanitizer is a noop when no
thinking-only turn exists.
Related: #16823 (wontfix — stub-text approach rejected).
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Registers tencent-tokenhub (https://tokenhub.tencentmaas.com/v1) as a
new API-key provider with model tencent/hy3-preview (256K context).
- PROVIDER_REGISTRY entry + TOKENHUB_API_KEY / TOKENHUB_BASE_URL env vars
- Aliases: tencent, tokenhub, tencent-cloud, tencentmaas
- openai_chat transport with is_tokenhub branch for top-level
reasoning_effort (Hy3 is a reasoning model)
- tencent/hy3-preview:free added to OpenRouter curated list
- 60+ tests (provider registry, aliases, runtime resolution,
credentials, model catalog, URL mapping, context length)
- Docs: integrations/providers.md, environment-variables.md,
model-catalog.json
Author: simonweng <simonweng@tencent.com>
Salvaged from PR #16860 onto current main (resolved conflicts with
#16935 Azure Anthropic env-var hint tests and the --provider choices=
list removal in chat_parser).
Three related fixes around custom env-var-name hints for provider entries.
1. Azure Anthropic path: previously hardcoded to look up AZURE_ANTHROPIC_KEY
then ANTHROPIC_API_KEY with no way to override. If a user wrote
model:
provider: anthropic
base_url: https://my-resource.services.ai.azure.com/anthropic
key_env: MY_CUSTOM_KEY
the key_env hint was silently ignored and the resolver raised
'No Azure Anthropic API key found' even when MY_CUSTOM_KEY was set
in the environment. The runtime now checks, in order:
(1) os.getenv(model_cfg.key_env)
(2) os.getenv(model_cfg.api_key_env) # docs alias
(3) model_cfg.api_key # inline value
(4) AZURE_ANTHROPIC_KEY # historical default
(5) ANTHROPIC_API_KEY # historical default
Error message updated to mention key_env as an option.
2. Provider entry normalizer (_normalize_custom_provider_entry): accept
'api_key_env' as a snake_case alias for 'key_env', and 'apiKeyEnv' as a
camelCase alias. Adds both to the _KNOWN_KEYS set so the 'unknown
config keys ignored' warning doesn't fire on valid configs.
3. _VALID_CUSTOM_PROVIDER_FIELDS: add 'key_env'. That set documents
supported custom_providers entry fields; it was drifting from reality
since key_env has been read at runtime in auxiliary_client.py,
runtime_provider.py, and main.py for a while.
Docs: website/docs/guides/azure-foundry.md now uses the canonical key_env
field and notes that api_key_env / keyEnv / apiKeyEnv are accepted as
aliases.
Validation: 12 new tests in test_runtime_provider_resolution.py covering
all 5 Azure Anthropic resolution paths + 4 normalizer-alias tests. Pass
rate across related suites (165 + 46 tests): 100%.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
The mention_user_id injection from #38a6bada9 unconditionally attached an
@user:server mention pill + MSC3952 m.mentions.user_ids payload to every
outbound reply and every tool-progress status update. The stated intent
was push notifications in muted rooms, but shipped as always-on in every
room, DM or group, muted or not — so every reply pinged the user.
- gateway/platforms/base.py: stop injecting mention_user_id into send
metadata on every reply; restore the original _thread_metadata passthrough.
- gateway/run.py: drop mention_user_id from status-thread metadata.
- gateway/platforms/matrix.py: drop the mention-pill append block in
_send_text that consumed the metadata. Keep the reaction-based exec
approval half of #38a6bada9 and the inbound/outbound m.mentions
handling (unrelated to the per-reply ping).
Reported by Elkim [NOUS] on Discord.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
* feat(claw-migrate): harden OpenClaw import with plan-first apply, redaction, and pre-migration backup
Adopts four design patterns from OpenClaw's reciprocal migrate-hermes
importer so both migration paths have the same safety posture.
- **Refuse-on-conflict apply.** 'hermes claw migrate' now refuses to
execute when the plan has any conflict items, unless --overwrite is
set. Previously the user could say 'yes, proceed' and end up with a
silent partial migration that skipped every conflicting item.
- **Engine-level secret redaction.** The report.json and summary.md
written to disk (and --json stdout) run through a redactor that
matches OpenClaw's key-name markers and value-shape patterns
(sk-*, ghp_*, xox*-, AIza*, Bearer *). Prevents accidental API key
leakage in bug reports and support channels.
- **Pre-migration tarball snapshot.** Apply creates one timestamped
restore-point archive of ~/.hermes/ at ~/.hermes/migration/pre-migration-backups/
before any mutation, excluding regenerable directories
(sessions, logs, cache). Opt out with --no-backup.
- **Blocked-by-earlier-conflict sequencing.** If a config.yaml write
hits conflict/error mid-apply, subsequent config-mutating options
are marked skipped with reason 'blocked by earlier apply conflict'
rather than attempting partial writes.
- **Structured warnings[] and next_steps[] on the report** — actionable
guidance surfaces in both JSON output and summary.md.
- **--json output mode** — emits the redacted report on stdout for CI.
Also flips --preset full to NOT auto-enable --migrate-secrets. Users
now have to opt in to secret import explicitly, mirroring OpenClaw's
two-phase posture.
Status/kind/action constants are defined (STATUS_MIGRATED etc) with
values that match the existing strings the script emits, so the
report schema is backward-compatible. ItemResult gains a 'sensitive'
bool field that redaction and consumers can key off.
Validation: 26 new unit tests + 1 updated test in tests/skills/
test_openclaw_migration_hardening.py and test_claw.py cover redaction
(key markers, value patterns, recursion, on-disk), warnings/next_steps,
blocked-by-earlier sequencing, --json mode, and the preset-flip.
Manual E2E against a fake $HERMES_HOME with real-shaped secrets
confirmed: (1) secrets never appear in stdout or on disk,
(2) _cmd_migrate refuses apply when plan has conflicts,
(3) --overwrite proceeds past the guard and the backup tarball is
created, (4) --no-backup skips the archive.
Related docs: website/docs/guides/migrate-from-openclaw.md and
website/docs/reference/cli-commands.md updated to reflect the
preset-flip and new --no-backup flag.
* refactor(claw-migrate): reuse hermes backup system for pre-migration snapshot
Drops the inline tarball in hermes_cli/claw.py in favor of
hermes_cli.backup.create_pre_migration_backup(), which shares an
implementation with create_pre_update_backup via a new
_write_full_zip_backup helper. Benefits:
- Consistent exclusion rules with hermes backup (_EXCLUDED_DIRS,
_EXCLUDED_SUFFIXES, _EXCLUDED_NAMES — single source of truth).
- SQLite safe-copy via _safe_copy_db (state.db restores cleanly).
- Zip format restorable with 'hermes import <archive>'.
- Lives under ~/.hermes/backups/pre-migration-*.zip alongside
pre-update-*.zip — one place for all snapshot archives.
- Auto-prune rotation with separate keep counters (pre-migration
keeps 5, pre-update keeps 5, they don't touch each other's files).
7 new tests in tests/hermes_cli/test_backup.py lock the contract:
directory location, shared exclusion rules, _validate_backup_zip
acceptance (i.e. restorable with 'hermes import'), non-recursive
into prior backups, rotation, missing-home handling, and the
invariant that pre-migration rotation never touches pre-update
backups.
Help text and docs updated — the restore hint now says
'hermes import <name>' instead of 'tar -xzf <archive> -C ~/'.
* chore(claw-migrate): use backup._format_size and drop duplicate output line
Minor polish using another existing primitive from hermes_cli.backup:
- Show backup archive size with _format_size (e.g. '(245 B)' or '(2.4 MB)')
matching the format hermes backup already uses.
- Drop the duplicate 'Pre-migration backup saved' line after Migration
Results — the earlier 'Pre-migration backup: <path> (<size>)' line
already surfaces the path before apply runs.
---------
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Follow-up to the static list refresh: replace the hardcoded xAI entries
with _xai_curated_models(), mirroring the _codex_curated_models()
pattern from PR #7844. The helper reads $HERMES_HOME/models_dev_cache.json
at import time (no network call) and falls back to a small static list
when the cache is missing or malformed.
Why: _PROVIDER_MODELS["xai"] has drifted once already (issue #16699) and
will drift again next time xAI renames a model. Hermes already maintains
the models.dev cache and uses it for context-length lookups; pointing
_PROVIDER_MODELS at the same source means the /model picker self-heals on
the next cache refresh instead of requiring a PR.
Behavior:
- With cache populated (normal user): shows every current xAI model ID,
picks up renames automatically on next refresh.
- Without cache (fresh install, offline): falls back to a static snapshot
of the 9 current flagship IDs.
- Malformed cache / unexpected shape: same static fallback, no crash.
Import time verified <20ms — disk read only, no HTTP.
Addresses the structural piece of #16699 ("consider a single
_provider_models(provider) resolver") for xAI. Other per-provider lists
can adopt the same pattern as drift is observed.
_PROVIDER_MODELS["xai"] was pointing at model IDs the xAI direct API
no longer accepts:
- grok-4.20-reasoning
- grok-4-1-fast-reasoning
Replaced with the actual current xAI catalog IDs from models.dev
($HERMES_HOME/models_dev_cache.json, mirror of https://models.dev/api.json):
grok-4.20-0309-reasoning
grok-4.20-0309-non-reasoning
grok-4.20-multi-agent-0309
grok-4-1-fast
grok-4-1-fast-non-reasoning
grok-4-fast
grok-4-fast-non-reasoning
grok-4
grok-code-fast-1
The xAI-direct API (https://api.x.ai/v1) serves the dated IDs shown
above; the bare aliases (grok-4.20, grok-4.1-fast, etc.) are
OpenRouter/Vercel-gateway normalizations and are not accepted on
xAI-direct. Those gateways remain unaffected.
Fixes#16699
Follow-up to PR #16819 applying the same treatment to the two sibling
fallback sites in resolve_provider_client() that carry the identical bug
class as the anonymous-custom branch:
- Named custom provider (providers: / custom_providers: config entries):
apply _to_openai_base_url() on the OpenAI-wire path (chat_completions /
codex_responses), leave custom_base untouched on the anthropic_messages
path where the /anthropic surface is intentional. Prefer
main_runtime.get('model') over _read_main_model() so the entry model
still wins first. The ImportError fallback for anthropic_messages now
redoes query-param extraction against the rewritten URL so the final
OpenAI client hits /v1.
- external_process branch (copilot-acp): same main_runtime.get('model')
fallback before _read_main_model() so auxiliary tasks on this provider
track live /model switches instead of stale config.yaml.
Keeps the fix consistent across all three custom-endpoint fallback sites
in resolve_provider_client().
Three related issues prevented user-defined providers in `providers:` and
`model_aliases:` from being reachable through standard CLI flags. Requests
silently routed to the configured `model.base_url` instead of the user-
intended endpoint.
* hermes_cli/model_switch.py — root cause of the silent misrouting:
`_ensure_direct_aliases()` rebound `DIRECT_ALIASES` to a freshly-loaded
dict, leaving every `from hermes_cli.model_switch import DIRECT_ALIASES`
caller stuck on the stale empty original. Switched to `.update()` so
module attribute references stay valid.
* hermes_cli/main.py — chat subcommand `--provider` had `choices=[...]`
hardcoded to built-in providers, rejecting valid keys from user
`providers:` config. Dropped the choices list; runtime resolution
validates correctly downstream.
* hermes_cli/oneshot.py — `-m <alias>` only resolved the model name; the
alias's base_url was never propagated. Now consults `DIRECT_ALIASES`
before falling through to `detect_provider_for_model`, and threads the
alias's base_url to `resolve_runtime_provider(explicit_base_url=...)`.
* hermes_cli/runtime_provider.py — `_resolve_named_custom_runtime` now
honors `(provider="custom", explicit_base_url=...)` so a base_url
propagated from a direct-alias resolution actually builds a runtime
instead of falling through to provider-registry handlers that don't
know about ad-hoc local endpoints.
Verified: `hermes chat --provider <user-key> -m <model> -q "..."` and
`hermes -m <user-alias> -z "..."` both route to the user-intended
endpoint, observable via the target server's request log.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to #15328's vision-unsupported retry branch in run_agent.py.
_strip_images_from_messages() previously deleted any message whose content
was entirely images. That's fine for synthetic user messages injected for
attachment delivery, but it breaks providers for tool-role messages — the
paired tool_call_id on the preceding assistant message ends up unmatched,
which OpenAI-compatible APIs reject with HTTP 400.
Fix: tool-role messages whose content becomes empty are replaced with a
plaintext placeholder that preserves the tool_call_id linkage. Only
non-tool messages are dropped. Added 10 tests covering the role-alternation
invariants + image-type coverage.
Image-rejection detector: expanded phrase list (image content not
supported / multimodal input / vision input / model does not support
image) and gated on 4xx status so transient 5xx errors never get
misinterpreted as 'server said no to images'. Detection is documented as
best-effort English phrase matching.
AUTHOR_MAP: mapped 3820588+ddupont808@users.noreply.github.com to
ddupont808 so release notes attribute the salvage correctly.
Tool handlers (e.g. computer_use capture) return a _multimodal envelope
dict when a screenshot is attached. The tool-message builder was passing
this raw dict as the `content` field of role:tool messages, which is an
illegal format — OpenAI-compatible APIs expect a string or a content-parts
list, not a plain Python dict, and would reject it with a 400/422 error.
Fix: unwrap _multimodal results to their `content` list
([{type:text,...},{type:image_url,...}]) in both the parallel and
sequential tool-call paths. The Anthropic adapter already handles content
lists natively; vision-capable OpenAI-compatible servers (mlx-vlm,
GPT-4o, etc.) accept image_url parts in tool messages directly.
Also add a _vision_supported adaptive fallback: on first image-rejection
error ("Only 'text' content type is supported." etc.) the agent strips all
image parts from the message history and retries with text only, so
text-only endpoints degrade gracefully without crashing the session.
Extends the cua-driver computer-use backend to drive backgrounded macOS
windows without stealing keyboard or mouse focus from the foreground app.
All changes target the cua-driver MCP backend and the shared dispatcher.
## cua_backend.py
**Window-aware capture**: capture() now calls list_windows + get_window_state
instead of the removed capture tool. Prefers structuredContent.windows
(MCP 2024-11-05+ cua-driver) for zero-parse window enumeration; falls back
to regex-parsed text for older builds. Stores the selected (pid, window_id)
as sticky context so subsequent action calls do not need a redundant round-trip.
**Action routing**: click/scroll/type_text/key all carry the sticky pid
(and window_id for element-indexed clicks). type_text routes through
type_text_chars (individual key events) rather than AX attribute write --
WebKit AXTextFields reject attribute writes from backgrounded processes.
**Key parsing**: _parse_key_combo splits cmd+s-style strings into
(key, [modifiers]) and routes to hotkey (modifier present) or
press_key (bare key) -- cua-driver actual tool names.
**set_value method**: new set_value(value, element) calls the cua-driver
set_value MCP tool. For AXPopUpButton / HTML select in a backgrounded Safari,
AXPress opens the native macOS popup which closes immediately when the app is
non-frontmost; set_value AX-presses the matching child option directly
(no menu required, no focus steal).
**focus_app**: reimplemented as a pure window-selector (enumerates
list_windows, sets sticky pid/window_id) without ever raising the window
or stealing focus.
**list_apps**: fixed tool name from listApps to list_apps; handles plain-text
response via regex when structured data is absent.
**Structured-content extraction**: _extract_tool_result now surfaces
structuredContent from MCP results, enabling the list_windows window array
without text parsing.
**Helpers**: _parse_windows_from_text, _parse_elements_from_tree,
_split_tree_text, _parse_key_combo extracted as module-level functions.
## schema.py
Added set_value to the action enum with a description explaining when to
prefer it over click (select/popup elements, sliders, no focus steal).
Added value field for set_value payloads.
## tool.py
Routed set_value action through _dispatch to backend.set_value.
Added set_value to _DESTRUCTIVE_ACTIONS (approval-gated).
Fixed MIME-type detection in _capture_response: cua-driver may return
JPEG; detect from base64 magic bytes (/9j/ -> image/jpeg, else image/png)
rather than hardcoding image/png.
## agent/display.py + run_agent.py
Guard _detect_tool_failure and result-preview logic against non-string
function_result values: multimodal tool results (dicts with _multimodal=True)
are not string-sliceable; treat them as successes and fall back to str()
for length/preview.
Background macOS desktop control via cua-driver MCP — does NOT steal the
user's cursor or keyboard focus, works with any tool-capable model.
Replaces the Anthropic-native `computer_20251124` approach from the
abandoned #4562 with a generic OpenAI function-calling schema plus SOM
(set-of-mark) captures so Claude, GPT, Gemini, and open models can all
drive the desktop via numbered element indices.
- `tools/computer_use/` package — swappable ComputerUseBackend ABC +
CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary).
- Universal `computer_use` tool with one schema for all providers.
Actions: capture (som/vision/ax), click, double_click, right_click,
middle_click, drag, scroll, type, key, wait, list_apps, focus_app.
- Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style
`content: [text, image_url]` parts) that flows through
handle_function_call into the tool message. Anthropic adapter converts
into native `tool_result` image blocks; OpenAI-compatible providers
get the parts list directly.
- Image eviction in convert_messages_to_anthropic: only the 3 most
recent screenshots carry real image data; older ones become text
placeholders to cap per-turn token cost.
- Context compressor image pruning: old multimodal tool results have
their image parts stripped instead of being skipped.
- Image-aware token estimation: each image counts as a flat 1500 tokens
instead of its base64 char length (~1MB would have registered as
~250K tokens before).
- COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset
is active.
- Session DB persistence strips base64 from multimodal tool messages.
- Trajectory saver normalises multimodal messages to text-only.
- `hermes tools` post-setup installs cua-driver via the upstream script
and prints permission-grant instructions.
- CLI approval callback wired so destructive computer_use actions go
through the same prompt_toolkit approval dialog as terminal commands.
- Hard safety guards at the tool level: blocked type patterns
(curl|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash,
force delete, lock screen, log out).
- Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic)
workflow guide.
- Docs: `user-guide/features/computer-use.md` plus reference catalog
entries.
44 new tests in tests/tools/test_computer_use.py covering schema
shape (universal, not Anthropic-native), dispatch routing, safety
guards, multimodal envelope, Anthropic adapter conversion, screenshot
eviction, context compressor pruning, image-aware token estimation,
run_agent helpers, and universality guarantees.
469/469 pass across tests/tools/test_computer_use.py + the affected
agent/ test suites.
- `model_tools.py` provider-gating: the tool is available to every
provider. Providers without multi-part tool message support will see
text-only tool results (graceful degradation via `text_summary`).
- Anthropic server-side `clear_tool_uses_20250919` — deferred;
client-side eviction + compressor pruning cover the same cost ceiling
without a beta header.
- macOS only. cua-driver uses private SkyLight SPIs
(SLEventPostToPid, SLPSPostEventRecordTo,
_AXObserverAddNotificationAndCheckRemote) that can break on any macOS
update. Pin with HERMES_CUA_DRIVER_VERSION.
- Requires Accessibility + Screen Recording permissions — the post-setup
prints the Settings path.
Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic-
native schema). Credit @0xbyt4 for the original #3816 groundwork whose
context/eviction/token design is preserved here in generic form.
Opt-in Langfuse tracing for Hermes conversations — LLM calls, tool
usage, usage/cost breakdown per span. Hooks into pre/post_api_request,
pre/post_llm_call, pre/post_tool_call. SDK is optional; missing SDK or
credentials renders the plugin inert.
Salvaged from PR #16845 by @kshitijk4poor, who wrote the plugin
(~875 LOC, 6 hooks, Langfuse usage-details/cost-details normalization,
read_file payload summarization).
Salvage scope (why this isn't PR #16845 as-authored):
- Lives at plugins/observability/langfuse/ (standalone kind, opt-in via
plugins.enabled) instead of a new parallel optional-plugins/
directory. Standalone bundled plugins are already opt-in — only their
plugin.yaml is scanned at startup; the Python module is not imported
unless the user enables it. The premise of optional-plugins/ (avoid
import cost for users who don't want it) is already solved by the
existing plugin system.
- Dropped the triple activation gate (plugins.enabled +
plugins.langfuse.enabled + HERMES_LANGFUSE_ENABLED). The Hermes plugin
system's own enable/disable is authoritative; runtime credentials
gate whether the hook actually traces.
- Rewrote _is_enabled() → cached _get_langfuse() with an _INIT_FAILED
sentinel. The original called hermes_cli.config.load_config() from
every hook invocation (full yaml parse + deep merge + env expansion
on every pre/post_tool_call, potentially 100+ times per turn). The
cached version reads env once and returns the cached client or None
on every subsequent call with zero further work.
- hermes tools → Langfuse Observability post-setup adds
observability/langfuse to plugins.enabled directly (via
_save_enabled_set) instead of going through an install-copy flow.
Enable:
hermes tools # interactive
hermes plugins enable observability/langfuse # manual
Required env (set by `hermes tools` or in ~/.hermes/.env):
HERMES_LANGFUSE_PUBLIC_KEY
HERMES_LANGFUSE_SECRET_KEY
HERMES_LANGFUSE_BASE_URL # optional
Co-authored-by: kshitijk4poor <kshitijk4poor@gmail.com>
Narrow plaintext shortcut that rewrites a tiny set of admin phrases
("restart gateway", "restart the gateway", "restart hermes") into the
/restart slash command, but only in DMs. Scope is intentionally tight:
- DM text messages only — group chats keep natural-language semantics
- Exact restart-style phrases only
- Skips anything already starting with "/"
Without this, the LLM can receive "restart gateway" as a user turn and
try to satisfy it via the terminal tool (systemctl restart ...). That
kills the gateway while the originating agent is still running, which
leaves systemd in "draining" state waiting on a process it's about to
kill. Routing the phrase to the slash-command dispatcher bypasses the
agent loop and uses the existing restart machinery (request_restart).
Called once, at the adapter level in BasePlatformAdapter.handle_message,
so every platform gets it for free and pending-message reinjection is
covered by the same call site.
Adds 2 Telegram-parametrized e2e tests: DM routes to request_restart,
group chats fall through to the normal agent path.
Runtime already supports list-form fallback_model (run_agent.py:1459
iterates fallback_chain; fallback_cmd.py migrates legacy single-dict
configs to list format). The config validator and save_config comment
gate still assumed single-dict form and flagged list-form configs as
errors. Fix both:
- validate_config_structure: when fallback_model is a list, validate
each entry has provider+model; keep the existing single-dict path.
- save_config: suppress the "add fallback_model" comment when any list
entry is well-formed.
Adds 4 list-form validator tests.
PR #16858's session-scoped interactive sudo password cache falls back to
a thread-identity scope when no HERMES_SESSION_KEY is bound. ACP never
set that contextvar, so two ACP sessions landing on the same reused
ThreadPoolExecutor thread still shared the cache — the exact scenario
the PR headlined.
acp_adapter/server.py now:
- binds HERMES_SESSION_KEY=<session_id> via gateway.session_context
inside _run_agent() (and clears on exit)
- wraps the loop.run_in_executor(_executor, _run_agent) call in a fresh
contextvars.copy_context() so concurrent ACP sessions don't stomp on
each other's ContextVar writes (executor pool threads would otherwise
share a context).
Adds tests/acp/test_approval_isolation.py::
test_sudo_password_cache_isolated_across_acp_sessions_on_same_pool_thread
which drives two back-to-back sessions through a 1-worker ThreadPoolExecutor
and asserts B does not observe A's cached password.
Follow-up on top of the cherry-picked contributor commit for #16751:
1. Delete triggers: the original PR switched FTS5 from external to inline
content mode and concatenated content || tool_name || tool_calls in
the insert/update triggers, but left the delete triggers passing
old.content to the FTS5 delete-command. FTS5 inline delete requires
the content to match what was stored, so every DELETE on messages
raised 'SQL logic error'. Replaced with plain DELETE FROM ... WHERE
rowid = old.id on all four delete paths (normal + trigram, delete +
update-delete).
2. v11 migration: existing DBs have the old external-content FTS tables
and triggers. Because CREATE VIRTUAL TABLE IF NOT EXISTS / CREATE
TRIGGER IF NOT EXISTS skip when the objects already exist, upgraders
would have kept the broken behavior forever. Bumped SCHEMA_VERSION
to 11 and added a migration that drops both FTS tables + all 6 old
triggers, recreates them via FTS_SQL / FTS_TRIGRAM_SQL, and backfills
from messages using the same concatenation expression.
3. Regression tests: 6 new tests cover INSERT / UPDATE / DELETE paths
for tool_name + tool_calls indexing plus the full v10 -> v11 upgrade
path on a hand-built legacy DB.
The FTS5 virtual tables (messages_fts, messages_fts_trigram) previously
only indexed the content column via external content mode. Tool calls
and tool names stored in the tool_calls (JSON) and tool_name columns
were invisible to FTS5 search.
Root cause: FTS5 triggers only INSERTed new.content into the index.
Changes:
- Switch FTS5 tables from external content (content=messages) to inline
mode so that trigger-inserted content is both indexed and stored
- Update all 6 FTS5 triggers to concatenate content, tool_name, and
tool_calls when indexing new messages
- Extend the short-CJK LIKE fallback to also search tool_name and
tool_calls columns
Closes: #16751
FTS5 default tokenizer splits 'sp_new1' into tokens 'sp' and 'new1'.
Without quoting, a search for 'sp_new' becomes an AND query
('sp AND new') that fails to match rows indexed as 'sp_new1'.
Fix: add underscore to the character class in Step 5 regex
([.-] -> [._-]) so underscored terms are wrapped in double quotes.
Also adds test_sanitize_fts5_quotes_underscored_terms.
Both keys are documented in cli-config.yaml.example and read at runtime by
hermes_cli/timeouts.py (get_provider_request_timeout and get_provider_stale_timeout),
but the provider-entry validator in config.py flagged them as unknown, producing
noisy warnings on every CLI invocation for users who followed the documented config.
Fixes#16779
- App.tsx doc comment: replace stale ChatPageHost reference with
'persistent chat host block rendered inline near the bottom of this
file' so readers can find the actual code.
- App.tsx persistent host: show a small spinner on /chat while plugin
manifests are loading instead of a blank content area. Direct
/chat deep-links used to paint empty for up to ~2s in the worst
case (plugin-registration safety timeout) because both the route
sink (null) and the persistent host (!pluginsLoading gate) render
nothing during that window. Non-chat routes stay empty as before.
- ChatPage.tsx: rename setter to match the 'raw' state — useState
now destructures as [mobilePanelOpenRaw, setMobilePanelOpenRaw],
and all four call sites (closeMobilePanel, matchMedia listener,
open-button onClick, plus destructure) updated accordingly. No
behavior change; matches the 'raw vs derived' convention the
original comment set up.
The dashboard's Chat tab (hermes dashboard --tui) lost its session
whenever the user navigated to another tab and came back. React Router
unmounted ChatPage on path change, which ran the cleanup function,
closed the PTY WebSocket, and terminated the underlying TUI child -
so the next mount generated a fresh channel id, spawned a new PTY, and
started a brand-new conversation.
Rather than rebuild the destroyed state (session id capture + resume
via HERMES_TUI_RESUME would reload history from disk but drop in-flight
tool state, scrollback, and picker position), keep the component tree
alive.
* Pull ChatPage out of Routes into a sibling always-mounted host that
toggles visibility via display:none keyed off the current route. A
tiny ChatRouteSink still claims /chat so the catch-all redirect
does not fire.
* xterm instance, WebSocket, PTY child, and TUI/agent state all
survive; returning to /chat shows the exact conversation the user
left.
* Respect plugin `/chat` overrides: if a plugin manifest declares
`tab.override: "/chat"`, the Routes tree already swaps the element
for <PluginPage /> — we additionally suppress the persistent host
so the two don't paint on top of each other. Preserves the
pre-persistence contract that a plugin owning /chat replaces the
built-in chat UI entirely.
* Wait for usePlugins() to finish loading before mounting the
persistent host. Manifests arrive asynchronously from
/api/dashboard/plugins, so without the `!pluginsLoading` gate the
host would mount with manifests=[], spawn a PTY, and then unmount
mid-session when the manifest list resolves and reveals a /chat
override. Typical delay is <50ms; worst case is the 2s plugin-
registration safety timeout. Cheaper than killing someone's
conversation underneath them.
* Gate page-header slot (`setEnd`), the mobile sheet's portalled
render, and body-scroll lock on a new `isActive` prop so the hidden
ChatPage doesn't fight the active page for shared state. The
scroll-lock effect keys on the *derived* `mobilePanelOpen` (which is
`isActive && mobilePanelOpenRaw`) rather than the raw state — that
way tab-switch flips the dep false, fires the cleanup, and releases
`document.body.style.overflow`. Keying on the raw state would leave
body.overflow="hidden" stuck on /sessions and every other tab until
the user navigated back to /chat and explicitly closed the sheet.
* When isActive flips false to true, force a double-rAF fit:
display:none collapses the host box and ResizeObserver does not fire
on display changes, so xterm would otherwise stay at a stale or 1x1
grid. Also early-return from syncTerminalMetrics when the host has
zero area, since fit() on a zero-sized element produces a 1x1
terminal.
* Focus handling on tab return: only steal focus into the terminal if
focus wasn't already parked somewhere inside ChatPage (e.g. the
sidebar model picker, a tool-call entry). Yanking focus away from
whatever the user last clicked is surprising and a screen-reader
foot-gun; the typical "first activation" case still focuses the
terminal because document.activeElement is <body> at that point.
Trade-off worth flagging, deliberately not mitigated in this change:
while hidden, ChatPage still holds a PTY child + WebSocket + xterm
instance for the dashboard's full lifetime. The WS keeps delivering
bytes and xterm keeps parsing them into a display:none host (cheap —
no paint work, but not free). Reasonable costs to pay for the session
preservation; if they become a problem we can pause `term.write` when
!isActive or idle-disconnect after N minutes hidden.
Lint clean on touched files. tsc -b && vite build pass.
Follow-up to the salvaged PR #16867 that added the read path for
agent.disabled_toolsets in _get_platform_tools():
- Document the new config key under a "Global Toolset Disable" section
in website/docs/user-guide/configuration.md, including the precedence
note (global disable overrides per-platform platform_toolsets).
- Map nazirulhafiy@gmail.com -> nazirulhafiy in scripts/release.py
AUTHOR_MAP so release-notes CI attributes the cherry-picked commit.
Previously, agent.disabled_toolsets in config.yaml only worked for CLI
mode (run_agent.py --disabled_toolsets). The gateway always passed
enabled_toolsets to AIAgent, and get_tool_definitions() ignored
disabled_toolsets when enabled_toolsets was set.
Fix: _get_platform_tools() now reads agent.disabled_toolsets from config
and excludes those toolsets from the returned set. This runs last so it
overrides everything above.
Added 3 tests covering cross-platform suppression, explicit platform
config override, and empty/missing config no-op behavior.
Streaming-only providers (glm, MiniMax, gpt-5.x via aigw, Anthropic via
openai-compat shims) emit reasoning through delta.reasoning_content
chunks that get accumulated into the local reasoning_text string — but
never land on the assistant message object as a top-level attribute. The
prior guard at _build_assistant_message only wrote reasoning_content
when the SDK exposed hasattr(msg, 'reasoning_content'), so these
providers persisted the chain-of-thought under the internal 'reasoning'
key and omitted the protocol-standard field.
The poison was silent until the user later switched to a DeepSeek-v4 or
Kimi thinking model, at which point replay failed with HTTP 400:
'The reasoning_content in the thinking mode must be passed back to the
API.' One reported session store accumulated 4,031 poisoned messages
across 1,101 files (#16844).
Fix: add an additive fallback that promotes the already-sanitized
reasoning_text to reasoning_content when no earlier branch wrote it AND
reasoning text was actually captured. Layered on top of the existing
SDK-attr branch and DeepSeek ''-pad (#15250) rather than replacing them,
so every existing behavior is preserved:
- SDK-exposed reasoning_content (OpenAI/Moonshot/DeepSeek SDK) still
wins.
- DeepSeek tool-call ''-pad still fires when the SDK exposes the attr
but the value is None.
- Non-thinking turns with no reasoning leave the field absent, so
_copy_reasoning_content_for_api's cross-provider leak guard (#15748),
promote-from-'reasoning' tier, and thinking-pad tier remain live at
replay time.
- No empty '' gets eagerly written on every assistant turn (which would
have bypassed the read-side ladder and triggered empty thinking-block
insertion in the Anthropic adapter).
Tests: three new TestBuildAssistantMessage cases covering the streaming
promotion path, SDK precedence, and field-absent-when-no-reasoning
invariant.
Credit @Sanjays2402 for the original diagnosis and patch in #16884;
this is a scoped rework that preserves the existing read-side
compensation code as defense in depth.
Refs #16844, #16884, #15250, #15353, #15748.
Address Copilot review on #16868:
1. Tighten pool iteration. ``validate_copilot_token`` only rejects empty
strings and classic PATs (``ghp_*``); a malformed/unsupported ``gho_*``
token at ``credential_pool.copilot[0]`` would pass the gate and short-
circuit the loop, hiding a later valid entry. Switch to calling
``exchange_copilot_token`` directly: only entries that actually exchange
into a live Copilot API token are returned. Bad/expired entries fall
through to the next, and an exhausted pool returns ``""`` so the picker
falls back to the curated list (existing behaviour).
2. Reword the docstring + test module docstring to describe the pool seed
path accurately — ``hermes auth add copilot`` adds an api-key-typed
credential whose ``access_token`` field stores the pasted token, and
``_seed_from_env`` mirrors ``COPILOT_GITHUB_TOKEN`` from
``~/.hermes/.env`` into the pool. The previous wording implied
``auth add copilot`` itself ran the device-code flow, which it does
not (the device-code flow lives in ``hermes model``).
Two new tests cover the iteration change:
- ``test_skips_pool_entry_that_fails_to_exchange`` — pool[0] raises,
pool[1] succeeds, picker uses pool[1].
- ``test_all_pool_entries_fail_exchange_returns_empty`` — every entry
raises, return ``""``.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Users whose only Copilot credential is the OAuth `access_token` saved by
`hermes auth add copilot` (device-code flow) saw the `/model` picker drop
back to a stale hardcoded list. Reason: `_resolve_copilot_catalog_api_key`
only consulted env vars (`COPILOT_GITHUB_TOKEN` / `GH_TOKEN` /
`GITHUB_TOKEN`) and the `gh auth token` CLI fallback, never the credential
pool that Hermes's own login flow writes into `auth.json`. With no token,
the live catalog fetch silently 401s and the picker hides current models
(claude-opus-4.7, claude-sonnet-4.6, gpt-5.5, grok-code-fast-1) — even
though `/model <id>` works fine because runtime inference reads the pool
through a different code path.
Mirror the Codex catalog resolver pattern: env-var first (unchanged), then
walk `read_credential_pool("copilot")` for the first entry with a
supported `access_token` (`gho_*` / `github_pat_*` / `ghu_*`). Run it
through `get_copilot_api_token()` so the catalog request uses the same
exchanged token the runtime path uses. Classic PATs (`ghp_*`) are still
rejected up-front via `validate_copilot_token` since the Copilot API
doesn't accept them.
Strictly additive: env still wins, and a missing/locked auth.json (or any
exception during pool read) still returns "" so the caller falls through
to the curated catalog.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
model_tools.py ran discover_mcp_tools() as a module-level side effect.
discover_mcp_tools() uses a blocking 120s wait internally (via
_run_on_mcp_loop -> future.result(timeout=120)).
The gateway lazy-imports run_agent -> model_tools on the first user
message, which happens inside the asyncio event loop thread. A slow or
unreachable MCP server therefore froze Discord shard heartbeats and
Telegram polling for up to 120s on the first message after gateway
start.
Fix: remove the module-level call. Every entry point now runs
discovery explicitly at its own startup, using the context-appropriate
blocking/non-blocking pattern:
- gateway/run.py: loop.run_in_executor(None, discover_mcp_tools)
before platforms start accepting traffic
- hermes_cli/main.py: inline (no event loop at CLI startup)
- tui_gateway/entry.py: inline (sync stdin loop, no event loop)
- acp_adapter/entry.py: inline before asyncio.run()
Closes#16856.
_handle_set_home_command wrote FEISHU_HOME_CHANNEL / DISCORD_HOME_CHANNEL /
etc. as top-level keys into config.yaml, but load_gateway_config() only
reads home channels from env vars. After every gateway restart the home
channel was lost — on every platform, not just Feishu.
Fix: switch /sethome to save_env_value(), which atomically writes to
~/.hermes/.env and updates the current process env in one shot. The
handler builds the env key from platform_name.upper(), so one line
change repairs /sethome for every platform that has a HOME_CHANNEL
env var.
Also widen _EXTRA_ENV_KEYS in hermes_cli/config.py so HOME_CHANNEL and
HOME_CHANNEL_NAME for every platform are treated as managed env vars:
SIGNAL, SLACK, SMS, DINGTALK, BLUEBUBBLES, FEISHU, WECOM, YUANBAO, plus
the missing *_NAME variants for DISCORD/TELEGRAM/MATTERMOST.
Closes#16806
Co-authored-by: teknium1 <screenmachine@gmail.com>
`/new` after `/model <custom-provider>:<model>` silently reverted to a
native provider whose static catalog happened to contain the same model
name (e.g. `deepseek-v4-pro` → native `deepseek` → 401).
Root cause at the `/model` writeback site: `HERMES_INFERENCE_PROVIDER`
was set unconditionally but `HERMES_TUI_PROVIDER` was only mirrored when
it was already set. On sessions launched without `--provider`,
`HERMES_TUI_PROVIDER` stayed unset, so `_resolve_startup_runtime()` on
`/new` skipped the explicit-provider early return and fell through to
`detect_static_provider_for_model()`.
Fix: set `HERMES_TUI_PROVIDER` unconditionally alongside
`HERMES_INFERENCE_PROVIDER` when `/model` lands. Keeps #15755's
invariant intact — `HERMES_TUI_PROVIDER` remains the canonical
"explicit this process" carrier, `HERMES_INFERENCE_PROVIDER` remains
ambient and does not short-circuit startup resolution.
Bug report and diagnosis: @Bartok9 in #16857 / #16873.
Fixes#16857
Replace the Linux/macOS pgrep regex ("hermes.*dashboard") with a ps
scan + the same explicit patterns list already used on the Windows
branch and in hermes_cli.gateway._scan_gateway_pids:
hermes dashboard
hermes_cli.main dashboard
hermes_cli/main.py dashboard
The old greedy regex would match any cmdline containing both words —
e.g. a chat session whose argv mentions "dashboard" or an unrelated
grafana/dashboard-server process. Added regression tests for both.
Follow-up tightening on #16881.
The dashboard is a long-lived server process users start and forget.
When hermes update replaces files on disk, the running process holds
the old Python backend in memory while the JS bundle gets updated,
producing a silent frontend/backend mismatch (e.g. v0.11.0 changed
the session token header -- old backends reject every API call).
Scan for running dashboard processes after a successful update (both
git and ZIP paths) and print a warning with their PIDs and restart
instructions. Mirrors the existing pattern for gateway processes.
Fixes#16872