Commit graph

26 commits

Author SHA1 Message Date
Teknium
d9d937b7f7
fix: detect Claude Code version dynamically for OAuth user-agent
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

* fix(security): block sandbox backend creds from subprocess env (#1264)

Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.

Cherry-picked from PR #1571 by ygd58.

* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)

When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.

Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.

The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).

* fix(gateway): /model shows active fallback model instead of config default (#1615)

When the agent falls back to a different model (e.g. due to rate
limiting), /model still showed the config default. Now tracks the
effective model/provider after each agent run and displays it.

Cleared when the primary model succeeds again or the user explicitly
switches via /model.

Cherry-picked from PR #1616 by MaxKerkula. Added hasattr guard for
test compatibility.

* feat(gateway): inject reply-to message context for out-of-session replies (#1594)

When a user replies to a Telegram message, check if the quoted text
exists in the current session transcript. If missing (from cron jobs,
background tasks, or old sessions), prepend [Replying to: "..."] to
the message so the agent has context about what's being referenced.

- Add reply_to_text field to MessageEvent (base.py)
- Populate from Telegram's reply_to_message (text or caption)
- Inject context in _handle_message when not found in history

Based on PR #1596 by anpicasso (cherry-picked reply-to feature only,
excluded unrelated /server command and background delegation changes).

* fix: recognize Claude Code OAuth credentials in startup gate (#1455)

The _has_any_provider_configured() startup check didn't look for
Claude Code OAuth credentials (~/.claude/.credentials.json). Users
with only Claude Code auth got the setup wizard instead of starting.

Cherry-picked from PR #1455 by kshitijk4poor.

* perf: use ripgrep for file search (200x faster than find)

search_files(target='files') now uses rg --files -g instead of find.
Ripgrep respects .gitignore, excludes hidden dirs by default, and has
parallel directory traversal — ~200x faster on wide trees (0.14s vs 34s
benchmarked on 164-repo tree).

Falls back to find when rg is unavailable, preserving hidden-dir
exclusion and BSD find compatibility.

Salvaged from PR #1464 by @light-merlin-dark (Merlin) — adapted to
preserve hidden-dir exclusion added since the original PR.

* refactor(tts): replace NeuTTS optional skill with built-in provider + setup flow

Remove the optional skill (redundant now that NeuTTS is a built-in TTS
provider). Replace neutts_cli dependency with a standalone synthesis
helper (tools/neutts_synth.py) that calls the neutts Python API directly
in a subprocess.

Add TTS provider selection to hermes setup:
- 'hermes setup' now prompts for TTS provider after model selection
- 'hermes setup tts' available as standalone section
- Selecting NeuTTS checks for deps and offers to install:
  espeak-ng (system) + neutts[all] (pip)
- ElevenLabs/OpenAI selections prompt for API keys
- Tool status display shows NeuTTS install state

Changes:
- Remove optional-skills/mlops/models/neutts/ (skill + CLI scaffold)
- Add tools/neutts_synth.py (standalone synthesis subprocess helper)
- Move jo.wav/jo.txt to tools/neutts_samples/ (bundled default voice)
- Refactor _generate_neutts() — uses neutts API via subprocess, no
  neutts_cli dependency, config-driven ref_audio/ref_text/model/device
- Add TTS setup to hermes_cli/setup.py (SETUP_SECTIONS, tool status)
- Update config.py defaults (ref_audio, ref_text, model, device)

* fix(docker): add explicit env allowlist for container credentials (#1436)

Docker terminal sessions are secret-dark by default. This adds
terminal.docker_forward_env as an explicit allowlist for env vars
that may be forwarded into Docker containers.

Values resolve from the current shell first, then fall back to
~/.hermes/.env. Only variables the user explicitly lists are
forwarded — nothing is auto-exposed.

Cherry-picked from PR #1449 by @teknium1, conflict-resolved onto
current main.

Fixes #1436
Supersedes #1439

* fix: email send_typing metadata param + ☤ Hermes staff symbol

- email.py: add missing metadata parameter to send_typing() to match
  BasePlatformAdapter signature (PR #1431 by @ItsChoudhry)
- README.md: ⚕ → ☤ — the caduceus is Hermes's staff, not the
  medical Staff of Asclepius (PR #1420 by @rianczerwinski)

* fix(whatsapp): support LID format in self-chat mode (#1556)

WhatsApp now uses LID (Linked Identity Device) format alongside classic
@s.whatsapp.net. Self-chat detection checked only the classic format,
breaking self-chat mode for users on newer WhatsApp versions.

- Check both sock.user.id and sock.user.lid for self-chat detection
- Accept 'append' message type in addition to 'notify' (self-chat
  messages arrive as 'append')
- Track sent message IDs to prevent echo-back loops with media
- Add WHATSAPP_DEBUG env var for troubleshooting

Based on PR #1556 by jcorrego (manually applied due to cherry-pick
conflicts).

* fix: detect Claude Code version dynamically for OAuth user-agent

The _CLAUDE_CODE_VERSION was hardcoded to '2.1.2' but Anthropic
rejects OAuth requests when the spoofed user-agent version is too
far behind the current Claude Code release. The error is a generic
400 with just 'Error' as the message, making it very hard to diagnose.

Fix: detect the installed version via 'claude --version' at import
time, falling back to a bumped static constant (2.1.74) when Claude
Code isn't installed. This means users who keep Claude Code updated
never hit stale-version rejections.

Reported by Jack — changing the version string to match the installed
claude binary fixed persistent OAuth 400 errors immediately.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
Co-authored-by: Max K <MaxKerkula@users.noreply.github.com>
Co-authored-by: Angello Picasso <angello.picasso@devsu.com>
Co-authored-by: kshitij <kshitijk4poor@users.noreply.github.com>
Co-authored-by: jcorrego <jcorrego@users.noreply.github.com>
2026-03-17 02:48:33 -07:00
teknium1
19c8ad3d3d fix: add Claude Code user-agent to OAuth token exchange/refresh requests
Anthropic's token endpoint is behind Cloudflare which blocks Python's
default urllib user-agent (Python-urllib/3.x). Without a proper
user-agent, the token exchange returns 403 (Cloudflare error 1010).

Adds 'claude-cli/2.1.2 (external, cli)' user-agent to all three
OAuth HTTP requests:
- Initial token exchange (authorization_code grant)
- Hermes token refresh (refresh_token grant)
- Claude Code credential refresh (refresh_token grant)

Verified: full OAuth PKCE flow now works end-to-end.
2026-03-16 23:26:43 -07:00
teknium1
bd3b0c712b fix: make OAuth login URL prominent for SSH/headless users
The URL is now the primary element — displayed in a bordered box
before the browser auto-open attempt. Works for users who SSH into
remote servers where webbrowser.open() silently fails.
2026-03-16 23:21:30 -07:00
teknium1
b798062501 fix: improve OAuth login UX for headless/SSH users
Put the authorization URL front and center instead of treating it as
a fallback. Most Hermes users run on remote servers via SSH where
webbrowser.open() silently fails.
2026-03-16 23:17:29 -07:00
teknium1
63e88326a8 feat: Hermes-native PKCE OAuth flow for Claude Pro/Max subscriptions
Adds our own OAuth login and token refresh flow, independent of Claude
Code CLI. Mirrors the PKCE flow used by pi-ai (clawdbot) and OpenCode:

- run_hermes_oauth_login(): full PKCE authorization code flow
  - Opens browser to claude.ai/oauth/authorize
  - User pastes code#state back
  - Exchanges for access + refresh tokens
  - Stores in ~/.hermes/.anthropic_oauth.json (our own file)
  - Also writes to ~/.claude/.credentials.json for backward compat

- refresh_hermes_oauth_token(): automatic token refresh
  - POST to console.anthropic.com/v1/oauth/token with refresh_token
  - Updates both credential files on success

- Credential resolution priority updated:
  1. ANTHROPIC_TOKEN env var
  2. CLAUDE_CODE_OAUTH_TOKEN env var
  3. Hermes OAuth credentials (~/.hermes/.anthropic_oauth.json) ← NEW
  4. Claude Code credentials (~/.claude/.credentials.json)
  5. ANTHROPIC_API_KEY env var

Uses same CLIENT_ID, endpoints, scopes, and PKCE parameters as
Claude Code / OpenCode / pi-ai. Token refresh happens automatically
before each API call via _try_refresh_anthropic_client_credentials.
2026-03-16 23:15:56 -07:00
Teknium
2158c44efd
fix: Anthropic OAuth compatibility — Claude Code identity fingerprinting (#1597)
Anthropic routes OAuth/subscription requests based on Claude Code's
identity markers. Without them, requests get intermittent 500 errors
(~25% failure rate observed). This matches what pi-ai (clawdbot) and
OpenCode both implement for OAuth compatibility.

Changes (OAuth tokens only — API key users unaffected):

1. Headers: user-agent 'claude-cli/2.1.2 (external, cli)' + x-app 'cli'
2. System prompt: prepend 'You are Claude Code, Anthropic's official CLI'
3. System prompt sanitization: replace Hermes/Nous references
4. Tool names: prefix with 'mcp_' (Claude Code convention for non-native tools)
5. Tool name stripping: remove 'mcp_' prefix from response tool calls

Before: 9/12 OK, 1 hard fail, 4 needed retries (~25% error rate)
After: 16/16 OK, 0 failures, 0 retries (0% error rate)
2026-03-16 17:08:22 -07:00
teknium1
62abb453d3 Merge origin/main into hermes/hermes-daa73839 2026-03-14 23:44:47 -07:00
teknium1
735a6e7651 fix: convert anthropic image content blocks 2026-03-14 23:41:20 -07:00
teknium1
f4e8772de4 fix: require oauth creds for native Anthropic 2026-03-14 22:11:21 -07:00
teknium1
db9e512424 fix: fall back from managed Anthropic keys 2026-03-14 21:44:39 -07:00
teknium1
db362dbd4c feat: add native Anthropic auxiliary vision 2026-03-14 21:14:20 -07:00
teknium1
e052c74727 fix: refresh Anthropic OAuth before stale env tokens 2026-03-14 19:22:31 -07:00
Teknium
bfb82b5cee
fix: preserve Anthropic cache markers through adapter (#1205)
Keep assistant cache-control blocks intact when converting OpenAI-format
messages to Anthropic format, and propagate tool-message cache markers onto
generated tool_result blocks.

Adds regression tests covering assistant and tool cache marker preservation
through convert_messages_to_anthropic().
2026-03-13 13:27:03 -07:00
Teknium
c097e56142
Merge pull request #1149 from NousResearch/hermes/hermes-d28bf447
feat: Agentic On-Policy Distillation (OPD) environment
2026-03-13 03:09:43 -07:00
teknium1
ef3f3f9c08 fix: normalize dot-versioned model names for Anthropic API
anthropic/claude-opus-4.6 (OpenRouter format) was being sent as
claude-opus-4.6 to the Anthropic API, which expects claude-opus-4-6
(hyphens, not dots).

normalize_model_name() now converts dots to hyphens after stripping
the provider prefix, matching Anthropic's naming convention.

Fixes 404: 'model: claude-opus-4.6 was not found'
2026-03-13 03:08:14 -07:00
kshitijk4poor
bb3f5ed32a fix: separate Anthropic OAuth tokens from API keys
Persist OAuth/setup tokens in ANTHROPIC_TOKEN instead of ANTHROPIC_API_KEY.
Reserve ANTHROPIC_API_KEY for regular Console API keys.

Changes:
- anthropic_adapter: reorder resolve_anthropic_token() priority —
  ANTHROPIC_TOKEN first, ANTHROPIC_API_KEY as legacy fallback
- config: add save_anthropic_oauth_token() / save_anthropic_api_key() helpers
  that clear the opposing slot to prevent priority conflicts
- config: show_config() prefers ANTHROPIC_TOKEN for display
- setup: OAuth login and pasted setup-tokens write to ANTHROPIC_TOKEN
- setup: API key entry writes to ANTHROPIC_API_KEY and clears ANTHROPIC_TOKEN
- main: same fixes in _run_anthropic_oauth_flow() and _model_flow_anthropic()
- main: _has_any_provider_configured() checks ANTHROPIC_TOKEN
- doctor: use _is_oauth_token() for correct auth method validation
- runtime_provider: updated error message
- run_agent: simplified client init to use resolve_anthropic_token()
- run_agent: updated 401 troubleshooting messages
- status: prefer ANTHROPIC_TOKEN in status display
- tests: updated priority test, added persistence helper tests

Cherry-picked from PR #1141 by kshitijk4poor, rebased onto current main
with unrelated changes (web_policy config, blocklist CLI) removed.

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-13 02:09:52 -07:00
Teknium
d24bcad90b
fix: Anthropic OAuth — beta header, token refresh, config contamination, reauthentication (#1132)
Fixes Anthropic OAuth/subscription authentication end-to-end:

Auth failures (401 errors):
- Add missing 'claude-code-20250219' beta header for OAuth tokens. Both
  clawdbot and OpenCode include this alongside 'oauth-2025-04-20' — without
  it, Anthropic's API rejects OAuth tokens with 401 authentication errors.
- Fix _fetch_anthropic_models() to use canonical beta headers from
  _COMMON_BETAS + _OAUTH_ONLY_BETAS instead of hardcoding.

Token refresh:
- Add _refresh_oauth_token() — when Claude Code credentials from
  ~/.claude/.credentials.json are expired but have a refresh token,
  automatically POST to console.anthropic.com/v1/oauth/token to get
  a new access token. Uses the same client_id as Claude Code / OpenCode.
- Add _write_claude_code_credentials() — writes refreshed tokens back
  to ~/.claude/.credentials.json, preserving other fields.
- resolve_anthropic_token() now auto-refreshes expired tokens before
  returning None.

Config contamination:
- Anthropic's _model_flow_anthropic() no longer saves base_url to config.
  Since resolve_runtime_provider() always hardcodes Anthropic's URL, the
  stale base_url was contaminating other providers when users switched
  without re-running 'hermes model' (e.g., Codex hitting api.anthropic.com).
- _update_config_for_provider() now pops base_url when passed empty string.
- Same fix in setup.py.

Flow/UX (hermes model command):
- CLAUDE_CODE_OAUTH_TOKEN env var now checked in credential detection
- Reauthentication option when existing credentials found
- run_oauth_setup_token() runs 'claude setup-token' as interactive
  subprocess, then auto-detects saved credentials
- Clean has_creds/needs_auth flow in both main.py and setup.py

Tests (14 new):
- Beta header assertions for claude-code-20250219
- Token refresh: successful refresh with credential writeback, failed
  refresh returns None, no refresh token returns None
- Credential writeback: new file creation, preserving existing fields
- Auto-refresh integration in resolve_anthropic_token()
- CLAUDE_CODE_OAUTH_TOKEN fallback, credential file auto-discovery
- run_oauth_setup_token() (5 scenarios)
2026-03-12 20:45:50 -07:00
teknium1
638136e353 fix(anthropic): skip thinking params for Haiku models
Haiku models don't support extended thinking at all. Without this
guard, claude-haiku-4-5-20251001 would receive type=enabled +
budget_tokens and return a 400 error.

Incorporates the fix from PR #1127 (by frizynn) on top of #1128's
adaptive thinking refactor.

Verified live with Claude Code OAuth:
  claude-opus-4-6       → adaptive thinking ✓
  claude-haiku-4-5      → no thinking params ✓
  claude-sonnet-4       → enabled thinking ✓
2026-03-12 19:34:55 -07:00
Ahmad Ragab
3dc148ab6f fix: use adaptive thinking without budget_tokens for Claude 4.6 models
For Claude 4.6 models (Opus and Sonnet), the Anthropic API rejects
budget_tokens when thinking.type is 'adaptive'. This was causing a
400 error: 'thinking.adaptive.budget_tokens: Extra inputs are not
permitted'.

Changes:
- Send thinking: {type: 'adaptive'} without budget_tokens for 4.6
- Move effort control to output_config: {effort: ...} per Anthropic docs
- Map Hermes effort levels to Anthropic effort levels (xhigh->max, etc.)
- Narrow adaptive detection to 4.6 models only (4.5 still uses manual)
- Add tests for adaptive thinking on 4.6 and manual thinking on pre-4.6

Fixes #1126
2026-03-13 03:21:13 +01:00
teknium1
aaaba78126 fix(anthropic): final polish — tool ID sanitization, crash guards, temp=1
Remaining issues from deep scan:

Adapter (agent/anthropic_adapter.py):
- Add _sanitize_tool_id() — Anthropic requires IDs matching [a-zA-Z0-9_-],
  now strips invalid chars and ensures non-empty (both tool_use and tool_result)
- Empty tool result content → '(no output)' placeholder (Anthropic rejects empty)
- Set temperature=1 when thinking type='enabled' on older models (required)
- normalize_model_name now case-insensitive for 'Anthropic/' prefix
- Fix stale docstrings referencing only ~/.claude/.credentials.json

Agent loop (run_agent.py):
- Guard memory flush path (line ~2684) — was calling self.client.chat.completions
  which is None in anthropic_messages mode. Now routes through Anthropic client.
- Guard summary generation path (line ~3171) — same crash when reaching
  iteration limit. Now builds proper Anthropic kwargs and normalizes response.
- Guard retry summary path (line ~3200) — same fix for the summary retry loop.

All three self.client.chat.completions.create() calls outside the main
loop now have anthropic_messages branches to prevent NoneType crashes.
2026-03-12 17:23:09 -07:00
teknium1
4068f20ce9 fix(anthropic): deep scan fixes — auth, retries, edge cases
Fixes from comprehensive code review and cross-referencing with
clawdbot/OpenCode implementations:

CRITICAL:
- Add one-shot guard (anthropic_auth_retry_attempted) to prevent
  infinite 401 retry loops when credentials keep changing
- Fix _is_oauth_token(): managed keys from ~/.claude.json are NOT
  regular API keys (don't start with sk-ant-api). Inverted the logic:
  only sk-ant-api* is treated as API key auth, everything else uses
  Bearer auth + oauth beta headers

HIGH:
- Wrap json.loads(args) in try/except in message conversion — malformed
  tool_call arguments no longer crash the entire conversation
- Raise AuthError in runtime_provider when no Anthropic token found
  (was silently passing empty string, causing confusing API errors)
- Remove broken _try_anthropic() from auxiliary vision chain — the
  centralized router creates an OpenAI client for api_key providers
  which doesn't work with Anthropic's Messages API

MEDIUM:
- Handle empty assistant message content — Anthropic rejects empty
  content blocks, now inserts '(empty)' placeholder
- Fix setup.py existing_key logic — set to 'KEEP' sentinel instead
  of None to prevent falling through to the auth choice prompt
- Add debug logging to _fetch_anthropic_models on failure

Tests: 43 adapter tests (2 new for token detection), 3197 total passed
2026-03-12 17:14:22 -07:00
teknium1
cd4e995d54 fix(anthropic): live model fetching + adaptive thinking for 4.5+ models
- Add _fetch_anthropic_models() to hermes_cli/models.py — hits the
  Anthropic /v1/models endpoint to get the live model catalog. Handles
  both API key and OAuth token auth headers.

- Wire it into provider_model_ids() so both 'hermes model' and
  'hermes setup model' show the live list instead of a stale static one.

- Update static _PROVIDER_MODELS fallback with full current catalog:
  opus-4-6, sonnet-4-6, opus-4-5, sonnet-4-5, opus-4, sonnet-4, haiku-4-5

- Update model_metadata.py with context lengths for all current models.

- Fix thinking parameter for 4.5+ models: use type='adaptive' instead
  of type='enabled' (Anthropic deprecated 'enabled' for newer models,
  warns at runtime). Detects model version from the model name string.

Verified live:
  hermes model → Anthropic → auto-detected creds → shows 7 live models
  hermes chat --provider anthropic --model claude-opus-4-6 → works
2026-03-12 17:04:31 -07:00
teknium1
d51243b6d3 fix(anthropic): read credentials from ~/.claude.json (native binary v2.x)
The critical bug: read_claude_code_credentials() only looked at
~/.claude/.credentials.json, but Claude Code's native binary (v2.x,
Bun-compiled) stores credentials in ~/.claude.json at the top level
as 'primaryApiKey'. The .credentials.json file is only written by
older npm-based installs.

Now checks both locations in priority order:
  1. ~/.claude.json → primaryApiKey (native binary, v2.x)
  2. ~/.claude/.credentials.json → claudeAiOauth.accessToken (legacy)

Verified live: hermes model → Anthropic → auto-detected credentials →
claude-sonnet-4-20250514 → 'Hello there, how are you?' (5 words)
2026-03-12 16:43:31 -07:00
teknium1
7086fde37e fix(anthropic): revert inline vision, add hermes model flow, wire vision aux
Feedback fixes:

1. Revert _convert_vision_content — vision is handled by the vision_analyze
   tool, not by converting image blocks inline in conversation messages.
   Removed the function and its tests.

2. Add Anthropic to 'hermes model' (cmd_model in main.py):
   - Added to provider_labels dict
   - Added to providers selection list
   - Added _model_flow_anthropic() with Claude Code credential auto-detection,
     API key prompting, and model selection from catalog.

3. Wire up Anthropic as a vision-capable auxiliary provider:
   - Added _try_anthropic() to auxiliary_client.py using claude-sonnet-4
     as the vision model (Claude natively supports multimodal)
   - Added to the get_vision_auxiliary_client() auto-detection chain
     (after OpenRouter/Nous, before Codex/custom)

Cache tracking note: the Anthropic cache metrics branch in run_agent.py
(cache_read_input_tokens / cache_creation_input_tokens) is in the correct
place — it's response-level parsing, same location as the existing
OpenRouter cache tracking. auxiliary_client.py has no cache tracking.
2026-03-12 16:09:04 -07:00
teknium1
d7adfe8f61 fix(anthropic): address gaps found in deep-dive audit
After studying clawdbot (OpenClaw) and OpenCode implementations:

## Beta headers
- Add interleaved-thinking-2025-05-14 and fine-grained-tool-streaming-2025-05-14
  as common betas (sent with ALL auth types, not just OAuth)
- OAuth tokens additionally get oauth-2025-04-20
- API keys now also get the common betas (previously got none)

## Vision/image support
- Add _convert_vision_content() to convert OpenAI multimodal format
  (image_url blocks) to Anthropic format (image blocks with base64/url source)
- Handles both data: URIs (base64) and regular URLs

## Role alternation enforcement
- Anthropic strictly rejects consecutive same-role messages (400 error)
- Add post-processing step that merges consecutive user/assistant messages
- Handles string, list, and mixed content types during merge

## Tool choice support
- Add tool_choice parameter to build_anthropic_kwargs()
- Maps OpenAI values: auto→auto, required→any, none→omit, name→tool

## Cache metrics tracking
- Anthropic uses cache_read_input_tokens / cache_creation_input_tokens
  (different from OpenRouter's prompt_tokens_details.cached_tokens)
- Add api_mode-aware branch in run_agent.py cache stats logging

## Credential refresh on 401
- On 401 error during anthropic_messages mode, re-read credentials
  via resolve_anthropic_token() (picks up refreshed Claude Code tokens)
- Rebuild client if new token differs from current one
- Follows same pattern as Codex/Nous 401 refresh handlers

## Tests
- 44 adapter tests (8 new: vision conversion, role alternation, tool choice)
- Updated beta header tests to verify new structure
- Full suite: 3198 passed, 0 regressions
2026-03-12 16:00:46 -07:00
teknium1
5e12442b4b feat: native Anthropic provider with Claude Code credential auto-discovery
Add Anthropic as a first-class inference provider, bypassing OpenRouter
for direct API access. Uses the native Anthropic SDK with a full format
adapter (same pattern as the codex_responses api_mode).

## Auth (three methods, priority order)
1. ANTHROPIC_API_KEY env var (regular API key, sk-ant-api-*)
2. ANTHROPIC_TOKEN / CLAUDE_CODE_OAUTH_TOKEN env var (setup-token, sk-ant-oat-*)
3. Auto-discovery from ~/.claude/.credentials.json (Claude Code subscription)
   - Reads Claude Code's OAuth credentials
   - Checks token expiry with 60s buffer
   - Setup tokens use Bearer auth + anthropic-beta: oauth-2025-04-20 header
   - Regular API keys use standard x-api-key header

## Changes by file

### New files
- agent/anthropic_adapter.py — Client builder, message/tool/response
  format conversion, Claude Code credential reader, token resolver.
  Handles system prompt extraction, tool_use/tool_result blocks,
  thinking/reasoning, orphaned tool_use cleanup, cache_control.
- tests/test_anthropic_adapter.py — 36 tests covering all adapter logic

### Modified files
- pyproject.toml — Add anthropic>=0.39.0 dependency
- hermes_cli/auth.py — Add 'anthropic' to PROVIDER_REGISTRY with
  three env vars, plus 'claude'/'claude-code' aliases
- hermes_cli/models.py — Add model catalog, labels, aliases, provider order
- hermes_cli/main.py — Add 'anthropic' to --provider CLI choices
- hermes_cli/runtime_provider.py — Add Anthropic branch returning
  api_mode='anthropic_messages' (before generic api_key fallthrough)
- hermes_cli/setup.py — Add Anthropic setup wizard with Claude Code
  credential auto-discovery, model selection, OpenRouter tools prompt
- agent/auxiliary_client.py — Add claude-haiku-4-5 as aux model
- agent/model_metadata.py — Add bare Claude model context lengths
- run_agent.py — Add anthropic_messages api_mode:
  * Client init (Anthropic SDK instead of OpenAI)
  * API call dispatch (_anthropic_client.messages.create)
  * Response validation (content blocks)
  * finish_reason mapping (stop_reason -> finish_reason)
  * Token usage (input_tokens/output_tokens)
  * Response normalization (normalize_anthropic_response)
  * Client interrupt/rebuild
  * Prompt caching auto-enabled for native Anthropic
- tests/test_run_agent.py — Update test_anthropic_base_url_accepted to
  expect native routing, add test_prompt_caching_native_anthropic
2026-03-12 15:47:45 -07:00