Four fixes to auxiliary_client.py:
1. Respect explicit provider as hard constraint (#7559)
When auxiliary.{task}.provider is explicitly set (not 'auto'),
connection/payment errors no longer silently fallback to cloud
providers. Local-only users (Ollama, vLLM) will no longer get
unexpected OpenRouter billing from auxiliary tasks.
2. Eliminate model='default' sentinel (#7512)
_resolve_api_key_provider() no longer sends literal 'default' as
model name to APIs. Providers without a known aux model in
_API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
model_not_supported errors.
3. Add payment/connection fallback to async_call_llm (#7512)
async_call_llm now mirrors sync call_llm's fallback logic for
payment (402) and connection errors. Previously, async consumers
(session_search, web_tools, vision) got hard failures with no
recovery. Also fixes hardcoded 'openrouter' fallback to use the
full auto-detection chain.
4. Use accurate error reason in fallback logs (#7512)
_try_payment_fallback() now accepts a reason parameter and uses
it in log messages. Connection timeouts are no longer misleadingly
logged as 'payment error'.
Closes#7559Closes#7512
The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.
Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
(AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
clients in CodexAuxiliaryClient when api_mode=codex_responses or
when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
API-key provider return paths
- Update test mocks for the new 5-tuple return format
Users can now set:
auxiliary:
compression:
model: gpt-5.3-codex
base_url: https://api.openai.com/v1
api_mode: codex_responses
Closes#6800
_discover_bundled_skills() used the directory name to identify skills,
but skills_tool.py and skills_hub.py use the `name:` field from SKILL.md
frontmatter. This mismatch caused 9 builtin skills whose directory name
differs from their SKILL.md name to be written to .bundled_manifest
under the wrong key, so `hermes skills list` showed them as "local"
instead of "builtin".
Read the frontmatter name field (with directory-name fallback) so the
manifest keys match what the rest of the codebase expects.
Closes#6835
Aligns MiniMax provider with official API documentation. Fixes 6 bugs:
transport mismatch (openai_chat -> anthropic_messages), credential leak
in switch_model(), prompt caching sent to non-Anthropic endpoints,
dot-to-hyphen model name corruption, trajectory compressor URL routing,
and stale doctor health check.
Also corrects context window (204,800), thinking support (manual mode),
max output (131,072), and model catalog (M2 family only on /anthropic).
Source: https://platform.minimax.io/docs/api-reference/text-anthropic-api
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Two fixes for the honcho memory plugin: (1) initOnSessionStart — opt-in eager session init in tools mode so sync_turn() works from turn 1 (default false, non-breaking). (2) peerName fix — gateway user_id no longer silently overwrites an explicitly configured peerName. 11 new tests. Contributed by @Kathie-yu.
_is_oauth_token() returned True for any key not starting with 'sk-ant-api',
which means MiniMax and Alibaba API keys were falsely treated as Anthropic
OAuth tokens. This triggered the Claude Code compatibility path:
- All tool names prefixed with mcp_ (e.g. mcp_terminal, mcp_web_search)
- System prompt injected with 'You are Claude Code' identity
- 'Hermes Agent' replaced with 'Claude Code' throughout
Fix: Make _is_oauth_token() positively identify Anthropic OAuth tokens by
their key format instead of using a broad catch-all:
- sk-ant-* (but not sk-ant-api-*) -> setup tokens, managed keys
- eyJ* -> JWTs from Anthropic OAuth flow
- Everything else -> False (MiniMax, Alibaba, etc.)
Reported by stefan171.
* fix: circuit breaker stops CPU-burning restart loops on persistent errors
When a gateway session hits a non-retryable error (e.g. invalid model
ID → HTTP 400), the agent fails and returns. But if the session keeps
receiving messages (or something periodically recreates agents), each
attempt spawns a new AIAgent — reinitializing MCP server connections,
burning CPU — only to hit the same 400 error again. On a 4-core server,
this pegs an entire core per stuck session and accumulates 300+ minutes
of CPU time over hours.
Fix: add a per-session consecutive failure counter in the gateway runner.
- Track consecutive non-retryable failures per session key
- After 3 consecutive failures (_MAX_CONSECUTIVE_FAILURES), block
further agent creation for that session and notify the user:
'⚠️ This session has failed N times in a row with a non-retryable
error. Use /reset to start a new session.'
- Evict the cached agent when the circuit breaker engages to prevent
stale state from accumulating
- Reset the counter on successful agent runs
- Clear the counter on /reset and /new so users can recover
- Uses getattr() pattern so bare GatewayRunner instances (common in
tests using object.__new__) don't crash
Tests:
- 8 new tests in test_circuit_breaker.py covering counter behavior,
threshold, reset, session isolation, and bare-runner safety
Addresses #7130.
* Revert "fix: circuit breaker stops CPU-burning restart loops on persistent errors"
This reverts commit d848ea7109.
* fix: don't evict cached agent on failed runs — prevents MCP restart loop
When a run fails (e.g. invalid model ID → 400) and fallback activated,
the gateway was evicting the cached agent to 'retry primary next time.'
But evicting a failed agent forces a full AIAgent recreation on the next
message — reinitializing MCP server connections, spawning stdio
processes — only to hit the same 400 again. This created a CPU-burning
loop (91%+ for hours, #7130).
The fix: add `and not _run_failed` to the fallback-eviction check.
Failed runs keep the cached agent. The next message reuses it (no MCP
reinit), hits the same error, returns it to the user quickly. The user
can /reset or /model to fix their config.
Successful fallback runs still evict as before so the next message
retries the primary model.
Addresses #7130.
- Remove unreachable `if not content_sample` branch inside the truthy
`if content_sample` block in `_is_likely_binary()` (dead code that
could never execute).
- Replace `linter_cmd.format(file=...)` with `linter_cmd.replace("{file}", ...)`
in `_check_lint()` so file paths containing curly braces (e.g.
`src/{test}.py`) no longer raise KeyError/ValueError.
- Add 16 unit tests covering both fixes and edge cases.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GPT-5+ models (except gpt-5-mini) are only accessible via the Responses
API on Copilot. When these models were configured as the compression
summary_model (or any auxiliary task), the plain OpenAI client sent them
to /chat/completions which returned a 400 error:
model "gpt-5.4-mini" is not accessible via the /chat/completions endpoint
resolve_provider_client() now checks _should_use_copilot_responses_api()
for the copilot provider and wraps the client in CodexAuxiliaryClient
when needed, routing calls through responses.stream() transparently.
Adds tests for both the wrapping (gpt-5.4-mini) and non-wrapping
(gpt-4.1-mini) paths.
Add delegation.reasoning_effort config key so subagents can run at a
different thinking level than the parent agent. When set, overrides
the parent's reasoning_config; when empty, inherits as before.
Valid values: xhigh, high, medium, low, minimal, none (disables thinking).
Config path: delegation.reasoning_effort in config.yaml
Files changed:
- tools/delegate_tool.py: resolve override in _build_child_agent
- hermes_cli/config.py: add reasoning_effort to DEFAULT_CONFIG
- tests/tools/test_delegate.py: 4 new tests covering all cases
Follow-up fixes for the matrix-nio → mautrix migration:
1. Module-level mautrix.types import now wrapped in try/except with
proper stub classes. Without this, importing gateway.platforms.matrix
crashes the entire gateway when mautrix isn't installed — even for
users who don't use Matrix. The stubs mirror mautrix's real attribute
names so tests that exercise adapter methods (send, reactions, etc.)
work without the real SDK.
2. Removed _ensure_mautrix_mock() from test_matrix_mention.py — it
permanently installed MagicMock modules in sys.modules via setdefault(),
polluting later tests in the suite. No longer needed since the module
imports cleanly without mautrix.
3. Fixed thread persistence tests to use direct class reference in
monkeypatch.setattr() instead of string-based paths, which broke
when the module was reimported by other tests.
4. Moved the module-importability test to a subprocess to prevent it
from polluting sys.modules (reimporting creates a second module object
with different __dict__, breaking patch.object in subsequent tests).
matrix-nio pulls in peewee -> atomicwrites (sdist-only, archived,
missing build-system metadata) which breaks nix flake builds.
mautrix-python publishes wheels, has a leaner dep tree, and its
[encryption] extra uses the same python-olm without the problematic
transitive chain.
- Add shared is_wsl() to hermes_constants (like is_termux)
- Update supports_systemd_services() to verify systemd is actually
running on WSL before returning True
- Add WSL-specific guidance in gateway install/start/setup/status
for both cases: WSL+systemd and WSL without systemd
- Improve help strings: 'run' now says recommended for WSL/Docker,
'start'/'install' now mention systemd/launchd explicitly
- Add WSL gateway FAQ section with tmux/nohup/Task Scheduler tips
- Update CLI commands docs with WSL tip
- Deduplicate _is_wsl() from clipboard.py to shared hermes_constants
- Fix clipboard tests to reset hermes_constants cache
- 20 new WSL-specific tests covering detection, systemd check,
supports_systemd_services integration, and command output
Motivated by user feedback: took 1 hour to figure out run vs start
on WSL, Telegram bot kept disconnecting due to flaky WSL systemd.
Cover the three key behaviors:
- bulk_upload_fn is called instead of per-file upload_fn
- Fallback to upload_fn when bulk_upload_fn is None
- Rollback on bulk upload failure retries all files
- Remove auto-activation: when context.engine is 'compressor' (default),
plugin-registered engines are NOT used. Users must explicitly set
context.engine to a plugin name to activate it.
- Add curses_radiolist() to curses_ui.py: single-select radio picker
with keyboard nav + text fallback, matching curses_checklist pattern.
- Rewrite cmd_toggle() as composite plugins UI:
Top section: general plugins with checkboxes (existing behavior)
Bottom section: provider plugin categories (Memory Provider, Context Engine)
with current selection shown inline. ENTER/SPACE on a category opens
a radiolist sub-screen for single-select configuration.
- Add provider discovery helpers: _discover_memory_providers(),
_discover_context_engines(), config read/save for memory.provider
and context.engine.
- Add tests: radiolist non-TTY fallback, provider config save/load,
discovery error handling, auto-activation removal verification.
- PluginContext.register_context_engine() lets plugins replace the
built-in ContextCompressor with a custom ContextEngine implementation
- PluginManager stores the registered engine; only one allowed
- run_agent.py checks for a plugin engine at init before falling back
to the default ContextCompressor
- reset_session_state() now calls engine.on_session_reset() instead of
poking internal attributes directly
- ContextCompressor.on_session_reset() handles its own internals
(_context_probed, _previous_summary, etc.)
- 19 new tests covering ABC contract, defaults, plugin slot registration,
rejection of duplicates/non-engines, and compressor reset behavior
- All 34 existing compressor tests pass unchanged
When models return empty responses (no content, no tool calls, no
reasoning), Hermes previously retried 3 times silently then fell through
to '(empty)' — without ever trying the fallback provider chain. Users on
GLM-4.5-Air and similar models experienced what appeared to be a
complete hang, especially in gateway (Telegram/Discord) contexts where
the silent retries produced zero feedback.
Changes:
- After exhausting 3 empty retries, attempt _try_activate_fallback()
before giving up with '(empty)'. If fallback succeeds, reset retry
counter and continue the conversation loop with the new provider.
- Replace all _vprint() calls in recovery paths with _emit_status(),
which surfaces messages through both CLI (_vprint with force=True)
and gateway (status_callback -> adapter.send). Users now see:
* '⚠️ Empty response from model — retrying (N/3)' during retries
* '⚠️ Model returning empty responses — switching to fallback...'
* '↻ Switched to fallback: <model> (<provider>)' on success
* '❌ Model returned no content after all retries [and fallback]'
- Add logger.warning() throughout empty response paths for log file
visibility (model name, provider, retry counts).
- Upgrade _last_content_with_tools fallback from logger.debug to
logger.info + _emit_status so recovery is visible.
- Upgrade thinking-only prefill continuation to use _emit_status.
Tests:
- test_empty_response_triggers_fallback_provider: verifies fallback
activation after 3 empty retries produces content from fallback model
- test_empty_response_fallback_also_empty_returns_empty: verifies
graceful degradation when fallback also returns empty
- test_empty_response_emits_status_for_gateway: verifies _emit_status
is called during retries so gateway users see feedback
Addresses #7180.
Tool progress markers (e.g. `⏰ list`) were injected directly into
SSE delta.content chunks. OpenAI-compatible frontends (Open WebUI,
LobeChat, etc.) store delta.content verbatim as the assistant message
and send it back on subsequent requests. After enough turns, the model
learns to emit these markers as plain text instead of issuing real tool
calls — silently hallucinating tool results without ever running them.
Fix: Send tool progress as a custom `event: hermes.tool.progress` SSE
event instead of mixing it into delta.content. Per the SSE spec, clients
that don't understand a custom event type silently ignore it, so this is
backward-compatible. Frontends that want to render progress indicators
can listen for the custom event without persisting it to conversation
history.
The /v1/runs endpoint already uses structured events — this aligns the
/v1/chat/completions streaming path with the same principle.
Closes#6972
* fix(nix): gate matrix extra to Linux in [all] profile
matrix-nio[e2e] depends on python-olm which is upstream-broken on modern
macOS (Clang 21+, archived libolm). Previously the [matrix] extra was
completely excluded from [all], meaning NixOS users (who install via [all])
had no Matrix support at all.
Add a sys_platform == 'linux' marker so [all] pulls in [matrix] on Linux
(where python-olm builds fine) while still skipping it on macOS. This
fixes the NixOS setup path without breaking macOS installs.
Update the regression test to verify the Linux-gated marker is present
rather than just checking matrix is absent from [all].
Fixes#4594
* chore: regenerate uv.lock with matrix-on-linux in [all]
When two gateway messages arrived concurrently, _set_session_env wrote
HERMES_SESSION_PLATFORM/CHAT_ID/CHAT_NAME/THREAD_ID into the process-global
os.environ. Because asyncio tasks share the same process, Message B would
overwrite Message A's values mid-flight, causing background-task notifications
and tool calls to route to the wrong thread/chat.
Replace os.environ with Python's contextvars.ContextVar. Each asyncio task
(and any run_in_executor thread it spawns) gets its own copy, so concurrent
messages never interfere.
Changes:
- New gateway/session_context.py with ContextVar definitions, set/clear/get
helpers, and os.environ fallback for CLI/cron/test backward compatibility
- gateway/run.py: _set_session_env returns reset tokens, _clear_session_env
accepts them for proper cleanup in finally blocks
- All tool consumers updated: cronjob_tools, send_message_tool, skills_tool,
terminal_tool (both notify_on_complete AND check_interval blocks), tts_tool,
agent/skill_utils, agent/prompt_builder
- Tests updated for new contextvar-based API
Fixes#7358
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Add 9 tests covering the full zombie process prevention chain:
- TestZombieReproduction: demonstrates that processes survive when
references are dropped without explicit cleanup (the original bug)
- TestAgentCloseMethod: verifies close() calls all cleanup functions,
is idempotent, propagates to children, and continues cleanup even
when individual steps fail
- TestGatewayCleanupWiring: verifies stop() calls close() and that
_evict_cached_agent() does NOT call close() (since it's also used
for non-destructive cache refreshes)
- TestDelegationCleanup: calls the real _run_single_child function and
verifies close() is called on the child agent
Ref: #7131
Add is_network_accessible() helper using Python's ipaddress module to
robustly classify bind addresses (IPv4/IPv6 loopback, wildcards,
mapped addresses, hostname resolution with DNS-failure-fails-closed).
The API server connect() now refuses to start when the bind address is
network-accessible and no API_SERVER_KEY is set, preventing RCE from
other machines on the network.
Co-authored-by: entropidelic <entropidelic@users.noreply.github.com>
- Bug 1: replace read_file(limit=10000) with read_file_raw in _apply_update,
preventing silent truncation of files >2000 lines and corruption of lines
>2000 chars; add read_file_raw to FileOperations abstract interface and
ShellFileOperations
- Bug 2: split apply_v4a_operations into validate-then-apply phases; if any
hunk fails validation, zero writes occur (was: continue after failure,
leaving filesystem partially modified)
- Bug 3: parse_v4a_patch now returns an error for begin-marker-with-no-ops,
empty file paths, and moves missing a destination (was: always returned
error=None)
- Bug 4: raise strategy 7 (block anchor) single-candidate similarity threshold
from 0.10 to 0.50, eliminating false-positive matches in repetitive code
- Bug 5: add _strategy_unicode_normalized (new strategy 7) with position
mapping via _build_orig_to_norm_map; smart quotes and em-dashes in
LLM-generated patches now match via strategies 1-6 before falling through
to fuzzy strategies
- Bug 6: extend fuzzy_find_and_replace to return 4-tuple (content, count,
error, strategy); update all 5 call sites across patch_parser.py,
file_operations.py, and skill_manager_tool.py
- Bug 7: guard in _apply_update returns error when addition-only context hint
is ambiguous (>1 occurrences); validation phase errors on both 0 and >1
- Bug 8: _apply_delete returns error (not silent success) on missing file
- Bug 9: _validate_operations checks source existence and destination absence
for MOVE operations before any write occurs
When enabled, @mentioning the bot in a DM creates a thread (default:
false). Supports both env var and YAML config (matrix.dm_mention_threads).
6 new tests, docs updated.
From #6957
When _build_api_kwargs() throws an exception, the except handler in
the retry loop referenced api_kwargs before it was assigned. This
caused an UnboundLocalError that masked the real error, making
debugging impossible for the user.
Two _dump_api_request_debug() calls in the except block (non-retryable
client error path and max-retries-exhausted path) both accessed
api_kwargs without checking if it was assigned.
Fix: initialize api_kwargs = None before the retry loop and guard both
dump calls. Now the real error surfaces instead of the masking
UnboundLocalError.
Reported by Discord user gruman0.
HERMES_OVERLAYS keys use models.dev IDs (e.g. 'github-copilot') but
_PROVIDER_MODELS curated lists and config.yaml use Hermes provider IDs
('copilot'). list_authenticated_providers() Section 2 was using the
overlay key directly for model lookups and is_current checks, causing:
- 0 models shown for copilot, kimi, kilo, opencode, vercel
- is_current never matching the config provider
Fix: build reverse mapping from PROVIDER_TO_MODELS_DEV to translate
overlay keys to Hermes slugs before curated list lookup and result
construction. Also adds 'kimi-for-coding' alias in auth.py so the
picker's returned slug resolves correctly in resolve_provider().
Fixes#5223. Based on work by HearthCore (#6492) and linxule (#6287).
Co-authored-by: HearthCore <HearthCore@users.noreply.github.com>
Co-authored-by: linxule <linxule@users.noreply.github.com>
Adds xAI as a first-class provider: ProviderConfig in auth.py,
HermesOverlay in providers.py, 11 curated Grok models, URL mapping
in model_metadata.py, aliases (x-ai, x.ai), and env var tests.
Uses standard OpenAI-compatible chat completions.
Closes#7050
`delegate_task` silently truncated batch tasks to 3 — the model sends
5 tasks, gets results for 3, never told 2 were dropped. Now returns a
clear tool_error explaining the limit and how to fix it.
The limit is configurable via:
- delegation.max_concurrent_children in config.yaml (priority 1)
- DELEGATION_MAX_CONCURRENT_CHILDREN env var (priority 2)
- default: 3
Uses the same _load_config() path as the rest of delegate_task for
consistent config priority. Clamps to min 1, warns on non-integer
config values.
Also removes the hardcoded maxItems: 3 from the JSON schema — the
schema was blocking the model from even attempting >3 tasks before
the runtime check could fire. The runtime check gives a much more
actionable error message.
Backwards compatible: default remains 3, existing configs unchanged.
Isolate system tool configs (git, ssh, gh, npm) per profile by injecting
a per-profile HOME into subprocess environments only. The Python
process's own os.environ['HOME'] and Path.home() are never modified,
preserving all existing profile infrastructure.
Activation is directory-based: when {HERMES_HOME}/home/ exists on disk,
subprocesses see it as HOME. The directory is created automatically for:
- Docker: entrypoint.sh bootstraps it inside the persistent volume
- Named profiles: added to _PROFILE_DIRS in profiles.py
Injection points (all three subprocess env builders):
- tools/environments/local.py _make_run_env() — foreground terminal
- tools/environments/local.py _sanitize_subprocess_env() — background procs
- tools/code_execution_tool.py child_env — execute_code sandbox
Single source of truth: hermes_constants.get_subprocess_home()
Closes#4426
Broaden the UnicodeEncodeError recovery to handle systems with ASCII-only
locale (LANG=C, Chromebooks) where ANY non-ASCII character causes encoding
failure, not just lone surrogates.
Changes:
- Add _strip_non_ascii() and _sanitize_messages_non_ascii() helpers that
strip all non-ASCII characters from message content, name, and tool_calls
- Update the UnicodeEncodeError handler to detect ASCII codec errors and
fall back to non-ASCII sanitization after surrogate check fails
- Sanitize tool_calls arguments and name fields (not just content)
- Fix bare .encode() in cli.py suspend handler to use explicit utf-8
- Add comprehensive test suite (17 tests)