Introduce gateway_timeout_warning (default 900s) as a pre-timeout alert
layer. When inactivity reaches the warning threshold, a single
notification is sent to the user offering to wait or reset. If
inactivity continues to the gateway_timeout (default 1800s), the full
timeout fires as before.
This gives users a chance to intervene before work is lost on slow
API providers without disabling the safety timeout entirely.
Config: agent.gateway_timeout_warning in config.yaml, or
HERMES_AGENT_TIMEOUT_WARNING env var (0 = disable warning).
Step c in switch_model() blindly converted the first colon to a slash for
aggregator providers, even when the model name already contained a slash
(vendor/model format). This mangled variant tags like :free into /free,
causing 400 Bad Request from the API.
Fix: skip the colon→slash conversion when the model already has a slash,
since the colon is a variant tag, not a vendor separator. The module
docstring already documented this intent (line 17-18) but the
implementation didn't enforce it.
Reported via Discord. Related to PR #6088 (which identified the same bug
but placed the fix in model_normalize.py instead of model_switch.py where
the actual mangling occurs).
Local inference providers (Ollama, oMLX, llama-cpp) can take 300+ seconds
for prefill on large contexts. The 180s stale stream detector was killing
these connections while the provider was still processing.
Uses the existing is_local_endpoint() (proper URL parsing with RFC-1918,
localhost, WSL detection) instead of ad-hoc substring matching. The stale
timeout is only disabled when the user hasn't explicitly set
HERMES_STREAM_STALE_TIMEOUT — explicit user config is always honored.
Fixes#5889
Documents the per-platform tool_progress_overrides config key added in
PR #6348. Shows example YAML with Signal set to 'off' while Telegram
stays on 'verbose'. Lists all valid platform keys.
The _call_llm() and direct OpenAI fallback paths in flush_memories() both
hardcoded timeout=30.0, ignoring the user-configurable value at
auxiliary.flush_memories.timeout in config.yaml.
Remove the explicit timeout from the auxiliary _call_llm() call so that
_get_task_timeout('flush_memories') reads from config. For the direct
OpenAI fallback, import and use _get_task_timeout() instead of the
hardcoded value.
Add two regression tests verifying both code paths respect the config.
Fixes#6154
When managed_persistence is enabled, cleanup_browser() was calling
camofox_close() which destroys the server-side browser context via
DELETE /sessions/{userId}, killing login sessions across cron runs.
Add camofox_soft_cleanup() — a public wrapper that drops only the
in-memory session entry when managed persistence is on, returning True.
When persistence is off it returns False so the caller falls back to
the full camofox_close(). The inactivity reaper still handles idle
resource cleanup.
Also surface a logger.warning() when _managed_persistence_enabled()
fails to load config, replacing a silent except-and-return-False.
Salvaged from #6182 by el-analista (Eduardo Perea Fernandez).
Added public API wrapper to avoid cross-module private imports,
and test coverage for both persistence paths.
Co-authored-by: Eduardo Perea Fernandez <el-analista@users.noreply.github.com>
Fixes#4647 — Signal replies duplicated when gateway streaming is enabled.
Root cause: stream_consumer.py did not handle the case where send() returns
success=True but no message_id (Signal behavior). Every stream delta produced
a separate send() call (7+ messages instead of 2), plus the gateway sent
another full duplicate since already_sent was never set.
Changes:
- stream_consumer.py: Add elif branch for success-without-message_id — enters
fallback mode (sets already_sent, disables editing, sends only continuation)
- signal.py send(): Extract timestamp from signal-cli RPC result as message_id
so stream consumer follows normal edit→fallback path
- signal.py: Add public stop_typing() delegating to _stop_typing_indicator()
so base adapter's _keep_typing finally block can clean up typing tasks
- gateway/run.py: Per-platform tool_progress_overrides (#6164) — lets users
set e.g. signal: off while keeping telegram: all
- hermes_cli/config.py: Add tool_progress_overrides to DEFAULT_CONFIG
Refs: #4647, #6164
- Docker env tests: verify _build_init_env_args() instead of per-execute
Popen flags (env forwarding is now init-time only)
- Docker: preserve explicit forward_env bypass of blocklist from main
- Daytona tests: adapt to SDK-native timeout, _ThreadedProcessHandle,
base.py interrupt handling, HERMES_STDIN_ heredoc prefix
- Modal tests: fix _load_module to include _ThreadedProcessHandle stub,
check ensurepip in _resolve_modal_image instead of __init__
- SSH tests: mock time.sleep on base module instead of removed ssh import
- Add missing BaseEnvironment attributes to __new__()-based test fixtures
Add configurable reply-reference behavior for Discord, matching the
existing Telegram (TELEGRAM_REPLY_TO_MODE) and Mattermost
(MATTERMOST_REPLY_MODE) implementations.
Modes:
- 'off': never reply-reference the original message
- 'first': reply-reference on first chunk only (default, current behavior)
- 'all': reply-reference on every chunk
Set DISCORD_REPLY_TO_MODE=off in .env to disable reply-to messages.
Changes:
- gateway/config.py: parse DISCORD_REPLY_TO_MODE env var
- gateway/platforms/discord.py: read reply_to_mode from config, respect
it in send() — skip fetch_message entirely when 'off'
- hermes_cli/config.py: add to OPTIONAL_ENV_VARS for hermes setup
- 23 tests covering config, send behavior, env var override
- docs: discord.md env var table + environment-variables.md reference
Closes community request from Stuart on Discord.
Two linked fixes for MiniMax Anthropic-compatible fallback:
1. Normalize httpx.URL to str before calling .rstrip() in auth/provider
detection helpers. Some client objects expose base_url as httpx.URL,
not str — crashed with AttributeError in _requires_bearer_auth() and
_is_third_party_anthropic_endpoint(). Also fixes _try_activate_fallback()
to use the already-stringified fb_base_url instead of raw httpx.URL.
2. Strip Anthropic-proprietary thinking block signatures when targeting
third-party Anthropic-compatible endpoints (MiniMax, Azure AI Foundry,
self-hosted proxies). These endpoints cannot validate Anthropic's
signatures and reject them with HTTP 400 'Invalid signature in
thinking block'. Now threads base_url through convert_messages_to_anthropic()
→ build_anthropic_kwargs() so signature management is endpoint-aware.
Based on PR #4945 by kshitijk4poor (rstrip fix).
Fixes#4944.
Users were setting model.provider to "main" after reading the auxiliary
provider docs, causing "Unknown provider" errors. The "main" alias is
only valid inside auxiliary:, compression:, and fallback_model: configs
where it means "use the same provider as my main agent chat."
Added warning admonitions and inline clarifications to:
- configuration.md: Auxiliary Models provider list and Provider Options table
- fallback-providers.md: Provider Options for Auxiliary Tasks table
Reported by community member cn on Discord.
Fixes 9 test failures on current main, incorporating ideas from PR stack
#6219-#6222 by xinbenlv with corrections:
- model_metadata: sync HF context length key casing
(minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5)
- cli.py: route quick command error output through self.console
instead of creating a new ChatConsole() instance
- docker.py: explicit docker_forward_env entries now bypass the
Hermes secret blocklist (intentional opt-in wins over generic filter)
- auxiliary_client: revert _read_main_provider() to simple
provider.strip().lower() — the _normalize_aux_provider() call
introduced in 5c03f2e7 stripped the custom: prefix, breaking
named custom provider resolution
- auxiliary_client: flip vision auto-detection order to
active provider → OpenRouter → Nous → stop (was OR → Nous → active)
- test: update vision priority test to match new order
Based on PR #6219-#6222 by xinbenlv.
* fix(nix): export HERMES_HOME system-wide when addToSystemPackages is true
The `addToSystemPackages` option's documentation (and the `:::tip` block in
`website/docs/getting-started/nix-setup.md`) promises that enabling it both
puts the `hermes` CLI on PATH and sets `HERMES_HOME` system-wide so interactive
shells share state with the gateway service. The module only did the former,
so running `hermes` in a user shell silently created a separate `~/.hermes/`
directory instead of the managed `${stateDir}/.hermes`.
Implement the documented behavior by also setting
`environment.variables.HERMES_HOME = "${cfg.stateDir}/.hermes"` in the same
mkIf block, and update the option description to match.
Fixes#6044
* fix(nix): preserve group-readable permissions in managed mode
The NixOS module sets HERMES_HOME directories to 0750 and files to 0640
so interactive users in the hermes group can share state with the gateway
service. Two issues prevented this from working:
1. hermes_cli/config.py: _secure_dir() unconditionally chmod'd HERMES_HOME
to 0700 on every startup, overwriting the NixOS module's 0750. Similarly,
_secure_file() forced 0600 on config files. Both now skip in managed mode
(detected via .managed marker or HERMES_MANAGED env var).
2. nix/nixosModules.nix: the .env file was created with 0600 (owner-only),
while config.yaml was already 0640 (group-readable). Changed to 0640 for
consistency — users granted hermes group membership should be able to read
the managed .env.
Verified with a NixOS VM integration test: a normal user in the hermes group
can now run `hermes version` and `hermes config` against the managed
HERMES_HOME without PermissionError.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: zerone0x <zerone0x@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mistralai v2.x is a namespace package — `Mistral` class lives at
`mistralai.client`, not at the top-level `mistralai` module. The
previous `from mistralai import Mistral` raises ImportError at runtime.
Update both production code and test fixture to use the correct path.
- Add HERMES_QWEN_BASE_URL to OPTIONAL_ENV_VARS in config.py (was missing
despite being referenced in code)
- Remove redundant qwen-oauth entry from _API_KEY_PROVIDER_AUX_MODELS
(non-aggregator providers use their main model for aux tasks automatically)
Based on #6079 by @tunamitom with critical fixes and comprehensive tests.
Changes from #6079:
- Fix: sanitization overwrite bug — Qwen message prep now runs AFTER codex
field sanitization, not before (was silently discarding Qwen transforms)
- Fix: missing try/except AuthError in runtime_provider.py — stale Qwen
credentials now fall through to next provider on auto-detect
- Fix: 'qwen' alias conflict — bare 'qwen' stays mapped to 'alibaba'
(DashScope); use 'qwen-portal' or 'qwen-cli' for the OAuth provider
- Fix: hardcoded ['coder-model'] replaced with live API fetch + curated
fallback list (qwen3-coder-plus, qwen3-coder)
- Fix: extract _is_qwen_portal() helper + _qwen_portal_headers() to replace
5 inline 'portal.qwen.ai' string checks and share headers between init
and credential swap
- Fix: add Qwen branch to _apply_client_headers_for_base_url for mid-session
credential swaps
- Fix: remove suspicious TypeError catch blocks around _prompt_provider_choice
- Fix: handle bare string items in content lists (were silently dropped)
- Fix: remove redundant dict() copies after deepcopy in message prep
- Revert: unrelated ai-gateway test mock removal and model_switch.py comment deletion
New tests (30 test functions):
- _qwen_cli_auth_path, _read_qwen_cli_tokens (success + 3 error paths)
- _save_qwen_cli_tokens (roundtrip, parent creation, permissions)
- _qwen_access_token_is_expiring (5 edge cases: fresh, expired, within skew,
None, non-numeric)
- _refresh_qwen_cli_tokens (success, preserve old refresh, 4 error paths,
default expires_in, disk persistence)
- resolve_qwen_runtime_credentials (fresh, auto-refresh, force-refresh,
missing token, env override)
- get_qwen_auth_status (logged in, not logged in)
- Runtime provider resolution (direct, pool entry, alias)
- _build_api_kwargs (metadata, vl_high_resolution_images, message formatting,
max_tokens suppression)
Fixes#6259. Three bugs fixed:
1. Config key mismatch: _get_client() and _start_daemon() read
'llmApiKey' (camelCase) but save_config() stores 'llm_api_key'
(snake_case). The config value was never read — only the env var
fallback worked.
2. Missing base URL support: users on OpenRouter or custom endpoints
had no way to configure HINDSIGHT_API_LLM_BASE_URL through setup.
Added llm_base_url to config schema with empty default, passed
conditionally to HindsightEmbedded constructor.
3. Daemon config change detection: config_changed now also checks
HINDSIGHT_API_LLM_BASE_URL, and the daemon profile .env includes
the base URL when set.
Keeps HINDSIGHT_API_LLM_API_KEY (with double API) in the daemon
profile .env — this matches the upstream hindsight .env.example
convention.
Documents the proxy env var support added in PR #3591 (salvage of #3411
by @kufufu9). Covers HTTPS_PROXY/HTTP_PROXY/ALL_PROXY precedence,
configuration methods, and scope.
* fix(tools): skip camofox auto-cleanup when managed persistence is enabled
When managed_persistence is enabled, cleanup_browser() was calling
camofox_close() which destroys the server-side browser context via
DELETE /sessions/{userId}, killing login sessions across cron runs.
Add camofox_soft_cleanup() — a public wrapper that drops only the
in-memory session entry when managed persistence is on, returning True.
When persistence is off it returns False so the caller falls back to
the full camofox_close(). The inactivity reaper still handles idle
resource cleanup.
Also surface a logger.warning() when _managed_persistence_enabled()
fails to load config, replacing a silent except-and-return-False.
Salvaged from #6182 by el-analista (Eduardo Perea Fernandez).
Added public API wrapper to avoid cross-module private imports,
and test coverage for both persistence paths.
Co-authored-by: Eduardo Perea Fernandez <el-analista@users.noreply.github.com>
* fix(doctor): only check the active memory provider, not all providers unconditionally
hermes doctor had hardcoded Honcho Memory and Mem0 Memory sections that
always ran regardless of the user's memory.provider config setting. After
the swappable memory provider update (#4623), users with leftover Honcho
config but no active provider saw false 'broken' errors.
Replaced both sections with a single Memory Provider section that reads
memory.provider from config.yaml and only checks the configured provider.
Users with no external provider see a green 'Built-in memory active' check.
Reported by community user michaelruiz001, confirmed by Eri (Honcho).
---------
Co-authored-by: Eduardo Perea Fernandez <el-analista@users.noreply.github.com>
Problem: hermes -w sessions accumulated 37+ worktrees and 1200+ orphaned
branches because:
- _cleanup_worktree bailed on any dirty working tree, but agent sessions
almost always leave untracked files/artifacts behind
- _prune_stale_worktrees had the same dirty-check, so stale worktrees
survived indefinitely
- pr-* and hermes/* branches from PR review had zero cleanup mechanism
Changes:
- _cleanup_worktree: check for unpushed commits instead of dirty state.
Agent work lives in pushed commits/PRs — dirty working tree without
unpushed commits is just artifacts, safe to remove.
- _prune_stale_worktrees: three-tier age system:
- Under 24h: skip (session may be active)
- 24h-72h: remove if no unpushed commits
- Over 72h: force remove regardless
- New _prune_orphaned_branches: on each -w startup, deletes local
hermes/hermes-* and pr-* branches with no corresponding worktree.
Protects main, checked-out branch, and active worktree branches.
Tests: 42 pass (6 new covering unpushed-commit logic, force-prune
tier, and orphaned branch cleanup).
- Fire on_session_finalize and on_session_reset in gateway _handle_reset_command()
- Fire on_session_finalize during gateway stop() for each active agent
- Move CLI test from tests/ root to tests/cli/ (matches recent restructure)
- Add 5 gateway tests covering reset hooks, ordering, shutdown, and error handling
- Place on_session_reset after new session is guaranteed to exist (covers
the get_or_create_session fallback path)
Plugins can now subscribe to session boundary events via
ctx.register_hook('on_session_finalize', ...) and
ctx.register_hook('on_session_reset', ...).
on_session_finalize — fires during CLI exit (/quit, Ctrl-C) and
before /new or /reset, giving plugins a chance to flush or clean up.
on_session_reset — fires after a new session is created via
/new or /reset, so plugins can initialize per-session state.
Closes#5592
Hermes Agent identified and patched its own prompting blind spots through
automated self-evaluation — running 64+ tool-use benchmarks across GPT-5.4
and Codex-5.3, diagnosing 5 failure modes, writing targeted prompt patches,
and verifying the fix in a closed loop.
Failure modes discovered and fixed:
- Mental arithmetic (wrong answers: 39,152,053 vs correct 39,151,253)
- User profile hallucination ('Windows 11' when running on Linux)
- Time guessing without verification
- Clarification-seeking instead of acting ('open where?' for port checks)
- Hash computation from memory (SHA-256, encodings)
- Confusing system RAM with agent's own persistent memory store
Two new XML sections added to OPENAI_MODEL_EXECUTION_GUIDANCE:
- <mandatory_tool_use>: explicit categories that must always use tools
- <act_dont_ask>: default to action on obvious interpretations
Results:
gpt-5.4: 68.8% → 100% tool compliance (+31.2pp)
gpt-5.3-codex: 62.5% → 100% tool compliance (+37.5pp)
Regression: 0/8 conversational prompts over-tooled
Anthropic signs thinking blocks against the full turn content. Any
upstream mutation (context compression, session truncation, orphan
stripping, message merging) invalidates the signature, causing HTTP 400
'Invalid signature in thinking block' — especially in long-lived
gateway sessions.
Strategy (following clawdbot/OpenClaw pattern):
1. Strip thinking/redacted_thinking from all assistant messages EXCEPT
the last one — preserves reasoning continuity on the current
tool-use chain while avoiding stale signature errors on older turns.
2. Downgrade unsigned thinking blocks to plain text — Anthropic can't
validate them, but the reasoning content is preserved.
3. Strip cache_control from thinking/redacted_thinking blocks to
prevent cache markers from interfering with signature validation.
4. Drop thinking blocks from the second message when merging
consecutive assistant messages (role alternation enforcement).
5. Error recovery: on HTTP 400 mentioning 'signature' and 'thinking',
strip all reasoning_details from the conversation and retry once.
This is the safety net for edge cases the proactive stripping
misses.
Addresses the issue reported in PR #6086 by @mingginwan while
preserving reasoning continuity (their PR stripped ALL thinking
blocks unconditionally).
Files changed:
- agent/anthropic_adapter.py: thinking block management in
convert_messages_to_anthropic (strip old turns, downgrade unsigned,
strip cache_control, merge-time strip)
- run_agent.py: one-shot signature error recovery in retry loop
- tests/test_anthropic_adapter.py: 10 new tests covering all cases
Gateway and cron had inconsistent reasoning_effort resolution:
- CLI: config.yaml only (correct)
- Gateway: config.yaml first, env var fallback
- Cron: env var first, config.yaml fallback
All three now read exclusively from agent.reasoning_effort in config.yaml.
Removed HERMES_REASONING_EFFORT env var support entirely — .env is for
secrets only, not behavioral config.
Previously crash recovery recreated detached sessions as if they were
fully managed, so polls and kills could lie about liveness and the
checkpoint could forget recovered jobs after the next restart.
This commit refreshes recovered host-backed sessions from real PID
state, keeps checkpoint data durable, and preserves notify watcher
metadata while treating sandbox-only PIDs as non-recoverable.
- Persist `pid_scope` in `tools/process_registry.py` and skip
recovering sandbox-backed entries without a host-visible PID handle
- Refresh detached sessions on access so `get`/`poll`/`wait` and active
session queries observe exited processes instead of hanging forever
- Allow recovered host PIDs to be terminated honestly and requeue
`notify_on_complete` watchers during checkpoint recovery
- Add regression tests for durable checkpoints, detached exit/kill
behavior, sandbox skip logic, and recovered notify watchers
Import hermes_constants.get_hermes_home at module level so it is
available in _start_daemon() when local mode starts the embedded
daemon. Previously the import was only inside _load_config(), causing
NameError when _start_daemon() referenced get_hermes_home().
Fixes#5993
Co-Authored-By: 史官 <historian@slock.team>
Add documentation for cocktailpeanut's hermes-mod community tool —
a web UI for creating and managing Hermes skins visually. Covers
installation (Pinokio, npx, manual), usage walkthrough, and feature
overview including ASCII art generation from images.
Ref: https://github.com/cocktailpeanut/hermes-mod
- DEFAULT_RESULT_SIZE_CHARS: 50K -> 100K (match current _LARGE_RESULT_CHARS)
- DEFAULT_PREVIEW_SIZE_CHARS: 2K -> 1.5K (match current _LARGE_RESULT_PREVIEW_CHARS)
- Per-tool overrides all set to 100K (terminal, execute_code, search_files)
- Remove pre-read byte guard (no behavioral regression vs current main)
- Revert limit signature change to int=500 (match current default)
- Restore original read_file schema description
- Update test assertions to match 100K thresholds
Add BudgetConfig dataclass to centralize and make overridable the
hardcoded constants (50K per-result, 200K per-turn, 2K preview) that
control when tool outputs get persisted to sandbox. Configurable at
the RL environment level via HermesAgentEnvConfig fields, threaded
through HermesAgentLoop to the storage layer.
Resolution: pinned (read_file=inf) > env config overrides > registry
per-tool > default. CLI override: --env.turn_budget_chars 80000