When installing a system service via sudo, ExecStart, WorkingDirectory,
VIRTUAL_ENV, and PATH entries were not remapped to the target user's
home — only HERMES_HOME was. This caused the service to fail with
status=200/CHDIR because the target user cannot access /root/.
Adds _remap_path_for_user() helper and applies it to all path variables
in the system branch of generate_systemd_unit().
Closes#6989
Legacy flat stt.model config key (from cli-config.yaml.example and older
versions) was passed as a model override to transcribe_audio() by the
gateway, bypassing provider-specific model resolution. When the provider
was 'local' (faster-whisper), this caused:
ValueError: Invalid model size 'whisper-1'
Changes:
- gateway/run.py, discord.py: stop passing model override — let
transcribe_audio() handle provider-specific model resolution internally
- get_stt_model_from_config(): now provider-aware, reads from the correct
nested section (stt.local.model, stt.openai.model, etc.); ignores
legacy flat key for local provider to prevent model name mismatch
- cli-config.yaml.example: updated STT section to show nested provider
config structure instead of legacy flat key
- config migration v13→v14: moves legacy stt.model to the correct
provider section and removes the flat key
Reported by community user on Discord.
Follow-up to cherry-picked PR #6592:
- Extract _webhook_url property to deduplicate URL construction
- Add _find_registered_webhooks() helper for reuse
- Crash resilience: check for existing registration before POSTing
(handles restart after unclean shutdown without creating duplicates)
- Accept 200-299 status range (not just 200) for webhook creation
- Unregister removes ALL matching registrations (cleans up orphaned dupes)
- Add 17 tests covering register/unregister/find/edge cases
Add Discord thread support to cron delivery and send_message_tool.
- _parse_target_ref: handle discord platform with chat_id:thread_id format
- _send_discord: add thread_id param, route to /channels/{thread_id}/messages
- _send_to_platform: pass thread_id through for Discord
- Discord adapter send(): read thread_id from metadata for gateway path
- Update tool schema description to document Discord thread targets
Cherry-picked from PR #7046 by pandacooming (maxyangcn).
Follow-up fixes:
- Restore proxy support (resolve_proxy_url/proxy_kwargs_for_aiohttp) that was
accidentally deleted — would have caused NameError at runtime
- Remove duplicate _DISCORD_TARGET_RE regex; reuse existing _TELEGRAM_TOPIC_TARGET_RE
via _NUMERIC_TOPIC_RE alias (identical pattern)
- Fix misleading test comments about Discord negative snowflake IDs
(Discord uses positive snowflakes; negative IDs are a Telegram convention)
- Rewrite misleading scheduler test that claimed to exercise home channel
fallback but actually tested the explicit platform:chat_id parsing path
- Add custom_provider_slug() to hermes_cli/providers.py as the single
source of truth for building 'custom:<name>' slugs.
- Use it in resolve_custom_provider() and list_authenticated_providers()
instead of duplicated inline slug construction.
- Add _session_model_overrides and _voice_mode to gateway test runner
for object.__new__() safety.
Custom providers defined in config.yaml under were
completely invisible to the /model command in both gateway (Telegram,
Discord, etc.) and CLI. The provider listing skipped them and explicit
switching via --provider failed with "Unknown provider".
Root cause: gateway/run.py, cli.py, and model_switch.py only read the
dict from config, ignoring entirely.
Changes:
- providers.py: add resolve_custom_provider() and extend
resolve_provider_full() to check custom_providers after user_providers
- model_switch.py: propagate custom_providers through switch_model(),
list_authenticated_providers(), and get_authenticated_provider_slugs();
add custom provider section to provider listings
- gateway/run.py: read custom_providers from config, pass to all
model-switch calls
- cli.py: hoist config loading, pass custom_providers to listing and
switch calls
Tests: 4 new regression tests covering listing, resolution, and gateway
command handler. All 71 tests pass.
xAI /v1/models does not return context_length metadata, so Hermes
probes down to the 128k default whenever a user configures a custom
provider pointing at https://api.x.ai/v1. This forces every xAI user
to manually override model.context_length in config.yaml (2M for
Grok 4.20 / 4.1-fast / 4-fast) or lose most of the usable context
window.
Add DEFAULT_CONTEXT_LENGTHS entries for the Grok family so the
fallback lookup returns the correct value via substring matching.
Values sourced from models.dev (2026-04) and cross-checked against
the xAI /v1/models listing:
- grok-4.20-* 2,000,000 (reasoning, non-reasoning, multi-agent)
- grok-4-1-fast-* 2,000,000
- grok-4-fast-* 2,000,000
- grok-4 / grok-4-0709 256,000
- grok-code-fast-1 256,000
- grok-3* 131,072
- grok-2 / latest 131,072
- grok-2-vision* 8,192
- grok (catch-all) 131,072
Keys are ordered longest-first so that specific variants match before
the catch-all, consistent with the existing Claude/Gemma/MiniMax entries.
Add TestDefaultContextLengths.test_grok_models_context_lengths and
test_grok_substring_matching to pin the values and verify the full
lookup path. All 77 tests in test_model_metadata.py pass.
The session store was copying the ENTIRE parent DM transcript into new
thread sessions. This caused unrelated conversations to bleed across
threads in Slack DMs.
The Slack adapter already handles thread context correctly via
_fetch_thread_context() (conversations.replies API), which fetches
only the actual thread messages. The session-level seeding was both
redundant and harmful.
No other platform (Telegram, Discord) uses DM threads, so the seeding
code path was only triggered by Slack — where it conflicted with the
adapter-level context.
Tests updated to assert thread isolation: all thread sessions start
empty, platform adapters are responsible for injecting thread context.
Salvage of PR #5868 (jarvisxyz). Reported by norbert on Discord.
- Modal snapshot tests: accept **kw in iter_skills_files/iter_cache_files
mock lambdas to match new container_base kwarg
- SSH preflight test: mock _detect_remote_home, _ensure_remote_dirs,
init_session, and FileSyncManager added in file sync PR
Replace per-backend ad-hoc file sync with a shared FileSyncManager
that handles mtime-based change detection, remote deletion of
locally-removed files, and transactional state updates.
- New FileSyncManager class (tools/environments/file_sync.py)
with callbacks for upload/delete, rate limiting, and rollback
- Shared iter_sync_files() eliminates 3 duplicate implementations
- SSH: replace unconditional rsync with scp + mtime skip
- Modal/Daytona: replace inline _synced_files dict with manager
- All 3 backends now sync credentials + skills + cache uniformly
- Remote deletion: files removed locally are cleaned from remote
- HERMES_FORCE_FILE_SYNC=1 env var for debugging
- Base class _before_execute() simplified to empty hook
- 12 unit tests covering mtime skip, deletion, rollback, rate limiting
Change behavior from silent clamping to returning an error when the
model requests a foreground timeout exceeding FOREGROUND_MAX_TIMEOUT.
This forces the model to use background=true for long-running commands
rather than silently changing its intent.
- Config default timeouts above the cap are NOT rejected (user's choice)
- Only explicit model-requested timeouts trigger rejection
- Added boundary test for timeout exactly at the limit
When the model calls terminal() in foreground mode without background=true
(e.g. to start a server), the tool call blocks until the command exits or
the timeout expires. Without an upper bound the model can request arbitrarily
high timeouts (the schema had minimum=1 but no maximum), blocking the entire
agent session for hours until the gateway idle watchdog kills it.
Changes:
- Add FOREGROUND_MAX_TIMEOUT (600s, configurable via
TERMINAL_MAX_FOREGROUND_TIMEOUT env var) that caps foreground timeout
- Clamp effective_timeout to the cap when background=false and timeout
exceeds the limit
- Include a timeout_note in the tool result when clamped, nudging the
model to use background=true for long-running processes
- Update schema description to show the max timeout value
- Remove dead clamping code in the background branch that could never
fire (max_timeout was set to effective_timeout, so timeout > max_timeout
was always false)
- Add 7 tests covering clamping, no-clamping, config-default-exceeds-cap
edge case, background bypass, default timeout, constant value, and
schema content
Self-review fixes:
- Fixed bug where timeout_note said 'Requested timeout Nones' when
clamping fired from config default exceeding cap (timeout param is
None). Now uses unclamped_timeout instead of the raw timeout param.
- Removed unused pytest import from test file
- Extracted test config dict into _make_env_config() helper
- Fixed tautological test_default_value assertion
- Added missing test for config default > cap with no model timeout
The gateway /model command stored session overrides in
_session_model_overrides but run_sync() never consulted them when
resolving the model and runtime for the next message. It always read
from config.yaml, so the switch was lost as soon as a new agent was
created.
Two fixes:
1. In run_sync(), apply _session_model_overrides after resolving from
config.yaml/env — the override takes precedence for model, provider,
api_key, base_url, and api_mode.
2. In post-run fallback detection, check whether the model mismatch
(agent.model != config_model) is due to an intentional /model switch
before evicting the cached agent. Without this, the first message
after /model would work (cached agent reused) but the fallback
detector would evict it, causing the next message to revert.
Affects all gateway platforms (Telegram, Discord, Slack, WhatsApp,
Signal, Matrix, BlueBubbles, HomeAssistant) since they all share
GatewayRunner._run_agent().
Fixes#6213
The gateway /usage handler only looked in _running_agents for the agent
object, which is only populated while the agent is actively processing a
message. Between turns (when users actually type /usage), the dict is
empty and the handler fell through to a rough message-count estimate.
The agent object actually lives in _agent_cache between turns (kept for
prompt caching). This fix checks both dicts, with _running_agents taking
priority (mid-turn) and _agent_cache as the between-turns fallback.
Also brings the gateway output to parity with the CLI /usage:
- Model name
- Detailed token breakdown (input, output, cache read, cache write)
- Cost estimation (estimated amount or 'included' for subscriptions)
- Cache token lines hidden when zero (cleaner output)
This fixes Nous Portal rate limit headers not showing up for gateway
users — the data was being captured correctly but the handler could
never see it.
Extends the /fast command to support Anthropic's Fast Mode beta in addition
to OpenAI Priority Processing. When enabled on Claude Opus 4.6, adds
speed:"fast" and the fast-mode-2026-02-01 beta header to API requests for
~2.5x faster output token throughput.
Changes:
- hermes_cli/models.py: Add _ANTHROPIC_FAST_MODE_MODELS registry,
model_supports_fast_mode() now recognizes Claude Opus 4.6,
resolve_fast_mode_overrides() returns {speed: fast} for Anthropic
vs {service_tier: priority} for OpenAI
- agent/anthropic_adapter.py: Add _FAST_MODE_BETA constant,
build_anthropic_kwargs() accepts fast_mode=True which injects
speed:fast + beta header via extra_headers (skipped for third-party
Anthropic-compatible endpoints like MiniMax)
- run_agent.py: Pass fast_mode to build_anthropic_kwargs in the
anthropic_messages path of _build_api_kwargs()
- cli.py: Update _handle_fast_command with provider-aware messaging
(shows 'Anthropic Fast Mode' vs 'Priority Processing')
- hermes_cli/commands.py: Update /fast description to mention both
providers
- tests: 13 new tests covering Anthropic model detection, override
resolution, CLI availability, routing, adapter kwargs, and
third-party endpoint safety
When `hermes update` stashes local changes and the restore hits merge
conflicts, the old code prompted the user to reset or keep conflict
markers. If the user declined the reset, git conflict markers
(<<<<<<< Updated upstream) were left in source files, making hermes
completely unrunnable with a SyntaxError on the next invocation.
Additionally, the interactive path called sys.exit(1), which killed
the entire update process before pip dependency install, skill sync,
and gateway restart could finish — even though the code pull itself
had succeeded.
Changes:
- Always auto-reset to clean state when stash restore conflicts
- Remove the "Reset working tree?" prompt (footgun)
- Remove sys.exit(1) — return False so cmd_update continues normally
- User's changes remain safely in the stash for manual recovery
Also fixes a secondary bug where the conflict handling prompt used
bare input() instead of the input_fn parameter, which would hang
in gateway mode.
Tests updated: replaced prompt/sys.exit assertions with auto-reset
behavior checks; removed the "user declines reset" test (path no
longer exists).
After mid-loop compression (triggered by 413, context_overflow, or Anthropic
long-context tier errors), _compress_context() creates a new session in SQLite
and resets _last_flushed_db_idx=0. However, conversation_history was not cleared,
so _flush_messages_to_session_db() computed:
flush_from = max(len(conversation_history=200), _last_flushed_db_idx=0) = 200
messages[200:] → empty (compressed messages < 200)
This resulted in zero messages being written to the new session's SQLite store.
On resume, the user would see 'Session found but has no messages.'
The preflight compression path (line 7311) already had the fix:
conversation_history = None
This commit adds the same clearing to the three mid-loop compression sites:
- Anthropic long-context tier overflow
- HTTP 413 payload too large
- Generic context_overflow error
Reported by Aaryan (Nous community).
Set _text_batch_delay_seconds = 0 on test adapter fixtures so messages
dispatch immediately (bypassing async batching). This preserves the
existing synchronous assertion patterns while the batching logic is
tested separately in test_text_batching.py.
22 tests covering:
- Single message dispatch after delay
- Split message aggregation (2-way and 3-way)
- Different chats/rooms not merged
- Adaptive delay for near-limit chunks
- State cleanup after flush
- Split continuation merging
All 5 platform adapters tested.
Raise the default httpx stream read timeout from 60s to 120s for all
providers. Additionally, auto-detect local LLM endpoints (Ollama,
llama.cpp, vLLM) and raise the read timeout to HERMES_API_TIMEOUT
(1800s) since local models can take minutes for prefill on large
contexts before producing the first token.
The stale stream timeout already had this local auto-detection pattern;
the httpx read timeout was missing it — causing a hard 60s wall that
users couldn't find (HERMES_STREAM_READ_TIMEOUT was undocumented).
Changes:
- Default HERMES_STREAM_READ_TIMEOUT: 60s -> 120s
- Auto-detect local endpoints -> raise to 1800s (user override respected)
- Document HERMES_STREAM_READ_TIMEOUT and HERMES_STREAM_STALE_TIMEOUT
- Add 10 parametrized tests
Reported-by: Pavan Srinivas (@pavanandums)
When the model mentions <think> as literal text in its response (e.g.
"(/think not producing <think> tags)"), the streaming display treated it
as a reasoning block opener and suppressed everything after it. The
response box would close with truncated content and no error — the API
response was complete but the display ate it.
Root cause: _stream_delta() matched <think> anywhere in the text stream
regardless of position. Real reasoning blocks always start at the
beginning of a line; mentions in prose appear mid-sentence.
Fix: track line position across streaming deltas with a
_stream_last_was_newline flag. Only enter reasoning suppression when
the tag appears at a block boundary (start of stream, after a newline,
or after only whitespace on the current line). Add a _flush_stream()
safety net that recovers buffered content if no closing tag is found
by end-of-stream.
Also fixes three related issues discovered during investigation:
- anthropic_adapter: _get_anthropic_max_output() now normalizes dots to
hyphens so 'claude-opus-4.6' matches the 'claude-opus-4-6' table key
(was returning 32K instead of 128K)
- run_agent: send explicit max_tokens for Claude models on Nous Portal,
same as OpenRouter — both proxy to Anthropic's API which requires it.
Without it the backend defaults to a low limit that truncates responses.
- run_agent: reset truncated_tool_call_retries after successful tool
execution so a single truncation doesn't poison the entire conversation.
Previously /fast only supported gpt-5.4 and forced a provider switch to
openai-codex. Now supports all 13 models from OpenAI's Priority Processing
pricing table (gpt-5.4, gpt-5.4-mini, gpt-5.2, gpt-5.1, gpt-5, gpt-5-mini,
gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini, o3, o4-mini).
Key changes:
- Replaced _FAST_MODE_BACKEND_CONFIG with _PRIORITY_PROCESSING_MODELS frozenset
- Removed provider-forcing logic — service_tier is now injected into whatever
API path the user is already on (Codex Responses, Chat Completions, or
OpenRouter passthrough)
- Added request_overrides support to chat_completions path in run_agent.py
- Updated messaging from 'Codex inference tier' to 'Priority Processing'
- Expanded test coverage for all supported models
Add /fast slash command to toggle OpenAI Codex service_tier between
normal and priority ('fast') inference. Only exposed for models
registered in _FAST_MODE_BACKEND_CONFIG (currently gpt-5.4).
- Registry-based backend config for extensibility
- Dynamic command visibility (hidden from help/autocomplete for
non-supported models) via command_filter on SlashCommandCompleter
- service_tier flows through request_overrides from route resolution
- Omit max_output_tokens for Codex backend (rejects it)
- Persists to config.yaml under agent.service_tier
Salvage cleanup: removed simple_term_menu/input() menu (banned),
bare /fast now shows status like /reasoning. Removed redundant
override resolution in _build_api_kwargs — single source of truth
via request_overrides from route.
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
MiniMax's Anthropic-compatible endpoints reject requests that include
the fine-grained-tool-streaming beta header — every tool-use message
triggers a connection error (~18s timeout). Regular chat works fine.
Add _common_betas_for_base_url() that filters out the tool-streaming
beta for Bearer-auth (MiniMax) endpoints while keeping all other betas.
All four client-construction branches now use the filtered list.
Based on #6528 by @HiddenPuppy.
Original cherry-picked from PR #6688 by kshitijk4poor.
Fixes#6510, fixes#6555.
* feat: API server model name derived from profile name
For multi-user setups (e.g. OpenWebUI), each profile's API server now
advertises a distinct model name on /v1/models:
- Profile 'lucas' -> model ID 'lucas'
- Profile 'admin' -> model ID 'admin'
- Default profile -> 'hermes-agent' (unchanged)
Explicit override via API_SERVER_MODEL_NAME env var or
platforms.api_server.model_name config for custom names.
Resolves friction where OpenWebUI couldn't distinguish multiple
hermes-agent connections all advertising the same model name.
* docs: multi-user setup with profiles for API server + Open WebUI
- api-server.md: added Multi-User Setup section, API_SERVER_MODEL_NAME
to config table, updated /v1/models description
- open-webui.md: added Multi-User Setup with Profiles section with
step-by-step guide, updated model name references
- environment-variables.md: added API_SERVER_MODEL_NAME entry
When a streaming response is cut mid-tool-call (connection drop, timeout),
the accumulated function.arguments is invalid JSON. The mock response
builder defaulted finish_reason to 'stop', so the agent loop treated it
as a valid completed turn and tried to execute tools with broken args.
Fix: validate tool call arguments with json.loads() during mock response
reconstruction. If any are invalid JSON, override finish_reason to
'length'. In the main loop's length handler, if tool calls are present,
refuse to execute and return partial=True with a clear error instead of
silently failing or wasting retries.
Also fixes _thinking_exhausted to not short-circuit when tool calls are
present — truncated tool calls are not thinking exhaustion.
Original cherry-picked from PR #6776 by AIandI0x1.
Closes#6638.
The test was mocking _vprint entirely, bypassing the suppress guard.
Switch to capturing _print_fn output so the real _vprint runs and
the guard suppresses retry noise as intended.
_classify_by_message had no handling for _USAGE_LIMIT_PATTERNS, so
messages like 'usage limit exceeded, try again in 5 minutes' arriving
without an HTTP status code fell through to FailoverReason.unknown
instead of rate_limit.
Apply the same billing/rate-limit disambiguation that _classify_402
already uses: USAGE_LIMIT_PATTERNS + transient signal → rate_limit,
USAGE_LIMIT_PATTERNS alone → billing.
Add 4 tests covering the no-status-code usage-limit path.
* feat(nix): shared-state permission model for interactive CLI users
Enable interactive CLI users in the hermes group to share full
read-write state (sessions, memories, logs, cron) with the gateway
service via a setgid + group-writable permission model.
Changes:
nix/nixosModules.nix:
- Directories use setgid 2770 (was 0750) so new files inherit the
hermes group. home/ stays 0750 (no interactive write needed).
- Activation script creates HERMES_HOME subdirs (cron, sessions, logs,
memories) — previously Python created them but managed mode now skips
mkdir.
- Activation migrates existing runtime files to group-writable (chmod
g+rw). Nix-managed files (config.yaml, .env, .managed) stay 0640/0644.
- Gateway systemd unit gets UMask=0007 so files it creates are 0660.
hermes_cli/config.py:
- ensure_hermes_home() splits into managed/unmanaged paths. Managed mode
verifies dirs exist (raises RuntimeError if not) instead of creating
them. Scoped umask(0o007) ensures SOUL.md is created as 0660.
hermes_logging.py:
- _ManagedRotatingFileHandler subclass applies chmod 0660 after log
rotation in managed mode. RotatingFileHandler.doRollover() creates new
files via open() which uses the process umask (0022 → 0644), not the
scoped umask from ensure_hermes_home().
Verified with a 13-subtest NixOS VM integration test covering setgid,
interactive writes, file ownership, migration, and gateway coexistence.
Refs: #6044
* Fix managed log file mode on initial open
Co-authored-by: Siddharth Balyan <alt-glitch@users.noreply.github.com>
* refactor: simplify managed file handler and merge activation loops
- Cache is_managed() result in handler __init__ instead of lazy-importing
on every _open()/_chmod_if_managed() call. Avoids repeated stat+env
checks on log rotation.
- Merge two for-loops over the same subdir list in activation script
into a single loop (mkdir + chown + chmod + find in one pass).
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Siddharth Balyan <alt-glitch@users.noreply.github.com>