MiniMax's Anthropic-compatible endpoints reject requests that include
the fine-grained-tool-streaming beta header — every tool-use message
triggers a connection error (~18s timeout). Regular chat works fine.
Add _common_betas_for_base_url() that filters out the tool-streaming
beta for Bearer-auth (MiniMax) endpoints while keeping all other betas.
All four client-construction branches now use the filtered list.
Based on #6528 by @HiddenPuppy.
Original cherry-picked from PR #6688 by kshitijk4poor.
Fixes#6510, fixes#6555.
* feat: API server model name derived from profile name
For multi-user setups (e.g. OpenWebUI), each profile's API server now
advertises a distinct model name on /v1/models:
- Profile 'lucas' -> model ID 'lucas'
- Profile 'admin' -> model ID 'admin'
- Default profile -> 'hermes-agent' (unchanged)
Explicit override via API_SERVER_MODEL_NAME env var or
platforms.api_server.model_name config for custom names.
Resolves friction where OpenWebUI couldn't distinguish multiple
hermes-agent connections all advertising the same model name.
* docs: multi-user setup with profiles for API server + Open WebUI
- api-server.md: added Multi-User Setup section, API_SERVER_MODEL_NAME
to config table, updated /v1/models description
- open-webui.md: added Multi-User Setup with Profiles section with
step-by-step guide, updated model name references
- environment-variables.md: added API_SERVER_MODEL_NAME entry
When a streaming response is cut mid-tool-call (connection drop, timeout),
the accumulated function.arguments is invalid JSON. The mock response
builder defaulted finish_reason to 'stop', so the agent loop treated it
as a valid completed turn and tried to execute tools with broken args.
Fix: validate tool call arguments with json.loads() during mock response
reconstruction. If any are invalid JSON, override finish_reason to
'length'. In the main loop's length handler, if tool calls are present,
refuse to execute and return partial=True with a clear error instead of
silently failing or wasting retries.
Also fixes _thinking_exhausted to not short-circuit when tool calls are
present — truncated tool calls are not thinking exhaustion.
Original cherry-picked from PR #6776 by AIandI0x1.
Closes#6638.
The test was mocking _vprint entirely, bypassing the suppress guard.
Switch to capturing _print_fn output so the real _vprint runs and
the guard suppresses retry noise as intended.
_classify_by_message had no handling for _USAGE_LIMIT_PATTERNS, so
messages like 'usage limit exceeded, try again in 5 minutes' arriving
without an HTTP status code fell through to FailoverReason.unknown
instead of rate_limit.
Apply the same billing/rate-limit disambiguation that _classify_402
already uses: USAGE_LIMIT_PATTERNS + transient signal → rate_limit,
USAGE_LIMIT_PATTERNS alone → billing.
Add 4 tests covering the no-status-code usage-limit path.
* feat(nix): shared-state permission model for interactive CLI users
Enable interactive CLI users in the hermes group to share full
read-write state (sessions, memories, logs, cron) with the gateway
service via a setgid + group-writable permission model.
Changes:
nix/nixosModules.nix:
- Directories use setgid 2770 (was 0750) so new files inherit the
hermes group. home/ stays 0750 (no interactive write needed).
- Activation script creates HERMES_HOME subdirs (cron, sessions, logs,
memories) — previously Python created them but managed mode now skips
mkdir.
- Activation migrates existing runtime files to group-writable (chmod
g+rw). Nix-managed files (config.yaml, .env, .managed) stay 0640/0644.
- Gateway systemd unit gets UMask=0007 so files it creates are 0660.
hermes_cli/config.py:
- ensure_hermes_home() splits into managed/unmanaged paths. Managed mode
verifies dirs exist (raises RuntimeError if not) instead of creating
them. Scoped umask(0o007) ensures SOUL.md is created as 0660.
hermes_logging.py:
- _ManagedRotatingFileHandler subclass applies chmod 0660 after log
rotation in managed mode. RotatingFileHandler.doRollover() creates new
files via open() which uses the process umask (0022 → 0644), not the
scoped umask from ensure_hermes_home().
Verified with a 13-subtest NixOS VM integration test covering setgid,
interactive writes, file ownership, migration, and gateway coexistence.
Refs: #6044
* Fix managed log file mode on initial open
Co-authored-by: Siddharth Balyan <alt-glitch@users.noreply.github.com>
* refactor: simplify managed file handler and merge activation loops
- Cache is_managed() result in handler __init__ instead of lazy-importing
on every _open()/_chmod_if_managed() call. Avoids repeated stat+env
checks on log rotation.
- Merge two for-loops over the same subdir list in activation script
into a single loop (mkdir + chown + chmod + find in one pass).
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Siddharth Balyan <alt-glitch@users.noreply.github.com>
Chrome auto-launch now passes --user-data-dir, --no-first-run, and
--no-default-browser-check so the debug instance doesn't conflict with
an already-running Chrome using the default profile. The profile dir
lives at {hermes_home}/chrome-debug/.
Also updates the fallback manual instructions to include the same flags
and removes the stale 'close existing Chrome windows' hint.
Chrome auto-launch now passes --user-data-dir, --no-first-run, and
--no-default-browser-check so the debug instance doesn't conflict with
an already-running Chrome using the default profile. The profile dir
lives at {hermes_home}/chrome-debug/.
Also updates the fallback manual instructions to include the same flags
and removes the stale 'close existing Chrome windows' hint.
- Rewrite test_google_workspace_api.py: test bridge token handling
and calendar date range instead of removed get_credentials()
- Update test_google_oauth_setup.py: partial scopes now accepted
with warning instead of rejected with SystemExit
- Add resolve_proxy_url() to base.py — shared by all platform adapters
- Check HTTPS_PROXY / HTTP_PROXY / ALL_PROXY env vars first
- Fall back to macOS system proxy via scutil --proxy (zero-config)
- Pass proxy= to discord.py commands.Bot() for gateway connectivity
- Refactor telegram_network.py to use shared resolver
- Update test fixtures to accept proxy kwarg
Fixes blockquote > escaping, edit_message raw markdown, ***bold italic***
handling, HTML entity double-escaping (&amp;), Wikipedia URL parens
truncation, and step numbering format. Also adds format_message to the
tool-layer _send_to_platform for consistent formatting across all
delivery paths.
Changes:
- Protect Slack entities (<@user>, <https://...|label>, <!here>) from
escaping passes
- Protect blockquote > markers before HTML entity escaping
- Unescape-before-escape for idempotent HTML entity handling
- ***bold italic*** → *_text_* conversion (before **bold** pass)
- URL regex upgraded to handle balanced parentheses
- mrkdwn:True flag on chat_postMessage payloads
- format_message applied in edit_message and send_message_tool
- 52 new tests (format, edit, streaming, splitting, tool chunking)
- Use reversed(dict) idiom for placeholder restoration
Based on PR #3715 by dashed, cherry-picked onto current main.
Port the mention gating pattern from Telegram, Discord, WhatsApp, and
Matrix adapters to the Slack platform adapter.
- Add _slack_require_mention() with explicit-false parsing and env var
fallback (SLACK_REQUIRE_MENTION)
- Add _slack_free_response_channels() with env var fallback
(SLACK_FREE_RESPONSE_CHANNELS)
- Replace hardcoded mention check with configurable gating logic
- Bridge slack config.yaml settings to env vars
- Bridge free_response_channels through the generic platform bridging loop
- Add 26 tests covering config parsing, env fallback, gating logic
Config usage:
slack:
require_mention: false
free_response_channels:
- "C0AQWDLHY9M"
Default behavior unchanged: channels require @mention (backward compatible).
Based on PR #5885 by dorukardahan, cherry-picked and adapted to current main.
* fix(tests): mock is_safe_url in tests that use example.com
Tests using example.com URLs were failing because is_safe_url does a real DNS lookup which fails in environments where example.com doesn't resolve, causing the request to be blocked before reaching the already-mocked HTTP client. This should fix around 17 failing tests.
These tests test logic, caching, etc. so mocking this method should not modify them in any way. TestMattermostSendUrlAsFile was already doing this so we follow the same pattern.
* fix(test): use case-insensitive lookup for model context length check
DEFAULT_CONTEXT_LENGTHS uses inconsistent casing (MiniMax keys are lowercase, Qwen keys are mixed-case) so the test was broken in some cases since it couldn't find the model.
* fix(test): patch is_linux in systemd gateway restart test
The test only patched is_macos to False but didn't patch is_linux to True. On macOS hosts, is_linux() returns False and the systemd restart code path is skipped entirely, making the assertion fail.
* fix(test): use non-blocklisted env var in docker forward_env tests
GITHUB_TOKEN is in api_key_env_vars and thus in _HERMES_PROVIDER_ENV_BLOCKLIST so the env var is silently dropped, we replace it with a non-blocked one like DATABASE_URL so the tests actually work.
* fix(test): fully isolate _has_any_provider_configured from host env
_has_any_provider_configured() checks all env vars from PROVIDER_REGISTRY (not just the 5 the tests were clearing) and also calls get_auth_status() which detects gh auth token for Copilot. On machines with any of these set, the function returns True before reaching the code path under test.
Clear all registry vars and mock get_auth_status so host credentials don't interfere.
* fix(test): correct path to hermes_base_env.py in tool parser tests
Path(__file__).parent.parent resolved to tests/, not the project root.
The file lives at environments/hermes_base_env.py so we need one more parent level.
* fix(test): accept optional HTML fields in Matrix send payload
_send_matrix sometimes adds format and formatted_body when the markdown library is installed. The test was doing an exact dict equality check which broke. Check required fields instead.
* fix(test): add config.yaml to codex vision requirements test
The test only wrote auth.json but not config.yaml, so _read_main_provider() returned empty and vision auto-detect never tried the codex provider. Add a config.yaml pointing at openai-codex so the fallback path actually resolves the client.
* fix(test): clear OPENROUTER_API_KEY in _isolate_hermes_home
run_agent.py calls load_hermes_dotenv() at import time, which injects API keys from ~/.hermes/.env into os.environ before any test fixture runs. This caused test_agent_loop_tool_calling to make real API calls instead of skipping, which ends up making some tests fail.
* fix(test): add get_rate_limit_state to agent mock in usage report tests
_show_usage now calls agent.get_rate_limit_state() for rate limit
display. The SimpleNamespace mock was missing this method.
* fix(test): update expected Camofox config version from 12 to 13
* fix(test): mock _get_enabled_platforms in nous managed defaults test
Importing gateway.run leaks DISCORD_BOT_TOKEN into os.environ, which makes _get_enabled_platforms() return ["cli", "discord"] instead of just ["cli"]. tools_command loops per platform, so apply_nous_managed_defaults
runs twice: the first call sets config values, the second sees them as
already configured and returns an empty set, causing the assertion to
fail.
The setup wizard's OpenClaw migration previously ran immediately with
aggressive defaults (overwrite=True, preset=full) after a single
'Would you like to import?' prompt. This caused several problems:
- Config values with different semantics (e.g. tool_call_execution:
'auto' in OpenClaw vs 'off' for Hermes yolo mode) were imported
without translation
- Gateway tokens were hijacked from OpenClaw without warning, taking
over Telegram/Slack/Discord channels
- Instruction files (.md) containing OpenClaw-specific setup/restart
procedures were copied, causing Hermes restart failures
Now the migration:
1. Asks 'Would you like to see what can be imported?' (softer framing)
2. Runs a dry-run preview showing everything that would be imported
3. Displays categorized warnings for high-impact items (gateway
takeover, config value differences, instruction files)
4. Asks for explicit confirmation with default=No
5. Executes with overwrite=False (preserves existing Hermes config)
Also extracts _load_openclaw_migration_module() for reuse and adds
_print_migration_preview() with keyword-based warning detection.
Tests updated for two-phase behavior + new test for decline-after-preview.
When the API returns "max_tokens too large given prompt" (input tokens
are within the context window, but input + requested output > window),
the old code incorrectly routed through the same handler as "prompt too
long" errors, calling get_next_probe_tier() and permanently halving
context_length. This made things worse: the window was fine, only the
requested output size needed trimming for that one call.
Two distinct error classes now handled separately:
Prompt too long — input itself exceeds context window.
Fix: compress history + halve context_length (existing behaviour,
unchanged).
Output cap too large — input OK, but input + max_tokens > window.
Fix: parse available_tokens from the error message, set a one-shot
_ephemeral_max_output_tokens override for the retry, and leave
context_length completely untouched.
Changes:
- agent/model_metadata.py: add parse_available_output_tokens_from_error()
that detects Anthropic's "available_tokens: N" error format and returns
the available output budget, or None for all other error types.
- run_agent.py: call the new parser first in the is_context_length_error
block; if it fires, set _ephemeral_max_output_tokens (with a 64-token
safety margin) and break to retry without touching context_length.
_build_api_kwargs consumes the ephemeral value exactly once then clears
it so subsequent calls use self.max_tokens normally.
- agent/anthropic_adapter.py: expand build_anthropic_kwargs docstring to
clearly document the max_tokens (output cap) vs context_length (total
window) distinction, which is a persistent source of confusion due to
the OpenAI-inherited "max_tokens" name.
- cli-config.yaml.example: add inline comments explaining both keys side
by side where users are most likely to look.
- website/docs/integrations/providers.md: add a callout box at the top
of "Context Length Detection" and clarify the troubleshooting entry.
- tests/test_ctx_halving_fix.py: 24 tests across four classes covering
the parser, build_anthropic_kwargs clamping, ephemeral one-shot
consumption, and the invariant that context_length is never mutated
on output-cap errors.
/pr <anything> silently resolved to /prompt via the shortest-match
tiebreaker in prefix expansion, permanently overwriting the system
prompt and persisting to config. The command's functionality (setting
agent.system_prompt) is available via config.yaml and /personality
covers the common use case.
Removes: CommandDef, dispatch branch, _handle_prompt_command handler,
docs references, and updates subcommand extraction test.
The error classifier's generic-400 heuristic only extracted err_body_msg from
the nested body structure (body['error']['message']), missing the flat body
format used by OpenAI's Responses API (body['message']). This caused
descriptive 400 errors like 'Invalid input[index].name: string does not match
pattern' to appear generic when the session was large, misclassifying them as
context overflow and triggering an infinite compression loop.
Added flat-body fallback in _classify_400() consistent with the parent
classify_api_error() function's existing handling at line 297-298.
Parse x-ratelimit-* headers from inference API responses (Nous Portal,
OpenRouter, OpenAI-compatible) and display them in the /usage command.
- New agent/rate_limit_tracker.py: parse 12 rate limit headers (RPM/RPH/
TPM/TPH limits, remaining, reset timers), format as progress bars (CLI)
or compact one-liner (gateway)
- Hook into streaming path in run_agent.py: stream.response.headers is
available on the OpenAI SDK Stream object before chunks are consumed
- CLI /usage: appends rate limit section with progress bars + warnings
when any bucket exceeds 80%
- Gateway /usage: appends compact rate limit summary
- 24 unit tests covering parsing, formatting, edge cases
Headers captured per response:
x-ratelimit-{limit,remaining,reset}-{requests,tokens}{,-1h}
Example CLI display:
Nous Rate Limits (captured just now):
Requests/min [░░░░░░░░░░░░░░░░░░░░] 0.1% 1/800 used (799 left, resets in 59s)
Tokens/hr [░░░░░░░░░░░░░░░░░░░░] 0.0% 49/336.0M (336.0M left, resets in 52m)
Wrap is_dir() in _is_valid_subdir() and is_file() in
_load_hints_for_directory() with OSError handlers so that
inaccessible directories (e.g. /root from a non-root Daytona
host user) are silently skipped instead of crashing the agent.
The existing PermissionError PRs for prompt_builder.py (#6247,
#6321, #6355) do not cover subdirectory_hints.py, which was
identified as a separate crash path in the #6214 comments.
Ref: #6214
prune_sessions and delete_session only handled direct children when
satisfying the parent_session_id FK constraint. Multi-level chains
(A -> B -> C) caused IntegrityError because deleting B while C still
referenced it was blocked by the FK.
Fix: NULL out parent_session_id for any session whose parent is about
to be deleted. This orphans children instead of cascade-deleting them,
which also respects the prune retention window — newer child sessions
are no longer deleted just because an ancestor is old.
Reported by Aaryan2304 in PR #6463.
The 24-hour default cooldown for 402-exhausted credentials was far too
aggressive — if a user tops up credits or the 402 was caused by an
oversized max_tokens request rather than true billing exhaustion, they
shouldn't have to wait a full day. Reduce to 1 hour (matching the
existing 429 TTL).
Inspired by PR #6493 (michalkomar).
When a model returns no content, no structured reasoning, and no tool
calls (common with open models), the agent now silently retries up to
3 times before falling through to (empty).
Silent retry (no synthetic messages) keeps the conversation history
clean, preserves prompt caching, and respects the no-synthetic-user-
injection invariant. Most empty responses from open models are
transient (provider hiccups, rate limits, sampling flukes) so a
simple retry is sufficient.
This fills the last gap in the empty-response recovery chain:
1. _last_content_with_tools fallback (prior tool turn had content)
2. Thinking-only prefill continuation (#5931 — structured reasoning)
3. Empty response silent retry (NEW — truly empty, no reasoning)
4. (empty) terminal (last resort after all retries exhausted)
Inline <think> blocks are excluded — the model chose to reason, it
just produced no visible text. That differs from truly empty.
Tests:
- Updated test_truly_empty to expect 4 API calls (1 + 3 retries)
- Added test_truly_empty_response_succeeds_on_nudge
Tests for the new behavior paths:
- Large tool outputs no longer block compaction (motivating scenario)
- Hard minimum of 3 tail messages always protected
- 1.5x soft ceiling for oversized messages
- Small conversations still compress (min 8 messages)
- Token-budget prune path in _prune_old_tool_results
- Fallback to message-count when no token budget
PR #6240 changed tail protection from protect_last_n to min(3, ...)
which increased the minimum compressible message count and shifted
tail boundaries. Three tests broke:
- test_summary_role_avoids_consecutive_user_messages: 6→8 msgs
- test_double_collision_user_head_assistant_tail: 7→8 msgs
- test_no_collision_scenarios_still_work: 6→8 msgs
All tests now exceed the new min_for_compress threshold (6) and
maintain proper role alternation in both head and tail sections.
Port missing features from the hindsight-hermes external integration
package into the native plugin. Only touches plugin files — no core
changes.
Features:
- Tags on retain/recall (tags, recall_tags, recall_tags_match)
- Recall config (recall_max_tokens, recall_max_input_chars, recall_types,
recall_prompt_preamble)
- Retain controls (retain_every_n_turns, auto_retain, auto_recall,
retain_async via aretain_batch, retain_context)
- Bank config via Banks API (bank_mission, bank_retain_mission)
- Structured JSON retain with per-message timestamps
- Full session accumulation with document_id for dedup
- Custom post_setup() wizard with curses picker
- Mode-aware dep install (hindsight-client for cloud, hindsight-all for local)
- local_external mode and openai_compatible LLM provider
- OpenRouter support with auto base URL
- Auto-upgrade of hindsight-client to >=0.4.22 on session start
- Comprehensive debug logging across all operations
- 46 unit tests
- Updated README and website docs
Prevents unbounded memory growth in _assistant_threads dict.
Evicts oldest entries when exceeding _ASSISTANT_THREADS_MAX (5000),
matching the pattern used by _mentioned_threads and _seen_messages.
The test_non_internal_event_without_user_triggers_pairing test relied on
no Discord auth env vars being set, but gateway/run.py loads dotenv at
module level. In environments with DISCORD_ALLOW_ALL_USERS=True in .env,
the auth check passed instead of triggering the pairing flow.
Clear DISCORD_ALLOW_ALL_USERS, DISCORD_ALLOWED_USERS, GATEWAY_ALLOW_ALL_USERS,
and GATEWAY_ALLOWED_USERS via monkeypatch to ensure test isolation.
When a background process with notify_on_complete=True finishes, the
gateway injects a synthetic MessageEvent to notify the session. This
event was constructed without user_id, causing _is_user_authorized()
to reject it and — for DM-origin sessions — trigger the pairing flow,
sending "Hi~ I don't recognize you yet!" with a pairing code to the
chat owner.
Add an `internal` flag to MessageEvent that bypasses authorization
checks for system-generated synthetic events. Only the process watcher
sets this flag; no external/adapter code path can produce it.
Includes 4 regression tests covering the fix and the normal pairing path.