Wrap is_dir() in _is_valid_subdir() and is_file() in
_load_hints_for_directory() with OSError handlers so that
inaccessible directories (e.g. /root from a non-root Daytona
host user) are silently skipped instead of crashing the agent.
The existing PermissionError PRs for prompt_builder.py (#6247,
#6321, #6355) do not cover subdirectory_hints.py, which was
identified as a separate crash path in the #6214 comments.
Ref: #6214
prune_sessions and delete_session only handled direct children when
satisfying the parent_session_id FK constraint. Multi-level chains
(A -> B -> C) caused IntegrityError because deleting B while C still
referenced it was blocked by the FK.
Fix: NULL out parent_session_id for any session whose parent is about
to be deleted. This orphans children instead of cascade-deleting them,
which also respects the prune retention window — newer child sessions
are no longer deleted just because an ancestor is old.
Reported by Aaryan2304 in PR #6463.
Updated the logic for determining the probed_url in the probe_api_models function to use the first tried URL instead of the last. This change ensures that the most relevant URL is returned when probing for models. Additionally, improved the output message in the _model_flow_custom function to provide clearer guidance based on the suggested_base_url.
The 24-hour default cooldown for 402-exhausted credentials was far too
aggressive — if a user tops up credits or the 402 was caused by an
oversized max_tokens request rather than true billing exhaustion, they
shouldn't have to wait a full day. Reduce to 1 hour (matching the
existing 429 TTL).
Inspired by PR #6493 (michalkomar).
When a model returns no content, no structured reasoning, and no tool
calls (common with open models), the agent now silently retries up to
3 times before falling through to (empty).
Silent retry (no synthetic messages) keeps the conversation history
clean, preserves prompt caching, and respects the no-synthetic-user-
injection invariant. Most empty responses from open models are
transient (provider hiccups, rate limits, sampling flukes) so a
simple retry is sufficient.
This fills the last gap in the empty-response recovery chain:
1. _last_content_with_tools fallback (prior tool turn had content)
2. Thinking-only prefill continuation (#5931 — structured reasoning)
3. Empty response silent retry (NEW — truly empty, no reasoning)
4. (empty) terminal (last resort after all retries exhausted)
Inline <think> blocks are excluded — the model chose to reason, it
just produced no visible text. That differs from truly empty.
Tests:
- Updated test_truly_empty to expect 4 API calls (1 + 3 retries)
- Added test_truly_empty_response_succeeds_on_nudge
The BlueBubbles adapter was merged but missing setup wizard support:
- Add _setup_bluebubbles() guided setup (server URL, password, allowlist,
home channel, webhook port)
- Add to _GATEWAY_PLATFORMS registry so it appears in 'hermes setup gateway'
- Add to any_messaging check and home channel missing warning
- Add to gateway status display in 'hermes setup'
- Add BLUEBUBBLES_SERVER_URL, BLUEBUBBLES_PASSWORD, BLUEBUBBLES_ALLOWED_USERS
to OPTIONAL_ENV_VARS with descriptions and categories
Previously the only way to configure BlueBubbles was manually editing .env.
Two issues resolved:
1. Add opencode.ai to _URL_TO_PROVIDER mapping so base_url routes through
models.dev lookup (which has mimo-v2-pro at 1M context) instead of
falling back to probing /models (404) and defaulting to 128K.
2. Fix _format_context_length to round cleanly: 1048576 → '1M' instead
of '1.048576M'. Applies same rounding logic to K values.
Add QEMU cross-compilation and multi-arch manifest support so Apple
Silicon (M1/M2/M3) and other ARM-based systems get native images.
- Add docker/setup-qemu-action for arm64 emulation on amd64 runners
- Smoke test stays amd64-only (load:true can't export multi-arch)
- Both push steps (main + release) now build linux/amd64,linux/arm64
- Bump timeout 30->60min for QEMU cross-compilation overhead
- Add permissions: contents: read (least-privilege hardening)
Salvaged from PR #3998 by Mibayy. Also addresses #5005 and #3913.
Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>
Tests for the new behavior paths:
- Large tool outputs no longer block compaction (motivating scenario)
- Hard minimum of 3 tail messages always protected
- 1.5x soft ceiling for oversized messages
- Small conversations still compress (min 8 messages)
- Token-budget prune path in _prune_old_tool_results
- Fallback to message-count when no token budget
PR #6240 changed tail protection from protect_last_n to min(3, ...)
which increased the minimum compressible message count and shifted
tail boundaries. Three tests broke:
- test_summary_role_avoids_consecutive_user_messages: 6→8 msgs
- test_double_collision_user_head_assistant_tail: 7→8 msgs
- test_no_collision_scenarios_still_work: 6→8 msgs
All tests now exceed the new min_for_compress threshold (6) and
maintain proper role alternation in both head and tail sections.
Tail protection was effectively message-count based despite having a
token budget, because protect_last_n=20 acted as a hard floor. A single
50K-token tool output would cause all 20 recent messages to be
preserved regardless of budget, leaving little room for summarization.
Changes:
- _find_tail_cut_by_tokens: min_tail reduced from protect_last_n (20)
to 3; token budget is now the primary criterion
- Soft ceiling at 1.5x budget to avoid cutting mid-oversized-message
- _prune_old_tool_results: accepts optional protect_tail_tokens so
pruning also respects the token budget instead of a fixed count
- compress() minimum message check relaxed from protect_first_n +
protect_last_n + 1 to protect_first_n + 3 + 1
- Tool group alignment (no splitting tool_call/result) preserved
Port missing features from the hindsight-hermes external integration
package into the native plugin. Only touches plugin files — no core
changes.
Features:
- Tags on retain/recall (tags, recall_tags, recall_tags_match)
- Recall config (recall_max_tokens, recall_max_input_chars, recall_types,
recall_prompt_preamble)
- Retain controls (retain_every_n_turns, auto_retain, auto_recall,
retain_async via aretain_batch, retain_context)
- Bank config via Banks API (bank_mission, bank_retain_mission)
- Structured JSON retain with per-message timestamps
- Full session accumulation with document_id for dedup
- Custom post_setup() wizard with curses picker
- Mode-aware dep install (hindsight-client for cloud, hindsight-all for local)
- local_external mode and openai_compatible LLM provider
- OpenRouter support with auto base URL
- Auto-upgrade of hindsight-client to >=0.4.22 on session start
- Comprehensive debug logging across all operations
- 46 unit tests
- Updated README and website docs
Prevents unbounded memory growth in _assistant_threads dict.
Evicts oldest entries when exceeding _ASSISTANT_THREADS_MAX (5000),
matching the pattern used by _mentioned_threads and _seen_messages.
The test_non_internal_event_without_user_triggers_pairing test relied on
no Discord auth env vars being set, but gateway/run.py loads dotenv at
module level. In environments with DISCORD_ALLOW_ALL_USERS=True in .env,
the auth check passed instead of triggering the pairing flow.
Clear DISCORD_ALLOW_ALL_USERS, DISCORD_ALLOWED_USERS, GATEWAY_ALLOW_ALL_USERS,
and GATEWAY_ALLOWED_USERS via monkeypatch to ensure test isolation.
When a background process with notify_on_complete=True finishes, the
gateway injects a synthetic MessageEvent to notify the session. This
event was constructed without user_id, causing _is_user_authorized()
to reject it and — for DM-origin sessions — trigger the pairing flow,
sending "Hi~ I don't recognize you yet!" with a pairing code to the
chat owner.
Add an `internal` flag to MessageEvent that bypasses authorization
checks for system-generated synthetic events. Only the process watcher
sets this flag; no external/adapter code path can produce it.
Includes 4 regression tests covering the fix and the normal pairing path.
`_cleanup_task_resources` was unconditionally calling `cleanup_vm()` at
the end of every `run_conversation` (i.e. every user turn), tearing down
the docker/daytona/modal sandbox container regardless of its
`persistent_filesystem` setting. This contradicted the documented intent
of `terminal.lifetime_seconds` (idle reaper) and `container_persistent`,
and caused per-turn loss of `/workspace`, `~/.config`, agent CLI auth
state, and any other content living inside the sandbox.
The unconditional teardown was introduced in fbd3a2fd ("prevent leakage
of morph instances between tasks", 2025-11-04) to plug a Morph backend
leak, two days after `lifetime_seconds` shipped in faecbddd. It was
later refactored into `_cleanup_task_resources` in 70dd3a16 without
changing semantics. Code and docs have disagreed since.
Fix: introduce `terminal_tool.is_persistent_env(task_id)` and skip the
per-turn `cleanup_vm` when the active env is persistent. The idle reaper
(`_cleanup_inactive_envs`) still tears persistent envs down once
`terminal.lifetime_seconds` is exceeded. Non-persistent backends (Morph)
are unchanged — still torn down per turn, preserving the original
leak-prevention intent.
Combines the approaches from PR #6309 (duan78) and PR #5963 (KUSH42):
Tiered warnings (from #5963):
- Replaces boolean _context_pressure_warned with float _context_pressure_warned_at
- Fires at 85% (orange) and re-fires at 95% (red/critical)
- Adds 'compacting context...' status message before compression
Gateway dedup (from #6309):
- Class-level dict _context_pressure_last_warned survives across AIAgent
instances (gateway creates a new instance per message)
- 5-minute cooldown per session prevents warning spam
- Higher-tier warnings bypass the cooldown (85% → 95% always fires)
- Compression reset clears the dedup entry for the session
- Stale entries evicted (older than 2x cooldown) to prevent memory leak
Does NOT inject into messages — purely user-facing via _safe_print (CLI)
and status_callback (gateway). Zero prompt cache impact.
Fixes#6309. Fixes#5963.
ORIGINAL INCIDENT:
Discord forum descriptions (the topic field on ForumChannel) were invisible
to the agent. When a user set project instructions in a forum's description
(e.g. tool-evaluations), threads created in that forum had no Channel Topic
in their session context. Discovered while evaluating per-forum auto-context
injection for web-tap-terminal development threads.
ISSUE IN THE CODE:
In gateway/platforms/discord.py, all three session entry points
(_handle_message, _build_slash_event, _dispatch_thread_session) read
chat_topic via getattr(channel, 'topic', None). Discord Thread objects
don't carry a topic — only the parent ForumChannel does. So chat_topic
was always None for forum threads, and the Channel Topic line was never
injected into build_session_context_prompt output. The infrastructure to
handle this was already in place — _is_forum_parent() detects forum
channels, _format_thread_chat_name() traverses to the parent, and
build_session_context_prompt() renders Channel Topic when present. The
forum parent was being identified; its topic just wasn't being read.
HOW THIS COMMIT FIXES IT:
Adds _get_effective_topic(channel, is_thread) helper that reads
channel.topic first, then falls back to the parent forum's topic when
the channel is a thread inside a forum. All three session entry points
now call this helper instead of inlining getattr(channel, 'topic', None).
Existing tests pass unchanged.
Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>
The new degradation warning reads compression_count as an int,
but the existing test's MagicMock returns a MagicMock object
for that attribute, causing '>=' comparison to fail.
Three targeted improvements to the compression system:
1. Replace hardcoded truncation limits with named class constants
(_CONTENT_MAX=6000, _CONTENT_HEAD=4000, _CONTENT_TAIL=1500,
_TOOL_ARGS_MAX=1500, _TOOL_ARGS_HEAD=1200). Previous limits
(3000/500) heavily truncated the summarizer's input — a 200-line
edit got cut to 3000 chars before the summarizer ever saw it.
2. Add '## Tools & Patterns' section to both compression prompt
templates (first-pass and iterative). Preserves working tool
invocations, preferred flags, and tool-specific discoveries
across compaction boundaries.
3. Warn users on 2nd+ compression: 'Session compressed N times —
accuracy may degrade. Consider /new to start fresh.'
Ref #499
- Bind exception in warning send handler (was using stale _ne from outer scope)
- Calculate remaining time until timeout correctly: (timeout - warning) // 60
instead of warning // 60 (which equals elapsed time, not remaining)
Introduce gateway_timeout_warning (default 900s) as a pre-timeout alert
layer. When inactivity reaches the warning threshold, a single
notification is sent to the user offering to wait or reset. If
inactivity continues to the gateway_timeout (default 1800s), the full
timeout fires as before.
This gives users a chance to intervene before work is lost on slow
API providers without disabling the safety timeout entirely.
Config: agent.gateway_timeout_warning in config.yaml, or
HERMES_AGENT_TIMEOUT_WARNING env var (0 = disable warning).
Step c in switch_model() blindly converted the first colon to a slash for
aggregator providers, even when the model name already contained a slash
(vendor/model format). This mangled variant tags like :free into /free,
causing 400 Bad Request from the API.
Fix: skip the colon→slash conversion when the model already has a slash,
since the colon is a variant tag, not a vendor separator. The module
docstring already documented this intent (line 17-18) but the
implementation didn't enforce it.
Reported via Discord. Related to PR #6088 (which identified the same bug
but placed the fix in model_normalize.py instead of model_switch.py where
the actual mangling occurs).
Local inference providers (Ollama, oMLX, llama-cpp) can take 300+ seconds
for prefill on large contexts. The 180s stale stream detector was killing
these connections while the provider was still processing.
Uses the existing is_local_endpoint() (proper URL parsing with RFC-1918,
localhost, WSL detection) instead of ad-hoc substring matching. The stale
timeout is only disabled when the user hasn't explicitly set
HERMES_STREAM_STALE_TIMEOUT — explicit user config is always honored.
Fixes#5889
Documents the per-platform tool_progress_overrides config key added in
PR #6348. Shows example YAML with Signal set to 'off' while Telegram
stays on 'verbose'. Lists all valid platform keys.
The _call_llm() and direct OpenAI fallback paths in flush_memories() both
hardcoded timeout=30.0, ignoring the user-configurable value at
auxiliary.flush_memories.timeout in config.yaml.
Remove the explicit timeout from the auxiliary _call_llm() call so that
_get_task_timeout('flush_memories') reads from config. For the direct
OpenAI fallback, import and use _get_task_timeout() instead of the
hardcoded value.
Add two regression tests verifying both code paths respect the config.
Fixes#6154
When managed_persistence is enabled, cleanup_browser() was calling
camofox_close() which destroys the server-side browser context via
DELETE /sessions/{userId}, killing login sessions across cron runs.
Add camofox_soft_cleanup() — a public wrapper that drops only the
in-memory session entry when managed persistence is on, returning True.
When persistence is off it returns False so the caller falls back to
the full camofox_close(). The inactivity reaper still handles idle
resource cleanup.
Also surface a logger.warning() when _managed_persistence_enabled()
fails to load config, replacing a silent except-and-return-False.
Salvaged from #6182 by el-analista (Eduardo Perea Fernandez).
Added public API wrapper to avoid cross-module private imports,
and test coverage for both persistence paths.
Co-authored-by: Eduardo Perea Fernandez <el-analista@users.noreply.github.com>
Fixes#4647 — Signal replies duplicated when gateway streaming is enabled.
Root cause: stream_consumer.py did not handle the case where send() returns
success=True but no message_id (Signal behavior). Every stream delta produced
a separate send() call (7+ messages instead of 2), plus the gateway sent
another full duplicate since already_sent was never set.
Changes:
- stream_consumer.py: Add elif branch for success-without-message_id — enters
fallback mode (sets already_sent, disables editing, sends only continuation)
- signal.py send(): Extract timestamp from signal-cli RPC result as message_id
so stream consumer follows normal edit→fallback path
- signal.py: Add public stop_typing() delegating to _stop_typing_indicator()
so base adapter's _keep_typing finally block can clean up typing tasks
- gateway/run.py: Per-platform tool_progress_overrides (#6164) — lets users
set e.g. signal: off while keeping telegram: all
- hermes_cli/config.py: Add tool_progress_overrides to DEFAULT_CONFIG
Refs: #4647, #6164
- Docker env tests: verify _build_init_env_args() instead of per-execute
Popen flags (env forwarding is now init-time only)
- Docker: preserve explicit forward_env bypass of blocklist from main
- Daytona tests: adapt to SDK-native timeout, _ThreadedProcessHandle,
base.py interrupt handling, HERMES_STDIN_ heredoc prefix
- Modal tests: fix _load_module to include _ThreadedProcessHandle stub,
check ensurepip in _resolve_modal_image instead of __init__
- SSH tests: mock time.sleep on base module instead of removed ssh import
- Add missing BaseEnvironment attributes to __new__()-based test fixtures
Add configurable reply-reference behavior for Discord, matching the
existing Telegram (TELEGRAM_REPLY_TO_MODE) and Mattermost
(MATTERMOST_REPLY_MODE) implementations.
Modes:
- 'off': never reply-reference the original message
- 'first': reply-reference on first chunk only (default, current behavior)
- 'all': reply-reference on every chunk
Set DISCORD_REPLY_TO_MODE=off in .env to disable reply-to messages.
Changes:
- gateway/config.py: parse DISCORD_REPLY_TO_MODE env var
- gateway/platforms/discord.py: read reply_to_mode from config, respect
it in send() — skip fetch_message entirely when 'off'
- hermes_cli/config.py: add to OPTIONAL_ENV_VARS for hermes setup
- 23 tests covering config, send behavior, env var override
- docs: discord.md env var table + environment-variables.md reference
Closes community request from Stuart on Discord.
Two linked fixes for MiniMax Anthropic-compatible fallback:
1. Normalize httpx.URL to str before calling .rstrip() in auth/provider
detection helpers. Some client objects expose base_url as httpx.URL,
not str — crashed with AttributeError in _requires_bearer_auth() and
_is_third_party_anthropic_endpoint(). Also fixes _try_activate_fallback()
to use the already-stringified fb_base_url instead of raw httpx.URL.
2. Strip Anthropic-proprietary thinking block signatures when targeting
third-party Anthropic-compatible endpoints (MiniMax, Azure AI Foundry,
self-hosted proxies). These endpoints cannot validate Anthropic's
signatures and reject them with HTTP 400 'Invalid signature in
thinking block'. Now threads base_url through convert_messages_to_anthropic()
→ build_anthropic_kwargs() so signature management is endpoint-aware.
Based on PR #4945 by kshitijk4poor (rstrip fix).
Fixes#4944.
Users were setting model.provider to "main" after reading the auxiliary
provider docs, causing "Unknown provider" errors. The "main" alias is
only valid inside auxiliary:, compression:, and fallback_model: configs
where it means "use the same provider as my main agent chat."
Added warning admonitions and inline clarifications to:
- configuration.md: Auxiliary Models provider list and Provider Options table
- fallback-providers.md: Provider Options for Auxiliary Tasks table
Reported by community member cn on Discord.
Fixes 9 test failures on current main, incorporating ideas from PR stack
#6219-#6222 by xinbenlv with corrections:
- model_metadata: sync HF context length key casing
(minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5)
- cli.py: route quick command error output through self.console
instead of creating a new ChatConsole() instance
- docker.py: explicit docker_forward_env entries now bypass the
Hermes secret blocklist (intentional opt-in wins over generic filter)
- auxiliary_client: revert _read_main_provider() to simple
provider.strip().lower() — the _normalize_aux_provider() call
introduced in 5c03f2e7 stripped the custom: prefix, breaking
named custom provider resolution
- auxiliary_client: flip vision auto-detection order to
active provider → OpenRouter → Nous → stop (was OR → Nous → active)
- test: update vision priority test to match new order
Based on PR #6219-#6222 by xinbenlv.
* fix(nix): export HERMES_HOME system-wide when addToSystemPackages is true
The `addToSystemPackages` option's documentation (and the `:::tip` block in
`website/docs/getting-started/nix-setup.md`) promises that enabling it both
puts the `hermes` CLI on PATH and sets `HERMES_HOME` system-wide so interactive
shells share state with the gateway service. The module only did the former,
so running `hermes` in a user shell silently created a separate `~/.hermes/`
directory instead of the managed `${stateDir}/.hermes`.
Implement the documented behavior by also setting
`environment.variables.HERMES_HOME = "${cfg.stateDir}/.hermes"` in the same
mkIf block, and update the option description to match.
Fixes#6044
* fix(nix): preserve group-readable permissions in managed mode
The NixOS module sets HERMES_HOME directories to 0750 and files to 0640
so interactive users in the hermes group can share state with the gateway
service. Two issues prevented this from working:
1. hermes_cli/config.py: _secure_dir() unconditionally chmod'd HERMES_HOME
to 0700 on every startup, overwriting the NixOS module's 0750. Similarly,
_secure_file() forced 0600 on config files. Both now skip in managed mode
(detected via .managed marker or HERMES_MANAGED env var).
2. nix/nixosModules.nix: the .env file was created with 0600 (owner-only),
while config.yaml was already 0640 (group-readable). Changed to 0640 for
consistency — users granted hermes group membership should be able to read
the managed .env.
Verified with a NixOS VM integration test: a normal user in the hermes group
can now run `hermes version` and `hermes config` against the managed
HERMES_HOME without PermissionError.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: zerone0x <zerone0x@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mistralai v2.x is a namespace package — `Mistral` class lives at
`mistralai.client`, not at the top-level `mistralai` module. The
previous `from mistralai import Mistral` raises ImportError at runtime.
Update both production code and test fixture to use the correct path.
- Add HERMES_QWEN_BASE_URL to OPTIONAL_ENV_VARS in config.py (was missing
despite being referenced in code)
- Remove redundant qwen-oauth entry from _API_KEY_PROVIDER_AUX_MODELS
(non-aggregator providers use their main model for aux tasks automatically)
Based on #6079 by @tunamitom with critical fixes and comprehensive tests.
Changes from #6079:
- Fix: sanitization overwrite bug — Qwen message prep now runs AFTER codex
field sanitization, not before (was silently discarding Qwen transforms)
- Fix: missing try/except AuthError in runtime_provider.py — stale Qwen
credentials now fall through to next provider on auto-detect
- Fix: 'qwen' alias conflict — bare 'qwen' stays mapped to 'alibaba'
(DashScope); use 'qwen-portal' or 'qwen-cli' for the OAuth provider
- Fix: hardcoded ['coder-model'] replaced with live API fetch + curated
fallback list (qwen3-coder-plus, qwen3-coder)
- Fix: extract _is_qwen_portal() helper + _qwen_portal_headers() to replace
5 inline 'portal.qwen.ai' string checks and share headers between init
and credential swap
- Fix: add Qwen branch to _apply_client_headers_for_base_url for mid-session
credential swaps
- Fix: remove suspicious TypeError catch blocks around _prompt_provider_choice
- Fix: handle bare string items in content lists (were silently dropped)
- Fix: remove redundant dict() copies after deepcopy in message prep
- Revert: unrelated ai-gateway test mock removal and model_switch.py comment deletion
New tests (30 test functions):
- _qwen_cli_auth_path, _read_qwen_cli_tokens (success + 3 error paths)
- _save_qwen_cli_tokens (roundtrip, parent creation, permissions)
- _qwen_access_token_is_expiring (5 edge cases: fresh, expired, within skew,
None, non-numeric)
- _refresh_qwen_cli_tokens (success, preserve old refresh, 4 error paths,
default expires_in, disk persistence)
- resolve_qwen_runtime_credentials (fresh, auto-refresh, force-refresh,
missing token, env override)
- get_qwen_auth_status (logged in, not logged in)
- Runtime provider resolution (direct, pool entry, alias)
- _build_api_kwargs (metadata, vl_high_resolution_images, message formatting,
max_tokens suppression)
Fixes#6259. Three bugs fixed:
1. Config key mismatch: _get_client() and _start_daemon() read
'llmApiKey' (camelCase) but save_config() stores 'llm_api_key'
(snake_case). The config value was never read — only the env var
fallback worked.
2. Missing base URL support: users on OpenRouter or custom endpoints
had no way to configure HINDSIGHT_API_LLM_BASE_URL through setup.
Added llm_base_url to config schema with empty default, passed
conditionally to HindsightEmbedded constructor.
3. Daemon config change detection: config_changed now also checks
HINDSIGHT_API_LLM_BASE_URL, and the daemon profile .env includes
the base URL when set.
Keeps HINDSIGHT_API_LLM_API_KEY (with double API) in the daemon
profile .env — this matches the upstream hindsight .env.example
convention.