Both /queue and /quit registered 'q' as an alias. Since /quit appeared
later in COMMAND_REGISTRY, _build_command_lookup() silently overwrote
/queue's claim, making the documented /queue shorthand unusable.
Fix: remove 'q' from /quit's aliases. /quit already has 'exit' as an
alias plus the full '/quit' command. /queue has no other short alias.
Closes#10467
The recovery block previously only retried (continue) when one of the
per-component sanitization checks (messages, tools, system prompt,
headers, credentials) found and stripped non-ASCII content. When the
non-ASCII lived only in api_messages' reasoning_content field (which
is built from messages['reasoning'] and not checked by the original
_sanitize_messages_non_ascii), all checks returned False and the
recovery fell through to the normal error path — burning a retry
attempt despite _force_ascii_payload being set.
Now the recovery always continues (retries) when _is_ascii_codec is
detected. The _force_ascii_payload flag guarantees the next iteration
runs _sanitize_structure_non_ascii(api_kwargs) on the full API payload,
catching any remaining non-ASCII regardless of where it lives.
Also adds test for the 'reasoning' field on canonical messages.
Fixes#6843
The ASCII-locale recovery path in run_agent.py sanitized the canonical
'messages' list but left 'api_messages' untouched. api_messages is a
separate API-copy built before the retry loop and may carry extra fields
(reasoning_content, extra_body entries) that are not present in
'messages'. This caused the retry to still raise UnicodeEncodeError even
after the 'System encoding is ASCII — stripped...' log line appeared.
Two changes:
- _sanitize_messages_non_ascii now walks all extra top-level string fields
in each message dict (any key not in {content, name, tool_calls, role})
so reasoning_content and future extras are cleaned in both 'messages'
and 'api_messages'.
- The ASCII-codec recovery block now also calls sanitize on api_messages
and api_kwargs so no non-ASCII survives into the next retry attempt.
Adds regression tests covering:
- reasoning_content with non-ASCII in api_messages
- extra_body with non-ASCII in api_kwargs
- canonical messages clean but api_messages dirty
Fixes#6843
Commentary messages (interim assistant status updates like "Using browser
tool...") are sent via _send_commentary(), which was incorrectly setting
_already_sent = True on success. This caused the final response to be
suppressed when there were multiple tool calls, because the gateway checks
already_sent to decide whether to skip re-sending the response.
The fix: commentary messages are interim status updates, not the final
response, so _already_sent should not be set when they succeed. This
ensures the final response is always delivered regardless of how many
commentary messages were sent during the turn.
Fixes: #10454
Route kimi-coding-cn through _resolve_kimi_base_url() in both
get_api_key_provider_status() and resolve_api_key_provider_credentials()
so CN users with sk-kimi- prefixed keys get auto-detected to the Kimi
Coding Plan endpoint, matching the existing behavior for kimi-coding.
Also update the kimi-coding display label to accurately reflect the
dual-endpoint setup (Kimi Coding Plan + Moonshot API).
Salvaged from PR #10525 by kkikione999.
_install_tirith() uses shutil.move() to place the binary from tmpdir
to ~/.hermes/bin/. When these are on different filesystems (common in
Docker, NFS), shutil.move() falls back to copy2 + unlink, but copy2's
metadata step can raise PermissionError. This exception propagated
past the fail_open guard, crashing the terminal tool entirely.
Additionally, a failed install could leave a non-executable tirith
binary at the destination, causing a retry loop on every subsequent
terminal command.
Fix:
- Catch OSError from shutil.move() and fall back to shutil.copy()
(skips metadata/xattr copying that causes PermissionError)
- If even copy fails, clean up the partial dest file to prevent
the non-executable retry loop
- Return (None, 'cross_device_copy_failed') so the failure routes
through the existing install-failure caching and fail_open logic
Closes#10127
Clarifies that tool-level access restrictions are not security boundaries
when the agent has unrestricted terminal access. Deny lists only matter
when paired with equivalent terminal-side restrictions (like WRITE_DENIED_PATHS
pairs with the dangerous command approval system).
After clear_session_vars() reset contextvars to their default (''),
get_session_env() treated the empty string as falsy and fell through
to os.environ — resurrecting stale HERMES_SESSION_* values from CLI
startup, cron, or previous sessions. This broke session isolation
in the gateway where concurrent messages could see each other's
stale environment values.
Fix: use a sentinel (_UNSET) as the contextvar default instead of ''.
get_session_env() now checks 'value is not _UNSET' instead of
truthiness. Three states are cleanly distinguished:
- _UNSET (never set): fall back to os.environ (CLI/cron compat)
- '' (explicitly cleared): return '' — no os.environ fallback
- 'telegram' (actively set): return the value
clear_session_vars() now uses var.set('') instead of var.reset(token)
to mark vars as explicitly cleared rather than reverting to _UNSET.
Closes#10304
When a model (e.g. mimo-v2-pro) streams intermediate text alongside tool
calls ("Let me search for that") but then returns empty after processing
tool results, the stream consumer already_sent flag is True from the
earlier text delivery. The gateway suppression check
(already_sent=True, failed=False → return None) would swallow the final
response, leaving the user staring at silence after the search.
Two changes:
1. gateway/run.py return path: skip already_sent suppression when the
final_response is "(empty)" or empty — the user needs to know the
agent finished even if streaming sent partial content earlier.
2. gateway/run.py response handler: convert the internal "(empty)"
sentinel to a user-friendly warning instead of delivering the raw
sentinel string.
Tests added for all empty/None/sentinel cases plus preserved existing
suppression behavior for normal non-empty responses.
Memory provider discovery (discover_memory_providers, load_memory_provider)
only scanned the bundled plugins/memory/ directory. User-installed providers
at $HERMES_HOME/plugins/<name>/ were invisible, forcing users to symlink
into the repo source tree — which broke on hermes update and created a
dual-registration path causing duplicate tool names (400 errors on strict
providers like Xiaomi MiMo).
Changes:
- Add _get_user_plugins_dir(), _is_memory_provider_dir(), _iter_provider_dirs(),
and find_provider_dir() helpers to plugins/memory/__init__.py
- discover_memory_providers() now scans both bundled and user dirs
- load_memory_provider() uses find_provider_dir() (bundled-first)
- discover_plugin_cli_commands() uses find_provider_dir()
- _install_dependencies() in memory_setup.py uses find_provider_dir()
- User plugins use _hermes_user_memory namespace to avoid sys.modules collisions
- Non-memory user plugins filtered via source text heuristic
- Bundled providers always take precedence on name collisions
Fixes#4956, #9099. Supersedes #4987, #9123, #9130, #9132, #9982.
Discord's _register_slash_commands() had a hardcoded list of ~27 commands
while COMMAND_REGISTRY defines 34+ gateway-available commands. Missing
commands (debug, branch, rollback, snapshot, profile, yolo, fast, reload,
commands) were invisible in Discord's / autocomplete — users couldn't
discover them.
Add a dynamic catch-all loop after the explicit registrations that
iterates COMMAND_REGISTRY, skips already-registered commands, and
auto-registers the rest using discord.app_commands.Command(). Commands
with args_hint get an optional string parameter; parameterless commands
get a simple callback.
This ensures any future commands added to COMMAND_REGISTRY automatically
appear on Discord without needing a manual entry in discord.py.
Telegram and Slack already derive dynamically from COMMAND_REGISTRY
via telegram_bot_commands() and slack_subcommand_map() — no changes
needed there.
update_job() assumed the schedule value was always a pre-parsed dict
and called .get() on it directly. When the API passes a raw string
like "every 10m", this crashed with AttributeError.
The create path already handles this correctly by calling
parse_schedule() on the incoming string. The fix adds the same
normalization to the update path: if the schedule is a string,
parse it into a dict before proceeding.
Closes#10129
When a user runs /browser connect to attach browser tools to their real
Chrome instance via CDP, the BROWSER_CDP_URL env var is set. However,
every browser tool function checks _is_camofox_mode() first, which
short-circuits to the Camofox backend before _get_session_info() ever
checks for the CDP override.
Fix: is_camofox_mode() now returns False when BROWSER_CDP_URL is set,
so the explicit CDP connection takes priority. This is the correct
behavior — /browser connect is an intentional user override.
Reported by SkyLinx on Discord.
Models (especially open-source like qwen3.5-plus) may send non-int values
for the limit parameter — None (JSON null), string, or even a type object.
This caused TypeError: '<=' not supported between instances of 'int' and
'type' when the value reached min()/comparison operations.
Changes:
- Add defensive int coercion at session_search() entry with fallback to 3
- Clamp limit to [1, 5] range (was only capped at 5, not floored)
- Add tests for None, type object, string, negative, and zero limit values
Reported by community user ludoSifu via Discord.
Memory provider plugins (e.g. Mnemosyne) can register tools via two paths:
1. Plugin system (ctx.register_tool) → tool registry → get_tool_definitions()
2. Memory manager → get_all_tool_schemas() → direct append in AIAgent.__init__
Path 2 blindly appended without checking if path 1 already added the same
tool names. This created duplicate function names in the tools array sent
to the API. Most providers silently handle duplicates, but Xiaomi MiMo
(via Nous Portal) strictly rejects them with a 400 Bad Request.
Fix: build a set of existing tool names before memory manager injection
and skip any tool whose name is already present.
Confirmed via live testing against Nous Portal:
- Unique tool names → 200 OK
- Duplicate tool names → 400 'Provider returned error'
Python's json.dumps() defaults to ensure_ascii=True, escaping non-ASCII
characters to \uXXXX sequences. For CJK characters this inflates
token count 3-4x — a single Chinese character like '中' becomes
'\u4e2d' (6 chars vs 3 bytes, ~6 tokens vs ~1 token).
Since MCP tool results feed directly into the model's conversation
context, this silently multiplied API costs for Chinese, Japanese,
and Korean users.
Fix: add ensure_ascii=False to all 20 json.dumps calls in mcp_tool.py.
Raw UTF-8 is valid JSON per RFC 8259 and all downstream consumers
(LLM APIs, display) handle it correctly.
Closes#10234
- Pastes uploaded by /debug now auto-delete after 1 hour via a detached
background process that sends DELETE to paste.rs
- CLI: shows privacy notice listing what data will be uploaded
- Gateway: only uploads summary report (system info + log tails), NOT
full log files containing conversation content
- Added 'hermes debug delete <url>' for immediate manual deletion
- 16 new tests covering auto-delete scheduling, paste deletion, privacy
notices, and the delete subcommand
Addresses user privacy concern where /debug uploaded full conversation
logs to a public paste service with no warning or expiry.
Two gateway fixes:
1. MessageDeduplicator.is_duplicate() now checks TTL at query time (#10306)
Previously, is_duplicate() returned True for any previously seen ID
without checking its age — expired entries were only purged when cache
size exceeded max_size. On normal workloads that never overflow, message
IDs stayed deduplicated forever instead of expiring after the TTL.
Fix: check `now - timestamp < ttl` before returning True. Expired
entries are removed and treated as new messages.
2. Gateway --config flag now uses yaml.safe_load() (#10216)
The --config CLI flag in gateway/run.py main() used json.load() to
parse config files. YAML is the only documented config format and
every other config loader uses yaml.safe_load(). A YAML config file
passed via --config would crash with json.JSONDecodeError.
Closes#10306Closes#10216
The on_memory_write bridge that notifies external memory providers
(ClawMem, retaindb, supermemory, etc.) of built-in memory writes was
only present in the concurrent tool execution path (_invoke_tool).
The sequential path (_execute_tool_calls_sequential) — which handles
all single tool calls, the common case — was missing it entirely.
This meant external memory providers silently missed every single-call
memory write, which is the vast majority of memory operations.
Fix: add the identical bridge block to the sequential path, right
after the memory_tool call returns.
Closes#10174
Multiple gaps in activity tracking could cause the gateway's inactivity
timeout to fire while the agent is actively working:
1. Streaming wait loop had no periodic heartbeat — the outer thread only
touched activity when the stale-stream detector fired (180-300s), and
for local providers (Ollama) the stale timeout was infinity, meaning
zero heartbeats. Now touches activity every 30s.
2. Concurrent tool execution never set the activity callback on worker
threads (threading.local invisible across threads) and never set
_current_tool. Workers now set the callback, and the concurrent wait
uses a polling loop with 30s heartbeats.
3. Modal backend's execute() override had its own polling loop without
any activity callback. Now matches _wait_for_process cadence (10s).
The _last_content_with_tools fallback was firing indiscriminately for ALL
content+tool turns, including mid-task narration alongside substantive
tools (terminal, search_files, etc.). This caused the agent to exit
the loop with 'I'll scan the directory...' as the final answer instead
of nudging the model to continue processing tool results.
The fix restricts the fallback to housekeeping-only turns (memory, todo,
skill_manage, session_search) where the content genuinely IS the final
answer. When substantive tools are present, the existing post-tool
nudge mechanism now fires instead, prompting the model to continue.
Affected models: xiaomi/mimo-v2-pro, GLM-5, and other weaker models
that intermittently return empty after tool results.
Reported by user Renaissance on Discord.
The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config. In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.
Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop. If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry. This bounds cache growth
to at most one entry per unique provider config.
Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.
Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).
Closes#10200
OV transparently handles message history across /new and /compress: old
messages stay in the same session and extraction is idempotent, so there's
no need to rebind providers to a new session_id. The only thing the
session boundary actually needs is to trigger extraction.
- MemoryProvider / MemoryManager: remove on_session_reset hook
- OpenViking: remove on_session_reset override (nothing to do)
- AIAgent: replace rotate_memory_session with commit_memory_session
(just calls on_session_end, no rebind)
- cli.py / run_agent.py: single commit_memory_session call at the
session boundary before session_id rotates
- tests: replace on_session_reset coverage with routing tests for
MemoryManager.on_session_end
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hasattr-forked OpenViking-specific paths with a proper base-class
hook. Collapse the two agent wrappers into a single rotate_memory_session
so callers don't orchestrate commit + rebind themselves.
- MemoryProvider: add on_session_reset(new_session_id) as a default no-op
- MemoryManager: on_session_reset fans out unconditionally (no hasattr,
no builtin skip — base no-op covers it)
- OpenViking: rename reset_session -> on_session_reset; drop the explicit
POST /api/v1/sessions (OV auto-creates on first message) and the two
debug raise_for_status wrappers
- AIAgent: collapse commit_memory_session + reinitialize_memory_session
into rotate_memory_session(new_sid, messages)
- cli.py / run_agent.py: replace hasattr blocks and the split calls with
a single unconditional rotate_memory_session call; compression path
now passes the real messages list instead of []
- tests: align with on_session_reset, assert reset does NOT POST /sessions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The OpenViking memory provider extracts memories when its session is
committed (POST /api/v1/sessions/{id}/commit). Before this fix, the
CLI had two code paths that changed the active session_id without ever
committing the outgoing OpenViking session:
1. /new (new_session() in cli.py) — called flush_memories() to write
MEMORY.md, then immediately discarded the old session_id. The
accumulated OpenViking session was never committed, so all context
from that session was lost before extraction could run.
2. /compress and auto-compress (_compress_context() in run_agent.py) —
split the SQLite session (new session_id) but left the OpenViking
provider pointing at the old session_id with no commit, meaning all
messages synced to OpenViking were silently orphaned.
The gateway already handles session commit on /new and /reset via
shutdown_memory_provider() on the cached agent; the CLI path did not.
Fix: introduce a lightweight session-transition lifecycle alongside
the existing full shutdown path:
- OpenVikingMemoryProvider.reset_session(new_session_id): waits for
in-flight background threads, resets per-session counters, and
creates the new OV session via POST /api/v1/sessions — without
tearing down the HTTP client (avoids connection overhead on /new).
- MemoryManager.restart_session(new_session_id): calls reset_session()
on providers that implement it; falls back to initialize() for
providers that do not. Skips the builtin provider (no per-session
state).
- AIAgent.commit_memory_session(messages): wraps
memory_manager.on_session_end() without shutdown — commits OV session
for extraction but leaves the provider alive for the next session.
- AIAgent.reinitialize_memory_session(new_session_id): wraps
memory_manager.restart_session() — transitions all external providers
to the new session after session_id has been assigned.
Call sites:
- cli.py new_session(): commit BEFORE session_id changes, reinitialize
AFTER — ensuring OV extraction runs on the correct session and the
new session is immediately ready for the next turn.
- run_agent._compress_context(): same pattern, inside the
if self._session_db: block where the session_id split happens.
/compress and auto-compress are functionally identical at this layer:
both call _compress_context(), so both are fixed by the same change.
Tests added to tests/agent/test_memory_provider.py:
- TestMemoryManagerRestartSession: reset_session() routing, builtin
skip, initialize() fallback, failure tolerance, empty-manager noop.
- TestOpenVikingResetSession: session_id update, per-session state
clear, POST /api/v1/sessions call, API failure tolerance, no-client
noop.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix copy-paste bug: `self._agent = user` → `self._agent = agent`
with new `agent` parameter in `_VikingClient.__init__`
- Read account/user/agent env vars in `initialize()` and pass them
to all 4 `_VikingClient` instantiations so identity headers are
consistently applied across health check, prefetch, sync, and
memory write paths
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change default OPENVIKING_ACCOUNT from 'root' to 'default'
- Add account and user config options to get_config_schema()
- Add session creation in initialize()
- Add reset_session() method
- Update docstring to reflect new default
This is a breaking change: existing users who relied on the 'root' account will need to either:
1. Set OPENVIKING_ACCOUNT=root in their environment, or
2. Migrate their data to the 'default' account
Future release will add support for OPENVIKING_ACCOUNT and OPENVIKING_USER in setup when API key is provided.
update desc for key setup
Gold #FFD700 has 1.4:1 contrast ratio on white — barely visible.
Replace with dark amber palette (#8B6508 primary, #7A5800 links)
that passes WCAG AA (5.3:1 and 6.5:1 respectively).
Changes:
- :root primary palette → dark amber tones for light mode
- Explicit light mode link colors (#7A5800 / #5A4100 hover)
- Light mode sidebar active state with amber accent
- Light mode table header/border styling
- Footer hover color split by theme (gold for dark, amber for light)
Dark mode is completely unchanged.
Reported by @AbrahamMat7632
_parse_session_key() now extracts the optional 6th part (thread_id) from
session keys, and _notify_active_sessions_of_shutdown uses _parsed.get()
instead of the removed 'parts' variable. Without this, shutdown notifications
silently failed (NameError caught by try/except) and forum topic routing
was lost.
- Populate watcher_* routing fields for watch-only processes (not just
notify_on_complete), so watch-pattern events carry direct metadata
instead of relying solely on session_key parsing fallback
- Extract _parse_session_key() helper to dedupe session key parsing
at two call sites in gateway/run.py
- Add negative test proving cross-thread leakage doesn't happen
- Add edge-case tests for _build_process_event_source returning None
(empty evt, invalid platform, short session_key)
- Add unit tests for _parse_session_key helper
Follow-up to #10459 (salvage of #7527). The copy_context() fix propagates
ALL ContextVars into the cron worker thread, including credential_files.
This test verifies that skill-declared required_credential_files are
visible inside the worker thread, matching the existing env_passthrough
regression test.