Fixes several outright-wrong facts and gaps vs current main:
- venv activation: .venv is preferred, venv is fallback (per run_tests.sh)
- AIAgent default model is "" (empty, resolved from config), not hardcoded opus
- Test suite is ~15k tests / ~700 files, not ~3000
- tools/mcp_tool.py is 2.6k LOC, not 1050
- Remove stale "currently 5" config_version note; the real bump-trigger rule
is migration-only, not every new key
- Remove MESSAGING_CWD as the messaging cwd — it's been removed in favor of
terminal.cwd in config.yaml (gateway bridges to TERMINAL_CWD env var)
- .env is secrets-only; non-secret settings belong in config.yaml
- simple_term_menu pitfall: existing sites are legacy fallback, rule is
no new usage
Incomplete/missing sections filled in:
- Gateway platforms list updated to reflect actual adapters (matrix,
mattermost, email, sms, dingtalk, wecom, weixin, feishu, bluebubbles,
webhook, api_server, etc.)
- New 'Plugins' section covering general plugins, memory-provider plugins,
and dashboard/context-engine/image-gen plugin directories — including
the May 2026 rule that plugins must not touch core files
- New 'Skills' section covering skills/ vs optional-skills/ split and
SKILL.md frontmatter fields
- Logs section pointing at ~/.hermes/logs/ and 'hermes logs' CLI
- Prompt-cache policy now explicitly mentions --now / deferred slash-command
invalidation pattern
- Two new pitfalls: gateway two-guard dispatch rule, squash-merge-from-stale
branch silent revert, don't-wire-dead-code rule
Tree layout trimmed to load-bearing entry points — per-file subtrees were
~70% stale so replaced with directory-level notes pointing readers at the
filesystem as the source of truth.
- Drop broken tinker-atropos submodule instructions: no .gitmodules exists,
tinker-atropos/ is empty, and atroposlib + tinker are regular pip deps in
pyproject.toml pulled in by .[all,dev]. Replace with a one-line note.
- CLI vs Messaging table: /skills is cli_only=True in COMMAND_REGISTRY, so
remove it from the messaging column. /<skill-name> still works there.
- Point contributors at scripts/run_tests.sh (the canonical runner enforcing
CI-parity env) instead of bare pytest.
Follow-up to Magaav's safe sync policy. Two gaps in the canonicalizer
caused false diffs or silent drift:
1. discord.py's AppCommand.to_dict() omits nsfw, dm_permission, and
default_member_permissions — those live only on attributes. The
canonicalizer was reading them via payload.get() and getting defaults
(False/True/None), while the desired side from Command.to_dict(tree)
had the real values. Any command using non-default permissions
false-diffed on every startup. Pull them from the AppCommand
attributes via _existing_command_to_payload().
2. contexts and integration_types weren't canonicalized at all, so
drift in either was silently ignored. Added both to
_canonicalize_app_command_payload (sorted for stable compare).
Also normalized default_member_permissions to str-or-None since the
server emits strings but discord.py stores ints locally.
Added regression tests for both gaps.
Replaces blind tree.sync() on every Discord reconnect with a diff-based
reconcile. In safe mode (default), fetch existing global commands,
compare desired vs existing payloads, skip unchanged, PATCH changed,
recreate when non-patchable metadata differs, POST missing, and delete
stale commands one-by-one. Keeps 'bulk' for legacy behavior and 'off'
to skip startup sync entirely.
Fixes restart-heavy workflows that burn Discord's command write budget
and can surface 429s when iterating on native slash commands.
Env var: DISCORD_COMMAND_SYNC_POLICY (safe|bulk|off), default 'safe'.
Co-authored-by: Codex <codex@openai.invalid>
- _stdio_pids: set → Dict[int,str] tracks pid→server_name
- SIGTERM-first with 2s grace before SIGKILL escalation
- hasattr guard for SIGKILL on platforms without it
- Updated tests for dict-based tracking and 3-phase kill sequence
float(os.getenv(...)) at module level raises ValueError on any
non-numeric value, crashing the web server at import before it starts.
Wrap in try/except with a warning log and fallback to 3.0s.
The original regex only matched relative paths (./foo/.env or bare
.env), so the exact command from the bug report —
`cp /opt/data/.env.local /opt/data/.env` — did not trigger approval.
Broaden the leading-path prefix to accept an absolute leading slash
alongside ./ and ../, and add regressions for the bug-report command
and its redirection variant.
cmd_update no longer SIGKILLs in-flight agent runs, and users get
'still working' status every 3 min instead of 10. Two long-standing
sources of '@user — agent gives up mid-task' reports on Telegram and
other gateways.
Drain-aware update:
- New helper hermes_cli.gateway._graceful_restart_via_sigusr1(pid,
drain_timeout) sends SIGUSR1 to the gateway and polls os.kill(pid,
0) until the process exits or the budget expires.
- cmd_update's systemd loop now reads MainPID via 'systemctl show
--property=MainPID --value' and tries the graceful path first. The
gateway's existing SIGUSR1 handler -> request_restart(via_service=
True) -> drain -> exit(75) is wired in gateway/run.py and is
respawned by systemd's Restart=on-failure (and the explicit
RestartForceExitStatus=75 on newer units).
- Falls back to 'systemctl restart' when MainPID is unknown, the
drain budget elapses, or the unit doesn't respawn after exit (older
units missing Restart=on-failure). Old install behavior preserved.
- Drain budget = max(restart_drain_timeout, 30s) + 15s margin so the
drain loop in run_agent + final exit have room before fallback
fires. Composes with #14728's tool-subprocess reaping.
Notification interval:
- agent.gateway_notify_interval default 600 -> 180.
- HERMES_AGENT_NOTIFY_INTERVAL env-var fallback in gateway/run.py
matched.
- 9-minute weak-model spinning runs now ping at 3 min and 6 min
instead of 27 seconds before completion, removing the 'is the bot
dead?' reflex that drives gateway-restart cycles.
Tests:
- Two new tests in tests/hermes_cli/test_update_gateway_restart.py:
one asserts SIGUSR1 is sent and 'systemctl restart' is NOT called
when MainPID is known and the helper succeeds; one asserts the
fallback fires when the helper returns False.
- E2E: spawned detached bash processes confirm the helper returns
True on SIGUSR1-handling exit (~0.5s) and False on SIGUSR1-ignoring
processes (timeout). Verified non-existent PID and pid=0 edge cases.
- 41/41 in test_update_gateway_restart.py (was 39, +2 new).
- 154/154 in shutdown-related suites including #14728's new tests.
Reported by @GeoffWellman and @ANT_1515 on X.
Closes#11616.
The agent's API retry loop hardcoded max_retries = 3, so users with
fallback providers on flaky primaries burned through ~3 × provider
timeout (e.g. 3 × 180s = 9 minutes) before their fallback chain got a
chance to kick in.
Expose a new config key:
agent:
api_max_retries: 3 # default unchanged
Set it to 1 for fast failover when you have fallback providers, or
raise it if you prefer longer tolerance on a single provider. Values
< 1 are clamped to 1 (single attempt, no retry); non-integer values
fall back to the default.
This wraps the Hermes-level retry loop only — the OpenAI SDK's own
low-level retries (max_retries=2 default) still run beneath this for
transient network errors.
Changes:
- hermes_cli/config.py: add agent.api_max_retries default 3 with comment.
- run_agent.py: read self._api_max_retries in AIAgent.__init__; replace
hardcoded max_retries = 3 in the retry loop with self._api_max_retries.
- cli-config.yaml.example: documented example entry.
- hermes_cli/tips.py: discoverable tip line.
- tests/run_agent/test_api_max_retries_config.py: 4 tests covering
default, override, clamp-to-one, and invalid-value fallback.
Closes#8202.
Root cause: stop() reclaimed tool-call bash/sleep children only at the
very end of the shutdown sequence — after a 60s drain, 5s interrupt
grace, and per-adapter disconnect. Under systemd (TimeoutStopSec bounded
by drain_timeout), that meant the cgroup SIGKILL escalation fired first,
and systemd reaped the bash/sleep children instead of us.
Fix:
- Extract tool-subprocess cleanup into a local helper
_kill_tool_subprocesses() in _stop_impl().
- Invoke it eagerly right after _interrupt_running_agents() on the
drain-timeout path, before adapter disconnect.
- Keep the existing catch-all call at the end for the graceful path
and defense in depth against mid-teardown respawns.
- Bump generated systemd unit TimeoutStopSec to drain_timeout + 30s
so cleanup + disconnect + DB close has headroom above the drain
budget, matching the 'subprocess timeout > TimeoutStopSec + margin'
rule from the skill.
Tests:
- New: test_gateway_stop_kills_tool_subprocesses_before_adapter_disconnect_on_timeout
asserts kill_all() runs before disconnect() when drain times out.
- New: test_gateway_stop_kills_tool_subprocesses_on_graceful_path
guards that the final catch-all still fires when drain succeeds
(regression guard against accidental removal during refactor).
- Updated: existing systemd unit generator tests expect TimeoutStopSec=90
(= 60s drain + 30s headroom) with explanatory comment.
Previously delegate_task exposed 'max_iterations' in its JSON schema and used
`max_iterations or default_max_iter` — so a model guessing conservatively (or
copy-pasting a docstring hint like 'Only set lower for simple tasks') could
silently shrink a subagent's budget below the user's configured
delegation.max_iterations. One such call this session capped a deep forensic
audit at 40 iterations while the user's config was set to 250.
Changes:
- Drop 'max_iterations' from DELEGATE_TASK_SCHEMA['parameters']['properties'].
Models can no longer emit it.
- In delegate_task(): ignore any caller-supplied max_iterations, always use
delegation.max_iterations from config. Log at debug if a stale schema or
internal caller still passes one through.
- Keep the Python kwarg on the function signature for internal callers
(_build_child_agent tests pass it through the plumbing layer).
- Update test_schema_valid to assert the param is now absent (intentional
contract change, not a change-detector).
A test in tests/agent/test_credential_pool.py
(test_try_refresh_current_updates_only_current_entry) monkeypatched
refresh_codex_oauth_pure() to return the literal fixture strings
'access-new'/'refresh-new', then executed the real production code path
in agent/credential_pool.py::try_refresh_current which calls
_sync_device_code_entry_to_auth_store → _save_provider_state → writes
to `providers.openai-codex.tokens`. That writer resolves the target via
get_hermes_home()/auth.json. If the test ran with HERMES_HOME unset (direct
pytest invocation, IDE runner bypassing conftest discovery, or any other
sandbox escape), it would overwrite the real user's auth store with the
fixture strings.
Observed in the wild: Teknium's ~/.hermes/auth.json providers.openai-codex.tokens
held 'access-new'/'refresh-new' for five days. His CLI kept working because
the credential_pool entries still held real JWTs, but `hermes model`'s live
discovery path (which reads via resolve_codex_runtime_credentials →
_read_codex_tokens → providers.tokens) was silently 401-ing.
Fixes:
- Delete test_try_refresh_current_updates_only_current_entry. It was the
only test that exercised a writer hitting providers.openai-codex.tokens
with literal stub tokens. The entry-level rotation behavior it asserted
is still covered by test_mark_exhausted_and_rotate_persists_status above.
- Add a seat belt in hermes_cli.auth._auth_file_path(): if PYTEST_CURRENT_TEST
is set AND the resolved path equals the real ~/.hermes/auth.json, raise
with a clear message. In production (no PYTEST_CURRENT_TEST), a single
dict lookup. Any future test that forgets to monkeypatch HERMES_HOME
fails loudly instead of corrupting the user's credentials.
Validation:
- production (no PYTEST_CURRENT_TEST): returns real path, unchanged behavior
- pytest + HERMES_HOME unset (points at real home): raises with message
- pytest + HERMES_HOME=/tmp/...: returns tmp path, tests pass normally
Dashboard themes now control typography and layout, not just colors.
Each built-in theme picks its own fonts, base size, radius, and density
so switching produces visible changes beyond hue.
Schema additions (per theme):
- typography — fontSans, fontMono, fontDisplay, fontUrl, baseSize,
lineHeight, letterSpacing. fontUrl is injected as <link> on switch
so Google/Bunny/self-hosted stylesheets all work.
- layout — radius (any CSS length) and density
(compact | comfortable | spacious, multiplies Tailwind spacing).
- colorOverrides (optional) — pin individual shadcn tokens that would
otherwise derive from the palette.
Built-in themes are now distinct beyond palette:
- default — system stack, 15px, 0.5rem radius, comfortable
- midnight — Inter + JetBrains Mono, 14px, 0.75rem, comfortable
- ember — Spectral (serif) + IBM Plex Mono, 15px, 0.25rem
- mono — IBM Plex Sans + Mono, 13px, 0 radius, compact
- cyberpunk— Share Tech Mono everywhere, 14px, 0 radius, compact
- rose — Fraunces (serif) + DM Mono, 16px, 1rem, spacious
Also fixes two bugs:
1. Custom user themes silently fell back to default. ThemeProvider
only applied BUILTIN_THEMES[name], so YAML files in
~/.hermes/dashboard-themes/ showed in the picker but did nothing.
Server now ships the full normalised definition; client applies it.
2. Docs documented a 21-token flat colors schema that never matched
the code (applyPalette reads a 3-layer palette). Rewrote the
Themes section against the actual shape.
Implementation:
- web/src/themes/types.ts: extend DashboardTheme with typography,
layout, colorOverrides; ThemeListEntry carries optional definition.
- web/src/themes/presets.ts: 6 built-ins with distinct typography+layout.
- web/src/themes/context.tsx: applyTheme() writes palette+typography+
layout+overrides as CSS vars, injects fontUrl stylesheet, fixes the
fallback-to-default bug via resolveTheme(name).
- web/src/index.css: html/body/code read the new theme-font vars;
--radius-sm/md/lg/xl derive from --theme-radius; --spacing scales
with --theme-spacing-mul so Tailwind utilities shift with density.
- hermes_cli/web_server.py: _normalise_theme_definition() parses loose
YAML (bare hex strings, partial blocks) into the canonical wire
shape; /api/dashboard/themes ships full definitions for user themes.
- tests/hermes_cli/test_web_server.py: 16 new tests covering the
normaliser and discovery (rejection cases, clamping, defaults).
- website/docs/user-guide/features/web-dashboard.md: rewrite Themes
section with real schema, per-model tables, full YAML example.
OpenAI launched GPT-5.5 on Codex today (Apr 23 2026). Adds it to the static
catalog and pipes the user's OAuth access token into the openai-codex path of
provider_model_ids() so /model mid-session and the gateway picker hit the
live ChatGPT codex/models endpoint — new models appear for each user
according to what ChatGPT actually lists for their account, without a Hermes
release.
Verified live: 'gpt-5.5' returns priority 0 (featured) from the endpoint,
400k context per OpenAI's launch article. 'hermes chat --provider
openai-codex --model gpt-5.5' completes end-to-end.
Changes:
- hermes_cli/codex_models.py: add gpt-5.5 to DEFAULT_CODEX_MODELS + forward-compat
- agent/model_metadata.py: 400k context length entry
- hermes_cli/models.py: resolve codex OAuth token before calling
get_codex_model_ids() in provider_model_ids('openai-codex')
Trim comment noise, remove redundant typing, normalize sticky prompt viewport args to top→bottom order, and reuse one sticky viewport helper instead of duplicating the math.
Sticky prompt selection only considered the top edge of the viewport, so it could keep showing an older user prompt even when a newer one was already visible lower down. Suppress sticky output whenever a user message is visible in the viewport and cover it with a regression test.
Renderer-driven follow-to-bottom was restoring the viewport to the tail without notifying ScrollBox subscribers, so StickyPromptTracker could stay stale-visible. Notify on render-time scroll/sticky changes and treat near-bottom as bottom for prompt hiding.
When the viewport is away from the bottom, keep the last visible progress snapshot instead of rebuilding the streaming/thinking subtree on every turn-store update. This cuts scroll-time churn while preserving live updates near the tail and on turn completion.
Commit 43de1ca8 removed the _nr_to_assistant_message shim in favor of
duck-typed properties on the ToolCall dataclass. However, the
extra_content property (which carries the Gemini thought_signature) was
omitted from the ToolCall definition. This caused _build_assistant_message
to silently drop the signature via getattr(tc, 'extra_content', None)
returning None, leading to HTTP 400 errors on subsequent turns for all
Gemini 3 thinking models.
Add the extra_content property to ToolCall (matching the existing
call_id and response_item_id pattern) so the thought_signature round-trips
correctly through the transport → agent loop → API replay path.
Credit to @celttechie for identifying the root cause and providing the fix.
Closes#14488
Broaden the settle repaint from xterm.js-only to all alt-screen terminals. Ink upstream and ConPTY/xterm reports point to resize/reflow desync as a general stale-cell class, not a host-specific quirk.
Copilot flagged the variable as unused. LogUpdate.render only sees prev/next, so a simulated "physical terminal" has no hook in the public API. Kept the narrative in the comment and tightened the assertion to demonstrate the test's actual invariant: identical prev/next emits no heal patches.
Replace 28-line guard + nested queueMicrotask + pendingResizeRender flag-reuse with a named canAltScreenRepaint predicate and a single flat paint. setTimeout already drained the burst coalescer; the nested defer and flag dance were paranoia.
Three bugs fixed in model alias resolution:
1. resolve_alias() returned the FIRST catalog match with no version
preference. '/model mimo' picked mimo-v2-omni (index 0 in dict)
instead of mimo-v2.5-pro. Now collects all prefix matches, sorts
by version descending with pro/max ranked above bare names, and
returns the highest.
2. models.dev registry missing newly added models (e.g. v2.5 for
native xiaomi). resolve_alias() now merges static _PROVIDER_MODELS
entries into the catalog so models resolve immediately without
waiting for models.dev to sync.
3. hermes model picker showed only models.dev results (3 xiaomi models),
hiding curated entries (5 total). The picker now merges curated
models into the models.dev list so all models appear.
Also fixes a trailing-dot float parsing edge case in _model_sort_key
where '5.4.' failed float() and multi-dot versions like '5.4.1'
weren't parsed correctly.
- run the xterm.js settle-heal pass through a full render commit instead of diff-only scheduleRender
- guard against overlapping resize renders and clear settle timers on unmount
## Merged
Adds MiMo v2.5-pro and v2.5 support to Xiaomi native provider, OpenCode Go, and setup wizard.
### Changes
- Context lengths: added v2.5-pro (1M) and v2.5 (1M), corrected existing MiMo entries to exact values (262144)
- Provider lists: xiaomi, opencode-go, setup wizard
- Vision: upgraded from mimo-v2-omni to mimo-v2.5 (omnimodal)
- Config description updated for XIAOMI_API_KEY
- Tests updated for new vision model preference
### Verification
- 4322 tests passed, 0 new regressions
- Live API tested on Xiaomi portal: basic, reasoning, tool calling, multi-tool, file ops, system prompt, vision — all pass
- Self-review found and fixed 2 issues (redundant vision check, stale HuggingFace context length)
- force full alt-screen damage in xterm.js hosts to avoid stale glyph artifacts
- skip incremental scroll optimization there and repaint from a cleared screen atomically
Replaces the blanket 'always allow' change from the previous commit with
an opt-in config flag so users who want belt-and-suspenders security can
still get the keyword scan on skill_manage output.
## Default behavior (flag off)
skill_manage(action='create'|'edit'|'patch') no longer runs the keyword
scanner. The agent can write skills that mention risky keywords in prose
(documenting what reviewers should watch for, describing cache-bust
semantics in a PR-review skill, referencing AGENTS.md, etc.) without
getting blocked.
Rationale: the agent can already execute the same code paths via
terminal() with no gate, so the scan adds friction without meaningful
security against a compromised or malicious agent.
## Opt-in behavior (flag on)
Set skills.guard_agent_created: true in config.yaml to get the original
behavior back. Scanner runs on every skill_manage write; dangerous
verdicts surface as a tool error the agent can react to (retry without
the flagged content).
## External hub installs unaffected
trusted/community sources (hermes skills install) always get scanned
regardless of this flag. The gate is specifically for skill_manage,
which only agents call.
## Changes
- hermes_cli/config.py: add skills.guard_agent_created: False to DEFAULT_CONFIG
- tools/skill_manager_tool.py: _guard_agent_created_enabled() reads the flag;
_security_scan_skill() short-circuits to None when the flag is off
- tools/skills_guard.py: restore INSTALL_POLICY['agent-created'] =
('allow', 'allow', 'ask') so the scan remains strict when it does run
- tests/tools/test_skills_guard.py: restore original ask/force tests
- tests/tools/test_skill_manager_tool.py: new TestSecurityScanGate class
covering both flag states + config error handling
## Validation
- tests/tools/test_skills_guard.py + test_skill_manager_tool.py: 115/115 pass
- E2E: flagged-keyword skill creates with default config, blocks with flag on
The security scanner is meant to protect against hostile external skills
pulled from GitHub via hermes skills install — trusted/community policies
block or ask on dangerous verdicts accordingly. But agent-created skills
(from skill_manage) run in the same process as the agent that wrote them.
The agent can already execute the same code paths via terminal() with no
gate, so the ask-on-dangerous policy adds friction without meaningful
security.
Concrete trigger: an agent writing a PR-review skill that describes
cache-busting or persistence semantics in prose gets blocked because
those words appear in the patterns list. The skill isn't actually doing
anything dangerous — it's just documenting what reviewers should watch
for in other PRs.
Change: agent-created dangerous verdict maps to 'allow' instead of 'ask'.
External hub installs (trusted/community) keep their stricter policies
intact. Tests updated: renamed test_dangerous_agent_created_asks →
test_dangerous_agent_created_allowed; renamed force-override test and
updated assertion since force is now a no-op for agent-created (the allow
branch returns first).
Before this, _process_message_background's finally did an unconditional
'del self._active_sessions[session_key]' — even if a /stop/ /new
command had already swapped in its own command_guard via
_dispatch_active_session_command and cancelled us. The old task's
unwind would clobber the newer guard, opening a race for follow-ups.
Replace with _release_session_guard(session_key, guard=interrupt_event)
so the delete only fires when the guard we captured is still the one
installed. The sibling _session_tasks pop already had equivalent
ownership matching via asyncio.current_task() identity; this closes the
asymmetry.
Adds two direct regressions in test_session_split_brain_11016:
- stale guard reference must not clobber a newer guard by identity
- guard=None default still releases unconditionally (for callers that
don't have a captured guard to match against)
Refs #11016
Covers all three layers of the salvaged fix:
1. Adapter-side cancellation: /stop, /new, /reset cancel the in-flight
adapter task, release the guard, and let follow-up messages through;
/new keeps the guard installed until the runner response lands, then
drains the queued follow-up in order.
2. Adapter-side self-heal: a split-brain guard (done owner task, lock
still live) is healed on the next inbound message and the user gets
a reply instead of being trapped in infinite busy acks. A guard
with no recorded owner task is NOT auto-healed (protects fixtures
that install guards directly).
3. Runner-side generation guard: stale async runs whose generation was
bumped by /stop or /new cannot clear a newer run's _running_agents
slot on the way out.
11 tests, all green.
Refs #11016
Closes the runner-side half of the split-brain described in issue #11016
by wiring the existing _session_run_generation counter through the
session-slot promotion and release paths.
Without this, an older async run could still:
- promote itself from sentinel to real agent after /stop or /new
invalidated its run generation
- clear _running_agents on the way out, deleting a newer run's slot
Both races leave _running_agents desynced from what the user actually
has in flight, which is half of what shows up as 'No active task to
stop' followed by late 'Interrupting current task...' acks.
Changes:
- track_agent() in _run_agent now calls _is_session_run_current() before
writing the real agent into _running_agents[session_key]; if /stop or
/new bumped the generation while the agent was spinning up, the slot
is left alone (the newer run owns it).
- _release_running_agent_state() gained an optional run_generation
keyword. When provided, it only clears the slot if the generation is
still current. The final cleanup at the tail of _run_agent passes the
run's generation so an old unwind can't blow away a newer run's state.
- Returns bool so callers can tell when a release was blocked.
All the existing call sites that do NOT pass run_generation behave
exactly as before — this is a strict additive guard.
Refs #11016