Covers PR #18256 fix for issue #18254 — when OPENROUTER_API_KEY is set in
BOTH os.environ (stale from parent shell) and ~/.hermes/.env (fresh),
_seed_from_env must prefer the .env value. Also guards the fallback case
where .env omits the key entirely (Docker/K8s/systemd deployments that
only inject via runtime env).
`_register_skill_group` captured the skill catalog in closure variables
(`entries` and `skill_lookup`) so the single `tree.add_command` call at
startup owned the only live copy. The closure is never re-entered after
startup, so `/reload-skills` — which rescans the on-disk skills dir and
refreshes the in-process `_skill_commands` registry — had no way to
propagate results into the `/skill` autocomplete on Discord. New skills
stayed invisible in the dropdown, and deleted skills returned
"Unknown skill" when the stale autocomplete entry was clicked.
The fix is purely a dataflow change: promote `entries` and `skill_lookup`
to instance attributes (`_skill_entries`, `_skill_lookup`), split the
collector-driven rebuild into a helper (`_refresh_skill_catalog_state`),
and add a public `refresh_skill_group()` method that re-runs the helper
and is safe to call at any point after the initial registration.
The gateway's `_handle_reload_skills_command` then iterates
`self.adapters` and calls `refresh_skill_group()` on any adapter that
exposes it (currently only Discord). Both sync and async implementations
are supported; adapters that don't override the method (Telegram's
BotCommand menu, Slack subcommand map, etc.) are silently skipped — the
in-process `reload_skills()` call covers them.
No `tree.sync()` is required because Discord fetches autocomplete
options dynamically on every keystroke — mutating the instance state the
callbacks already read from is sufficient. That sidesteps the per-app
command-bucket rate limit (~5 writes / 20 s) that made the previous
bulk-sync-on-reload approach unusable (#16713 context).
Tests: tests/gateway/test_reload_skills_discord_resync.py — five cases
covering (1) refresh replaces entries, (2) entries stay sorted after
refresh, (3) collector exception leaves cached state intact, (4)
`_refresh_skill_catalog_state` populates the instance attrs, (5)
orchestrator calls `refresh_skill_group()` on sync + async adapters and
skips adapters that don't expose it.
_check_unavailable_skill is meant to turn a typed "/foo" command that
doesn't resolve into a specific hint — "disabled, enable with hermes
skills config" or "available but not installed, install with hermes
skills install …" — instead of the generic "unknown command" reply.
It was doing the match with `skill_md.parent.name.lower().replace("_", "-")`,
comparing that to the typed command. For every skill whose directory name
drifted from its declared frontmatter `name:`, that comparison failed and
the user got the unhelpful generic path. On a standard install today 19
skills have this drift, e.g.:
dir: mlops/stable-diffusion
frontmatter: name: Stable Diffusion Image Generation
registered slug (what the user types): /stable-diffusion-image-generation
dir: mlops/qdrant
frontmatter: name: Qdrant Vector Search
registered slug: /qdrant-vector-search
dir: mlops/flash-attention
frontmatter: name: Optimizing Attention Flash
registered slug: /optimizing-attention-flash
In every case, _check_unavailable_skill would fall through because
"stable-diffusion" != "stable-diffusion-image-generation", even with the
skill sitting right there on disk.
Fix: extract a small `_skill_slug_from_frontmatter` helper that reads the
SKILL.md frontmatter and normalizes exactly like scan_skill_commands
(lower, spaces/underscores → hyphens, strip non-[a-z0-9-], collapse
runs of hyphens, strip edges). Use it in both the
disabled-skills branch and the optional-skills branch. The disabled-set
membership check now uses the declared frontmatter name (which is what
`hermes skills config` writes into skills.disabled / platform_disabled),
not the slug.
Tests: five cases in tests/gateway/test_unavailable_skill_hint.py —
the drift case for the disabled branch, unknown-command negative,
matched-but-not-disabled negative, non-alnum stripping, and the drift
case for the optional-skills branch. All five fail against main and
pass with the fix.
``discord_skill_commands_by_category`` was lagging the flat
``discord_skill_commands`` collector on two counts. Both were actively
dropping skills from Discord's ``/skill`` autocomplete dropdown.
1. External-dir skills were filtered out. #18741 widened the flat
collector to accept ``SKILLS_DIR + skills.external_dirs`` but left
this sibling collector — the one ``_register_skill_group`` actually
uses on Discord — still matching ``SKILLS_DIR`` only. External
skills were visible in ``hermes skills list`` and the agent's
``/skill-name`` dispatch but silently absent from Discord's
``/skill`` picker. Widen the accepted roots to match, and derive
categories from whichever root the skill lives under so
``<ext>/mlops/foo/SKILL.md`` still lands in the ``mlops`` group.
2. 25-group × 25-subcommand caps were still applied. PR #11580
refactored ``/skill`` to a flat autocomplete (whose options Discord
fetches dynamically — no per-command payload concern) and its
docstring promises "no hidden skills." The collector kept the old
nested-layout caps anyway, silently dropping anything past the 25th
alphabetical category. On installs with 29 category dirs today (real
example: tail categories ``social-media``, ``software-development``,
``yuanbao`` going missing) this was biting immediately. Remove the
caps; ``hidden`` now reports only 32-char name-clamp collisions
against reserved names.
Tests: guard both behaviors. ``test_no_legacy_25x25_cap`` builds 30
categories × 30 skills each and asserts all 900 are returned.
``test_external_dirs_skills_included`` monkeypatches
``get_external_skills_dirs`` and asserts an external-dir skill makes
it into the result grouped under its own top-level directory.
After a transient Telegram 502, _handle_polling_network_error's
stop()+start_polling() cycle can leave PTB's Updater with `running=True`
but a wedged consumer task that never makes progress. No error_callback
fires in that state, so the reconnect ladder never advances past attempt
1, the MAX_NETWORK_RETRIES fatal-error path is never reached, and the
gateway sits silent indefinitely.
Schedule a heartbeat probe (60s after a successful reconnect) that
verifies Updater.running is still True and bot.get_me() responds within
a tight asyncio.wait_for timeout. Either failure feeds back into the
reconnect ladder so the existing escalation path fires.
No PTB-internal coupling, no Application rebuild — minimal additive
defense inside the existing reconnect abstraction.
Tests cover healthy / Updater non-running / probe timeout / probe
network error / already-fatal cases, plus an integration check that the
probe is actually scheduled after a successful start_polling().
Closes the silent-wedge case observed in the wild after a transient
Telegram 502; existing reconnect tests updated to mock bot.get_me() now
that the success path schedules a heartbeat probe.
Providers like Google Vertex, Azure, and Amazon Bedrock reject API
requests with duplicate tool names (HTTP 400: 'Tool names must be
unique'). The upstream injection paths in run_agent.py already dedup
after PR #17335, but two API-boundary functions pass tools through
without checking:
- agent/auxiliary_client.py: _build_call_kwargs() (all non-Anthropic
providers in chat_completions mode)
- agent/anthropic_adapter.py: convert_tools_to_anthropic() (Anthropic
Messages API path)
Add defensive dedup guards at both sites. Duplicates are dropped with
a warning log, converting a hard 400 failure into a recoverable
condition. This is intentionally conservative — the root-cause dedup
in run_agent.py is the primary defense; these guards add resilience
against future injection-path regressions.
Includes 8 new tests covering unique passthrough, duplicate removal,
empty/None edge cases.
Closes#18478
When HERMES_HOME is unset but ~/.hermes/active_profile names a non-default
profile, any data this process writes lands in the default profile — not the
one the operator expects. Before this change the fallback was silent, so
cross-profile contamination (#18594) was invisible until a user noticed
their memory/state ended up in the wrong place.
Now we emit a one-shot warning to stderr the first time this happens in
a process. No raise — there are 30+ module-level callers of get_hermes_home()
and raising from any of them would brick import. Behavior is otherwise
unchanged; subprocess spawners (systemd template, kanban dispatcher, docker
entrypoint) already propagate HERMES_HOME correctly.
Bypasses logging.getLogger() because this runs before logging is configured
in a significant fraction of callers (module import time).
Refs #18594. Credit to @liuhao1024 for surfacing the silent-fallback case
in PR #18600; we kept the diagnostic signal without the import-time raise.
Path.read_text() uses the system locale by default. On Windows CN/JP/KR
locales (GBK/CP932/CP949), reading a UTF-8 .env raises UnicodeDecodeError
as soon as it contains any non-ASCII byte (e.g. an em dash).
Pin encoding="utf-8" on every .env read in hermes_cli to match how the
rest of the codebase (load_dotenv at doctor.py:26) already decodes it.
Adds a regression test that monkeypatches Path.read_text to simulate a
GBK locale and asserts 'hermes doctor' no longer raises.
Refs #18637
Skills configured through `skills.external_dirs` in config.yaml were
visible via `hermes skills list`, `get_skill_commands()`, and the
agent's `/skill-name` dispatch, but silently excluded from the
Telegram and Discord slash-command menus. The filter in
`_collect_gateway_skill_entries` only accepted skills whose
`skill_md_path` started with `SKILLS_DIR`, so anything under an
external directory fell through.
Widen the accepted-prefix set to include all configured external
dirs alongside the local skills dir. Every prefix is now
slash-terminated so `/my-skills` cannot also admit
`/my-skills-extra`. Also guard against empty `skill_md_path`
values so they can't accidentally match.
Fixes#8110
Salvages #8790 by luyao618.
Co-authored-by: Yao <34041715+luyao618@users.noreply.github.com>
The process-global `_skill_commands` dict in agent/skill_commands.py
was seeded by whichever platform scanned first, and
`get_skill_commands()` only rescanned when the cache was empty. In a
long-lived gateway process serving multiple platforms (Telegram +
Discord + Slack), the first platform's
`skills.platform_disabled` view was silently inherited by the
others — so a skill disabled for Telegram would also disappear from
Discord's slash menu, and vice versa.
Track the platform scope the cache was populated for
(`_skill_commands_platform`) and rescan in `get_skill_commands()`
when the currently-active platform no longer matches. Platform
resolution uses the same precedence as `_is_skill_disabled`:
`HERMES_PLATFORM` env var then `HERMES_SESSION_PLATFORM` from the
gateway session context.
Fixes#14536
Salvages #14570 by LeonSGP43.
Co-authored-by: LeonSGP <leon@sgp43.com>
* fix(curator): authoritative absorbed_into declarations on skill delete
Closes#18671. The classification pipeline that feeds cron-ref rewriting
used to infer consolidation vs pruning from two brittle signals: the
curator model's post-hoc YAML summary block, and a substring heuristic
scanning other tool calls for the removed skill's name. Both miss in
real consolidations — the model forgets the YAML under reasoning
pressure, and the heuristic misses when the umbrella's patch content
describes the absorbed behavior abstractly instead of naming the old
slug. When both miss, the skill falls through to 'no-evidence fallback'
pruned, and #18253's cron rewriter drops the cron ref entirely instead
of mapping it to the umbrella. Same observable symptom as pre-#18253:
'Skill(s) not found and skipped' at the next cron run.
The fix makes the model declare intent at the moment of deletion.
skill_manage(action='delete') now accepts absorbed_into:
- absorbed_into='<umbrella>' -> consolidated, target must exist on disk
- absorbed_into='' -> explicit prune, no forwarding target
- missing -> legacy path, falls through to heuristic/YAML
The curator reconciler reads these declarations off llm_meta.tool_calls
BEFORE either the YAML block or the substring heuristic. Declaration
wins. Fallback logic stays intact for backward compat with any caller
(human or older curator conversation) that doesn't populate the arg.
Changes
- tools/skill_manager_tool.py: add absorbed_into param to skill_manage
+ _delete_skill. Validate target exists when non-empty. Reject
absorbed_into=<self>. Wire through dispatcher + registry + schema.
- agent/curator.py: new _extract_absorbed_into_declarations() walks
tool calls for skill_manage(delete) with the arg. _reconcile_classification
accepts absorbed_declarations= and treats them as authoritative. Curator
prompt updated to require the arg on every delete.
- Tests: 7 new skill_manager tests covering the tool contract (valid
target, empty string, nonexistent target, self-reference, whitespace,
backward compat, dispatcher plumbing). 11 new curator tests covering
the extractor + authoritative reconciler path + mixed-legacy-and-
declared runs.
Validation
- 307/307 targeted tests pass (curator + cron + skill_manager suites).
- E2E #18671 repro: 3 narrow skills, 1 umbrella, cron job referencing
all 3. Model emits NO YAML block. Heuristic misses (patch prose
doesn't name old slugs). Delete calls carry absorbed_into. Result:
both PR skills correctly classified 'consolidated' + cron rewritten
['pr-review-format', 'pr-review-checklist', 'stale-junk'] ->
['hermes-agent-dev']; stale-junk pruned via absorbed_into=''.
- E2E backward-compat: delete without absorbed_into, model emits YAML
-> routed via existing 'model' source, cron still rewritten correctly.
* feat(curator): capture + restore cron skill links across snapshot/rollback
Before this, rolling back a curator run restored the skills tree but cron
jobs still pointed at the umbrella skills the curator had rewritten them
to. The user would see their old narrow skills back on disk but their
cron jobs still configured with the merged umbrella — not actually 'back
to how it was'.
Snapshot side: snapshot_skills() now captures ~/.hermes/cron/jobs.json
alongside the skills tarball, as cron-jobs.json. The manifest gets a new
'cron_jobs' block with {backed_up, jobs_count} so rollback (and the CLI
confirm dialog) can surface what's in the snapshot. If jobs.json is
missing/unreadable/malformed, snapshot proceeds without cron data — the
skills backup is the core guarantee; cron is additive.
Rollback side: after the skills extract succeeds, the new
_restore_cron_skill_links() reconciles the backed-up jobs into the live
jobs.json SURGICALLY. Only 'skills' and 'skill' fields are restored, and
only on jobs matched by id. Everything else about a cron job — schedule,
last_run_at, next_run_at, enabled, prompt, workdir, hooks — is live
state the user or scheduler has modified since the snapshot; overwriting
it would regress unrelated activity.
Reconciliation rules:
- Job in backup AND live, skills differ → skills restored.
- Job in backup AND live, skills match → no-op.
- Job in backup, NOT in live → skipped (user deleted it
after snapshot; their choice
is later than the snapshot).
- Job in live, NOT in backup → untouched (user created it
after snapshot).
- Snapshot missing cron-jobs.json at all → rollback still succeeds,
reports 'not captured'
(older pre-feature snapshots
keep working).
Writes go through cron.jobs.save_jobs under the same _jobs_file_lock the
scheduler uses, so rollback doesn't race tick().
Also:
- hermes_cli/curator.py: rollback confirm dialog now shows
'cron jobs: N (will be restored for skill-link fields only)' when the
snapshot has cron data, or 'not in snapshot (<reason>)' otherwise.
- rollback()'s message string includes a 'cron links: ...' clause
summarizing the reconciliation outcome.
Tests
- 9 new cases: snapshot-with-cron, snapshot-without-cron, malformed-json
captured-as-raw, full rollback-restores-skills-and-cron, rollback
touches only skill fields, rollback skips user-deleted jobs, rollback
leaves user-created jobs untouched, rollback still works with
pre-feature snapshot that has no cron-jobs.json, standalone unit test
on _restore_cron_skill_links exercising the full report shape.
Validation
- 484/484 targeted tests pass (curator + cron + skill_manager suites).
- E2E: real snapshot_skills, real cron rewrite, real rollback. Before:
['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage'].
After curator: ['hermes-agent-dev']. After rollback: ['pr-review-format',
'pr-review-checklist', 'pr-triage-salvage']. Non-skill fields (id,
name, prompt) preserved across the round trip.
FixesNousResearch/hermes-agent#11768
Root cause: target.strip().lower() was lowercasing the entire target string,
corrupting case-sensitive chat IDs like Slack C123ABC and Matrix !RoomABC.
Fix: Only lowercase the platform prefix for case-insensitive matching;
preserve the original case for chat_id and thread_id values.
YAML loads a bare numeric value such as
discord:
free_response_channels: 1491973769726791812
as an int. _discord_free_response_channels() / _slack_free_response_channels()
checked `isinstance(raw, list)` and `isinstance(raw, str)` in that order and
then fell through to `return set()`, so a single-channel config that happened
to be unquoted was silently dropped with no log line — the bot kept demanding
@mentions even though the channel was configured to free-response.
A multi-channel value like `1234567890,9876543210` does not trip this because
the comma forces YAML to parse it as a string. Single-channel configs are
the only case that breaks, which is exactly the footgun that's hardest to
diagnose (the config "looks right" and the feature just doesn't activate).
Note that the old-schema env-var bridge at gateway/config.py:614+ already
runs `str(frc)` when forwarding to SLACK_/DISCORD_FREE_RESPONSE_CHANNELS,
so the env-var fallback worked. The bug only surfaces on the
`config.extra["free_response_channels"]` path populated by the `platforms:`
bridge at gateway/config.py:576, which passes the raw YAML value through
unchanged.
Fix at the reader: treat any non-list value as a scalar, coerce with str(),
then apply the same CSV split semantics. This keeps the public contract
stable (list or str-like continues to work identically) while accepting
the ints that the YAML loader is free to hand us.
Added tests for both Discord and Slack covering:
- bare int value in config.extra
- list of ints in config.extra
Slack has built-in slash commands (e.g. /status, /me, /join) that apps
cannot register. When running `hermes slack manifest --write`, the
generated manifest included /status, causing Slack to reject the entire
manifest with a reserved-command error.
Add _SLACK_RESERVED_COMMANDS frozenset of all known Slack built-ins and
skip them in slack_native_slashes(). Affected commands remain reachable
via /hermes <command>.
Tests updated:
- New test_excludes_slack_reserved_commands validates no leaks
- test_includes_canonical_commands no longer asserts /status
- test_telegram_parity accounts for expected Slack-only exclusions
Self-review fixes for the slash ephemeral ack:
- Only stash response_url when text starts with '/' (gateway command).
Free-form questions via '/hermes <question>' must produce public agent
replies visible to the whole channel, not ephemeral.
- Use a ContextVar (_slash_user_id) to thread the invoking user's ID
from _handle_slash_command through to send(). _pop_slash_context now
matches the exact (channel_id, user_id) key when the ContextVar is
set, preventing concurrent users on the same channel from stealing
each other's ephemeral context. ContextVars propagate to child
asyncio.Tasks, so the value survives through handle_message →
_process_message_background → _send_with_retry → send().
- Add truncate_message() in _send_slash_ephemeral to prevent silent
failures on long responses (response_url has the same ~40k limit).
- Log send_private_notice failures at debug level instead of bare
except/pass — aids diagnostics without spamming.
- Document app_mention dedup dependency on shared event ts.
- Add tests: free-form question must NOT stash context, concurrent
users on the same channel get isolated contexts, non-slash send()
path fallback behavior.
Adds platform-level private notice delivery abstraction so operational
messages (e.g. sethome prompt) can be sent ephemerally on Slack when
configured with `slack.notice_delivery: private`.
Changes:
- gateway/config.py: _normalize_notice_delivery() + GatewayConfig.get_notice_delivery()
with per-platform config bridging
- gateway/platforms/base.py: send_private_notice() default implementation
(falls through to send())
- gateway/platforms/slack.py: send_private_notice() via chat_postEphemeral
- gateway/run.py: _deliver_platform_notice() helper replaces direct
adapter.send() for the sethome notice, with private→public fallback
- gateway/platforms/slack.py: app_mention handler now forwards to
_handle_slack_message (safe due to ts-based dedup) instead of no-op pass,
fixing edge-case Slack configs where mentions arrive only as app_mention
- gateway/platforms/slack.py format_message: negative lookbehind prevents
markdown images (![]()) from becoming broken Slack links; italic regex
now requires non-whitespace boundaries so 'a * b * c' stays literal
Based on PR #9340 by @probepark.
Slack slash commands (/q, /btw, /stop, /model, etc.) previously showed
no user-visible acknowledgement and posted command replies as public
channel messages. This diverged from Discord, which uses ephemeral
deferred responses for slash commands.
Changes:
- handle_hermes_command now passes response_type='ephemeral' and a
'Running /cmd…' text to ack(), giving the user immediate 'Only visible
to you' feedback when they invoke any native slash command.
- _handle_slash_command stashes the Slack response_url from the command
payload in a per-channel context dict before dispatching to
handle_message.
- send() checks for a pending slash context and, when found, POSTs to
the response_url with replace_original=true to swap the initial ack
with the real command reply (e.g. 'Queued for the next turn.'),
keeping it ephemeral.
- Stale slash contexts are garbage-collected on lookup (120s TTL).
- The response_url POST is non-fatal: if it fails, the user already saw
the initial ack, and send() returns success=True.
Fixes#18182
Long-running gateway processes that survive 'hermes update' keep
pre-update modules cached in sys.modules. When new tool files on
disk then try to 'from hermes_cli.config import cfg_get' (added in
PR #17304), the import resolves against the stale module object
and raises ImportError — hitting users on Matrix, Telegram, Feishu,
and other platforms.
Two defenses:
1. Gateway self-check (gateway/run.py). On __init__, snapshot the
newest mtime across sentinel source files (hermes_cli/config.py,
run_agent.py, gateway/run.py, etc.). On every inbound message,
re-read those mtimes; if any is newer than boot time + 2s slack,
request a graceful restart via the normal drain path and return
a one-line ack to the user. Idempotent, works regardless of how
the update happened (hermes update, manual git pull, installer).
2. Post-restart survivor sweep ('hermes update'). After the existing
restart loop, sleep 3s, rescan for gateway PIDs we already tried
to kill, and SIGKILL any survivors. The detached profile watchers
and systemd then relaunch with fresh code instead of waiting out
the 120s watcher timeout.
Closes#17648.
* fix(curator): defer first run and add --dry-run preview (#18373)
Curator was meant to run 7 days after install, not on the very first
gateway tick. On a fresh install (no .curator_state), should_run_now()
returned True immediately because last_run_at was None — so the gateway
cron ticker fired Curator against a fresh skill library moments after
'hermes update'. Combined with the binary 'agent-created' provenance
model (anything not bundled and not hub-installed), this consolidated
hand-authored user workflow skills without consent.
Changes:
- should_run_now(): first observation seeds last_run_at='now' and returns
False. The next real pass fires one full interval_hours later (7 days
by default), matching the original design intent.
- hermes curator run --dry-run: produces the same review report without
applying automatic transitions OR permitting the LLM to call
skill_manage / terminal mv. A DRY-RUN banner is prepended to the
prompt and the caller skips apply_automatic_transitions. State is
NOT advanced so a preview doesn't defer the next scheduled real pass.
- hermes update: prints a one-liner on fresh installs pointing at
--dry-run, pause, and the docs. Silent on steady state.
- Docs: curator.md and cli-commands.md explain the deferred first-run
behavior and warn that hand-written SKILL.md files share the
'agent-created' bucket, with guidance to pin or preview before the
first pass.
Tests:
- test_first_run_defers replaces the old 'first run always eligible'
assertion — same fixture, inverted expectation.
- test_maybe_run_curator_defers_on_fresh_install covers the gateway tick
path end-to-end.
- Three new dry-run tests cover state-advance suppression, prompt
banner injection, and apply_automatic_transitions skipping.
Fixes#18373.
* feat(curator): pre-run backup + rollback (#18373)
Every real curator pass now snapshots ~/.hermes/skills/ into
~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling
apply_automatic_transitions or the LLM review. If a run consolidates or
archives something the user didn't want touched, 'hermes curator
rollback' restores the tree in one command. Dry-run is skipped — no
mutation means no snapshot needed.
Changes:
- agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The
snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed
by the skills hub). Extract refuses absolute paths and .. components,
and uses tarfile's filter='data' on Python 3.12+. Rollback takes a
pre-rollback safety snapshot FIRST, stages the current tree into
.rollback-staging-<ts>/ so the extract lands in an empty dir, and
cleans the staging dir on success. A failed extract restores the
staged contents.
- agent/curator.py: run_curator_review() calls curator_backup.
snapshot_skills(reason='pre-curator-run') before apply_automatic_
transitions. Best-effort — a failed snapshot logs at debug and the
run continues (a transient disk issue shouldn't silently disable
curator forever).
- hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator
rollback' subcommands. rollback supports --list, --id <ts>, -y.
- hermes_cli/config.py: curator.backup.{enabled, keep} config block
with sane defaults (enabled=true, keep=5).
- Docs: curator.md gets a 'Backups and rollback' section; cli-commands
.md table gets the new rows.
Tests (new file tests/agent/test_curator_backup.py, 16 cases):
- snapshot creates tarball + manifest with correct counts
- snapshot excludes .curator_backups/ (recursion guard) and .hub/
- snapshot disabled via config returns None without creating anything
- snapshot uniquifies ids within the same second (-01 suffix)
- prune honors keep count, newest-first
- list_backups + _resolve_backup cover newest-default and unknown-id
- rollback restores a deleted skill with content intact
- rollback is itself undoable — safety snapshot shows up in list_backups
- rollback with no snapshots returns an error
- rollback refuses tarballs with absolute paths or .. components
- real curator runs take a 'pre-curator-run' snapshot; dry-runs do not
All curator tests: 210 passing locally.
Prevents ghost sessions from accumulating in state.db when the TUI/web
dashboard is opened and closed without sending a message.
Changes:
- run_agent.py: Add _ensure_db_session() gate method, called at
run_conversation() entry. Remove eager create_session() from __init__.
Handle compression rotation flag correctly.
- tui_gateway/server.py: Remove eager db.create_session() in
_start_agent_build(). Add post-first-message pending_title re-apply.
- hermes_state.py: Extract _insert_session_row() shared helper (DRY).
Add prune_empty_ghost_sessions() for one-time migration.
- cli.py: One-time ghost session prune on startup. Fix _pending_title
to call _ensure_db_session() before set_session_title().
- hermes_cli/main.py: Guard TUI exit summary on message_count > 0.
- tests: Update test_860_dedup to call _ensure_db_session() before
direct _flush_messages_to_session_db() calls.
Closes: ghost session clutter in hermes sessions list and web dashboard.
The anyOf collapse in _repair_schema returned early, skipping the
nullable-strip and enum-cleanup steps. When a schema had anyOf
[{enum: [..., null, '']}, {type: null}] alongside a parent-level
'nullable: true', collapsing to the single non-null branch produced a
merged node that still had both 'nullable' and the bad enum values —
Moonshot would still 400 on it.
Fix: fall through to Rules 1/3 when the collapse produces a single
merged node; only return early for the multi-branch case (pure
anyOf preservation) or when there was no null branch to remove.
Adds a test that locks in the combined-case expectation.
When a schema node inside anyOf has enum values but no explicit 'type',
Rule 3 (enum cleanup) ran before _fill_missing_type, so node_type was
None and the enum was never cleaned. Moonshot then rejected the schema
with 'enum value (<nil>) does not match any type in [string]'.
Fix: reorder operations — fill missing type first, strip nullable,
then clean enum. This ensures enum cleanup always has a type to check.
Also fixes test expectation: empty string in enum is now correctly
stripped (Moonshot rejects it too).
Closes#16875
Add a standing-goal slash command that keeps Hermes working toward a
user-stated objective across turns until it is achieved, paused, or
the turn budget runs out. Our take on the Ralph loop — cf. Codex CLI
0.128.0's /goal.
After each turn, a lightweight auxiliary-model judge call asks 'is
this goal satisfied by the assistant's last response?'. If not, and
we're under the turn budget (default 20), Hermes feeds a continuation
prompt back into the same session as a normal user message. Any real
user message preempts the continuation loop automatically.
Judge failures fail OPEN (continue) so a flaky judge never wedges
progress — the turn budget is the real backstop.
### Commands
- `/goal <text>` — set a standing goal (kicks off the first turn)
- `/goal` or `/goal status` — show current state
- `/goal pause` — pause the continuation loop
- `/goal resume` — resume (resets turn counter)
- `/goal clear` — drop the goal
Works on both CLI and gateway platforms via the central CommandDef
registry.
### Design invariants preserved
- **Prompt cache**: continuation prompts are regular user-role
messages appended to history. No system-prompt mutation, no toolset
swap.
- **Role alternation**: continuation is a user turn, never injected
mid-tool-loop.
- **Session persistence**: goal state lives in SessionDB.state_meta
keyed by `goal:<session_id>`, so `/resume` picks it up.
- **Mid-run safety**: on the gateway, `/goal status|pause|clear` are
allowed mid-run (control-plane only); setting a new goal requires
`/stop` first so we don't race a second continuation prompt against
the current turn.
### Files
- `hermes_cli/goals.py` (new, 380 lines) — GoalManager + judge + state
- `hermes_cli/commands.py` — CommandDef entry
- `hermes_cli/config.py` — `goals.max_turns` default
- `hermes_cli/web_server.py` — dashboard category merge
- `cli.py` — /goal handler + post-turn continuation hook in
process_loop
- `gateway/run.py` — /goal handler + post-turn continuation hook
wrapping _handle_message_with_agent
- `tests/hermes_cli/test_goals.py` (new, 26 tests) — judge parsing,
fail-open semantics, lifecycle, persistence, budget exhaustion
- `website/docs/reference/slash-commands.md` — docs entry
hermes update had two interactive [Y/n] prompts with no bypass:
1. Config migration (after new env/config options are added)
2. Autostash restore (when uncommitted work was stashed before pull)
hermes uninstall already has --yes/-y; mirrors that.
Under --yes:
- Config-migrate prompt → auto-yes, migrate_config(interactive=False)
so new config fields are applied but API-key prompts are skipped
(user runs 'hermes config migrate' later for those). Matches
gateway-mode semantics.
- Stash-restore prompt → auto-yes, git stash apply runs automatically.
Closes the 'can I hermes update -y, No ! Fix' gap reported by @murelux.
Adds opt-in auto-deletion for slash-command reply messages like
"New session started!", "Restarting gateway…", "Stopped.", and
YOLO toggles. After the TTL elapses the gateway calls the adapter's
delete_message; on platforms without a delete API (everything except
Telegram today) the TTL is silently ignored and the message stays.
Requested on Twitter by @charlesmcdowell — tool-call bubbles are useful
real-time, but system notices clutter the thread once the agent finishes.
Implementation:
- EphemeralReply(str) sentinel in gateway/platforms/base.py. Subclasses
str so existing 'X' in response / response.startswith(...) checks in
tests and call sites keep working unchanged; isinstance() still
distinguishes it for the send path.
- _process_message_background and both busy-session bypass paths
(in base.py) call _unwrap_ephemeral() on the handler return, send
the unwrapped text, and schedule a detached delete task when the
TTL > 0 AND the adapter class overrides delete_message.
- display.ephemeral_system_ttl (default 0 = disabled) in DEFAULT_CONFIG.
Handler can pass ttl_seconds explicitly to override.
- Wrapped the highest-noise return sites: /new, /reset, /stop,
/yolo on/off, /restart success + "already in progress". Draining
notices and /help output left as plain strings — those are
informational and users want to read them.
Backward-compat: default TTL 0 → no scheduling, no behavior change
for existing users. Platforms without delete_message silently no-op.
When the curator consolidates skill X into umbrella Y, any cron job
that listed X in its skills field would fail to load X at run time —
the scheduler logs a warning and skips it, so the scheduled job runs
without the instructions it was scheduled to follow.
cron.jobs.rewrite_skill_refs(consolidated, pruned) now updates jobs
in-place: consolidated names route to the umbrella target (dedup
when umbrella is already present), pruned names are dropped.
agent.curator._write_run_report calls it after classification,
best-effort so a cron-side failure never breaks the curator itself.
Results are recorded in run.json (counts.cron_jobs_rewritten + full
cron_rewrites payload), a separate cron_rewrites.json for convenience
when jobs were touched, and a section in REPORT.md.
Reported by @tombielecki.
DeepSeek V4 Pro tightened thinking-mode validation and rejects empty-string
reasoning_content with HTTP 400:
The reasoning content in the thinking mode must be passed back to the API.
run_agent.py injected "" at three fallback sites — the tool-call pad in
_build_assistant_message and both injection branches of
_copy_reasoning_content_for_api (cross-provider poison guard + unconditional
thinking pad). All three now emit " " (single space), which satisfies the
non-empty check on V4 Pro without leaking fabricated reasoning.
Also upgrades stale empty-string placeholders on replay: sessions persisted
before this change have reasoning_content="" pinned at creation time; when
the active provider enforces thinking-mode echo, the replay path now rewrites
"" -> " " so existing users don't 400 on their first V4 Pro turn after
updating. Non-thinking providers still round-trip "" verbatim.
Updates 9 existing assertions + adds 2 regression tests (stale-placeholder
upgrade, non-thinking verbatim preservation).
Refs #15250, #17400.
Closes#17341.
The user-visible /compress banner and the post-compression last_prompt_tokens
writeback both counted only the raw message transcript (chars/4). With a 15KB
system prompt and 30 tool schemas (~26KB), a 4-message transcript that looks
like ~45 tokens to the transcript-only estimator is really ~10.5K tokens of
request pressure — a 234x gap.
Two user-facing consequences:
- Banner shows 'Compressing … (~45 tokens)…' while compression is actually
firing on 10K+ tokens of real pressure, confusing users about why
compression triggered (reported by @codecovenant on X; #6217).
- Post-compression last_prompt_tokens writeback omits tool schemas, so the
next should_compress() check compares real usage against a stale
underestimate — compression triggers late, potentially past the model's
context limit on small-context models (#14695).
Swap estimate_messages_tokens_rough() for estimate_request_tokens_rough()
at every user-visible banner and at the post-compression writeback.
estimate_request_tokens_rough() already existed for exactly this purpose
and includes system prompt + tool schemas.
Touched call sites:
- run_agent.py: post-compression last_prompt_tokens writeback, post-tool
call should_compress() fallback when provider usage is missing
- cli.py: /compress banner + summary
- gateway/run.py: gateway /compress banner + summary
- tui_gateway/server.py: TUI /compress status + summary
- acp_adapter/server.py: ACP /compact before/after
Left intentionally alone:
- Session-hygiene fallback and the 'no agent' /status path in gateway/run.py
— no agent instance is in scope to query for system prompt/tools, and the
existing 30-50% overestimate wobble on hygiene is safety-accepted.
- Verbose-mode 'Request size' logging — informational only, already counts
system prompt via api_messages[0].
Also relabels the feedback line from 'Rough transcript estimate' to
'Approx request size' so the metric label matches what it actually measures.
Credits: diagnoses from @devilardis (#14695) and @Jackten (#6217);
user report @codecovenant on X (2026-04-30).
Closes#14695Closes#6217
When a user types /steer <text> on an ACP session that isn't actively
running a turn (and there's no interrupted-prompt salvage available),
_cmd_steer silently appended to state.queued_prompts and replied
"No active turn — queued for the next turn". That looks identical to
/queue output even though the user never typed /queue — @EddyLeeKhane
reported this as "/steer never works, gets queued instead".
Rewrite the payload to a plain user prompt before the slash-intercept
fires, matching the gateway's idle-/steer fallthrough in
gateway/run.py ~L4898.
_process_message_background snapshotted callback_generation from the
interrupt event at the TOP of the task — before the handler ran.
_hermes_run_generation is only set on the event by
GatewayRunner._bind_adapter_run_generation during
_handle_message_with_agent, which runs DURING the handler await. The
early snapshot always captured None, which then flowed into
pop_post_delivery_callback(..., generation=None) in the finally block.
In pop_post_delivery_callback, generation=None with a tuple-registered
entry (generation, callback) bypasses the ownership check — it pops and
fires the callback regardless of which run owns it. Result: a stale run
could fire a fresher run's post-delivery callback (e.g. a
background-review notification attributed to the wrong turn).
Fix: move the snapshot into the finally block, after the handler has
run and _hermes_run_generation has been bound to the current run.
Regression test added: simulates a stale handler at generation=1 and a
fresher callback registered at generation=2. Pre-fix: snapshot=None →
pop fires the generation=2 callback under generation=1's ownership
("newer" fires). Post-fix: snapshot=1 → pop skips the mismatched
entry, callback stays in the dict for the correct run to claim.
Verified: test FAILS on current main (captures "newer" in fired list),
PASSES with this fix.
Salvaged from PR #12565 (the callback-ownership portion only; the
/status totals portion was already fixed on main in 7abc9ce4d via #17158).
Co-authored-by: Oxidane-bot <1317078257maroon@gmail.com>
Widens #16528 to two sibling sites that had the same quoted-boolean
bug: a YAML string "false" (or "0", "no", "off") silently evaluated
truthy under bool() / if-check.
- gateway/run.py _load_show_reasoning: is_truthy_value wrap
- tools/skill_manager_tool.py _guard_agent_created_enabled: is_truthy_value wrap
- regression tests for both
SELECT in get_messages_as_conversation() was missing finish_reason, so
assistant messages round-tripped through replay (including /branch copies)
silently dropped the provider's stop signal. Adds it to the SELECT, restores
it on assistant rows, and locks it in with a round-trip test.
When running on a host with sudoers NOPASSWD configured for the current
user, interactive Hermes sessions were unnecessarily entering the
password prompt path before executing sudo commands. Outside Hermes,
`sudo -n true` exits 0 for that user.
Add `_sudo_nopasswd_works()` that probes `sudo -n true` and, when it
succeeds, lets `_transform_sudo_command()` return the command unchanged
with no stdin password. The probe:
- Is scoped to the `local` terminal backend only, so Docker/SSH/Modal
and other remote backends do not inherit host sudo state.
- Re-probes every call (no process-lifetime cache) so an expired sudo
timestamp cannot silently make a later command block waiting for a
password that Hermes never prompts for.
- Is bypassed entirely when `SUDO_PASSWORD` is configured or a cached
password already exists, preserving existing explicit-password flows.
Co-authored-by: Junting Wu <juntingpublic@gmail.com>
The fix for this bug (isinstance guard) was merged via commit 3ff9e010,
but test coverage was not included. Adding 4 tests:
- dict metadata with hermes keys (normal case)
- string metadata (bug case — previously caused AttributeError)
- None metadata
- missing metadata key
Proves token A's detected capabilities do not leak to token B after the
fix in the preceding commit. Before the fix this test would have seen
both tokens return token A's cached value.