When a platform-bundle name (e.g. `hermes-yuanbao`, or any `hermes-*`) lands
in `agent.disabled_toolsets`, the shared tool-assembly path
(`model_tools._compute_tool_definitions`, used by the gateway, cron, AND the
CLI) subtracted the WHOLE bundle from the enabled set. Because every platform
bundle is defined as `_HERMES_CORE_TOOLS + [platform extras]`, and core tools
are shared by every other enabled toolset, the subtraction emptied the tool
list entirely — the model received `tools: []` / `tool_choice: null` and
started replying "I cannot execute shell commands" with no error, no warning,
and `hermes tools list` / `hermes doctor` still green. For unattended cron
jobs this fails silently for days. (#33924)
(The original report framed this as gateway-only; it actually affects every
caller of `_compute_tool_definitions`, including the CLI — the reporter's
follow-up confirms this. Fixing the shared chokepoint covers all paths.)
Fix: for a `hermes-*` bundle in `disabled_toolsets`, subtract only its
*non-core delta* (its platform-specific tools plus those of any `includes`),
leaving `_HERMES_CORE_TOOLS` intact. Disabling a bundle now removes its
platform tools (e.g. the `yb_*` tools for `hermes-yuanbao`) while terminal,
read_file, web, etc. survive. A `logger.warning` notes that core tools are
preserved and that bundle names usually belong in `toolsets:`, not
`disabled_toolsets` — informative, not destructive (the subtraction still
behaves sensibly).
Salvaged from #33941 by @liuhao1024 (authorship preserved). Extracted the
inline bundle-resolution into a module-level `_bundle_non_core_tools` helper
(was re-importing `toolsets` inside the disable loop), and added the
informative warning folding in the UX intent of #34073 (@ousiaresearch)
without its hard "ignore the bundle name" behavior — which would have undone
this fix's sensible-subtraction.
Verified empirically: disabling `hermes-yuanbao` from a gateway-style enabled
set keeps all core tools (18→18) and would remove only the 5 `yb_*` tools;
disabling `hermes-discord` removes only `discord`/`discord_admin`.
Fixes#33924
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Two regression tests for the agentmemory reconnect-loop:
- _is_method_not_found_error matches the plain 'Unknown method: ping'
phrasing (no structural -32601 code).
- _keepalive_probe latches _ping_unsupported and falls back to list_tools
when send_ping raises 'Unknown method: ping', instead of propagating
(which would reconnect-loop).
PR #22410 added three-mode Telegram topic routing to the live message path
(TelegramAdapter.send via the gateway DeliveryRouter), but the cron delivery
path never got it. cron/scheduler.py::_deliver_result sent through the live
adapter with a bare ``{"thread_id": ...}`` and fell back to the standalone
_send_telegram, neither of which addresses Bot API Direct Messages topics
correctly. After Bot API 10.0 (2026-05-08), sending to a private chat with a
bare ``message_thread_id`` is rejected/mis-routed, so cron deliveries to a
private DM topic landed in the General topic instead of the requested lane.
Fix: the cron live-adapter branch now routes the text send through the
gateway's ``DeliveryRouter._deliver_to_platform`` — the same canonical path
live messages use — so it inherits all three Telegram routing modes:
1. Forum/supergroup (negative chat_id) -> message_thread_id
2. Bot API DM topics (private chat_id + numeric topic id) ->
direct_messages_topic_id (the case #22773 reported)
3. Hermes-created named private DM-topic lanes -> ensure_dm_topic +
reply anchor
For mode 2, a private-chat target with a numeric topic id is passed as
``direct_messages_topic_id`` metadata (verified end-to-end:
TelegramAdapter._thread_kwargs_for_send turns it into
``{message_thread_id: None, direct_messages_topic_id: <int>}``), instead of a
bare message_thread_id. Forum/supergroup and home-channel deliveries are
unchanged. The standalone fallback (gateway down) is preserved.
No new config knob and no duplicated routing logic — this reuses the existing
DeliveryRouter rather than reimplementing topic routing in the cron path.
Salvaged from #42051 (stepanov1975) and #23249 (devsart95), which both
diagnosed the missing three-mode routing in the cron/standalone path;
reimplemented onto the canonical DeliveryRouter that landed since those PRs
were opened.
Co-authored-by: Alex <9785479+stepanov1975@users.noreply.github.com>
Co-authored-by: devsart95 <devsart95@gmail.com>
A recurring cron job persists `next_run_at` as an absolute timestamp with a
UTC offset (e.g. `2026-05-19T21:00:00+10:00`). Cron expressions, however,
describe *local wall-clock* intent ("run at 21:00"). When Hermes/system
timezone changes after the timestamp was persisted, the stored instant is
re-interpreted in the new zone: `21:00+10:00` is the instant `13:00+02:00`,
which is `<= now` (13:02+02:00) — so the job fires HOURS EARLY, then
`compute_next_run` advances it via croniter to `21:00+02:00` the same day,
producing a SECOND fire. (#28934, recurrence of #24289.)
`_get_due_jobs_locked` now detects this precise migration case before the
due check: for a `cron` job whose converted instant looks due, whose stored
UTC offset differs from the current zone's, AND whose stored *wall-clock*
time is still in the future (distinguishing a migrated offset from a
genuinely missed run), it recomputes `next_run_at` from the schedule and
skips the early fire — preserving the local wall-clock intent.
Verified against the issue's reproducer: stored `21:00+10` under runtime
`+02:00` at wall-clock `13:02` is rescheduled to `21:00+02` instead of
firing early + again.
Salvaged from #28941 by @Tranquil-Flow (authorship preserved). Chosen over
the alternative approaches (#28951 normalize-to-UTC, #28985 rebase-and-match)
because UTC-normalization does not change the absolute-instant comparison and
so does not fix the early fire, and this guard is the tightest: it only acts
when all four conditions hold and reuses the existing `compute_next_run`.
Fixes#28934
`cronjob(action='run')` (and `hermes cron run`) only set `next_run_at = now`
and returned success, relying on the scheduler ticker to actually execute the
job on its next tick. When no gateway/ticker is running — a CLI-only setup, or
the Windows case in #41037 — the job never executed: `run` reported success,
but `last_run_at` stayed null forever, no output, no delivery.
A manual `run` should actually run. `_execute_job_now` now:
- **claims the job via `claim_job_for_fire`** — the same at-most-once CAS the
scheduler/external-provider fire path uses. This both advances `next_run_at`
for recurring jobs and blocks a concurrently-running gateway ticker from
double-firing the same job; if the claim is lost, the run is skipped (the
tool reports `execution_skipped`). This closes the double-fire race that a
bare `advance_next_run` left open (a tick whose `get_due_jobs` already
captured the job between trigger and advance would still fire it).
- **delegates firing to `run_one_job`** — the single shared
execute→save→deliver→mark body the ticker and external providers use — so
failure delivery, `[SILENT]` handling, and live-adapter delivery stay
identical across paths and can't drift. (The original salvage re-implemented
this sequence inline and had already dropped failure delivery + `[SILENT]`.)
The tool response carries `executed`, `execution_success`, and either
`execution_error` or `execution_skipped`. The `hermes cron run` CLI message no
longer claims "It will run on the next scheduler tick" — it reports the actual
"Ran now: succeeded/failed" outcome (or the skip).
Salvaged from #41130 by @kyssta-exe (authorship preserved); reworked to reuse
`claim_job_for_fire` + `run_one_job` per review rather than re-implementing the
fire sequence inline. Adds tests for the claim-then-fire path, claim-lost skip,
failure reporting, and exception capture.
Fixes#41037
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
The in-process cron ticker (cron/scheduler_provider.py) caught only
`Exception` and logged at DEBUG, so a `SystemExit`/`KeyboardInterrupt`
raised from a misbehaving provider SDK or agent retry path killed the
ticker thread silently. The gateway PROCESS stayed up, so `hermes cron
status` — which only checks `find_gateway_pids()` — kept reporting
"✓ jobs will fire automatically" while no jobs ever fired (#32612,
#32895).
This makes ticker death survivable and detectable:
- The ticker loop now catches `BaseException` and logs at ERROR with a
traceback, so a single bad tick no longer tears the thread down and
the failure is visible in the gateway log.
- The loop records a heartbeat (`cron/ticker_heartbeat`, epoch seconds)
on startup and after every tick — best-effort, never raised into the
loop. Both ticker entry points (the gateway and the desktop fallback
in web_server.py) funnel through `InProcessCronScheduler.start`, so one
heartbeat site covers both.
- `hermes cron status` now reads the heartbeat age: if the gateway is
running but the heartbeat is stale (> 200s, i.e. several missed ~60s
ticks), it reports the ticker as STALLED and suggests a restart instead
of falsely claiming jobs will fire. A missing heartbeat (older build /
never ran) is treated as "unknown", not "dead".
Adds tests for BaseException survival, per-iteration heartbeat recording,
heartbeat round-trip/age, staleness detection, and silent-write-failure.
Salvaged from #49660 (BaseException survival on current structure),
extended with the heartbeat + honest-status reporting that the earlier
(pre-refactor) watchdog PRs #35616 and #33849 proposed.
Fixes#32612Fixes#32895
Co-authored-by: banditburai <promptsiren@gmail.com>
Co-authored-by: sweetcornna <96944678+sweetcornna@users.noreply.github.com>
Consolidates three cron-delivery defects in cron/scheduler.py::_deliver_result
that all stem from how the live-adapter send result is interpreted.
#38922 — duplicate message on confirmation timeout.
future.result(timeout=60) raising TimeoutError bubbled to the outer
except handler, which left delivered=False, so `if not delivered:` re-sent
the identical message via the standalone path. future.cancel() cannot
un-send a request already in flight on the wire, so a slow confirmation
deterministically produced a duplicate. The send was already dispatched onto
the gateway loop, so a bare timeout is now treated as delivered
(assume-delivered is safer than guaranteed-duplicate) and the standalone
fallback is skipped. The live-adapter media attempt is also skipped on
timeout since the contended loop would re-block each 30s media budget.
#47056 — silent drop when the gateway has an active session.
The old check `if send_result is None or not getattr(send_result,
"success", True)` let a result object missing a `success` attribute default
to True = counted as a successful delivery, so the scheduler logged
"delivered via live adapter" while the gateway never processed the message.
Delivery is now confirmed via _confirm_adapter_delivery(): only an explicit,
truthy `success` attribute counts; None or a `success`-less object falls
through to the standalone path so the message actually arrives.
A genuine send Exception (not a slow confirmation) still falls through to
the standalone path, and is caught by run_job's outer handler — it is
recorded as the job's last_error and never crashes the cron ticker.
#43014 — deliver=origin fails to resolve in CLI sessions.
A CLI-created job has no {platform, chat_id} origin, so deliver=origin (and
auto-detect / deliver=None) was unresolvable and emitted "no delivery target
resolved" on every run. An unresolvable origin with no configured home
channel is now treated as local (output stays in last_output), matching the
documented auto-deliver contract; a concrete unresolvable platform target
still reports a real error.
Salvaged from #41007 (timeout discriminator), folding in #47127's
_confirm_adapter_delivery hardening and #38937 / #43063's origin→local
fallback. Tests rewritten as behavior contracts (timeout => no duplicate;
None / success-less result => standalone fallback; confirmed success => no
fallback; CLI origin => local, explicit platform => still errors).
Co-authored-by: Evi Nova <66773372+Tranquil-Flow@users.noreply.github.com>
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
Cron jobs created without an explicit `model` are stored as `model: null`.
At fire time `run_job` resolved `model = job.get("model") or os.getenv(
"HERMES_MODEL") or ""` and then `_model_cfg.get("default", model)`, so when
config.yaml had no `model.default` (or `model: {default: null}`) an empty
string flowed straight to the provider and surfaced as an opaque HTTP 400
("Model parameter is required" / "model: String should have at least 1
character"). The operator had to inspect jobs.json to discover the job was
stored with a null model.
This change makes cron model resolution robust and symmetric with the CLI:
- Coerce `model: null`/missing config to `{}` so a falsy default never
overwrites an already-resolved env value with `None`.
- Only overwrite `model` from `model.default` when the resolved value is
truthy; accept a `model.model` alias key, mirroring the sibling resolvers
in hermes_cli/oneshot.py, fallback_cmd.py and prompt_size.py.
- Resolve AFTER the managed-scope overlay so an administrator-pinned model
still wins.
- Fail fast with an actionable error (caught by run_job's outer handler and
recorded as the job's last_error — the cron ticker is unaffected) instead
of letting an empty model reach the API.
- The per-job model is re-read every tick, so a `cronjob action=update
model=...` after a failed run takes effect on the next tick (no cache).
Adds tests/cron/conftest.py pinning a default HERMES_MODEL so existing
run_job tests don't trip the new guard, plus regression tests covering env
fallback, config.default fallback, string-form config, the model alias key,
null-default-no-clobber, corrupt-config graceful degradation, fail-fast,
and the no-cache re-read property.
Salvaged from #24005, rebased onto current main, with additional test
coverage folded in from #45550 and the alias-key behavior from #43952.
Fixes#43899Fixes#23979Fixes#22761
Co-authored-by: szzhoujiarui-sketch <szzhoujiarui@gmail.com>
Co-authored-by: rayjun <rayjun0412@gmail.com>
protect_first_n keeps the first N non-system messages verbatim through
compaction so the original task framing survives. But it was applied on
EVERY compression pass: the same early user turns were re-copied into each
child session and never summarized away, so across a long, repeatedly-
compressed session those old messages became immortal and grew the
protected head unboundedly (#11996, P1).
Decay it: protect_first_n applies on the FIRST compaction only. Once the
session has been compressed at least once (compression_count >= 1, or a
handoff summary already exists), the early turns are captured in the
summary, so _effective_protect_first_n() returns 0 and only the system
prompt stays protected. The decay is read at compress_start computation
time, before compression_count/_previous_summary are mutated at the end of
compress(), so the first pass still protects correctly.
Co-authored-by: truenorth-lj <liliangjya@gmail.com>
Co-authored-by: davidvv <david.vv@icloud.com>
Single-op replace/remove failed with a dead-end 'old_text is required'
error when a structured-output client omitted the optional old_text field
(it can't be schema-required without a top-level if/then combinator that
OpenAI's Codex backend 400s on). The model couldn't recover.
Now a missing old_text returns the current entry inventory plus a retry
instruction (mirroring the batch path's _batch_error), so the model can
reissue the call with old_text set. Also sharpens the old_text schema
description to state it's required for replace/remove.
Fixes#49466, #43412.
The native echo recovery handles replies to most rich messages, but
messages sent before the bot's first rich send have no echo to read.
record() was only called on the fresh-send path (_try_send_rich); a
streamed final finalized via _try_edit_rich/editMessageText was never
indexed, so a reply to it had neither a native echo nor an index entry.
Mirror the fresh-send record() into the edit success path to close
that gap.
Telegram DOES echo a rich message's content back in
reply_to_message.api_kwargs['rich_message']['blocks'] when a user
replies to it. Read that native field first in _build_message_event,
keeping the local send-time index only as a fallback. Duck-type
api_kwargs via .get() since it is a mappingproxy, not a dict.
Fixes#49534
- Remove the 'you only log in once per machine' claim from spotify.md
and document the ~6-month refresh token expiry with re-auth instructions
- Add test_client_wraps_invalid_grant_as_spotify_auth_required_error to
confirm SpotifyClient wraps AuthError(code=spotify_refresh_invalid_grant)
into SpotifyAuthRequiredError with a user-facing message
Refs: #28155
resolve_spotify_runtime_credentials() called _refresh_spotify_oauth_state()
without a try/except, so a terminal failure (HTTP 400/401, invalid_grant,
refresh_token_reused) raised AuthError but left the dead refresh_token in
auth.json. Every subsequent session re-read and retried the same token over
the network, failing identically each time.
Fix: wrap the refresh call and, when exc.relogin_required is True and a
refresh_token is present, clear the dead OAuth fields (access_token,
refresh_token, expires_at, expires_in, obtained_at) and write a
last_auth_error quarantine marker to auth.json before re-raising. The next
call sees no access_token and fails fast with spotify_access_token_missing —
no network retry — and the user is prompted to re-authenticate.
Mirrors the quarantine pattern already in place for Nous, xAI-OAuth,
Codex-OAuth (#28116, #28118), and MiniMax-OAuth (#28119).
Clarify that session_search is secondary context and direct source identifiers must be inspected first when accessible. Add regression coverage for the tool description.
After context compression, the agent re-sent an already-delivered
generated image on every subsequent turn (#46627). The auto-append
fallback rescans full history when the message list shrinks (compression-
safe path), deduping against _history_media_paths — but that set was built
by scanning ONLY MEDIA: text tags in tool results. image_generate returns
its path in a JSON payload field (host_image/image/agent_visible_image),
never a MEDIA: tag, so generated-image paths never entered the dedup set
and were re-emitted after the boundary.
Extract the history-path collection into _collect_history_media_paths(),
which now covers BOTH delivery shapes: MEDIA: text tags AND image_generate
JSON-payload paths (mirroring what _collect_auto_append_media_tags
extracts). The inline block in _handle_message is replaced with a call to
the helper.
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
When LLM summarization fails, the deterministic fallback summary rendered
the latest user ask (active_task = "User asked: '<ask>'") verbatim under
THREE headings — Historical Task Snapshot, Historical In-Progress State,
and Historical Pending User Asks. Re-presenting an already-handled ask as
unresolved in-progress/pending work made the model re-answer it AND treat
the resurrected ask as the active turn, burying the genuinely-new
post-compaction user message (#49307: answer repetition + new-instruction
loss, P1).
Keep the latest ask once, under Task Snapshot, as historical context only.
The In-Progress and Pending-Asks sections now say 'Unknown / None
recoverable from deterministic fallback' (consistent with the Active
State / Key Decisions / Resolved Questions sections) and explicitly note
the ask is historical, not outstanding. The raw turn text still appears in
the verbatim 'Last Dropped Turns' transcript — that's the dropped-turn
record, not a re-labeled instruction.
Note: the separate role=assistant standalone-summary regurgitation
(#33256) is left as-is — that role choice is constrained by strict message
alternation (user collides with a user-ending head) and is already
mitigated by the summary end-marker; forcing the role would risk the
alternation invariant.
Co-authored-by: r266-tech <r2668940489@gmail.com>
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
Context compression is atomic, but a gateway interrupt (an incoming user
message while the agent is busy) could abort the in-flight summary call.
The Codex Responses aux stream polls the thread interrupt flag and raised
InterruptedError unconditionally — so compression fell back to a degraded
static 'summary unavailable' marker, losing the real handoff (#23975).
Add a thread-local interrupt-protection flag (aux_interrupt_protection
context manager) in auxiliary_client; the Codex stream's cancellation
check honors it. The compressor wraps its summary call_llm in the context
manager. Timeouts still fire (a hung call must die) and all other aux
tasks (vision, web_extract, title_generation, …) stay interruptible.
Re-entrant, so the main-model retry recursion is safe.
Co-authored-by: konsisumer <der@konsi.org>
test_matrix_voice flaked in CI (6/7 failing on some shards, passing on
others and on main) depending on leaked MATRIX_REQUIRE_MENTION env state.
Root cause: the adapter defaults require_mention=True (falling back to the
MATRIX_REQUIRE_MENTION env var). These tests fire a group-room audio event
with no @mention, so _resolve_message_context drops it before dispatch
('No event was captured') whenever require_mention resolves True — which
happens in a clean shard, but an earlier test in another shard can leave
MATRIX_REQUIRE_MENTION=false in os.environ and mask it. The plugin
migration (#5600105478 adapter→bundled plugin) shifted shard composition
and exposed it.
Pin require_mention: False in the test adapter config so these media-TYPE
detection tests are no longer gated by the mention requirement, regardless
of ambient env. Verified: 7/7 pass with MATRIX_REQUIRE_MENTION=true (the
failing condition) AND with the env unset.
Gateway Session Hygiene auto-compression destroyed the original transcript
when the throwaway hygiene agent couldn't rotate the session (#21301, P1).
The _hyg_agent is built WITHOUT a session_db, so _compress_context cannot
end-and-fork the session (its rotate block is gated on agent._session_db).
The session_id stays unchanged, and the rewrite_transcript() call ran
UNCONDITIONALLY — replacing the full original transcript with just the
head+summary list. Permanent data loss on every hygiene compaction.
Guard the rewrite behind 'rotated OR in-place' exactly like the /compress
path already does (#44794/#39704): only overwrite when a new session id
was minted or in-place compaction succeeded; otherwise preserve the
original transcript and log a warning. The token/count bookkeeping that
followed the rewrite is moved inside the guard, with no-change values in
the preserve branch.
Co-authored-by: SandroHub013 <sandrohub013@gmail.com>
Co-authored-by: WuTianyi123 <wtyopenclaw@gmail.com>
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
Three state-loss bugs at the compression rotation boundary, fixed together
because they all live in the same ~80-line rotation block:
- #33618: a persistent /goal did not follow the rotation. load_goal does a
flat per-session lookup with no lineage walk, so a goal silently died when
compression minted a fresh child id. Added migrate_goal_to_session() and
call it after the child session is created (move-not-copy: the parent row
is archived as cleared so exactly one active goal row exists).
- #33906/#33907: if the child create_session raised (FK constraint,
contended write), the outer handler only warned and let the agent continue
on the NEW id — which has no row in state.db — producing an orphan session.
Now the rotation rolls agent.session_id back to the still-indexed parent
(reopening it) instead of stranding the conversation on a phantom id.
- #27633: the compaction-boundary on_session_start notification omitted the
platform kwarg, so context-engine plugins saw source=unknown for every
message after the boundary. Forward platform (matching the initial
session-start call in agent_init.py).
Co-authored-by: denisqq <21260182+denisqq@users.noreply.github.com>
Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com>
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
The subagent-demotion busy-handler test asserted the internal
merge_pending_message_event call, which the FIFO refactor replaced with
_queue_or_replace_pending_event. Assert the behavioral outcome (the
follow-up lands in the pending slot for the next turn) instead — same
fix already applied to the two steer-fallback tests.
When the agent is busy and the user sends multiple text follow-ups, the
interrupt-mode and steer-fallback path stored them via
merge_pending_message_event(merge_text=True), which newline-joins
consecutive TEXT messages into a SINGLE pending turn — collapsing two
separate user messages into one mashed-together turn and destroying the
message boundaries the user sees (#43066 sub-bug 2).
Route that storage through _queue_or_replace_pending_event (the same FIFO
infrastructure used by busy queue-mode and /queue) so each follow-up gets
its own next-turn slot in arrival order, while still preserving
photo-burst / album merge semantics for media. Pure queue-mode already
used FIFO; this brings the interrupt/steer-fallback path in line.
The sibling defect in #43066 (assistant messages lost after compaction)
was already fixed on main by the identity-tracking flush rewrite (#46053)
plus the pre-rotation flush (#47202), so this only addresses the
remaining busy-message-merge half.
Co-authored-by: KiruyaMomochi <65301509+KiruyaMomochi@users.noreply.github.com>
On Windows, hermes writes writer.bat (@echo off / hermes -p writer %*)
with CRLF endings instead of the POSIX writer shell script. The test
hardcoded the POSIX path and exact bytes, so it failed on Windows hosts.
Assert on stripped non-empty lines per platform, making it line-ending-
and OS-independent.
Follow-up to the salvaged worktree-materialization fix. When a worktree
task has no explicit workspace_path, resolve the anchor from the board's
default_workdir (a git repo) and materialize <repo>/.worktrees/<id> per
task, instead of silently rooting under the dispatcher's CWD (whatever
directory launched the gateway, e.g. the Hermes checkout). If no
default_workdir is configured, raise with a clear message rather than
guessing from CWD.
Adds AUTHOR_MAP entry for the salvaged commit.
The dispatcher treated workspace_kind=worktree as metadata only and never
ran 'git worktree add', so every worktree task ran in the main repo checkout
instead of an isolated worktree — concurrent tasks silently shared one tree
and contaminated each other.
This materializes a real linked worktree at <repo>/.worktrees/<task_id> on
branch wt/<task_id> when resolve_workspace() handles a worktree task, treats a
repo-root workspace_path as shorthand for that location, persists the derived
workspace/branch back onto the task row, and — on rerun/redispatch — detects an
already-materialized linked worktree (via git-common-dir) and reuses it instead
of nesting a second .worktrees/<id> inside it.
Follow-up for salvaged #49654: unit tests for resolve_whatsapp_bridge_dir()
(writable passthrough, read-only mirror, existing-mirror reuse) and the
AUTHOR_MAP entry for the contributor.
Add a regression test for #47868 asserting convert_messages strips the
internal per-message timestamp field, plus the identity-return path for
timestamp-free message lists. Map x7peeps for the release attribution gate.
When the auxiliary summary call fails with an authentication/permission
error (HTTP 401/403), context compression now ABORTS and preserves the
session unchanged instead of rotating into a child session with a
placeholder summary.
Before: a 401 (invalid/blocked key, or a token pointed at the wrong
inference host) fell through every transient-error check to 'return
None', and because compression.abort_on_summary_failure defaults False,
compress() took the static-fallback path and rotated the session anyway
(messages N->N). The user landed on a fresh-but-broken session that kept
failing the same way — paying for a full-context API call each turn with
no useful compression.
After: _generate_summary classifies 401/403 as a non-recoverable auth
failure (_last_summary_auth_failure) and compress() aborts on it
regardless of abort_on_summary_failure. A distinct auxiliary summary_model
that 401s still retries once on the main model first (its dedicated creds
may be the only broken thing); the abort only sticks when the main model
itself auth-fails or the fallback also auth-fails. The existing
_last_compress_aborted handling in conversation_compression.py already
skips rotation and emits a warning, so no session rotation occurs.
Tests: TestAuthFailureAborts — 401/403 flagging, compress() aborts despite
flag=False, non-auth failures keep the historical fallback path, and
aux-model auth failure recovers on main without aborting.
When the active provider returns a 401/403 that survives its per-provider
credential-refresh attempt (revoked OAuth, blocked/expired key, or an
account pinned to a dead/staging inference endpoint), the conversation
loop now escalates to the configured fallback chain instead of dead-ending.
Before: the generic failover dispatch fired only for {rate_limit, billing};
auth/auth_permanent fell through to 'switch providers manually' advice and
never called _try_activate_fallback(). A user whose primary credential was
broken kept thrashing on the same dead credential every turn — the main
agent appeared 'stuck in fallback mode' while never actually failing over.
This also affected auxiliary tasks (compression, vision, title-gen), since
auto-resolved aux follows the main provider.
After: a persistent auth failure with a configured fallback chain switches
to the next provider (mirroring the rate-limit/billing failover path),
guarded one-shot per attempt by TurnRetryState.auth_failover_attempted.
When no fallback is configured the behavior is unchanged — it falls through
to the existing terminal handling and provider-specific troubleshooting
guidance.
Tests: test_auth_provider_failover.py — 401/403 classify as auth, the
gating condition fires only with a chain present + guard unset, the guard
blocks repeats, and non-auth (500) errors do not trigger auth failover.
* feat(delegation): single-task delegate_task always runs in the background
The model no longer decides whether a subagent runs in the background — a
single-task delegate_task from the top-level agent is now always dispatched
async, so the parent turn returns immediately and the subagent's result
re-enters the conversation when it finishes.
- run_agent._dispatch_delegate_task (the live model path) forces
background=True for top-level single-task calls; the schema-level
`background` param is ignored.
- A batch (tasks with >1 item) stays synchronous (fan-out can't go async).
- A delegation from an orchestrator subagent (depth > 0) stays synchronous —
it needs its workers' results within its own turn.
- The function-level default is unchanged, so direct Python callers/tests keep
the historical synchronous behavior.
- On async-pool capacity rejection, single-task now falls through to a
synchronous run instead of erroring (the child stays attached for interrupt
propagation; detach happens only on a successful dispatch).
- Schema `background` param marked deprecated/ignored; tool description
updated to state the always-background single-task rule.
* feat(delegation): all delegate_task fan-out runs in the background
Extend the always-background behavior to the full fan-out. A batch is now
dispatched as N independent async subagents (one handle each), instead of
running synchronously. Single task and batch both return immediately; each
subagent's result re-enters the conversation as its own message when it
finishes.
- delegate_task: when background is set, loop over ALL built children and
dispatch each via dispatch_async_delegation; return a combined handle block
(count + per-task delegation_ids). Children the async pool rejects (at
capacity) run synchronously inline and are reported alongside the dispatched
handles, so nothing is silently dropped.
- run_agent._dispatch_delegate_task + registry handler: force background for
any top-level model delegation (single OR batch); orchestrator subagents
(depth > 0) still run synchronously since they need workers' results within
their own turn.
- Removed the v1 'batch async not supported' rejection.
- Tool description updated: BOTH MODES RUN IN THE BACKGROUND.
- Tests updated to assert batch fan-out dispatches each task async (verified
E2E: 3-task batch -> 3 independent completion-queue events).
* fix(delegation): background fan-out joins and returns one consolidated block
Correct the fan-out semantics: a backgrounded batch is dispatched as ONE
async unit (one handle, one async-pool slot), not N independent dispatches.
The unit runs all children in parallel, waits on every one, and emits a
SINGLE completion event carrying the consolidated per-task results. The chat
is never blocked; when all subagents finish, their full summaries re-enter
the conversation together as one message.
- async_delegation.dispatch_async_delegation_batch + _finalize_batch: a batch
occupies one slot; its runner returns the combined {results:[...]} dict and
one event with the full results list is pushed to the completion queue.
- delegate_tool: extract the sync execution+aggregation into
_execute_and_aggregate(); background dispatches it via the batch unit and
returns one handle; on pool-capacity rejection it runs the batch inline.
- process_registry._format_async_delegation: render a consolidated multi-task
block (TASK i/N + per-task summary) when the event carries is_batch/results.
- Tests updated; E2E verified: 3-task batch -> immediate return -> one combined
completion block with all three summaries.
Async-delegation completions (delegate_task(background=true)) and
background-process completions (terminal notify_on_complete) re-enter the
originating session as internal MessageEvents. When the session was busy,
_handle_active_session_busy_message treated them like a user TEXT message and
the default busy_input_mode='interrupt' aborted the active turn (and sent a
'Interrupting current task' ack) — the opposite of the design invariant that a
completion surfaces as a new turn only when idle.
Short-circuit internal events to return False so the base adapter queues them
silently (it already excludes internal events from debounce), cascading them as
the next turn after the current one finishes.
Follow-up to the soft-archive durability fix. Reusing the rewind/undo active=0
flag for compaction-archived turns inherited the wrong search semantics: undo
rows are intentionally HIDDEN from session_search (the user took them back), but
compaction-archived turns must stay DISCOVERABLE — that is the whole point of
Teknium's "searchable / recoverable" requirement. As built, search_messages
defaulted to WHERE active=1, so after in-place compaction the pre-compaction
turns were in the FTS index but filtered out of the default search. (The earlier
"searchable" claim only held for a raw FTS query / include_inactive=True, not
the actual session_search tool.)
Empirically confirmed the gap: search 'HMAC' returned 2 hits before compaction,
1 after (only the summary's mention) — the originals were hidden.
Fix — a `compacted` flag distinct from `active`, giving a 3-way state:
- active=1, compacted=0 → live context (normal)
- active=0, compacted=1 → compaction-archived: OUT of live context, IN search
- active=0, compacted=0 → rewind/undo: OUT of live context, OUT of search
Changes:
- messages.compacted INTEGER NOT NULL DEFAULT 0 added to SCHEMA_SQL. Declarative
_reconcile_columns adds it on existing DBs — no version bump (plain column add).
- archive_and_compact: UPDATE … SET active=0, compacted=1 (was active=0 only).
- search_messages: default WHERE active=1 → (active=1 OR compacted=1), on BOTH
the main FTS5 path and the trigram CJK path. include_inactive=True still
returns everything. The short-CJK LIKE fallback already returns all rows
(no active filter) — unchanged.
- Docstrings on archive_and_compact + search_messages document the 3-way state.
Verified: after compaction, session_search default finds the archived originals
(ids 1 & 4); rewind/undo rows stay hidden by default (recoverable via
include_inactive); live context still excludes both. 322 in-place + hermes_state
tests and 46 session_search tests green; ruff clean. Mutation check: reverting
the search WHERE to active-only fails the new searchable test.
(Surfaced by the question "is search semantic or only FTS?" — answer: session
search is FTS5 keyword/BM25 only, no embeddings over the transcript; semantic
retrieval lives in the optional memory-provider layer. Tracing that confirmed
the active-only filter gap above.)
Teknium review: keeping one durable session id must NOT come at the cost of
destroying history. The prior in-place implementation used replace_messages,
which hard-DELETEs the pre-compaction turns (they also drop out of the FTS
index) — same id, but the original conversation is gone with no recovery path
and the summary becomes the only record. Rotation today is non-destructive
(the old session's full transcript survives under the old id); in-place must
match that durability contract, not weaken it.
Fix: compact in place by SOFT-ARCHIVING, reusing the existing messages.active
flag (the /undo soft-delete mechanic), instead of deleting:
- New SessionDB.archive_and_compact(session_id, compacted): in one atomic
write, UPDATE messages SET active=0 on the live turns, then insert the
compacted set as fresh active=1 rows. Nothing is deleted.
- The insert loop is extracted into a shared _insert_message_rows() helper so
archive_and_compact and replace_messages don't duplicate the 60-line
column/encoding block (extend-don't-duplicate).
- Agent in-place branch calls archive_and_compact instead of replace_messages.
Durability outcome (proven by test + E2E across repeated compactions):
- Live context load (get_messages_as_conversation / get_messages) filters
active=1, so a resume reloads ONLY the compacted set — compaction still
shrinks the live session.
- The pre-compaction turns stay on disk at active=0, recoverable via
get_messages(include_inactive=True) / restore_rewound.
- They remain FTS-searchable: the messages_fts* triggers index on INSERT and
remove on DELETE only — they do NOT key on active, and active=0 is a
content-preserving UPDATE. session_search still finds them.
- Verified across TWO successive compactions: the 1st compaction's originals
are still recoverable + searchable after the 2nd (answers the "no recovery
path after the next compaction" concern directly).
message_count now reflects the LIVE (active/compacted) count, matching the
live load. replace_messages keeps its DELETE semantics (still correct for
/retry, /undo) and gains a docstring note pointing compaction at the
non-destructive method.
Tests: test_in_place_keeps_same_session_id strengthened to assert the 8
seeded originals survive at active=0 alongside the 2 compacted rows AND stay
FTS-searchable. Mutation check: swapping archive_and_compact back to a hard
DELETE fails the test, so the non-destructive contract is bound. 285
hermes_state + in-place tests green; rotation/persistence/compress-command/cli
suites green; ruff clean.
Parallel 3-reviewer cleanup of the in-place compaction code. Findings applied:
- perf: in-place mode no longer pre-flushes current-turn messages. The flush
ran INSERTs that the immediately-following replace_messages(compressed)
DELETE+reinsert discarded -- pure wasted writes per compaction. The
current-turn tail survives via the compressor's compressed output
(protect_last_n), not the flush. Verified no data loss; rotation still
pre-flushes (its old session row is preserved, so the flush is real there).
- quality: hoist the two shared post-write steps (update_system_prompt +
_last_flushed_db_idx = 0) below the if/else -- they ran in both branches
against agent.session_id. Removes the easiest divergence bug.
- quality: compute the compaction-boundary locals (_old_sid, _is_boundary,
_boundary_parent) ONCE instead of recomputing locals().get('old_session_id')
and the "_old_sid or agent.session_id or ''" chain three times.
- quality: initialize compacted_in_place up front and assign
agent._last_compaction_in_place directly, dropping the fragile
locals().get('compacted_in_place') reflection.
- reuse: parse the in_place config flag with utils.is_truthy_value (the
project's canonical truthy coerce) instead of a hand-rolled
str().lower() in {...} (agent_init already imports from utils).
Dropped as false positives / out of scope: gateway getattr of agent internals
(established session_id pattern), dual result-dict carry (mirrors history_offset
etc.), stringly-typed "compression" (codebase-wide convention, no constant).
Behavior-preserving: 7 in-place tests (incl. 2 new flush-guard tests) + 26
rotation/boundary/persistence/command tests green; mutation check confirms the
durable-replace guard still binds (removing replace_messages fails the test);
ruff clean. Added test_in_place_skips_redundant_preflush /
test_rotation_still_preflushes to guard the perf change.
Review (Codex + 3-agent parallel) found the first cut of in-place mode was
incomplete: it only updated the system prompt, so the persisted transcript
stayed 'full history + summary' and the next turn/resume reloaded the full
history and immediately re-compacted (a loop), and every downstream layer
that keyed off session-id rotation silently no-op'd. The session_id was
doing double duty as the 'compaction happened' signal. This wires the whole
path so removing rotation is actually complete:
Agent (agent/conversation_compression.py):
- In-place now DURABLY replaces the transcript: replace_messages(session_id,
compressed) on the same row (the canonical store the gateway reloads from),
not just update_system_prompt. Resume reloads the compacted set; no loop.
- Reset flush identity/cursor (_last_flushed_db_idx=0, _flushed_db_message_ids
cleared) so next-turn appends diff against the compacted transcript.
- Expose a rotation-independent signal: agent._last_compaction_in_place, and
in_place=True on the session:compress event.
- Fire the compaction-boundary hooks (context-engine on_session_start, memory
manager on_session_switch, reason='compression') in BOTH modes — in-place
passes the same id as parent so DAG/buffer state still checkpoints. Without
this, memory/context plugins miss every in-place compaction.
Gateway auto-compress (gateway/run.py):
- Read agent._last_compaction_in_place; set history_offset=0 on rotation OR
in-place (both return the compacted set, so slicing past the pre-compaction
length would drop everything). Carry compacted_in_place in the result dict.
- No extra rewrite needed: the agent shares the gateway's SessionDB, so its
replace_messages already updated the canonical store load_transcript reads.
Manual /compress (gateway/slash_commands.py):
- The throwaway /compress agent has no _session_db, so rewrite_transcript is
the durable write. Previously gated behind 'if rotated:' which treated
'id unchanged' as the #44794 data-loss failure case and SKIPPED the rewrite
— making /compress a silent no-op in in-place mode. Now rewrites on rotated
OR in_place; the data-loss guard still fires only for the genuine
no-rotation-AND-not-in-place failure.
Hygiene auto-compress already writes _compressed to the same id
unconditionally (its agent has no _session_db, can't rotate) — correct for
in-place, no change.
Tests (tests/run_agent/test_in_place_compaction.py):
- Assert the DURABLE transcript IS the compacted set after reload
(get_messages_as_conversation == compacted), message_count==2, flush
identity reset, and the rotation-independent signal set on in-place /
unset on rotation. Rotation regression guard unchanged.
Verified: 64 tests green across in-place + rotation/persistence/boundary/
concurrent/failure-sync/command/cli suites; E2E both modes (durable replace,
gateway offset=0, rotation preserves old transcript); ruff clean. Still
default-off.
Context compression today rewrites the message list AND rotates the
session id — it ends the session, forks a parent_session_id child, and
renumbers the title (name -> name #2). That moving identity key is the
root cause of a whole bug cluster: /goal lost (#33618), pending response
lost at the split (#14238), orphan sessions (#33907), TUI sid desync
(#36777), FTS search gaps + duplicate sidebar entries (#45117), null
continuation cwd (#42228), and title-rename dead-ends (#48989). It also
forced a large defensive apparatus (compression lock, contextvar/env/
logging triple-sync, orphan finalization, gateway SessionEntry
re-propagation, tip projection) whose only job is surviving a
mid-conversation id change.
Add a compression.in_place config flag (default False during rollout).
When True, compaction rewrites the transcript and rebuilds the system
prompt but keeps the SAME session_id: no end_session, no child row, no
title renumber, no contextvar/logging re-sync, no memory/context-engine
session-switch. The conversation keeps one durable id for life, like
Claude Code / Codex. Compaction is lossy by design — the pre-compaction
transcript is summarized away, not archived.
The rotation path is unchanged when the flag is off (moved verbatim into
an else branch). Staged rollout: this PR ships the option behind a
default-off flag for live validation; a follow-up flips the default and
deletes the now-redundant rotation machinery, superseding the 14 open
band-aid PRs in this area.
- hermes_cli/config.py: add compression.in_place (default False), documented
- agent/agent_init.py: resolve the flag -> agent.compression_in_place
- agent/conversation_compression.py: branch compress_context() on the flag
- tests/run_agent/test_in_place_compaction.py: in-place invariants +
rotation regression guard + config default
The pre-flush of current-turn messages (#47202) runs in BOTH modes, so no
boundary data loss. Prompt-cache invariant preserved: the system-prompt
rebuild is the same single sanctioned invalidation that already happens
during compaction — no NEW invalidation. Message alternation preserved.
A nous inference_base_url that fails the host allowlist (e.g. a stale
stg-inference-api.nousresearch.com persisted before the allowlist
existed) was only replaced 'if refreshed_url:' — so when the validator
rejected the URL it left the poisoned value in place. The 'falling back
to default' warning fired but never took effect: every subsequent call,
including the auxiliary compression call, kept hitting the dead staging
endpoint and 401'd.
Reset to DEFAULT_NOUS_INFERENCE_URL when validation returns None at both
refresh sites in resolve_nous_runtime_credentials, so a poisoned
auth.json self-heals on the next refresh. The proxy adapter already did
this correctly; this brings the two auth.py sites in line.
The session-stable system prompt embeds Model:/Provider: identity lines,
but mid-turn failover (try_activate_fallback) swaps the runtime without
touching them, so a fallback model misreports itself as the primary when
asked "what model are you?".
rewrite_prompt_model_identity() rewrites the last occurrence of each line
on _cached_system_prompt when a fallback activates (and back on restore,
byte-identical so the primary's prefix cache still hits). The rewrite is
never persisted to the session DB. _sync_failover_system_message() patches
the in-flight api_messages[0] at all 8 failover sites so the current turn
ships the corrected identity. Cache-safe: the fallback's prefix cache is
cold on a model switch anyway.
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
* feat(setup): Blank Slate setup mode — minimal agent, opt in to everything
Adds a third first-time setup option alongside Quick Setup and Full Setup.
Blank Slate forces ON only what an agent needs to run — provider & model,
the File Operations toolset, and the Terminal toolset — and turns
everything else OFF, then walks the user through opting each capability
back in.
What it does:
- platform_toolsets.cli = [file, terminal] (explicit, authoritative list)
- agent.disabled_toolsets = every other known toolset (web, browser,
code_execution, vision, memory, delegation, cronjob, skills, image_gen,
kanban, …). Applied last in the resolver, so it overrides the
non-configurable platform-toolset recovery that would otherwise re-add
toolsets like kanban — guaranteeing a true blank slate.
- Optional config features off: compression, memory + user-profile capture,
checkpoints, smart model routing, auto session reset.
- Bundled skills default to NONE (reuses the .no-bundled-skills marker);
offers to seed the full catalog.
- Walks through tools / plugins / MCP / messaging, all opt-in.
Proven end-to-end: with the Blank Slate config, model_tools.get_tool_definitions
emits exactly 6 schemas — patch, process, read_file, search_files, terminal,
write_file. Nothing else reaches the model.
Re-enable later via hermes tools / hermes skills opt-in --sync /
hermes setup agent.
Tests: tests/hermes_cli/test_setup_blank_slate.py (8 tests) pin the writers,
the resolver invariant ({file, terminal}), and the 6-schema end-to-end set.
Docs: getting-started/quickstart.md documents all three setup modes.
* feat(setup): Blank Slate fork — finish minimal, or walk through configs
After applying the minimal baseline (provider/model + file + terminal,
everything else off), Blank Slate now presents a choice instead of always
running the full walkthrough:
1. Start with everything disabled — finish now with the minimal agent.
2. Walk through all configurations — opt in to tools, skills, plugins, MCP,
and messaging.
Provider/model and terminal are still configured first either way (the agent
can't run without them). The finish-now path records the bundled-skill opt-out
so future `hermes update` runs don't re-inject skills. The walkthrough body
moved to a separate _blank_slate_walkthrough() helper.
Tests: TestBlankSlateFork covers both branches (finish-now applies baseline +
skill opt-out and skips the walkthrough; walkthrough path invokes it). Docs
updated to describe the fork.
test_telegram_webhook_secret reads telegram adapter source by path; point it
at plugins/platforms/telegram/adapter.py. test_windows_native_support
npm-spawn parametrization referenced gateway/platforms/whatsapp.py; point it at
plugins/platforms/whatsapp/adapter.py.
Salvage of PR #41284 onto current main. Relocates the last 9 inline messaging
adapters (+ satellites: telegram_network, feishu_comment/_rules/meeting_invite,
wecom_crypto, wecom_callback) from gateway/platforms/ into self-contained
bundled plugins under plugins/platforms/<x>/, discovered via the platform
registry. Strips the per-platform core touchpoints from gateway/run.py,
gateway/config.py, hermes_cli/gateway.py, hermes_cli/setup.py, and
tools/send_message_tool.py.
Carries forward the migration fixes (explicit enabled:false honored,
get_connected_platforms forces discovery, plugin is_connected via
gateway.get_env_value, logs --component gateway matches plugins.platforms.*,
matrix hidden on Windows).
Additionally ports config keys main added since the PR base: the matrix
plugin's _apply_yaml_config now also covers allowed_users,
ignore_user_patterns, process_notices, and session_scope (the inline
gateway/config.py matrix block gained these in the 1340 commits the PR sat
open; they would otherwise have been silently dropped on deletion).
`_sent_message_timestamps` (the reply-to-own-message quote cache) used a
`set` evicted with `set.pop()`, which removes an ARBITRARY element — so once
more than the cap (500) outbound timestamps are tracked, a still-recent
timestamp could be dropped while older ones survive, missing a genuine
reply-to-own-message. Convert it to an OrderedDict with FIFO (oldest-first)
eviction, mirroring the recently-hardened echo ring (#31250). This closes the
same bug class on the sibling cache.
Adds a regression test asserting oldest-first eviction + MRU promotion.
The hardened echo ring (#31250) changes _recent_sent_timestamps from a set
to an OrderedDict, so the reply-detection-cache regression test from the quote
salvage can no longer call .discard(); route it through the new
_consume_sent_timestamp() helper, which is the real echo-removal path.