Adds auxiliary.background_review.{provider,model} (default auto = main chat
model — unchanged). Set it to a different, cheaper model and the post-turn
self-improvement review runs there for ~3-5x lower cost.
Cache-aware by design: the main chat is warm in the prompt cache, so the
default full-history replay on the main model is cheap cache reads — left
exactly as-is. A different model can't reuse that cache (different key), so
when (and only when) routed to a different model the fork replays a compact
digest instead of the full transcript, minimising what it cold-writes on the
aux model. Same model -> full replay; different model -> digest.
Quality holds in benchmarks: memory capture identical, skill near-identical.
Nothing changes unless you opt in by naming a different model.
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
The background-review fork (fires ~every 10 turns) pins
review_agent.session_id = agent.session_id — the parent's LIVE id — for
prefix-cache parity, then calls close(). With session finalization now in
close(), that would end the still-active parent session mid-conversation.
Set _end_session_on_close = False on the fork so the real owner (CLI close /
gateway reset / cron) finalizes the session instead.
Follow-up to the #12029 fix.
Consolidated findings from three independent reviewers (Codex, Claude Code, a
Hermes subagent w/ the hermes-agent-dev skill):
- BLOCKING: refresh_agent_mcp_tools rebuilt only the registry subset, silently
dropping post-build-injected memory-provider (mem0/honcho/…) and context-
engine (lcm_*) tools on every refresh. Now additive-preserving: re-applies
the same injectors agent_init uses, staged on locals and published atomically.
- Re-injection now honors the #5544 enabled_toolsets gate for context-engine
tools, so a restricted-toolset platform can't get lcm_* leaked back in.
- Atomic read-diff-publish under one lock: the returned `added` set and the
(tools, valid_tool_names) pair are consistent even under concurrent callers
(no half-swap, no TOCTOU).
- background_review fork opts out (_skip_mcp_refresh) so its byte-identical
tools[] cache parity with the parent is preserved.
- CLI /reload-mcp routed through the shared helper (was a 4th divergent copy
with the same clobber bug + missing disabled_toolsets).
- Explicit reloads (TUI RPC + CLI) pass enabled_override so a server the user
just enabled in config this session is picked up; automatic paths reuse the
agent's build-time selection.
- mcp_discovery_timeout default 5.0 -> 1.5s: correctness now comes from the
between-turns refresh, so the startup wait is only a small turn-1 UX bump
rather than a heavy dead-server latency penalty.
- has_registered_mcp_tools checks registered TOOLS (not connected servers) so a
zero-tool/prompt-only server doesn't make the per-turn hook fire forever.
- Tests: rewrote the thread-safety test to actually exercise the write path
(alternating tool sets), added the #5544-gate regression, the memory/context
preservation regression, and a "callable next turn via valid_tool_names"
contract; removed a dead monkeypatch line.
The memory tool was strictly one-op-per-call. With the store running near
its char limit by design, a new add that would overflow gets rejected with
'consolidate now, then retry' -- but the model could not consolidate and add
in one call. It had to remove/replace across several turns, then retry the
add, each turn re-sending the whole conversation context. Expensive thrash.
Add an 'operations' array: a list of add/replace/remove ops applied
atomically against the FINAL char budget. The model frees space and adds new
entries in ONE call, even when an add alone would overflow. All-or-nothing:
any bad op aborts the whole batch, nothing written.
Root-cause note: the two agent-level memory interception sites
(agent_runtime_helpers.py, tool_executor.py) silently dropped any param not
in their explicit kwarg list, so 'operations' never reached the handler and
batch calls failed with 'Unknown action None'. Both now pass it through and
bridge each add/replace op to external memory providers.
Also: success response is now terminal (done=true + 'do not repeat' note,
no full-entries echo that invited re-edits); schema rewritten to lead with
the batch mechanism and an explicit one-shot stop rule (2138 -> 1476 chars).
Live-verified: near-full consolidate-and-add went 7 calls -> 1 call,
stable across 3 reps. 103 memory/approval tests + 398 background-review/
run_agent tests green; 6 new batch tests added.
When display.memory_notifications is set to 'verbose', skill_manage
notifications now show meaningful change details instead of just the
generic tool message.
Before (verbose mode):
💾📝 Patched SKILL.md in skill 'gogcli' (1 replacement).
After (verbose mode):
💾📝 Skill 'gogcli' patched: "old pitfall text..." → "new pitfall text..."
Changes:
- skill_manager_tool.py: _patch_skill() now includes old/new string
previews (truncated to 200 chars) in the result via '_change' key.
_create_skill() and _edit_skill() include skill description from
frontmatter for verbose create/edit notifications.
- run_agent.py: Background review notification builder now reads the
'_change' dict from skill tool results and formats descriptive
notifications per action type (patch → old→new diff, create/edit →
description preview). Falls back to generic message when _change
data is unavailable (backwards compatible).
This is especially useful when subagents patch skills, since neither
the user nor the parent agent can see what the subagent changed.
Background memory reviews now support three notification modes,
configured via display.memory_notifications in config.yaml:
off — no chat notification (still logged to stdout/HA log)
on — generic '💾 Memory updated' (default, unchanged behavior)
verbose — content preview with action indicators:
💾 Memory ➕ Hermes Repo liegt unter /config/amy/hermes-agent/...
💾 Memory ✏️ Updated repo path from claude-code to hermes-agent...
💾 Memory ➖ old entry about claude-code path...
Previews are truncated to 120 chars for adds/replaces, 60 for removes.
Each action gets its own line in verbose mode for readability.
Files: run_agent.py, gateway/run.py
The per-session compression lock prevents same-window concurrent forks but
not cross-turn ones: the background-review fork shares the parent's
session_id, so if it won a compression race its new child session was never
adopted by the gateway (the fork is single-lifecycle). The next foreground
turn then started from the stale parent and compressed it again, leaving the
same parent with two sibling children.
Set review_agent.compression_enabled = False so the fork never triggers
compression. Both trigger sites in conversation_loop.py gate on
compression_enabled before calling _compress_context, so the fork can never
rotate the shared parent. Review needs full context anyway — compressing
would degrade the memory/skill summary.
The per-session lock is kept as defense-in-depth for any future shared-session
path. Adds a regression test that fails without the flag and passes with it.
Closes#38727
Snapshot review_agent._session_messages before teardown so close() can
clean per-session state without dropping the user-visible
self-improvement summary. Adds two regressions:
- bg-review summarizer receives captured review-agent tool messages
after review_agent.close() runs
- context-compressor protected-head handoff rehydration populates
_previous_summary and keeps the old handoff out of newly summarized
turns
Salvaged from PR #26039 onto current main after agent/background_review.py
extraction. Original commit 63eaf6055; bg-review test updated to patch
the module-level summarize_background_review_actions in
agent.background_review instead of the now-forwarder
AIAgent._summarize_background_review_actions.
The post-turn background reviewer prompt listed pinned skills under
'Protected skills (DO NOT edit these)' alongside bundled and
hub-installed skills, with the instruction to say 'Nothing to save.'
if only protected skills needed updating. This meant the reviewer
would refuse to patch a pinned skill even when the user explicitly
wanted that skill improved.
The underlying tool layer already gets this right: skill_manage's
_pinned_guard only fires on delete; patch/edit/write_file go through
on pinned skills. Curator archive/consolidation still skips pinned
at the data layer (agent/curator.py), which is the correct place for
that protection — pin's job is anti-deletion, not anti-improvement.
Both _SKILL_REVIEW_PROMPT and _COMBINED_REVIEW_PROMPT now explicitly
tell the reviewer that pinned skills can be patched, with rationale,
so it doesn't bail out of an improvement just because the target is
pinned.
- Replace 18-line comment block with 3-line invariant statement
- Trim test docstrings from multi-paragraph to single-line summaries
- Trim assertion messages from 4-line to 2-line mismatch reports
- Replace 5-line WHAT comments in stubs with 1-line WHY comments
- Add ziliangdotme@gmail.com -> ziliangpeng to AUTHOR_MAP
## Summary
The background skill/memory-review fork constructed a child `AIAgent`
without propagating `enabled_toolsets` / `disabled_toolsets` from the
parent. When the parent narrowed its toolset (via `hermes tools
disable` or `config.yaml`), the fork's default `enabled_toolsets=None`
expanded to "all registered tools" — and the fork's outbound request
body sent a wider `tools[]` array than the parent's main-turn request.
Anthropic's prompt-cache key includes the `tools[]` array byte-for-byte,
so this divergence forked the cache lineage on every nudge and forced a
full prefix rewrite. On a captured ~4 hour Claude-via-Hermes session
this cost roughly 4.3 M cache-write tokens — about half of those
attributable to the per-nudge alternation between the main turn's
narrowed `tools[]` and the review fork's wider `tools[]`.
## Goal
Extend the byte-stability invariant established by PR #17276 (which
fixed `system`) to the `tools[]` slot of the request body, so the
review fork's outbound request hits the parent's warmed Anthropic
prefix cache regardless of how the parent's toolset is configured.
## Implementation
Two-line change in `agent/background_review.py`: pass
`enabled_toolsets=getattr(agent, "enabled_toolsets", None)` and the
matching `disabled_toolsets` kwarg into the `AIAgent(...)` call inside
`_spawn_background_review`. Adds an explanatory block comment that
calls out the cache-key dependency and the relationship to PR #17276.
The post-construction runtime whitelist
(`set_thread_tool_whitelist({memory, skills})`) is untouched — it
still gates which tools the model is allowed to *dispatch*. This
change aligns only what the request body *transmits*, not what the
review is allowed to do, so the safety contract from issue #15204
remains intact.
## Testing
- `tests/run_agent/test_background_review_cache_parity.py`: new
`test_review_fork_inherits_parent_toolset_config` asserts the
parent's `enabled_toolsets` and `disabled_toolsets` reach the
review-fork constructor as kwargs.
- `tests/run_agent/test_background_review_toolset_restriction.py`:
the existing `test_background_review_does_not_narrow_toolset_schema`
was inverted (its old "must NOT pass enabled_toolsets" rule was
built on the assumption that the parent always ran with the
registry default — wrong in practice when the parent is narrowed).
Renamed to `test_background_review_matches_parent_toolset_config`
and updated to assert the parent's value propagates verbatim.
- Verified the new positive test fails without the fix and passes
with it.
- Full suite for `test_background_review*`:
```
$ python -m pytest tests/run_agent/test_background_review.py \
tests/run_agent/test_background_review_summary.py \
tests/run_agent/test_background_review_toolset_restriction.py \
tests/run_agent/test_background_review_cache_parity.py -q
18 passed in 1.85s
```
## Scope
- `agent/background_review.py`: 2 added kwargs + explanatory comment.
- Two test files: one new positive test, one inverted existing test.
- No production code paths outside the review fork; no schema changes;
no public-API changes.
Refs: ziliangpeng/hermes-agent#1 (root-cause analysis with wire-level
cache-write measurements). Extends PR #17276's `system`-bytes
invariant to the `tools[]` slot.
The background review prompts (_SKILL_REVIEW_PROMPT and
_COMBINED_REVIEW_PROMPT) now include explicit protection rules
for bundled, hub-installed, and pinned skills — aligning with
the curator's existing policy at curator.py L345/350.
Before this change, bg-review could freely rewrite bundled skills
like 'hermes-agent' or pinned skills, while the 7-day curator
explicitly skips them.
The review agent now sees:
• Bundled skills (shipped with Hermes)
• Hub-installed skills (installed via hermes skills install)
• Pinned skills (marked via hermes curator pin)
If only protected skills need updating, the review says
'Nothing to save.' and stops.
Fixes#27644
Original commit 973f27e95 by Teknium targeted _spawn_background_review in
pre-refactor run_agent.py. The body now lives in
agent/background_review._spawn_background_review — re-applied there.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
The three big review-prompt strings (_MEMORY_REVIEW_PROMPT,
_SKILL_REVIEW_PROMPT, _COMBINED_REVIEW_PROMPT — 183 lines combined) move
out of the AIAgent class body and into agent/background_review.py where
they're consumed.
AIAgent re-exposes them as class attributes via 'from ... import' inside
the class body — Python binds those names into the class namespace so
existing AIAgent._MEMORY_REVIEW_PROMPT references keep working.
spawn_background_review_thread also falls back to the module-level
constants if an agent doesn't have the attribute (preserves the test
pattern of mocking these on the agent).
tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).
run_agent.py: 9986 -> 9800 lines (-186).
Move the background-review subsystem (the self-improvement loop — see the
README) out of run_agent.py into a dedicated module.
* summarize_background_review_actions — was the @staticmethod that builds
the user-facing action summary
* spawn_background_review_thread — builds the thread target + prompt;
the actual review loop body (forked AIAgent, runtime inheritance,
tool whitelist, suppression, teardown) lives in _run_review_in_thread
* build_memory_write_metadata — provenance for external memory mirrors
AIAgent keeps thin wrappers for backward compatibility AND because tests
patch run_agent.threading.Thread to assert lifecycle behavior — the
threading.Thread construction stays in AIAgent._spawn_background_review,
the inner work moves out.
tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing failure
(test_auxiliary_client.py::test_custom_endpoint... — confirmed failing
on main before this change). 3 skipped.
run_agent.py: 15272 -> 14972 lines (-300).