mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
feat(compression): make protect_first_n configurable
The number of head messages preserved verbatim across context compactions was previously hardcoded to 3 in AIAgent.__init__. Expose it as `compression.protect_first_n` in config, matching the existing `protect_last_n` pattern. Motivation: users who rely on rolling compaction for long-running sessions had the opening user/assistant exchange pinned as head forever, which doesn't always match how they want the session framed after many compactions. Lowering to 1 preserves the system prompt + first non-system message; lowering to 0 preserves only the system prompt and lets the entire first exchange age out naturally through the summary. Semantics: `protect_first_n` counts non-system head messages protected **in addition to** the system prompt, which is always implicitly protected when present. Same meaning across both code paths: protect_first_n=0 → system prompt only (or nothing if no system message) protect_first_n=2 → system prompt + first 2 non-system messages (default) This unifies the CLI path (which reads messages with the system prompt at position 0) and the gateway path (where the gateway /compress handler strips the system prompt before calling compress() — see gateway/run.py L9150-9154 on the parent fork). Previously these two paths disagreed: CLI path: protect_first_n=1 → protect system prompt only Gateway path: protect_first_n=1 → protect first USER turn forever In practice on long-running gateway sessions the old semantics pinned whatever stale aside happened to be the first user message, reinserting it into every compaction summary indefinitely. Default chosen as 2 (not 3) so that the effective protected head count remains 3 messages in the common case — assuming a system prompt is present, default protection becomes system + 2 non-system = 3 total, matching the pre-feature behaviour where `protect_first_n` was hardcoded to protect 3 messages total. Sessions without a system prompt will see a small behaviour change (2 protected head messages instead of 3), but this is the rare path and the new semantics make the system-prompt-present case the well-defined one. Changes: - agent/context_compressor.py: redefine protect_first_n as the count of non-system head messages protected beyond the implicit system-prompt guarantee; both paths converge. Constructor default updated to 2. - hermes_cli/config.py: add `compression.protect_first_n` default (2), matching the new semantics. `show_config` label tweaked to 'Protect first: N non-system head messages' for clarity. - run_agent.py: read protect_first_n from config; 0 is now valid (system prompt is always implicitly protected). - cli-config.yaml.example: document the new key and rationale. - tests/agent/test_context_compressor.py: cover default, override, the end-to-end `protect_first_n=0` and `protect_first_n=1` behaviour, the no-system-prompt (gateway) path, and the new shared-semantics regression test. Fixes #13751 Tested on Ubuntu 24.04.
This commit is contained in:
parent
ffbc21100d
commit
dee71a31e5
5 changed files with 149 additions and 11 deletions
|
|
@ -405,7 +405,7 @@ class ContextCompressor(ContextEngine):
|
|||
self,
|
||||
model: str,
|
||||
threshold_percent: float = 0.50,
|
||||
protect_first_n: int = 3,
|
||||
protect_first_n: int = 2,
|
||||
protect_last_n: int = 20,
|
||||
summary_target_ratio: float = 0.20,
|
||||
quiet_mode: bool = False,
|
||||
|
|
@ -1185,6 +1185,26 @@ The user has requested that this compaction PRIORITISE preserving all informatio
|
|||
idx += 1
|
||||
return idx
|
||||
|
||||
def _protect_head_size(self, messages: List[Dict[str, Any]]) -> int:
|
||||
"""Total count of head messages to protect.
|
||||
|
||||
``protect_first_n`` is defined as *additional* messages protected
|
||||
beyond the system prompt. The system prompt (if present at index 0)
|
||||
is always implicitly protected — it's load-bearing context that
|
||||
must never be summarised away. This keeps semantics stable across
|
||||
call paths where the system prompt may or may not be included in
|
||||
the ``messages`` list (e.g. the gateway ``/compress`` handler
|
||||
strips it before calling compress()).
|
||||
|
||||
Examples:
|
||||
protect_first_n=0 → system prompt only (or nothing if no system msg)
|
||||
protect_first_n=3 → system + first 3 non-system messages
|
||||
"""
|
||||
head = 0
|
||||
if messages and messages[0].get("role") == "system":
|
||||
head = 1
|
||||
return head + self.protect_first_n
|
||||
|
||||
def _align_boundary_backward(self, messages: List[Dict[str, Any]], idx: int) -> int:
|
||||
"""Pull a compress-end boundary backward to avoid splitting a
|
||||
tool_call / result group.
|
||||
|
|
@ -1343,7 +1363,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
|
|||
skip the LLM call when the transcript is still entirely inside
|
||||
the protected head/tail.
|
||||
"""
|
||||
compress_start = self._align_boundary_forward(messages, self.protect_first_n)
|
||||
compress_start = self._align_boundary_forward(messages, self._protect_head_size(messages))
|
||||
compress_end = self._find_tail_cut_by_tokens(messages, compress_start)
|
||||
return compress_start < compress_end
|
||||
|
||||
|
|
@ -1379,7 +1399,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
|
|||
self._last_aux_model_failure_model = None
|
||||
n_messages = len(messages)
|
||||
# Only need head + 3 tail messages minimum (token budget decides the real tail size)
|
||||
_min_for_compress = self.protect_first_n + 3 + 1
|
||||
_min_for_compress = self._protect_head_size(messages) + 3 + 1
|
||||
if n_messages <= _min_for_compress:
|
||||
if not self.quiet_mode:
|
||||
logger.warning(
|
||||
|
|
@ -1399,7 +1419,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
|
|||
logger.info("Pre-compression: pruned %d old tool result(s)", pruned_count)
|
||||
|
||||
# Phase 2: Determine boundaries
|
||||
compress_start = self.protect_first_n
|
||||
compress_start = self._protect_head_size(messages)
|
||||
compress_start = self._align_boundary_forward(messages, compress_start)
|
||||
|
||||
# Use token-budget tail protection instead of fixed message count
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue