mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-23 10:42:00 +00:00
The thinking-signature recovery in agent/conversation_loop.py popped
reasoning_details from messages, then continued to retry. That had two
defects.
First, the strip never reached the wire payload. api_messages is built
once at the start of the turn by shallow-copying every entry in messages
(line 919 area). Each api_messages entry has its own reference to the
same reasoning_details list. When build_api_kwargs runs on every retry
iteration of the inner while-loop, it consumes api_messages, not
messages. Popping reasoning_details from messages left api_messages
untouched, so the retry's request still carried the same thinking
blocks Anthropic had just rejected. The classifier latched
thinking_sig_retry_attempted = True after the first attempt, and the
loop terminated with max_retries_exhausted on the same 400.
Second, the pop mutated the canonical message list. messages is the
same list _persist_session writes to state.db and the session
transcript, so a single recovery permanently wiped every signed
thinking block from the stored conversation. Subsequent turns reloaded
the stripped state, hit the same 400 ('invalid signature' or 'cannot
be modified', see #24107), and the agent stopped responding entirely.
Cascading compaction-ended sessions then chained off the corrupted
parent and the affected chat could not produce a response on any
future turn.
Move the strip onto api_messages, which is the API-call-time list
rebuilt into kwargs on every retry. messages is no longer touched, so
disk I/O stays clean and the recovery actually reaches the wire.
Observed against the native Anthropic Messages API on claude-opus-4-7
and claude-opus-4-8 with the interleaved-thinking-2025-05-14 beta on
hermes-agent 0.12.0 and 0.14.0. PR #24107 narrows the trigger; this
change makes the recovery do what it always claimed to do, and
prevents the destructive aftermath.
Tests cover the api_messages strip in isolation: pop on a shallow copy
does not affect the source, the canonical messages list survives the
strip, idempotency on a duplicate firing path, and a no-op when no
reasoning_details exist on the messages.
Related: #24107, #26959, #17861.
93 lines
3.4 KiB
Python
93 lines
3.4 KiB
Python
"""Regression tests for the thinking-block signature recovery.
|
|
|
|
The recovery in ``agent/conversation_loop.py`` strips ``reasoning_details``
|
|
from ``api_messages`` (the API-call-time list rebuilt on every retry) and
|
|
leaves ``messages`` (the canonical store) untouched. The previous
|
|
implementation popped from ``messages`` directly, which never reached
|
|
``api_messages`` because each entry in ``api_messages`` was a shallow
|
|
copy of the corresponding entry in ``messages``, and the mutation also
|
|
landed in ``state.db`` on the next ``_persist_session`` call, corrupting
|
|
the conversation.
|
|
|
|
These tests cover the surface that the recovery touches in isolation:
|
|
shallow copies share inner field references; popping a key from one dict
|
|
does not remove it from the other; and a list of shallow copies behaves
|
|
the same way.
|
|
"""
|
|
|
|
|
|
def _shallow_copies(messages):
|
|
return [m.copy() for m in messages]
|
|
|
|
|
|
def test_pop_on_shallow_copy_does_not_affect_source():
|
|
rd = [{"type": "thinking", "thinking": "r", "signature": "s"}]
|
|
src = {"role": "assistant", "content": "x", "reasoning_details": rd}
|
|
cp = src.copy()
|
|
|
|
cp.pop("reasoning_details", None)
|
|
|
|
assert "reasoning_details" not in cp
|
|
assert "reasoning_details" in src
|
|
assert src["reasoning_details"] is rd
|
|
|
|
|
|
def test_strip_api_messages_leaves_canonical_messages_intact():
|
|
"""Mirrors the recovery: pop reasoning_details from api_messages only.
|
|
|
|
The canonical ``messages`` list keeps its reasoning_details so future
|
|
persists carry the original signed blocks.
|
|
"""
|
|
rd_one = [{"type": "thinking", "thinking": "one", "signature": "sig_one"}]
|
|
rd_two = [{"type": "thinking", "thinking": "two", "signature": "sig_two"}]
|
|
messages = [
|
|
{"role": "user", "content": "q1"},
|
|
{"role": "assistant", "content": "a1", "reasoning_details": rd_one},
|
|
{"role": "user", "content": "q2"},
|
|
{"role": "assistant", "content": "a2", "reasoning_details": rd_two},
|
|
]
|
|
api_messages = _shallow_copies(messages)
|
|
|
|
stripped = 0
|
|
for m in api_messages:
|
|
if isinstance(m, dict) and "reasoning_details" in m:
|
|
m.pop("reasoning_details", None)
|
|
stripped += 1
|
|
|
|
assert stripped == 2
|
|
assert all("reasoning_details" not in m for m in api_messages)
|
|
canonical_rd = [
|
|
m.get("reasoning_details") for m in messages if m["role"] == "assistant"
|
|
]
|
|
assert canonical_rd == [rd_one, rd_two]
|
|
|
|
|
|
def test_strip_is_idempotent_when_run_twice():
|
|
"""A second strip is a no-op when reasoning_details has already been
|
|
removed from api_messages. Guards against a duplicate firing path.
|
|
"""
|
|
api_messages = [
|
|
{"role": "assistant", "content": "a", "reasoning_details": [{"x": 1}]},
|
|
{"role": "user", "content": "q"},
|
|
]
|
|
for _ in range(2):
|
|
for m in api_messages:
|
|
if isinstance(m, dict) and "reasoning_details" in m:
|
|
m.pop("reasoning_details", None)
|
|
|
|
assert all("reasoning_details" not in m for m in api_messages)
|
|
|
|
|
|
def test_strip_skips_messages_without_reasoning_details():
|
|
api_messages = [
|
|
{"role": "user", "content": "q"},
|
|
{"role": "assistant", "content": "a"},
|
|
{"role": "tool", "tool_call_id": "1", "content": "ok"},
|
|
]
|
|
snapshot = [dict(m) for m in api_messages]
|
|
|
|
for m in api_messages:
|
|
if isinstance(m, dict) and "reasoning_details" in m:
|
|
m.pop("reasoning_details", None)
|
|
|
|
assert api_messages == snapshot
|