mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-29 01:31:41 +00:00
fix(gateway): use transcript timestamp for auto-continue freshness
Follow-up to PR #16802 (BeliefanX). The original fix read `agent_history[-1].get("timestamp")` for the tool-tail freshness gate, but `gateway/run.py` strips the `timestamp` field off all tool/tool_call rows when building `agent_history` from the raw transcript (see `clean_msg = {k: v for k, v in msg.items() if k != "timestamp"}`). At runtime the tool-tail branch always saw `None` and silently took the legacy-fresh path — the stale-guard never fired for the tool-tail case it was supposed to cover. Changes: - Read the freshness signal from the RAW `history` list (via new `_last_transcript_timestamp()` helper) BEFORE the strip. Both the resume_pending branch and the tool-tail branch use this single signal, replacing the two divergent ones. - Default window bumped 15 min → 1 hour via new `_AUTO_CONTINUE_FRESHNESS_SECS_DEFAULT`. The 15-minute default was shorter than the default `gateway_timeout` of 30 min, so a legitimate long-running turn interrupted near its timeout boundary and resumed shortly after would have been misclassified as stale. - Configurable via `config.yaml` `agent.gateway_auto_continue_freshness` (bridged to `HERMES_AUTO_CONTINUE_FRESHNESS` at gateway startup — same pattern as `gateway_timeout`). Set to 0 to disable the gate. - `_coerce_gateway_timestamp` now explicitly rejects bool (which is a subclass of int and would otherwise coerce to 0.0/1.0). - Tests rewritten to exercise the real production data shape: raw `history` → `_build_agent_history` strip → freshness decision. A regression guard (`test_stale_tool_tail_with_production_data_shape`) asserts `agent_history` tool rows carry NO timestamp, protecting against someone "fixing" the original bug by re-adding the stripped field (which would break the OpenAI tool-result message contract). Add BeliefanX to scripts/release.py AUTHOR_MAP. E2E verified: config.yaml → env var bridge → helper returns configured value; default 1h window; malformed/empty env var falls back to default; ISO-Z timestamps parse; ms-epoch coerced; bool rejected.
This commit is contained in:
parent
93feffbcfa
commit
7444e49d4e
4 changed files with 407 additions and 35 deletions
|
|
@ -411,6 +411,20 @@ DEFAULT_CONFIG = {
|
|||
# (60+ tool iterations with tiny output) before users assume the
|
||||
# bot is dead and /restart.
|
||||
"gateway_notify_interval": 180,
|
||||
# Freshness window for the gateway auto-continue note (seconds).
|
||||
# After a gateway crash/restart/SIGTERM mid-run, the next user
|
||||
# message gets a "[System note: your previous turn was
|
||||
# interrupted — process the unfinished tool result(s) first]"
|
||||
# prepended so the model picks up where it left off. That's the
|
||||
# right behaviour while the interruption is fresh, but stale
|
||||
# markers (transcript last touched hours or days ago) can revive
|
||||
# an unrelated old task when the user's next message starts new
|
||||
# work. This window is the max age of the last persisted
|
||||
# transcript row for which we still inject the continue note.
|
||||
# Default 3600s comfortably covers a long turn (gateway_timeout
|
||||
# default is 1800s) plus runtime slack. Set to 0 to disable the
|
||||
# gate and restore pre-fix behaviour (always inject).
|
||||
"gateway_auto_continue_freshness": 3600,
|
||||
# How user-attached images are presented to the main model on each turn.
|
||||
# "auto" — attach natively when the active model reports
|
||||
# supports_vision=True AND the user hasn't explicitly
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue