fix: gateway token double-counting with cached agents (#3306)

The cached agent accumulates session_input_tokens across messages, so
run_conversation() returns cumulative totals. But update_session() used
+= (increment), double-counting on every message after the first.

- session.py: change in-memory entry updates from += to = (direct
  assignment for cumulative values)
- hermes_state.py: add absolute=True flag to update_token_counts()
  that uses SET column = ? instead of SET column = column + ?
- session.py: pass absolute=True to the DB call

CLI path is unchanged — it passes per-API-call deltas directly to
update_token_counts() with the default absolute=False (increment).

Reported by @zaycruz in #3222. Closes #3222.
This commit is contained in:
Teknium 2026-03-26 19:04:53 -07:00 committed by GitHub
parent 1519c4d477
commit a8df7f9964
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 45 additions and 10 deletions

View file

@ -858,6 +858,7 @@ class TestLastPromptTokens:
billing_provider=None,
billing_base_url=None,
model="openai/gpt-5.4",
absolute=True,
)