hermes-agent/acp_adapter
Teknium f0dc919f92
fix(compression): include system prompt + tool schemas in token estimates (#18265)
The user-visible /compress banner and the post-compression last_prompt_tokens
writeback both counted only the raw message transcript (chars/4). With a 15KB
system prompt and 30 tool schemas (~26KB), a 4-message transcript that looks
like ~45 tokens to the transcript-only estimator is really ~10.5K tokens of
request pressure — a 234x gap.

Two user-facing consequences:
- Banner shows 'Compressing … (~45 tokens)…' while compression is actually
  firing on 10K+ tokens of real pressure, confusing users about why
  compression triggered (reported by @codecovenant on X; #6217).
- Post-compression last_prompt_tokens writeback omits tool schemas, so the
  next should_compress() check compares real usage against a stale
  underestimate — compression triggers late, potentially past the model's
  context limit on small-context models (#14695).

Swap estimate_messages_tokens_rough() for estimate_request_tokens_rough()
at every user-visible banner and at the post-compression writeback.
estimate_request_tokens_rough() already existed for exactly this purpose
and includes system prompt + tool schemas.

Touched call sites:
- run_agent.py: post-compression last_prompt_tokens writeback, post-tool
  call should_compress() fallback when provider usage is missing
- cli.py: /compress banner + summary
- gateway/run.py: gateway /compress banner + summary
- tui_gateway/server.py: TUI /compress status + summary
- acp_adapter/server.py: ACP /compact before/after

Left intentionally alone:
- Session-hygiene fallback and the 'no agent' /status path in gateway/run.py
  — no agent instance is in scope to query for system prompt/tools, and the
  existing 30-50% overestimate wobble on hygiene is safety-accepted.
- Verbose-mode 'Request size' logging — informational only, already counts
  system prompt via api_messages[0].

Also relabels the feedback line from 'Rough transcript estimate' to
'Approx request size' so the metric label matches what it actually measures.

Credits: diagnoses from @devilardis (#14695) and @Jackten (#6217);
user report @codecovenant on X (2026-04-30).

Closes #14695
Closes #6217
2026-04-30 23:03:54 -07:00
..
__init__.py feat: restore ACP server implementation from PR #949 (#1254) 2026-03-14 00:09:05 -07:00
__main__.py feat: restore ACP server implementation from PR #949 (#1254) 2026-03-14 00:09:05 -07:00
auth.py feat: restore ACP server implementation from PR #949 (#1254) 2026-03-14 00:09:05 -07:00
entry.py fix(mcp): move discovery out of model_tools import side effect (#16856) (#16899) 2026-04-28 01:17:58 -07:00
events.py fix(acp): improve zed integration 2026-04-17 13:29:26 -07:00
permissions.py fix(permissions): handle None response from ACP request_permission 2026-04-21 05:57:23 -07:00
server.py fix(compression): include system prompt + tool schemas in token estimates (#18265) 2026-04-30 23:03:54 -07:00
session.py fix(acp): normalize Windows cwd for WSL tool execution 2026-04-30 20:55:14 -07:00
tools.py fix(acp): improve zed integration 2026-04-17 13:29:26 -07:00