hermes-agent/agent
Teknium 6cb9917c73
perf(compression): defer feasibility check to first compression attempt (#28957)
`AIAgent.__init__` was eagerly calling
`_check_compression_model_feasibility()` which probes the auxiliary
provider chain and runs `get_model_context_length()` (potentially
network-bound) to decide whether the configured auxiliary model can
fit a full compression-threshold window. That cost ~440ms cold on
every agent construction.

Most `chat -q` invocations finish in 1-5 seconds and never accumulate
enough context to trip the compression threshold, so the feasibility
check is pure overhead. The result is also only consumed when
compression actually fires (the function adjusts the live threshold
downward if the aux model can't fit; absent that mutation, the gate
in `conversation_loop.py:442` would never fire anyway).

Defer to first `compress_context()` call via
`agent._compression_feasibility_checked` sentinel. Runs at most once
per agent lifetime, just before the first compression pass. The
warning storage (`_compression_warning`) and gateway replay
machinery is unchanged — it still emits to status_callback on the
first turn that actually needs compression.

E2E timing (chat -q 'hi', 3 runs each):
                BEFORE   AFTER    delta
  median wall   2.03s    1.86s    -8% (-169ms)
  min wall      1.92s    1.63s    -15% (-293ms)

Real cold-start observation (synthetic 31-turn agent loop): identical
behavior since feasibility check fires once on first compression and
caches. No semantic difference for sessions that DO compress.

UX trade-off: users with broken auxiliary-provider config no longer
see the warning at session start. They see it when compression first
fires — which is exactly when it matters. For users with working
config (the vast majority), the warning never fires anyway, so the
deferral is invisible.

Tests:
- tests/run_agent/test_compression_feasibility.py — 16/16 pass
  (the one test that asserted call-at-init was updated to drive the
  lazy check explicitly via agent._check_compression_model_feasibility())
- Live tmux session: 2-turn conversation + tool call completes clean,
  zero errors in agent.log
2026-05-19 17:27:17 -07:00
..
lsp chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355) 2026-05-17 02:29:41 -07:00
transports fix(codex): allow kanban worker board writes 2026-05-17 11:50:43 -07:00
__init__.py Refactor Terminal and AIAgent cleanup 2026-02-21 22:31:43 -08:00
account_usage.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
agent_init.py perf(compression): defer feasibility check to first compression attempt (#28957) 2026-05-19 17:27:17 -07:00
agent_runtime_helpers.py fix(agent): set tool_name on tool-result messages at construction time 2026-05-19 20:49:11 +01:00
anthropic_adapter.py feat(azure-foundry): add Microsoft Entra ID auth 2026-05-18 10:14:38 -07:00
async_utils.py fix(async): close unscheduled coroutines in all threadsafe bridges (#26584) 2026-05-15 14:00:01 -07:00
auxiliary_client.py fix(xai-oauth): pin inference base_url to x.ai origin (#28952) 2026-05-19 14:51:21 -07:00
azure_identity_adapter.py feat(azure-foundry): add Microsoft Entra ID auth 2026-05-18 10:14:38 -07:00
background_review.py feat(bg-review): add bundled/pinned skill protection rules to review prompts (#27644) 2026-05-18 20:02:22 -07:00
bedrock_adapter.py chore(deps): lazy-install boto3/botocore for bedrock adapter 2026-05-17 02:31:18 -07:00
browser_provider.py fix(browser): self-review pass — dead-import, log levels, future-proofing 2026-05-17 04:04:15 -07:00
browser_registry.py fix(browser): self-review pass — dead-import, log levels, future-proofing 2026-05-17 04:04:15 -07:00
chat_completion_helpers.py fix(xai-responses): strip enum values containing '/' from tool schemas 2026-05-18 10:37:35 -07:00
codex_responses_adapter.py fix(xai-oauth): recover from prelude SSE errors, gate reasoning replay, surface entitlement 403s (#26644) 2026-05-15 16:35:12 -07:00
codex_runtime.py fix(xai): surface provider 'error' SSE frame in Codex fallback stream (#27184) 2026-05-16 23:41:09 -07:00
context_compressor.py fix(compress): make abort-on-summary-failure opt-in via config flag (#28117) 2026-05-18 10:28:20 -07:00
context_engine.py fix(compression): keep default protect_first_n at 3 + align ABC 2026-05-13 22:25:16 -07:00
context_references.py fix(agent): fall back when rg is blocked for @folder references 2026-04-20 01:56:41 -07:00
conversation_compression.py perf(compression): defer feasibility check to first compression attempt (#28957) 2026-05-19 17:27:17 -07:00
conversation_loop.py fix: wrap _pool_may_recover_from_rate_limit call through run_agent namespace 2026-05-18 20:04:57 -07:00
copilot_acp_client.py fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes 2026-05-19 00:12:41 -07:00
credential_pool.py fix(codex-oauth): quarantine terminal refresh errors so dead tokens are not replayed across sessions 2026-05-18 10:31:40 -07:00
credential_sources.py feat(xai-oauth): add xAI Grok OAuth (SuperGrok Subscription) provider 2026-05-15 12:11:32 -07:00
curator.py feat(curator): hint at hermes curator pin in the rename block (#23212) 2026-05-10 06:44:53 -07:00
curator_backup.py fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671) (#18731) 2026-05-02 01:29:57 -07:00
display.py chore: remove Atropos RL environments and tinker-atropos integration (#26106) 2026-05-15 10:36:38 +05:30
error_classifier.py fix(error_classifier): classify xAI Grok entitlement SSE errors as auth 2026-05-18 10:24:13 -07:00
file_safety.py fix(security): apply file safety to copilot acp fs 2026-04-21 01:31:58 -07:00
gemini_cloudcode_adapter.py fix(agent/gemini-cloudcode): seed delta defaults for reasoning-only stream chunks 2026-05-14 08:03:56 -07:00
gemini_native_adapter.py fix(auxiliary): evict async wrappers on poisoned client (follow-up to #23482) 2026-05-11 11:13:20 -07:00
gemini_schema.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
google_code_assist.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
google_oauth.py fix(google_oauth): close TOCTOU window when saving credentials 2026-05-04 03:16:19 -07:00
i18n.py feat(i18n): localize all gateway commands + web dashboard, add 8 new locales (16 total) (#22914) 2026-05-10 07:14:14 -07:00
image_gen_provider.py feat(plugins): pluggable image_gen backends + OpenAI provider (#13799) 2026-04-21 21:30:10 -07:00
image_gen_registry.py fix(plugins): filter resolution by is_available() in web + image_gen registries 2026-05-13 22:31:28 -07:00
image_routing.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
insights.py Merge branch 'main' into feat/dashboard-skill-analytics 2026-04-20 05:25:49 -07:00
iteration_budget.py refactor(run_agent): extract OpenAI proxy, safe stdio, IterationBudget 2026-05-16 17:59:32 -07:00
lmstudio_reasoning.py feat(agent): add lmstudio integration 2026-04-28 12:27:36 -07:00
manual_compression_feedback.py fix(compression): include system prompt + tool schemas in token estimates (#18265) 2026-04-30 23:03:54 -07:00
markdown_tables.py fix(cli): vertical fallback for markdown tables wider than terminal (#23948) 2026-05-11 16:49:13 -07:00
memory_manager.py 🐛 fix(memory): require newline after context tag 2026-05-18 10:53:08 -07:00
memory_provider.py docs(agent): remove stale BuiltinMemoryProvider references from memory module docstrings 2026-05-05 13:33:49 -07:00
message_sanitization.py refactor(run_agent): extract message sanitization to agent/message_sanitization.py 2026-05-16 17:41:09 -07:00
model_metadata.py fix(metadata): qwen3.6-plus has a 1M context window (#27008) 2026-05-17 02:31:18 -07:00
models_dev.py feat: add NovitaAI as LLM provider 2026-05-13 23:51:15 -07:00
moonshot_schema.py fix(moonshot): strip $ref siblings and collapse tuple items in tool schemas (#27104) 2026-05-16 13:02:19 -07:00
nous_rate_guard.py codebase: add encoding='utf-8' to all bare open() calls (PLW1514) 2026-05-08 14:27:40 -07:00
onboarding.py docs(onboarding): lead OpenClaw residue banner with migrate, warn that cleanup breaks OpenClaw (#17507) 2026-04-29 08:08:36 -07:00
plugin_llm.py feat(plugins): run any LLM call from inside a plugin via ctx.llm (#23194) 2026-05-10 07:09:28 -07:00
portal_tags.py feat(nous): unified client=hermes-client-v<version> tag on every Portal request (#24779) 2026-05-12 20:49:20 -07:00
process_bootstrap.py refactor(run_agent): extract OpenAI proxy, safe stdio, IterationBudget 2026-05-16 17:59:32 -07:00
prompt_builder.py fix(kanban): stale reclaim must not tick failure counter (#28680) 2026-05-19 03:15:18 -07:00
prompt_caching.py fix(cache): kill long-lived prefix layout — system prompt is now byte-static within a session (#24778) 2026-05-12 20:46:04 -07:00
rate_limit_tracker.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
redact.py perf(agent-loop): cut 47% of per-conversation function calls via 3 targeted hot-path optimizations (#28866) 2026-05-19 14:25:10 -07:00
retry_utils.py feat(agent): add jittered retry backoff 2026-04-08 00:41:36 -07:00
shell_hooks.py fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes 2026-05-19 00:12:41 -07:00
skill_bundles.py feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373) 2026-05-18 21:38:05 -07:00
skill_commands.py fix(skills): load symlinked skill slash commands 2026-05-18 00:34:29 -07:00
skill_preprocessing.py fix: treat inline-shell timeout guard as timeout 2026-05-18 19:36:04 -07:00
skill_utils.py perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138) 2026-05-08 16:39:32 -07:00
stream_diag.py refactor(run_agent): extract stream diagnostics to agent/stream_diag.py 2026-05-16 18:28:17 -07:00
subdirectory_hints.py fix(agent): catch PermissionError in subdirectory hint discovery 2026-04-09 03:10:30 -07:00
system_prompt.py perf(prompt): cache kanban worker guidance at session init 2026-05-18 20:56:44 -07:00
think_scrubber.py fix(agent): stateful streaming scrubber for reasoning-block leaks (#17924) (#20184) 2026-05-05 04:33:38 -07:00
title_generator.py fix: improve telegram topic mode setup 2026-05-04 12:07:17 -07:00
tool_dispatch_helpers.py fix(agent): set tool_name on tool-result messages at construction time 2026-05-19 20:49:11 +01:00
tool_executor.py fix(agent): set tool_name on tool-result messages at construction time 2026-05-19 20:49:11 +01:00
tool_guardrails.py fix: add recovery hints to loop guard warnings 2026-05-19 00:12:12 -07:00
tool_result_classification.py fix: classify landed file mutations with diagnostics 2026-05-13 06:46:23 -07:00
trajectory.py Refactor Terminal and AIAgent cleanup 2026-02-21 22:31:43 -08:00
usage_pricing.py fix(pricing): add deepseek-v4-pro to official docs pricing table 2026-05-12 16:32:57 -07:00
video_gen_provider.py feat(video_gen): unified video_generate tool with pluggable provider backends (#25126) 2026-05-13 16:39:41 -07:00
video_gen_registry.py feat(video_gen): unified video_generate tool with pluggable provider backends (#25126) 2026-05-13 16:39:41 -07:00
web_search_provider.py fix(web): align _LEGACY_PREFERENCE with legacy 7-provider order + doc cleanup 2026-05-13 22:31:28 -07:00
web_search_registry.py fix(web): align _LEGACY_PREFERENCE with legacy 7-provider order + doc cleanup 2026-05-13 22:31:28 -07:00