hermes-agent/hermes_cli
Teknium 544c31b50b
perf(agent-loop): cut 47% of per-conversation function calls via 3 targeted hot-path optimizations (#28866)
* perf(config): add load_config_readonly() fast path for hot agent loop

`load_config()` is called from the agent loop's per-API-call hot path via
`get_provider_request_timeout()` and `get_provider_stale_timeout()` —
both invoked once per turn from `_resolved_api_call_timeout()` in
run_agent.py.

Profiling a synthetic 20-tool-call agent run revealed:
- 21 invocations of `load_config()` cumulating 56ms (~17% of agent loop)
- 34,398 deepcopy calls totaling 37ms (config defensive deepcopy + chain)
- 8,652 `_expand_env_vars` invocations (~412 per turn)

Microbench (cache-hit, real config.yaml present):
  load_config()          265us/call  (125us deepcopy + 140us infra)
  load_config_readonly() 138us/call  (~48% faster)

`load_config_readonly()` returns the cached dict directly without the
defensive deepcopy. Documented contract: caller must not mutate. Returns
plain dict (not MappingProxyType) so downstream `isinstance(x, dict)`
guards keep working — caught during initial implementation when
MappingProxyType broke get_provider_request_timeout's guard logic.

Wired into hermes_cli/timeouts.py (the two functions called per agent
turn). load_config() is unchanged for the 263 other call sites that
mutate the result before save_config(), are not in the hot path, or
where the safety guarantee matters more than the perf.

Profile A/B (cached config, 21-turn agent loop):
                                BEFORE  AFTER   delta
  get_provider_request_timeout  55ms    16ms    -71%
  total function calls          399k    160k    -60%
  deepcopy calls (in hotspots)  34,398  ~0      ~elim

Verified:
- isinstance(load_config_readonly(), dict) is True
- timeout/stale resolutions correct
- load_config() still returns isolated mutable deepcopies
- tests/hermes_cli/test_config*.py / test_timeouts.py: 102/102 pass
- tests/cli/ + tests/agent/test_auxiliary_client.py: 883/883 pass

* perf(redact): substring pre-screens skip non-matching regex chains

Every log record passes through `RedactingFormatter.format` which calls
`redact_sensitive_text`, which historically ran ALL 13 secret-pattern
regexes against every line — including DB connection strings, JWTs,
Discord mentions, Signal phone numbers, etc. — even for typical clean
log records like 'INFO run_agent: API call completed'.

Add cheap substring pre-checks before each regex pass. False positives
still run the regex (which then matches nothing); false negatives are
impossible because every pattern requires the gated substring to match
its leading anchor:

- `_PREFIX_RE`        gated on any of 33 known credential prefix substrings
- `_ENV_ASSIGN_RE`    gated on `=` in text
- `_JSON_FIELD_RE`    gated on `:` and `"` in text
- `_AUTH_HEADER_RE`   gated on `uthorization`/`UTHORIZATION` in text
- `_TELEGRAM_RE`      gated on `:` in text
- `_PRIVATE_KEY_RE`   gated on `BEGIN` and `-----`
- `_DB_CONNSTR_RE`    gated on `://` in text
- `_JWT_RE`           gated on `eyJ` in text
- URL userinfo/query  gated on `://`
- `_redact_form_body` gated on `&` and `=`
- `_DISCORD_MENTION_RE` gated on `<@`
- `_SIGNAL_PHONE_RE`  gated on `+`

Microbench (5 typical log records, 20k iterations each):
                              BEFORE  AFTER  delta
  redact_sensitive_text per call  5.63us  1.79us  -68%

Real-world impact: ~244 log records emitted in a 30-turn agent loop, so
the chain saves ~1ms of CPU per conversation. Bigger win is the
reduction in regex execution and GC pressure during heavy logging
sessions (verbose logging, gateway message processing).

Security regression test: 30 secret-containing inputs (sk-/ghp_/JWT/DB
connstr/Auth-Bearer/private key/URL userinfo/Discord/Signal/etc.)
verified to produce identical redacted output before/after. All 75
existing tests/agent/test_redact.py cases pass.

The `?access_token=foo&code=bar` (bare query string, no scheme) case
that 'leaks' is pre-existing behavior — the URL query redaction
requires a well-formed URL with scheme+host. Not a regression.

* perf(run_agent): cache _needs_thinking_reasoning_pad result per (provider, model, base_url)

Profile of a 31-turn synthetic agent run shows `_needs_thinking_reasoning_pad`
fires 495 times (~16 per turn) and each call ran 3 helper methods, each
hitting `base_url_host_matches` 1-4 times via `urlparse`. Total cost:
3,342 base_url_host_matches calls + 3,373 urlparse calls accounting for
~36ms of agent-loop overhead (~7% of the entire post-network work).

Provider / model / base_url don't change during a conversation except via
`switch_model` and fallback activation — both of which already overwrite
those attributes atomically. Cache the result on a tuple key; since the
key is derived from the very fields that would change, the cache
auto-invalidates on the next read after a switch. No manual invalidation
needed in switch_model / _try_activate_fallback.

Profile A/B (31-turn cached-config agent run):
                                      BEFORE  AFTER  delta
  _needs_thinking_reasoning_pad cum    18ms    1ms    -94%
  _copy_reasoning_content_for_api cum  17ms    1ms    -94%
  base_url_host_matches calls          3,342   372    -89%
  urlparse calls                       3,373   403    -88%
  total function calls                 296k    223k   -25%

Verified:
- tests/run_agent/test_deepseek_reasoning_content_echo.py: 36/36 pass
- tests/run_agent/ (full): 1383/1383 pass + 3 skipped
2026-05-19 14:25:10 -07:00
..
proxy feat(proxy): add xai upstream adapter for Grok via OAuth 2026-05-18 20:09:32 -07:00
__init__.py chore: release v0.14.0 (2026.5.16) (#26862) 2026-05-16 02:58:57 -07:00
_parser.py fix: add dashboard to CLI help epilogue and Docker CI smoke test 2026-05-07 06:16:23 -07:00
_subprocess_compat.py feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags 2026-05-08 14:27:40 -07:00
auth.py fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes 2026-05-19 00:12:41 -07:00
auth_commands.py feat(cli): wire --manual-paste into `hermes auth add and hermes model` 2026-05-18 20:10:52 -07:00
azure_detect.py feat(azure-foundry): add Microsoft Entra ID auth 2026-05-18 10:14:38 -07:00
backup.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
banner.py refactor: DRY cleanup from code review 2026-05-15 14:45:43 -07:00
browser_connect.py fix(browser): address Copilot review on /browser connect 2026-04-28 22:11:10 -07:00
bundles.py feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373) 2026-05-18 21:38:05 -07:00
callbacks.py fix: ESC cancels secret/sudo prompts, clearer skip messaging (#9902) 2026-04-14 16:11:37 -07:00
checkpoints.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
claw.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
cli_output.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
clipboard.py fix(clipboard): only read PNG signature bytes, not entire file 2026-05-13 22:54:21 -07:00
codex_models.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
codex_runtime_plugin_migration.py fix(codex-runtime): de-dup [plugins.X] tables and stop leaking HERMES_HOME into config.toml 2026-05-15 02:31:30 -07:00
codex_runtime_switch.py chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355) 2026-05-17 02:29:41 -07:00
colors.py feat: respect NO_COLOR env var and TERM=dumb (#4079) 2026-03-30 17:07:21 -07:00
commands.py Revert "feat(telegram): support quick-command-only menus" 2026-05-18 23:59:57 -07:00
completion.py test(cli): strengthen zsh completion regression coverage 2026-05-13 09:34:15 -07:00
config.py perf(agent-loop): cut 47% of per-conversation function calls via 3 targeted hot-path optimizations (#28866) 2026-05-19 14:25:10 -07:00
copilot_auth.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
cron.py feat: add cron job profile support 2026-05-18 17:39:50 +00:00
curator.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
curses_ui.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
debug.py fix(debug): redact log content at upload time in hermes debug share 2026-05-03 11:42:20 -07:00
default_soul.py fix: reset default SOUL.md to baseline identity text (#3159) 2026-03-26 01:34:27 -07:00
dep_ensure.py feat(dep_ensure): complete Windows bootstrap — dep_ensure + install.ps1 + detection (#27845) 2026-05-18 16:34:24 +05:30
dingtalk_auth.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
doctor.py fix(doctor): attach codex CLI hint to OpenAI Codex auth warning for #27975 2026-05-19 00:14:39 -07:00
dump.py refactor(env): use shared Hermes dotenv loader 2026-05-05 10:13:13 -07:00
env_loader.py feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 14:27:40 -07:00
fallback_cmd.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
gateway.py fix(gateway): harden Windows gateway install lifecycle 2026-05-19 11:23:15 -07:00
gateway_windows.py fix(gateway): harden Windows gateway install lifecycle 2026-05-19 11:23:15 -07:00
goals.py feat: inject current time into goal judge prompt 2026-05-16 23:05:27 -07:00
hooks.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
inventory.py refactor(inventory): extract shared ConfigContext + build_models_payload 2026-05-13 22:31:11 -07:00
kanban.py feat(kanban): add scheduled status for delayed follow-ups 2026-05-18 21:39:03 -07:00
kanban_db.py fix(kanban): also hoist idx_events_run + drop redundant inner create 2026-05-19 08:09:11 -07:00
kanban_decompose.py fix: assign single-task kanban decompositions 2026-05-18 20:26:02 -07:00
kanban_diagnostics.py fix(kanban): honor severity thresholds in diagnostics 2026-05-18 20:47:01 -07:00
kanban_specify.py fix(cli): make kanban specify max_tokens configurable 2026-05-18 20:15:20 -07:00
kanban_swarm.py feat(cli): add kanban swarm topology helper 2026-05-18 21:10:12 -07:00
logs.py feat: component-separated logging with session context and filtering (#7991) 2026-04-11 17:23:36 -07:00
main.py fix: register browse-sh in per-source limits and --source choices 2026-05-19 14:17:38 -07:00
mcp_config.py fix(mcp): pre-compile env-var regex and unify interpolation 2026-05-15 01:43:54 -07:00
memory_setup.py fix: restrict .env file permissions to 0600 2026-05-14 07:59:38 -07:00
model_catalog.py codebase: add encoding='utf-8' to all bare open() calls (PLW1514) 2026-05-08 14:27:40 -07:00
model_normalize.py fix(opencode-go): keep users on opencode-go instead of hijacking to native providers (#20802) 2026-05-06 09:08:33 -07:00
model_switch.py fix(model-switch): mark bare custom provider as current 2026-05-19 10:57:35 -07:00
models.py fix: detect gh-copilot deprecation and improve GitHub Models 413 errors (#10648) 2026-05-16 02:24:48 -07:00
nous_subscription.py feat(web): add SearXNG as a native search-only backend 2026-05-06 10:05:29 -07:00
oneshot.py fix(oneshot): pass fallback_providers from profile config to AIAgent 2026-05-18 20:37:23 -07:00
pairing.py fix(pairing): enforce lockout on approve_code, not just generate_code (#10195) (#21325) 2026-05-07 07:18:21 -07:00
platforms.py feat: complete plugin platform parity — all 12 integration points 2026-04-29 21:56:51 -07:00
plugins.py feat(browser): add BrowserProvider ABC mirroring web_search_provider template 2026-05-17 04:04:15 -07:00
plugins_cmd.py test(plugins): cover _discover_all_plugins recursion + cross-link loader 2026-05-16 17:15:19 -07:00
profile_describer.py feat(kanban): orchestrator-driven auto-decomposition on triage (#27572) 2026-05-17 13:54:12 -07:00
profile_distribution.py feat(profile): shareable profile distributions via git (#20831) 2026-05-08 10:04:32 -07:00
profiles.py feat(kanban): orchestrator-driven auto-decomposition on triage (#27572) 2026-05-17 13:54:12 -07:00
providers.py fix: add default base_url_override for ollama-cloud provider 2026-05-18 14:31:37 -07:00
pt_input_extras.py fix(cli): make Ctrl+Enter insert newline on WSL/SSH/Windows Terminal (#22777) 2026-05-09 12:48:14 -07:00
pty_bridge.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
relaunch.py fix(windows): prefer npm.cmd over npm.ps1, skip .py argv0 in relaunch 2026-05-08 14:27:40 -07:00
runtime_provider.py fix(runtime): treat 'ollama'/'vllm'/'llamacpp' aliases like 'custom' for base_url trust (#27132) 2026-05-19 14:23:19 -07:00
security_advisories.py feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220) 2026-05-12 01:02:25 -07:00
send_cmd.py fix(review): address Copilot follow-up on sanitizer and file decode errors 2026-05-16 23:00:58 -05:00
session_recap.py chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355) 2026-05-17 02:29:41 -07:00
setup.py fix(cli): preserve setup config picker writes 2026-05-19 14:23:19 -07:00
skills_config.py refactor(config): migrate remaining 33 cfg_get call sites (#17311) 2026-04-29 04:03:03 -07:00
skills_hub.py fix: register browse-sh in per-source limits and --source choices 2026-05-19 14:17:38 -07:00
skin_engine.py fix(tui): improve charizard completion menu contrast 2026-05-18 20:05:23 -07:00
slack_cli.py fix(slack): enable writable app home DMs in manifest 2026-05-08 17:01:12 -07:00
status.py feat(status): show xAI OAuth login state in hermes status 2026-05-17 11:35:57 -07:00
stdio.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00
timeouts.py perf(agent-loop): cut 47% of per-conversation function calls via 3 targeted hot-path optimizations (#28866) 2026-05-19 14:25:10 -07:00
tips.py feat(session_search): single-shape tool with discovery, scroll, browse — no LLM (#27590) 2026-05-17 23:28:45 -07:00
tools_config.py feat(tools): mirror image_gen plugin-injection in Browser Automation picker 2026-05-17 04:04:15 -07:00
uninstall.py docs(windows): avoid piping installer directly into iex 2026-05-18 20:05:47 -07:00
vercel_auth.py feat: add Vercel Sandbox backend 2026-04-29 07:22:33 -07:00
voice.py fix(tui): restore voice push-to-talk parity (#20897) 2026-05-06 15:49:59 -07:00
web_server.py fix(dashboard): use browser scrollback for chat wheel 2026-05-19 00:07:33 -07:00
webhook.py chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937) 2026-05-11 11:13:25 -07:00