perf(agent-loop): cut 47% of per-conversation function calls via 3 targeted hot-path optimizations (#28866)

* perf(config): add load_config_readonly() fast path for hot agent loop

`load_config()` is called from the agent loop's per-API-call hot path via
`get_provider_request_timeout()` and `get_provider_stale_timeout()` —
both invoked once per turn from `_resolved_api_call_timeout()` in
run_agent.py.

Profiling a synthetic 20-tool-call agent run revealed:
- 21 invocations of `load_config()` cumulating 56ms (~17% of agent loop)
- 34,398 deepcopy calls totaling 37ms (config defensive deepcopy + chain)
- 8,652 `_expand_env_vars` invocations (~412 per turn)

Microbench (cache-hit, real config.yaml present):
  load_config()          265us/call  (125us deepcopy + 140us infra)
  load_config_readonly() 138us/call  (~48% faster)

`load_config_readonly()` returns the cached dict directly without the
defensive deepcopy. Documented contract: caller must not mutate. Returns
plain dict (not MappingProxyType) so downstream `isinstance(x, dict)`
guards keep working — caught during initial implementation when
MappingProxyType broke get_provider_request_timeout's guard logic.

Wired into hermes_cli/timeouts.py (the two functions called per agent
turn). load_config() is unchanged for the 263 other call sites that
mutate the result before save_config(), are not in the hot path, or
where the safety guarantee matters more than the perf.

Profile A/B (cached config, 21-turn agent loop):
                                BEFORE  AFTER   delta
  get_provider_request_timeout  55ms    16ms    -71%
  total function calls          399k    160k    -60%
  deepcopy calls (in hotspots)  34,398  ~0      ~elim

Verified:
- isinstance(load_config_readonly(), dict) is True
- timeout/stale resolutions correct
- load_config() still returns isolated mutable deepcopies
- tests/hermes_cli/test_config*.py / test_timeouts.py: 102/102 pass
- tests/cli/ + tests/agent/test_auxiliary_client.py: 883/883 pass

* perf(redact): substring pre-screens skip non-matching regex chains

Every log record passes through `RedactingFormatter.format` which calls
`redact_sensitive_text`, which historically ran ALL 13 secret-pattern
regexes against every line — including DB connection strings, JWTs,
Discord mentions, Signal phone numbers, etc. — even for typical clean
log records like 'INFO run_agent: API call completed'.

Add cheap substring pre-checks before each regex pass. False positives
still run the regex (which then matches nothing); false negatives are
impossible because every pattern requires the gated substring to match
its leading anchor:

- `_PREFIX_RE`        gated on any of 33 known credential prefix substrings
- `_ENV_ASSIGN_RE`    gated on `=` in text
- `_JSON_FIELD_RE`    gated on `:` and `"` in text
- `_AUTH_HEADER_RE`   gated on `uthorization`/`UTHORIZATION` in text
- `_TELEGRAM_RE`      gated on `:` in text
- `_PRIVATE_KEY_RE`   gated on `BEGIN` and `-----`
- `_DB_CONNSTR_RE`    gated on `://` in text
- `_JWT_RE`           gated on `eyJ` in text
- URL userinfo/query  gated on `://`
- `_redact_form_body` gated on `&` and `=`
- `_DISCORD_MENTION_RE` gated on `<@`
- `_SIGNAL_PHONE_RE`  gated on `+`

Microbench (5 typical log records, 20k iterations each):
                              BEFORE  AFTER  delta
  redact_sensitive_text per call  5.63us  1.79us  -68%

Real-world impact: ~244 log records emitted in a 30-turn agent loop, so
the chain saves ~1ms of CPU per conversation. Bigger win is the
reduction in regex execution and GC pressure during heavy logging
sessions (verbose logging, gateway message processing).

Security regression test: 30 secret-containing inputs (sk-/ghp_/JWT/DB
connstr/Auth-Bearer/private key/URL userinfo/Discord/Signal/etc.)
verified to produce identical redacted output before/after. All 75
existing tests/agent/test_redact.py cases pass.

The `?access_token=foo&code=bar` (bare query string, no scheme) case
that 'leaks' is pre-existing behavior — the URL query redaction
requires a well-formed URL with scheme+host. Not a regression.

* perf(run_agent): cache _needs_thinking_reasoning_pad result per (provider, model, base_url)

Profile of a 31-turn synthetic agent run shows `_needs_thinking_reasoning_pad`
fires 495 times (~16 per turn) and each call ran 3 helper methods, each
hitting `base_url_host_matches` 1-4 times via `urlparse`. Total cost:
3,342 base_url_host_matches calls + 3,373 urlparse calls accounting for
~36ms of agent-loop overhead (~7% of the entire post-network work).

Provider / model / base_url don't change during a conversation except via
`switch_model` and fallback activation — both of which already overwrite
those attributes atomically. Cache the result on a tuple key; since the
key is derived from the very fields that would change, the cache
auto-invalidates on the next read after a switch. No manual invalidation
needed in switch_model / _try_activate_fallback.

Profile A/B (31-turn cached-config agent run):
                                      BEFORE  AFTER  delta
  _needs_thinking_reasoning_pad cum    18ms    1ms    -94%
  _copy_reasoning_content_for_api cum  17ms    1ms    -94%
  base_url_host_matches calls          3,342   372    -89%
  urlparse calls                       3,373   403    -88%
  total function calls                 296k    223k   -25%

Verified:
- tests/run_agent/test_deepseek_reasoning_content_echo.py: 36/36 pass
- tests/run_agent/ (full): 1383/1383 pass + 3 skipped
This commit is contained in:
Teknium 2026-05-19 14:25:10 -07:00 committed by GitHub
parent 784febe1cf
commit 544c31b50b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 165 additions and 42 deletions

View file

@ -4331,7 +4331,38 @@ def load_config() -> Dict[str, Any]:
The cache is keyed on ``str(config_path)`` so profile switches
(which change ``HERMES_HOME`` and therefore ``get_config_path()``)
don't collide.
Read-only callers should use ``load_config_readonly()`` to skip the
defensive deepcopy that path matters in agent-loop hot spots like
``get_provider_request_timeout`` which is called once per API turn.
"""
return _load_config_impl(want_deepcopy=True)
def load_config_readonly() -> Dict[str, Any]:
"""Fast-path variant of ``load_config()`` for callers that ONLY READ.
Returns the cached config dict directly without the defensive deepcopy
that ``load_config()`` applies. **Mutating the returned dict (or any
nested structure) corrupts the in-process cache for every subsequent
caller** only use this when you are absolutely sure your code path
will not write to the result. If you need to mutate or pass to
``save_config``, call ``load_config()`` instead.
Why this exists: ``load_config()`` cache-hit cost is ~265us per call,
half of which (~135us) is the defensive deepcopy. The agent loop calls
into config reads (timeouts, thresholds, feature flags) ~20-50x per
conversation; skipping deepcopy here removes a measurable allocation
source and the GC pressure that comes with it.
Note: this returns a plain ``dict`` (not ``MappingProxyType``) so
existing ``isinstance(x, dict)`` guards downstream keep working. The
safety guarantee is purely documented, not enforced be careful.
"""
return _load_config_impl(want_deepcopy=False)
def _load_config_impl(*, want_deepcopy: bool) -> Dict[str, Any]:
with _CONFIG_LOCK:
ensure_hermes_home()
config_path = get_config_path()
@ -4345,7 +4376,7 @@ def load_config() -> Dict[str, Any]:
cached = _LOAD_CONFIG_CACHE.get(path_key)
if cached is not None and cache_key is not None and cached[:2] == cache_key:
return copy.deepcopy(cached[2])
return copy.deepcopy(cached[2]) if want_deepcopy else cached[2]
config = copy.deepcopy(DEFAULT_CONFIG)
@ -4369,9 +4400,24 @@ def load_config() -> Dict[str, Any]:
expanded = _expand_env_vars(normalized)
_LAST_EXPANDED_CONFIG_BY_PATH[path_key] = copy.deepcopy(expanded)
if cache_key is not None:
_LOAD_CONFIG_CACHE[path_key] = (cache_key[0], cache_key[1], copy.deepcopy(expanded))
# Cache stores a separate deepcopy so subsequent ``load_config()``
# (deepcopy=True) callers can mutate freely without affecting the
# cached value, and ``load_config_readonly()`` (deepcopy=False)
# callers all see the same stable cached object.
cached_copy = copy.deepcopy(expanded)
_LOAD_CONFIG_CACHE[path_key] = (cache_key[0], cache_key[1], cached_copy)
# On the readonly path return the same cached object subsequent
# calls will see — keeps "two readonly calls return the same
# object" invariant that callers may rely on for identity checks.
if not want_deepcopy:
return cached_copy
else:
_LOAD_CONFIG_CACHE.pop(path_key, None)
# First-load result is a fresh dict (not aliased to the cache); safe
# to return directly. For the deepcopy=True path this is the
# canonical "freshly-built mutable result" the function has always
# returned. For the deepcopy=False path with no cache (e.g. config
# file missing), it's also fine — callers get an isolated object.
return expanded