hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-14 14:12:44 +00:00

History

Teknium 544c31b50b perf(agent-loop): cut 47% of per-conversation function calls via 3 targeted hot-path optimizations (#28866 ) * perf(config): add load_config_readonly() fast path for hot agent loop `load_config()` is called from the agent loop's per-API-call hot path via `get_provider_request_timeout()` and `get_provider_stale_timeout()` — both invoked once per turn from `_resolved_api_call_timeout()` in run_agent.py. Profiling a synthetic 20-tool-call agent run revealed: - 21 invocations of `load_config()` cumulating 56ms (~17% of agent loop) - 34,398 deepcopy calls totaling 37ms (config defensive deepcopy + chain) - 8,652 `_expand_env_vars` invocations (~412 per turn) Microbench (cache-hit, real config.yaml present): load_config() 265us/call (125us deepcopy + 140us infra) load_config_readonly() 138us/call (~48% faster) `load_config_readonly()` returns the cached dict directly without the defensive deepcopy. Documented contract: caller must not mutate. Returns plain dict (not MappingProxyType) so downstream `isinstance(x, dict)` guards keep working — caught during initial implementation when MappingProxyType broke get_provider_request_timeout's guard logic. Wired into hermes_cli/timeouts.py (the two functions called per agent turn). load_config() is unchanged for the 263 other call sites that mutate the result before save_config(), are not in the hot path, or where the safety guarantee matters more than the perf. Profile A/B (cached config, 21-turn agent loop): BEFORE AFTER delta get_provider_request_timeout 55ms 16ms -71% total function calls 399k 160k -60% deepcopy calls (in hotspots) 34,398 ~0 ~elim Verified: - isinstance(load_config_readonly(), dict) is True - timeout/stale resolutions correct - load_config() still returns isolated mutable deepcopies - tests/hermes_cli/test_config.py / test_timeouts.py: 102/102 pass - tests/cli/ + tests/agent/test_auxiliary_client.py: 883/883 pass perf(redact): substring pre-screens skip non-matching regex chains Every log record passes through `RedactingFormatter.format` which calls `redact_sensitive_text`, which historically ran ALL 13 secret-pattern regexes against every line — including DB connection strings, JWTs, Discord mentions, Signal phone numbers, etc. — even for typical clean log records like 'INFO run_agent: API call completed'. Add cheap substring pre-checks before each regex pass. False positives still run the regex (which then matches nothing); false negatives are impossible because every pattern requires the gated substring to match its leading anchor: - `_PREFIX_RE` gated on any of 33 known credential prefix substrings - `_ENV_ASSIGN_RE` gated on `=` in text - `_JSON_FIELD_RE` gated on `:` and `"` in text - `_AUTH_HEADER_RE` gated on `uthorization`/`UTHORIZATION` in text - `_TELEGRAM_RE` gated on `:` in text - `_PRIVATE_KEY_RE` gated on `BEGIN` and `-----` - `_DB_CONNSTR_RE` gated on `://` in text - `_JWT_RE` gated on `eyJ` in text - URL userinfo/query gated on `://` - `_redact_form_body` gated on `&` and `=` - `_DISCORD_MENTION_RE` gated on `<@` - `_SIGNAL_PHONE_RE` gated on `+` Microbench (5 typical log records, 20k iterations each): BEFORE AFTER delta redact_sensitive_text per call 5.63us 1.79us -68% Real-world impact: ~244 log records emitted in a 30-turn agent loop, so the chain saves ~1ms of CPU per conversation. Bigger win is the reduction in regex execution and GC pressure during heavy logging sessions (verbose logging, gateway message processing). Security regression test: 30 secret-containing inputs (sk-/ghp_/JWT/DB connstr/Auth-Bearer/private key/URL userinfo/Discord/Signal/etc.) verified to produce identical redacted output before/after. All 75 existing tests/agent/test_redact.py cases pass. The `?access_token=foo&code=bar` (bare query string, no scheme) case that 'leaks' is pre-existing behavior — the URL query redaction requires a well-formed URL with scheme+host. Not a regression. * perf(run_agent): cache _needs_thinking_reasoning_pad result per (provider, model, base_url) Profile of a 31-turn synthetic agent run shows `_needs_thinking_reasoning_pad` fires 495 times (~16 per turn) and each call ran 3 helper methods, each hitting `base_url_host_matches` 1-4 times via `urlparse`. Total cost: 3,342 base_url_host_matches calls + 3,373 urlparse calls accounting for ~36ms of agent-loop overhead (~7% of the entire post-network work). Provider / model / base_url don't change during a conversation except via `switch_model` and fallback activation — both of which already overwrite those attributes atomically. Cache the result on a tuple key; since the key is derived from the very fields that would change, the cache auto-invalidates on the next read after a switch. No manual invalidation needed in switch_model / _try_activate_fallback. Profile A/B (31-turn cached-config agent run): BEFORE AFTER delta _needs_thinking_reasoning_pad cum 18ms 1ms -94% _copy_reasoning_content_for_api cum 17ms 1ms -94% base_url_host_matches calls 3,342 372 -89% urlparse calls 3,373 403 -88% total function calls 296k 223k -25% Verified: - tests/run_agent/test_deepseek_reasoning_content_echo.py: 36/36 pass - tests/run_agent/ (full): 1383/1383 pass + 3 skipped		2026-05-19 14:25:10 -07:00
..
proxy
__init__.py
_parser.py
_subprocess_compat.py
auth.py	fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes	2026-05-19 00:12:41 -07:00
auth_commands.py
azure_detect.py
backup.py
banner.py
browser_connect.py
bundles.py	feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373 )	2026-05-18 21:38:05 -07:00
callbacks.py
checkpoints.py
claw.py
cli_output.py
clipboard.py
codex_models.py
codex_runtime_plugin_migration.py
codex_runtime_switch.py
colors.py
commands.py	Revert "feat(telegram): support quick-command-only menus"	2026-05-18 23:59:57 -07:00
completion.py
config.py	perf(agent-loop): cut 47% of per-conversation function calls via 3 targeted hot-path optimizations (#28866 )	2026-05-19 14:25:10 -07:00
copilot_auth.py
cron.py
curator.py
curses_ui.py
debug.py
default_soul.py
dep_ensure.py
dingtalk_auth.py
doctor.py	fix(doctor): attach codex CLI hint to OpenAI Codex auth warning for #27975	2026-05-19 00:14:39 -07:00
dump.py
env_loader.py
fallback_cmd.py
gateway.py	fix(gateway): harden Windows gateway install lifecycle	2026-05-19 11:23:15 -07:00
gateway_windows.py	fix(gateway): harden Windows gateway install lifecycle	2026-05-19 11:23:15 -07:00
goals.py
hooks.py
inventory.py
kanban.py	feat(kanban): add scheduled status for delayed follow-ups	2026-05-18 21:39:03 -07:00
kanban_db.py	fix(kanban): also hoist idx_events_run + drop redundant inner create	2026-05-19 08:09:11 -07:00
kanban_decompose.py
kanban_diagnostics.py	fix(kanban): honor severity thresholds in diagnostics	2026-05-18 20:47:01 -07:00
kanban_specify.py
kanban_swarm.py	feat(cli): add kanban swarm topology helper	2026-05-18 21:10:12 -07:00
logs.py
main.py	fix: register browse-sh in per-source limits and --source choices	2026-05-19 14:17:38 -07:00
mcp_config.py
memory_setup.py
model_catalog.py
model_normalize.py
model_switch.py	fix(model-switch): mark bare custom provider as current	2026-05-19 10:57:35 -07:00
models.py
nous_subscription.py
oneshot.py
pairing.py
platforms.py
plugins.py
plugins_cmd.py
profile_describer.py
profile_distribution.py
profiles.py
providers.py
pt_input_extras.py
pty_bridge.py
relaunch.py
runtime_provider.py	fix(runtime): treat 'ollama'/'vllm'/'llamacpp' aliases like 'custom' for base_url trust (#27132 )	2026-05-19 14:23:19 -07:00
security_advisories.py
send_cmd.py
session_recap.py
setup.py	fix(cli): preserve setup config picker writes	2026-05-19 14:23:19 -07:00
skills_config.py
skills_hub.py	fix: register browse-sh in per-source limits and --source choices	2026-05-19 14:17:38 -07:00
skin_engine.py
slack_cli.py
status.py
stdio.py
timeouts.py	perf(agent-loop): cut 47% of per-conversation function calls via 3 targeted hot-path optimizations (#28866 )	2026-05-19 14:25:10 -07:00
tips.py
tools_config.py
uninstall.py
vercel_auth.py
voice.py
web_server.py	fix(dashboard): use browser scrollback for chat wheel	2026-05-19 00:07:33 -07:00
webhook.py