hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-14 14:12:44 +00:00

History

Teknium 544c31b50b perf(agent-loop): cut 47% of per-conversation function calls via 3 targeted hot-path optimizations (#28866 ) * perf(config): add load_config_readonly() fast path for hot agent loop `load_config()` is called from the agent loop's per-API-call hot path via `get_provider_request_timeout()` and `get_provider_stale_timeout()` — both invoked once per turn from `_resolved_api_call_timeout()` in run_agent.py. Profiling a synthetic 20-tool-call agent run revealed: - 21 invocations of `load_config()` cumulating 56ms (~17% of agent loop) - 34,398 deepcopy calls totaling 37ms (config defensive deepcopy + chain) - 8,652 `_expand_env_vars` invocations (~412 per turn) Microbench (cache-hit, real config.yaml present): load_config() 265us/call (125us deepcopy + 140us infra) load_config_readonly() 138us/call (~48% faster) `load_config_readonly()` returns the cached dict directly without the defensive deepcopy. Documented contract: caller must not mutate. Returns plain dict (not MappingProxyType) so downstream `isinstance(x, dict)` guards keep working — caught during initial implementation when MappingProxyType broke get_provider_request_timeout's guard logic. Wired into hermes_cli/timeouts.py (the two functions called per agent turn). load_config() is unchanged for the 263 other call sites that mutate the result before save_config(), are not in the hot path, or where the safety guarantee matters more than the perf. Profile A/B (cached config, 21-turn agent loop): BEFORE AFTER delta get_provider_request_timeout 55ms 16ms -71% total function calls 399k 160k -60% deepcopy calls (in hotspots) 34,398 ~0 ~elim Verified: - isinstance(load_config_readonly(), dict) is True - timeout/stale resolutions correct - load_config() still returns isolated mutable deepcopies - tests/hermes_cli/test_config.py / test_timeouts.py: 102/102 pass - tests/cli/ + tests/agent/test_auxiliary_client.py: 883/883 pass perf(redact): substring pre-screens skip non-matching regex chains Every log record passes through `RedactingFormatter.format` which calls `redact_sensitive_text`, which historically ran ALL 13 secret-pattern regexes against every line — including DB connection strings, JWTs, Discord mentions, Signal phone numbers, etc. — even for typical clean log records like 'INFO run_agent: API call completed'. Add cheap substring pre-checks before each regex pass. False positives still run the regex (which then matches nothing); false negatives are impossible because every pattern requires the gated substring to match its leading anchor: - `_PREFIX_RE` gated on any of 33 known credential prefix substrings - `_ENV_ASSIGN_RE` gated on `=` in text - `_JSON_FIELD_RE` gated on `:` and `"` in text - `_AUTH_HEADER_RE` gated on `uthorization`/`UTHORIZATION` in text - `_TELEGRAM_RE` gated on `:` in text - `_PRIVATE_KEY_RE` gated on `BEGIN` and `-----` - `_DB_CONNSTR_RE` gated on `://` in text - `_JWT_RE` gated on `eyJ` in text - URL userinfo/query gated on `://` - `_redact_form_body` gated on `&` and `=` - `_DISCORD_MENTION_RE` gated on `<@` - `_SIGNAL_PHONE_RE` gated on `+` Microbench (5 typical log records, 20k iterations each): BEFORE AFTER delta redact_sensitive_text per call 5.63us 1.79us -68% Real-world impact: ~244 log records emitted in a 30-turn agent loop, so the chain saves ~1ms of CPU per conversation. Bigger win is the reduction in regex execution and GC pressure during heavy logging sessions (verbose logging, gateway message processing). Security regression test: 30 secret-containing inputs (sk-/ghp_/JWT/DB connstr/Auth-Bearer/private key/URL userinfo/Discord/Signal/etc.) verified to produce identical redacted output before/after. All 75 existing tests/agent/test_redact.py cases pass. The `?access_token=foo&code=bar` (bare query string, no scheme) case that 'leaks' is pre-existing behavior — the URL query redaction requires a well-formed URL with scheme+host. Not a regression. * perf(run_agent): cache _needs_thinking_reasoning_pad result per (provider, model, base_url) Profile of a 31-turn synthetic agent run shows `_needs_thinking_reasoning_pad` fires 495 times (~16 per turn) and each call ran 3 helper methods, each hitting `base_url_host_matches` 1-4 times via `urlparse`. Total cost: 3,342 base_url_host_matches calls + 3,373 urlparse calls accounting for ~36ms of agent-loop overhead (~7% of the entire post-network work). Provider / model / base_url don't change during a conversation except via `switch_model` and fallback activation — both of which already overwrite those attributes atomically. Cache the result on a tuple key; since the key is derived from the very fields that would change, the cache auto-invalidates on the next read after a switch. No manual invalidation needed in switch_model / _try_activate_fallback. Profile A/B (31-turn cached-config agent run): BEFORE AFTER delta _needs_thinking_reasoning_pad cum 18ms 1ms -94% _copy_reasoning_content_for_api cum 17ms 1ms -94% base_url_host_matches calls 3,342 372 -89% urlparse calls 3,373 403 -88% total function calls 296k 223k -25% Verified: - tests/run_agent/test_deepseek_reasoning_content_echo.py: 36/36 pass - tests/run_agent/ (full): 1383/1383 pass + 3 skipped		2026-05-19 14:25:10 -07:00
..
lsp	chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355 )	2026-05-17 02:29:41 -07:00
transports	fix(codex): allow kanban worker board writes	2026-05-17 11:50:43 -07:00
__init__.py	Refactor Terminal and AIAgent cleanup	2026-02-21 22:31:43 -08:00
account_usage.py	chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937 )	2026-05-11 11:13:25 -07:00
agent_init.py	perf(prompt): cache kanban worker guidance at session init	2026-05-18 20:56:44 -07:00
agent_runtime_helpers.py	fix(agent): set tool_name on tool-result messages at construction time	2026-05-19 20:49:11 +01:00
anthropic_adapter.py	feat(azure-foundry): add Microsoft Entra ID auth	2026-05-18 10:14:38 -07:00
async_utils.py	fix(async): close unscheduled coroutines in all threadsafe bridges (#26584 )	2026-05-15 14:00:01 -07:00
auxiliary_client.py	fix(xai-responses): strip enum values containing '/' from tool schemas	2026-05-18 10:37:35 -07:00
azure_identity_adapter.py	feat(azure-foundry): add Microsoft Entra ID auth	2026-05-18 10:14:38 -07:00
background_review.py	feat(bg-review): add bundled/pinned skill protection rules to review prompts (#27644 )	2026-05-18 20:02:22 -07:00
bedrock_adapter.py	chore(deps): lazy-install boto3/botocore for bedrock adapter	2026-05-17 02:31:18 -07:00
browser_provider.py	fix(browser): self-review pass — dead-import, log levels, future-proofing	2026-05-17 04:04:15 -07:00
browser_registry.py	fix(browser): self-review pass — dead-import, log levels, future-proofing	2026-05-17 04:04:15 -07:00
chat_completion_helpers.py	fix(xai-responses): strip enum values containing '/' from tool schemas	2026-05-18 10:37:35 -07:00
codex_responses_adapter.py	fix(xai-oauth): recover from prelude SSE errors, gate reasoning replay, surface entitlement 403s (#26644 )	2026-05-15 16:35:12 -07:00
codex_runtime.py	fix(xai): surface provider 'error' SSE frame in Codex fallback stream (#27184 )	2026-05-16 23:41:09 -07:00
context_compressor.py	fix(compress): make abort-on-summary-failure opt-in via config flag (#28117 )	2026-05-18 10:28:20 -07:00
context_engine.py	fix(compression): keep default protect_first_n at 3 + align ABC	2026-05-13 22:25:16 -07:00
context_references.py	fix(agent): fall back when rg is blocked for @folder references	2026-04-20 01:56:41 -07:00
conversation_compression.py	fix(compress): abort instead of dropping messages when summary LLM fails (#28102 )	2026-05-18 10:19:40 -07:00
conversation_loop.py	fix: wrap _pool_may_recover_from_rate_limit call through run_agent namespace	2026-05-18 20:04:57 -07:00
copilot_acp_client.py	fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes	2026-05-19 00:12:41 -07:00
credential_pool.py	fix(codex-oauth): quarantine terminal refresh errors so dead tokens are not replayed across sessions	2026-05-18 10:31:40 -07:00
credential_sources.py	feat(xai-oauth): add xAI Grok OAuth (SuperGrok Subscription) provider	2026-05-15 12:11:32 -07:00
curator.py	feat(curator): hint at `hermes curator pin` in the rename block (#23212 )	2026-05-10 06:44:53 -07:00
curator_backup.py	fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671 ) (#18731 )	2026-05-02 01:29:57 -07:00
display.py	chore: remove Atropos RL environments and tinker-atropos integration (#26106 )	2026-05-15 10:36:38 +05:30
error_classifier.py	fix(error_classifier): classify xAI Grok entitlement SSE errors as auth	2026-05-18 10:24:13 -07:00
file_safety.py	fix(security): apply file safety to copilot acp fs	2026-04-21 01:31:58 -07:00
gemini_cloudcode_adapter.py	fix(agent/gemini-cloudcode): seed delta defaults for reasoning-only stream chunks	2026-05-14 08:03:56 -07:00
gemini_native_adapter.py	fix(auxiliary): evict async wrappers on poisoned client (follow-up to #23482 )	2026-05-11 11:13:20 -07:00
gemini_schema.py	chore: remove unused imports and dead locals (ruff F401, F841) (#17010 )	2026-04-28 06:46:45 -07:00
google_code_assist.py	chore: remove unused imports and dead locals (ruff F401, F841) (#17010 )	2026-04-28 06:46:45 -07:00
google_oauth.py	fix(google_oauth): close TOCTOU window when saving credentials	2026-05-04 03:16:19 -07:00
i18n.py	feat(i18n): localize all gateway commands + web dashboard, add 8 new locales (16 total) (#22914 )	2026-05-10 07:14:14 -07:00
image_gen_provider.py	feat(plugins): pluggable image_gen backends + OpenAI provider (#13799 )	2026-04-21 21:30:10 -07:00
image_gen_registry.py	fix(plugins): filter resolution by is_available() in web + image_gen registries	2026-05-13 22:31:28 -07:00
image_routing.py	chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937 )	2026-05-11 11:13:25 -07:00
insights.py	Merge branch 'main' into feat/dashboard-skill-analytics	2026-04-20 05:25:49 -07:00
iteration_budget.py	refactor(run_agent): extract OpenAI proxy, safe stdio, IterationBudget	2026-05-16 17:59:32 -07:00
lmstudio_reasoning.py	feat(agent): add lmstudio integration	2026-04-28 12:27:36 -07:00
manual_compression_feedback.py	fix(compression): include system prompt + tool schemas in token estimates (#18265 )	2026-04-30 23:03:54 -07:00
markdown_tables.py	fix(cli): vertical fallback for markdown tables wider than terminal (#23948 )	2026-05-11 16:49:13 -07:00
memory_manager.py	🐛 fix(memory): require newline after context tag	2026-05-18 10:53:08 -07:00
memory_provider.py	docs(agent): remove stale BuiltinMemoryProvider references from memory module docstrings	2026-05-05 13:33:49 -07:00
message_sanitization.py	refactor(run_agent): extract message sanitization to agent/message_sanitization.py	2026-05-16 17:41:09 -07:00
model_metadata.py	fix(metadata): qwen3.6-plus has a 1M context window (#27008 )	2026-05-17 02:31:18 -07:00
models_dev.py	feat: add NovitaAI as LLM provider	2026-05-13 23:51:15 -07:00
moonshot_schema.py	fix(moonshot): strip $ref siblings and collapse tuple items in tool schemas (#27104 )	2026-05-16 13:02:19 -07:00
nous_rate_guard.py	codebase: add encoding='utf-8' to all bare open() calls (PLW1514)	2026-05-08 14:27:40 -07:00
onboarding.py	docs(onboarding): lead OpenClaw residue banner with migrate, warn that cleanup breaks OpenClaw (#17507 )	2026-04-29 08:08:36 -07:00
plugin_llm.py	feat(plugins): run any LLM call from inside a plugin via ctx.llm (#23194 )	2026-05-10 07:09:28 -07:00
portal_tags.py	feat(nous): unified client=hermes-client-v<version> tag on every Portal request (#24779 )	2026-05-12 20:49:20 -07:00
process_bootstrap.py	refactor(run_agent): extract OpenAI proxy, safe stdio, IterationBudget	2026-05-16 17:59:32 -07:00
prompt_builder.py	fix(kanban): stale reclaim must not tick failure counter (#28680 )	2026-05-19 03:15:18 -07:00
prompt_caching.py	fix(cache): kill long-lived prefix layout — system prompt is now byte-static within a session (#24778 )	2026-05-12 20:46:04 -07:00
rate_limit_tracker.py	refactor: remove dead code — 1,784 lines across 77 files (#9180 )	2026-04-13 16:32:04 -07:00
redact.py	perf(agent-loop): cut 47% of per-conversation function calls via 3 targeted hot-path optimizations (#28866 )	2026-05-19 14:25:10 -07:00
retry_utils.py	feat(agent): add jittered retry backoff	2026-04-08 00:41:36 -07:00
shell_hooks.py	fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes	2026-05-19 00:12:41 -07:00
skill_bundles.py	feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373 )	2026-05-18 21:38:05 -07:00
skill_commands.py	fix(skills): load symlinked skill slash commands	2026-05-18 00:34:29 -07:00
skill_preprocessing.py	fix: treat inline-shell timeout guard as timeout	2026-05-18 19:36:04 -07:00
skill_utils.py	perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138 )	2026-05-08 16:39:32 -07:00
stream_diag.py	refactor(run_agent): extract stream diagnostics to agent/stream_diag.py	2026-05-16 18:28:17 -07:00
subdirectory_hints.py	fix(agent): catch PermissionError in subdirectory hint discovery	2026-04-09 03:10:30 -07:00
system_prompt.py	perf(prompt): cache kanban worker guidance at session init	2026-05-18 20:56:44 -07:00
think_scrubber.py	fix(agent): stateful streaming scrubber for reasoning-block leaks (#17924 ) (#20184 )	2026-05-05 04:33:38 -07:00
title_generator.py	fix: improve telegram topic mode setup	2026-05-04 12:07:17 -07:00
tool_dispatch_helpers.py	fix(agent): set tool_name on tool-result messages at construction time	2026-05-19 20:49:11 +01:00
tool_executor.py	fix(agent): set tool_name on tool-result messages at construction time	2026-05-19 20:49:11 +01:00
tool_guardrails.py	fix: add recovery hints to loop guard warnings	2026-05-19 00:12:12 -07:00
tool_result_classification.py	fix: classify landed file mutations with diagnostics	2026-05-13 06:46:23 -07:00
trajectory.py	Refactor Terminal and AIAgent cleanup	2026-02-21 22:31:43 -08:00
usage_pricing.py	fix(pricing): add deepseek-v4-pro to official docs pricing table	2026-05-12 16:32:57 -07:00
video_gen_provider.py	feat(video_gen): unified video_generate tool with pluggable provider backends (#25126 )	2026-05-13 16:39:41 -07:00
video_gen_registry.py	feat(video_gen): unified video_generate tool with pluggable provider backends (#25126 )	2026-05-13 16:39:41 -07:00
web_search_provider.py	fix(web): align _LEGACY_PREFERENCE with legacy 7-provider order + doc cleanup	2026-05-13 22:31:28 -07:00
web_search_registry.py	fix(web): align _LEGACY_PREFERENCE with legacy 7-provider order + doc cleanup	2026-05-13 22:31:28 -07:00