perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138)

Interactive `hermes` launch drops from ~21s to ~2.5s. Three independent
fixes, each targets a distinct hot spot in the banner / tool-registration
path that fires on every CLI invocation.

1. `get_external_skills_dirs()` in-process mtime cache (~10s saved)
   The function re-read + YAML-parsed the full ~/.hermes/config.yaml on
   every call. Banner build invokes it once per skill to resolve the
   category column, which on a 120-skill install meant ~120 reparses of
   a 15 KB config (~85 ms each). Added a
   `(config_path, mtime_ns) -> list[Path]` memo; stat() is ~2 us vs
   ~85 ms for the parse. Edits to config.yaml invalidate the cache on
   the next call via mtime.

2. Feishu availability probe uses `importlib.util.find_spec` (~5.2s saved)
   `tools/feishu_doc_tool.py::_check_feishu` and the identical helper in
   `feishu_drive_tool.py` were calling `import lark_oapi` purely to
   detect whether the SDK was installed. Executing the real import pulls
   in websockets + dispatcher + every v2 API model — ~5 seconds of work
   that fires at every tool-registry bootstrap. `find_spec` answers the
   same question ("is lark_oapi importable?") without executing the
   module. The actual tool handlers still do the real import on invoke,
   so runtime behavior is unchanged.

3. `_web_requires_env` no longer triggers Nous portal refresh (~800ms saved)
   `tools/web_tools.py::_web_requires_env` used
   `managed_nous_tools_enabled()` to gate four gateway env-var names in
   the returned list. The gate called `get_nous_auth_status()` ->
   `resolve_nous_runtime_credentials()` -> live HTTP POST to the portal
   on every tool-registry bootstrap. But the list is pure metadata — if
   the env var is set at runtime, the tool lights up; otherwise it
   doesn't. Including the four names unconditionally is harmless for
   unsubscribed users (vars just aren't set) and eliminates the sync
   HTTP round trip from startup.

Test:
- tests/agent/test_external_skills_dirs_cache.py (new, 6 cases):
  returns config'd dir, caches on second call (yaml_load patched to
  raise — never invoked), invalidates on mtime bump, empty when config
  missing, returned list is a defensive copy, per-HERMES_HOME cache key
  isolation.
- Existing tests/agent/test_external_skills.py and tests/tools/
  continue to pass modulo pre-existing flakes on main (test_delegate,
  test_send_message — unrelated, pass in isolation).

Measured: bare `hermes` (cold → REPL ready) 21,519ms -> 2,618ms on
Teknium's install (119 skills, 15 KB config.yaml, Nous auth logged in,
lark_oapi installed). 8x faster.
This commit is contained in:
Teknium 2026-05-08 16:39:32 -07:00 committed by GitHub
parent d606df8126
commit 0ec052ca24
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 220 additions and 20 deletions

View file

@ -28,10 +28,12 @@ def get_client():
def _check_feishu():
# See ``tools/feishu_doc_tool.py::_check_feishu`` — ``find_spec`` keeps
# CLI startup fast (the SDK itself takes ~5s to import eagerly).
import importlib.util
try:
import lark_oapi # noqa: F401
return True
except ImportError:
return importlib.util.find_spec("lark_oapi") is not None
except (ImportError, ValueError):
return False