perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage (#17046)

* perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage

Four heavy SDK/module imports are now deferred off the hot startup path.
Net savings on cold module imports:

  cli                       1200 → 958 ms  (-242)
  run_agent                 1220 → 901 ms  (-319)
  tools.web_tools            711 → 423 ms  (-288)
  agent.anthropic_adapter    230 →  15 ms  (-215)
  agent.auxiliary_client     253 →  68 ms  (-185)

Four independent changes in one PR since they all use the same pattern
and share the same risk profile (heavy SDK import → lazy proxy or
function-local import):

1. tools/web_tools.py:
   'from firecrawl import Firecrawl' moved into _get_firecrawl_client(),
   which is only called when backend='firecrawl'. Users on Exa/Tavily/
   Parallel pay zero firecrawl cost.

2. cli.py + gateway/run.py:
   'from agent.account_usage import ...' moved into the /limits handlers.
   account_usage transitively pulls the OpenAI SDK chain; only needed
   when the user runs /limits.

3. agent/anthropic_adapter.py:
   'try: import anthropic as _anthropic_sdk' replaced with a cached
   '_get_anthropic_sdk()' accessor. The three usage sites
   (build_anthropic_client, build_anthropic_bedrock_client,
   read_claude_code_credentials_from_keychain) now resolve via the
   accessor. All pre-existing test patches of
   'agent.anthropic_adapter._anthropic_sdk' keep working because the
   accessor respects any value already in module globals.

4. agent/auxiliary_client.py AND run_agent.py:
   'from openai import OpenAI' replaced with an '_OpenAIProxy()' module-
   level object that looks like the OpenAI class but imports the SDK on
   first call/isinstance check. This preserves:
     - 15+ in-module OpenAI(...) construction sites in auxiliary_client
       and the single site in run_agent's _create_openai_client (Python's
       function-scope name lookup finds the proxy, forwards the call);
     - 'patch("agent.auxiliary_client.OpenAI", ...)' and
       'patch("run_agent.OpenAI", ...)' test patterns used by 28+ test
       files (patch replaces the module attribute as usual).
   Tried two alternatives first:
     - 'from openai._client import OpenAI' — doesn't skip openai/__init__.py
       (the audit's hypothesis here was wrong).
     - Module-level __getattr__ — works for external access but Python
       function-scope name resolution skips __getattr__, so in-module
       OpenAI(...) calls NameError.

Note: 'openai' still loads on 'import cli' because
cli.py -> neuter_async_httpx_del() -> openai._base_client, and
run_agent.py -> code_execution_tool.py (module-level
build_execute_code_schema) -> _load_config() -> 'from cli import
CLI_CONFIG'. Deferring those is a separate, larger change — out of scope
for this PR. The savings above all come from avoiding the openai/*,
anthropic/*, and firecrawl/* top-level type-tree imports on paths that
don't need them.

Verified:
- 302/302 tests in tests/agent/{test_anthropic_adapter,
  test_bedrock_1m_context, test_minimax_provider, test_anthropic_keychain}
  pass. Two pre-existing failures on main unchanged.
- 106/106 tests/agent/test_auxiliary_client.py pass (1 pre-existing fail).
- 97/97 tests/run_agent/test_create_openai_client_kwargs_isolation.py,
  test_plugin_context_engine_init.py, test_invalid_context_length_warning.py,
  test_api_max_retries_config.py,
  tests/hermes_cli/test_gemini_provider.py, test_ollama_cloud_provider.py
  pass (1 pre-existing fail).
- Live hermes chat smoke: 2 turns + /model switch + tool calls, zero
  errors in the 57-line agent.log window.
- Module-level import of run_agent + auxiliary_client + anthropic_adapter
  no longer pulls 'anthropic' or 'firecrawl' at all.

* fix(gateway): restore top-level account_usage import for test-patch surface

CI caught two failures in tests/gateway/test_usage_command.py that I
missed locally:

    AttributeError: 'module' object at gateway.run has no attribute 'fetch_account_usage'

The test uses monkeypatch.setattr('gateway.run.fetch_account_usage', ...)
to inject a fake account-fetch call. Moving the import inside the
handler deleted that module-level attribute, breaking the patch surface.

Restoring the top-level import in gateway/run.py gives up the ~230 ms
gateway-boot savings from that one lazy, but:

  1. the gateway is a long-running daemon — boot cost is paid once per
     install, not per turn;
  2. the other four lazy-imports (firecrawl, openai, anthropic, cli's
     account_usage) remain in place and still account for the bulk of
     the savings reported in the PR body;
  3. preserving the patch surface keeps the established
     'gateway.run.fetch_account_usage' monkeypatch pattern working
     without touching tests.

Verified: tests/gateway/test_usage_command.py — 8 passed, 0 failed.
Full targeted sweep (2336 tests across agent/gateway/hermes_cli/run_agent):
2332 passed, 4 failed — all 4 pre-existing on main.

---------

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
This commit is contained in:
Teknium 2026-04-28 09:38:42 -07:00 committed by GitHub
parent df51ad7973
commit b5128a751b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 125 additions and 9 deletions

View file

@ -22,10 +22,25 @@ from hermes_constants import get_hermes_home
from typing import Any, Dict, List, Optional, Tuple
from utils import normalize_proxy_env_vars
try:
import anthropic as _anthropic_sdk
except ImportError:
_anthropic_sdk = None # type: ignore[assignment]
# NOTE: `import anthropic` is deliberately NOT at module top — the SDK pulls
# ~220 ms of imports (anthropic.types, anthropic.lib.tools._beta_runner, etc.)
# and the 3 usage sites (build_anthropic_client, build_anthropic_bedrock_client,
# read_claude_code_credentials_from_keychain) are all on cold user-triggered
# paths. Access via the `_get_anthropic_sdk()` accessor below, which caches
# the module after the first call and returns None on ImportError.
_anthropic_sdk: Any = ... # sentinel — None means "tried and missing"
def _get_anthropic_sdk():
"""Return the ``anthropic`` SDK module, importing lazily. None if not installed."""
global _anthropic_sdk
if _anthropic_sdk is ...:
try:
import anthropic as _sdk
_anthropic_sdk = _sdk
except ImportError:
_anthropic_sdk = None
return _anthropic_sdk
logger = logging.getLogger(__name__)
@ -395,6 +410,7 @@ def build_anthropic_client(api_key: str, base_url: str = None, timeout: float =
Returns an anthropic.Anthropic instance.
"""
_anthropic_sdk = _get_anthropic_sdk()
if _anthropic_sdk is None:
raise ImportError(
"The 'anthropic' package is required for the Anthropic provider. "
@ -492,6 +508,7 @@ def build_anthropic_bedrock_client(region: str):
Auth uses the boto3 default credential chain (IAM roles, SSO, env vars).
"""
_anthropic_sdk = _get_anthropic_sdk()
if _anthropic_sdk is None:
raise ImportError(
"The 'anthropic' package is required for the Bedrock provider. "

View file

@ -41,10 +41,57 @@ import threading
import time
from pathlib import Path # noqa: F401 — used by test mocks
from types import SimpleNamespace
from typing import Any, Dict, List, Optional, Tuple
from typing import Any, Dict, List, Optional, Tuple, TYPE_CHECKING
from urllib.parse import urlparse, parse_qs, urlunparse
from openai import OpenAI
# NOTE: `from openai import OpenAI` is deliberately NOT at module top — the
# openai SDK pulls a large type tree (~240 ms cold, including responses/*,
# graders/*). We expose `OpenAI` here as a thin proxy that imports the SDK on
# first call and forwards, so:
# (a) the 15+ in-module `OpenAI(...)` construction sites work unchanged
# (Python's function-scope name lookup resolves `OpenAI` to the proxy
# object bound in module globals here, without triggering any import);
# (b) external code can still do `auxiliary_client.OpenAI` or
# `patch("agent.auxiliary_client.OpenAI", ...)` — tests see the proxy,
# and patch replaces the module attribute as usual;
# (c) `OpenAI` as a type annotation resolves at runtime to the proxy class
# (which is harmless — annotations aren't type-checked at runtime).
# See tests/agent/test_auxiliary_client.py for patch patterns this supports.
if TYPE_CHECKING:
from openai import OpenAI # noqa: F401 — type hints only
_OPENAI_CLS_CACHE: Optional[type] = None
def _load_openai_cls() -> type:
"""Import and cache ``openai.OpenAI``."""
global _OPENAI_CLS_CACHE
if _OPENAI_CLS_CACHE is None:
from openai import OpenAI as _cls
_OPENAI_CLS_CACHE = _cls
return _OPENAI_CLS_CACHE
class _OpenAIProxy:
"""Module-level proxy that looks like the ``openai.OpenAI`` class.
Forwards ``OpenAI(...)`` calls and ``isinstance(x, OpenAI)`` checks to the
real SDK class, importing the SDK lazily on first use.
"""
__slots__ = ()
def __call__(self, *args, **kwargs):
return _load_openai_cls()(*args, **kwargs)
def __instancecheck__(self, obj):
return isinstance(obj, _load_openai_cls())
def __repr__(self):
return "<lazy openai.OpenAI proxy>"
OpenAI = _OpenAIProxy() # module-level name, resolves lazily on call/isinstance
from agent.credential_pool import load_pool
from hermes_cli.config import get_hermes_home

6
cli.py
View file

@ -69,7 +69,9 @@ from agent.usage_pricing import (
format_duration_compact,
format_token_count_compact,
)
from agent.account_usage import fetch_account_usage, render_account_usage_lines
# NOTE: `from agent.account_usage import ...` is deliberately NOT at module
# top — it transitively pulls the OpenAI SDK chain (~230 ms cold) and is only
# needed when the user runs `/limits`. Lazy-imported inside the handler below.
from hermes_cli.banner import _format_context_length, format_banner_version_label
_COMMAND_SPINNER_FRAMES = ("", "", "", "", "", "", "", "", "", "")
@ -7285,6 +7287,8 @@ class HermesCLI:
provider = getattr(agent, "provider", None) or getattr(self, "provider", None)
base_url = getattr(agent, "base_url", None) or getattr(self, "base_url", None)
api_key = getattr(agent, "api_key", None) or getattr(self, "api_key", None)
# Lazy import — pulls the OpenAI SDK chain, only needed here.
from agent.account_usage import fetch_account_usage, render_account_usage_lines
account_snapshot = None
if provider:
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as _pool:

View file

@ -31,6 +31,12 @@ from pathlib import Path
from datetime import datetime
from typing import Dict, Optional, Any, List
# account_usage imports the OpenAI SDK chain (~230 ms). Only needed by
# /usage; we still import it at module top in the gateway because test
# patches (tests/gateway/test_usage_command.py) target
# `gateway.run.fetch_account_usage` as a module-level attribute. The
# gateway is a long-running daemon, so its boot cost matters less than
# preserving the established test-patch surface.
from agent.account_usage import fetch_account_usage, render_account_usage_lines
# --- Agent cache tuning ---------------------------------------------------

View file

@ -41,13 +41,48 @@ import urllib.request
import uuid
from typing import List, Dict, Any, Optional
from urllib.parse import urlparse, parse_qs, urlunparse
from openai import OpenAI
# NOTE: `from openai import OpenAI` is deliberately NOT at module top — the
# SDK pulls ~240 ms of imports. We expose `OpenAI` as a thin proxy object
# that imports the SDK on first call/isinstance check. This preserves:
# (a) the single in-module `OpenAI(**client_kwargs)` call site at
# _create_openai_client, and
# (b) `patch("run_agent.OpenAI", ...)` test patterns used by ~28 test files.
import fire
from datetime import datetime
from pathlib import Path
from hermes_constants import get_hermes_home
_OPENAI_CLS_CACHE: Optional[type] = None
def _load_openai_cls() -> type:
"""Import and cache ``openai.OpenAI``."""
global _OPENAI_CLS_CACHE
if _OPENAI_CLS_CACHE is None:
from openai import OpenAI as _cls
_OPENAI_CLS_CACHE = _cls
return _OPENAI_CLS_CACHE
class _OpenAIProxy:
"""Module-level proxy that looks like ``openai.OpenAI`` but imports lazily."""
__slots__ = ()
def __call__(self, *args, **kwargs):
return _load_openai_cls()(*args, **kwargs)
def __instancecheck__(self, obj):
return isinstance(obj, _load_openai_cls())
def __repr__(self):
return "<lazy openai.OpenAI proxy>"
OpenAI = _OpenAIProxy()
# Load .env from ~/.hermes/.env first, then project root as dev fallback.
# User-managed env files should override stale shell exports on restart.
from hermes_cli.env_loader import load_hermes_dotenv
@ -5243,6 +5278,8 @@ class AIAgent:
keepalive_http = self._build_keepalive_http_client(client_kwargs.get("base_url", ""))
if keepalive_http is not None:
client_kwargs["http_client"] = keepalive_http
# Uses the module-level `OpenAI` name, resolved lazily on first
# access via __getattr__ below. Tests patch via `run_agent.OpenAI`.
client = OpenAI(**client_kwargs)
logger.info(
"OpenAI client created (%s, shared=%s) %s",

View file

@ -47,7 +47,10 @@ import re
import asyncio
from typing import List, Dict, Any, Optional
import httpx
from firecrawl import Firecrawl
# NOTE: `from firecrawl import Firecrawl` is deliberately NOT at module top —
# the SDK pulls ~200 ms of imports (httpcore, firecrawl.v1/v2 type trees) and
# we only need it when the backend is actually "firecrawl". See
# _get_firecrawl_client() below for the lazy import.
from agent.auxiliary_client import (
async_call_llm,
extract_content_or_reasoning,
@ -236,6 +239,8 @@ def _get_firecrawl_client():
if _firecrawl_client is not None and _firecrawl_client_config == client_config:
return _firecrawl_client
# Lazy import — ~200 ms of SDK init, only paid when firecrawl is actually used.
from firecrawl import Firecrawl # noqa: E402
_firecrawl_client = Firecrawl(**kwargs)
_firecrawl_client_config = client_config
return _firecrawl_client