feat(agent): make API retry count configurable via agent.api_max_retries (#14730)

Closes #11616. The agent's API retry loop hardcoded max_retries = 3, so users with fallback providers on flaky primaries burned through ~3 × provider timeout (e.g. 3 × 180s = 9 minutes) before their fallback chain got a chance to kick in. Expose a new config key: agent: api_max_retries: 3 # default unchanged Set it to 1 for fast failover when you have fallback providers, or raise it if you prefer longer tolerance on a single provider. Values < 1 are clamped to 1 (single attempt, no retry); non-integer values fall back to the default. This wraps the Hermes-level retry loop only — the OpenAI SDK's own low-level retries (max_retries=2 default) still run beneath this for transient network errors. Changes: - hermes_cli/config.py: add agent.api_max_retries default 3 with comment. - run_agent.py: read self._api_max_retries in AIAgent.__init__; replace hardcoded max_retries = 3 in the retry loop with self._api_max_retries. - cli-config.yaml.example: documented example entry. - hermes_cli/tips.py: discoverable tip line. - tests/run_agent/test_api_max_retries_config.py: 4 tests covering default, override, clamp-to-one, and invalid-value fallback.
2026-04-25 00:51:20 +00:00 · 2026-04-23 13:59:32 -07:00 · 2026-04-23 13:59:32 -07:00 · 165b2e481a
commit 165b2e481a
parent 327b57da91
5 changed files with 94 additions and 1 deletions
--- a/run_agent.py
+++ b/run_agent.py
@ -1548,6 +1548,17 @@ class AIAgent:
            _agent_section = {}
        self._tool_use_enforcement = _agent_section.get("tool_use_enforcement", "auto")

+        # App-level API retry count (wraps each model API call).  Default 3,
+        # overridable via agent.api_max_retries in config.yaml.  See #11616.
+        try:
+            _raw_api_retries = _agent_section.get("api_max_retries", 3)
+            _api_retries = int(_raw_api_retries)
+            if _api_retries < 1:
+                _api_retries = 1  # 1 = no retry (single attempt)
+        except (TypeError, ValueError):
+            _api_retries = 3
+        self._api_max_retries = _api_retries
+
        # Initialize context compressor for automatic context management
        # Compresses conversation when approaching model's context limit
        # Configuration via config.yaml (compression section)
@ -9259,7 +9270,7 @@ class AIAgent:
            
            api_start_time = time.time()
            retry_count = 0
-            max_retries = 3
+            max_retries = self._api_max_retries
            primary_recovery_attempted = False
            max_compression_attempts = 3
            codex_auth_retry_attempted=False