hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-01 07:01:41 +00:00

Author	SHA1	Message	Date
Teknium	67011cc0d7	feat(agent): buffer retry/fallback status, surface only on terminal failure (#33816 ) Users report that the CLI/gateway floods them with confusing retry chatter during transient failures: a single 429 can produce 10+ "Provider/Endpoint/ Retrying in 5s..." lines before the request eventually succeeds. The same firehose hits Telegram, Discord, Slack, etc. via _emit_status. This patch defers all retry/fallback/compression status messages until we know the outcome: - if the turn ultimately succeeds (any path: primary recovers, fallback activates, compression unsticks the request), the buffer is silently dropped — the user sees nothing. - if every retry and fallback exhausts and the turn fails, the buffer is flushed at the terminal-failure return so the user sees the full retry trace alongside the final error. Backend logging (agent.log) is unchanged — every emission site still writes to logger.warning/info, so post-mortem diagnosis is intact. ## What changed run_agent.py: four new methods on AIAgent: _buffer_status(msg) — defer an _emit_status call _buffer_vprint(msg) — defer a _vprint(force=True) line _clear_status_buffer() — drop pending messages on success _flush_status_buffer() — replay pending messages on terminal failure agent/conversation_loop.py: - converted ~30 mid-process emit/vprint sites in the retry, fallback, compression, empty-response, and stream-watchdog paths to the buffered helpers - added _flush_status_buffer() at every terminal-failure return so users still see the trace when it actually matters - added _clear_status_buffer() at the "non-empty assistant content" point (NOT at "API call returned bytes" — empty responses still loop through the empty-retry path and would otherwise lose their trace between iterations) - silenced the two "(´;ω;`) oops, retrying..." / "(╥_╥) error, retrying..." spinner final-frame messages — the spinner now stops cleanly so retries leave no visible residue agent/chat_completion_helpers.py: same conversion for codex TTFB / stale- stream / fallback-activation status messages. agent/stream_diag.py: _emit_stream_drop now buffers instead of emitting directly. ## Tests tests/run_agent/test_retry_status_buffer.py: 7 unit tests covering accumulate→flush, clear-on-success, mixed kinds, empty-buffer no-op, re-buffer after flush, exception swallowing. Updated 3 existing tests that mocked _emit_status to also mock (or use) _buffer_status: - tests/run_agent/test_run_agent.py::test_empty_response_emits_status_for_gateway - tests/run_agent/test_stream_drop_logging.py (2 tests) - tests/agent/test_codex_ttfb_watchdog.py (TTFB hint test) ## Validation Live test: hermes chat -q against an unreachable endpoint with no fallback exhausts retries and prints the full trace at the end. Same flow against a working endpoint prints zero retry chatter.	2026-05-28 04:53:27 -07:00
Sanghyuk Seo	283bb810e7	fix(agent): tolerate large codex stream prefill	2026-05-27 11:19:55 -07:00
EvilHumphrey	4243b6dc45	fix(codex): update silent-hang workaround hint	2026-05-27 01:52:34 -07:00
adam91holt	8601c4d44c	fix(codex): add time-to-first-byte watchdog for stalled Codex streams The chatgpt.com/backend-api/codex endpoint has an intermittent failure mode where it accepts the connection but never emits a single stream event — the socket just hangs. Direct sequential probing reproduces it (0 events, no HTTP status), and a fresh reconnect then succeeds in ~2s. Today the only guard is the wall-clock stale timeout in interruptible_api_call, so a dead-on-arrival connection is held for the full stale window (90-900s depending on context / config) before the retry loop can reconnect — minutes of wasted wall time per stall, at a rate of ~20% of calls during affected windows. Add a TTFB watchdog scoped to the codex_responses path: - codex_runtime.run_codex_stream stamps agent._codex_stream_last_event_ts on every stream event (not just output-text deltas), so reasoning-only and tool-call-only turns are not mistaken for a stall. - interruptible_api_call resets that marker before the worker starts and, while it is still None, kills the connection once elapsed exceeds the TTFB cutoff (default 45s, tunable via HERMES_CODEX_TTFB_TIMEOUT_SECONDS, 0 disables). The raised TimeoutError flows through the existing retry path unchanged. Once any event has arrived the stream is healthy and only the existing wall-clock stale timeout applies, so legitimate long generations are never interrupted. Gated to codex_responses; the chat_completions non-stream, anthropic and bedrock branches have no first-event signal and are untouched. Adds tests/agent/test_codex_ttfb_watchdog.py covering the stall kill, the events-flowing pass-through, and the env-disable path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 05:34:42 -07:00

4 commits