mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
## Problem
When a pooled HTTPS connection to the Bedrock runtime goes stale (NAT
timeout, VPN flap, server-side TCP RST, proxy idle cull), the next
Converse call surfaces as one of:
* botocore.exceptions.ConnectionClosedError / ReadTimeoutError /
EndpointConnectionError / ConnectTimeoutError
* urllib3.exceptions.ProtocolError
* A bare AssertionError raised from inside urllib3 or botocore
(internal connection-pool invariant check)
The agent loop retries the request 3x, but the cached boto3 client in
_bedrock_runtime_client_cache is reused across retries — so every
attempt hits the same dead connection pool and fails identically.
Only a process restart clears the cache and lets the user keep working.
The bare-AssertionError variant is particularly user-hostile because
str(AssertionError()) is an empty string, so the retry banner shows:
⚠️ API call failed: AssertionError
📝 Error:
with no hint of what went wrong.
## Fix
Add two helpers to agent/bedrock_adapter.py:
* is_stale_connection_error(exc) — classifies exceptions that
indicate dead-client/dead-socket state. Matches botocore
ConnectionError + HTTPClientError subtrees, urllib3
ProtocolError / NewConnectionError, and AssertionError
raised from a frame whose module name starts with urllib3.,
botocore., or boto3.. Application-level AssertionErrors are
intentionally excluded.
* invalidate_runtime_client(region) — per-region counterpart to
the existing reset_client_cache(). Evicts a single cached
client so the next call rebuilds it (and its connection pool).
Wire both into the Converse call sites:
* call_converse() / call_converse_stream() in
bedrock_adapter.py (defense-in-depth for any future caller)
* The two direct client.converse(**kwargs) /
client.converse_stream(**kwargs) call sites in run_agent.py
(the paths the agent loop actually uses)
On a stale-connection exception, the client is evicted and the
exception re-raised unchanged. The agent's existing retry loop then
builds a fresh client on the next attempt and recovers without
requiring a process restart.
## Tests
tests/agent/test_bedrock_adapter.py gets three new classes (14 tests):
* TestInvalidateRuntimeClient — per-region eviction correctness;
non-cached region returns False.
* TestIsStaleConnectionError — classifies botocore
ConnectionClosedError / EndpointConnectionError /
ReadTimeoutError, urllib3 ProtocolError, library-internal
AssertionError (both urllib3.* and botocore.* frames), and
correctly ignores application-level AssertionError and
unrelated exceptions (ValueError, KeyError).
* TestCallConverseInvalidatesOnStaleError — end-to-end: stale
error evicts the cached client, non-stale error (validation)
leaves it alone, successful call leaves it cached.
All 116 tests in test_bedrock_adapter.py pass.
Signed-off-by: Andre Kurait <andrekurait@gmail.com>
|
||
|---|---|---|
| .. | ||
| transports | ||
| __init__.py | ||
| test_anthropic_adapter.py | ||
| test_anthropic_keychain.py | ||
| test_auxiliary_client.py | ||
| test_auxiliary_client_anthropic_custom.py | ||
| test_auxiliary_config_bridge.py | ||
| test_auxiliary_main_first.py | ||
| test_auxiliary_named_custom_providers.py | ||
| test_bedrock_adapter.py | ||
| test_bedrock_integration.py | ||
| test_codex_cloudflare_headers.py | ||
| test_compress_focus.py | ||
| test_context_compressor.py | ||
| test_context_engine.py | ||
| test_context_references.py | ||
| test_copilot_acp_client.py | ||
| test_credential_pool.py | ||
| test_credential_pool_routing.py | ||
| test_crossloop_client_cache.py | ||
| test_direct_provider_url_detection.py | ||
| test_display.py | ||
| test_display_emoji.py | ||
| test_error_classifier.py | ||
| test_external_skills.py | ||
| test_gemini_cloudcode.py | ||
| test_gemini_free_tier_gate.py | ||
| test_gemini_native_adapter.py | ||
| test_gemini_schema.py | ||
| test_image_gen_registry.py | ||
| test_insights.py | ||
| test_kimi_coding_anthropic_thinking.py | ||
| test_local_stream_timeout.py | ||
| test_memory_provider.py | ||
| test_memory_user_id.py | ||
| test_minimax_auxiliary_url.py | ||
| test_minimax_provider.py | ||
| test_model_metadata.py | ||
| test_model_metadata_local_ctx.py | ||
| test_model_metadata_ssl.py | ||
| test_models_dev.py | ||
| test_moonshot_schema.py | ||
| test_nous_rate_guard.py | ||
| test_prompt_builder.py | ||
| test_prompt_caching.py | ||
| test_proxy_and_url_validation.py | ||
| test_rate_limit_tracker.py | ||
| test_redact.py | ||
| test_shell_hooks.py | ||
| test_shell_hooks_consent.py | ||
| test_skill_commands.py | ||
| test_subagent_progress.py | ||
| test_subagent_stop_hook.py | ||
| test_subdirectory_hints.py | ||
| test_title_generator.py | ||
| test_usage_pricing.py | ||
| test_vision_resolved_args.py | ||