Six tests in test_bedrock_adapter.py import botocore.exceptions
directly (ConnectionClosedError, EndpointConnectionError,
ReadTimeoutError, ClientError) without guarding the import. When
botocore is not installed (it's an optional dependency), these tests
fail with ModuleNotFoundError instead of being gracefully skipped.
Added pytest.importorskip('botocore') to each affected test function,
following the same pattern used elsewhere in the test suite (e.g.
test_voice_mode.py for numpy, test_mcp_oauth.py for mcp).
Tests affected:
- TestIsStaleConnectionError: 3 tests
- TestCallConverseInvalidatesOnStaleError: 3 tests
Before: 6 FAIL with ModuleNotFoundError
After: 6 SKIP with reason message
25 new tests (all Bedrock API calls mocked, no real AWS creds needed):
tests/hermes_cli/test_bedrock_model_picker.py (20 tests):
- provider_model_ids("bedrock") uses live discovery, returns regional
model IDs, falls back gracefully on empty/exception, resolves all
bedrock aliases (aws, aws-bedrock, amazon-bedrock) to live discovery
- list_authenticated_providers() section 2: bedrock appears with AWS
creds, model list from discover_bedrock_models(), total_models
matches, is_current flag works, absent creds hides bedrock, discovery
failure does not crash, no duplicate entries
- Region routing: botocore profile eu-central-1 yields eu.* model IDs
end-to-end; env var takes priority over botocore profile
- providers.py overlay: exists with correct transport/auth_type, label
is non-empty, all aliases normalize to bedrock
tests/agent/test_bedrock_adapter.py (5 tests):
- resolve_bedrock_region() botocore profile fallback, botocore failure
fallback, us-east-1 hard fallback (with botocore mocked)
## Problem
When a pooled HTTPS connection to the Bedrock runtime goes stale (NAT
timeout, VPN flap, server-side TCP RST, proxy idle cull), the next
Converse call surfaces as one of:
* botocore.exceptions.ConnectionClosedError / ReadTimeoutError /
EndpointConnectionError / ConnectTimeoutError
* urllib3.exceptions.ProtocolError
* A bare AssertionError raised from inside urllib3 or botocore
(internal connection-pool invariant check)
The agent loop retries the request 3x, but the cached boto3 client in
_bedrock_runtime_client_cache is reused across retries — so every
attempt hits the same dead connection pool and fails identically.
Only a process restart clears the cache and lets the user keep working.
The bare-AssertionError variant is particularly user-hostile because
str(AssertionError()) is an empty string, so the retry banner shows:
⚠️ API call failed: AssertionError
📝 Error:
with no hint of what went wrong.
## Fix
Add two helpers to agent/bedrock_adapter.py:
* is_stale_connection_error(exc) — classifies exceptions that
indicate dead-client/dead-socket state. Matches botocore
ConnectionError + HTTPClientError subtrees, urllib3
ProtocolError / NewConnectionError, and AssertionError
raised from a frame whose module name starts with urllib3.,
botocore., or boto3.. Application-level AssertionErrors are
intentionally excluded.
* invalidate_runtime_client(region) — per-region counterpart to
the existing reset_client_cache(). Evicts a single cached
client so the next call rebuilds it (and its connection pool).
Wire both into the Converse call sites:
* call_converse() / call_converse_stream() in
bedrock_adapter.py (defense-in-depth for any future caller)
* The two direct client.converse(**kwargs) /
client.converse_stream(**kwargs) call sites in run_agent.py
(the paths the agent loop actually uses)
On a stale-connection exception, the client is evicted and the
exception re-raised unchanged. The agent's existing retry loop then
builds a fresh client on the next attempt and recovers without
requiring a process restart.
## Tests
tests/agent/test_bedrock_adapter.py gets three new classes (14 tests):
* TestInvalidateRuntimeClient — per-region eviction correctness;
non-cached region returns False.
* TestIsStaleConnectionError — classifies botocore
ConnectionClosedError / EndpointConnectionError /
ReadTimeoutError, urllib3 ProtocolError, library-internal
AssertionError (both urllib3.* and botocore.* frames), and
correctly ignores application-level AssertionError and
unrelated exceptions (ValueError, KeyError).
* TestCallConverseInvalidatesOnStaleError — end-to-end: stale
error evicts the cached client, non-stale error (validation)
leaves it alone, successful call leaves it cached.
All 116 tests in test_bedrock_adapter.py pass.
Signed-off-by: Andre Kurait <andrekurait@gmail.com>