hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-09 08:21:50 +00:00

History

Teknium a5bd56eae3 fix: eliminate provider hang dead zones in retry/timeout architecture (#8985 ) Three targeted changes to close the gaps between retry layers that caused users to experience 'No response from provider for 580s' and 'No activity for 15 minutes' despite having 5 layers of retry: 1. Remove non-streaming fallback from streaming path Previously, when all 3 stream retries exhausted, the code fell back to _interruptible_api_call() which had no stale detection and no activity tracking — a black hole that could hang for up to 1800s. Now errors propagate to the main retry loop which has richer recovery (credential rotation, provider fallback, backoff). For 'stream not supported' errors, sets _disable_streaming flag so the main retry loop automatically switches to non-streaming on the next attempt. 2. Add _touch_activity to recovery dead zones The gateway inactivity monitor relies on _touch_activity() to know the agent is alive, but activity was never touched during: - Stale stream detection/kill cycles (180-300s gaps) - Stream retry connection rebuilds - Main retry backoff sleeps (up to 120s) - Error recovery classification Now all these paths touch activity every ~30s, keeping the gateway informed during recovery cycles. 3. Add stale-call detector to non-streaming path _interruptible_api_call() now has the same stale detection pattern as the streaming path: kills hung connections after 300s (default, configurable via HERMES_API_CALL_STALE_TIMEOUT), scaled for large contexts (450s for 50K+ tokens, 600s for 100K+ tokens), disabled for local providers. Also touches activity every ~30s during the wait so the gateway monitor stays informed. Env vars: - HERMES_API_CALL_STALE_TIMEOUT: non-streaming stale timeout (default 300s) - HERMES_STREAM_STALE_TIMEOUT: unchanged (default 180s) Before: worst case ~2+ hours of sequential retries with no feedback After: worst case bounded by gateway inactivity timeout (default 1800s) with continuous activity reporting		2026-04-13 04:55:20 -07:00
..
acp	fix(acp): declare session load and resume capabilities in initialize response (#6985 )	2026-04-10 03:45:36 -07:00
agent	fix(agent): prefer Ollama Modelfile num_ctx over GGUF training max	2026-04-13 04:24:07 -07:00
cli	fix: show full last assistant response when resuming a session (#8724 )	2026-04-12 19:07:14 -07:00
cron	feat(cron): support Discord thread_id in deliver targets	2026-04-10 03:20:05 -07:00
e2e	refactor: extract shared helpers to deduplicate repeated code patterns (#7917 )	2026-04-11 13:59:52 -07:00
environments/benchmarks	fix(security): consolidated security hardening — SSRF, timing attack, tar traversal, credential leakage (#5944 )	2026-04-07 17:28:37 -07:00
fakes	fix: streaming tool call parsing, error handling, and fake HA state mutation	2026-03-14 14:27:20 +03:00
gateway	fix: reopen resumed gateway sessions in sqlite	2026-04-13 04:54:07 -07:00
hermes_cli	feat: fix SQLite safety in hermes backup + add --quick snapshots + /snapshot command (#8971 )	2026-04-13 04:46:13 -07:00
honcho_plugin	feat(honcho): add opt-in initOnSessionStart for tools mode and respect explicit peerName (#6995 )	2026-04-11 00:43:27 -07:00
integration	refactor: remove mini-swe-agent dependency — inline Docker/Modal backends (#2804 )	2026-03-24 07:30:25 -07:00
plugins	feat(hindsight): feature parity, setup wizard, and config improvements	2026-04-08 23:54:15 -07:00
run_agent	fix: eliminate provider hang dead zones in retry/timeout architecture (#8985 )	2026-04-13 04:55:20 -07:00
skills	fix(migration): don't auto-archive OpenClaw source directory	2026-04-12 00:33:54 -07:00
tools	test: add multi-word query tests for truncation match strategy	2026-04-13 04:54:42 -07:00
__init__.py	A bit of restructuring for simplicity and organization	2025-10-01 23:29:25 +00:00
conftest.py	fix(tests): fix several failing/flaky tests on main (#6777 )	2026-04-09 13:17:06 -07:00
run_interrupt_test.py	fix: thread safety for concurrent subagent delegation (#1672 )	2026-03-17 02:53:33 -07:00
test_batch_runner_checkpoint.py	fix: sanitize chat payloads and provider precedence	2026-03-13 23:59:12 -07:00
test_cli_file_drop.py	fix(gateway): reject file paths in get_command() + file-drop tests (#7356 )	2026-04-10 13:06:02 -07:00
test_cli_skin_integration.py	fix: CLI/UX batch — ChatConsole errors, curses scroll, skin-aware banner, git state banner (#5974 )	2026-04-07 17:59:42 -07:00
test_ctx_halving_fix.py	fix(compaction): don't halve context_length on output-cap-too-large errors	2026-04-09 11:27:41 -07:00
test_empty_model_fallback.py	fix: fall back to provider's default model when model config is empty (#8303 )	2026-04-12 03:53:30 -07:00
test_evidence_store.py	feat: add OSS Security Forensics skill (Skills Hub) (#1482 )	2026-03-15 21:59:53 -07:00
test_hermes_constants.py	fix(gateway): harden Docker/container gateway pathway	2026-04-12 16:36:11 -07:00
test_hermes_logging.py	feat: component-separated logging with session context and filtering (#7991 )	2026-04-11 17:23:36 -07:00
test_hermes_state.py	fix(state): orphan children instead of cascade-deleting in prune/delete (#6513 )	2026-04-09 02:41:56 -07:00
test_honcho_client_config.py	feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623 )	2026-04-02 15:33:51 -07:00
test_ipv4_preference.py	feat: add network.force_ipv4 config to fix IPv6 timeout issues (#8196 )	2026-04-11 23:12:11 -07:00
test_mcp_serve.py	feat: add MCP server mode — hermes mcp serve (#3795 )	2026-03-29 15:47:19 -07:00
test_minisweagent_path.py	chore: remove all remaining mini-swe-agent references	2026-03-24 08:19:23 -07:00
test_model_picker_scroll.py	fix: CLI/UX batch — ChatConsole errors, curses scroll, skin-aware banner, git state banner (#5974 )	2026-04-07 17:59:42 -07:00
test_model_tools.py	Add request-scoped plugin lifecycle hooks	2026-04-05 23:31:29 -07:00
test_model_tools_async_bridge.py	fix: use per-thread persistent event loops in worker threads	2026-03-20 15:41:06 -04:00
test_ollama_num_ctx.py	fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983 )	2026-04-07 22:23:28 -07:00
test_packaging_metadata.py	chore: prepare Hermes for Homebrew packaging (#4099 )	2026-03-30 17:34:43 -07:00
test_project_metadata.py	refactor(matrix): swap matrix-nio for mautrix-python dependency	2026-04-10 21:15:59 -07:00
test_retry_utils.py	feat(agent): add jittered retry backoff	2026-04-08 00:41:36 -07:00
test_sql_injection.py	fix(security): eliminate SQL string formatting in execute() calls	2026-03-19 15:16:35 +01:00
test_subprocess_home_isolation.py	fix: per-profile subprocess HOME isolation (#4426 ) (#7357 )	2026-04-10 13:37:45 -07:00
test_timezone.py	fix: remove 115 verified dead code symbols across 46 production files	2026-04-10 03:44:43 -07:00
test_toolset_distributions.py	test: add unit tests for 8 modules (batch 2)	2026-02-26 13:54:20 +03:00
test_toolsets.py	fix: add missing Platform.SIGNAL to toolset mappings, update test + config docs	2026-03-09 23:27:19 -07:00
test_trajectory_compressor.py	fix: URL-based auth for third-party Anthropic endpoints + CI test fixes (#4148 )	2026-03-30 20:36:56 -07:00
test_trajectory_compressor_async.py	fix: create AsyncOpenAI lazily in trajectory_compressor to avoid closed event loop (#4013 )	2026-03-30 13:16:16 -07:00
test_utils_truthy_values.py	Gate tool-gateway behind an env var, so it's not in users' faces until we're ready. Even if users enable it, it'll be blocked server-side for now, until we unlock for non-admin users on tool-gateway.	2026-03-30 13:28:10 +09:00