hermes-agent/hermes_cli
Teknium fdb9e0f6a6
fix(kanban): auto-block workers that exit without completing (#20894) (#21214)
When a kanban worker subprocess exits rc=0 but its task is still in
status='running', the agent almost certainly answered the task
conversationally without calling kanban_complete or kanban_block. The
dispatcher used to classify this as a generic crash and respawn, which
loops forever on small local models (gemma4-e2b q4 etc.) that keep
returning clean but unproductive output.

Dispatcher changes:
- The waitpid reap loop at the top of dispatch_once now records each
  reaped child's raw exit status in a bounded module registry
  (_recent_worker_exits, TTL 600s, size cap 4096).
- _classify_worker_exit distinguishes clean_exit / nonzero_exit /
  signaled / unknown using os.WIFEXITED / WIFSIGNALED.
- detect_crashed_workers consults the classification when a worker
  is found dead. clean_exit → protocol_violation event + immediate
  circuit-breaker trip (failure_limit=1). Everything else keeps the
  existing crashed-event + counter behavior.
- DispatchResult.auto_blocked now includes protocol-violation trips.

Gateway fix (Bug A in #20894):
- gateway.run._notify_active_sessions_of_shutdown snapshots
  self.adapters with list(...) before iterating. adapter.send() can
  hit a fatal-error path that pops the adapter from the dict, which
  was raising 'RuntimeError: dictionary changed size during iteration'
  during shutdown.

Regression tests:
- test_detect_crashed_workers_protocol_violation_auto_blocks verifies
  rc=0 + still-running → status=blocked on first occurrence with
  protocol_violation + gave_up events and NO crashed event.
- test_detect_crashed_workers_nonzero_exit_uses_default_limit verifies
  non-zero exits keep the existing 2-strike behavior.

Closes #20894.
2026-05-07 05:24:16 -07:00
..
__init__.py fix(windows): enforce UTF-8 stdout/stderr to prevent UnicodeEncodeError crash 2026-05-03 16:58:25 -07:00
_parser.py refactor(cli): derive relaunch flag table from argparse introspection 2026-04-29 20:33:29 -07:00
auth.py fix(security): close TOCTOU window in hermes_cli/auth.py credential writers (#21194) 2026-05-07 05:12:05 -07:00
auth_commands.py feat(nous): persist Nous OAuth across profiles via shared token store (#19712) 2026-05-04 04:54:55 -07:00
azure_detect.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
backup.py fix(backup): floor pre-update backup_keep to 1 so the new backup survives 2026-05-04 05:07:13 -07:00
banner.py fix(banner): show correct update status on nix-built hermes (#17550) 2026-04-30 07:03:00 +05:30
browser_connect.py fix(browser): address Copilot review on /browser connect 2026-04-28 22:11:10 -07:00
callbacks.py fix: ESC cancels secret/sudo prompts, clearer skip messaging (#9902) 2026-04-14 16:11:37 -07:00
checkpoints.py feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709) 2026-05-06 05:44:35 -07:00
claw.py fix(claw): handle missing dir in _scan_workspace_state 2026-05-05 06:08:14 -07:00
cli_output.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
clipboard.py feat: fix img pasting in new ink plus newline after tools 2026-04-11 13:14:32 -05:00
codex_models.py feat(codex): add gpt-5.5 and wire live model discovery into picker (#14720) 2026-04-23 13:32:43 -07:00
colors.py feat: respect NO_COLOR env var and TERM=dumb (#4079) 2026-03-30 17:07:21 -07:00
commands.py feat(telegram): /topic off + help + auth gate + screenshot debounce 2026-05-04 12:07:17 -07:00
completion.py fix: preserve profile name completion in dynamic shell completion 2026-04-14 10:45:42 -07:00
config.py feat(security): enable secret redaction by default (#17691, #20785) (#21193) 2026-05-07 05:10:33 -07:00
copilot_auth.py fix(oauth,gateway): monotonic deadlines for polling/timeout loops 2026-05-07 05:09:39 -07:00
cron.py feat(cron): add no_agent mode for script-only cron jobs (watchdog pattern) (#19709) 2026-05-04 12:31:01 -07:00
curator.py feat(curator): add archive and prune subcommands (#20200) 2026-05-05 05:15:54 -07:00
curses_ui.py fix: treat ctrl-c as curses cancel 2026-05-04 01:36:44 -07:00
debug.py fix(debug): redact log content at upload time in hermes debug share 2026-05-03 11:42:20 -07:00
default_soul.py fix: reset default SOUL.md to baseline identity text (#3159) 2026-03-26 01:34:27 -07:00
dingtalk_auth.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
doctor.py docs: pluggable surfaces coverage — model-provider guide, full plugin map, opt-in fix (#20749) 2026-05-06 07:24:42 -07:00
dump.py refactor(env): use shared Hermes dotenv loader 2026-05-05 10:13:13 -07:00
env_loader.py refactor: consolidate symlink-safe atomic replace into shared helper 2026-04-28 04:58:22 -07:00
fallback_cmd.py feat(cli): add 'hermes fallback' command to manage fallback providers (#16052) 2026-04-26 06:19:04 -07:00
gateway.py fix(oauth,gateway): monotonic deadlines for polling/timeout loops 2026-05-07 05:09:39 -07:00
goals.py feat: /goal — persistent cross-turn goals (Ralph loop) (#18262) 2026-04-30 23:10:20 -07:00
hooks.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
kanban.py fix: auto-block repeated kanban retries 2026-05-07 05:05:20 -07:00
kanban_db.py fix(kanban): auto-block workers that exit without completing (#20894) (#21214) 2026-05-07 05:24:16 -07:00
kanban_diagnostics.py fix(kanban): unify failure counter across spawn/timeout/crash outcomes (#20410) 2026-05-05 13:55:37 -07:00
logs.py feat: component-separated logging with session context and filtering (#7991) 2026-04-11 17:23:36 -07:00
main.py fix(mcp): give 'mcp add --command' a distinct argparse dest 2026-05-07 05:17:03 -07:00
mcp_config.py fix(mcp): give 'mcp add --command' a distinct argparse dest 2026-05-07 05:17:03 -07:00
memory_setup.py fix(cli): decode .env as UTF-8 to avoid GBK crash on Windows 2026-05-02 01:40:31 -07:00
model_catalog.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
model_normalize.py fix(opencode-go): keep users on opencode-go instead of hijacking to native providers (#20802) 2026-05-06 09:08:33 -07:00
model_switch.py fix(model_switch): live model discovery for custom_providers in /model picker 2026-05-07 05:21:26 -07:00
models.py docs: pluggable surfaces coverage — model-provider guide, full plugin map, opt-in fix (#20749) 2026-05-06 07:24:42 -07:00
nous_subscription.py feat(web): add SearXNG as a native search-only backend 2026-05-06 10:05:29 -07:00
oneshot.py fix(tui): honor launch toolsets (#17623) 2026-04-29 16:55:27 -07:00
pairing.py fix(pairing): handle null user_name in pairing list display 2026-04-23 02:34:11 -07:00
platforms.py feat: complete plugin platform parity — all 12 integration points 2026-04-29 21:56:51 -07:00
plugins.py feat(providers): make all 33 providers pluggable under plugins/model-providers/ 2026-05-05 13:40:01 -07:00
plugins_cmd.py feat(dashboard): add Plugins page with enable/disable, auth status, install/remove 2026-04-30 20:29:37 -04:00
profiles.py feat(profiles): --no-skills flag for empty profile creation (#20986) 2026-05-07 04:34:38 -07:00
providers.py fix: prevent bare 'custom' slug in model.provider (#17478) 2026-04-30 04:32:11 -07:00
pty_bridge.py fix(pty): default TERM for resize probes 2026-05-04 02:38:54 -07:00
relaunch.py remove relaunch_chat 2026-04-29 20:33:29 -07:00
runtime_provider.py fix(fallback): let custom_providers shadow built-in aliases 2026-04-30 20:18:44 -07:00
setup.py fix(gateway): don't dead-end setup wizard when only system-scope unit is installed 2026-05-06 15:58:02 -07:00
skills_config.py refactor(config): migrate remaining 33 cfg_get call sites (#17311) 2026-04-29 04:03:03 -07:00
skills_hub.py feat(skills): install skills from a direct HTTP(S) URL (#16323) 2026-04-26 20:57:10 -07:00
skin_engine.py fix(tui): honor skin highlight colors (#20895) 2026-05-06 14:01:56 -07:00
slack_cli.py fix(paths): route achievements plugin + profile-tui through HERMES_HOME 2026-04-30 23:21:54 -07:00
status.py fix(status): add missing popular provider API keys to hermes status display 2026-05-04 05:14:13 -07:00
timeouts.py refactor(timeouts): drop redundant ImportError in except clause 2026-04-26 20:48:20 -07:00
tips.py docs: refresh stale platform/LOC/test counts; clarify gateway vs plugin platforms 2026-05-05 13:45:47 -07:00
tools_config.py feat(web): add SearXNG as a native search-only backend 2026-05-06 10:05:29 -07:00
uninstall.py feat(uninstall): offer to remove named profiles when uninstalling from default 2026-04-18 19:18:13 -07:00
vercel_auth.py feat: add Vercel Sandbox backend 2026-04-29 07:22:33 -07:00
voice.py fix(tui): restore voice push-to-talk parity (#20897) 2026-05-06 15:49:59 -07:00
web_server.py fix(oauth,gateway): monotonic deadlines for polling/timeout loops 2026-05-07 05:09:39 -07:00
webhook.py refactor(config): migrate remaining 33 cfg_get call sites (#17311) 2026-04-29 04:03:03 -07:00