mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-11 08:42:11 +00:00
* desktop+gateway: harden Slack socket recovery and Windows restart dedupe Fix Slack Socket Mode reliability by adding a watchdog/reconnect path so silent socket task drops no longer leave the adapter stuck. Harden Windows gateway lifecycle by avoiding desktop-binary path collisions, making gateway PID scans case/extension tolerant, and reusing in-flight restart actions to prevent duplicate gateway spawns. * test(slack): add Socket Mode watchdog/reconnect behavioural coverage Drive the new Slack Socket Mode self-healing logic through a fake AsyncSocketModeHandler so we can simulate the P0 silent-hang failure mode (task exit, transport disconnected, intentional shutdown, concurrent reconnect attempts) without touching real Slack. * fix(slack,desktop): address Copilot review on watchdog races and path normalization - connect(): explicitly cancel + await the prior socket watchdog before flipping _running, so an old monitor cannot exit between teardown and respawn (Copilot #1) - _socket_watchdog_loop: wrap the body in try/except + add a done-callback that respawns on unexpected crash, so a transient bug cannot permanently disable self-healing (Copilot #2) - normalizeExecutablePathForCompare: use the resolved path for realpathSync so non-string inputs cannot leak through (Copilot #3) - Add tests for crash-recovery and atomic watchdog replacement across reconnects * fix(slack): tighten connect() error path and clarify watchdog test intent Address Copilot review round 2. - connect(): wrap _start_socket_mode_handler/_ensure_socket_watchdog in a focused try/except so any failure rolls back partially-started handler/task state and leaves _running=False, ensuring the platform lock is always released by the outer finally - Defer _running=True until after the handler is actually started so the watchdog observes a live socket task immediately and never spins against a half-built adapter - Rename test_watchdog_self_restarts_after_unexpected_crash to test_watchdog_cancellation_does_not_respawn (matches what it actually asserts) and add test_watchdog_unexpected_exit_respawns_via_done_callback that drives a real RuntimeError through _on_socket_watchdog_done and verifies a fresh task replaces the crashed one * fix(web_server): serialize action spawn check+store under a threading lock Address Copilot review round 3. FastAPI runs sync handlers on its threadpool, so two near-simultaneous /api/gateway/restart (or /api/hermes/update) requests could both observe "no live process" in _spawn_hermes_action's poll-based dedupe and double-spawn. Add a module-level _ACTION_SPAWN_LOCK around the entire check + Popen + _ACTION_PROCS store sequence so the dedupe is atomic across threads. * fix: address Copilot review round 4 - slack.disconnect(): mirror connect()'s defensive cleanup — catch the broad Exception path on watchdog await so handler shutdown and lock release still run if the watchdog raised before cancellation took effect - web_server._spawn_hermes_action: wrap subprocess.Popen in try/except so a missing executable / permission error closes the log file handle, writes a failure marker, and re-raises instead of leaking a file descriptor - gateway._scan_gateway_pids: drop the over-broad "hermes.exe --profile" / "hermes.exe -p" patterns that would match any Hermes CLI subcommand using a profile flag (e.g. `hermes.exe --profile foo dashboard`); rely on the "hermes.exe gateway" + "hermes-gateway.exe" tokens instead - tests: tighten _fake_create_task to assert coroutine input and return a real asyncio.Task that stays pending until pytest teardown, and update the three callsites whose mocked AsyncSocketModeHandler.start_async returned a non-coroutine value * fix(slack): reset multi-workspace state on reconnect Address Copilot review round 5. connect() is reentrant (gateway restart, in-process reconnect), but it was leaving _bot_user_id / _team_clients / _team_bot_user_ids populated from the previous session. A reconnect that rotated the primary token or dropped a workspace would silently keep the stale bot user id and stale workspace client maps, leading to dispatch against gone workspaces. Clear these three pieces of state right after _stop_socket_mode_handler() and before the auth_test loop, then let the loop repopulate from the current tokens. Add test_reconnect_refreshes_multi_workspace_state to lock it in. |
||
|---|---|---|
| .. | ||
| proxy | ||
| __init__.py | ||
| _parser.py | ||
| _subprocess_compat.py | ||
| auth.py | ||
| auth_commands.py | ||
| azure_detect.py | ||
| backup.py | ||
| banner.py | ||
| browser_connect.py | ||
| callbacks.py | ||
| checkpoints.py | ||
| claw.py | ||
| cli_output.py | ||
| clipboard.py | ||
| codex_models.py | ||
| codex_runtime_plugin_migration.py | ||
| codex_runtime_switch.py | ||
| colors.py | ||
| commands.py | ||
| completion.py | ||
| config.py | ||
| copilot_auth.py | ||
| cron.py | ||
| curator.py | ||
| curses_ui.py | ||
| debug.py | ||
| default_soul.py | ||
| dep_ensure.py | ||
| dingtalk_auth.py | ||
| doctor.py | ||
| dump.py | ||
| env_loader.py | ||
| fallback_cmd.py | ||
| gateway.py | ||
| gateway_windows.py | ||
| goals.py | ||
| hooks.py | ||
| inventory.py | ||
| kanban.py | ||
| kanban_db.py | ||
| kanban_decompose.py | ||
| kanban_diagnostics.py | ||
| kanban_specify.py | ||
| logs.py | ||
| main.py | ||
| mcp_config.py | ||
| memory_setup.py | ||
| model_catalog.py | ||
| model_normalize.py | ||
| model_switch.py | ||
| models.py | ||
| nous_subscription.py | ||
| oneshot.py | ||
| pairing.py | ||
| platforms.py | ||
| plugins.py | ||
| plugins_cmd.py | ||
| profile_describer.py | ||
| profile_distribution.py | ||
| profiles.py | ||
| providers.py | ||
| pt_input_extras.py | ||
| pty_bridge.py | ||
| relaunch.py | ||
| runtime_provider.py | ||
| security_advisories.py | ||
| send_cmd.py | ||
| session_recap.py | ||
| setup.py | ||
| skills_config.py | ||
| skills_hub.py | ||
| skin_engine.py | ||
| slack_cli.py | ||
| status.py | ||
| stdio.py | ||
| timeouts.py | ||
| tips.py | ||
| tools_config.py | ||
| uninstall.py | ||
| vercel_auth.py | ||
| voice.py | ||
| web_server.py | ||
| webhook.py | ||