feat(goals): /goal wait <pid> — park the loop on a background process (#50503)

* feat(goals): add /goal wait <pid> barrier to park the loop on a background process The /goal loop re-pokes the agent every turn via the post-turn judge. When a goal is gated on a long-running background process (CI poller, build, test matrix, deploy) that produces nothing to judge yet, this spins the agent into 'is it done?' busy-work and burns the turn budget. /goal wait <pid> [reason] parks the loop: while the PID is alive, the judge is skipped, no turn is consumed, no continuation fires, and /goal status shows a parked indicator. The barrier auto-clears the moment the process exits (the agent's notify_on_complete watcher is the natural wake signal), then the next turn resumes normal judging. /goal unwait clears it manually; pause/resume/clear drop it; a dead/stale PID can never wedge the loop. Wired across CLI, gateway, and the mid-run command guard for parity. Barrier persists in SessionDB.state_meta (survives /resume); GoalState gains backward-compatible waiting_on_pid/waiting_reason/waiting_since fields. 12 new tests; docs updated. * fix(goals): use gateway.status._pid_exists for liveness, not os.kill(pid,0) The Windows-footguns CI guard flagged os.kill(pid, 0) in _pid_alive — on Windows that's not a no-op, it routes to CTRL_C_EVENT and hard-kills the target's console process group (bpo-14484). Delegate to the canonical footgun-safe gateway.status._pid_exists (psutil + ctypes/POSIX fallback) instead, with a direct-psutil last resort. * feat(goals): judge-driven auto-wait — the loop parks itself, no manual /goal wait Makes the wait barrier automatic. Every turn the judge is shown the agent's live background processes (pid, command, uptime, output tail from the process_registry) alongside the goal + response, and can return a new 'wait' verdict instead of continue: {"verdict":"wait","wait_on_pid":N} → park until that process exits {"verdict":"wait","wait_for_seconds":N} → park until the deadline passes evaluate_after_turn acts on the directive (sets the barrier, parks the loop) so the agent isn't re-poked into busy-work while CI/builds/deploys run. Adds a time-based waiting_until barrier alongside the pid barrier; both auto-clear and can never wedge the loop. Drivers (CLI, gateway, tui_gateway) feed the live registry in via gather_background_processes(). Manual /goal wait stays as an override. Judge verdict contract widened to (verdict, reason, parse_failed, wait_directive); legacy {"done":bool} shape still accepted. * test(goals): update kanban _fake_judge to the 4-tuple judge contract CI test(3) caught it: test_kanban_goal_mode's _fake_judge still returned the 3-tuple (verdict, reason, parse_failed), but the kanban loop now unpacks the 4-tuple (+ wait_directive). Update the fake to return None for the directive and accept the background_processes kwarg. * feat(goals): trigger-based wait — park on a process's own signal, not just exit Addresses two gaps in the judge-driven wait: (1) the judge could only express 'wait until PID exits' or 'wait N seconds', so a long-lived watcher/server that fires a trigger MID-RUN (and may never exit) couldn't be waited on; (2) the process's own watch_patterns/notify_on_complete trigger was invisible to the judge. Adds a session-based barrier (waiting_on_session) that releases on the process's OWN trigger via process_registry.is_session_waiting(): the session exits, OR (if started with watch_patterns) its pattern matches — even while the process keeps running. list_sessions() now surfaces session_id + watch_patterns/watch_hit/ notify_on_complete so the judge sees the trigger and is told to prefer wait_on_session for trigger processes. Judge verdict gains a {wait_on_session} directive (preferred over pid). Backward-compatible GoalState field; pid + time barriers unchanged. Tests: TestSessionTriggerBarrier (release on mid-run pattern match while alive, release on exit, unknown-session, full park→trigger→resume, parse, validation, backcompat load). 105 goal-surface + 85 process_registry tests green.
2026-06-24 10:52:21 +00:00 · 2026-06-22 06:27:29 -07:00 · 2026-06-22 06:27:29 -07:00 · ff85af3fc7
commit ff85af3fc7
parent d4fa2db1c5
13 changed files with 1139 additions and 104 deletions
--- a/tools/process_registry.py
+++ b/tools/process_registry.py
@ -1055,6 +1055,42 @@ class ProcessRegistry:
        """Check if a completion notification was already consumed via wait/log."""
        return session_id in self._completion_consumed

+    def is_session_waiting(self, session_id: str) -> bool:
+        """Whether a goal loop parked on this session should still be parked.
+
+        Used by the goal-loop wait barrier (``hermes_cli.goals``) to support
+        waiting on a process's OWN trigger, not just its exit. A session is
+        "still waiting" when:
+          - it is still running, AND
+          - if it has ``watch_patterns``, none has matched yet (so a
+            long-lived watcher that fires a trigger mid-run — and may never
+            exit — unblocks the moment its pattern hits, not on exit).
+
+        Returns False (don't wait) when the session has exited, its watch
+        pattern has already fired, or the session is unknown — so a stale or
+        already-triggered barrier can never wedge the loop.
+        """
+        if not session_id:
+            return False
+        with self._lock:
+            session = self._running.get(session_id) or self._finished.get(session_id)
+        if session is None:
+            return False
+        # Refresh detached/remote state so .exited is current.
+        try:
+            self._refresh_detached_session(session)
+        except Exception:
+            pass
+        if session.exited:
+            return False
+        # Watch-pattern process: the trigger is a pattern match, not exit.
+        # Once any match has been delivered, the wait is satisfied even though
+        # the process keeps running (server/daemon/watcher case).
+        if session.watch_patterns and not session._watch_disabled:
+            if session._watch_hits > 0:
+                return False
+        return True
+
    def _drain_should_skip(self, session_id: str) -> bool:
        """Whether the CLI drain should skip a completion event for this session.

@ -1500,6 +1536,14 @@ class ProcessRegistry:
                "status": "exited" if s.exited else "running",
                "output_preview": s.output_buffer[-200:] if s.output_buffer else "",
            }
+            # Trigger metadata so a goal-loop judge can decide to wait on this
+            # process's OWN signal (a watch-pattern match or completion), not
+            # just its exit. A watcher with watch_patterns may never exit.
+            if s.watch_patterns and not s._watch_disabled:
+                entry["watch_patterns"] = list(s.watch_patterns)
+                entry["watch_hit"] = s._watch_hits > 0
+            if s.notify_on_complete:
+                entry["notify_on_complete"] = True
            if s.exited:
                entry["exit_code"] = s.exit_code
            if s.detached: