feat(goals): /goal wait <pid> — park the loop on a background process (#50503)

* feat(goals): add /goal wait <pid> barrier to park the loop on a background process

The /goal loop re-pokes the agent every turn via the post-turn judge. When a
goal is gated on a long-running background process (CI poller, build, test
matrix, deploy) that produces nothing to judge yet, this spins the agent into
'is it done?' busy-work and burns the turn budget.

/goal wait <pid> [reason] parks the loop: while the PID is alive, the judge is
skipped, no turn is consumed, no continuation fires, and /goal status shows a
parked indicator. The barrier auto-clears the moment the process exits (the
agent's notify_on_complete watcher is the natural wake signal), then the next
turn resumes normal judging. /goal unwait clears it manually; pause/resume/clear
drop it; a dead/stale PID can never wedge the loop.

Wired across CLI, gateway, and the mid-run command guard for parity. Barrier
persists in SessionDB.state_meta (survives /resume); GoalState gains
backward-compatible waiting_on_pid/waiting_reason/waiting_since fields. 12 new
tests; docs updated.

* fix(goals): use gateway.status._pid_exists for liveness, not os.kill(pid,0)

The Windows-footguns CI guard flagged os.kill(pid, 0) in _pid_alive — on
Windows that's not a no-op, it routes to CTRL_C_EVENT and hard-kills the
target's console process group (bpo-14484). Delegate to the canonical
footgun-safe gateway.status._pid_exists (psutil + ctypes/POSIX fallback)
instead, with a direct-psutil last resort.

* feat(goals): judge-driven auto-wait — the loop parks itself, no manual /goal wait

Makes the wait barrier automatic. Every turn the judge is shown the agent's
live background processes (pid, command, uptime, output tail from the
process_registry) alongside the goal + response, and can return a new 'wait'
verdict instead of continue:
  {"verdict":"wait","wait_on_pid":N}      → park until that process exits
  {"verdict":"wait","wait_for_seconds":N} → park until the deadline passes
evaluate_after_turn acts on the directive (sets the barrier, parks the loop)
so the agent isn't re-poked into busy-work while CI/builds/deploys run. Adds a
time-based waiting_until barrier alongside the pid barrier; both auto-clear and
can never wedge the loop. Drivers (CLI, gateway, tui_gateway) feed the live
registry in via gather_background_processes(). Manual /goal wait stays as an
override. Judge verdict contract widened to (verdict, reason, parse_failed,
wait_directive); legacy {"done":bool} shape still accepted.

* test(goals): update kanban _fake_judge to the 4-tuple judge contract

CI test(3) caught it: test_kanban_goal_mode's _fake_judge still returned the
3-tuple (verdict, reason, parse_failed), but the kanban loop now unpacks the
4-tuple (+ wait_directive). Update the fake to return None for the directive
and accept the background_processes kwarg.

* feat(goals): trigger-based wait — park on a process's own signal, not just exit

Addresses two gaps in the judge-driven wait: (1) the judge could only express
'wait until PID exits' or 'wait N seconds', so a long-lived watcher/server that
fires a trigger MID-RUN (and may never exit) couldn't be waited on; (2) the
process's own watch_patterns/notify_on_complete trigger was invisible to the judge.

Adds a session-based barrier (waiting_on_session) that releases on the process's
OWN trigger via process_registry.is_session_waiting(): the session exits, OR (if
started with watch_patterns) its pattern matches — even while the process keeps
running. list_sessions() now surfaces session_id + watch_patterns/watch_hit/
notify_on_complete so the judge sees the trigger and is told to prefer
wait_on_session for trigger processes. Judge verdict gains a {wait_on_session}
directive (preferred over pid). Backward-compatible GoalState field; pid + time
barriers unchanged.

Tests: TestSessionTriggerBarrier (release on mid-run pattern match while alive,
release on exit, unknown-session, full park→trigger→resume, parse, validation,
backcompat load). 105 goal-surface + 85 process_registry tests green.
This commit is contained in:
Teknium 2026-06-22 06:27:29 -07:00 committed by GitHub
parent d4fa2db1c5
commit ff85af3fc7
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
13 changed files with 1139 additions and 104 deletions

12
cli.py
View file

@ -8460,7 +8460,17 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
if not last_response.strip():
return
decision = mgr.evaluate_after_turn(last_response, user_initiated=True)
try:
from hermes_cli.goals import gather_background_processes as _gather_bg
_bg_procs = _gather_bg()
except Exception:
_bg_procs = None
decision = mgr.evaluate_after_turn(
last_response,
user_initiated=True,
background_processes=_bg_procs,
)
msg = decision.get("message") or ""
if msg:
_cprint(f" {msg}")

View file

@ -7768,16 +7768,24 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
if _cmd_def_inner and _cmd_def_inner.name == "kanban":
return await self._handle_kanban_command(event)
# /goal is safe mid-run for status/pause/clear (inspection and
# control-plane only — doesn't interrupt the running turn).
# /goal is safe mid-run for status/pause/clear/wait (inspection
# and control-plane only — doesn't interrupt the running turn).
# Setting a new goal text mid-run is rejected with the same
# "wait or /stop" message as /model so we don't race a second
# continuation prompt against the current turn.
if _cmd_def_inner and _cmd_def_inner.name == "goal":
_goal_arg = (event.get_command_args() or "").strip().lower()
if not _goal_arg or _goal_arg in {"status", "pause", "resume", "clear", "stop", "done"}:
_goal_verb = _goal_arg.split(None, 1)[0] if _goal_arg else ""
# Exact-match control verbs (unchanged semantics), plus the
# wait/unwait barrier verbs which take a pid argument.
_is_control = (
not _goal_arg
or _goal_arg in {"status", "pause", "resume", "clear", "stop", "done", "unwait"}
or _goal_verb == "wait"
)
if _is_control:
return await self._handle_goal_command(event)
return "Agent is running — use /goal status / pause / clear mid-run, or /stop before setting a new goal."
return "Agent is running — use /goal status / pause / clear / wait mid-run, or /stop before setting a new goal."
# /subgoal is safe mid-run — it only modifies the goal's
# subgoals list, which the judge reads at the next turn
@ -10634,7 +10642,17 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
if not mgr.is_active():
return
decision = mgr.evaluate_after_turn(final_response or "", user_initiated=True)
try:
from hermes_cli.goals import gather_background_processes as _gather_bg
_bg_procs = _gather_bg()
except Exception:
_bg_procs = None
decision = mgr.evaluate_after_turn(
final_response or "",
user_initiated=True,
background_processes=_bg_procs,
)
msg = decision.get("message") or ""
# Defer the status line until after the adapter has delivered the

View file

@ -1808,6 +1808,30 @@ class GatewaySlashCommandsMixin:
logger.debug("goal clear: pending continuation cleanup failed: %s", exc)
return t("gateway.goal_cleared") if had else t("gateway.no_active_goal")
# /goal wait <pid> [reason] — park the loop on a background process.
if lower == "wait" or lower.startswith("wait "):
wait_arg = args[len("wait"):].strip()
if not wait_arg:
return "Usage: /goal wait <pid> [reason]"
wtokens = wait_arg.split(None, 1)
try:
pid = int(wtokens[0])
except ValueError:
return "/goal wait: <pid> must be an integer process id."
reason = wtokens[1].strip() if len(wtokens) > 1 else ""
try:
mgr.wait_on(pid, reason=reason)
except (RuntimeError, ValueError) as exc:
return f"/goal wait: {exc}"
rtxt = f" ({reason})" if reason else ""
return f"⏳ Goal parked on pid {pid}{rtxt}. Loop pauses until it exits."
# /goal unwait — clear the wait barrier.
if lower == "unwait":
if mgr.stop_waiting():
return "▶ Wait barrier cleared — goal loop resumes."
return "No wait barrier set."
# Otherwise — treat the remaining text as the new goal.
try:
state = mgr.set(args)

View file

@ -1821,6 +1821,38 @@ class CLICommandsMixin:
_cprint(f" {_DIM}No active goal.{_RST}")
return
# /goal wait <pid> [reason] — park the loop on a background process so
# it stops re-poking the agent every turn while it waits on CI / a
# build / a long job. The barrier auto-clears when the PID exits.
if lower == "wait" or lower.startswith("wait "):
wait_arg = arg[len("wait"):].strip()
if not wait_arg:
_cprint(" Usage: /goal wait <pid> [reason]")
return
wtokens = wait_arg.split(None, 1)
try:
pid = int(wtokens[0])
except ValueError:
_cprint(" /goal wait: <pid> must be an integer process id.")
return
reason = wtokens[1].strip() if len(wtokens) > 1 else ""
try:
mgr.wait_on(pid, reason=reason)
except (RuntimeError, ValueError) as exc:
_cprint(f" /goal wait: {exc}")
return
rtxt = f" ({reason})" if reason else ""
_cprint(f" ⏳ Goal parked on pid {pid}{rtxt}. Loop pauses until it exits.")
return
# /goal unwait — drop the wait barrier and resume normal looping.
if lower == "unwait":
if mgr.stop_waiting():
_cprint(" ▶ Wait barrier cleared — goal loop resumes.")
else:
_cprint(f" {_DIM}No wait barrier set.{_RST}")
return
# Otherwise treat the arg as the goal text.
try:
state = mgr.set(arg)

View file

@ -108,7 +108,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("steer", "Inject a message after the next tool call without interrupting", "Session",
args_hint="<prompt>"),
CommandDef("goal", "Set a standing goal Hermes works on across turns until achieved", "Session",
args_hint="[text | pause | resume | clear | status]"),
args_hint="[text | pause | resume | clear | status | wait <pid> | unwait]"),
CommandDef("subgoal", "Add or manage extra criteria on the active goal", "Session",
args_hint="[text | remove N | clear]"),
CommandDef("status", "Show session, model, token, and context info", "Session"),

View file

@ -94,25 +94,59 @@ CONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATE = (
JUDGE_SYSTEM_PROMPT = (
"You are a strict judge evaluating whether an autonomous agent has "
"achieved a user's stated goal. You receive the goal text and the "
"agent's most recent response. Your only job is to decide whether "
"the goal is fully satisfied based on that response.\n\n"
"A goal is DONE only when:\n"
"achieved a user's stated goal. You receive the goal text, the agent's "
"most recent response, and — when present — a list of background "
"processes the agent has running. Decide one of three verdicts.\n\n"
"DONE — the goal is fully satisfied:\n"
"- The response explicitly confirms the goal was completed, OR\n"
"- The response clearly shows the final deliverable was produced, OR\n"
"- The response explains the goal is unachievable / blocked / needs "
"user input (treat this as DONE with reason describing the block).\n\n"
"Otherwise the goal is NOT done — CONTINUE.\n\n"
"Reply ONLY with a single JSON object on one line:\n"
'{\"done\": <true|false>, \"reason\": \"<one-sentence rationale>\"}'
"WAIT — the goal is NOT done, but the next step is to wait for async "
"work to finish rather than act again. Choose this ONLY when the agent's "
"progress is genuinely gated on something running on its own:\n"
"- A background process listed below is still running AND the response "
"shows the agent is waiting on its result (e.g. a CI poller, build, "
"test run, deploy). If the process has a session id, return it in "
"``wait_on_session`` — that releases when the process exits OR its "
"watch_patterns trigger fires (use this for a long-lived watcher that "
"signals mid-run and may never exit). Otherwise return its pid in "
"``wait_on_pid`` (releases on exit only).\n"
"- The agent says it is rate-limited / backing off / must wait a fixed "
"period — return seconds in ``wait_for_seconds``.\n"
"Picking WAIT parks the loop without burning a turn; it resumes "
"automatically when the pid exits or the time elapses. Do NOT pick WAIT "
"just because work remains — only when re-poking now would be pure "
"busy-work because the agent can't progress until the async thing "
"finishes.\n\n"
"CONTINUE — not done, and there is a concrete next step the agent can "
"take right now. This is the default when in doubt.\n\n"
"Reply ONLY with a single JSON object on one line. Shapes:\n"
'{"verdict": "done", "reason": "<one sentence>"}\n'
'{"verdict": "continue", "reason": "<one sentence>"}\n'
'{"verdict": "wait", "wait_on_session": "<id>", "reason": "<one sentence>"}\n'
'{"verdict": "wait", "wait_on_pid": <int>, "reason": "<one sentence>"}\n'
'{"verdict": "wait", "wait_for_seconds": <int>, "reason": "<one sentence>"}\n'
"The legacy shape {\"done\": <true|false>, \"reason\": \"...\"} is still "
"accepted (true=done, false=continue)."
)
# Rendered into the judge prompt when the agent has background processes
# running. Gives the judge the context it needs to decide WAIT vs CONTINUE
# (and which pid to wait on) without it having to probe anything itself.
JUDGE_BACKGROUND_BLOCK_TEMPLATE = (
"Background processes the agent currently has running (it may be waiting "
"on one of these):\n{background_lines}\n\n"
)
JUDGE_USER_PROMPT_TEMPLATE = (
"Goal:\n{goal}\n\n"
"Agent's most recent response:\n{response}\n\n"
"{background_block}"
"Current time: {current_time}\n\n"
"Is the goal satisfied?"
"Is the goal satisfied — done, continue, or wait?"
)
# Used when the user has added /subgoal criteria. The judge must
@ -122,6 +156,7 @@ JUDGE_USER_PROMPT_WITH_SUBGOALS_TEMPLATE = (
"Additional criteria the user added mid-loop (all must also be "
"satisfied for the goal to be DONE):\n{subgoals_block}\n\n"
"Agent's most recent response:\n{response}\n\n"
"{background_block}"
"Current time: {current_time}\n\n"
"Decision: For each numbered criterion above, find concrete "
"evidence in the agent's response that the criterion is "
@ -129,7 +164,8 @@ JUDGE_USER_PROMPT_WITH_SUBGOALS_TEMPLATE = (
"met' or 'implying it was done' — require specific evidence (a "
"file contents excerpt, an output line, a command result). If "
"ANY criterion lacks specific evidence in the response, the goal "
"is NOT done — return CONTINUE.\n\n"
"is NOT done — return CONTINUE (or WAIT if blocked on a listed "
"background process).\n\n"
"Is the goal AND every additional criterion satisfied?"
)
@ -159,6 +195,30 @@ class GoalState:
# them into the verdict. Backwards-compatible: defaults to empty so
# old state_meta rows load unchanged.
subgoals: List[str] = field(default_factory=list)
# Wait barrier: when the agent is blocked on long-running async work
# (CI poller, build, test run, deploy, rate-limit cooldown) the goal loop
# PARKS instead of being re-poked every turn into busy-work. Two barrier
# kinds, set automatically by the judge (which now sees the live
# background-process list and can return a ``wait`` verdict) or manually
# via ``/goal wait``:
# • ``waiting_on_pid`` — park until that process exits.
# • ``waiting_on_session`` — park until that process_registry session's
# OWN trigger fires: it exits, OR (if it has watch_patterns) its
# pattern matches. Covers long-lived watchers/servers that signal
# mid-run via a trigger and may never exit. Preferred over raw pid
# when the agent set up a watch_patterns/notify_on_complete process.
# • ``waiting_until`` — park until this wall-clock epoch (time backoff).
# While ANY is active, ``evaluate_after_turn`` short-circuits to
# should_continue=False without burning a turn or calling the judge. The
# barrier auto-clears when the pid exits / the trigger fires / the deadline
# passes, then the next turn resumes normal judging. Cleared by that,
# ``/goal unwait``, pause, resume, or clear. Backwards-compatible: old
# state_meta rows load with no barrier.
waiting_on_pid: Optional[int] = None
waiting_on_session: Optional[str] = None
waiting_until: float = 0.0
waiting_reason: Optional[str] = None
waiting_since: float = 0.0
def to_json(self) -> str:
return json.dumps(asdict(self), ensure_ascii=False)
@ -182,6 +242,11 @@ class GoalState:
paused_reason=data.get("paused_reason"),
consecutive_parse_failures=int(data.get("consecutive_parse_failures", 0) or 0),
subgoals=subgoals,
waiting_on_pid=(int(data["waiting_on_pid"]) if data.get("waiting_on_pid") else None),
waiting_on_session=(str(data["waiting_on_session"]) if data.get("waiting_on_session") else None),
waiting_until=float(data.get("waiting_until", 0.0) or 0.0),
waiting_reason=data.get("waiting_reason"),
waiting_since=float(data.get("waiting_since", 0.0) or 0.0),
)
# --- subgoals helpers -------------------------------------------------
@ -330,6 +395,52 @@ def _truncate(text: str, limit: int) -> str:
return text[:limit] + "… [truncated]"
def _pid_alive(pid: int) -> bool:
"""Return True if a process with ``pid`` is currently alive.
Delegates to ``gateway.status._pid_exists`` the canonical,
cross-platform, footgun-safe liveness check (psutil with a ctypes /
POSIX fallback). Critically this avoids ``os.kill(pid, 0)``, which on
Windows is NOT a no-op: it routes to ``CTRL_C_EVENT`` and hard-kills the
target's console process group (bpo-14484). Any error resolves to False
(treat unknown as dead) so a stale barrier never wedges the loop the
worst case is the goal resumes one turn early, which is safe.
"""
if not pid or pid <= 0:
return False
try:
from gateway.status import _pid_exists
return bool(_pid_exists(int(pid)))
except Exception:
pass
# Last-resort fallback if gateway.status is unavailable: psutil directly.
try:
import psutil # type: ignore
return bool(psutil.pid_exists(int(pid)))
except Exception:
return False
def _session_waiting(session_id: str) -> bool:
"""Whether a goal parked on a process_registry session should stay parked.
Delegates to ``process_registry.is_session_waiting`` True while the
session is running and (if it has watch_patterns) its trigger hasn't fired.
Fail-safe: any import/registry error yields False (don't wait) so a stale
barrier can never wedge the loop.
"""
if not session_id:
return False
try:
from tools.process_registry import process_registry
return bool(process_registry.is_session_waiting(session_id))
except Exception:
return False
_JSON_OBJECT_RE = re.compile(r"\{.*?\}", re.DOTALL)
@ -357,17 +468,25 @@ def _goal_judge_max_tokens() -> int:
return DEFAULT_JUDGE_MAX_TOKENS
def _parse_judge_response(raw: str) -> Tuple[bool, str, bool]:
"""Parse the judge's reply. Fail-open to ``(False, "<reason>", parse_failed)``.
def _parse_judge_response(raw: str) -> Tuple[str, str, bool, Optional[Dict[str, Any]]]:
"""Parse the judge's reply. Fail-open on unusable output.
Returns ``(done, reason, parse_failed)``. ``parse_failed`` is True when the
judge returned output that couldn't be interpreted as the expected JSON
verdict (empty body, prose, malformed JSON). Callers use that flag to
auto-pause after N consecutive parse failures so a weak judge model
doesn't silently burn the turn budget.
Returns ``(verdict, reason, parse_failed, wait_directive)`` where:
- ``verdict`` is ``"done"``, ``"continue"``, or ``"wait"``.
- ``parse_failed`` is True when the judge returned output that couldn't
be interpreted as the expected JSON verdict (empty body, prose,
malformed JSON). Callers use it to auto-pause after N consecutive
parse failures so a weak judge model doesn't silently burn the budget.
- ``wait_directive`` is set only for ``verdict == "wait"``: a dict with
``{"pid": int}`` or ``{"seconds": int}`` (whichever the judge supplied).
``None`` otherwise. If a wait verdict carries neither a usable pid nor
seconds, it is downgraded to ``continue`` (can't park on nothing).
Accepts both the new ``{"verdict": ...}`` shape and the legacy
``{"done": <bool>}`` shape.
"""
if not raw:
return False, "judge returned empty response", True
return "continue", "judge returned empty response", True, None
text = raw.strip()
@ -393,17 +512,103 @@ def _parse_judge_response(raw: str) -> Tuple[bool, str, bool]:
data = None
if not isinstance(data, dict):
return False, f"judge reply was not JSON: {_truncate(raw, 200)!r}", True
return "continue", f"judge reply was not JSON: {_truncate(raw, 200)!r}", True, None
done_val = data.get("done")
if isinstance(done_val, str):
done = done_val.strip().lower() in {"true", "yes", "1", "done"}
reason = str(data.get("reason") or "").strip() or "no reason provided"
# Determine verdict — prefer the explicit "verdict" field, fall back to
# the legacy "done" boolean.
verdict_raw = data.get("verdict")
if isinstance(verdict_raw, str):
verdict = verdict_raw.strip().lower()
else:
done = bool(done_val)
reason = str(data.get("reason") or "").strip()
if not reason:
reason = "no reason provided"
return done, reason, False
done_val = data.get("done")
if isinstance(done_val, str):
done = done_val.strip().lower() in {"true", "yes", "1", "done"}
else:
done = bool(done_val)
verdict = "done" if done else "continue"
if verdict not in {"done", "continue", "wait"}:
verdict = "continue"
if verdict != "wait":
return verdict, reason, False, None
# Wait verdict: extract a concrete directive (pid or seconds). Accept a
# few key spellings the model might emit.
def _first_int(*keys: str) -> Optional[int]:
for k in keys:
v = data.get(k)
if v is None:
continue
try:
iv = int(v)
if iv > 0:
return iv
except (TypeError, ValueError):
continue
return None
# Prefer a session-id directive (releases on the process's own trigger —
# exit OR watch-pattern match), then pid (exit only), then seconds.
sess = data.get("wait_on_session") or data.get("session_id") or data.get("wait_session")
if isinstance(sess, str) and sess.strip():
return "wait", reason, False, {"session_id": sess.strip()}
pid = _first_int("wait_on_pid", "pid", "wait_pid")
if pid is not None:
return "wait", reason, False, {"pid": pid}
seconds = _first_int("wait_for_seconds", "seconds", "wait_seconds")
if seconds is not None:
return "wait", reason, False, {"seconds": seconds}
# Wait with no usable target — can't park on nothing; treat as continue.
return "continue", f"{reason} (wait verdict had no target — continuing)", False, None
def _render_background_block(background_processes: Optional[List[Dict[str, Any]]]) -> str:
"""Render the live background-process list for the judge prompt.
Each entry is a ``process_registry.list_sessions()`` dict. Only RUNNING
processes are worth showing (an exited one is nothing to wait on). Returns
an empty string when there's nothing running, so the judge prompt is
byte-identical to the no-background case (no behavior change for the
common path).
"""
if not background_processes:
return ""
lines: List[str] = []
for p in background_processes:
if not isinstance(p, dict):
continue
if p.get("status") == "exited":
continue
pid = p.get("pid")
if not pid:
continue
cmd = _truncate(str(p.get("command") or "").replace("\n", " ").strip(), 120)
uptime = p.get("uptime_seconds")
tail = _truncate(str(p.get("output_preview") or "").replace("\n", " ").strip(), 120)
sid = p.get("session_id")
line = f"- pid {pid}"
if sid:
line += f" / session {sid}"
line += f": {cmd}"
if uptime is not None:
line += f" (running {uptime}s)"
# Surface the process's own trigger so the judge can wait on a
# mid-run signal (watch-pattern) or completion, not just exit.
wps = p.get("watch_patterns")
if wps:
hit = " [already matched]" if p.get("watch_hit") else ""
line += f" | watch_patterns={wps}{hit}"
elif p.get("notify_on_complete"):
line += " | notify_on_complete"
if tail:
line += f" | recent output: {tail}"
lines.append(line)
if not lines:
return ""
return JUDGE_BACKGROUND_BLOCK_TEMPLATE.format(background_lines="\n".join(lines))
def judge_goal(
@ -412,11 +617,14 @@ def judge_goal(
*,
timeout: float = DEFAULT_JUDGE_TIMEOUT,
subgoals: Optional[List[str]] = None,
) -> Tuple[str, str, bool]:
background_processes: Optional[List[Dict[str, Any]]] = None,
) -> Tuple[str, str, bool, Optional[Dict[str, Any]]]:
"""Ask the auxiliary model whether the goal is satisfied.
Returns ``(verdict, reason, parse_failed)`` where verdict is ``"done"``,
``"continue"``, or ``"skipped"`` (when the judge couldn't be reached).
Returns ``(verdict, reason, parse_failed, wait_directive)`` where verdict
is ``"done"``, ``"continue"``, ``"wait"``, or ``"skipped"`` (when the
judge couldn't be reached). ``wait_directive`` is set only for ``"wait"``
(``{"pid": int}`` or ``{"seconds": int}``); ``None`` otherwise.
``parse_failed`` is True only when the judge call succeeded but its output
was unusable (empty or non-JSON). API/transport errors return False they
@ -425,37 +633,39 @@ def judge_goal(
``DEFAULT_MAX_CONSECUTIVE_PARSE_FAILURES``).
``subgoals`` is an optional list of user-added criteria (from
``/subgoal``) that the judge must also factor into its DONE/CONTINUE
decision. When non-empty the prompt switches to the with-subgoals
template; otherwise behavior is identical to the original judge.
``/subgoal``) factored into the verdict. ``background_processes`` is the
live ``process_registry.list_sessions()`` snapshot; when the agent is
waiting on one (a CI poller, build, etc.) the judge can return a ``wait``
verdict naming its pid, parking the loop instead of re-poking.
This is deliberately fail-open: any error returns ``("continue", "...", False)``
This is deliberately fail-open: any error returns ``("continue", ..., False, None)``
so a broken judge doesn't wedge progress — the turn budget and the
consecutive-parse-failures auto-pause are the backstops.
"""
if not goal.strip():
return "skipped", "empty goal", False
return "skipped", "empty goal", False, None
if not last_response.strip():
# No substantive reply this turn — almost certainly not done yet.
return "continue", "empty response (nothing to evaluate)", False
return "continue", "empty response (nothing to evaluate)", False, None
try:
from agent.auxiliary_client import get_auxiliary_extra_body, get_text_auxiliary_client
except Exception as exc:
logger.debug("goal judge: auxiliary client import failed: %s", exc)
return "continue", "auxiliary client unavailable", False
return "continue", "auxiliary client unavailable", False, None
try:
client, model = get_text_auxiliary_client("goal_judge")
except Exception as exc:
logger.debug("goal judge: get_text_auxiliary_client failed: %s", exc)
return "continue", "auxiliary client unavailable", False
return "continue", "auxiliary client unavailable", False, None
if client is None or not model:
return "continue", "no auxiliary client configured", False
return "continue", "no auxiliary client configured", False, None
# Build the prompt — pick the with-subgoals variant when applicable.
clean_subgoals = [s.strip() for s in (subgoals or []) if s and s.strip()]
background_block = _render_background_block(background_processes)
current_time = datetime.now(tz=timezone.utc).astimezone().strftime("%Y-%m-%d %H:%M:%S %Z")
if clean_subgoals:
subgoals_block = "\n".join(
@ -465,12 +675,14 @@ def judge_goal(
goal=_truncate(goal, 2000),
subgoals_block=_truncate(subgoals_block, 2000),
response=_truncate(last_response, _JUDGE_RESPONSE_SNIPPET_CHARS),
background_block=background_block,
current_time=current_time,
)
else:
prompt = JUDGE_USER_PROMPT_TEMPLATE.format(
goal=_truncate(goal, 2000),
response=_truncate(last_response, _JUDGE_RESPONSE_SNIPPET_CHARS),
background_block=background_block,
current_time=current_time,
)
@ -488,17 +700,40 @@ def judge_goal(
)
except Exception as exc:
logger.info("goal judge: API call failed (%s) — falling through to continue", exc)
return "continue", f"judge error: {type(exc).__name__}", False
return "continue", f"judge error: {type(exc).__name__}", False, None
try:
raw = resp.choices[0].message.content or ""
except Exception:
raw = ""
done, reason, parse_failed = _parse_judge_response(raw)
verdict = "done" if done else "continue"
logger.info("goal judge: verdict=%s reason=%s", verdict, _truncate(reason, 120))
return verdict, reason, parse_failed
verdict, reason, parse_failed, wait_directive = _parse_judge_response(raw)
logger.info(
"goal judge: verdict=%s reason=%s%s",
verdict, _truncate(reason, 120),
f" wait={wait_directive}" if wait_directive else "",
)
return verdict, reason, parse_failed, wait_directive
def gather_background_processes(task_id: Optional[str] = None) -> List[Dict[str, Any]]:
"""Return the live background-process snapshot for the goal judge.
Thin, fail-safe wrapper over ``process_registry.list_sessions(task_id)``.
Returns only RUNNING processes (an exited one is nothing to wait on) and
never raises any import/registry failure yields ``[]`` so the goal loop
degrades to its pre-wait-barrier behavior (judge just won't see processes).
The drivers (CLI + gateway) call this and pass the result into
``GoalManager.evaluate_after_turn(background_processes=...)``.
"""
try:
from tools.process_registry import process_registry
sessions = process_registry.list_sessions(task_id=task_id) or []
except Exception as exc:
logger.debug("gather_background_processes failed: %s", exc)
return []
return [s for s in sessions if isinstance(s, dict) and s.get("status") != "exited"]
# ──────────────────────────────────────────────────────────────────────
@ -547,6 +782,16 @@ class GoalManager:
turns = f"{s.turns_used}/{s.max_turns} turns"
sub = f", {len(s.subgoals)} subgoal{'s' if len(s.subgoals) != 1 else ''}" if s.subgoals else ""
if s.status == "active":
if s.waiting_on_session and _session_waiting(s.waiting_on_session):
wr = s.waiting_reason or f"session {s.waiting_on_session}"
return f"⏳ Goal (parked on {wr}, {turns}{sub}): {s.goal}"
if s.waiting_on_pid and _pid_alive(s.waiting_on_pid):
wr = s.waiting_reason or f"pid {s.waiting_on_pid}"
return f"⏳ Goal (parked on {wr}, {turns}{sub}): {s.goal}"
if s.waiting_until and time.time() < s.waiting_until:
remaining = int(s.waiting_until - time.time())
wr = s.waiting_reason or f"{remaining}s"
return f"⏳ Goal (parked {remaining}s — {wr}, {turns}{sub}): {s.goal}"
return f"⊙ Goal (active, {turns}{sub}): {s.goal}"
if s.status == "paused":
extra = f"{s.paused_reason}" if s.paused_reason else ""
@ -578,6 +823,12 @@ class GoalManager:
return None
self._state.status = "paused"
self._state.paused_reason = reason
# A wait barrier is meaningless once paused — drop it.
self._state.waiting_on_pid = None
self._state.waiting_on_session = None
self._state.waiting_until = 0.0
self._state.waiting_reason = None
self._state.waiting_since = 0.0
save_goal(self.session_id, self._state)
return self._state
@ -586,6 +837,12 @@ class GoalManager:
return None
self._state.status = "active"
self._state.paused_reason = None
# Resuming starts fresh — clear any stale barrier.
self._state.waiting_on_pid = None
self._state.waiting_on_session = None
self._state.waiting_until = 0.0
self._state.waiting_reason = None
self._state.waiting_since = 0.0
if reset_budget:
self._state.turns_used = 0
save_goal(self.session_id, self._state)
@ -653,6 +910,123 @@ class GoalManager:
return "(no subgoals — use /subgoal <text> to add criteria)"
return self._state.render_subgoals_block()
# --- /goal wait barrier -------------------------------------------
def wait_on(self, pid: int, reason: str = "") -> GoalState:
"""Park the goal loop on a background process PID.
While the PID is alive, ``evaluate_after_turn`` returns
``should_continue=False`` without burning a turn or calling the
judge the loop quiesces instead of re-poking the agent into busy
work. The barrier auto-clears when the process exits. Requires an
active goal. For a process with a watch_patterns/notify_on_complete
trigger, prefer ``wait_on_session`` so a mid-run trigger (not just
exit) releases the barrier.
"""
if self._state is None or self._state.status != "active":
raise RuntimeError("no active goal to park")
pid = int(pid)
if pid <= 0:
raise ValueError("pid must be a positive integer")
self._state.waiting_on_pid = pid
self._state.waiting_on_session = None
self._state.waiting_until = 0.0
self._state.waiting_reason = (reason or "").strip() or None
self._state.waiting_since = time.time()
save_goal(self.session_id, self._state)
return self._state
def wait_on_session(self, session_id: str, reason: str = "") -> GoalState:
"""Park the goal loop on a process_registry session's OWN trigger.
Unlike ``wait_on`` (which releases only on PID exit), this releases
when the session's trigger fires: it exits, OR — if it was started
with ``watch_patterns`` its pattern matches. This is the right
barrier for a long-lived watcher/server/poller that signals mid-run
and may never exit. Requires an active goal.
"""
if self._state is None or self._state.status != "active":
raise RuntimeError("no active goal to park")
session_id = str(session_id or "").strip()
if not session_id:
raise ValueError("session_id must be a non-empty string")
self._state.waiting_on_session = session_id
self._state.waiting_on_pid = None
self._state.waiting_until = 0.0
self._state.waiting_reason = (reason or "").strip() or None
self._state.waiting_since = time.time()
save_goal(self.session_id, self._state)
return self._state
def wait_for_seconds(self, seconds: int, reason: str = "") -> GoalState:
"""Park the goal loop until ``seconds`` from now have elapsed.
Time-based counterpart to ``wait_on`` for backoff / cooldown waits
where there's no process to track (e.g. the agent is rate-limited).
The barrier auto-clears once the deadline passes. Requires an active
goal.
"""
if self._state is None or self._state.status != "active":
raise RuntimeError("no active goal to park")
seconds = int(seconds)
if seconds <= 0:
raise ValueError("seconds must be a positive integer")
self._state.waiting_on_pid = None
self._state.waiting_on_session = None
self._state.waiting_until = time.time() + seconds
self._state.waiting_reason = (reason or "").strip() or None
self._state.waiting_since = time.time()
save_goal(self.session_id, self._state)
return self._state
def stop_waiting(self) -> bool:
"""Clear any active wait barrier (pid / session / time). Returns True
if one was cleared."""
if self._state is None:
return False
if (
self._state.waiting_on_pid is None
and self._state.waiting_on_session is None
and not self._state.waiting_until
):
return False
self._state.waiting_on_pid = None
self._state.waiting_on_session = None
self._state.waiting_until = 0.0
self._state.waiting_reason = None
self._state.waiting_since = 0.0
save_goal(self.session_id, self._state)
return True
def is_waiting(self) -> bool:
"""True iff a barrier is set AND not yet satisfied.
Session barrier: active until the process exits or its watch-pattern
trigger fires. Pid barrier: active while the process is alive. Time
barrier: active until the deadline passes. Side effect: a satisfied
barrier is cleared here (lazy auto-clear) so the next evaluation
resumes normal judging.
"""
s = self._state
if s is None:
return False
if s.waiting_on_session is not None:
if _session_waiting(s.waiting_on_session):
return True
self.stop_waiting() # session exited or trigger fired
return False
if s.waiting_on_pid is not None:
if _pid_alive(s.waiting_on_pid):
return True
self.stop_waiting() # process gone
return False
if s.waiting_until:
if time.time() < s.waiting_until:
return True
self.stop_waiting() # deadline passed
return False
return False
# --- the main entry point called after every turn -----------------
def evaluate_after_turn(
@ -660,6 +1034,7 @@ class GoalManager:
last_response: str,
*,
user_initiated: bool = True,
background_processes: Optional[List[Dict[str, Any]]] = None,
) -> Dict[str, Any]:
"""Run the judge and update state. Return a decision dict.
@ -667,11 +1042,16 @@ class GoalManager:
continuation prompt we fed ourselves (False). Both increment
``turns_used`` because both consume model budget.
``background_processes`` is the live ``process_registry.list_sessions()``
snapshot for this session. It's handed to the judge so it can decide
to WAIT on an in-flight process (CI poller, build, ...) instead of
re-poking the agent the automatic counterpart to ``/goal wait``.
Decision keys:
- ``status``: current goal status after update
- ``should_continue``: bool caller should fire another turn
- ``continuation_prompt``: str or None
- ``verdict``: "done" | "continue" | "skipped" | "inactive"
- ``verdict``: "done" | "continue" | "wait" | "skipped" | "inactive"
- ``reason``: str
- ``message``: user-visible one-liner to print/send
"""
@ -686,12 +1066,36 @@ class GoalManager:
"message": "",
}
# Wait barrier: if the loop is parked (on a live process OR a time
# deadline that hasn't passed), quiesce — do NOT burn a turn or call
# the judge. Resumes automatically once the barrier clears.
if self.is_waiting():
if state.waiting_on_session is not None:
tgt = f"session {state.waiting_on_session}"
elif state.waiting_on_pid is not None:
tgt = f"pid {state.waiting_on_pid}"
else:
remaining = max(0, int(state.waiting_until - time.time()))
tgt = f"{remaining}s remaining"
reason = state.waiting_reason or tgt
return {
"status": "active",
"should_continue": False,
"continuation_prompt": None,
"verdict": "waiting",
"reason": reason,
"message": f"⏳ Goal parked — waiting on {tgt}: {reason}",
}
# Count the turn that just finished.
state.turns_used += 1
state.last_turn_at = time.time()
verdict, reason, parse_failed = judge_goal(
state.goal, last_response, subgoals=state.subgoals or None
verdict, reason, parse_failed, wait_directive = judge_goal(
state.goal,
last_response,
subgoals=state.subgoals or None,
background_processes=background_processes,
)
state.last_verdict = verdict
state.last_reason = reason
@ -704,6 +1108,31 @@ class GoalManager:
else:
state.consecutive_parse_failures = 0
# WAIT verdict: the judge decided the agent is blocked on async work
# and re-poking now would be busy-work. Set the barrier and park —
# the turn we just counted stands (the judge call happened), but no
# continuation fires. The loop resumes automatically when the pid
# exits or the deadline passes (next evaluate_after_turn falls through
# the is_waiting() short-circuit once the barrier clears).
if verdict == "wait" and wait_directive:
if wait_directive.get("session_id"):
self.wait_on_session(str(wait_directive["session_id"]), reason=reason)
tgt = f"session {wait_directive['session_id']}"
elif wait_directive.get("pid"):
self.wait_on(int(wait_directive["pid"]), reason=reason)
tgt = f"pid {wait_directive['pid']}"
else:
self.wait_for_seconds(int(wait_directive["seconds"]), reason=reason)
tgt = f"{wait_directive['seconds']}s"
return {
"status": "active",
"should_continue": False,
"continuation_prompt": None,
"verdict": "wait",
"reason": reason,
"message": f"⏳ Goal parked (judge) — waiting on {tgt}: {reason}",
}
if verdict == "done":
state.status = "done"
save_goal(self.session_id, state)
@ -889,7 +1318,12 @@ def run_kanban_goal_loop(
return {"outcome": "stopped", "turns_used": turns_used, "reason": f"status={status}"}
# Still open — judge whether the latest response satisfies the card.
verdict, reason, _parse_failed = judge_goal(goal_text, last_response)
# The kanban worker loop has no wait-barrier concept (workers finish
# via kanban_complete / kanban_block, not by parking), so a WAIT
# verdict is treated as CONTINUE here.
verdict, reason, _parse_failed, _wait = judge_goal(goal_text, last_response)
if verdict == "wait":
verdict = "continue"
_log(f"kanban goal loop: turn {turns_used}/{max_turns} verdict={verdict} reason={_truncate(reason, 120)}")
if verdict == "done":

View file

@ -169,7 +169,7 @@ class TestHealthyTurnStillRuns:
# Force the judge to say "continue" without touching the network.
with patch(
"hermes_cli.goals.judge_goal",
return_value=("continue", "needs more steps", False),
return_value=("continue", "needs more steps", False, None),
):
cli._maybe_continue_goal_after_turn()
@ -189,7 +189,7 @@ class TestHealthyTurnStillRuns:
with patch(
"hermes_cli.goals.judge_goal",
return_value=("done", "goal satisfied", False),
return_value=("done", "goal satisfied", False, None),
):
cli._maybe_continue_goal_after_turn()

View file

@ -107,7 +107,7 @@ async def test_goal_verdict_done_sent_via_adapter_send(hermes_home):
mgr = GoalManager(session_entry.session_id)
mgr.set("ship the feature")
with patch("hermes_cli.goals.judge_goal", return_value=("done", "the feature shipped", False)):
with patch("hermes_cli.goals.judge_goal", return_value=("done", "the feature shipped", False, None)):
await runner._post_turn_goal_continuation(
session_entry=session_entry,
source=src,
@ -136,7 +136,7 @@ async def test_goal_verdict_continue_enqueues_continuation(hermes_home):
mgr = GoalManager(session_entry.session_id)
mgr.set("polish the docs")
with patch("hermes_cli.goals.judge_goal", return_value=("continue", "still needs work", False)):
with patch("hermes_cli.goals.judge_goal", return_value=("continue", "still needs work", False, None)):
await runner._post_turn_goal_continuation(
session_entry=session_entry,
source=src,
@ -164,7 +164,7 @@ async def test_goal_verdict_budget_exhausted_sends_pause(hermes_home):
state.turns_used = 2
save_goal(session_entry.session_id, state)
with patch("hermes_cli.goals.judge_goal", return_value=("continue", "keep going", False)):
with patch("hermes_cli.goals.judge_goal", return_value=("continue", "keep going", False, None)):
await runner._post_turn_goal_continuation(
session_entry=session_entry,
source=src,
@ -211,7 +211,7 @@ async def test_goal_verdict_survives_adapter_without_send(hermes_home):
runner.adapters[Platform.TELEGRAM] = _NoSendAdapter()
with patch("hermes_cli.goals.judge_goal", return_value=("done", "ok", False)):
with patch("hermes_cli.goals.judge_goal", return_value=("done", "ok", False, None)):
# must not raise
await runner._post_turn_goal_continuation(
session_entry=session_entry,

View file

@ -3,6 +3,7 @@
from __future__ import annotations
import json
import time
from unittest.mock import patch, MagicMock
import pytest
@ -40,23 +41,25 @@ class TestParseJudgeResponse:
def test_clean_json_done(self):
from hermes_cli.goals import _parse_judge_response
done, reason, _ = _parse_judge_response('{"done": true, "reason": "all good"}')
assert done is True
verdict, reason, _pf, wait = _parse_judge_response('{"done": true, "reason": "all good"}')
assert verdict == "done"
assert reason == "all good"
assert wait is None
def test_clean_json_continue(self):
from hermes_cli.goals import _parse_judge_response
done, reason, _ = _parse_judge_response('{"done": false, "reason": "more work needed"}')
assert done is False
verdict, reason, _pf, wait = _parse_judge_response('{"done": false, "reason": "more work needed"}')
assert verdict == "continue"
assert reason == "more work needed"
assert wait is None
def test_json_in_markdown_fence(self):
from hermes_cli.goals import _parse_judge_response
raw = '```json\n{"done": true, "reason": "done"}\n```'
done, reason, _ = _parse_judge_response(raw)
assert done is True
verdict, reason, _pf, _w = _parse_judge_response(raw)
assert verdict == "done"
assert "done" in reason
def test_json_embedded_in_prose(self):
@ -64,33 +67,79 @@ class TestParseJudgeResponse:
from hermes_cli.goals import _parse_judge_response
raw = 'Looking at this... the agent says X. Verdict: {"done": false, "reason": "partial"}'
done, reason, _ = _parse_judge_response(raw)
assert done is False
verdict, reason, _pf, _w = _parse_judge_response(raw)
assert verdict == "continue"
assert reason == "partial"
def test_string_done_values(self):
from hermes_cli.goals import _parse_judge_response
for s in ("true", "yes", "done", "1"):
done, _, _ = _parse_judge_response(f'{{"done": "{s}", "reason": "r"}}')
assert done is True
verdict, _, _, _ = _parse_judge_response(f'{{"done": "{s}", "reason": "r"}}')
assert verdict == "done"
for s in ("false", "no", "not yet"):
done, _, _ = _parse_judge_response(f'{{"done": "{s}", "reason": "r"}}')
assert done is False
verdict, _, _, _ = _parse_judge_response(f'{{"done": "{s}", "reason": "r"}}')
assert verdict == "continue"
def test_malformed_json_fails_open(self):
"""Non-JSON → not done, with error-ish reason (so judge_goal can map to continue)."""
def test_new_verdict_shape(self):
"""The explicit {"verdict": ...} shape is honored."""
from hermes_cli.goals import _parse_judge_response
done, reason, _ = _parse_judge_response("this is not json at all")
assert done is False
v, _, _, _ = _parse_judge_response('{"verdict": "done", "reason": "r"}')
assert v == "done"
v, _, _, _ = _parse_judge_response('{"verdict": "continue", "reason": "r"}')
assert v == "continue"
def test_wait_verdict_with_pid(self):
from hermes_cli.goals import _parse_judge_response
v, reason, pf, wait = _parse_judge_response(
'{"verdict": "wait", "wait_on_pid": 4242, "reason": "CI running"}'
)
assert v == "wait"
assert pf is False
assert wait == {"pid": 4242}
assert reason == "CI running"
def test_wait_verdict_with_seconds(self):
from hermes_cli.goals import _parse_judge_response
v, _, _, wait = _parse_judge_response(
'{"verdict": "wait", "wait_for_seconds": 90, "reason": "rate limited"}'
)
assert v == "wait"
assert wait == {"seconds": 90}
def test_wait_verdict_without_target_downgrades_to_continue(self):
"""A wait verdict with no pid/seconds can't park on anything → continue."""
from hermes_cli.goals import _parse_judge_response
v, _, pf, wait = _parse_judge_response('{"verdict": "wait", "reason": "vague"}')
assert v == "continue"
assert wait is None
assert pf is False
def test_unknown_verdict_falls_back_to_continue(self):
from hermes_cli.goals import _parse_judge_response
v, _, _, _ = _parse_judge_response('{"verdict": "maybe", "reason": "r"}')
assert v == "continue"
def test_malformed_json_fails_open(self):
"""Non-JSON → continue + parse_failed, with error-ish reason."""
from hermes_cli.goals import _parse_judge_response
verdict, reason, parse_failed, _w = _parse_judge_response("this is not json at all")
assert verdict == "continue"
assert parse_failed is True
assert reason # non-empty
def test_empty_response(self):
from hermes_cli.goals import _parse_judge_response
done, reason, _ = _parse_judge_response("")
assert done is False
verdict, reason, parse_failed, _w = _parse_judge_response("")
assert verdict == "continue"
assert parse_failed is True
assert reason
@ -103,13 +152,13 @@ class TestJudgeGoal:
def test_empty_goal_skipped(self):
from hermes_cli.goals import judge_goal
verdict, _, _ = judge_goal("", "some response")
verdict, _, _, _wd = judge_goal("", "some response")
assert verdict == "skipped"
def test_empty_response_continues(self):
from hermes_cli.goals import judge_goal
verdict, _, _ = judge_goal("ship the thing", "")
verdict, _, _, _wd = judge_goal("ship the thing", "")
assert verdict == "continue"
def test_no_aux_client_continues(self):
@ -120,7 +169,7 @@ class TestJudgeGoal:
"agent.auxiliary_client.get_text_auxiliary_client",
return_value=(None, None),
):
verdict, _, _ = goals.judge_goal("my goal", "my response")
verdict, _, _, _wd = goals.judge_goal("my goal", "my response")
assert verdict == "continue"
def test_api_error_continues(self):
@ -133,7 +182,7 @@ class TestJudgeGoal:
"agent.auxiliary_client.get_text_auxiliary_client",
return_value=(fake_client, "judge-model"),
):
verdict, reason, _ = goals.judge_goal("goal", "response")
verdict, reason, _, _wd = goals.judge_goal("goal", "response")
assert verdict == "continue"
assert "judge error" in reason.lower()
@ -152,7 +201,7 @@ class TestJudgeGoal:
"agent.auxiliary_client.get_text_auxiliary_client",
return_value=(fake_client, "judge-model"),
):
verdict, reason, _ = goals.judge_goal("goal", "agent response")
verdict, reason, _, _wd = goals.judge_goal("goal", "agent response")
assert verdict == "done"
assert reason == "achieved"
@ -171,7 +220,7 @@ class TestJudgeGoal:
"agent.auxiliary_client.get_text_auxiliary_client",
return_value=(fake_client, "judge-model"),
):
verdict, reason, _ = goals.judge_goal("goal", "agent response")
verdict, reason, _, _wd = goals.judge_goal("goal", "agent response")
assert verdict == "continue"
assert reason == "not yet"
@ -260,7 +309,7 @@ class TestGoalManager:
mgr = GoalManager(session_id="eval-sid-1")
mgr.set("ship it")
with patch.object(goals, "judge_goal", return_value=("done", "shipped", False)):
with patch.object(goals, "judge_goal", return_value=("done", "shipped", False, None)):
decision = mgr.evaluate_after_turn("I shipped the feature.")
assert decision["verdict"] == "done"
@ -276,7 +325,7 @@ class TestGoalManager:
mgr = GoalManager(session_id="eval-sid-2", default_max_turns=5)
mgr.set("a long goal")
with patch.object(goals, "judge_goal", return_value=("continue", "more work", False)):
with patch.object(goals, "judge_goal", return_value=("continue", "more work", False, None)):
decision = mgr.evaluate_after_turn("made some progress")
assert decision["verdict"] == "continue"
@ -294,7 +343,7 @@ class TestGoalManager:
mgr = GoalManager(session_id="eval-sid-3", default_max_turns=2)
mgr.set("hard goal")
with patch.object(goals, "judge_goal", return_value=("continue", "not yet", False)):
with patch.object(goals, "judge_goal", return_value=("continue", "not yet", False, None)):
d1 = mgr.evaluate_after_turn("step 1")
assert d1["should_continue"] is True
assert mgr.state.turns_used == 1
@ -371,28 +420,28 @@ class TestJudgeParseFailureAutoPause:
def test_parse_response_flags_empty_as_parse_failure(self):
from hermes_cli.goals import _parse_judge_response
done, reason, parse_failed = _parse_judge_response("")
assert done is False
verdict, reason, parse_failed, _w = _parse_judge_response("")
assert verdict == "continue"
assert parse_failed is True
assert "empty" in reason.lower()
def test_parse_response_flags_non_json_as_parse_failure(self):
from hermes_cli.goals import _parse_judge_response
done, reason, parse_failed = _parse_judge_response(
verdict, reason, parse_failed, _w = _parse_judge_response(
"Let me analyze whether the goal is fully satisfied based on the agent's response..."
)
assert done is False
assert verdict == "continue"
assert parse_failed is True
assert "not json" in reason.lower()
def test_parse_response_clean_json_is_not_parse_failure(self):
from hermes_cli.goals import _parse_judge_response
done, _, parse_failed = _parse_judge_response(
verdict, _, parse_failed, _w = _parse_judge_response(
'{"done": false, "reason": "more work"}'
)
assert done is False
assert verdict == "continue"
assert parse_failed is False
def test_api_error_does_not_count_as_parse_failure(self):
@ -405,7 +454,7 @@ class TestJudgeParseFailureAutoPause:
"agent.auxiliary_client.get_text_auxiliary_client",
return_value=(fake_client, "judge-model"),
):
verdict, _, parse_failed = goals.judge_goal("goal", "response")
verdict, _, parse_failed, _wd = goals.judge_goal("goal", "response")
assert verdict == "continue"
assert parse_failed is False
@ -421,7 +470,7 @@ class TestJudgeParseFailureAutoPause:
"agent.auxiliary_client.get_text_auxiliary_client",
return_value=(fake_client, "judge-model"),
):
verdict, _, parse_failed = goals.judge_goal("goal", "response")
verdict, _, parse_failed, _wd = goals.judge_goal("goal", "response")
assert verdict == "continue"
assert parse_failed is True
@ -435,7 +484,7 @@ class TestJudgeParseFailureAutoPause:
mgr.set("do a thing")
with patch.object(
goals, "judge_goal", return_value=("continue", "judge returned empty response", True)
goals, "judge_goal", return_value=("continue", "judge returned empty response", True, None)
):
d1 = mgr.evaluate_after_turn("step 1")
assert d1["should_continue"] is True
@ -464,7 +513,7 @@ class TestJudgeParseFailureAutoPause:
# Two parse failures…
with patch.object(
goals, "judge_goal", return_value=("continue", "not json", True)
goals, "judge_goal", return_value=("continue", "not json", True, None)
):
mgr.evaluate_after_turn("step 1")
mgr.evaluate_after_turn("step 2")
@ -472,7 +521,7 @@ class TestJudgeParseFailureAutoPause:
# …then one clean reply resets the counter.
with patch.object(
goals, "judge_goal", return_value=("continue", "making progress", False)
goals, "judge_goal", return_value=("continue", "making progress", False, None)
):
d = mgr.evaluate_after_turn("step 3")
assert d["should_continue"] is True
@ -487,7 +536,7 @@ class TestJudgeParseFailureAutoPause:
mgr.set("goal")
with patch.object(
goals, "judge_goal", return_value=("continue", "judge error: RuntimeError", False)
goals, "judge_goal", return_value=("continue", "judge error: RuntimeError", False, None)
):
for _ in range(5):
d = mgr.evaluate_after_turn("still going")
@ -506,7 +555,7 @@ class TestJudgeParseFailureAutoPause:
mgr.set("persistent goal")
with patch.object(
goals, "judge_goal", return_value=("continue", "empty", True)
goals, "judge_goal", return_value=("continue", "empty", True, None)
):
mgr.evaluate_after_turn("r")
mgr.evaluate_after_turn("r")
@ -714,7 +763,7 @@ class TestJudgeGoalWithSubgoals:
return_value=(_FakeClient, "fake-model")), \
patch("agent.auxiliary_client.get_auxiliary_extra_body",
return_value=None):
verdict, reason, parse_failed = goals.judge_goal(
verdict, reason, parse_failed, _wd = goals.judge_goal(
"ship the feature",
"ok shipped",
subgoals=["write tests", "update docs"],
@ -778,3 +827,395 @@ class TestStatusLineSubgoalCount:
mgr.add_subgoal("b")
line = mgr.status_line()
assert "2 subgoals" in line
# ──────────────────────────────────────────────────────────────────────
# Wait barrier — parking the goal loop on a background process
# ──────────────────────────────────────────────────────────────────────
class TestWaitBarrier:
"""The /goal wait barrier parks the loop on a live PID and resumes when
the process exits, without burning turns or calling the judge."""
@staticmethod
def _spawn_sleeper():
"""Start a short-lived child process; return its Popen handle."""
import subprocess
import sys
return subprocess.Popen([sys.executable, "-c", "import time; time.sleep(30)"])
@staticmethod
def _dead_pid():
"""A PID that is essentially guaranteed not to be running."""
return 2_000_000_000
def test_wait_on_requires_active_goal(self, hermes_home):
from hermes_cli.goals import GoalManager
mgr = GoalManager(session_id="wb-noactive")
with pytest.raises(RuntimeError):
mgr.wait_on(12345)
def test_wait_on_rejects_bad_pid(self, hermes_home):
from hermes_cli.goals import GoalManager
mgr = GoalManager(session_id="wb-badpid")
mgr.set("g")
with pytest.raises(ValueError):
mgr.wait_on(0)
def test_parked_on_live_pid_does_not_continue_or_judge(self, hermes_home):
from hermes_cli import goals
from hermes_cli.goals import GoalManager
proc = self._spawn_sleeper()
try:
mgr = GoalManager(session_id="wb-live")
mgr.set("ship it", max_turns=5)
mgr.wait_on(proc.pid, reason="CI green")
assert mgr.is_waiting() is True
# The judge must NOT be called while parked, and no turn is burned.
judge = MagicMock(return_value=("continue", "x", False, None))
with patch.object(goals, "judge_goal", judge):
decision = mgr.evaluate_after_turn("still waiting on CI")
judge.assert_not_called()
assert decision["verdict"] == "waiting"
assert decision["should_continue"] is False
assert decision["continuation_prompt"] is None
assert mgr.state.turns_used == 0 # no turn consumed while parked
assert "CI green" in decision["message"]
assert mgr.state.status == "active" # still active, just parked
finally:
proc.terminate()
proc.wait(timeout=10)
def test_barrier_auto_clears_when_process_exits_and_loop_resumes(self, hermes_home):
from hermes_cli import goals
from hermes_cli.goals import GoalManager
proc = self._spawn_sleeper()
mgr = GoalManager(session_id="wb-exit")
mgr.set("ship it", max_turns=5)
mgr.wait_on(proc.pid, reason="build")
assert mgr.is_waiting() is True
# Kill the process — barrier should auto-clear and judging resumes.
proc.terminate()
proc.wait(timeout=10)
assert mgr.is_waiting() is False # lazy auto-clear
assert mgr.state.waiting_on_pid is None
with patch.object(goals, "judge_goal", return_value=("continue", "more", False, None)):
decision = mgr.evaluate_after_turn("process finished, here are results")
assert decision["verdict"] == "continue"
assert decision["should_continue"] is True
assert mgr.state.turns_used == 1 # now a turn IS consumed
def test_dead_pid_never_parks(self, hermes_home):
from hermes_cli import goals
from hermes_cli.goals import GoalManager
mgr = GoalManager(session_id="wb-dead")
mgr.set("g", max_turns=5)
mgr.wait_on(self._dead_pid(), reason="already-dead")
# is_waiting clears the stale barrier immediately.
assert mgr.is_waiting() is False
with patch.object(goals, "judge_goal", return_value=("continue", "go", False, None)):
decision = mgr.evaluate_after_turn("response")
assert decision["should_continue"] is True
def test_stop_waiting_clears_barrier(self, hermes_home):
from hermes_cli.goals import GoalManager
proc = self._spawn_sleeper()
try:
mgr = GoalManager(session_id="wb-stop")
mgr.set("g")
mgr.wait_on(proc.pid)
assert mgr.is_waiting() is True
assert mgr.stop_waiting() is True
assert mgr.state.waiting_on_pid is None
assert mgr.is_waiting() is False
assert mgr.stop_waiting() is False # idempotent
finally:
proc.terminate()
proc.wait(timeout=10)
def test_pause_and_resume_clear_barrier(self, hermes_home):
from hermes_cli.goals import GoalManager
proc = self._spawn_sleeper()
try:
mgr = GoalManager(session_id="wb-pause")
mgr.set("g")
mgr.wait_on(proc.pid)
mgr.pause()
assert mgr.state.waiting_on_pid is None
mgr.resume()
assert mgr.state.waiting_on_pid is None
finally:
proc.terminate()
proc.wait(timeout=10)
def test_barrier_persists_and_reloads(self, hermes_home):
from hermes_cli.goals import GoalManager
proc = self._spawn_sleeper()
try:
mgr = GoalManager(session_id="wb-persist")
mgr.set("g")
mgr.wait_on(proc.pid, reason="deploy")
# Fresh manager loads the persisted barrier.
mgr2 = GoalManager(session_id="wb-persist")
assert mgr2.state.waiting_on_pid == proc.pid
assert mgr2.state.waiting_reason == "deploy"
assert mgr2.is_waiting() is True
finally:
proc.terminate()
proc.wait(timeout=10)
def test_old_state_row_loads_without_barrier_fields(self, hermes_home):
"""Backwards-compat: a state_meta row written before the barrier
existed must load with no barrier."""
from hermes_cli.goals import GoalState
legacy = json.dumps({
"goal": "old goal",
"status": "active",
"turns_used": 2,
"max_turns": 20,
})
st = GoalState.from_json(legacy)
assert st.goal == "old goal"
assert st.waiting_on_pid is None
assert st.waiting_reason is None
assert st.waiting_since == 0.0
assert st.waiting_until == 0.0
# ──────────────────────────────────────────────────────────────────────
# Judge-driven auto-wait — the judge parks the loop on its own
# ──────────────────────────────────────────────────────────────────────
class TestJudgeDrivenWait:
"""The judge returns a `wait` verdict (given live background-process
context) and the loop parks automatically no manual /goal wait."""
@staticmethod
def _spawn_sleeper():
import subprocess, sys
return subprocess.Popen([sys.executable, "-c", "import time; time.sleep(30)"])
def test_judge_wait_pid_parks_loop(self, hermes_home):
from hermes_cli import goals
from hermes_cli.goals import GoalManager
proc = self._spawn_sleeper()
try:
mgr = GoalManager(session_id="jw-pid", default_max_turns=10)
mgr.set("ship the PR")
# Judge sees the running process and says wait-on-pid.
with patch.object(
goals, "judge_goal",
return_value=("wait", "CI watcher still running", False, {"pid": proc.pid}),
):
decision = mgr.evaluate_after_turn(
"Pushed the PR, watching CI.",
background_processes=[{
"pid": proc.pid, "command": "wait_for_pr_green.sh",
"status": "running", "uptime_seconds": 12,
}],
)
assert decision["verdict"] == "wait"
assert decision["should_continue"] is False
assert decision["continuation_prompt"] is None
assert mgr.state.waiting_on_pid == proc.pid
assert mgr.is_waiting() is True
# Next turn while still parked: judge must NOT be called again.
judge = MagicMock()
with patch.object(goals, "judge_goal", judge):
d2 = mgr.evaluate_after_turn("still going")
judge.assert_not_called()
assert d2["verdict"] == "waiting"
assert d2["should_continue"] is False
finally:
proc.terminate()
proc.wait(timeout=10)
def test_judge_wait_seconds_parks_loop(self, hermes_home):
from hermes_cli import goals
from hermes_cli.goals import GoalManager
mgr = GoalManager(session_id="jw-secs", default_max_turns=10)
mgr.set("retry after backoff")
with patch.object(
goals, "judge_goal",
return_value=("wait", "rate limited", False, {"seconds": 120}),
):
decision = mgr.evaluate_after_turn("Hit a 429, backing off.")
assert decision["verdict"] == "wait"
assert decision["should_continue"] is False
assert mgr.state.waiting_until > 0
assert mgr.state.waiting_on_pid is None
assert mgr.is_waiting() is True
def test_time_barrier_clears_after_deadline(self, hermes_home):
from hermes_cli.goals import GoalManager
mgr = GoalManager(session_id="jw-deadline")
mgr.set("g")
mgr.wait_for_seconds(120, reason="backoff")
assert mgr.is_waiting() is True
# Force the deadline into the past → barrier auto-clears.
mgr.state.waiting_until = time.time() - 1
assert mgr.is_waiting() is False
assert mgr.state.waiting_until == 0.0
def test_continue_verdict_still_continues_with_background(self, hermes_home):
"""A running process present but judge says continue → normal loop."""
from hermes_cli import goals
from hermes_cli.goals import GoalManager
mgr = GoalManager(session_id="jw-cont", default_max_turns=10)
mgr.set("do work")
with patch.object(
goals, "judge_goal",
return_value=("continue", "more to do", False, None),
):
decision = mgr.evaluate_after_turn(
"made progress",
background_processes=[{"pid": 999999, "command": "x", "status": "running"}],
)
assert decision["verdict"] == "continue"
assert decision["should_continue"] is True
assert mgr.state.waiting_on_pid is None
# ──────────────────────────────────────────────────────────────────────
# Session/trigger barrier — wait on a process's OWN trigger, not just exit
# ──────────────────────────────────────────────────────────────────────
class TestSessionTriggerBarrier:
"""The session barrier (wait_on_session) releases when a process's own
trigger fires a watch_patterns match mid-run (process may never exit)
OR exit not only on PID exit. CI-safe: uses synthetic registry session
objects, no real child processes."""
@staticmethod
def _inject(sid, *, watch_patterns=None, exited=False):
import time as _t
from tools.process_registry import process_registry, ProcessSession
s = ProcessSession(id=sid, command="watcher.sh", task_id="t",
session_key="", cwd="/tmp", started_at=_t.time())
if watch_patterns:
s.watch_patterns = list(watch_patterns)
s.exited = exited
if exited:
process_registry._finished[sid] = s
else:
process_registry._running[sid] = s
return s, process_registry
def test_registry_is_session_waiting_running_unmatched(self, hermes_home):
s, reg = self._inject("proc_t1", watch_patterns=["READY"])
assert reg.is_session_waiting("proc_t1") is True
def test_registry_releases_on_watch_match_while_alive(self, hermes_home):
s, reg = self._inject("proc_t2", watch_patterns=["READY"])
assert reg.is_session_waiting("proc_t2") is True
s._watch_hits = 1 # what _check_watch_patterns sets on a match
# Released even though the process is STILL running (never exited).
assert s.exited is False
assert reg.is_session_waiting("proc_t2") is False
def test_registry_releases_on_exit_plain_session(self, hermes_home):
s, reg = self._inject("proc_t3") # no watch pattern
assert reg.is_session_waiting("proc_t3") is True
s.exited = True
assert reg.is_session_waiting("proc_t3") is False
def test_registry_unknown_session_never_waits(self, hermes_home):
from tools.process_registry import process_registry
assert process_registry.is_session_waiting("proc_does_not_exist") is False
def test_goal_parks_on_session_and_releases_on_trigger(self, hermes_home):
from hermes_cli import goals
from hermes_cli.goals import GoalManager
s, reg = self._inject("proc_t4", watch_patterns=["BUILD SUCCESSFUL"])
mgr = GoalManager(session_id="st-goal", default_max_turns=10)
mgr.set("wait for the build to succeed")
with patch.object(
goals, "judge_goal",
return_value=("wait", "blocked on build", False, {"session_id": "proc_t4"}),
):
decision = mgr.evaluate_after_turn(
"Started the build watcher.",
background_processes=[{
"session_id": "proc_t4", "pid": 4242, "command": "watcher.sh",
"status": "running", "watch_patterns": ["BUILD SUCCESSFUL"],
"watch_hit": False,
}],
)
assert decision["verdict"] == "wait"
assert mgr.state.waiting_on_session == "proc_t4"
assert mgr.is_waiting() is True
# Judge must NOT be called again while parked.
judge = MagicMock()
with patch.object(goals, "judge_goal", judge):
d2 = mgr.evaluate_after_turn("still building")
judge.assert_not_called()
assert d2["should_continue"] is False
# Trigger fires mid-run (process still alive) → barrier releases.
s._watch_hits = 1
assert mgr.is_waiting() is False
assert mgr.state.waiting_on_session is None
# Loop resumes with a real judge verdict.
with patch.object(goals, "judge_goal",
return_value=("continue", "build done", False, None)):
d3 = mgr.evaluate_after_turn("build succeeded")
assert d3["should_continue"] is True
def test_wait_on_session_validation(self, hermes_home):
from hermes_cli.goals import GoalManager
mgr = GoalManager(session_id="st-val")
# No active goal → RuntimeError
try:
mgr.wait_on_session("proc_x")
assert False, "expected RuntimeError"
except RuntimeError:
pass
mgr.set("g")
try:
mgr.wait_on_session("")
assert False, "expected ValueError"
except ValueError:
pass
def test_session_directive_parsed_from_judge(self, hermes_home):
from hermes_cli.goals import _parse_judge_response
v, _, pf, wd = _parse_judge_response(
'{"verdict": "wait", "wait_on_session": "proc_abc", "reason": "r"}'
)
assert v == "wait"
assert pf is False
assert wd == {"session_id": "proc_abc"}
def test_old_state_loads_without_session_field(self, hermes_home):
from hermes_cli.goals import GoalState
st = GoalState.from_json(json.dumps({
"goal": "g", "status": "active", "turns_used": 0, "max_turns": 20,
}))
assert st.waiting_on_session is None

View file

@ -179,9 +179,10 @@ def _patch_judge(monkeypatch, verdicts):
"""Make judge_goal return a scripted sequence of verdicts."""
seq = list(verdicts)
def _fake_judge(goal, response, subgoals=None):
def _fake_judge(goal, response, subgoals=None, background_processes=None, **_kw):
v = seq.pop(0) if seq else "done"
return v, f"scripted:{v}", False
# 4-tuple contract: (verdict, reason, parse_failed, wait_directive)
return v, f"scripted:{v}", False, None
monkeypatch.setattr(goals, "judge_goal", _fake_judge)

View file

@ -1055,6 +1055,42 @@ class ProcessRegistry:
"""Check if a completion notification was already consumed via wait/log."""
return session_id in self._completion_consumed
def is_session_waiting(self, session_id: str) -> bool:
"""Whether a goal loop parked on this session should still be parked.
Used by the goal-loop wait barrier (``hermes_cli.goals``) to support
waiting on a process's OWN trigger, not just its exit. A session is
"still waiting" when:
- it is still running, AND
- if it has ``watch_patterns``, none has matched yet (so a
long-lived watcher that fires a trigger mid-run and may never
exit unblocks the moment its pattern hits, not on exit).
Returns False (don't wait) when the session has exited, its watch
pattern has already fired, or the session is unknown so a stale or
already-triggered barrier can never wedge the loop.
"""
if not session_id:
return False
with self._lock:
session = self._running.get(session_id) or self._finished.get(session_id)
if session is None:
return False
# Refresh detached/remote state so .exited is current.
try:
self._refresh_detached_session(session)
except Exception:
pass
if session.exited:
return False
# Watch-pattern process: the trigger is a pattern match, not exit.
# Once any match has been delivered, the wait is satisfied even though
# the process keeps running (server/daemon/watcher case).
if session.watch_patterns and not session._watch_disabled:
if session._watch_hits > 0:
return False
return True
def _drain_should_skip(self, session_id: str) -> bool:
"""Whether the CLI drain should skip a completion event for this session.
@ -1500,6 +1536,14 @@ class ProcessRegistry:
"status": "exited" if s.exited else "running",
"output_preview": s.output_buffer[-200:] if s.output_buffer else "",
}
# Trigger metadata so a goal-loop judge can decide to wait on this
# process's OWN signal (a watch-pattern match or completion), not
# just its exit. A watcher with watch_patterns may never exit.
if s.watch_patterns and not s._watch_disabled:
entry["watch_patterns"] = list(s.watch_patterns)
entry["watch_hit"] = s._watch_hits > 0
if s.notify_on_complete:
entry["notify_on_complete"] = True
if s.exited:
entry["exit_code"] = s.exit_code
if s.detached:

View file

@ -6716,9 +6716,15 @@ def _run_prompt_submit(rid, sid: str, session: dict, text: Any) -> None:
default_max_turns=goal_max_turns,
)
if goal_mgr.is_active():
try:
from hermes_cli.goals import gather_background_processes as _gather_bg
_bg_procs = _gather_bg()
except Exception:
_bg_procs = None
decision = goal_mgr.evaluate_after_turn(
raw,
user_initiated=True,
background_processes=_bg_procs,
)
verdict_msg = decision.get("message") or ""
if verdict_msg:

View file

@ -44,6 +44,8 @@ What you'll see:
| `/goal pause` | Stop the auto-continuation loop without clearing the goal. |
| `/goal resume` | Resume the loop (resets the turn counter back to zero). |
| `/goal clear` | Drop the goal entirely. |
| `/goal wait <pid> [reason]` | Park the loop on a background process — it stops re-poking the agent every turn while the process runs, and auto-resumes when it exits. |
| `/goal unwait` | Drop the wait barrier and resume the loop immediately. |
Works identically on the CLI and every gateway platform (Telegram, Discord, Slack, Matrix, Signal, WhatsApp, SMS, iMessage, Webhook, API server, and the web dashboard).
@ -62,6 +64,29 @@ Subgoals are persisted alongside the goal in `SessionDB.state_meta`, so they sur
Use this when you start a loop ("fix the failing tests") and notice partway through that you also want it to "and add a regression test for the bug you just patched" — `/subgoal add a regression test` tightens the success criteria without breaking the running loop.
## Parking on a background process: automatic, with a manual override
Some goals are gated on something that takes minutes and runs on its own — CI on a pushed PR, a long build, a test matrix, a deploy, a rate-limit cooldown. Without help, the goal loop would re-poke the agent every turn into "is it done yet?" busy-work while it waits.
**This is handled automatically.** Every turn, the judge is shown the agent's live background processes (the `terminal(background=true)` registry — pid, session id, command, uptime, recent output, and any `watch_patterns` / `notify_on_complete` trigger) alongside the goal and the agent's response. When the agent's progress is genuinely gated on one of them, the judge returns a **`wait`** verdict instead of `continue`, and the loop **parks**: the next turns are skipped (no judge call, no continuation, no turn consumed) until the wait is satisfied — then it resumes normally with the result in hand. The judge can also park on a **time** basis (`wait_for_seconds`) for backoff/cooldown waits. `/goal status` shows `⏳ Goal (parked …)` while parked.
The judge picks the right kind of wait from the process's own signal:
- **`wait_on_session <id>`** — releases when the process's *own trigger* fires: it exits, **or** (if it was started with `watch_patterns`) its pattern matches. This is the one for a long-lived watcher / server / poller that signals **mid-run** (e.g. a build process that prints `BUILD SUCCESSFUL` and keeps running, or a `notify_on_complete` watcher) and may never exit on its own.
- **`wait_on_pid <pid>`** — releases on process exit only.
- **`wait_for_seconds <n>`** — releases after a fixed delay.
You don't type anything for this — it's the judge's decision, made from the process context the loop hands it. The manual commands exist as an override:
| Command | What it does |
|---|---|
| `/goal wait <pid> [reason]` | Manually park the loop until the process with that PID exits. |
| `/goal unwait` | Clear any wait barrier (judge- or manually-set) and resume immediately. |
The barrier (pid- or time-based) is persisted with the goal in `SessionDB.state_meta`, so it survives `/resume`. `/goal pause`, `/goal resume`, and `/goal clear` all drop it. If the PID is already dead when the barrier is set (or dies while parked), or the time deadline passes, the barrier clears on the next check — a stale barrier can never wedge the loop.
Typical flow: the agent pushes a PR, starts a CI watcher with `terminal(background=true, notify_on_complete=true)`, and reports "watching CI." The judge sees the watcher process still running, returns `wait` on its pid, and the loop goes quiet — then picks back up the instant CI finishes and judges the goal against the actual result.
## Behavior details
### The judge
@ -94,7 +119,7 @@ Any real message you send while a goal is active takes priority over the continu
### Mid-run safety (gateway)
While an agent is already running, `/goal status`, `/goal pause`, and `/goal clear` are safe to run — they only touch control-plane state and don't interrupt the current turn. Setting a **new** goal mid-run (`/goal <new text>`) is rejected with a message telling you to `/stop` first, so the old continuation can't race the new one.
While an agent is already running, `/goal status`, `/goal pause`, `/goal clear`, `/goal wait`, and `/goal unwait` are safe to run — they only touch control-plane state and don't interrupt the current turn. Setting a **new** goal mid-run (`/goal <new text>`) is rejected with a message telling you to `/stop` first, so the old continuation can't race the new one.
### Persistence