feat(goals): /subgoal — user-added criteria appended to active /goal (#25449)

* feat(goals): /subgoal — user-added criteria appended to active /goal Layers a /subgoal command on top of the existing freeform Ralph judge loop. The user can append extra criteria mid-loop; the judge factors them into its done/continue verdict and the continuation prompt surfaces them to the agent. No new tool, no agent self-judging — the existing judge model just sees a richer prompt. Forms: /subgoal show current subgoals /subgoal <text> append a criterion /subgoal remove <n> drop subgoal n (1-based) /subgoal clear wipe all subgoals How it integrates: - GoalState gains `subgoals: List[str]` (default []), backwards-compat for existing state_meta rows. - judge_goal accepts an optional subgoals kwarg; non-empty switches to JUDGE_USER_PROMPT_WITH_SUBGOALS_TEMPLATE which lists them as numbered criteria and asks 'is the goal AND every additional criterion satisfied?' - next_continuation_prompt picks CONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATE when non-empty so the agent sees what to target. - /subgoal is allowed mid-run on the gateway since it only touches the state the judge reads at turn boundary — no race with the running turn. - Status line shows '... , N subgoals' when present. Surface: - hermes_cli/goals.py — field, prompt blocks, manager methods, judge weave - hermes_cli/commands.py — /subgoal CommandDef - cli.py — _handle_subgoal_command - gateway/run.py — _handle_subgoal_command + mid-run dispatch - tests/hermes_cli/test_goals.py — 15 new tests (backcompat, mutation, persistence, prompt template selection, judge-prompt content via mock, status-line rendering) 77 goal-related tests passing across goals + cli + gateway + tui. * fix(goals): slash commands don't preempt the goal-continuation hook Two findings from live-testing /subgoal: 1. Slash commands queued while the agent is running landed in _pending_input (same queue as real user messages). The goal hook's 'is a real user message pending?' check returned True and silently skipped — but the slash command consumes its queue slot via process_command() which never re-fires the goal hook, so the loop stalls indefinitely. Now the hook peeks the queue and only defers when a non-slash payload is present. 2. The with-subgoals judge prompt was too soft — opus 4.7 said 'done, implying all requirements met' without verifying. Tightened to demand specific per-criterion evidence (file contents, output line, command result) and explicitly reject phrases like 'implying it was done.' Live verified: /subgoal injected mid-loop now correctly forces the judge to refuse done until the new criterion is met. Agent gets the continuation prompt with subgoals listed, updates the script, judge confirms done with specific evidence cited.
2026-05-18 04:41:56 +00:00 · 2026-05-13 22:55:09 -07:00 · 2026-05-13 22:55:09 -07:00 · 8f19078c6a
commit 8f19078c6a
parent d110ce4493
5 changed files with 531 additions and 14 deletions
--- a/cli.py
+++ b/cli.py
@ -7647,6 +7647,8 @@ class HermesCLI:
                _cprint(f"  No agent running; queued as next turn: {payload[:80]}{'...' if len(payload) > 80 else ''}")
        elif canonical == "goal":
            self._handle_goal_command(cmd_original)
+        elif canonical == "subgoal":
+            self._handle_subgoal_command(cmd_original)
        elif canonical == "skin":
            self._handle_skin_command(cmd_original)
        elif canonical == "voice":
@ -8245,6 +8247,81 @@ class HermesCLI:
        except Exception:
            pass

+    def _handle_subgoal_command(self, cmd: str) -> None:
+        """Dispatch /subgoal subcommands.
+
+        Forms:
+          /subgoal                              show current subgoals
+          /subgoal <text>                       append a criterion
+          /subgoal remove <n>                   drop subgoal n (1-based)
+          /subgoal clear                        wipe all subgoals
+
+        Subgoals are extra criteria the user adds mid-loop. They get
+        appended to both the judge prompt (verdict must consider them)
+        and the continuation prompt (agent sees them) on the next turn
+        boundary. No special kick — the running turn finishes, the next
+        judge call includes them.
+        """
+        parts = (cmd or "").strip().split(None, 2)
+        arg = " ".join(parts[1:]).strip() if len(parts) > 1 else ""
+
+        mgr = self._get_goal_manager()
+        if mgr is None:
+            _cprint(f"  {_DIM}Goals unavailable (no active session).{_RST}")
+            return
+
+        if not mgr.has_goal():
+            _cprint(f"  {_DIM}No active goal. Set one with /goal <text>.{_RST}")
+            return
+
+        # No args → list current subgoals.
+        if not arg:
+            _cprint(f"  {mgr.status_line()}")
+            _cprint(f"  {mgr.render_subgoals()}")
+            return
+
+        tokens = arg.split(None, 1)
+        verb = tokens[0].lower()
+        rest = tokens[1].strip() if len(tokens) > 1 else ""
+
+        if verb == "remove":
+            if not rest:
+                _cprint("  Usage: /subgoal remove <n>")
+                return
+            try:
+                idx = int(rest.split()[0])
+            except ValueError:
+                _cprint("  /subgoal remove: <n> must be an integer (1-based index).")
+                return
+            try:
+                removed = mgr.remove_subgoal(idx)
+            except (IndexError, RuntimeError) as exc:
+                _cprint(f"  /subgoal remove: {exc}")
+                return
+            _cprint(f"  ✓ Removed subgoal {idx}: {removed}")
+            return
+
+        if verb == "clear":
+            try:
+                prev = mgr.clear_subgoals()
+            except RuntimeError as exc:
+                _cprint(f"  /subgoal clear: {exc}")
+                return
+            if prev:
+                _cprint(f"  ✓ Cleared {prev} subgoal{'s' if prev != 1 else ''}.")
+            else:
+                _cprint(f"  {_DIM}No subgoals to clear.{_RST}")
+            return
+
+        # Otherwise — append the whole arg as a new subgoal.
+        try:
+            text = mgr.add_subgoal(arg)
+        except (ValueError, RuntimeError) as exc:
+            _cprint(f"  /subgoal: {exc}")
+            return
+        idx = len(mgr.state.subgoals) if mgr.state else 0
+        _cprint(f"  ✓ Added subgoal {idx}: {text}")
+
    def _maybe_continue_goal_after_turn(self) -> None:
        """Hook run after every CLI turn. Judges + maybe re-queues.

@ -8271,10 +8348,36 @@ class HermesCLI:

        # If a real user message is already queued, don't inject a
        # continuation prompt on top — let the user's turn go first.
+        # Slash commands don't count as "real user messages" for this
+        # check: they're inspection/mutation (e.g. /subgoal added mid-
+        # run) and the process_loop dispatches them via process_command,
+        # not via chat(). If we treat a queued /subgoal as preempting,
+        # the goal loop silently stalls — we'd return here, then the
+        # slash command consumes its queue slot via process_command()
+        # which never re-fires the goal hook. Peek at all queued entries
+        # and only defer when there's a non-slash payload.
        try:
-            if getattr(self, "_pending_input", None) is not None \
-                    and not self._pending_input.empty():
-                return
+            pending = getattr(self, "_pending_input", None)
+            if pending is not None and not pending.empty():
+                has_real_message = False
+                try:
+                    # Queue.queue is the underlying deque — direct peek
+                    # without disturbing FIFO order.
+                    for entry in list(pending.queue):
+                        # Bundled payloads are (text, images) tuples;
+                        # unpack for inspection.
+                        if isinstance(entry, tuple) and entry:
+                            entry = entry[0]
+                        if isinstance(entry, str) and _looks_like_slash_command(entry):
+                            continue
+                        has_real_message = True
+                        break
+                except Exception:
+                    # Fallback: if we can't introspect the queue, behave
+                    # like the old check and defer to be safe.
+                    has_real_message = True
+                if has_real_message:
+                    return
        except Exception:
            pass