feat(goals): /subgoal — user-added criteria appended to active /goal (#25449)

* feat(goals): /subgoal — user-added criteria appended to active /goal Layers a /subgoal command on top of the existing freeform Ralph judge loop. The user can append extra criteria mid-loop; the judge factors them into its done/continue verdict and the continuation prompt surfaces them to the agent. No new tool, no agent self-judging — the existing judge model just sees a richer prompt. Forms: /subgoal show current subgoals /subgoal <text> append a criterion /subgoal remove <n> drop subgoal n (1-based) /subgoal clear wipe all subgoals How it integrates: - GoalState gains `subgoals: List[str]` (default []), backwards-compat for existing state_meta rows. - judge_goal accepts an optional subgoals kwarg; non-empty switches to JUDGE_USER_PROMPT_WITH_SUBGOALS_TEMPLATE which lists them as numbered criteria and asks 'is the goal AND every additional criterion satisfied?' - next_continuation_prompt picks CONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATE when non-empty so the agent sees what to target. - /subgoal is allowed mid-run on the gateway since it only touches the state the judge reads at turn boundary — no race with the running turn. - Status line shows '... , N subgoals' when present. Surface: - hermes_cli/goals.py — field, prompt blocks, manager methods, judge weave - hermes_cli/commands.py — /subgoal CommandDef - cli.py — _handle_subgoal_command - gateway/run.py — _handle_subgoal_command + mid-run dispatch - tests/hermes_cli/test_goals.py — 15 new tests (backcompat, mutation, persistence, prompt template selection, judge-prompt content via mock, status-line rendering) 77 goal-related tests passing across goals + cli + gateway + tui. * fix(goals): slash commands don't preempt the goal-continuation hook Two findings from live-testing /subgoal: 1. Slash commands queued while the agent is running landed in _pending_input (same queue as real user messages). The goal hook's 'is a real user message pending?' check returned True and silently skipped — but the slash command consumes its queue slot via process_command() which never re-fires the goal hook, so the loop stalls indefinitely. Now the hook peeks the queue and only defers when a non-slash payload is present. 2. The with-subgoals judge prompt was too soft — opus 4.7 said 'done, implying all requirements met' without verifying. Tightened to demand specific per-criterion evidence (file contents, output line, command result) and explicitly reject phrases like 'implying it was done.' Live verified: /subgoal injected mid-loop now correctly forces the judge to refuse done until the new criterion is met. Agent gets the continuation prompt with subgoals listed, updates the script, judge confirms done with specific evidence cited.
2026-05-18 04:41:56 +00:00 · 2026-05-13 22:55:09 -07:00 · 2026-05-13 22:55:09 -07:00 · 8f19078c6a
commit 8f19078c6a
parent d110ce4493
5 changed files with 531 additions and 14 deletions
--- a/tests/hermes_cli/test_goals.py
+++ b/tests/hermes_cli/test_goals.py
@ -514,3 +514,227 @@ class TestJudgeParseFailureAutoPause:
        reloaded = load_goal("parse-fail-sid-4")
        assert reloaded is not None
        assert reloaded.consecutive_parse_failures == 2
+
+
+# ──────────────────────────────────────────────────────────────────────
+# /subgoal — user-added criteria
+# ──────────────────────────────────────────────────────────────────────
+
+
+class TestGoalStateSubgoalsBackcompat:
+    def test_old_state_meta_row_loads_without_subgoals(self):
+        """A goal serialized BEFORE the subgoals field existed must
+        round-trip with an empty list, not crash."""
+        import json
+        from hermes_cli.goals import GoalState
+
+        legacy = json.dumps({
+            "goal": "do a thing",
+            "status": "active",
+            "turns_used": 2,
+            "max_turns": 20,
+            "created_at": 1.0,
+            "last_turn_at": 2.0,
+            "consecutive_parse_failures": 0,
+        })
+        state = GoalState.from_json(legacy)
+        assert state.goal == "do a thing"
+        assert state.subgoals == []
+
+    def test_subgoals_round_trip(self):
+        from hermes_cli.goals import GoalState
+        state = GoalState(goal="g", subgoals=["a", "b", "c"])
+        rt = GoalState.from_json(state.to_json())
+        assert rt.subgoals == ["a", "b", "c"]
+
+
+class TestGoalManagerSubgoals:
+    def test_add_subgoal(self, hermes_home):
+        from hermes_cli.goals import GoalManager
+        mgr = GoalManager(session_id="sub-add")
+        mgr.set("main goal")
+        text = mgr.add_subgoal("  use bullet points  ")
+        assert text == "use bullet points"
+        assert mgr.state.subgoals == ["use bullet points"]
+
+    def test_add_subgoal_requires_active_goal(self, hermes_home):
+        import pytest
+        from hermes_cli.goals import GoalManager
+        mgr = GoalManager(session_id="sub-noactive")
+        with pytest.raises(RuntimeError):
+            mgr.add_subgoal("oops")
+
+    def test_add_empty_subgoal_rejected(self, hermes_home):
+        import pytest
+        from hermes_cli.goals import GoalManager
+        mgr = GoalManager(session_id="sub-empty")
+        mgr.set("g")
+        with pytest.raises(ValueError):
+            mgr.add_subgoal("   ")
+
+    def test_remove_subgoal(self, hermes_home):
+        from hermes_cli.goals import GoalManager
+        mgr = GoalManager(session_id="sub-remove")
+        mgr.set("g")
+        mgr.add_subgoal("first")
+        mgr.add_subgoal("second")
+        mgr.add_subgoal("third")
+        removed = mgr.remove_subgoal(2)
+        assert removed == "second"
+        assert mgr.state.subgoals == ["first", "third"]
+
+    def test_remove_subgoal_out_of_range(self, hermes_home):
+        import pytest
+        from hermes_cli.goals import GoalManager
+        mgr = GoalManager(session_id="sub-oob")
+        mgr.set("g")
+        mgr.add_subgoal("only")
+        with pytest.raises(IndexError):
+            mgr.remove_subgoal(5)
+        with pytest.raises(IndexError):
+            mgr.remove_subgoal(0)
+
+    def test_clear_subgoals(self, hermes_home):
+        from hermes_cli.goals import GoalManager
+        mgr = GoalManager(session_id="sub-clear")
+        mgr.set("g")
+        mgr.add_subgoal("a")
+        mgr.add_subgoal("b")
+        prev = mgr.clear_subgoals()
+        assert prev == 2
+        assert mgr.state.subgoals == []
+
+    def test_subgoals_persist_across_reloads(self, hermes_home):
+        """Subgoals stored in SessionDB survive a fresh GoalManager."""
+        from hermes_cli.goals import GoalManager
+        mgr = GoalManager(session_id="sub-persist")
+        mgr.set("g")
+        mgr.add_subgoal("first")
+        mgr.add_subgoal("second")
+
+        mgr2 = GoalManager(session_id="sub-persist")
+        assert mgr2.state.subgoals == ["first", "second"]
+
+
+class TestContinuationPromptWithSubgoals:
+    def test_empty_subgoals_uses_original_template(self, hermes_home):
+        from hermes_cli.goals import GoalManager
+        mgr = GoalManager(session_id="cp-empty")
+        mgr.set("ship the feature")
+        prompt = mgr.next_continuation_prompt()
+        assert prompt is not None
+        assert "ship the feature" in prompt
+        assert "Additional criteria" not in prompt
+
+    def test_with_subgoals_includes_them(self, hermes_home):
+        from hermes_cli.goals import GoalManager
+        mgr = GoalManager(session_id="cp-with")
+        mgr.set("ship the feature")
+        mgr.add_subgoal("write tests")
+        mgr.add_subgoal("update docs")
+        prompt = mgr.next_continuation_prompt()
+        assert prompt is not None
+        assert "ship the feature" in prompt
+        assert "Additional criteria" in prompt
+        assert "1. write tests" in prompt
+        assert "2. update docs" in prompt
+
+
+class TestJudgeGoalWithSubgoals:
+    def test_judge_uses_subgoals_template_when_provided(self, hermes_home):
+        """judge_goal switches templates when subgoals is non-empty.
+
+        We don't actually call the model — we patch the aux client to
+        capture the prompt that would be sent.
+        """
+        from unittest.mock import patch, MagicMock
+        from hermes_cli import goals
+
+        captured = {}
+
+        class _FakeMsg:
+            content = '{"done": true, "reason": "all done"}'
+        class _FakeChoice:
+            message = _FakeMsg()
+        class _FakeResp:
+            choices = [_FakeChoice()]
+        class _FakeClient:
+            class chat:
+                class completions:
+                    @staticmethod
+                    def create(**kwargs):
+                        captured.update(kwargs)
+                        return _FakeResp()
+
+        with patch.object(goals, "get_text_auxiliary_client",
+                          return_value=(_FakeClient, "fake-model"), create=True), \
+             patch.object(goals, "get_auxiliary_extra_body",
+                          return_value=None, create=True), \
+             patch("agent.auxiliary_client.get_text_auxiliary_client",
+                   return_value=(_FakeClient, "fake-model")), \
+             patch("agent.auxiliary_client.get_auxiliary_extra_body",
+                   return_value=None):
+            verdict, reason, parse_failed = goals.judge_goal(
+                "ship the feature",
+                "ok shipped",
+                subgoals=["write tests", "update docs"],
+            )
+
+        # The aux client was called with a prompt that includes the subgoals.
+        sent_messages = captured.get("messages") or []
+        user_msg = next((m["content"] for m in sent_messages if m["role"] == "user"), "")
+        assert "Additional criteria" in user_msg
+        assert "1. write tests" in user_msg
+        assert "2. update docs" in user_msg
+        assert "every additional criterion" in user_msg
+        assert verdict == "done"
+
+    def test_judge_uses_original_template_when_no_subgoals(self, hermes_home):
+        from unittest.mock import patch
+        from hermes_cli import goals
+
+        captured = {}
+
+        class _FakeMsg:
+            content = '{"done": true, "reason": "ok"}'
+        class _FakeChoice:
+            message = _FakeMsg()
+        class _FakeResp:
+            choices = [_FakeChoice()]
+        class _FakeClient:
+            class chat:
+                class completions:
+                    @staticmethod
+                    def create(**kwargs):
+                        captured.update(kwargs)
+                        return _FakeResp()
+
+        with patch("agent.auxiliary_client.get_text_auxiliary_client",
+                   return_value=(_FakeClient, "fake-model")), \
+             patch("agent.auxiliary_client.get_auxiliary_extra_body",
+                   return_value=None):
+            goals.judge_goal("ship it", "done", subgoals=None)
+
+        sent_messages = captured.get("messages") or []
+        user_msg = next((m["content"] for m in sent_messages if m["role"] == "user"), "")
+        assert "Additional criteria" not in user_msg
+        assert "ship it" in user_msg
+
+
+class TestStatusLineSubgoalCount:
+    def test_status_line_no_subgoals(self, hermes_home):
+        from hermes_cli.goals import GoalManager
+        mgr = GoalManager(session_id="sl-empty")
+        mgr.set("ship it")
+        line = mgr.status_line()
+        assert "ship it" in line
+        assert "subgoal" not in line.lower()
+
+    def test_status_line_with_subgoals(self, hermes_home):
+        from hermes_cli.goals import GoalManager
+        mgr = GoalManager(session_id="sl-with")
+        mgr.set("ship it")
+        mgr.add_subgoal("a")
+        mgr.add_subgoal("b")
+        line = mgr.status_line()
+        assert "2 subgoals" in line