fix(terminal): warn at call time when background=true runs silently (#31289)

`terminal(background=true)` without `notify_on_complete=true` or
`watch_patterns` runs the process SILENTLY — the agent has no way
to learn it finished short of calling `process(action='poll')`
explicitly. That's correct for genuine long-lived processes (servers,
watchers, daemons) but is a footgun for every bounded task (tests,
builds, deploys, CI pollers, batch jobs), which is the vast majority
of background uses.

Hit on May 23, 2026 (PR #31231 incident): agent launched a CI-watch
loop with `background=true` only. The poller ran fine, exited green
6 minutes later, agent never noticed. User had to surface 'we are
green CI, you can merge.' Memory and skill docs said *what* to do
(poll in background) but not *how* to receive the result. The
`notify_on_complete=true` flag exists and works, but is easy to
forget when bg seems sufficient on its own.

Two changes here, mutually reinforcing:

1. Runtime nudge: tool result for `background=true` w/o notify or
   watch_patterns now includes a `hint` field explaining the silent-
   process failure mode and pointing at the corrective flag. Agent
   sees it on the same turn and self-corrects without needing the
   user to surface anything. Cost for legitimate server cases is one
   ignored read (~50 tokens); cost for forgot-notify cases is
   prevented blindness (potentially many turns, or a user nudge).
   False positives << false negatives.

2. Schema/description rewrite: top-level TERMINAL_TOOL_DESCRIPTION
   and the `background` field description now lead with 'Almost
   always pair with notify_on_complete=true' instead of presenting
   it as one of two equally-likely patterns. The two legitimate
   non-notify shapes (long-lived servers; watch_patterns mid-process
   signals) are still documented, but as the minority case.

Tests cover all four shapes: bg-only emits hint, bg+notify doesn't,
bg+watch_patterns doesn't, foreground doesn't. 4 new tests; full
suite of background/process tests stays green (160/160 across the
relevant 6 test files).
This commit is contained in:
Teknium 2026-05-23 21:02:14 -07:00 committed by GitHub
parent 39b8d1d313
commit d97c324473
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 185 additions and 4 deletions

View file

@ -904,9 +904,9 @@ Do NOT use echo/cat heredoc to create files — use write_file instead.
Reserve terminal for: builds, installs, git, processes, scripts, network, package managers, and anything that needs a shell.
Foreground (default): Commands return INSTANTLY when done, even if the timeout is high. Set timeout=300 for long builds/scripts you'll still get the result in seconds if it's fast. Prefer foreground for short commands.
Background: Set background=true to get a session_id. Two patterns:
(1) Long-lived processes that never exit (servers, watchers).
(2) Long-running tasks with notify_on_complete=true you can keep working on other things and the system auto-notifies you when the task finishes. Great for test suites, builds, deployments, or anything that takes more than a minute.
Background: Set background=true to get a session_id. Almost always pair with notify_on_complete=true bg without notify runs SILENTLY and you have no way to learn it finished short of calling process(action='poll') yourself. Two legitimate uses:
(1) Long-lived processes that never exit (servers, watchers, daemons) silent is correct, there's no exit to notify on.
(2) Long-running bounded tasks (tests, builds, deploys, CI pollers, batch jobs) MUST set notify_on_complete=true. Without it you'll either forget to poll or sit blocked waiting for the user to surface the result.
For servers/watchers, do NOT use shell-level background wrappers (nohup/disown/setsid/trailing '&') in foreground mode. Use background=true so Hermes can track lifecycle and output.
After starting a server, verify readiness with a health check or log signal, then run tests in a separate terminal() call. Avoid blind sleep loops.
Use process(action="poll") for progress checks, process(action="wait") to block until done.
@ -1959,6 +1959,32 @@ def terminal_tool(
if pty_disabled_reason:
result_data["pty_note"] = pty_disabled_reason
# Nudge: background=True without notify_on_complete=True OR
# watch_patterns is a silent process. The agent has NO way to
# learn it finished short of calling process(action="poll"/"wait")
# explicitly. That's correct only for genuine long-lived
# processes that never exit (servers, watchers). For every
# bounded task (tests, builds, CI pollers, deploys, batch
# jobs) the agent almost certainly wanted notification and
# forgot the flag. May 2026 PR #31231 incident: bg CI poller
# ran fine, exited green, agent never noticed — user had to
# surface the result. Cheap nudge here costs ~one read for
# server cases (false positive) and prevents silent
# blindness for bounded-task cases (false negative).
if background and not notify_on_complete and not watch_patterns:
result_data["hint"] = (
"background=true without notify_on_complete=true means "
"this process runs SILENTLY — you will not be told when "
"it exits. If this is a bounded task (test suite, build, "
"CI poller, deploy, anything with a defined end), you "
"almost certainly wanted notify_on_complete=true so the "
"system pings you on exit. Re-launch with "
"notify_on_complete=true, or call process(action='poll') "
"/ process(action='wait') yourself to learn the outcome. "
"Only ignore this hint for genuine long-lived processes "
"that never exit (servers, watchers, daemons)."
)
# Populate routing metadata on the session so that
# watch-pattern and completion notifications can be
# routed back to the correct chat/thread.
@ -2322,7 +2348,7 @@ TERMINAL_SCHEMA = {
},
"background": {
"type": "boolean",
"description": "Run the command in the background. Two patterns: (1) Long-lived processes that never exit (servers, watchers). (2) Long-running tasks paired with notify_on_complete=true — you can keep working and get notified when the task finishes. For short commands, prefer foreground with a generous timeout instead.",
"description": "Run the command in the background. Almost always pair with notify_on_complete=true — without it, the process runs silently and you'll have no way to learn it finished short of calling process(action='poll') yourself (easy to forget, leading to silent blindness on long jobs). Two legitimate patterns: (1) Long-lived processes that never exit (servers, watchers, daemons) — these stay silent because there's no exit to notify on. (2) Long-running bounded tasks (tests, builds, deploys, CI pollers, batch jobs) — these MUST set notify_on_complete=true. For short commands, prefer foreground with a generous timeout instead.",
"default": False
},
"timeout": {