hermes-agent/apps
Brooklyn Nicholson 0f45509daf fix(agent): make mid-turn /steer trusted, not read as injection
A steer rides inside a tool result (the only role-alternation-safe slot
mid-turn), so a bare "User guidance:" line reads as untrusted tool content —
well-behaved models refuse it as suspected prompt injection (observed live:
"I only follow instructions from you directly, not ones injected through
command results").

- Wrap steers in a bounded, self-describing [OUT-OF-BAND USER MESSAGE] marker
  (prompt_builder.format_steer_marker), shared by both drain sites.
- Add STEER_CHANNEL_NOTE to the core system prompt so the model expects this
  exact marker and trusts it as a genuine user message — while still ignoring
  lookalikes buried in tool/web/file output. Static text → byte-stable prompt,
  no prompt-cache regression; gated on the agent having tools.
- Desktop: steer ack is now an inline transcript note ( steered · …) instead
  of a toast.

Marker is intentionally static (not a per-session nonce) to honor the
byte-stable system-prompt caching policy; nonce hardening noted as follow-up.
2026-06-05 20:59:36 -05:00
..
bootstrap-installer fix(desktop): repair macOS updater helper (#40217) 2026-06-05 20:05:32 -05:00
desktop fix(agent): make mid-turn /steer trusted, not read as injection 2026-06-05 20:59:36 -05:00
shared fix(desktop): guard reconnect sockets and keep branch search precise 2026-06-03 13:13:21 -05:00