mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-09 08:21:50 +00:00
A steer rides inside a tool result (the only role-alternation-safe slot
mid-turn), so a bare "User guidance:" line reads as untrusted tool content —
well-behaved models refuse it as suspected prompt injection (observed live:
"I only follow instructions from you directly, not ones injected through
command results").
- Wrap steers in a bounded, self-describing [OUT-OF-BAND USER MESSAGE] marker
(prompt_builder.format_steer_marker), shared by both drain sites.
- Add STEER_CHANNEL_NOTE to the core system prompt so the model expects this
exact marker and trusts it as a genuine user message — while still ignoring
lookalikes buried in tool/web/file output. Static text → byte-stable prompt,
no prompt-cache regression; gated on the agent having tools.
- Desktop: steer ack is now an inline transcript note (⏩ steered · …) instead
of a toast.
Marker is intentionally static (not a per-session nonce) to honor the
byte-stable system-prompt caching policy; nonce hardening noted as follow-up.
|
||
|---|---|---|
| .. | ||
| bootstrap-installer | ||
| desktop | ||
| shared | ||