fix: move pre_llm_call plugin context to user message, preserve prompt cache (#5146)

Plugin context from pre_llm_call hooks was injected into the system prompt, breaking the prompt cache prefix every turn when content changed (typical for memory plugins). Now all plugin context goes into the current turn's user message — the system prompt stays identical across turns, preserving cached tokens. The system prompt is reserved for Hermes internals. Plugins contribute context alongside the user's input. Also adds comprehensive documentation for all 6 plugin hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end — each with full callback signatures, parameter tables, firing conditions, and examples. Supersedes #5138 which identified the same cache-busting bug and proposed an uncached system suffix approach. This fix goes further by removing system prompt injection entirely. Co-identified-by: OutThisLife (PR #5138)
2026-04-25 00:51:20 +00:00 · 2026-04-04 16:55:44 -07:00 · 2026-04-04 16:55:44 -07:00 · 5879b3ef82
commit 5879b3ef82
parent 96e96a79ad
6 changed files with 653 additions and 57 deletions
--- a/website/docs/user-guide/features/plugins.md
+++ b/website/docs/user-guide/features/plugins.md
@ -103,12 +103,12 @@ Plugins can register callbacks for these lifecycle events. See the **[Event Hook

 | Hook | Fires when |
 |------|-----------|
-| `pre_tool_call` | Before any tool executes |
-| `post_tool_call` | After any tool returns |
-| `pre_llm_call` | Once per turn, before the LLM loop — can return `{"context": "..."}` to inject into the system prompt |
-| `post_llm_call` | Once per turn, after the LLM loop completes |
-| `on_session_start` | New session created (first turn only) |
-| `on_session_end` | End of every `run_conversation` call |
+| [`pre_tool_call`](/docs/user-guide/features/hooks#pre_tool_call) | Before any tool executes |
+| [`post_tool_call`](/docs/user-guide/features/hooks#post_tool_call) | After any tool returns |
+| [`pre_llm_call`](/docs/user-guide/features/hooks#pre_llm_call) | Once per turn, before the LLM loop — can return `{"context": "..."}` to [inject context into the user message](/docs/user-guide/features/hooks#pre_llm_call) |
+| [`post_llm_call`](/docs/user-guide/features/hooks#post_llm_call) | Once per turn, after the LLM loop (successful turns only) |
+| [`on_session_start`](/docs/user-guide/features/hooks#on_session_start) | New session created (first turn only) |
+| [`on_session_end`](/docs/user-guide/features/hooks#on_session_end) | End of every `run_conversation` call + CLI exit handler |

 ## Managing plugins