Ensure failed plugin-config clear operations still re-arm managed reinitialization on the next Hermes session.
Add focused regression coverage for successful init, failed final-session clear, and next-session recovery.
Signed-off-by: mnajafian-nv <mnajafian@nvidia.com>
Clear NeMo Relay plugin-config observability only after the last active Hermes session finalizes.
Use the plugin's async-safe awaitable helper for both initialize and clear so session rotation remains safe under active event loops.
Disable the direct ATIF fallback when plugins.toml already owns the ATIF exporter lifecycle to avoid duplicate trajectory export on finalization.
Adds backend-neutral observer hooks for plugins: session, turn, API
request, tool, approval, and subagent lifecycle events with stable
correlation IDs (session_id, task_id, turn_id, api_request_id,
tool_call_id, parent/child subagent ids). Extends VALID_HOOKS with
api_request_error and subagent_start.
Hot path is zero-cost when no plugin subscribes: has_hook()/presence
checks gate all payload construction, request payloads are returned
by reference when no middleware rewrites, and the sanitized response
payload no longer embeds raw response objects.
Bundles the optional NeMo-Relay observability plugin
(plugins/observability/nemo_relay) as an in-repo consumer of the new
hooks, peer to the existing langfuse plugin. Fails open when the
optional nemo-relay package is not installed.
Authored-by: Bryan Bednarski <bbednarski@nvidia.com>
Salvaged from #29722 onto current main.