fix(nemo-relay): align adaptive config with tool_parallelism mode

Signed-off-by: mnajafian-nv <mnajafian@nvidia.com>
This commit is contained in:
mnajafian-nv 2026-06-08 11:48:19 -07:00
parent a38003be3d
commit 021d1034d0
No known key found for this signature in database
GPG key ID: C0C3EEEE9FB11E38
3 changed files with 110 additions and 25 deletions

View file

@ -173,8 +173,8 @@ include an adaptive component in the same `plugins.toml`:
kind = "adaptive"
enabled = true
[components.config]
mode = "route"
[components.config.tool_parallelism]
mode = "observe_only"
```
When the adaptive component is enabled and the installed NeMo Relay runtime
@ -182,15 +182,16 @@ exposes `llm.execute(...)` / `tools.execute(...)`, Hermes routes LLM and tool
execution through those middleware boundaries. The observer hooks still emit
session, turn, approval, and subagent marks; the plugin skips its manual
`llm.call` and `tools.call` spans for executions that are already managed by
NeMo Relay.
NeMo Relay. `tool_parallelism.mode = "observe_only"` keeps tool scheduling
observational while still wrapping the real execution boundary.
For the full generic Hermes middleware contract, see
[`docs/middleware/README.md`](../../../docs/middleware/README.md).
## Canonical Local Examples
The examples below use the official `nemo-relay==0.3` distribution and a local
Ollama model served through the OpenAI-compatible API.
The observe-only examples in this section use the official `nemo-relay==0.3`
distribution and a local Ollama model served through the OpenAI-compatible API.
```bash
pip install "nemo-relay==0.3"
@ -404,8 +405,8 @@ version = 1
kind = "adaptive"
enabled = true
[components.config]
mode = "route"
[components.config.tool_parallelism]
mode = "observe_only"
```
Enable it for Hermes:
@ -438,11 +439,12 @@ for the same execution.
### Local Adaptive E2E
This example enables both NeMo Relay observability export and adaptive execution
middleware for a local Hermes run.
middleware for a local Hermes run. This path requires a NeMo Relay runtime that
supports `[components.config.tool_parallelism]`; the `nemo-relay==0.3`
install used by the earlier observability-only examples does not support this
adaptive config.
```bash
pip install "nemo-relay==0.3"
export HERMES_HOME=/tmp/hermes-middleware-test/hermes-home
mkdir -p "$HERMES_HOME" /tmp/hermes-middleware-test/nemo-relay
@ -484,8 +486,8 @@ agent_version = "local"
kind = "adaptive"
enabled = true
[components.config]
mode = "route"
[components.config.tool_parallelism]
mode = "observe_only"
TOML
export HERMES_NEMO_RELAY_PLUGINS_TOML=/tmp/hermes-middleware-test/nemo-relay/plugins.toml
@ -510,8 +512,8 @@ middleware_execution_ok
Expected ATOF shape:
```jsonl
{"kind":"scope","category":"llm","name":"custom","scope_category":"start","metadata":{"session_id":"middleware-demo-session"},"data":{"mode":"route"}}
{"kind":"scope","category":"tool","name":"terminal","scope_category":"start","metadata":{"session_id":"middleware-demo-session","tool_call_id":"call_terminal"},"data":{"mode":"route"}}
{"kind":"scope","category":"llm","name":"custom","scope_category":"start","metadata":{"session_id":"middleware-demo-session"},"data":{"mode":"observe_only"}}
{"kind":"scope","category":"tool","name":"terminal","scope_category":"start","metadata":{"session_id":"middleware-demo-session","tool_call_id":"call_terminal"},"data":{"mode":"observe_only"}}
{"kind":"scope","category":"tool","name":"terminal","scope_category":"end","metadata":{"session_id":"middleware-demo-session","tool_call_id":"call_terminal","status":"ok"},"data":"{\"output\":\"middleware_execution_ok\",\"exit_code\":0,\"error\":null}"}
```

View file

@ -44,7 +44,7 @@ class _Settings:
plugins_toml_path: str = ""
plugins_config: dict[str, Any] | None = None
adaptive_enabled: bool = False
adaptive_mode: str = "observe"
adaptive_mode: str = "observe_only"
atof_enabled: bool = False
atof_output_directory: str = ""
atof_filename: str = "hermes-atof.jsonl"
@ -611,11 +611,16 @@ def _enabled_component_config(
def _adaptive_mode(config: dict[str, Any] | None) -> str:
if not isinstance(config, dict):
return "observe"
return "observe_only"
tool_parallelism = config.get("tool_parallelism")
if isinstance(tool_parallelism, dict):
mode = tool_parallelism.get("mode")
if isinstance(mode, str) and mode.strip():
return mode.strip()
mode = config.get("mode")
if isinstance(mode, str) and mode.strip():
return mode.strip()
return "observe"
return "observe_only"
def _env(name: str) -> str: