docs(execute_code): document project/strict execution modes (#12073)

Follow-up to PR #11971. Documents the new code_execution.mode config key and what each mode actually does. - user-guide/configuration.md: add mode: project to the yaml example, explain project vs strict and call out that security invariants are identical across modes. - user-guide/features/code-execution.md: new 'Execution Mode' section with a comparison table and usage guidance; update the 'temporary directory' note so it reflects that script.py runs in the session CWD in project mode (staging dir stays on PYTHONPATH for imports); drop stale 'sandboxed' framing from the intro and skill-passthrough paragraph. - getting-started/learning-path.md: update the one-line Code Execution summary to match (no longer 'sandboxed environments' — the default runs in the session's real working directory). No code changes.
2026-06-30 11:52:04 +00:00 · 2026-04-18 01:53:09 -07:00 · 2026-04-18 01:53:09 -07:00 · a2c9f5d0a7
commit a2c9f5d0a7
parent 8322b42c6c
3 changed files with 45 additions and 7 deletions
--- a/website/docs/user-guide/features/code-execution.md
+++ b/website/docs/user-guide/features/code-execution.md
@ -1,12 +1,12 @@
 ---
 sidebar_position: 8
 title: "Code Execution"
-description: "Sandboxed Python execution with RPC tool access — collapse multi-step workflows into a single turn"
+description: "Programmatic Python execution with RPC tool access — collapse multi-step workflows into a single turn"
 ---

 # Code Execution (Programmatic Tool Calling)

-The `execute_code` tool lets the agent write Python scripts that call Hermes tools programmatically, collapsing multi-step workflows into a single LLM turn. The script runs in a sandboxed child process on the agent host, communicating via Unix domain socket RPC.
+The `execute_code` tool lets the agent write Python scripts that call Hermes tools programmatically, collapsing multi-step workflows into a single LLM turn. The script runs in a child process on the agent host, communicating with Hermes over a Unix domain socket RPC.

 ## How It Works

@ -27,7 +27,7 @@ for r in results["data"]["web"]:
 print(summary)
 ```

-**Available tools in sandbox:** `web_search`, `web_extract`, `read_file`, `write_file`, `search_files`, `patch`, `terminal` (foreground only).
+**Available tools inside scripts:** `web_search`, `web_extract`, `read_file`, `write_file`, `search_files`, `patch`, `terminal` (foreground only).

 ## When the Agent Uses This

@ -126,6 +126,35 @@ report = {
 print(json.dumps(report, indent=2))
 ```

+## Execution Mode
+
+`execute_code` has two execution modes controlled by `code_execution.mode` in `~/.hermes/config.yaml`:
+
+| Mode | Working directory | Python interpreter |
+|------|-------------------|--------------------|
+| **`project`** (default) | The session's working directory (same as `terminal()`) | Active `VIRTUAL_ENV` / `CONDA_PREFIX` python, falling back to Hermes's own python |
+| `strict` | A temp staging directory isolated from the user's project | `sys.executable` (Hermes's own python) |
+
+**When to leave it on `project`:** you want `import pandas`, `from my_project import foo`, or relative paths like `open(".env")` to work the same way they do in `terminal()`. This is almost always what you want.
+
+**When to flip to `strict`:** you need maximum reproducibility — you want the same interpreter every session regardless of which venv the user activated, and you want scripts quarantined from the project tree (no risk of accidentally reading project files through a relative path).
+
+```yaml
+# ~/.hermes/config.yaml
+code_execution:
+  mode: project   # or "strict"
+```
+
+Fallback behavior in `project` mode: if `VIRTUAL_ENV` / `CONDA_PREFIX` is unset, broken, or points at a Python older than 3.8, the resolver falls back cleanly to `sys.executable` — it never leaves the agent without a working interpreter.
+
+Security-critical invariants are identical across both modes:
+
+- environment scrubbing (API keys, tokens, credentials stripped)
+- tool whitelist (scripts cannot call `execute_code` recursively, `delegate_task`, or MCP tools)
+- resource limits (timeout, stdout cap, tool-call cap)
+
+Switching mode changes where scripts run and which interpreter runs them, not what credentials they can see or which tools they can call.
+
 ## Resource Limits

 | Resource | Limit | Notes |
@ -140,6 +169,7 @@ All limits are configurable via `config.yaml`:
 ```yaml
 # In ~/.hermes/config.yaml
 code_execution:
+  mode: project      # project (default) | strict
  timeout: 300       # Max seconds per script (default: 300)
  max_tool_calls: 50 # Max tool calls per execution (default: 50)
 ```
@ -176,7 +206,7 @@ Environment variables containing `KEY`, `TOKEN`, `SECRET`, `PASSWORD`, `CREDENTI

 ### Skill Environment Variable Passthrough

-When a skill declares `required_environment_variables` in its frontmatter, those variables are **automatically passed through** to both `execute_code` and `terminal` sandboxes after the skill is loaded. This lets skills use their declared API keys without weakening the security posture for arbitrary code.
+When a skill declares `required_environment_variables` in its frontmatter, those variables are **automatically passed through** to both `execute_code` and `terminal` child processes after the skill is loaded. This lets skills use their declared API keys without weakening the security posture for arbitrary code.

 For non-skill use cases, you can explicitly allowlist variables in `config.yaml`:

@ -189,7 +219,7 @@ terminal:

 See the [Security guide](/docs/user-guide/security#environment-variable-passthrough) for full details.

-The script runs in a temporary directory that is cleaned up after execution. The child process runs in its own process group so it can be cleanly killed on timeout or interruption.
+Hermes always writes the script and the auto-generated `hermes_tools.py` RPC stub into a temp staging directory that is cleaned up after execution. In `strict` mode the script also *runs* there; in `project` mode it runs in the session's working directory (the staging directory stays on `PYTHONPATH` so imports still resolve). The child process runs in its own process group so it can be cleanly killed on timeout or interruption.

 ## execute_code vs terminal