changes from feedback

This commit is contained in:
emozilla 2026-05-05 22:45:12 -04:00
parent 401aadb5b8
commit 0d1cbc2dda

View file

@ -31,11 +31,31 @@ through the private security channel.
## 2. Trust Model ## 2. Trust Model
Hermes is a single-tenant personal agent. Its posture is layered, and Hermes Agent is a single-tenant personal agent. Its posture is
the layers are not equally load-bearing. Reporters and operators layered, and the layers are not equally load-bearing. Reporters and
should reason about them in the same terms. operators should reason about them in the same terms.
### 2.1 The Boundary: OS-Level Isolation ### 2.1 Definitions
- **Agent process.** The Python interpreter running Hermes Agent,
including any Python modules it has loaded (skills, plugins,
hook handlers).
- **Terminal backend.** A pluggable execution target for the
`terminal()` tool. The default runs commands directly on the host.
Other backends run commands inside a container, cloud sandbox, or
remote host.
- **Input surface.** Any channel through which content enters the
agent's context: operator input, web fetches, email, gateway
messages, file reads, MCP server responses, tool results.
- **Trust envelope.** The set of resources an operator has implicitly
granted Hermes Agent access to by running it — typically, whatever
the operator's own user account can reach on the host.
- **Stance.** An explicit statement in Hermes Agent's documentation
or code about how a consuming layer (adapter, UI, file writer,
shell) should treat agent output — e.g. "the dashboard renders
agent output as inert HTML."
### 2.2 The Boundary: OS-Level Isolation
**The only security boundary against an adversarial LLM is the **The only security boundary against an adversarial LLM is the
operating system.** Nothing inside the agent process constitutes operating system.** Nothing inside the agent process constitutes
@ -44,51 +64,76 @@ pattern scanner, not any tool allowlist. Any in-process component
that screens LLM output is a heuristic operating on an that screens LLM output is a heuristic operating on an
attacker-influenced string, and this policy treats it as such. attacker-influenced string, and this policy treats it as such.
Hermes supports two OS-level isolation postures. They address Hermes Agent supports two OS-level isolation postures. They address
different threats and an operator should choose deliberately. different threats and an operator should choose deliberately.
**Terminal-backend isolation** sandboxes the shell tool. A #### Terminal-backend isolation
non-default terminal backend runs LLM-emitted shell commands inside
a container, remote host, or cloud sandbox. This confines the blast
radius of destructive shell — but only of shell. The Python process
running the agent itself stays on the host, along with every code
path that doesn't go through the shell tool: the code-execution
tool, MCP subprocesses, file tools, plugin loading, hook dispatch,
skill loading. This is the right posture when the concern is
LLM-emitted destructive shell and the operator is otherwise
trusted.
**Whole-process wrapping** sandboxes the agent itself. The agent A non-default terminal backend runs LLM-emitted shell commands
runs inside an external runtime that enforces filesystem, network, inside a container, remote host, or cloud sandbox. The file tools
process, and inference policies across the entire agent process (`read_file`, `write_file`, `patch`) also run through this backend,
tree. [NVIDIA OpenShell](https://github.com/NVIDIA/OpenShell) is since they are implemented on top of the shell contract — they
the reference deployment. Under this posture, every code path in cannot reach paths the backend doesn't expose.
the agent is subject to the same policy, and the in-process
heuristics in §2.3 become accident-prevention layered on top of a What this confines: anything the agent does by issuing shell or
real boundary. This is the supported posture when the agent file operations. What this does **not** confine: everything the
ingests content from surfaces the operator does not control — the agent does in its own Python process. That includes the
open web, inbound email, multi-user channels, untrusted MCP code-execution tool (spawned as a host subprocess), MCP subprocesses
servers — and for production or shared deployments. (spawned from the agent's environment), plugin loading, hook
dispatch, and skill loading (all imported into the agent
interpreter).
Terminal-backend isolation is the right posture when the concern is
LLM-emitted destructive shell or unwanted file-tool writes, and the
operator is otherwise trusted.
#### Whole-process wrapping
Whole-process wrapping runs the entire agent process tree inside a
sandbox. Every code path — shell, code-execution, MCP, file tools,
plugins, hooks, skill loading — is subject to the same filesystem,
network, process, and (where applicable) inference policy.
Hermes Agent supports this in two ways:
- **Hermes Agent's own Docker image and Compose setup.** Lighter-
weight; the agent runs in a standard container with operator-
configured mounts and network policy.
- **[NVIDIA OpenShell](https://github.com/NVIDIA/OpenShell)**.
OpenShell provides per-session sandboxes with declarative policy
across filesystem, network (L7 egress), process/syscall, and
inference-routing layers. Network and inference policies are
hot-reloadable. Credentials are injected from a Provider store
and never touch the sandbox filesystem.
Under a whole-process wrapper, Hermes Agent's in-process heuristics
(§2.4) function as accident-prevention layered on top of a real
boundary. This is the supported posture when the agent ingests
content from surfaces the operator does not control — the open web,
inbound email, multi-user channels, untrusted MCP servers — and for
production or shared deployments.
Operators running the default local backend with untrusted input Operators running the default local backend with untrusted input
surfaces, or running a terminal-backend sandbox and expecting it to surfaces, or running a terminal-backend sandbox and expecting it to
contain code paths that don't go through the shell, are operating contain code paths that don't go through the shell, are operating
outside the supported security posture. outside the supported security posture.
### 2.2 Credential Scoping ### 2.3 Credential Scoping
Hermes filters the environment it passes to its lower-trust Hermes Agent filters the environment it passes to its lower-trust
in-process components: shell subprocesses, MCP subprocesses, and in-process components: shell subprocesses, MCP subprocesses, and
the code-execution child. Credentials like provider API keys and the code-execution child. Credentials like provider API keys and
gateway tokens are stripped by default; variables explicitly gateway tokens are stripped by default; variables explicitly
declared by the operator or by a loaded skill are passed through. declared by the operator or by a loaded skill are passed through.
This reduces casual exfiltration. It is not containment. A This reduces casual exfiltration. It is not containment. Any
component with code-execution primitives can always reach component running inside the agent process (skills, plugins, hook
filesystem-resident credentials that the agent process itself can handlers) can read whatever the agent itself can read, including
read. in-memory credentials. The mitigation against a compromised
in-process component is operator review before install (§2.4,
§2.5), not environment scrubbing.
### 2.3 In-Process Heuristics ### 2.4 In-Process Heuristics
The following components screen or warn about LLM behavior. They The following components screen or warn about LLM behavior. They
are useful. They are not boundaries. are useful. They are not boundaries.
@ -102,35 +147,75 @@ are useful. They are not boundaries.
A motivated output producer will defeat it. A motivated output producer will defeat it.
- **Skills Guard** scans installable skill content for injection - **Skills Guard** scans installable skill content for injection
patterns. It is a review aid; the boundary for third-party skills patterns. It is a review aid; the boundary for third-party skills
is operator review before install. is operator review before install. Reviewing a skill means
reading its Python code and scripts, not just its SKILL.md
description — skills execute arbitrary Python at import time.
### 2.4 Gateway Authorization ### 2.5 Plugin Trust Model
When the gateway integrates with a messaging platform, each platform Plugins load into the agent process and run with full agent
adapter authenticates callers against an operator-configured privileges: they can read the same credentials, call the same
allowlist. **An allowlist is required for every enabled adapter.** tools, register the same hooks, and import the same modules as
Adapters should refuse to dispatch agent work, resolve approvals, or anything shipped in-tree. The boundary for third-party plugins is
relay output until an allowlist is set; code paths that fail open operator review before install — the same rule as skills (§2.4),
when no allowlist is configured are code bugs in scope under §3.1. called out separately because plugins are architecturally heavier
Within the allowlist, all authorized callers are equally trusted. and often ship their own background services, network listeners,
Session identifiers are routing handles, not authorization and dependencies.
boundaries.
### 2.5 Agent-Loaded Content A malicious or buggy plugin is not a vulnerability in Hermes Agent
itself. Bugs in Hermes Agent's plugin-install or plugin-discovery
path that prevent the operator from seeing what they're installing
are in scope under §3.1.
Hermes chooses, by design, to load and execute content from specific ### 2.6 External Surfaces
on-disk locations at its own initiative — skills, hooks, plugins,
operator-configured shortcuts. Content placed in these locations
becomes code the agent runs on its next session, hook dispatch, or
command invocation.
Hermes does not claim these locations are protected files. An **external surface** is any channel outside the local agent
Filesystem-level protection is whatever the OS provides under the process through which a caller can dispatch agent work, resolve
operator's chosen isolation posture (§2.1). What Hermes commits to approvals, or receive agent output. Each surface has its own
is narrower and different: **attacker-influenced input must not be authorization model, but the rules below apply uniformly.
chainable into a write that Hermes would later load and execute on
its own initiative**. The concern is not what the filesystem **Surfaces in Hermes Agent:**
allows; it is what Hermes loads.
- **Gateway platform adapters.** Messaging integrations in
`gateway/platforms/` (Telegram, Discord, Slack, email, SMS, etc.)
and analogous adapters shipped as plugins.
- **Network-exposed HTTP surfaces.** The API server adapter, the
dashboard plugin, the kanban plugin's HTTP endpoints, and any
other plugin that binds a listening socket.
- **Editor / IDE adapters.** The ACP adapter (`acp_adapter/`) and
equivalent integrations that accept requests from a local client
process.
- **The TUI gateway (`tui_gateway/`).** JSON-RPC backend for the
Ink terminal UI, reached over local IPC.
**Uniform rules:**
1. **Authorization is required at every surface that crosses a
trust boundary.** For messaging and network HTTP surfaces, the
boundary is the network: authorization means an operator-
configured caller allowlist. For editor and local-IPC surfaces
(ACP, TUI gateway), the boundary is the host's user account:
authorization means relying on OS-level access control (file
permissions, loopback-only binds) and not exposing the surface
beyond the local user without an explicit network auth layer.
2. **An allowlist is required for every enabled network-exposed
adapter.** Adapters must refuse to dispatch agent work, resolve
approvals, or relay output until an allowlist is set. Code paths
that fail open when no allowlist is configured are code bugs in
scope under §3.1.
3. **Session identifiers are routing handles, not authorization
boundaries.** Knowing another caller's session ID does not grant
access to their approvals or output; authorization is always
re-checked against the allowlist (or OS-level equivalent).
4. **Within the authorized set, all callers are equally trusted.**
Hermes Agent does not model per-caller capabilities inside a
single adapter. Operators who need capability separation should
run separate agent instances with separate allowlists.
5. **Binding a local-only surface to a non-loopback interface is a
break-glass operator decision (§3.2).** The dashboard and other
plugin HTTP servers default to loopback; exposing them via
`--host 0.0.0.0` or equivalent makes public-exposure hardening
(§4) the operator's responsibility.
--- ---
@ -138,60 +223,71 @@ allows; it is what Hermes loads.
### 3.1 In Scope ### 3.1 In Scope
- Escape from a declared OS-level isolation posture (§2.1): an - Escape from a declared OS-level isolation posture (§2.2): an
attacker-controlled code path reaching state that the posture attacker-controlled code path reaching state that the posture
claimed to confine. claimed to confine.
- Unauthorized gateway access: a caller outside the configured - Unauthorized external-surface access: a caller outside the
allowlist dispatching work, receiving output, or resolving configured authorization set (allowlist, or OS-level equivalent
approvals (§2.4). for local-IPC surfaces) dispatching work, receiving output, or
resolving approvals (§2.6).
- Credential exfiltration: leakage of operator credentials or - Credential exfiltration: leakage of operator credentials or
session authorization material to a destination outside the session authorization material to a destination outside the
operator's trust envelope. trust envelope, via a mechanism that should have prevented it
- Untrusted input chaining into agent-loaded content: an untrusted (environment scrubbing bug, adapter logging, transport error
input surface chains into a write whose target is a location that flushes credentials to an upstream, etc.).
Hermes loads and executes on its own initiative (§2.5).
- Output integrity failures into external platforms: agent output
rendered on a receiving platform with unintended authority —
broadcast-mention passthrough, content that fetches attacker
resources for every recipient, markup injection into hosted UIs.
- Trust-model documentation violations: code behaving contrary to - Trust-model documentation violations: code behaving contrary to
what this policy states, where an operator relying on the policy what this policy, Hermes Agent's own documentation, or reasonable
would reasonably expect otherwise. operator expectations would predict — including cases where
Hermes Agent has documented a stance about how its output should
be rendered by a consuming layer (dashboard, gateway adapter,
file writer, shell) and a code path breaks that stance.
### 3.2 Out of Scope ### 3.2 Out of Scope
"Out of scope" here means "not a security vulnerability under this "Out of scope" here means "not a security vulnerability under this
policy." It does not mean "not worth reporting." Improvements to the policy." It does not mean "not worth reporting." Improvements to the
in-process heuristics, hardening ideas, and UX fixes are welcome as in-process heuristics, hardening ideas, and UX fixes are welcome as
regular issues or pull requests — we can always make the approval regular issues or pull requests — the approval gate can always catch
gate catch more patterns, make redaction smarter, or tighten adapter more patterns, redaction can always get smarter, adapter behavior
behavior. These items just don't go through the private-disclosure can always be tightened. These items just don't go through the
channel and don't receive advisories. private-disclosure channel and don't receive advisories.
- **Bypasses of in-process heuristics (§2.3)** — approval-gate regex - **Bypasses of in-process heuristics (§2.4)** — approval-gate regex
bypasses, redaction bypasses, Skills Guard pattern bypasses, and bypasses, redaction bypasses, Skills Guard pattern bypasses, and
analogous reports against future heuristics. These components are analogous reports against future heuristics. These components are
not boundaries; defeating them is not a vulnerability under this not boundaries; defeating them is not a vulnerability under this
policy. policy.
- **Prompt injection that does not chain to a §3.1 outcome.** Getting - **Prompt injection per se.** Getting the LLM to emit unusual
the LLM to emit unusual text or "ignore previous instructions" is output — via injected content, hallucination, training artifacts,
not itself a vulnerability; it becomes one only when it results in or any other cause — is not itself a vulnerability. "I achieved
something §3.1 describes. prompt injection" without a chained §3.1 outcome is not an
actionable report under this policy.
- **Consequences of a chosen isolation posture.** Reports that a - **Consequences of a chosen isolation posture.** Reports that a
code path operating within its posture's scope can do what that code path operating within its posture's scope can do what that
posture permits are not vulnerabilities. Examples: shell tools posture permits are not vulnerabilities. Examples: shell or file
reaching host state under the local backend; code-execution or tools reaching host state under the local backend; code-execution
file tools reaching host state under terminal-backend isolation or MCP subprocesses reaching host state under terminal-backend
that only sandboxes shell; reports whose preconditions require isolation that only sandboxes shell; reports whose preconditions
pre-existing write access to operator-owned configuration or require pre-existing write access to operator-owned configuration
credential files (those are already inside the operator's trust or credential files (those are already inside the trust envelope).
envelope). - **Documented break-glass settings.** Operator-selected trade-offs
that explicitly disable protections: `--insecure` and equivalent
flags on the dashboard or other components, disabled approvals,
local backend in production, development profiles that bypass
hermes-home security, and similar. Reports against those
configurations are not vulnerabilities — that's the flag's job.
- **Community-contributed skills and plugins.** Third-party skills
(including the community skills repository) and third-party
plugins are in the operator's review surface, not Hermes Agent's
trust surface (§2.4, §2.5). A skill or plugin doing something
malicious is the expected failure mode of one that wasn't
reviewed, not a vulnerability in Hermes Agent. Bugs in Hermes
Agent's skill-install or plugin-install path that prevent the
operator from seeing what they're installing are in scope under
§3.1.
- **Public exposure without external controls.** Exposing the - **Public exposure without external controls.** Exposing the
gateway or API to the public internet without authentication, gateway or API to the public internet without authentication,
VPN, or firewall. VPN, or firewall.
- **Documented break-glass settings.** Disabled approvals, local
backend in production, development profiles that bypass
hermes-home security, and similar operator-selected trade-offs.
- **Tool-level read/write restrictions on a posture where shell is - **Tool-level read/write restrictions on a posture where shell is
permitted.** If a path is reachable via the terminal tool, reports permitted.** If a path is reachable via the terminal tool, reports
that other file tools can reach it add nothing. that other file tools can reach it add nothing.
@ -201,25 +297,26 @@ channel and don't receive advisories.
## 4. Deployment Hardening ## 4. Deployment Hardening
The single most important hardening decision is matching isolation The single most important hardening decision is matching isolation
(§2.1) to the trust of the content the agent will ingest. Beyond (§2.2) to the trust of the content the agent will ingest. Beyond
that: that:
- Run the agent as a non-root user. The supplied container image - Run the agent as a non-root user. The supplied container image
does this by default. does this by default.
- Keep credentials in the operator credential file with tight - Keep credentials in the operator credential file with tight
permissions, never in the main config, never in version control. permissions, never in the main config, never in version control.
Under OpenShell, use its Provider store rather than an on-disk Under OpenShell, use the Provider store rather than an on-disk
credential file. credential file.
- Do not expose the gateway or API to the public internet without - Do not expose the gateway or API to the public internet without
VPN, Tailscale, or firewall protection. Under OpenShell, use the VPN, Tailscale, or firewall protection. Under OpenShell, use the
network policy layer to restrict egress. network policy layer to restrict egress.
- Configure a caller allowlist for every gateway adapter you enable - Configure a caller allowlist for every network-exposed adapter
(§2.4). you enable (§2.6).
- Review third-party skills before install. Skills Guard reports and - Review third-party skills and plugins before install (§2.4,
the install audit log are the review surface. §2.5). For skills, this means reading the Python and scripts,
- The OSV malware database is consulted before launching not just SKILL.md. Skills Guard reports and the install audit
ecosystem-resolved MCP servers. Additional supply-chain guards log are the review surface.
on dependency and bundled-package changes run in CI; see - Hermes Agent includes supply-chain guards for MCP server
launches and for dependency / bundled-package changes in CI; see
`CONTRIBUTING.md` for specifics. `CONTRIBUTING.md` for specifics.
--- ---