changes from feedback

2026-07-02 12:13:05 +00:00 · 2026-05-05 22:45:12 -04:00 · 2026-05-05 22:45:12 -04:00 · 0d1cbc2dda
commit 0d1cbc2dda
parent 401aadb5b8
1 changed files with 196 additions and 99 deletions
--- a/SECURITY.md
+++ b/SECURITY.md
@ -31,11 +31,31 @@ through the private security channel.

 ## 2. Trust Model

-Hermes is a single-tenant personal agent. Its posture is layered, and
-the layers are not equally load-bearing. Reporters and operators
-should reason about them in the same terms.
+Hermes Agent is a single-tenant personal agent. Its posture is
+layered, and the layers are not equally load-bearing. Reporters and
+operators should reason about them in the same terms.

-### 2.1 The Boundary: OS-Level Isolation
+### 2.1 Definitions
+
+- **Agent process.** The Python interpreter running Hermes Agent,
+  including any Python modules it has loaded (skills, plugins,
+  hook handlers).
+- **Terminal backend.** A pluggable execution target for the
+  `terminal()` tool. The default runs commands directly on the host.
+  Other backends run commands inside a container, cloud sandbox, or
+  remote host.
+- **Input surface.** Any channel through which content enters the
+  agent's context: operator input, web fetches, email, gateway
+  messages, file reads, MCP server responses, tool results.
+- **Trust envelope.** The set of resources an operator has implicitly
+  granted Hermes Agent access to by running it — typically, whatever
+  the operator's own user account can reach on the host.
+- **Stance.** An explicit statement in Hermes Agent's documentation
+  or code about how a consuming layer (adapter, UI, file writer,
+  shell) should treat agent output — e.g. "the dashboard renders
+  agent output as inert HTML."
+
+### 2.2 The Boundary: OS-Level Isolation

 **The only security boundary against an adversarial LLM is the
 operating system.** Nothing inside the agent process constitutes
@ -44,51 +64,76 @@ pattern scanner, not any tool allowlist. Any in-process component
 that screens LLM output is a heuristic operating on an
 attacker-influenced string, and this policy treats it as such.

-Hermes supports two OS-level isolation postures. They address
+Hermes Agent supports two OS-level isolation postures. They address
 different threats and an operator should choose deliberately.

-**Terminal-backend isolation** sandboxes the shell tool. A
-non-default terminal backend runs LLM-emitted shell commands inside
-a container, remote host, or cloud sandbox. This confines the blast
-radius of destructive shell — but only of shell. The Python process
-running the agent itself stays on the host, along with every code
-path that doesn't go through the shell tool: the code-execution
-tool, MCP subprocesses, file tools, plugin loading, hook dispatch,
-skill loading. This is the right posture when the concern is
-LLM-emitted destructive shell and the operator is otherwise
-trusted.
+#### Terminal-backend isolation

-**Whole-process wrapping** sandboxes the agent itself. The agent
-runs inside an external runtime that enforces filesystem, network,
-process, and inference policies across the entire agent process
-tree. [NVIDIA OpenShell](https://github.com/NVIDIA/OpenShell) is
-the reference deployment. Under this posture, every code path in
-the agent is subject to the same policy, and the in-process
-heuristics in §2.3 become accident-prevention layered on top of a
-real boundary. This is the supported posture when the agent
-ingests content from surfaces the operator does not control — the
-open web, inbound email, multi-user channels, untrusted MCP
-servers — and for production or shared deployments.
+A non-default terminal backend runs LLM-emitted shell commands
+inside a container, remote host, or cloud sandbox. The file tools
+(`read_file`, `write_file`, `patch`) also run through this backend,
+since they are implemented on top of the shell contract — they
+cannot reach paths the backend doesn't expose.
+
+What this confines: anything the agent does by issuing shell or
+file operations. What this does **not** confine: everything the
+agent does in its own Python process. That includes the
+code-execution tool (spawned as a host subprocess), MCP subprocesses
+(spawned from the agent's environment), plugin loading, hook
+dispatch, and skill loading (all imported into the agent
+interpreter).
+
+Terminal-backend isolation is the right posture when the concern is
+LLM-emitted destructive shell or unwanted file-tool writes, and the
+operator is otherwise trusted.
+
+#### Whole-process wrapping
+
+Whole-process wrapping runs the entire agent process tree inside a
+sandbox. Every code path — shell, code-execution, MCP, file tools,
+plugins, hooks, skill loading — is subject to the same filesystem,
+network, process, and (where applicable) inference policy.
+
+Hermes Agent supports this in two ways:
+
+- **Hermes Agent's own Docker image and Compose setup.** Lighter-
+  weight; the agent runs in a standard container with operator-
+  configured mounts and network policy.
+- **[NVIDIA OpenShell](https://github.com/NVIDIA/OpenShell)**.
+  OpenShell provides per-session sandboxes with declarative policy
+  across filesystem, network (L7 egress), process/syscall, and
+  inference-routing layers. Network and inference policies are
+  hot-reloadable. Credentials are injected from a Provider store
+  and never touch the sandbox filesystem.
+
+Under a whole-process wrapper, Hermes Agent's in-process heuristics
+(§2.4) function as accident-prevention layered on top of a real
+boundary. This is the supported posture when the agent ingests
+content from surfaces the operator does not control — the open web,
+inbound email, multi-user channels, untrusted MCP servers — and for
+production or shared deployments.

 Operators running the default local backend with untrusted input
 surfaces, or running a terminal-backend sandbox and expecting it to
 contain code paths that don't go through the shell, are operating
 outside the supported security posture.

-### 2.2 Credential Scoping
+### 2.3 Credential Scoping

-Hermes filters the environment it passes to its lower-trust
+Hermes Agent filters the environment it passes to its lower-trust
 in-process components: shell subprocesses, MCP subprocesses, and
 the code-execution child. Credentials like provider API keys and
 gateway tokens are stripped by default; variables explicitly
 declared by the operator or by a loaded skill are passed through.

-This reduces casual exfiltration. It is not containment. A
-component with code-execution primitives can always reach
-filesystem-resident credentials that the agent process itself can
-read.
+This reduces casual exfiltration. It is not containment. Any
+component running inside the agent process (skills, plugins, hook
+handlers) can read whatever the agent itself can read, including
+in-memory credentials. The mitigation against a compromised
+in-process component is operator review before install (§2.4,
+§2.5), not environment scrubbing.

-### 2.3 In-Process Heuristics
+### 2.4 In-Process Heuristics

 The following components screen or warn about LLM behavior. They
 are useful. They are not boundaries.
@ -102,35 +147,75 @@ are useful. They are not boundaries.
  A motivated output producer will defeat it.
 - **Skills Guard** scans installable skill content for injection
  patterns. It is a review aid; the boundary for third-party skills
-  is operator review before install.
+  is operator review before install. Reviewing a skill means
+  reading its Python code and scripts, not just its SKILL.md
+  description — skills execute arbitrary Python at import time.

-### 2.4 Gateway Authorization
+### 2.5 Plugin Trust Model

-When the gateway integrates with a messaging platform, each platform
-adapter authenticates callers against an operator-configured
-allowlist. **An allowlist is required for every enabled adapter.**
-Adapters should refuse to dispatch agent work, resolve approvals, or
-relay output until an allowlist is set; code paths that fail open
-when no allowlist is configured are code bugs in scope under §3.1.
-Within the allowlist, all authorized callers are equally trusted.
-Session identifiers are routing handles, not authorization
-boundaries.
+Plugins load into the agent process and run with full agent
+privileges: they can read the same credentials, call the same
+tools, register the same hooks, and import the same modules as
+anything shipped in-tree. The boundary for third-party plugins is
+operator review before install — the same rule as skills (§2.4),
+called out separately because plugins are architecturally heavier
+and often ship their own background services, network listeners,
+and dependencies.

-### 2.5 Agent-Loaded Content
+A malicious or buggy plugin is not a vulnerability in Hermes Agent
+itself. Bugs in Hermes Agent's plugin-install or plugin-discovery
+path that prevent the operator from seeing what they're installing
+are in scope under §3.1.

-Hermes chooses, by design, to load and execute content from specific
-on-disk locations at its own initiative — skills, hooks, plugins,
-operator-configured shortcuts. Content placed in these locations
-becomes code the agent runs on its next session, hook dispatch, or
-command invocation.
+### 2.6 External Surfaces

-Hermes does not claim these locations are protected files.
-Filesystem-level protection is whatever the OS provides under the
-operator's chosen isolation posture (§2.1). What Hermes commits to
-is narrower and different: **attacker-influenced input must not be
-chainable into a write that Hermes would later load and execute on
-its own initiative**. The concern is not what the filesystem
-allows; it is what Hermes loads.
+An **external surface** is any channel outside the local agent
+process through which a caller can dispatch agent work, resolve
+approvals, or receive agent output. Each surface has its own
+authorization model, but the rules below apply uniformly.
+
+**Surfaces in Hermes Agent:**
+
+- **Gateway platform adapters.** Messaging integrations in
+  `gateway/platforms/` (Telegram, Discord, Slack, email, SMS, etc.)
+  and analogous adapters shipped as plugins.
+- **Network-exposed HTTP surfaces.** The API server adapter, the
+  dashboard plugin, the kanban plugin's HTTP endpoints, and any
+  other plugin that binds a listening socket.
+- **Editor / IDE adapters.** The ACP adapter (`acp_adapter/`) and
+  equivalent integrations that accept requests from a local client
+  process.
+- **The TUI gateway (`tui_gateway/`).** JSON-RPC backend for the
+  Ink terminal UI, reached over local IPC.
+
+**Uniform rules:**
+
+1. **Authorization is required at every surface that crosses a
+   trust boundary.** For messaging and network HTTP surfaces, the
+   boundary is the network: authorization means an operator-
+   configured caller allowlist. For editor and local-IPC surfaces
+   (ACP, TUI gateway), the boundary is the host's user account:
+   authorization means relying on OS-level access control (file
+   permissions, loopback-only binds) and not exposing the surface
+   beyond the local user without an explicit network auth layer.
+2. **An allowlist is required for every enabled network-exposed
+   adapter.** Adapters must refuse to dispatch agent work, resolve
+   approvals, or relay output until an allowlist is set. Code paths
+   that fail open when no allowlist is configured are code bugs in
+   scope under §3.1.
+3. **Session identifiers are routing handles, not authorization
+   boundaries.** Knowing another caller's session ID does not grant
+   access to their approvals or output; authorization is always
+   re-checked against the allowlist (or OS-level equivalent).
+4. **Within the authorized set, all callers are equally trusted.**
+   Hermes Agent does not model per-caller capabilities inside a
+   single adapter. Operators who need capability separation should
+   run separate agent instances with separate allowlists.
+5. **Binding a local-only surface to a non-loopback interface is a
+   break-glass operator decision (§3.2).** The dashboard and other
+   plugin HTTP servers default to loopback; exposing them via
+   `--host 0.0.0.0` or equivalent makes public-exposure hardening
+   (§4) the operator's responsibility.

 ---

@ -138,60 +223,71 @@ allows; it is what Hermes loads.

 ### 3.1 In Scope

- Escape from a declared OS-level isolation posture (§2.1): an
+- Escape from a declared OS-level isolation posture (§2.2): an
  attacker-controlled code path reaching state that the posture
  claimed to confine.
- Unauthorized gateway access: a caller outside the configured
-  allowlist dispatching work, receiving output, or resolving
-  approvals (§2.4).
+- Unauthorized external-surface access: a caller outside the
+  configured authorization set (allowlist, or OS-level equivalent
+  for local-IPC surfaces) dispatching work, receiving output, or
+  resolving approvals (§2.6).
 - Credential exfiltration: leakage of operator credentials or
  session authorization material to a destination outside the
-  operator's trust envelope.
- Untrusted input chaining into agent-loaded content: an untrusted
-  input surface chains into a write whose target is a location
-  Hermes loads and executes on its own initiative (§2.5).
- Output integrity failures into external platforms: agent output
-  rendered on a receiving platform with unintended authority —
-  broadcast-mention passthrough, content that fetches attacker
-  resources for every recipient, markup injection into hosted UIs.
+  trust envelope, via a mechanism that should have prevented it
+  (environment scrubbing bug, adapter logging, transport error
+  that flushes credentials to an upstream, etc.).
 - Trust-model documentation violations: code behaving contrary to
-  what this policy states, where an operator relying on the policy
-  would reasonably expect otherwise.
+  what this policy, Hermes Agent's own documentation, or reasonable
+  operator expectations would predict — including cases where
+  Hermes Agent has documented a stance about how its output should
+  be rendered by a consuming layer (dashboard, gateway adapter,
+  file writer, shell) and a code path breaks that stance.

 ### 3.2 Out of Scope

 "Out of scope" here means "not a security vulnerability under this
 policy." It does not mean "not worth reporting." Improvements to the
 in-process heuristics, hardening ideas, and UX fixes are welcome as
-regular issues or pull requests — we can always make the approval
-gate catch more patterns, make redaction smarter, or tighten adapter
-behavior. These items just don't go through the private-disclosure
-channel and don't receive advisories.
+regular issues or pull requests — the approval gate can always catch
+more patterns, redaction can always get smarter, adapter behavior
+can always be tightened. These items just don't go through the
+private-disclosure channel and don't receive advisories.

- **Bypasses of in-process heuristics (§2.3)** — approval-gate regex
+- **Bypasses of in-process heuristics (§2.4)** — approval-gate regex
  bypasses, redaction bypasses, Skills Guard pattern bypasses, and
  analogous reports against future heuristics. These components are
  not boundaries; defeating them is not a vulnerability under this
  policy.
- **Prompt injection that does not chain to a §3.1 outcome.** Getting
-  the LLM to emit unusual text or "ignore previous instructions" is
-  not itself a vulnerability; it becomes one only when it results in
-  something §3.1 describes.
+- **Prompt injection per se.** Getting the LLM to emit unusual
+  output — via injected content, hallucination, training artifacts,
+  or any other cause — is not itself a vulnerability. "I achieved
+  prompt injection" without a chained §3.1 outcome is not an
+  actionable report under this policy.
 - **Consequences of a chosen isolation posture.** Reports that a
  code path operating within its posture's scope can do what that
-  posture permits are not vulnerabilities. Examples: shell tools
-  reaching host state under the local backend; code-execution or
-  file tools reaching host state under terminal-backend isolation
-  that only sandboxes shell; reports whose preconditions require
-  pre-existing write access to operator-owned configuration or
-  credential files (those are already inside the operator's trust
-  envelope).
+  posture permits are not vulnerabilities. Examples: shell or file
+  tools reaching host state under the local backend; code-execution
+  or MCP subprocesses reaching host state under terminal-backend
+  isolation that only sandboxes shell; reports whose preconditions
+  require pre-existing write access to operator-owned configuration
+  or credential files (those are already inside the trust envelope).
+- **Documented break-glass settings.** Operator-selected trade-offs
+  that explicitly disable protections: `--insecure` and equivalent
+  flags on the dashboard or other components, disabled approvals,
+  local backend in production, development profiles that bypass
+  hermes-home security, and similar. Reports against those
+  configurations are not vulnerabilities — that's the flag's job.
+- **Community-contributed skills and plugins.** Third-party skills
+  (including the community skills repository) and third-party
+  plugins are in the operator's review surface, not Hermes Agent's
+  trust surface (§2.4, §2.5). A skill or plugin doing something
+  malicious is the expected failure mode of one that wasn't
+  reviewed, not a vulnerability in Hermes Agent. Bugs in Hermes
+  Agent's skill-install or plugin-install path that prevent the
+  operator from seeing what they're installing are in scope under
+  §3.1.
 - **Public exposure without external controls.** Exposing the
  gateway or API to the public internet without authentication,
  VPN, or firewall.
- **Documented break-glass settings.** Disabled approvals, local
-  backend in production, development profiles that bypass
-  hermes-home security, and similar operator-selected trade-offs.
 - **Tool-level read/write restrictions on a posture where shell is
  permitted.** If a path is reachable via the terminal tool, reports
  that other file tools can reach it add nothing.
@ -201,25 +297,26 @@ channel and don't receive advisories.
 ## 4. Deployment Hardening

 The single most important hardening decision is matching isolation
-(§2.1) to the trust of the content the agent will ingest. Beyond
+(§2.2) to the trust of the content the agent will ingest. Beyond
 that:

 - Run the agent as a non-root user. The supplied container image
  does this by default.
 - Keep credentials in the operator credential file with tight
  permissions, never in the main config, never in version control.
-  Under OpenShell, use its Provider store rather than an on-disk
+  Under OpenShell, use the Provider store rather than an on-disk
  credential file.
 - Do not expose the gateway or API to the public internet without
  VPN, Tailscale, or firewall protection. Under OpenShell, use the
  network policy layer to restrict egress.
- Configure a caller allowlist for every gateway adapter you enable
-  (§2.4).
- Review third-party skills before install. Skills Guard reports and
-  the install audit log are the review surface.
- The OSV malware database is consulted before launching
-  ecosystem-resolved MCP servers. Additional supply-chain guards
-  on dependency and bundled-package changes run in CI; see
+- Configure a caller allowlist for every network-exposed adapter
+  you enable (§2.6).
+- Review third-party skills and plugins before install (§2.4,
+  §2.5). For skills, this means reading the Python and scripts,
+  not just SKILL.md. Skills Guard reports and the install audit
+  log are the review surface.
+- Hermes Agent includes supply-chain guards for MCP server
+  launches and for dependency / bundled-package changes in CI; see
  `CONTRIBUTING.md` for specifics.

 ---