mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-01 07:01:41 +00:00
New opt-in plugin that scans the content passed to write_file / patch /
skill_manage for 25 known-dangerous code patterns — pickle.load,
yaml.load, eval(, os.system, subprocess(shell=True), child_process.exec,
dangerouslySetInnerHTML, innerHTML/outerHTML/document.write/
insertAdjacentHTML, crypto.createCipher (no IV), AES ECB,
TLS verification disabled, XXE-prone xml.etree/minidom parsers,
<script src=//...> without SRI, torch.load without weights_only=True,
GitHub Actions ${{ github.event.* }} injection — and appends a
"Security guidance" warning block to the tool result via the
transform_tool_result hook.
Default behaviour is non-blocking: the file is written and the warning
rides back to the model in the next turn so it can self-correct or
document why the construct is safe. SECURITY_GUIDANCE_BLOCK=1 upgrades
to refusing the write entirely; SECURITY_GUIDANCE_DISABLE=1 is the
kill switch.
Pattern data (patterns.py) is a verbatim Apache-2.0 fork of
Anthropic's claude-plugins-official/plugins/security-guidance/hooks/
patterns.py at commit 0bde168 (2026-05-26). LICENSE and NOTICE
preserve attribution. The Hermes-side plugin glue (__init__.py,
plugin.yaml, README.md, tests) is original work.
Plugin is opt-in like all bundled plugins:
hermes plugins enable security-guidance
Inspired by https://x.com/ClaudeDevs/status/1927108527247... — Anthropic
shipped this as their security-guidance plugin for Claude Code on
2026-05-26 with a measured 30-40% reduction in security-related PR
comments on internal rollout.
What's NOT ported (deferred):
* Layer 2 (LLM diff review on turn end) — would route through main
model by default on Hermes, real money on reasoning models. A
follow-up can wire it to a cheap aux model with explicit opt-in.
* Layer 3 (agentic commit-time review) — agent can run this on
demand via delegate_task today.
* .hermes/security-guidance.md project-rules file — only used by
layers 2/3 upstream.
88 lines
4.5 KiB
Markdown
88 lines
4.5 KiB
Markdown
# security-guidance
|
|
|
|
Pattern-matched security warnings for code the agent writes. When the agent
|
|
calls `write_file`, `patch`, or `skill_manage` with content that matches a
|
|
known-dangerous code pattern (eval, pickle.load, yaml.load, os.system,
|
|
subprocess with `shell=True`, `dangerouslySetInnerHTML`, `verify=False`, ECB
|
|
mode, GitHub Actions `${{ github.event.* }}` injection, `torch.load` without
|
|
`weights_only=True`, ...), the plugin appends a warning to the tool's result.
|
|
The file is still written; the model sees the warning in the next turn and
|
|
can fix the code or briefly document why the construct is safe.
|
|
|
|
This is layer 1 of Anthropic's `security-guidance` plugin design — a fast
|
|
first-pass that runs locally with zero LLM tokens spent. Layers 2 and 3 (LLM
|
|
diff review on turn end, agentic commit review) are not ported; the agent
|
|
can already run those kinds of reviews on demand via `delegate_task`.
|
|
|
|
## Coverage (25 rules)
|
|
|
|
The pattern set is forked verbatim from Anthropic's `claude-plugins-official`
|
|
under Apache-2.0. Categories:
|
|
|
|
| Category | Rules |
|
|
|---|---|
|
|
| Unsafe deserialization | `pickle.load`, `cPickle/cloudpickle/dill.load`, `marshal.loads`, `shelve.open`, `yaml.load`, `yaml.unsafe_load`, `torch.load` (without `weights_only=True`), `joblib.load`, `pandas.read_pickle`, `numpy.load(allow_pickle=True)` |
|
|
| Command injection | `os.system`, `subprocess(..., shell=True)`, JS `child_process.exec`, Go `exec.Command("sh"...)` |
|
|
| Code injection | `eval(`, JS `new Function(...)` |
|
|
| XSS sinks | `.innerHTML =`, `.outerHTML =`, `.insertAdjacentHTML(`, `document.write`, React `dangerouslySetInnerHTML` |
|
|
| Crypto footguns | AES ECB mode, Node `crypto.createCipher` (no IV), TLS verification disabled (`verify=False`, `rejectUnauthorized: false`, `InsecureSkipVerify: true`, ...) |
|
|
| XXE | `xml.etree`, `minidom`, `xml.sax` without `defusedxml` |
|
|
| Supply chain | `<script src="https://..."` without `integrity=` SRI hash |
|
|
| CI/CD injection | GitHub Actions workflow files using `${{ github.event.* }}` in `run:` |
|
|
|
|
The pattern data uses Python regex + literal-substring matching. Each rule
|
|
carries a per-extension `path_filter` lambda — Python-only rules skip `.js`,
|
|
JS rules skip `.py`, all rules skip `.md/.txt/.rst/.json/.yaml`. Lookbehind
|
|
assertions exclude method calls (so `model.eval()` and `redis.eval()` don't
|
|
trip the `eval(` rule). False-positive rate is mediocre but tolerable; the
|
|
plugin is warn-by-default precisely because of that.
|
|
|
|
## Enabling
|
|
|
|
Plugins are opt-in. Add it to your allow-list:
|
|
|
|
```bash
|
|
hermes plugins enable security-guidance
|
|
# or edit ~/.hermes/config.yaml manually:
|
|
plugins:
|
|
enabled:
|
|
- security-guidance
|
|
```
|
|
|
|
## Modes
|
|
|
|
| Env var | Default | Effect |
|
|
|---|---|---|
|
|
| (none) | warn | Appends a `⚠️ Security guidance` block to the tool result. The file is written. |
|
|
| `SECURITY_GUIDANCE_BLOCK=1` | unset | Refuses the write entirely with the warning as the block reason. Use for stricter environments. |
|
|
| `SECURITY_GUIDANCE_DISABLE=1` | unset | Kill switch — plugin loads but does nothing. |
|
|
|
|
## What it does **not** do (yet)
|
|
|
|
* **No LLM diff review.** Anthropic's layer 2 spawns an auxiliary LLM call
|
|
on every agent turn that touched files. On hermes that would route
|
|
through the main model by default (`auxiliary_client._resolve_auto()` is
|
|
main-model-first), which is real money on reasoning models. A separate
|
|
PR can wire layer 2 to a cheap auxiliary model with explicit opt-in.
|
|
* **No agentic commit review.** Anthropic's layer 3 spawns an SDK subagent
|
|
with `Read`/`Grep`/`Glob` to trace data flow on `git commit`. That's a
|
|
follow-up that would build on `delegate_task`.
|
|
* **No project-local rules file.** Anthropic's `.claude/claude-security-guidance.md`
|
|
is read by their layer 2/3 LLM prompts, not the pattern scanner. We can
|
|
add an analogous `.hermes/security-guidance.md` once layer 2 lands.
|
|
|
|
## Limitations
|
|
|
|
This is a best-effort assistive tool. Pattern matching can miss
|
|
vulnerabilities and produce false positives. Treat warnings as suggestions,
|
|
not a substitute for code review, SAST, dependency scanning, or pen testing.
|
|
|
|
## Attribution and licensing
|
|
|
|
* `patterns.py` is a verbatim fork from
|
|
[`anthropics/claude-plugins-official`](https://github.com/anthropics/claude-plugins-official/tree/main/plugins/security-guidance/hooks)
|
|
(commit `0bde168`, 2026-05-26), licensed under the
|
|
[Apache License 2.0](./LICENSE). See [NOTICE](./NOTICE) for the full
|
|
attribution.
|
|
* `__init__.py`, `plugin.yaml`, `README.md`, and tests are original work by
|
|
NousResearch, MIT-licensed alongside the rest of hermes-agent.
|