hermes-agent

mirrors/hermes-agent

Fork 0

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-02 12:13:05 +00:00

Commit graph

Author	SHA1	Message	Date
MarioYounger	3b2bb30c5d	fix(security): harden heredoc approval, NFKC homograph fold, env-var filter Three independent security-scanner hardenings, re-homed onto the current shared threat-pattern architecture (tools/threat_patterns.py): - approval.py: add bash/sh/zsh/ksh heredoc to DANGEROUS_PATTERNS. The existing heredoc pattern only covered python/perl/ruby/node, so `bash <<'EOF' ... EOF` ran arbitrary shell — including exfil pipelines whose inner commands don't individually match a pattern — with no prompt. - threat_patterns.py: apply unicodedata.normalize("NFKC", ...) before pattern matching so full-width / compatibility homographs (e.g. `ｃａｔ ~/.hermes/.env`) are folded to ASCII and no longer bypass the keyword scanners. Invisible-char detection still runs on the raw content first (NFKC can strip those codepoints). - code_execution_tool.py: add CREDS/BEARER/APIKEY to _SECRET_SUBSTRINGS so vars like HERMES_LLM_CREDS, API_BEARER, MY_APIKEY are scrubbed from the sandbox env. PASS was intentionally dropped from the original proposal — it false-positives on BYPASS_CACHE / COMPASS_DIR / PASSENGER_HOST while PASSWORD/PASSWD already cover the credential cases. The original PR also proposed a 'synonym' injection pattern block (overlook/forget/set aside/bypass/discard + developer-mode); dropped here because it false-positives on ordinary AGENTS.md/SOUL.md prose ("don't forget to follow the rules", "run in developer mode"), exactly the bossy-English class threat_patterns.py is documented to avoid. Salvaged from #9028. Co-authored-by: Hermes Agent <agent@nousresearch.com>	2026-06-30 02:59:46 -07:00
Teknium	099df3cd89	fix(security): stop blocking AGENTS.md/SOUL.md that name an agent 'Praxis' (#52925 ) The known_c2_framework threat pattern included 'praxis' in its alternation alongside genuine offensive-security tool brands (Cobalt Strike, Sliver, Havoc, Mythic, Metasploit, Brainworm). Unlike those distinctive brand names, 'praxis' is a common English word (Greek for practice/action) and a legitimate agent name, so any context file that mentioned an agent named Praxis matched at 'context' scope and the whole AGENTS.md / SOUL.md was replaced with a [BLOCKED] placeholder before it reached the system prompt. Remove 'praxis' from the alternation and add a guard comment: every token in this list must be a distinctive tool brand, not a common word. Real C2 brands still fire.	2026-06-26 00:36:01 -07:00
Teknium	0dee92df22	feat(security): promptware defense — shared threat patterns + memory load-time scan + tool-result delimiters (#32269 ) Hardens the context window against Brainworm-class promptware attacks (see #496). Three changes: 1. tools/threat_patterns.py — single source of truth for injection/promptware patterns. Replaces the duplicated pattern lists in prompt_builder.py and memory_tool.py. Adds ~15 new Brainworm/C2 patterns (node registration, heartbeat/beacon, pull tasking, anti-forensic disk avoidance, identity override, known framework names). Three scopes — 'all' (narrow, classic injection), 'context' (adds promptware/role-play, broader detection), 'strict' (adds persistence/SSH-backdoor patterns for user-mediated writes). 2. MemoryStore.load_from_disk() now scans entries at snapshot-build time. Poisoned entries are replaced with [BLOCKED: ...] placeholders in the frozen system-prompt snapshot. Live state keeps the original so the user can still inspect + remove via memory(action=read/remove). Scan is deterministic from disk bytes — prefix-cache invariant holds. 3. make_tool_result_message() wraps results from high-risk tools (web_extract, web_search, browser_, mcp_) in <untrusted_tool_result source="...">...</untrusted_tool_result> delimiters with framing prose telling the model the content is data, not instructions. Architectural defense against indirect injection from poisoned web pages, GitHub issues, MCP responses — does NOT regex-scan tool results (pattern arms race + per-iteration latency). Multimodal content lists pass through unwrapped to preserve adapter compatibility. Pattern philosophy: anchor on C2-specific vocabulary or unambiguous attack behavior, NOT on bossy English. Dropped patterns suggested in #496 that would have tripped legitimate content: standalone 'you are obligated to', 'do not respond immediately', 'you must X' without a C2-verb anchor. Validation: - 257/257 targeted tests pass (test_threat_patterns + test_memory_tool + test_tool_dispatch_helpers + test_prompt_builder) - E2E run with real Brainworm payload: blocked from AGENTS.md context-file path, blocked from MEMORY.md snapshot, wrapped in delimiters when arriving via web_extract. Legitimate 'you must follow conventions' phrasing not flagged. Explicitly NOT in this PR (per #496 discussion): - Per-tool-result regex scanning (pattern arms race) - SessionBehaviorMonitor / polling-loop detection (wrong layer) - Outbound network gating (Docker backend already covers this) - security.context_scanning warn\|block knob (current behavior is always block-with-placeholder — there's no warn mode that makes sense) Closes #496 for Phase 1 + the architectural delimiter piece of Phase 2. Phase 3 stays in tracking issue territory.	2026-05-25 14:52:24 -07:00

Author

SHA1

Message

Date

MarioYounger

3b2bb30c5d

fix(security): harden heredoc approval, NFKC homograph fold, env-var filter

Three independent security-scanner hardenings, re-homed onto the current
shared threat-pattern architecture (tools/threat_patterns.py):

- approval.py: add bash/sh/zsh/ksh heredoc to DANGEROUS_PATTERNS. The
  existing heredoc pattern only covered python/perl/ruby/node, so
  `bash <<'EOF' ... EOF` ran arbitrary shell — including exfil pipelines
  whose inner commands don't individually match a pattern — with no prompt.

- threat_patterns.py: apply unicodedata.normalize("NFKC", ...) before
  pattern matching so full-width / compatibility homographs (e.g.
  `ｃａｔ ~/.hermes/.env`) are folded to ASCII and no longer bypass the
  keyword scanners. Invisible-char detection still runs on the raw content
  first (NFKC can strip those codepoints).

- code_execution_tool.py: add CREDS/BEARER/APIKEY to _SECRET_SUBSTRINGS so
  vars like HERMES_LLM_CREDS, API_BEARER, MY_APIKEY are scrubbed from the
  sandbox env. PASS was intentionally dropped from the original proposal —
  it false-positives on BYPASS_CACHE / COMPASS_DIR / PASSENGER_HOST while
  PASSWORD/PASSWD already cover the credential cases.

The original PR also proposed a 'synonym' injection pattern block
(overlook/forget/set aside/bypass/discard + developer-mode); dropped here
because it false-positives on ordinary AGENTS.md/SOUL.md prose ("don't
forget to follow the rules", "run in developer mode"), exactly the
bossy-English class threat_patterns.py is documented to avoid.

Salvaged from #9028.

Co-authored-by: Hermes Agent <agent@nousresearch.com>

2026-06-30 02:59:46 -07:00

Teknium

099df3cd89

fix(security): stop blocking AGENTS.md/SOUL.md that name an agent 'Praxis' (#52925 )

The known_c2_framework threat pattern included 'praxis' in its
alternation alongside genuine offensive-security tool brands (Cobalt
Strike, Sliver, Havoc, Mythic, Metasploit, Brainworm). Unlike those
distinctive brand names, 'praxis' is a common English word (Greek for
practice/action) and a legitimate agent name, so any context file that
mentioned an agent named Praxis matched at 'context' scope and the whole
AGENTS.md / SOUL.md was replaced with a [BLOCKED] placeholder before it
reached the system prompt.

Remove 'praxis' from the alternation and add a guard comment: every
token in this list must be a distinctive tool brand, not a common word.
Real C2 brands still fire.

2026-06-26 00:36:01 -07:00

Teknium

0dee92df22

feat(security): promptware defense — shared threat patterns + memory load-time scan + tool-result delimiters (#32269 )

Hardens the context window against Brainworm-class promptware attacks
(see #496). Three changes:

1. tools/threat_patterns.py — single source of truth for injection/promptware
   patterns. Replaces the duplicated pattern lists in prompt_builder.py and
   memory_tool.py. Adds ~15 new Brainworm/C2 patterns (node registration,
   heartbeat/beacon, pull tasking, anti-forensic disk avoidance, identity
   override, known framework names). Three scopes — 'all' (narrow, classic
   injection), 'context' (adds promptware/role-play, broader detection),
   'strict' (adds persistence/SSH-backdoor patterns for user-mediated writes).

2. MemoryStore.load_from_disk() now scans entries at snapshot-build time.
   Poisoned entries are replaced with [BLOCKED: ...] placeholders in the
   frozen system-prompt snapshot. Live state keeps the original so the
   user can still inspect + remove via memory(action=read/remove). Scan is
   deterministic from disk bytes — prefix-cache invariant holds.

3. make_tool_result_message() wraps results from high-risk tools
   (web_extract, web_search, browser_*, mcp_*) in
   <untrusted_tool_result source="...">...</untrusted_tool_result>
   delimiters with framing prose telling the model the content is data,
   not instructions. Architectural defense against indirect injection
   from poisoned web pages, GitHub issues, MCP responses — does NOT
   regex-scan tool results (pattern arms race + per-iteration latency).
   Multimodal content lists pass through unwrapped to preserve adapter
   compatibility.

Pattern philosophy: anchor on C2-specific vocabulary or unambiguous attack
behavior, NOT on bossy English. Dropped patterns suggested in #496 that
would have tripped legitimate content: standalone 'you are obligated to',
'do not respond immediately', 'you must X' without a C2-verb anchor.

Validation:
- 257/257 targeted tests pass (test_threat_patterns + test_memory_tool +
  test_tool_dispatch_helpers + test_prompt_builder)
- E2E run with real Brainworm payload: blocked from AGENTS.md context-file
  path, blocked from MEMORY.md snapshot, wrapped in delimiters when
  arriving via web_extract. Legitimate 'you must follow conventions'
  phrasing not flagged.

Explicitly NOT in this PR (per #496 discussion):
- Per-tool-result regex scanning (pattern arms race)
- SessionBehaviorMonitor / polling-loop detection (wrong layer)
- Outbound network gating (Docker backend already covers this)
- security.context_scanning warn|block knob (current behavior is always
  block-with-placeholder — there's no warn mode that makes sense)

Closes #496 for Phase 1 + the architectural delimiter piece of Phase 2.
Phase 3 stays in tracking issue territory.

2026-05-25 14:52:24 -07:00

3 commits