Commit graph

3 commits

Author SHA1 Message Date
MarioYounger
3b2bb30c5d fix(security): harden heredoc approval, NFKC homograph fold, env-var filter
Three independent security-scanner hardenings, re-homed onto the current
shared threat-pattern architecture (tools/threat_patterns.py):

- approval.py: add bash/sh/zsh/ksh heredoc to DANGEROUS_PATTERNS. The
  existing heredoc pattern only covered python/perl/ruby/node, so
  `bash <<'EOF' ... EOF` ran arbitrary shell — including exfil pipelines
  whose inner commands don't individually match a pattern — with no prompt.

- threat_patterns.py: apply unicodedata.normalize("NFKC", ...) before
  pattern matching so full-width / compatibility homographs (e.g.
  `cat ~/.hermes/.env`) are folded to ASCII and no longer bypass the
  keyword scanners. Invisible-char detection still runs on the raw content
  first (NFKC can strip those codepoints).

- code_execution_tool.py: add CREDS/BEARER/APIKEY to _SECRET_SUBSTRINGS so
  vars like HERMES_LLM_CREDS, API_BEARER, MY_APIKEY are scrubbed from the
  sandbox env. PASS was intentionally dropped from the original proposal —
  it false-positives on BYPASS_CACHE / COMPASS_DIR / PASSENGER_HOST while
  PASSWORD/PASSWD already cover the credential cases.

The original PR also proposed a 'synonym' injection pattern block
(overlook/forget/set aside/bypass/discard + developer-mode); dropped here
because it false-positives on ordinary AGENTS.md/SOUL.md prose ("don't
forget to follow the rules", "run in developer mode"), exactly the
bossy-English class threat_patterns.py is documented to avoid.

Salvaged from #9028.

Co-authored-by: Hermes Agent <agent@nousresearch.com>
2026-06-30 02:59:46 -07:00
Teknium
099df3cd89
fix(security): stop blocking AGENTS.md/SOUL.md that name an agent 'Praxis' (#52925)
The known_c2_framework threat pattern included 'praxis' in its
alternation alongside genuine offensive-security tool brands (Cobalt
Strike, Sliver, Havoc, Mythic, Metasploit, Brainworm). Unlike those
distinctive brand names, 'praxis' is a common English word (Greek for
practice/action) and a legitimate agent name, so any context file that
mentioned an agent named Praxis matched at 'context' scope and the whole
AGENTS.md / SOUL.md was replaced with a [BLOCKED] placeholder before it
reached the system prompt.

Remove 'praxis' from the alternation and add a guard comment: every
token in this list must be a distinctive tool brand, not a common word.
Real C2 brands still fire.
2026-06-26 00:36:01 -07:00
Teknium
0dee92df22
feat(security): promptware defense — shared threat patterns + memory load-time scan + tool-result delimiters (#32269)
Hardens the context window against Brainworm-class promptware attacks
(see #496). Three changes:

1. tools/threat_patterns.py — single source of truth for injection/promptware
   patterns. Replaces the duplicated pattern lists in prompt_builder.py and
   memory_tool.py. Adds ~15 new Brainworm/C2 patterns (node registration,
   heartbeat/beacon, pull tasking, anti-forensic disk avoidance, identity
   override, known framework names). Three scopes — 'all' (narrow, classic
   injection), 'context' (adds promptware/role-play, broader detection),
   'strict' (adds persistence/SSH-backdoor patterns for user-mediated writes).

2. MemoryStore.load_from_disk() now scans entries at snapshot-build time.
   Poisoned entries are replaced with [BLOCKED: ...] placeholders in the
   frozen system-prompt snapshot. Live state keeps the original so the
   user can still inspect + remove via memory(action=read/remove). Scan is
   deterministic from disk bytes — prefix-cache invariant holds.

3. make_tool_result_message() wraps results from high-risk tools
   (web_extract, web_search, browser_*, mcp_*) in
   <untrusted_tool_result source="...">...</untrusted_tool_result>
   delimiters with framing prose telling the model the content is data,
   not instructions. Architectural defense against indirect injection
   from poisoned web pages, GitHub issues, MCP responses — does NOT
   regex-scan tool results (pattern arms race + per-iteration latency).
   Multimodal content lists pass through unwrapped to preserve adapter
   compatibility.

Pattern philosophy: anchor on C2-specific vocabulary or unambiguous attack
behavior, NOT on bossy English. Dropped patterns suggested in #496 that
would have tripped legitimate content: standalone 'you are obligated to',
'do not respond immediately', 'you must X' without a C2-verb anchor.

Validation:
- 257/257 targeted tests pass (test_threat_patterns + test_memory_tool +
  test_tool_dispatch_helpers + test_prompt_builder)
- E2E run with real Brainworm payload: blocked from AGENTS.md context-file
  path, blocked from MEMORY.md snapshot, wrapped in delimiters when
  arriving via web_extract. Legitimate 'you must follow conventions'
  phrasing not flagged.

Explicitly NOT in this PR (per #496 discussion):
- Per-tool-result regex scanning (pattern arms race)
- SessionBehaviorMonitor / polling-loop detection (wrong layer)
- Outbound network gating (Docker backend already covers this)
- security.context_scanning warn|block knob (current behavior is always
  block-with-placeholder — there's no warn mode that makes sense)

Closes #496 for Phase 1 + the architectural delimiter piece of Phase 2.
Phase 3 stays in tracking issue territory.
2026-05-25 14:52:24 -07:00