mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
fix(security): catch multi-word prompt injection in skills_guard
The regex `ignore\s+(previous|all|...)\s+instructions` only matched a single keyword between 'ignore' and 'instructions'. Phrases like 'ignore all prior instructions' bypassed the scanner entirely. Changed to `ignore\s+(?:\w+\s+)*(previous|all|...)\s+instructions` to allow arbitrary words before the keyword.
This commit is contained in:
parent
6366177118
commit
4ea29978fc
1 changed files with 1 additions and 1 deletions
|
|
@ -157,7 +157,7 @@ THREAT_PATTERNS = [
|
|||
"markdown link with variable interpolation"),
|
||||
|
||||
# ── Prompt injection ──
|
||||
(r'ignore\s+(previous|all|above|prior)\s+instructions',
|
||||
(r'ignore\s+(?:\w+\s+)*(previous|all|above|prior)\s+instructions',
|
||||
"prompt_injection_ignore", "critical", "injection",
|
||||
"prompt injection: ignore previous instructions"),
|
||||
(r'you\s+are\s+now\s+',
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue