mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-30 06:41:51 +00:00
fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint (#31290)
* fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint
Adds a soft guard so an agent running under one Hermes profile cannot
silently edit a different profile's skills/plugins/cron/memories.
Three layers:
A. agent/file_safety.classify_cross_profile_target
Classifies a write target against the active HERMES_HOME. Returns
a {active_profile, target_profile, area, target_path} dict when the
path lands in another profile's scoped area. PROFILE_SCOPED_AREAS =
(skills, plugins, cron, memories). get_cross_profile_warning()
wraps it into a model-facing error string that names both profiles,
names the area, and points at the cross_profile=True bypass.
Defense-in-depth, NOT a security boundary — the terminal tool runs
as the same OS user and can write any of these paths directly. The
guard exists to prevent confused-agent corruption, not to stop a
determined attacker. SECURITY.md §3.2 (terminal-bypass posture)
still applies.
Wired into tools/file_tools.write_file_tool and patch_tool with a
cross_profile=False kwarg. WRITE_FILE_SCHEMA and PATCH_SCHEMA both
advertise cross_profile so the model can pass it after explicit
user direction. patch_tool extracts target paths from V4A patch
bodies before checking (same shape as the existing sensitive-path
check).
skill_manage is already scoped to the active profile's SKILLS_DIR
by construction, so no extra guard wiring is needed there. The
D-side error message (below) still names other profiles when the
skill exists elsewhere.
B. agent/system_prompt
One deterministic line near the environment-hints block names the
active profile and tells the model not to modify another profile's
skills/plugins/cron/memories without explicit direction. Profile
name is stable for the lifetime of the AIAgent, so the line is
prompt-cache-safe.
D. tools/skill_manager_tool._skill_not_found_error
Replaces the bare "Skill 'X' not found." with a message that:
- names the active profile,
- searches OTHER profiles' skills dirs for the same name,
- names the profile(s) where the skill exists and the path,
- suggests `hermes -p <name>` to switch profiles, or
cross_profile=True for an explicit edit.
All 5 "not found" sites in skill_manager_tool (edit, patch, delete,
write_file, remove_file) now go through the helper.
Reference incident (May 2026): a hermes-security profile session
edited skills under both ~/.hermes/profiles/hermes-security/skills/
AND ~/.hermes/skills/ (the default profile's skills) without
realizing the second path belonged to a different profile. Three of
the four skill files needed manual restoration afterward.
What this PR does NOT do:
* No hard block. The terminal tool can still touch any of these
paths with no guard — same posture as the dangerous-command
approval flow. SECURITY.md §3.2 applies.
* No regex sweep on terminal commands for cross-profile paths.
That direction is a Skills-Guard-style arms race (cd + relative
paths, base64, etc.) and would false-positive on legitimate
cross-profile reads. Filed as a follow-up.
* No on-disk path migration. ~/.hermes/skills/ remains the
default profile's skills dir; this PR is about telling the
agent about that boundary, not changing the layout.
Tests:
tests/agent/test_file_safety_cross_profile.py (16 tests)
- _resolve_active_profile_name covers default/named/failure paths
- classify_cross_profile_target covers all four scoped areas,
both directions (default → named, named → default, named → named),
non-Hermes paths, and root-level config files
- get_cross_profile_warning covers in-profile no-op, cross-profile
message shape, and the defense-in-depth self-documentation
tests/tools/test_cross_profile_guard.py (12 tests)
- write_file: in-profile allow, cross-profile block, cross_profile=True
bypass, non-Hermes pass-through
- patch: replace-mode block, cross_profile=True bypass, V4A patch
path extraction
- skill_manage: error names the other profile (single + multiple),
missing-everywhere falls back to skills_list hint
- system prompt: contract-level checks (both branches present,
cross_profile=True mentioned, ~/.hermes/profiles/ referenced)
All 207 existing tests in file_safety/file_operations/skill_manager
still pass. 10 system-prompt tests still pass.
E2E verified: the exact incident scenario (security profile editing
default's hermes-agent-dev skill) is now blocked with the warning
message; cross_profile=True unblocks.
* fix(code_execution): add cross_profile to write_file/patch stubs
The cross_profile kwarg added to write_file_tool/patch_tool needs to
flow through the execute_code sandbox stubs in _TOOL_STUBS so the
test_stubs_cover_all_schema_params drift test passes. Without this,
scripts running inside execute_code couldn't pass cross_profile=True
through hermes_tools.write_file().
Caught by CI on PR #31290.
This commit is contained in:
parent
b207dc28b3
commit
d3c167b644
7 changed files with 846 additions and 19 deletions
|
|
@ -254,3 +254,148 @@ def get_read_block_error(path: str) -> Optional[str]:
|
|||
)
|
||||
|
||||
return None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Cross-profile write guard (#TBD)
|
||||
#
|
||||
# Hermes profiles are separate HERMES_HOME dirs under
|
||||
# ``<root>/profiles/<name>/``. Each profile has its own skills/, plugins/,
|
||||
# cron/, memories/. When an agent runs under one profile, writing into
|
||||
# ANOTHER profile's directories is almost always wrong — those skills /
|
||||
# plugins / cron jobs / memories affect a different session the user runs
|
||||
# from a different shell.
|
||||
#
|
||||
# Soft guard, NOT a security boundary: the agent runs as the same OS user
|
||||
# and has unrestricted terminal access, so this returns a warning the model
|
||||
# can choose to honor or override with ``cross_profile=True``. Same shape
|
||||
# as the dangerous-command approval flow — the agent is told the boundary
|
||||
# exists, and explicit user direction is required to cross it.
|
||||
#
|
||||
# Reference: May 2026 incident where a hermes-security profile session
|
||||
# edited skills under both ``~/.hermes/profiles/hermes-security/skills/``
|
||||
# AND ``~/.hermes/skills/`` (the default profile's skills) without realizing
|
||||
# the second path belonged to a different profile.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# Profile-scoped directories under HERMES_HOME / <root> / <root>/profiles/<X>/
|
||||
# that should be guarded. Adding a new area here extends the guard with no
|
||||
# other code change.
|
||||
PROFILE_SCOPED_AREAS = ("skills", "plugins", "cron", "memories")
|
||||
|
||||
|
||||
def _resolve_active_profile_name() -> str:
|
||||
"""Return the active profile name derived from HERMES_HOME.
|
||||
|
||||
``~/.hermes`` -> ``"default"``
|
||||
``~/.hermes/profiles/X`` -> ``"X"``
|
||||
|
||||
Falls back to ``"default"`` on any resolution failure so the guard
|
||||
never raises into the tool path.
|
||||
"""
|
||||
try:
|
||||
home_real = _hermes_home_path().resolve()
|
||||
root_real = _hermes_root_path().resolve()
|
||||
except (OSError, RuntimeError):
|
||||
return "default"
|
||||
profiles_dir = root_real / "profiles"
|
||||
try:
|
||||
rel = home_real.relative_to(profiles_dir)
|
||||
parts = rel.parts
|
||||
if len(parts) >= 1:
|
||||
return parts[0]
|
||||
except ValueError:
|
||||
pass
|
||||
return "default"
|
||||
|
||||
|
||||
def classify_cross_profile_target(path: str) -> Optional[dict]:
|
||||
"""Classify a write target as cross-profile if it lands in another
|
||||
profile's scoped area (skills/plugins/cron/memories).
|
||||
|
||||
Returns ``None`` when the target is outside Hermes scope, or is inside
|
||||
the ACTIVE profile, or doesn't hit a profile-scoped area. Otherwise
|
||||
returns a dict with:
|
||||
|
||||
* ``active_profile``: name of the profile the agent is running as
|
||||
* ``target_profile``: name of the profile the path belongs to
|
||||
* ``area``: which scoped area (``"skills"``, ``"plugins"``, etc.)
|
||||
* ``target_path``: the resolved path string
|
||||
|
||||
The caller decides what to do with the result — surface a warning to
|
||||
the model, prompt the user, or (with explicit consent /
|
||||
``cross_profile=True``) proceed anyway.
|
||||
"""
|
||||
try:
|
||||
target = Path(os.path.expanduser(str(path))).resolve()
|
||||
root_real = _hermes_root_path().resolve()
|
||||
except (OSError, RuntimeError):
|
||||
return None
|
||||
|
||||
target_profile: Optional[str] = None
|
||||
area: Optional[str] = None
|
||||
|
||||
try:
|
||||
rel = target.relative_to(root_real)
|
||||
except ValueError:
|
||||
return None
|
||||
|
||||
parts = rel.parts
|
||||
if not parts:
|
||||
return None
|
||||
|
||||
if parts[0] in PROFILE_SCOPED_AREAS:
|
||||
# ``<root>/<area>/...`` → default profile.
|
||||
target_profile = "default"
|
||||
area = parts[0]
|
||||
elif (
|
||||
parts[0] == "profiles"
|
||||
and len(parts) >= 3
|
||||
and parts[2] in PROFILE_SCOPED_AREAS
|
||||
):
|
||||
# ``<root>/profiles/<name>/<area>/...`` → named profile.
|
||||
target_profile = parts[1]
|
||||
area = parts[2]
|
||||
else:
|
||||
return None
|
||||
|
||||
active_profile = _resolve_active_profile_name()
|
||||
if target_profile == active_profile:
|
||||
# In-profile write — not a cross-profile event.
|
||||
return None
|
||||
|
||||
return {
|
||||
"active_profile": active_profile,
|
||||
"target_profile": target_profile,
|
||||
"area": area,
|
||||
"target_path": str(target),
|
||||
}
|
||||
|
||||
|
||||
def get_cross_profile_warning(path: str) -> Optional[str]:
|
||||
"""Return a model-facing warning string when ``path`` is cross-profile.
|
||||
|
||||
Returns ``None`` when the write is in-scope (same profile) or outside
|
||||
Hermes entirely. Caller is expected to surface the warning to the
|
||||
agent as a tool-result error, NOT to silently allow the write — the
|
||||
agent must either get explicit user direction to proceed, or pass
|
||||
``cross_profile=True`` to its write tool.
|
||||
|
||||
This is defense-in-depth: the terminal tool runs as the same OS user
|
||||
and can write any of these paths without going through this guard.
|
||||
Treat the guard as a confusion-reducer, not a security boundary.
|
||||
"""
|
||||
info = classify_cross_profile_target(path)
|
||||
if info is None:
|
||||
return None
|
||||
return (
|
||||
f"Cross-profile write blocked by soft guard: {info['target_path']} "
|
||||
f"belongs to Hermes profile {info['target_profile']!r}, but the "
|
||||
f"agent is running under profile {info['active_profile']!r}. "
|
||||
f"Editing another profile's {info['area']}/ will affect that "
|
||||
f"profile's future sessions, not the one you are currently in. "
|
||||
f"Confirm with the user before proceeding. To bypass this guard "
|
||||
f"after explicit user direction, retry the call with "
|
||||
f"``cross_profile=True``. (Defense-in-depth — not a security "
|
||||
f"boundary; the terminal tool can still bypass.)"
|
||||
)
|
||||
|
|
|
|||
|
|
@ -205,6 +205,40 @@ def build_system_prompt_parts(agent: Any, system_message: Optional[str] = None)
|
|||
if _env_hints:
|
||||
stable_parts.append(_env_hints)
|
||||
|
||||
# Active-profile hint — names the Hermes profile the agent is running
|
||||
# under so it doesn't conflate ~/.hermes/skills/ (default profile) with
|
||||
# ~/.hermes/profiles/<active>/skills/ (this profile's). Deterministic
|
||||
# for the lifetime of the agent — profile name doesn't change
|
||||
# mid-session, so this doesn't break the prompt cache.
|
||||
# See file_safety._resolve_active_profile_name + classify_cross_profile_target
|
||||
# for the matching tool-side guard.
|
||||
try:
|
||||
from agent.file_safety import _resolve_active_profile_name
|
||||
active_profile = _resolve_active_profile_name()
|
||||
except Exception:
|
||||
active_profile = "default"
|
||||
if active_profile == "default":
|
||||
stable_parts.append(
|
||||
"Active Hermes profile: default. Other profiles (if any) live "
|
||||
"under ~/.hermes/profiles/<name>/. Each profile has its own "
|
||||
"skills/, plugins/, cron/, and memories/ that affect a different "
|
||||
"session than this one. Do not modify another profile's "
|
||||
"skills/plugins/cron/memories unless the user explicitly directs "
|
||||
"you to."
|
||||
)
|
||||
else:
|
||||
stable_parts.append(
|
||||
f"Active Hermes profile: {active_profile}. This session reads "
|
||||
f"and writes ~/.hermes/profiles/{active_profile}/. The default "
|
||||
f"profile's data lives at ~/.hermes/skills/, ~/.hermes/plugins/, "
|
||||
f"~/.hermes/cron/, ~/.hermes/memories/ — those belong to a "
|
||||
f"different session run from a different shell. Do NOT modify "
|
||||
f"another profile's skills/plugins/cron/memories unless the user "
|
||||
f"explicitly directs you to. The cross-profile write guard will "
|
||||
f"refuse such writes by default; pass cross_profile=True only "
|
||||
f"after explicit direction."
|
||||
)
|
||||
|
||||
platform_key = (agent.platform or "").lower().strip()
|
||||
if platform_key in PLATFORM_HINTS:
|
||||
stable_parts.append(PLATFORM_HINTS[platform_key])
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue