fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint (#31290)

* fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint Adds a soft guard so an agent running under one Hermes profile cannot silently edit a different profile's skills/plugins/cron/memories. Three layers: A. agent/file_safety.classify_cross_profile_target Classifies a write target against the active HERMES_HOME. Returns a {active_profile, target_profile, area, target_path} dict when the path lands in another profile's scoped area. PROFILE_SCOPED_AREAS = (skills, plugins, cron, memories). get_cross_profile_warning() wraps it into a model-facing error string that names both profiles, names the area, and points at the cross_profile=True bypass. Defense-in-depth, NOT a security boundary — the terminal tool runs as the same OS user and can write any of these paths directly. The guard exists to prevent confused-agent corruption, not to stop a determined attacker. SECURITY.md §3.2 (terminal-bypass posture) still applies. Wired into tools/file_tools.write_file_tool and patch_tool with a cross_profile=False kwarg. WRITE_FILE_SCHEMA and PATCH_SCHEMA both advertise cross_profile so the model can pass it after explicit user direction. patch_tool extracts target paths from V4A patch bodies before checking (same shape as the existing sensitive-path check). skill_manage is already scoped to the active profile's SKILLS_DIR by construction, so no extra guard wiring is needed there. The D-side error message (below) still names other profiles when the skill exists elsewhere. B. agent/system_prompt One deterministic line near the environment-hints block names the active profile and tells the model not to modify another profile's skills/plugins/cron/memories without explicit direction. Profile name is stable for the lifetime of the AIAgent, so the line is prompt-cache-safe. D. tools/skill_manager_tool._skill_not_found_error Replaces the bare "Skill 'X' not found." with a message that: - names the active profile, - searches OTHER profiles' skills dirs for the same name, - names the profile(s) where the skill exists and the path, - suggests `hermes -p <name>` to switch profiles, or cross_profile=True for an explicit edit. All 5 "not found" sites in skill_manager_tool (edit, patch, delete, write_file, remove_file) now go through the helper. Reference incident (May 2026): a hermes-security profile session edited skills under both ~/.hermes/profiles/hermes-security/skills/ AND ~/.hermes/skills/ (the default profile's skills) without realizing the second path belonged to a different profile. Three of the four skill files needed manual restoration afterward. What this PR does NOT do: * No hard block. The terminal tool can still touch any of these paths with no guard — same posture as the dangerous-command approval flow. SECURITY.md §3.2 applies. * No regex sweep on terminal commands for cross-profile paths. That direction is a Skills-Guard-style arms race (cd + relative paths, base64, etc.) and would false-positive on legitimate cross-profile reads. Filed as a follow-up. * No on-disk path migration. ~/.hermes/skills/ remains the default profile's skills dir; this PR is about telling the agent about that boundary, not changing the layout. Tests: tests/agent/test_file_safety_cross_profile.py (16 tests) - _resolve_active_profile_name covers default/named/failure paths - classify_cross_profile_target covers all four scoped areas, both directions (default → named, named → default, named → named), non-Hermes paths, and root-level config files - get_cross_profile_warning covers in-profile no-op, cross-profile message shape, and the defense-in-depth self-documentation tests/tools/test_cross_profile_guard.py (12 tests) - write_file: in-profile allow, cross-profile block, cross_profile=True bypass, non-Hermes pass-through - patch: replace-mode block, cross_profile=True bypass, V4A patch path extraction - skill_manage: error names the other profile (single + multiple), missing-everywhere falls back to skills_list hint - system prompt: contract-level checks (both branches present, cross_profile=True mentioned, ~/.hermes/profiles/ referenced) All 207 existing tests in file_safety/file_operations/skill_manager still pass. 10 system-prompt tests still pass. E2E verified: the exact incident scenario (security profile editing default's hermes-agent-dev skill) is now blocked with the warning message; cross_profile=True unblocks. * fix(code_execution): add cross_profile to write_file/patch stubs The cross_profile kwarg added to write_file_tool/patch_tool needs to flow through the execute_code sandbox stubs in _TOOL_STUBS so the test_stubs_cover_all_schema_params drift test passes. Without this, scripts running inside execute_code couldn't pass cross_profile=True through hermes_tools.write_file(). Caught by CI on PR #31290.
2026-07-14 14:12:44 +00:00 · 2026-05-24 00:38:17 -07:00 · 2026-05-24 00:38:17 -07:00 · d3c167b644
commit d3c167b644
parent b207dc28b3
7 changed files with 846 additions and 19 deletions
--- a/agent/file_safety.py
+++ b/agent/file_safety.py
@ -254,3 +254,148 @@ def get_read_block_error(path: str) -> Optional[str]:
        )

    return None
+
+
+# ---------------------------------------------------------------------------
+# Cross-profile write guard (#TBD)
+#
+# Hermes profiles are separate HERMES_HOME dirs under
+# ``<root>/profiles/<name>/``. Each profile has its own skills/, plugins/,
+# cron/, memories/. When an agent runs under one profile, writing into
+# ANOTHER profile's directories is almost always wrong — those skills /
+# plugins / cron jobs / memories affect a different session the user runs
+# from a different shell.
+#
+# Soft guard, NOT a security boundary: the agent runs as the same OS user
+# and has unrestricted terminal access, so this returns a warning the model
+# can choose to honor or override with ``cross_profile=True``. Same shape
+# as the dangerous-command approval flow — the agent is told the boundary
+# exists, and explicit user direction is required to cross it.
+#
+# Reference: May 2026 incident where a hermes-security profile session
+# edited skills under both ``~/.hermes/profiles/hermes-security/skills/``
+# AND ``~/.hermes/skills/`` (the default profile's skills) without realizing
+# the second path belonged to a different profile.
+# ---------------------------------------------------------------------------
+
+# Profile-scoped directories under HERMES_HOME / <root> / <root>/profiles/<X>/
+# that should be guarded. Adding a new area here extends the guard with no
+# other code change.
+PROFILE_SCOPED_AREAS = ("skills", "plugins", "cron", "memories")
+
+
+def _resolve_active_profile_name() -> str:
+    """Return the active profile name derived from HERMES_HOME.
+
+    ``~/.hermes``              -> ``"default"``
+    ``~/.hermes/profiles/X``  -> ``"X"``
+
+    Falls back to ``"default"`` on any resolution failure so the guard
+    never raises into the tool path.
+    """
+    try:
+        home_real = _hermes_home_path().resolve()
+        root_real = _hermes_root_path().resolve()
+    except (OSError, RuntimeError):
+        return "default"
+    profiles_dir = root_real / "profiles"
+    try:
+        rel = home_real.relative_to(profiles_dir)
+        parts = rel.parts
+        if len(parts) >= 1:
+            return parts[0]
+    except ValueError:
+        pass
+    return "default"
+
+
+def classify_cross_profile_target(path: str) -> Optional[dict]:
+    """Classify a write target as cross-profile if it lands in another
+    profile's scoped area (skills/plugins/cron/memories).
+
+    Returns ``None`` when the target is outside Hermes scope, or is inside
+    the ACTIVE profile, or doesn't hit a profile-scoped area. Otherwise
+    returns a dict with:
+
+      * ``active_profile``: name of the profile the agent is running as
+      * ``target_profile``: name of the profile the path belongs to
+      * ``area``: which scoped area (``"skills"``, ``"plugins"``, etc.)
+      * ``target_path``: the resolved path string
+
+    The caller decides what to do with the result — surface a warning to
+    the model, prompt the user, or (with explicit consent /
+    ``cross_profile=True``) proceed anyway.
+    """
+    try:
+        target = Path(os.path.expanduser(str(path))).resolve()
+        root_real = _hermes_root_path().resolve()
+    except (OSError, RuntimeError):
+        return None
+
+    target_profile: Optional[str] = None
+    area: Optional[str] = None
+
+    try:
+        rel = target.relative_to(root_real)
+    except ValueError:
+        return None
+
+    parts = rel.parts
+    if not parts:
+        return None
+
+    if parts[0] in PROFILE_SCOPED_AREAS:
+        # ``<root>/<area>/...`` → default profile.
+        target_profile = "default"
+        area = parts[0]
+    elif (
+        parts[0] == "profiles"
+        and len(parts) >= 3
+        and parts[2] in PROFILE_SCOPED_AREAS
+    ):
+        # ``<root>/profiles/<name>/<area>/...`` → named profile.
+        target_profile = parts[1]
+        area = parts[2]
+    else:
+        return None
+
+    active_profile = _resolve_active_profile_name()
+    if target_profile == active_profile:
+        # In-profile write — not a cross-profile event.
+        return None
+
+    return {
+        "active_profile": active_profile,
+        "target_profile": target_profile,
+        "area": area,
+        "target_path": str(target),
+    }
+
+
+def get_cross_profile_warning(path: str) -> Optional[str]:
+    """Return a model-facing warning string when ``path`` is cross-profile.
+
+    Returns ``None`` when the write is in-scope (same profile) or outside
+    Hermes entirely. Caller is expected to surface the warning to the
+    agent as a tool-result error, NOT to silently allow the write — the
+    agent must either get explicit user direction to proceed, or pass
+    ``cross_profile=True`` to its write tool.
+
+    This is defense-in-depth: the terminal tool runs as the same OS user
+    and can write any of these paths without going through this guard.
+    Treat the guard as a confusion-reducer, not a security boundary.
+    """
+    info = classify_cross_profile_target(path)
+    if info is None:
+        return None
+    return (
+        f"Cross-profile write blocked by soft guard: {info['target_path']} "
+        f"belongs to Hermes profile {info['target_profile']!r}, but the "
+        f"agent is running under profile {info['active_profile']!r}. "
+        f"Editing another profile's {info['area']}/ will affect that "
+        f"profile's future sessions, not the one you are currently in. "
+        f"Confirm with the user before proceeding. To bypass this guard "
+        f"after explicit user direction, retry the call with "
+        f"``cross_profile=True``. (Defense-in-depth — not a security "
+        f"boundary; the terminal tool can still bypass.)"
+    )
--- a/agent/system_prompt.py
+++ b/agent/system_prompt.py
@ -205,6 +205,40 @@ def build_system_prompt_parts(agent: Any, system_message: Optional[str] = None)
    if _env_hints:
        stable_parts.append(_env_hints)

+    # Active-profile hint — names the Hermes profile the agent is running
+    # under so it doesn't conflate ~/.hermes/skills/ (default profile) with
+    # ~/.hermes/profiles/<active>/skills/ (this profile's). Deterministic
+    # for the lifetime of the agent — profile name doesn't change
+    # mid-session, so this doesn't break the prompt cache.
+    # See file_safety._resolve_active_profile_name + classify_cross_profile_target
+    # for the matching tool-side guard.
+    try:
+        from agent.file_safety import _resolve_active_profile_name
+        active_profile = _resolve_active_profile_name()
+    except Exception:
+        active_profile = "default"
+    if active_profile == "default":
+        stable_parts.append(
+            "Active Hermes profile: default. Other profiles (if any) live "
+            "under ~/.hermes/profiles/<name>/. Each profile has its own "
+            "skills/, plugins/, cron/, and memories/ that affect a different "
+            "session than this one. Do not modify another profile's "
+            "skills/plugins/cron/memories unless the user explicitly directs "
+            "you to."
+        )
+    else:
+        stable_parts.append(
+            f"Active Hermes profile: {active_profile}. This session reads "
+            f"and writes ~/.hermes/profiles/{active_profile}/. The default "
+            f"profile's data lives at ~/.hermes/skills/, ~/.hermes/plugins/, "
+            f"~/.hermes/cron/, ~/.hermes/memories/ — those belong to a "
+            f"different session run from a different shell. Do NOT modify "
+            f"another profile's skills/plugins/cron/memories unless the user "
+            f"explicitly directs you to. The cross-profile write guard will "
+            f"refuse such writes by default; pass cross_profile=True only "
+            f"after explicit direction."
+        )
+
    platform_key = (agent.platform or "").lower().strip()
    if platform_key in PLATFORM_HINTS:
        stable_parts.append(PLATFORM_HINTS[platform_key])