hermes-agent/tests/agent
Teknium d3c167b644
fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint (#31290)
* fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint

Adds a soft guard so an agent running under one Hermes profile cannot
silently edit a different profile's skills/plugins/cron/memories.
Three layers:

A. agent/file_safety.classify_cross_profile_target
   Classifies a write target against the active HERMES_HOME. Returns
   a {active_profile, target_profile, area, target_path} dict when the
   path lands in another profile's scoped area. PROFILE_SCOPED_AREAS =
   (skills, plugins, cron, memories). get_cross_profile_warning()
   wraps it into a model-facing error string that names both profiles,
   names the area, and points at the cross_profile=True bypass.

   Defense-in-depth, NOT a security boundary — the terminal tool runs
   as the same OS user and can write any of these paths directly. The
   guard exists to prevent confused-agent corruption, not to stop a
   determined attacker. SECURITY.md §3.2 (terminal-bypass posture)
   still applies.

   Wired into tools/file_tools.write_file_tool and patch_tool with a
   cross_profile=False kwarg. WRITE_FILE_SCHEMA and PATCH_SCHEMA both
   advertise cross_profile so the model can pass it after explicit
   user direction. patch_tool extracts target paths from V4A patch
   bodies before checking (same shape as the existing sensitive-path
   check).

   skill_manage is already scoped to the active profile's SKILLS_DIR
   by construction, so no extra guard wiring is needed there. The
   D-side error message (below) still names other profiles when the
   skill exists elsewhere.

B. agent/system_prompt
   One deterministic line near the environment-hints block names the
   active profile and tells the model not to modify another profile's
   skills/plugins/cron/memories without explicit direction. Profile
   name is stable for the lifetime of the AIAgent, so the line is
   prompt-cache-safe.

D. tools/skill_manager_tool._skill_not_found_error
   Replaces the bare "Skill 'X' not found." with a message that:
     - names the active profile,
     - searches OTHER profiles' skills dirs for the same name,
     - names the profile(s) where the skill exists and the path,
     - suggests `hermes -p <name>` to switch profiles, or
       cross_profile=True for an explicit edit.

   All 5 "not found" sites in skill_manager_tool (edit, patch, delete,
   write_file, remove_file) now go through the helper.

Reference incident (May 2026): a hermes-security profile session
edited skills under both ~/.hermes/profiles/hermes-security/skills/
AND ~/.hermes/skills/ (the default profile's skills) without
realizing the second path belonged to a different profile. Three of
the four skill files needed manual restoration afterward.

What this PR does NOT do:

  * No hard block. The terminal tool can still touch any of these
    paths with no guard — same posture as the dangerous-command
    approval flow. SECURITY.md §3.2 applies.
  * No regex sweep on terminal commands for cross-profile paths.
    That direction is a Skills-Guard-style arms race (cd + relative
    paths, base64, etc.) and would false-positive on legitimate
    cross-profile reads. Filed as a follow-up.
  * No on-disk path migration. ~/.hermes/skills/ remains the
    default profile's skills dir; this PR is about telling the
    agent about that boundary, not changing the layout.

Tests:
  tests/agent/test_file_safety_cross_profile.py (16 tests)
    - _resolve_active_profile_name covers default/named/failure paths
    - classify_cross_profile_target covers all four scoped areas,
      both directions (default → named, named → default, named → named),
      non-Hermes paths, and root-level config files
    - get_cross_profile_warning covers in-profile no-op, cross-profile
      message shape, and the defense-in-depth self-documentation

  tests/tools/test_cross_profile_guard.py (12 tests)
    - write_file: in-profile allow, cross-profile block, cross_profile=True
      bypass, non-Hermes pass-through
    - patch: replace-mode block, cross_profile=True bypass, V4A patch
      path extraction
    - skill_manage: error names the other profile (single + multiple),
      missing-everywhere falls back to skills_list hint
    - system prompt: contract-level checks (both branches present,
      cross_profile=True mentioned, ~/.hermes/profiles/ referenced)

All 207 existing tests in file_safety/file_operations/skill_manager
still pass. 10 system-prompt tests still pass.

E2E verified: the exact incident scenario (security profile editing
default's hermes-agent-dev skill) is now blocked with the warning
message; cross_profile=True unblocks.

* fix(code_execution): add cross_profile to write_file/patch stubs

The cross_profile kwarg added to write_file_tool/patch_tool needs to
flow through the execute_code sandbox stubs in _TOOL_STUBS so the
test_stubs_cover_all_schema_params drift test passes. Without this,
scripts running inside execute_code couldn't pass cross_profile=True
through hermes_tools.write_file().

Caught by CI on PR #31290.
2026-05-24 00:38:17 -07:00
..
lsp fix(lint): skip per-file shell linter when LSP will handle the file (#29054) 2026-05-20 01:46:40 -05:00
transports fix(tui): handle images with codex app-server 2026-05-23 20:40:09 -07:00
__init__.py test: add unit tests for 8 modules (batch 2) 2026-02-26 13:54:20 +03:00
test_anthropic_adapter.py feat(azure-foundry): add Microsoft Entra ID auth 2026-05-18 10:14:38 -07:00
test_anthropic_keychain.py fix: re-auth on stale OAuth token; read Claude Code credentials from macOS Keychain 2026-04-24 07:14:00 -07:00
test_anthropic_oauth_pkce.py test(security): regression guard for OAuth PKCE state/verifier separation 2026-05-16 02:38:02 -07:00
test_arcee_trinity_overrides.py test(arcee): cover Trinity Large Thinking temperature + compression overrides 2026-05-05 17:23:45 -07:00
test_async_utils.py fix(async): close unscheduled coroutines in all threadsafe bridges (#26584) 2026-05-15 14:00:01 -07:00
test_auxiliary_client.py test: use subprocesses for each test file (#29016) 2026-05-21 16:40:04 +05:30
test_auxiliary_client_anthropic_custom.py fix(anthropic): complete third-party Anthropic-compatible provider support (#12846) 2026-04-19 22:43:09 -07:00
test_auxiliary_client_azure_foundry.py feat(azure-foundry): add Microsoft Entra ID auth 2026-05-18 10:14:38 -07:00
test_auxiliary_config_bridge.py feat(plugins): add register_auxiliary_task() to PluginContext API 2026-05-23 17:49:47 -07:00
test_auxiliary_main_first.py chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355) 2026-05-17 02:29:41 -07:00
test_auxiliary_named_custom_providers.py fix(fallback): let custom_providers shadow built-in aliases 2026-04-30 20:18:44 -07:00
test_auxiliary_transport_autodetect.py fix(auxiliary): auto-detect Anthropic Messages transport for all aux clients (#17027) 2026-04-28 06:50:14 -07:00
test_azure_identity_adapter.py feat(azure-foundry): add Microsoft Entra ID auth 2026-05-18 10:14:38 -07:00
test_bedrock_1m_context.py feat(azure-foundry): add Microsoft Entra ID auth 2026-05-18 10:14:38 -07:00
test_bedrock_adapter.py test(ci): stabilize shared optional dependency baselines 2026-05-13 17:32:22 -07:00
test_bedrock_integration.py test(ci): stabilize shared optional dependency baselines 2026-05-13 17:32:22 -07:00
test_codex_cloudflare_headers.py fix(aux): remove hardcoded Codex fallback model, drop Codex from auto chain (#17765) 2026-04-29 23:23:50 -07:00
test_compress_focus.py fix: resolve CI test failures — add missing functions, fix stale tests (#9483) 2026-04-14 01:43:45 -07:00
test_compressor_historical_media.py Port from Kilo-Org/kilocode#9434: strip historical media after compression (#27189) 2026-05-16 17:18:25 -07:00
test_compressor_image_tokens.py feat(image-input): native multimodal routing based on model vision capability (#16506) 2026-04-27 06:27:59 -07:00
test_context_compressor.py test: keep tirith checks hermetic 2026-05-23 02:20:14 -07:00
test_context_compressor_summary_continuity.py fix(ci): stabilize shared test state after 21012 2026-05-14 14:28:14 -07:00
test_context_engine.py feat: wire context engine plugin slot into agent and plugin system 2026-04-10 19:15:50 -07:00
test_context_references.py fix(agent): fall back when rg is blocked for @folder references 2026-04-20 01:56:41 -07:00
test_copilot_acp_client.py fix(ci): recover 38 failing tests on main (#17642) 2026-04-29 20:05:32 -07:00
test_copilot_acp_deprecation.py fix(copilot-acp): tighten deprecation detection + sharpen GitHub Models 413 hint 2026-05-16 02:24:48 -07:00
test_credential_pool.py fix(codex-oauth): quarantine terminal refresh errors so dead tokens are not replayed across sessions 2026-05-18 10:31:40 -07:00
test_credential_pool_routing.py refactor: remove smart_model_routing feature (#12732) 2026-04-19 18:12:55 -07:00
test_crossloop_client_cache.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_curator.py fix(skills): keep manual skills out of curator 2026-05-04 02:19:28 -07:00
test_curator_activity.py fix: use skill activity in curator status 2026-04-30 10:31:47 -07:00
test_curator_backup.py fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671) (#18731) 2026-05-02 01:29:57 -07:00
test_curator_classification.py feat(curator): hint at hermes curator pin in the rename block (#23212) 2026-05-10 06:44:53 -07:00
test_curator_reports.py fix(curator): rewrite cron job skill refs after consolidation (#18253) 2026-04-30 23:04:50 -07:00
test_custom_provider_extra_body.py fix(custom): pass custom provider extra body 2026-05-21 07:48:53 -07:00
test_deepseek_anthropic_thinking.py chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355) 2026-05-17 02:29:41 -07:00
test_direct_provider_url_detection.py fix: restrict provider URL detection to exact hostname matches 2026-04-20 22:14:29 -07:00
test_display.py fix: classify landed file mutations with diagnostics 2026-05-13 06:46:23 -07:00
test_display_emoji.py feat(tools): centralize tool emoji metadata in registry + skin integration 2026-03-15 20:21:21 -07:00
test_display_todo_progress.py feat(cli): show todo progress as done/total fraction 2026-05-23 21:03:51 -07:00
test_display_tool_failure.py test(display): cover failure-suffix rendering + update scrollback test 2026-05-23 21:03:51 -07:00
test_error_classifier.py fix(agent): recover from providers rejecting list-type tool content (#27344) (#30259) 2026-05-21 23:40:16 -07:00
test_external_skills.py feat(skills): support external skill directories via config (#3678) 2026-03-29 00:33:30 -07:00
test_external_skills_dirs_cache.py perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138) 2026-05-08 16:39:32 -07:00
test_file_safety_credentials.py fix(file-safety): widen read-deny to .env, mcp-tokens/, webhook secrets, root 2026-05-22 20:15:09 -07:00
test_file_safety_cross_profile.py fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint (#31290) 2026-05-24 00:38:17 -07:00
test_gemini_cloudcode.py fix(agent/gemini-cloudcode): seed delta defaults for reasoning-only stream chunks 2026-05-14 08:03:56 -07:00
test_gemini_fast_fallback.py fix: wrap _pool_may_recover_from_rate_limit call through run_agent namespace 2026-05-18 20:04:57 -07:00
test_gemini_free_tier_gate.py feat(gemini): block free-tier keys at setup + surface guidance on 429 (#15100) 2026-04-24 04:46:17 -07:00
test_gemini_native_adapter.py fix(gemini): fail fast on missing API key + surface it in hermes dump (#15133) 2026-04-24 05:35:17 -07:00
test_gemini_schema.py fix(gemini): drop integer/number/boolean enums from tool schemas (#15082) 2026-04-24 03:40:00 -07:00
test_i18n.py feat(i18n): localize all gateway commands + web dashboard, add 8 new locales (16 total) (#22914) 2026-05-10 07:14:14 -07:00
test_image_gen_registry.py feat(plugins): pluggable image_gen backends + OpenAI provider (#13799) 2026-04-21 21:30:10 -07:00
test_image_routing.py fix(agent): consult supports_vision override in auto-mode routing 2026-05-20 23:27:10 -07:00
test_insights.py test: stop testing mutable data — convert change-detectors to invariants (#13363) 2026-04-20 23:20:33 -07:00
test_kimi_coding_anthropic_thinking.py fix(anthropic): broaden Kimi thinking-suppression to custom endpoints (#17455) 2026-04-29 06:35:42 -07:00
test_last_total_tokens.py fix(compressor): ABC compliance — total_tokens, api_mode, logger consistency 2026-05-23 17:38:19 -07:00
test_local_stream_timeout.py fix(agent): recognize Tailscale CGNAT (100.64.0.0/10) as local for Ollama timeouts 2026-04-22 14:46:10 -07:00
test_markdown_tables.py fix(cli): vertical fallback for markdown tables wider than terminal (#23948) 2026-05-11 16:49:13 -07:00
test_memory_provider.py fix(agent): widen toolset gate to context engine tools (#5544 sibling) 2026-05-21 23:18:37 -07:00
test_memory_session_switch.py feat(hindsight): probe API for update_mode='append' support, dedupe across processes 2026-05-05 15:09:59 -07:00
test_memory_user_id.py feat(hindsight): richer session-scoped retain metadata 2026-04-22 05:27:10 -07:00
test_minimax_auxiliary_url.py fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983) 2026-04-07 22:23:28 -07:00
test_minimax_provider.py feat: provider modules — ProviderProfile ABC, 33 providers, fetch_models, transport single-path 2026-05-05 13:40:01 -07:00
test_model_metadata.py fix(xai): resolve Grok Build context for OAuth 2026-05-22 13:05:36 -07:00
test_model_metadata_local_ctx.py fix(tui): show correct context length 2026-04-28 12:27:36 -07:00
test_model_metadata_ssl.py fix(auth): honor SSL CA env vars across httpx + requests callsites 2026-04-24 03:00:33 -07:00
test_models_dev.py fix(xai): resolve Grok Build context for OAuth 2026-05-22 13:05:36 -07:00
test_moonshot_schema.py fix(moonshot): strip $ref siblings and collapse tuple items in tool schemas (#27104) 2026-05-16 13:02:19 -07:00
test_nous_rate_guard.py fix(nous): don't trip cross-session rate breaker on upstream-capacity 429s (#15898) 2026-04-26 04:53:42 -07:00
test_onboarding.py docs(onboarding): lead OpenClaw residue banner with migrate, warn that cleanup breaks OpenClaw (#17507) 2026-04-29 08:08:36 -07:00
test_openrouter_response_cache.py fix(openrouter): use canonical X-Title attribution header 2026-05-05 10:13:34 -07:00
test_plugin_llm.py feat(plugins): run any LLM call from inside a plugin via ctx.llm (#23194) 2026-05-10 07:09:28 -07:00
test_portal_tags.py feat(nous): unified client=hermes-client-v<version> tag on every Portal request (#24779) 2026-05-12 20:49:20 -07:00
test_prompt_builder.py fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS 2026-05-18 20:06:49 -07:00
test_prompt_caching.py fix(cache): kill long-lived prefix layout — system prompt is now byte-static within a session (#24778) 2026-05-12 20:46:04 -07:00
test_proxy_and_url_validation.py fix(agent): normalize socks:// env proxies for httpx/anthropic 2026-04-21 05:52:46 -07:00
test_rate_limit_tracker.py feat: capture provider rate limit headers and show in /usage (#6541) 2026-04-09 03:43:14 -07:00
test_redact.py fix(security): redact xAI (Grok) API keys in logs 2026-05-18 10:21:22 -07:00
test_shell_hooks.py fix(security): restore type safety and extract constant in shell hook block handler 2026-05-17 02:31:18 -07:00
test_shell_hooks_consent.py fix(shell_hooks): parse hooks_auto_accept as strict bool/string, not bool() (#16322) 2026-04-26 20:48:35 -07:00
test_skill_bundles.py feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373) 2026-05-18 21:38:05 -07:00
test_skill_commands.py test: use subprocesses for each test file (#29016) 2026-05-21 16:40:04 +05:30
test_skill_commands_reload.py refactor(reload-skills): queue note for next turn, drop cache invalidation + agent tool 2026-04-29 21:07:47 -07:00
test_skill_utils.py fix(skills): load Linux-tagged skills on Termux (android sys.platform) 2026-05-21 19:08:38 -07:00
test_streaming_context_scrubber.py 🐛 fix(memory): require newline after context tag 2026-05-18 10:53:08 -07:00
test_subagent_progress.py feat(delegate): orchestrator role and configurable spawn depth (default flat) 2026-04-21 14:23:45 -07:00
test_subagent_stop_hook.py feat: shell hooks — wire shell scripts as Hermes hook callbacks 2026-04-20 20:53:51 -07:00
test_subdirectory_hints.py fix(agent): catch PermissionError in subdirectory hint discovery 2026-04-09 03:10:30 -07:00
test_system_prompt_restore.py perf(prompt-cache): date-only timestamp + loud gateway-DB roundtrip logging 2026-05-17 23:20:37 -07:00
test_think_scrubber.py fix(agent): stateful streaming scrubber for reasoning-block leaks (#17924) (#20184) 2026-05-05 04:33:38 -07:00
test_title_generator.py fix: improve telegram topic mode setup 2026-05-04 12:07:17 -07:00
test_tool_guardrails.py fix: add recovery hints to loop guard warnings 2026-05-19 00:12:12 -07:00
test_tool_result_classification.py fix: classify landed file mutations with diagnostics 2026-05-13 06:46:23 -07:00
test_unsupported_parameter_retry.py test: remove 50 stale/broken tests to unblock CI (#22098) 2026-05-08 14:55:40 -07:00
test_unsupported_temperature_retry.py refactor(memory): remove flush_memories entirely (#15696) 2026-04-25 08:21:14 -07:00
test_usage_pricing.py fix(pricing): add deepseek-v4-pro to official docs pricing table 2026-05-12 16:32:57 -07:00
test_video_gen_registry.py feat(video_gen): unified video_generate tool with pluggable provider backends (#25126) 2026-05-13 16:39:41 -07:00
test_vision_resolved_args.py fix(vision): preserve explicit provider auth with custom base_url 2026-05-04 05:05:43 -07:00