hermes-agent/tests/tools
Teknium 285bb2b915
feat(execute_code): add project/strict execution modes, default to project (#11971)
Weaker models (Gemma-class) repeatedly rediscover and forget that
execute_code uses a different CWD and Python interpreter than terminal(),
causing them to flip-flop on whether user files exist and to hit import
errors on project dependencies like pandas.

Adds a new 'code_execution.mode' config key (default 'project') that
brings execute_code into line with terminal()'s filesystem/interpreter:

  project (new default):
    - cwd       = session's TERMINAL_CWD (falls back to os.getcwd())
    - python    = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python
                  with a Python 3.8+ version check; falls back cleanly to
                  sys.executable if no venv or the candidate fails
    - result    : 'import pandas' works, '.env' resolves, matches terminal()

  strict (opt-in):
    - cwd       = staging tmpdir (today's behavior)
    - python    = sys.executable (today's behavior)
    - result    : maximum reproducibility and isolation; project deps
                  won't resolve

Security-critical invariants are identical across both modes and covered by
explicit regression tests:

  - env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, *_PASSWORD,
    *_CREDENTIAL, *_PASSWD, *_AUTH substrings)
  - SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no
    delegate_task, no MCP from inside scripts)
  - resource caps (5-min timeout, 50KB stdout, 50 tool calls)

Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool
descriptions (regression from commit 39b83f34 where agents on local
backends falsely believed they were sandboxed and refused networking).

Override via env var: HERMES_EXECUTE_CODE_MODE=strict|project
2026-04-18 01:46:25 -07:00
..
__init__.py test: reorganize test structure and add missing unit tests 2026-02-26 03:20:08 +03:00
test_accretion_caps.py fix(tools): bound _read_tracker sub-containers + prune _completion_consumed (#11839) 2026-04-17 15:53:57 -07:00
test_ansi_strip.py fix: strip ANSI at the source — clean terminal output before it reaches the model 2026-03-23 07:43:12 -07:00
test_approval.py fix(kimi): cover remaining fixed-temperature bypasses 2026-04-17 20:25:42 -07:00
test_approval_heartbeat.py fix(approval): heartbeat activity during gateway approval wait (#11245) 2026-04-16 14:48:50 -07:00
test_base_environment.py feat(environments): unified spawn-per-call execution layer 2026-04-08 17:23:15 -07:00
test_browser_camofox.py fix: /browser connect CDP override now takes priority over Camofox (#10523) 2026-04-15 14:11:18 -07:00
test_browser_camofox_persistence.py docs: remove nonexistent CAMOFOX_PROFILE_DIR env var references (#10976) 2026-04-16 04:07:11 -07:00
test_browser_camofox_state.py feat(discord): add channel_prompts config 2026-04-15 16:31:28 -07:00
test_browser_cdp_override.py Support browser CDP URL from config 2026-04-17 16:05:04 -07:00
test_browser_cleanup.py fix(doctor): only check the active memory provider, not all providers unconditionally (#6285) 2026-04-08 13:44:58 -07:00
test_browser_cloud_fallback.py fix(browser): runtime fallback to local Chromium when cloud provider fails 2026-04-16 04:19:34 -07:00
test_browser_console.py fix: add browser_console to browser toolset and core tools list (#1084) 2026-03-17 02:02:57 -07:00
test_browser_content_none_guard.py fix(browser): guard LLM response content against None in snapshot and vision (#3642) 2026-03-28 17:25:04 -07:00
test_browser_hardening.py fix(browser): hardening — dead code, caching, scroll perf, security, thread safety 2026-04-10 13:05:44 -07:00
test_browser_homebrew_paths.py fix(browser): add termux PATH fallbacks 2026-04-14 16:55:55 -07:00
test_browser_orphan_reaper.py fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843) 2026-04-17 18:46:30 -07:00
test_browser_secret_exfil.py fix: rewrite test mock secrets and add redaction fixture 2026-04-01 12:03:56 -07:00
test_browser_ssrf_local.py fix(browser): skip SSRF check for local backends (Camofox, headless Chromium) (#4292) 2026-03-31 10:40:13 -07:00
test_budget_config.py test(tools): add unit tests for budget_config module 2026-04-11 02:58:48 -07:00
test_checkpoint_manager.py fix(checkpoints): isolate shadow git repo from user's global config (#11261) 2026-04-16 16:06:49 -07:00
test_clarify_tool.py test(tools): add unit tests for clarify_tool.py 2026-02-27 03:29:26 -05:00
test_clipboard.py feat: fix img pasting in new ink plus newline after tools 2026-04-11 13:14:32 -05:00
test_code_execution.py fix: follow-up for salvaged PR #10854 2026-04-16 06:42:45 -07:00
test_code_execution_modes.py feat(execute_code): add project/strict execution modes, default to project (#11971) 2026-04-18 01:46:25 -07:00
test_command_guards.py fix: remove 115 verified dead code symbols across 46 production files 2026-04-10 03:44:43 -07:00
test_config_null_guard.py fix: guard config.get() against YAML null values to prevent AttributeError (#3377) 2026-03-27 04:03:00 -07:00
test_credential_files.py fix: remove 115 verified dead code symbols across 46 production files 2026-04-10 03:44:43 -07:00
test_cron_prompt_injection.py fix: cron prompt injection scanner bypass for multi-word variants 2026-02-26 13:55:54 +03:00
test_cronjob_tools.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
test_daytona_environment.py fix: update tests for unified spawn-per-call execution model 2026-04-08 17:23:15 -07:00
test_debug_helpers.py fix(tests): isolate HERMES_HOME in tests and adjust log directory for debug session 2026-03-02 04:34:21 -08:00
test_delegate.py test: update stale tests to match current code (#11963) 2026-04-17 21:35:30 -07:00
test_delegate_toolset_scope.py fix(security): restrict subagent toolsets to parent's enabled set (#3269) 2026-03-26 14:50:26 -07:00
test_docker_environment.py fix(tests): fix several failing/flaky tests on main (#6777) 2026-04-09 13:17:06 -07:00
test_docker_find.py feat: entry-level Podman support — find_docker() + rootless entrypoint (#10066) 2026-04-14 21:20:37 -07:00
test_env_passthrough.py fix: remove 115 verified dead code symbols across 46 production files 2026-04-10 03:44:43 -07:00
test_feishu_tools.py feat: add Feishu document comment intelligent reply with 3-tier access control 2026-04-17 19:04:11 -07:00
test_file_operations.py fix(patch): harden V4A patch parser and fuzzy match — 9 correctness bugs 2026-04-10 16:47:44 -07:00
test_file_operations_edge_cases.py fix(tools): remove dead code in _is_likely_binary and harden _check_lint against brace paths 2026-04-10 21:16:53 -07:00
test_file_ops_cwd_tracking.py fix(file-ops): follow terminal env's live cwd in _exec instead of init-time cached cwd (#11912) 2026-04-17 19:26:40 -07:00
test_file_read_guards.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
test_file_staleness.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
test_file_sync.py test(file_sync): add tests for bulk_upload_fn callback 2026-04-10 21:14:32 -07:00
test_file_sync_back.py fix: harden sync_back — PID-suffix temp path, size cap, lifecycle guards 2026-04-16 19:39:21 -07:00
test_file_sync_perf.py test: add reproducible perf benchmark for file sync overhead 2026-04-10 03:01:46 -07:00
test_file_tools.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
test_file_tools_live.py feat(environments): unified spawn-per-call execution layer 2026-04-08 17:23:15 -07:00
test_file_write_safety.py fix(file_tools): block /private/etc writes on macOS symlink bypass 2026-04-13 05:15:05 -07:00
test_force_dangerous_override.py fix(skills): honor policy table for dangerous verdicts 2026-03-14 11:27:02 -07:00
test_fuzzy_match.py fix(patch): harden V4A patch parser and fuzzy match — 9 correctness bugs 2026-04-10 16:47:44 -07:00
test_hidden_dir_filter.py fix: use Path.parts for hidden directory filter in skill listing 2026-03-04 18:34:16 +03:00
test_homeassistant_tool.py fix: clean up description escaping, add string-data tests 2026-04-13 04:45:07 -07:00
test_image_generation.py feat(image_gen): upgrade Recraft V3 → V4 Pro, Nano Banana → Pro (#11406) 2026-04-16 22:05:41 -07:00
test_interrupt.py fix: resolve remaining 4 CI test failures (#9543) 2026-04-14 02:18:38 -07:00
test_llm_content_none_guard.py fix: guard aux LLM calls against None content + reasoning fallback + retry (salvage #3389) (#3449) 2026-03-27 15:28:19 -07:00
test_local_env_blocklist.py fix(providers): complete NVIDIA NIM parity with other providers 2026-04-17 13:47:46 -07:00
test_local_interrupt_cleanup.py fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907) 2026-04-17 20:39:25 -07:00
test_local_tempdir.py fix(termux): honor temp dirs for local temp artifacts 2026-04-09 16:24:53 -07:00
test_managed_browserbase_and_modal.py feat: ungate Tool Gateway — subscription-based access with per-tool opt-in 2026-04-16 12:36:49 -07:00
test_managed_media_gateways.py feat: ungate Tool Gateway — subscription-based access with per-tool opt-in 2026-04-16 12:36:49 -07:00
test_managed_modal_environment.py fix: add activity heartbeats to prevent false gateway inactivity timeouts (#10501) 2026-04-15 13:29:05 -07:00
test_managed_server_tool_support.py fix(tests): fix several failing/flaky tests on main (#6777) 2026-04-09 13:17:06 -07:00
test_managed_tool_gateway.py feat: ungate Tool Gateway — subscription-based access with per-tool opt-in 2026-04-16 12:36:49 -07:00
test_mcp_dynamic_discovery.py fix(mcp): make server aliases explicit 2026-04-14 17:19:20 -07:00
test_mcp_oauth.py fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383) 2026-04-16 21:57:10 -07:00
test_mcp_oauth_integration.py fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383) 2026-04-16 21:57:10 -07:00
test_mcp_oauth_manager.py fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383) 2026-04-16 21:57:10 -07:00
test_mcp_probe.py fix: remove stale test skips, fix regex backtracking, file search bug, and test flakiness 2026-04-04 10:18:57 -07:00
test_mcp_reconnect_signal.py fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383) 2026-04-16 21:57:10 -07:00
test_mcp_stability.py fix: add vLLM/local server error patterns + MCP initial connection retry (#9281) 2026-04-13 18:46:14 -07:00
test_mcp_structured_content.py fix(mcp): combine content and structuredContent when both present (#7118) 2026-04-10 03:44:35 -07:00
test_mcp_tool.py fix(mcp): make server aliases explicit 2026-04-14 17:19:20 -07:00
test_mcp_tool_401_handling.py fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383) 2026-04-16 21:57:10 -07:00
test_mcp_tool_issue_948.py fix: remove stale test skips, fix regex backtracking, file search bug, and test flakiness 2026-04-04 10:18:57 -07:00
test_memory_tool.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
test_memory_tool_import_fallback.py fix(tools): keep memory tool available when fcntl is unavailable 2026-04-14 10:18:05 -07:00
test_mixture_of_agents_tool.py refactor: tighten MoA traceback logging scope (#1307) 2026-03-14 07:53:56 -07:00
test_modal_bulk_upload.py perf(ssh,modal): bulk file sync via tar pipe and tar/base64 archive (#8014) 2026-04-12 06:18:05 +05:30
test_modal_sandbox_fixes.py fix: update tests for unified spawn-per-call execution model 2026-04-08 17:23:15 -07:00
test_modal_snapshot_isolation.py fix(tests): update mocks for file sync changes 2026-04-10 03:01:46 -07:00
test_notify_on_complete.py fix: suppress duplicate completion notifications when agent already consumed output via wait/poll/log (#8228) 2026-04-12 00:36:22 -07:00
test_osv_check.py feat: OSV malware check for MCP extension packages (#5305) 2026-04-05 12:46:07 -07:00
test_parse_env_var.py fix(docker): add explicit env allowlist for container credentials (#1436) 2026-03-17 02:34:35 -07:00
test_patch_parser.py fix(patch): harden V4A patch parser and fuzzy match — 9 correctness bugs 2026-04-10 16:47:44 -07:00
test_process_registry.py fix(gateway): propagate user identity through process watcher pipeline 2026-04-11 13:46:16 -07:00
test_read_loop_detection.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
test_registry.py test: update stale tests to match current code (#11963) 2026-04-17 21:35:30 -07:00
test_rl_training_tool.py fix: call _stop_training_run on early-return failure paths 2026-03-10 17:09:51 -07:00
test_search_hidden_dirs.py fix: exclude hidden directories from find/grep search backends (#1558) 2026-03-17 02:02:57 -07:00
test_send_message_missing_platforms.py fix(send_message): deliver Matrix media via adapter 2026-04-15 17:37:43 -07:00
test_send_message_tool.py fix(discord): forum channel media + polish 2026-04-17 20:25:48 -07:00
test_session_search.py fix(session_search): coerce limit to int to prevent TypeError with non-int values (#10522) 2026-04-15 14:11:05 -07:00
test_singularity_preflight.py fix(tests): use case-insensitive regex in singularity preflight tests 2026-03-16 19:01:39 +03:00
test_skill_env_passthrough.py fix: remove 115 verified dead code symbols across 46 production files 2026-04-10 03:44:43 -07:00
test_skill_improvements.py feat(skills): size limits for agent writes + fuzzy matching for patch (#4414) 2026-04-01 04:19:19 -07:00
test_skill_manager_tool.py refactor: extract shared helpers to deduplicate repeated code patterns (#7917) 2026-04-11 13:59:52 -07:00
test_skill_size_limits.py feat(skills): size limits for agent writes + fuzzy matching for patch (#4414) 2026-04-01 04:19:19 -07:00
test_skill_view_path_check.py refactor: use Path.is_relative_to() for skill_view boundary check 2026-03-04 05:30:43 -08:00
test_skill_view_traversal.py fix(security): block path traversal in skill_view file_path (fixes #220) 2026-03-02 02:00:09 -08:00
test_skills_guard.py fix(skills): preserve trust for skills-sh identifiers + reduce resolution churn (#3251) 2026-03-26 13:40:21 -07:00
test_skills_hub.py fix: update 6 test files broken by dead code removal 2026-04-10 03:44:43 -07:00
test_skills_hub_clawhub.py fix: improve clawhub skill search matching 2026-03-14 23:15:04 -07:00
test_skills_sync.py feat(skills): add 'hermes skills reset' to un-stick bundled skills (#11468) 2026-04-17 00:41:31 -07:00
test_skills_tool.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
test_ssh_bulk_upload.py perf(ssh,modal): bulk file sync via tar pipe and tar/base64 archive (#8014) 2026-04-12 06:18:05 +05:30
test_ssh_environment.py fix(tests): update mocks for file sync changes 2026-04-10 03:01:46 -07:00
test_symlink_prefix_confusion.py fix: use is_relative_to() for symlink boundary check in skills_guard 2026-03-04 17:23:23 +03:00
test_sync_back_backends.py fix: harden sync_back — PID-suffix temp path, size cap, lifecycle guards 2026-04-16 19:39:21 -07:00
test_terminal_exit_semantics.py feat: add exit code context for common CLI tools in terminal results (#5144) 2026-04-04 16:57:24 -07:00
test_terminal_foreground_timeout_cap.py fix: reject foreground timeout above cap instead of clamping 2026-04-10 02:58:54 -07:00
test_terminal_none_command_guard.py fix(terminal): guard invalid command values 2026-04-08 21:37:51 -07:00
test_terminal_requirements.py feat: ungate Tool Gateway — subscription-based access with per-tool opt-in 2026-04-16 12:36:49 -07:00
test_terminal_timeout_output.py fix(terminal): preserve partial output when command times out (#3868) 2026-03-29 21:51:44 -07:00
test_terminal_tool.py fix terminal workdir validation for Windows paths 2026-04-15 15:06:51 -07:00
test_terminal_tool_pty_fallback.py feat: add tested Termux install path and EOF-aware gh auth 2026-04-09 16:24:53 -07:00
test_terminal_tool_requirements.py feat: ungate Tool Gateway — subscription-based access with per-tool opt-in 2026-04-16 12:36:49 -07:00
test_threaded_process_handle.py feat(environments): unified spawn-per-call execution layer 2026-04-08 17:23:15 -07:00
test_tirith_security.py fix: send_animation metadata, MarkdownV2 inline code splitting, tirith cosign-free install (#1626) 2026-03-16 23:39:41 -07:00
test_todo_tool.py fix(tools): enforce ID uniqueness in TODO store during replace operations 2026-04-11 16:22:50 -07:00
test_tool_backend_helpers.py feat: ungate Tool Gateway — subscription-based access with per-tool opt-in 2026-04-16 12:36:49 -07:00
test_tool_call_parsers.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_tool_result_storage.py fix(tools): neutralize shell injection in _write_to_sandbox via path quoting (#7940) 2026-04-11 14:26:11 -07:00
test_transcription.py feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647) 2026-03-31 03:10:01 -07:00
test_transcription_tools.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
test_tts_gemini.py feat(tts): add Google Gemini TTS provider (#11229) 2026-04-16 14:23:16 -07:00
test_tts_mistral.py feat(tools): add Voxtral TTS provider (Mistral AI) 2026-04-11 01:56:55 -07:00
test_tts_speed.py test(tts): add speed config tests for Edge, OpenAI, and MiniMax 2026-04-12 16:46:18 -07:00
test_url_safety.py fix: allow trusted QQ CDN benchmark IP resolution 2026-04-17 04:22:40 -07:00
test_vision_tools.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
test_voice_cli_integration.py fix(tests): fix 78 CI test failures and remove dead test (#9036) 2026-04-13 10:50:24 -07:00
test_voice_mode.py fix(termux): tighten voice setup and mobile chat UX 2026-04-09 16:24:53 -07:00
test_watch_patterns.py fix(gateway): route synthetic background events by session 2026-04-15 11:16:01 -07:00
test_web_tools_config.py test: remove 169 change-detector tests across 21 files (#11472) 2026-04-17 01:05:09 -07:00
test_web_tools_tavily.py fix(tests): fix several failing/flaky tests on main (#6777) 2026-04-09 13:17:06 -07:00
test_website_policy.py fix: resolve 7 failing CI tests (#3936) 2026-03-30 08:10:14 -07:00
test_windows_compat.py fix: guard POSIX-only process functions for Windows compatibility 2026-03-01 01:54:27 +03:00
test_write_deny.py fix: resolve symlink bypass in write deny list on macOS 2026-02-26 13:30:55 +03:00
test_yolo_mode.py fix(gateway): scope /yolo to the active session 2026-04-10 03:38:44 -07:00
test_zombie_process_cleanup.py fix(tests): fix 78 CI test failures and remove dead test (#9036) 2026-04-13 10:50:24 -07:00