hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-14 14:12:44 +00:00

History

ethernet 48be2e0e4d test: use subprocesses for each test file (#29016 ) * ci(tests): install ripgrep from prebuilt tarball instead of apt apt-get update + install of ripgrep takes ~4 min on the GHA Ubuntu runners (the apt-get update against archive.ubuntu.com is the slow part; ripgrep itself is small). Switching to the upstream musl binary tarball cuts the step to a few seconds. - Pinned to ripgrep 15.1.0 with sha256 verification (same hash as published in the releases sha256 sidecar file). - Drops the `rg` binary into /usr/local/bin so it is on PATH for every subsequent step without GITHUB_PATH manipulation. - Applied to both the test and e2e jobs in tests.yml. * fix(cli): compile syntax check to tempdir, not source __pycache__ `_validate_critical_files_syntax` runs `py_compile.compile()` on each critical bootstrap file after a successful `git pull`. The default `py_compile` writes the resulting `.pyc` next to the source under `__pycache__/`, which causes two real problems: 1. Parallel test workers walking the same source tree (e.g. running the suite under per-file process isolation) can race against each other on the `__pycache__` write — manifests as flaky 'directory not empty' errors during teardown. 2. In production, the post-pull syntax check leaves a `.pyc` behind that the next interpreter run might pick up — fine when the interpreter version matches, sketchy if it doesn't. Fix: write the compiled output to a `tempfile.TemporaryDirectory()` that's discarded on function exit. We only care about the compile-or-not signal, not the artifact. * test(runner): per-file process isolation, drop manual state reset + xdist Replace fragile manual _reset_module_state test fixtures with robust per-file subprocess isolation. Each test file runs in a fresh `python -m pytest <file>` subprocess via ThreadPoolExecutor. No xdist, no custom pytest plugin, no shared worker state. Key changes: * scripts/run_tests_parallel.py — new runner: discovers test files, runs N in parallel via ThreadPoolExecutor, captures stdout per file, treats exit code 5 (no tests collected) as pass, kills all children on exit. Change from cpu_count to cpu_count2. The runner is I/O-bound (waiting on subprocess.communicate() from pytest children) The parent process does almost no CPU work, so 2x oversubscription keeps more pipes full. When a file fails, immediately show the last 30 lines of pytest output (stack traces + FAILED summary) plus a ready-to-copy repro command: python -m pytest tests/agent/test_auxiliary_client.py scripts/run_tests.sh — delegates to run_tests_parallel.py * .github/workflows/tests.yml — test step: python scripts/run_tests_parallel.py * pyproject.toml — drop pytest-xdist, pytest-split; simplify addopts * tests/conftest.py — remove ~200 lines of manual state-reset fixtures * AGENTS.md — update Testing section for per-file design * test(runner): speed gateway test antipattern scan up * fix(test): web search provider plugin test missing xai * fix(tests): make 14 test files pass under per-file subprocess isolation Tests that relied on cross-file state pollution from xdist workers fail when run in isolation (per-file subprocess model). Root causes and fixes: Tool registry not populated: - test_video_generation_tool_surface_matrix: add discover_builtin_tools() - test_web_providers_brave_free/ddgs/searxng/general: autouse fixtures registering all 8 bundled web providers, reset after each test - test_website_policy: same provider registration pattern - test_web_tools_tavily: same pattern across 3 dispatch test classes - Also add is_safe_url/check_website_access mocks where SSRF check blocks example.com (DNS resolution fails in isolated envs) Stale check_fn cache: - test_kanban_tools: invalidate_check_fn_cache() + _clear_tool_defs_cache() in both kanban guidance tests (prior test cached False for kanban_show) - test_discord_tool: cache invalidation in setup/teardown - test_homeassistant_tool: invalidate_check_fn_cache() before registry queries Module-level state pollution: - test_auxiliary_client: autouse fixture clearing _aux_unhealthy_until cache - test_skill_commands: set_session_vars() instead of patch.dict(os.environ) (ContextVar takes precedence over os.environ) - test_dm_topics: overwrite sys.modules + separate telegram.constants mock + force-reimport of gateway.platforms.telegram - test_terminal_tool_requirements: removed duplicate class declaration, autouse _clear_caches fixture * change(tests): run_tests.sh explicitly includes env vars instead of manually dropping some vars, now we just only include some * fix(tests): 5 more isolation/NixOS fixes - test_approval_plugin_hooks: isolate HERMES_HOME so real user's command_allowlist doesn't short-circuit the approval path - test_google_chat: skipif when Platform.GOOGLE_CHAT not in enum (feature not merged on this branch) - test_write_deny: test systemd prefix against tmp_path instead of /etc/systemd which resolves to /nix/store on NixOS - test_pty_bridge: use shutil.which('cat') instead of /bin/cat (doesn't exist on NixOS) - profiles.py: rmtree onexc handler chmod's parent dirs too, fixing profile deletion when copytree preserved read-only modes from nix store * fix(tests): clear unhealthy cache in autouse fixture for auxiliary_client * fix(tests): skip send_message when telegram not installed; handle missing worker_id in browser_supervisor * fix: py3.11 rmtree onexc compat + belt-and-suspenders unhealthy cache clear for expired codex test * fix: address PR #29016 review feedback - Remove tracked .pytest-cache/ artifact and add to .gitignore - Fix stale 'xdist worker' comment in conftest.py - Deduplicate web provider registration into tests/tools/conftest.py shared helper (register_all_web_providers), replacing 8 copy-pasted blocks across 6 test files - Update PR description: remove stale recovered-test-files claim, fix worker count to match code (cpu_count2) fix: eliminate race in stale-cache achievements test The background scan thread could complete and overwrite _SNAPSHOT_CACHE before evaluate_all() returned the stale data — only 10 fake sessions made the scan finish instantly. Added scan_delay param to _FakeSessionDB and set it to 2s in the stale-cache test so the background thread can't win the race.		2026-05-21 16:40:04 +05:30
..
browser	fix(browser): self-review pass — dead-import, log levels, future-proofing	2026-05-17 04:04:15 -07:00
image_gen	feat(xai-oauth): add xAI Grok OAuth (SuperGrok Subscription) provider	2026-05-15 12:11:32 -07:00
memory	fix(memory): skip OpenViking upload symlinks	2026-05-14 07:48:03 -07:00
model_providers	fix(deepseek): set default_aux_model on profile so aux warning stops firing	2026-05-16 22:54:22 -07:00
video_gen	chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355 )	2026-05-17 02:29:41 -07:00
web	test: use subprocesses for each test file (#29016 )	2026-05-21 16:40:04 +05:30
__init__.py	fix: mem0 API v2 compat, prefetch context fencing, secret redaction (#5423 )	2026-04-05 22:43:33 -07:00
test_achievements_plugin.py	test: use subprocesses for each test file (#29016 )	2026-05-21 16:40:04 +05:30
test_disk_cleanup_plugin.py	feat(plugins): make all plugins opt-in by default	2026-04-20 04:46:45 -07:00
test_google_meet_audio.py	feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364 )	2026-04-27 06:22:25 -07:00
test_google_meet_node.py	feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364 )	2026-04-27 06:22:25 -07:00
test_google_meet_plugin.py	feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364 )	2026-04-27 06:22:25 -07:00
test_google_meet_realtime.py	feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364 )	2026-04-27 06:22:25 -07:00
test_kanban_dashboard_plugin.py	fix(kanban-dashboard): restore implementations dropped during salvages (#28481 )	2026-05-18 21:54:56 -07:00
test_kanban_worker_runs.py	feat(kanban): worker visibility endpoints (workers/active, runs/{id}, inspect)	2026-05-18 21:01:47 -07:00
test_langfuse_plugin.py	fix(langfuse): complete observability fix — trace I/O, tool outputs, placeholder credentials (closes #22342 , #22763 ) (#26320 )	2026-05-15 05:04:02 -07:00
test_retaindb_plugin.py	test: speed up slow tests (backoff + subprocess + IMDS network) (#11797 )	2026-04-17 14:21:22 -07:00
test_teams_pipeline_plugin.py	fix(teams-pipeline): fill in missing delivery URL in adapter-reuse test	2026-05-08 12:00:09 -07:00