hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-30 06:41:51 +00:00

Author	SHA1	Message	Date
ethernet	48be2e0e4d	test: use subprocesses for each test file (#29016 ) * ci(tests): install ripgrep from prebuilt tarball instead of apt apt-get update + install of ripgrep takes ~4 min on the GHA Ubuntu runners (the apt-get update against archive.ubuntu.com is the slow part; ripgrep itself is small). Switching to the upstream musl binary tarball cuts the step to a few seconds. - Pinned to ripgrep 15.1.0 with sha256 verification (same hash as published in the releases sha256 sidecar file). - Drops the `rg` binary into /usr/local/bin so it is on PATH for every subsequent step without GITHUB_PATH manipulation. - Applied to both the test and e2e jobs in tests.yml. * fix(cli): compile syntax check to tempdir, not source __pycache__ `_validate_critical_files_syntax` runs `py_compile.compile()` on each critical bootstrap file after a successful `git pull`. The default `py_compile` writes the resulting `.pyc` next to the source under `__pycache__/`, which causes two real problems: 1. Parallel test workers walking the same source tree (e.g. running the suite under per-file process isolation) can race against each other on the `__pycache__` write — manifests as flaky 'directory not empty' errors during teardown. 2. In production, the post-pull syntax check leaves a `.pyc` behind that the next interpreter run might pick up — fine when the interpreter version matches, sketchy if it doesn't. Fix: write the compiled output to a `tempfile.TemporaryDirectory()` that's discarded on function exit. We only care about the compile-or-not signal, not the artifact. * test(runner): per-file process isolation, drop manual state reset + xdist Replace fragile manual _reset_module_state test fixtures with robust per-file subprocess isolation. Each test file runs in a fresh `python -m pytest <file>` subprocess via ThreadPoolExecutor. No xdist, no custom pytest plugin, no shared worker state. Key changes: * scripts/run_tests_parallel.py — new runner: discovers test files, runs N in parallel via ThreadPoolExecutor, captures stdout per file, treats exit code 5 (no tests collected) as pass, kills all children on exit. Change from cpu_count to cpu_count2. The runner is I/O-bound (waiting on subprocess.communicate() from pytest children) The parent process does almost no CPU work, so 2x oversubscription keeps more pipes full. When a file fails, immediately show the last 30 lines of pytest output (stack traces + FAILED summary) plus a ready-to-copy repro command: python -m pytest tests/agent/test_auxiliary_client.py scripts/run_tests.sh — delegates to run_tests_parallel.py * .github/workflows/tests.yml — test step: python scripts/run_tests_parallel.py * pyproject.toml — drop pytest-xdist, pytest-split; simplify addopts * tests/conftest.py — remove ~200 lines of manual state-reset fixtures * AGENTS.md — update Testing section for per-file design * test(runner): speed gateway test antipattern scan up * fix(test): web search provider plugin test missing xai * fix(tests): make 14 test files pass under per-file subprocess isolation Tests that relied on cross-file state pollution from xdist workers fail when run in isolation (per-file subprocess model). Root causes and fixes: Tool registry not populated: - test_video_generation_tool_surface_matrix: add discover_builtin_tools() - test_web_providers_brave_free/ddgs/searxng/general: autouse fixtures registering all 8 bundled web providers, reset after each test - test_website_policy: same provider registration pattern - test_web_tools_tavily: same pattern across 3 dispatch test classes - Also add is_safe_url/check_website_access mocks where SSRF check blocks example.com (DNS resolution fails in isolated envs) Stale check_fn cache: - test_kanban_tools: invalidate_check_fn_cache() + _clear_tool_defs_cache() in both kanban guidance tests (prior test cached False for kanban_show) - test_discord_tool: cache invalidation in setup/teardown - test_homeassistant_tool: invalidate_check_fn_cache() before registry queries Module-level state pollution: - test_auxiliary_client: autouse fixture clearing _aux_unhealthy_until cache - test_skill_commands: set_session_vars() instead of patch.dict(os.environ) (ContextVar takes precedence over os.environ) - test_dm_topics: overwrite sys.modules + separate telegram.constants mock + force-reimport of gateway.platforms.telegram - test_terminal_tool_requirements: removed duplicate class declaration, autouse _clear_caches fixture * change(tests): run_tests.sh explicitly includes env vars instead of manually dropping some vars, now we just only include some * fix(tests): 5 more isolation/NixOS fixes - test_approval_plugin_hooks: isolate HERMES_HOME so real user's command_allowlist doesn't short-circuit the approval path - test_google_chat: skipif when Platform.GOOGLE_CHAT not in enum (feature not merged on this branch) - test_write_deny: test systemd prefix against tmp_path instead of /etc/systemd which resolves to /nix/store on NixOS - test_pty_bridge: use shutil.which('cat') instead of /bin/cat (doesn't exist on NixOS) - profiles.py: rmtree onexc handler chmod's parent dirs too, fixing profile deletion when copytree preserved read-only modes from nix store * fix(tests): clear unhealthy cache in autouse fixture for auxiliary_client * fix(tests): skip send_message when telegram not installed; handle missing worker_id in browser_supervisor * fix: py3.11 rmtree onexc compat + belt-and-suspenders unhealthy cache clear for expired codex test * fix: address PR #29016 review feedback - Remove tracked .pytest-cache/ artifact and add to .gitignore - Fix stale 'xdist worker' comment in conftest.py - Deduplicate web provider registration into tests/tools/conftest.py shared helper (register_all_web_providers), replacing 8 copy-pasted blocks across 6 test files - Update PR description: remove stale recovered-test-files claim, fix worker count to match code (cpu_count2) fix: eliminate race in stale-cache achievements test The background scan thread could complete and overwrite _SNAPSHOT_CACHE before evaluate_all() returned the stale data — only 10 fake sessions made the scan finish instantly. Added scan_delay param to _FakeSessionDB and set it to 2s in the stale-cache test so the background thread can't win the race.	2026-05-21 16:40:04 +05:30
kshitij	5fba236644	chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355 ) Six days after #23937 (608 fixes) the codebase had accumulated 241 new PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix, same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the two are semantically equivalent for hashable scalar membership tests. All 241 instances fixed via `ruff check --select PLR6201 --fix --unsafe-fixes`, zero remaining. Every changed value is a hashable scalar (str/int/None/enum/signal); no risk of unhashable runtime errors. No behavior change. Test plan: - 119 files changed, +244/-244 (net zero) — exactly one-line edits - `ruff check` clean afterward - Compile checks pass on the largest touched files (cli.py, run_agent.py, gateway/run.py, gateway/platforms/discord.py, model_tools.py) - Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/ tests/tools/: 18187 passed, 59 pre-existing failures (verified against origin/main with the same shape — identical failure count, identical category — all xdist test-order flakes unrelated to this change) Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).	2026-05-17 02:29:41 -07:00
teknium1	1dca6a6960	feat(discord): render clarify choices as buttons Brings Discord to parity with Telegram on the clarify tool's interactive UX. Overrides BasePlatformAdapter.send_clarify on DiscordAdapter to attach a button view when choices are present. - ClarifyChoiceView: one discord.ui.Button per choice (max 24, Discord's 25-component view cap leaves one slot for Other) plus a final 'Other (type answer)' button. - Numeric click -> tools.clarify_gateway.resolve_gateway_clarify( clarify_id, choice_text) using the canonical choice text from the gateway entry (falls back to the button label if the entry vanished). - Other click -> tools.clarify_gateway.mark_awaiting_text(clarify_id) so the gateway's text-intercept captures the next user message in this session as the response. - Auth via the shared _component_check_auth helper (same OR-semantics as ExecApprovalView / SlashConfirmView / UpdatePromptView / ModelPickerView). - Open-ended (no choices) path renders the prompt as a plain embed and relies on the existing text-intercept resolution. - Single-use: first valid click disables every button and updates the embed footer with who answered and what they chose. No changes to BasePlatformAdapter.send_clarify or the gateway's clarify_callback wiring -- the existing scaffolding already drives all adapters; Discord just inherits the default text fallback today and gains buttons by virtue of this override. Test conftest extended: _FakeEmbed gains add_field() / set_footer() stubs so tests can construct embedded views without monkey-patching per-test. Original PR: #19249 by @LeonSGP43. This is a reshape of the contributor's work onto current main's clarify infrastructure (clarify_id + entry-based resolution shared with Telegram, instead of a parallel on_answer-closure mechanism). The button view structure and UX shape are preserved. Tests: 14 new tests in tests/gateway/test_discord_clarify_buttons.py. 391/391 existing Discord gateway tests still pass. Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>	2026-05-14 07:26:43 -07:00
Teknium	26787ce638	test(gateway): isolate plugin adapter imports and guard the anti-pattern Fixes the xdist collision that broke CI on PR #17764, and structurally prevents future plugin-adapter tests from reintroducing it. Problem ------- tests/gateway/test_teams.py (new in this PR) and tests/gateway/test_irc_adapter.py (already on main) both followed the same anti-pattern: sys.path.insert(0, str(_REPO_ROOT / 'plugins' / 'platforms' / '<name>')) from adapter import <Adapter> Every platform plugin ships its own adapter.py, so the bare 'from adapter import ...' races for sys.modules['adapter']. Whichever test collected first in a given xdist worker won; the other crashed at collection with ImportError, and the polluted sys.path cascaded into 19 unrelated test failures across tools/, hermes_cli/, and run_agent/ in the same worker. Fix --- 1. tests/gateway/_plugin_adapter_loader.py (new): shared helper load_plugin_adapter('<name>') that imports plugins/platforms/<name>/adapter.py via importlib.util under the unique module name plugin_adapter_<name>. Zero sys.path mutation, no possibility of collision. 2. tests/gateway/test_irc_adapter.py and tests/gateway/test_teams.py: migrated to the helper. All 'from adapter import ...' statements (including the ones inside test methods) are replaced with module-level attribute access on the loaded module. 3. tests/gateway/conftest.py: new pytest_configure guard that AST-scans every test_.py under tests/gateway/ at session start and fails the run with a pointer to the helper if any test uses sys.path.insert into plugins/platforms/ OR a bare 'import adapter' / 'from adapter import'. Runs on the xdist controller only (skipped in workers). The next plugin adapter test that tries to reintroduce this pattern gets rejected at collection time with a clear remediation message. 4. scripts/release.py: add aamirjawaid@microsoft.com -> heyitsaamir to AUTHOR_MAP so the check-attribution workflow passes. Validation ---------- scripts/run_tests.sh tests/gateway/ 4194 passed scripts/run_tests.sh tests/gateway/test_{teams,irc} 72 passed (both orderings) scripts/run_tests.sh <11 prev-failing test files> 398 passed Guard triggers correctly on both Path-operator and string-literal forms of the anti-pattern.	2026-04-30 01:19:34 -07:00
Teknium	42d6ab5082	test(gateway): unify discord mock via shared conftest; drop duplicated mock in model_picker test The cherry-picked model_picker test installed its own discord mock at module-import time via a local _ensure_discord_mock(), overwriting sys.modules['discord'] with a mock that lacked attributes other gateway tests needed (Intents.default(), File, app_commands.Choice). On pytest-xdist workers that collected test_discord_model_picker.py first, the shared mock in tests/gateway/conftest.py got clobbered and downstream tests failed with AttributeError / TypeError against missing mock attrs. Classic sys.modules cross-test pollution (see xdist-cross-test-pollution skill). Fix: - Extend the canonical _ensure_discord_mock() in tests/gateway/conftest.py to cover everything the model_picker test needs: real View/Select/ Button/SelectOption classes (not MagicMock sentinels), an Embed class that preserves title/description/color kwargs for assertion, and Color.greyple. - Strip the duplicated mock-setup block from test_discord_model_picker.py and rely on the shared mock that conftest installs at collection time. Regression check: scripts/run_tests.sh tests/gateway/ tests/hermes_cli/ -k 'discord or model or copilot or provider' -o 'addopts=' 1291 passed (was 1288 passed + 3 xdist-ordered failures before this commit).	2026-04-24 03:33:29 -07:00
Teknium	77bdad5b02	fix(tests): resolve 12 CI failures + 10 errors across 6 root causes (#11040 ) Group A (3 tests): 'No LLM provider configured' RuntimeError - test_user_message_surrogates_sanitized, test_counters_initialized_in_init, test_openai_prompt_tokens_unchanged - Root cause: AIAgent.__init__ now requires base_url alongside api_key to skip resolve_provider_client() (which returns None when API keys are blanked in CI). Added base_url='http://localhost:1234/v1' to test agent construction. Group B (5 tests): Discord slash command auto-registration - test_auto_registers_missing_gateway_commands, test_auto_registered_command_, test_register_skill_group_ - Root cause: xdist workers that loaded a discord mock WITHOUT app_commands.Command/Group caused _register_slash_commands() to fail silently. Added comprehensive shared discord mock in tests/gateway/conftest.py (same pattern as existing telegram mock). Group C (5 errors): Discord reply mode 'NoneType has no DMChannel' - All TestReplyToText tests - Root cause: FakeDMChannel was not a subclass of real discord.DMChannel, so isinstance() checks in _handle_message failed when running in full suite (real discord installed). Made FakeDMChannel inherit from discord.DMChannel when available. Removed fragile monkeypatch approach. Group D (2 tests): detect_provider_for_model wrong provider - test_openrouter_slug_match (got 'ai-gateway'), test_bare_name_gets_ openrouter_slug (got 'copilot') - Root cause: ai-gateway, copilot, and kilocode are multi-vendor aggregators that list other providers' models (OpenRouter-style slugs). They were being matched in Step 1 before OpenRouter. Added all three to _AGGREGATORS set so they're skipped like nous/openrouter. Group E (1 test): model_flow_custom StopIteration - test_model_flow_custom_saves_verified_v1_base_url - Root cause: 'Display name' prompt was added after the test was written. The input iterator had 5 answers but the flow now asks 6 questions. Added 6th empty string answer. Group F (1 test): Telegram proxy env assertion - test_uses_proxy_env_for_primary_and_fallback_transports - Root cause: _resolve_proxy_url() now checks TELEGRAM_PROXY first (via resolve_proxy_url('TELEGRAM_PROXY')). Test didn't clear this env var, allowing potential leakage from other tests in xdist workers. Added TELEGRAM_PROXY to the cleanup list.	2026-04-16 06:49:36 -07:00
kshitijk4poor	ff5bf0d6c8	fix(tests): resolve CI test failures — pool auto-seeding, stale assertions, mock isolation Salvaged from PR #10643 by kshitijk4poor, updated for current main. Root causes fixed: 1. Telegram xdist mock pollution — new tests/gateway/conftest.py with shared mock that runs at collection time (prevents ChatType=None caching) 2. VIRTUAL_ENV env var leak — monkeypatch.delenv in _detect_venv_dir tests 3. Copilot base_url missing — add fallback in _resolve_runtime_from_pool_entry 4. Stale vision model assertion — zai now uses glm-5v-turbo 5. Reasoning item id intentionally stripped — assert 'id' not in (store=False) 6. Context length warning unreachable — pass base_url to AIAgent in test 7. Kimi provider label updated — 'Kimi / Kimi Coding Plan' matches models.py 8. Google Workspace calendar tests — rewritten for current production code, properly mock subprocess on api_module, removed stale +agenda assertions 9. Credential pool auto-seeding — mock _select_pool_entry / _resolve_auto / _import_codex_cli_tokens to prevent real credentials from leaking into tests	2026-04-15 22:05:21 -07:00

7 commits