hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Author	SHA1	Message	Date
hengm3467	c6b1ef4e58	feat: add Step Plan provider support (salvage #6005 ) Adds a first-class 'stepfun' API-key provider surfaced as Step Plan: - Support Step Plan setup for both International and China regions - Discover Step Plan models live from /step_plan/v1/models, with a small coding-focused fallback catalog when discovery is unavailable - Thread StepFun through provider metadata, setup persistence, status and doctor output, auxiliary routing, and model normalization - Add tests for provider resolution, model validation, metadata mapping, and StepFun region/model persistence Based on #6005 by @hengm3467. Co-authored-by: hengm3467 <100685635+hengm3467@users.noreply.github.com>	2026-04-22 02:59:58 -07:00
Teknium	b2ba351380	fix(kimi): reconcile sk-kimi- routing with Anthropic SDK URL semantics Follow-ups after salvaging xiaoqiang243's kimi-for-coding patches: - KIMI_CODE_BASE_URL: drop trailing /v1 (was /coding/v1). The /coding endpoint speaks Anthropic Messages, and the Anthropic SDK appends /v1/messages internally. /coding/v1 + SDK suffix produced /coding/v1/v1/messages (a 404). /coding + SDK suffix now yields /coding/v1/messages correctly. - kimi-coding ProviderConfig: keep legacy default api.moonshot.ai/v1 so non-sk-kimi- moonshot keys still authenticate. sk-kimi- keys are already redirected to api.kimi.com/coding via _resolve_kimi_base_url. - doctor.py: update Kimi UA to claude-code/0.1.0 (was KimiCLI/1.30.0) and rewrite /coding base URLs to /coding/v1 for the /models health check (Anthropic surface has no /models). - test_kimi_env_vars: accept KIMI_CODING_API_KEY as a secondary env var. E2E verified: sk-kimi-<key> → https://api.kimi.com/coding/v1/messages (Anthropic) sk-<legacy> → https://api.moonshot.ai/v1/chat/completions (OpenAI) UA: claude-code/0.1.0, x-api-key: <sk-kimi-*>	2026-04-21 19:48:39 -07:00
Teknium	62cbeb6367	test: stop testing mutable data — convert change-detectors to invariants (#13363 ) Catalog snapshots, config version literals, and enumeration counts are data that changes as designed. Tests that assert on those values add no behavioral coverage — they just break CI on every routine update and cost engineering time to 'fix.' Replace with invariants where one exists, delete where none does. Deleted (pure snapshots): - TestMinimaxModelCatalog (3 tests): 'MiniMax-M2.7 in models' et al - TestGeminiModelCatalog: 'gemini-2.5-pro in models', 'gemini-3.x in models' - test_browser_camofox_state::test_config_version_matches_current_schema (docstring literally said it would break on unrelated bumps) Relaxed (keep plumbing check, drop snapshot): - Xiaomi / Arcee / Kimi moonshot / Kimi coding / HuggingFace static lists: now assert 'provider exists and has >= 1 entry' instead of specific names - HuggingFace main/models.py consistency test: drop 'len >= 6' floor Dynamicized (follow source, not a literal): - 3x test_config.py migration tests: raw['_config_version'] == DEFAULT_CONFIG['_config_version'] instead of hardcoded 21 Fixed stale tests against intentional behavior changes: - test_insights::test_gateway_format_hides_cost: name matches new behavior (no dollar figures); remove contradicting '$' in text assertion - test_config::prefers_api_then_url_then_base_url: flipped per PR #9332; rename + update to base_url > url > api - test_anthropic_adapter: relax assert_called_once() (xdist-flaky) to assert called — contract is 'credential flowed through' - test_interrupt_propagation: add provider/model/_base_url to bare-agent fixture so the stale-timeout code path resolves Fixed stale integration tests against opt-in plugin gate: - transform_tool_result + transform_terminal_output: write plugins.enabled allow-list to config.yaml and reset the plugin manager singleton Source fix (real consistency invariant): - agent/model_metadata.py: add moonshotai/Kimi-K2.6 context length (262144, same as K2.5). test_model_metadata_has_context_lengths was correctly catching the gap. Policy: - AGENTS.md Testing section: new subsection 'Don't write change-detector tests' with do/don't examples. Reviewers should reject catalog-snapshot assertions in new tests. Covers every test that failed on the last completed main CI run (24703345583) except test_modal_sandbox_fixes::test_terminal_tool_present + test_terminal_and_file_toolsets_resolve_all_tools, which now pass both alone and with the full tests/tools/ directory (xdist ordering flake that resolved itself).	2026-04-20 23:20:33 -07:00
asurla	3b569ff576	feat(providers): add native NVIDIA NIM provider Adds NVIDIA NIM as a first-class provider: ProviderConfig in auth.py, HermesOverlay in providers.py, curated models (Nemotron plus other open source models hosted on build.nvidia.com), URL mapping in model_metadata.py, aliases (nim, nvidia-nim, build-nvidia, nemotron), and env var tests. Docs updated: providers page, quickstart table, fallback providers table, and README provider list.	2026-04-17 13:47:46 -07:00
Teknium	d404849351	test: make test env hermetic; enforce CI parity via scripts/run_tests.sh (#11577 ) * test: make test env hermetic; enforce CI parity via scripts/run_tests.sh Fixes the recurring 'works locally, fails in CI' (and vice versa) class of flakes by making tests hermetic and providing a canonical local runner that matches CI's environment. ## Layer 1 — hermetic conftest.py (tests/conftest.py) Autouse fixture now unsets every credential-shaped env var before every test, so developer-local API keys can't leak into tests that assert 'auto-detect provider when key present'. Pattern: unset any var ending in _API_KEY, _TOKEN, _SECRET, _PASSWORD, _CREDENTIALS, _ACCESS_KEY, _PRIVATE_KEY, etc. Plus an explicit list of credential names that don't fit the suffix pattern (AWS_ACCESS_KEY_ID, FAL_KEY, GH_TOKEN, etc.) and all the provider BASE_URL overrides that change auto-detect behavior. Also unsets HERMES_* behavioral vars (HERMES_YOLO_MODE, HERMES_QUIET, HERMES_SESSION_, etc.) that mutate agent behavior. Also: - Redirects HOME to a per-test tempdir (not just HERMES_HOME), so code reading ~/.hermes/ directly can't touch the real dir. - Pins TZ=UTC, LANG=C.UTF-8, LC_ALL=C.UTF-8, PYTHONHASHSEED=0 to match CI's deterministic runtime. The old _isolate_hermes_home fixture name is preserved as an alias so any test that yields it explicitly still works. ## Layer 2 — scripts/run_tests.sh canonical runner 'Always use scripts/run_tests.sh, never call pytest directly' is the new rule (documented in AGENTS.md). The script: - Unsets all credential env vars (belt-and-suspenders for callers who bypass conftest — e.g. IDE integrations) - Pins TZ/LANG/PYTHONHASHSEED - Uses -n 4 xdist workers (matches GHA ubuntu-latest; -n auto on a 20-core workstation surfaces test-ordering flakes CI will never see, causing the infamous 'passes in CI, fails locally' drift) - Finds the venv in .venv, venv, or main checkout's venv - Passes through arbitrary pytest args Installs pytest-split on demand so the script can also be used to run matrix-split subsets locally for debugging. ## Remove 3 module-level dotenv stubs that broke test isolation tests/hermes_cli/test_{arcee,xiaomi,api_key}_provider.py each had a module-level: if 'dotenv' not in sys.modules: fake_dotenv = types.ModuleType('dotenv') fake_dotenv.load_dotenv = lambda a, kw: None sys.modules['dotenv'] = fake_dotenv This patches sys.modules['dotenv'] to a fake at import time with no teardown. Under pytest-xdist LoadScheduling, whichever worker collected one of these files first poisoned its sys.modules; subsequent tests in the same worker that imported load_dotenv transitively (e.g. test_env_loader.py via hermes_cli.env_loader) got the no-op lambda and saw their assertions fail. dotenv is a required dependency (python-dotenv>=1.2.1 in pyproject.toml), so the defensive stub was never needed. Removed. ## Validation - tests/hermes_cli/ alone: 2178 passed, 1 skipped, 0 failed (was 4 failures in test_env_loader.py before this fix) - tests/test_plugin_skills.py, tests/hermes_cli/test_plugins.py, tests/test_hermes_logging.py combined: 123 passed (the caplog regression tests from PR #11453 still pass) - Local full run shows no F/E clusters in the 0-55% range that were previously present before the conftest hardening ## Background See AGENTS.md 'Testing' section for the full list of drift sources this closes. Matrix split (closed as #11566) will be re-attempted once this foundation lands — cross-test pollution was the root cause of the shard-3 hang in that PR. fix(conftest): don't redirect HOME — it broke CI subprocesses PR #11577's autouse fixture was setting HOME to a per-test tempdir. CI started timing out at 97% complete with dozens of E/F markers and orphan python processes at cleanup — tests (or transitive deps) spawn subprocesses that expect a stable HOME, and the redirect broke them in non-obvious ways. Env-var unsetting and TZ/LANG/hashseed pinning (the actual CI-drift fixes) are unchanged and still in place. HERMES_HOME redirection is also unchanged — that's the canonical way to isolate tests from ~/.hermes/, not HOME. Any code in the codebase reading ~/.hermes/* via `Path.home() / ".hermes"` instead of `get_hermes_home()` is a bug to fix at the callsite, not something to paper over in conftest.	2026-04-17 06:09:09 -07:00
Teknium	5621fc449a	chore: rename AI Gateway → Vercel AI Gateway, move Xiaomi to #5 (#9326 ) - Rename 'AI Gateway' to 'Vercel AI Gateway' across auth, models, doctor, setup, and tests. - Move Xiaomi MiMo to position #5 in the provider picker.	2026-04-13 19:51:54 -07:00
Teknium	587eeb56b9	chore: remove duplicate dead _try_gh_cli_token / _gh_cli_candidates from auth.py These functions were duplicated between auth.py and copilot_auth.py. The auth.py copies had zero production callers — only copilot_auth.py's versions are used. Redirect the test import to the live copy and update monkeypatch targets accordingly.	2026-04-13 05:12:36 -07:00
Julien Talbot	8bcb8b8e87	feat(providers): add native xAI provider Adds xAI as a first-class provider: ProviderConfig in auth.py, HermesOverlay in providers.py, 11 curated Grok models, URL mapping in model_metadata.py, aliases (x-ai, x.ai), and env var tests. Uses standard OpenAI-compatible chat completions. Closes #7050	2026-04-10 13:40:38 -07:00
Carlos	38ccd9eb95	Harden setup provider flows Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-10 02:57:39 -07:00
Dylan Socolobsky	c6dba918b3	fix(tests): fix several failing/flaky tests on main (#6777 ) * fix(tests): mock is_safe_url in tests that use example.com Tests using example.com URLs were failing because is_safe_url does a real DNS lookup which fails in environments where example.com doesn't resolve, causing the request to be blocked before reaching the already-mocked HTTP client. This should fix around 17 failing tests. These tests test logic, caching, etc. so mocking this method should not modify them in any way. TestMattermostSendUrlAsFile was already doing this so we follow the same pattern. * fix(test): use case-insensitive lookup for model context length check DEFAULT_CONTEXT_LENGTHS uses inconsistent casing (MiniMax keys are lowercase, Qwen keys are mixed-case) so the test was broken in some cases since it couldn't find the model. * fix(test): patch is_linux in systemd gateway restart test The test only patched is_macos to False but didn't patch is_linux to True. On macOS hosts, is_linux() returns False and the systemd restart code path is skipped entirely, making the assertion fail. * fix(test): use non-blocklisted env var in docker forward_env tests GITHUB_TOKEN is in api_key_env_vars and thus in _HERMES_PROVIDER_ENV_BLOCKLIST so the env var is silently dropped, we replace it with a non-blocked one like DATABASE_URL so the tests actually work. * fix(test): fully isolate _has_any_provider_configured from host env _has_any_provider_configured() checks all env vars from PROVIDER_REGISTRY (not just the 5 the tests were clearing) and also calls get_auth_status() which detects gh auth token for Copilot. On machines with any of these set, the function returns True before reaching the code path under test. Clear all registry vars and mock get_auth_status so host credentials don't interfere. * fix(test): correct path to hermes_base_env.py in tool parser tests Path(__file__).parent.parent resolved to tests/, not the project root. The file lives at environments/hermes_base_env.py so we need one more parent level. * fix(test): accept optional HTML fields in Matrix send payload _send_matrix sometimes adds format and formatted_body when the markdown library is installed. The test was doing an exact dict equality check which broke. Check required fields instead. * fix(test): add config.yaml to codex vision requirements test The test only wrote auth.json but not config.yaml, so _read_main_provider() returned empty and vision auto-detect never tried the codex provider. Add a config.yaml pointing at openai-codex so the fallback path actually resolves the client. * fix(test): clear OPENROUTER_API_KEY in _isolate_hermes_home run_agent.py calls load_hermes_dotenv() at import time, which injects API keys from ~/.hermes/.env into os.environ before any test fixture runs. This caused test_agent_loop_tool_calling to make real API calls instead of skipping, which ends up making some tests fail. * fix(test): add get_rate_limit_state to agent mock in usage report tests _show_usage now calls agent.get_rate_limit_state() for rate limit display. The SimpleNamespace mock was missing this method. * fix(test): update expected Camofox config version from 12 to 13 * fix(test): mock _get_enabled_platforms in nous managed defaults test Importing gateway.run leaks DISCORD_BOT_TOKEN into os.environ, which makes _get_enabled_platforms() return ["cli", "discord"] instead of just ["cli"]. tools_command loops per platform, so apply_nous_managed_defaults runs twice: the first call sets config values, the second sees them as already configured and returns an empty set, causing the assertion to fail.	2026-04-09 13:17:06 -07:00
Siddharth Balyan	f3006ebef9	refactor(tests): re-architect tests + fix CI failures (#5946 ) * refactor: re-architect tests to mirror the codebase * Update tests.yml * fix: add missing tool_error imports after registry refactor * fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers, causing test_code_execution tests to hit the Modal remote path. * fix(tests): fix update_check and telegram xdist failures - test_update_check: replace patch("hermes_cli.banner.os.getenv") with monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os directly, it uses get_hermes_home() from hermes_constants. - test_telegram_conflict/approval_buttons: provide real exception classes for telegram.error mock (NetworkError, TimedOut, BadRequest) so the except clause in connect() doesn't fail with "catching classes that do not inherit from BaseException" when xdist pollutes sys.modules. * fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock	2026-04-07 17:19:07 -07:00

11 commits