hermes-agent

5689 commits 895 branches 10 tags 1.7 GiB

Author	SHA1	Message	Date
Teknium	2ff1ef6ae6	fix(surrogates): sanitize reasoning/reasoning_content/reasoning_details fields (#11628 ) Byte-level reasoning models (xiaomi/mimo-v2-pro, kimi, glm) can emit lone surrogates in reasoning output. The proactive sanitizer walked content/ name/tool_calls but not extra fields like reasoning or the nested reasoning_details array. Surrogates in those fields survived the proactive pass, crashed json.dumps() in the OpenAI SDK, and the recovery block's _sanitize_messages_surrogates(messages) call also didn't check those fields — so 'found' was False, no retry happened, and after 3 attempts the user saw: API call failed after 3 retries. 'utf-8' codec can't encode characters in position N-M: surrogates not allowed Changes: - _sanitize_messages_surrogates: walk any extra string fields (reasoning, reasoning_content, etc.) and recurse into nested dict/list values (reasoning_details). Mirrors _sanitize_messages_non_ascii coverage added in PR #10537. - _sanitize_structure_surrogates: new recursive walker, mirror of _sanitize_structure_non_ascii but for surrogate recovery. - UnicodeEncodeError recovery block: also sanitize api_messages, api_kwargs, and prefill_messages (not just the canonical messages list — the API-copy carries reasoning_content transformed from reasoning and that's what the SDK actually serializes). Always retry on detected surrogate errors, not only when we found something to strip — gate on error type per PR #10537's pattern. Tests: extended tests/cli/test_surrogate_sanitization.py with coverage for reasoning, reasoning_content, reasoning_details (flat and deeply nested), structure walker, and an integration case that reproduces the exact api_messages shape that was crashing.	2026-04-17 13:30:47 -07:00
Teknium	77bdad5b02	fix(tests): resolve 12 CI failures + 10 errors across 6 root causes (#11040 ) Group A (3 tests): 'No LLM provider configured' RuntimeError - test_user_message_surrogates_sanitized, test_counters_initialized_in_init, test_openai_prompt_tokens_unchanged - Root cause: AIAgent.__init__ now requires base_url alongside api_key to skip resolve_provider_client() (which returns None when API keys are blanked in CI). Added base_url='http://localhost:1234/v1' to test agent construction. Group B (5 tests): Discord slash command auto-registration - test_auto_registers_missing_gateway_commands, test_auto_registered_command_, test_register_skill_group_ - Root cause: xdist workers that loaded a discord mock WITHOUT app_commands.Command/Group caused _register_slash_commands() to fail silently. Added comprehensive shared discord mock in tests/gateway/conftest.py (same pattern as existing telegram mock). Group C (5 errors): Discord reply mode 'NoneType has no DMChannel' - All TestReplyToText tests - Root cause: FakeDMChannel was not a subclass of real discord.DMChannel, so isinstance() checks in _handle_message failed when running in full suite (real discord installed). Made FakeDMChannel inherit from discord.DMChannel when available. Removed fragile monkeypatch approach. Group D (2 tests): detect_provider_for_model wrong provider - test_openrouter_slug_match (got 'ai-gateway'), test_bare_name_gets_ openrouter_slug (got 'copilot') - Root cause: ai-gateway, copilot, and kilocode are multi-vendor aggregators that list other providers' models (OpenRouter-style slugs). They were being matched in Step 1 before OpenRouter. Added all three to _AGGREGATORS set so they're skipped like nous/openrouter. Group E (1 test): model_flow_custom StopIteration - test_model_flow_custom_saves_verified_v1_base_url - Root cause: 'Display name' prompt was added after the test was written. The input iterator had 5 answers but the flow now asks 6 questions. Added 6th empty string answer. Group F (1 test): Telegram proxy env assertion - test_uses_proxy_env_for_primary_and_fallback_transports - Root cause: _resolve_proxy_url() now checks TELEGRAM_PROXY first (via resolve_proxy_url('TELEGRAM_PROXY')). Test didn't clear this env var, allowing potential leakage from other tests in xdist workers. Added TELEGRAM_PROXY to the cleanup list.	2026-04-16 06:49:36 -07:00
Siddharth Balyan	f3006ebef9	refactor(tests): re-architect tests + fix CI failures (#5946 ) * refactor: re-architect tests to mirror the codebase * Update tests.yml * fix: add missing tool_error imports after registry refactor * fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers, causing test_code_execution tests to hit the Modal remote path. * fix(tests): fix update_check and telegram xdist failures - test_update_check: replace patch("hermes_cli.banner.os.getenv") with monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os directly, it uses get_hermes_home() from hermes_constants. - test_telegram_conflict/approval_buttons: provide real exception classes for telegram.error mock (NetworkError, TimedOut, BadRequest) so the except clause in connect() doesn't fail with "catching classes that do not inherit from BaseException" when xdist pollutes sys.modules. * fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock	2026-04-07 17:19:07 -07:00

Author

SHA1

Message

Date

Teknium

2ff1ef6ae6

fix(surrogates): sanitize reasoning/reasoning_content/reasoning_details fields (#11628 )

Byte-level reasoning models (xiaomi/mimo-v2-pro, kimi, glm) can emit lone
surrogates in reasoning output. The proactive sanitizer walked content/
name/tool_calls but not extra fields like reasoning or the nested
reasoning_details array. Surrogates in those fields survived the
proactive pass, crashed json.dumps() in the OpenAI SDK, and the recovery
block's _sanitize_messages_surrogates(messages) call also didn't check
those fields — so 'found' was False, no retry happened, and after 3
attempts the user saw:

  API call failed after 3 retries. 'utf-8' codec can't encode characters
  in position N-M: surrogates not allowed

Changes:
- _sanitize_messages_surrogates: walk any extra string fields (reasoning,
  reasoning_content, etc.) and recurse into nested dict/list values
  (reasoning_details). Mirrors _sanitize_messages_non_ascii coverage
  added in PR #10537.
- _sanitize_structure_surrogates: new recursive walker, mirror of
  _sanitize_structure_non_ascii but for surrogate recovery.
- UnicodeEncodeError recovery block: also sanitize api_messages,
  api_kwargs, and prefill_messages (not just the canonical messages
  list — the API-copy carries reasoning_content transformed from
  reasoning and that's what the SDK actually serializes). Always
  retry on detected surrogate errors, not only when we found
  something to strip — gate on error type per PR #10537's pattern.

Tests: extended tests/cli/test_surrogate_sanitization.py with
coverage for reasoning, reasoning_content, reasoning_details (flat
and deeply nested), structure walker, and an integration case that
reproduces the exact api_messages shape that was crashing.

2026-04-17 13:30:47 -07:00

Teknium

77bdad5b02

fix(tests): resolve 12 CI failures + 10 errors across 6 root causes (#11040 )

Group A (3 tests): 'No LLM provider configured' RuntimeError
- test_user_message_surrogates_sanitized, test_counters_initialized_in_init,
  test_openai_prompt_tokens_unchanged
- Root cause: AIAgent.__init__ now requires base_url alongside api_key to
  skip resolve_provider_client() (which returns None when API keys are
  blanked in CI). Added base_url='http://localhost:1234/v1' to test
  agent construction.

Group B (5 tests): Discord slash command auto-registration
- test_auto_registers_missing_gateway_commands, test_auto_registered_command_*,
  test_register_skill_group_*
- Root cause: xdist workers that loaded a discord mock WITHOUT
  app_commands.Command/Group caused _register_slash_commands() to fail
  silently. Added comprehensive shared discord mock in
  tests/gateway/conftest.py (same pattern as existing telegram mock).

Group C (5 errors): Discord reply mode 'NoneType has no DMChannel'
- All TestReplyToText tests
- Root cause: FakeDMChannel was not a subclass of real discord.DMChannel,
  so isinstance() checks in _handle_message failed when running in full
  suite (real discord installed). Made FakeDMChannel inherit from
  discord.DMChannel when available. Removed fragile monkeypatch approach.

Group D (2 tests): detect_provider_for_model wrong provider
- test_openrouter_slug_match (got 'ai-gateway'), test_bare_name_gets_
  openrouter_slug (got 'copilot')
- Root cause: ai-gateway, copilot, and kilocode are multi-vendor
  aggregators that list other providers' models (OpenRouter-style slugs).
  They were being matched in Step 1 before OpenRouter. Added all three
  to _AGGREGATORS set so they're skipped like nous/openrouter.

Group E (1 test): model_flow_custom StopIteration
- test_model_flow_custom_saves_verified_v1_base_url
- Root cause: 'Display name' prompt was added after the test was written.
  The input iterator had 5 answers but the flow now asks 6 questions.
  Added 6th empty string answer.

Group F (1 test): Telegram proxy env assertion
- test_uses_proxy_env_for_primary_and_fallback_transports
- Root cause: _resolve_proxy_url() now checks TELEGRAM_PROXY first
  (via resolve_proxy_url('TELEGRAM_PROXY')). Test didn't clear this
  env var, allowing potential leakage from other tests in xdist workers.
  Added TELEGRAM_PROXY to the cleanup list.

2026-04-16 06:49:36 -07:00

Siddharth Balyan

f3006ebef9

refactor(tests): re-architect tests + fix CI failures (#5946 )

* refactor: re-architect tests to mirror the codebase

* Update tests.yml

* fix: add missing tool_error imports after registry refactor

* fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist

patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers,
causing test_code_execution tests to hit the Modal remote path.

* fix(tests): fix update_check and telegram xdist failures

- test_update_check: replace patch("hermes_cli.banner.os.getenv") with
  monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os
  directly, it uses get_hermes_home() from hermes_constants.

- test_telegram_conflict/approval_buttons: provide real exception classes
  for telegram.error mock (NetworkError, TimedOut, BadRequest) so the
  except clause in connect() doesn't fail with "catching classes that do
  not inherit from BaseException" when xdist pollutes sys.modules.

* fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock

2026-04-07 17:19:07 -07:00

Renamed from tests/test_surrogate_sanitization.py (Browse further)

3 commits