hermes-agent

4951 commits 895 branches 10 tags 1.7 GiB

Author	SHA1	Message	Date
jarvischer	0f778f7768	fix: prevent tool name duplication in streaming accumulator (MiniMax/NVIDIA NIM) Based on #11984 by @maxchernin. Fixes #8259. Some providers (MiniMax M2.7 via NVIDIA NIM) resend the full function name in every streaming chunk instead of only the first. The old accumulator used += which concatenated them into 'read_fileread_file'. Changed to simple assignment (=), matching the OpenAI Node SDK, LiteLLM, and Vercel AI SDK patterns. Function names are atomic identifiers delivered complete — no provider splits them across chunks, so concatenation was never correct semantics.	2026-04-18 12:50:32 -07:00
Teknium	8322b42c6c	fix(streaming): surface dropped tool-call on mid-stream stall (#12072 ) When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: \| Scenario \| Before \| After \| \|---------------------------------------------\|---------------------------\|---------------------------------------------\| \| Stream dies mid tool-call, text already sent \| Silent exit, no indication \| User sees ⚠ warning naming the dropped tool \| \| Text-only partial stream \| Bare recovered text \| Unchanged \| \| tests/run_agent/test_streaming.py \| 24 passed \| 26 passed (2 new) \|	2026-04-18 01:52:06 -07:00
Teknium	d0e1388ca9	fix(tests): make AIAgent constructor calls self-contained (#11755 ) * fix(tests): make AIAgent constructor calls self-contained (no env leakage) Tests in tests/run_agent/ were constructing AIAgent() without passing both api_key and base_url, then relying on leaked state from other tests in the same xdist worker (or process-level env vars) to keep provider resolution happy. Under hermetic conftest + pytest-split, that state is gone and the tests fail with 'No LLM provider configured'. Fix: pass both api_key and base_url explicitly on 47 AIAgent() construction sites across 13 files. AIAgent.__init__ with both set takes the direct-construction path (line 960 in run_agent.py) and skips the resolver entirely. One call site (test_none_base_url_passed_as_none) left alone — that test asserts behavior for base_url=None specifically. This is a prerequisite for any future matrix-split or stricter isolation work, and lands cleanly on its own. Validation: - tests/run_agent/ full: 760 passed, 0 failed (local) - Previously relied on cross-test pollution; now self-contained * fix(tests): update opencode-go model order assertion to match kimi-k2.5-first commit `78a74bb` promoted kimi-k2.5 to first position in model suggestion lists but didn't update this test, which has been failing on main since. Reorder expected list to match the new canonical order.	2026-04-17 12:32:03 -07:00
yongtenglei	2773b18b56	fix(run_agent): refresh activity during streaming responses Previously, long-running streamed responses could be incorrectly treated as idle by the gateway/cron inactivity timeout even while tokens were actively arriving. The _touch_activity() call (which feeds get_activity_summary() polled by the external timeout) was either called only on the first chunk (chat completions) or not at all (Anthropic, Codex, Codex fallback). Add _touch_activity() on every chunk/event in all four streaming paths so the inactivity monitor knows data is still flowing. Fixes #8760	2026-04-13 10:55:51 -07:00
Teknium	a5bd56eae3	fix: eliminate provider hang dead zones in retry/timeout architecture (#8985 ) Three targeted changes to close the gaps between retry layers that caused users to experience 'No response from provider for 580s' and 'No activity for 15 minutes' despite having 5 layers of retry: 1. Remove non-streaming fallback from streaming path Previously, when all 3 stream retries exhausted, the code fell back to _interruptible_api_call() which had no stale detection and no activity tracking — a black hole that could hang for up to 1800s. Now errors propagate to the main retry loop which has richer recovery (credential rotation, provider fallback, backoff). For 'stream not supported' errors, sets _disable_streaming flag so the main retry loop automatically switches to non-streaming on the next attempt. 2. Add _touch_activity to recovery dead zones The gateway inactivity monitor relies on _touch_activity() to know the agent is alive, but activity was never touched during: - Stale stream detection/kill cycles (180-300s gaps) - Stream retry connection rebuilds - Main retry backoff sleeps (up to 120s) - Error recovery classification Now all these paths touch activity every ~30s, keeping the gateway informed during recovery cycles. 3. Add stale-call detector to non-streaming path _interruptible_api_call() now has the same stale detection pattern as the streaming path: kills hung connections after 300s (default, configurable via HERMES_API_CALL_STALE_TIMEOUT), scaled for large contexts (450s for 50K+ tokens, 600s for 100K+ tokens), disabled for local providers. Also touches activity every ~30s during the wait so the gateway monitor stays informed. Env vars: - HERMES_API_CALL_STALE_TIMEOUT: non-streaming stale timeout (default 300s) - HERMES_STREAM_STALE_TIMEOUT: unchanged (default 180s) Before: worst case ~2+ hours of sequential retries with no feedback After: worst case bounded by gateway inactivity timeout (default 1800s) with continuous activity reporting	2026-04-13 04:55:20 -07:00
Siddharth Balyan	f3006ebef9	refactor(tests): re-architect tests + fix CI failures (#5946 ) * refactor: re-architect tests to mirror the codebase * Update tests.yml * fix: add missing tool_error imports after registry refactor * fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers, causing test_code_execution tests to hit the Modal remote path. * fix(tests): fix update_check and telegram xdist failures - test_update_check: replace patch("hermes_cli.banner.os.getenv") with monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os directly, it uses get_hermes_home() from hermes_constants. - test_telegram_conflict/approval_buttons: provide real exception classes for telegram.error mock (NetworkError, TimedOut, BadRequest) so the except clause in connect() doesn't fail with "catching classes that do not inherit from BaseException" when xdist pollutes sys.modules. * fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock	2026-04-07 17:19:07 -07:00

Renamed from tests/test_streaming.py (Browse further)

6 commits