hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Author	SHA1	Message	Date
Brooklyn Nicholson	e2ea8934d4	feat: ensure feature parity once again	2026-04-11 14:02:36 -05:00
Brooklyn Nicholson	bf6af95ff5	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-11 13:14:36 -05:00
Brooklyn Nicholson	3fd5cf6e3c	feat: fix img pasting in new ink plus newline after tools	2026-04-11 13:14:32 -05:00
kshitijk4poor	50bb4fe010	fix(vision): auto-resize oversized images, increase default timeout, fix vision capability detection Cherry-picked from PR #7749 by kshitijk4poor with modifications: - Raise hard image limit from 5 MB to 20 MB (matches most restrictive provider) - Send images at full resolution first; only auto-resize to 5 MB on API failure - Add _is_image_size_error() helper to detect size-related API rejections - Auto-resize uses Pillow (soft dep) with progressive downscale + JPEG quality reduction - Fix get_model_capabilities() to check modalities.input for vision support - Increase default vision timeout from 30s to 120s (matches hardcoded fallback intent) - Applied retry-with-resize to both vision_analyze_tool and browser_vision Closes #7740	2026-04-11 11:12:50 -07:00
Teknium	06e1d9cdd4	fix: resolve three high-impact community bugs (#5819 , #6893 , #3388 ) (#7881 ) Matrix gateway: fix sync loop never dispatching events (#5819) - _sync_loop() called client.sync() but never called handle_sync() to dispatch events to registered callbacks — _on_room_message was registered but never fired for new messages - Store next_batch token from initial sync and pass as since= to subsequent incremental syncs (was doing full initial sync every time) - 17 comments, confirmed by multiple users on matrix.org Feishu docs: add interactive card configuration for approvals (#6893) - Error 200340 is a Feishu Developer Console configuration issue, not a code bug — users need to enable Interactive Card capability and configure Card Request URL - Added required 3-step setup instructions to feishu.md - Added troubleshooting entry for error 200340 - 17 comments from Feishu users Copilot provider drift: detect GPT-5.x Responses API requirement (#3388) - GPT-5.x models are rejected on /v1/chat/completions by both OpenAI and OpenRouter (unsupported_api_for_model error) - Added _model_requires_responses_api() to detect models needing Responses API regardless of provider - Applied in __init__ (covers OpenRouter primary users) and in _try_activate_fallback() (covers Copilot->OpenRouter drift) - Fixed stale comment claiming gateway creates fresh agents per message (it caches them via _agent_cache since the caching was added) - 7 comments, reported on Copilot+Telegram gateway	2026-04-11 11:12:20 -07:00
Siddharth Balyan	69f3aaa1d6	fix(matrix): pass required args to MemoryCryptoStore for mautrix ≥0.21 (#7848 ) * fix(matrix): pass required args to MemoryCryptoStore for mautrix ≥0.21 MemoryCryptoStore.__init__() now requires account_id and pickle_key positional arguments as of mautrix 0.21. The migration from matrix-nio (commit `1850747`) didn't account for this, causing E2EE initialization to fail with: MemoryCryptoStore.__init__() missing 2 required positional arguments: 'account_id' and 'pickle_key' Pass self._user_id as account_id and derive pickle_key from the same user_id:device_id pair already used for the on-disk HMAC signature. Update the test stub to accept the new parameters. Fixes #7803 * fix: use consistent fallback for pickle_key derivation Address review: _pickle_key now uses _acct_id (which has the 'hermes' fallback) instead of raw self._user_id, so both values stay consistent when user_id is empty. --------- Co-authored-by: Hermes Agent <hermes@nousresearch.com>	2026-04-11 10:43:49 -07:00
Brooklyn Nicholson	b04248f4d5	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor # Conflicts: # gateway/platforms/base.py # gateway/run.py # tests/gateway/test_command_bypass_active_session.py	2026-04-11 11:39:47 -05:00
kshitijk4poor	af9caec44f	fix(qwen): correct context lengths for qwen3-coder models and send max_tokens to portal Based on PR #7285 by @kshitijk4poor. Two bugs affecting Qwen OAuth users: 1. Wrong context window — qwen3-coder-plus showed 128K instead of 1M. Added specific entries before the generic qwen catch-all: - qwen3-coder-plus: 1,000,000 (corrected from PR's 1,048,576 per official Alibaba Cloud docs and OpenRouter) - qwen3-coder: 262,144 2. Random stopping — max_tokens was suppressed for Qwen Portal, so the server applied its own low default. Reasoning models exhaust that on thinking tokens. Now: honor explicit max_tokens, default to 65536 when unset. Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-04-11 03:29:31 -07:00
Teknium	f459214010	feat: background process monitoring — watch_patterns for real-time output alerts * feat: add watch_patterns to background processes for output monitoring Adds a new 'watch_patterns' parameter to terminal(background=true) that lets the agent specify strings to watch for in process output. When a matching line appears, a notification is queued and injected as a synthetic message — triggering a new agent turn, similar to notify_on_complete but mid-process. Implementation: - ProcessSession gets watch_patterns field + rate-limit state - _check_watch_patterns() in ProcessRegistry scans new output chunks from all three reader threads (local, PTY, env-poller) - Rate limited: max 8 notifications per 10s window - Sustained overload (45s) permanently disables watching for that process - watch_queue alongside completion_queue, same consumption pattern - CLI drains watch_queue in both idle loop and post-turn drain - Gateway drains after agent runs via _inject_watch_notification() - Checkpoint persistence + crash recovery includes watch_patterns - Blocked in execute_code sandbox (like other bg params) - 20 new tests covering matching, rate limiting, overload kill, checkpoint persistence, schema, and handler passthrough Usage: terminal( command='npm run dev', background=true, watch_patterns=['ERROR', 'WARN', 'listening on port'] ) * refactor: merge watch_queue into completion_queue Unified queue with 'type' field distinguishing 'completion', 'watch_match', and 'watch_disabled' events. Extracted _format_process_notification() in CLI and gateway to handle all event types in a single drain loop. Removes duplication across both CLI drain sites and the gateway.	2026-04-11 03:13:23 -07:00
Hygaard	a2f9f04c06	fix: honor session-scoped gateway model overrides	2026-04-11 03:11:34 -07:00
luyao618	50ad66aee6	test(tools): add unit tests for budget_config module Cover default constants, BudgetConfig defaults, frozen immutability, custom construction, and the resolve_threshold() priority chain (pinned > tool_overrides > registry > default). 20 tests total. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 02:58:48 -07:00
luyao618	80d82c2f5c	test(tools): add unit tests for tool_backend_helpers module Cover all public functions with 50 test cases: - managed_nous_tools_enabled() feature flag toggling - normalize_browser_cloud_provider() coercion and defaults - coerce_modal_mode() / normalize_modal_mode() validation - has_direct_modal_credentials() env vars and config file detection - resolve_modal_backend_state() full backend selection matrix - resolve_openai_audio_api_key() priority chain and edge cases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 02:58:48 -07:00
Teknium	7241e6134b	fix: remove stale test (missing pop_pending), add headers to FakeResponse Follow-up fixes for cherry-pick conflicts: - Removed test_context_keeps_pending_approval test that referenced pop_pending() which doesn't exist on current main - Added headers attribute to FakeResponse in vision test (needed after #6949 added Content-Length check)	2026-04-11 02:03:20 -07:00
Kenny Xie	ae9a713a0a	test(approval): clear leaked bypass state	2026-04-11 02:03:20 -07:00
Kenny Xie	eb8071bbc1	test(gateway): isolate blocking approval env	2026-04-11 02:03:20 -07:00
Kenny Xie	086d92a0e0	test(tools): isolate approval and audio gateway env	2026-04-11 02:03:20 -07:00
Tranquil-Flow	4e56eacdce	fix(vision): reject oversized images before API call, handle file:// URIs, improve 400 errors Three fixes for vision_analyze returning cryptic 400 "Invalid request data": 1. Pre-flight base64 size check — base64 inflates data ~33%, so a 3.8 MB file exceeds the 5 MB API limit. Reject early with a clear message instead of letting the provider return a generic 400. 2. Handle file:// URIs — strip the scheme and resolve as a local path. Previously file:///path/to/image.png fell through to the "invalid image source" error since it matched neither is_file() nor http(s). 3. Separate invalid_request errors from "does not support vision" errors so the user gets actionable guidance (resize/compress/retry) instead of a misleading "model does not support vision" message. Closes #6677	2026-04-11 02:03:20 -07:00
kagura-agent	4d1f1dccf9	fix: normalize numeric MCP server names to str (fixes #6901 ) YAML parses bare numeric keys (e.g. `12306:`) as int, causing TypeError when sorted() is called on mixed int/str collections. Changes: - Normalize toolset_names entries to str in _get_platform_tools() - Cast MCP server name to str(name) when building enabled_mcp_servers - Add regression test	2026-04-11 02:03:20 -07:00
jjovalle99	640441b865	feat(tools): add Voxtral TTS provider (Mistral AI)	2026-04-11 01:56:55 -07:00
Teknium	424b62aa16	fix: update async fallback test mock to 5-tuple for api_mode	2026-04-11 01:52:58 -07:00
kshitijk4poor	c89719ad9c	fix: warn and clear stale OPENAI_BASE_URL on provider switch (#5161 )	2026-04-11 01:52:58 -07:00
kshitijk4poor	eeb8b4b00f	fix(auxiliary): harden fallback behavior for non-OpenRouter users Four fixes to auxiliary_client.py: 1. Respect explicit provider as hard constraint (#7559) When auxiliary.{task}.provider is explicitly set (not 'auto'), connection/payment errors no longer silently fallback to cloud providers. Local-only users (Ollama, vLLM) will no longer get unexpected OpenRouter billing from auxiliary tasks. 2. Eliminate model='default' sentinel (#7512) _resolve_api_key_provider() no longer sends literal 'default' as model name to APIs. Providers without a known aux model in _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing model_not_supported errors. 3. Add payment/connection fallback to async_call_llm (#7512) async_call_llm now mirrors sync call_llm's fallback logic for payment (402) and connection errors. Previously, async consumers (session_search, web_tools, vision) got hard failures with no recovery. Also fixes hardcoded 'openrouter' fallback to use the full auto-detection chain. 4. Use accurate error reason in fallback logs (#7512) _try_payment_fallback() now accepts a reason parameter and uses it in log messages. Connection timeouts are no longer misleadingly logged as 'payment error'. Closes #7559 Closes #7512	2026-04-11 01:52:58 -07:00
kshitijk4poor	ffbd80f5fc	fix(auxiliary): honor api_mode in auxiliary client (#6800 ) The auxiliary client always calls client.chat.completions.create(), ignoring the api_mode config flag. This breaks codex-family models (e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the /v1/responses endpoint. Changes: - Expand _resolve_task_provider_model to return api_mode (5-tuple) - Read api_mode from auxiliary.{task}.api_mode config and env vars (AUXILIARY_{TASK}_API_MODE) - Pass api_mode through _get_cached_client to resolve_provider_client - Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI clients in CodexAuxiliaryClient when api_mode=codex_responses or when auto-detection finds api.openai.com + codex model pattern - Apply wrapping at all custom endpoint, named custom provider, and API-key provider return paths - Update test mocks for the new 5-tuple return format Users can now set: auxiliary: compression: model: gpt-5.3-codex base_url: https://api.openai.com/v1 api_mode: codex_responses Closes #6800	2026-04-11 01:52:58 -07:00
jamesarch	704488b207	fix(setup): relaunch chat in a fresh process	2026-04-11 01:47:48 -07:00
konsisumer	b87e0f59cc	fix(skills): read name from SKILL.md frontmatter in skills_sync _discover_bundled_skills() used the directory name to identify skills, but skills_tool.py and skills_hub.py use the `name:` field from SKILL.md frontmatter. This mismatch caused 9 builtin skills whose directory name differs from their SKILL.md name to be written to .bundled_manifest under the wrong key, so `hermes skills list` showed them as "local" instead of "builtin". Read the frontmatter name field (with directory-name fallback) so the manifest keys match what the rest of the codebase expects. Closes #6835	2026-04-11 01:21:20 -07:00
kshitijk4poor	d442f25a2f	fix: align MiniMax provider with official API docs Aligns MiniMax provider with official API documentation. Fixes 6 bugs: transport mismatch (openai_chat -> anthropic_messages), credential leak in switch_model(), prompt caching sent to non-Anthropic endpoints, dot-to-hyphen model name corruption, trajectory compressor URL routing, and stale doctor health check. Also corrects context window (204,800), thinking support (manual mode), max output (131,072), and model catalog (M2 family only on /anthropic). Source: https://platform.minimax.io/docs/api-reference/text-anthropic-api Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>	2026-04-11 01:04:41 -07:00
Kathie1ee	d9f53dba4c	feat(honcho): add opt-in initOnSessionStart for tools mode and respect explicit peerName (#6995 ) Two fixes for the honcho memory plugin: (1) initOnSessionStart — opt-in eager session init in tools mode so sync_turn() works from turn 1 (default false, non-breaking). (2) peerName fix — gateway user_id no longer silently overwrites an explicitly configured peerName. 11 new tests. Contributed by @Kathie-yu.	2026-04-11 00:43:27 -07:00
Teknium	caf371da18	fix: MiniMax/Alibaba incorrectly detected as Anthropic OAuth, causing mcp_ tool prefix (#7509 ) _is_oauth_token() returned True for any key not starting with 'sk-ant-api', which means MiniMax and Alibaba API keys were falsely treated as Anthropic OAuth tokens. This triggered the Claude Code compatibility path: - All tool names prefixed with mcp_ (e.g. mcp_terminal, mcp_web_search) - System prompt injected with 'You are Claude Code' identity - 'Hermes Agent' replaced with 'Claude Code' throughout Fix: Make _is_oauth_token() positively identify Anthropic OAuth tokens by their key format instead of using a broad catch-all: - sk-ant-* (but not sk-ant-api-) -> setup tokens, managed keys - eyJ -> JWTs from Anthropic OAuth flow - Everything else -> False (MiniMax, Alibaba, etc.) Reported by stefan171.	2026-04-11 00:43:01 -07:00
Kenny Xie	ecfae98152	fix(gateway): address restart review feedback	2026-04-10 21:18:34 -07:00
aquaright1	a55c044ca8	fix(gateway): self-request service restarts when invoked in-process	2026-04-10 21:18:34 -07:00
Kenny Xie	3163731289	fix(gateway): drain in-flight work before restart	2026-04-10 21:18:34 -07:00
Teknium	241032455c	fix: don't evict cached agent on failed runs — prevents MCP restart loop (#7539 ) * fix: circuit breaker stops CPU-burning restart loops on persistent errors When a gateway session hits a non-retryable error (e.g. invalid model ID → HTTP 400), the agent fails and returns. But if the session keeps receiving messages (or something periodically recreates agents), each attempt spawns a new AIAgent — reinitializing MCP server connections, burning CPU — only to hit the same 400 error again. On a 4-core server, this pegs an entire core per stuck session and accumulates 300+ minutes of CPU time over hours. Fix: add a per-session consecutive failure counter in the gateway runner. - Track consecutive non-retryable failures per session key - After 3 consecutive failures (_MAX_CONSECUTIVE_FAILURES), block further agent creation for that session and notify the user: '⚠️ This session has failed N times in a row with a non-retryable error. Use /reset to start a new session.' - Evict the cached agent when the circuit breaker engages to prevent stale state from accumulating - Reset the counter on successful agent runs - Clear the counter on /reset and /new so users can recover - Uses getattr() pattern so bare GatewayRunner instances (common in tests using object.__new__) don't crash Tests: - 8 new tests in test_circuit_breaker.py covering counter behavior, threshold, reset, session isolation, and bare-runner safety Addresses #7130. * Revert "fix: circuit breaker stops CPU-burning restart loops on persistent errors" This reverts commit `d848ea7109`. * fix: don't evict cached agent on failed runs — prevents MCP restart loop When a run fails (e.g. invalid model ID → 400) and fallback activated, the gateway was evicting the cached agent to 'retry primary next time.' But evicting a failed agent forces a full AIAgent recreation on the next message — reinitializing MCP server connections, spawning stdio processes — only to hit the same 400 again. This created a CPU-burning loop (91%+ for hours, #7130). The fix: add `and not _run_failed` to the fallback-eviction check. Failed runs keep the cached agent. The next message reuses it (no MCP reinit), hits the same error, returns it to the user quickly. The user can /reset or /model to fix their config. Successful fallback runs still evict as before so the next message retries the primary model. Addresses #7130.	2026-04-10 21:16:56 -07:00
Kenny Xie	1ffd92cc94	fix(gateway): make manual compression feedback truthful	2026-04-10 21:16:53 -07:00
Kenny Xie	d6c2ad7e41	fix(gateway): make compress responses truthful	2026-04-10 21:16:53 -07:00
luyao618	fc06a0147e	fix(tools): remove dead code in _is_likely_binary and harden _check_lint against brace paths - Remove unreachable `if not content_sample` branch inside the truthy `if content_sample` block in `_is_likely_binary()` (dead code that could never execute). - Replace `linter_cmd.format(file=...)` with `linter_cmd.replace("{file}", ...)` in `_check_lint()` so file paths containing curly braces (e.g. `src/{test}.py`) no longer raise KeyError/ValueError. - Add 16 unit tests covering both fixes and edge cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 21:16:53 -07:00
hermes-agent-dhabibi	c1af614289	fix: wrap copilot Responses-API models in CodexAuxiliaryClient for auxiliary tasks GPT-5+ models (except gpt-5-mini) are only accessible via the Responses API on Copilot. When these models were configured as the compression summary_model (or any auxiliary task), the plain OpenAI client sent them to /chat/completions which returned a 400 error: model "gpt-5.4-mini" is not accessible via the /chat/completions endpoint resolve_provider_client() now checks _should_use_copilot_responses_api() for the copilot provider and wraps the client in CodexAuxiliaryClient when needed, routing calls through responses.stream() transparently. Adds tests for both the wrapping (gpt-5.4-mini) and non-wrapping (gpt-4.1-mini) paths.	2026-04-10 21:16:53 -07:00
hermes-agent-dhabibi	718e8ad6fa	feat(delegation): add configurable reasoning_effort for subagents Add delegation.reasoning_effort config key so subagents can run at a different thinking level than the parent agent. When set, overrides the parent's reasoning_config; when empty, inherits as before. Valid values: xhigh, high, medium, low, minimal, none (disables thinking). Config path: delegation.reasoning_effort in config.yaml Files changed: - tools/delegate_tool.py: resolve override in _build_child_agent - hermes_cli/config.py: add reasoning_effort to DEFAULT_CONFIG - tests/tools/test_delegate.py: 4 new tests covering all cases	2026-04-10 21:16:53 -07:00
Teknium	be9198f1e1	fix: guard mautrix imports for gateway-safe fallback + fix test isolation Follow-up fixes for the matrix-nio → mautrix migration: 1. Module-level mautrix.types import now wrapped in try/except with proper stub classes. Without this, importing gateway.platforms.matrix crashes the entire gateway when mautrix isn't installed — even for users who don't use Matrix. The stubs mirror mautrix's real attribute names so tests that exercise adapter methods (send, reactions, etc.) work without the real SDK. 2. Removed _ensure_mautrix_mock() from test_matrix_mention.py — it permanently installed MagicMock modules in sys.modules via setdefault(), polluting later tests in the suite. No longer needed since the module imports cleanly without mautrix. 3. Fixed thread persistence tests to use direct class reference in monkeypatch.setattr() instead of string-based paths, which broke when the module was reimported by other tests. 4. Moved the module-importability test to a subprocess to prevent it from polluting sys.modules (reimporting creates a second module object with different __dict__, breaking patch.object in subsequent tests).	2026-04-10 21:15:59 -07:00
alt-glitch	d5be23aed7	docs(matrix): update all references from matrix-nio to mautrix	2026-04-10 21:15:59 -07:00
alt-glitch	417e28f941	test(matrix): update all test mocks for mautrix-python API Rewrite mock infrastructure across three test files: - test_matrix.py: replace fake nio module with fake mautrix module tree, update all client method mocks to new API names and return types - test_matrix_voice.py: update event construction, download/upload mocks, handler invocation (single event arg, no room object) - test_matrix_mention.py: update mock module, event construction, DM detection via _dm_rooms cache instead of room.member_count 157 tests passing.	2026-04-10 21:15:59 -07:00
alt-glitch	1850747172	refactor(matrix): swap matrix-nio for mautrix-python dependency matrix-nio pulls in peewee -> atomicwrites (sdist-only, archived, missing build-system metadata) which breaks nix flake builds. mautrix-python publishes wheels, has a leaner dep tree, and its [encryption] extra uses the same python-olm without the problematic transitive chain.	2026-04-10 21:15:59 -07:00
Teknium	a8fd7257b1	feat(gateway): WSL-aware gateway with smart systemd detection (#7510 ) - Add shared is_wsl() to hermes_constants (like is_termux) - Update supports_systemd_services() to verify systemd is actually running on WSL before returning True - Add WSL-specific guidance in gateway install/start/setup/status for both cases: WSL+systemd and WSL without systemd - Improve help strings: 'run' now says recommended for WSL/Docker, 'start'/'install' now mention systemd/launchd explicitly - Add WSL gateway FAQ section with tmux/nohup/Task Scheduler tips - Update CLI commands docs with WSL tip - Deduplicate _is_wsl() from clipboard.py to shared hermes_constants - Fix clipboard tests to reset hermes_constants cache - 20 new WSL-specific tests covering detection, systemd check, supports_systemd_services integration, and command output Motivated by user feedback: took 1 hour to figure out run vs start on WSL, Telegram bot kept disconnecting due to flaky WSL systemd.	2026-04-10 21:15:47 -07:00
Hermes Agent	97bb64dbbf	test(file_sync): add tests for bulk_upload_fn callback Cover the three key behaviors: - bulk_upload_fn is called instead of per-file upload_fn - Fallback to upload_fn when bulk_upload_fn is None - Rollback on bulk upload failure retries all files	2026-04-10 21:14:32 -07:00
Teknium	436dfd5ab5	fix: no auto-activation + unified hermes plugins UI with provider categories - Remove auto-activation: when context.engine is 'compressor' (default), plugin-registered engines are NOT used. Users must explicitly set context.engine to a plugin name to activate it. - Add curses_radiolist() to curses_ui.py: single-select radio picker with keyboard nav + text fallback, matching curses_checklist pattern. - Rewrite cmd_toggle() as composite plugins UI: Top section: general plugins with checkboxes (existing behavior) Bottom section: provider plugin categories (Memory Provider, Context Engine) with current selection shown inline. ENTER/SPACE on a category opens a radiolist sub-screen for single-select configuration. - Add provider discovery helpers: _discover_memory_providers(), _discover_context_engines(), config read/save for memory.provider and context.engine. - Add tests: radiolist non-TTY fallback, provider config save/load, discovery error handling, auto-activation removal verification.	2026-04-10 19:15:50 -07:00
Stephen Schoettler	92382fb00e	feat: wire context engine plugin slot into agent and plugin system - PluginContext.register_context_engine() lets plugins replace the built-in ContextCompressor with a custom ContextEngine implementation - PluginManager stores the registered engine; only one allowed - run_agent.py checks for a plugin engine at init before falling back to the default ContextCompressor - reset_session_state() now calls engine.on_session_reset() instead of poking internal attributes directly - ContextCompressor.on_session_reset() handles its own internals (_context_probed, _previous_summary, etc.) - 19 new tests covering ABC contract, defaults, plugin slot registration, rejection of duplicates/non-engines, and compressor reset behavior - All 34 existing compressor tests pass unchanged	2026-04-10 19:15:50 -07:00
Teknium	842e669a13	fix: activate fallback provider on repeated empty responses + user-visible status (#7505 ) When models return empty responses (no content, no tool calls, no reasoning), Hermes previously retried 3 times silently then fell through to '(empty)' — without ever trying the fallback provider chain. Users on GLM-4.5-Air and similar models experienced what appeared to be a complete hang, especially in gateway (Telegram/Discord) contexts where the silent retries produced zero feedback. Changes: - After exhausting 3 empty retries, attempt _try_activate_fallback() before giving up with '(empty)'. If fallback succeeds, reset retry counter and continue the conversation loop with the new provider. - Replace all _vprint() calls in recovery paths with _emit_status(), which surfaces messages through both CLI (_vprint with force=True) and gateway (status_callback -> adapter.send). Users now see: * '⚠️ Empty response from model — retrying (N/3)' during retries * '⚠️ Model returning empty responses — switching to fallback...' * '↻ Switched to fallback: <model> (<provider>)' on success * '❌ Model returned no content after all retries [and fallback]' - Add logger.warning() throughout empty response paths for log file visibility (model name, provider, retry counts). - Upgrade _last_content_with_tools fallback from logger.debug to logger.info + _emit_status so recovery is visible. - Upgrade thinking-only prefill continuation to use _emit_status. Tests: - test_empty_response_triggers_fallback_provider: verifies fallback activation after 3 empty retries produces content from fallback model - test_empty_response_fallback_also_empty_returns_empty: verifies graceful degradation when fallback also returns empty - test_empty_response_emits_status_for_gateway: verifies _emit_status is called during retries so gateway users see feedback Addresses #7180.	2026-04-10 19:15:41 -07:00
Bartok Moltbot	992422910c	fix(api): send tool progress as custom SSE event to prevent model corruption (#6972 ) Tool progress markers (e.g. `⏰ list`) were injected directly into SSE delta.content chunks. OpenAI-compatible frontends (Open WebUI, LobeChat, etc.) store delta.content verbatim as the assistant message and send it back on subsequent requests. After enough turns, the model learns to emit these markers as plain text instead of issuing real tool calls — silently hallucinating tool results without ever running them. Fix: Send tool progress as a custom `event: hermes.tool.progress` SSE event instead of mixing it into delta.content. Per the SSE spec, clients that don't understand a custom event type silently ignore it, so this is backward-compatible. Frontends that want to render progress indicators can listen for the custom event without persisting it to conversation history. The /v1/runs endpoint already uses structured events — this aligns the /v1/chat/completions streaming path with the same principle. Closes #6972	2026-04-10 18:55:26 -07:00
Siddharth Balyan	9a0c44f908	fix(nix): gate matrix extra to Linux in [all] profile (#7461 ) * fix(nix): gate matrix extra to Linux in [all] profile matrix-nio[e2e] depends on python-olm which is upstream-broken on modern macOS (Clang 21+, archived libolm). Previously the [matrix] extra was completely excluded from [all], meaning NixOS users (who install via [all]) had no Matrix support at all. Add a sys_platform == 'linux' marker so [all] pulls in [matrix] on Linux (where python-olm builds fine) while still skipping it on macOS. This fixes the NixOS setup path without breaking macOS installs. Update the regression test to verify the Linux-gated marker is present rather than just checking matrix is absent from [all]. Fixes #4594 * chore: regenerate uv.lock with matrix-on-linux in [all]	2026-04-11 05:59:56 +05:30
0xFrank-eth	e8034e2f6a	fix(gateway): replace os.environ session state with contextvars for concurrency safety When two gateway messages arrived concurrently, _set_session_env wrote HERMES_SESSION_PLATFORM/CHAT_ID/CHAT_NAME/THREAD_ID into the process-global os.environ. Because asyncio tasks share the same process, Message B would overwrite Message A's values mid-flight, causing background-task notifications and tool calls to route to the wrong thread/chat. Replace os.environ with Python's contextvars.ContextVar. Each asyncio task (and any run_in_executor thread it spawns) gets its own copy, so concurrent messages never interfere. Changes: - New gateway/session_context.py with ContextVar definitions, set/clear/get helpers, and os.environ fallback for CLI/cron/test backward compatibility - gateway/run.py: _set_session_env returns reset tokens, _clear_session_env accepts them for proper cleanup in finally blocks - All tool consumers updated: cronjob_tools, send_message_tool, skills_tool, terminal_tool (both notify_on_complete AND check_interval blocks), tts_tool, agent/skill_utils, agent/prompt_builder - Tests updated for new contextvar-based API Fixes #7358 Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>	2026-04-10 17:04:38 -07:00
Dylan Socolobsky	dab5ec8245	test(e2e): add Slack to parametrized e2e platform tests	2026-04-10 16:51:44 -07:00

1 2 3 4 5 ...

1611 commits