hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-26 01:01:40 +00:00

Author	SHA1	Message	Date
Brooklyn Nicholson	b82ec6419d	test(tui-gateway): cover mcp_servers field in _session_info output	2026-04-18 09:42:57 -05:00
Brooklyn Nicholson	a397b0fd4d	test(tui-gateway): assert quick_commands appear in commands.catalog output	2026-04-18 09:42:57 -05:00
Teknium	2edebedc9e	feat(steer): /steer <prompt> injects a mid-run note after the next tool call (#12116 ) * feat(steer): /steer <prompt> injects a mid-run note after the next tool call Adds a new slash command that sits between /queue (turn boundary) and interrupt. /steer <text> stashes the message on the running agent and the agent loop appends it to the LAST tool result's content once the current tool batch finishes. The model sees it as part of the tool output on its next iteration. No interrupt is fired, no new user turn is inserted, and no prompt cache invalidation happens beyond the normal per-turn tool-result churn. Message-role alternation is preserved — we only modify an existing role:"tool" message's content. Wiring ------ - hermes_cli/commands.py: register /steer + add to ACTIVE_SESSION_BYPASS_COMMANDS. - run_agent.py: add _pending_steer state, AIAgent.steer(), _drain_pending_steer(), _apply_pending_steer_to_tool_results(); drain at end of both parallel and sequential tool executors; clear on interrupt; return leftover as result['pending_steer'] if the agent exits before another tool batch. - cli.py: /steer handler — route to agent.steer() when running, fall back to the regular queue otherwise; deliver result['pending_steer'] as next turn. - gateway/run.py: running-agent intercept calls running_agent.steer(); idle-agent path strips the prefix and forwards as a regular user message. - tui_gateway/server.py: new session.steer JSON-RPC method. - ui-tui: SessionSteerResponse type + local /steer slash command that calls session.steer when ui.busy, otherwise enqueues for the next turn. Fallbacks --------- - Agent exits mid-steer → surfaces in run_conversation result as pending_steer so CLI/gateway deliver it as the next user turn instead of silently dropping it. - All tools skipped after interrupt → re-stashes pending_steer for the caller. - No active agent → /steer reduces to sending the text as a normal message. Tests ----- - tests/run_agent/test_steer.py — accept/reject, concatenation, drain, last-tool-result injection, multimodal list content, thread safety, cleared-on-interrupt, registry membership, bypass-set membership. - tests/gateway/test_steer_command.py — running agent, pending sentinel, missing steer() method, rejected payload, empty payload. - tests/gateway/test_command_bypass_active_session.py — /steer bypasses the Level-1 base adapter guard. - tests/test_tui_gateway_server.py — session.steer RPC paths. 72/72 targeted tests pass under scripts/run_tests.sh. * feat(steer): register /steer in Discord's native slash tree Discord's app_commands tree is a curated subset of slash commands (not derived from COMMAND_REGISTRY like Telegram/Slack). /steer already works there as plain text (routes through handle_message → base adapter bypass → runner), but registering it here adds Discord's native autocomplete + argument hint UI so users can discover and type it like any other first-class command.	2026-04-18 04:17:18 -07:00
Teknium	9527707f80	fix(signal): back off sendTyping spam for unreachable recipients (#12118 ) base.py's _keep_typing refresh loop calls send_typing every ~2s while the agent is processing. If signal-cli returns NETWORK_FAILURE for the recipient (offline, unroutable, group membership lost), the unmitigated path was a WARNING log every 2 seconds for as long as the agent stayed busy — a user report showed 1048 warnings in 41 minutes for one offline contact, plus the matching volume of pointless RPC traffic to signal-cli. - _rpc() accepts log_failures=False so callers can route repeated expected failures (typing) to DEBUG while keeping send/receive at WARNING. - send_typing() tracks consecutive failures per chat. First failure still logs WARNING so transport issues remain visible; subsequent failures log at DEBUG. After three consecutive failures we skip the RPC during an exponential cooldown (16s, 32s, 60s cap) so we stop hammering signal-cli for a recipient it can't deliver to. A successful sendTyping resets the counters. - _stop_typing_indicator() clears the backoff state so the next agent turn starts fresh. E2E simulation against the reported 41-minute window: RPCs drop from 1230 to 45 (-96%), log lines from 1048 WARNINGs to 1 WARNING + 44 DEBUGs. Credits kshitijk4poor (#12056) for the _rpc log_failures kwarg idea; the broader restructure in that PR (nested per-chat loop inside send_typing) is avoided here in favour of stateful backoff that preserves base.py's existing _keep_typing architecture.	2026-04-18 04:13:32 -07:00
teknium1	3b69b2fd61	test(session-search): regression coverage for CJK LIKE fallback Twelve tests under TestCJKSearchFallback guarding: - CJK detection across Chinese/Japanese/Korean/Hiragana/Katakana ranges (including the full Hangul syllables block \uac00-\ud7af, to catch the shorter-range typo from one of the duplicate PRs) - Substring match for multi-char Chinese, Japanese, Korean queries - Filter preservation (source_filter, exclude_sources, role_filter) in the LIKE path — guards against the SQL-builder bug from another duplicate PR where filter clauses landed after LIMIT/OFFSET - Snippet centered on the matched term (instr-based substr window), not the leading 200 chars of content - English fast-path untouched - Empty/no-match cases - Mixed CJK+English queries Also: - hermes_state.py: LIKE-fallback snippet is now `substr(content, max(1, instr(content, ?) - 40), 120)`, centered on the match instead of the whole-content default. Credit goes to @iamagenius00 for the snippet idea in PR #11517. - scripts/release.py: add @iamagenius00 to AUTHOR_MAP so future release attribution resolves cleanly. Refs #11511, #11516, #11517, #11541. Co-authored-by: iamagenius00 <iamagenius00@users.noreply.github.com>	2026-04-18 01:57:57 -07:00
Teknium	8322b42c6c	fix(streaming): surface dropped tool-call on mid-stream stall (#12072 ) When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: \| Scenario \| Before \| After \| \|---------------------------------------------\|---------------------------\|---------------------------------------------\| \| Stream dies mid tool-call, text already sent \| Silent exit, no indication \| User sees ⚠ warning naming the dropped tool \| \| Text-only partial stream \| Bare recovered text \| Unchanged \| \| tests/run_agent/test_streaming.py \| 24 passed \| 26 passed (2 new) \|	2026-04-18 01:52:06 -07:00
Teknium	285bb2b915	feat(execute_code): add project/strict execution modes, default to project (#11971 ) Weaker models (Gemma-class) repeatedly rediscover and forget that execute_code uses a different CWD and Python interpreter than terminal(), causing them to flip-flop on whether user files exist and to hit import errors on project dependencies like pandas. Adds a new 'code_execution.mode' config key (default 'project') that brings execute_code into line with terminal()'s filesystem/interpreter: project (new default): - cwd = session's TERMINAL_CWD (falls back to os.getcwd()) - python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python with a Python 3.8+ version check; falls back cleanly to sys.executable if no venv or the candidate fails - result : 'import pandas' works, '.env' resolves, matches terminal() strict (opt-in): - cwd = staging tmpdir (today's behavior) - python = sys.executable (today's behavior) - result : maximum reproducibility and isolation; project deps won't resolve Security-critical invariants are identical across both modes and covered by explicit regression tests: - env scrubbing (strips _API_KEY, _TOKEN, _SECRET, _PASSWORD, _CREDENTIAL, _PASSWD, *_AUTH substrings) - SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no delegate_task, no MCP from inside scripts) - resource caps (5-min timeout, 50KB stdout, 50 tool calls) Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool descriptions (regression from commit `39b83f34` where agents on local backends falsely believed they were sandboxed and refused networking). Override via env var: HERMES_EXECUTE_CODE_MODE=strict\|project	2026-04-18 01:46:25 -07:00
Teknium	598cba62ad	test: update stale tests to match current code (#11963 ) Seven test files were asserting against older function signatures and behaviors. CI has been red on main because of accumulated test debt from other PRs; this catches the tests up. - tests/agent/test_subagent_progress.py: _build_child_progress_callback now takes (task_index, goal, parent_agent, task_count=1); update all call sites and rewrite tests that assumed the old 'batch-only' relay semantics (now relays per-tool AND flushes a summary at BATCH_SIZE). Renamed test_thinking_not_relayed_to_gateway → test_thinking_relayed_to_gateway since thinking IS now relayed as subagent.thinking. - tests/tools/test_delegate.py: _build_child_agent now requires task_count; add task_count=1 to all 8 call sites. - tests/cli/test_reasoning_command.py: AIAgent gained _stream_callback; stub it on the two test agent helpers that use spec=AIAgent / __new__. - tests/hermes_cli/test_cmd_update.py: cmd_update now runs npm install in repo root + ui-tui/ + web/ and 'npm run build' in web/; assert all four subprocess calls in the expected order. - tests/hermes_cli/test_model_validation.py: dissimilar unknown models now return accepted=False (previously True with warning); update both affected tests. - tests/tools/test_registry.py: include feishu_doc_tool and feishu_drive_tool in the expected builtin tool set. - tests/gateway/test_voice_command.py: missing-voice-deps message now suggests 'pip install PyNaCl' not 'hermes-agent[messaging]'. 411/411 pass locally across these 7 files.	2026-04-17 21:35:30 -07:00
AviArora02-commits	994faacce8	fix: suppress Authorization: Bearer for Gemini provider to prevent HTTP 400 (#7893 )	2026-04-17 21:30:17 -07:00
Teknium	8a59f8a9ed	fix(update): survive mid-update terminal disconnect (#11960 ) hermes update no longer dies when the controlling terminal closes (SSH drop, shell close) during pip install. SIGHUP is set to SIG_IGN for the duration of the update, and stdout/stderr are wrapped so writes to a closed pipe are absorbed instead of cascading into process exit. All update output is mirrored to ~/.hermes/logs/update.log so users can see what happened after reconnecting. SIGINT (Ctrl-C) and SIGTERM (systemd) are intentionally still honored — those are deliberate cancellations, not accidents. In gateway mode the helper is a no-op since the update is already detached. POSIX preserves SIG_IGN across exec(), so pip and git subprocesses inherit hangup protection automatically — no changes to subprocess spawning needed.	2026-04-17 21:29:24 -07:00
Teknium	45acd9beb5	fix(gateway): ignore redelivered /restart after PTB offset ACK fails (#11940 ) When a Telegram /restart fires and PTB's graceful-shutdown `get_updates` ACK call times out ("When polling for updates is restarted, updates may be received twice" in gateway.log), the new gateway receives the same /restart again and restarts a second time — a self-perpetuating loop. Record the triggering update_id in `.restart_last_processed.json` when handling /restart. On the next process, reject a /restart whose update_id <= the recorded one as a stale redelivery. 5-minute staleness guard so an orphaned marker can't block a legitimately new /restart. - gateway/platforms/base.py: add `platform_update_id` to MessageEvent - gateway/platforms/telegram.py: propagate `update.update_id` through _build_message_event for text/command/location/media handlers - gateway/run.py: write dedup marker in _handle_restart_command; _is_stale_restart_redelivery checks it before processing /restart - tests/gateway/test_restart_redelivery_dedup.py: 9 new tests covering fresh restart, redelivery, staleness window, cross-platform, malformed-marker resilience, and no-update_id (CLI) bypass Only active for Telegram today (the one platform with monotonic cross-session update ordering); other platforms return False from _is_stale_restart_redelivery and proceed normally.	2026-04-17 21:17:33 -07:00
Teknium	20f2258f34	fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907 ) * fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace interrupt() previously only flagged the agent's _execution_thread_id. Tools running inside _execute_tool_calls_concurrent execute on ThreadPoolExecutor worker threads whose tids are distinct from the agent's, so is_interrupted() inside those tools returned False no matter how many times the gateway called .interrupt() — hung ssh / curl / long make-builds ran to their own timeout. Changes: - run_agent.py: track concurrent-tool worker tids in a per-agent set, fan interrupt()/clear_interrupt() out to them, and handle the register-after-interrupt race at _run_tool entry. getattr fallback for the tracker so test stubs built via object.__new__ keep working. - tools/environments/base.py: opt-in _wait_for_process trace (ENTER, per-30s HEARTBEAT with interrupt+activity-cb state, INTERRUPT DETECTED, TIMEOUT, EXIT) behind HERMES_DEBUG_INTERRUPT=1. - tools/interrupt.py: opt-in set_interrupt() trace (caller tid, target tid, set snapshot) behind the same env flag. - tests: new regression test runs a polling tool on a concurrent worker and asserts is_interrupted() flips to True within ~1s of interrupt(). Second new test guards clear_interrupt() clearing tracked worker bits. Validation: tests/run_agent/ all 762 pass; tests/tools/ interrupt+env subset 216 pass. * fix(interrupt-debug): bypass quiet_mode logger filter so trace reaches agent.log AIAgent.__init__ sets logging.getLogger('tools').setLevel(ERROR) when quiet_mode=True (the CLI default). This would silently swallow every INFO-level trace line from the HERMES_DEBUG_INTERRUPT=1 instrumentation added in the parent commit — confirmed by running hermes chat -q with the flag and finding zero trace lines in agent.log even though _wait_for_process was clearly executing (subprocess pid existed). Fix: when HERMES_DEBUG_INTERRUPT=1, each traced module explicitly sets its own logger level to INFO at import time, overriding the 'tools' parent-level filter. Scoped to the opt-in case only, so production (quiet_mode default) logs stay quiet as designed. Validation: hermes chat -q with HERMES_DEBUG_INTERRUPT=1 now writes '_wait_for_process ENTER/EXIT' lines to agent.log as expected. * fix(cli): SIGTERM/SIGHUP no longer orphans tool subprocesses Tool subprocesses spawned by the local environment backend use os.setsid so they run in their own process group. Before this fix, SIGTERM/SIGHUP to the hermes CLI killed the main thread via KeyboardInterrupt but the worker thread running _wait_for_process never got a chance to call _kill_process — Python exited, the child was reparented to init (PPID=1), and the subprocess ran to its natural end (confirmed live: sleep 300 survived 4+ min after SIGTERM to the agent until manual cleanup). Changes: - cli.py _signal_handler (interactive) + _signal_handler_q (-q mode): route SIGTERM/SIGHUP through agent.interrupt() so the worker's poll loop sees the per-thread interrupt flag and calls _kill_process (os.killpg) on the subprocess group. HERMES_SIGTERM_GRACE (default 1.5s) gives the worker time to complete its SIGTERM+SIGKILL escalation before KeyboardInterrupt unwinds main. - tools/environments/base.py _wait_for_process: wrap the poll loop in try/except (KeyboardInterrupt, SystemExit) so the cleanup fires even on paths the signal handlers don't cover (direct sys.exit, unhandled KI from nested code, etc.). Emits EXCEPTION_EXIT trace line when HERMES_DEBUG_INTERRUPT=1. - New regression test: injects KeyboardInterrupt into a running _wait_for_process via PyThreadState_SetAsyncExc, verifies the subprocess process group is dead within 3s of the exception and that KeyboardInterrupt re-raises cleanly afterward. Validation: \| Before \| After \| \|---------------------------------------------------------\|--------------------\| \| sleep 300 survives 4+ min as PPID=1 orphan after SIGTERM \| dies within 2 s \| \| No INTERRUPT DETECTED in trace \| INTERRUPT DETECTED fires + killing process group \| \| tests/tools/test_local_interrupt_cleanup \| 1/1 pass \| \| tests/run_agent/test_concurrent_interrupt \| 4/4 pass \|	2026-04-17 20:39:25 -07:00
Teknium	607be54a24	fix(discord): forum channel media + polish Extend forum support from PR #10145: - REST path (_send_discord): forum thread creation now uploads media files as multipart attachments on the starter message in a single call. Previously media files were silently dropped on the forum path. - Websocket media paths (_send_file_attachment, send_voice, send_image, send_animation — covers send_image_file, send_video, send_document transitively): forum channels now go through a new _forum_post_file helper that creates a thread with the file as starter content, instead of failing via channel.send(file=...) which forums reject. - _send_to_forum chunk follow-up failures are collected into raw_response['warnings'] so partial-send outcomes surface. - Process-local probe cache (_DISCORD_CHANNEL_TYPE_PROBE_CACHE) avoids GET /channels/{id} on every uncached send after the first. - Dedup of TestSendDiscordMedia that the PR merge-resolution left behind. - Docs: Forum Channels section under website/docs/user-guide/messaging/discord.md. Tests: 117 passed (22 new for forum+media, probe cache, warnings).	2026-04-17 20:25:48 -07:00
ChimingLiu	e5333e793c	feat(discord): support forum channels	2026-04-17 20:25:48 -07:00
helix4u	148459716c	fix(kimi): cover remaining fixed-temperature bypasses	2026-04-17 20:25:42 -07:00
Teknium	53e4a2f2c6	feat(update): warn about legacy hermes.service units during hermes update (#11918 ) Follow-up to #11909: surface the legacy-unit warning where users are most likely to see it. After a 'hermes update', if a pre-rename hermes.service is still installed alongside the current hermes-gateway.service, print the list of legacy units + the 'hermes gateway migrate-legacy' command. Profile-safe: reuses _find_legacy_hermes_units() which is an explicit allowlist of hermes.service only — profile units never match. Platform-gated: only prints on systemd hosts (the rename is Linux-only). Non-blocking: just prints, never prompts, so gateway-spawned hermes update --gateway runs aren't affected.	2026-04-17 19:35:12 -07:00
Teknium	07db20c72d	fix(gateway): detect legacy hermes.service + mark --replace SIGTERM as planned (#11909 ) * fix(gateway): detect legacy hermes.service units from pre-rename installs Older Hermes installs used a different service name (hermes.service) before the rename to hermes-gateway.service. When both units remain installed, they fight over the same bot token — after PR #5646's signal-recovery change, this manifests as a 30-second SIGTERM flap loop between the two services. Detection is an explicit allowlist (no globbing) plus an ExecStart content check, so profile units (hermes-gateway-<profile>.service) and unrelated third-party services named 'hermes' are never matched. Wired into systemd_install, systemd_status, gateway_setup wizard, and the main hermes setup flow — anywhere we already warn about scope conflicts now also warns about legacy units. * feat(gateway): add migrate-legacy command + install-time removal prompt - New hermes_cli.gateway.remove_legacy_hermes_units() removes legacy unit files with stop → disable → unlink → daemon-reload. Handles user and system scopes separately; system scope returns path list when not running as root so the caller can tell the user to re-run with sudo. - New 'hermes gateway migrate-legacy' subcommand (with --dry-run and -y) routes to remove_legacy_hermes_units via gateway_command dispatch. - systemd_install now offers to remove legacy units BEFORE installing the new hermes-gateway.service, preventing the SIGTERM flap loop that hits users who still have pre-rename hermes.service around. Profile units (hermes-gateway-<profile>.service) remain untouched in all paths — the legacy allowlist is explicit (_LEGACY_SERVICE_NAMES) and the ExecStart content check further narrows matches. * fix(gateway): mark --replace SIGTERM as planned so target exits 0 PR #5646 made SIGTERM exit the gateway with code 1 so systemd's Restart=on-failure revives it after unexpected kills. But when a user has two gateway units fighting for the same bot token (e.g. legacy hermes.service + hermes-gateway.service from a pre-rename install), the --replace takeover itself becomes the 'unexpected' SIGTERM — the loser exits 1, systemd revives it 30s later, and the cycle flaps indefinitely. Before calling terminate_pid(), --replace now writes a short-lived marker file naming the target PID + start_time. The target's shutdown_signal_handler consumes the marker and, when it names this process, leaves _signal_initiated_shutdown=False so the final exit code stays 0. Staleness defences: - PID + start_time combo prevents PID reuse matching an old marker - Marker older than 60s is treated as stale and discarded - Marker is unlinked on first read even if it doesn't match this process - Replacer clears the marker post-loop + on permission-denied give-up	2026-04-17 19:27:58 -07:00
pedh	4459913f40	feat(dingtalk): AI Cards streaming, emoji reactions, and media handling Cherry-picked from #10985 by pedh, adapted to current main: * Keeps main's full group-chat gating (require_mention + allowed_users + free_response_chats + mention_patterns) — PR's simpler subset dropped. * Keeps main's fire-and-forget process() dispatch + session_webhook fallback for SDK >= 0.24. * Picks up PR's REQUIRES_EDIT_FINALIZE capability flag on BasePlatformAdapter + finalize kwarg on edit_message(), plumbed through stream_consumer. Default False so Telegram/Slack/Discord/Matrix stay on the zero-overhead fast path. * DingTalk AI Card lifecycle: per-chat _message_contexts, two-card flow (tool-progress + final response) with sibling auto-close driven by reply_to, idempotent 🤔Thinking → 🥳Done swap, $alibabacloud-dingtalk$ for media URL resolution (replaces raw HTTP that was 403-ing). * pyproject: dingtalk extra now dingtalk-stream>=0.20,<1 + alibabacloud-dingtalk>=2.0.0 + qrcode. Closes #10991 Co-authored-by: pedh	2026-04-17 19:26:53 -07:00
Teknium	d7ef562a05	fix(file-ops): follow terminal env's live cwd in _exec instead of init-time cached cwd (#11912 ) ShellFileOperations captured the terminal env's cwd at __init__ time and used that stale value for every subsequent _exec() call. When the user ran `cd` via the terminal tool, `env.cwd` updated but `ops.cwd` did not. Relative paths passed to patch_replace / read_file / write_file / search then targeted the ORIGINAL directory instead of the current one. Observed symptom in agent sessions: terminal: cd .worktrees/my-branch patch hermes_cli/main.py <old> <new> → returns {"success": true} with a plausible unified diff → but `git diff` in the worktree shows nothing → the patch landed in the main repo's checkout of main.py instead The diff looked legitimate because patch_replace computes it from the IN-MEMORY content vs new_content, not by re-reading the file. The write itself DID succeed — it just wrote to the wrong directory's copy of the same-named file. Fix: _exec() now resolves cwd from live sources in this order: 1. Explicit `cwd` arg (if provided by the caller) 2. Live `self.env.cwd` (tracks `cd` commands run via terminal) 3. Init-time `self.cwd` (fallback when env has no cwd attribute) Includes a 5-test regression suite covering: - cd followed by relative read follows live cwd - the exact reported bug: patch_replace with relative path after cd - explicit cwd= arg still wins over env.cwd - env without cwd attribute falls back to init-time cwd - patch_replace success reflects real file state (safety rail) Co-authored-by: teknium1 <teknium@nousresearch.com>	2026-04-17 19:26:40 -07:00
helix4u	47010e0757	fix(gateway): allow systemd-backed distrobox services	2026-04-17 19:24:30 -07:00
Teknium	2297c5f5ce	fix(auth): restore --label for hermes auth add nous --type oauth persist_nous_credentials() now accepts an optional label kwarg which gets embedded in providers.nous under the 'label' key. _seed_from_singletons() prefers the embedded label over the auto-derived label_from_token() fingerprint when materialising the pool entry, so re-seeding on every load_pool('nous') preserves the user's chosen label. auth_commands.py threads --label through to the helper, restoring parity with how other OAuth providers (anthropic, codex, google, qwen) honor the flag. Tests: 4 new (embed, reseed-survives, no-label fallback, end-to-end through auth_add_command). All 390 nous/auth/credential_pool tests pass.	2026-04-17 19:13:40 -07:00
Antoine Khater	c7fece1f9d	fix: normalise Nous device-code pool source to avoid duplicates Review feedback on the original commit: the helper wrote a pool entry with source `manual:device_code` while `_seed_from_singletons()` upserts with `device_code` (no `manual:` prefix), so the pool grew a duplicate row on every `load_pool()` after login. Normalise: the helper now writes `providers.nous` and delegates the pool write entirely to `_seed_from_singletons()` via a follow-up `load_pool()` call. The canonical source is `device_code`; the helper never materialises a parallel `manual:device_code` entry. - `persist_nous_credentials()` loses its `label` and `source` kwargs — both are now derived by the seed path from the singleton state. - CLI and web dashboard call sites simplified accordingly. - New test `test_persist_nous_credentials_idempotent_no_duplicate_pool_entries` asserts that two consecutive persists leave exactly one pool row and no stray `manual:` entries. - Existing `test_auth_add_nous_oauth_persists_pool_entry` updated to assert the canonical source and single-entry invariant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 19:13:40 -07:00
Antoine Khater	c096a6935f	fix(auth): mirror Nous OAuth credentials to providers.nous on CLI login `hermes auth add nous --type oauth` only wrote credential_pool.nous, leaving providers.nous empty. When the Nous agent_key's 24h TTL expired, run_agent.py's 401-recovery path called resolve_nous_runtime_credentials (which reads providers.nous), got AuthError "Hermes is not logged into Nous Portal", caught it as logger.debug (suppressed at INFO level), and the agent died with "Non-retryable client error" — no signal to the user that recovery even tried. Introduce persist_nous_credentials() as the single source of truth for Nous device-code login persistence. Both auth_commands (CLI) and web_server (dashboard) now route through it, so pool and providers stay in sync at write time. Why: CLI-provisioned profiles couldn't recover from agent_key expiry, producing silent daily outages 24h after first login. PR #6856/#6869 addressed adjacent issues but assumed providers.nous was populated; this one wasn't being written. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 19:13:40 -07:00
Teknium	a155b4a159	feat(auxiliary): default 'auto' routing to main model for all users (#11900 ) Before: aggregator users (OpenRouter / Nous Portal) running 'auto' routing for auxiliary tasks — compression, vision, web extraction, session search, etc. — got routed to a cheap provider-side default model (Gemini Flash). Non-aggregator users already got their main model. Behavior was inconsistent and surprising — users picked Claude / GPT / their preferred model, but side tasks ran on Gemini Flash. After: 'auto' means "use my main chat model" for every user, regardless of provider type. Only when the main provider has no working client does the fallback chain run (OpenRouter → Nous → custom → Codex → API-key providers). Explicit per-task overrides in config.yaml (auxiliary.<task>.provider / .model) still win — they are a hard constraint, not subject to the auto policy. Vision auto-detection follows the same policy: try main provider + main model first (with _PROVIDER_VISION_MODELS overrides preserved for providers like xiaomi and zai that ship a dedicated multimodal model distinct from their chat model). Aggregator strict vision backends are fallbacks, not the primary path. Changes: - agent/auxiliary_client.py: _resolve_auto() drops the `_AGGREGATOR_PROVIDERS` guard. resolve_vision_provider_client() auto branch unifies aggregator and exotic-provider paths — everyone goes through resolve_provider_client() with main_model. Dead _AGGREGATOR_PROVIDERS constant removed (was only used by the guard we just removed). - hermes_cli/main.py: aux config menu copy updated to reflect the new semantics ("'auto' means 'use my main model'"). - tests/agent/test_auxiliary_main_first.py: 12 regression tests covering OpenRouter/Nous/DeepSeek main paths, runtime-override wins, explicit-config wins, vision override preservation for exotic providers, and fallback-chain activation when the main provider has no working client. Co-authored-by: teknium1 <teknium@nousresearch.com>	2026-04-17 19:13:23 -07:00
Teknium	b449a0e049	fix(feishu-comment): use get_hermes_home(); drop dead asyncio wrapper; AUTHOR_MAP Follow-up polish on top of the cherry-picked #11023 commit. - feishu_comment_rules.py: replace import-time "~/.hermes" expanduser fallback with get_hermes_home() from hermes_constants (canonical, profile-safe). - tools/feishu_doc_tool.py, tools/feishu_drive_tool.py: drop the asyncio.get_event_loop().run_until_complete(asyncio.to_thread(...)) dance. Tool handlers run synchronously in a worker thread with no running loop, so the RuntimeError branch was always the one that executed. Calls client.request directly now. Unused asyncio import removed. - tests/gateway/test_feishu.py: add register_p2_customized_event to the mock EventDispatcher builder so the existing adapter test matches the new handler registration for drive.notice.comment_add_v1. - scripts/release.py: map liujinkun@bytedance.com -> liujinkun2025 for contributor attribution on release notes.	2026-04-17 19:04:11 -07:00
liujinkun	85cdb04bd4	feat: add Feishu document comment intelligent reply with 3-tier access control - Full comment handler: parse drive.notice.comment_add_v1 events, build timeline, run agent, deliver reply with chunking support. - 5 tools: feishu_doc_read, feishu_drive_list_comments, feishu_drive_list_comment_replies, feishu_drive_reply_comment, feishu_drive_add_comment. - 3-tier access control rules (exact doc > wildcard "*" > top-level > defaults) with per-field fallback. Config via ~/.hermes/feishu_comment_rules.json, mtime-cached hot-reload. - Self-reply filter using generalized self_open_id (supports future user-identity subscriptions). Receiver check: only process events where the bot is the @mentioned target. - Smart timeline selection, long text chunking, semantic text extraction, session sharing per document, wiki link resolution. Change-Id: I31e82fd6355173dbcc400b8934b6d9799e3137b9	2026-04-17 19:04:11 -07:00
Teknium	9b14b76eb3	fix(wecom): bound req_id cache, revert undocumented is_group change, add tests Follow-up to the cherry-picked contributor fix: - Extract `_remember_chat_req_id()` and bound it at DEDUP_MAX_SIZE like `_reply_req_ids` — the unbounded dict would grow forever on a long- running gateway with many chats. - Move the cache write to AFTER the group/DM policy check so we don't cache req_ids from blocked senders. - Revert the undocumented `is_group` change: the contributor flipped `chattype == 'group'` to `bool(chatid)`, which wasn't mentioned in the PR description and weakens the signal (chattype is the explicit hint; relying on chatid presence assumes DMs never carry it). Keep the original check. - Drop the defensive `getattr(self, '_last_chat_req_ids', {})` reads at both send sites — the attribute is initialized in __init__. - Update `test_send_uses_passive_reply_stream_...` → `_markdown_...` to match the new msgtype, and add a new TestWeComZombieSessionFix class covering device_id presence in subscribe, per-chat req_id caching + bounding, blocked-sender cache exclusion, and the group APP_CMD_RESPONSE fallback path.	2026-04-17 19:03:29 -07:00
Teknium	04a0c3cb95	fix(config): preserve env refs when save_config rewrites config (#11892 ) Co-authored-by: binhnt92 <84617813+binhnt92@users.noreply.github.com>	2026-04-17 19:03:26 -07:00
Teknium	8444f66890	feat(hermes model): add Configure auxiliary models UI to `hermes model` (#11891 ) Previously users had to hand-edit config.yaml to route individual auxiliary tasks (vision, compression, web_extract, etc.) to a specific provider+model. Add a first-class picker reachable from the bottom of the existing `hermes model` provider list. Flow: hermes model → Configure auxiliary models... → <task picker: 9 tasks, shows current setting inline> → <provider picker: authenticated providers + auto + custom> → <model picker: curated list + live pricing> The aux picker does NOT re-run credential/OAuth setup; users authenticate providers through the normal `hermes model` flow, then route aux tasks to them here. `list_authenticated_providers()` gates the list to providers the user has configured. Also: - 'Cancel' entry relabeled 'Leave unchanged' (sentinel still 'cancel' internally, so dispatch logic is unchanged) - 'Reset all to auto' entry to bulk-clear aux overrides; preserves user-tuned timeout / download_timeout values - Adds `title_generation` task to DEFAULT_CONFIG.auxiliary — the task was called from agent/title_generator.py but was missing from defaults, so config-backed timeout overrides never worked for it Co-authored-by: teknium1 <teknium@nousresearch.com>	2026-04-17 19:02:06 -07:00
Sara Reynolds	8ab1aa2efc	fix(gateway): fix discrepancies in gateway status	2026-04-17 18:58:29 -07:00
Xowiek	511ed4dacc	fix(gateway): bypass active-session guard for gateway-handled slash commands	2026-04-17 18:58:03 -07:00
helix4u	016ae5c334	fix(kimi): force 0.6 on main chat path	2026-04-17 18:47:01 -07:00
Teknium	304fb921bf	fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843 ) Both fixes close process leaks observed in production (18+ orphaned agent-browser node daemons, 15+ orphaned paste.rs sleep interpreters accumulated over ~3 days, ~2.7 GB RSS). ## agent-browser daemon leak Previously the orphan reaper (_reap_orphaned_browser_sessions) only ran from _start_browser_cleanup_thread, which is only invoked on the first browser tool call in a process. Hermes sessions that never used the browser never swept orphans, and the cross-process orphan detection relied on in-process _active_sessions, which doesn't see other hermes PIDs' sessions (race risk). - Write <session>.owner_pid alongside the socket dir recording the hermes PID that owns the daemon (extracted into _write_owner_pid for direct testability). - Reaper prefers owner_pid liveness over in-process _active_sessions. Cross-process safe: concurrent hermes instances won't reap each other's daemons. Legacy tracked_names fallback kept for daemons that predate owner_pid. - atexit handler (_emergency_cleanup_all_sessions) now always runs the reaper, not just when this process had active sessions — every clean hermes exit sweeps accumulated orphans. ## paste.rs auto-delete leak _schedule_auto_delete spawned a detached Python subprocess per call that slept 6 hours then issued DELETE requests. No dedup, no tracking — every 'hermes debug share' invocation added ~20 MB of resident Python interpreters that stuck around until the sleep finished. - Replaced the spawn with ~/.hermes/pastes/pending.json: records {url, expire_at} entries. - _sweep_expired_pastes() synchronously DELETEs past-due entries on every 'hermes debug' invocation (run_debug() dispatcher). - Network failures stay in pending.json for up to 24h, then give up (paste.rs's own retention handles the 'user never runs hermes again' edge case). - Zero subprocesses; regression test asserts subprocess/Popen/time.sleep never appear in the function source (skipping docstrings via AST). ## Validation \| \| Before \| After \| \|------------------------------\|---------------\|--------------\| \| Orphan agent-browser daemons \| 18 accumulated\| 2 (live) \| \| paste.rs sleep interpreters \| 15 accumulated\| 0 \| \| RSS reclaimed \| - \| ~2.7 GB \| \| Targeted tests \| - \| 2253 pass \| E2E verified: alive-owner daemons NOT reaped; dead-owner daemons SIGTERM'd and socket dirs cleaned; pending.json sweep deletes expired entries without spawning subprocesses.	2026-04-17 18:46:30 -07:00
helix4u	64b354719f	Support browser CDP URL from config	2026-04-17 16:05:04 -07:00
brooklyn!	e9b8ece103	Merge pull request #4692 from NousResearch/feat/ink-refactor Feat/ink refactor	2026-04-17 18:02:37 -05:00
Teknium	3f43aec15d	fix(tools): bound _read_tracker sub-containers + prune _completion_consumed (#11839 ) Two accretion-over-time leaks that compound over long CLI / gateway lifetimes. Both were flagged in the memory-leak audit. ## file_tools._read_tracker _read_tracker[task_id] holds three sub-containers that grew unbounded: read_history set of (path, offset, limit) tuples — 1 per unique read dedup dict of (path, offset, limit) → mtime — same growth pattern read_timestamps dict of resolved_path → mtime — 1 per unique path A CLI session uses one stable task_id for its lifetime, so these were uncapped. A 10k-read session accumulated ~1.5MB of tracker state that the tool no longer needed (only the most recent reads are relevant for dedup, consecutive-loop detection, and write/patch external-edit warnings). Fix: _cap_read_tracker_data() enforces hard caps on each container after every add. Defaults: read_history=500, dedup=1000, read_timestamps=1000. Eviction is insertion-order (Python 3.7+ dict guarantee) for the dicts; arbitrary for the set (which only feeds diagnostic summaries). ## process_registry._completion_consumed Module-level set that recorded every session_id ever polled / waited / logged. No pruning. Each entry is ~20 bytes, so the absolute leak is small, but on a gateway processing thousands of background commands per day the set grows until process exit. Fix: _prune_if_needed() now discards _completion_consumed entries alongside the session dict evictions it already performs (both the TTL-based prune and the LRU-over-cap prune). Adds a final belt-and-suspenders pass that drops any dangling entries whose session_id no longer appears in _running or _finished. Tests: tests/tools/test_accretion_caps.py — 9 cases * Each container bound respected, oldest evicted * No-op when under cap (no unnecessary work) * Handles missing sub-containers without crashing * Live read_file_tool path enforces caps end-to-end * _completion_consumed pruned on TTL expiry * _completion_consumed pruned on LRU eviction * Dangling entries (no backing session) cleared Broader suite: 3486 tests/tools + tests/cli pass. The single flake (test_alias_command_passes_args) reproduces on unchanged main — known cross-test pollution under suite-order load.	2026-04-17 15:53:57 -07:00
Brooklyn Nicholson	aa583cb14e	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-17 17:51:40 -05:00
helix4u	2b60478fc2	fix(kimi): force kimi-for-coding temperature to 0.6	2026-04-17 15:49:14 -07:00
Teknium	c6fd2619f7	fix(gemini-cli): surface MODEL_CAPACITY_EXHAUSTED cleanly + drop retired gemma-4-26b (#11833 ) Google-side 429 Code Assist errors now flow through Hermes' normal rate-limit path (status_code on the exception, Retry-After preserved via error.response) instead of being opaque RuntimeErrors. User sees a one-line capacity message instead of a 500-char JSON dump. Changes - CodeAssistError grows status_code / response / retry_after / details attrs. _extract_status_code in error_classifier picks up status_code and classifies 429 as FailoverReason.rate_limit, so fallback_providers triggers the same way it does for SDK errors. run_agent.py line ~10428 already walks error.response.headers for Retry-After — preserving the response means that path just works. - _gemini_http_error parses the Google error envelope (error.status + error.details[].reason from google.rpc.ErrorInfo, retryDelay from google.rpc.RetryInfo). MODEL_CAPACITY_EXHAUSTED / RESOURCE_EXHAUSTED / 404 model-not-found each produce a human-readable message; unknown shapes fall back to the previous raw-body format. - Drop gemma-4-26b-it from hermes_cli/models.py, hermes_cli/setup.py, and agent/model_metadata.py — Google returned 404 for it today in local repro. Kept gemma-4-31b-it (capacity-constrained but not retired). Validation \| \| Before \| After \| \|---------------------------\|--------------------------------\|-------------------------------------------\| \| Error message \| 'Code Assist returned HTTP 429: {500 chars JSON}' \| 'Gemini capacity exhausted for gemini-2.5-pro (Google-side throttle...)' \| \| status_code on error \| None (opaque RuntimeError) \| 429 \| \| Classifier reason \| unknown (string-match fallback) \| FailoverReason.rate_limit \| \| Retry-After honored \| ignored \| extracted from RetryInfo or header \| \| gemma-4-26b-it picker \| advertised (404s on Google) \| removed \| Unit + E2E tests cover non-streaming 429, streaming 429, 404 model-not-found, Retry-After header fallback, malformed body, and classifier integration. Targeted suites: tests/agent/test_gemini_cloudcode.py (81 tests), full tests/hermes_cli (2203 tests) green. Co-authored-by: teknium1 <teknium@nousresearch.com>	2026-04-17 15:34:12 -07:00
Teknium	d2206c69cc	fix(qqbot): add back-compat for env var rename; drop qrcode core dep Follow-up to WideLee's salvaged PR #11582. Back-compat for QQ_HOME_CHANNEL → QQBOT_HOME_CHANNEL rename: - gateway/config.py reads QQBOT_HOME_CHANNEL, falls back to QQ_HOME_CHANNEL with a one-shot deprecation warning so users on the old name aren't silently broken. - cron/scheduler.py: _HOME_TARGET_ENV_VARS['qqbot'] now maps to the new name; _get_home_target_chat_id falls back to the legacy name via a _LEGACY_HOME_TARGET_ENV_VARS table. - hermes_cli/status.py + hermes_cli/setup.py: honor both names when displaying or checking for missing home channels. - hermes_cli/config.py: keep legacy QQ_HOME_CHANNEL[_NAME] in _EXTRA_ENV_KEYS so .env sanitization still recognizes them. Scope cleanup: - Drop qrcode from core dependencies and requirements.txt (remains in messaging/dingtalk/feishu extras). _qqbot_render_qr already degrades gracefully when qrcode is missing, printing a 'pip install qrcode' tip and falling back to URL-only display. - Restore @staticmethod on QQAdapter._detect_message_type (it doesn't use self). Revert the test change that was only needed when it was converted to an instance method. - Reset uv.lock to origin/main; the PR's stale lock also included unrelated changes (atroposlib source URL, hermes-agent version bump, fastapi additions) that don't belong. Verified E2E: - Existing user (QQ_HOME_CHANNEL set): gateway + cron both pick up the legacy name; deprecation warning logs once. - Fresh user (QQBOT_HOME_CHANNEL set): gateway + cron use new name, no warning. - Both set: new name wins on both surfaces. Targeted tests: 296 passed, 4 skipped (qqbot + cron + hermes_cli).	2026-04-17 15:31:14 -07:00
WideLee	103beea7a6	fix(qqbot): fix test failures after package refactor - Re-export _ssrf_redirect_guard from __init__.py - Fix _parse_json @staticmethod using self._log_tag - Update test_detect_message_type to call as instance method - Fix mock.patch path for httpx.AsyncClient in adapter submodule	2026-04-17 15:31:14 -07:00
Teknium	31e7276474	fix(gateway): consolidate per-session cleanup; close SessionDB on shutdown (#11800 ) Three closely-related fixes for shutdown / lifecycle hygiene. 1. _release_running_agent_state(session_key) helper ---------------------------------------------------- Per-running-agent state lived in three dicts that drifted out of sync across cleanup sites: self._running_agents — AIAgent per session_key self._running_agents_ts — start timestamp per session_key self._busy_ack_ts — last busy-ack timestamp per session_key Inventory before this PR: 8 sites: del self._running_agents[key] — only 1 (stale-eviction) cleaned all three — 1 cleaned _running_agents + _running_agents_ts only — 6 cleaned _running_agents only Each missed entry was a (str, float) tuple per session per gateway lifetime — small, persistent, accumulates across thousands of sessions over months. Per-platform leaks compounded. This change adds a single helper that pops all three dicts in lockstep, and replaces every bare 'del self._running_agents[key]' site with it. Per-session state that PERSISTS across turns (_session_model_overrides, _voice_mode, _pending_approvals, _update_prompt_pending) is intentionally NOT touched here — those have their own lifecycles tied to user actions, not turn boundaries. 2. _running_agents_ts cleared in _stop_impl ---------------------------------------- Was being missed alongside _running_agents.clear(); now included. 3. SessionDB close() in _stop_impl --------------------------------- The SQLite WAL write lock stayed held by the old gateway connection until Python actually exited — causing 'database is locked' errors when --replace launched a new gateway against the same file. We now explicitly close both self._db and self.session_store._db inside _stop_impl, with try/except so a flaky close on one doesn't block the other. Tests ----- tests/gateway/test_session_state_cleanup.py — 10 cases covering: * helper pops all three dicts atomically * idempotent on missing/empty keys * preserves other sessions * tolerates older runners without _busy_ack_ts attribute * thread-safe under concurrent release * regression guard: scans gateway/run.py and fails if a future contributor reintroduces 'del self._running_agents[...]' outside docstrings * SessionDB close called on both holders during shutdown * shutdown tolerates missing session_store * shutdown tolerates close() raising on one db (other still closes) Broader gateway suite: 3108 passed (vs 3100 on baseline) — failure delta is +8 net passes; the 10 remaining failures are pre-existing cross-test pollution / missing optional deps (matrix needs olm, signal/telegram approval flake, dingtalk Mock wiring), all reproduce on stashed baseline.	2026-04-17 15:18:23 -07:00
Teknium	036dacf659	feat(telegram): auto-wrap markdown tables in code blocks (#11794 ) Telegram's MarkdownV2 has no table syntax — pipes get backslash-escaped and tables render as noisy unaligned text. format_message now detects GFM-style pipe tables (header row + delimiter row + optional body) and wraps them in ``` fences before the existing MarkdownV2 conversion runs. Telegram renders fenced code blocks as monospace preformatted text with columns intact. Tables already inside an existing code block are left alone. Plain prose with pipes, lone '---' horizontal rules, and non-table content are unaffected. Closes the recurring community request to stop having to ask the agent to re-render tables as code blocks manually.	2026-04-17 14:27:26 -07:00
Teknium	3207b9bda0	test: speed up slow tests (backoff + subprocess + IMDS network) (#11797 ) Cuts shard-3 local runtime in half by neutralizing real wall-clock waits across three classes of slow test: ## 1. Retry backoff mocks - tests/run_agent/conftest.py (NEW): autouse fixture mocks jittered_backoff to 0.0 so the `while time.time() < sleep_end` busy-loop exits immediately. No global time.sleep mock (would break threading tests). - test_anthropic_error_handling, test_413_compression, test_run_agent_codex_responses, test_fallback_model: per-file fixtures mock time.sleep / asyncio.sleep for retry / compression paths. - test_retaindb_plugin: cap the retaindb module's bound time.sleep to 0.05s via a per-test shim (background writer-thread retries sleep 2s after errors; tests don't care about exact duration). Plus replace arbitrary time.sleep(N) waits with short polling loops bounded by deadline. ## 2. Subprocess sleeps in production code - test_update_gateway_restart: mock time.sleep. Production code does time.sleep(3) after `systemctl restart` to verify the service survived. Tests mock subprocess.run \u2014 nothing actually restarts \u2014 so the wait is dead time. ## 3. Network / IMDS timeouts (biggest single win) - tests/conftest.py: add AWS_EC2_METADATA_DISABLED=true plus AWS_METADATA_SERVICE_TIMEOUT=1 and ATTEMPTS=1. boto3 falls back to IMDS (169.254.169.254) when no AWS creds are set. Any test hitting has_aws_credentials() / resolve_aws_auth_env_var() (e.g. test_status, test_setup_copilot_acp, anything that touches provider auto-detect) burned ~2-4s waiting for that to time out. - test_exit_cleanup_interrupt: explicitly mock resolve_runtime_provider which was doing real network auto-detect (~4s). Tests don't care about provider resolution \u2014 the agent is already mocked. - test_timezone: collapse the 3-test "TZ env in subprocess" suite into 2 tests by checking both injection AND no-leak in the same subprocess spawn (was 3 \u00d7 3.2s, now 2 \u00d7 4s). ## Validation \| Test \| Before \| After \| \|---\|---\|---\| \| test_anthropic_error_handling (8 tests) \| ~80s \| ~15s \| \| test_413_compression (14 tests) \| ~18s \| 2.3s \| \| test_retaindb_plugin (67 tests) \| ~13s \| 1.3s \| \| test_status_includes_tavily_key \| 4.0s \| 0.05s \| \| test_setup_copilot_acp_skips_same_provider_pool_step \| 8.0s \| 0.26s \| \| test_update_gateway_restart (5 tests) \| ~18s total \| ~0.35s total \| \| test_exit_cleanup_interrupt (2 tests) \| 8s \| 1.5s \| \| Matrix shard 3 local \| 108s \| 50s \| No behavioral contract changed \u2014 tests still verify retry happens, service restart logic runs, etc.; they just don't burn real seconds waiting for it. Supersedes PR #11779 (those changes are included here).	2026-04-17 14:21:22 -07:00
Teknium	eb07c05646	fix(gateway): prune stale SessionStore entries to bound memory + disk (#11789 ) SessionStore._entries grew unbounded. Every unique (platform, chat_id, thread_id, user_id) tuple ever seen was kept in RAM and rewritten to sessions.json on every message. A Discord bot in 100 servers x 100 channels x ~100 rotating users accumulates on the order of 10^5 entries after a few months; each sessions.json write becomes an O(n) fsync. Nothing trimmed this — there was no TTL, no cap, no eviction path. Changes ------- * SessionStore.prune_old_entries(max_age_days) — drops entries whose updated_at is older than the cutoff. Preserves: - suspended entries (user paused them via /stop for later resume) - entries with an active background process attached Pruning is functionally identical to a natural reset-policy expiry: SQLite transcript stays, session_key -> session_id mapping dropped, returning user gets a fresh session. * GatewayConfig.session_store_max_age_days (default 90; 0 disables). Serialized in to_dict/from_dict, coerced from bad types / negatives to safe defaults. No migration needed — missing field -> 90 days. * _session_expiry_watcher calls prune_old_entries once per hour (first tick is immediate). Uses the existing watcher loop so no new background task is created. Why not more aggressive ----------------------- 90 days is long enough that legitimate long-idle users (seasonal, vacation, etc.) aren't surprised — pruning just means they get a fresh session on return, same outcome they'd get from any other reset-policy trigger. Admins can lower it via config; 0 disables. Tests ----- tests/gateway/test_session_store_prune.py — 17 cases covering: * entry age based on updated_at, not created_at * max_age_days=0 disables; negative coerces to 0 * suspended + active-process entries are skipped * _save fires iff something was removed * disk JSON reflects post-prune state * thread safety against concurrent readers * config field roundtrips + graceful fallback on bad values * watcher gate logic (first tick prunes, subsequent within 1h don't) 119 broader session/gateway tests remain green.	2026-04-17 13:48:49 -07:00
Teknium	f362083c64	fix(providers): complete NVIDIA NIM parity with other providers Follow-up on the native NVIDIA NIM provider salvage. The original PR wired PROVIDER_REGISTRY + HERMES_OVERLAYS correctly but missed several touchpoints required for full parity with other OpenAI-compatible providers (xai, huggingface, deepseek, zai). Gaps closed: - hermes_cli/main.py: - Add 'nvidia' to the _model_flow_api_key_provider dispatch tuple so selecting 'NVIDIA NIM' in `hermes model` actually runs the api-key provider flow (previously fell through silently). - Add 'nvidia' to `hermes chat --provider` argparse choices so the documented test command (`hermes chat --provider nvidia --model ...`) parses successfully. - hermes_cli/config.py: Register NVIDIA_API_KEY and NVIDIA_BASE_URL in OPTIONAL_ENV_VARS so setup wizard can prompt for them and they're auto-added to the subprocess env blocklist. - hermes_cli/doctor.py: Add NVIDIA NIM row to `_apikey_providers` so `hermes doctor` probes https://integrate.api.nvidia.com/v1/models. - hermes_cli/dump.py: Add NVIDIA_API_KEY → 'nvidia' mapping for `hermes dump` credential masking. - tests/tools/test_local_env_blocklist.py: Extend registry_vars fixture with NVIDIA_API_KEY to verify it's blocked from leaking into subprocesses. - agent/model_metadata.py: Add 'nemotron' → 131072 context-length entry so all Nemotron variants get 128K context via substring match (rather than falling back to MINIMUM_CONTEXT_LENGTH). - hermes_cli/models.py: Fix hallucinated model ID 'nvidia/nemotron-3-nano-8b-a4b' → 'nvidia/nemotron-3-nano-30b-a3b' (verified against live integrate.api.nvidia.com/v1/models catalog). Expand curated list from 5 to 9 agentic models mapping to OpenRouter defaults per provider-guide convention: add qwen3.5-397b-a17b, deepseek-v3.2, llama-3.3-nemotron-super-49b-v1.5, gpt-oss-120b. - cli-config.yaml.example: Document 'nvidia' provider option. - scripts/release.py: Map asurla@nvidia.com → anniesurla in AUTHOR_MAP for CI attribution. E2E verified: `hermes chat --provider nvidia ...` now reaches NVIDIA's endpoint (returns 401 with bogus key instead of argparse error); `hermes doctor` detects NVIDIA NIM when NVIDIA_API_KEY is set.	2026-04-17 13:47:46 -07:00
asurla	3b569ff576	feat(providers): add native NVIDIA NIM provider Adds NVIDIA NIM as a first-class provider: ProviderConfig in auth.py, HermesOverlay in providers.py, curated models (Nemotron plus other open source models hosted on build.nvidia.com), URL mapping in model_metadata.py, aliases (nim, nvidia-nim, build-nvidia, nemotron), and env var tests. Docs updated: providers page, quickstart table, fallback providers table, and README provider list.	2026-04-17 13:47:46 -07:00
Brooklyn Nicholson	bd09e42eac	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-17 15:44:57 -05:00
Teknium	cc3aa76675	build(deps): add qrcode to dingtalk + feishu extras (parity with messaging) (#11627 ) #`4b1567f4` (anthhub) added qrcode to the messaging extra for Weixin's QR login. The same package is needed by: * hermes_cli/dingtalk_auth.py — QR device-flow auth shipped in #11574 * gateway/platforms/feishu.py:3962 — Feishu QR login These extras are independent of [messaging] (users can install hermes-agent[dingtalk] or hermes-agent[feishu] without [messaging]), so the dep needs to be declared on each. Pin matches anthhub's choice (>=7.0,<8) for consistency. The all extra inherits from all three, so it picks up qrcode transitively. Adds parallel tests to tests/test_project_metadata.py — same shape as test_messaging_extra_includes_qrcode_for_weixin_setup. Refs #9431.	2026-04-17 13:31:53 -07:00
Teknium	2ff1ef6ae6	fix(surrogates): sanitize reasoning/reasoning_content/reasoning_details fields (#11628 ) Byte-level reasoning models (xiaomi/mimo-v2-pro, kimi, glm) can emit lone surrogates in reasoning output. The proactive sanitizer walked content/ name/tool_calls but not extra fields like reasoning or the nested reasoning_details array. Surrogates in those fields survived the proactive pass, crashed json.dumps() in the OpenAI SDK, and the recovery block's _sanitize_messages_surrogates(messages) call also didn't check those fields — so 'found' was False, no retry happened, and after 3 attempts the user saw: API call failed after 3 retries. 'utf-8' codec can't encode characters in position N-M: surrogates not allowed Changes: - _sanitize_messages_surrogates: walk any extra string fields (reasoning, reasoning_content, etc.) and recurse into nested dict/list values (reasoning_details). Mirrors _sanitize_messages_non_ascii coverage added in PR #10537. - _sanitize_structure_surrogates: new recursive walker, mirror of _sanitize_structure_non_ascii but for surrogate recovery. - UnicodeEncodeError recovery block: also sanitize api_messages, api_kwargs, and prefill_messages (not just the canonical messages list — the API-copy carries reasoning_content transformed from reasoning and that's what the SDK actually serializes). Always retry on detected surrogate errors, not only when we found something to strip — gate on error type per PR #10537's pattern. Tests: extended tests/cli/test_surrogate_sanitization.py with coverage for reasoning, reasoning_content, reasoning_details (flat and deeply nested), structure walker, and an integration case that reproduces the exact api_messages shape that was crashing.	2026-04-17 13:30:47 -07:00

... 5 6 7 8 9 ...

2341 commits