hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-26 01:01:40 +00:00

Author	SHA1	Message	Date
helix4u	47010e0757	fix(gateway): allow systemd-backed distrobox services	2026-04-17 19:24:30 -07:00
Teknium	2297c5f5ce	fix(auth): restore --label for hermes auth add nous --type oauth persist_nous_credentials() now accepts an optional label kwarg which gets embedded in providers.nous under the 'label' key. _seed_from_singletons() prefers the embedded label over the auto-derived label_from_token() fingerprint when materialising the pool entry, so re-seeding on every load_pool('nous') preserves the user's chosen label. auth_commands.py threads --label through to the helper, restoring parity with how other OAuth providers (anthropic, codex, google, qwen) honor the flag. Tests: 4 new (embed, reseed-survives, no-label fallback, end-to-end through auth_add_command). All 390 nous/auth/credential_pool tests pass.	2026-04-17 19:13:40 -07:00
Antoine Khater	c7fece1f9d	fix: normalise Nous device-code pool source to avoid duplicates Review feedback on the original commit: the helper wrote a pool entry with source `manual:device_code` while `_seed_from_singletons()` upserts with `device_code` (no `manual:` prefix), so the pool grew a duplicate row on every `load_pool()` after login. Normalise: the helper now writes `providers.nous` and delegates the pool write entirely to `_seed_from_singletons()` via a follow-up `load_pool()` call. The canonical source is `device_code`; the helper never materialises a parallel `manual:device_code` entry. - `persist_nous_credentials()` loses its `label` and `source` kwargs — both are now derived by the seed path from the singleton state. - CLI and web dashboard call sites simplified accordingly. - New test `test_persist_nous_credentials_idempotent_no_duplicate_pool_entries` asserts that two consecutive persists leave exactly one pool row and no stray `manual:` entries. - Existing `test_auth_add_nous_oauth_persists_pool_entry` updated to assert the canonical source and single-entry invariant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 19:13:40 -07:00
Antoine Khater	c096a6935f	fix(auth): mirror Nous OAuth credentials to providers.nous on CLI login `hermes auth add nous --type oauth` only wrote credential_pool.nous, leaving providers.nous empty. When the Nous agent_key's 24h TTL expired, run_agent.py's 401-recovery path called resolve_nous_runtime_credentials (which reads providers.nous), got AuthError "Hermes is not logged into Nous Portal", caught it as logger.debug (suppressed at INFO level), and the agent died with "Non-retryable client error" — no signal to the user that recovery even tried. Introduce persist_nous_credentials() as the single source of truth for Nous device-code login persistence. Both auth_commands (CLI) and web_server (dashboard) now route through it, so pool and providers stay in sync at write time. Why: CLI-provisioned profiles couldn't recover from agent_key expiry, producing silent daily outages 24h after first login. PR #6856/#6869 addressed adjacent issues but assumed providers.nous was populated; this one wasn't being written. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 19:13:40 -07:00
Teknium	a155b4a159	feat(auxiliary): default 'auto' routing to main model for all users (#11900 ) Before: aggregator users (OpenRouter / Nous Portal) running 'auto' routing for auxiliary tasks — compression, vision, web extraction, session search, etc. — got routed to a cheap provider-side default model (Gemini Flash). Non-aggregator users already got their main model. Behavior was inconsistent and surprising — users picked Claude / GPT / their preferred model, but side tasks ran on Gemini Flash. After: 'auto' means "use my main chat model" for every user, regardless of provider type. Only when the main provider has no working client does the fallback chain run (OpenRouter → Nous → custom → Codex → API-key providers). Explicit per-task overrides in config.yaml (auxiliary.<task>.provider / .model) still win — they are a hard constraint, not subject to the auto policy. Vision auto-detection follows the same policy: try main provider + main model first (with _PROVIDER_VISION_MODELS overrides preserved for providers like xiaomi and zai that ship a dedicated multimodal model distinct from their chat model). Aggregator strict vision backends are fallbacks, not the primary path. Changes: - agent/auxiliary_client.py: _resolve_auto() drops the `_AGGREGATOR_PROVIDERS` guard. resolve_vision_provider_client() auto branch unifies aggregator and exotic-provider paths — everyone goes through resolve_provider_client() with main_model. Dead _AGGREGATOR_PROVIDERS constant removed (was only used by the guard we just removed). - hermes_cli/main.py: aux config menu copy updated to reflect the new semantics ("'auto' means 'use my main model'"). - tests/agent/test_auxiliary_main_first.py: 12 regression tests covering OpenRouter/Nous/DeepSeek main paths, runtime-override wins, explicit-config wins, vision override preservation for exotic providers, and fallback-chain activation when the main provider has no working client. Co-authored-by: teknium1 <teknium@nousresearch.com>	2026-04-17 19:13:23 -07:00
Teknium	b449a0e049	fix(feishu-comment): use get_hermes_home(); drop dead asyncio wrapper; AUTHOR_MAP Follow-up polish on top of the cherry-picked #11023 commit. - feishu_comment_rules.py: replace import-time "~/.hermes" expanduser fallback with get_hermes_home() from hermes_constants (canonical, profile-safe). - tools/feishu_doc_tool.py, tools/feishu_drive_tool.py: drop the asyncio.get_event_loop().run_until_complete(asyncio.to_thread(...)) dance. Tool handlers run synchronously in a worker thread with no running loop, so the RuntimeError branch was always the one that executed. Calls client.request directly now. Unused asyncio import removed. - tests/gateway/test_feishu.py: add register_p2_customized_event to the mock EventDispatcher builder so the existing adapter test matches the new handler registration for drive.notice.comment_add_v1. - scripts/release.py: map liujinkun@bytedance.com -> liujinkun2025 for contributor attribution on release notes.	2026-04-17 19:04:11 -07:00
liujinkun	85cdb04bd4	feat: add Feishu document comment intelligent reply with 3-tier access control - Full comment handler: parse drive.notice.comment_add_v1 events, build timeline, run agent, deliver reply with chunking support. - 5 tools: feishu_doc_read, feishu_drive_list_comments, feishu_drive_list_comment_replies, feishu_drive_reply_comment, feishu_drive_add_comment. - 3-tier access control rules (exact doc > wildcard "*" > top-level > defaults) with per-field fallback. Config via ~/.hermes/feishu_comment_rules.json, mtime-cached hot-reload. - Self-reply filter using generalized self_open_id (supports future user-identity subscriptions). Receiver check: only process events where the bot is the @mentioned target. - Smart timeline selection, long text chunking, semantic text extraction, session sharing per document, wiki link resolution. Change-Id: I31e82fd6355173dbcc400b8934b6d9799e3137b9	2026-04-17 19:04:11 -07:00
Teknium	9b14b76eb3	fix(wecom): bound req_id cache, revert undocumented is_group change, add tests Follow-up to the cherry-picked contributor fix: - Extract `_remember_chat_req_id()` and bound it at DEDUP_MAX_SIZE like `_reply_req_ids` — the unbounded dict would grow forever on a long- running gateway with many chats. - Move the cache write to AFTER the group/DM policy check so we don't cache req_ids from blocked senders. - Revert the undocumented `is_group` change: the contributor flipped `chattype == 'group'` to `bool(chatid)`, which wasn't mentioned in the PR description and weakens the signal (chattype is the explicit hint; relying on chatid presence assumes DMs never carry it). Keep the original check. - Drop the defensive `getattr(self, '_last_chat_req_ids', {})` reads at both send sites — the attribute is initialized in __init__. - Update `test_send_uses_passive_reply_stream_...` → `_markdown_...` to match the new msgtype, and add a new TestWeComZombieSessionFix class covering device_id presence in subscribe, per-chat req_id caching + bounding, blocked-sender cache exclusion, and the group APP_CMD_RESPONSE fallback path.	2026-04-17 19:03:29 -07:00
Teknium	04a0c3cb95	fix(config): preserve env refs when save_config rewrites config (#11892 ) Co-authored-by: binhnt92 <84617813+binhnt92@users.noreply.github.com>	2026-04-17 19:03:26 -07:00
Teknium	8444f66890	feat(hermes model): add Configure auxiliary models UI to `hermes model` (#11891 ) Previously users had to hand-edit config.yaml to route individual auxiliary tasks (vision, compression, web_extract, etc.) to a specific provider+model. Add a first-class picker reachable from the bottom of the existing `hermes model` provider list. Flow: hermes model → Configure auxiliary models... → <task picker: 9 tasks, shows current setting inline> → <provider picker: authenticated providers + auto + custom> → <model picker: curated list + live pricing> The aux picker does NOT re-run credential/OAuth setup; users authenticate providers through the normal `hermes model` flow, then route aux tasks to them here. `list_authenticated_providers()` gates the list to providers the user has configured. Also: - 'Cancel' entry relabeled 'Leave unchanged' (sentinel still 'cancel' internally, so dispatch logic is unchanged) - 'Reset all to auto' entry to bulk-clear aux overrides; preserves user-tuned timeout / download_timeout values - Adds `title_generation` task to DEFAULT_CONFIG.auxiliary — the task was called from agent/title_generator.py but was missing from defaults, so config-backed timeout overrides never worked for it Co-authored-by: teknium1 <teknium@nousresearch.com>	2026-04-17 19:02:06 -07:00
Sara Reynolds	8ab1aa2efc	fix(gateway): fix discrepancies in gateway status	2026-04-17 18:58:29 -07:00
Xowiek	511ed4dacc	fix(gateway): bypass active-session guard for gateway-handled slash commands	2026-04-17 18:58:03 -07:00
helix4u	016ae5c334	fix(kimi): force 0.6 on main chat path	2026-04-17 18:47:01 -07:00
Teknium	304fb921bf	fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843 ) Both fixes close process leaks observed in production (18+ orphaned agent-browser node daemons, 15+ orphaned paste.rs sleep interpreters accumulated over ~3 days, ~2.7 GB RSS). ## agent-browser daemon leak Previously the orphan reaper (_reap_orphaned_browser_sessions) only ran from _start_browser_cleanup_thread, which is only invoked on the first browser tool call in a process. Hermes sessions that never used the browser never swept orphans, and the cross-process orphan detection relied on in-process _active_sessions, which doesn't see other hermes PIDs' sessions (race risk). - Write <session>.owner_pid alongside the socket dir recording the hermes PID that owns the daemon (extracted into _write_owner_pid for direct testability). - Reaper prefers owner_pid liveness over in-process _active_sessions. Cross-process safe: concurrent hermes instances won't reap each other's daemons. Legacy tracked_names fallback kept for daemons that predate owner_pid. - atexit handler (_emergency_cleanup_all_sessions) now always runs the reaper, not just when this process had active sessions — every clean hermes exit sweeps accumulated orphans. ## paste.rs auto-delete leak _schedule_auto_delete spawned a detached Python subprocess per call that slept 6 hours then issued DELETE requests. No dedup, no tracking — every 'hermes debug share' invocation added ~20 MB of resident Python interpreters that stuck around until the sleep finished. - Replaced the spawn with ~/.hermes/pastes/pending.json: records {url, expire_at} entries. - _sweep_expired_pastes() synchronously DELETEs past-due entries on every 'hermes debug' invocation (run_debug() dispatcher). - Network failures stay in pending.json for up to 24h, then give up (paste.rs's own retention handles the 'user never runs hermes again' edge case). - Zero subprocesses; regression test asserts subprocess/Popen/time.sleep never appear in the function source (skipping docstrings via AST). ## Validation \| \| Before \| After \| \|------------------------------\|---------------\|--------------\| \| Orphan agent-browser daemons \| 18 accumulated\| 2 (live) \| \| paste.rs sleep interpreters \| 15 accumulated\| 0 \| \| RSS reclaimed \| - \| ~2.7 GB \| \| Targeted tests \| - \| 2253 pass \| E2E verified: alive-owner daemons NOT reaped; dead-owner daemons SIGTERM'd and socket dirs cleaned; pending.json sweep deletes expired entries without spawning subprocesses.	2026-04-17 18:46:30 -07:00
helix4u	64b354719f	Support browser CDP URL from config	2026-04-17 16:05:04 -07:00
brooklyn!	e9b8ece103	Merge pull request #4692 from NousResearch/feat/ink-refactor Feat/ink refactor	2026-04-17 18:02:37 -05:00
Teknium	3f43aec15d	fix(tools): bound _read_tracker sub-containers + prune _completion_consumed (#11839 ) Two accretion-over-time leaks that compound over long CLI / gateway lifetimes. Both were flagged in the memory-leak audit. ## file_tools._read_tracker _read_tracker[task_id] holds three sub-containers that grew unbounded: read_history set of (path, offset, limit) tuples — 1 per unique read dedup dict of (path, offset, limit) → mtime — same growth pattern read_timestamps dict of resolved_path → mtime — 1 per unique path A CLI session uses one stable task_id for its lifetime, so these were uncapped. A 10k-read session accumulated ~1.5MB of tracker state that the tool no longer needed (only the most recent reads are relevant for dedup, consecutive-loop detection, and write/patch external-edit warnings). Fix: _cap_read_tracker_data() enforces hard caps on each container after every add. Defaults: read_history=500, dedup=1000, read_timestamps=1000. Eviction is insertion-order (Python 3.7+ dict guarantee) for the dicts; arbitrary for the set (which only feeds diagnostic summaries). ## process_registry._completion_consumed Module-level set that recorded every session_id ever polled / waited / logged. No pruning. Each entry is ~20 bytes, so the absolute leak is small, but on a gateway processing thousands of background commands per day the set grows until process exit. Fix: _prune_if_needed() now discards _completion_consumed entries alongside the session dict evictions it already performs (both the TTL-based prune and the LRU-over-cap prune). Adds a final belt-and-suspenders pass that drops any dangling entries whose session_id no longer appears in _running or _finished. Tests: tests/tools/test_accretion_caps.py — 9 cases * Each container bound respected, oldest evicted * No-op when under cap (no unnecessary work) * Handles missing sub-containers without crashing * Live read_file_tool path enforces caps end-to-end * _completion_consumed pruned on TTL expiry * _completion_consumed pruned on LRU eviction * Dangling entries (no backing session) cleared Broader suite: 3486 tests/tools + tests/cli pass. The single flake (test_alias_command_passes_args) reproduces on unchanged main — known cross-test pollution under suite-order load.	2026-04-17 15:53:57 -07:00
Brooklyn Nicholson	aa583cb14e	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-17 17:51:40 -05:00
helix4u	2b60478fc2	fix(kimi): force kimi-for-coding temperature to 0.6	2026-04-17 15:49:14 -07:00
Teknium	c6fd2619f7	fix(gemini-cli): surface MODEL_CAPACITY_EXHAUSTED cleanly + drop retired gemma-4-26b (#11833 ) Google-side 429 Code Assist errors now flow through Hermes' normal rate-limit path (status_code on the exception, Retry-After preserved via error.response) instead of being opaque RuntimeErrors. User sees a one-line capacity message instead of a 500-char JSON dump. Changes - CodeAssistError grows status_code / response / retry_after / details attrs. _extract_status_code in error_classifier picks up status_code and classifies 429 as FailoverReason.rate_limit, so fallback_providers triggers the same way it does for SDK errors. run_agent.py line ~10428 already walks error.response.headers for Retry-After — preserving the response means that path just works. - _gemini_http_error parses the Google error envelope (error.status + error.details[].reason from google.rpc.ErrorInfo, retryDelay from google.rpc.RetryInfo). MODEL_CAPACITY_EXHAUSTED / RESOURCE_EXHAUSTED / 404 model-not-found each produce a human-readable message; unknown shapes fall back to the previous raw-body format. - Drop gemma-4-26b-it from hermes_cli/models.py, hermes_cli/setup.py, and agent/model_metadata.py — Google returned 404 for it today in local repro. Kept gemma-4-31b-it (capacity-constrained but not retired). Validation \| \| Before \| After \| \|---------------------------\|--------------------------------\|-------------------------------------------\| \| Error message \| 'Code Assist returned HTTP 429: {500 chars JSON}' \| 'Gemini capacity exhausted for gemini-2.5-pro (Google-side throttle...)' \| \| status_code on error \| None (opaque RuntimeError) \| 429 \| \| Classifier reason \| unknown (string-match fallback) \| FailoverReason.rate_limit \| \| Retry-After honored \| ignored \| extracted from RetryInfo or header \| \| gemma-4-26b-it picker \| advertised (404s on Google) \| removed \| Unit + E2E tests cover non-streaming 429, streaming 429, 404 model-not-found, Retry-After header fallback, malformed body, and classifier integration. Targeted suites: tests/agent/test_gemini_cloudcode.py (81 tests), full tests/hermes_cli (2203 tests) green. Co-authored-by: teknium1 <teknium@nousresearch.com>	2026-04-17 15:34:12 -07:00
Teknium	d2206c69cc	fix(qqbot): add back-compat for env var rename; drop qrcode core dep Follow-up to WideLee's salvaged PR #11582. Back-compat for QQ_HOME_CHANNEL → QQBOT_HOME_CHANNEL rename: - gateway/config.py reads QQBOT_HOME_CHANNEL, falls back to QQ_HOME_CHANNEL with a one-shot deprecation warning so users on the old name aren't silently broken. - cron/scheduler.py: _HOME_TARGET_ENV_VARS['qqbot'] now maps to the new name; _get_home_target_chat_id falls back to the legacy name via a _LEGACY_HOME_TARGET_ENV_VARS table. - hermes_cli/status.py + hermes_cli/setup.py: honor both names when displaying or checking for missing home channels. - hermes_cli/config.py: keep legacy QQ_HOME_CHANNEL[_NAME] in _EXTRA_ENV_KEYS so .env sanitization still recognizes them. Scope cleanup: - Drop qrcode from core dependencies and requirements.txt (remains in messaging/dingtalk/feishu extras). _qqbot_render_qr already degrades gracefully when qrcode is missing, printing a 'pip install qrcode' tip and falling back to URL-only display. - Restore @staticmethod on QQAdapter._detect_message_type (it doesn't use self). Revert the test change that was only needed when it was converted to an instance method. - Reset uv.lock to origin/main; the PR's stale lock also included unrelated changes (atroposlib source URL, hermes-agent version bump, fastapi additions) that don't belong. Verified E2E: - Existing user (QQ_HOME_CHANNEL set): gateway + cron both pick up the legacy name; deprecation warning logs once. - Fresh user (QQBOT_HOME_CHANNEL set): gateway + cron use new name, no warning. - Both set: new name wins on both surfaces. Targeted tests: 296 passed, 4 skipped (qqbot + cron + hermes_cli).	2026-04-17 15:31:14 -07:00
WideLee	103beea7a6	fix(qqbot): fix test failures after package refactor - Re-export _ssrf_redirect_guard from __init__.py - Fix _parse_json @staticmethod using self._log_tag - Update test_detect_message_type to call as instance method - Fix mock.patch path for httpx.AsyncClient in adapter submodule	2026-04-17 15:31:14 -07:00
Teknium	31e7276474	fix(gateway): consolidate per-session cleanup; close SessionDB on shutdown (#11800 ) Three closely-related fixes for shutdown / lifecycle hygiene. 1. _release_running_agent_state(session_key) helper ---------------------------------------------------- Per-running-agent state lived in three dicts that drifted out of sync across cleanup sites: self._running_agents — AIAgent per session_key self._running_agents_ts — start timestamp per session_key self._busy_ack_ts — last busy-ack timestamp per session_key Inventory before this PR: 8 sites: del self._running_agents[key] — only 1 (stale-eviction) cleaned all three — 1 cleaned _running_agents + _running_agents_ts only — 6 cleaned _running_agents only Each missed entry was a (str, float) tuple per session per gateway lifetime — small, persistent, accumulates across thousands of sessions over months. Per-platform leaks compounded. This change adds a single helper that pops all three dicts in lockstep, and replaces every bare 'del self._running_agents[key]' site with it. Per-session state that PERSISTS across turns (_session_model_overrides, _voice_mode, _pending_approvals, _update_prompt_pending) is intentionally NOT touched here — those have their own lifecycles tied to user actions, not turn boundaries. 2. _running_agents_ts cleared in _stop_impl ---------------------------------------- Was being missed alongside _running_agents.clear(); now included. 3. SessionDB close() in _stop_impl --------------------------------- The SQLite WAL write lock stayed held by the old gateway connection until Python actually exited — causing 'database is locked' errors when --replace launched a new gateway against the same file. We now explicitly close both self._db and self.session_store._db inside _stop_impl, with try/except so a flaky close on one doesn't block the other. Tests ----- tests/gateway/test_session_state_cleanup.py — 10 cases covering: * helper pops all three dicts atomically * idempotent on missing/empty keys * preserves other sessions * tolerates older runners without _busy_ack_ts attribute * thread-safe under concurrent release * regression guard: scans gateway/run.py and fails if a future contributor reintroduces 'del self._running_agents[...]' outside docstrings * SessionDB close called on both holders during shutdown * shutdown tolerates missing session_store * shutdown tolerates close() raising on one db (other still closes) Broader gateway suite: 3108 passed (vs 3100 on baseline) — failure delta is +8 net passes; the 10 remaining failures are pre-existing cross-test pollution / missing optional deps (matrix needs olm, signal/telegram approval flake, dingtalk Mock wiring), all reproduce on stashed baseline.	2026-04-17 15:18:23 -07:00
Teknium	036dacf659	feat(telegram): auto-wrap markdown tables in code blocks (#11794 ) Telegram's MarkdownV2 has no table syntax — pipes get backslash-escaped and tables render as noisy unaligned text. format_message now detects GFM-style pipe tables (header row + delimiter row + optional body) and wraps them in ``` fences before the existing MarkdownV2 conversion runs. Telegram renders fenced code blocks as monospace preformatted text with columns intact. Tables already inside an existing code block are left alone. Plain prose with pipes, lone '---' horizontal rules, and non-table content are unaffected. Closes the recurring community request to stop having to ask the agent to re-render tables as code blocks manually.	2026-04-17 14:27:26 -07:00
Teknium	3207b9bda0	test: speed up slow tests (backoff + subprocess + IMDS network) (#11797 ) Cuts shard-3 local runtime in half by neutralizing real wall-clock waits across three classes of slow test: ## 1. Retry backoff mocks - tests/run_agent/conftest.py (NEW): autouse fixture mocks jittered_backoff to 0.0 so the `while time.time() < sleep_end` busy-loop exits immediately. No global time.sleep mock (would break threading tests). - test_anthropic_error_handling, test_413_compression, test_run_agent_codex_responses, test_fallback_model: per-file fixtures mock time.sleep / asyncio.sleep for retry / compression paths. - test_retaindb_plugin: cap the retaindb module's bound time.sleep to 0.05s via a per-test shim (background writer-thread retries sleep 2s after errors; tests don't care about exact duration). Plus replace arbitrary time.sleep(N) waits with short polling loops bounded by deadline. ## 2. Subprocess sleeps in production code - test_update_gateway_restart: mock time.sleep. Production code does time.sleep(3) after `systemctl restart` to verify the service survived. Tests mock subprocess.run \u2014 nothing actually restarts \u2014 so the wait is dead time. ## 3. Network / IMDS timeouts (biggest single win) - tests/conftest.py: add AWS_EC2_METADATA_DISABLED=true plus AWS_METADATA_SERVICE_TIMEOUT=1 and ATTEMPTS=1. boto3 falls back to IMDS (169.254.169.254) when no AWS creds are set. Any test hitting has_aws_credentials() / resolve_aws_auth_env_var() (e.g. test_status, test_setup_copilot_acp, anything that touches provider auto-detect) burned ~2-4s waiting for that to time out. - test_exit_cleanup_interrupt: explicitly mock resolve_runtime_provider which was doing real network auto-detect (~4s). Tests don't care about provider resolution \u2014 the agent is already mocked. - test_timezone: collapse the 3-test "TZ env in subprocess" suite into 2 tests by checking both injection AND no-leak in the same subprocess spawn (was 3 \u00d7 3.2s, now 2 \u00d7 4s). ## Validation \| Test \| Before \| After \| \|---\|---\|---\| \| test_anthropic_error_handling (8 tests) \| ~80s \| ~15s \| \| test_413_compression (14 tests) \| ~18s \| 2.3s \| \| test_retaindb_plugin (67 tests) \| ~13s \| 1.3s \| \| test_status_includes_tavily_key \| 4.0s \| 0.05s \| \| test_setup_copilot_acp_skips_same_provider_pool_step \| 8.0s \| 0.26s \| \| test_update_gateway_restart (5 tests) \| ~18s total \| ~0.35s total \| \| test_exit_cleanup_interrupt (2 tests) \| 8s \| 1.5s \| \| Matrix shard 3 local \| 108s \| 50s \| No behavioral contract changed \u2014 tests still verify retry happens, service restart logic runs, etc.; they just don't burn real seconds waiting for it. Supersedes PR #11779 (those changes are included here).	2026-04-17 14:21:22 -07:00
Teknium	eb07c05646	fix(gateway): prune stale SessionStore entries to bound memory + disk (#11789 ) SessionStore._entries grew unbounded. Every unique (platform, chat_id, thread_id, user_id) tuple ever seen was kept in RAM and rewritten to sessions.json on every message. A Discord bot in 100 servers x 100 channels x ~100 rotating users accumulates on the order of 10^5 entries after a few months; each sessions.json write becomes an O(n) fsync. Nothing trimmed this — there was no TTL, no cap, no eviction path. Changes ------- * SessionStore.prune_old_entries(max_age_days) — drops entries whose updated_at is older than the cutoff. Preserves: - suspended entries (user paused them via /stop for later resume) - entries with an active background process attached Pruning is functionally identical to a natural reset-policy expiry: SQLite transcript stays, session_key -> session_id mapping dropped, returning user gets a fresh session. * GatewayConfig.session_store_max_age_days (default 90; 0 disables). Serialized in to_dict/from_dict, coerced from bad types / negatives to safe defaults. No migration needed — missing field -> 90 days. * _session_expiry_watcher calls prune_old_entries once per hour (first tick is immediate). Uses the existing watcher loop so no new background task is created. Why not more aggressive ----------------------- 90 days is long enough that legitimate long-idle users (seasonal, vacation, etc.) aren't surprised — pruning just means they get a fresh session on return, same outcome they'd get from any other reset-policy trigger. Admins can lower it via config; 0 disables. Tests ----- tests/gateway/test_session_store_prune.py — 17 cases covering: * entry age based on updated_at, not created_at * max_age_days=0 disables; negative coerces to 0 * suspended + active-process entries are skipped * _save fires iff something was removed * disk JSON reflects post-prune state * thread safety against concurrent readers * config field roundtrips + graceful fallback on bad values * watcher gate logic (first tick prunes, subsequent within 1h don't) 119 broader session/gateway tests remain green.	2026-04-17 13:48:49 -07:00
Teknium	f362083c64	fix(providers): complete NVIDIA NIM parity with other providers Follow-up on the native NVIDIA NIM provider salvage. The original PR wired PROVIDER_REGISTRY + HERMES_OVERLAYS correctly but missed several touchpoints required for full parity with other OpenAI-compatible providers (xai, huggingface, deepseek, zai). Gaps closed: - hermes_cli/main.py: - Add 'nvidia' to the _model_flow_api_key_provider dispatch tuple so selecting 'NVIDIA NIM' in `hermes model` actually runs the api-key provider flow (previously fell through silently). - Add 'nvidia' to `hermes chat --provider` argparse choices so the documented test command (`hermes chat --provider nvidia --model ...`) parses successfully. - hermes_cli/config.py: Register NVIDIA_API_KEY and NVIDIA_BASE_URL in OPTIONAL_ENV_VARS so setup wizard can prompt for them and they're auto-added to the subprocess env blocklist. - hermes_cli/doctor.py: Add NVIDIA NIM row to `_apikey_providers` so `hermes doctor` probes https://integrate.api.nvidia.com/v1/models. - hermes_cli/dump.py: Add NVIDIA_API_KEY → 'nvidia' mapping for `hermes dump` credential masking. - tests/tools/test_local_env_blocklist.py: Extend registry_vars fixture with NVIDIA_API_KEY to verify it's blocked from leaking into subprocesses. - agent/model_metadata.py: Add 'nemotron' → 131072 context-length entry so all Nemotron variants get 128K context via substring match (rather than falling back to MINIMUM_CONTEXT_LENGTH). - hermes_cli/models.py: Fix hallucinated model ID 'nvidia/nemotron-3-nano-8b-a4b' → 'nvidia/nemotron-3-nano-30b-a3b' (verified against live integrate.api.nvidia.com/v1/models catalog). Expand curated list from 5 to 9 agentic models mapping to OpenRouter defaults per provider-guide convention: add qwen3.5-397b-a17b, deepseek-v3.2, llama-3.3-nemotron-super-49b-v1.5, gpt-oss-120b. - cli-config.yaml.example: Document 'nvidia' provider option. - scripts/release.py: Map asurla@nvidia.com → anniesurla in AUTHOR_MAP for CI attribution. E2E verified: `hermes chat --provider nvidia ...` now reaches NVIDIA's endpoint (returns 401 with bogus key instead of argparse error); `hermes doctor` detects NVIDIA NIM when NVIDIA_API_KEY is set.	2026-04-17 13:47:46 -07:00
asurla	3b569ff576	feat(providers): add native NVIDIA NIM provider Adds NVIDIA NIM as a first-class provider: ProviderConfig in auth.py, HermesOverlay in providers.py, curated models (Nemotron plus other open source models hosted on build.nvidia.com), URL mapping in model_metadata.py, aliases (nim, nvidia-nim, build-nvidia, nemotron), and env var tests. Docs updated: providers page, quickstart table, fallback providers table, and README provider list.	2026-04-17 13:47:46 -07:00
Brooklyn Nicholson	bd09e42eac	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-17 15:44:57 -05:00
Teknium	cc3aa76675	build(deps): add qrcode to dingtalk + feishu extras (parity with messaging) (#11627 ) #`4b1567f4` (anthhub) added qrcode to the messaging extra for Weixin's QR login. The same package is needed by: * hermes_cli/dingtalk_auth.py — QR device-flow auth shipped in #11574 * gateway/platforms/feishu.py:3962 — Feishu QR login These extras are independent of [messaging] (users can install hermes-agent[dingtalk] or hermes-agent[feishu] without [messaging]), so the dep needs to be declared on each. Pin matches anthhub's choice (>=7.0,<8) for consistency. The all extra inherits from all three, so it picks up qrcode transitively. Adds parallel tests to tests/test_project_metadata.py — same shape as test_messaging_extra_includes_qrcode_for_weixin_setup. Refs #9431.	2026-04-17 13:31:53 -07:00
Teknium	2ff1ef6ae6	fix(surrogates): sanitize reasoning/reasoning_content/reasoning_details fields (#11628 ) Byte-level reasoning models (xiaomi/mimo-v2-pro, kimi, glm) can emit lone surrogates in reasoning output. The proactive sanitizer walked content/ name/tool_calls but not extra fields like reasoning or the nested reasoning_details array. Surrogates in those fields survived the proactive pass, crashed json.dumps() in the OpenAI SDK, and the recovery block's _sanitize_messages_surrogates(messages) call also didn't check those fields — so 'found' was False, no retry happened, and after 3 attempts the user saw: API call failed after 3 retries. 'utf-8' codec can't encode characters in position N-M: surrogates not allowed Changes: - _sanitize_messages_surrogates: walk any extra string fields (reasoning, reasoning_content, etc.) and recurse into nested dict/list values (reasoning_details). Mirrors _sanitize_messages_non_ascii coverage added in PR #10537. - _sanitize_structure_surrogates: new recursive walker, mirror of _sanitize_structure_non_ascii but for surrogate recovery. - UnicodeEncodeError recovery block: also sanitize api_messages, api_kwargs, and prefill_messages (not just the canonical messages list — the API-copy carries reasoning_content transformed from reasoning and that's what the SDK actually serializes). Always retry on detected surrogate errors, not only when we found something to strip — gate on error type per PR #10537's pattern. Tests: extended tests/cli/test_surrogate_sanitization.py with coverage for reasoning, reasoning_content, reasoning_details (flat and deeply nested), structure walker, and an integration case that reproduces the exact api_messages shape that was crashing.	2026-04-17 13:30:47 -07:00
Henkey	cb883f9e97	fix(acp): improve zed integration	2026-04-17 13:29:26 -07:00
Brooklyn Nicholson	d5b9db8b4a	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-17 15:13:36 -05:00
Teknium	d0e1388ca9	fix(tests): make AIAgent constructor calls self-contained (#11755 ) * fix(tests): make AIAgent constructor calls self-contained (no env leakage) Tests in tests/run_agent/ were constructing AIAgent() without passing both api_key and base_url, then relying on leaked state from other tests in the same xdist worker (or process-level env vars) to keep provider resolution happy. Under hermetic conftest + pytest-split, that state is gone and the tests fail with 'No LLM provider configured'. Fix: pass both api_key and base_url explicitly on 47 AIAgent() construction sites across 13 files. AIAgent.__init__ with both set takes the direct-construction path (line 960 in run_agent.py) and skips the resolver entirely. One call site (test_none_base_url_passed_as_none) left alone — that test asserts behavior for base_url=None specifically. This is a prerequisite for any future matrix-split or stricter isolation work, and lands cleanly on its own. Validation: - tests/run_agent/ full: 760 passed, 0 failed (local) - Previously relied on cross-test pollution; now self-contained * fix(tests): update opencode-go model order assertion to match kimi-k2.5-first commit `78a74bb` promoted kimi-k2.5 to first position in model suggestion lists but didn't update this test, which has been failing on main since. Reorder expected list to match the new canonical order.	2026-04-17 12:32:03 -07:00
Brooklyn Nicholson	0dd5055d59	fix(tui): first-run setup preflight + actionable no-provider panel - tui_gateway: new `setup.status` RPC that reuses CLI's `_has_any_provider_configured()`, so the TUI can ask the same question the CLI bootstrap asks before launching a session - useSessionLifecycle: preflight `setup.status` before both `newSession` and `resumeById`, and render a clear "Setup Required" panel when no provider is configured instead of booting a session that immediately fails with `agent init failed` - createGatewayEventHandler: drop duplicate startup resume logic in favor of the preflighted `resumeById`, and special-case the no-provider agent-init error as a last-mile fallback to the same setup panel - add regression tests for both paths	2026-04-17 10:58:01 -05:00
Brooklyn Nicholson	5b386ced71	fix(tui): approval flow + input ergonomics + selection perf - tui_gateway: route approvals through gateway callback (HERMES_GATEWAY_SESSION/ HERMES_EXEC_ASK) so dangerous commands emit approval.request instead of silently falling through the CLI input() path and auto-denying - approval UX: dedicated PromptZone between transcript and composer, safer defaults (sel=0, numeric quick-picks, no Esc=deny), activity trail line, outcome footer under the cost row - text input: Ctrl+A select-all, real forward Delete, Ctrl+W always consumed (fixes Ctrl+Backspace at cursor 0 inserting literal w) - hermes-ink selection: swap synchronous onRender() for throttled scheduleRender() on drag, and only notify React subscribers on presence change — no more per-cell paint/subscribe spam - useConfigSync: silence config.get polling failures instead of surfacing 'error: timeout: config.get' in the transcript	2026-04-17 10:37:48 -05:00
Brooklyn Nicholson	1f37ef2fd1	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-17 08:59:33 -05:00
Young Sherlock	8dcd08d8bb	Fix Weixin media uploads and refresh lockfile	2026-04-17 06:50:36 -07:00
anthhub	4b1567f425	fix(packaging): include qrcode in messaging extra	2026-04-17 06:50:36 -07:00
Teknium	3f3d8a7b24	fix(discord): strip mention syntax from auto-thread names Previously a message like `<@&1490963422786093149> help` would spawn a thread literally named `<@&1490963422786093149> help`, exposing raw Discord mention markers in the thread list. Only user mentions (`<@id>`) were being stripped upstream — role mentions (`<@&id>`) and channel mentions (`<#id>`) leaked through. Fix: strip all three mention patterns in `_auto_create_thread` before building the thread name. Collapse runs of whitespace left by the removal. If the entire content was mention-only, fall back to 'Hermes' instead of an empty title. Fixes #6336. Tests: two new regression guards in test_discord_slash_commands.py covering mixed-mention content and mention-only content.	2026-04-17 06:46:52 -07:00
sgaofen	32a694ad5f	fix(discord): fall back when auto-thread creation fails	2026-04-17 06:46:52 -07:00
OwenYWT	f5dc4e905d	fix(discord): skip auto-threading reply messages	2026-04-17 06:46:52 -07:00
Matteo De Agazio	93fe4b357d	fix(discord): free-response channels skip auto-threading Free-response channels already bypassed the @mention gate so users could chat inline with the bot, but auto-threading still fired on every message — spinning off a thread per message and defeating the lightweight-chat purpose. Fix: fold `is_free_channel` into `skip_thread` so threading is skipped whenever the channel is in DISCORD_FREE_RESPONSE_CHANNELS (via env or discord.free_response_channels in config.yaml). Net change: one line in _handle_message + one regression test. Partially addresses #9399. Authored by @Hypn0sis (salvaged from PR #9650; the bundled 'smart' auto-thread mode from that PR was dropped in favor of deterministic true/false semantics).	2026-04-17 06:46:52 -07:00
Teknium	8d7b7feb0d	fix(gateway): bound _agent_cache with LRU cap + idle TTL eviction (#11565 ) * fix(gateway): bound _agent_cache with LRU cap + idle TTL eviction The per-session AIAgent cache was unbounded. Each cached AIAgent holds LLM clients, tool schemas, memory providers, and a conversation buffer. In a long-lived gateway serving many chats/threads, cached agents accumulated indefinitely — entries were only evicted on /new, /model, or session reset. Changes: - Cache is now an OrderedDict so we can pop least-recently-used entries. - _enforce_agent_cache_cap() pops entries beyond _AGENT_CACHE_MAX_SIZE=64 when a new agent is inserted. LRU order is refreshed via move_to_end() on cache hits. - _sweep_idle_cached_agents() evicts entries whose AIAgent has been idle longer than _AGENT_CACHE_IDLE_TTL_SECS=3600s. Runs from the existing _session_expiry_watcher so no new background task is created. - The expiry watcher now also pops the cache entry after calling _cleanup_agent_resources on a flushed session — previously the agent was shut down but its reference stayed in the cache dict. - Evicted agents have _cleanup_agent_resources() called on a daemon thread so the cache lock isn't held during slow teardown. Both tuning constants live at module scope so tests can monkeypatch them without touching class state. Tests: 7 new cases in test_agent_cache.py covering LRU eviction, move_to_end refresh, cleanup thread dispatch, idle TTL sweep, defensive handling of agents without _last_activity_ts, and plain-dict test fixture tolerance. * tweak: bump _AGENT_CACHE_MAX_SIZE 64 -> 128 * fix(gateway): never evict mid-turn agents; live spillover tests The prior commit could tear down an active agent if its session_key happened to be LRU when the cap was exceeded. AIAgent.close() kills process_registry entries for the task, tears down the terminal sandbox, closes the OpenAI client (sets self.client = None), and cascades .close() into any active child subagents — all fatal if the agent is still processing a turn. Changes: - _enforce_agent_cache_cap and _sweep_idle_cached_agents now look at GatewayRunner._running_agents and skip any entry whose AIAgent instance is present (identity via id(), so MagicMock doesn't confuse lookup in tests). _AGENT_PENDING_SENTINEL is treated as 'not active' since no real agent exists yet. - Eviction only considers the LRU-excess window (first size-cap entries). If an excess slot is held by a mid-turn agent, we skip it WITHOUT compensating by evicting a newer entry. A freshly inserted session (zero cache history) shouldn't be punished to protect a long-lived one that happens to be busy. - Cache may therefore stay transiently over cap when load spikes; a WARNING is logged so operators can see it, and the next insert re-runs the check after some turns have finished. New tests (TestAgentCacheActiveSafety + TestAgentCacheSpilloverLive): - Active LRU entry is skipped; no newer entry compensated - Mixed active/idle excess window: only idle slots go - All-active cache: no eviction, WARNING logged, all clients intact - _AGENT_PENDING_SENTINEL doesn't block other evictions - Idle-TTL sweep skips active agents - End-to-end: active agent's .client survives eviction attempt - Live fill-to-cap with real AIAgents, then spillover - Live: CAP=4 all active + 1 newcomer — cache grows to 5, no teardown - Live: 8 threads racing 160 inserts into CAP=16 — settles at 16 - Live: evicted session's next turn gets a fresh agent that works 30 tests pass (13 pre-existing + 17 new). Related gateway suites (model switch, session reset, proxy, etc.) all green. * fix(gateway): cache eviction preserves per-task state for session resume The prior commits called AIAgent.close() on cache-evicted agents, which tears down process_registry entries, terminal sandbox, and browser daemon for that task_id — permanently. Fine for session-expiry (session ended), wrong for cache eviction (session may resume). Real-world scenario: a user leaves a Telegram session open for 2+ hours, idle TTL evicts the cached AIAgent, user returns and sends a message. Conversation history is preserved via SessionStore, but their terminal sandbox (cwd, env vars, bg shells) and browser state were destroyed. Fix: split the two cleanup modes. close() Full teardown — session ended. Kills bg procs, tears down terminal sandbox + browser daemon, closes LLM client. Used by session-expiry, /new, /reset (unchanged). release_clients() Soft cleanup — session may resume. Closes LLM client only. Leaves process_registry, terminal sandbox, browser daemon intact for the resuming agent to inherit via shared task_id. Gateway cache eviction (_enforce_agent_cache_cap, _sweep_idle_cached_agents) now dispatches _release_evicted_agent_soft on the daemon thread instead of _cleanup_agent_resources. All session-expiry call sites of _cleanup_agent_resources are unchanged. Tests (TestAgentCacheIdleResume, 5 new cases): - release_clients does NOT call process_registry.kill_all - release_clients does NOT call cleanup_vm / cleanup_browser - release_clients DOES close the LLM client (agent.client is None after) - close() vs release_clients() — semantic contract pinned - Idle-evicted session's rebuild with same session_id gets same task_id Updated test_cap_triggers_cleanup_thread to assert the soft path fires and the hard path does NOT. 35 tests pass in test_agent_cache.py; 67 related tests green.	2026-04-17 06:36:34 -07:00
Jorge	86f02d8d71	refactor(cli): align model picker viewport with PR #11260 vocabulary Match the row-budget naming introduced in PR #11260 for the approval and clarify panels: rename chrome_reserve=14 into reserved_below=6 (input chrome below the panel) + panel_chrome=6 (this panel's borders, blanks, and hint row) + min_visible=3 (floor on visible items). Same arithmetic as before, but a reviewer reading both files now sees the same handle. Compact-chrome mode is intentionally not adopted — that pattern fits the "fixed mandatory content might overflow" shape of approval/clarify (solved by truncating with a marker), whereas the picker's overflow is already handled by the scrolling viewport.	2026-04-17 06:33:21 -07:00
Jorge	5fbe16635b	fix(cli): scroll the /model picker viewport so long catalogs aren't clipped The /model picker rendered every choice into a prompt_toolkit Window with no max height. Providers with many models (e.g. Ollama Cloud's 36+) overflowed the terminal, clipping the bottom border and the last items. - Add HermesCLI._compute_model_picker_viewport() to slide a scroll offset that keeps the cursor on screen, sized from the live terminal rows minus chrome reserved for input/status/border. - Render only the visible slice in _get_model_picker_display() and persist the offset on _model_picker_state across redraws. - Bind ESC (eager) to close the picker, matching the Cancel button. - Cover the viewport math with 8 unit tests in tests/hermes_cli/test_model_picker_viewport.py.	2026-04-17 06:33:21 -07:00
Teknium	f64241ed90	feat(cron+tests): extend origin fallback to email/dingtalk/qqbot + fix Weixin test mocks Cron origin fallback extension (builds on #9193's _HOME_TARGET_ENV_VARS): adds the three remaining origin-fallback-eligible platforms that have home channel env vars configured in gateway/config.py but use non-generic env var names: - email → EMAIL_HOME_ADDRESS (non-standard suffix) - dingtalk → DINGTALK_HOME_CHANNEL - qqbot → QQ_HOME_CHANNEL (non-standard prefix: QQ_ not QQBOT_) Picks up the completeness intent of @Xowiek's PR #11317 using the architecturally-correct dict-based lookup from #9193, so platforms with non-standard env var names actually resolve instead of silently missing. Extended the parametrized regression test to cover the new three. Weixin test mock alignment (builds on #10091's _send_session split): Three test sites added in Batch 1 (TestWeixinSendImageFileParameterName) and Batch 3 (TestWeixinVoiceSending) mocked only adapter._session, but #10091 switched the send paths to check self._send_session. Added the companion setter so the tests stay green with the session split in place.	2026-04-17 06:26:43 -07:00
bde3249023	b46db048c3	fix(cron): align home target env lookup	2026-04-17 06:26:43 -07:00
bde3249023	f696b4745a	fix(cron): restore origin fallback for feishu home channels	2026-04-17 06:26:43 -07:00
Ubuntu	5ca52bae5b	fix(gateway/weixin): split poll/send sessions, reuse live adapter for cron & send_message - gateway/platforms/weixin.py: - Split aiohttp.ClientSession into _poll_session and _send_session - Add _LIVE_ADAPTERS registry so send_weixin_direct() reuses the connected gateway adapter instead of creating a competing session - Fixes silent message loss when gateway is running (iLink token contention) - cron/scheduler.py: - Support comma-separated deliver values (e.g. 'feishu,weixin') for multi-target delivery - Delay pconfig/enabled check until standalone fallback so live adapters work even when platform is not in gateway config - tools/send_message_tool.py: - Synthesize PlatformConfig from WEIXIN_* env vars when gateway config lacks a weixin entry - Fall back to WEIXIN_HOME_CHANNEL env var for home channel resolution - tests/gateway/test_weixin.py: - Update mocks to include _send_session	2026-04-17 06:26:43 -07:00

1 2 3 4 5 ...

2121 commits