hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Author	SHA1	Message	Date
dirtyfancy	5af9614f6d	fix(claw): warn if OpenClaw is running before migration Add _is_openclaw_running() and _warn_if_openclaw_running() to detect OpenClaw processes (via pgrep/tasklist) before hermes claw migrate. Warns the user that messaging platforms only allow one active session per bot token, and lets them cancel or continue. Fixes #7907	2026-04-12 16:40:37 -07:00
Teknium	76019320fb	feat(skills): centralized skills index — eliminate GitHub API calls for search/install Add a CI-built skills index served from the docs site. The index is crawled daily by GitHub Actions, resolves all GitHub paths upfront, and is cached locally by the client. When the index is available: - Search uses the cached index (0 GitHub API calls, was 23+) - Install uses resolved paths from index (6 API calls for file downloads only, was 31-45 for discovery + downloads) Total: 68 → 6 GitHub API calls for a typical search + install flow. Unauthenticated users (60 req/hr) can now search and install without hitting rate limits. Components: - scripts/build_skills_index.py: Crawl all sources (skills.sh, GitHub taps, official, clawhub, lobehub), batch-resolve GitHub paths via tree API, output JSON index - tools/skills_hub.py: HermesIndexSource class — search/fetch/inspect backed by the index, with lazy GitHubSource for file downloads - parallel_search_sources() skips external API sources when index is available (0 GitHub calls for search) - .github/workflows/skills-index.yml: twice-daily CI build + deploy - .github/workflows/deploy-site.yml: also builds index during docs deploy Graceful degradation: when the index is unavailable (first run, network down, stale), all methods return empty/None and downstream sources handle the request via direct API as before.	2026-04-12 16:39:04 -07:00
Teknium	7e0e5ea03b	fix(skills): cache GitHub repo trees to avoid rate-limit exhaustion on install Skills.sh installs hit the GitHub API 45 times per install because the same repo tree was fetched 6 times redundantly. Combined with search (23 API calls), this totals 68 — exceeding the unauthenticated rate limit of 60 req/hr, causing 'Could not fetch' errors for users without a GITHUB_TOKEN. Changes: - Add _get_repo_tree() cache to GitHubSource — repo info + recursive tree fetched once per repo per source instance, eliminating 10 redundant API calls (6 tree + 4 candidate 404s) - _download_directory_via_tree returns {} (not None) when cached tree shows path doesn't exist, skipping unnecessary Contents API fallback - _check_rate_limit_response() detects exhausted quota and sets is_rate_limited flag - do_install() shows actionable hint when rate limited: set GITHUB_TOKEN or install gh CLI Before: 45 API calls per install (68 total with search) After: 31 API calls per install (54 total with search — under 60/hr) Reported by community user from Vietnam (no GitHub auth configured).	2026-04-12 16:39:04 -07:00
Teknium	4c6ebd077e	chore: sync uv.lock with matrix extra deps (aiosqlite, asyncpg) (#8661 ) These were already declared in pyproject.toml but missing from the lockfile.	2026-04-12 16:38:15 -07:00
alt-glitch	5e1197a42e	fix(gateway): harden Docker/container gateway pathway Centralize container detection in hermes_constants.is_container() with process-lifetime caching, matching existing is_wsl()/is_termux() patterns. Dedup _is_inside_container() in config.py to delegate to the new function. Add _run_systemctl() wrapper that converts FileNotFoundError to RuntimeError for defense-in-depth — all 10 bare subprocess.run(_systemctl_cmd(...)) call sites now route through it. Make supports_systemd_services() return False in containers and when systemctl binary is absent (shutil.which check). Add Docker-specific guidance in gateway_command() for install/uninstall/start subcommands — exit 0 with helpful instructions instead of crashing. Make 'hermes status' show 'Manager: docker (foreground)' and 'hermes dump' show 'running (docker, pid N)' inside containers. Fix setup_gateway() to use supports_systemd instead of _is_linux for all systemd-related branches, and show Docker restart policy instructions in containers. Replace inline /.dockerenv check in voice_mode.py with is_container(). Fixes #7420 Co-authored-by: teknium1 <teknium1@users.noreply.github.com>	2026-04-12 16:36:11 -07:00
sprmn24	18ab5c99d1	fix(backup): correct marker filenames in _validate_backup_zip The backup validation checked for 'hermes_state.db' and 'memory_store.db' as telltale markers of a valid Hermes backup zip. Neither name exists in a real Hermes installation — the actual database file is 'state.db' (hermes_state.py: DEFAULT_DB_PATH = get_hermes_home() / 'state.db'). A fresh Hermes installation produces: ~/.hermes/state.db (actual name) ~/.hermes/config.yaml ~/.hermes/.env Because the marker set never matched 'state.db', a backup zip containing only 'state.db' plus 'config.yaml' would fail validation with: 'zip does not appear to be a Hermes backup' and the import would exit with sys.exit(1), silently rejecting a valid backup. Fix: replace the wrong marker names with the correct filename. Adds TestValidateBackupZip with three cases: - state.db is accepted as a valid marker - old wrong names (hermes_state.db, memory_store.db) alone are rejected - config.yaml continues to pass (existing behaviour preserved)	2026-04-12 16:35:56 -07:00
Teknium	d6785dc4d4	fix: empty response recovery for reasoning models (mimo, qwen, GLM) (#8609 ) Three fixes for the (empty) response bug affecting open reasoning models: 1. Allow retries after prefill exhaustion — models like mimo-v2-pro always populate reasoning fields via OpenRouter, so the old 'not _has_structured' guard on the retry path blocked retries for EVERY reasoning model after the 2 prefill attempts. Now: 2 prefills + 3 retries = 6 total attempts before (empty). 2. Reset prefill/retry counters on tool-call recovery — the counters accumulated across the entire conversation, never resetting during tool-calling turns. A model cycling empty→prefill→tools→empty burned both prefill attempts and the third empty got zero recovery. Now counters reset when prefill succeeds with tool calls. 3. Strip think blocks before _truly_empty check — inline <think> content made the string non-empty, skipping both retry paths. Reported by users on Telegram with xiaomi/mimo-v2-pro and qwen3.5 models. Reproduced: qwen3.5-9b emits tool calls as XML in reasoning field instead of proper function calls, causing content=None + tool_calls=None + reasoning with embedded <tool_call> XML. Prefill recovery works but counter accumulation caused permanent (empty) in long sessions.	2026-04-12 15:38:11 -07:00
Teknium	a4593f8b21	feat: make gateway 'still working' notification interval configurable (#8572 ) Add agent.gateway_notify_interval config option (default 600s). Set to 0 to disable periodic 'still working' notifications. Bridged to HERMES_AGENT_NOTIFY_INTERVAL env var (same pattern as gateway_timeout and gateway_timeout_warning). The inactivity warning (gateway_timeout_warning) was already configurable; this makes the wall-clock ping configurable too.	2026-04-12 13:06:34 -07:00
Teknium	1179918746	fix: salvage follow-ups for Feishu QR onboarding (#7706 ) - Remove duplicate _setup_feishu() definition (old 3-line version left behind by cherry-pick — Python picked the new one but dead code remained) - Remove misleading 'Disable direct messages' DM option — the Feishu adapter has no DM policy mechanism, so 'disable' produced identical env vars to 'pairing'. Users who chose 'disable' would still see pairing prompts. Reduced to 3 options: pairing, allow-all, allowlist. - Fix test_probe_returns_bot_info_on_success and test_probe_returns_none_on_failure: patch FEISHU_AVAILABLE=True so probe_bot() takes the SDK path when lark_oapi is not installed	2026-04-12 13:05:56 -07:00
Shuo	d7785f4d5b	feat(feishu): add scan-to-create onboarding for Feishu / Lark Add a QR-based onboarding flow to `hermes gateway setup` for Feishu / Lark. Users scan a QR code with their phone and the platform creates a fully configured bot application automatically — matching the existing WeChat QR login experience. Setup flow: - Choose between QR scan-to-create (new app) or manual credential input (existing app) - Connection mode selection (WebSocket / Webhook) - DM security policy (pairing / open / allowlist / disabled) - Group chat policy (open with @mention / disabled) Implementation: - Onboard functions (init/begin/poll/QR/probe) in gateway/platforms/feishu.py - _setup_feishu() in hermes_cli/gateway.py with manual fallback - probe_bot uses lark_oapi SDK when available, raw HTTP fallback otherwise - qr_register() catches expected errors (network/protocol), propagates bugs - Poll handles HTTP 4xx JSON responses and feishu/lark domain auto-detection Tests: - 25 tests for onboard module (registration, QR, probe, contract, negative paths) - 16 tests for setup flow (credentials, connection mode, DM policy, group policy, adapter integration verifying env vars produce valid FeishuAdapterSettings) Change-Id: I720591ee84755f32dda95fbac4b26dc82cbcf823	2026-04-12 13:05:56 -07:00
Teknium	a9ebb331bc	fix: contextual error diagnostics for invalid API responses (#8565 ) Previously, all invalid API responses (choices=None) were diagnosed as 'fast response often indicates rate limiting' regardless of actual response time or error code. A 738s Cloudflare 524 timeout was labeled as 'fast response' and 'possible rate limit'. Now extracts the error code from response.error and classifies: - 524: upstream provider timed out (Cloudflare) - 504: upstream gateway timeout - 429: rate limited by upstream provider - 500/502: upstream server error - 503/529: upstream provider overloaded - Other codes: shown with code number - No code + <10s: likely rate limited (timing heuristic) - No code + >60s: likely upstream timeout - No code + 10-60s: neutral response time All downstream messages (retry status, final error, interrupt message) now use the classified hint instead of generic rate-limit language. Reported by community member Lumen Radley (MiMo provider timeouts).	2026-04-12 13:00:07 -07:00
Teknium	400fe9b2a1	fix: add <thought> stripping to auxiliary_client + tests auxiliary_client.py had its own regex mirroring _strip_think_blocks but was missing the <thought> variant. Also adds test coverage for <thought> paired and orphaned tags.	2026-04-12 12:44:49 -07:00
Chen Chia Yang	326d5febe5	fix: also strip <thought> tags during streaming in cli.py	2026-04-12 12:44:49 -07:00
Chen Chia Yang	a372c14fc5	fix: strip <thought> tags from Gemma 4 responses in _strip_think_blocks Gemma 4 (26B/31B) uses <thought>...</thought> to wrap its reasoning output. This tag was not included in the existing list of reasoning tag variants stripped by _strip_think_blocks(), causing raw thinking blocks to leak into the visible response. Added a new re.sub() line for <thought> and extended the cleanup regex to include 'thought' alongside the existing variants. Fixes #6148	2026-04-12 12:44:49 -07:00
Teknium	f295b17d92	fix: make agent_thread daemon to prevent orphan CLI processes on tab close (#8557 ) When a user closes a terminal tab, SIGHUP exits the main thread but the non-daemon agent_thread kept the entire Python process alive — stuck in the API call loop with no interrupt signal. Over many conversations, these orphan processes accumulate and cause massive swap usage (reported: 77GB on a 32GB M1 Pro). Changes: - Make agent_thread daemon=True so the process exits when the main thread finishes its cleanup. Under normal operation this changes nothing — the main thread already waits on agent_thread.is_alive(). - Interrupt the agent in the finally/exit path so the daemon thread stops making API calls promptly rather than being killed mid-flight.	2026-04-12 12:38:55 -07:00
Teknium	06290f6a2f	fix: handle broken stdin in prompt_toolkit startup (#6393 ) (#8560 ) On macOS with uv-managed Python, stdin (fd 0) can be invalid or unregisterable with the asyncio selector, causing: KeyError: '0 is not registered' during prompt_toolkit's app.run() → asyncio.run() → _add_reader(0). Three-layer fix: 1. Pre-flight fstat(0) check before app.run() — detects broken stdin early and prints actionable guidance instead of a raw traceback. 2. Catch KeyError/OSError around app.run() as fallback for edge cases that slip past the fstat guard. 3. Extend asyncio exception handler to suppress selector registration KeyErrors in async callbacks. Fixes #6393	2026-04-12 12:38:03 -07:00
Teknium	06a17c57ae	fix: improve profile creation UX — seed SOUL.md + credential warning (#8553 ) Fresh profiles (created without --clone) now: - Auto-seed a default SOUL.md immediately, so users have a file to customize right away instead of discovering it only after first use - Print a clear warning that the profile has no API keys and will inherit from the shell environment unless configured separately - Show the SOUL.md path for personality customization Previously, fresh profiles started with no SOUL.md (only seeded on first use via ensure_hermes_home), no mention of credential isolation, and no guidance about customizing personality. Users reported confusion about profiles using the wrong model/plan tokens and SOUL.md not being read — both traced to operational gaps in the creation UX. Closes #8093 (investigated: code correctly loads SOUL.md from profile HERMES_HOME; issue was operational, not a code bug).	2026-04-12 12:22:34 -07:00
Teknium	4eecaf06e4	fix: prevent duplicate update prompt spam in gateway watcher (#8343 ) The _watch_update_progress() poll loop never deleted .update_prompt.json after forwarding the prompt to the user, causing the same prompt to be re-sent every poll cycle (2s). Two fixes: 1. Delete .update_prompt.json after forwarding — the update process only polls for .update_response, it doesn't need the prompt file to persist. 2. Guard re-sends with _update_prompt_pending check — belt-and-suspenders to prevent duplicates even under race conditions. Add regression test asserting the prompt is sent exactly once.	2026-04-12 04:52:59 -07:00
Teknium	7a67b13506	fix: title_generator no longer logs as 'compression' task Changed task='compression' to task='title_generation' so auto-title calls don't pollute logs with false compression alarms.	2026-04-12 04:17:18 -07:00
Teknium	45e60904c6	fix: fall back to provider's default model when model config is empty (#8303 ) When a user configures a provider (e.g. `hermes auth add openai-codex`) but never selects a model via `hermes model`, the gateway and CLI would pass an empty model string to the API, causing: 'Codex Responses request model must be a non-empty string' Now both gateway (_resolve_session_agent_runtime) and CLI (_ensure_runtime_credentials) detect an empty model and fill it from the provider's first catalog entry in _PROVIDER_MODELS. This covers all providers that have a static model list (openai-codex, anthropic, gemini, copilot, etc.). The fix is conservative: it only triggers when model is truly empty and a known provider was resolved. Explicit model choices are never overridden.	2026-04-12 03:53:30 -07:00
Teknium	17c72f176d	fix: make skill loading instructions more aggressive in system prompt (#8286 ) The previous wording ('If one clearly matches') set too high a threshold, and 'If none match, proceed normally' was an easy escape hatch for lazy models. Now: - Lowered threshold: 'matches or is even partially relevant' - Added MUST directive and 'err on the side of loading' guidance - Replaced permissive closer with 'only proceed without if genuinely none are relevant' This should reduce cases where the agent skips loading relevant skills unless explicitly forced.	2026-04-12 03:03:16 -07:00
Teknium	b6b6b02f0f	fix: prevent unwanted session auto-reset after graceful gateway restarts (#8299 ) When the gateway shuts down gracefully (hermes update, gateway restart, /restart), it now writes a .clean_shutdown marker file. On the next startup, if this marker exists, suspend_recently_active() is skipped and the marker is cleaned up. Previously, suspend_recently_active() fired on EVERY startup — including planned restarts from hermes update or hermes gateway restart. This caused users to lose their conversation history unexpectedly: the session would be marked as suspended, and the next message would trigger an auto-reset with a notification the user never asked for. The original purpose of suspend_recently_active() is crash recovery — preventing stuck sessions that were mid-processing when the gateway died unexpectedly. Graceful shutdowns already drain active agents via _drain_active_agents(), so there is no stuck-session risk. After a crash (no marker written), suspension still fires as before. Fixes the scenario where a user asks the agent to run hermes update, the gateway restarts, and the user's next message gets an unwanted 'Session automatically reset' notification with their history cleared.	2026-04-12 03:03:07 -07:00
Teknium	56e3ee2440	fix: write update exit code before gateway restart (cgroup kill race) (#8288 ) When /update runs via Telegram, hermes update --gateway is spawned inside the gateway's systemd cgroup. The update process itself calls systemctl restart hermes-gateway, which tears down the cgroup with KillMode=mixed — SIGKILL to all remaining processes. The wrapping bash shell is killed before it can execute the exit-code epilogue, so .update_exit_code is never created. The new gateway's update watcher then polls for 30 minutes and sends a spurious timeout message. Fix: write .update_exit_code from Python inside cmd_update() immediately after the git pull + pip install succeed ("Update complete!"), before attempting the gateway restart. The shell epilogue still writes it too (idempotent overwrite), but now the marker exists even when the process is killed mid-restart.	2026-04-12 02:33:21 -07:00
Teknium	b321330362	feat: add WSL environment hint to system prompt (#8285 ) When running inside WSL (Windows Subsystem for Linux), inject a hint into the system prompt explaining that the Windows host filesystem is mounted at /mnt/c/, /mnt/d/, etc. This lets the agent naturally translate Windows paths (Desktop, Documents) to their /mnt/ equivalents without the user needing to configure anything. Uses the existing is_wsl() detection from hermes_constants (cached, checks /proc/version for 'microsoft'). Adds build_environment_hints() in prompt_builder.py — extensible for Termux, Docker, etc. later. Closes the UX gap where WSL users had to manually explain path translation to the agent every session.	2026-04-12 02:26:28 -07:00
Teknium	dd5b1063d0	fix: register MATRIX_RECOVERY_KEY env var + document migration path Follow-up for cherry-picked PR #8272: - Add MATRIX_RECOVERY_KEY to module docstring header in matrix.py - Register in OPTIONAL_ENV_VARS (config.py) with password=True, advanced=True - Add to _NON_SETUP_ENV_VARS set - Document cross-signing verification in matrix.md E2EE section - Update migration guide with recovery key step (step 3) - Add to environment-variables.md reference	2026-04-12 02:18:03 -07:00
elkimek	b9af4955b9	fix(matrix): restore verify_with_recovery_key after device key rotation After the PgCryptoStore migration in v0.8.0, the verify_with_recovery_key call that previously ran after share_keys() was dropped. On any rotation that uploads fresh device keys (fresh crypto.db, server had stale keys from a prior install, etc.), the new device keys carry no valid self- signing signature because the bot has no access to the self-signing private key. Peers like Element then refuse to share Megolm sessions with the rotated device, so the bot silently stops decrypting incoming messages. This restores the recovery-key bootstrap: on startup, if MATRIX_RECOVERY_KEY is set, import the cross-signing private keys from SSSS and sign_own_device(), producing a valid signature server-side. Idempotent and gated on MATRIX_RECOVERY_KEY — no behavior change for users who don't configure a recovery key. Verified end-to-end by deleting crypto.db and restarting: the bot rotates device identity keys, re-uploads, self-signs via recovery key, and decrypts+replies to fresh messages from a paired Element client.	2026-04-12 02:18:03 -07:00
Ben Barclay	b0d65c333a	Merge pull request #8279 from NousResearch/chore/simplify-docker-tags chore: simplify Docker image tags	2026-04-12 19:09:05 +10:00
Ben	00adbd0de0	chore: simplify Docker image tags - Main branch push: only push :latest (remove SHA tag) - Release push: only push release tag name (remove :latest and SHA tag)	2026-04-12 19:08:16 +10:00
Teknium	95fa78eb6c	fix: write refreshed Codex tokens back to ~/.codex/auth.json (#8277 ) OpenAI OAuth refresh tokens are single-use and rotate on every refresh. When Hermes refreshes a Codex token, it consumed the old refresh_token but never wrote the new pair back to ~/.codex/auth.json. This caused Codex CLI and VS Code to fail with 'refresh_token_reused' on their next refresh attempt. This mirrors the existing Anthropic write-back pattern where refreshed tokens are written to ~/.claude/.credentials.json via _write_claude_code_credentials(). Changes: - Add _write_codex_cli_tokens() in hermes_cli/auth.py (parallel to _write_claude_code_credentials in anthropic_adapter.py) - Call it from _refresh_codex_auth_tokens() (non-pool refresh path) - Call it from credential_pool._refresh_entry() (pool happy path + retry) - Add tests for the new write-back behavior - Update existing test docstring to clarify _save_codex_tokens vs _write_codex_cli_tokens separation Fixes refresh token conflict reported by @ec12edfae2cb221	2026-04-12 02:05:20 -07:00
Teknium	6d05e3d56f	fix(gateway): evict cached agent on /model switch + add diagnostic logging (#8276 ) After /model switches the model (both picker and text paths), the cached agent's config signature becomes stale — the agent was updated in-place via switch_model() but the cache tuple's signature was never refreshed. The next turn should detect the signature mismatch and create a fresh agent, but this relies on the new model's signature differing from the old one in _agent_config_signature(). Evicting the cached agent explicitly after storing the session override is more defensive — the next turn is guaranteed to create a fresh agent from the override without depending on signature mismatch detection. Also adds debug logging at three key decision points so we can trace exactly what happens when /model + /retry interact: - _resolve_session_agent_runtime: which override path is taken (fast with api_key vs fallback), or why no override was found - _run_agent.run_sync: final resolved model/provider before agent creation Reported: /model switch to xiaomi/mimo-v2-pro followed by /retry still used the old model (glm-5.1).	2026-04-12 01:58:17 -07:00
Teknium	4aa534eae5	fix(gateway): peek at pending message during interrupt instead of consuming it The monitor_for_interrupt() and backup interrupt checks were calling get_pending_message() which pops the message from the adapter's queue. This created a race condition: if the agent finished naturally before checking _interrupt_requested, the pending message was permanently lost. Timeline of the race: 1. Agent near completion, user sends message 2. Level 1 guard stores message in adapter._pending_messages, sets event 3. monitor_for_interrupt() detects event, POPS message, calls agent.interrupt() 4. Agent's run_conversation() was already returning (interrupted=False) 5. Post-run dequeue finds nothing (monitor already consumed it) 6. result.get('interrupted') is False so interrupt_message fallback doesn't fire 7. User message permanently lost — agent finishes without processing it Fix: change all three interrupt detection sites (primary monitor + two backup checks) from get_pending_message() (pop) to _pending_messages.get() (peek). The message stays in the adapter's queue until _dequeue_pending_event() consumes it in the post-run handler, which runs regardless of whether the agent was interrupted or finished naturally. Reported by @_SushantSays — intermittent message loss during long terminal command execution, persisting after the previous fix (`73f970fa`) which addressed monitor task death but not this consumption race.	2026-04-12 01:57:34 -07:00
Teknium	ae6820a45a	fix(setup): validate base URL input in hermes model flow (#8264 ) Reject non-URL values (e.g. shell commands typed by mistake) in the base URL prompt during provider setup. Previously any string was saved as-is to .env, breaking connectivity when the garbage value was used as the API endpoint. Adds http:// / https:// prefix check with a clear error message. The custom-endpoint flow already had this validation (line 1620); this brings the generic API-key provider flow to parity. Triggered by a user support case where 'nano ~/.hermes/.env' was accidentally entered as GLM_BASE_URL during Z.AI setup.	2026-04-12 01:51:57 -07:00
Teknium	a1220977d3	fix: make skill loading instructions more aggressive in system prompt (#8209 ) The previous wording ('If one clearly matches') set too high a threshold, and 'If none match, proceed normally' was an easy escape hatch for lazy models. Now: - Lowered threshold: 'matches or is even partially relevant' - Added MUST directive and 'err on the side of loading' guidance - Replaced permissive closer with 'only proceed without if genuinely none are relevant' This should reduce cases where the agent skips loading relevant skills unless explicitly forced.	2026-04-12 01:46:34 -07:00
Teknium	078dba015d	fix: three provider-related bugs (#8161 , #8181 , #8147 ) (#8243 ) - Add openai/openai-codex -> openai mapping to PROVIDER_TO_MODELS_DEV so context-length lookups use models.dev data instead of 128k fallback. Fixes #8161. - Set api_mode from custom_providers entry when switching via hermes model, and clear stale api_mode when the entry has none. Also extract api_mode in _named_custom_provider_map(). Fixes #8181. - Convert OpenAI image_url content blocks to Anthropic image blocks when the endpoint is Anthropic-compatible (MiniMax, MiniMax-CN, or any URL containing /anthropic). Fixes #8147.	2026-04-12 01:44:18 -07:00
Harish Kukreja	b1f13a8c5f	fix(agent): route compression aux through live session runtime	2026-04-12 01:34:52 -07:00
Teknium	c52f6348b6	fix: list all available toolsets in delegate_task schema description (#8231 ) * fix: list all available toolsets in delegate_task schema description The delegate_task tool's toolsets parameter description only mentioned 'terminal', 'file', and 'web' as examples. Models (especially smaller ones like Gemma) would substitute 'web' for 'browser' because they didn't know 'browser' was a valid option. Now dynamically builds the toolset list from the TOOLSETS dict at import time, excluding blocked, composite, and platform-specific toolsets. Auto-updates when new toolsets are added. Reported by jeffutter on Discord. * chore: exclude moa and rl from delegate_task toolset list	2026-04-12 00:54:35 -07:00
Teknium	3162472674	feat(tips): add 69 deeper hidden-gem tips (279 total) (#8237 ) Add lesser-known power-user tips covering: - BOOT.md gateway startup automation - Cron script attachment for data collection pipelines - Prefill messages for few-shot priming - Focus topic compression (/compress <topic>) - Terminal exit code annotations and auto-retry - Automatic sudo password piping - execute_code built-in helpers (json_parse, shell_quote, retry) - File loop detection and staleness warnings - MCP sampling and dynamic tool discovery - Delegation heartbeat and ACP child agents (Claude Code) - 402 auto-fallback in auxiliary client - Container mode, HERMES_HOME_MODE, subprocess HOME isolation - Ctrl+C 5-tier priority system - Browser CDP URL override and stealth mode - Skills quarantine, audit log, and well-known protocol - Per-platform display overrides, human delay mode - And many more deep-cut features	2026-04-12 00:54:07 -07:00
Teknium	8b9d22a74b	revert: keep debian:13.4 full image instead of slim The slim image drops packages that may be needed at runtime. Keep the full Debian base for compatibility.	2026-04-12 00:53:16 -07:00
m0n5t3r	fee0e0d35e	fix(docker): run as non-root user, use virtualenv (salvage #5811 ) - Add gosu for runtime privilege dropping from root to hermes user - Support HERMES_UID/HERMES_GID env vars for host mount permission matching - Switch to debian:13.4-slim base image - Use uv venv instead of pip install --break-system-packages - Pin uv and gosu multi-stage images with SHA256 digests - Set PLAYWRIGHT_BROWSERS_PATH to /opt/hermes/.playwright so build-time chromium install survives the /opt/data volume mount - Keep procps for container debugging Based on work by m0n5t3r in PR #5811. Stripped to hardening-only changes (non-root, virtualenv, slim base); matrix deps, fonts, xvfb, and entrypoint playwright download deferred to follow-up.	2026-04-12 00:53:16 -07:00
bravohenry	81ac62c0e9	fix(weixin): split chatty short replies into separate bubbles, keep structured content together Add content-aware splitting to compact mode: short chat-like exchanges (2-6 short lines without headings/lists/quotes) get separate message bubbles for a natural chat feel, while structured content (tables, headings with body, numbered lists) stays in a single message. Cherry-picked from PR #7587 by bravohenry, adapted to the compact/legacy split_per_line architecture from #7903.	2026-04-12 00:38:07 -07:00
Teknium	f53a5a7fe1	fix: suppress duplicate completion notifications when agent already consumed output via wait/poll/log (#8228 ) When the agent calls process(action='wait') or process(action='poll') and gets the exited status, the completion_queue notification is redundant — the agent already has the output from the tool return. Previously, the drain loops in CLI and gateway would still inject the [SYSTEM: Background process completed] message, causing the agent to receive the same information twice. Fix: track session IDs in _completion_consumed set when wait/poll/log returns an exited process. Drain loops in cli.py and gateway watcher skip completion events for consumed sessions. Watch pattern events are never suppressed (they have independent semantics). Adds 4 tests covering wait/poll/log marking and running-process negative case.	2026-04-12 00:36:22 -07:00
Teknium	fdf55e0fe9	feat(cli): show random tip on new session start (#8225 ) Add a 'tip of the day' feature that displays a random one-liner about Hermes Agent features on every new session — CLI startup, /clear, /new, and gateway /new across all messaging platforms. - New hermes_cli/tips.py module with 210 curated tips covering slash commands, keybindings, CLI flags, config options, tools, gateway platforms, profiles, sessions, memory, skills, cron, voice, security, and more - CLI: tips display in skin-aware dim gold color after the welcome line - Gateway: tips append to the /new and /reset response on all platforms - Fully wrapped in try/except — tips are non-critical and never break startup or reset Display format (CLI): ✦ Tip: /btw <question> asks a quick side question without tools or history. Display format (gateway): ✨ Session reset! Starting fresh. ✦ Tip: hermes -c resumes your most recent CLI session.	2026-04-12 00:34:01 -07:00
opriz	36f57dbc51	fix(migration): don't auto-archive OpenClaw source directory Remove auto-archival from hermes claw migrate — not its responsibility (hermes claw cleanup is still there for that). Skip MESSAGING_CWD when it points inside the OpenClaw source directory, which was the actual root cause of agent confusion after migration. Use Path.is_relative_to() for robust path containment check. Salvaged from PR #8192 by opriz. Co-authored-by: opriz <opriz@users.noreply.github.com>	2026-04-12 00:33:54 -07:00
Teknium	1871227198	feat: rebrand OpenClaw references to Hermes during migration - Add rebrand_text() that replaces OpenClaw, Open Claw, Open-Claw, ClawdBot, and MoltBot with Hermes (case-insensitive, word-boundary) - Apply rebranding to memory entries (MEMORY.md, USER.md, daily memory) - Apply rebranding to SOUL.md and workspace instructions via new transform parameter on copy_file() - Fix moldbot -> moltbot typo across codebase (claw.py, migration script, docs, tests) - Add unit tests for rebrand_text and integration tests for memory and soul migration rebranding	2026-04-12 00:33:54 -07:00
Teknium	eb2a49f95a	fix: openai-codex and anthropic not appearing in /model picker for external credentials (#8224 ) Users whose credentials exist only in external files — OpenAI Codex OAuth tokens in ~/.codex/auth.json or Anthropic Claude Code credentials in ~/.claude/.credentials.json — would not see those providers in the /model picker, even though hermes auth and hermes model detected them. Root cause: list_authenticated_providers() only checked the raw Hermes auth store and env vars. External credential file fallbacks (Codex CLI import, Claude Code file discovery) were never triggered. Fix (three parts): 1. _seed_from_singletons() in credential_pool.py: openai-codex now imports from ~/.codex/auth.json when the Hermes auth store is empty, mirroring resolve_codex_runtime_credentials(). 2. list_authenticated_providers() in model_switch.py: auth store + pool checks now run for ALL providers (not just OAuth auth_type), catching providers like anthropic that support both API key and OAuth. 3. list_authenticated_providers(): direct check for anthropic external credential files (Claude Code, Hermes PKCE). The credential pool intentionally gates anthropic behind is_provider_explicitly_configured() to prevent auxiliary tasks from silently consuming tokens. The /model picker bypasses this gate since it is discovery-oriented.	2026-04-12 00:33:42 -07:00
Teknium	73f970fa4d	fix: make gateway interrupt detection resilient to monitor task failures The interrupt mechanism for regular text messages (non-commands) during active agent runs relied on a single async polling task (monitor_for_interrupt) with no error handling. If this task died silently due to an unhandled exception, stale adapter reference after reconnect, or any other failure, user messages sent during agent execution would be queued but never trigger an actual interrupt — the agent would continue running until it finished naturally, then process the queued message. Three improvements: 1. Error handling in monitor_for_interrupt(): wrap the polling body in try/except so transient errors are logged and retried instead of silently killing the task. 2. Fresh adapter reference on each poll iteration: re-resolve self.adapters.get(source.platform) every 200ms instead of capturing the adapter once at task creation time. This prevents stale references after adapter reconnects. 3. Backup interrupt check in the inactivity poll loop: both the unlimited and timeout-enabled paths now check for pending interrupts every 5 seconds (the existing poll interval). Uses a shared _interrupt_detected asyncio.Event to avoid double-firing when the primary monitor already handled the interrupt. Logs at INFO level with monitor task state for debugging.	2026-04-12 00:25:05 -07:00
Teknium	4cadfef8e3	fix(cli): restore stacked tool progress scrollback in TUI (#8201 ) The TUI transition (`4970705`, `f83e86d`) replaced stacked per-tool history lines with a single live-updating spinner widget. While the spinner provides a nice live timer, it removed the scrollback history that users relied on to see what the agent did during a session. This restores stacked tool progress lines in 'all' and 'new' modes by printing persistent scrollback lines via _cprint() when tools complete, in addition to the existing live spinner display. Behavior per mode: - off: no scrollback lines, no spinner (unchanged) - new: scrollback line on completion, skipping consecutive same-tool repeats - all: scrollback line on every tool completion - verbose: no scrollback (run_agent.py handles verbose output directly) Implementation: - Store function_args from tool.started events in _pending_tool_info - On tool.completed, pop stored args and format via get_cute_tool_message() - FIFO queue per function_name handles concurrent tool execution - 'new' mode tracks _last_scrollback_tool for dedup - State cleared at end of agent run Reported by community user Mr.D — the stacked history provides transparency into what the agent is doing, which builds trust. Addresses user report from Discord about lost tool call visibility.	2026-04-11 23:22:34 -07:00
Teknium	8e00b3a69e	fix(cron): steer model away from explicit deliver targets that lose topic context (#8187 ) Rewrite the cronjob tool's 'deliver' parameter description to strongly guide models toward omitting the parameter (which auto-detects origin including thread/topic). The previous description listed all platform names equally, inviting models to construct explicit targets like 'telegram:<chat_id>' which silently drops the thread_id. New description: - Leads with 'Omit this parameter' as the recommended path - Explicitly warns that platform:chat_id without :thread_id loses topics - Removes the long flat list of platform names that invited construction Also adds diagnostic logging at two key points: - _origin_from_env(): logs when thread_id is captured during job creation - _deliver_result(): warns when origin has thread_id but delivery target lost it; logs at debug when delivering to a specific thread Helps diagnose user-reported issue where cron responses from Telegram topics are delivered to the main chat instead of the originating topic.	2026-04-11 23:20:39 -07:00
Teknium	1ca9b19750	feat: add network.force_ipv4 config to fix IPv6 timeout issues (#8196 ) On servers with broken or unreachable IPv6, Python's socket.getaddrinfo returns AAAA records first. urllib/httpx/requests all try IPv6 connections first and hang for the full TCP timeout before falling back to IPv4. This affects web_extract, web_search, the OpenAI SDK, and all HTTP tools. Adds network.force_ipv4 config option (default: false) that monkey-patches socket.getaddrinfo to resolve as AF_INET when the caller didn't specify a family. Falls back to full resolution if no A record exists, so pure-IPv6 hosts still work. Applied early at all three entry points (CLI, gateway, cron scheduler) before any HTTP clients are created. Reported by user @29n — Chinese Ubuntu server with unreachable IPv6 causing timeouts on lobste.rs and other IPv6-enabled sites while Google/GitHub worked fine (IPv4-only resolution).	2026-04-11 23:12:11 -07:00
Teknium	1cec910b6a	fix: improve context compaction to prevent model answering stale questions (#8107 ) After compression, models (especially Kimi 2.5) would sometimes respond to questions from the summary instead of the latest user message. This happened ~30% of the time on Telegram. Root cause: the summary's 'Next Steps' section read as active instructions, and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions in the summary. When the summary merged into the first tail message, there was no clear separator between historical context and the actual user message. Changes inspired by competitor analysis (Claude Code, OpenCode, Codex): 1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from this summary — respond ONLY to the latest user message AFTER it' 2. Summarizer preamble (shared by both prompts) adds: - 'Do NOT respond to any questions' (from OpenCode's approach) - 'Different assistant' framing (from Codex) to create psychological distance between summary content and active conversation 3. New summary sections: - '## Resolved Questions' — tracks already-answered questions with their answers, preventing re-answering (from Claude Code's 'Pending user asks' pattern) - '## Pending User Asks' — explicitly marks unanswered questions - '## Remaining Work' replaces '## Next Steps' — passive framing avoids reading as active instructions 4. merge-summary-into-tail path now inserts a clear separator: '--- END OF CONTEXT SUMMARY — respond to the message below ---' 5. Iterative update prompt now instructs: 'Move answered questions to Resolved Questions' to maintain the resolved/pending distinction across multiple compactions.	2026-04-11 19:43:58 -07:00

... 3 4 5 6 7 ...

4173 commits