hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-08 03:01:47 +00:00

Author	SHA1	Message	Date
Sofia Yang	103e11926f	feat(cli): show context compression count in status bar Display the number of context compressions in the CLI status bar when compressions > 0, helping users understand conversation compression pressure during long sessions. - Wide layout (>=76 cols): shows 'cmp N' between context percent and duration - Medium layout (52-75 cols): shows 'cmp N' between percent and duration - Narrow layout (<52 cols): omitted to save space - Color-coded: dim for 1-4, warn for 5-9, bad for 10+ - Hidden when zero to keep the bar clean for new sessions Closes #18564	2026-05-07 05:27:45 -07:00
Hermes Agent	e38ea38079	fix(credential_pool): resolve key mix-up when custom providers share base_url When multiple custom_providers share the same base_url but have different API keys, get_custom_provider_pool_key() always returned the first match, causing wrong-key unauthorized errors. Add provider_name parameter to prefer exact name matches over base_url-only matching, with fallback for backward compatibility. Fixes #19083	2026-05-07 05:27:41 -07:00
Teknium	3c8154e62c	chore: AUTHOR_MAP entry for @GinWU05	2026-05-07 05:26:28 -07:00
GinWU	6d9b30632d	fix(cli): honor positive tool preview length	2026-05-07 05:26:28 -07:00
Teknium	eef23354a5	chore: AUTHOR_MAP entry for @nouseman666	2026-05-07 05:24:43 -07:00
nouseman666	7cbef2bd42	fix(dashboard): route browser wheel into inner TUI scrolling	2026-05-07 05:24:43 -07:00
nouseman666	8aceef539f	fix(dashboard): let embedded chat use a single scroll system	2026-05-07 05:24:43 -07:00
nouseman666	a0758cd1e9	fix(dashboard): stabilize embedded chat resume and scrollback	2026-05-07 05:24:43 -07:00
Teknium	fdb9e0f6a6	fix(kanban): auto-block workers that exit without completing (#20894 ) (#21214 ) When a kanban worker subprocess exits rc=0 but its task is still in status='running', the agent almost certainly answered the task conversationally without calling kanban_complete or kanban_block. The dispatcher used to classify this as a generic crash and respawn, which loops forever on small local models (gemma4-e2b q4 etc.) that keep returning clean but unproductive output. Dispatcher changes: - The waitpid reap loop at the top of dispatch_once now records each reaped child's raw exit status in a bounded module registry (_recent_worker_exits, TTL 600s, size cap 4096). - _classify_worker_exit distinguishes clean_exit / nonzero_exit / signaled / unknown using os.WIFEXITED / WIFSIGNALED. - detect_crashed_workers consults the classification when a worker is found dead. clean_exit → protocol_violation event + immediate circuit-breaker trip (failure_limit=1). Everything else keeps the existing crashed-event + counter behavior. - DispatchResult.auto_blocked now includes protocol-violation trips. Gateway fix (Bug A in #20894): - gateway.run._notify_active_sessions_of_shutdown snapshots self.adapters with list(...) before iterating. adapter.send() can hit a fatal-error path that pops the adapter from the dict, which was raising 'RuntimeError: dictionary changed size during iteration' during shutdown. Regression tests: - test_detect_crashed_workers_protocol_violation_auto_blocks verifies rc=0 + still-running → status=blocked on first occurrence with protocol_violation + gave_up events and NO crashed event. - test_detect_crashed_workers_nonzero_exit_uses_default_limit verifies non-zero exits keep the existing 2-strike behavior. Closes #20894.	2026-05-07 05:24:16 -07:00
jani	699c770e5c	docs(readme): drop misleading RL install-extras claim, defer to CONTRIBUTING README.md:163 said atroposlib and tinker were pulled in by .[all,dev], but .[all] does not include .[rl] — those dependencies live in pyproject.toml's [rl] extra (lines 95-101). With the original wording, a contributor running uv pip install -e ".[all,dev]" would not have atroposlib or tinker installed. Rather than swap one extra for another (which paths users to either of two parallel install conventions — pip [rl] extra vs tinker-atropos submodule — without saying which the project considers canonical), this PR drops the specific install command from the README and links to CONTRIBUTING.md, which already documents the actual development setup.	2026-05-07 05:22:59 -07:00
Teknium	aa9a2091f6	chore(release): add AUTHOR_MAP entries for ggnnggez and ehz0ah Contributors to OpenViking local resource upload fix (#19569).	2026-05-07 05:21:50 -07:00
Hao Zhe	2b6345cee3	fix(memory): harden OpenViking local path uploads	2026-05-07 05:21:50 -07:00
Hao Zhe	187951ec6b	test(memory): harden OpenViking local upload coverage	2026-05-07 05:21:50 -07:00
nan	7137cccbd1	fix(memory): support OpenViking local resource uploads	2026-05-07 05:21:50 -07:00
0oAstro	abe5a3c937	fix(model_switch): live model discovery for custom_providers in /model picker custom_providers entries (section 4 of list_authenticated_providers) only read the static models: dict from config.yaml, ignoring the live /v1/models endpoint. This means gateways like Bifrost that expose hundreds of models only show the handful explicitly listed in config. Add live discovery via fetch_api_models() for custom_providers entries that have api_key + base_url, matching the existing behavior for user providers: entries (section 3). When the endpoint is reachable and returns models, the live list replaces the static subset. Fixes: /model picker showing only 9 models from a Bifrost gateway that actually exposes 581.	2026-05-07 05:21:26 -07:00
Teknium	4e27e4e05a	chore: AUTHOR_MAP entry for @leon7609	2026-05-07 05:20:10 -07:00
Teknium	e82f3b0c41	test: update send_message_tool mocks for force_document kwarg	2026-05-07 05:20:10 -07:00
leon7609	d34f03c32a	feat(gateway): support [[as_document]] directive for skill media routing Skills that produce large/lossless images (e.g. info-graph, where a rendered JPG is 1-2 MB) currently lose quality in Telegram delivery because `_IMAGE_EXTS` membership routes the file through `send_multiple_images` → `sendMediaGroup`, which Telegram's server re-encodes to JPEG @ 1280px max edge. The original bytes only survive when the file goes through `send_document`, which the dispatch tables in three places (`_process_message_background`, `_deliver_media_from_response`, and the `send_message` tool's telegram path) only reach for files whose extension is NOT in `_IMAGE_EXTS`. This commit adds an `[[as_document]]` directive that mirrors the existing `[[audio_as_voice]]` shape: a skill emits the directive once in its response, and every image-extension MEDIA: file in that response is delivered via `send_document` instead of `send_multiple_images` / `sendPhoto`. The directive is detected at the dispatch sites (which see the raw response) and the directive string is stripped from the user-visible cleaned text in `extract_media` so it never leaks. Granularity is intentionally all-or-nothing per response, matching [[audio_as_voice]]'s scope. Skills that need fine control can split into two responses. Verified the targeted use case: info-graph emits 信息图已生成（...） [[as_document]] MEDIA:/tmp/info-graph-x/infographic.jpg → Telegram receives `infographic.jpg` via sendDocument, original 1MB JPEG bytes preserved, no recompression. Forwarding and download filenames stay clean (`infographic.jpg`). Tests: +3 cases in TestExtractMedia covering directive strip, isolation from voice flag, and coexistence with [[audio_as_voice]]. All 113 pre-existing media/extract/send tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 05:20:10 -07:00
Molvikar	8d363f8d54	fix(bedrock): preserve reasoningContent across converse normalization	2026-05-07 05:17:16 -07:00
Teknium	f0dd5b9c10	chore: add discodirector email to AUTHOR_MAP	2026-05-07 05:17:03 -07:00
badfriend	4f364c4e99	fix(mcp): give 'mcp add --command' a distinct argparse dest The --command flag of `hermes mcp add` shared its argparse dest with the top-level subparser (`dest="command"` in `hermes_cli/_parser.py`). When the flag was omitted, argparse still wrote `args.command = None`, clobbering the top-level value of `"mcp"`. The dispatcher then saw `args.command is None` and fell through to interactive chat, so `hermes mcp add ...` silently launched chat instead of registering the server. `cmd_mcp_add` was never reached. Use `dest="mcp_command"` on the flag and read it from `cmd_mcp_add`. The user-facing CLI flag `--command` is unchanged; only the in-memory namespace attribute moves. Also updates the `_make_args` helper in `tests/hermes_cli/test_mcp_config.py` to populate the new dest, and adds `tests/hermes_cli/test_mcp_add_command_dest.py` with a parser- level regression test. Closes #19785. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-07 05:17:03 -07:00
teknium1	333598cb0e	fix(gateway): cap cached session sources with LRU eviction Follow-up on top of Zyproth's session-source cache: swap the unbounded dict for an OrderedDict with a 512-entry LRU cap so long-running gateways can't accumulate stale entries for dead sessions forever. - self._session_sources is now an OrderedDict - _cache_session_source() move_to_end + popitem(last=False) above cap - _get_cached_session_source() move_to_end on hit (LRU read bump) - restart_test_helpers.py wires OrderedDict + _session_sources_max	2026-05-07 05:16:38 -07:00
Zyproth	176b93575a	fix(gateway): preserve thread routing from cached live session sources	2026-05-07 05:16:38 -07:00
Kailigithub	5bf12eb44a	fix: exclude hidden and archive dirs from _find_skill rglob	2026-05-07 05:15:28 -07:00
liuhao1024	69692039e9	fix(delegate): correct ACP docs — Claude Code CLI has no --acp flag The delegate_task tool schema descriptions referenced 'claude --acp --stdio' as an example, but Claude Code CLI does not support --acp or --stdio flags. The ACP subprocess transport (agent/copilot_acp_client.py) is specifically built for GitHub Copilot CLI ('copilot --acp --stdio'). Changes: - Per-task acp_command example: 'claude' → 'copilot' - Top-level acp_command description: remove 'Claude Code' reference, clarify requirement for ACP-compatible CLI (currently Copilot only) - acp_args description: remove misleading claude-opus-4-6 example Fixes #19055	2026-05-07 05:13:30 -07:00
Teknium	042eb930e2	fix(security): close TOCTOU window in hermes_cli/auth.py credential writers (#21194 ) `_save_auth_store`, `_save_qwen_cli_tokens`, and `_write_shared_nous_state` all created the temp file via `Path.open('w')` / `Path.write_text` and only tightened permissions to 0o600 afterward. Between create and chmod the file existed at the process umask (commonly 0o644 = world-readable on multi-user hosts), briefly exposing OAuth access/refresh tokens for Nous, Codex, Copilot, Claude, Qwen, Gemini, and every other native OAuth provider that flows through auth.json. Switch all three to `os.open(O_WRONLY\|O_CREAT\|O_EXCL, 0o600)` + `os.fdopen` + `fsync` so the file is atomic at 0o600 on creation. Tighten each parent directory (`~/.hermes/`, Qwen auth dir, Nous shared auth dir) to 0o700 so siblings can't traverse to the creds. `_save_auth_store` also gains a per-process random temp suffix to match `agent/google_oauth.py` (#19673) and `tools/mcp_oauth.py` (#21148). Adds `tests/hermes_cli/test_auth_toctou_file_modes.py` asserting final file mode 0o600 and parent dir mode 0o700 across all three writers, plus an explicit `os.open(flags, mode)` check on the main auth.json writer that would fail if anyone reintroduces the `Path.open('w')` pattern. POSIX-only (mode bits skipped on Windows).	2026-05-07 05:12:05 -07:00
Teknium	991df4ef81	chore: AUTHOR_MAP entry for @likejudy	2026-05-07 05:11:09 -07:00
Brian Su	8b32a9d0f1	feat: add Discord message deletion action	2026-05-07 05:11:09 -07:00
Teknium	fb1ce793e6	feat(security): enable secret redaction by default (#17691 , #20785 ) (#21193 ) Flip the default for HERMES_REDACT_SECRETS from off to on so the redactor already wired into send_message_tool, logs, and tool output actually runs on a fresh install. - agent/redact.py: env-var default "" → "true" - hermes_cli/config.py: DEFAULT_CONFIG security.redact_secrets True; two config-template comments rewritten - gateway/run.py + cli.py: startup log / banner warning when the user has explicitly opted out, so the downgrade is visible in agent.log and at CLI banner time - docs/reference/environment-variables.md: description reconciled - tests: flipped the default-pin, restructured the force=True regression test to explicit-false instead of unset Users who need raw credential values (redactor development) can still opt out via security.redact_secrets: false in config.yaml or HERMES_REDACT_SECRETS=false in .env. Closes #17691. Addresses #20785 (short-term output-pipeline recommendation).	2026-05-07 05:10:33 -07:00
Teknium	d856f4535d	chore: AUTHOR_MAP entry for chenlinfeng@ruije / @noOne-list	2026-05-07 05:10:04 -07:00
Teknium	ecaafe5f22	test(weixin): update timeout assertion for asyncio.wait_for migration	2026-05-07 05:10:04 -07:00
chenlinfeng	3a0d52d579	fix(weixin): replace all aiohttp ClientTimeout with asyncio.wait_for() aiohttp ClientTimeout uses BaseTimerContext which calls loop.call_later() internally. When invoked via asyncio.run_coroutine_threadsafe() from cron jobs, this triggers "Timeout context manager should be used inside a task" errors, causing message delivery failures. Replace all direct ClientTimeout usage with asyncio.wait_for(): - _upload_ciphertext: CDN upload (120s timeout) - _download_bytes: CDN download (configurable timeout) - _download_remote_media: remote media fetch (30s timeout) Also set total=None on _send_session to disable aiohttp built-in timeout, and change trust_env=True to False to bypass proxy for WeChat CDN connections.	2026-05-07 05:10:04 -07:00
teknium1	2e00bcaaab	fix(oauth,gateway): monotonic deadlines for polling/timeout loops Widen PR #20314's fix to the other timeout-polling sites in the codebase that share the same wall-clock-jump bug class. All of these measure elapsed timeout duration, not civil time, so they belong on time.monotonic(). - hermes_cli/auth.py: auth-store file-lock timeout, Spotify OAuth callback wait, Nous portal device-auth token poll. - hermes_cli/copilot_auth.py: Copilot OAuth device-flow token poll. - hermes_cli/gateway.py: gateway systemd restart wait. - hermes_cli/web_server.py: dashboard Codex device-auth user_code wait, dashboard Nous device-auth token poll. (sess["expires_at"] stays on time.time() — it's a persisted absolute timestamp, not a local deadline-polling variable.) - agent/copilot_acp_client.py: Copilot ACP JSON-RPC request timeout.	2026-05-07 05:09:39 -07:00
Zyproth	6e8f1e09a9	fix(gateway): use monotonic deadlines in QR onboarding flows	2026-05-07 05:09:39 -07:00
Teknium	73d6371762	chore: add AUTHOR_MAP entries for thelumiereguy and counterposition	2026-05-07 05:07:59 -07:00
thelumiereguy	8a96fa48c1	fix(gateway): avoid duplicated responses history	2026-05-07 05:07:59 -07:00
teknium1	429e78589b	refactor(auth): dedupe file-lock helper; document Nous lock order Extract the shared flock/msvcrt boilerplate from _auth_store_lock and _nous_shared_store_lock into a single _file_lock(lock_path, holder, timeout, message) helper. Each caller keeps its own threading.local holder so reentrancy state stays per-lock. Also document the lock-ordering invariant on both wrappers: _auth_store_lock is OUTER, _nous_shared_store_lock is INNER for all runtime refresh paths. The one exception is _try_import_shared_nous_state, which holds the shared lock alone across the full HTTP refresh+mint cycle to prevent concurrent sibling imports from racing on the single- use shared refresh token; that helper must not be called with the auth lock already held.	2026-05-07 05:07:06 -07:00
Michael Nguyen	a84e56d4c6	fix(auth): sync shared Nous refresh tokens	2026-05-07 05:07:06 -07:00
Teknium	38b1c7dce5	refactor(gateway): simplify auto-resume + extend to crash recovery Follow-up on top of @kyan12's PR #20888 — same feature, cleaner shape, wider coverage. Changes: - Drop the synthetic '[System note: ...]' in the internal MessageEvent. The existing _is_resume_pending branch in _handle_message_with_agent (run.py ~L13738) already injects a reason-aware recovery system note on the next turn. With kyan's text in place the model saw two stacked system notes. Now the event text is empty and the existing injection path owns the wording. - Drop SessionStore.list_resume_pending() as a new public method. The filter is 8 lines inline in _schedule_resume_pending_sessions() — one caller, no other pluggability need. - Add 'restart_interrupted' to the auto-resume reason set. That's the reason SessionStore.suspend_recently_active() stamps on sessions recovered from a crash/OOM/SIGKILL (no .clean_shutdown marker). Previously those sessions had to wait for a real user message to auto-resume; now they continue automatically at startup like drain-timeout interruptions do. - Reasons live in a _AUTO_RESUME_REASONS frozenset at class scope so future reasons (e.g. 'manual_resume_request') can be opted in with one line. Test coverage added: - drain-timeout + crash-recovery both scheduled - stale entries skipped (outside freshness window) - suspended entries skipped (suspended > resume_pending) - originless entries skipped (no routing target) - disallowed reasons skipped (graceful forward-compat) E2E verified end-to-end with a real on-disk SessionStore: 2 eligible sessions scheduled, 2 ineligible skipped, empty-text internal events delivered to the adapter. Co-authored-by: Kevin Yan <kevyan1998@gmail.com>	2026-05-07 05:05:34 -07:00
Kevin Yan	961a3535fa	fix(gateway): preserve resume marker on interrupted restart	2026-05-07 05:05:34 -07:00
Kevin Yan	fad684b1f3	feat(gateway): auto-resume interrupted sessions after restart	2026-05-07 05:05:34 -07:00
Teknium	233bfd3621	chore(release): map mwnickerson noreply email	2026-05-07 05:05:20 -07:00
mwnickerson	411cfa26e3	fix: auto-block repeated kanban retries	2026-05-07 05:05:20 -07:00
Teknium	595e906698	chore(release): map sonic-netizen noreply email	2026-05-07 05:05:20 -07:00
Sonic Chang	b49a3f8474	fix(kanban): reap completed worker children in dispatch_once The gateway-embedded dispatcher (default since `kanban.dispatch_in_gateway = true`) is the parent of every spawned kanban worker. `_default_spawn` calls `subprocess.Popen(..., start_new_session=True)` and returns the pid — `start_new_session` detaches the controlling tty but does not reparent to init, so the gateway keeps each worker as a child until it `wait()`s for them. Nothing in the dispatch loop ever calls `waitpid`. Result: every completed worker becomes a `<defunct>` zombie that lingers until the gateway exits. We hit ~430 zombies on a single hermes-agent container after ~40 days of steady kanban traffic, approaching process-table exhaustion on the host. Fix: add a non-blocking reap loop at the top of `dispatch_once`, so every dispatcher tick (default 60s) drains zombies that accumulated since the last tick. WNOHANG keeps the call non-blocking; ChildProcessError means no children to reap. Why here, not a SIGCHLD handler: - signal.signal requires the main thread; gateway threading model makes that placement non-trivial. - Bounded staleness: at default interval=60s the maximum live zombie count is one tick's worth of worker completions. - No interaction with detect_crashed_workers: that function only inspects rows where status='running', and rows reach 'done' (and stop being inspected) before their workers exit.	2026-05-07 05:05:20 -07:00
LeonSGP43	06f24351c5	fix(kanban): stop reclaimed workers before retry	2026-05-07 05:05:20 -07:00
Teknium	63bd690a50	chore(release): map stephen0110 noreply email	2026-05-07 05:05:20 -07:00
stephen0110	40b51c93a2	fix(kanban): heartbeat tool extends claim TTL, not just last_heartbeat_at The kanban_heartbeat tool called heartbeat_worker but never heartbeat_claim, so a worker that loops the tool while a single tool call blocks the agent for >DEFAULT_CLAIM_TTL_SECONDS still got reclaimed by release_stale_claims. The function name and heartbeat_claim's own docstring imply otherwise: "Workers that know they'll exceed 15 minutes should call this every few minutes to keep ownership." But there was no caller in the worker tool path. Workers couldn't invoke heartbeat_claim themselves either — it isn't exposed as a tool. Fix: _handle_heartbeat now calls heartbeat_claim first, reading HERMES_KANBAN_CLAIM_LOCK from the worker env (the dispatcher pins this in _default_spawn). Falls back to _claimer_id() for locally- driven workers that didn't go through dispatcher spawn. Test: tests/tools/test_kanban_tools.py::test_heartbeat_extends_claim_expires rewinds claim_expires into the past, calls the tool, and asserts the new value is at least now + DEFAULT_CLAIM_TTL_SECONDS // 2. Verified to fail against the unfixed code (claim_expires stays at the rewound value). Closes the root cause underlying the symptom in #21141 (15-min respawns of long-running workers). #21141 separately addresses post-reclaim cleanup; this fixes the upstream "shouldn't have been reclaimed in the first place" half.	2026-05-07 05:05:20 -07:00
Teknium	bf843adf05	feat(gateway): opt-in cleanup of temporary progress bubbles (#21186 ) When display.cleanup_progress (or display.platforms.<plat>.cleanup_progress) is true, the gateway deletes tool-progress bubbles, long-running '⏳ Still working...' notices, and status-callback messages after the final response is delivered successfully. Currently effective on adapters that implement delete_message (Telegram); silently no-ops elsewhere. Off by default. Failed runs skip cleanup so bubbles stay as breadcrumbs. Minimal plumbing: base.py's existing post_delivery_callback slot now chains new registrations onto any existing callback (with per-callback exception isolation) rather than clobbering. Stale-generation registrations are rejected so they can't step on a fresher run's callbacks. This lets the cleanup callback coexist with the background-review release hook already registered on the same slot. Co-authored-by: mrcharlesiv <Mrcharlesiv@gmail.com>	2026-05-07 05:04:37 -07:00
ambition0802	7c0766e06a	fix(gateway): translate inbound document host paths to container paths for Docker backend When terminal.backend is docker, inbound documents uploaded via messaging platforms (Telegram, Slack, Discord, Feishu, Email, etc.) are cached at a host path under ~/.hermes/cache/documents, but the container sandbox only sees them at the auto-mounted /root/.hermes/cache/documents path. This PR adds to_agent_visible_cache_path() in tools/credential_files.py (the natural sibling to get_cache_directory_mounts()) and calls it at the document-context-injection site in gateway/run.py so the agent always receives a path it can open directly, matching the mount layout already established by get_cache_directory_mounts() (#4846). Scope: only Docker backend for now; other backends use different mount semantics and are left unchanged until verified. Fixes #18787	2026-05-07 05:02:26 -07:00

1 2 3 4 5 ...

7510 commits