hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-01 01:51:44 +00:00

Author	SHA1	Message	Date
Teknium	398945e7b1	fix(cron): accept list-form deliver values so deliver=['telegram'] works (#17456 ) The cron schema contracts deliver as a string ("local", "origin", "telegram", "telegram:chat_id[:thread_id]", or comma-separated combos), but MCP clients and scripts sometimes pass an array like ['telegram']. Before this change, the list was written to jobs.json verbatim, and the scheduler's str(deliver).split(',') then tried to resolve the literal string "['telegram']" as a platform — returning None and logging 'no delivery target resolved for deliver=[\'telegram\']'. Fix on both ends: - tools/cronjob_tools.py: normalize deliver at the API boundary on create and update, so storage is always a string. - cron/scheduler.py: normalize deliver in _resolve_delivery_targets, so existing jobs.json entries with list-form deliver are handled gracefully without requiring users to edit the file. Closes #17139	2026-04-29 06:35:34 -07:00
Nicolò Boschi	0565497dcc	fix(hindsight): drain retain queue cleanly on shutdown The plugin used to spawn one daemon thread per sync_turn() to do the aretain_batch network write. On CLI exit, that pattern raced interpreter shutdown — the last retain could reach aiohttp after asyncio's "cannot schedule new futures" guard had fired, producing noisy logs and silently losing the final unsaved turn: WARNING ... Hindsight sync failed: cannot schedule new futures after interpreter shutdown ERROR asyncio: Unclosed client session client_session: <aiohttp.client.ClientSession object at 0x...> Switch to a single-writer model: each provider owns one long-lived writer thread plus a queue. sync_turn() snapshots state and enqueues a job; the writer drains sequentially. Once shutdown() is called: - new sync_turn() / queue_prefetch() calls are dropped, not enqueued - a sentinel wakes the writer so it finishes in-flight work - shutdown joins the writer (10s) before nulling the client Also register an idempotent atexit hook from the first sync_turn(), so exit paths that don't go through MemoryManager.shutdown_all() (Ctrl-C, abrupt exit) still get a chance to drain. Tests: keep _sync_thread as a legacy alias to the writer, swap join() calls to _retain_queue.join() (canonical wait-for-drain), add a new TestShutdownRace suite covering single-writer reuse, post-shutdown drop, queue draining, and shutdown idempotency.	2026-04-29 06:34:24 -07:00
briandevans	835f9adec0	fix(update,test): clarify wmic comment; switch tests to monkeypatch sys.platform Two fix-ups for #17123: 1. Reword the inline comment in `_warn_stale_dashboard_processes` to accurately describe the failure mode (locale-dependent decoder, not a "default UTF-8 decoder") and identify `errors="ignore"` as the load-bearing protection. Per Copilot's review. 2. Switch `TestWindowsWmicEncoding` from `patch("hermes_cli.main.sys")` to `monkeypatch.setattr(sys, "platform", "win32")` — the codebase's canonical pattern (e.g. `tests/hermes_cli/test_auth_ssl_macos.py`). The MagicMock-replacement approach passed locally on Python 3.12 but the platform-equality check failed under CI's xdist+Python 3.11, leaving both new tests red despite the fix being present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 06:34:13 -07:00
briandevans	b85fff9495	fix(update): protect dashboard wmic scan against UnicodeDecodeError on Windows non-UTF-8 locales (#17049 ) `hermes update` calls `_warn_stale_dashboard_processes()` to warn about dashboard processes still running the pre-update Python backend. On Windows, that scan shells out to `wmic process get ProcessId,CommandLine /FORMAT:LIST` with `text=True` and no explicit encoding. `wmic` emits text in the system code page (e.g. cp936 on zh-CN locales), not UTF-8. Without an explicit `encoding=`, Python's default UTF-8 decoder crashes the subprocess reader thread with `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 ...`. In Python 3.11 that crash is silently absorbed: `subprocess.run()` returns a `CompletedProcess` with `result.stdout = None`, the next line calls `result.stdout.split("\n")`, and `hermes update` aborts with the exact `AttributeError: 'NoneType' object has no attribute 'split'` trace reported in #17049. Fix: pass `encoding="utf-8", errors="ignore"` so undecodable bytes cannot take down the reader thread (the parsing only matches the ASCII prefixes `CommandLine=` and `ProcessId=`, so dropping non-UTF-8 bytes is safe), and short-circuit when `result.stdout is None` as a defensive guard for environments where the reader thread still fails for other reasons. This is the same root cause as #17074 (which patches `hermes_cli/gateway._scan_gateway_pids` for the `hermes setup` path). That PR does not touch `_warn_stale_dashboard_processes`, so `hermes update` remains broken on the same locales until this lands. Regression test in `tests/hermes_cli/test_update_stale_dashboard.py`: - `test_wmic_invoked_with_utf8_ignore_errors` asserts the explicit encoding/errors kwargs reach `subprocess.run`. - `test_wmic_returns_none_stdout_does_not_crash` simulates the reader-thread-crashed `result.stdout=None` aftermath and asserts the function returns silently instead of raising AttributeError. Both new tests fail against clean origin/main (`7d4648461`) reproducing the original AttributeError; both pass with this patch. The remaining 3 failures in `tests/hermes_cli/test_cmd_update.py` and `test_update_autostash.py` are pre-existing baselines on origin/main — they reproduce identically without this change and are unrelated to the wmic scan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 06:34:13 -07:00
teknium1	9e63062b6c	fix(stt): resolve API keys from ~/.hermes/.env via get_env_value (#17140 ) Widen #17163 to the sibling file tools/transcription_tools.py, which had the same class of bug. STT provider call sites and the _get_provider selection gate called os.getenv(...) directly and missed keys that only lived in ~/.hermes/.env. Same pattern as tts_tool.py: one guarded top-level import of get_env_value (falls back to os.getenv on ImportError), then every API-key and paired-base-URL lookup swapped over. Call sites migrated: - _transcribe_groq — GROQ_API_KEY - _transcribe_mistral — MISTRAL_API_KEY - _transcribe_xai — XAI_API_KEY, XAI_STT_BASE_URL - _get_provider — GROQ/MISTRAL/XAI_API_KEY in explicit + auto branches Module-level defaults (DEFAULT_STT_MODEL, GROQ_BASE_URL, etc.) stay on os.getenv — they're import-time constants, not runtime config, and the dotenv fallback would add no value there. New regression tests in tests/tools/test_transcription_dotenv_fallback.py (8 cases) mirror briandevans' TTS tests: per-provider dotenv-key forwarding, selection-gate dotenv visibility, and an end-to-end probe that patches hermes_cli.config.load_env to simulate ~/.hermes/.env carrying the key while os.environ does not.	2026-04-29 06:25:20 -07:00
briandevans	33967b4e52	fix(tts): tolerate missing hermes_cli.config in tts_tool import Wrap the new top-level `from hermes_cli.config import get_env_value` in try/except ImportError and fall back to a thin os.getenv shim, so importing tools.tts_tool keeps working in environments where hermes_cli.config is unavailable. This matches the existing tolerance in `_load_tts_config()` (tools/tts_tool.py) and the same import-fallback pattern in tools/tool_backend_helpers.py::fal_key_is_configured. Also update the TestDotenvFallbackPerProvider docstring to accurately describe the mocking strategy: per-provider tests patch `tools.tts_tool.get_env_value` directly, while the regression-guard tests cover the lower-level `hermes_cli.config.load_env` integration. Addresses Copilot review on #17163. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 06:25:20 -07:00
briandevans	40d25e125b	fix(tts): resolve API keys from ~/.hermes/.env via get_env_value (#17140 ) TTS provider tools (elevenlabs, xai, minimax, mistral, gemini) called os.getenv("X_API_KEY") directly, which bypassed Hermes's dotenv bridge in hermes_cli.config. Users who keep their TTS keys only in ~/.hermes/.env saw "X_API_KEY not set" errors even though the rest of the stack (agent/credential_pool, hermes_cli/auth) already resolves keys through get_env_value() — same class of bug as #15914 fixed for those modules. Switch every TTS env-var lookup (API keys, base URLs, and check_tts_requirements gates) to get_env_value, which checks os.environ first and then ~/.hermes/.env. Behaviour for users with keys exported in the shell is unchanged; users with dotenv-only keys now succeed. The two diagnostics prints in __main__ are migrated for consistency. Regression test (tests/tools/test_tts_dotenv_fallback.py): - per-provider: each backend reads the dotenv key when only ~/.hermes/.env carries it (5 providers). - end-to-end: with hermes_cli.config.load_env returning the key and os.environ empty, _generate_minimax_tts and check_tts_requirements both succeed; reverting tools/tts_tool.py back to os.getenv makes all 7 tests fail with "MINIMAX_API_KEY not set" / similar. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 06:25:20 -07:00
Teknium	ff687c019e	fix(aux): skip kimi-coding in vision auto-detect (closes #17076 ) (#17451 ) * docs(anthropic): correct OAuth scope to Max plan + extra usage credits only The previous docs pass (#17399) overstated what Anthropic OAuth works with. In practice Hermes can only route against a Claude Max plan that has purchased extra usage credits — the base Max allowance is not consumed, and Claude Pro is not supported at all. Without Max + extra credits, users must fall back to an ANTHROPIC_API_KEY (pay-per-token). Updates the four pages touched in #17399: - integrations/providers.md - user-guide/features/credential-pools.md - reference/environment-variables.md - getting-started/quickstart.md * fix(aux): skip kimi-coding in vision auto-detect (closes #17076) Kimi Coding Plan's /coding endpoint (Anthropic Messages wire) has no image_in capability — Kimi's own docs confirm and suggest switching to a vision-capable model. Vision lives on the separate Kimi Platform (api.moonshot.ai, OpenAI-wire, pay-as-you-go). When the user has kimi-coding as main provider and auxiliary.vision.provider=auto, resolve_vision_provider_client was handing back an AnthropicAuxiliaryClient wrapped around /coding which 404'd on every vision request. Add a _PROVIDERS_WITHOUT_VISION frozenset ({kimi-coding, kimi-coding-cn}) and gate the main-provider vision branch on membership. On a skip the auto-detect falls through to OpenRouter → Nous like any other main-provider-unavailable case. Explicit per-task overrides (auxiliary.vision.provider=kimi-coding) are unaffected — the skip only applies when the caller is in auto mode. Tests: 4 new targeted tests in TestVisionAutoSkipsKimiCoding covering the skip path, CN variant, explicit-override passthrough, and a guard against accidental skip-list widening.	2026-04-29 06:10:23 -07:00
teknium1	258755a24f	test(weixin): cover _is_stale_session_ret helper (#17228 ) Regression test for the ret=-2 / errmsg='unknown error' disambiguation: - ret=-2 or errcode=-2 with 'unknown error' → stale session (True) - ret=-2 with 'freq limit' or other errmsg → rate limit (False) - ret=-14 → not matched here (handled by SESSION_EXPIRED_ERRCODE path) - Success codes and missing errmsg → False	2026-04-29 05:44:44 -07:00
teknium1	b0435cc164	fix(model_tools): cancel coroutine on timeout so worker thread exits + log full traceback _run_async() bridges sync tool handlers to async code. When the handler is invoked from inside a running event loop (gateway / nested async), it spawns a worker thread and blocks on future.result(timeout=300). Before this change, a coroutine that ran past 300s leaked its worker thread: - future.cancel() is a no-op on a running ThreadPoolExecutor future (cancel only works on not-yet-started work). - pool.shutdown(wait=False, cancel_futures=True) let the caller proceed but the worker kept running the coroutine until it returned on its own. Every tool timeout leaked one thread. In long-lived gateway / RL sessions this is cumulative. The fix replaces bare asyncio.run() with a worker wrapper that creates its own event loop. On timeout, _run_async schedules task.cancel() on that loop via call_soon_threadsafe, then shuts the pool down with wait=False so the caller returns immediately. The coroutine observes CancelledError at its next await and the worker thread exits cleanly. Also switches logger.error() to logger.exception() in the top-level handle_function_call() except block so tool failures produce full stack traces in errors.log instead of just the message. Related: #17420 (contributor flagged the leak; the original fix used pool.shutdown(wait=True) which would have converted the leak into a hang — caller blocks forever on the same stuck coroutine). Credit for identifying the leak goes to the contributor. Co-authored-by: 0z! <162235745+0z1-ghb@users.noreply.github.com>	2026-04-29 05:00:40 -07:00
tmimmanuel	3606414ec7	fix(gateway): isolate platform connect failures with per-platform timeout Wrap each adapter.connect() in asyncio.wait_for() so one platform hanging during startup or reconnect cannot block the others. Telegram's 8-retry connect loop (~140s worst case) previously prevented Feishu from ever starting when Telegram was network-restricted — common for users in regions where Telegram is blocked. Default timeout is 30s; override via HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT (0 disables). Applied to both startup and the reconnect watcher so a platform that hangs mid-retry also does not stall retries for others. Fixes #17242	2026-04-29 05:00:37 -07:00
Teknium	20b759cd02	fix(process): reconcile session.exited against real child exit in poll/wait (#17430 ) When a background terminal process spawns a descendant daemon that inherits the stdout pipe (e.g. 'hermes update' triggering a gateway systemctl restart), the reader thread's stdout.read() never returns EOF and its finally: block never runs. session.exited stays False forever, so process(action='poll') returns 'running' indefinitely even though the direct child exited long ago. Issue #17327: Feishu user polled 74 times over 7 minutes before killing the gateway manually. Fix: add _reconcile_local_exit() that checks the direct Popen.poll() before trusting session.exited. If the direct child has exited, drain any immediately-readable bytes non-blocking and flip session.exited. Called from poll() and wait(). The stuck reader thread remains blocked but is a daemon thread and gets reaped with the process. Safe no-op for env/PTY sessions, already-exited sessions, and live children (returns None from Popen.poll()).	2026-04-29 04:59:21 -07:00
Teknium	13683c0842	feat(memory): notify providers on mid-process session_id rotation (#17409 ) Fixes #6672 Memory providers now receive on_session_switch() whenever AIAgent.session_id rotates mid-process — /resume, /branch, /reset, /new, and context compression. Before this, providers that cached per-session state in initialize() (Hindsight's _session_id, _document_id, accumulated _session_turns, _turn_counter) kept writing into the old session's record after the agent had moved on. MemoryProvider ABC ------------------ - New optional hook on_session_switch(new_session_id, , parent_session_id='', reset=False, *kwargs) with no-op default for backward compat. reset=True signals /reset or /new — providers should flush accumulated per-session buffers. reset=False for /resume, /branch, compression where the logical conversation continues. MemoryManager ------------- - on_session_switch() fans the hook out to every registered provider. Isolated try/except per provider — one bad provider can't block others. - Empty/None new_session_id is a no-op to avoid corrupting provider state during shutdown paths. run_agent.py ------------ - _sync_external_memory_for_turn now passes session_id=self.session_id into sync_all() and queue_prefetch_all(). Providers with defensive session_id updates in sync_turn (Hindsight already had this at plugins/memory/hindsight/__init__.py:1199) now actually receive the current id. - Compression block at ~L8884 already notified the context engine of the rollover; now also calls _memory_manager.on_session_switch(reason='compression'). cli.py ------ - new_session() fires reset=True, reason='new_session' so providers flush buffers. - _handle_resume_command fires reset=False, reason='resume' with the previous session as parent_session_id. - _handle_branch_command fires reset=False, reason='branch' with the parent session_id already captured for the DB parent link. gateway/run.py -------------- - _handle_resume_command now evicts the cached AIAgent, mirroring /branch and /reset. The next message rebuilds a fresh agent whose memory provider initialize() runs with the correct session_id — matches the pattern the gateway already uses for provider state cross-session transitions. Hindsight reference implementation ---------------------------------- - plugins/memory/hindsight/__init__.py adds on_session_switch that: updates _session_id, mints a fresh _document_id (prevents vectorize-io/hindsight#1303 overwrite), and clears _session_turns / _turn_counter / _turn_index so in-flight batches don't flush under the new document id. parent_session_id only overwritten when provided (avoids clobbering on a bare switch). Tests ----- - tests/agent/test_memory_session_switch.py: new dedicated file. ABC default no-op, manager fan-out, failure isolation, empty-id no-op, session_id propagation through sync_all/queue_prefetch_all, Hindsight state transitions for every reset/non-reset case, parent preservation. - tests/cli/test_branch_command.py: new test verifying /branch fires the hook with correct parent_session_id + reset=False + reason. - tests/gateway/test_resume_command.py: new test verifying /resume evicts the cached agent. - tests/run_agent/test_memory_sync_interrupted.py: updated existing assertions to account for the session_id kwarg on sync_all and queue_prefetch_all. E2E verified (real imports, tmp HERMES_HOME): - /resume: session_id updates, doc_id fresh, buffers cleared, parent set - /branch: session_id forks, parent links to original - /new: reset=True clears accumulated state - compression: reason='compression' propagated, lineage preserved - Empty id: no-op, state preserved - Legacy provider without on_session_switch: no crash Reported by @nicoloboschi (Hindsight maintainer); related scope-widening comment by @kidonng extending coverage to compression.	2026-04-29 04:57:22 -07:00
Rylen Anil	37d107e03d	[verified] fix(gateway): accept user systemd private socket during preflight	2026-04-29 04:57:01 -07:00
Teknium	df0e97a168	fix(minimax): enable Anthropic prompt caching for MiniMax's own models (#17425 ) MiniMax's /anthropic endpoint documents cache_control support (0.1x read pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR #12846 gated third-party Anthropic-wire caching on 'claude' in model name, which left MiniMax's own model family re-paying full input tokens every turn. Opt in explicitly via provider id (minimax / minimax-cn) or host match (api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the existing Qwen/Alibaba branch below; leaves room for a capability-based surface (ProviderConfig.supports_anthropic_cache) if a third provider needs it. Closes #17332	2026-04-29 04:56:55 -07:00
exiao	23f5fc6765	feat(gateway/signal): native formatting, reply quotes, and reactions Three Signal adapter improvements that depend on the no-edit-mode plumbing from the previous commit. 1. Native formatting (markdown -> Signal bodyRanges) Signal renders markdown as literal characters (bold, `code`, # heading), which looks broken. Added _markdown_to_signal(text) that strips markdown syntax and emits Signal-native bodyRanges as start:length:STYLE entries. Offsets are computed in UTF-16 code units so non-BMP emoji stay aligned. Supports BOLD, ITALIC, STRIKE, MONO, and headings mapped to BOLD. Fenced code and inline code are handled; link syntax is unwrapped to visible text + URL. Includes edge-case fixes reported previously: - Bullet lists ("* item") no longer misidentified as italics - URLs containing underscores no longer italicized around the dot 2. Reply-quote context Parses dataMessage.quote on inbound messages and populates MessageEvent.raw_message with sender + timestamp_ms. This lets the gateway's existing [Replying to: "..."] injector (gateway/run.py) work on Signal, matching Telegram/Matrix behavior. 3. Processing reactions Overrides on_processing_start -> hourglass and on_processing_complete -> checkmark via the sendReaction JSON-RPC using targetAuthor and targetTimestamp pulled from raw_message. Uses the ProcessingOutcome enum introduced in the previous commit. Also sets SUPPORTS_MESSAGE_EDITING = False on SignalAdapter so the no-edit streaming path activates. Tests: 40+ new tests in tests/gateway/test_signal_format.py covering markdown conversion, UTF-16 offset correctness with non-BMP emoji, bullet-list and URL false-positive regressions, reply-quote extraction, and reaction payload shape. Regression extensions to test_signal.py.	2026-04-29 04:38:17 -07:00
Teknium	21676e80cc	Revert "fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path (#16957 )" (#17397 ) This reverts commit `023f5c74b1`.	2026-04-29 03:55:03 -07:00
Ben Barclay	58a6171bfb	Merge pull request #17305 from NousResearch/feat/docker-run-as-host-user feat(docker): run container as host user to avoid root-owned bind mounts	2026-04-29 16:41:55 +10:00
Teknium	bc0d8a941e	feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307 ) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.	2026-04-28 23:23:11 -07:00
Ben	5531c0df82	feat(docker): run container as host user to avoid root-owned bind mounts Add opt-in terminal.docker_run_as_host_user config flag that passes --user $(id -u):$(id -g) to the Docker backend so files written into bind-mounted directories (/workspace, /root, docker_volumes entries) are owned by the host user instead of root. When enabled on POSIX platforms, also drops SETUID/SETGID caps since the container no longer needs gosu/su to switch users. Falls back cleanly on platforms without os.getuid (e.g. native Windows Docker) with a warning. Wired through all three config.yaml -> TERMINAL_* env-var bridges: - cli.py env_mappings (CLI + TUI startup) - gateway/run.py _terminal_env_map (gateway / messaging platforms) - hermes_cli/config.py _config_to_env_sync (`hermes config set`) Also fixes docker_mount_cwd_to_workspace silently failing in gateway mode -- it was missing from gateway/run.py's _terminal_env_map. Adds tests/tools/test_terminal_config_env_sync.py to guard against future drift between the three bridges (same bug class shipped twice in one month). Bundled Hermes image won't work with this flag since its entrypoint expects to start as root for the usermod/gosu hermes flow; works with the default nikolaik/python-nodejs image and plain Debian/Ubuntu.	2026-04-29 16:16:43 +10:00
brooklyn!	5e68503d2f	Merge pull request #17190 from NousResearch/bb/tui-cold-start-profiling perf(tui): cut visible cold start ~57% with lazy agent init	2026-04-28 22:45:14 -07:00
teknium1	fa9383d27b	feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations Based on three live test runs against 346 agent-created skills on the author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator prompt needed three sharpenings before it consistently produced real umbrella consolidation instead of passive audit output: Umbrella-first framing. The original 'decide keep/patch/archive/ consolidate' framing lets opus default to 'keep' whenever two skills aren't byte-identical. The new prompt explicitly tells the reviewer that pairwise distinctness is the wrong bar — the right question is 'would a human maintainer write this as N separate skills, or one skill with N labeled subsections?' Expect 10-25 prefix clusters; merge each into an umbrella via one of three methods. Three concrete consolidation methods. (a) Merge into an existing umbrella (patch the broadest skill, archive siblings); (b) Create a new umbrella SKILL.md (skill_manage action=create); (c) Demote session-specific detail into references/, templates/, or scripts/ under the umbrella via skill_manage action=write_file, then archive the narrow sibling. This matches the support-file vocabulary the review-prompt side already uses (PR #17213). Two observed bailouts pre-empted: 'usage counters are zero so I can't judge' (rule 4: judge on content, not use_count) and 'each has a distinct trigger' (rule 5: pairwise distinctness is the wrong bar). Config-aware parent inheritance. _run_llm_review() was building AIAgent() without explicit provider/model, hitting an auto-resolve path that returned empty credentials → HTTP 400 'No models provided' against OpenRouter. Fork now inherits the user's main provider and model (via load_config + resolve_runtime_provider) before spawning — runs on whatever the user is currently on, OAuth-backed or pool-backed included. Unbounded iteration ceiling. max_iterations=8 was way too low for an umbrella-build pass over hundreds of skills. A live pass takes 50-100 API calls (scanning, clustering, skill_view'ing candidates, patching umbrellas, mv'ing siblings). Raised to 9999 — the natural stopping criterion is 'no more clusters worth processing', not an arbitrary tool-call budget. Tests updated: test_curator_review_prompt_has_invariants accepts DO NOT / MUST NOT and drops 'keep' from the required-verb set (the umbrella-first prompt correctly deemphasizes 'keep' as a first-class decision label since passive keep-everything is the failure mode being prevented). Added test_curator_review_prompt_is_umbrella_first asserting the umbrella framing, class-level thinking, references/ + templates/ + scripts/ support-file mentions, and the 'use_count is not evidence of value' pre-emption. Added test_curator_review_prompt_offers_support_file_actions asserting skill_manage action=create and action=write_file are both named. Live validation on author's setup: - Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome - Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella - Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.	2026-04-28 22:33:33 -07:00
Teknium	a12f7aa8bb	fix(curator): default cycle is every 7 days, not 24 hours Weekly is closer to how skill churn actually works — most agent-created skills don't change multiple times per day, so a daily review is pure cost without benefit. Bumping the default to 7 days reduces aux-model spend while still catching drift and staleness on the timescales that matter (30d stale, 90d archive). Changes: - DEFAULT_INTERVAL_HOURS: 24 -> 168 (7 days) - config.yaml default: interval_hours: 24 -> 24 * 7 - CLI status line renders as '7d' when interval is a whole-day multiple - Test `test_old_run_eligible` decoupled from the exact default: it now uses 2 * get_interval_hours() so future tweaks don't break it	2026-04-28 22:33:33 -07:00
Teknium	0d31864e3b	fix(curator): defense-in-depth gates against bundled/hub skills Previous invariants only gated the primary entry points (apply_automatic_transitions, archive_skill, CLI pin). Several paths were unprotected: - bump_view / bump_use / bump_patch / set_state / set_pinned wrote usage records unconditionally, which is confusing noise in .usage.json even though the review list filtered them out - restore_skill did not check whether a bundled skill now shadows the archived name - CLI unpin was asymmetric with CLI pin — it had no gate Fixes: - _mutate() (the shared counter / state writer) now drops silently when the skill is not agent-created. .usage.json never gains a record for a bundled or hub-installed skill. - restore_skill() refuses to restore under a name that is now bundled or hub-installed (would shadow upstream). - CLI unpin gate matches CLI pin. New tests: - 5 provenance-guard tests on skill_usage (one per mutator) - 1 end-to-end test that hammers every mutator at a bundled skill and a hub skill, asserts both are untouched on disk, and asserts the sidecar stays clean - 2 CLI tests proving pin/unpin refuse bundled skills symmetrically 64/64 tests passing (29 skill_usage + 27 curator + 8 new guards).	2026-04-28 22:33:33 -07:00
Teknium	c8b7e7268a	refactor(curator): point review prompt at existing tools The LLM review prompt mentioned bespoke `archive_skill` and `pin_skill` tools that are not registered as model tools. Swap the prompt to rely on the real surface: - skill_manage action=patch — for patching and consolidation - terminal — to `mv` skill dirs into .archive/ Also drop `pin` from the model's decision list — pinning is a user opt-out for `hermes curator pin <skill>`, not something the model should do autonomously. Decision list is now: keep / patch / consolidate / archive. Tests updated: prompt-invariant test now asserts the existing tools are referenced and that bespoke tool names do NOT appear. New test prevents `pin` from being re-added as a model decision.	2026-04-28 22:33:33 -07:00
Teknium	bc79e227e6	feat(curator): background skill maintenance (issue #7816 ) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.	2026-04-28 22:33:33 -07:00
Lyle Lengyel	80e474f11f	fix(gateway,terminal): expand shell tilde in terminal.cwd before subprocess Commit `3c42064e` made config.yaml the single source of truth for TERMINAL_CWD, but the config bridge passes cwd values verbatim to os.environ. When a user sets terminal.cwd: ~/ in config.yaml, the literal string '~/'' reaches subprocess.Popen, which the kernel rejects because it does not expand shell tilde syntax. This patch adds three defensive layers: 1. gateway/run.py — expanduser at config bridge time so TERMINAL_CWD is always an absolute path. 2. tools/terminal_tool.py — expanduser when reading TERMINAL_CWD in _get_env_config(), guarding against stale or manually-set env vars. 3. tools/environments/local.py — expanduser in LocalEnvironment before passing cwd to subprocess.Popen, the final safety net. Includes regression tests in test_config_cwd_bridge.py for nested terminal.cwd, top-level cwd alias, and precedence ordering. Refs: `3c42064e`	2026-04-28 22:26:09 -07:00
JackJin	88e07c42b4	fix(cli): prevent .env sanitizer from splitting GLM_API_KEY by LM_API_KEY suffix The known-key splitter in `_sanitize_env_lines` used substring matching to find concatenated KEY=VALUE pairs. When a registered key was a suffix of another (LM_API_KEY is a suffix of GLM_API_KEY), the shorter key's needle would match inside the longer one, causing the sanitizer to rewrite `GLM_API_KEY=...` as `G\nLM_API_KEY=...` and silently break Z.AI/GLM auth (and similarly `GLM_BASE_URL` -> `G\nLM_BASE_URL`). Drop matches whose needle range is fully contained within a longer overlapping match. Two regression tests cover the suffix-collision case and confirm a real concatenation that happens to start with the longer key still splits where it should. Fixes #17138	2026-04-28 22:22:45 -07:00
Brooklyn Nicholson	4858e26eaa	feat(tui): port classic CLI /reload (.env hot-reload) to TUI Classic CLI exposes ``/reload`` (re-reads ~/.hermes/.env into ``os.environ`` via ``hermes_cli.config.reload_env``) so newly added API keys take effect without restarting the session. The TUI was missing the parity command, so users had to Ctrl+C out and ``hermes --tui`` again whenever they added or rotated a credential. Three small wires: * New ``reload.env`` JSON-RPC method in ``tui_gateway/server.py`` that delegates to ``hermes_cli.config.reload_env`` and returns the count of vars updated. * New ``/reload`` slash command in ``ui-tui/src/app/slash/commands/ops.ts`` matching the existing ``/reload-mcp`` pattern (native RPC, no slash worker). * Drop ``cli_only=True`` from the ``reload`` ``CommandDef`` in ``hermes_cli/commands.py`` so help/menus surface it in the TUI too. ``reload_env`` itself is environment-agnostic. Same caveat as classic CLI: the currently constructed agent's credential pool / provider routing does not auto-rebuild. Users who want a brand-new credential resolution should follow with ``/new``. Tests: * New ``test_reload_env_rpc_calls_hermes_cli_reload_env`` confirms RPC delegates and reports the count. * New ``test_reload_env_rpc_surfaces_errors`` confirms exceptions are rendered as JSON-RPC errors. * ``createSlashHandler.test.ts`` slash-parity matrix extended with ``['/reload', 'reload.env', {}]`` so we can't regress the routing. Validation: scripts/run_tests.sh tests/test_tui_gateway_server.py — 92/92. scripts/run_tests.sh tests/hermes_cli/test_commands.py — 128/128. cd ui-tui && npm run type-check — clean; npm test --run — 390/390.	2026-04-28 22:22:30 -07:00
Teknium	dcd7b717f8	fix(gateway): linearize tool-progress bubbles with content messages (#17280 ) After PR #7885 (`97b0cd51e`) added content-side segment breaks for natural mid-turn assistant messages, the tool-progress task in gateway/run.py was not updated to match. progress_msg_id and progress_lines persisted for the whole run, so after a tool batch produced bubble B1 followed by content bubble C1, the next tool.started kept editing the OLD bubble B1 above C1 — making the chat appear out of order on Telegram, Discord, and Slack. Add on_new_message callback to GatewayStreamConsumer, fired at the four sites where a fresh content bubble lands on the platform: - _send_or_edit first-send branch (NOT edits) - _send_commentary - _send_new_chunk (overflow split) - each successful chunk of _send_fallback_final Gateway supplies a lambda that enqueues ('__reset__',) into the progress_queue. send_progress_messages() handles the marker in both the main loop and the CancelledError drain path, clearing progress_msg_id, progress_lines, and the dedup state so the next tool.started opens a fresh bubble below the new content. Result: each tool batch appears in chronological order below the preceding content. When no content appears between tool batches, tools still group in one bubble (CLI-style compactness). Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 22:17:33 -07:00
Tranquil-Flow	ac855bba0e	fix(cli): respect terminal.cwd config in local terminal backend init_session() runs a login shell bootstrap that sources profile scripts (.bashrc, .bash_profile, etc.) before capturing pwd. If any profile script changes the working directory, the captured cwd overwrites the configured terminal.cwd value — so terminal commands run in the wrong directory despite the TUI banner showing the configured path. Add an explicit 'builtin cd' to the configured cwd in the bootstrap script, after profile sourcing but before pwd capture, ensuring the configured terminal.cwd is always what gets recorded. Fixes #14044	2026-04-28 22:16:08 -07:00
Brooklyn Nicholson	f95c34f415	fix(browser): address Copilot round-4 on /browser connect * Reject unsupported schemes (anything outside http/https/ws/wss) in cli.py /browser connect before probing or persisting, matching the gateway's existing 4015 path. * Defend gateway browser.manage against `{"url": null}` and non-string urls: empty/null falls back to DEFAULT_BROWSER_CDP_URL, non-string returns a 4015 instead of slipping into the generic 5031 catch via TypeError on `"://" in url`. * Add regression tests for both null-url fallback and non-string rejection.	2026-04-28 22:11:10 -07:00
Brooklyn Nicholson	679a27498d	fix(browser): address Copilot round-3 on /browser connect * Gate `browser.progress` emit on truthy `session_id`. The TUI prints `messages` from the response when there's no session, so emitting events too would double-render. Now: with a session → events stream live; without one → bundled messages only. * Resolve `system = platform.system()` once in `_browser_connect` and thread it through `try_launch_chrome_debug` and `_failure_messages` → `manual_chrome_debug_command`, so the generated hint is consistent (and tests are deterministic) on any host. * Add `test_browser_manage_connect_no_session_skips_progress_events` to lock in the gating behavior.	2026-04-28 22:11:10 -07:00
Brooklyn Nicholson	d1ee4915f3	fix(browser): address Copilot review on /browser connect Fixes from Copilot's two passes on PR #17238: * Validate parsed URL once: reject missing host, invalid port, and unsupported scheme up front so malformed inputs (e.g. http://:9222 or http://localhost:abc) don't fall through to a generic 5031. * Tighten _is_default_local_cdp to require a discovery-style path so ws://127.0.0.1:9222/devtools/browser/<id> is not collapsed to bare http://127.0.0.1:9222 (which would lose the path and break the connect). * Move browser.manage into _LONG_HANDLERS so the up-to-10s launch-and-retry loop runs on the RPC pool instead of blocking the main dispatcher. * try_launch_chrome_debug uses Windows-appropriate detach kwargs (creationflags=DETACHED_PROCESS\|CREATE_NEW_PROCESS_GROUP) instead of POSIX-only start_new_session=True. * manual_chrome_debug_command uses subprocess.list2cmdline on Windows so the printed instruction is cmd.exe-compatible. * Mirror host/port validation in cli.py /browser connect so the classic CLI never persists an invalid BROWSER_CDP_URL.	2026-04-28 22:11:10 -07:00
Brooklyn Nicholson	e750829015	fix(tui): stream /browser connect progress as gateway events Emit browser.progress JSON-RPC notifications during the connect work and render them in the TUI as system transcript lines, so users see the same step-by-step status the base CLI prints instead of nothing for ~1m followed by a final result.	2026-04-28 22:11:10 -07:00
Brooklyn Nicholson	7d39a45749	fix(tui): show /browser connect progress like CLI Return CLI-style browser connect status messages from the gateway and render them in the TUI so local Chrome launch attempts are visible instead of ending in a silent delayed failure.	2026-04-28 22:11:10 -07:00
Brooklyn Nicholson	69ff114ee2	fix(browser): avoid bogus Chrome launch fallback Detect an actual Chrome/Chromium executable before printing a manual CDP launch command, including common WSL-mounted Windows browser paths, so /browser connect does not suggest google-chrome when it is unavailable.	2026-04-28 22:11:10 -07:00
Brooklyn Nicholson	f10a3df632	fix(tui): align /browser connect local CDP handling Share Chrome CDP launch helpers between the classic CLI and TUI so default /browser connect uses loopback consistently, retries local Chrome launch, and reports a copyable manual-start command instead of claiming a dead connection.	2026-04-28 22:11:10 -07:00
Teknium	1d4218be56	feat(review): active-update bias, loaded-skill-first, support-file variants (#17213 ) The background skill-review prompts (_SKILL_REVIEW_PROMPT and the Skills half of _COMBINED_REVIEW_PROMPT) steered the reviewer toward passive behavior — most passes concluded 'Nothing to save.' even when the session produced real lessons. User-preference corrections (style, format, legibility, verbosity) were especially lost: they were read as memory signals only, so skills never carried the fix. This rewrite changes the stance: - Active-update bias. The reviewer now treats inaction as a missed learning opportunity. 'Nothing to save.' remains an explicit escape but is no longer framed as the most-common outcome. - User-preference corrections are first-class skill signals. Style, tone, format, legibility, verbosity complaints — and the actual phrasings users use ('stop doing X', 'this is too verbose', 'I hate when you Y', 'remember this') — now warrant patching the skill that governs the task, not just writing to memory. - Loaded-skill-first preference order. When a skill was loaded via /skill-name or skill_view during the session, the reviewer patches THAT one first. It was in play; it's the right place. - Four-step ladder: patch-loaded → patch-umbrella → support-file → create. Support files are explicitly enumerated as three kinds: * references/<topic>.md — session-specific detail OR condensed knowledge banks (quoted research, API docs excerpts, domain notes) * templates/<name>.<ext> — starter files to copy and modify * scripts/<name>.<ext> — statically re-runnable actions - Name-veto for CREATE. New skill names MUST be class-level — no PR numbers, error strings, codenames, library-alone names, or session artifacts ('fix-X / debug-Y / audit-Z-today'). If the proposed name only fits today's task, fall back to one of the patch/support-file options. - Memory scope clarified. 'who the user is and what the current situation and state of your operations are' — MEMORY.md is situational/state, USER.md is identity/preferences. - Curator handoff. Reviewer flags overlap; the background curator handles consolidation at scale. Single-session reviewer doesn't attempt umbrella-rebalancing. Tests: tests/run_agent/test_review_prompt_class_first.py upgraded to assert the new behavioral contracts (active bias, user-correction signals, loaded-skill-first, support-file kinds, name-veto, memory framing, curator handoff). 17 tests, all pass. Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 21:11:48 -07:00
Brooklyn Nicholson	9e398e1809	perf(tui): avoid importing classic CLI during tool discovery TUI session readiness was still laggy after the gateway-ready fixes. Profiling session.create -> session.info showed the slow phase is background AIAgent construction (~1.1s). A cProfile run of tui_gateway.server::_make_agent showed model_tools/tool discovery importing tools.code_execution_tool, whose module-level EXECUTE_CODE_SCHEMA calls _get_execution_mode(), which imported cli.CLI_CONFIG. That pulled the classic interactive CLI stack (prompt_toolkit/Rich and REPL setup) into every agent startup path, including hermes --tui where it is not used. Replace that with hermes_cli.config.read_raw_config(), which is cached and reads only the raw code_execution section. Existing defaults still apply when the key is absent. Measurements on macOS Terminal.app: - import run_agent: ~466ms -> ~347ms - model_tools import: ~418ms -> ~272ms - _make_agent: ~1452ms -> ~1239ms - session.create -> session.info: ~1167ms -> ~999ms - full hermes --tui ready p50: ~1655ms -> ~1537ms Tests: - scripts/run_tests.sh tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py	2026-04-28 22:42:17 -05:00
Scott Trinh	fd943461ca	fix(doctor): accept catalog provider aliases Validate configured providers against both Hermes runtime provider ids and catalog-normalized provider ids. This keeps providers like ai-gateway from being rejected after catalog resolution maps them to models.dev ids. Keep credential checks and vendor-slug warnings anchored to the runtime id so doctor reports actionable provider names in follow-up diagnostics.	2026-04-28 18:27:42 -07:00
Teknium	9f004b6d94	perf(tools): memoize get_tool_definitions + TTL-cache check_fn results (#17098 ) Two amplifying optimizations to per-turn overhead in the gateway: 1. get_tool_definitions() memoization (model_tools.py) Keyed on (frozenset(enabled), frozenset(disabled), registry._generation, config.yaml mtime+size). Only active when quiet_mode=True (which is every hot-path caller — gateway, AIAgent.__init__); quiet_mode=False keeps the existing print side effects. Cached path returns a shallow-copy list sharing read-only schema dicts. Measured: 7.5 ms → 0.01 ms per call (~750× speedup). Gateway constructs fresh AIAgent per message, so this saves ~7 ms/turn before any LLM work. 2. check_fn() TTL cache (tools/registry.py) check_fn callables like check_terminal_requirements probe external state (Docker daemon, Modal SDK, playwright binary). For a long-lived process, hitting them on every get_definitions() pass was pure waste — external state changes on human timescales. 30 s TTL so env-var flips (hermes tools enable X) propagate within a turn or two without explicit invalidation. Measured: first call 7.5ms → 1.6ms (check_fn probes now dominate); subsequent calls ~0.01ms via the upstream memoization. Invalidation surface: - registry._generation bumps on register/deregister/register_toolset_alias, invalidating the memoized definitions automatically. - config.yaml mtime in the cache key captures user-visible config edits affecting dynamic schemas (execute_code mode, discord allowlist). - invalidate_check_fn_cache() exposed for explicit flushes (e.g. after hermes tools enable/disable). - tests/conftest.py autouse fixture clears both caches before every test so env-var monkeypatches don't see stale results. Also fixes a regression from PR #17046 that I missed: - tools/web_tools.py — Firecrawl was removed from module scope by the lazy import, breaking 8 tests that patch 'tools.web_tools.Firecrawl'. Applied the same _FirecrawlProxy pattern used in auxiliary_client/ run_agent for OpenAI (module-level proxy that looks like the class but imports the SDK on first call/isinstance; patch() replaces the attribute as usual). Verified: - 49/49 tests/tools/test_web_tools_config.py pass (was 8 failing on main) - 68/68 tests/tools/test_homeassistant_tool.py pass (was 1 failing in the full suite due to check_fn TTL cross-test pollution; fixed by the autouse fixture) - 3887/3895 tests/tools/ (8 pre-existing fails: 2 delegate, 1 mcp dynamic discovery, 5 mcp structured content — all confirmed on main) - 2973/2976 tests/agent/ + tests/run_agent/ (3 pre-existing fails) - 868/868 tests/run_agent/ (excluding test_run_agent.py which has pre-existing suite-level issues) - Live smoke: 2 turns + /model switch + tool calls, zero errors in agent.log session window. Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 18:20:17 -07:00
brooklyn!	188eaa57c4	fix(tui): honor documented mouse_tracking config key (#17188 ) * fix(tui): honor documented mouse_tracking config key The TUI runtime was reading display.tui_mouse while docs and user-facing examples pointed users at display.mouse_tracking. That made persistent mouse-disable config look like a no-op for users trying to restore native terminal selection/copy behavior on Linux/SSH/tmux terminals. Use display.mouse_tracking as the canonical key, keep display.tui_mouse as a legacy fallback, and have /mouse write the documented key. Both gateway config.get and client-side config sync now share the same precedence: the canonical key wins, then the legacy key, then default on. * review(copilot): align mouse tracking config coercion - Load gateway config once before deriving display.mouse_tracking state. - Use key-presence precedence on the TUI client too, so canonical mouse_tracking wins over legacy tui_mouse even when the value is null. - Treat numeric 0 as disabled on both gateway and client, matching the existing string "0" handling. - Widen ConfigDisplayConfig mouse fields because config.get full returns raw YAML, not normalized booleans.	2026-04-28 17:39:07 -07:00
brooklyn!	6b09df39be	fix(tui): restore macOS copy behavior and theme polish (#17131 ) This PR groups the TUI fixes that restore macOS Terminal usability and clean up the theme/composer regressions: - copy transcript selections on macOS drag-release so Terminal.app users can copy while mouse tracking is enabled - copy composer selections on macOS drag-release; composer selection is internal to TextInput and does not use the global Ink selection bus - keep IDE Cmd+C forwarding setup macOS-only, and make keybinding conflict checks respect simple when-clause overlap/negation - force truecolor before chalk initializes (unless NO_COLOR / FORCE_COLOR / HERMES_TUI_TRUECOLOR opt-outs apply) so the default banner keeps its gold/amber/bronze gradient in Terminal.app - move TUI surfaces onto semantic theme tokens and preserve skin prompt symbols as bare tokens with renderer-owned spacing - render focused placeholders as dim hint text in TTY mode instead of inverse/selected-looking synthetic cursor text	2026-04-28 18:47:14 -05:00
brooklyn!	7d81d76366	feat(tui): pluggable busy-indicator styles (#13610 ) (#17150 ) * feat(tui): pluggable busy-indicator styles (kaomoji/emoji/unicode/ascii) The status-bar `FaceTicker` rotated through wide-and-variable kaomoji glyphs (`(｡•́︿•̀｡)`, `( ͡° ͜ʖ ͡°)`, …) every 2.5s. Real display widths range from ~5 to ~16 columns, so the rest of the bar (cwd, ctx %, voice, bg counter) shifted on every cycle. Padding the verb alone (#17116) helped but didn't address the dominant jitter source — the glyph itself. Add four indicator styles, configurable + hot-swappable: * `kaomoji` (default — preserves the existing vibe; verb is now pad-stable so the only width churn left is the kaomoji itself). * `emoji` — single 2-col emoji frame (`⚕ 🌀 🤔 ✨ 🍵 🔮`). * `unicode` — `unicode-animations` braille spinner (1-col, smooth). * `ascii` — `\| / - \` (1-col, max compat). Wires: * `display.tui_status_indicator` in `DEFAULT_CONFIG` (default `kaomoji`). * New JSON-RPC `config.set/get indicator` keys, narrow allow-list. * `applyDisplay` reads the field and patches `UiState.indicatorStyle`, so the existing `mtime` poll picks up `~/.hermes/config.yaml` edits within ~5s without a TUI restart. * `/indicator [style]` slash command (alias `/indicator-style`, subcommand completion `kaomoji\|emoji\|unicode\|ascii`). Bare form shows the current style; setter fires `config.set` and optimistically `patchUiState({ indicatorStyle })` so the live TUI swaps immediately, matching the `/skin` UX. * `CommandDef("indicator", ..., subcommands=...)` so classic CLI autocomplete + TUI `complete.slash` both surface it. * `FaceTicker` decouples spinner cadence from verb cadence — the glyph runs at the spinner's authored interval (or `FACE_TICK_MS` for kaomoji), the verb stays on the original 2.5s cycle, and both re-arm cleanly when style changes. Tests: * `normalizeIndicatorStyle` rejects unknown / non-string input. * `applyDisplay → tui_status_indicator` covers fan-out + fallback. * `/indicator <style>` hot-swaps `UiState.indicatorStyle` after a successful `config.set`. * `/indicator sparkle` rejects with the usage hint and never hits the gateway. * Slash-parity matrix gets `'/indicator'` → `config.get`. Validation: cd ui-tui && npm run type-check — clean; npm test --run — 398/398. scripts/run_tests.sh tests/test_tui_gateway_server.py tests/hermes_cli/test_commands.py — 220/220. * chore(tui): drop /indicator-style alias to declutter autocomplete * fix(tui): drop verb-width pad — /indicator handles glyph jitter directly * fix(tui): unicode indicator style hides the verb (cleanest option) * refactor(tui): single source of truth for INDICATOR_STYLES; cleaner error format Round 1 Copilot review on PR #17150: - Exported `INDICATOR_STYLES` const tuple from `interfaces.ts`; `IndicatorStyle` union type is derived from it. `useConfigSync` builds its validation Set from the tuple, and `session.ts` uses it for both the usage hint and the runtime allow-list — adding/removing a style now touches one line. - Backend `config.set indicator` error message: switched `sorted(allowed)` list repr to `pick one of ascii\|emoji\|kaomoji\|unicode` (matches the TUI usage hint), and reports the normalized `raw` instead of the original `value`. Backend allowed tuple now has a comment pointing back at `INDICATOR_STYLES` so the two stay aligned. Note: kept the verb portion unpadded per design intent — fixed-width padding was the exact UX the `/indicator` command was added to remove. Stable width comes from the glyph; verbs cycling is part of the kawaii aesthetic. Reply on the verb thread will explain. * fix(tui): drop type collapse + gate verb timer + DEFAULT_INDICATOR_STYLE Round 2 Copilot review on PR #17150: - `tui_status_indicator?: 'ascii' \| ... \| string` collapses to `string` in TS — consumers got no narrowing. Documented as plain `string` with a comment about runtime validation via `normalizeIndicatorStyle`. - `FaceTicker` always started a 2.5s verb interval, even for the `unicode` style which hides the verb entirely. Now gated on `showVerb` from `renderIndicator` — `unicode` stays calm. Pre-emptive self-review (avoid round 3): - Three call sites duplicated the literal `'kaomoji'` default (uiStore, normalizeIndicatorStyle, slash command). Added `DEFAULT_INDICATOR_STYLE` to interfaces.ts and threaded it through so changing the default touches one line. * fix(tui-gateway): normalize config.get indicator output to match TUI render Round 4 Copilot review on PR #17150: `config.get` for `indicator` returned the raw `display.tui_status_indicator` value without validation, so a hand-edited config.yaml with stray casing or an unknown style would leave `/indicator` printing one thing while the TUI rendered the kaomoji default (frontend's `normalizeIndicatorStyle` does this normalization on receive). Lifted the allow-list to module scope as `_INDICATOR_STYLES` / `_INDICATOR_DEFAULT`, reused by both `config.set` and `config.get`. Comment notes the alignment with `INDICATOR_STYLES` / `DEFAULT_INDICATOR_STYLE` in interfaces.ts so adding/removing a style is a one-line change on each end. Tests cover: known value verbatim, casing/whitespace normalize, unknown→default, unset→default. * fix(tui-gateway): preserve falsy-input diagnostics in config.set indicator error Round 5 Copilot review on PR #17150: `raw = str(value or "").strip().lower()` collapsed any falsy non-string (`0`, `False`, `[]`) to empty string, so the error message read `unknown indicator: ` with nothing after — losing the original input. Switched to `("" if value is None else str(value)).strip().lower()` so only `None` (the genuine 'no value' case) becomes blank. Used `{raw!r}` in the error so the diagnostic is unambiguous (`'0'` vs `0`). Tests: - known-value happy path (`'EMOJI'` → `'emoji'`) - falsy non-string inputs (`0` / `False` / `[]`) surface meaningfully - `None` keeps the blank-repr error	2026-04-28 18:19:16 -05:00
brooklyn!	1e326c686d	fix(tui-gateway): harden stdio transport against half-closed pipes + SIGTERM races (#17118 ) * fix(tui-gateway): harden stdio transport against half-closed pipes + SIGTERM races `tui_gateway` reports `tui_gateway_crash.log` traces where the main thread sits in `sys.stdin` while a worker holds `_stdout_lock` mid- flush, and SIGTERM then calls `sys.exit(0)` while the lock is still held — the interpreter shutdown stalls behind the wedged write. Two narrowly scoped hardenings: `tui_gateway/transport.py` * Move JSON serialisation outside the lock — long messages no longer block sibling writers while we serialise. * Treat `BrokenPipeError`, `ValueError` ("I/O on closed file") and generic `OSError` from both `write` and `flush` as "peer is gone": return `False` instead of bubbling, matching what `write_json`'s callers in `entry.py` already expect. * Split `flush` into its own try block so a stuck flush never strands a partial write or holds the lock indefinitely on its way out. * Optional `HERMES_TUI_GATEWAY_NO_FLUSH=1` env knob to skip explicit `flush()` entirely on environments where a half-closed read pipe produces an indefinite kernel-level block. Default unchanged. `tui_gateway/entry.py` * `_log_signal` now spawns a 1-second daemon timer that calls `os._exit(0)` if the orderly `sys.exit(0)` path is itself stuck behind a wedged worker. Atexit handlers run inside the grace window when they can; the timer is the safety net so a deadlocked flush no longer strands the gateway process. Tests: * `test_write_json_closed_stream_returns_false` — ValueError path. * `test_write_json_oserror_on_flush_returns_false` — OSError on flush must not strand the lock; the write portion still landed before the flush failure. * `test_write_json_no_flush_env_skips_flush` — env knob bypass. Validation: `scripts/run_tests.sh tests/tui_gateway/test_protocol.py` (42/42 pass; one pre-existing failure on `test_session_resume_returns_hydrated_messages` is unrelated to this change — same `include_ancestors` mock kwarg issue tracked elsewhere). `scripts/run_tests.sh tests/test_tui_gateway_server.py` 90/90 pass. * review(copilot): tighten transport hardening comments + test cleanup * review(copilot): narrow exception capture, configurable grace, simpler no-flush test * fix(tui-gateway): narrow ValueError to closed-stream; surface UnicodeEncodeError Copilot review on PR #17118: `UnicodeEncodeError` is a ValueError subclass, so a non-UTF-8 stdout (mismatched PYTHONIOENCODING / locale) would have been silently swallowed as 'peer gone' under `except ValueError`. That hides a real environment bug. Now: - UnicodeEncodeError → log with exc_info (warning) and drop the frame - ValueError where str(e) contains 'closed file' → peer gone, return False - Any other ValueError → log loudly, drop frame (defensive, but visible) Same shape applied to flush. Adds two regression tests. * fix(tui-gateway): reserve write() False for peer-gone; re-raise programming errors Round 2 Copilot review on PR #17118: `Transport.write()` returning `False` is documented as 'peer is gone', and `entry.py` reacts by calling `sys.exit(0)`. But the implementation also returned False for non-IO conditions (non-JSON-safe payloads, UnicodeEncodeError, unrelated ValueErrors), so a programming error or local env bug would present as a clean disconnect — exactly the diagnosis pain we wanted to eliminate. Now: - `json.dumps` failure → re-raises (TypeError/ValueError surfaces in crash log) - `BrokenPipeError` → False (peer gone) - `ValueError('...closed file...')` → False (peer gone) - `UnicodeEncodeError` and any other ValueError → re-raise - `OSError` → False (existing IO-failure semantics, debug-logged) Tests updated to assert the re-raise behaviour and added a non-serializable-payload regression test. * fix(tui-gateway): narrow OSError to peer-gone errnos; honest test naming Round 3 Copilot review on PR #17118: - Docstring claimed False = peer gone, but generic OSError on write/flush also returned False — meaning ENOSPC/EACCES/EIO would silently exit. Added `_PEER_GONE_ERRNOS = {EPIPE, ECONNRESET, EBADF, ESHUTDOWN, +WSA}` and narrowed the OSError handlers; non-peer-gone errnos re-raise. Docstring now lists OSError as peer-gone branch with the errno set. - The `_DISABLE_FLUSH` test was named after the env var but actually patched the module constant. Renamed it to reflect the contract being tested (skips flush when constant is true) AND added a real end-to-end test that sets the env var, reloads transport.py, and asserts the constant flips. Cleanup reload restores defaults so parallel tests stay isolated. Self-review (avoid round 4): - Verified TeeTransport's secondary-swallow stays intentional. - _log_signal grace path already covered by separate tests.	2026-04-28 17:54:06 -05:00
brooklyn!	15ef11a8b8	fix(tui): make /browser connect actually take effect on the live agent (#17120 ) * fix(tui): make /browser connect actually take effect on the live agent Reports were that `/browser connect <url>` (and "changes to CDP url don't get picked up") didn't propagate to the live agent in `--tui`, forcing users to fall back to setting `browser.cdp_url` in `config.yaml` and restarting. Tracing the path on current main shows the protocol wiring is already correct — `/browser` is registered in `ui-tui/src/app/slash/commands/ops.ts` and dispatches `browser.manage` through the gateway RPC, NOT the slash worker (covered by the `browser.manage` row in `slashParity.test.ts`). But three real gaps left the experience flaky: 1. `cleanup_all_browsers()` ran AFTER `os.environ["BROWSER_CDP_URL"]` was rewritten. `_ensure_cdp_supervisor(...)` reads the env to resolve its target URL, so a tool call landing in that brief window could re-attach the supervisor to the OLD CDP endpoint just before we reaped sessions, leaving the agent talking to a dead URL. Reorder to clean first, swap env, clean again so the supervisor for the default task is definitively closed. 2. `browser.manage status` reported only the env var, ignoring `browser.cdp_url` from config.yaml. `_get_cdp_override()` (the resolver the agent itself uses) consults both — match it so `/browser status` answers the same question the next `browser_navigate` will see. Closes a stealth bug where users saw "browser not connected" while their CDP URL was perfectly set in config.yaml. 3. `/browser disconnect` only cleared `BROWSER_CDP_URL` and reaped once, leaving the same swap window as connect. Symmetrical double-cleanup here too. Frontend (`ops.ts`): * Echo "next browser tool call will use this CDP endpoint" on success so users see immediate confirmation that the gateway accepted the swap, even before any tool runs. * Mention `browser.cdp_url` in `config.yaml` in the usage hint and the not-connected status line. Persistent config is the correct fix for some terminal-multiplexer / sub-agent flows where env inheritance is unreliable; surfacing it makes that workaround discoverable. Tests (4 new, all hermetic): * `status` returns the resolved URL when only `browser.cdp_url` is set in config.yaml. * `connect` writes env AND cleans before/after, in that order. * `connect` against an unreachable endpoint does NOT mutate env or reap. * `disconnect` removes env and cleans twice. Validation: scripts/run_tests.sh tests/test_tui_gateway_server.py — 94/94 pass. cd ui-tui && npm run type-check — clean; npm test --run — 389/389. * review(copilot): always defer to _get_cdp_override; normalize bare host:port * review(copilot): collapse discovery-style CDP paths so /json/version isn't duplicated * fix(tui): /browser status must not perform CDP discovery I/O Copilot review on PR #17120: previous version routed through `tools.browser_tool._get_cdp_override`, which calls `_resolve_cdp_override` and performs an HTTP probe to /json/version with a multi-second timeout for discovery-style URLs. That blocks the TUI on `/browser status` whenever the configured host is slow or unreachable. Status now reads env-then-config directly with no network I/O. The WS normalization still happens in `browser_navigate` for actual tool calls, so behaviour-on-call is unchanged. * fix(tui): skip /json/version probe for concrete ws://devtools/browser endpoints Round 2 Copilot review on PR #17120: hosted CDP providers (Browserbase, browserless, etc.) return concrete `ws[s]://.../devtools/browser/<id>` URLs which are already directly connectable but don't serve the HTTP discovery path. The previous `/json/version` probe rejected these valid endpoints with 'could not reach browser CDP'. For `ws[s]://...` URLs whose path starts with `/devtools/browser/` we now do a TCP-level reachability check (`socket.create_connection`) instead of the HTTP probe. The actual CDP handshake happens on the next `browser_navigate` call, so we still surface unreachable hosts as 5031 errors — just without the false negatives. Discovery-style URLs (`http://host:port[/json[/version]]`) keep the HTTP probe path unchanged. Updated existing test + added two new ones (TCP-only success, TCP unreachable → 5031).	2026-04-28 17:46:57 -05:00
brooklyn!	87d3fa6f1c	feat(tui): opt-in auto-resume of the most recent session (#17130 ) * feat(tui): opt-in auto-resume of the most recent session `hermes --tui` always forges a fresh session at startup unless the user sets `HERMES_TUI_RESUME=<id>`. Disconnects, terminal-window crashes, and accidental Ctrl+D therefore lose every piece of in-flight context even though `state.db` still has the full history a `/resume` away. Add an opt-in path that mirrors classic CLI's `hermes -c` muscle memory: when `display.tui_auto_resume_recent: true` is set in `~/.hermes/config.yaml`, the TUI looks up the most recent human-facing session and resumes it instead of starting fresh. Default off so existing users aren't surprised; explicit `HERMES_TUI_RESUME` always wins. Wires: * New `session.most_recent` JSON-RPC in `tui_gateway/server.py` that returns the first non-`tool` row from `list_sessions_rich`, or `{"session_id": null}` when none. Uses the same deny-list as `session.list` so sub-agent rows can't sneak in. * `createGatewayEventHandler.handleReady` re-ordered: explicit `STARTUP_RESUME_ID` first (unchanged), then conditional auto-resume via `config.get full → display.tui_auto_resume_recent`, then the legacy `newSession()` fallback. Failures of either RPC fall back to `newSession()` so the path is always finite. * Default `display.tui_auto_resume_recent: False` added to `DEFAULT_CONFIG` in `hermes_cli/config.py` (no `_config_version` bump per AGENTS.md — deep-merge handles the additive key). Tests: * 4 new vitest cases in `createGatewayEventHandler.test.ts` cover every gate-and-fallback combination (env wins, config off, config on with hit, config on with miss). * 3 new pytest cases for `session.most_recent` (denied row skip, tool-only → null, db-unavailable → null). Validation: scripts/run_tests.sh tests/test_tui_gateway_server.py — 93/93. cd ui-tui && npm run type-check — clean; npm test --run — 393/393. * review(copilot): fold session.most_recent errors into null + extend ConfigDisplayConfig * review(copilot): cover RPC-rejection fallbacks in auto-resume tests	2026-04-28 16:53:38 -05:00
Gille	0d957a8d48	fix(tui): surface mouse slash command (#17126 )	2026-04-28 13:27:43 -07:00
brooklyn!	5f215b13ce	fix(docker): materialize bundled TUI Ink package (#16690 ) * fix(docker): materialize bundled TUI Ink package * fix(docker): keep nested deps out of build context * fix(docker): make TUI Ink smoke check deterministic * test(docker): skip dockerignore assertion in partial checkouts * fix(docker): use lockfile install for vendored Ink deps * test(cli): expect deterministic npm ci in /update flow * fix(docker): fall back to npm install for vendored Ink deps * fix(docker): keep bundled Ink source for TUI runtime builds * fix(docker): dedupe React in vendored Ink package	2026-04-28 15:11:47 -05:00

1 2 3 4 5 ...

2837 commits