From 20f2258f3481e708fc954034ee36e2c72bce1782 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Fri, 17 Apr 2026 20:39:25 -0700 Subject: [PATCH 001/143] fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace interrupt() previously only flagged the agent's _execution_thread_id. Tools running inside _execute_tool_calls_concurrent execute on ThreadPoolExecutor worker threads whose tids are distinct from the agent's, so is_interrupted() inside those tools returned False no matter how many times the gateway called .interrupt() — hung ssh / curl / long make-builds ran to their own timeout. Changes: - run_agent.py: track concurrent-tool worker tids in a per-agent set, fan interrupt()/clear_interrupt() out to them, and handle the register-after-interrupt race at _run_tool entry. getattr fallback for the tracker so test stubs built via object.__new__ keep working. - tools/environments/base.py: opt-in _wait_for_process trace (ENTER, per-30s HEARTBEAT with interrupt+activity-cb state, INTERRUPT DETECTED, TIMEOUT, EXIT) behind HERMES_DEBUG_INTERRUPT=1. - tools/interrupt.py: opt-in set_interrupt() trace (caller tid, target tid, set snapshot) behind the same env flag. - tests: new regression test runs a polling tool on a concurrent worker and asserts is_interrupted() flips to True within ~1s of interrupt(). Second new test guards clear_interrupt() clearing tracked worker bits. Validation: tests/run_agent/ all 762 pass; tests/tools/ interrupt+env subset 216 pass. * fix(interrupt-debug): bypass quiet_mode logger filter so trace reaches agent.log AIAgent.__init__ sets logging.getLogger('tools').setLevel(ERROR) when quiet_mode=True (the CLI default). This would silently swallow every INFO-level trace line from the HERMES_DEBUG_INTERRUPT=1 instrumentation added in the parent commit — confirmed by running hermes chat -q with the flag and finding zero trace lines in agent.log even though _wait_for_process was clearly executing (subprocess pid existed). Fix: when HERMES_DEBUG_INTERRUPT=1, each traced module explicitly sets its own logger level to INFO at import time, overriding the 'tools' parent-level filter. Scoped to the opt-in case only, so production (quiet_mode default) logs stay quiet as designed. Validation: hermes chat -q with HERMES_DEBUG_INTERRUPT=1 now writes '_wait_for_process ENTER/EXIT' lines to agent.log as expected. * fix(cli): SIGTERM/SIGHUP no longer orphans tool subprocesses Tool subprocesses spawned by the local environment backend use os.setsid so they run in their own process group. Before this fix, SIGTERM/SIGHUP to the hermes CLI killed the main thread via KeyboardInterrupt but the worker thread running _wait_for_process never got a chance to call _kill_process — Python exited, the child was reparented to init (PPID=1), and the subprocess ran to its natural end (confirmed live: sleep 300 survived 4+ min after SIGTERM to the agent until manual cleanup). Changes: - cli.py _signal_handler (interactive) + _signal_handler_q (-q mode): route SIGTERM/SIGHUP through agent.interrupt() so the worker's poll loop sees the per-thread interrupt flag and calls _kill_process (os.killpg) on the subprocess group. HERMES_SIGTERM_GRACE (default 1.5s) gives the worker time to complete its SIGTERM+SIGKILL escalation before KeyboardInterrupt unwinds main. - tools/environments/base.py _wait_for_process: wrap the poll loop in try/except (KeyboardInterrupt, SystemExit) so the cleanup fires even on paths the signal handlers don't cover (direct sys.exit, unhandled KI from nested code, etc.). Emits EXCEPTION_EXIT trace line when HERMES_DEBUG_INTERRUPT=1. - New regression test: injects KeyboardInterrupt into a running _wait_for_process via PyThreadState_SetAsyncExc, verifies the subprocess process group is dead within 3s of the exception and that KeyboardInterrupt re-raises cleanly afterward. Validation: | Before | After | |---------------------------------------------------------|--------------------| | sleep 300 survives 4+ min as PPID=1 orphan after SIGTERM | dies within 2 s | | No INTERRUPT DETECTED in trace | INTERRUPT DETECTED fires + killing process group | | tests/tools/test_local_interrupt_cleanup | 1/1 pass | | tests/run_agent/test_concurrent_interrupt | 4/4 pass | --- cli.py | 69 ++++++++- run_agent.py | 72 +++++++++ tests/run_agent/test_concurrent_interrupt.py | 123 +++++++++++++++- tests/tools/test_local_interrupt_cleanup.py | 145 +++++++++++++++++++ tools/environments/base.py | 142 +++++++++++++++--- tools/interrupt.py | 22 +++ 6 files changed, 551 insertions(+), 22 deletions(-) create mode 100644 tests/tools/test_local_interrupt_cleanup.py diff --git a/cli.py b/cli.py index c0c17babc4c..2456c7754b2 100644 --- a/cli.py +++ b/cli.py @@ -10067,8 +10067,36 @@ class HermesCLI: # Register signal handlers for graceful shutdown on SSH disconnect / SIGTERM def _signal_handler(signum, frame): - """Handle SIGHUP/SIGTERM by triggering graceful cleanup.""" + """Handle SIGHUP/SIGTERM by triggering graceful cleanup. + + Calls ``self.agent.interrupt()`` first so the agent daemon + thread's poll loop sees the per-thread interrupt and kills the + tool's subprocess group via ``_kill_process`` (os.killpg). + Without this, the main thread dies from KeyboardInterrupt and + the daemon thread is killed with it — before it can run one + more poll iteration to clean up the subprocess, which was + spawned with ``os.setsid`` and therefore survives as an orphan + with PPID=1. + + Grace window (``HERMES_SIGTERM_GRACE``, default 1.5 s) gives + the daemon time to: detect the interrupt (next 200 ms poll) → + call _kill_process (SIGTERM + 1 s wait + SIGKILL if needed) → + return from _wait_for_process. ``time.sleep`` releases the + GIL so the daemon actually runs during the window. + """ logger.debug("Received signal %s, triggering graceful shutdown", signum) + try: + if getattr(self, "agent", None) and getattr(self, "_agent_running", False): + self.agent.interrupt(f"received signal {signum}") + import time as _t + try: + _grace = float(os.getenv("HERMES_SIGTERM_GRACE", "1.5")) + except (TypeError, ValueError): + _grace = 1.5 + if _grace > 0: + _t.sleep(_grace) + except Exception: + pass # never block signal handling raise KeyboardInterrupt() try: @@ -10371,6 +10399,45 @@ def main( # Register cleanup for single-query mode (interactive mode registers in run()) atexit.register(_run_cleanup) + + # Also install signal handlers in single-query / `-q` mode. Interactive + # mode registers its own inside HermesCLI.run(), but `-q` runs + # cli.agent.run_conversation() below and AIAgent spawns worker threads + # for tools — so when SIGTERM arrives on the main thread, raising + # KeyboardInterrupt only unwinds the main thread, not the worker + # running _wait_for_process. Python then exits, the child subprocess + # (spawned with os.setsid, its own process group) is reparented to + # init and keeps running as an orphan. + # + # Fix: route SIGTERM/SIGHUP through agent.interrupt() which sets the + # per-thread interrupt flag the worker's poll loop checks every 200 ms. + # Give the worker a grace window to call _kill_process (SIGTERM to the + # process group, then SIGKILL after 1 s), then raise KeyboardInterrupt + # so main unwinds normally. HERMES_SIGTERM_GRACE overrides the 1.5 s + # default for debugging. + def _signal_handler_q(signum, frame): + logger.debug("Received signal %s in single-query mode", signum) + try: + _agent = getattr(cli, "agent", None) + if _agent is not None: + _agent.interrupt(f"received signal {signum}") + import time as _t + try: + _grace = float(os.getenv("HERMES_SIGTERM_GRACE", "1.5")) + except (TypeError, ValueError): + _grace = 1.5 + if _grace > 0: + _t.sleep(_grace) + except Exception: + pass # never block signal handling + raise KeyboardInterrupt() + try: + import signal as _signal + _signal.signal(_signal.SIGTERM, _signal_handler_q) + if hasattr(_signal, "SIGHUP"): + _signal.signal(_signal.SIGHUP, _signal_handler_q) + except Exception: + pass # signal handler may fail in restricted environments # Handle single query mode if query or image: diff --git a/run_agent.py b/run_agent.py index ef90ae39e20..010715280ca 100644 --- a/run_agent.py +++ b/run_agent.py @@ -831,6 +831,16 @@ class AIAgent: self._execution_thread_id: int | None = None # Set at run_conversation() start self._interrupt_thread_signal_pending = False self._client_lock = threading.RLock() + + # Concurrent-tool worker thread tracking. `_execute_tool_calls_concurrent` + # runs each tool on its own ThreadPoolExecutor worker — those worker + # threads have tids distinct from `_execution_thread_id`, so + # `_set_interrupt(True, _execution_thread_id)` alone does NOT cause + # `is_interrupted()` inside the worker to return True. Track the + # workers here so `interrupt()` / `clear_interrupt()` can fan out to + # their tids explicitly. + self._tool_worker_threads: set[int] = set() + self._tool_worker_threads_lock = threading.Lock() # Subagent delegation state self._delegate_depth = 0 # 0 = top-level agent, incremented for children @@ -3191,6 +3201,25 @@ class AIAgent: # interrupt signal until startup completes instead of targeting # the caller thread by mistake. self._interrupt_thread_signal_pending = True + # Fan out to concurrent-tool worker threads. Those workers run tools + # on their own tids (ThreadPoolExecutor workers), so `is_interrupted()` + # inside a tool only sees an interrupt when their specific tid is in + # the `_interrupted_threads` set. Without this propagation, an + # already-running concurrent tool (e.g. a terminal command hung on + # network I/O) never notices the interrupt and has to run to its own + # timeout. See `_run_tool` for the matching entry/exit bookkeeping. + # `getattr` fallback covers test stubs that build AIAgent via + # object.__new__ and skip __init__. + _tracker = getattr(self, "_tool_worker_threads", None) + _tracker_lock = getattr(self, "_tool_worker_threads_lock", None) + if _tracker is not None and _tracker_lock is not None: + with _tracker_lock: + _worker_tids = list(_tracker) + for _wtid in _worker_tids: + try: + _set_interrupt(True, _wtid) + except Exception: + pass # Propagate interrupt to any running child agents (subagent delegation) with self._active_children_lock: children_copy = list(self._active_children) @@ -3209,6 +3238,23 @@ class AIAgent: self._interrupt_thread_signal_pending = False if self._execution_thread_id is not None: _set_interrupt(False, self._execution_thread_id) + # Also clear any concurrent-tool worker thread bits. Tracked + # workers normally clear their own bit on exit, but an explicit + # clear here guarantees no stale interrupt can survive a turn + # boundary and fire on a subsequent, unrelated tool call that + # happens to get scheduled onto the same recycled worker tid. + # `getattr` fallback covers test stubs that build AIAgent via + # object.__new__ and skip __init__. + _tracker = getattr(self, "_tool_worker_threads", None) + _tracker_lock = getattr(self, "_tool_worker_threads_lock", None) + if _tracker is not None and _tracker_lock is not None: + with _tracker_lock: + _worker_tids = list(_tracker) + for _wtid in _worker_tids: + try: + _set_interrupt(False, _wtid) + except Exception: + pass def _touch_activity(self, desc: str) -> None: """Update the last-activity timestamp and description (thread-safe).""" @@ -7653,6 +7699,22 @@ class AIAgent: def _run_tool(index, tool_call, function_name, function_args): """Worker function executed in a thread.""" + # Register this worker tid so the agent can fan out an interrupt + # to it — see AIAgent.interrupt(). Must happen first thing, and + # must be paired with discard + clear in the finally block. + _worker_tid = threading.current_thread().ident + with self._tool_worker_threads_lock: + self._tool_worker_threads.add(_worker_tid) + # Race: if the agent was interrupted between fan-out (which + # snapshotted an empty/earlier set) and our registration, apply + # the interrupt to our own tid now so is_interrupted() inside + # the tool returns True on the next poll. + if self._interrupt_requested: + try: + from tools.interrupt import set_interrupt as _sif + _sif(True, _worker_tid) + except Exception: + pass # Set the activity callback on THIS worker thread so # _wait_for_process (terminal commands) can fire heartbeats. # The callback is thread-local; the main thread's callback @@ -7675,6 +7737,16 @@ class AIAgent: else: logger.info("tool %s completed (%.2fs, %d chars)", function_name, duration, len(result)) results[index] = (function_name, function_args, result, duration, is_error) + # Tear down worker-tid tracking. Clear any interrupt bit we may + # have set so the next task scheduled onto this recycled tid + # starts with a clean slate. + with self._tool_worker_threads_lock: + self._tool_worker_threads.discard(_worker_tid) + try: + from tools.interrupt import set_interrupt as _sif + _sif(False, _worker_tid) + except Exception: + pass # Start spinner for CLI mode (skip when TUI handles tool progress) spinner = None diff --git a/tests/run_agent/test_concurrent_interrupt.py b/tests/run_agent/test_concurrent_interrupt.py index fdeb8dd6907..e5d8b88e727 100644 --- a/tests/run_agent/test_concurrent_interrupt.py +++ b/tests/run_agent/test_concurrent_interrupt.py @@ -23,6 +23,10 @@ def _make_agent(monkeypatch): class _Stub: _interrupt_requested = False + _interrupt_message = None + # Bind to this thread's ident so interrupt() targets a real tid. + _execution_thread_id = threading.current_thread().ident + _interrupt_thread_signal_pending = False log_prefix = "" quiet_mode = True verbose_logging = False @@ -40,6 +44,15 @@ def _make_agent(monkeypatch): _current_tool = None _last_activity = 0 _print_fn = print + # Worker-thread tracking state mirrored from AIAgent.__init__ so the + # real interrupt() method can fan out to concurrent-tool workers. + _active_children: list = [] + + def __init__(self): + # Instance-level (not class-level) so each test gets a fresh set. + self._tool_worker_threads: set = set() + self._tool_worker_threads_lock = threading.Lock() + self._active_children_lock = threading.Lock() def _touch_activity(self, desc): self._last_activity = time.time() @@ -60,8 +73,10 @@ def _make_agent(monkeypatch): return False stub = _Stub() - # Bind the real methods + # Bind the real methods under test stub._execute_tool_calls_concurrent = _ra.AIAgent._execute_tool_calls_concurrent.__get__(stub) + stub.interrupt = _ra.AIAgent.interrupt.__get__(stub) + stub.clear_interrupt = _ra.AIAgent.clear_interrupt.__get__(stub) stub._invoke_tool = MagicMock(side_effect=lambda *a, **kw: '{"ok": true}') return stub @@ -137,3 +152,109 @@ def test_concurrent_preflight_interrupt_skips_all(monkeypatch): assert "skipped due to user interrupt" in messages[1]["content"] # _invoke_tool should never have been called agent._invoke_tool.assert_not_called() + + +def test_running_concurrent_worker_sees_is_interrupted(monkeypatch): + """Regression guard for the "interrupt-doesn't-reach-hung-tool" class of + bug Physikal reported in April 2026. + + Before this fix, `AIAgent.interrupt()` called `_set_interrupt(True, + _execution_thread_id)` — which only flagged the agent's *main* thread. + Tools running inside `_execute_tool_calls_concurrent` execute on + ThreadPoolExecutor worker threads whose tids are NOT the agent's, so + `is_interrupted()` (which checks the *current* thread's tid) returned + False inside those tools no matter how many times the gateway called + `.interrupt()`. Hung ssh / long curl / big make-build tools would run + to their own timeout. + + This test runs a fake tool in the concurrent path that polls + `is_interrupted()` like a real terminal command does, then calls + `agent.interrupt()` from another thread, and asserts the poll sees True + within one second. + """ + from tools.interrupt import is_interrupted + + agent = _make_agent(monkeypatch) + + # Counter plus observation hooks so we can prove the worker saw the flip. + observed = {"saw_true": False, "poll_count": 0, "worker_tid": None} + worker_started = threading.Event() + + def polling_tool(name, args, task_id, call_id=None): + observed["worker_tid"] = threading.current_thread().ident + worker_started.set() + deadline = time.monotonic() + 5.0 + while time.monotonic() < deadline: + observed["poll_count"] += 1 + if is_interrupted(): + observed["saw_true"] = True + return '{"interrupted": true}' + time.sleep(0.05) + return '{"timed_out": true}' + + agent._invoke_tool = MagicMock(side_effect=polling_tool) + + tc1 = _FakeToolCall("hung_fake_tool_1", call_id="tc1") + tc2 = _FakeToolCall("hung_fake_tool_2", call_id="tc2") + msg = _FakeAssistantMsg([tc1, tc2]) + messages = [] + + def _interrupt_after_start(): + # Wait until at least one worker is running so its tid is tracked. + worker_started.wait(timeout=2.0) + time.sleep(0.2) # let the other worker enter too + agent.interrupt("stop requested by test") + + t = threading.Thread(target=_interrupt_after_start) + t.start() + start = time.monotonic() + agent._execute_tool_calls_concurrent(msg, messages, "test_task") + elapsed = time.monotonic() - start + t.join(timeout=2.0) + + # The worker must have actually polled is_interrupted — otherwise the + # test isn't exercising what it claims to. + assert observed["poll_count"] > 0, ( + "polling_tool never ran — test scaffold issue" + ) + # The worker must see the interrupt within ~1 s of agent.interrupt() + # being called. Before the fix this loop ran until its 5 s own-timeout. + assert observed["saw_true"], ( + f"is_interrupted() never returned True inside the concurrent worker " + f"after agent.interrupt() — interrupt-propagation hole regressed. " + f"worker_tid={observed['worker_tid']!r} poll_count={observed['poll_count']}" + ) + assert elapsed < 3.0, ( + f"concurrent execution took {elapsed:.2f}s after interrupt — the fan-out " + f"to worker tids didn't shortcut the tool's poll loop as expected" + ) + # Also verify cleanup: no stale worker tids should remain after all + # tools finished. + assert agent._tool_worker_threads == set(), ( + f"worker tids leaked after run: {agent._tool_worker_threads}" + ) + + +def test_clear_interrupt_clears_worker_tids(monkeypatch): + """After clear_interrupt(), stale worker-tid bits must be cleared so the + next turn's tools — which may be scheduled onto recycled tids — don't + see a false interrupt.""" + from tools.interrupt import is_interrupted, set_interrupt + + agent = _make_agent(monkeypatch) + # Simulate a worker having registered but not yet exited cleanly (e.g. a + # hypothetical bug in the tear-down). Put a fake tid in the set and + # flag it interrupted. + fake_tid = threading.current_thread().ident # use real tid so is_interrupted can see it + with agent._tool_worker_threads_lock: + agent._tool_worker_threads.add(fake_tid) + set_interrupt(True, fake_tid) + assert is_interrupted() is True # sanity + + agent.clear_interrupt() + + assert is_interrupted() is False, ( + "clear_interrupt() did not clear the interrupt bit for a tracked " + "worker tid — stale interrupt can leak into the next turn" + ) + diff --git a/tests/tools/test_local_interrupt_cleanup.py b/tests/tools/test_local_interrupt_cleanup.py new file mode 100644 index 00000000000..72310009a54 --- /dev/null +++ b/tests/tools/test_local_interrupt_cleanup.py @@ -0,0 +1,145 @@ +"""Regression tests for _wait_for_process subprocess cleanup on exception exit. + +When the poll loop exits via KeyboardInterrupt or SystemExit (SIGTERM via +cli.py signal handler, SIGINT on the main thread in non-interactive -q mode, +or explicit sys.exit from some caller), the child subprocess must be killed +before the exception propagates — otherwise the local backend's use of +os.setsid leaves an orphan with PPID=1. + +The live repro that motivated this: hermes chat -q ... 'sleep 300', SIGTERM +to the python process, sleep 300 survived with PPID=1 for the full 300 s +because _wait_for_process never got to call _kill_process before python +died. See commit message for full context. +""" +import os +import signal +import subprocess +import threading +import time + +import pytest + +from tools.environments.local import LocalEnvironment + + +@pytest.fixture(autouse=True) +def _isolate_hermes_home(tmp_path, monkeypatch): + monkeypatch.setenv("HERMES_HOME", str(tmp_path)) + (tmp_path / "logs").mkdir(exist_ok=True) + + +def _pgid_still_alive(pgid: int) -> bool: + """Return True if any process in the given process group is still alive.""" + try: + os.killpg(pgid, 0) # signal 0 = existence check + return True + except ProcessLookupError: + return False + + +def test_wait_for_process_kills_subprocess_on_keyboardinterrupt(): + """When KeyboardInterrupt arrives mid-poll, the subprocess group must be + killed before the exception is re-raised.""" + env = LocalEnvironment(cwd="/tmp") + try: + result_holder = {} + proc_holder = {} + started = threading.Event() + raise_at = [None] # set by the main thread to tell worker when + + # Drive execute() on a separate thread so we can SIGNAL-interrupt it + # via a thread-targeted exception without killing our test process. + def worker(): + # Spawn a subprocess that will definitely be alive long enough + # to observe the cleanup, via env.execute(...) — the normal path + # that goes through _wait_for_process. + try: + result_holder["result"] = env.execute("sleep 30", timeout=60) + except BaseException as e: # noqa: BLE001 — we want to observe it + result_holder["exception"] = type(e).__name__ + + t = threading.Thread(target=worker, daemon=True) + t.start() + # Wait until the subprocess actually exists. LocalEnvironment.execute + # does init_session() (one spawn) before the real command, so we need + # to wait until a sleep 30 is visible. Use pgrep-style lookup via + # /proc to find the bash process running our sleep. + deadline = time.monotonic() + 5.0 + target_pid = None + while time.monotonic() < deadline: + # Walk our children and grand-children to find one running 'sleep 30' + try: + import psutil # optional — fall back if absent + for p in psutil.Process(os.getpid()).children(recursive=True): + try: + if "sleep 30" in " ".join(p.cmdline()): + target_pid = p.pid + break + except (psutil.NoSuchProcess, psutil.AccessDenied): + continue + except ImportError: + # Fall back to ps + ps = subprocess.run( + ["ps", "-eo", "pid,ppid,pgid,cmd"], capture_output=True, text=True, + ) + for line in ps.stdout.splitlines(): + if "sleep 30" in line and "grep" not in line: + parts = line.split() + if parts and parts[0].isdigit(): + target_pid = int(parts[0]) + break + if target_pid: + break + time.sleep(0.1) + + assert target_pid is not None, ( + "test setup: couldn't find 'sleep 30' subprocess after 5 s" + ) + pgid = os.getpgid(target_pid) + assert _pgid_still_alive(pgid), "sanity: subprocess should be alive" + + # Now inject a KeyboardInterrupt into the worker thread the same + # way CPython's signal machinery would. We use ctypes.PyThreadState_SetAsyncExc + # which is how signal delivery to non-main threads is simulated. + import ctypes + import sys as _sys + # py-thread-state exception targets need the ident, not the Thread + tid = t.ident + assert tid is not None + # Fire KeyboardInterrupt into the worker thread + ret = ctypes.pythonapi.PyThreadState_SetAsyncExc( + ctypes.c_ulong(tid), ctypes.py_object(KeyboardInterrupt), + ) + assert ret == 1, f"SetAsyncExc returned {ret}, expected 1" + + # Give the worker a moment to: hit the exception at the next poll, + # run the except-block cleanup (_kill_process), and exit. + t.join(timeout=5.0) + assert not t.is_alive(), "worker didn't exit within 5 s of the interrupt" + + # The critical assertion: the subprocess GROUP must be dead. Not + # just the bash wrapper — the 'sleep 30' child too. + # Give the SIGTERM+1s wait+SIGKILL escalation a moment to complete. + deadline = time.monotonic() + 3.0 + while time.monotonic() < deadline: + if not _pgid_still_alive(pgid): + break + time.sleep(0.1) + assert not _pgid_still_alive(pgid), ( + f"subprocess group {pgid} is STILL ALIVE after worker received " + f"KeyboardInterrupt — orphan bug regressed. This is the " + f"sleep-300-survives-SIGTERM scenario from Physikal's Apr 2026 " + f"report. See tools/environments/base.py _wait_for_process " + f"except-block." + ) + # And the worker should have observed the KeyboardInterrupt (i.e. + # it re-raised cleanly, not silently swallowed). + assert result_holder.get("exception") == "KeyboardInterrupt", ( + f"worker result: {result_holder!r} — expected KeyboardInterrupt " + f"propagation after cleanup" + ) + finally: + try: + env.cleanup() + except Exception: + pass diff --git a/tools/environments/base.py b/tools/environments/base.py index 8e990792369..1bc08449e49 100644 --- a/tools/environments/base.py +++ b/tools/environments/base.py @@ -23,6 +23,19 @@ from tools.interrupt import is_interrupted logger = logging.getLogger(__name__) +# Opt-in debug tracing for the interrupt/activity/poll machinery. Set +# HERMES_DEBUG_INTERRUPT=1 to log loop entry/exit, periodic heartbeats, and +# every is_interrupted() state change from _wait_for_process. Off by default +# to avoid flooding production gateway logs. +_DEBUG_INTERRUPT = bool(os.getenv("HERMES_DEBUG_INTERRUPT")) + +if _DEBUG_INTERRUPT: + # AIAgent's quiet_mode path (run_agent.py) forces the `tools` logger to + # ERROR on CLI startup, which would silently swallow every trace we emit. + # Force this module's own logger back to INFO so the trace is visible in + # agent.log regardless of quiet-mode. Scoped to the opt-in case only. + logger.setLevel(logging.INFO) + # Thread-local activity callback. The agent sets this before a tool call so # long-running _wait_for_process loops can report liveness to the gateway. _activity_callback_local = threading.local() @@ -413,6 +426,13 @@ class BaseEnvironment(ABC): Fires the ``activity_callback`` (if set on this instance) every 10s while the process is running so the gateway's inactivity timeout doesn't kill long-running commands. + + Also wraps the poll loop in a ``try/finally`` that guarantees we + call ``self._kill_process(proc)`` if we exit via ``KeyboardInterrupt`` + or ``SystemExit``. Without this, the local backend (which spawns + subprocesses with ``os.setsid`` into their own process group) leaves + an orphan with ``PPID=1`` when python is shut down mid-tool — the + ``sleep 300``-survives-30-min bug Physikal and I both hit. """ output_chunks: list[str] = [] @@ -437,28 +457,101 @@ class BaseEnvironment(ABC): "start": _now, } - while proc.poll() is None: - if is_interrupted(): + # --- Debug tracing (opt-in via HERMES_DEBUG_INTERRUPT=1) ------------- + # Captures loop entry/exit, interrupt state changes, and periodic + # heartbeats so we can diagnose "agent never sees the interrupt" + # reports without reproducing locally. + _tid = threading.current_thread().ident + _pid = getattr(proc, "pid", None) + _iter_count = 0 + _last_heartbeat = _now + _last_interrupt_state = False + _cb_was_none = _get_activity_callback() is None + if _DEBUG_INTERRUPT: + logger.info( + "[interrupt-debug] _wait_for_process ENTER tid=%s pid=%s " + "timeout=%ss activity_cb=%s initial_interrupt=%s", + _tid, _pid, timeout, + "set" if not _cb_was_none else "MISSING", + is_interrupted(), + ) + + try: + while proc.poll() is None: + _iter_count += 1 + if is_interrupted(): + if _DEBUG_INTERRUPT: + logger.info( + "[interrupt-debug] _wait_for_process INTERRUPT DETECTED " + "tid=%s pid=%s iter=%d elapsed=%.1fs — killing process group", + _tid, _pid, _iter_count, time.monotonic() - _activity_state["start"], + ) + self._kill_process(proc) + drain_thread.join(timeout=2) + return { + "output": "".join(output_chunks) + "\n[Command interrupted]", + "returncode": 130, + } + if time.monotonic() > deadline: + if _DEBUG_INTERRUPT: + logger.info( + "[interrupt-debug] _wait_for_process TIMEOUT " + "tid=%s pid=%s iter=%d timeout=%ss", + _tid, _pid, _iter_count, timeout, + ) + self._kill_process(proc) + drain_thread.join(timeout=2) + partial = "".join(output_chunks) + timeout_msg = f"\n[Command timed out after {timeout}s]" + return { + "output": partial + timeout_msg + if partial + else timeout_msg.lstrip(), + "returncode": 124, + } + # Periodic activity touch so the gateway knows we're alive + touch_activity_if_due(_activity_state, "terminal command running") + + # Heartbeat every ~30s: proves the loop is alive and reports + # the activity-callback state (thread-local, can get clobbered + # by nested tool calls or executor thread reuse). + if _DEBUG_INTERRUPT and time.monotonic() - _last_heartbeat >= 30.0: + _cb_now_none = _get_activity_callback() is None + logger.info( + "[interrupt-debug] _wait_for_process HEARTBEAT " + "tid=%s pid=%s iter=%d elapsed=%.0fs " + "interrupt=%s activity_cb=%s%s", + _tid, _pid, _iter_count, + time.monotonic() - _activity_state["start"], + is_interrupted(), + "set" if not _cb_now_none else "MISSING", + " (LOST during run)" if _cb_now_none and not _cb_was_none else "", + ) + _last_heartbeat = time.monotonic() + _cb_was_none = _cb_now_none + + time.sleep(0.2) + except (KeyboardInterrupt, SystemExit): + # Signal arrived (SIGTERM/SIGHUP/SIGINT) or sys.exit() was called + # while we were polling. The local backend spawns subprocesses + # with os.setsid, which puts them in their own process group — so + # if we let the interrupt propagate without killing the child, + # python exits and the child is reparented to init (PPID=1) and + # keeps running as an orphan. Killing the process group here + # guarantees the tool's side effects stop when the agent stops. + if _DEBUG_INTERRUPT: + logger.info( + "[interrupt-debug] _wait_for_process EXCEPTION_EXIT " + "tid=%s pid=%s iter=%d elapsed=%.1fs — killing subprocess group before re-raise", + _tid, _pid, _iter_count, + time.monotonic() - _activity_state["start"], + ) + try: self._kill_process(proc) drain_thread.join(timeout=2) - return { - "output": "".join(output_chunks) + "\n[Command interrupted]", - "returncode": 130, - } - if time.monotonic() > deadline: - self._kill_process(proc) - drain_thread.join(timeout=2) - partial = "".join(output_chunks) - timeout_msg = f"\n[Command timed out after {timeout}s]" - return { - "output": partial + timeout_msg - if partial - else timeout_msg.lstrip(), - "returncode": 124, - } - # Periodic activity touch so the gateway knows we're alive - touch_activity_if_due(_activity_state, "terminal command running") - time.sleep(0.2) + except Exception: + pass # cleanup is best-effort + raise drain_thread.join(timeout=5) @@ -467,6 +560,15 @@ class BaseEnvironment(ABC): except Exception: pass + if _DEBUG_INTERRUPT: + logger.info( + "[interrupt-debug] _wait_for_process EXIT (natural) " + "tid=%s pid=%s iter=%d elapsed=%.1fs returncode=%s", + _tid, _pid, _iter_count, + time.monotonic() - _activity_state["start"], + proc.returncode, + ) + return {"output": "".join(output_chunks), "returncode": proc.returncode} def _kill_process(self, proc: ProcessHandle): diff --git a/tools/interrupt.py b/tools/interrupt.py index 9bc8b83ae4f..ac784332f91 100644 --- a/tools/interrupt.py +++ b/tools/interrupt.py @@ -14,8 +14,23 @@ Usage in tools: return {"output": "[interrupted]", "returncode": 130} """ +import logging +import os import threading +logger = logging.getLogger(__name__) + +# Opt-in debug tracing — pairs with HERMES_DEBUG_INTERRUPT in +# tools/environments/base.py. Enables per-call logging of set/check so the +# caller thread, target thread, and current state are visible when +# diagnosing "interrupt signaled but tool never saw it" reports. +_DEBUG_INTERRUPT = bool(os.getenv("HERMES_DEBUG_INTERRUPT")) + +if _DEBUG_INTERRUPT: + # AIAgent's quiet_mode path forces `tools` logger to ERROR on CLI startup. + # Force our own logger back to INFO so the trace is visible in agent.log. + logger.setLevel(logging.INFO) + # Set of thread idents that have been interrupted. _interrupted_threads: set[int] = set() _lock = threading.Lock() @@ -35,6 +50,13 @@ def set_interrupt(active: bool, thread_id: int | None = None) -> None: _interrupted_threads.add(tid) else: _interrupted_threads.discard(tid) + _snapshot = set(_interrupted_threads) if _DEBUG_INTERRUPT else None + if _DEBUG_INTERRUPT: + logger.info( + "[interrupt-debug] set_interrupt(active=%s, target_tid=%s) " + "called_from_tid=%s current_set=%s", + active, tid, threading.current_thread().ident, _snapshot, + ) def is_interrupted() -> bool: From c5c0bb9a732c11b786e1595af98d5faa06048899 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Fri, 17 Apr 2026 21:16:33 -0700 Subject: [PATCH 002/143] fix: point optional-dep install hints at the venv's python (#11938) Error messages that tell users to install optional extras now use {sys.executable} -m pip install ... instead of a bare 'pip install hermes-agent[extra]' string. Under the curl installer, bare 'pip' resolves to system pip, which either fails with PEP 668 externally-managed-environment or installs into the wrong Python. Affects: hermes dashboard, hermes web server startup, mcp_serve, hermes doctor Bedrock check, CLI voice mode, voice_mode tool runtime error, Discord voice-channel join failure message. --- cli.py | 6 ++---- gateway/run.py | 3 +-- hermes_cli/doctor.py | 4 ++-- hermes_cli/main.py | 2 +- hermes_cli/web_server.py | 2 +- mcp_serve.py | 4 ++-- tools/voice_mode.py | 4 ++-- 7 files changed, 11 insertions(+), 14 deletions(-) diff --git a/cli.py b/cli.py index 2456c7754b2..ea76991acc3 100644 --- a/cli.py +++ b/cli.py @@ -7017,8 +7017,7 @@ class HermesCLI: ) raise RuntimeError( "Voice mode requires sounddevice and numpy.\n" - "Install with: pip install sounddevice numpy\n" - "Or: pip install hermes-agent[voice]" + f"Install with: {sys.executable} -m pip install sounddevice numpy" ) if not reqs.get("stt_available", reqs.get("stt_key_set")): raise RuntimeError( @@ -7294,8 +7293,7 @@ class HermesCLI: _cprint(f" {_DIM}Then install/update the Termux:API Android app for microphone capture{_RST}") _cprint(f" {_BOLD}Option 2: pkg install python-numpy portaudio && python -m pip install sounddevice{_RST}") else: - _cprint(f"\n {_BOLD}Install: pip install {' '.join(reqs['missing_packages'])}{_RST}") - _cprint(f" {_DIM}Or: pip install hermes-agent[voice]{_RST}") + _cprint(f"\n {_BOLD}Install: {sys.executable} -m pip install {' '.join(reqs['missing_packages'])}{_RST}") return with self._voice_lock: diff --git a/gateway/run.py b/gateway/run.py index b3270d95827..e09dbde2654 100644 --- a/gateway/run.py +++ b/gateway/run.py @@ -5520,8 +5520,7 @@ class GatewayRunner: if "pynacl" in err_lower or "nacl" in err_lower or "davey" in err_lower: return ( "Voice dependencies are missing (PyNaCl / davey). " - "Install or reinstall Hermes with the messaging extra, e.g. " - "`pip install hermes-agent[messaging]`." + f"Install with: `{sys.executable} -m pip install PyNaCl`" ) return f"Failed to join voice channel: {e}" diff --git a/hermes_cli/doctor.py b/hermes_cli/doctor.py index 28c4af1fa8a..4138aeaa278 100644 --- a/hermes_cli/doctor.py +++ b/hermes_cli/doctor.py @@ -895,8 +895,8 @@ def run_doctor(args): _model_count = len(_br_resp.get("modelSummaries", [])) print(f"\r {color('✓', Colors.GREEN)} {_label} {color(f'({_auth_var}, {_region}, {_model_count} models)', Colors.DIM)} ") except ImportError: - print(f"\r {color('⚠', Colors.YELLOW)} {_label} {color('(boto3 not installed — pip install hermes-agent[bedrock])', Colors.DIM)} ") - issues.append("Install boto3 for Bedrock: pip install hermes-agent[bedrock]") + print(f"\r {color('⚠', Colors.YELLOW)} {_label} {color(f'(boto3 not installed — {sys.executable} -m pip install boto3)', Colors.DIM)} ") + issues.append(f"Install boto3 for Bedrock: {sys.executable} -m pip install boto3") except Exception as _e: _err_name = type(_e).__name__ print(f"\r {color('⚠', Colors.YELLOW)} {_label} {color(f'({_err_name}: {_e})', Colors.DIM)} ") diff --git a/hermes_cli/main.py b/hermes_cli/main.py index e2e2a774f5a..81b27e4a100 100644 --- a/hermes_cli/main.py +++ b/hermes_cli/main.py @@ -6029,7 +6029,7 @@ def cmd_dashboard(args): import uvicorn # noqa: F401 except ImportError: print("Web UI dependencies not installed.") - print("Install them with: pip install hermes-agent[web]") + print(f"Install them with: {sys.executable} -m pip install 'fastapi' 'uvicorn[standard]'") sys.exit(1) if not _build_web_ui(PROJECT_ROOT / "web", fatal=True): diff --git a/hermes_cli/web_server.py b/hermes_cli/web_server.py index e5f2eb53767..0d0dc4a66b5 100644 --- a/hermes_cli/web_server.py +++ b/hermes_cli/web_server.py @@ -56,7 +56,7 @@ try: except ImportError: raise SystemExit( "Web UI requires fastapi and uvicorn.\n" - "Run 'hermes web' to auto-install, or: pip install hermes-agent[web]" + f"Install with: {sys.executable} -m pip install 'fastapi' 'uvicorn[standard]'" ) WEB_DIST = Path(__file__).parent / "web_dist" diff --git a/mcp_serve.py b/mcp_serve.py index e8294d1f91f..e0aeb706191 100644 --- a/mcp_serve.py +++ b/mcp_serve.py @@ -433,7 +433,7 @@ def create_mcp_server(event_bridge: Optional[EventBridge] = None) -> "FastMCP": if not _MCP_SERVER_AVAILABLE: raise ImportError( "MCP server requires the 'mcp' package. " - "Install with: pip install 'hermes-agent[mcp]'" + f"Install with: {sys.executable} -m pip install 'mcp'" ) mcp = FastMCP( @@ -838,7 +838,7 @@ def run_mcp_server(verbose: bool = False) -> None: if not _MCP_SERVER_AVAILABLE: print( "Error: MCP server requires the 'mcp' package.\n" - "Install with: pip install 'hermes-agent[mcp]'", + f"Install with: {sys.executable} -m pip install 'mcp'", file=sys.stderr, ) sys.exit(1) diff --git a/tools/voice_mode.py b/tools/voice_mode.py index 50515fc6903..66ecb242c67 100644 --- a/tools/voice_mode.py +++ b/tools/voice_mode.py @@ -15,6 +15,7 @@ import platform import re import shutil import subprocess +import sys import tempfile import threading import time @@ -582,8 +583,7 @@ class AudioRecorder: except (ImportError, OSError) as e: raise RuntimeError( "Voice mode requires sounddevice and numpy.\n" - "Install with: pip install sounddevice numpy\n" - "Or: pip install hermes-agent[voice]" + f"Install with: {sys.executable} -m pip install sounddevice numpy" ) from e with self._lock: From 45acd9beb571d0cba4ea38662b0daaac642ea3fb Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Fri, 17 Apr 2026 21:17:33 -0700 Subject: [PATCH 003/143] fix(gateway): ignore redelivered /restart after PTB offset ACK fails (#11940) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When a Telegram /restart fires and PTB's graceful-shutdown `get_updates` ACK call times out ("When polling for updates is restarted, updates may be received twice" in gateway.log), the new gateway receives the same /restart again and restarts a second time — a self-perpetuating loop. Record the triggering update_id in `.restart_last_processed.json` when handling /restart. On the next process, reject a /restart whose update_id <= the recorded one as a stale redelivery. 5-minute staleness guard so an orphaned marker can't block a legitimately new /restart. - gateway/platforms/base.py: add `platform_update_id` to MessageEvent - gateway/platforms/telegram.py: propagate `update.update_id` through _build_message_event for text/command/location/media handlers - gateway/run.py: write dedup marker in _handle_restart_command; _is_stale_restart_redelivery checks it before processing /restart - tests/gateway/test_restart_redelivery_dedup.py: 9 new tests covering fresh restart, redelivery, staleness window, cross-platform, malformed-marker resilience, and no-update_id (CLI) bypass Only active for Telegram today (the one platform with monotonic cross-session update ordering); other platforms return False from _is_stale_restart_redelivery and proceed normally. --- gateway/platforms/base.py | 9 + gateway/platforms/telegram.py | 24 +- gateway/run.py | 92 +++++++ .../gateway/test_restart_redelivery_dedup.py | 247 ++++++++++++++++++ 4 files changed, 366 insertions(+), 6 deletions(-) create mode 100644 tests/gateway/test_restart_redelivery_dedup.py diff --git a/gateway/platforms/base.py b/gateway/platforms/base.py index af694a5e2d6..f82b1fa0683 100644 --- a/gateway/platforms/base.py +++ b/gateway/platforms/base.py @@ -669,6 +669,15 @@ class MessageEvent: # Original platform data raw_message: Any = None message_id: Optional[str] = None + + # Platform-specific update identifier. For Telegram this is the + # ``update_id`` from the PTB Update wrapper; other platforms currently + # ignore it. Used by ``/restart`` to record the triggering update so the + # new gateway can advance the Telegram offset past it and avoid processing + # the same ``/restart`` twice if PTB's graceful-shutdown ACK times out + # ("Error while calling `get_updates` one more time to mark all fetched + # updates" in gateway.log). + platform_update_id: Optional[int] = None # Media attachments # media_urls: local file paths (for vision tool access) diff --git a/gateway/platforms/telegram.py b/gateway/platforms/telegram.py index 5b1fef1337b..8df05268c71 100644 --- a/gateway/platforms/telegram.py +++ b/gateway/platforms/telegram.py @@ -2326,7 +2326,7 @@ class TelegramAdapter(BasePlatformAdapter): if not self._should_process_message(update.message): return - event = self._build_message_event(update.message, MessageType.TEXT) + event = self._build_message_event(update.message, MessageType.TEXT, update_id=update.update_id) event.text = self._clean_bot_trigger_text(event.text) self._enqueue_text_event(event) @@ -2337,7 +2337,7 @@ class TelegramAdapter(BasePlatformAdapter): if not self._should_process_message(update.message, is_command=True): return - event = self._build_message_event(update.message, MessageType.COMMAND) + event = self._build_message_event(update.message, MessageType.COMMAND, update_id=update.update_id) await self.handle_message(event) async def _handle_location_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None: @@ -2373,7 +2373,7 @@ class TelegramAdapter(BasePlatformAdapter): parts.append(f"Map: https://www.google.com/maps/search/?api=1&query={lat},{lon}") parts.append("Ask what they'd like to find nearby (restaurants, cafes, etc.) and any preferences.") - event = self._build_message_event(msg, MessageType.LOCATION) + event = self._build_message_event(msg, MessageType.LOCATION, update_id=update.update_id) event.text = "\n".join(parts) await self.handle_message(event) @@ -2524,7 +2524,7 @@ class TelegramAdapter(BasePlatformAdapter): else: msg_type = MessageType.DOCUMENT - event = self._build_message_event(msg, msg_type) + event = self._build_message_event(msg, msg_type, update_id=update.update_id) # Add caption as text if msg.caption: @@ -2863,8 +2863,19 @@ class TelegramAdapter(BasePlatformAdapter): self.name, cache_key, thread_id, ) - def _build_message_event(self, message: Message, msg_type: MessageType) -> MessageEvent: - """Build a MessageEvent from a Telegram message.""" + def _build_message_event( + self, + message: Message, + msg_type: MessageType, + update_id: Optional[int] = None, + ) -> MessageEvent: + """Build a MessageEvent from a Telegram message. + + ``update_id`` is the ``Update.update_id`` from PTB; passing it through + lets ``/restart`` record the triggering offset so the new gateway + process can advance past it (prevents ``/restart`` being re-delivered + when PTB's graceful-shutdown ACK fails). + """ chat = message.chat user = message.from_user @@ -2943,6 +2954,7 @@ class TelegramAdapter(BasePlatformAdapter): source=source, raw_message=message, message_id=str(message.message_id), + platform_update_id=update_id, reply_to_message_id=reply_to_id, reply_to_text=reply_to_text, auto_skill=topic_skill, diff --git a/gateway/run.py b/gateway/run.py index e09dbde2654..62b813f0d6b 100644 --- a/gateway/run.py +++ b/gateway/run.py @@ -4738,6 +4738,26 @@ class GatewayRunner: async def _handle_restart_command(self, event: MessageEvent) -> str: """Handle /restart command - drain active work, then restart the gateway.""" + # Defensive idempotency check: if the previous gateway process + # recorded this same /restart (same platform + update_id) and the new + # process is seeing it *again*, this is a re-delivery caused by PTB's + # graceful-shutdown `get_updates` ACK failing on the way out ("Error + # while calling `get_updates` one more time to mark all fetched + # updates. Suppressing error to ensure graceful shutdown. When + # polling for updates is restarted, updates may be received twice." + # in gateway.log). Ignoring the stale redelivery prevents a + # self-perpetuating restart loop where every fresh gateway + # re-processes the same /restart command and immediately restarts + # again. + if self._is_stale_restart_redelivery(event): + logger.info( + "Ignoring redelivered /restart (platform=%s, update_id=%s) — " + "already processed by a previous gateway instance.", + event.source.platform.value if event.source and event.source.platform else "?", + event.platform_update_id, + ) + return "" + if self._restart_requested or self._draining: count = self._running_agent_count() if count: @@ -4760,6 +4780,26 @@ class GatewayRunner: except Exception as e: logger.debug("Failed to write restart notify file: %s", e) + # Record the triggering platform + update_id in a dedicated dedup + # marker. Unlike .restart_notify.json (which gets unlinked once the + # new gateway sends the "gateway restarted" notification), this + # marker persists so the new gateway can still detect a delayed + # /restart redelivery from Telegram. Overwritten on every /restart. + try: + import json as _json + import time as _time + dedup_data = { + "platform": event.source.platform.value if event.source.platform else None, + "requested_at": _time.time(), + } + if event.platform_update_id is not None: + dedup_data["update_id"] = event.platform_update_id + (_hermes_home / ".restart_last_processed.json").write_text( + _json.dumps(dedup_data) + ) + except Exception as e: + logger.debug("Failed to write restart dedup marker: %s", e) + active_agents = self._running_agent_count() # When running under a service manager (systemd/launchd), use the # service restart path: exit with code 75 so the service manager @@ -4775,6 +4815,58 @@ class GatewayRunner: return f"⏳ Draining {active_agents} active agent(s) before restart..." return "♻ Restarting gateway. If you aren't notified within 60 seconds, restart from the console with `hermes gateway restart`." + def _is_stale_restart_redelivery(self, event: MessageEvent) -> bool: + """Return True if this /restart is a Telegram re-delivery we already handled. + + The previous gateway wrote ``.restart_last_processed.json`` with the + triggering platform + update_id when it processed the /restart. If + we now see a /restart on the same platform with an update_id <= that + recorded value AND the marker is recent (< 5 minutes), it's a + redelivery and should be ignored. + + Only applies to Telegram today (the only platform that exposes a + numeric cross-session update ordering); other platforms return False. + """ + if event is None or event.source is None: + return False + if event.platform_update_id is None: + return False + if event.source.platform is None: + return False + # Only Telegram populates platform_update_id currently; be explicit + # so future platforms aren't accidentally gated by this check. + try: + platform_value = event.source.platform.value + except Exception: + return False + if platform_value != "telegram": + return False + + try: + import json as _json + import time as _time + marker_path = _hermes_home / ".restart_last_processed.json" + if not marker_path.exists(): + return False + data = _json.loads(marker_path.read_text()) + except Exception: + return False + + if data.get("platform") != platform_value: + return False + recorded_uid = data.get("update_id") + if not isinstance(recorded_uid, int): + return False + # Staleness guard: ignore markers older than 5 minutes. A legitimately + # old marker (e.g. crash recovery where notify never fired) should not + # swallow a fresh /restart from the user. + requested_at = data.get("requested_at") + if isinstance(requested_at, (int, float)): + if _time.time() - requested_at > 300: + return False + return event.platform_update_id <= recorded_uid + + async def _handle_help_command(self, event: MessageEvent) -> str: """Handle /help command - list available commands.""" from hermes_cli.commands import gateway_help_lines diff --git a/tests/gateway/test_restart_redelivery_dedup.py b/tests/gateway/test_restart_redelivery_dedup.py new file mode 100644 index 00000000000..aa4e4330caf --- /dev/null +++ b/tests/gateway/test_restart_redelivery_dedup.py @@ -0,0 +1,247 @@ +"""Tests for /restart idempotency guard against Telegram update re-delivery. + +When PTB's graceful-shutdown ACK call (the final `get_updates` on exit) fails +with a network error, Telegram re-delivers the `/restart` message to the new +gateway process. Without a dedup guard, the new gateway would process +`/restart` again and immediately restart — a self-perpetuating loop. +""" +import asyncio +import json +import time +from unittest.mock import MagicMock + +import pytest + +import gateway.run as gateway_run +from gateway.platforms.base import MessageEvent, MessageType +from tests.gateway.restart_test_helpers import make_restart_runner, make_restart_source + + +def _make_restart_event(update_id: int | None = 100) -> MessageEvent: + return MessageEvent( + text="/restart", + message_type=MessageType.TEXT, + source=make_restart_source(), + message_id="m1", + platform_update_id=update_id, + ) + + +@pytest.mark.asyncio +async def test_restart_handler_writes_dedup_marker_with_update_id(tmp_path, monkeypatch): + """First /restart writes .restart_last_processed.json with the triggering update_id.""" + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + monkeypatch.delenv("INVOCATION_ID", raising=False) + + runner, _adapter = make_restart_runner() + runner.request_restart = MagicMock(return_value=True) + + event = _make_restart_event(update_id=12345) + result = await runner._handle_restart_command(event) + + assert "Restarting gateway" in result + marker_path = tmp_path / ".restart_last_processed.json" + assert marker_path.exists() + data = json.loads(marker_path.read_text()) + assert data["platform"] == "telegram" + assert data["update_id"] == 12345 + assert isinstance(data["requested_at"], (int, float)) + + +@pytest.mark.asyncio +async def test_redelivered_restart_with_same_update_id_is_ignored(tmp_path, monkeypatch): + """A /restart with update_id <= recorded marker is silently ignored as a redelivery.""" + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + monkeypatch.delenv("INVOCATION_ID", raising=False) + + # Previous gateway recorded update_id=12345 a few seconds ago + marker = tmp_path / ".restart_last_processed.json" + marker.write_text(json.dumps({ + "platform": "telegram", + "update_id": 12345, + "requested_at": time.time() - 5, + })) + + runner, _adapter = make_restart_runner() + runner.request_restart = MagicMock() + + event = _make_restart_event(update_id=12345) # same update_id → redelivery + result = await runner._handle_restart_command(event) + + assert result == "" # silently ignored + runner.request_restart.assert_not_called() + + +@pytest.mark.asyncio +async def test_redelivered_restart_with_older_update_id_is_ignored(tmp_path, monkeypatch): + """update_id strictly LESS than the recorded one is also a redelivery.""" + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + monkeypatch.delenv("INVOCATION_ID", raising=False) + + marker = tmp_path / ".restart_last_processed.json" + marker.write_text(json.dumps({ + "platform": "telegram", + "update_id": 12345, + "requested_at": time.time() - 5, + })) + + runner, _adapter = make_restart_runner() + runner.request_restart = MagicMock() + + event = _make_restart_event(update_id=12344) # older update — shouldn't happen, + # but if Telegram does re-deliver + # something older, treat as stale + result = await runner._handle_restart_command(event) + + assert result == "" + runner.request_restart.assert_not_called() + + +@pytest.mark.asyncio +async def test_fresh_restart_with_higher_update_id_is_processed(tmp_path, monkeypatch): + """A NEW /restart from the user (higher update_id) bypasses the dedup guard.""" + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + monkeypatch.delenv("INVOCATION_ID", raising=False) + + # Previous restart recorded update_id=12345 + marker = tmp_path / ".restart_last_processed.json" + marker.write_text(json.dumps({ + "platform": "telegram", + "update_id": 12345, + "requested_at": time.time() - 5, + })) + + runner, _adapter = make_restart_runner() + runner.request_restart = MagicMock(return_value=True) + + event = _make_restart_event(update_id=12346) # strictly higher → fresh + result = await runner._handle_restart_command(event) + + assert "Restarting gateway" in result + runner.request_restart.assert_called_once() + + # Marker is overwritten with the new update_id + data = json.loads(marker.read_text()) + assert data["update_id"] == 12346 + + +@pytest.mark.asyncio +async def test_stale_marker_older_than_5min_does_not_block(tmp_path, monkeypatch): + """A marker older than the 5-minute window is ignored — fresh /restart proceeds.""" + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + monkeypatch.delenv("INVOCATION_ID", raising=False) + + marker = tmp_path / ".restart_last_processed.json" + marker.write_text(json.dumps({ + "platform": "telegram", + "update_id": 12345, + "requested_at": time.time() - 600, # 10 minutes ago + })) + + runner, _adapter = make_restart_runner() + runner.request_restart = MagicMock(return_value=True) + + # Same update_id as the stale marker, but the marker is too old to trust + event = _make_restart_event(update_id=12345) + result = await runner._handle_restart_command(event) + + assert "Restarting gateway" in result + runner.request_restart.assert_called_once() + + +@pytest.mark.asyncio +async def test_no_marker_file_allows_restart(tmp_path, monkeypatch): + """Clean gateway start (no prior marker) processes /restart normally.""" + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + monkeypatch.delenv("INVOCATION_ID", raising=False) + + runner, _adapter = make_restart_runner() + runner.request_restart = MagicMock(return_value=True) + + event = _make_restart_event(update_id=100) + result = await runner._handle_restart_command(event) + + assert "Restarting gateway" in result + runner.request_restart.assert_called_once() + + +@pytest.mark.asyncio +async def test_corrupt_marker_file_is_treated_as_absent(tmp_path, monkeypatch): + """Malformed JSON in the marker file doesn't crash — /restart proceeds.""" + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + monkeypatch.delenv("INVOCATION_ID", raising=False) + + marker = tmp_path / ".restart_last_processed.json" + marker.write_text("not-json{") + + runner, _adapter = make_restart_runner() + runner.request_restart = MagicMock(return_value=True) + + event = _make_restart_event(update_id=100) + result = await runner._handle_restart_command(event) + + assert "Restarting gateway" in result + runner.request_restart.assert_called_once() + + +@pytest.mark.asyncio +async def test_event_without_update_id_bypasses_dedup(tmp_path, monkeypatch): + """Events with no platform_update_id (non-Telegram, CLI fallback) aren't gated.""" + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + monkeypatch.delenv("INVOCATION_ID", raising=False) + + marker = tmp_path / ".restart_last_processed.json" + marker.write_text(json.dumps({ + "platform": "telegram", + "update_id": 999999, + "requested_at": time.time(), + })) + + runner, _adapter = make_restart_runner() + runner.request_restart = MagicMock(return_value=True) + + # No update_id — the dedup check should NOT kick in + event = _make_restart_event(update_id=None) + result = await runner._handle_restart_command(event) + + assert "Restarting gateway" in result + runner.request_restart.assert_called_once() + + +@pytest.mark.asyncio +async def test_different_platform_bypasses_dedup(tmp_path, monkeypatch): + """Marker from Telegram doesn't block a /restart from another platform.""" + from gateway.config import Platform + from gateway.session import SessionSource + + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + monkeypatch.delenv("INVOCATION_ID", raising=False) + + marker = tmp_path / ".restart_last_processed.json" + marker.write_text(json.dumps({ + "platform": "telegram", + "update_id": 12345, + "requested_at": time.time(), + })) + + runner, _adapter = make_restart_runner() + runner.request_restart = MagicMock(return_value=True) + + # /restart from Discord — not a redelivery candidate + discord_source = SessionSource( + platform=Platform.DISCORD, + chat_id="discord-chan", + chat_type="dm", + user_id="u1", + ) + event = MessageEvent( + text="/restart", + message_type=MessageType.TEXT, + source=discord_source, + message_id="m1", + platform_update_id=12345, + ) + result = await runner._handle_restart_command(event) + + assert "Restarting gateway" in result + runner.request_restart.assert_called_once() From 11a89cc032b20f75e5273f98e9a02dcaf06ce573 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Fri, 17 Apr 2026 21:22:11 -0700 Subject: [PATCH 004/143] docs: backfill coverage for recently-merged features (#11942) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fills documentation gaps that accumulated as features merged ahead of their docs updates. All additions are verified against code and the originating PRs. Providers: - Ollama Cloud (#10782) — new provider section, env vars, quickstart/fallback rows - xAI Grok Responses API + TTS (#10783) — provider note, TTS table + config - Google Gemini CLI OAuth (#11270) — quickstart/fallback/cli-commands entries - NVIDIA NIM (#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference - HERMES_INFERENCE_PROVIDER enum updated Messaging: - DISCORD_ALLOWED_ROLES (#11608) — env-vars, discord.md access control section - DingTalk QR device-flow (#11574) — wizard path in Option A + openClaw disclosure - Feishu document comment intelligent reply (#11898) — full section + 3-tier access control + CLI Skills / commands: - concept-diagrams skill (#11363) — optional-skills-catalog entry - /gquota (#11270) — slash-commands reference Build: docusaurus build passes, ascii-guard lint 0 errors. --- website/docs/getting-started/quickstart.md | 3 ++ website/docs/integrations/providers.md | 30 +++++++++++- website/docs/reference/cli-commands.md | 2 +- .../docs/reference/environment-variables.md | 9 +++- .../docs/reference/optional-skills-catalog.md | 1 + website/docs/reference/slash-commands.md | 1 + .../user-guide/features/fallback-providers.md | 3 ++ website/docs/user-guide/features/tts.md | 10 +++- website/docs/user-guide/messaging/dingtalk.md | 9 +++- website/docs/user-guide/messaging/discord.md | 23 ++++++++- website/docs/user-guide/messaging/feishu.md | 48 +++++++++++++++++++ 11 files changed, 132 insertions(+), 7 deletions(-) diff --git a/website/docs/getting-started/quickstart.md b/website/docs/getting-started/quickstart.md index 428d23b7ce3..77d6ac84904 100644 --- a/website/docs/getting-started/quickstart.md +++ b/website/docs/getting-started/quickstart.md @@ -62,6 +62,9 @@ hermes setup # Or configure everything at once | **OpenCode Go** | $10/month subscription for open models | Set `OPENCODE_GO_API_KEY` | | **DeepSeek** | Direct DeepSeek API access | Set `DEEPSEEK_API_KEY` | | **NVIDIA NIM** | Nemotron models via build.nvidia.com or local NIM | Set `NVIDIA_API_KEY` (optional: `NVIDIA_BASE_URL`) | +| **Ollama Cloud** | Managed Ollama catalog without local GPU | Set `OLLAMA_API_KEY` (or pick **Ollama Cloud** in `hermes model`) | +| **Google Gemini (OAuth)** | Gemini via Cloud Code Assist — free and paid tiers | OAuth via `hermes model` (optional: `HERMES_GEMINI_PROJECT_ID` for paid tiers) | +| **xAI (Grok)** | Grok 4 models via Responses API + prompt caching | Set `XAI_API_KEY` (alias: `grok`) | | **GitHub Copilot** | GitHub Copilot subscription (GPT-5.x, Claude, Gemini, etc.) | OAuth via `hermes model`, or `COPILOT_GITHUB_TOKEN` / `GH_TOKEN` | | **GitHub Copilot ACP** | Copilot ACP agent backend (spawns local `copilot` CLI) | `hermes model` (requires `copilot` CLI + `copilot login`) | | **Vercel AI Gateway** | Vercel AI Gateway routing | Set `AI_GATEWAY_API_KEY` | diff --git a/website/docs/integrations/providers.md b/website/docs/integrations/providers.md index 750ad671cda..56d2f0ea38d 100644 --- a/website/docs/integrations/providers.md +++ b/website/docs/integrations/providers.md @@ -289,12 +289,40 @@ Base URLs can be overridden with `GLM_BASE_URL`, `KIMI_BASE_URL`, `MINIMAX_BASE_ When using the Z.AI / GLM provider, Hermes automatically probes multiple endpoints (global, China, coding variants) to find one that accepts your API key. You don't need to set `GLM_BASE_URL` manually — the working endpoint is detected and cached automatically. ::: -### xAI (Grok) Prompt Caching +### xAI (Grok) — Responses API + Prompt Caching + +xAI is wired through the Responses API (`codex_responses` transport) for automatic reasoning support on Grok 4 models — no `reasoning_effort` parameter needed, the server reasons by default. Set `XAI_API_KEY` in `~/.hermes/.env` and pick xAI in `hermes model`, or drop `grok` as a shortcut into `/model grok-4-1-fast-reasoning`. When using xAI as a provider (any base URL containing `x.ai`), Hermes automatically enables prompt caching by sending the `x-grok-conv-id` header with every API request. This routes requests to the same server within a conversation session, allowing xAI's infrastructure to reuse cached system prompts and conversation history. No configuration is needed — caching activates automatically when an xAI endpoint is detected and a session ID is available. This reduces latency and cost for multi-turn conversations. +xAI also ships a dedicated TTS endpoint (`/v1/tts`). Select **xAI TTS** in `hermes tools` → Voice & TTS, or see the [Voice & TTS](../user-guide/features/tts.md#text-to-speech) page for config. + +### Ollama Cloud — Managed Ollama Models, OAuth + API Key + +[Ollama Cloud](https://ollama.com/cloud) hosts the same open-weight catalog as local Ollama but without the GPU requirement. Pick it in `hermes model` as **Ollama Cloud**, paste your API key from [ollama.com/settings/keys](https://ollama.com/settings/keys), and Hermes auto-discovers the available models. + +```bash +hermes model +# → pick "Ollama Cloud" +# → paste your OLLAMA_API_KEY +# → select from discovered models (gpt-oss:120b, glm-4.6:cloud, qwen3-coder:480b-cloud, etc.) +``` + +Or `config.yaml` directly: +```yaml +model: + provider: "ollama-cloud" + default: "gpt-oss:120b" +``` + +The model catalog is fetched dynamically from `ollama.com/v1/models` and cached for one hour. `model:tag` notation (e.g. `qwen3-coder:480b-cloud`) is preserved through normalization — don't use dashes. + +:::tip Ollama Cloud vs local Ollama +Both speak the same OpenAI-compatible API. Cloud is a first-class provider (`--provider ollama-cloud`, `OLLAMA_API_KEY`); local Ollama is reached via the Custom Endpoint flow (base URL `http://localhost:11434/v1`, no key). Use cloud for large models you can't run locally; use local for privacy or offline work. +::: + ### NVIDIA NIM Nemotron and other open source models via [build.nvidia.com](https://build.nvidia.com) (free API key) or a local NIM endpoint. diff --git a/website/docs/reference/cli-commands.md b/website/docs/reference/cli-commands.md index 6b08552676e..ea5557a193d 100644 --- a/website/docs/reference/cli-commands.md +++ b/website/docs/reference/cli-commands.md @@ -85,7 +85,7 @@ Common options: | `-q`, `--query "..."` | One-shot, non-interactive prompt. | | `-m`, `--model ` | Override the model for this run. | | `-t`, `--toolsets ` | Enable a comma-separated set of toolsets. | -| `--provider ` | Force a provider: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot-acp`, `copilot`, `anthropic`, `gemini`, `huggingface`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`, `xiaomi`, `arcee`. | +| `--provider ` | Force a provider: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot-acp`, `copilot`, `anthropic`, `gemini`, `google-gemini-cli`, `huggingface`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`, `xiaomi`, `arcee`, `alibaba`, `deepseek`, `nvidia`, `ollama-cloud`, `xai` (alias `grok`). | | `-s`, `--skills ` | Preload one or more skills for the session (can be repeated or comma-separated). | | `-v`, `--verbose` | Verbose output. | | `-Q`, `--quiet` | Programmatic mode: suppress banner/spinner/tool previews. | diff --git a/website/docs/reference/environment-variables.md b/website/docs/reference/environment-variables.md index ead884ba7b7..ff223739af3 100644 --- a/website/docs/reference/environment-variables.md +++ b/website/docs/reference/environment-variables.md @@ -56,6 +56,12 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config | `DASHSCOPE_BASE_URL` | Custom DashScope base URL (default: `https://coding-intl.dashscope.aliyuncs.com/v1`) | | `DEEPSEEK_API_KEY` | DeepSeek API key for direct DeepSeek access ([platform.deepseek.com](https://platform.deepseek.com/api_keys)) | | `DEEPSEEK_BASE_URL` | Custom DeepSeek API base URL | +| `NVIDIA_API_KEY` | NVIDIA NIM API key — Nemotron and open models ([build.nvidia.com](https://build.nvidia.com)) | +| `NVIDIA_BASE_URL` | Override NVIDIA base URL (default: `https://integrate.api.nvidia.com/v1`; set to `http://localhost:8000/v1` for a local NIM endpoint) | +| `OLLAMA_API_KEY` | Ollama Cloud API key — managed Ollama catalog without local GPU ([ollama.com/settings/keys](https://ollama.com/settings/keys)) | +| `OLLAMA_BASE_URL` | Override Ollama Cloud base URL (default: `https://ollama.com/v1`) | +| `XAI_API_KEY` | xAI (Grok) API key for chat + TTS ([console.x.ai](https://console.x.ai/)) | +| `XAI_BASE_URL` | Override xAI base URL (default: `https://api.x.ai/v1`) | | `OPENCODE_ZEN_API_KEY` | OpenCode Zen API key — pay-as-you-go access to curated models ([opencode.ai](https://opencode.ai/auth)) | | `OPENCODE_ZEN_BASE_URL` | Override OpenCode Zen base URL | | `OPENCODE_GO_API_KEY` | OpenCode Go API key — $10/month subscription for open models ([opencode.ai](https://opencode.ai/auth)) | @@ -73,7 +79,7 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe | Variable | Description | |----------|-------------| -| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `kilocode`, `xiaomi`, `arcee`, `alibaba`, `deepseek`, `opencode-zen`, `opencode-go`, `ai-gateway` (default: `auto`) | +| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `kilocode`, `xiaomi`, `arcee`, `alibaba`, `deepseek`, `nvidia`, `ollama-cloud`, `xai` (alias `grok`), `google-gemini-cli`, `opencode-zen`, `opencode-go`, `ai-gateway` (default: `auto`) | | `HERMES_PORTAL_BASE_URL` | Override Nous Portal URL (for development/testing) | | `NOUS_INFERENCE_BASE_URL` | Override Nous inference API URL | | `HERMES_NOUS_MIN_KEY_TTL_SECONDS` | Min agent key TTL before re-mint (default: 1800 = 30min) | @@ -187,6 +193,7 @@ For cloud sandbox backends, persistence is filesystem-oriented. `TERMINAL_LIFETI | `TELEGRAM_PROXY` | Proxy URL for Telegram connections — overrides `HTTPS_PROXY`. Supports `http://`, `https://`, `socks5://` | | `DISCORD_BOT_TOKEN` | Discord bot token | | `DISCORD_ALLOWED_USERS` | Comma-separated Discord user IDs allowed to use the bot | +| `DISCORD_ALLOWED_ROLES` | Comma-separated Discord role IDs allowed to use the bot (OR with `DISCORD_ALLOWED_USERS`). Auto-enables the Members intent. Useful when moderation teams churn — role grants propagate automatically. | | `DISCORD_HOME_CHANNEL` | Default Discord channel for cron delivery | | `DISCORD_HOME_CHANNEL_NAME` | Display name for the Discord home channel | | `DISCORD_REQUIRE_MENTION` | Require an @mention before responding in server channels | diff --git a/website/docs/reference/optional-skills-catalog.md b/website/docs/reference/optional-skills-catalog.md index 18ec4b3810b..6fde99b5ee8 100644 --- a/website/docs/reference/optional-skills-catalog.md +++ b/website/docs/reference/optional-skills-catalog.md @@ -54,6 +54,7 @@ hermes skills uninstall | Skill | Description | |-------|-------------| | **blender-mcp** | Control Blender directly from Hermes via socket connection to the blender-mcp addon. Create 3D objects, materials, animations, and run arbitrary Blender Python (bpy) code. | +| **concept-diagrams** | Generate flat, minimal light/dark-aware SVG diagrams as standalone HTML files, using a unified educational visual language (9 semantic color ramps, automatic dark mode). Best for physics setups, chemistry mechanisms, math curves, physical objects (aircraft, turbines, smartphones), floor plans, cross-sections, lifecycle/process narratives, and hub-spoke system diagrams. Ships with 15 example diagrams. | | **meme-generation** | Generate real meme images by picking a template and overlaying text with Pillow. Produces actual `.png` meme files. | ## DevOps diff --git a/website/docs/reference/slash-commands.md b/website/docs/reference/slash-commands.md index 2ad3c62d81c..214b2866d07 100644 --- a/website/docs/reference/slash-commands.md +++ b/website/docs/reference/slash-commands.md @@ -83,6 +83,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in | `/image ` | Attach a local image file for your next prompt. | | `/debug` | Upload debug report (system info + logs) and get shareable links. Also available in messaging. | | `/profile` | Show active profile name and home directory | +| `/gquota` | Show Google Gemini Code Assist quota usage with progress bars (only available when the `google-gemini-cli` provider is active). | ### Exit diff --git a/website/docs/user-guide/features/fallback-providers.md b/website/docs/user-guide/features/fallback-providers.md index 12fde185d46..8d16079c2e5 100644 --- a/website/docs/user-guide/features/fallback-providers.md +++ b/website/docs/user-guide/features/fallback-providers.md @@ -48,6 +48,9 @@ Both `provider` and `model` are **required**. If either is missing, the fallback | MiniMax (China) | `minimax-cn` | `MINIMAX_CN_API_KEY` | | DeepSeek | `deepseek` | `DEEPSEEK_API_KEY` | | NVIDIA NIM | `nvidia` | `NVIDIA_API_KEY` (optional: `NVIDIA_BASE_URL`) | +| Ollama Cloud | `ollama-cloud` | `OLLAMA_API_KEY` | +| Google Gemini (OAuth) | `google-gemini-cli` | `hermes model` (Google OAuth; optional: `HERMES_GEMINI_PROJECT_ID`) | +| xAI (Grok) | `xai` (alias `grok`) | `XAI_API_KEY` (optional: `XAI_BASE_URL`) | | OpenCode Zen | `opencode-zen` | `OPENCODE_ZEN_API_KEY` | | OpenCode Go | `opencode-go` | `OPENCODE_GO_API_KEY` | | Kilo Code | `kilocode` | `KILOCODE_API_KEY` | diff --git a/website/docs/user-guide/features/tts.md b/website/docs/user-guide/features/tts.md index 9b0fe8b3afc..9f9d257fcc4 100644 --- a/website/docs/user-guide/features/tts.md +++ b/website/docs/user-guide/features/tts.md @@ -24,6 +24,7 @@ Convert text to speech with seven providers: | **MiniMax TTS** | Excellent | Paid | `MINIMAX_API_KEY` | | **Mistral (Voxtral TTS)** | Excellent | Paid | `MISTRAL_API_KEY` | | **Google Gemini TTS** | Excellent | Free tier | `GEMINI_API_KEY` | +| **xAI TTS** | Excellent | Paid | `XAI_API_KEY` | | **NeuTTS** | Good | Free | None needed | ### Platform Delivery @@ -40,7 +41,7 @@ Convert text to speech with seven providers: ```yaml # In ~/.hermes/config.yaml tts: - provider: "edge" # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "gemini" | "neutts" + provider: "edge" # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "gemini" | "xai" | "neutts" speed: 1.0 # Global speed multiplier (provider-specific settings override this) edge: voice: "en-US-AriaNeural" # 322 voices, 74 languages @@ -65,6 +66,12 @@ tts: gemini: model: "gemini-2.5-flash-preview-tts" # or gemini-2.5-pro-preview-tts voice: "Kore" # 30 prebuilt voices: Zephyr, Puck, Kore, Enceladus, Gacrux, etc. + xai: + voice_id: "eve" # xAI TTS voice (see https://docs.x.ai/docs/api-reference#tts) + language: "en" # ISO 639-1 code + sample_rate: 24000 # 22050 / 24000 (default) / 44100 / 48000 + bit_rate: 128000 # MP3 bitrate; only applies when codec=mp3 + # base_url: "https://api.x.ai/v1" # Override via XAI_BASE_URL env var neutts: ref_audio: '' ref_text: '' @@ -82,6 +89,7 @@ Telegram voice bubbles require Opus/OGG audio format: - **Edge TTS** (default) outputs MP3 and needs **ffmpeg** to convert: - **MiniMax TTS** outputs MP3 and needs **ffmpeg** to convert for Telegram voice bubbles - **Google Gemini TTS** outputs raw PCM and uses **ffmpeg** to encode Opus directly for Telegram voice bubbles +- **xAI TTS** outputs MP3 and needs **ffmpeg** to convert for Telegram voice bubbles - **NeuTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles ```bash diff --git a/website/docs/user-guide/messaging/dingtalk.md b/website/docs/user-guide/messaging/dingtalk.md index d88c1a952f6..9e8e74ee26f 100644 --- a/website/docs/user-guide/messaging/dingtalk.md +++ b/website/docs/user-guide/messaging/dingtalk.md @@ -100,7 +100,14 @@ Run the guided setup command: hermes gateway setup ``` -Select **DingTalk** when prompted, then paste your Client ID, Client Secret, and allowed user IDs when asked. +Select **DingTalk** when prompted. The setup wizard can authorize via one of two paths: + +- **QR-code device flow (recommended).** Scan the QR that prints in your terminal with the DingTalk mobile app — your Client ID and Client Secret are returned automatically and written to `~/.hermes/.env`. No developer-console trip needed. +- **Manual paste.** If you already have credentials (or QR scanning isn't convenient), paste your Client ID, Client Secret, and allowed user IDs when prompted. + +:::note openClaw branding disclosure +Because DingTalk's `verification_uri_complete` is hardcoded to the openClaw identity at the API layer, the QR currently authorizes under an `openClaw` source string until Alibaba / DingTalk-Real-AI registers a Hermes-specific template server-side. This is purely how DingTalk presents the consent screen — the bot you create is fully yours and private to your tenant. +::: ### Option B: Manual Configuration diff --git a/website/docs/user-guide/messaging/discord.md b/website/docs/user-guide/messaging/discord.md index 233f544d9c6..44e08330dfa 100644 --- a/website/docs/user-guide/messaging/discord.md +++ b/website/docs/user-guide/messaging/discord.md @@ -271,7 +271,8 @@ Discord behavior is controlled through two files: **`~/.hermes/.env`** for crede | Variable | Required | Default | Description | |----------|----------|---------|-------------| | `DISCORD_BOT_TOKEN` | **Yes** | — | Bot token from the [Discord Developer Portal](https://discord.com/developers/applications). | -| `DISCORD_ALLOWED_USERS` | **Yes** | — | Comma-separated Discord user IDs allowed to interact with the bot. Without this, the gateway denies all users. | +| `DISCORD_ALLOWED_USERS` | **Yes** | — | Comma-separated Discord user IDs allowed to interact with the bot. Without this **or** `DISCORD_ALLOWED_ROLES`, the gateway denies all users. | +| `DISCORD_ALLOWED_ROLES` | No | — | Comma-separated Discord role IDs. Any member with one of these roles is authorized — OR semantics with `DISCORD_ALLOWED_USERS`. Auto-enables the **Server Members Intent** on connect. Useful when moderation teams churn: new mods get access as soon as the role is granted, no config push needed. | | `DISCORD_HOME_CHANNEL` | No | — | Channel ID where the bot sends proactive messages (cron output, reminders, notifications). | | `DISCORD_HOME_CHANNEL_NAME` | No | `"Home"` | Display name for the home channel in logs and status output. | | `DISCORD_REQUIRE_MENTION` | No | `true` | When `true`, the bot only responds in server channels when `@mentioned`. Set to `false` to respond to all messages in every channel. | @@ -569,9 +570,27 @@ If you intentionally want a shared room conversation, leave it off — just expe ## Security :::warning -Always set `DISCORD_ALLOWED_USERS` to restrict who can interact with the bot. Without it, the gateway denies all users by default as a safety measure. Only add User IDs of people you trust — authorized users have full access to the agent's capabilities, including tool use and system access. +Always set `DISCORD_ALLOWED_USERS` (or `DISCORD_ALLOWED_ROLES`) to restrict who can interact with the bot. Without either, the gateway denies all users by default as a safety measure. Only authorize people you trust — authorized users have full access to the agent's capabilities, including tool use and system access. ::: +### Role-Based Access Control + +For servers where access is managed by roles instead of individual user lists (moderator teams, support staff, internal tooling), use `DISCORD_ALLOWED_ROLES` — a comma-separated list of role IDs. Any member with one of those roles is authorized. + +```bash +# ~/.hermes/.env — works alongside or instead of DISCORD_ALLOWED_USERS +DISCORD_ALLOWED_ROLES=987654321098765432,876543210987654321 +``` + +Semantics: + +- **OR with user allowlist.** A user is authorized if their ID is in `DISCORD_ALLOWED_USERS` **or** they have any role in `DISCORD_ALLOWED_ROLES`. +- **Server Members Intent auto-enabled.** When `DISCORD_ALLOWED_ROLES` is set, the bot enables the Members intent on connect — required for Discord to send role information with member records. +- **Role IDs, not names.** Grab them from Discord: **User Settings → Advanced → Developer Mode ON**, then right-click any role → **Copy Role ID**. +- **DM fallback.** In DMs the role check scans mutual guilds; a user with an allowed role in any shared server is authorized in DMs too. + +This is the preferred pattern when the moderation team churns — new moderators get access the moment the role is granted, with no `.env` edit or gateway restart. + ### Mention Control By default, Hermes blocks the bot from pinging `@everyone`, `@here`, and role mentions, even if its reply contains those tokens. This prevents a poorly-worded prompt or echoed user content from spamming a whole server. Individual `@user` pings and reply-reference pings (the little "replying to…" chip) stay enabled so normal conversation still works. diff --git a/website/docs/user-guide/messaging/feishu.md b/website/docs/user-guide/messaging/feishu.md index 4d9783d402b..6e9f1d0e7fb 100644 --- a/website/docs/user-guide/messaging/feishu.md +++ b/website/docs/user-guide/messaging/feishu.md @@ -244,6 +244,54 @@ Interactive cards require **three** configuration steps in the Feishu Developer Without all three steps, Feishu will successfully *send* interactive cards (sending only requires `im:message:send` permission), but clicking any button will return error 200340. The card appears to work — the error only surfaces when a user interacts with it. ::: +## Document Comment Intelligent Reply + +Beyond chat, the adapter can also answer `@`-mentions left on **Feishu/Lark documents**. When a user comments on a document (local text selection or whole-doc comment) and @-mentions the bot, Hermes reads the document plus the surrounding comment thread and posts an LLM reply inline on the thread. + +Powered by the `drive.notice.comment_add_v1` event, the handler: + +- Fetches the document content and comment timeline in parallel (20 messages for whole-doc threads, 12 for local-selection threads). +- Runs the agent with the `feishu_doc` + `feishu_drive` toolsets scoped to that single comment session. +- Chunks replies at 4000 chars and posts them back as threaded replies. +- Caches per-document sessions for 1 hour with a 50-message cap so follow-up comments on the same doc keep context. + +### 3-Tier Access Control + +Document-comment replies are **explicit-grant only** — there is no implicit allow-all mode. Permissions resolve in this order (first match wins, per field): + +1. **Exact doc** — rule scoped to a specific document token. +2. **Wildcard** — rule that matches a pattern of docs. +3. **Top-level** — default rule for the workspace. + +Two policies are available per rule: + +- **`allowlist`** — a static list of users / tenants. +- **`pairing`** — static list ∪ runtime-approved store. Useful for rollouts where moderators can grant access live. + +Rules live in `~/.hermes/feishu_comment_rules.json` (pairing grants in `~/.hermes/feishu_comment_pairing.json`) with mtime-cached hot-reload — edits take effect on the next comment event without restarting the gateway. + +CLI: + +```bash +# Inspect current rules and pairing state +python -m gateway.platforms.feishu_comment_rules status + +# Simulate an access check for a specific doc + user +python -m gateway.platforms.feishu_comment_rules check + +# Manage pairing grants at runtime +python -m gateway.platforms.feishu_comment_rules pairing list +python -m gateway.platforms.feishu_comment_rules pairing add +python -m gateway.platforms.feishu_comment_rules pairing remove +``` + +### Required Feishu App Configuration + +On top of the chat/card permissions already granted, add the drive comment event: + +- Subscribe to `drive.notice.comment_add_v1` in **Event Subscriptions**. +- Grant the `docs:doc:readonly` and `drive:drive:readonly` scopes so the handler can read document content. + ## Media Support ### Inbound (receiving) From 1c352f6b1d377088b5a3d4310030587a9960a09d Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Fri, 17 Apr 2026 21:23:31 -0700 Subject: [PATCH 005/143] docs(browser): expand Camofox persistence guide with troubleshooting (#11957) The existing 'Persistent browser sessions' section had the correct config snippet but users still hit the flag at the wrong config path, assumed Hermes could force persistence when the server was ephemeral, and had no way to verify the flag was actually taking effect. Adds to that section: - Warning admonition calling out the nested path vs top-level mistake. - Explicit 'What Hermes does / does not do' split so users understand Hermes can only send a stable userId; the Camofox server must map it to a persistent profile. - 5-step verification flow for confirming persistence works end-to-end. - Reminder to restart Hermes after editing config.yaml. - Where Hermes derives the stable userId (~/.hermes/browser_auth/camofox/) so users can reset or back up state. Docs-only change. --- website/docs/user-guide/features/browser.md | 39 +++++++++++++++++++-- 1 file changed, 36 insertions(+), 3 deletions(-) diff --git a/website/docs/user-guide/features/browser.md b/website/docs/user-guide/features/browser.md index 9880965ae48..42b6815df51 100644 --- a/website/docs/user-guide/features/browser.md +++ b/website/docs/user-guide/features/browser.md @@ -111,16 +111,49 @@ When `CAMOFOX_URL` is set, all browser tools automatically route through Camofox #### Persistent browser sessions -By default, each Camofox session gets a random identity — cookies and logins don't survive across agent restarts. To enable persistent browser sessions: +By default, each Camofox session gets a random identity — cookies and logins don't survive across agent restarts. To enable persistent browser sessions, add the following to `~/.hermes/config.yaml`: ```yaml -# In ~/.hermes/config.yaml browser: camofox: managed_persistence: true ``` -When enabled, Hermes sends a stable profile-scoped `userId` to Camofox. The Camofox server automatically maps each `userId` to a dedicated persistent Firefox profile, so cookies, logins, and localStorage survive across restarts. Different Hermes profiles get different browser profiles (profile isolation). +Then fully restart Hermes so the new config is picked up. + +:::warning Nested path matters +Hermes reads `browser.camofox.managed_persistence`, **not** a top-level `managed_persistence`. A common mistake is writing: + +```yaml +# ❌ Wrong — Hermes ignores this +managed_persistence: true +``` + +If the flag is placed at the wrong path, Hermes silently falls back to a random ephemeral `userId` and your login state will be lost on every session. +::: + +##### What Hermes does +- Sends a deterministic profile-scoped `userId` to Camofox so the server can reuse the same Firefox profile across sessions. +- Skips server-side context destruction on cleanup, so cookies and logins survive between agent tasks. +- Scopes the `userId` to the active Hermes profile, so different Hermes profiles get different browser profiles (profile isolation). + +##### What Hermes does not do +- It does not force persistence on the Camofox server. Hermes only sends a stable `userId`; the server must honor it by mapping that `userId` to a persistent Firefox profile directory. +- If your Camofox server build treats every request as ephemeral (e.g. always calls `browser.newContext()` without loading a stored profile), Hermes cannot make those sessions persist. Make sure you are running a Camofox build that implements userId-based profile persistence. + +##### Verify it's working + +1. Start Hermes and your Camofox server. +2. Open Google (or any login site) in a browser task and sign in manually. +3. End the browser task normally. +4. Start a new browser task. +5. Open the same site again — you should still be signed in. + +If step 5 logs you out, the Camofox server isn't honoring the stable `userId`. Double-check your config path, confirm you fully restarted Hermes after editing `config.yaml`, and verify your Camofox server version supports persistent per-user profiles. + +##### Where state lives + +Hermes derives the stable `userId` from the profile-scoped directory `~/.hermes/browser_auth/camofox/` (or the equivalent under `$HERMES_HOME` for non-default profiles). The actual browser profile data lives on the Camofox server side, keyed by that `userId`. To fully reset a persistent profile, clear it on the Camofox server and remove the corresponding Hermes profile's state directory. #### VNC live view From 8a59f8a9edcf6a23cffda3377cced2761732e7bc Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Fri, 17 Apr 2026 21:29:24 -0700 Subject: [PATCH 006/143] fix(update): survive mid-update terminal disconnect (#11960) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit hermes update no longer dies when the controlling terminal closes (SSH drop, shell close) during pip install. SIGHUP is set to SIG_IGN for the duration of the update, and stdout/stderr are wrapped so writes to a closed pipe are absorbed instead of cascading into process exit. All update output is mirrored to ~/.hermes/logs/update.log so users can see what happened after reconnecting. SIGINT (Ctrl-C) and SIGTERM (systemd) are intentionally still honored — those are deliberate cancellations, not accidents. In gateway mode the helper is a no-op since the update is already detached. POSIX preserves SIG_IGN across exec(), so pip and git subprocesses inherit hangup protection automatically — no changes to subprocess spawning needed. --- hermes_cli/main.py | 195 ++++++++++- .../test_update_hangup_protection.py | 325 ++++++++++++++++++ website/docs/getting-started/updating.md | 15 + 3 files changed, 534 insertions(+), 1 deletion(-) create mode 100644 tests/hermes_cli/test_update_hangup_protection.py diff --git a/hermes_cli/main.py b/hermes_cli/main.py index 81b27e4a100..0afadac3d16 100644 --- a/hermes_cli/main.py +++ b/hermes_cli/main.py @@ -4985,8 +4985,187 @@ def _update_node_dependencies() -> None: print(f" {stderr.splitlines()[-1]}") +class _UpdateOutputStream: + """Stream wrapper used during ``hermes update`` to survive terminal loss. + + Wraps the process's original stdout/stderr so that: + + * Every write is also mirrored to an append-only log file + (``~/.hermes/logs/update.log``) that users can inspect after the + terminal disconnects. + * Writes to the original stream that fail with ``BrokenPipeError`` / + ``OSError`` / ``ValueError`` (closed file) no longer cascade into + process exit — the update keeps going, only the on-screen output + stops. + + Combined with ``SIGHUP -> SIG_IGN`` installed by + ``_install_hangup_protection``, this makes ``hermes update`` safe to + run in a plain SSH session that might disconnect mid-install. + """ + + def __init__(self, original, log_file): + self._original = original + self._log = log_file + self._original_broken = False + + def write(self, data): + # Mirror to the log file first — it's the most reliable destination. + if self._log is not None: + try: + self._log.write(data) + except Exception: + # Log errors should never abort the update. + pass + + if self._original_broken: + return len(data) if isinstance(data, (str, bytes)) else 0 + + try: + return self._original.write(data) + except (BrokenPipeError, OSError, ValueError): + # Terminal vanished (SSH disconnect, shell close). Stop trying + # to write to it, but keep the update running. + self._original_broken = True + return len(data) if isinstance(data, (str, bytes)) else 0 + + def flush(self): + if self._log is not None: + try: + self._log.flush() + except Exception: + pass + if self._original_broken: + return + try: + self._original.flush() + except (BrokenPipeError, OSError, ValueError): + self._original_broken = True + + def isatty(self): + if self._original_broken: + return False + try: + return self._original.isatty() + except Exception: + return False + + def fileno(self): + # Some tools probe fileno(); defer to the underlying stream and let + # callers handle failures (same behaviour as the unwrapped stream). + return self._original.fileno() + + def __getattr__(self, name): + return getattr(self._original, name) + + +def _install_hangup_protection(gateway_mode: bool = False): + """Protect ``cmd_update`` from SIGHUP and broken terminal pipes. + + Users commonly run ``hermes update`` in an SSH session or a terminal + that may close mid-install. Without protection, ``SIGHUP`` from the + terminal kills the Python process during ``pip install`` and leaves + the venv half-installed; the documented workaround ("use screen / + tmux") shouldn't be required for something as routine as an update. + + Protections installed: + + 1. ``SIGHUP`` is set to ``SIG_IGN``. POSIX preserves ``SIG_IGN`` + across ``exec()``, so pip and git subprocesses also stop dying on + hangup. + 2. ``sys.stdout`` / ``sys.stderr`` are wrapped to mirror output to + ``~/.hermes/logs/update.log`` and to silently absorb + ``BrokenPipeError`` when the terminal vanishes. + + ``SIGINT`` (Ctrl-C) and ``SIGTERM`` (systemd shutdown) are + **intentionally left alone** — those are legitimate cancellation + signals the user or OS sent on purpose. + + In gateway mode (``hermes update --gateway``) the update is already + spawned detached from a terminal, so this function is a no-op. + + Returns a dict that ``cmd_update`` can pass to + ``_finalize_update_output`` on exit. Returning a dict rather than a + tuple keeps the call site forward-compatible with future additions. + """ + state = { + "prev_stdout": sys.stdout, + "prev_stderr": sys.stderr, + "log_file": None, + "installed": False, + } + + if gateway_mode: + return state + + import signal as _signal + + # (1) Ignore SIGHUP for the remainder of this process. + if hasattr(_signal, "SIGHUP"): + try: + _signal.signal(_signal.SIGHUP, _signal.SIG_IGN) + except (ValueError, OSError): + # Called from a non-main thread — not fatal. The update still + # runs, just without hangup protection. + pass + + # (2) Mirror output to update.log and wrap stdio for broken-pipe + # tolerance. Any failure here is non-fatal; we just skip the wrap. + try: + from hermes_cli.config import get_hermes_home + + logs_dir = get_hermes_home() / "logs" + logs_dir.mkdir(parents=True, exist_ok=True) + log_path = logs_dir / "update.log" + log_file = open(log_path, "a", buffering=1, encoding="utf-8") + + import datetime as _dt + + log_file.write( + f"\n=== hermes update started " + f"{_dt.datetime.now().isoformat(timespec='seconds')} ===\n" + ) + + state["log_file"] = log_file + sys.stdout = _UpdateOutputStream(state["prev_stdout"], log_file) + sys.stderr = _UpdateOutputStream(state["prev_stderr"], log_file) + state["installed"] = True + except Exception: + # Leave stdio untouched on any setup failure. Update continues + # without mirroring. + state["log_file"] = None + + return state + + +def _finalize_update_output(state): + """Restore stdio and close the update.log handle opened by ``_install_hangup_protection``.""" + if not state: + return + if state.get("installed"): + try: + sys.stdout = state.get("prev_stdout", sys.stdout) + except Exception: + pass + try: + sys.stderr = state.get("prev_stderr", sys.stderr) + except Exception: + pass + log_file = state.get("log_file") + if log_file is not None: + try: + log_file.flush() + log_file.close() + except Exception: + pass + + def cmd_update(args): - """Update Hermes Agent to the latest version.""" + """Update Hermes Agent to the latest version. + + Thin wrapper around ``_cmd_update_impl``: installs hangup protection, + runs the update, then restores stdio on the way out (even on + ``sys.exit`` or unhandled exceptions). + """ from hermes_cli.config import is_managed, managed_error if is_managed(): @@ -4994,6 +5173,20 @@ def cmd_update(args): return gateway_mode = getattr(args, "gateway", False) + + # Protect against mid-update terminal disconnects (SIGHUP) and tolerate + # writes to a closed stdout. No-op in gateway mode. See + # _install_hangup_protection for rationale. + _update_io_state = _install_hangup_protection(gateway_mode=gateway_mode) + try: + _cmd_update_impl(args, gateway_mode=gateway_mode) + finally: + _finalize_update_output(_update_io_state) + + +def _cmd_update_impl(args, gateway_mode: bool): + """Body of ``cmd_update`` — kept separate so the wrapper can always + restore stdio even on ``sys.exit``.""" # In gateway mode, use file-based IPC for prompts instead of stdin gw_input_fn = ( (lambda prompt, default="": _gateway_prompt(prompt, default)) diff --git a/tests/hermes_cli/test_update_hangup_protection.py b/tests/hermes_cli/test_update_hangup_protection.py new file mode 100644 index 00000000000..e5c81a45a01 --- /dev/null +++ b/tests/hermes_cli/test_update_hangup_protection.py @@ -0,0 +1,325 @@ +"""Tests for SIGHUP protection and stdout mirroring in ``hermes update``. + +Covers ``_UpdateOutputStream``, ``_install_hangup_protection``, and +``_finalize_update_output`` in ``hermes_cli/main.py``. These exist so +that ``hermes update`` survives a terminal disconnect mid-install +(SSH drop, shell close) without leaving the venv half-installed. +""" + +from __future__ import annotations + +import io +import os +import signal +import sys +from pathlib import Path +from unittest.mock import patch + +import pytest + +from hermes_cli.main import ( + _UpdateOutputStream, + _finalize_update_output, + _install_hangup_protection, +) + + +# ----------------------------------------------------------------------------- +# _UpdateOutputStream +# ----------------------------------------------------------------------------- + + +class TestUpdateOutputStream: + def test_write_mirrors_to_both_original_and_log(self): + original = io.StringIO() + log = io.StringIO() + stream = _UpdateOutputStream(original, log) + + stream.write("hello world\n") + + assert original.getvalue() == "hello world\n" + assert log.getvalue() == "hello world\n" + + def test_write_continues_after_broken_original(self): + """When the terminal disconnects, original.write raises BrokenPipeError. + + The wrapper must catch it, flip the broken flag, and keep writing to + the log from then on. + """ + log = io.StringIO() + + class _BrokenStream: + def write(self, data): + raise BrokenPipeError("terminal gone") + + def flush(self): + raise BrokenPipeError("terminal gone") + + stream = _UpdateOutputStream(_BrokenStream(), log) + + # First write triggers the broken-pipe path. + stream.write("first line\n") + # Subsequent writes take the fast broken path (no exception). + stream.write("second line\n") + + assert log.getvalue() == "first line\nsecond line\n" + assert stream._original_broken is True + + def test_write_tolerates_oserror_and_valueerror(self): + """OSError (EIO) and ValueError (closed file) should also be absorbed.""" + log = io.StringIO() + + class _RaisingStream: + def __init__(self, exc): + self._exc = exc + + def write(self, data): + raise self._exc + + def flush(self): + raise self._exc + + for exc in (OSError("EIO"), ValueError("closed file")): + stream = _UpdateOutputStream(_RaisingStream(exc), log) + stream.write("x\n") + assert stream._original_broken is True + + def test_log_failure_does_not_abort_write(self): + """Even if the log file write raises, the original write must still happen.""" + class _BrokenLog: + def write(self, data): + raise OSError("disk full") + + def flush(self): + raise OSError("disk full") + + original = io.StringIO() + stream = _UpdateOutputStream(original, _BrokenLog()) + + stream.write("data\n") + + assert original.getvalue() == "data\n" + + def test_flush_tolerates_broken_original(self): + class _BrokenStream: + def write(self, data): + return len(data) + + def flush(self): + raise BrokenPipeError("gone") + + log = io.StringIO() + stream = _UpdateOutputStream(_BrokenStream(), log) + stream.flush() # must not raise + assert stream._original_broken is True + + def test_isatty_delegates_to_original(self): + class _TtyStream: + def isatty(self): + return True + + def write(self, data): + return len(data) + + def flush(self): + return None + + stream = _UpdateOutputStream(_TtyStream(), io.StringIO()) + assert stream.isatty() is True + + def test_isatty_returns_false_after_broken(self): + class _BrokenStream: + def isatty(self): + return True + + def write(self, data): + raise BrokenPipeError() + + def flush(self): + return None + + stream = _UpdateOutputStream(_BrokenStream(), io.StringIO()) + stream.write("x") # marks broken + assert stream.isatty() is False + + def test_getattr_delegates_unknown_attrs(self): + class _StreamWithEncoding: + encoding = "utf-8" + + def write(self, data): + return len(data) + + def flush(self): + return None + + stream = _UpdateOutputStream(_StreamWithEncoding(), io.StringIO()) + assert stream.encoding == "utf-8" + + +# ----------------------------------------------------------------------------- +# _install_hangup_protection +# ----------------------------------------------------------------------------- + + +class TestInstallHangupProtection: + def test_gateway_mode_is_noop(self): + """In gateway mode the process is already detached — don't touch stdio or signals.""" + prev_out, prev_err = sys.stdout, sys.stderr + prev_sighup = signal.getsignal(signal.SIGHUP) if hasattr(signal, "SIGHUP") else None + + state = _install_hangup_protection(gateway_mode=True) + + try: + assert sys.stdout is prev_out + assert sys.stderr is prev_err + assert state["log_file"] is None + assert state["installed"] is False + if hasattr(signal, "SIGHUP"): + assert signal.getsignal(signal.SIGHUP) == prev_sighup + finally: + _finalize_update_output(state) + + @pytest.mark.skipif( + not hasattr(signal, "SIGHUP"), reason="SIGHUP not available on this platform" + ) + def test_installs_sighup_ignore(self, tmp_path, monkeypatch): + """SIGHUP should be set to SIG_IGN so SSH disconnect doesn't kill the update.""" + monkeypatch.setenv("HERMES_HOME", str(tmp_path)) + # Clear cached get_hermes_home if present + import hermes_cli.config as _cfg + if hasattr(_cfg, "_HERMES_HOME_CACHE"): + _cfg._HERMES_HOME_CACHE = None # type: ignore[attr-defined] + + original_handler = signal.getsignal(signal.SIGHUP) + state = _install_hangup_protection(gateway_mode=False) + + try: + assert signal.getsignal(signal.SIGHUP) == signal.SIG_IGN + finally: + _finalize_update_output(state) + # Restore whatever was there before so we don't leak to other tests. + signal.signal(signal.SIGHUP, original_handler) + + def test_wraps_stdout_and_stderr_with_mirror(self, tmp_path, monkeypatch): + monkeypatch.setenv("HERMES_HOME", str(tmp_path)) + # Nuke any cached home path + import hermes_cli.config as _cfg + if hasattr(_cfg, "_HERMES_HOME_CACHE"): + _cfg._HERMES_HOME_CACHE = None # type: ignore[attr-defined] + + prev_out, prev_err = sys.stdout, sys.stderr + state = _install_hangup_protection(gateway_mode=False) + + try: + # On Windows (no SIGHUP) we still wrap stdio and create the log. + assert state["installed"] is True + assert isinstance(sys.stdout, _UpdateOutputStream) + assert isinstance(sys.stderr, _UpdateOutputStream) + assert state["log_file"] is not None + + sys.stdout.write("checking mirror\n") + sys.stdout.flush() + + log_path = tmp_path / "logs" / "update.log" + assert log_path.exists() + contents = log_path.read_text(encoding="utf-8") + assert "checking mirror" in contents + assert "hermes update started" in contents + finally: + _finalize_update_output(state) + # Sanity-check restoration + assert sys.stdout is prev_out + assert sys.stderr is prev_err + + def test_logs_dir_created_if_missing(self, tmp_path, monkeypatch): + monkeypatch.setenv("HERMES_HOME", str(tmp_path)) + import hermes_cli.config as _cfg + if hasattr(_cfg, "_HERMES_HOME_CACHE"): + _cfg._HERMES_HOME_CACHE = None # type: ignore[attr-defined] + + # No logs/ dir yet. + assert not (tmp_path / "logs").exists() + + state = _install_hangup_protection(gateway_mode=False) + try: + assert (tmp_path / "logs").is_dir() + assert (tmp_path / "logs" / "update.log").exists() + finally: + _finalize_update_output(state) + + def test_non_fatal_if_log_setup_fails(self, monkeypatch): + """If get_hermes_home() raises, stdio must be left untouched but SIGHUP still handled.""" + prev_out, prev_err = sys.stdout, sys.stderr + + def _boom(): + raise RuntimeError("no home for you") + + # Patch the import inside _install_hangup_protection. + monkeypatch.setattr( + "hermes_cli.config.get_hermes_home", _boom, raising=True + ) + + original_handler = ( + signal.getsignal(signal.SIGHUP) if hasattr(signal, "SIGHUP") else None + ) + + state = _install_hangup_protection(gateway_mode=False) + + try: + assert sys.stdout is prev_out + assert sys.stderr is prev_err + assert state["installed"] is False + # SIGHUP must still be installed even when log setup fails. + if hasattr(signal, "SIGHUP"): + assert signal.getsignal(signal.SIGHUP) == signal.SIG_IGN + finally: + _finalize_update_output(state) + if hasattr(signal, "SIGHUP") and original_handler is not None: + signal.signal(signal.SIGHUP, original_handler) + + +# ----------------------------------------------------------------------------- +# _finalize_update_output +# ----------------------------------------------------------------------------- + + +class TestFinalizeUpdateOutput: + def test_none_state_is_noop(self): + _finalize_update_output(None) # must not raise + + def test_restores_streams_and_closes_log(self, tmp_path, monkeypatch): + monkeypatch.setenv("HERMES_HOME", str(tmp_path)) + import hermes_cli.config as _cfg + if hasattr(_cfg, "_HERMES_HOME_CACHE"): + _cfg._HERMES_HOME_CACHE = None # type: ignore[attr-defined] + + prev_out = sys.stdout + state = _install_hangup_protection(gateway_mode=False) + log_file = state["log_file"] + + assert sys.stdout is not prev_out + assert log_file is not None + + _finalize_update_output(state) + + assert sys.stdout is prev_out + # The log file handle should be closed. + assert log_file.closed is True + + def test_skipped_install_leaves_stdio_alone(self): + """When install failed (state['installed']=False) finalize should not + touch sys.stdout / sys.stderr (they were never wrapped).""" + # Build a synthetic state that mimics a failed install. + sentinel_out = object() + state = { + "prev_stdout": sentinel_out, + "prev_stderr": sentinel_out, + "log_file": None, + "installed": False, + } + before_out, before_err = sys.stdout, sys.stderr + + _finalize_update_output(state) + + assert sys.stdout is before_out + assert sys.stderr is before_err diff --git a/website/docs/getting-started/updating.md b/website/docs/getting-started/updating.md index b0e34e07dec..eb74427a0a0 100644 --- a/website/docs/getting-started/updating.md +++ b/website/docs/getting-started/updating.md @@ -59,6 +59,21 @@ Already up to date. (or: Updating abc1234..def5678) If `git status --short` shows unexpected changes after `hermes update`, stop and inspect them before continuing. This usually means local modifications were reapplied on top of the updated code, or a dependency step refreshed lockfiles. ::: +### If your terminal disconnects mid-update + +`hermes update` protects itself against accidental terminal loss: + +- The update ignores `SIGHUP`, so closing your SSH session or terminal window no longer kills it mid-install. `pip` and `git` child processes inherit this protection, so the Python environment cannot be left half-installed by a dropped connection. +- All output is mirrored to `~/.hermes/logs/update.log` while the update runs. If your terminal disappears, reconnect and inspect the log to see whether the update finished and whether the gateway restart succeeded: + +```bash +tail -f ~/.hermes/logs/update.log +``` + +- `Ctrl-C` (SIGINT) and system shutdown (SIGTERM) are still honored — those are deliberate cancellations, not accidents. + +You no longer need to wrap `hermes update` in `screen` or `tmux` to survive a terminal drop. + ### Checking your current version ```bash From 994faacce894cba8f97c1ff06f65da89f56520f5 Mon Sep 17 00:00:00 2001 From: AviArora02-commits Date: Sun, 12 Apr 2026 23:23:03 +0530 Subject: [PATCH 007/143] fix: suppress Authorization: Bearer for Gemini provider to prevent HTTP 400 (#7893) --- agent/auxiliary_client.py | 27 ++++++++++++ run_agent.py | 21 ++++++++++ tests/hermes_cli/test_gemini_provider.py | 52 ++++++++++++++++++++++++ 3 files changed, 100 insertions(+) diff --git a/agent/auxiliary_client.py b/agent/auxiliary_client.py index 8adf080e31d..568d6109220 100644 --- a/agent/auxiliary_client.py +++ b/agent/auxiliary_client.py @@ -745,6 +745,15 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]: from hermes_cli.models import copilot_default_headers extra["default_headers"] = copilot_default_headers() + elif "generativelanguage.googleapis.com" in base_url.lower(): + # Google's OpenAI-compatible endpoint only accepts x-goog-api-key. + # Passing api_key= causes the SDK to inject Authorization: Bearer, + # which Google rejects with HTTP 400 "Multiple authentication + # credentials received". Use a placeholder for api_key and pass + # the real key via x-goog-api-key header instead. + # Fixes: https://github.com/NousResearch/hermes-agent/issues/7893 + extra["default_headers"] = {"x-goog-api-key": api_key} + api_key = "not-used" return OpenAI(api_key=api_key, base_url=base_url, **extra), model creds = resolve_api_key_provider_credentials(provider_id) @@ -766,6 +775,15 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]: from hermes_cli.models import copilot_default_headers extra["default_headers"] = copilot_default_headers() + elif "generativelanguage.googleapis.com" in base_url.lower(): + # Google's OpenAI-compatible endpoint only accepts x-goog-api-key. + # Passing api_key= causes the SDK to inject Authorization: Bearer, + # which Google rejects with HTTP 400 "Multiple authentication + # credentials received". Use a placeholder for api_key and pass + # the real key via x-goog-api-key header instead. + # Fixes: https://github.com/NousResearch/hermes-agent/issues/7893 + extra["default_headers"] = {"x-goog-api-key": api_key} + api_key = "not-used" return OpenAI(api_key=api_key, base_url=base_url, **extra), model return None, None @@ -1611,6 +1629,15 @@ def resolve_provider_client( from hermes_cli.models import copilot_default_headers headers.update(copilot_default_headers()) + elif "generativelanguage.googleapis.com" in base_url.lower(): + # Google's OpenAI-compatible endpoint only accepts x-goog-api-key. + # Passing api_key= causes the OpenAI SDK to inject Authorization: Bearer, + # which Google rejects with HTTP 400 "Multiple authentication credentials + # received". Use a placeholder for api_key and pass the real key via + # x-goog-api-key header instead. + # Fixes: https://github.com/NousResearch/hermes-agent/issues/7893 + headers["x-goog-api-key"] = api_key + api_key = "not-used" client = OpenAI(api_key=api_key, base_url=base_url, **({"default_headers": headers} if headers else {})) diff --git a/run_agent.py b/run_agent.py index 010715280ca..e8d23d39cac 100644 --- a/run_agent.py +++ b/run_agent.py @@ -1044,6 +1044,16 @@ class AIAgent: } elif "portal.qwen.ai" in effective_base.lower(): client_kwargs["default_headers"] = _qwen_portal_headers() + elif "generativelanguage.googleapis.com" in effective_base.lower(): + # Google's OpenAI-compatible endpoint only accepts x-goog-api-key. + # The OpenAI SDK auto-injects Authorization: Bearer when api_key= is + # set to a real value, causing HTTP 400 "Multiple authentication + # credentials received". Pass a placeholder so the SDK does not + # emit Bearer, and carry the real key via x-goog-api-key instead. + # Fixes: https://github.com/NousResearch/hermes-agent/issues/7893 + real_key = client_kwargs["api_key"] + client_kwargs["api_key"] = "not-used" + client_kwargs["default_headers"] = {"x-goog-api-key": real_key} else: # No explicit creds — use the centralized provider router from agent.auxiliary_client import resolve_provider_client @@ -5102,6 +5112,17 @@ class AIAgent: self._client_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.30.0"} elif "portal.qwen.ai" in normalized: self._client_kwargs["default_headers"] = _qwen_portal_headers() + elif "generativelanguage.googleapis.com" in normalized: + # Google's endpoint rejects Bearer tokens; use x-goog-api-key instead. + # Swap the real key out of api_key and into the header so the OpenAI + # SDK does not emit Authorization: Bearer. + # Fixes: https://github.com/NousResearch/hermes-agent/issues/7893 + real_key = self._client_kwargs.get("api_key", "") + if real_key and real_key != "not-used": + self._client_kwargs["api_key"] = "not-used" + self._client_kwargs["default_headers"] = { + "x-goog-api-key": real_key or self._client_kwargs.get("api_key", ""), + } else: self._client_kwargs.pop("default_headers", None) diff --git a/tests/hermes_cli/test_gemini_provider.py b/tests/hermes_cli/test_gemini_provider.py index 089a5cf98d1..fd16e825d14 100644 --- a/tests/hermes_cli/test_gemini_provider.py +++ b/tests/hermes_cli/test_gemini_provider.py @@ -207,6 +207,58 @@ class TestGeminiAgentInit: assert agent.api_mode == "chat_completions" assert agent.provider == "gemini" + def test_gemini_uses_x_goog_api_key_not_bearer(self, monkeypatch): + """Regression test for issue #7893. + + When provider=gemini, the OpenAI client must be constructed with + api_key='not-used' and default_headers={'x-goog-api-key': real_key}. + This prevents the SDK from injecting Authorization: Bearer, which + Google's endpoint rejects with HTTP 400. + """ + monkeypatch.setenv("GOOGLE_API_KEY", "AIzaSy_REAL_KEY") + real_key = "AIzaSy_REAL_KEY" + with patch("run_agent.OpenAI") as mock_openai: + mock_openai.return_value = MagicMock() + from run_agent import AIAgent + AIAgent( + model="gemini-2.5-flash", + provider="gemini", + api_key=real_key, + base_url="https://generativelanguage.googleapis.com/v1beta/openai", + ) + call_kwargs = mock_openai.call_args[1] + # The SDK must NOT receive the real key as api_key (which would emit Bearer) + assert call_kwargs.get("api_key") == "not-used", ( + "api_key must be 'not-used' to suppress Authorization: Bearer for Gemini" + ) + # The real key must be in x-goog-api-key header + headers = call_kwargs.get("default_headers", {}) + assert headers.get("x-goog-api-key") == real_key, ( + "x-goog-api-key header must carry the real Gemini API key" + ) + + def test_gemini_resolve_provider_client_auth(self, monkeypatch): + """Regression test for issue #7893 — resolve_provider_client path. + + When resolve_provider_client('gemini') is called, the returned OpenAI + client must use x-goog-api-key header, not Authorization: Bearer. + """ + monkeypatch.setenv("GEMINI_API_KEY", "AIzaSy_TEST_KEY") + real_key = "AIzaSy_TEST_KEY" + with patch("agent.auxiliary_client.OpenAI") as mock_openai: + mock_openai.return_value = MagicMock() + mock_openai.return_value.api_key = "not-used" + from agent.auxiliary_client import resolve_provider_client + resolve_provider_client("gemini") + call_kwargs = mock_openai.call_args[1] + assert call_kwargs.get("api_key") == "not-used", ( + "api_key must be 'not-used' to prevent Bearer injection for Gemini" + ) + headers = call_kwargs.get("default_headers", {}) + assert headers.get("x-goog-api-key") == real_key, ( + "x-goog-api-key header must carry the real Gemini API key" + ) + # ── models.dev Integration ── From c20e236b7156ad9d882567e36bae7ce3d0d95927 Mon Sep 17 00:00:00 2001 From: Teknium Date: Fri, 17 Apr 2026 21:27:43 -0700 Subject: [PATCH 008/143] chore: map AviArora02-commits author email in release AUTHOR_MAP --- scripts/release.py | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/release.py b/scripts/release.py index e8039047ceb..5e909de76ec 100755 --- a/scripts/release.py +++ b/scripts/release.py @@ -262,6 +262,7 @@ AUTHOR_MAP = { "xiayh17@gmail.com": "xiayh0107", "asurla@nvidia.com": "anniesurla", "limkuan24@gmail.com": "WideLee", + "aviralarora002@gmail.com": "AviArora02-commits", } From 5ff65dbf68a4f6b0a25cbb3ee618210f7700d322 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Fri, 17 Apr 2026 21:30:34 -0700 Subject: [PATCH 009/143] docs(execute_code): clarify that scripts run in their own temp dir, not session CWD (#11956) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Weaker models (Gemma-class) repeatedly rediscover and forget that execute_code's working directory differs from terminal()/read_file()'s, leading to os.path.exists('.env') returning False even though the file exists in the session's CWD. They then bounce between 'the file exists' and 'the file is missing' across tool calls. Adds a 'Working directory' note to the execute_code schema description pointing agents at absolute paths (os.path.expanduser) or terminal()/read_file() for inspecting user files. Carefully avoids the 'sandbox'/'isolated'/'cloud' language that commit 39b83f34 removed (it caused agents on local backends to refuse networking tasks and save false sandbox beliefs to persistent memory). Purely factual CWD guidance — no restriction implications. --- tools/code_execution_tool.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/code_execution_tool.py b/tools/code_execution_tool.py index 3e7e3f925b9..8268024fc72 100644 --- a/tools/code_execution_tool.py +++ b/tools/code_execution_tool.py @@ -1367,6 +1367,8 @@ def build_execute_code_schema(enabled_sandbox_tools: set = None) -> dict: f"{tool_lines}\n\n" "Limits: 5-minute timeout, 50KB stdout cap, max 50 tool calls per script. " "terminal() is foreground-only (no background or pty).\n\n" + "Scripts run in their own temp dir, not the session's CWD — use absolute paths " + "(os.path.expanduser('~/.hermes/.env')) or terminal()/read_file() for user files.\n\n" "Print your final result to stdout. Use Python stdlib (json, re, math, csv, " "datetime, collections, etc.) for processing between tool calls.\n\n" "Also available (no import needed — built into hermes_tools):\n" From 598cba62adb3b722d0bb49512efcead336148b98 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Fri, 17 Apr 2026 21:35:30 -0700 Subject: [PATCH 010/143] test: update stale tests to match current code (#11963) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Seven test files were asserting against older function signatures and behaviors. CI has been red on main because of accumulated test debt from other PRs; this catches the tests up. - tests/agent/test_subagent_progress.py: _build_child_progress_callback now takes (task_index, goal, parent_agent, task_count=1); update all call sites and rewrite tests that assumed the old 'batch-only' relay semantics (now relays per-tool AND flushes a summary at BATCH_SIZE). Renamed test_thinking_not_relayed_to_gateway → test_thinking_relayed_to_gateway since thinking IS now relayed as subagent.thinking. - tests/tools/test_delegate.py: _build_child_agent now requires task_count; add task_count=1 to all 8 call sites. - tests/cli/test_reasoning_command.py: AIAgent gained _stream_callback; stub it on the two test agent helpers that use spec=AIAgent / __new__. - tests/hermes_cli/test_cmd_update.py: cmd_update now runs npm install in repo root + ui-tui/ + web/ and 'npm run build' in web/; assert all four subprocess calls in the expected order. - tests/hermes_cli/test_model_validation.py: dissimilar unknown models now return accepted=False (previously True with warning); update both affected tests. - tests/tools/test_registry.py: include feishu_doc_tool and feishu_drive_tool in the expected builtin tool set. - tests/gateway/test_voice_command.py: missing-voice-deps message now suggests 'pip install PyNaCl' not 'hermes-agent[messaging]'. 411/411 pass locally across these 7 files. --- tests/agent/test_subagent_progress.py | 107 ++++++++++++---------- tests/cli/test_reasoning_command.py | 2 + tests/gateway/test_voice_command.py | 2 +- tests/hermes_cli/test_cmd_update.py | 38 ++++---- tests/hermes_cli/test_model_validation.py | 8 +- tests/tools/test_delegate.py | 8 ++ tests/tools/test_registry.py | 2 + 7 files changed, 94 insertions(+), 73 deletions(-) diff --git a/tests/agent/test_subagent_progress.py b/tests/agent/test_subagent_progress.py index 99375d6bd6a..88b2e379026 100644 --- a/tests/agent/test_subagent_progress.py +++ b/tests/agent/test_subagent_progress.py @@ -79,7 +79,7 @@ class TestBuildChildProgressCallback: parent._delegate_spinner = None parent.tool_progress_callback = None - cb = _build_child_progress_callback(0, parent) + cb = _build_child_progress_callback(0, "test goal", parent) assert cb is None def test_cli_spinner_tool_event(self): @@ -93,7 +93,7 @@ class TestBuildChildProgressCallback: parent._delegate_spinner = spinner parent.tool_progress_callback = None - cb = _build_child_progress_callback(0, parent) + cb = _build_child_progress_callback(0, "test goal", parent) assert cb is not None cb("tool.started", "web_search", "quantum computing", {}) @@ -113,7 +113,7 @@ class TestBuildChildProgressCallback: parent._delegate_spinner = spinner parent.tool_progress_callback = None - cb = _build_child_progress_callback(0, parent) + cb = _build_child_progress_callback(0, "test goal", parent) cb("_thinking", "I'll search for papers first") output = buf.getvalue() @@ -121,54 +121,64 @@ class TestBuildChildProgressCallback: assert "search for papers" in output def test_gateway_batched_progress(self): - """Gateway path should batch tool calls and flush at BATCH_SIZE.""" + """Gateway path: each tool.started relays a subagent.tool event, and a + subagent.progress summary fires once BATCH_SIZE tools accumulate.""" parent = MagicMock() parent._delegate_spinner = None parent_cb = MagicMock() parent.tool_progress_callback = parent_cb - - cb = _build_child_progress_callback(0, parent) - - # Send 4 tool calls — shouldn't flush yet (BATCH_SIZE = 5) + + cb = _build_child_progress_callback(0, "test goal", parent) + + # Each tool.started relays a subagent.tool event immediately (per-tool relay). for i in range(4): cb("tool.started", f"tool_{i}", f"arg_{i}", {}) - parent_cb.assert_not_called() - - # 5th call should trigger flush - cb("tool.started", "tool_4", "arg_4", {}) - parent_cb.assert_called_once() - call_args = parent_cb.call_args - assert "tool_0" in call_args[0][1] - assert "tool_4" in call_args[0][1] + # 4 per-tool relays so far, no batch summary yet (BATCH_SIZE=5) + events = [c.args[0] for c in parent_cb.call_args_list] + assert events == ["subagent.tool"] * 4 - def test_thinking_not_relayed_to_gateway(self): - """Thinking events should NOT be sent to gateway (too noisy).""" + # 5th call triggers another per-tool relay PLUS the batch-size summary + cb("tool.started", "tool_4", "arg_4", {}) + events = [c.args[0] for c in parent_cb.call_args_list] + assert events == ["subagent.tool"] * 5 + ["subagent.progress"] + summary_call = parent_cb.call_args_list[-1] + summary_text = summary_call.kwargs.get("preview") or summary_call.args[2] + assert "tool_0" in summary_text + assert "tool_4" in summary_text + + def test_thinking_relayed_to_gateway(self): + """Thinking events are relayed as subagent.thinking events.""" parent = MagicMock() parent._delegate_spinner = None parent_cb = MagicMock() parent.tool_progress_callback = parent_cb - - cb = _build_child_progress_callback(0, parent) + + cb = _build_child_progress_callback(0, "test goal", parent) cb("_thinking", "some reasoning text") - - parent_cb.assert_not_called() + + parent_cb.assert_called_once() + assert parent_cb.call_args.args[0] == "subagent.thinking" + assert parent_cb.call_args.args[2] == "some reasoning text" def test_parallel_callbacks_independent(self): - """Each child's callback should have independent batch state.""" + """Each child's callback batches tool names independently.""" parent = MagicMock() parent._delegate_spinner = None parent_cb = MagicMock() parent.tool_progress_callback = parent_cb - - cb0 = _build_child_progress_callback(0, parent) - cb1 = _build_child_progress_callback(1, parent) - - # Send 3 calls to each — neither should flush (batch size = 5) + + cb0 = _build_child_progress_callback(0, "goal a", parent) + cb1 = _build_child_progress_callback(1, "goal b", parent) + + # 3 tool.started per child = 6 per-tool relays; neither should hit + # the batch-size summary (batch size = 5, counted per-child). for i in range(3): - cb0(f"tool_{i}") - cb1(f"other_{i}") - - parent_cb.assert_not_called() + cb0("tool.started", f"tool_{i}", f"a_{i}", {}) + cb1("tool.started", f"other_{i}", f"b_{i}", {}) + + events = [c.args[0] for c in parent_cb.call_args_list] + assert events.count("subagent.tool") == 6 + assert "subagent.progress" not in events def test_task_index_prefix_in_batch_mode(self): """Batch mode (task_count > 1) should show 1-indexed prefix for all tasks.""" @@ -182,7 +192,7 @@ class TestBuildChildProgressCallback: parent.tool_progress_callback = None # task_index=0 in a batch of 3 → prefix "[1]" - cb0 = _build_child_progress_callback(0, parent, task_count=3) + cb0 = _build_child_progress_callback(0, "test goal", parent, task_count=3) cb0("web_search", "test") output = buf.getvalue() assert "[1]" in output @@ -190,7 +200,7 @@ class TestBuildChildProgressCallback: # task_index=2 in a batch of 3 → prefix "[3]" buf.truncate(0) buf.seek(0) - cb2 = _build_child_progress_callback(2, parent, task_count=3) + cb2 = _build_child_progress_callback(2, "test goal", parent, task_count=3) cb2("web_search", "test") output = buf.getvalue() assert "[3]" in output @@ -206,7 +216,7 @@ class TestBuildChildProgressCallback: parent._delegate_spinner = spinner parent.tool_progress_callback = None - cb = _build_child_progress_callback(0, parent, task_count=1) + cb = _build_child_progress_callback(0, "test goal", parent, task_count=1) cb("tool.started", "web_search", "test", {}) output = buf.getvalue() @@ -321,26 +331,31 @@ class TestBatchFlush: """Tests for gateway batch flush on subagent completion.""" def test_flush_sends_remaining_batch(self): - """_flush should send remaining tool names to gateway.""" + """_flush should send a final subagent.progress summary of any unsent + tool names in the batch (less than BATCH_SIZE).""" parent = MagicMock() parent._delegate_spinner = None parent_cb = MagicMock() parent.tool_progress_callback = parent_cb - cb = _build_child_progress_callback(0, parent) + cb = _build_child_progress_callback(0, "test goal", parent) - # Send 3 tools (below batch size of 5) + # Send 3 tools (below batch size of 5) — each relays subagent.tool cb("tool.started", "web_search", "query1", {}) cb("tool.started", "read_file", "file.txt", {}) cb("tool.started", "write_file", "out.txt", {}) - parent_cb.assert_not_called() + events = [c.args[0] for c in parent_cb.call_args_list] + assert events == ["subagent.tool"] * 3 # per-tool relays so far + assert "subagent.progress" not in events # no batch-size summary yet - # Flush should send the remaining 3 + # Flush should send the remaining 3 as a summary cb._flush() - parent_cb.assert_called_once() - summary = parent_cb.call_args[0][1] - assert "web_search" in summary - assert "write_file" in summary + events = [c.args[0] for c in parent_cb.call_args_list] + assert events[-1] == "subagent.progress" + summary_call = parent_cb.call_args_list[-1] + summary_text = summary_call.kwargs.get("preview") or summary_call.args[2] + assert "web_search" in summary_text + assert "write_file" in summary_text def test_flush_noop_when_batch_empty(self): """_flush should not send anything when batch is empty.""" @@ -349,7 +364,7 @@ class TestBatchFlush: parent_cb = MagicMock() parent.tool_progress_callback = parent_cb - cb = _build_child_progress_callback(0, parent) + cb = _build_child_progress_callback(0, "test goal", parent) cb._flush() parent_cb.assert_not_called() @@ -364,7 +379,7 @@ class TestBatchFlush: parent._delegate_spinner = spinner parent.tool_progress_callback = None - cb = _build_child_progress_callback(0, parent) + cb = _build_child_progress_callback(0, "test goal", parent) cb("tool.started", "web_search", "test", {}) cb._flush() # Should not crash diff --git a/tests/cli/test_reasoning_command.py b/tests/cli/test_reasoning_command.py index 554cb6f96bc..228d2904b16 100644 --- a/tests/cli/test_reasoning_command.py +++ b/tests/cli/test_reasoning_command.py @@ -473,6 +473,7 @@ class TestInlineThinkBlockExtraction(unittest.TestCase): agent.verbose_logging = False agent.reasoning_callback = None agent.stream_delta_callback = None # non-streaming by default + agent._stream_callback = None # non-streaming by default return agent def test_single_think_block_extracted(self): @@ -619,6 +620,7 @@ class TestReasoningDeltasFiredFlag(unittest.TestCase): agent = AIAgent.__new__(AIAgent) agent.reasoning_callback = None agent.stream_delta_callback = None + agent._stream_callback = None agent.verbose_logging = False return agent diff --git a/tests/gateway/test_voice_command.py b/tests/gateway/test_voice_command.py index f0c3171d6e7..f25fb972e44 100644 --- a/tests/gateway/test_voice_command.py +++ b/tests/gateway/test_voice_command.py @@ -758,7 +758,7 @@ class TestVoiceChannelCommands: result = await runner._handle_voice_channel_join(event) assert "voice dependencies are missing" in result.lower() - assert "hermes-agent[messaging]" in result + assert "PyNaCl" in result # -- _handle_voice_channel_leave -- diff --git a/tests/hermes_cli/test_cmd_update.py b/tests/hermes_cli/test_cmd_update.py index c8f284228bd..1e6a2245b2d 100644 --- a/tests/hermes_cli/test_cmd_update.py +++ b/tests/hermes_cli/test_cmd_update.py @@ -124,29 +124,23 @@ class TestCmdUpdateBranchFallback: if call.args and call.args[0][0] == "/usr/bin/npm" ] + # cmd_update runs npm commands in three locations: + # 1. repo root — slash-command / TUI bridge deps + # 2. ui-tui/ — Ink TUI deps + # 3. web/ — install + "npm run build" for the web frontend + full_flags = [ + "/usr/bin/npm", + "install", + "--silent", + "--no-fund", + "--no-audit", + "--progress=false", + ] assert npm_calls == [ - ( - [ - "/usr/bin/npm", - "install", - "--silent", - "--no-fund", - "--no-audit", - "--progress=false", - ], - PROJECT_ROOT, - ), - ( - [ - "/usr/bin/npm", - "install", - "--silent", - "--no-fund", - "--no-audit", - "--progress=false", - ], - PROJECT_ROOT / "ui-tui", - ), + (full_flags, PROJECT_ROOT), + (full_flags, PROJECT_ROOT / "ui-tui"), + (["/usr/bin/npm", "install", "--silent"], PROJECT_ROOT / "web"), + (["/usr/bin/npm", "run", "build"], PROJECT_ROOT / "web"), ] def test_update_non_interactive_skips_migration_prompt(self, mock_args, capsys): diff --git a/tests/hermes_cli/test_model_validation.py b/tests/hermes_cli/test_model_validation.py index cbd41216622..1ddf6ab6399 100644 --- a/tests/hermes_cli/test_model_validation.py +++ b/tests/hermes_cli/test_model_validation.py @@ -450,9 +450,9 @@ class TestValidateApiNotFound: assert result["recognized"] is True def test_dissimilar_model_shows_suggestions_not_autocorrect(self): - """Models too different for auto-correction still get suggestions.""" + """Models too different for auto-correction are rejected with suggestions.""" result = _validate("anthropic/claude-nonexistent") - assert result["accepted"] is True + assert result["accepted"] is False assert result.get("corrected_model") is None assert "not found" in result["message"] @@ -532,11 +532,11 @@ class TestValidateCodexAutoCorrection: assert result["message"] is None def test_very_different_name_falls_to_suggestions(self): - """Names too different for auto-correction get the suggestion list.""" + """Names too different for auto-correction are rejected with a suggestion list.""" codex_models = ["gpt-5.4-mini", "gpt-5.4", "gpt-5.3-codex"] with patch("hermes_cli.models.provider_model_ids", return_value=codex_models): result = validate_requested_model("totally-wrong", "openai-codex") - assert result["accepted"] is True + assert result["accepted"] is False assert result["recognized"] is False assert result.get("corrected_model") is None assert "not found" in result["message"] diff --git a/tests/tools/test_delegate.py b/tests/tools/test_delegate.py index 3299b927e56..e1e119d9199 100644 --- a/tests/tools/test_delegate.py +++ b/tests/tools/test_delegate.py @@ -274,6 +274,7 @@ class TestDelegateTask(unittest.TestCase): model=None, max_iterations=10, parent_agent=parent, + task_count=1, ) self.assertIs(mock_child._print_fn, sink) @@ -294,6 +295,7 @@ class TestDelegateTask(unittest.TestCase): model=None, max_iterations=10, parent_agent=parent, + task_count=1, ) self.assertTrue(callable(mock_child.thinking_callback)) @@ -363,6 +365,7 @@ class TestToolNamePreservation(unittest.TestCase): model=None, max_iterations=10, parent_agent=parent, + task_count=1, ) except NameError as exc: self.fail( @@ -1000,6 +1003,7 @@ class TestChildCredentialPoolResolution(unittest.TestCase): model=None, max_iterations=10, parent_agent=parent, + task_count=1, ) self.assertEqual(mock_child._credential_pool, mock_pool) @@ -1225,6 +1229,7 @@ class TestDelegationReasoningEffort(unittest.TestCase): _build_child_agent( task_index=0, goal="test", context=None, toolsets=None, model=None, max_iterations=50, parent_agent=parent, + task_count=1, ) call_kwargs = MockAgent.call_args[1] self.assertEqual(call_kwargs["reasoning_config"], {"enabled": True, "effort": "xhigh"}) @@ -1241,6 +1246,7 @@ class TestDelegationReasoningEffort(unittest.TestCase): _build_child_agent( task_index=0, goal="test", context=None, toolsets=None, model=None, max_iterations=50, parent_agent=parent, + task_count=1, ) call_kwargs = MockAgent.call_args[1] self.assertEqual(call_kwargs["reasoning_config"], {"enabled": True, "effort": "low"}) @@ -1257,6 +1263,7 @@ class TestDelegationReasoningEffort(unittest.TestCase): _build_child_agent( task_index=0, goal="test", context=None, toolsets=None, model=None, max_iterations=50, parent_agent=parent, + task_count=1, ) call_kwargs = MockAgent.call_args[1] self.assertEqual(call_kwargs["reasoning_config"], {"enabled": False}) @@ -1273,6 +1280,7 @@ class TestDelegationReasoningEffort(unittest.TestCase): _build_child_agent( task_index=0, goal="test", context=None, toolsets=None, model=None, max_iterations=50, parent_agent=parent, + task_count=1, ) call_kwargs = MockAgent.call_args[1] self.assertEqual(call_kwargs["reasoning_config"], {"enabled": True, "effort": "medium"}) diff --git a/tests/tools/test_registry.py b/tests/tools/test_registry.py index 85246bd7609..eb895e55a1a 100644 --- a/tests/tools/test_registry.py +++ b/tests/tools/test_registry.py @@ -296,6 +296,8 @@ class TestBuiltinDiscovery: "tools.code_execution_tool", "tools.cronjob_tools", "tools.delegate_tool", + "tools.feishu_doc_tool", + "tools.feishu_drive_tool", "tools.file_tools", "tools.homeassistant_tool", "tools.image_generation_tool", From 73bccc94c7af3a07b4002c2a14a4b54f844bd561 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Fri, 17 Apr 2026 21:36:40 -0700 Subject: [PATCH 011/143] =?UTF-8?q?skills:=20consolidate=20mlops=20redunda?= =?UTF-8?q?ncies=20(gguf+llama-cpp,=20grpo+trl,=20guidance=E2=86=92optiona?= =?UTF-8?q?l)=20(#11965)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three tightly-scoped built-in skill consolidations to reduce redundancy in the available_skills listing injected into every system prompt: 1. gguf-quantization → llama-cpp (merged) GGUF is llama.cpp's format; two skills covered the same toolchain. The merged llama-cpp skill keeps the full K-quant table + imatrix workflow from gguf and the ROCm/benchmarks/supported-models sections from the original llama-cpp. All 5 reference files preserved. 2. grpo-rl-training → fine-tuning-with-trl (folded in) GRPO isn't a framework, it's a trainer inside TRL. Moved the 17KB deep-dive SKILL.md to references/grpo-training.md and the working template to templates/basic_grpo_training.py. TRL's GRPO workflow section now points to both. Atropos skill's related_skills updated. 3. guidance → optional-skills/mlops/ Dropped from built-in. Outlines (still built-in) covers the same structured-generation ground with wider adoption. Listed in the optional catalog for users who specifically want Guidance. Net: 3 fewer built-in skill lines in every system prompt, zero content loss. Contributor authorship preserved via git rename detection. --- .../mlops}/guidance/SKILL.md | 0 .../mlops}/guidance/references/backends.md | 0 .../mlops}/guidance/references/constraints.md | 0 .../mlops}/guidance/references/examples.md | 0 .../hermes-atropos-environments/SKILL.md | 2 +- skills/mlops/inference/gguf/SKILL.md | 430 --------------- skills/mlops/inference/llama-cpp/SKILL.md | 491 ++++++++++++------ .../references/advanced-usage.md | 0 .../references/troubleshooting.md | 0 .../mlops/training/grpo-rl-training/README.md | 97 ---- .../mlops/training/trl-fine-tuning/SKILL.md | 4 + .../references/grpo-training.md} | 329 +++++------- .../templates/basic_grpo_training.py | 0 .../docs/reference/optional-skills-catalog.md | 1 + website/docs/reference/skills-catalog.md | 5 +- 15 files changed, 470 insertions(+), 889 deletions(-) rename {skills/mlops/inference => optional-skills/mlops}/guidance/SKILL.md (100%) rename {skills/mlops/inference => optional-skills/mlops}/guidance/references/backends.md (100%) rename {skills/mlops/inference => optional-skills/mlops}/guidance/references/constraints.md (100%) rename {skills/mlops/inference => optional-skills/mlops}/guidance/references/examples.md (100%) delete mode 100644 skills/mlops/inference/gguf/SKILL.md rename skills/mlops/inference/{gguf => llama-cpp}/references/advanced-usage.md (100%) rename skills/mlops/inference/{gguf => llama-cpp}/references/troubleshooting.md (100%) delete mode 100644 skills/mlops/training/grpo-rl-training/README.md rename skills/mlops/training/{grpo-rl-training/SKILL.md => trl-fine-tuning/references/grpo-training.md} (56%) rename skills/mlops/training/{grpo-rl-training => trl-fine-tuning}/templates/basic_grpo_training.py (100%) diff --git a/skills/mlops/inference/guidance/SKILL.md b/optional-skills/mlops/guidance/SKILL.md similarity index 100% rename from skills/mlops/inference/guidance/SKILL.md rename to optional-skills/mlops/guidance/SKILL.md diff --git a/skills/mlops/inference/guidance/references/backends.md b/optional-skills/mlops/guidance/references/backends.md similarity index 100% rename from skills/mlops/inference/guidance/references/backends.md rename to optional-skills/mlops/guidance/references/backends.md diff --git a/skills/mlops/inference/guidance/references/constraints.md b/optional-skills/mlops/guidance/references/constraints.md similarity index 100% rename from skills/mlops/inference/guidance/references/constraints.md rename to optional-skills/mlops/guidance/references/constraints.md diff --git a/skills/mlops/inference/guidance/references/examples.md b/optional-skills/mlops/guidance/references/examples.md similarity index 100% rename from skills/mlops/inference/guidance/references/examples.md rename to optional-skills/mlops/guidance/references/examples.md diff --git a/optional-skills/mlops/hermes-atropos-environments/SKILL.md b/optional-skills/mlops/hermes-atropos-environments/SKILL.md index 9dff4668767..5101886b41a 100644 --- a/optional-skills/mlops/hermes-atropos-environments/SKILL.md +++ b/optional-skills/mlops/hermes-atropos-environments/SKILL.md @@ -7,7 +7,7 @@ license: MIT metadata: hermes: tags: [atropos, rl, environments, training, reinforcement-learning, reward-functions] - related_skills: [axolotl, grpo-rl-training, trl-fine-tuning, lm-evaluation-harness] + related_skills: [axolotl, fine-tuning-with-trl, lm-evaluation-harness] --- # Hermes Agent Atropos Environments diff --git a/skills/mlops/inference/gguf/SKILL.md b/skills/mlops/inference/gguf/SKILL.md deleted file mode 100644 index 21bb176c8f9..00000000000 --- a/skills/mlops/inference/gguf/SKILL.md +++ /dev/null @@ -1,430 +0,0 @@ ---- -name: gguf-quantization -description: GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements. -version: 1.0.0 -author: Orchestra Research -license: MIT -dependencies: [llama-cpp-python>=0.2.0] -metadata: - hermes: - tags: [GGUF, Quantization, llama.cpp, CPU Inference, Apple Silicon, Model Compression, Optimization] - ---- - -# GGUF - Quantization Format for llama.cpp - -The GGUF (GPT-Generated Unified Format) is the standard file format for llama.cpp, enabling efficient inference on CPUs, Apple Silicon, and GPUs with flexible quantization options. - -## When to use GGUF - -**Use GGUF when:** -- Deploying on consumer hardware (laptops, desktops) -- Running on Apple Silicon (M1/M2/M3) with Metal acceleration -- Need CPU inference without GPU requirements -- Want flexible quantization (Q2_K to Q8_0) -- Using local AI tools (LM Studio, Ollama, text-generation-webui) - -**Key advantages:** -- **Universal hardware**: CPU, Apple Silicon, NVIDIA, AMD support -- **No Python runtime**: Pure C/C++ inference -- **Flexible quantization**: 2-8 bit with various methods (K-quants) -- **Ecosystem support**: LM Studio, Ollama, koboldcpp, and more -- **imatrix**: Importance matrix for better low-bit quality - -**Use alternatives instead:** -- **AWQ/GPTQ**: Maximum accuracy with calibration on NVIDIA GPUs -- **HQQ**: Fast calibration-free quantization for HuggingFace -- **bitsandbytes**: Simple integration with transformers library -- **TensorRT-LLM**: Production NVIDIA deployment with maximum speed - -## Quick start - -### Installation - -```bash -# Clone llama.cpp -git clone https://github.com/ggml-org/llama.cpp -cd llama.cpp - -# Build (CPU) -make - -# Build with CUDA (NVIDIA) -make GGML_CUDA=1 - -# Build with Metal (Apple Silicon) -make GGML_METAL=1 - -# Install Python bindings (optional) -pip install llama-cpp-python -``` - -### Convert model to GGUF - -```bash -# Install requirements -pip install -r requirements.txt - -# Convert HuggingFace model to GGUF (FP16) -python convert_hf_to_gguf.py ./path/to/model --outfile model-f16.gguf - -# Or specify output type -python convert_hf_to_gguf.py ./path/to/model \ - --outfile model-f16.gguf \ - --outtype f16 -``` - -### Quantize model - -```bash -# Basic quantization to Q4_K_M -./llama-quantize model-f16.gguf model-q4_k_m.gguf Q4_K_M - -# Quantize with importance matrix (better quality) -./llama-imatrix -m model-f16.gguf -f calibration.txt -o model.imatrix -./llama-quantize --imatrix model.imatrix model-f16.gguf model-q4_k_m.gguf Q4_K_M -``` - -### Run inference - -```bash -# CLI inference -./llama-cli -m model-q4_k_m.gguf -p "Hello, how are you?" - -# Interactive mode -./llama-cli -m model-q4_k_m.gguf --interactive - -# With GPU offload -./llama-cli -m model-q4_k_m.gguf -ngl 35 -p "Hello!" -``` - -## Quantization types - -### K-quant methods (recommended) - -| Type | Bits | Size (7B) | Quality | Use Case | -|------|------|-----------|---------|----------| -| Q2_K | 2.5 | ~2.8 GB | Low | Extreme compression | -| Q3_K_S | 3.0 | ~3.0 GB | Low-Med | Memory constrained | -| Q3_K_M | 3.3 | ~3.3 GB | Medium | Balance | -| Q4_K_S | 4.0 | ~3.8 GB | Med-High | Good balance | -| Q4_K_M | 4.5 | ~4.1 GB | High | **Recommended default** | -| Q5_K_S | 5.0 | ~4.6 GB | High | Quality focused | -| Q5_K_M | 5.5 | ~4.8 GB | Very High | High quality | -| Q6_K | 6.0 | ~5.5 GB | Excellent | Near-original | -| Q8_0 | 8.0 | ~7.2 GB | Best | Maximum quality | - -### Legacy methods - -| Type | Description | -|------|-------------| -| Q4_0 | 4-bit, basic | -| Q4_1 | 4-bit with delta | -| Q5_0 | 5-bit, basic | -| Q5_1 | 5-bit with delta | - -**Recommendation**: Use K-quant methods (Q4_K_M, Q5_K_M) for best quality/size ratio. - -## Conversion workflows - -### Workflow 1: HuggingFace to GGUF - -```bash -# 1. Download model -huggingface-cli download meta-llama/Llama-3.1-8B --local-dir ./llama-3.1-8b - -# 2. Convert to GGUF (FP16) -python convert_hf_to_gguf.py ./llama-3.1-8b \ - --outfile llama-3.1-8b-f16.gguf \ - --outtype f16 - -# 3. Quantize -./llama-quantize llama-3.1-8b-f16.gguf llama-3.1-8b-q4_k_m.gguf Q4_K_M - -# 4. Test -./llama-cli -m llama-3.1-8b-q4_k_m.gguf -p "Hello!" -n 50 -``` - -### Workflow 2: With importance matrix (better quality) - -```bash -# 1. Convert to GGUF -python convert_hf_to_gguf.py ./model --outfile model-f16.gguf - -# 2. Create calibration text (diverse samples) -cat > calibration.txt << 'EOF' -The quick brown fox jumps over the lazy dog. -Machine learning is a subset of artificial intelligence. -Python is a popular programming language. -# Add more diverse text samples... -EOF - -# 3. Generate importance matrix -./llama-imatrix -m model-f16.gguf \ - -f calibration.txt \ - --chunk 512 \ - -o model.imatrix \ - -ngl 35 # GPU layers if available - -# 4. Quantize with imatrix -./llama-quantize --imatrix model.imatrix \ - model-f16.gguf \ - model-q4_k_m.gguf \ - Q4_K_M -``` - -### Workflow 3: Multiple quantizations - -```bash -#!/bin/bash -MODEL="llama-3.1-8b-f16.gguf" -IMATRIX="llama-3.1-8b.imatrix" - -# Generate imatrix once -./llama-imatrix -m $MODEL -f wiki.txt -o $IMATRIX -ngl 35 - -# Create multiple quantizations -for QUANT in Q4_K_M Q5_K_M Q6_K Q8_0; do - OUTPUT="llama-3.1-8b-${QUANT,,}.gguf" - ./llama-quantize --imatrix $IMATRIX $MODEL $OUTPUT $QUANT - echo "Created: $OUTPUT ($(du -h $OUTPUT | cut -f1))" -done -``` - -## Python usage - -### llama-cpp-python - -```python -from llama_cpp import Llama - -# Load model -llm = Llama( - model_path="./model-q4_k_m.gguf", - n_ctx=4096, # Context window - n_gpu_layers=35, # GPU offload (0 for CPU only) - n_threads=8 # CPU threads -) - -# Generate -output = llm( - "What is machine learning?", - max_tokens=256, - temperature=0.7, - stop=["", "\n\n"] -) -print(output["choices"][0]["text"]) -``` - -### Chat completion - -```python -from llama_cpp import Llama - -llm = Llama( - model_path="./model-q4_k_m.gguf", - n_ctx=4096, - n_gpu_layers=35, - chat_format="llama-3" # Or "chatml", "mistral", etc. -) - -messages = [ - {"role": "system", "content": "You are a helpful assistant."}, - {"role": "user", "content": "What is Python?"} -] - -response = llm.create_chat_completion( - messages=messages, - max_tokens=256, - temperature=0.7 -) -print(response["choices"][0]["message"]["content"]) -``` - -### Streaming - -```python -from llama_cpp import Llama - -llm = Llama(model_path="./model-q4_k_m.gguf", n_gpu_layers=35) - -# Stream tokens -for chunk in llm( - "Explain quantum computing:", - max_tokens=256, - stream=True -): - print(chunk["choices"][0]["text"], end="", flush=True) -``` - -## Server mode - -### Start OpenAI-compatible server - -```bash -# Start server -./llama-server -m model-q4_k_m.gguf \ - --host 0.0.0.0 \ - --port 8080 \ - -ngl 35 \ - -c 4096 - -# Or with Python bindings -python -m llama_cpp.server \ - --model model-q4_k_m.gguf \ - --n_gpu_layers 35 \ - --host 0.0.0.0 \ - --port 8080 -``` - -### Use with OpenAI client - -```python -from openai import OpenAI - -client = OpenAI( - base_url="http://localhost:8080/v1", - api_key="not-needed" -) - -response = client.chat.completions.create( - model="local-model", - messages=[{"role": "user", "content": "Hello!"}], - max_tokens=256 -) -print(response.choices[0].message.content) -``` - -## Hardware optimization - -### Apple Silicon (Metal) - -```bash -# Build with Metal -make clean && make GGML_METAL=1 - -# Run with Metal acceleration -./llama-cli -m model.gguf -ngl 99 -p "Hello" - -# Python with Metal -llm = Llama( - model_path="model.gguf", - n_gpu_layers=99, # Offload all layers - n_threads=1 # Metal handles parallelism -) -``` - -### NVIDIA CUDA - -```bash -# Build with CUDA -make clean && make GGML_CUDA=1 - -# Run with CUDA -./llama-cli -m model.gguf -ngl 35 -p "Hello" - -# Specify GPU -CUDA_VISIBLE_DEVICES=0 ./llama-cli -m model.gguf -ngl 35 -``` - -### CPU optimization - -```bash -# Build with AVX2/AVX512 -make clean && make - -# Run with optimal threads -./llama-cli -m model.gguf -t 8 -p "Hello" - -# Python CPU config -llm = Llama( - model_path="model.gguf", - n_gpu_layers=0, # CPU only - n_threads=8, # Match physical cores - n_batch=512 # Batch size for prompt processing -) -``` - -## Integration with tools - -### Ollama - -```bash -# Create Modelfile -cat > Modelfile << 'EOF' -FROM ./model-q4_k_m.gguf -TEMPLATE """{{ .System }} -{{ .Prompt }}""" -PARAMETER temperature 0.7 -PARAMETER num_ctx 4096 -EOF - -# Create Ollama model -ollama create mymodel -f Modelfile - -# Run -ollama run mymodel "Hello!" -``` - -### LM Studio - -1. Place GGUF file in `~/.cache/lm-studio/models/` -2. Open LM Studio and select the model -3. Configure context length and GPU offload -4. Start inference - -### text-generation-webui - -```bash -# Place in models folder -cp model-q4_k_m.gguf text-generation-webui/models/ - -# Start with llama.cpp loader -python server.py --model model-q4_k_m.gguf --loader llama.cpp --n-gpu-layers 35 -``` - -## Best practices - -1. **Use K-quants**: Q4_K_M offers best quality/size balance -2. **Use imatrix**: Always use importance matrix for Q4 and below -3. **GPU offload**: Offload as many layers as VRAM allows -4. **Context length**: Start with 4096, increase if needed -5. **Thread count**: Match physical CPU cores, not logical -6. **Batch size**: Increase n_batch for faster prompt processing - -## Common issues - -**Model loads slowly:** -```bash -# Use mmap for faster loading -./llama-cli -m model.gguf --mmap -``` - -**Out of memory:** -```bash -# Reduce GPU layers -./llama-cli -m model.gguf -ngl 20 # Reduce from 35 - -# Or use smaller quantization -./llama-quantize model-f16.gguf model-q3_k_m.gguf Q3_K_M -``` - -**Poor quality at low bits:** -```bash -# Always use imatrix for Q4 and below -./llama-imatrix -m model-f16.gguf -f calibration.txt -o model.imatrix -./llama-quantize --imatrix model.imatrix model-f16.gguf model-q4_k_m.gguf Q4_K_M -``` - -## References - -- **[Advanced Usage](references/advanced-usage.md)** - Batching, speculative decoding, custom builds -- **[Troubleshooting](references/troubleshooting.md)** - Common issues, debugging, benchmarks - -## Resources - -- **Repository**: https://github.com/ggml-org/llama.cpp -- **Python Bindings**: https://github.com/abetlen/llama-cpp-python -- **Pre-quantized Models**: https://huggingface.co/TheBloke -- **GGUF Converter**: https://huggingface.co/spaces/ggml-org/gguf-my-repo -- **License**: MIT diff --git a/skills/mlops/inference/llama-cpp/SKILL.md b/skills/mlops/inference/llama-cpp/SKILL.md index 57016c920df..33fc37adb18 100644 --- a/skills/mlops/inference/llama-cpp/SKILL.md +++ b/skills/mlops/inference/llama-cpp/SKILL.md @@ -1,138 +1,271 @@ --- name: llama-cpp -description: Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU. -version: 1.0.0 +description: Run LLM inference with llama.cpp on CPU, Apple Silicon, AMD/Intel GPUs, or NVIDIA — plus GGUF model conversion and quantization (2–8 bit with K-quants and imatrix). Covers CLI, Python bindings, OpenAI-compatible server, and Ollama/LM Studio integration. Use for edge deployment, M1/M2/M3/M4 Macs, CUDA-less environments, or flexible local quantization. +version: 2.0.0 author: Orchestra Research license: MIT -dependencies: [llama-cpp-python] +dependencies: [llama-cpp-python>=0.2.0] metadata: hermes: - tags: [Inference Serving, Llama.cpp, CPU Inference, Apple Silicon, Edge Deployment, GGUF, Quantization, Non-NVIDIA, AMD GPUs, Intel GPUs, Embedded] - + tags: [llama.cpp, GGUF, Quantization, CPU Inference, Apple Silicon, Edge Deployment, Non-NVIDIA, AMD GPUs, Intel GPUs, Embedded, Model Compression] --- -# llama.cpp +# llama.cpp + GGUF -Pure C/C++ LLM inference with minimal dependencies, optimized for CPUs and non-NVIDIA hardware. +Pure C/C++ LLM inference with minimal dependencies, plus the GGUF (GPT-Generated Unified Format) standard used for quantized weights. One toolchain covers conversion, quantization, and serving. -## When to use llama.cpp +## When to use -**Use llama.cpp when:** -- Running on CPU-only machines -- Deploying on Apple Silicon (M1/M2/M3/M4) -- Using AMD or Intel GPUs (no CUDA) -- Edge deployment (Raspberry Pi, embedded systems) -- Need simple deployment without Docker/Python +**Use llama.cpp + GGUF when:** +- Running on CPU-only machines or Apple Silicon (M1/M2/M3/M4) with Metal acceleration +- Using AMD (ROCm) or Intel GPUs where CUDA isn't available +- Edge deployment (Raspberry Pi, embedded systems, consumer laptops) +- Need flexible quantization (2–8 bit with K-quants) +- Want local AI tools (LM Studio, Ollama, text-generation-webui, koboldcpp) +- Want a single binary deploy without Docker/Python -**Use TensorRT-LLM instead when:** -- Have NVIDIA GPUs (A100/H100) -- Need maximum throughput (100K+ tok/s) -- Running in datacenter with CUDA +**Key advantages:** +- Universal hardware: CPU, Apple Silicon, NVIDIA, AMD, Intel +- No Python runtime required (pure C/C++) +- K-quants + imatrix for better low-bit quality +- OpenAI-compatible server built in +- Rich ecosystem (Ollama, LM Studio, llama-cpp-python) -**Use vLLM instead when:** -- Have NVIDIA GPUs -- Need Python-first API -- Want PagedAttention +**Use alternatives instead:** +- **vLLM** — NVIDIA GPUs, PagedAttention, Python-first, max throughput +- **TensorRT-LLM** — Production NVIDIA (A100/H100), maximum speed +- **AWQ/GPTQ** — Calibrated quantization for NVIDIA-only deployments +- **bitsandbytes** — Simple HuggingFace transformers integration +- **HQQ** — Fast calibration-free quantization ## Quick start -### Installation +### Install ```bash -# macOS/Linux +# macOS / Linux (simplest) brew install llama.cpp # Or build from source -git clone https://github.com/ggerganov/llama.cpp +git clone https://github.com/ggml-org/llama.cpp cd llama.cpp -make +make # CPU +make GGML_METAL=1 # Apple Silicon +make GGML_CUDA=1 # NVIDIA CUDA +make LLAMA_HIP=1 # AMD ROCm -# With Metal (Apple Silicon) -make LLAMA_METAL=1 - -# With CUDA (NVIDIA) -make LLAMA_CUDA=1 - -# With ROCm (AMD) -make LLAMA_HIP=1 +# Python bindings (optional) +pip install llama-cpp-python +# With CUDA: CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --force-reinstall --no-cache-dir +# With Metal: CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python --force-reinstall --no-cache-dir ``` -### Download model +### Download a pre-quantized GGUF ```bash -# Download from HuggingFace (GGUF format) +# TheBloke hosts most popular models pre-quantized huggingface-cli download \ TheBloke/Llama-2-7B-Chat-GGUF \ llama-2-7b-chat.Q4_K_M.gguf \ --local-dir models/ +``` -# Or convert from HuggingFace -python convert_hf_to_gguf.py models/llama-2-7b-chat/ +### Or convert a HuggingFace model to GGUF + +```bash +# 1. Download HF model +huggingface-cli download meta-llama/Llama-3.1-8B --local-dir ./llama-3.1-8b + +# 2. Convert to FP16 GGUF +python convert_hf_to_gguf.py ./llama-3.1-8b \ + --outfile llama-3.1-8b-f16.gguf \ + --outtype f16 + +# 3. Quantize to Q4_K_M +./llama-quantize llama-3.1-8b-f16.gguf llama-3.1-8b-q4_k_m.gguf Q4_K_M ``` ### Run inference ```bash -# Simple chat -./llama-cli \ - -m models/llama-2-7b-chat.Q4_K_M.gguf \ - -p "Explain quantum computing" \ - -n 256 # Max tokens +# One-shot prompt +./llama-cli -m model.Q4_K_M.gguf -p "Explain quantum computing" -n 256 # Interactive chat -./llama-cli \ - -m models/llama-2-7b-chat.Q4_K_M.gguf \ - --interactive +./llama-cli -m model.Q4_K_M.gguf --interactive + +# With GPU offload +./llama-cli -m model.Q4_K_M.gguf -ngl 35 -p "Hello!" ``` -### Server mode +### Serve an OpenAI-compatible API ```bash -# Start OpenAI-compatible server ./llama-server \ - -m models/llama-2-7b-chat.Q4_K_M.gguf \ + -m model.Q4_K_M.gguf \ --host 0.0.0.0 \ --port 8080 \ - -ngl 32 # Offload 32 layers to GPU + -ngl 35 \ + -c 4096 \ + --parallel 4 \ + --cont-batching +``` -# Client request +```bash curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ - "model": "llama-2-7b-chat", + "model": "local", "messages": [{"role": "user", "content": "Hello!"}], "temperature": 0.7, "max_tokens": 100 }' ``` -## Quantization formats +## Quantization formats (GGUF) -### GGUF format overview +### K-quant methods (recommended) -| Format | Bits | Size (7B) | Speed | Quality | Use Case | -|--------|------|-----------|-------|---------|----------| -| **Q4_K_M** | 4.5 | 4.1 GB | Fast | Good | **Recommended default** | -| Q4_K_S | 4.3 | 3.9 GB | Faster | Lower | Speed critical | -| Q5_K_M | 5.5 | 4.8 GB | Medium | Better | Quality critical | -| Q6_K | 6.5 | 5.5 GB | Slower | Best | Maximum quality | -| Q8_0 | 8.0 | 7.0 GB | Slow | Excellent | Minimal degradation | -| Q2_K | 2.5 | 2.7 GB | Fastest | Poor | Testing only | +| Type | Bits | Size (7B) | Quality | Use Case | +|------|------|-----------|---------|----------| +| Q2_K | 2.5 | ~2.8 GB | Low | Extreme compression (testing only) | +| Q3_K_S | 3.0 | ~3.0 GB | Low-Med | Memory constrained | +| Q3_K_M | 3.3 | ~3.3 GB | Medium | Fits small devices | +| Q4_K_S | 4.0 | ~3.8 GB | Med-High | Speed critical | +| **Q4_K_M** | 4.5 | ~4.1 GB | High | **Recommended default** | +| Q5_K_S | 5.0 | ~4.6 GB | High | Quality focused | +| Q5_K_M | 5.5 | ~4.8 GB | Very High | High quality | +| Q6_K | 6.0 | ~5.5 GB | Excellent | Near-original | +| Q8_0 | 8.0 | ~7.2 GB | Best | Maximum quality, minimal degradation | -### Choosing quantization +**Variant suffixes** — `_S` (Small, faster, lower quality), `_M` (Medium, balanced), `_L` (Large, better quality). + +**Legacy (Q4_0/Q4_1/Q5_0/Q5_1) exist** but always prefer K-quants for better quality/size ratio. + +**IQ quantization** — ultra-low-bit with importance-aware methods: IQ2_XXS, IQ2_XS, IQ2_S, IQ3_XXS, IQ3_XS, IQ3_S, IQ4_XS. Require `--imatrix`. + +**Task-specific defaults:** +- General chat / assistants: Q4_K_M, or Q5_K_M if RAM allows +- Code generation: Q5_K_M or Q6_K (higher precision helps) +- Technical / medical: Q6_K or Q8_0 +- Very large (70B, 405B) on consumer hardware: Q3_K_M or Q4_K_S +- Raspberry Pi / edge: Q2_K or Q3_K_S + +## Conversion workflows + +### Basic: HF → GGUF → quantized ```bash -# General use (balanced) -Q4_K_M # 4-bit, medium quality +python convert_hf_to_gguf.py ./model --outfile model-f16.gguf --outtype f16 +./llama-quantize model-f16.gguf model-q4_k_m.gguf Q4_K_M +./llama-cli -m model-q4_k_m.gguf -p "Hello!" -n 50 +``` -# Maximum speed (more degradation) -Q2_K or Q3_K_M +### With importance matrix (imatrix) — better low-bit quality -# Maximum quality (slower) -Q6_K or Q8_0 +`imatrix` gives 10–20% perplexity improvement at Q4, essential at Q3 and below. -# Very large models (70B, 405B) -Q3_K_M or Q4_K_S # Lower bits to fit in memory +```bash +# 1. Convert to FP16 GGUF +python convert_hf_to_gguf.py ./model --outfile model-f16.gguf + +# 2. Prepare calibration data (diverse text, ~100MB is ideal) +cat > calibration.txt << 'EOF' +The quick brown fox jumps over the lazy dog. +Machine learning is a subset of artificial intelligence. +# Add more diverse text samples... +EOF + +# 3. Generate importance matrix +./llama-imatrix -m model-f16.gguf \ + -f calibration.txt \ + --chunk 512 \ + -o model.imatrix \ + -ngl 35 + +# 4. Quantize with imatrix +./llama-quantize --imatrix model.imatrix \ + model-f16.gguf model-q4_k_m.gguf Q4_K_M +``` + +### Multi-quant batch + +```bash +#!/bin/bash +MODEL="llama-3.1-8b-f16.gguf" +IMATRIX="llama-3.1-8b.imatrix" + +./llama-imatrix -m $MODEL -f wiki.txt -o $IMATRIX -ngl 35 + +for QUANT in Q4_K_M Q5_K_M Q6_K Q8_0; do + OUTPUT="llama-3.1-8b-${QUANT,,}.gguf" + ./llama-quantize --imatrix $IMATRIX $MODEL $OUTPUT $QUANT + echo "Created: $OUTPUT ($(du -h $OUTPUT | cut -f1))" +done +``` + +### Quality testing (perplexity) + +```bash +./llama-perplexity -m model.gguf -f wikitext-2-raw/wiki.test.raw -c 512 +# Baseline FP16: ~5.96 | Q4_K_M: ~6.06 (+1.7%) | Q2_K: ~6.87 (+15.3%) +``` + +## Python bindings (llama-cpp-python) + +### Basic generation + +```python +from llama_cpp import Llama + +llm = Llama( + model_path="./model-q4_k_m.gguf", + n_ctx=4096, + n_gpu_layers=35, # 0 for CPU only, 99 to offload everything + n_threads=8, +) + +output = llm( + "What is machine learning?", + max_tokens=256, + temperature=0.7, + stop=["", "\n\n"], +) +print(output["choices"][0]["text"]) +``` + +### Chat completion + streaming + +```python +llm = Llama( + model_path="./model-q4_k_m.gguf", + n_ctx=4096, + n_gpu_layers=35, + chat_format="llama-3", # Or "chatml", "mistral", etc. +) + +# Non-streaming +response = llm.create_chat_completion( + messages=[ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "What is Python?"}, + ], + max_tokens=256, + temperature=0.7, +) +print(response["choices"][0]["message"]["content"]) + +# Streaming +for chunk in llm("Explain quantum computing:", max_tokens=256, stream=True): + print(chunk["choices"][0]["text"], end="", flush=True) +``` + +### Embeddings + +```python +llm = Llama(model_path="./model-q4_k_m.gguf", embedding=True, n_gpu_layers=35) +vec = llm.embed("This is a test sentence.") +print(f"Embedding dimension: {len(vec)}") ``` ## Hardware acceleration @@ -140,122 +273,166 @@ Q3_K_M or Q4_K_S # Lower bits to fit in memory ### Apple Silicon (Metal) ```bash -# Build with Metal -make LLAMA_METAL=1 - -# Run with GPU acceleration (automatic) -./llama-cli -m model.gguf -ngl 999 # Offload all layers - -# Performance: M3 Max 40-60 tokens/sec (Llama 2-7B Q4_K_M) +make clean && make GGML_METAL=1 +./llama-cli -m model.gguf -ngl 99 -p "Hello" # offload all layers ``` -### NVIDIA GPUs (CUDA) - -```bash -# Build with CUDA -make LLAMA_CUDA=1 - -# Offload layers to GPU -./llama-cli -m model.gguf -ngl 35 # Offload 35/40 layers - -# Hybrid CPU+GPU for large models -./llama-cli -m llama-70b.Q4_K_M.gguf -ngl 20 # GPU: 20 layers, CPU: rest +```python +llm = Llama( + model_path="model.gguf", + n_gpu_layers=99, # Offload everything + n_threads=1, # Metal handles parallelism +) ``` -### AMD GPUs (ROCm) +Performance: M3 Max ~40–60 tok/s on Llama 2-7B Q4_K_M. + +### NVIDIA (CUDA) + +```bash +make clean && make GGML_CUDA=1 +./llama-cli -m model.gguf -ngl 35 -p "Hello" + +# Hybrid for large models +./llama-cli -m llama-70b.Q4_K_M.gguf -ngl 20 # GPU: 20 layers, CPU: rest + +# Multi-GPU split +./llama-cli -m large-model.gguf --tensor-split 0.5,0.5 -ngl 60 +``` + +### AMD (ROCm) ```bash -# Build with ROCm make LLAMA_HIP=1 - -# Run with AMD GPU ./llama-cli -m model.gguf -ngl 999 ``` -## Common patterns - -### Batch processing +### CPU ```bash -# Process multiple prompts from file -cat prompts.txt | ./llama-cli \ - -m model.gguf \ - --batch-size 512 \ - -n 100 +# Match PHYSICAL cores, not logical +./llama-cli -m model.gguf -t 8 -p "Hello" + +# BLAS acceleration (2–3× speedup) +make LLAMA_OPENBLAS=1 ``` -### Constrained generation - -```bash -# JSON output with grammar -./llama-cli \ - -m model.gguf \ - -p "Generate a person: " \ - --grammar-file grammars/json.gbnf - -# Outputs valid JSON only -``` - -### Context size - -```bash -# Increase context (default 512) -./llama-cli \ - -m model.gguf \ - -c 4096 # 4K context window - -# Very long context (if model supports) -./llama-cli -m model.gguf -c 32768 # 32K context +```python +llm = Llama( + model_path="model.gguf", + n_gpu_layers=0, + n_threads=8, + n_batch=512, # Larger batch = faster prompt processing +) ``` ## Performance benchmarks -### CPU performance (Llama 2-7B Q4_K_M) +### CPU (Llama 2-7B Q4_K_M) -| CPU | Threads | Speed | Cost | -|-----|---------|-------|------| -| Apple M3 Max | 16 | 50 tok/s | $0 (local) | -| AMD Ryzen 9 7950X | 32 | 35 tok/s | $0.50/hour | -| Intel i9-13900K | 32 | 30 tok/s | $0.40/hour | -| AWS c7i.16xlarge | 64 | 40 tok/s | $2.88/hour | +| CPU | Threads | Speed | +|-----|---------|-------| +| Apple M3 Max (Metal) | 16 | 50 tok/s | +| AMD Ryzen 9 7950X | 32 | 35 tok/s | +| Intel i9-13900K | 32 | 30 tok/s | -### GPU acceleration (Llama 2-7B Q4_K_M) +### GPU offloading on RTX 4090 -| GPU | Speed | vs CPU | Cost | -|-----|-------|--------|------| -| NVIDIA RTX 4090 | 120 tok/s | 3-4× | $0 (local) | -| NVIDIA A10 | 80 tok/s | 2-3× | $1.00/hour | -| AMD MI250 | 70 tok/s | 2× | $2.00/hour | -| Apple M3 Max (Metal) | 50 tok/s | ~Same | $0 (local) | +| Layers GPU | Speed | VRAM | +|------------|-------|------| +| 0 (CPU only) | 30 tok/s | 0 GB | +| 20 (hybrid) | 80 tok/s | 8 GB | +| 35 (all) | 120 tok/s | 12 GB | ## Supported models -**LLaMA family**: -- Llama 2 (7B, 13B, 70B) -- Llama 3 (8B, 70B, 405B) -- Code Llama +- **LLaMA family**: Llama 2 (7B/13B/70B), Llama 3 (8B/70B/405B), Code Llama +- **Mistral family**: Mistral 7B, Mixtral 8x7B/8x22B +- **Other**: Falcon, BLOOM, GPT-J, Phi-3, Gemma, Qwen, LLaVA (vision), Whisper (audio) -**Mistral family**: -- Mistral 7B -- Mixtral 8x7B, 8x22B +Find GGUF models: https://huggingface.co/models?library=gguf -**Other**: -- Falcon, BLOOM, GPT-J -- Phi-3, Gemma, Qwen -- LLaVA (vision), Whisper (audio) +## Ecosystem integrations -**Find models**: https://huggingface.co/models?library=gguf +### Ollama + +```bash +cat > Modelfile << 'EOF' +FROM ./model-q4_k_m.gguf +TEMPLATE """{{ .System }} +{{ .Prompt }}""" +PARAMETER temperature 0.7 +PARAMETER num_ctx 4096 +EOF + +ollama create mymodel -f Modelfile +ollama run mymodel "Hello!" +``` + +### LM Studio + +1. Place GGUF file in `~/.cache/lm-studio/models/` +2. Open LM Studio and select the model +3. Configure context length and GPU offload, start inference + +### text-generation-webui + +```bash +cp model-q4_k_m.gguf text-generation-webui/models/ +python server.py --model model-q4_k_m.gguf --loader llama.cpp --n-gpu-layers 35 +``` + +### OpenAI client → llama-server + +```python +from openai import OpenAI + +client = OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed") +response = client.chat.completions.create( + model="local-model", + messages=[{"role": "user", "content": "Hello!"}], + max_tokens=256, +) +print(response.choices[0].message.content) +``` + +## Best practices + +1. **Use K-quants** — Q4_K_M is the recommended default +2. **Use imatrix** for Q4 and below (calibration improves quality substantially) +3. **Offload as many layers as VRAM allows** — start high, reduce by 5 on OOM +4. **Thread count** — match physical cores, not logical +5. **Batch size** — increase `n_batch` (e.g. 512) for faster prompt processing +6. **Context** — start at 4096, grow only as needed (memory scales with ctx) +7. **Flash Attention** — add `--flash-attn` if your build supports it + +## Common issues (quick fixes) + +**Model loads slowly** — use `--mmap` for memory-mapped loading. + +**Out of memory (GPU)** — reduce `-ngl`, use a smaller quant (Q4_K_S / Q3_K_M), or quantize the KV cache: +```python +Llama(model_path="...", type_k=2, type_v=2, n_gpu_layers=35) # Q4_0 KV cache +``` + +**Garbage output** — wrong `chat_format`, temperature too high, or model file corrupted. Test with `temperature=0.1` and verify FP16 baseline works. + +**Connection refused (server)** — bind to `--host 0.0.0.0`, check `lsof -i :8080`. + +See `references/troubleshooting.md` for the full playbook. ## References -- **[Quantization Guide](references/quantization.md)** - GGUF formats, conversion, quality comparison -- **[Server Deployment](references/server.md)** - API endpoints, Docker, monitoring -- **[Optimization](references/optimization.md)** - Performance tuning, hybrid CPU+GPU +- **[advanced-usage.md](references/advanced-usage.md)** — speculative decoding, batched inference, grammar-constrained generation, LoRA, multi-GPU, custom builds, benchmark scripts +- **[quantization.md](references/quantization.md)** — perplexity tables, use-case guide, model size scaling (7B/13B/70B RAM needs), imatrix deep dive +- **[server.md](references/server.md)** — OpenAI API endpoints, Docker deployment, NGINX load balancing, monitoring +- **[optimization.md](references/optimization.md)** — CPU threading, BLAS, GPU offload heuristics, batch tuning, benchmarks +- **[troubleshooting.md](references/troubleshooting.md)** — install/convert/quantize/inference/server issues, Apple Silicon, debugging ## Resources -- **GitHub**: https://github.com/ggerganov/llama.cpp -- **Models**: https://huggingface.co/models?library=gguf -- **Discord**: https://discord.gg/llama-cpp - - +- **GitHub**: https://github.com/ggml-org/llama.cpp +- **Python bindings**: https://github.com/abetlen/llama-cpp-python +- **Pre-quantized models**: https://huggingface.co/TheBloke +- **GGUF converter Space**: https://huggingface.co/spaces/ggml-org/gguf-my-repo +- **License**: MIT diff --git a/skills/mlops/inference/gguf/references/advanced-usage.md b/skills/mlops/inference/llama-cpp/references/advanced-usage.md similarity index 100% rename from skills/mlops/inference/gguf/references/advanced-usage.md rename to skills/mlops/inference/llama-cpp/references/advanced-usage.md diff --git a/skills/mlops/inference/gguf/references/troubleshooting.md b/skills/mlops/inference/llama-cpp/references/troubleshooting.md similarity index 100% rename from skills/mlops/inference/gguf/references/troubleshooting.md rename to skills/mlops/inference/llama-cpp/references/troubleshooting.md diff --git a/skills/mlops/training/grpo-rl-training/README.md b/skills/mlops/training/grpo-rl-training/README.md deleted file mode 100644 index 99b60d66438..00000000000 --- a/skills/mlops/training/grpo-rl-training/README.md +++ /dev/null @@ -1,97 +0,0 @@ -# GRPO/RL Training Skill - -**Expert-level guidance for Group Relative Policy Optimization with TRL** - -## 📁 Skill Structure - -``` -grpo-rl-training/ -├── SKILL.md # Main skill documentation (READ THIS FIRST) -├── README.md # This file -├── templates/ -│ └── basic_grpo_training.py # Production-ready training template -└── examples/ - └── reward_functions_library.py # 20+ reward function examples -``` - -## 🚀 Quick Start - -1. **Read SKILL.md** - Comprehensive guide with all concepts and patterns -2. **Copy `templates/basic_grpo_training.py`** - Start with working code -3. **Browse `examples/reward_functions_library.py`** - Pick reward functions for your task -4. **Modify for your use case** - Adapt dataset, rewards, and config - -## 💡 What's Inside - -### SKILL.md (Main Documentation) -- Core GRPO concepts and algorithm fundamentals -- Complete implementation workflow (dataset → rewards → training → deployment) -- 10+ reward function examples with code -- Hyperparameter tuning guide -- Training insights (loss behavior, metrics, debugging) -- Troubleshooting guide -- Production best practices - -### Templates -- **basic_grpo_training.py**: Minimal, production-ready training script - - Uses Qwen 2.5 1.5B Instruct - - 3 reward functions (format + correctness) - - LoRA for efficient training - - Fully documented and ready to run - -### Examples -- **reward_functions_library.py**: 20+ battle-tested reward functions - - Correctness rewards (exact match, fuzzy match, numeric, code execution) - - Format rewards (XML, JSON, strict/soft) - - Length rewards (ideal length, min/max) - - Style rewards (reasoning quality, citations, repetition penalty) - - Combined rewards (multi-objective optimization) - - Preset collections for common tasks - -## 📖 Usage for Agents - -When this skill is loaded in your agent's context: - -1. **Always read SKILL.md first** before implementing -2. **Start simple** - Use length-based reward to validate setup -3. **Build incrementally** - Add one reward function at a time -4. **Reference examples** - Copy patterns from reward_functions_library.py -5. **Monitor training** - Watch reward metrics (not loss!) - -## 🎯 Common Use Cases - -| Task Type | Recommended Rewards | Template | -|-----------|---------------------|----------| -| Math reasoning | `MATH_REASONING_REWARDS` preset | basic_grpo_training.py | -| Code generation | `CODE_GENERATION_REWARDS` preset | Modify dataset in template | -| Summarization | `SUMMARIZATION_REWARDS` preset | Adjust prompts + rewards | -| Q&A | `QA_REWARDS` preset | Use fuzzy match + citations | - -## ⚠️ Critical Reminders - -- **Loss goes UP during training** - This is normal (it's KL divergence) -- **Use 3-5 reward functions** - Single rewards often fail -- **Test rewards before training** - Debug each function independently -- **Monitor reward_std** - Should stay > 0.1 (avoid mode collapse) -- **Start with num_generations=4-8** - Scale up if GPU allows - -## 🔗 External Resources - -- [TRL Documentation](https://huggingface.co/docs/trl) -- [DeepSeek R1 Paper](https://arxiv.org/abs/2501.12948) -- [Open R1 Implementation](https://github.com/huggingface/open-r1) -- [Unsloth (2-3x faster)](https://docs.unsloth.ai/) - -## 📝 Version - -**v1.0.0** - Initial release (January 2025) - -## 👨‍💻 Maintained By - -Orchestra Research -For questions or improvements, see https://orchestra.com - ---- - -**License:** MIT -**Last Updated:** January 2025 diff --git a/skills/mlops/training/trl-fine-tuning/SKILL.md b/skills/mlops/training/trl-fine-tuning/SKILL.md index 3bf4f6e12ba..70023fc707f 100644 --- a/skills/mlops/training/trl-fine-tuning/SKILL.md +++ b/skills/mlops/training/trl-fine-tuning/SKILL.md @@ -252,6 +252,8 @@ trl dpo \ Train with reinforcement learning using minimal memory. +For in-depth GRPO guidance — reward function design, critical training insights (loss behavior, mode collapse, tuning), and advanced multi-stage patterns — see **[references/grpo-training.md](references/grpo-training.md)**. A production-ready training script is in **[templates/basic_grpo_training.py](templates/basic_grpo_training.py)**. + Copy this checklist: ``` @@ -428,6 +430,8 @@ config = PPOConfig( **Online RL methods**: See [references/online-rl.md](references/online-rl.md) for PPO, GRPO, RLOO, and OnlineDPO with detailed configurations. +**GRPO deep dive**: See [references/grpo-training.md](references/grpo-training.md) for expert-level GRPO patterns — reward function design philosophy, training insights (why loss increases, mode collapse detection), hyperparameter tuning, multi-stage training, and troubleshooting. Production-ready template in [templates/basic_grpo_training.py](templates/basic_grpo_training.py). + ## Hardware requirements - **GPU**: NVIDIA (CUDA required) diff --git a/skills/mlops/training/grpo-rl-training/SKILL.md b/skills/mlops/training/trl-fine-tuning/references/grpo-training.md similarity index 56% rename from skills/mlops/training/grpo-rl-training/SKILL.md rename to skills/mlops/training/trl-fine-tuning/references/grpo-training.md index 1d7629ab633..a22bd40945d 100644 --- a/skills/mlops/training/grpo-rl-training/SKILL.md +++ b/skills/mlops/training/trl-fine-tuning/references/grpo-training.md @@ -1,51 +1,36 @@ ---- -name: grpo-rl-training -description: Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training -version: 1.0.0 -author: Orchestra Research -license: MIT -dependencies: [transformers>=4.47.0, trl>=0.14.0, datasets>=3.2.0, peft>=0.14.0, torch] -metadata: - hermes: - tags: [Post-Training, Reinforcement Learning, GRPO, TRL, RLHF, Reward Modeling, Reasoning, DPO, PPO, Structured Output] +# GRPO (Group Relative Policy Optimization) — Deep Guide ---- +Expert-level patterns, critical insights, and production-ready workflows for fine-tuning language models with custom reward functions using TRL's `GRPOTrainer`. This is the deep reference for the GRPO workflow summarized in the main skill. -# GRPO/RL Training with TRL +## When to use GRPO -Expert-level guidance for implementing Group Relative Policy Optimization (GRPO) using the Transformer Reinforcement Learning (TRL) library. This skill provides battle-tested patterns, critical insights, and production-ready workflows for fine-tuning language models with custom reward functions. - -## When to Use This Skill - -Use GRPO training when you need to: -- **Enforce specific output formats** (e.g., XML tags, JSON, structured reasoning) +Use GRPO when you need to: +- **Enforce specific output formats** (XML tags, JSON, structured reasoning) - **Teach verifiable tasks** with objective correctness metrics (math, coding, fact-checking) - **Improve reasoning capabilities** by rewarding chain-of-thought patterns - **Align models to domain-specific behaviors** without labeled preference data - **Optimize for multiple objectives** simultaneously (format + correctness + style) **Do NOT use GRPO for:** -- Simple supervised fine-tuning tasks (use SFT instead) +- Simple supervised fine-tuning tasks → use SFT - Tasks without clear reward signals -- When you already have high-quality preference pairs (use DPO/PPO instead) +- When you already have high-quality preference pairs → use DPO/PPO ---- +## Core concepts -## Core Concepts +### 1. GRPO algorithm fundamentals -### 1. GRPO Algorithm Fundamentals - -**Key Mechanism:** -- Generates **multiple completions** for each prompt (group size: 4-16) +**Key mechanism:** +- Generates **multiple completions** per prompt (group size: 4–16) - Compares completions within each group using reward functions - Updates policy to favor higher-rewarded responses relative to the group -**Critical Difference from PPO:** +**Critical differences from PPO:** - No separate reward model needed - More sample-efficient (learns from within-group comparisons) - Simpler to implement and debug -**Mathematical Intuition:** +**Mathematical intuition:** ``` For each prompt p: 1. Generate N completions: {c₁, c₂, ..., cₙ} @@ -54,35 +39,32 @@ For each prompt p: relative to low-reward ones in the same group ``` -### 2. Reward Function Design Philosophy +### 2. Reward function design philosophy -**Golden Rules:** -1. **Compose multiple reward functions** - Each handles one aspect (format, correctness, style) -2. **Scale rewards appropriately** - Higher weight = stronger signal -3. **Use incremental rewards** - Partial credit for partial compliance -4. **Test rewards independently** - Debug each reward function in isolation +**Golden rules:** +1. **Compose multiple reward functions** — each handles one aspect (format, correctness, style) +2. **Scale rewards appropriately** — higher weight = stronger signal +3. **Use incremental rewards** — partial credit for partial compliance +4. **Test rewards independently** — debug each reward function in isolation -**Reward Function Types:** +**Reward function types:** | Type | Use Case | Example Weight | |------|----------|----------------| | **Correctness** | Verifiable tasks (math, code) | 2.0 (highest) | -| **Format** | Strict structure enforcement | 0.5-1.0 | -| **Length** | Encourage verbosity/conciseness | 0.1-0.5 | -| **Style** | Penalize unwanted patterns | -0.5 to 0.5 | +| **Format** | Strict structure enforcement | 0.5–1.0 | +| **Length** | Encourage verbosity/conciseness | 0.1–0.5 | +| **Style** | Penalize unwanted patterns | −0.5 to 0.5 | ---- +## Implementation workflow -## Implementation Workflow +### Step 1: Dataset preparation -### Step 1: Dataset Preparation - -**Critical Requirements:** -- Prompts in chat format (list of dicts with 'role' and 'content') +**Critical requirements:** +- Prompts in chat format (list of dicts with `role` and `content`) - Include system prompts to set expectations - For verifiable tasks, include ground truth answers as additional columns -**Example Structure:** ```python from datasets import load_dataset, Dataset @@ -97,8 +79,7 @@ Respond in the following format: """ def prepare_dataset(raw_data): - """ - Transform raw data into GRPO-compatible format. + """Transform raw data into GRPO-compatible format. Returns: Dataset with columns: - 'prompt': List[Dict] with role/content (system + user messages) @@ -113,14 +94,14 @@ def prepare_dataset(raw_data): }) ``` -**Pro Tips:** -- Use one-shot or few-shot examples in system prompt for complex formats -- Keep prompts concise (max_prompt_length: 256-512 tokens) +**Pro tips:** +- Use one-shot or few-shot examples in the system prompt for complex formats +- Keep prompts concise (max_prompt_length: 256–512 tokens) - Validate data quality before training (garbage in = garbage out) -### Step 2: Reward Function Implementation +### Step 2: Reward function implementation -**Template Structure:** +**Template structure:** ```python def reward_function_name( prompts, # List[List[Dict]]: Original prompts @@ -128,24 +109,16 @@ def reward_function_name( answer=None, # Optional: Ground truth from dataset **kwargs # Additional dataset columns ) -> list[float]: - """ - Evaluate completions and return rewards. - - Returns: List of floats (one per completion) - """ - # Extract completion text + """Evaluate completions and return rewards (one per completion).""" responses = [comp[0]['content'] for comp in completions] - - # Compute rewards rewards = [] for response in responses: score = compute_score(response) rewards.append(score) - return rewards ``` -**Example 1: Correctness Reward (Math/Coding)** +**Example 1: correctness reward (math/coding)** ```python def correctness_reward(prompts, completions, answer, **kwargs): """Reward correct answers with high score.""" @@ -155,7 +128,7 @@ def correctness_reward(prompts, completions, answer, **kwargs): for ans, gt in zip(extracted, answer)] ``` -**Example 2: Format Reward (Structured Output)** +**Example 2: format reward (structured output)** ```python import re @@ -167,7 +140,7 @@ def format_reward(completions, **kwargs): for r in responses] ``` -**Example 3: Incremental Format Reward (Partial Credit)** +**Example 3: incremental format reward (partial credit)** ```python def incremental_format_reward(completions, **kwargs): """Award partial credit for format compliance.""" @@ -176,14 +149,10 @@ def incremental_format_reward(completions, **kwargs): for r in responses: score = 0.0 - if '' in r: - score += 0.25 - if '' in r: - score += 0.25 - if '' in r: - score += 0.25 - if '' in r: - score += 0.25 + if '' in r: score += 0.25 + if '' in r: score += 0.25 + if '' in r: score += 0.25 + if '' in r: score += 0.25 # Penalize extra text after closing tag if r.count('') == 1: extra_text = r.split('')[-1].strip() @@ -193,12 +162,11 @@ def incremental_format_reward(completions, **kwargs): return rewards ``` -**Critical Insight:** -Combine 3-5 reward functions for robust training. Order matters less than diversity of signals. +**Critical insight:** Combine 3–5 reward functions for robust training. Order matters less than diversity of signals. -### Step 3: Training Configuration +### Step 3: Training configuration -**Memory-Optimized Config (Small GPU)** +**Memory-optimized config (small GPU)** ```python from trl import GRPOConfig @@ -218,13 +186,13 @@ training_args = GRPOConfig( gradient_accumulation_steps=4, # Effective batch = 4 # GRPO-specific - num_generations=8, # Group size: 8-16 recommended + num_generations=8, # Group size: 8–16 recommended max_prompt_length=256, max_completion_length=512, # Training duration num_train_epochs=1, - max_steps=None, # Or set fixed steps (e.g., 500) + max_steps=None, # Optimization bf16=True, # Faster on A100/H100 @@ -234,11 +202,11 @@ training_args = GRPOConfig( # Logging logging_steps=1, save_steps=100, - report_to="wandb", # Or "none" for no logging + report_to="wandb", ) ``` -**High-Performance Config (Large GPU)** +**High-performance config (large GPU)** ```python training_args = GRPOConfig( output_dir="outputs/grpo-model", @@ -255,31 +223,30 @@ training_args = GRPOConfig( ) ``` -**Critical Hyperparameters:** +**Critical hyperparameters:** | Parameter | Impact | Tuning Advice | |-----------|--------|---------------| -| `num_generations` | Group size for comparison | Start with 8, increase to 16 if GPU allows | +| `num_generations` | Group size for comparison | Start 8, increase to 16 if GPU allows | | `learning_rate` | Convergence speed/stability | 5e-6 (safe), 1e-5 (faster, riskier) | -| `max_completion_length` | Output verbosity | Match your task (512 for reasoning, 256 for short answers) | +| `max_completion_length` | Output verbosity | Match your task (512 reasoning, 256 short answers) | | `gradient_accumulation_steps` | Effective batch size | Increase if GPU memory limited | -### Step 4: Model Setup and Training +### Step 4: Model setup and training -**Standard Setup (Transformers)** +**Standard setup (Transformers + TRL)** ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from peft import LoraConfig from trl import GRPOTrainer -# Load model model_name = "Qwen/Qwen2.5-1.5B-Instruct" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, - attn_implementation="flash_attention_2", # 2-3x faster - device_map="auto" + attn_implementation="flash_attention_2", # 2–3× faster + device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained(model_name) @@ -287,17 +254,16 @@ tokenizer.pad_token = tokenizer.eos_token # Optional: LoRA for parameter-efficient training peft_config = LoraConfig( - r=16, # Rank (higher = more capacity) - lora_alpha=32, # Scaling factor (typically 2*r) + r=16, + lora_alpha=32, target_modules=[ "q_proj", "k_proj", "v_proj", "o_proj", - "gate_proj", "up_proj", "down_proj" + "gate_proj", "up_proj", "down_proj", ], task_type="CAUSAL_LM", lora_dropout=0.05, ) -# Initialize trainer trainer = GRPOTrainer( model=model, processing_class=tokenizer, @@ -308,17 +274,14 @@ trainer = GRPOTrainer( ], args=training_args, train_dataset=dataset, - peft_config=peft_config, # Remove for full fine-tuning + peft_config=peft_config, # Remove for full fine-tuning ) -# Train trainer.train() - -# Save trainer.save_model("final_model") ``` -**Unsloth Setup (2-3x Faster)** +**Unsloth setup (2–3× faster)** ```python from unsloth import FastLanguageModel @@ -339,28 +302,26 @@ model = FastLanguageModel.get_peft_model( use_gradient_checkpointing="unsloth", ) -# Rest is identical to standard setup +# Rest is identical to the standard setup trainer = GRPOTrainer(model=model, ...) trainer.train() ``` ---- +## Critical training insights -## Critical Training Insights +### 1. Loss behavior (EXPECTED pattern) +- **Loss starts near 0 and INCREASES during training** — this is CORRECT +- Loss measures KL divergence from initial policy; the model is learning (diverging from original behavior to optimize rewards) +- **Monitor reward metrics, not loss, for progress** -### 1. Loss Behavior (EXPECTED PATTERN) -- **Loss starts near 0 and INCREASES during training** -- This is CORRECT - loss measures KL divergence from initial policy -- Model is learning (diverging from original behavior to optimize rewards) -- Monitor reward metrics instead of loss for progress +### 2. Reward tracking -### 2. Reward Tracking Key metrics to watch: -- `reward`: Average across all completions -- `reward_std`: Diversity within groups (should remain > 0) -- `kl`: KL divergence from reference (should grow moderately) +- `reward` — average across all completions +- `reward_std` — diversity within groups (should remain > 0) +- `kl` — KL divergence from reference (should grow moderately) -**Healthy Training Pattern:** +**Healthy pattern:** ``` Step Reward Reward_Std KL 100 0.5 0.3 0.02 @@ -369,12 +330,12 @@ Step Reward Reward_Std KL 400 1.5 0.15 0.12 ``` -**Warning Signs:** -- Reward std → 0 (model collapsing to single response) -- KL exploding (> 0.5) (diverging too much, reduce LR) -- Reward stuck (reward functions too harsh or model capacity issue) +**Warning signs:** +- `reward_std` → 0 (model collapsing to a single response) +- `kl` exploding (> 0.5) — diverging too much, reduce LR +- Reward stuck — reward functions too harsh or model capacity issue -### 3. Common Pitfalls and Solutions +### 3. Common pitfalls and solutions | Problem | Symptom | Solution | |---------|---------|----------| @@ -384,15 +345,14 @@ Step Reward Reward_Std KL | **Slow training** | < 1 it/s | Enable `use_vllm=True`, use Unsloth, reduce seq length | | **Format ignored** | Model doesn't follow structure | Increase format reward weight, add incremental rewards | ---- +## Advanced patterns -## Advanced Patterns +### 1. Multi-stage training -### 1. Multi-Stage Training For complex tasks, train in stages: ```python -# Stage 1: Format compliance (epochs=1) +# Stage 1: Format compliance trainer_stage1 = GRPOTrainer( model=model, reward_funcs=[incremental_format_reward, format_reward], @@ -400,7 +360,7 @@ trainer_stage1 = GRPOTrainer( ) trainer_stage1.train() -# Stage 2: Correctness (epochs=1) +# Stage 2: Correctness trainer_stage2 = GRPOTrainer( model=model, reward_funcs=[format_reward, correctness_reward], @@ -409,7 +369,8 @@ trainer_stage2 = GRPOTrainer( trainer_stage2.train() ``` -### 2. Adaptive Reward Scaling +### 2. Adaptive reward scaling + ```python class AdaptiveReward: def __init__(self, base_reward_func, initial_weight=1.0): @@ -428,148 +389,116 @@ class AdaptiveReward: self.weight *= 0.9 ``` -### 3. Custom Dataset Integration +### 3. Custom dataset integration + ```python def load_custom_knowledge_base(csv_path): - """Example: School communication platform docs.""" import pandas as pd df = pd.read_csv(csv_path) - - dataset = Dataset.from_pandas(df).map(lambda x: { + return Dataset.from_pandas(df).map(lambda x: { 'prompt': [ {'role': 'system', 'content': CUSTOM_SYSTEM_PROMPT}, {'role': 'user', 'content': x['question']} ], 'answer': x['expert_answer'] }) - return dataset ``` ---- +## Deployment and inference -## Deployment and Inference - -### Save and Merge LoRA +### Save and merge LoRA ```python -# Merge LoRA adapters into base model if hasattr(trainer.model, 'merge_and_unload'): merged_model = trainer.model.merge_and_unload() merged_model.save_pretrained("production_model") tokenizer.save_pretrained("production_model") ``` -### Inference Example +### Inference ```python from transformers import pipeline -generator = pipeline( - "text-generation", - model="production_model", - tokenizer=tokenizer -) +generator = pipeline("text-generation", model="production_model", tokenizer=tokenizer) result = generator( [ {'role': 'system', 'content': SYSTEM_PROMPT}, - {'role': 'user', 'content': "What is 15 + 27?"} + {'role': 'user', 'content': "What is 15 + 27?"}, ], max_new_tokens=256, do_sample=True, temperature=0.7, - top_p=0.9 + top_p=0.9, ) print(result[0]['generated_text']) ``` ---- +## Best practices checklist -## Best Practices Checklist - -**Before Training:** +**Before training:** - [ ] Validate dataset format (prompts as List[Dict]) - [ ] Test reward functions on sample data -- [ ] Calculate expected max_prompt_length from data -- [ ] Choose appropriate num_generations based on GPU memory +- [ ] Calculate expected `max_prompt_length` from data +- [ ] Choose `num_generations` based on GPU memory - [ ] Set up logging (wandb recommended) -**During Training:** +**During training:** - [ ] Monitor reward progression (should increase) -- [ ] Check reward_std (should stay > 0.1) +- [ ] Check `reward_std` (should stay > 0.1) - [ ] Watch for OOM errors (reduce batch size if needed) -- [ ] Sample generations every 50-100 steps +- [ ] Sample generations every 50–100 steps - [ ] Validate format compliance on holdout set -**After Training:** +**After training:** - [ ] Merge LoRA weights if using PEFT - [ ] Test on diverse prompts - [ ] Compare to baseline model - [ ] Document reward weights and hyperparameters - [ ] Save reproducibility config ---- +## Troubleshooting -## Troubleshooting Guide +### Debugging workflow +1. **Isolate reward functions** — test each independently +2. **Check data distribution** — ensure diversity in prompts +3. **Reduce complexity** — start with single reward, add gradually +4. **Monitor generations** — print samples every N steps +5. **Validate extraction logic** — ensure answer parsing works -### Debugging Workflow -1. **Isolate reward functions** - Test each independently -2. **Check data distribution** - Ensure diversity in prompts -3. **Reduce complexity** - Start with single reward, add gradually -4. **Monitor generations** - Print samples every N steps -5. **Validate extraction logic** - Ensure answer parsing works - -### Quick Fixes +### Quick debug reward ```python -# Debug reward function def debug_reward(completions, **kwargs): responses = [comp[0]['content'] for comp in completions] - for i, r in enumerate(responses[:2]): # Print first 2 + for i, r in enumerate(responses[:2]): print(f"Response {i}: {r[:200]}...") - return [1.0] * len(responses) # Dummy rewards + return [1.0] * len(responses) # Test without training trainer = GRPOTrainer(..., reward_funcs=[debug_reward]) -trainer.generate_completions(dataset[:1]) # Generate without updating +trainer.generate_completions(dataset[:1]) ``` ---- +## Template -## References and Resources +A production-ready training script lives at **`../templates/basic_grpo_training.py`**. It uses Qwen 2.5-1.5B-Instruct with LoRA and three reward functions (incremental format, strict format, correctness) on GSM8K. Copy and adapt: +1. `get_dataset()` — swap in your data loader +2. Reward functions — tune to your task +3. `SYSTEM_PROMPT` — match your output format +4. `GRPOConfig` — adjust hyperparameters for your GPU + +## References and resources -**Official Documentation:** - TRL GRPO Trainer: https://huggingface.co/docs/trl/grpo_trainer -- DeepSeek R1 Paper: https://arxiv.org/abs/2501.12948 -- Unsloth Docs: https://docs.unsloth.ai/ - -**Example Repositories:** -- Open R1 Implementation: https://github.com/huggingface/open-r1 -- TRL Examples: https://github.com/huggingface/trl/tree/main/examples - -**Recommended Reading:** -- Progressive Disclosure Pattern for agent instructions -- Reward shaping in RL (Ng et al.) -- LoRA paper (Hu et al., 2021) - ---- - -## Usage Instructions for Agents - -When this skill is loaded: - -1. **Read this entire file** before implementing GRPO training -2. **Start with the simplest reward function** (e.g., length-based) to validate setup -3. **Use the templates** in `templates/` directory as starting points -4. **Reference examples** in `examples/` for task-specific implementations -5. **Follow the workflow** sequentially (don't skip steps) -6. **Debug incrementally** - add one reward function at a time - -**Critical Reminders:** -- Always use multiple reward functions (3-5 is optimal) -- Monitor reward metrics, not loss -- Test reward functions before training -- Start small (num_generations=4), scale up gradually -- Save checkpoints frequently (every 100 steps) - -This skill is designed for **expert-level implementation**. Beginners should start with supervised fine-tuning before attempting GRPO. - +- GRPO paper (DeepSeek): https://arxiv.org/abs/2402.03300 +- DeepSeek R1 paper: https://arxiv.org/abs/2501.12948 +- Open R1 implementation: https://github.com/huggingface/open-r1 +- TRL examples: https://github.com/huggingface/trl/tree/main/examples +- Unsloth (faster training): https://docs.unsloth.ai/ +## Critical reminders +- **Loss goes UP during training** — this is normal (it's KL divergence) +- **Use 3–5 reward functions** — single rewards often fail +- **Test rewards before training** — debug each function independently +- **Monitor `reward_std`** — should stay > 0.1 (avoid mode collapse) +- **Start with `num_generations=4–8`** — scale up if GPU allows diff --git a/skills/mlops/training/grpo-rl-training/templates/basic_grpo_training.py b/skills/mlops/training/trl-fine-tuning/templates/basic_grpo_training.py similarity index 100% rename from skills/mlops/training/grpo-rl-training/templates/basic_grpo_training.py rename to skills/mlops/training/trl-fine-tuning/templates/basic_grpo_training.py diff --git a/website/docs/reference/optional-skills-catalog.md b/website/docs/reference/optional-skills-catalog.md index 6fde99b5ee8..bbb2c3b80ea 100644 --- a/website/docs/reference/optional-skills-catalog.md +++ b/website/docs/reference/optional-skills-catalog.md @@ -98,6 +98,7 @@ The largest optional category — covers the full ML pipeline from data curation | **chroma** | Open-source embedding database. Store embeddings and metadata, perform vector and full-text search. Simple 4-function API for RAG and semantic search. | | **faiss** | Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). | | **flash-attention** | Optimize transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Supports PyTorch SDPA, flash-attn library, H100 FP8, and sliding window. | +| **guidance** | Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance — Microsoft Research's constrained generation framework. | | **hermes-atropos-environments** | Build, test, and debug Hermes Agent RL environments for Atropos training. Covers the HermesAgentBaseEnv interface, reward functions, agent loop integration, and evaluation. | | **huggingface-tokenizers** | Fast Rust-based tokenizers for research and production. Tokenizes 1GB in under 20 seconds. Supports BPE, WordPiece, and Unigram algorithms. | | **instructor** | Extract structured data from LLM responses with Pydantic validation, retry failed extractions automatically, and stream partial results. | diff --git a/website/docs/reference/skills-catalog.md b/website/docs/reference/skills-catalog.md index 13ef2f7fc4a..ead50dbea67 100644 --- a/website/docs/reference/skills-catalog.md +++ b/website/docs/reference/skills-catalog.md @@ -163,10 +163,8 @@ Model serving, quantization (GGUF/GPTQ), structured output, inference optimizati | Skill | Description | Path | |-------|-------------|------| -| `gguf-quantization` | GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements. | `mlops/inference/gguf` | -| `guidance` | Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance - Microsoft Research's constrained generation framework | `mlops/inference/guidance` | | `instructor` | Extract structured data from LLM responses with Pydantic validation, retry failed extractions automatically, parse complex JSON with type safety, and stream partial results with Instructor - battle-tested structured output library | `mlops/inference/instructor` | -| `llama-cpp` | Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU. | `mlops/inference/llama-cpp` | +| `llama-cpp` | Run LLM inference with llama.cpp on CPU, Apple Silicon, AMD/Intel GPUs, or NVIDIA — plus GGUF model conversion and quantization (2–8 bit with K-quants and imatrix). Covers CLI, Python bindings, OpenAI-compatible server, and Ollama/LM Studio integration. Use for edge deployment, M1/M2/M3/M4 Macs, CUDA-less environments, or flexible local quantization. | `mlops/inference/llama-cpp` | | `obliteratus` | Remove refusal behaviors from open-weight LLMs using OBLITERATUS — mechanistic interpretability techniques (diff-in-means, SVD, whitened SVD, LEACE, SAE decomposition, etc.) to excise guardrails while preserving reasoning. 9 CLI methods, 28 analysis modules, 116 model presets ac… | `mlops/inference/obliteratus` | | `outlines` | Guarantee valid JSON/XML/code structure during generation, use Pydantic models for type-safe outputs, support local models (Transformers, vLLM), and maximize inference speed with Outlines - dottxt.ai's structured generation library | `mlops/inference/outlines` | | `serving-llms-vllm` | Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), an… | `mlops/inference/vllm` | @@ -202,7 +200,6 @@ Fine-tuning, RLHF/DPO/GRPO training, distributed training frameworks, and optimi | `axolotl` | Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support | `mlops/training/axolotl` | | `distributed-llm-pretraining-torchtitan` | Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing. | `mlops/training/torchtitan` | | `fine-tuning-with-trl` | Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Tr… | `mlops/training/trl-fine-tuning` | -| `grpo-rl-training` | Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training | `mlops/training/grpo-rl-training` | | `hermes-atropos-environments` | Build, test, and debug Hermes Agent RL environments for Atropos training. Covers the HermesAgentBaseEnv interface, reward functions, agent loop integration, evaluation with tools, wandb logging, and the three CLI modes (serve/process/evaluate). Use when creating, reviewing, or f… | `mlops/training/hermes-atropos-environments` | | `huggingface-accelerate` | Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard. | `mlops/training/accelerate` | | `optimizing-attention-flash` | Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA,… | `mlops/training/flash-attention` | From 54e0eb24c0c9700fd0139242aab740c51711bacb Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 18 Apr 2026 01:45:48 -0700 Subject: [PATCH 012/143] =?UTF-8?q?docs:=20correctness=20audit=20=E2=80=94?= =?UTF-8?q?=20fix=20wrong=20values,=20add=20missing=20coverage=20(#11972)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Comprehensive audit of every reference/messaging/feature doc page against the live code registries (PROVIDER_REGISTRY, OPTIONAL_ENV_VARS, COMMAND_REGISTRY, TOOLSETS, tool registry, on-disk skills). Every fix was verified against code before writing. ### Wrong values fixed (users would paste-and-fail) - reference/environment-variables.md: - DASHSCOPE_BASE_URL default was `coding-intl.dashscope.aliyuncs.com/v1` \u2192 actual `dashscope-intl.aliyuncs.com/compatible-mode/v1`. - MINIMAX_BASE_URL and MINIMAX_CN_BASE_URL defaults were `/v1` \u2192 actual `/anthropic` (Hermes calls MiniMax via its Anthropic Messages endpoint). - reference/toolsets-reference.md MCP example used the non-existent nested `mcp: servers:` key \u2192 real key is the flat `mcp_servers:`. - reference/skills-catalog.md listed ~20 bundled skills that no longer exist on disk (all moved to `optional-skills/`). Regenerated the whole bundled section from `skills/**/SKILL.md` \u2014 79 skills, accurate paths and names. - messaging/slack.md ":::info" callout claimed Slack has no `free_response_channels` equivalent; both the env var and the yaml key are in fact read. - messaging/qqbot.md documented `QQ_MARKDOWN_SUPPORT` as an env var, but the adapter only reads `extra.markdown_support` from config.yaml. Removed the env var row and noted config-only nature. - messaging/qqbot.md `hermes setup gateway` \u2192 `hermes gateway setup`. ### Missing coverage added - Providers: AWS Bedrock and Qwen Portal (qwen-oauth) \u2014 both in PROVIDER_REGISTRY but undocumented everywhere. Added sections to integrations/providers.md, rows to quickstart.md and fallback-providers.md. - integrations/providers.md "Fallback Model" provider list now includes gemini, google-gemini-cli, qwen-oauth, xai, nvidia, ollama-cloud, bedrock. - reference/cli-commands.md `--provider` enum and HERMES_INFERENCE_PROVIDER enum in env-vars now include the same set. - reference/slash-commands.md: added `/agents` (alias `/tasks`) and `/copy`. Removed duplicate rows for `/snapshot`, `/fast` (\u00d72), `/debug`. - reference/tools-reference.md: fixed "47 built-in tools" \u2192 52. Added `feishu_doc` and `feishu_drive` toolset sections. - reference/toolsets-reference.md: added `feishu_doc` / `feishu_drive` core rows + all missing `hermes-` toolsets in the platform table (bluebubbles, dingtalk, feishu, qqbot, wecom, wecom-callback, weixin, homeassistant, webhook, gateway). Fixed the `debugging` composite to describe the actual `includes=[...]` mechanism. - reference/optional-skills-catalog.md: added `fitness-nutrition`. - reference/environment-variables.md: added NOUS_BASE_URL, NOUS_INFERENCE_BASE_URL, NVIDIA_API_KEY/BASE_URL, OLLAMA_API_KEY/BASE_URL, XAI_API_KEY/BASE_URL, MISTRAL_API_KEY, AWS_REGION/AWS_PROFILE, BEDROCK_BASE_URL, HERMES_QWEN_BASE_URL, DISCORD_ALLOWED_CHANNELS, DISCORD_PROXY, TELEGRAM_REPLY_TO_MODE, MATRIX_DEVICE_ID, MATRIX_REACTIONS, QQBOT_HOME_CHANNEL_NAME, QQ_SANDBOX. - messaging/discord.md: documented DISCORD_ALLOWED_CHANNELS, DISCORD_PROXY, HERMES_DISCORD_TEXT_BATCH_DELAY_SECONDS and HERMES_DISCORD_TEXT_BATCH_SPLIT _DELAY_SECONDS (all actively read by the adapter). - messaging/matrix.md: documented MATRIX_REACTIONS (default true). - messaging/telegram.md: removed the redundant second Webhook Mode section that invented a `telegram.webhook_mode: true` yaml key the adapter does not read. - user-guide/features/hooks.md: added `on_session_finalize` and `on_session_reset` (both emitted via invoke_hook but undocumented). - user-guide/features/api-server.md: documented GET /health/detailed, the `/api/jobs/*` CRUD surface, POST /v1/runs, and GET /v1/runs/{id}/events (10 routes that were live but undocumented). - user-guide/features/fallback-providers.md: added `approval` and `title_generation` auxiliary-task rows; added gemini, bedrock, qwen-oauth to the supported-providers table. - user-guide/features/tts.md: "seven providers" \u2192 "eight" (post-xAI add oversight in #11942). - user-guide/configuration.md: TTS provider enum gains `xai` and `gemini`; yaml example block gains `mistral:`, `gemini:`, `xai:` subsections. Auxiliary-provider enum now enumerates all real registry entries. - reference/faq.md: stale AIAgent/config examples bumped from `nous/hermes-3-llama-3.1-70b` and `claude-sonnet-4.6` to `claude-opus-4.7`. ### Docs-site integrity - guides/build-a-hermes-plugin.md referenced two nonexistent hooks (`pre_api_request`, `post_api_request`). Replaced with the real `on_session_finalize` / `on_session_reset` entries. - messaging/open-webui.md and features/api-server.md had pre-existing broken links to `/docs/user-guide/features/profiles` (actual path is `/docs/user-guide/profiles`). Fixed. - reference/skills-catalog.md had one `<1%` literal that MDX parsed as a JSX tag. Escaped to `<1%`. ### False positives filtered out (not changed, verified correct) - `/set-home` is a registered alias of `/sethome` \u2014 docs were fine. - `hermes setup gateway` is valid syntax (`hermes setup \`); changed in qqbot.md for cross-doc consistency, not as a bug fix. - Telegram reactions "disabled by default" matches code (default `"false"`). - Matrix encryption "opt-in" matches code (empty env default \u2192 disabled). - `pre_api_request` / `post_api_request` hooks do NOT exist in current code; documented instead the real `on_session_finalize` / `on_session_reset`. - SIGNAL_IGNORE_STORIES is already in env-vars.md (subagent missed it). Validation: - `docusaurus build` \u2014 passes (only pre-existing nix-setup anchor warning). - `ascii-guard lint docs` \u2014 124 files, 0 errors. - 22 files changed, +317 / \u2212158. --- website/docs/getting-started/quickstart.md | 3 + website/docs/guides/build-a-hermes-plugin.md | 4 +- website/docs/integrations/providers.md | 60 ++++++- website/docs/reference/cli-commands.md | 5 +- .../docs/reference/environment-variables.md | 22 ++- website/docs/reference/faq.md | 6 +- .../docs/reference/optional-skills-catalog.md | 1 + website/docs/reference/skills-catalog.md | 149 ++++++++---------- website/docs/reference/slash-commands.md | 8 +- website/docs/reference/tools-reference.md | 23 ++- website/docs/reference/toolsets-reference.md | 29 ++-- website/docs/user-guide/configuration.md | 16 +- .../docs/user-guide/features/api-server.md | 56 ++++++- .../user-guide/features/fallback-providers.md | 7 + website/docs/user-guide/features/hooks.md | 48 ++++++ website/docs/user-guide/features/tts.md | 2 +- website/docs/user-guide/messaging/discord.md | 4 + website/docs/user-guide/messaging/matrix.md | 5 + .../docs/user-guide/messaging/open-webui.md | 2 +- website/docs/user-guide/messaging/qqbot.md | 6 +- website/docs/user-guide/messaging/slack.md | 2 +- website/docs/user-guide/messaging/telegram.md | 34 ---- 22 files changed, 326 insertions(+), 166 deletions(-) diff --git a/website/docs/getting-started/quickstart.md b/website/docs/getting-started/quickstart.md index 77d6ac84904..8a39c49f1e8 100644 --- a/website/docs/getting-started/quickstart.md +++ b/website/docs/getting-started/quickstart.md @@ -53,6 +53,9 @@ hermes setup # Or configure everything at once | **Kimi / Moonshot** | Moonshot-hosted coding and chat models | Set `KIMI_API_KEY` | | **Kimi / Moonshot China** | China-region Moonshot endpoint | Set `KIMI_CN_API_KEY` | | **Arcee AI** | Trinity models | Set `ARCEEAI_API_KEY` | +| **Xiaomi MiMo** | Xiaomi MiMo models via [platform.xiaomimimo.com](https://platform.xiaomimimo.com) | Set `XIAOMI_API_KEY` | +| **AWS Bedrock** | Anthropic Claude, Amazon Nova, DeepSeek v3.2, and Meta Llama via AWS | Standard boto3 auth (`AWS_PROFILE` or `AWS_ACCESS_KEY_ID` + `AWS_REGION`) | +| **Qwen Portal (OAuth)** | Qwen 3.5 / Qwen-Coder models via Alibaba's consumer Qwen Portal | OAuth via `hermes model` (optional: `HERMES_QWEN_BASE_URL`) | | **MiniMax** | International MiniMax endpoint | Set `MINIMAX_API_KEY` | | **MiniMax China** | China-region MiniMax endpoint | Set `MINIMAX_CN_API_KEY` | | **Alibaba Cloud** | Qwen models via DashScope | Set `DASHSCOPE_API_KEY` | diff --git a/website/docs/guides/build-a-hermes-plugin.md b/website/docs/guides/build-a-hermes-plugin.md index e8611197a17..4e2ee5cf267 100644 --- a/website/docs/guides/build-a-hermes-plugin.md +++ b/website/docs/guides/build-a-hermes-plugin.md @@ -419,8 +419,8 @@ Each hook is documented in full on the **[Event Hooks reference](/docs/user-guid | [`post_llm_call`](/docs/user-guide/features/hooks#post_llm_call) | Once per turn, after the tool-calling loop (successful turns only) | `session_id: str, user_message: str, assistant_response: str, conversation_history: list, model: str, platform: str` | ignored | | [`on_session_start`](/docs/user-guide/features/hooks#on_session_start) | New session created (first turn only) | `session_id: str, model: str, platform: str` | ignored | | [`on_session_end`](/docs/user-guide/features/hooks#on_session_end) | End of every `run_conversation` call + CLI exit | `session_id: str, completed: bool, interrupted: bool, model: str, platform: str` | ignored | -| [`pre_api_request`](/docs/user-guide/features/hooks#pre_api_request) | Before each HTTP request to the LLM provider | `method: str, url: str, headers: dict, body: dict` | ignored | -| [`post_api_request`](/docs/user-guide/features/hooks#post_api_request) | After each HTTP response from the LLM provider | `method: str, url: str, status_code: int, response: dict` | ignored | +| [`on_session_finalize`](/docs/user-guide/features/hooks#on_session_finalize) | CLI/gateway tears down an active session | `session_id: str \| None, platform: str` | ignored | +| [`on_session_reset`](/docs/user-guide/features/hooks#on_session_reset) | Gateway swaps in a new session key (`/new`, `/reset`) | `session_id: str, platform: str` | ignored | Most hooks are fire-and-forget observers — their return values are ignored. The exception is `pre_llm_call`, which can inject context into the conversation. diff --git a/website/docs/integrations/providers.md b/website/docs/integrations/providers.md index 56d2f0ea38d..4f536ec7496 100644 --- a/website/docs/integrations/providers.md +++ b/website/docs/integrations/providers.md @@ -323,6 +323,64 @@ The model catalog is fetched dynamically from `ollama.com/v1/models` and cached Both speak the same OpenAI-compatible API. Cloud is a first-class provider (`--provider ollama-cloud`, `OLLAMA_API_KEY`); local Ollama is reached via the Custom Endpoint flow (base URL `http://localhost:11434/v1`, no key). Use cloud for large models you can't run locally; use local for privacy or offline work. ::: +### AWS Bedrock + +Anthropic Claude, Amazon Nova, DeepSeek v3.2, Meta Llama 4, and other models via AWS Bedrock. Uses the AWS SDK (`boto3`) credential chain — no API key, just standard AWS auth. + +```bash +# Simplest — named profile in ~/.aws/credentials +hermes chat --provider bedrock --model us.anthropic.claude-sonnet-4-6 + +# Or with explicit env vars +AWS_PROFILE=myprofile AWS_REGION=us-east-1 hermes chat --provider bedrock --model us.anthropic.claude-sonnet-4-6 +``` + +Or permanently in `config.yaml`: +```yaml +model: + provider: "bedrock" + default: "us.anthropic.claude-sonnet-4-6" +bedrock: + region: "us-east-1" # or set AWS_REGION + # profile: "myprofile" # or set AWS_PROFILE + # discovery: true # auto-discover region from IAM + # guardrail: # optional Bedrock Guardrails + # id: "your-guardrail-id" + # version: "DRAFT" +``` + +Authentication uses the standard boto3 chain: explicit `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY`, `AWS_PROFILE` from `~/.aws/credentials`, IAM role on EC2/ECS/Lambda, IMDS, or SSO. No env var is required if you're already authenticated with the AWS CLI. + +Bedrock uses the **Converse API** under the hood — requests are translated to Bedrock's model-agnostic shape, so the same config works for Claude, Nova, DeepSeek, and Llama models. Set `BEDROCK_BASE_URL` only if you're calling a non-default regional endpoint. + +See the [AWS Bedrock guide](/docs/guides/aws-bedrock) for a walkthrough of IAM setup, region selection, and cross-region inference. + +### Qwen Portal (OAuth) + +Alibaba's Qwen Portal with browser-based OAuth login. Pick **Qwen OAuth (Portal)** in `hermes model`, sign in through the browser, and Hermes persists the refresh token. + +```bash +hermes model +# → pick "Qwen OAuth (Portal)" +# → browser opens; sign in with your Alibaba account +# → confirm — credentials are saved to ~/.hermes/auth.json + +hermes chat # uses portal.qwen.ai/v1 endpoint +``` + +Or configure `config.yaml`: +```yaml +model: + provider: "qwen-oauth" + default: "qwen3-coder-plus" +``` + +Set `HERMES_QWEN_BASE_URL` only if the portal endpoint relocates (default: `https://portal.qwen.ai/v1`). + +:::tip Qwen OAuth vs DashScope (Alibaba) +`qwen-oauth` uses the consumer-facing Qwen Portal with OAuth login — ideal for individual users. The `alibaba` provider uses DashScope's enterprise API with a `DASHSCOPE_API_KEY` — ideal for programmatic / production workloads. Both route to Qwen-family models but live at different endpoints. +::: + ### NVIDIA NIM Nemotron and other open source models via [build.nvidia.com](https://build.nvidia.com) (free API key) or a local NIM endpoint. @@ -1101,7 +1159,7 @@ fallback_model: When activated, the fallback swaps the model and provider mid-session without losing your conversation. It fires **at most once** per session. -Supported providers: `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `deepseek`, `ai-gateway`, `opencode-zen`, `opencode-go`, `kilocode`, `xiaomi`, `arcee`, `alibaba`, `custom`. +Supported providers: `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `gemini`, `google-gemini-cli`, `qwen-oauth`, `huggingface`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `deepseek`, `nvidia`, `xai`, `ollama-cloud`, `bedrock`, `ai-gateway`, `opencode-zen`, `opencode-go`, `kilocode`, `xiaomi`, `arcee`, `alibaba`, `custom`. :::tip Fallback is configured exclusively through `config.yaml` — there are no environment variables for it. For full details on when it triggers, supported providers, and how it interacts with auxiliary tasks and delegation, see [Fallback Providers](/docs/user-guide/features/fallback-providers). diff --git a/website/docs/reference/cli-commands.md b/website/docs/reference/cli-commands.md index ea5557a193d..1fc4911158b 100644 --- a/website/docs/reference/cli-commands.md +++ b/website/docs/reference/cli-commands.md @@ -63,9 +63,6 @@ hermes [global-options] [subcommand/options] | `hermes insights` | Show token/cost/activity analytics. | | `hermes claw` | OpenClaw migration helpers. | | `hermes dashboard` | Launch the web dashboard for managing config, API keys, and sessions. | -| `hermes debug` | Debug tools — upload logs and system info for support. | -| `hermes backup` | Back up Hermes home directory to a zip file. | -| `hermes import` | Restore a Hermes backup from a zip file. | | `hermes profile` | Manage profiles — multiple isolated Hermes instances. | | `hermes completion` | Print shell completion scripts (bash/zsh). | | `hermes version` | Show version information. | @@ -85,7 +82,7 @@ Common options: | `-q`, `--query "..."` | One-shot, non-interactive prompt. | | `-m`, `--model ` | Override the model for this run. | | `-t`, `--toolsets ` | Enable a comma-separated set of toolsets. | -| `--provider ` | Force a provider: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot-acp`, `copilot`, `anthropic`, `gemini`, `google-gemini-cli`, `huggingface`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`, `xiaomi`, `arcee`, `alibaba`, `deepseek`, `nvidia`, `ollama-cloud`, `xai` (alias `grok`). | +| `--provider ` | Force a provider: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot-acp`, `copilot`, `anthropic`, `gemini`, `google-gemini-cli`, `huggingface`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `kilocode`, `xiaomi`, `arcee`, `alibaba`, `deepseek`, `nvidia`, `ollama-cloud`, `xai` (alias `grok`), `qwen-oauth`, `bedrock`, `opencode-zen`, `opencode-go`, `ai-gateway`. | | `-s`, `--skills ` | Preload one or more skills for the session (can be repeated or comma-separated). | | `-v`, `--verbose` | Verbose output. | | `-Q`, `--quiet` | Programmatic mode: suppress banner/spinner/tool previews. | diff --git a/website/docs/reference/environment-variables.md b/website/docs/reference/environment-variables.md index ff223739af3..640e7be999b 100644 --- a/website/docs/reference/environment-variables.md +++ b/website/docs/reference/environment-variables.md @@ -14,6 +14,8 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config |----------|-------------| | `OPENROUTER_API_KEY` | OpenRouter API key (recommended for flexibility) | | `OPENROUTER_BASE_URL` | Override the OpenRouter-compatible base URL | +| `NOUS_BASE_URL` | Override Nous Portal base URL (rarely needed; development/testing only) | +| `NOUS_INFERENCE_BASE_URL` | Override Nous inference endpoint directly | | `AI_GATEWAY_API_KEY` | Vercel AI Gateway API key ([ai-gateway.vercel.sh](https://ai-gateway.vercel.sh)) | | `AI_GATEWAY_BASE_URL` | Override AI Gateway base URL (default: `https://ai-gateway.vercel.sh/v1`) | | `OPENAI_API_KEY` | API key for custom OpenAI-compatible endpoints (used with `OPENAI_BASE_URL`) | @@ -35,9 +37,9 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config | `ARCEEAI_API_KEY` | Arcee AI API key ([chat.arcee.ai](https://chat.arcee.ai/)) | | `ARCEE_BASE_URL` | Override Arcee base URL (default: `https://api.arcee.ai/api/v1`) | | `MINIMAX_API_KEY` | MiniMax API key — global endpoint ([minimax.io](https://www.minimax.io)) | -| `MINIMAX_BASE_URL` | Override MiniMax base URL (default: `https://api.minimax.io/v1`) | +| `MINIMAX_BASE_URL` | Override MiniMax base URL (default: `https://api.minimax.io/anthropic` — Hermes uses MiniMax's Anthropic Messages-compatible endpoint) | | `MINIMAX_CN_API_KEY` | MiniMax API key — China endpoint ([minimaxi.com](https://www.minimaxi.com)) | -| `MINIMAX_CN_BASE_URL` | Override MiniMax China base URL (default: `https://api.minimaxi.com/v1`) | +| `MINIMAX_CN_BASE_URL` | Override MiniMax China base URL (default: `https://api.minimaxi.com/anthropic`) | | `KILOCODE_API_KEY` | Kilo Code API key ([kilo.ai](https://kilo.ai)) | | `KILOCODE_BASE_URL` | Override Kilo Code base URL (default: `https://api.kilo.ai/api/gateway`) | | `XIAOMI_API_KEY` | Xiaomi MiMo API key ([platform.xiaomimimo.com](https://platform.xiaomimimo.com)) | @@ -53,7 +55,7 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config | `ANTHROPIC_API_KEY` | Anthropic Console API key ([console.anthropic.com](https://console.anthropic.com/)) | | `ANTHROPIC_TOKEN` | Manual or legacy Anthropic OAuth/setup-token override | | `DASHSCOPE_API_KEY` | Alibaba Cloud DashScope API key for Qwen models ([modelstudio.console.alibabacloud.com](https://modelstudio.console.alibabacloud.com/)) | -| `DASHSCOPE_BASE_URL` | Custom DashScope base URL (default: `https://coding-intl.dashscope.aliyuncs.com/v1`) | +| `DASHSCOPE_BASE_URL` | Custom DashScope base URL (default: `https://dashscope-intl.aliyuncs.com/compatible-mode/v1`; use `https://dashscope.aliyuncs.com/compatible-mode/v1` for mainland-China region) | | `DEEPSEEK_API_KEY` | DeepSeek API key for direct DeepSeek access ([platform.deepseek.com](https://platform.deepseek.com/api_keys)) | | `DEEPSEEK_BASE_URL` | Custom DeepSeek API base URL | | `NVIDIA_API_KEY` | NVIDIA NIM API key — Nemotron and open models ([build.nvidia.com](https://build.nvidia.com)) | @@ -62,6 +64,11 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config | `OLLAMA_BASE_URL` | Override Ollama Cloud base URL (default: `https://ollama.com/v1`) | | `XAI_API_KEY` | xAI (Grok) API key for chat + TTS ([console.x.ai](https://console.x.ai/)) | | `XAI_BASE_URL` | Override xAI base URL (default: `https://api.x.ai/v1`) | +| `MISTRAL_API_KEY` | Mistral API key for Voxtral TTS and Voxtral STT ([console.mistral.ai](https://console.mistral.ai)) | +| `AWS_REGION` | AWS region for Bedrock inference (e.g. `us-east-1`, `eu-central-1`). Read by boto3. | +| `AWS_PROFILE` | AWS named profile for Bedrock authentication (reads `~/.aws/credentials`). Leave unset to use default boto3 credential chain. | +| `BEDROCK_BASE_URL` | Override Bedrock runtime base URL (default: `https://bedrock-runtime.us-east-1.amazonaws.com`; usually leave unset and use `AWS_REGION` instead) | +| `HERMES_QWEN_BASE_URL` | Qwen Portal base URL override (default: `https://portal.qwen.ai/v1`) | | `OPENCODE_ZEN_API_KEY` | OpenCode Zen API key — pay-as-you-go access to curated models ([opencode.ai](https://opencode.ai/auth)) | | `OPENCODE_ZEN_BASE_URL` | Override OpenCode Zen base URL | | `OPENCODE_GO_API_KEY` | OpenCode Go API key — $10/month subscription for open models ([opencode.ai](https://opencode.ai/auth)) | @@ -79,7 +86,7 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe | Variable | Description | |----------|-------------| -| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `kilocode`, `xiaomi`, `arcee`, `alibaba`, `deepseek`, `nvidia`, `ollama-cloud`, `xai` (alias `grok`), `google-gemini-cli`, `opencode-zen`, `opencode-go`, `ai-gateway` (default: `auto`) | +| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `kilocode`, `xiaomi`, `arcee`, `alibaba`, `deepseek`, `nvidia`, `ollama-cloud`, `xai` (alias `grok`), `google-gemini-cli`, `qwen-oauth`, `bedrock`, `opencode-zen`, `opencode-go`, `ai-gateway` (default: `auto`) | | `HERMES_PORTAL_BASE_URL` | Override Nous Portal URL (for development/testing) | | `NOUS_INFERENCE_BASE_URL` | Override Nous inference API URL | | `HERMES_NOUS_MIN_KEY_TTL_SECONDS` | Min agent key TTL before re-mint (default: 1800 = 30min) | @@ -189,11 +196,14 @@ For cloud sandbox backends, persistence is filesystem-oriented. `TERMINAL_LIFETI | `TELEGRAM_WEBHOOK_PORT` | Local listen port for webhook server (default: `8443`) | | `TELEGRAM_WEBHOOK_SECRET` | Secret token for verifying updates come from Telegram | | `TELEGRAM_REACTIONS` | Enable emoji reactions on messages during processing (default: `false`) | +| `TELEGRAM_REPLY_TO_MODE` | Reply-reference behavior: `off`, `first` (default), or `all`. Matches the Discord pattern. | | `TELEGRAM_IGNORED_THREADS` | Comma-separated Telegram forum topic/thread IDs where the bot never responds | | `TELEGRAM_PROXY` | Proxy URL for Telegram connections — overrides `HTTPS_PROXY`. Supports `http://`, `https://`, `socks5://` | | `DISCORD_BOT_TOKEN` | Discord bot token | | `DISCORD_ALLOWED_USERS` | Comma-separated Discord user IDs allowed to use the bot | | `DISCORD_ALLOWED_ROLES` | Comma-separated Discord role IDs allowed to use the bot (OR with `DISCORD_ALLOWED_USERS`). Auto-enables the Members intent. Useful when moderation teams churn — role grants propagate automatically. | +| `DISCORD_ALLOWED_CHANNELS` | Comma-separated Discord channel IDs. When set, the bot only responds in these channels (plus DMs if allowed). Overrides `config.yaml` `discord.allowed_channels`. | +| `DISCORD_PROXY` | Proxy URL for Discord connections — overrides `HTTPS_PROXY`. Supports `http://`, `https://`, `socks5://` | | `DISCORD_HOME_CHANNEL` | Default Discord channel for cron delivery | | `DISCORD_HOME_CHANNEL_NAME` | Display name for the Discord home channel | | `DISCORD_REQUIRE_MENTION` | Require an @mention before responding in server channels | @@ -298,6 +308,8 @@ For cloud sandbox backends, persistence is filesystem-oriented. `TERMINAL_LIFETI | `QQ_GROUP_ALLOWED_USERS` | Comma-separated QQ group IDs for group @-message access | | `QQ_ALLOW_ALL_USERS` | Allow all users (`true`/`false`, overrides `QQ_ALLOWED_USERS`) | | `QQBOT_HOME_CHANNEL` | QQ user/group openID for cron delivery and notifications | +| `QQBOT_HOME_CHANNEL_NAME` | Display name for the QQ home channel | +| `QQ_SANDBOX` | Route QQ Bot to the sandbox gateway for development testing (`true`/`false`). Use with a sandbox app credential from [q.qq.com](https://q.qq.com). | | `MATTERMOST_URL` | Mattermost server URL (e.g. `https://mm.example.com`) | | `MATTERMOST_TOKEN` | Bot token or personal access token for Mattermost | | `MATTERMOST_ALLOWED_USERS` | Comma-separated Mattermost user IDs allowed to message the bot | @@ -312,6 +324,8 @@ For cloud sandbox backends, persistence is filesystem-oriented. `TERMINAL_LIFETI | `MATRIX_ALLOWED_USERS` | Comma-separated Matrix user IDs allowed to message the bot (e.g. `@alice:matrix.org`) | | `MATRIX_HOME_ROOM` | Room ID for proactive message delivery (e.g. `!abc123:matrix.org`) | | `MATRIX_ENCRYPTION` | Enable end-to-end encryption (`true`/`false`, default: `false`) | +| `MATRIX_DEVICE_ID` | Stable Matrix device ID for E2EE persistence across restarts (e.g. `HERMES_BOT`). Without this, E2EE keys rotate every startup and historic-room decrypt breaks. | +| `MATRIX_REACTIONS` | Enable processing-lifecycle emoji reactions on inbound messages (default: `true`). Set to `false` to disable. | | `MATRIX_REQUIRE_MENTION` | Require `@mention` in rooms (default: `true`). Set to `false` to respond to all messages. | | `MATRIX_FREE_RESPONSE_ROOMS` | Comma-separated room IDs where bot responds without `@mention` | | `MATRIX_AUTO_THREAD` | Auto-create threads for room messages (default: `true`) | diff --git a/website/docs/reference/faq.md b/website/docs/reference/faq.md index c39f510b1ff..132a4d00a9e 100644 --- a/website/docs/reference/faq.md +++ b/website/docs/reference/faq.md @@ -110,7 +110,7 @@ Yes. Import the `AIAgent` class and use Hermes programmatically: ```python from run_agent import AIAgent -agent = AIAgent(model="openrouter/nous/hermes-3-llama-3.1-70b") +agent = AIAgent(model="anthropic/claude-opus-4.7") response = agent.chat("Explain quantum computing briefly") ``` @@ -243,7 +243,7 @@ Make sure the key matches the provider. An OpenAI key won't work with OpenRouter hermes model # Set a valid model -hermes config set HERMES_MODEL openrouter/nous/hermes-3-llama-3.1-70b +hermes config set HERMES_MODEL anthropic/claude-opus-4.7 # Or specify per-session hermes chat --model openrouter/meta-llama/llama-3.1-70b-instruct @@ -781,7 +781,7 @@ hermes config show | head -20 hermes model # Or test with a known-good model -hermes chat -q "hello" --model anthropic/claude-sonnet-4.6 +hermes chat -q "hello" --model anthropic/claude-opus-4.7 ``` If using OpenRouter, make sure your API key has credits. A 400 from OpenRouter often means the model requires a paid plan or the model ID has a typo. diff --git a/website/docs/reference/optional-skills-catalog.md b/website/docs/reference/optional-skills-catalog.md index bbb2c3b80ea..1501567b791 100644 --- a/website/docs/reference/optional-skills-catalog.md +++ b/website/docs/reference/optional-skills-catalog.md @@ -74,6 +74,7 @@ hermes skills uninstall | Skill | Description | |-------|-------------| +| **fitness-nutrition** | Gym workout planner and nutrition tracker. Search 690+ exercises by muscle, equipment, or category via wger. Look up macros and calories for 380,000+ foods via USDA FoodData Central. Computes BMI, TDEE, one-rep max, macro splits, and body fat — pure Python, no pip installs. | | **neuroskill-bci** | Brain-Computer Interface (BCI) integration for neuroscience research workflows. | ## MCP diff --git a/website/docs/reference/skills-catalog.md b/website/docs/reference/skills-catalog.md index ead50dbea67..e5283ba0154 100644 --- a/website/docs/reference/skills-catalog.md +++ b/website/docs/reference/skills-catalog.md @@ -27,27 +27,32 @@ Skills for spawning and orchestrating autonomous AI coding agents and multi-agen |-------|-------------|------| | `claude-code` | Delegate coding tasks to Claude Code (Anthropic's CLI agent). Use for building features, refactoring, PR reviews, and iterative coding. Requires the claude CLI installed. | `autonomous-ai-agents/claude-code` | | `codex` | Delegate coding tasks to OpenAI Codex CLI agent. Use for building features, refactoring, PR reviews, and batch issue fixing. Requires the codex CLI and a git repository. | `autonomous-ai-agents/codex` | -| `hermes-agent-spawning` | Spawn additional Hermes Agent instances as autonomous subprocesses for independent long-running tasks. Supports non-interactive one-shot mode (-q) and interactive PTY mode for multi-turn collaboration. Different from delegate_task — this runs a full separate hermes process. | `autonomous-ai-agents/hermes-agent` | +| `hermes-agent` | Complete guide to using and extending Hermes Agent — CLI usage, setup, configuration, spawning additional agents, gateway platforms, skills, voice, tools, profiles, and a concise contributor reference. Load this skill when helping users configure Hermes, troubleshoot issues, s… | `autonomous-ai-agents/hermes-agent` | | `opencode` | Delegate coding tasks to OpenCode CLI agent for feature implementation, refactoring, PR review, and long-running autonomous sessions. Requires the opencode CLI installed and authenticated. | `autonomous-ai-agents/opencode` | +## creative + +Creative content generation — ASCII art, hand-drawn diagrams, animations, music, and visual design tools. + +| Skill | Description | Path | +|-------|-------------|------| +| `architecture-diagram` | Generate dark-themed SVG diagrams of software systems and cloud infrastructure as standalone HTML files with inline SVG graphics. Semantic component colors (cyan=frontend, emerald=backend, violet=database, amber=cloud/AWS, rose=security, orange=message bus), JetBrains Mono fon… | `creative/architecture-diagram` | +| `ascii-art` | Generate ASCII art using pyfiglet (571 fonts), cowsay, boxes, toilet, image-to-ascii, remote APIs (asciified, ascii.co.uk), and LLM fallback. No API keys required. | `creative/ascii-art` | +| `ascii-video` | Production pipeline for ASCII art video — any format. Converts video/audio/images/generative input into colored ASCII character video output (MP4, GIF, image sequence). Covers: video-to-ASCII conversion, audio-reactive music visualizers, generative ASCII art animations, hybrid… | `creative/ascii-video` | +| `excalidraw` | Create hand-drawn style diagrams using Excalidraw JSON format. Generate .excalidraw files for architecture diagrams, flowcharts, sequence diagrams, concept maps, and more. Files can be opened at excalidraw.com or uploaded for shareable links. | `creative/excalidraw` | +| `ideation` | Generate project ideas through creative constraints. Use when the user says 'I want to build something', 'give me a project idea', 'I'm bored', 'what should I make', 'inspire me', or any variant of 'I have tools but no direction'. Works for code, art, hardware, writing, tools,… | `creative/creative-ideation` | +| `manim-video` | Production pipeline for mathematical and technical animations using Manim Community Edition. Creates 3Blue1Brown-style explainer videos, algorithm visualizations, equation derivations, architecture diagrams, and data stories. Use when users request: animated explanations, math… | `creative/manim-video` | +| `p5js` | Production pipeline for interactive and generative visual art using p5.js. Creates browser-based sketches, generative art, data visualizations, interactive experiences, 3D scenes, audio-reactive visuals, and motion graphics — exported as HTML, PNG, GIF, MP4, or SVG. Covers: 2D… | `creative/p5js` | +| `popular-web-designs` | 54 production-quality design systems extracted from real websites. Load a template to generate HTML/CSS that matches the visual identity of sites like Stripe, Linear, Vercel, Notion, Airbnb, and more. Each template includes colors, typography, components, layout rules, and rea… | `creative/popular-web-designs` | +| `songwriting-and-ai-music` | Songwriting craft, AI music generation prompts (Suno focus), parody/adaptation techniques, phonetic tricks, and lessons learned. These are tools and ideas, not rules. Break any of them when the art calls for it. | `creative/songwriting-and-ai-music` | + ## data-science Skills for data science workflows — interactive exploration, Jupyter notebooks, data analysis, and visualization. | Skill | Description | Path | |-------|-------------|------| -| `jupyter-live-kernel` | Use a live Jupyter kernel for stateful, iterative Python execution via hamelnb. Load this skill when the task involves exploration, iteration, or inspecting intermediate results. | `data-science/jupyter-live-kernel` | - -## creative - -Creative content generation — ASCII art, hand-drawn style diagrams, and visual design tools. - -| Skill | Description | Path | -|-------|-------------|------| -| `ascii-art` | Generate ASCII art using pyfiglet (571 fonts), cowsay, boxes, toilet, image-to-ascii, remote APIs (asciified, ascii.co.uk), and LLM fallback. No API keys required. | `creative/ascii-art` | -| `ascii-video` | "Production pipeline for ASCII art video — any format. Converts video/audio/images/generative input into colored ASCII character video output (MP4, GIF, image sequence). Covers: video-to-ASCII conversion, audio-reactive music visualizers, generative ASCII art animations, hybrid… | `creative/ascii-video` | -| `excalidraw` | Create hand-drawn style diagrams using Excalidraw JSON format. Generate .excalidraw files for architecture diagrams, flowcharts, sequence diagrams, concept maps, and more. Files can be opened at excalidraw.com or uploaded for shareable links. | `creative/excalidraw` | -| `p5js` | Production pipeline for interactive and generative visual art using p5.js. Create sketches, render them to images/video via headless browser, and serve live previews. Supports canvas animations, data visualizations, and creative coding experiments. | `creative/p5js` | +| `jupyter-live-kernel` | Use a live Jupyter kernel for stateful, iterative Python execution via hamelnb. Load this skill when the task involves exploration, iteration, or inspecting intermediate results — data science, ML experimentation, API exploration, or building up complex code step-by-step. Uses… | `data-science/jupyter-live-kernel` | ## devops @@ -55,14 +60,15 @@ DevOps and infrastructure automation skills. | Skill | Description | Path | |-------|-------------|------| -| `webhook-subscriptions` | Create and manage webhook subscriptions for event-driven agent activation. External services (GitHub, Stripe, CI/CD, IoT) POST events to trigger agent runs. Requires webhook platform to be enabled. | `devops/webhook-subscriptions` | +| `webhook-subscriptions` | Create and manage webhook subscriptions for event-driven agent activation. Use when the user wants external services to trigger agent runs automatically. | `devops/webhook-subscriptions` | ## dogfood +Internal dogfooding and QA skills used to test Hermes Agent itself. + | Skill | Description | Path | |-------|-------------|------| -| `dogfood` | Systematic exploratory QA testing of web applications — find bugs, capture evidence, and generate structured reports. | `dogfood/dogfood` | -| `hermes-agent-setup` | Help users configure Hermes Agent — CLI usage, setup wizard, model/provider selection, tools, skills, voice/STT/TTS, gateway, and troubleshooting. | `dogfood/hermes-agent-setup` | +| `dogfood` | Systematic exploratory QA testing of web applications — find bugs, capture evidence, and generate structured reports | `dogfood` | ## email @@ -83,7 +89,7 @@ Skills for setting up, configuring, and managing game servers, modpacks, and gam ## github -GitHub workflow skills for managing repositories, pull requests, code reviews, issues, and CI/CD pipelines using the gh CLI and git via terminal. +GitHub workflow skills for managing repositories, pull requests, code reviews, issues, and CI/CD pipelines. | Skill | Description | Path | |-------|-------------|------| @@ -94,23 +100,17 @@ GitHub workflow skills for managing repositories, pull requests, code reviews, i | `github-pr-workflow` | Full pull request lifecycle — create branches, commit changes, open PRs, monitor CI status, auto-fix failures, and merge. Works with gh CLI or falls back to git + GitHub REST API via curl. | `github/github-pr-workflow` | | `github-repo-management` | Clone, create, fork, configure, and manage GitHub repositories. Manage remotes, secrets, releases, and workflows. Works with gh CLI or falls back to git + GitHub REST API via curl. | `github/github-repo-management` | -## inference-sh - -Skills for AI app execution via inference.sh cloud platform. - -| Skill | Description | Path | -|-------|-------------|------| -| `inference-sh-cli` | Run 150+ AI apps via inference.sh CLI (infsh) — image generation, video creation, LLMs, search, 3D, social automation. | `inference-sh/cli` | - ## leisure +Skills for discovery and everyday tasks. + | Skill | Description | Path | |-------|-------------|------| | `find-nearby` | Find nearby places (restaurants, cafes, bars, pharmacies, etc.) using OpenStreetMap. Works with coordinates, addresses, cities, zip codes, or Telegram location pins. No API keys needed. | `leisure/find-nearby` | ## mcp -Skills for working with MCP (Model Context Protocol) servers, tools, and integrations. Includes the built-in native MCP client (configure servers in config.yaml for automatic tool discovery) and the mcporter CLI bridge for ad-hoc server interaction. +Skills for working with MCP (Model Context Protocol) servers, tools, and integrations. | Skill | Description | Path | |-------|-------------|------| @@ -126,7 +126,7 @@ Skills for working with media content — YouTube transcripts, GIF search, music | `gif-search` | Search and download GIFs from Tenor using curl. No dependencies beyond curl and jq. Useful for finding reaction GIFs, creating visual content, and sending GIFs in chat. | `media/gif-search` | | `heartmula` | Set up and run HeartMuLa, the open-source music generation model family (Suno-like). Generates full songs from lyrics + tags with multilingual support. | `media/heartmula` | | `songsee` | Generate spectrograms and audio feature visualizations (mel, chroma, MFCC, tempogram, etc.) from audio files via CLI. Useful for audio analysis, music production debugging, and visual documentation. | `media/songsee` | -| `youtube-content` | Fetch YouTube video transcripts and transform them into structured content (chapters, summaries, threads, blog posts). | `media/youtube-content` | +| `youtube-content` | Fetch YouTube video transcripts and transform them into structured content (chapters, summaries, threads, blog posts). Use when the user shares a YouTube URL or video link, asks to summarize a video, requests a transcript, or wants to extract and reformat content from any YouT… | `media/youtube-content` | ## mlops @@ -134,7 +134,7 @@ General-purpose ML operations tools — model hub management, dataset operations | Skill | Description | Path | |-------|-------------|------| -| `huggingface-hub` | Hugging Face Hub CLI (hf) — search, download, and upload models and datasets, manage repos, deploy inference endpoints. | `mlops/huggingface-hub` | +| `huggingface-hub` | Hugging Face Hub CLI (hf) — search, download, and upload models and datasets, manage repos, query datasets with SQL, deploy inference endpoints, manage Spaces and buckets. | `mlops/huggingface-hub` | ## mlops/cloud @@ -142,19 +142,15 @@ GPU cloud providers and serverless compute platforms for ML workloads. | Skill | Description | Path | |-------|-------------|------| -| `lambda-labs-gpu-cloud` | Reserved and on-demand GPU cloud instances for ML training and inference. Use when you need dedicated GPU instances with simple SSH access, persistent filesystems, or high-performance multi-node clusters for large-scale training. | `mlops/cloud/lambda-labs` | | `modal-serverless-gpu` | Serverless GPU cloud platform for running ML workloads. Use when you need on-demand GPU access without infrastructure management, deploying ML models as APIs, or running batch jobs with automatic scaling. | `mlops/cloud/modal` | ## mlops/evaluation -Model evaluation benchmarks, experiment tracking, data curation, tokenizers, and interpretability tools. +Model evaluation benchmarks, experiment tracking, and interpretability tools. | Skill | Description | Path | |-------|-------------|------| -| `evaluating-llms-harness` | Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Sup… | `mlops/evaluation/lm-evaluation-harness` | -| `huggingface-tokenizers` | Fast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds. Supports BPE, WordPiece, and Unigram algorithms. Train custom vocabularies, track alignments, handle padding/truncation. Integrates seamlessly with transformers. Use… | `mlops/evaluation/huggingface-tokenizers` | -| `nemo-curator` | GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII redaction, NSFW detection. Scales across GPUs with RAPIDS. Use for preparing high-quality t… | `mlops/evaluation/nemo-curator` | -| `sparse-autoencoder-training` | Provides guidance for training and analyzing Sparse Autoencoders (SAEs) using SAELens to decompose neural network activations into interpretable features. Use when discovering interpretable features, analyzing superposition, or studying monosemantic representations in language m… | `mlops/evaluation/saelens` | +| `evaluating-llms-harness` | Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. S… | `mlops/evaluation/lm-evaluation-harness` | | `weights-and-biases` | Track ML experiments with automatic logging, visualize training in real-time, optimize hyperparameters with sweeps, and manage model registry with W&B - collaborative MLOps platform | `mlops/evaluation/weights-and-biases` | ## mlops/inference @@ -163,25 +159,22 @@ Model serving, quantization (GGUF/GPTQ), structured output, inference optimizati | Skill | Description | Path | |-------|-------------|------| -| `instructor` | Extract structured data from LLM responses with Pydantic validation, retry failed extractions automatically, parse complex JSON with type safety, and stream partial results with Instructor - battle-tested structured output library | `mlops/inference/instructor` | -| `llama-cpp` | Run LLM inference with llama.cpp on CPU, Apple Silicon, AMD/Intel GPUs, or NVIDIA — plus GGUF model conversion and quantization (2–8 bit with K-quants and imatrix). Covers CLI, Python bindings, OpenAI-compatible server, and Ollama/LM Studio integration. Use for edge deployment, M1/M2/M3/M4 Macs, CUDA-less environments, or flexible local quantization. | `mlops/inference/llama-cpp` | -| `obliteratus` | Remove refusal behaviors from open-weight LLMs using OBLITERATUS — mechanistic interpretability techniques (diff-in-means, SVD, whitened SVD, LEACE, SAE decomposition, etc.) to excise guardrails while preserving reasoning. 9 CLI methods, 28 analysis modules, 116 model presets ac… | `mlops/inference/obliteratus` | +| `llama-cpp` | Run LLM inference with llama.cpp on CPU, Apple Silicon, AMD/Intel GPUs, or NVIDIA — plus GGUF model conversion and quantization (2–8 bit with K-quants and imatrix). Covers CLI, Python bindings, OpenAI-compatible server, and Ollama/LM Studio integration. Use for edge deployment… | `mlops/inference/llama-cpp` | +| `obliteratus` | Remove refusal behaviors from open-weight LLMs using OBLITERATUS — mechanistic interpretability techniques (diff-in-means, SVD, whitened SVD, LEACE, SAE decomposition, etc.) to excise guardrails while preserving reasoning. 9 CLI methods, 28 analysis modules, 116 model presets … | `mlops/inference/obliteratus` | | `outlines` | Guarantee valid JSON/XML/code structure during generation, use Pydantic models for type-safe outputs, support local models (Transformers, vLLM), and maximize inference speed with Outlines - dottxt.ai's structured generation library | `mlops/inference/outlines` | -| `serving-llms-vllm` | Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), an… | `mlops/inference/vllm` | -| `tensorrt-llm` | Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and mult… | `mlops/inference/tensorrt-llm` | +| `serving-llms-vllm` | Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), … | `mlops/inference/vllm` | ## mlops/models -Specific model architectures and tools — computer vision (CLIP, SAM, Stable Diffusion), speech (Whisper), audio generation (AudioCraft), and multimodal models (LLaVA). +Specific model architectures — computer vision (CLIP, SAM, Stable Diffusion), speech (Whisper), and audio generation (AudioCraft). | Skill | Description | Path | |-------|-------------|------| | `audiocraft-audio-generation` | PyTorch library for audio generation including text-to-music (MusicGen) and text-to-sound (AudioGen). Use when you need to generate music from text descriptions, create sound effects, or perform melody-conditioned music generation. | `mlops/models/audiocraft` | -| `clip` | OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-purpo… | `mlops/models/clip` | -| `llava` | Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language cha… | `mlops/models/llava` | +| `clip` | OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-pur… | `mlops/models/clip` | | `segment-anything-model` | Foundation model for image segmentation with zero-shot transfer. Use when you need to segment any object in images using points, boxes, or masks as prompts, or automatically generate all object masks in an image. | `mlops/models/segment-anything` | | `stable-diffusion-image-generation` | State-of-the-art text-to-image generation with Stable Diffusion models via HuggingFace Diffusers. Use when generating images from text prompts, performing image-to-image translation, inpainting, or building custom diffusion pipelines. | `mlops/models/stable-diffusion` | -| `whisper` | OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio proc… | `mlops/models/whisper` | +| `whisper` | OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio pr… | `mlops/models/whisper` | ## mlops/research @@ -193,37 +186,19 @@ ML research frameworks for building and optimizing AI systems with declarative p ## mlops/training -Fine-tuning, RLHF/DPO/GRPO training, distributed training frameworks, and optimization tools for training LLMs and other models. +Fine-tuning, RLHF/DPO/GRPO training, distributed training frameworks, and optimization tools. | Skill | Description | Path | |-------|-------------|------| | `axolotl` | Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support | `mlops/training/axolotl` | -| `distributed-llm-pretraining-torchtitan` | Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing. | `mlops/training/torchtitan` | -| `fine-tuning-with-trl` | Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Tr… | `mlops/training/trl-fine-tuning` | -| `hermes-atropos-environments` | Build, test, and debug Hermes Agent RL environments for Atropos training. Covers the HermesAgentBaseEnv interface, reward functions, agent loop integration, evaluation with tools, wandb logging, and the three CLI modes (serve/process/evaluate). Use when creating, reviewing, or f… | `mlops/training/hermes-atropos-environments` | -| `huggingface-accelerate` | Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard. | `mlops/training/accelerate` | -| `optimizing-attention-flash` | Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA,… | `mlops/training/flash-attention` | -| `peft-fine-tuning` | Parameter-efficient fine-tuning for LLMs using LoRA, QLoRA, and 25+ methods. Use when fine-tuning large models (7B-70B) with limited GPU memory, when you need to train <1% of parameters with minimal accuracy loss, or for multi-adapter serving. HuggingFace's official library i… | `mlops/training/peft` | +| `fine-tuning-with-trl` | Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace … | `mlops/training/trl-fine-tuning` | +| `peft-fine-tuning` | Parameter-efficient fine-tuning for LLMs using LoRA, QLoRA, and 25+ methods. Use when fine-tuning large models (7B-70B) with limited GPU memory, when you need to train <1% of parameters with minimal accuracy loss, or for multi-adapter serving. HuggingFace's official library… | `mlops/training/peft` | | `pytorch-fsdp` | Expert guidance for Fully Sharded Data Parallel training with PyTorch FSDP - parameter sharding, mixed precision, CPU offloading, FSDP2 | `mlops/training/pytorch-fsdp` | -| `pytorch-lightning` | High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices. | `mlops/training/pytorch-lightning` | -| `simpo-training` | Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO. | `mlops/training/simpo` | -| `slime-rl-training` | Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling. | `mlops/training/slime` | | `unsloth` | Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization | `mlops/training/unsloth` | -## mlops/vector-databases - -Vector similarity search and embedding databases for RAG, semantic search, and AI application backends. - -| Skill | Description | Path | -|-------|-------------|------| -| `chroma` | Open-source embedding database for AI applications. Store embeddings and metadata, perform vector and full-text search, filter by metadata. Simple 4-function API. Scales from notebooks to production clusters. Use for semantic search, RAG applications, or document retrieval. Best… | `mlops/vector-databases/chroma` | -| `faiss` | Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). Use for fast k-NN search, large-scale vector retrieval, or when you need pure similarity search without… | `mlops/vector-databases/faiss` | -| `pinecone` | Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for server… | `mlops/vector-databases/pinecone` | -| `qdrant-vector-search` | High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid search with filtering, or scalable vector storage with Rust-powered performance. | `mlops/vector-databases/qdrant` | - ## note-taking -Note taking skills, to save information, assist with research, and collab on multi-session planning and information sharing. +Note taking skills, to save information, assist with research, and collaborate on multi-session planning. | Skill | Description | Path | |-------|-------------|------| @@ -235,26 +210,12 @@ Skills for document creation, presentations, spreadsheets, and other productivit | Skill | Description | Path | |-------|-------------|------| -| `google-workspace` | Gmail, Calendar, Drive, Contacts, Sheets, and Docs integration via Python. Uses OAuth2 with automatic token refresh. No external binaries needed — runs entirely with Google's Python client libraries in the Hermes venv. | `productivity/google-workspace` | -| `linear` | Manage Linear issues, projects, and teams via the GraphQL API. Create, update, search, and organize issues. | `productivity/linear` | +| `google-workspace` | Gmail, Calendar, Drive, Contacts, Sheets, and Docs integration for Hermes. Uses Hermes-managed OAuth2 setup, prefers the Google Workspace CLI (`gws`) when available for broader API coverage, and falls back to the Python client libraries otherwise. | `productivity/google-workspace` | +| `linear` | Manage Linear issues, projects, and teams via the GraphQL API. Create, update, search, and organize issues. Uses API key auth (no OAuth needed). All operations via curl — no dependencies. | `productivity/linear` | | `nano-pdf` | Edit PDFs with natural-language instructions using the nano-pdf CLI. Modify text, fix typos, update titles, and make content changes to specific pages without manual editing. | `productivity/nano-pdf` | | `notion` | Notion API for creating and managing pages, databases, and blocks via curl. Search, create, update, and query Notion workspaces directly from the terminal. | `productivity/notion` | | `ocr-and-documents` | Extract text from PDFs and scanned documents. Use web_extract for remote URLs, pymupdf for local text-based PDFs, marker-pdf for OCR/scanned docs. For DOCX use python-docx, for PPTX see the powerpoint skill. | `productivity/ocr-and-documents` | -| `powerpoint` | "Use this skill any time a .pptx file is involved in any way — as input, output, or both. This includes: creating slide decks, pitch decks, or presentations; reading, parsing, or extracting text from any .pptx file (even if the extracted content will be used elsewhere, like in a… | `productivity/powerpoint` | - -## research - -Skills for academic research, paper discovery, literature review, domain reconnaissance, market data, content monitoring, and scientific knowledge retrieval. - -| Skill | Description | Path | -|-------|-------------|------| -| `arxiv` | Search and retrieve academic papers from arXiv using their free REST API. No API key needed. Search by keyword, author, category, or ID. Combine with web_extract or the ocr-and-documents skill to read full paper content. | `research/arxiv` | -| `blogwatcher` | Monitor blogs and RSS/Atom feeds for updates using the blogwatcher CLI. Add blogs, scan for new articles, and track what you've read. | `research/blogwatcher` | -| `llm-wiki` | Karpathy's LLM Wiki — build and maintain a persistent, interlinked markdown knowledge base. Ingest sources, query compiled knowledge, and lint for consistency. Unlike RAG, the wiki compiles knowledge once and keeps it current. Works as an Obsidian vault. Wiki path is controlled by the `WIKI_PATH` env var (defaults to `~/wiki`). | `research/llm-wiki` | -| `domain-intel` | Passive domain reconnaissance using Python stdlib. Subdomain discovery, SSL certificate inspection, WHOIS lookups, DNS records, domain availability checks, and bulk multi-domain analysis. No API keys required. | `research/domain-intel` | -| `duckduckgo-search` | Free web search via DuckDuckGo — text, news, images, videos. No API key needed. Prefer the `ddgs` CLI when installed; use the Python DDGS library only after verifying that `ddgs` is available in the current runtime. | `research/duckduckgo-search` | -| `ml-paper-writing` | Write publication-ready ML/AI papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM. Use when drafting papers from research repos, structuring arguments, verifying citations, or preparing camera-ready submissions. Includes LaTeX templates, reviewer guidelines, and citation verificatio… | `research/ml-paper-writing` | -| `polymarket` | Query Polymarket prediction market data — search markets, get prices, orderbooks, and price history. Read-only via public REST APIs, no API key needed. | `research/polymarket` | +| `powerpoint` | Use this skill any time a .pptx file is involved in any way — as input, output, or both. This includes: creating slide decks, pitch decks, or presentations; reading, parsing, or extracting text from any .pptx file (even if the extracted content will be used elsewhere, like in … | `productivity/powerpoint` | ## red-teaming @@ -262,7 +223,19 @@ Skills for LLM red-teaming, jailbreaking, and safety filter bypass research. | Skill | Description | Path | |-------|-------------|------| -| `godmode` | Jailbreak API-served LLMs using G0DM0D3 techniques — Parseltongue input obfuscation (33 techniques), GODMODE CLASSIC system prompt templates, ULTRAPLINIAN multi-model racing, encoding escalation, and Hermes-native prefill/system prompt integration. Works on any model accessible via API including closed-source models. | `red-teaming/godmode` | +| `godmode` | Jailbreak API-served LLMs using G0DM0D3 techniques — Parseltongue input obfuscation (33 techniques), GODMODE CLASSIC system prompt templates, ULTRAPLINIAN multi-model racing, encoding escalation, and Hermes-native prefill/system prompt integration. Use when a user wants to byp… | `red-teaming/godmode` | + +## research + +Skills for academic research, paper discovery, literature review, market data, content monitoring, and scientific knowledge retrieval. + +| Skill | Description | Path | +|-------|-------------|------| +| `arxiv` | Search and retrieve academic papers from arXiv using their free REST API. No API key needed. Search by keyword, author, category, or ID. Combine with web_extract or the ocr-and-documents skill to read full paper content. | `research/arxiv` | +| `blogwatcher` | Monitor blogs and RSS/Atom feeds for updates using the blogwatcher-cli tool. Add blogs, scan for new articles, track read status, and filter by category. | `research/blogwatcher` | +| `llm-wiki` | Karpathy's LLM Wiki — build and maintain a persistent, interlinked markdown knowledge base. Ingest sources, query compiled knowledge, and lint for consistency. | `research/llm-wiki` | +| `polymarket` | Query Polymarket prediction market data — search markets, get prices, orderbooks, and price history. Read-only via public REST APIs, no API key needed. | `research/polymarket` | +| `research-paper-writing` | End-to-end pipeline for writing ML/AI research papers — from experiment design through analysis, drafting, revision, and submission. Covers NeurIPS, ICML, ICLR, ACL, AAAI, COLM. Integrates automated experiment monitoring, statistical analysis, iterative writing, and citation v… | `research/research-paper-writing` | ## smart-home @@ -278,20 +251,22 @@ Skills for interacting with social platforms — posting, reading, monitoring, a | Skill | Description | Path | |-------|-------------|------| -| `xitter` | Interact with X/Twitter via the x-cli terminal client using official X API credentials. | `social-media/xitter` | +| `xitter` | Interact with X/Twitter via the x-cli terminal client using official X API credentials. Use for posting, reading timelines, searching tweets, liking, retweeting, bookmarks, mentions, and user lookups. | `social-media/xitter` | ## software-development +General software-engineering skills — planning, reviewing, debugging, and test-driven development. + | Skill | Description | Path | |-------|-------------|------| -| `code-review` | Guidelines for performing thorough code reviews with security and quality focus | `software-development/code-review` | -| `plan` | Plan mode for Hermes — inspect context, write a markdown plan into `.hermes/plans/` in the active workspace/backend working directory, and do not execute the work. | `software-development/plan` | -| `requesting-code-review` | Use when completing tasks, implementing major features, or before merging. Validates work meets requirements through systematic review process. | `software-development/requesting-code-review` | +| `plan` | Plan mode for Hermes — inspect context, write a markdown plan into the active workspace's `.hermes/plans/` directory, and do not execute the work. | `software-development/plan` | +| `requesting-code-review` | Pre-commit verification pipeline — static security scan, baseline-aware quality gates, independent reviewer subagent, and auto-fix loop. Use after code changes and before committing, pushing, or opening a PR. | `software-development/requesting-code-review` | | `subagent-driven-development` | Use when executing implementation plans with independent tasks. Dispatches fresh delegate_task per task with two-stage review (spec compliance then code quality). | `software-development/subagent-driven-development` | | `systematic-debugging` | Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation — NO fixes without understanding the problem first. | `software-development/systematic-debugging` | | `test-driven-development` | Use when implementing any feature or bugfix, before writing implementation code. Enforces RED-GREEN-REFACTOR cycle with test-first approach. | `software-development/test-driven-development` | | `writing-plans` | Use when you have a spec or requirements for a multi-step task. Creates comprehensive implementation plans with bite-sized tasks, exact file paths, and complete code examples. | `software-development/writing-plans` | + --- # Optional Skills diff --git a/website/docs/reference/slash-commands.md b/website/docs/reference/slash-commands.md index 214b2866d07..79453474fc8 100644 --- a/website/docs/reference/slash-commands.md +++ b/website/docs/reference/slash-commands.md @@ -35,7 +35,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in | `/queue ` (alias: `/q`) | Queue a prompt for the next turn (doesn't interrupt the current agent response). **Note:** `/q` is claimed by both `/queue` and `/quit`; the last registration wins, so `/q` resolves to `/quit` in practice. Use `/queue` explicitly. | | `/resume [name]` | Resume a previously-named session | | `/status` | Show session info | -| `/snapshot` (alias: `/snap`) | Create or restore state snapshots of Hermes config/state (usage: /snapshot [create\|restore \\|prune]) | +| `/agents` (alias: `/tasks`) | Show active agents and running tasks across the current session. | | `/background ` (alias: `/bg`) | Run a prompt in a separate background session. The agent processes your prompt independently — your current session stays free for other work. Results appear as a panel when the task finishes. See [CLI Background Sessions](/docs/user-guide/cli#background-sessions). | | `/btw ` | Ephemeral side question using session context (no tools, not persisted). Useful for quick clarifications without affecting the conversation history. | | `/plan [request]` | Load the bundled `plan` skill to write a markdown plan instead of executing the work. Plans are saved under `.hermes/plans/` relative to the active workspace/backend working directory. | @@ -50,9 +50,8 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in | `/provider` | Show available providers and current provider | | `/personality` | Set a predefined personality | | `/verbose` | Cycle tool progress display: off → new → all → verbose. Can be [enabled for messaging](#notes) via config. | -| `/fast` | Toggle fast mode — OpenAI Priority Processing / Anthropic Fast Mode (usage: /fast [normal\|fast\|status]) | +| `/fast [normal\|fast\|status]` | Toggle fast mode — OpenAI Priority Processing / Anthropic Fast Mode. Options: `normal`, `fast`, `status`. | | `/reasoning` | Manage reasoning effort and display (usage: /reasoning [level\|show\|hide]) | -| `/fast [normal\|fast\|status]` | Toggle fast mode — OpenAI Priority Processing / Anthropic Fast Mode. Options: `normal`, `fast`, `status`, `on`, `off`. | | `/skin` | Show or change the display skin/theme | | `/statusbar` (alias: `/sb`) | Toggle the context/model status bar on or off | | `/voice [on\|off\|tts\|status]` | Toggle CLI voice mode and spoken playback. Recording uses `voice.record_key` (default: `Ctrl+B`). | @@ -80,6 +79,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in | `/insights` | Show usage insights and analytics (last 30 days) | | `/platforms` (alias: `/gateway`) | Show gateway/messaging platform status | | `/paste` | Check clipboard for an image and attach it | +| `/copy [number]` | Copy the last assistant response to clipboard (or the Nth-from-last with a number). CLI-only. | | `/image ` | Attach a local image file for your next prompt. | | `/debug` | Upload debug report (system info + logs) and get shareable links. Also available in messaging. | | `/profile` | Show active profile name and home directory | @@ -151,8 +151,6 @@ The messaging gateway supports the following built-in commands inside Telegram, | `/deny` | Reject a pending dangerous command. | | `/update` | Update Hermes Agent to the latest version. | | `/restart` | Gracefully restart the gateway after draining active runs. When the gateway comes back online, it sends a confirmation to the requester's chat/thread. | -| `/fast [normal\|fast\|status]` | Toggle fast mode — OpenAI Priority Processing / Anthropic Fast Mode. | -| `/debug` | Upload debug report (system info + logs) and get shareable links. | | `/debug` | Upload debug report (system info + logs) and get shareable links. | | `/help` | Show messaging help. | | `/` | Invoke any installed skill by name. | diff --git a/website/docs/reference/tools-reference.md b/website/docs/reference/tools-reference.md index e1138dc00a1..40d44627ec7 100644 --- a/website/docs/reference/tools-reference.md +++ b/website/docs/reference/tools-reference.md @@ -6,9 +6,9 @@ description: "Authoritative reference for Hermes built-in tools, grouped by tool # Built-in Tools Reference -This page documents all 47 built-in tools in the Hermes tool registry, grouped by toolset. Availability varies by platform, credentials, and enabled toolsets. +This page documents all 52 built-in tools in the Hermes tool registry, grouped by toolset. Availability varies by platform, credentials, and enabled toolsets. -**Quick counts:** 10 browser tools, 4 file tools, 10 RL tools, 4 Home Assistant tools, 2 terminal tools, 2 web tools, and 15 standalone tools across other toolsets. +**Quick counts:** 10 browser tools, 4 file tools, 10 RL tools, 4 Home Assistant tools, 2 terminal tools, 2 web tools, 5 Feishu tools, and 15 standalone tools across other toolsets. :::tip MCP Tools In addition to built-in tools, Hermes can load tools dynamically from MCP servers. MCP tools appear with a server-name prefix (e.g., `github_create_issue` for the `github` MCP server). See [MCP Integration](/docs/user-guide/features/mcp) for configuration. @@ -53,6 +53,25 @@ In addition to built-in tools, Hermes can load tools dynamically from MCP server |------|-------------|----------------------| | `delegate_task` | Spawn one or more subagents to work on tasks in isolated contexts. Each subagent gets its own conversation, terminal session, and toolset. Only the final summary is returned -- intermediate tool results never enter your context window. TWO… | — | +## `feishu_doc` toolset + +Scoped to the Feishu document-comment intelligent-reply handler (`gateway/platforms/feishu_comment.py`). Not exposed on `hermes-cli` or the regular Feishu chat adapter. + +| Tool | Description | Requires environment | +|------|-------------|----------------------| +| `feishu_doc_read` | Read the full text content of a Feishu/Lark document (Docx, Doc, or Sheet) given its file_type and token. | Feishu app credentials | + +## `feishu_drive` toolset + +Scoped to the Feishu document-comment handler. Drives comment read/write operations on drive files. + +| Tool | Description | Requires environment | +|------|-------------|----------------------| +| `feishu_drive_add_comment` | Add a top-level comment on a Feishu/Lark document or file. | Feishu app credentials | +| `feishu_drive_list_comments` | List whole-document comments on a Feishu/Lark file, most recent first. | Feishu app credentials | +| `feishu_drive_list_comment_replies` | List replies on a specific Feishu comment thread (whole-doc or local-selection). | Feishu app credentials | +| `feishu_drive_reply_comment` | Post a reply on a Feishu comment thread, with optional `@`-mention. | Feishu app credentials | + ## `file` toolset | Tool | Description | Requires environment | diff --git a/website/docs/reference/toolsets-reference.md b/website/docs/reference/toolsets-reference.md index e941015b6a9..7593a3fdcfd 100644 --- a/website/docs/reference/toolsets-reference.md +++ b/website/docs/reference/toolsets-reference.md @@ -57,6 +57,8 @@ Or in-session: | `code_execution` | `execute_code` | Run Python scripts that call Hermes tools programmatically. | | `cronjob` | `cronjob` | Schedule and manage recurring tasks. | | `delegation` | `delegate_task` | Spawn isolated subagent instances for parallel work. | +| `feishu_doc` | `feishu_doc_read` | Read Feishu/Lark document content. Used by the Feishu document-comment intelligent-reply handler. | +| `feishu_drive` | `feishu_drive_add_comment`, `feishu_drive_list_comments`, `feishu_drive_list_comment_replies`, `feishu_drive_reply_comment` | Feishu/Lark drive comment operations. Scoped to the comment agent; not exposed on `hermes-cli` or other messaging toolsets. | | `file` | `patch`, `read_file`, `search_files`, `write_file` | File reading, writing, searching, and editing. | | `homeassistant` | `ha_call_service`, `ha_get_state`, `ha_list_entities`, `ha_list_services` | Smart home control via Home Assistant. Only available when `HASS_TOKEN` is set. | | `image_gen` | `image_generate` | Text-to-image generation via FAL.ai. | @@ -79,7 +81,7 @@ These expand to multiple core toolsets, providing a convenient shorthand for com | Toolset | Expands to | Use case | |---------|-----------|----------| -| `debugging` | `patch`, `process`, `read_file`, `search_files`, `terminal`, `web_extract`, `web_search`, `write_file` | Debug sessions — file access, terminal, and web research without browser or delegation overhead. | +| `debugging` | `web` + `file` + `process`, `terminal` (via `includes`) — effectively `patch`, `process`, `read_file`, `search_files`, `terminal`, `web_extract`, `web_search`, `write_file` | Debug sessions — file access, terminal, and web research without browser or delegation overhead. | | `safe` | `image_generate`, `vision_analyze`, `web_extract`, `web_search` | Read-only research and media generation. No file writes, no terminal access, no code execution. Good for untrusted or constrained environments. | ## Platform Toolsets @@ -88,7 +90,7 @@ Platform toolsets define the complete tool configuration for a deployment target | Toolset | Differences from `hermes-cli` | |---------|-------------------------------| -| `hermes-cli` | Full toolset — all 36 tools including `clarify`. The default for interactive CLI sessions. | +| `hermes-cli` | Full toolset — all 36 core tools including `clarify`. The default for interactive CLI sessions. | | `hermes-acp` | Drops `clarify`, `cronjob`, `image_generate`, `send_message`, `text_to_speech`, homeassistant tools. Focused on coding tasks in IDE context. | | `hermes-api-server` | Drops `clarify`, `send_message`, and `text_to_speech`. Adds everything else — suitable for programmatic access where user interaction isn't possible. | | `hermes-telegram` | Same as `hermes-cli`. | @@ -100,16 +102,16 @@ Platform toolsets define the complete tool configuration for a deployment target | `hermes-mattermost` | Same as `hermes-cli`. | | `hermes-email` | Same as `hermes-cli`. | | `hermes-sms` | Same as `hermes-cli`. | -| `hermes-dingtalk` | Same as `hermes-cli`. | -| `hermes-feishu` | Same as `hermes-cli`. | -| `hermes-wecom` | Same as `hermes-cli`. | -| `hermes-wecom-callback` | WeCom callback toolset — enterprise self-built app messaging (full access). | -| `hermes-weixin` | Same as `hermes-cli`. | | `hermes-bluebubbles` | Same as `hermes-cli`. | +| `hermes-dingtalk` | Same as `hermes-cli`. | +| `hermes-feishu` | Same as `hermes-cli`. Note: the `feishu_doc` / `feishu_drive` toolsets are used only by the document-comment handler, not by the regular Feishu chat adapter. | | `hermes-qqbot` | Same as `hermes-cli`. | -| `hermes-homeassistant` | Same as `hermes-cli`. | +| `hermes-wecom` | Same as `hermes-cli`. | +| `hermes-wecom-callback` | Same as `hermes-cli`. | +| `hermes-weixin` | Same as `hermes-cli`. | +| `hermes-homeassistant` | Same as `hermes-cli` plus the `homeassistant` toolset always on. | | `hermes-webhook` | Same as `hermes-cli`. | -| `hermes-gateway` | Union of all messaging platform toolsets. Used internally when the gateway needs the broadest possible tool set. | +| `hermes-gateway` | Internal gateway orchestrator toolset — union of the broadest possible tool set when the gateway needs to accept any message source. | ## Dynamic Toolsets @@ -119,11 +121,10 @@ Each configured MCP server generates a `mcp-` toolset at runtime. For ex ```yaml # config.yaml -mcp: - servers: - github: - command: npx - args: ["-y", "@modelcontextprotocol/server-github"] +mcp_servers: + github: + command: npx + args: ["-y", "@modelcontextprotocol/server-github"] ``` This creates a `mcp-github` toolset you can reference in `--toolsets` or platform configs. diff --git a/website/docs/user-guide/configuration.md b/website/docs/user-guide/configuration.md index bef9b5cfd55..29d1665627e 100644 --- a/website/docs/user-guide/configuration.md +++ b/website/docs/user-guide/configuration.md @@ -601,7 +601,7 @@ Every model slot in Hermes — auxiliary tasks, compression, fallback — uses t When `base_url` is set, Hermes ignores the provider and calls that endpoint directly (using `api_key` or `OPENAI_API_KEY` for auth). When only `provider` is set, Hermes uses that provider's built-in auth and base URL. -Available providers for auxiliary tasks: `auto`, `openrouter`, `nous`, `codex`, `copilot`, `anthropic`, `main`, `zai`, `kimi-coding`, `kimi-coding-cn`, `arcee`, `minimax`, any provider registered in the [provider registry](/docs/reference/environment-variables), or any named custom provider from your `custom_providers` list (e.g. `provider: "beans"`). +Available providers for auxiliary tasks: `auto`, `main`, plus any provider in the [provider registry](/docs/reference/environment-variables) — `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `gemini`, `google-gemini-cli`, `qwen-oauth`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `deepseek`, `nvidia`, `xai`, `ollama-cloud`, `alibaba`, `bedrock`, `huggingface`, `arcee`, `xiaomi`, `kilocode`, `opencode-zen`, `opencode-go`, `ai-gateway` — or any named custom provider from your `custom_providers` list (e.g. `provider: "beans"`). :::warning `"main"` is for auxiliary tasks only The `"main"` provider option means "use whatever provider my main agent uses" — it's only valid inside `auxiliary:`, `compression:`, and `fallback_model:` configs. It is **not** a valid value for your top-level `model.provider` setting. If you use a custom OpenAI-compatible endpoint, set `provider: custom` in your `model:` section. See [AI Providers](/docs/integrations/providers) for all main model provider options. @@ -851,7 +851,7 @@ agent: ```yaml tts: - provider: "edge" # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "neutts" + provider: "edge" # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "gemini" | "xai" | "neutts" speed: 1.0 # Global speed multiplier (fallback for all providers) edge: voice: "en-US-AriaNeural" # 322 voices, 74 languages @@ -867,6 +867,18 @@ tts: minimax: speed: 1.0 # Speech speed multiplier # base_url: "" # Optional: override for OpenAI-compatible TTS endpoints + mistral: + model: "voxtral-mini-tts-2603" + voice_id: "c69964a6-ab8b-4f8a-9465-ec0925096ec8" # Paul - Neutral (default) + gemini: + model: "gemini-2.5-flash-preview-tts" # or gemini-2.5-pro-preview-tts + voice: "Kore" # 30 prebuilt voices: Zephyr, Puck, Kore, Enceladus, etc. + xai: + voice_id: "eve" # xAI TTS voice + language: "en" # ISO 639-1 + sample_rate: 24000 + bit_rate: 128000 # MP3 bitrate + # base_url: "https://api.x.ai/v1" neutts: ref_audio: '' ref_text: '' diff --git a/website/docs/user-guide/features/api-server.md b/website/docs/user-guide/features/api-server.md index ebcb4523e86..82c6db0b2c2 100644 --- a/website/docs/user-guide/features/api-server.md +++ b/website/docs/user-guide/features/api-server.md @@ -154,12 +154,64 @@ Delete a stored response. ### GET /v1/models -Lists the agent as an available model. The advertised model name defaults to the [profile](/docs/user-guide/features/profiles) name (or `hermes-agent` for the default profile). Required by most frontends for model discovery. +Lists the agent as an available model. The advertised model name defaults to the [profile](/docs/user-guide/profiles) name (or `hermes-agent` for the default profile). Required by most frontends for model discovery. ### GET /health Health check. Returns `{"status": "ok"}`. Also available at **GET /v1/health** for OpenAI-compatible clients that expect the `/v1/` prefix. +### GET /health/detailed + +Extended health check that also reports active sessions, running agents, and resource usage. Useful for monitoring/observability tooling. + +## Runs API (streaming-friendly alternative) + +In addition to `/v1/chat/completions` and `/v1/responses`, the server exposes a **runs** API for long-form sessions where the client wants to subscribe to progress events instead of managing streaming themselves. + +### POST /v1/runs + +Create a new agent run. Returns a `run_id` that can be used to subscribe to progress events. + +### GET /v1/runs/\{run_id\}/events + +Server-Sent Events stream of the run's tool-call progress, token deltas, and lifecycle events. Designed for dashboards and thick clients that want to attach/detach without losing state. + +## Jobs API (background scheduled work) + +The server exposes a lightweight jobs CRUD surface for managing scheduled / background agent runs from a remote client. All endpoints are gated behind the same bearer auth. + +### GET /api/jobs + +List all scheduled jobs. + +### POST /api/jobs + +Create a new scheduled job. Body accepts the same shape as `hermes cron` — prompt, schedule, skills, provider override, delivery target. + +### GET /api/jobs/\{job_id\} + +Fetch a single job's definition and last-run state. + +### PATCH /api/jobs/\{job_id\} + +Update fields on an existing job (prompt, schedule, etc.). Partial updates are merged. + +### DELETE /api/jobs/\{job_id\} + +Remove a job. Also cancels any in-flight run. + +### POST /api/jobs/\{job_id\}/pause + +Pause a job without deleting it. Next-scheduled-run timestamps are suspended until resumed. + +### POST /api/jobs/\{job_id\}/resume + +Resume a previously paused job. + +### POST /api/jobs/\{job_id\}/run + +Trigger the job to run immediately, out of schedule. + ## System Prompt Handling When a frontend sends a `system` message (Chat Completions) or `instructions` field (Responses API), hermes-agent **layers it on top** of its core system prompt. Your agent keeps all its tools, memory, and skills — the frontend's system prompt adds extra instructions. @@ -247,7 +299,7 @@ Any frontend that supports the OpenAI API format works. Tested/documented integr ## Multi-User Setup with Profiles -To give multiple users their own isolated Hermes instance (separate config, memory, skills), use [profiles](/docs/user-guide/features/profiles): +To give multiple users their own isolated Hermes instance (separate config, memory, skills), use [profiles](/docs/user-guide/profiles): ```bash # Create a profile per user diff --git a/website/docs/user-guide/features/fallback-providers.md b/website/docs/user-guide/features/fallback-providers.md index 8d16079c2e5..2e9bcad99b0 100644 --- a/website/docs/user-guide/features/fallback-providers.md +++ b/website/docs/user-guide/features/fallback-providers.md @@ -50,7 +50,10 @@ Both `provider` and `model` are **required**. If either is missing, the fallback | NVIDIA NIM | `nvidia` | `NVIDIA_API_KEY` (optional: `NVIDIA_BASE_URL`) | | Ollama Cloud | `ollama-cloud` | `OLLAMA_API_KEY` | | Google Gemini (OAuth) | `google-gemini-cli` | `hermes model` (Google OAuth; optional: `HERMES_GEMINI_PROJECT_ID`) | +| Google AI Studio | `gemini` | `GOOGLE_API_KEY` (alias: `GEMINI_API_KEY`) | | xAI (Grok) | `xai` (alias `grok`) | `XAI_API_KEY` (optional: `XAI_BASE_URL`) | +| AWS Bedrock | `bedrock` | Standard boto3 auth (`AWS_REGION` + `AWS_PROFILE` or `AWS_ACCESS_KEY_ID`) | +| Qwen Portal (OAuth) | `qwen-oauth` | `hermes model` (Qwen Portal OAuth; optional: `HERMES_QWEN_BASE_URL`) | | OpenCode Zen | `opencode-zen` | `OPENCODE_ZEN_API_KEY` | | OpenCode Go | `opencode-go` | `OPENCODE_GO_API_KEY` | | Kilo Code | `kilocode` | `KILOCODE_API_KEY` | @@ -166,6 +169,8 @@ Hermes uses separate lightweight models for side tasks. Each task has its own pr | Skills Hub | Skill search and discovery | `auxiliary.skills_hub` | | MCP | MCP helper operations | `auxiliary.mcp` | | Memory Flush | Memory consolidation | `auxiliary.flush_memories` | +| Approval | Smart command-approval classification | `auxiliary.approval` | +| Title Generation | Session title summaries | `auxiliary.title_generation` | ### Auto-Detection Chain @@ -339,5 +344,7 @@ See [Scheduled Tasks (Cron)](/docs/user-guide/features/cron) for full configurat | Skills hub | Auto-detection chain | `auxiliary.skills_hub` | | MCP helpers | Auto-detection chain | `auxiliary.mcp` | | Memory flush | Auto-detection chain | `auxiliary.flush_memories` | +| Approval classification | Auto-detection chain | `auxiliary.approval` | +| Title generation | Auto-detection chain | `auxiliary.title_generation` | | Delegation | Provider override only (no automatic fallback) | `delegation.provider` / `delegation.model` | | Cron jobs | Per-job provider override only (no automatic fallback) | Per-job `provider` / `model` | diff --git a/website/docs/user-guide/features/hooks.md b/website/docs/user-guide/features/hooks.md index c1c7ef05bf7..a64f3220956 100644 --- a/website/docs/user-guide/features/hooks.md +++ b/website/docs/user-guide/features/hooks.md @@ -243,6 +243,8 @@ def register(ctx): | [`post_llm_call`](#post_llm_call) | Once per turn, after the tool-calling loop | ignored | | [`on_session_start`](#on_session_start) | New session created (first turn only) | ignored | | [`on_session_end`](#on_session_end) | Session ends | ignored | +| [`on_session_finalize`](#on_session_finalize) | CLI/gateway tears down an active session (flush, save, stats) | ignored | +| [`on_session_reset`](#on_session_reset) | Gateway swaps in a fresh session key (e.g. `/new`, `/reset`) | ignored | --- @@ -600,4 +602,50 @@ def register(ctx): --- +### `on_session_finalize` + +Fires when the CLI or gateway **tears down** an active session — for example, when the user runs `/new`, the gateway GC'd an idle session, or the CLI quit with an active agent. This is the last chance to flush state tied to the outgoing session before its identity is gone. + +**Callback signature:** + +```python +def my_callback(session_id: str | None, platform: str, **kwargs): +``` + +| Parameter | Type | Description | +|-----------|------|-------------| +| `session_id` | `str` or `None` | The outgoing session ID. May be `None` if no active session existed. | +| `platform` | `str` | `"cli"` or the messaging platform name (`"telegram"`, `"discord"`, etc.). | + +**Fires:** In `cli.py` (on `/new` / CLI exit) and `gateway/run.py` (when a session is reset or GC'd). Always paired with `on_session_reset` on the gateway side. + +**Return value:** Ignored. + +**Use cases:** Persist final session metrics before the session ID is discarded, close per-session resources, emit a final telemetry event, drain queued writes. + +--- + +### `on_session_reset` + +Fires when the gateway **swaps in a new session key** for an active chat — the user invoked `/new`, `/reset`, `/clear`, or the adapter picked a fresh session after an idle window. This lets plugins react to the fact that conversation state has been wiped without waiting for the next `on_session_start`. + +**Callback signature:** + +```python +def my_callback(session_id: str, platform: str, **kwargs): +``` + +| Parameter | Type | Description | +|-----------|------|-------------| +| `session_id` | `str` | The new session's ID (already rotated to the fresh value). | +| `platform` | `str` | The messaging platform name. | + +**Fires:** In `gateway/run.py`, immediately after the new session key is allocated but before the next inbound message is processed. On the gateway, the order is: `on_session_finalize(old_id)` → swap → `on_session_reset(new_id)` → `on_session_start(new_id)` on the first inbound turn. + +**Return value:** Ignored. + +**Use cases:** Reset per-session caches keyed by `session_id`, emit "session rotated" analytics, prime a fresh state bucket. + +--- + See the **[Build a Plugin guide](/docs/guides/build-a-hermes-plugin)** for the full walkthrough including tool schemas, handlers, and advanced hook patterns. diff --git a/website/docs/user-guide/features/tts.md b/website/docs/user-guide/features/tts.md index 9f9d257fcc4..6f7fc895062 100644 --- a/website/docs/user-guide/features/tts.md +++ b/website/docs/user-guide/features/tts.md @@ -14,7 +14,7 @@ If you have a paid [Nous Portal](https://portal.nousresearch.com) subscription, ## Text-to-Speech -Convert text to speech with seven providers: +Convert text to speech with eight providers: | Provider | Quality | Cost | API Key | |----------|---------|------|---------| diff --git a/website/docs/user-guide/messaging/discord.md b/website/docs/user-guide/messaging/discord.md index 44e08330dfa..0efe909b0d1 100644 --- a/website/docs/user-guide/messaging/discord.md +++ b/website/docs/user-guide/messaging/discord.md @@ -282,12 +282,16 @@ Discord behavior is controlled through two files: **`~/.hermes/.env`** for crede | `DISCORD_ALLOW_BOTS` | No | `"none"` | Controls how the bot handles messages from other Discord bots. `"none"` — ignore all other bots. `"mentions"` — only accept bot messages that `@mention` Hermes. `"all"` — accept all bot messages. | | `DISCORD_REACTIONS` | No | `true` | When `true`, the bot adds emoji reactions to messages during processing (👀 when starting, ✅ on success, ❌ on error). Set to `false` to disable reactions entirely. | | `DISCORD_IGNORED_CHANNELS` | No | — | Comma-separated channel IDs where the bot **never** responds, even when `@mentioned`. Takes priority over all other channel settings. | +| `DISCORD_ALLOWED_CHANNELS` | No | — | Comma-separated channel IDs. When set, the bot **only** responds in these channels (plus DMs if allowed). Overrides `config.yaml` `discord.allowed_channels`. Combine with `DISCORD_IGNORED_CHANNELS` to express allow/deny rules. | | `DISCORD_NO_THREAD_CHANNELS` | No | — | Comma-separated channel IDs where the bot responds directly in the channel instead of creating a thread. Only relevant when `DISCORD_AUTO_THREAD` is `true`. | | `DISCORD_REPLY_TO_MODE` | No | `"first"` | Controls reply-reference behavior: `"off"` — never reply to the original message, `"first"` — reply-reference on the first message chunk only (default), `"all"` — reply-reference on every chunk. | | `DISCORD_ALLOW_MENTION_EVERYONE` | No | `false` | When `false` (default), the bot cannot ping `@everyone` or `@here` even if its response contains those tokens. Set to `true` to opt back in. See [Mention Control](#mention-control) below. | | `DISCORD_ALLOW_MENTION_ROLES` | No | `false` | When `false` (default), the bot cannot ping `@role` mentions. Set to `true` to allow. | | `DISCORD_ALLOW_MENTION_USERS` | No | `true` | When `true` (default), the bot can ping individual users by ID. | | `DISCORD_ALLOW_MENTION_REPLIED_USER` | No | `true` | When `true` (default), replying to a message pings the original author. | +| `DISCORD_PROXY` | No | — | Proxy URL for Discord connections (HTTP, WebSocket, REST). Overrides `HTTPS_PROXY`/`ALL_PROXY`. Supports `http://`, `https://`, and `socks5://` schemes. | +| `HERMES_DISCORD_TEXT_BATCH_DELAY_SECONDS` | No | `0.6` | Grace window the adapter waits before flushing a queued text chunk. Useful for smoothing streamed output. | +| `HERMES_DISCORD_TEXT_BATCH_SPLIT_DELAY_SECONDS` | No | `0.1` | Delay between split chunks when a single message exceeds Discord's length limit. | ### Config File (`config.yaml`) diff --git a/website/docs/user-guide/messaging/matrix.md b/website/docs/user-guide/messaging/matrix.md index ec77b5bc33e..255806c01ba 100644 --- a/website/docs/user-guide/messaging/matrix.md +++ b/website/docs/user-guide/messaging/matrix.md @@ -72,8 +72,13 @@ MATRIX_REQUIRE_MENTION=true MATRIX_FREE_RESPONSE_ROOMS=!abc123:matrix.org,!def456:matrix.org MATRIX_AUTO_THREAD=true MATRIX_DM_MENTION_THREADS=false +MATRIX_REACTIONS=true # default: true — emoji reactions during processing ``` +:::tip Disabling reactions +`MATRIX_REACTIONS=false` turns off the processing-lifecycle emoji reactions (👀/✅/❌) the bot posts on inbound messages. Useful for rooms where reaction events are noisy or aren't supported by all participating clients. +::: + :::note If you are upgrading from a version that did not have `MATRIX_REQUIRE_MENTION`, the bot previously responded to all messages in rooms. To preserve that behavior, set `MATRIX_REQUIRE_MENTION=false`. ::: diff --git a/website/docs/user-guide/messaging/open-webui.md b/website/docs/user-guide/messaging/open-webui.md index b26d23eddfd..efdf901371b 100644 --- a/website/docs/user-guide/messaging/open-webui.md +++ b/website/docs/user-guide/messaging/open-webui.md @@ -198,7 +198,7 @@ Make sure your `OPENAI_API_KEY` in Open WebUI matches the `API_SERVER_KEY` in He ## Multi-User Setup with Profiles -To run separate Hermes instances per user — each with their own config, memory, and skills — use [profiles](/docs/user-guide/features/profiles). Each profile runs its own API server on a different port and automatically advertises the profile name as the model in Open WebUI. +To run separate Hermes instances per user — each with their own config, memory, and skills — use [profiles](/docs/user-guide/profiles). Each profile runs its own API server on a different port and automatically advertises the profile name as the model in Open WebUI. ### 1. Create profiles and configure API servers diff --git a/website/docs/user-guide/messaging/qqbot.md b/website/docs/user-guide/messaging/qqbot.md index d9da90d5868..8da6f92def5 100644 --- a/website/docs/user-guide/messaging/qqbot.md +++ b/website/docs/user-guide/messaging/qqbot.md @@ -28,7 +28,7 @@ The QQ Bot adapter uses the [Official QQ Bot API](https://bot.q.qq.com/wiki/deve ### Interactive setup ```bash -hermes setup gateway +hermes gateway setup ``` Select **QQ Bot** from the platform list and follow the prompts. @@ -52,7 +52,7 @@ QQ_CLIENT_SECRET=your-app-secret | `QQBOT_HOME_CHANNEL_NAME` | Display name for home channel | `Home` | | `QQ_ALLOWED_USERS` | Comma-separated user OpenIDs for DM access | open (all users) | | `QQ_ALLOW_ALL_USERS` | Set to `true` to allow all DMs | `false` | -| `QQ_MARKDOWN_SUPPORT` | Enable QQ markdown (msg_type 2) | `true` | +| `QQ_SANDBOX` | Route requests to the QQ sandbox gateway for development testing | `false` | | `QQ_STT_API_KEY` | API key for voice-to-text provider | — | | `QQ_STT_BASE_URL` | Base URL for STT provider | `https://open.bigmodel.cn/api/coding/paas/v4` | | `QQ_STT_MODEL` | STT model name | `glm-asr` | @@ -68,7 +68,7 @@ platforms: extra: app_id: "your-app-id" client_secret: "your-secret" - markdown_support: true + markdown_support: true # enable QQ markdown (msg_type 2). Config-only; no env-var equivalent. dm_policy: "open" # open | allowlist | disabled allow_from: - "user_openid_1" diff --git a/website/docs/user-guide/messaging/slack.md b/website/docs/user-guide/messaging/slack.md index 5f6492216a9..a7eff683da8 100644 --- a/website/docs/user-guide/messaging/slack.md +++ b/website/docs/user-guide/messaging/slack.md @@ -283,7 +283,7 @@ slack: ``` :::info -Unlike Discord and Telegram, Slack does not have a `free_response_channels` equivalent. The Slack adapter requires `@mention` to start a conversation in channels. However, once the bot has an active session in a thread, subsequent thread replies do not require a mention. In DMs, the bot always responds without needing a mention. +Slack supports both patterns: `@mention` required to start a conversation by default, but you can opt specific channels out via `SLACK_FREE_RESPONSE_CHANNELS` (comma-separated channel IDs) or `slack.free_response_channels` in `config.yaml`. Once the bot has an active session in a thread, subsequent thread replies do not require a mention. In DMs the bot always responds without needing a mention. ::: ### Unauthorized User Handling diff --git a/website/docs/user-guide/messaging/telegram.md b/website/docs/user-guide/messaging/telegram.md index 0fa2e830b9d..6dbf9e61dff 100644 --- a/website/docs/user-guide/messaging/telegram.md +++ b/website/docs/user-guide/messaging/telegram.md @@ -422,40 +422,6 @@ The current model and provider are displayed at the top. All navigation happens If you know the exact model name, type `/model ` directly to skip the picker. You can also type `/model --global` to persist the change across sessions. ::: -## Webhook Mode - -By default, the Telegram adapter connects via **long polling** — the gateway makes outbound connections to Telegram's servers. This works everywhere but keeps a persistent connection open. - -**Webhook mode** is an alternative where Telegram pushes updates to your server over HTTPS. This is ideal for **serverless and cloud deployments** (Fly.io, Railway, etc.) where inbound HTTP can wake a suspended machine. - -### Configuration - -Set the `TELEGRAM_WEBHOOK_URL` environment variable to enable webhook mode: - -```bash -# Required — your public HTTPS endpoint -TELEGRAM_WEBHOOK_URL=https://app.fly.dev/telegram - -# Optional — local listen port (default: 8443) -TELEGRAM_WEBHOOK_PORT=8443 - -# Optional — secret token for update verification (auto-generated if not set) -TELEGRAM_WEBHOOK_SECRET=my-secret-token -``` - -Or in `~/.hermes/config.yaml`: - -```yaml -telegram: - webhook_mode: true -``` - -When `TELEGRAM_WEBHOOK_URL` is set, the gateway starts an HTTP server listening on `0.0.0.0:` and registers the webhook URL with Telegram. The URL path is extracted from the webhook URL (defaults to `/telegram`). - -:::warning -Telegram requires a **valid TLS certificate** on the webhook endpoint. Self-signed certificates will be rejected. Use a reverse proxy (nginx, Caddy) or a platform that provides TLS termination (Fly.io, Railway, Cloudflare Tunnel). -::: - ## DNS-over-HTTPS Fallback IPs In some restricted networks, `api.telegram.org` may resolve to an IP that is unreachable. The Telegram adapter includes a **fallback IP** mechanism that transparently retries connections against alternative IPs while preserving the correct TLS hostname and SNI. From 285bb2b9150b93445e5eded9bc897a4001b66e55 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 18 Apr 2026 01:46:25 -0700 Subject: [PATCH 013/143] feat(execute_code): add project/strict execution modes, default to project (#11971) Weaker models (Gemma-class) repeatedly rediscover and forget that execute_code uses a different CWD and Python interpreter than terminal(), causing them to flip-flop on whether user files exist and to hit import errors on project dependencies like pandas. Adds a new 'code_execution.mode' config key (default 'project') that brings execute_code into line with terminal()'s filesystem/interpreter: project (new default): - cwd = session's TERMINAL_CWD (falls back to os.getcwd()) - python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python with a Python 3.8+ version check; falls back cleanly to sys.executable if no venv or the candidate fails - result : 'import pandas' works, '.env' resolves, matches terminal() strict (opt-in): - cwd = staging tmpdir (today's behavior) - python = sys.executable (today's behavior) - result : maximum reproducibility and isolation; project deps won't resolve Security-critical invariants are identical across both modes and covered by explicit regression tests: - env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, *_PASSWORD, *_CREDENTIAL, *_PASSWD, *_AUTH substrings) - SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no delegate_task, no MCP from inside scripts) - resource caps (5-min timeout, 50KB stdout, 50 tool calls) Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool descriptions (regression from commit 39b83f34 where agents on local backends falsely believed they were sandboxed and refused networking). Override via env var: HERMES_EXECUTE_CODE_MODE=strict|project --- hermes_cli/config.py | 16 +- model_tools.py | 4 +- tests/hermes_cli/test_config.py | 6 +- tests/tools/test_code_execution_modes.py | 455 +++++++++++++++++++++++ tools/code_execution_tool.py | 176 ++++++++- 5 files changed, 643 insertions(+), 14 deletions(-) create mode 100644 tests/tools/test_code_execution_modes.py diff --git a/hermes_cli/config.py b/hermes_cli/config.py index c9e05e3e882..dfb6b7210a4 100644 --- a/hermes_cli/config.py +++ b/hermes_cli/config.py @@ -771,6 +771,20 @@ DEFAULT_CONFIG = { "wrap_response": True, }, + # execute_code settings — controls the tool used for programmatic tool calls. + "code_execution": { + # Execution mode: + # project (default) — scripts run in the session's working directory + # with the active virtualenv/conda env's python, so project deps + # (pandas, torch, project packages) and relative paths resolve. + # strict — scripts run in an isolated temp directory with + # hermes-agent's own python (sys.executable). Maximum isolation + # and reproducibility; project deps and relative paths won't work. + # Env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, ...) and the + # tool whitelist apply identically in both modes. + "mode": "project", + }, + # Logging — controls file logging to ~/.hermes/logs/. # agent.log captures INFO+ (all agent activity); errors.log captures WARNING+. "logging": { @@ -788,7 +802,7 @@ DEFAULT_CONFIG = { }, # Config schema version - bump this when adding new required fields - "_config_version": 18, + "_config_version": 19, } # ============================================================================= diff --git a/model_tools.py b/model_tools.py index 801255b7978..5ec806e78bf 100644 --- a/model_tools.py +++ b/model_tools.py @@ -274,9 +274,9 @@ def get_tool_definitions( # execute_code" even when the API key isn't configured or the toolset is # disabled (#560-discord). if "execute_code" in available_tool_names: - from tools.code_execution_tool import SANDBOX_ALLOWED_TOOLS, build_execute_code_schema + from tools.code_execution_tool import SANDBOX_ALLOWED_TOOLS, build_execute_code_schema, _get_execution_mode sandbox_enabled = SANDBOX_ALLOWED_TOOLS & available_tool_names - dynamic_schema = build_execute_code_schema(sandbox_enabled) + dynamic_schema = build_execute_code_schema(sandbox_enabled, mode=_get_execution_mode()) for i, td in enumerate(filtered_tools): if td.get("function", {}).get("name") == "execute_code": filtered_tools[i] = {"type": "function", "function": dynamic_schema} diff --git a/tests/hermes_cli/test_config.py b/tests/hermes_cli/test_config.py index f31ac045c4f..4330424b9a2 100644 --- a/tests/hermes_cli/test_config.py +++ b/tests/hermes_cli/test_config.py @@ -459,7 +459,7 @@ class TestCustomProviderCompatibility: migrate_config(interactive=False, quiet=True) raw = yaml.safe_load(config_path.read_text(encoding="utf-8")) - assert raw["_config_version"] == 18 + assert raw["_config_version"] == 19 assert raw["providers"]["openai-direct"] == { "api": "https://api.openai.com/v1", "api_key": "test-key", @@ -606,7 +606,7 @@ class TestInterimAssistantMessageConfig: migrate_config(interactive=False, quiet=True) raw = yaml.safe_load(config_path.read_text(encoding="utf-8")) - assert raw["_config_version"] == 18 + assert raw["_config_version"] == 19 assert raw["display"]["tool_progress"] == "off" assert raw["display"]["interim_assistant_messages"] is True @@ -626,6 +626,6 @@ class TestDiscordChannelPromptsConfig: migrate_config(interactive=False, quiet=True) raw = yaml.safe_load(config_path.read_text(encoding="utf-8")) - assert raw["_config_version"] == 18 + assert raw["_config_version"] == 19 assert raw["discord"]["auto_thread"] is True assert raw["discord"]["channel_prompts"] == {} diff --git a/tests/tools/test_code_execution_modes.py b/tests/tools/test_code_execution_modes.py new file mode 100644 index 00000000000..875eaf7aeda --- /dev/null +++ b/tests/tools/test_code_execution_modes.py @@ -0,0 +1,455 @@ +#!/usr/bin/env python3 +"""Tests for execute_code's strict / project execution modes. + +The mode switch controls two things: + - working directory: staging tmpdir (strict) vs session CWD (project) + - interpreter: sys.executable (strict) vs active venv's python (project) + +Security-critical invariants — env scrubbing, tool whitelist, resource caps — +must apply identically in both modes. These tests guard all three layers. + +Mode is sourced exclusively from ``code_execution.mode`` in config.yaml — +there is no env-var override. Tests patch ``_load_config`` directly. +""" + +import json +import os +import sys +import unittest +from contextlib import contextmanager +from unittest.mock import patch + +import pytest + +os.environ["TERMINAL_ENV"] = "local" + + +@pytest.fixture(autouse=True) +def _force_local_terminal(monkeypatch): + """Mirror test_code_execution.py — guarantee local backend under xdist.""" + monkeypatch.setenv("TERMINAL_ENV", "local") + + +from tools.code_execution_tool import ( + SANDBOX_ALLOWED_TOOLS, + DEFAULT_EXECUTION_MODE, + EXECUTION_MODES, + _get_execution_mode, + _is_usable_python, + _resolve_child_cwd, + _resolve_child_python, + build_execute_code_schema, + execute_code, +) + + +@contextmanager +def _mock_mode(mode): + """Context manager that pins code_execution.mode to the given value.""" + with patch("tools.code_execution_tool._load_config", + return_value={"mode": mode}): + yield + + +def _mock_handle_function_call(function_name, function_args, task_id=None, user_task=None): + """Minimal mock dispatcher reused across tests.""" + if function_name == "terminal": + return json.dumps({"output": "mock", "exit_code": 0}) + if function_name == "read_file": + return json.dumps({"content": "line1\n", "total_lines": 1}) + return json.dumps({"error": f"Unknown tool: {function_name}"}) + + +# --------------------------------------------------------------------------- +# Mode resolution +# --------------------------------------------------------------------------- + +class TestGetExecutionMode(unittest.TestCase): + """_get_execution_mode reads config.yaml only (no env var surface).""" + + def test_default_is_project(self): + self.assertEqual(DEFAULT_EXECUTION_MODE, "project") + + def test_config_project(self): + with patch("tools.code_execution_tool._load_config", + return_value={"mode": "project"}): + self.assertEqual(_get_execution_mode(), "project") + + def test_config_strict(self): + with patch("tools.code_execution_tool._load_config", + return_value={"mode": "strict"}): + self.assertEqual(_get_execution_mode(), "strict") + + def test_config_case_insensitive(self): + with patch("tools.code_execution_tool._load_config", + return_value={"mode": "STRICT"}): + self.assertEqual(_get_execution_mode(), "strict") + + def test_config_strips_whitespace(self): + with patch("tools.code_execution_tool._load_config", + return_value={"mode": " project "}): + self.assertEqual(_get_execution_mode(), "project") + + def test_empty_config_falls_back_to_default(self): + with patch("tools.code_execution_tool._load_config", return_value={}): + self.assertEqual(_get_execution_mode(), DEFAULT_EXECUTION_MODE) + + def test_bogus_config_falls_back_to_default(self): + with patch("tools.code_execution_tool._load_config", + return_value={"mode": "banana"}): + self.assertEqual(_get_execution_mode(), DEFAULT_EXECUTION_MODE) + + def test_none_config_falls_back_to_default(self): + with patch("tools.code_execution_tool._load_config", + return_value={"mode": None}): + # str(None).lower() = "none" → not in EXECUTION_MODES → default + self.assertEqual(_get_execution_mode(), DEFAULT_EXECUTION_MODE) + + def test_execution_modes_tuple(self): + """Canonical set of modes — tests + config layer rely on this shape.""" + self.assertEqual(set(EXECUTION_MODES), {"project", "strict"}) + + +# --------------------------------------------------------------------------- +# Interpreter resolver +# --------------------------------------------------------------------------- + +class TestResolveChildPython(unittest.TestCase): + """_resolve_child_python — picks the right interpreter per mode.""" + + def test_strict_always_sys_executable(self): + """Strict mode never leaves sys.executable, even if venv is set.""" + with patch.dict(os.environ, {"VIRTUAL_ENV": "/some/venv"}): + self.assertEqual(_resolve_child_python("strict"), sys.executable) + + def test_project_with_no_venv_falls_back(self): + """Project mode without VIRTUAL_ENV or CONDA_PREFIX → sys.executable.""" + env = {k: v for k, v in os.environ.items() + if k not in ("VIRTUAL_ENV", "CONDA_PREFIX")} + with patch.dict(os.environ, env, clear=True): + self.assertEqual(_resolve_child_python("project"), sys.executable) + + def test_project_with_virtualenv_picks_venv_python(self): + """Project mode + VIRTUAL_ENV pointing at a real venv → that python.""" + import tempfile, pathlib + with tempfile.TemporaryDirectory() as td: + fake_venv = pathlib.Path(td) + (fake_venv / "bin").mkdir() + # Symlink to real python so the version check actually passes + (fake_venv / "bin" / "python").symlink_to(sys.executable) + with patch.dict(os.environ, {"VIRTUAL_ENV": str(fake_venv)}): + # Clear cache — _is_usable_python memoizes on path + _is_usable_python.cache_clear() + result = _resolve_child_python("project") + self.assertEqual(result, str(fake_venv / "bin" / "python")) + + def test_project_with_broken_venv_falls_back(self): + """VIRTUAL_ENV set but bin/python missing → sys.executable.""" + import tempfile + with tempfile.TemporaryDirectory() as td: + # No bin/python inside — broken venv + with patch.dict(os.environ, {"VIRTUAL_ENV": td}): + _is_usable_python.cache_clear() + self.assertEqual(_resolve_child_python("project"), sys.executable) + + def test_project_prefers_virtualenv_over_conda(self): + """If both VIRTUAL_ENV and CONDA_PREFIX are set, VIRTUAL_ENV wins.""" + import tempfile, pathlib + with tempfile.TemporaryDirectory() as ve_td, tempfile.TemporaryDirectory() as conda_td: + ve = pathlib.Path(ve_td) + (ve / "bin").mkdir() + (ve / "bin" / "python").symlink_to(sys.executable) + + conda = pathlib.Path(conda_td) + (conda / "bin").mkdir() + (conda / "bin" / "python").symlink_to(sys.executable) + + with patch.dict(os.environ, {"VIRTUAL_ENV": str(ve), "CONDA_PREFIX": str(conda)}): + _is_usable_python.cache_clear() + result = _resolve_child_python("project") + self.assertEqual(result, str(ve / "bin" / "python")) + + def test_is_usable_python_rejects_nonexistent(self): + _is_usable_python.cache_clear() + self.assertFalse(_is_usable_python("/does/not/exist/python")) + + def test_is_usable_python_accepts_real_python(self): + _is_usable_python.cache_clear() + self.assertTrue(_is_usable_python(sys.executable)) + + +# --------------------------------------------------------------------------- +# CWD resolver +# --------------------------------------------------------------------------- + +class TestResolveChildCwd(unittest.TestCase): + + def test_strict_uses_staging_dir(self): + self.assertEqual(_resolve_child_cwd("strict", "/tmp/staging"), "/tmp/staging") + + def test_project_without_terminal_cwd_uses_getcwd(self): + env = {k: v for k, v in os.environ.items() if k != "TERMINAL_CWD"} + with patch.dict(os.environ, env, clear=True): + self.assertEqual(_resolve_child_cwd("project", "/tmp/staging"), os.getcwd()) + + def test_project_uses_terminal_cwd_when_set(self): + import tempfile + with tempfile.TemporaryDirectory() as td: + with patch.dict(os.environ, {"TERMINAL_CWD": td}): + self.assertEqual(_resolve_child_cwd("project", "/tmp/staging"), td) + + def test_project_bogus_terminal_cwd_falls_back_to_getcwd(self): + with patch.dict(os.environ, {"TERMINAL_CWD": "/does/not/exist/anywhere"}): + self.assertEqual(_resolve_child_cwd("project", "/tmp/staging"), os.getcwd()) + + def test_project_expands_tilde(self): + import pathlib + home = str(pathlib.Path.home()) + with patch.dict(os.environ, {"TERMINAL_CWD": "~"}): + self.assertEqual(_resolve_child_cwd("project", "/tmp/staging"), home) + + +# --------------------------------------------------------------------------- +# Schema description +# --------------------------------------------------------------------------- + +class TestModeAwareSchema(unittest.TestCase): + + def test_strict_description_mentions_temp_dir(self): + desc = build_execute_code_schema(mode="strict")["description"] + self.assertIn("temp dir", desc) + + def test_project_description_mentions_session_and_venv(self): + desc = build_execute_code_schema(mode="project")["description"] + self.assertIn("session", desc) + self.assertIn("venv", desc) + + def test_neither_description_uses_sandbox_language(self): + """REGRESSION GUARD for commit 39b83f34. + + Agents on local backends falsely believed they were sandboxed and + refused networking tasks. Do not reintroduce any 'sandbox' / + 'isolated' / 'cloud' language in the tool description. + """ + for mode in EXECUTION_MODES: + desc = build_execute_code_schema(mode=mode)["description"].lower() + for forbidden in ("sandbox", "isolated", "cloud"): + self.assertNotIn(forbidden, desc, + f"mode={mode}: '{forbidden}' leaked into description") + + def test_descriptions_are_similar_length(self): + """Both modes should have roughly the same-size description.""" + strict = len(build_execute_code_schema(mode="strict")["description"]) + project = len(build_execute_code_schema(mode="project")["description"]) + self.assertLess(abs(strict - project), 200) + + def test_default_mode_reads_config(self): + """build_execute_code_schema() with mode=None reads config.yaml.""" + with _mock_mode("strict"): + desc = build_execute_code_schema()["description"] + self.assertIn("temp dir", desc) + with _mock_mode("project"): + desc = build_execute_code_schema()["description"] + self.assertIn("session", desc) + + +# --------------------------------------------------------------------------- +# Integration: what actually happens when execute_code runs per mode +# --------------------------------------------------------------------------- + +@pytest.mark.skipif(sys.platform == "win32", reason="execute_code is POSIX-only") +class TestExecuteCodeModeIntegration(unittest.TestCase): + """End-to-end: verify the subprocess actually runs where we expect.""" + + def _run(self, code, mode, enabled_tools=None, extra_env=None): + env_overrides = extra_env or {} + with _mock_mode(mode): + with patch.dict(os.environ, env_overrides): + with patch("model_tools.handle_function_call", + side_effect=_mock_handle_function_call): + raw = execute_code( + code=code, + task_id=f"test-{mode}", + enabled_tools=enabled_tools or list(SANDBOX_ALLOWED_TOOLS), + ) + return json.loads(raw) + + def test_strict_mode_runs_in_tmpdir(self): + """Strict mode: script's os.getcwd() is the staging tmpdir.""" + result = self._run("import os; print(os.getcwd())", mode="strict") + self.assertEqual(result["status"], "success") + self.assertIn("hermes_sandbox_", result["output"]) + + def test_project_mode_runs_in_session_cwd(self): + """Project mode: script's os.getcwd() is the session's working dir.""" + import tempfile + with tempfile.TemporaryDirectory() as td: + result = self._run( + "import os; print(os.getcwd())", + mode="project", + extra_env={"TERMINAL_CWD": td}, + ) + self.assertEqual(result["status"], "success") + # Resolve symlinks (macOS /tmp → /private/tmp) on both sides + self.assertEqual( + os.path.realpath(result["output"].strip()), + os.path.realpath(td), + ) + + def test_project_mode_interpreter_is_venv_python(self): + """Project mode: sys.executable inside the child is the venv's python + when VIRTUAL_ENV is set to a real venv.""" + # The hermes-agent venv is always active during tests, so this also + # happens to equal sys.executable of the parent. What we're asserting + # is: resolver picked a venv-bin/python path, not that it differs + # from sys.executable. + result = self._run("import sys; print(sys.executable)", mode="project") + self.assertEqual(result["status"], "success") + # Either VIRTUAL_ENV-bin/python or sys.executable fallback, both OK. + output = result["output"].strip() + ve = os.environ.get("VIRTUAL_ENV", "").strip() + if ve: + self.assertTrue( + output.startswith(ve) or output == sys.executable, + f"project-mode python should be under VIRTUAL_ENV={ve} or sys.executable={sys.executable}, got {output}", + ) + + def test_project_mode_can_still_import_hermes_tools(self): + """Regression: hermes_tools still importable from non-tmpdir CWD. + + This is the PYTHONPATH fix — without it, switching to session CWD + breaks `from hermes_tools import terminal`. + """ + import tempfile + with tempfile.TemporaryDirectory() as td: + code = ( + "from hermes_tools import terminal\n" + "r = terminal('echo x')\n" + "print(r.get('output', 'MISSING'))\n" + ) + result = self._run(code, mode="project", extra_env={"TERMINAL_CWD": td}) + self.assertEqual(result["status"], "success") + self.assertIn("mock", result["output"]) + + def test_strict_mode_can_still_import_hermes_tools(self): + """Regression: strict mode's tmpdir CWD still works for imports.""" + code = ( + "from hermes_tools import terminal\n" + "r = terminal('echo x')\n" + "print(r.get('output', 'MISSING'))\n" + ) + result = self._run(code, mode="strict") + self.assertEqual(result["status"], "success") + self.assertIn("mock", result["output"]) + + +# --------------------------------------------------------------------------- +# SECURITY-CRITICAL regression guards +# +# These MUST pass in both strict and project mode. The whole tiered-mode +# proposition rests on the claim that switching from strict to project only +# changes CWD + interpreter, not the security posture. +# --------------------------------------------------------------------------- + +@pytest.mark.skipif(sys.platform == "win32", reason="execute_code is POSIX-only") +class TestSecurityInvariantsAcrossModes(unittest.TestCase): + + def _run(self, code, mode): + with _mock_mode(mode): + with patch("model_tools.handle_function_call", + side_effect=_mock_handle_function_call): + raw = execute_code( + code=code, + task_id=f"test-sec-{mode}", + enabled_tools=list(SANDBOX_ALLOWED_TOOLS), + ) + return json.loads(raw) + + def test_api_keys_scrubbed_in_strict_mode(self): + code = ( + "import os\n" + "print('KEY=' + os.environ.get('OPENAI_API_KEY', 'MISSING'))\n" + "print('TOK=' + os.environ.get('ANTHROPIC_API_KEY', 'MISSING'))\n" + ) + with patch.dict(os.environ, { + "OPENAI_API_KEY": "sk-should-not-leak", + "ANTHROPIC_API_KEY": "ant-should-not-leak", + }): + result = self._run(code, mode="strict") + self.assertEqual(result["status"], "success") + self.assertIn("KEY=MISSING", result["output"]) + self.assertIn("TOK=MISSING", result["output"]) + self.assertNotIn("sk-should-not-leak", result["output"]) + self.assertNotIn("ant-should-not-leak", result["output"]) + + def test_api_keys_scrubbed_in_project_mode(self): + """CRITICAL: the project-mode default does NOT leak user credentials.""" + code = ( + "import os\n" + "print('KEY=' + os.environ.get('OPENAI_API_KEY', 'MISSING'))\n" + "print('TOK=' + os.environ.get('ANTHROPIC_API_KEY', 'MISSING'))\n" + "print('SEC=' + os.environ.get('GITHUB_TOKEN', 'MISSING'))\n" + ) + with patch.dict(os.environ, { + "OPENAI_API_KEY": "sk-should-not-leak", + "ANTHROPIC_API_KEY": "ant-should-not-leak", + "GITHUB_TOKEN": "ghp-should-not-leak", + }): + result = self._run(code, mode="project") + self.assertEqual(result["status"], "success") + for needle in ("KEY=MISSING", "TOK=MISSING", "SEC=MISSING"): + self.assertIn(needle, result["output"]) + for leaked in ("sk-should-not-leak", "ant-should-not-leak", "ghp-should-not-leak"): + self.assertNotIn(leaked, result["output"]) + + def test_secret_substrings_scrubbed_in_project_mode(self): + """SECRET/PASSWORD/CREDENTIAL/PASSWD/AUTH filters still apply.""" + code = ( + "import os\n" + "for k in ('MY_SECRET', 'DB_PASSWORD', 'VAULT_CREDENTIAL', " + "'LDAP_PASSWD', 'AUTH_TOKEN'):\n" + " print(f'{k}=' + os.environ.get(k, 'MISSING'))\n" + ) + with patch.dict(os.environ, { + "MY_SECRET": "secret-should-not-leak", + "DB_PASSWORD": "password-should-not-leak", + "VAULT_CREDENTIAL": "cred-should-not-leak", + "LDAP_PASSWD": "passwd-should-not-leak", + "AUTH_TOKEN": "auth-should-not-leak", + }): + result = self._run(code, mode="project") + self.assertEqual(result["status"], "success") + for leaked in ("secret-should-not-leak", "password-should-not-leak", + "cred-should-not-leak", "passwd-should-not-leak", + "auth-should-not-leak"): + self.assertNotIn(leaked, result["output"]) + + def test_tool_whitelist_enforced_in_strict_mode(self): + """A script cannot RPC-call tools outside SANDBOX_ALLOWED_TOOLS.""" + # execute_code is NOT in SANDBOX_ALLOWED_TOOLS (no recursion) + self.assertNotIn("execute_code", SANDBOX_ALLOWED_TOOLS) + code = ( + "import hermes_tools as ht\n" + "print('execute_code_available:', hasattr(ht, 'execute_code'))\n" + "print('delegate_task_available:', hasattr(ht, 'delegate_task'))\n" + ) + result = self._run(code, mode="strict") + self.assertEqual(result["status"], "success") + self.assertIn("execute_code_available: False", result["output"]) + self.assertIn("delegate_task_available: False", result["output"]) + + def test_tool_whitelist_enforced_in_project_mode(self): + """CRITICAL: project mode does NOT widen the tool whitelist.""" + code = ( + "import hermes_tools as ht\n" + "print('execute_code_available:', hasattr(ht, 'execute_code'))\n" + "print('delegate_task_available:', hasattr(ht, 'delegate_task'))\n" + ) + result = self._run(code, mode="project") + self.assertEqual(result["status"], "success") + self.assertIn("execute_code_available: False", result["output"]) + self.assertIn("delegate_task_available: False", result["output"]) + + +if __name__ == "__main__": + unittest.main() diff --git a/tools/code_execution_tool.py b/tools/code_execution_tool.py index 8268024fc72..c5a89488a08 100644 --- a/tools/code_execution_tool.py +++ b/tools/code_execution_tool.py @@ -29,6 +29,7 @@ Remote execution additionally requires Python 3 in the terminal backend. """ import base64 +import functools import json import logging import os @@ -1022,10 +1023,15 @@ def execute_code( child_env["HERMES_RPC_SOCKET"] = sock_path child_env["PYTHONDONTWRITEBYTECODE"] = "1" # Ensure the hermes-agent root is importable in the sandbox so - # repo-root modules are available to child scripts. + # repo-root modules are available to child scripts. We also prepend + # the staging tmpdir so ``from hermes_tools import ...`` resolves even + # when the subprocess CWD is not tmpdir (project mode). _hermes_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) _existing_pp = child_env.get("PYTHONPATH", "") - child_env["PYTHONPATH"] = _hermes_root + (os.pathsep + _existing_pp if _existing_pp else "") + _pp_parts = [tmpdir, _hermes_root] + if _existing_pp: + _pp_parts.append(_existing_pp) + child_env["PYTHONPATH"] = os.pathsep.join(_pp_parts) # Inject user's configured timezone so datetime.now() in sandboxed # code reflects the correct wall-clock time. Only TZ is set — # HERMES_TIMEZONE is an internal Hermes setting and must not leak @@ -1042,9 +1048,19 @@ def execute_code( if _profile_home: child_env["HOME"] = _profile_home + # Resolve interpreter + CWD based on execute_code mode. + # - strict : today's behavior (sys.executable + tmpdir CWD). + # - project: user's venv python + session's working directory, so + # project deps like pandas and user files resolve. + # Env scrubbing and tool whitelist apply identically in both modes. + _mode = _get_execution_mode() + _child_python = _resolve_child_python(_mode) + _child_cwd = _resolve_child_cwd(_mode, tmpdir) + _script_path = os.path.join(tmpdir, "script.py") + proc = subprocess.Popen( - [sys.executable, "script.py"], - cwd=tmpdir, + [_child_python, _script_path], + cwd=_child_cwd, env=child_env, stdout=subprocess.PIPE, stderr=subprocess.PIPE, @@ -1299,6 +1315,127 @@ def _load_config() -> dict: return {} +# --------------------------------------------------------------------------- +# Execution mode resolution (strict vs project) +# --------------------------------------------------------------------------- + +# Valid values for code_execution.mode. Kept as a module constant so tests +# and the config layer can reference the canonical set. +EXECUTION_MODES = ("project", "strict") +DEFAULT_EXECUTION_MODE = "project" + + +def _get_execution_mode() -> str: + """Return the active execute_code mode — 'project' or 'strict'. + + Reads ``code_execution.mode`` from config.yaml; invalid values fall back + to ``DEFAULT_EXECUTION_MODE`` ('project') with a log warning. + + Mode semantics: + - ``project`` (default): scripts run in the session's working directory + with the active virtual environment's python, so project dependencies + (pandas, torch, project packages) and files resolve naturally. + - ``strict``: scripts run in an isolated temp directory with + ``sys.executable`` (hermes-agent's python). Reproducible and the + interpreter is guaranteed to work, but project deps and relative paths + won't resolve. + + Env scrubbing and tool whitelist apply identically in both modes. + """ + cfg_value = str(_load_config().get("mode", DEFAULT_EXECUTION_MODE)).strip().lower() + if cfg_value in EXECUTION_MODES: + return cfg_value + logger.warning( + "Ignoring code_execution.mode=%r (expected one of %s), falling back to %r", + cfg_value, EXECUTION_MODES, DEFAULT_EXECUTION_MODE, + ) + return DEFAULT_EXECUTION_MODE + + +@functools.lru_cache(maxsize=32) +def _is_usable_python(python_path: str) -> bool: + """Check whether a candidate Python interpreter is usable for execute_code. + + Requires Python 3.8+ (f-strings and stdlib modules the RPC stubs need). + Cached so we don't fork a subprocess on every execute_code call. + """ + try: + result = subprocess.run( + [python_path, "-c", + "import sys; sys.exit(0 if sys.version_info >= (3, 8) else 1)"], + timeout=5, + capture_output=True, + ) + return result.returncode == 0 + except (OSError, subprocess.TimeoutExpired, subprocess.SubprocessError): + return False + + +def _resolve_child_python(mode: str) -> str: + """Pick the Python interpreter for the execute_code subprocess. + + In ``strict`` mode, always ``sys.executable`` — guaranteed to work and + keeps behavior fully reproducible across sessions. + + In ``project`` mode, prefer the user's active virtualenv/conda env's + python so ``import pandas`` etc. work. Falls back to ``sys.executable`` + if no venv is detected, the candidate binary is missing/not executable, + or it fails a Python 3.8+ version check. + """ + if mode != "project": + return sys.executable + + if _IS_WINDOWS: + exe_names = ("python.exe", "python3.exe") + subdirs = ("Scripts",) + else: + exe_names = ("python", "python3") + subdirs = ("bin",) + + for var in ("VIRTUAL_ENV", "CONDA_PREFIX"): + root = os.environ.get(var, "").strip() + if not root: + continue + for subdir in subdirs: + for exe in exe_names: + candidate = os.path.join(root, subdir, exe) + if not (os.path.isfile(candidate) and os.access(candidate, os.X_OK)): + continue + if _is_usable_python(candidate): + return candidate + # Found the interpreter but it failed the version check — + # log once and fall through to sys.executable. + logger.info( + "execute_code: skipping %s=%s (Python version < 3.8 or broken). " + "Using sys.executable instead.", var, candidate, + ) + return sys.executable + + return sys.executable + + +def _resolve_child_cwd(mode: str, staging_dir: str) -> str: + """Resolve the working directory for the execute_code subprocess. + + - ``strict``: the staging tmpdir (today's behavior). + - ``project``: the session's TERMINAL_CWD (same as the terminal tool), or + ``os.getcwd()`` if TERMINAL_CWD is unset or doesn't point at a real dir. + Falls back to the staging tmpdir as a last resort so we never invoke + Popen with a nonexistent cwd. + """ + if mode != "project": + return staging_dir + raw = os.environ.get("TERMINAL_CWD", "").strip() + if raw: + expanded = os.path.expanduser(raw) + if os.path.isdir(expanded): + return expanded + here = os.getcwd() + if os.path.isdir(here): + return here + return staging_dir + + # --------------------------------------------------------------------------- # OpenAI Function-Calling Schema # --------------------------------------------------------------------------- @@ -1330,15 +1467,24 @@ _TOOL_DOC_LINES = [ ] -def build_execute_code_schema(enabled_sandbox_tools: set = None) -> dict: +def build_execute_code_schema(enabled_sandbox_tools: set = None, + mode: str = None) -> dict: """Build the execute_code schema with description listing only enabled tools. When tools are disabled via ``hermes tools`` (e.g. web is turned off), the schema description should NOT mention web_search / web_extract — otherwise the model thinks they are available and keeps trying to use them. + + ``mode`` controls the working-directory sentence in the description: + - ``'strict'``: scripts run in a temp dir (not the session's CWD) + - ``'project'`` (default): scripts run in the session's CWD with the + active venv's python + If ``mode`` is None, the current ``code_execution.mode`` config is read. """ if enabled_sandbox_tools is None: enabled_sandbox_tools = SANDBOX_ALLOWED_TOOLS + if mode is None: + mode = _get_execution_mode() # Build tool documentation lines for only the enabled tools tool_lines = "\n".join( @@ -1354,6 +1500,20 @@ def build_execute_code_schema(enabled_sandbox_tools: set = None) -> dict: else: import_str = "..." + # Mode-specific CWD guidance. Project mode is the default and matches + # terminal()'s filesystem/interpreter; strict mode retains the isolated + # temp-dir staging and hermes-agent's own python. + if mode == "strict": + cwd_note = ( + "Scripts run in their own temp dir, not the session's CWD — use absolute paths " + "(os.path.expanduser('~/.hermes/.env')) or terminal()/read_file() for user files." + ) + else: + cwd_note = ( + "Scripts run in the session's working directory with the active venv's python, " + "so project deps (pandas, etc.) and relative paths work like in terminal()." + ) + description = ( "Run a Python script that can call Hermes tools programmatically. " "Use this when you need 3+ tool calls with processing logic between them, " @@ -1367,8 +1527,7 @@ def build_execute_code_schema(enabled_sandbox_tools: set = None) -> dict: f"{tool_lines}\n\n" "Limits: 5-minute timeout, 50KB stdout cap, max 50 tool calls per script. " "terminal() is foreground-only (no background or pty).\n\n" - "Scripts run in their own temp dir, not the session's CWD — use absolute paths " - "(os.path.expanduser('~/.hermes/.env')) or terminal()/read_file() for user files.\n\n" + f"{cwd_note}\n\n" "Print your final result to stdout. Use Python stdlib (json, re, math, csv, " "datetime, collections, etc.) for processing between tool calls.\n\n" "Also available (no import needed — built into hermes_tools):\n" @@ -1397,7 +1556,8 @@ def build_execute_code_schema(enabled_sandbox_tools: set = None) -> dict: } -# Default schema used at registration time (all sandbox tools listed) +# Default schema used at registration time (all sandbox tools listed, +# current configured mode). model_tools.py rebuilds per-session anyway. EXECUTE_CODE_SCHEMA = build_execute_code_schema() From 8322b42c6cd0f6ae9bc6721e8b0d8cbff4a856f2 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 18 Apr 2026 01:52:06 -0700 Subject: [PATCH 014/143] fix(streaming): surface dropped tool-call on mid-stream stall (#12072) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: | Scenario | Before | After | |---------------------------------------------|---------------------------|---------------------------------------------| | Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool | | Text-only partial stream | Bare recovered text | Unchanged | | tests/run_agent/test_streaming.py | 24 passed | 26 passed (2 new) | --- run_agent.py | 55 ++++++++++-- tests/run_agent/test_streaming.py | 135 ++++++++++++++++++++++++++++++ 2 files changed, 182 insertions(+), 8 deletions(-) diff --git a/run_agent.py b/run_agent.py index e8d23d39cac..d5ff125e33b 100644 --- a/run_agent.py +++ b/run_agent.py @@ -5579,7 +5579,7 @@ class AIAgent: raise result["error"] return result["response"] - result = {"response": None, "error": None} + result = {"response": None, "error": None, "partial_tool_names": []} request_client_holder = {"client": None} first_delta_fired = {"done": False} deltas_were_sent = {"yes": False} # Track if any deltas were fired (for fallback) @@ -5751,6 +5751,14 @@ class AIAgent: tool_gen_notified.add(idx) _fire_first_delta() self._fire_tool_gen_started(name) + # Record the partial tool-call name so the outer + # stub-builder can surface a user-visible warning + # if streaming dies before this tool's arguments + # are fully delivered. Without this, a stall + # during tool-call JSON generation lets the stub + # at line ~6107 return `tool_calls=None`, silently + # discarding the attempted action. + result["partial_tool_names"].append(name) if chunk.choices[0].finish_reason: finish_reason = chunk.choices[0].finish_reason @@ -6117,13 +6125,44 @@ class AIAgent: _partial_text = ( getattr(self, "_current_streamed_assistant_text", "") or "" ).strip() or None - logger.warning( - "Partial stream delivered before error; returning stub " - "response with %s chars of recovered content to prevent " - "duplicate messages: %s", - len(_partial_text or ""), - result["error"], - ) + + # If the stream died while the model was emitting a tool call, + # the stub below will silently set `tool_calls=None` and the + # agent loop will treat the turn as complete — the attempted + # action is lost with no user-facing signal. Append a + # human-visible warning to the stub content so (a) the user + # knows something failed, and (b) the next turn's model sees + # in conversation history what was attempted and can retry. + _partial_names = list(result.get("partial_tool_names") or []) + if _partial_names: + _name_str = ", ".join(_partial_names[:3]) + if len(_partial_names) > 3: + _name_str += f", +{len(_partial_names) - 3} more" + _warn = ( + f"\n\n⚠ Stream stalled mid tool-call " + f"({_name_str}); the action was not executed. " + f"Ask me to retry if you want to continue." + ) + _partial_text = (_partial_text or "") + _warn + # Also fire as a streaming delta so the user sees it now + # instead of only in the persisted transcript. + try: + self._fire_stream_delta(_warn) + except Exception: + pass + logger.warning( + "Partial stream dropped tool call(s) %s after %s chars " + "of text; surfaced warning to user: %s", + _partial_names, len(_partial_text or ""), result["error"], + ) + else: + logger.warning( + "Partial stream delivered before error; returning stub " + "response with %s chars of recovered content to prevent " + "duplicate messages: %s", + len(_partial_text or ""), + result["error"], + ) _stub_msg = SimpleNamespace( role="assistant", content=_partial_text, tool_calls=None, reasoning_content=None, diff --git a/tests/run_agent/test_streaming.py b/tests/run_agent/test_streaming.py index 73a9872020e..6afe36ee3ad 100644 --- a/tests/run_agent/test_streaming.py +++ b/tests/run_agent/test_streaming.py @@ -952,3 +952,138 @@ class TestAnthropicStreamCallbacks: agent._interruptible_streaming_api_call({}) assert touch_calls.count("receiving stream response") == len(events) + + +class TestPartialToolCallWarning: + """Regression: when a stream dies mid tool-call argument generation after + text was already delivered, the partial-stream stub at run_agent.py + line ~6107 used to silently set ``tool_calls=None`` and return + ``finish_reason=stop``, losing the attempted action with zero user-facing + signal. Live-observed Apr 2026 with MiniMax M2.7 on a 6-minute audit + task — agent streamed commentary, emitted a write_file tool call, + MiniMax stalled for 240 s mid-arguments, stale-stream detector killed + the connection, the stub returned, session ended with no file written + and no error shown. + + Fix: when the stream accumulator captured any tool-call names before the + error, the stub now appends a user-visible warning to content AND fires + it as a stream delta so the user sees it immediately. + """ + + @patch("run_agent.AIAgent._create_request_openai_client") + @patch("run_agent.AIAgent._close_request_openai_client") + def test_partial_tool_call_surfaces_warning(self, mock_close, mock_create): + """Stream with text + partial tool-call name + mid-stream error + produces a stub whose content contains the user-visible warning + and whose tool_calls is None.""" + from run_agent import AIAgent + + class _StallError(RuntimeError): + pass + + def _stalling_stream(): + yield _make_stream_chunk(content="Let me write the audit: ") + yield _make_stream_chunk(tool_calls=[ + _make_tool_call_delta(index=0, tc_id="call_1", name="write_file"), + ]) + yield _make_stream_chunk(tool_calls=[ + _make_tool_call_delta(index=0, arguments='{"path": "/tmp/x", '), + ]) + raise _StallError("simulated upstream stall") + + mock_client = MagicMock() + mock_client.chat.completions.create.side_effect = lambda *a, **kw: _stalling_stream() + mock_create.return_value = mock_client + + agent = AIAgent( + api_key="test-key", + base_url="https://openrouter.ai/api/v1", + model="test/model", + quiet_mode=True, + skip_context_files=True, + skip_memory=True, + ) + agent.api_mode = "chat_completions" + agent._interrupt_requested = False + + fired_deltas: list = [] + agent._fire_stream_delta = lambda text: fired_deltas.append(text) + agent._current_streamed_assistant_text = "Let me write the audit: " + + import os as _os + _prev = _os.environ.get("HERMES_STREAM_RETRIES") + _os.environ["HERMES_STREAM_RETRIES"] = "0" + try: + response = agent._interruptible_streaming_api_call({}) + finally: + if _prev is None: + _os.environ.pop("HERMES_STREAM_RETRIES", None) + else: + _os.environ["HERMES_STREAM_RETRIES"] = _prev + + content = response.choices[0].message.content or "" + assert "Let me write the audit:" in content, ( + f"Partial text not preserved in stub: {content!r}" + ) + assert "Stream stalled mid tool-call" in content, ( + f"Stub content is missing the dropped-tool-call warning; users " + f"get silent failure. Got content={content!r}" + ) + assert "write_file" in content, ( + f"Warning should name the dropped tool. Got: {content!r}" + ) + assert response.choices[0].message.tool_calls is None + assert any("Stream stalled mid tool-call" in d for d in fired_deltas), ( + f"Warning was not surfaced as a live stream delta. " + f"fired_deltas={fired_deltas}" + ) + + @patch("run_agent.AIAgent._create_request_openai_client") + @patch("run_agent.AIAgent._close_request_openai_client") + def test_partial_text_only_no_warning(self, mock_close, mock_create): + """Text-only partial stream (no tool call mid-flight) keeps the + pre-fix behaviour: bare recovered text, no warning noise.""" + from run_agent import AIAgent + + class _StallError(RuntimeError): + pass + + def _stalling_stream(): + yield _make_stream_chunk(content="Here's my answer so far") + raise _StallError("simulated upstream stall") + + mock_client = MagicMock() + mock_client.chat.completions.create.side_effect = lambda *a, **kw: _stalling_stream() + mock_create.return_value = mock_client + + agent = AIAgent( + api_key="test-key", + base_url="https://openrouter.ai/api/v1", + model="test/model", + quiet_mode=True, + skip_context_files=True, + skip_memory=True, + ) + agent.api_mode = "chat_completions" + agent._interrupt_requested = False + agent._current_streamed_assistant_text = "Here's my answer so far" + + import os as _os + _prev = _os.environ.get("HERMES_STREAM_RETRIES") + _os.environ["HERMES_STREAM_RETRIES"] = "0" + try: + response = agent._interruptible_streaming_api_call({}) + finally: + if _prev is None: + _os.environ.pop("HERMES_STREAM_RETRIES", None) + else: + _os.environ["HERMES_STREAM_RETRIES"] = _prev + + content = response.choices[0].message.content or "" + assert content == "Here's my answer so far", ( + f"Pre-fix behaviour regressed for text-only partial streams: {content!r}" + ) + assert "Stream stalled" not in content, ( + f"Unexpected warning on text-only partial stream: {content!r}" + ) + From a2c9f5d0a79d7d7fb4ff7bffc44cc9dd1c8f2259 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 18 Apr 2026 01:53:09 -0700 Subject: [PATCH 015/143] docs(execute_code): document project/strict execution modes (#12073) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Follow-up to PR #11971. Documents the new code_execution.mode config key and what each mode actually does. - user-guide/configuration.md: add mode: project to the yaml example, explain project vs strict and call out that security invariants are identical across modes. - user-guide/features/code-execution.md: new 'Execution Mode' section with a comparison table and usage guidance; update the 'temporary directory' note so it reflects that script.py runs in the session CWD in project mode (staging dir stays on PYTHONPATH for imports); drop stale 'sandboxed' framing from the intro and skill-passthrough paragraph. - getting-started/learning-path.md: update the one-line Code Execution summary to match (no longer 'sandboxed environments' — the default runs in the session's real working directory). No code changes. --- website/docs/getting-started/learning-path.md | 2 +- website/docs/user-guide/configuration.md | 10 ++++- .../user-guide/features/code-execution.md | 40 ++++++++++++++++--- 3 files changed, 45 insertions(+), 7 deletions(-) diff --git a/website/docs/getting-started/learning-path.md b/website/docs/getting-started/learning-path.md index bcdbb44d420..41170ccccdb 100644 --- a/website/docs/getting-started/learning-path.md +++ b/website/docs/getting-started/learning-path.md @@ -129,7 +129,7 @@ Not sure what's available? Here's a quick directory of major features: | **MCP** | Connect to external tool servers via Model Context Protocol | [MCP](/docs/user-guide/features/mcp) | | **Cron** | Schedule recurring agent tasks | [Cron](/docs/user-guide/features/cron) | | **Delegation** | Spawn sub-agents for parallel work | [Delegation](/docs/user-guide/features/delegation) | -| **Code Execution** | Run code in sandboxed environments | [Code Execution](/docs/user-guide/features/code-execution) | +| **Code Execution** | Run Python scripts that call Hermes tools programmatically | [Code Execution](/docs/user-guide/features/code-execution) | | **Browser** | Web browsing and scraping | [Browser](/docs/user-guide/features/browser) | | **Hooks** | Event-driven callbacks and middleware | [Hooks](/docs/user-guide/features/hooks) | | **Batch Processing** | Process multiple inputs in bulk | [Batch Processing](/docs/user-guide/features/batch-processing) | diff --git a/website/docs/user-guide/configuration.md b/website/docs/user-guide/configuration.md index 29d1665627e..dbc6b0e47e6 100644 --- a/website/docs/user-guide/configuration.md +++ b/website/docs/user-guide/configuration.md @@ -1104,14 +1104,22 @@ human_delay: ## Code Execution -Configure the sandboxed Python code execution tool: +Configure the `execute_code` tool: ```yaml code_execution: + mode: project # project (default) | strict timeout: 300 # Max execution time in seconds max_tool_calls: 50 # Max tool calls within code execution ``` +**`mode`** controls the working directory and Python interpreter for scripts: + +- **`project`** (default) — scripts run in the session's working directory with the active virtualenv/conda env's python. Project deps (`pandas`, `torch`, project packages) and relative paths (`.env`, `./data.csv`) resolve naturally, matching what `terminal()` sees. +- **`strict`** — scripts run in a temp staging directory with `sys.executable` (Hermes's own python). Maximum reproducibility, but project deps and relative paths won't resolve. + +Environment scrubbing (strips `*_API_KEY`, `*_TOKEN`, `*_SECRET`, `*_PASSWORD`, `*_CREDENTIAL`, `*_PASSWD`, `*_AUTH`) and the tool whitelist apply identically in both modes — switching mode does not change the security posture. + ## Web Search Backends The `web_search`, `web_extract`, and `web_crawl` tools support four backend providers. Configure the backend in `config.yaml` or via `hermes tools`: diff --git a/website/docs/user-guide/features/code-execution.md b/website/docs/user-guide/features/code-execution.md index 53668da9010..4deae296220 100644 --- a/website/docs/user-guide/features/code-execution.md +++ b/website/docs/user-guide/features/code-execution.md @@ -1,12 +1,12 @@ --- sidebar_position: 8 title: "Code Execution" -description: "Sandboxed Python execution with RPC tool access — collapse multi-step workflows into a single turn" +description: "Programmatic Python execution with RPC tool access — collapse multi-step workflows into a single turn" --- # Code Execution (Programmatic Tool Calling) -The `execute_code` tool lets the agent write Python scripts that call Hermes tools programmatically, collapsing multi-step workflows into a single LLM turn. The script runs in a sandboxed child process on the agent host, communicating via Unix domain socket RPC. +The `execute_code` tool lets the agent write Python scripts that call Hermes tools programmatically, collapsing multi-step workflows into a single LLM turn. The script runs in a child process on the agent host, communicating with Hermes over a Unix domain socket RPC. ## How It Works @@ -27,7 +27,7 @@ for r in results["data"]["web"]: print(summary) ``` -**Available tools in sandbox:** `web_search`, `web_extract`, `read_file`, `write_file`, `search_files`, `patch`, `terminal` (foreground only). +**Available tools inside scripts:** `web_search`, `web_extract`, `read_file`, `write_file`, `search_files`, `patch`, `terminal` (foreground only). ## When the Agent Uses This @@ -126,6 +126,35 @@ report = { print(json.dumps(report, indent=2)) ``` +## Execution Mode + +`execute_code` has two execution modes controlled by `code_execution.mode` in `~/.hermes/config.yaml`: + +| Mode | Working directory | Python interpreter | +|------|-------------------|--------------------| +| **`project`** (default) | The session's working directory (same as `terminal()`) | Active `VIRTUAL_ENV` / `CONDA_PREFIX` python, falling back to Hermes's own python | +| `strict` | A temp staging directory isolated from the user's project | `sys.executable` (Hermes's own python) | + +**When to leave it on `project`:** you want `import pandas`, `from my_project import foo`, or relative paths like `open(".env")` to work the same way they do in `terminal()`. This is almost always what you want. + +**When to flip to `strict`:** you need maximum reproducibility — you want the same interpreter every session regardless of which venv the user activated, and you want scripts quarantined from the project tree (no risk of accidentally reading project files through a relative path). + +```yaml +# ~/.hermes/config.yaml +code_execution: + mode: project # or "strict" +``` + +Fallback behavior in `project` mode: if `VIRTUAL_ENV` / `CONDA_PREFIX` is unset, broken, or points at a Python older than 3.8, the resolver falls back cleanly to `sys.executable` — it never leaves the agent without a working interpreter. + +Security-critical invariants are identical across both modes: + +- environment scrubbing (API keys, tokens, credentials stripped) +- tool whitelist (scripts cannot call `execute_code` recursively, `delegate_task`, or MCP tools) +- resource limits (timeout, stdout cap, tool-call cap) + +Switching mode changes where scripts run and which interpreter runs them, not what credentials they can see or which tools they can call. + ## Resource Limits | Resource | Limit | Notes | @@ -140,6 +169,7 @@ All limits are configurable via `config.yaml`: ```yaml # In ~/.hermes/config.yaml code_execution: + mode: project # project (default) | strict timeout: 300 # Max seconds per script (default: 300) max_tool_calls: 50 # Max tool calls per execution (default: 50) ``` @@ -176,7 +206,7 @@ Environment variables containing `KEY`, `TOKEN`, `SECRET`, `PASSWORD`, `CREDENTI ### Skill Environment Variable Passthrough -When a skill declares `required_environment_variables` in its frontmatter, those variables are **automatically passed through** to both `execute_code` and `terminal` sandboxes after the skill is loaded. This lets skills use their declared API keys without weakening the security posture for arbitrary code. +When a skill declares `required_environment_variables` in its frontmatter, those variables are **automatically passed through** to both `execute_code` and `terminal` child processes after the skill is loaded. This lets skills use their declared API keys without weakening the security posture for arbitrary code. For non-skill use cases, you can explicitly allowlist variables in `config.yaml`: @@ -189,7 +219,7 @@ terminal: See the [Security guide](/docs/user-guide/security#environment-variable-passthrough) for full details. -The script runs in a temporary directory that is cleaned up after execution. The child process runs in its own process group so it can be cleanly killed on timeout or interruption. +Hermes always writes the script and the auto-generated `hermes_tools.py` RPC stub into a temp staging directory that is cleaned up after execution. In `strict` mode the script also *runs* there; in `project` mode it runs in the session's working directory (the staging directory stays on `PYTHONPATH` so imports still resolve). The child process runs in its own process group so it can be cleanly killed on timeout or interruption. ## execute_code vs terminal From 8826d9c19796da80bd4d5cc6a3e61a6f45a09775 Mon Sep 17 00:00:00 2001 From: vominh1919 <92574218+vominh1919@users.noreply.github.com> Date: Fri, 17 Apr 2026 16:35:02 +0700 Subject: [PATCH 016/143] fix: FTS5 LIKE fallback for CJK (Chinese/Japanese/Korean) queries MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit FTS5 default tokenizer splits CJK text character-by-character, causing multi-character queries like '记忆断裂' to return 0 results. This fix adds a LIKE fallback: when FTS5 returns no results and the query contains CJK characters, retry with WHERE content LIKE '%query%'. Preserves FTS5 performance for English queries. Fixes #11511 --- hermes_state.py | 54 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 52 insertions(+), 2 deletions(-) diff --git a/hermes_state.py b/hermes_state.py index 5e563666e83..0a8b000ab47 100644 --- a/hermes_state.py +++ b/hermes_state.py @@ -987,6 +987,22 @@ class SessionDB: return sanitized.strip() + + @staticmethod + def _contains_cjk(text: str) -> bool: + """Check if text contains CJK (Chinese, Japanese, Korean) characters.""" + for ch in text: + cp = ord(ch) + if (0x4E00 <= cp <= 0x9FFF or # CJK Unified Ideographs + 0x3400 <= cp <= 0x4DBF or # CJK Extension A + 0x20000 <= cp <= 0x2A6DF or # CJK Extension B + 0x3000 <= cp <= 0x303F or # CJK Symbols + 0x3040 <= cp <= 0x309F or # Hiragana + 0x30A0 <= cp <= 0x30FF or # Katakana + 0xAC00 <= cp <= 0xD7AF): # Hangul Syllables + return True + return False + def search_messages( self, query: str, @@ -1062,8 +1078,42 @@ class SessionDB: cursor = self._conn.execute(sql, params) except sqlite3.OperationalError: # FTS5 query syntax error despite sanitization — return empty - return [] - matches = [dict(row) for row in cursor.fetchall()] + # unless query contains CJK (fall back to LIKE below) + if not self._contains_cjk(query): + return [] + matches = [] + else: + matches = [dict(row) for row in cursor.fetchall()] + + # LIKE fallback for CJK queries: FTS5 default tokenizer splits CJK + # characters individually, causing multi-character queries to fail. + if not matches and self._contains_cjk(query): + raw_query = query.strip('"').strip() + like_where = ["m.content LIKE ?"] + like_params: list = [f"%{raw_query}%"] + if source_filter is not None: + like_where.append(f"s.source IN ({','.join('?' for _ in source_filter)})") + like_params.extend(source_filter) + if exclude_sources is not None: + like_where.append(f"s.source NOT IN ({','.join('?' for _ in exclude_sources)})") + like_params.extend(exclude_sources) + if role_filter: + like_where.append(f"m.role IN ({','.join('?' for _ in role_filter)})") + like_params.extend(role_filter) + like_sql = f""" + SELECT m.id, m.session_id, m.role, m.content AS snippet, + m.content, m.timestamp, m.tool_name, + s.source, s.model, s.started_at AS session_started + FROM messages m + JOIN sessions s ON s.id = m.session_id + WHERE {' AND '.join(like_where)} + ORDER BY m.timestamp DESC + LIMIT ? OFFSET ? + """ + like_params.extend([limit, offset]) + with self._lock: + like_cursor = self._conn.execute(like_sql, like_params) + matches = [dict(row) for row in like_cursor.fetchall()] # Add surrounding context (1 message before + after each match). # Done outside the lock so we don't hold it across N sequential queries. From 3b69b2fd615c4679c647946d597ef6c84763f370 Mon Sep 17 00:00:00 2001 From: teknium1 Date: Sat, 18 Apr 2026 01:56:22 -0700 Subject: [PATCH 017/143] test(session-search): regression coverage for CJK LIKE fallback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Twelve tests under TestCJKSearchFallback guarding: - CJK detection across Chinese/Japanese/Korean/Hiragana/Katakana ranges (including the full Hangul syllables block \uac00-\ud7af, to catch the shorter-range typo from one of the duplicate PRs) - Substring match for multi-char Chinese, Japanese, Korean queries - Filter preservation (source_filter, exclude_sources, role_filter) in the LIKE path — guards against the SQL-builder bug from another duplicate PR where filter clauses landed after LIMIT/OFFSET - Snippet centered on the matched term (instr-based substr window), not the leading 200 chars of content - English fast-path untouched - Empty/no-match cases - Mixed CJK+English queries Also: - hermes_state.py: LIKE-fallback snippet is now `substr(content, max(1, instr(content, ?) - 40), 120)`, centered on the match instead of the whole-content default. Credit goes to @iamagenius00 for the snippet idea in PR #11517. - scripts/release.py: add @iamagenius00 to AUTHOR_MAP so future release attribution resolves cleanly. Refs #11511, #11516, #11517, #11541. Co-authored-by: iamagenius00 --- hermes_state.py | 7 +- scripts/release.py | 1 + tests/test_hermes_state.py | 135 +++++++++++++++++++++++++++++++++++++ 3 files changed, 142 insertions(+), 1 deletion(-) diff --git a/hermes_state.py b/hermes_state.py index 0a8b000ab47..af97f7fbd89 100644 --- a/hermes_state.py +++ b/hermes_state.py @@ -1101,7 +1101,10 @@ class SessionDB: like_where.append(f"m.role IN ({','.join('?' for _ in role_filter)})") like_params.extend(role_filter) like_sql = f""" - SELECT m.id, m.session_id, m.role, m.content AS snippet, + SELECT m.id, m.session_id, m.role, + substr(m.content, + max(1, instr(m.content, ?) - 40), + 120) AS snippet, m.content, m.timestamp, m.tool_name, s.source, s.model, s.started_at AS session_started FROM messages m @@ -1111,6 +1114,8 @@ class SessionDB: LIMIT ? OFFSET ? """ like_params.extend([limit, offset]) + # instr() parameter goes first in the bound list + like_params = [raw_query] + like_params with self._lock: like_cursor = self._conn.execute(like_sql, like_params) matches = [dict(row) for row in like_cursor.fetchall()] diff --git a/scripts/release.py b/scripts/release.py index 5e909de76ec..372a4802ba7 100755 --- a/scripts/release.py +++ b/scripts/release.py @@ -207,6 +207,7 @@ AUTHOR_MAP = { "cola-runner@users.noreply.github.com": "cola-runner", "ygd58@users.noreply.github.com": "ygd58", "vominh1919@users.noreply.github.com": "vominh1919", + "iamagenius00@users.noreply.github.com": "iamagenius00", "trevmanthony@gmail.com": "trevthefoolish", "ziliangpeng@users.noreply.github.com": "ziliangpeng", "centripetal-star@users.noreply.github.com": "centripetal-star", diff --git a/tests/test_hermes_state.py b/tests/test_hermes_state.py index 5f9a16a529c..d54d7b9fb0f 100644 --- a/tests/test_hermes_state.py +++ b/tests/test_hermes_state.py @@ -479,6 +479,141 @@ class TestFTS5Search: assert s('my-app.config.ts') == '"my-app.config.ts"' +# ========================================================================= +# CJK (Chinese/Japanese/Korean) LIKE fallback +# ========================================================================= + +class TestCJKSearchFallback: + """Regression tests for CJK search (see #11511). + + SQLite FTS5's default tokenizer treats contiguous CJK runs as a single + token ("和其他agent的聊天记录" → one token), so substring queries like + "记忆断裂" return 0 rows despite the data being present. SessionDB falls + back to LIKE substring matching whenever FTS5 returns no results and + the query contains CJK characters. + """ + + def test_cjk_detection_covers_all_ranges(self): + from hermes_state import SessionDB + f = SessionDB._contains_cjk + # Chinese (CJK Unified Ideographs) + assert f("记忆断裂") is True + # Japanese Hiragana + Katakana + assert f("こんにちは") is True + assert f("カタカナ") is True + # Korean Hangul syllables (both early and late — guards against + # the \ud7a0-\ud7af typo seen in one of the duplicate PRs) + assert f("안녕하세요") is True + assert f("기억") is True + # Non-CJK + assert f("hello world") is False + assert f("日本語mixedwithenglish") is True + assert f("") is False + + def test_chinese_multichar_query_returns_results(self, db): + """The headline bug: multi-char Chinese query must not return [].""" + db.create_session(session_id="s1", source="cli") + db.append_message( + "s1", role="user", + content="昨天和其他Agent的聊天记录,记忆断裂问题复现了", + ) + results = db.search_messages("记忆断裂") + assert len(results) == 1 + assert results[0]["session_id"] == "s1" + + def test_chinese_bigram_query(self, db): + db.create_session(session_id="s1", source="telegram") + db.append_message("s1", role="user", content="今天讨论A2A通信协议的实现") + results = db.search_messages("通信") + assert len(results) == 1 + + def test_korean_query_returns_results(self, db): + """Guards against Hangul range typos (\\uac00-\\ud7af, not \\ud7a0-).""" + db.create_session(session_id="s1", source="cli") + db.append_message("s1", role="user", content="안녕하세요 반갑습니다") + results = db.search_messages("안녕") + assert len(results) == 1 + + def test_japanese_query_returns_results(self, db): + db.create_session(session_id="s1", source="cli") + db.append_message("s1", role="user", content="こんにちは世界") + assert len(db.search_messages("こんにちは")) == 1 + assert len(db.search_messages("世界")) == 1 + + def test_cjk_fallback_preserves_source_filter(self, db): + """Guards against the SQL-builder bug where filter clauses land + after LIMIT/OFFSET (seen in one of the duplicate PRs).""" + db.create_session(session_id="s1", source="cli") + db.create_session(session_id="s2", source="telegram") + db.append_message("s1", role="user", content="记忆断裂在CLI") + db.append_message("s2", role="user", content="记忆断裂在Telegram") + + results = db.search_messages("记忆断裂", source_filter=["telegram"]) + assert len(results) == 1 + assert results[0]["source"] == "telegram" + + def test_cjk_fallback_preserves_exclude_sources(self, db): + db.create_session(session_id="s1", source="cli") + db.create_session(session_id="s2", source="tool") + db.append_message("s1", role="user", content="记忆断裂在CLI") + db.append_message("s2", role="assistant", content="记忆断裂在tool") + + results = db.search_messages("记忆断裂", exclude_sources=["tool"]) + sources = {r["source"] for r in results} + assert "tool" not in sources + assert "cli" in sources + + def test_cjk_fallback_preserves_role_filter(self, db): + db.create_session(session_id="s1", source="cli") + db.append_message("s1", role="user", content="用户说的记忆断裂") + db.append_message("s1", role="assistant", content="助手说的记忆断裂") + + results = db.search_messages("记忆断裂", role_filter=["assistant"]) + assert len(results) == 1 + assert results[0]["role"] == "assistant" + + def test_cjk_snippet_is_centered_on_match(self, db): + """Snippet should contain the search term, not just the first N chars.""" + db.create_session(session_id="s1", source="cli") + long_prefix = "这是一段很长的前缀用来把匹配位置推到文档中间" * 3 + long_suffix = "这是一段很长的后缀内容填充剩余空间" * 3 + db.append_message( + "s1", role="user", + content=f"{long_prefix}记忆断裂{long_suffix}", + ) + results = db.search_messages("记忆断裂") + assert len(results) == 1 + # The centered substr() snippet must include the matched term. + assert "记忆断裂" in results[0]["snippet"] + + def test_english_query_still_uses_fts5_fast_path(self, db): + """English queries must not trigger the LIKE fallback (fast path regression).""" + db.create_session(session_id="s1", source="cli") + db.append_message("s1", role="user", content="Deploy docker containers") + results = db.search_messages("docker") + assert len(results) == 1 + # No CJK in query → LIKE fallback must not run. We don't assert this + # directly (no instrumentation), but the FTS5 path produces an + # FTS5-style snippet with highlight markers when the term is short. + # At minimum: english queries must still match. + + def test_cjk_query_with_no_matches_returns_empty(self, db): + db.create_session(session_id="s1", source="cli") + db.append_message("s1", role="user", content="unrelated English content") + results = db.search_messages("记忆断裂") + assert results == [] + + def test_mixed_cjk_english_query(self, db): + """Mixed queries should still fall back to LIKE when FTS5 misses.""" + db.create_session(session_id="s1", source="cli") + db.append_message("s1", role="user", content="讨论Agent通信协议") + # "Agent通信" is CJK+English — FTS5 default tokenizer indexes the + # whole CJK run with embedded "agent" as separate tokens; the LIKE + # fallback handles the substring correctly. + results = db.search_messages("Agent通信") + assert len(results) == 1 + + # ========================================================================= # Session search and listing # ========================================================================= From cf012a05d895b4f2c19f75b27f799d222421be82 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 18 Apr 2026 03:53:21 -0700 Subject: [PATCH 018/143] docs(terminal): warn against stacking watch_patterns + notify_on_complete on end-of-run markers (#12113) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Stacking both features on the same event produces duplicate, delayed notifications — delivery is async and continues firing after the process exits, so matches on end-of-run markers (SUMMARY, DONE, PASS) arrive after the agent has already polled/waited and moved on. Updates both the terminal tool JSON schema description and the terminal_tool() function docstring to make the split explicit: - watch_patterns: mid-process signals only (errors, readiness markers, intermediate steps you want to react to before the process exits) - notify_on_complete: end-of-run completion signal No behavioural change. --- tools/terminal_tool.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/terminal_tool.py b/tools/terminal_tool.py index 69832cc1c7a..1182207b84c 100644 --- a/tools/terminal_tool.py +++ b/tools/terminal_tool.py @@ -1126,7 +1126,7 @@ def terminal_tool( workdir: Working directory for this command (optional, uses session cwd if not set) pty: If True, use pseudo-terminal for interactive CLI tools (local backend only) notify_on_complete: If True and background=True, auto-notify the agent when the process exits - watch_patterns: List of strings to watch for in background output; triggers notification on match + watch_patterns: List of strings to watch for in background output; fires a notification on first match per pattern. Use ONLY for mid-process signals (errors, readiness markers) that appear before exit. For end-of-run markers use notify_on_complete instead — stacking both produces duplicate, delayed notifications. Returns: str: JSON string with output, exit_code, and error fields @@ -1724,7 +1724,7 @@ TERMINAL_SCHEMA = { "watch_patterns": { "type": "array", "items": {"type": "string"}, - "description": "List of strings to watch for in background process output. When any pattern matches a line of output, you'll be notified with the matching text — like notify_on_complete but triggers mid-process on specific output. Use for monitoring logs, watching for errors, or waiting for specific events (e.g. [\"ERROR\", \"FAIL\", \"listening on port\"])." + "description": "Strings to watch for in background process output. Fires a notification the first time each pattern matches a line of output. **Use ONLY for mid-process signals** you want to react to before the process exits — errors, readiness markers, intermediate step markers (e.g. [\"ERROR\", \"Traceback\", \"listening on port\"]). Do NOT use for end-of-run markers (summary headers, 'DONE', 'PASS' printed right before exit) — use `notify_on_complete` for that instead. Stacking end-of-run patterns on top of `notify_on_complete` produces duplicate, delayed notifications that arrive after you've already moved on, since delivery is asynchronous and continues after the process exits." } }, "required": ["command"] From 9527707f805a35377169616fa41dd7711e42a9dc Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 18 Apr 2026 04:13:32 -0700 Subject: [PATCH 019/143] fix(signal): back off sendTyping spam for unreachable recipients (#12118) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit base.py's _keep_typing refresh loop calls send_typing every ~2s while the agent is processing. If signal-cli returns NETWORK_FAILURE for the recipient (offline, unroutable, group membership lost), the unmitigated path was a WARNING log every 2 seconds for as long as the agent stayed busy — a user report showed 1048 warnings in 41 minutes for one offline contact, plus the matching volume of pointless RPC traffic to signal-cli. - _rpc() accepts log_failures=False so callers can route repeated expected failures (typing) to DEBUG while keeping send/receive at WARNING. - send_typing() tracks consecutive failures per chat. First failure still logs WARNING so transport issues remain visible; subsequent failures log at DEBUG. After three consecutive failures we skip the RPC during an exponential cooldown (16s, 32s, 60s cap) so we stop hammering signal-cli for a recipient it can't deliver to. A successful sendTyping resets the counters. - _stop_typing_indicator() clears the backoff state so the next agent turn starts fresh. E2E simulation against the reported 41-minute window: RPCs drop from 1230 to 45 (-96%), log lines from 1048 WARNINGs to 1 WARNING + 44 DEBUGs. Credits kshitijk4poor (#12056) for the _rpc log_failures kwarg idea; the broader restructure in that PR (nested per-chat loop inside send_typing) is avoided here in favour of stateful backoff that preserves base.py's existing _keep_typing architecture. --- gateway/platforms/signal.py | 84 +++++++++++++++++++-- tests/gateway/test_signal.py | 137 +++++++++++++++++++++++++++++++++++ 2 files changed, 215 insertions(+), 6 deletions(-) diff --git a/gateway/platforms/signal.py b/gateway/platforms/signal.py index 617713ad908..4df4193bc0d 100644 --- a/gateway/platforms/signal.py +++ b/gateway/platforms/signal.py @@ -160,6 +160,14 @@ class SignalAdapter(BasePlatformAdapter): self._sse_task: Optional[asyncio.Task] = None self._health_monitor_task: Optional[asyncio.Task] = None self._typing_tasks: Dict[str, asyncio.Task] = {} + # Per-chat typing-indicator backoff. When signal-cli reports + # NETWORK_FAILURE (recipient offline / unroutable), base.py's + # _keep_typing refresh loop would otherwise hammer sendTyping every + # ~2s indefinitely, producing WARNING-level log spam and pointless + # RPC traffic. We track consecutive failures per chat and skip the + # RPC during a cooldown window instead. + self._typing_failures: Dict[str, int] = {} + self._typing_skip_until: Dict[str, float] = {} self._running = False self._last_sse_activity = 0.0 self._sse_response: Optional[httpx.Response] = None @@ -548,8 +556,22 @@ class SignalAdapter(BasePlatformAdapter): # JSON-RPC Communication # ------------------------------------------------------------------ - async def _rpc(self, method: str, params: dict, rpc_id: str = None) -> Any: - """Send a JSON-RPC 2.0 request to signal-cli daemon.""" + async def _rpc( + self, + method: str, + params: dict, + rpc_id: str = None, + *, + log_failures: bool = True, + ) -> Any: + """Send a JSON-RPC 2.0 request to signal-cli daemon. + + When ``log_failures=False``, error and exception paths log at DEBUG + instead of WARNING — used by the typing-indicator path to silence + repeated NETWORK_FAILURE spam for unreachable recipients while + still preserving visibility for the first occurrence and for + unrelated RPCs. + """ if not self.client: logger.warning("Signal: RPC called but client not connected") return None @@ -574,13 +596,19 @@ class SignalAdapter(BasePlatformAdapter): data = resp.json() if "error" in data: - logger.warning("Signal RPC error (%s): %s", method, data["error"]) + if log_failures: + logger.warning("Signal RPC error (%s): %s", method, data["error"]) + else: + logger.debug("Signal RPC error (%s): %s", method, data["error"]) return None return data.get("result") except Exception as e: - logger.warning("Signal RPC %s failed: %s", method, e) + if log_failures: + logger.warning("Signal RPC %s failed: %s", method, e) + else: + logger.debug("Signal RPC %s failed: %s", method, e) return None # ------------------------------------------------------------------ @@ -627,7 +655,28 @@ class SignalAdapter(BasePlatformAdapter): self._recent_sent_timestamps.pop() async def send_typing(self, chat_id: str, metadata=None) -> None: - """Send a typing indicator.""" + """Send a typing indicator. + + base.py's ``_keep_typing`` refresh loop calls this every ~2s while + the agent is processing. If signal-cli returns NETWORK_FAILURE for + this recipient (offline, unroutable, group membership lost, etc.) + the unmitigated behaviour is: a WARNING log every 2 seconds for as + long as the agent keeps running. Instead we: + + - silence the WARNING after the first consecutive failure (subsequent + attempts log at DEBUG) so transport issues are still visible once + but don't flood the log, + - skip the RPC entirely during an exponential cooldown window once + three consecutive failures have happened, so we stop hammering + signal-cli with requests it can't deliver. + + A successful sendTyping clears the counters. + """ + now = time.monotonic() + skip_until = self._typing_skip_until.get(chat_id, 0.0) + if now < skip_until: + return + params: Dict[str, Any] = { "account": self.account, } @@ -637,7 +686,26 @@ class SignalAdapter(BasePlatformAdapter): else: params["recipient"] = [chat_id] - await self._rpc("sendTyping", params, rpc_id="typing") + fails = self._typing_failures.get(chat_id, 0) + result = await self._rpc( + "sendTyping", + params, + rpc_id="typing", + log_failures=(fails == 0), + ) + + if result is None: + fails += 1 + self._typing_failures[chat_id] = fails + # After 3 consecutive failures, back off exponentially (16s, + # 32s, 60s cap) to stop spamming signal-cli for a recipient + # that clearly isn't reachable right now. + if fails >= 3: + backoff = min(60.0, 16.0 * (2 ** (fails - 3))) + self._typing_skip_until[chat_id] = now + backoff + else: + self._typing_failures.pop(chat_id, None) + self._typing_skip_until.pop(chat_id, None) async def send_image( self, @@ -789,6 +857,10 @@ class SignalAdapter(BasePlatformAdapter): await task except asyncio.CancelledError: pass + # Reset per-chat typing backoff state so the next agent turn starts + # fresh rather than inheriting a cooldown from a prior conversation. + self._typing_failures.pop(chat_id, None) + self._typing_skip_until.pop(chat_id, None) async def stop_typing(self, chat_id: str) -> None: """Public interface for stopping typing — called by base adapter's diff --git a/tests/gateway/test_signal.py b/tests/gateway/test_signal.py index 26f1e4f3bb3..eee3a0db8aa 100644 --- a/tests/gateway/test_signal.py +++ b/tests/gateway/test_signal.py @@ -740,3 +740,140 @@ class TestSignalStopTyping: await adapter.stop_typing("+155****4567") adapter._stop_typing_indicator.assert_awaited_once_with("+155****4567") + + +# --------------------------------------------------------------------------- +# Typing-indicator backoff on repeated failures (Signal RPC spam fix) +# --------------------------------------------------------------------------- + +class TestSignalTypingBackoff: + """When base.py's _keep_typing refresh loop calls send_typing every ~2s + and the recipient is unreachable (NETWORK_FAILURE), the adapter must: + + - log WARNING only for the first failure (subsequent failures use DEBUG + via log_failures=False on the _rpc call) + - after 3 consecutive failures, skip the RPC entirely during an + exponential cooldown window instead of hammering signal-cli every 2s + - reset counters on a successful sendTyping + - reset counters when _stop_typing_indicator() is called for the chat + """ + + @pytest.mark.asyncio + async def test_first_failure_logs_at_warning_subsequent_at_debug( + self, monkeypatch + ): + adapter = _make_signal_adapter(monkeypatch) + calls = [] + + async def _fake_rpc(method, params, rpc_id=None, *, log_failures=True): + calls.append({"log_failures": log_failures}) + return None # simulate NETWORK_FAILURE + + adapter._rpc = _fake_rpc + + await adapter.send_typing("+155****4567") + await adapter.send_typing("+155****4567") + + assert len(calls) == 2 + assert calls[0]["log_failures"] is True # first failure — warn + assert calls[1]["log_failures"] is False # subsequent — debug + + @pytest.mark.asyncio + async def test_three_consecutive_failures_trigger_cooldown( + self, monkeypatch + ): + adapter = _make_signal_adapter(monkeypatch) + call_count = {"n": 0} + + async def _fake_rpc(method, params, rpc_id=None, *, log_failures=True): + call_count["n"] += 1 + return None + + adapter._rpc = _fake_rpc + + # Three failures engage the cooldown. + await adapter.send_typing("+155****4567") + await adapter.send_typing("+155****4567") + await adapter.send_typing("+155****4567") + assert call_count["n"] == 3 + assert "+155****4567" in adapter._typing_skip_until + + # Fourth, fifth, ... calls during the cooldown window are short- + # circuited — the RPC is not issued at all. + await adapter.send_typing("+155****4567") + await adapter.send_typing("+155****4567") + assert call_count["n"] == 3 + + @pytest.mark.asyncio + async def test_cooldown_is_per_chat_not_global(self, monkeypatch): + adapter = _make_signal_adapter(monkeypatch) + call_log = [] + + async def _fake_rpc(method, params, rpc_id=None, *, log_failures=True): + call_log.append(params.get("recipient") or params.get("groupId")) + return None + + adapter._rpc = _fake_rpc + + # Drive chat A into cooldown. + for _ in range(3): + await adapter.send_typing("+155****4567") + assert "+155****4567" in adapter._typing_skip_until + + # Chat B is unaffected — still makes RPCs. + await adapter.send_typing("+155****9999") + await adapter.send_typing("+155****9999") + assert "+155****9999" not in adapter._typing_skip_until + # Chat A cooldown untouched + assert "+155****4567" in adapter._typing_skip_until + + @pytest.mark.asyncio + async def test_success_resets_failure_counter_and_cooldown( + self, monkeypatch + ): + adapter = _make_signal_adapter(monkeypatch) + result_queue = [None, None, {"timestamp": 12345}] + call_log = [] + + async def _fake_rpc(method, params, rpc_id=None, *, log_failures=True): + call_log.append(log_failures) + return result_queue.pop(0) + + adapter._rpc = _fake_rpc + + await adapter.send_typing("+155****4567") # fail 1 — warn + await adapter.send_typing("+155****4567") # fail 2 — debug + await adapter.send_typing("+155****4567") # success — reset + + assert adapter._typing_failures.get("+155****4567", 0) == 0 + assert "+155****4567" not in adapter._typing_skip_until + + # Next failure after recovery logs at WARNING again (fresh counter). + async def _fail(method, params, rpc_id=None, *, log_failures=True): + call_log.append(log_failures) + return None + + adapter._rpc = _fail + await adapter.send_typing("+155****4567") + assert call_log[-1] is True # first failure in a fresh cycle + + @pytest.mark.asyncio + async def test_stop_typing_indicator_clears_backoff_state( + self, monkeypatch + ): + adapter = _make_signal_adapter(monkeypatch) + + async def _fail(method, params, rpc_id=None, *, log_failures=True): + return None + + adapter._rpc = _fail + + for _ in range(3): + await adapter.send_typing("+155****4567") + assert adapter._typing_failures.get("+155****4567") == 3 + assert "+155****4567" in adapter._typing_skip_until + + await adapter._stop_typing_indicator("+155****4567") + + assert "+155****4567" not in adapter._typing_failures + assert "+155****4567" not in adapter._typing_skip_until From f9667331e559caf8476fa4775b8add4c0c23d933 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 18 Apr 2026 04:14:05 -0700 Subject: [PATCH 020/143] docs(browser): improve /browser connect setup guidance (#12123) - Note that /browser connect is CLI-only and won't work in gateways (WebUI, Telegram, Discord). - Update the Chrome launch command to use a dedicated --user-data-dir, so port 9222 actually comes up even when Chrome is already running with the user's regular profile. - Add --no-first-run --no-default-browser-check to skip the fresh-profile wizard. - Explain why the dedicated user-data-dir matters. Community tip via Karamjit Singh. Co-authored-by: teknium1 --- website/docs/user-guide/features/browser.md | 23 ++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/website/docs/user-guide/features/browser.md b/website/docs/user-guide/features/browser.md index 42b6815df51..5b2462d2e37 100644 --- a/website/docs/user-guide/features/browser.md +++ b/website/docs/user-guide/features/browser.md @@ -163,6 +163,10 @@ When Camofox runs in headed mode (with a visible browser window), it exposes a V Instead of a cloud provider, you can attach Hermes browser tools to your own running Chrome instance via the Chrome DevTools Protocol (CDP). This is useful when you want to see what the agent is doing in real-time, interact with pages that require your own cookies/sessions, or avoid cloud browser costs. +:::note +`/browser connect` is an **interactive-CLI slash command** — it is not dispatched by the gateway. If you try to run it inside a WebUI, Telegram, Discord, or other gateway chat, the message will be sent to the agent as plain text and the command will not execute. Start Hermes from the terminal (`hermes` or `hermes chat`) and issue `/browser connect` there. +::: + In the CLI, use: ``` @@ -175,14 +179,27 @@ In the CLI, use: If Chrome isn't already running with remote debugging, Hermes will attempt to auto-launch it with `--remote-debugging-port=9222`. :::tip -To start Chrome manually with CDP enabled: +To start Chrome manually with CDP enabled, use a dedicated user-data-dir so the debug port actually comes up even if Chrome is already running with your normal profile: + ```bash # Linux -google-chrome --remote-debugging-port=9222 +google-chrome \ + --remote-debugging-port=9222 \ + --user-data-dir=$HOME/.hermes/chrome-debug \ + --no-first-run \ + --no-default-browser-check & # macOS -"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222 +"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ + --remote-debugging-port=9222 \ + --user-data-dir="$HOME/.hermes/chrome-debug" \ + --no-first-run \ + --no-default-browser-check & ``` + +Then launch the Hermes CLI and run `/browser connect`. + +**Why `--user-data-dir`?** Without it, launching Chrome while a regular Chrome instance is already running typically opens a new window on the existing process — and that existing process was not started with `--remote-debugging-port`, so port 9222 never opens. A dedicated user-data-dir forces a fresh Chrome process where the debug port actually listens. `--no-first-run --no-default-browser-check` skips the first-launch wizard for the fresh profile. ::: When connected via CDP, all browser tools (`browser_navigate`, `browser_click`, etc.) operate on your live Chrome instance instead of spinning up a cloud session. From 2edebedc9eeb48093dda2a58ce3715b34d23bc15 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 18 Apr 2026 04:17:18 -0700 Subject: [PATCH 021/143] feat(steer): /steer injects a mid-run note after the next tool call (#12116) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat(steer): /steer injects a mid-run note after the next tool call Adds a new slash command that sits between /queue (turn boundary) and interrupt. /steer stashes the message on the running agent and the agent loop appends it to the LAST tool result's content once the current tool batch finishes. The model sees it as part of the tool output on its next iteration. No interrupt is fired, no new user turn is inserted, and no prompt cache invalidation happens beyond the normal per-turn tool-result churn. Message-role alternation is preserved — we only modify an existing role:"tool" message's content. Wiring ------ - hermes_cli/commands.py: register /steer + add to ACTIVE_SESSION_BYPASS_COMMANDS. - run_agent.py: add _pending_steer state, AIAgent.steer(), _drain_pending_steer(), _apply_pending_steer_to_tool_results(); drain at end of both parallel and sequential tool executors; clear on interrupt; return leftover as result['pending_steer'] if the agent exits before another tool batch. - cli.py: /steer handler — route to agent.steer() when running, fall back to the regular queue otherwise; deliver result['pending_steer'] as next turn. - gateway/run.py: running-agent intercept calls running_agent.steer(); idle-agent path strips the prefix and forwards as a regular user message. - tui_gateway/server.py: new session.steer JSON-RPC method. - ui-tui: SessionSteerResponse type + local /steer slash command that calls session.steer when ui.busy, otherwise enqueues for the next turn. Fallbacks --------- - Agent exits mid-steer → surfaces in run_conversation result as pending_steer so CLI/gateway deliver it as the next user turn instead of silently dropping it. - All tools skipped after interrupt → re-stashes pending_steer for the caller. - No active agent → /steer reduces to sending the text as a normal message. Tests ----- - tests/run_agent/test_steer.py — accept/reject, concatenation, drain, last-tool-result injection, multimodal list content, thread safety, cleared-on-interrupt, registry membership, bypass-set membership. - tests/gateway/test_steer_command.py — running agent, pending sentinel, missing steer() method, rejected payload, empty payload. - tests/gateway/test_command_bypass_active_session.py — /steer bypasses the Level-1 base adapter guard. - tests/test_tui_gateway_server.py — session.steer RPC paths. 72/72 targeted tests pass under scripts/run_tests.sh. * feat(steer): register /steer in Discord's native slash tree Discord's app_commands tree is a curated subset of slash commands (not derived from COMMAND_REGISTRY like Telegram/Slack). /steer already works there as plain text (routes through handle_message → base adapter bypass → runner), but registering it here adds Discord's native autocomplete + argument hint UI so users can discover and type it like any other first-class command. --- cli.py | 34 ++- gateway/platforms/discord.py | 5 + gateway/run.py | 63 +++++ hermes_cli/commands.py | 3 + run_agent.py | 152 ++++++++++++ .../test_command_bypass_active_session.py | 19 ++ tests/gateway/test_steer_command.py | 191 +++++++++++++++ tests/run_agent/test_steer.py | 228 ++++++++++++++++++ tests/test_tui_gateway_server.py | 71 ++++++ tui_gateway/server.py | 25 ++ ui-tui/src/app/slash/commands/core.ts | 32 ++- ui-tui/src/gatewayTypes.ts | 5 + 12 files changed, 826 insertions(+), 2 deletions(-) create mode 100644 tests/gateway/test_steer_command.py create mode 100644 tests/run_agent/test_steer.py diff --git a/cli.py b/cli.py index ea76991acc3..8aa8bb03f11 100644 --- a/cli.py +++ b/cli.py @@ -5720,6 +5720,30 @@ class HermesCLI: _cprint(f" Queued for the next turn: {payload[:80]}{'...' if len(payload) > 80 else ''}") else: _cprint(f" Queued: {payload[:80]}{'...' if len(payload) > 80 else ''}") + elif canonical == "steer": + # Inject a message after the next tool call without interrupting. + # If the agent is actively running, push the text into the agent's + # pending_steer slot — the drain hook in _execute_tool_calls_* + # will append it to the next tool result's content. If no agent + # is running, fall back to queue semantics (same as /queue). + parts = cmd_original.split(None, 1) + payload = parts[1].strip() if len(parts) > 1 else "" + if not payload: + _cprint(" Usage: /steer ") + elif self._agent_running and self.agent is not None and hasattr(self.agent, "steer"): + try: + accepted = self.agent.steer(payload) + except Exception as exc: + _cprint(f" Steer failed: {exc}") + else: + if accepted: + _cprint(f" ⏩ Steer queued — arrives after the next tool call: {payload[:80]}{'...' if len(payload) > 80 else ''}") + else: + _cprint(" Steer rejected (empty payload).") + else: + # No active run — treat as a normal next-turn message. + self._pending_input.put(payload) + _cprint(f" No agent running; queued as next turn: {payload[:80]}{'...' if len(payload) > 80 else ''}") elif canonical == "skin": self._handle_skin_command(cmd_original) elif canonical == "voice": @@ -8244,7 +8268,15 @@ class HermesCLI: else: print(f"\n⚡ Sending after interrupt: '{preview}'") self._pending_input.put(combined) - + + # If a /steer was left over (agent finished before another tool + # batch could absorb it), deliver it as the next user turn. + _leftover_steer = result.get("pending_steer") if result else None + if _leftover_steer and hasattr(self, '_pending_input'): + preview = _leftover_steer[:60] + ("..." if len(_leftover_steer) > 60 else "") + print(f"\n⏩ Delivering leftover /steer as next turn: '{preview}'") + self._pending_input.put(_leftover_steer) + return response except Exception as e: diff --git a/gateway/platforms/discord.py b/gateway/platforms/discord.py index 5cad956a362..31973b9629b 100644 --- a/gateway/platforms/discord.py +++ b/gateway/platforms/discord.py @@ -1994,6 +1994,11 @@ class DiscordAdapter(BasePlatformAdapter): async def slash_stop(interaction: discord.Interaction): await self._run_simple_slash(interaction, "/stop", "Stop requested~") + @tree.command(name="steer", description="Inject a message after the next tool call (no interrupt)") + @discord.app_commands.describe(prompt="Text to inject into the agent's next tool result") + async def slash_steer(interaction: discord.Interaction, prompt: str): + await self._run_simple_slash(interaction, f"/steer {prompt}".strip()) + @tree.command(name="compress", description="Compress conversation context") async def slash_compress(interaction: discord.Interaction): await self._run_simple_slash(interaction, "/compress") diff --git a/gateway/run.py b/gateway/run.py index 62b813f0d6b..1525ad14776 100644 --- a/gateway/run.py +++ b/gateway/run.py @@ -3019,6 +3019,54 @@ class GatewayRunner: adapter._pending_messages[_quick_key] = queued_event return "Queued for the next turn." + # /steer — inject mid-run after the next tool call. + # Unlike /queue (turn boundary), /steer lands BETWEEN tool-call + # iterations inside the same agent run, by appending to the + # last tool result's content. No interrupt, no new user turn, + # no role-alternation violation. + if _cmd_def_inner and _cmd_def_inner.name == "steer": + steer_text = event.get_command_args().strip() + if not steer_text: + return "Usage: /steer " + running_agent = self._running_agents.get(_quick_key) + if running_agent is _AGENT_PENDING_SENTINEL: + # Agent hasn't started yet — queue as turn-boundary fallback. + adapter = self.adapters.get(source.platform) + if adapter: + from gateway.platforms.base import MessageEvent as _ME, MessageType as _MT + queued_event = _ME( + text=steer_text, + message_type=_MT.TEXT, + source=event.source, + message_id=event.message_id, + channel_prompt=event.channel_prompt, + ) + adapter._pending_messages[_quick_key] = queued_event + return "Agent still starting — /steer queued for the next turn." + if running_agent and hasattr(running_agent, "steer"): + try: + accepted = running_agent.steer(steer_text) + except Exception as exc: + logger.warning("Steer failed for session %s: %s", _quick_key[:20], exc) + return f"⚠️ Steer failed: {exc}" + if accepted: + preview = steer_text[:60] + ("..." if len(steer_text) > 60 else "") + return f"⏩ Steer queued — arrives after the next tool call: '{preview}'" + return "Steer rejected (empty payload)." + # Running agent is missing or lacks steer() — fall back to queue. + adapter = self.adapters.get(source.platform) + if adapter: + from gateway.platforms.base import MessageEvent as _ME, MessageType as _MT + queued_event = _ME( + text=steer_text, + message_type=_MT.TEXT, + source=event.source, + message_id=event.message_id, + channel_prompt=event.channel_prompt, + ) + adapter._pending_messages[_quick_key] = queued_event + return "No active agent — /steer queued for the next turn." + # /model must not be used while the agent is running. if _cmd_def_inner and _cmd_def_inner.name == "model": return "Agent is running — wait or /stop first, then switch models." @@ -3260,6 +3308,21 @@ class GatewayRunner: if canonical == "btw": return await self._handle_btw_command(event) + if canonical == "steer": + # No active agent — /steer has no tool call to inject into. + # Strip the prefix so downstream treats it as a normal user + # message. If the payload is empty, surface the usage hint. + steer_payload = event.get_command_args().strip() + if not steer_payload: + return "Usage: /steer (no agent is running; sending as a normal message)" + try: + event.text = steer_payload + except Exception: + pass + # Do NOT return — fall through to _handle_message_with_agent + # at the end of this function so the rewritten text is sent + # to the agent as a regular user turn. + if canonical == "voice": return await self._handle_voice_command(event) diff --git a/hermes_cli/commands.py b/hermes_cli/commands.py index ce257b0d7cb..681e6f9b265 100644 --- a/hermes_cli/commands.py +++ b/hermes_cli/commands.py @@ -91,6 +91,8 @@ COMMAND_REGISTRY: list[CommandDef] = [ aliases=("tasks",)), CommandDef("queue", "Queue a prompt for the next turn (doesn't interrupt)", "Session", aliases=("q",), args_hint=""), + CommandDef("steer", "Inject a message after the next tool call without interrupting", "Session", + args_hint=""), CommandDef("status", "Show session info", "Session"), CommandDef("profile", "Show active profile name and home directory", "Info"), CommandDef("sethome", "Set this chat as the home channel", "Session", @@ -275,6 +277,7 @@ ACTIVE_SESSION_BYPASS_COMMANDS: frozenset[str] = frozenset( "queue", "restart", "status", + "steer", "stop", "update", } diff --git a/run_agent.py b/run_agent.py index d5ff125e33b..a47455e5345 100644 --- a/run_agent.py +++ b/run_agent.py @@ -832,6 +832,16 @@ class AIAgent: self._interrupt_thread_signal_pending = False self._client_lock = threading.RLock() + # /steer mechanism — inject a user note into the next tool result + # without interrupting the agent. Unlike interrupt(), steer() does + # NOT set _interrupt_requested; it waits for the current tool batch + # to finish naturally, then the drain hook appends the text to the + # last tool result's content so the model sees it on its next + # iteration. Message-role alternation is preserved (we modify an + # existing tool message rather than inserting a new user turn). + self._pending_steer: Optional[str] = None + self._pending_steer_lock = threading.Lock() + # Concurrent-tool worker thread tracking. `_execute_tool_calls_concurrent` # runs each tool on its own ThreadPoolExecutor worker — those worker # threads have tids distinct from `_execution_thread_id`, so @@ -3265,6 +3275,129 @@ class AIAgent: _set_interrupt(False, _wtid) except Exception: pass + # A hard interrupt supersedes any pending /steer — the steer was + # meant for the agent's next tool-call iteration, which will no + # longer happen. Drop it instead of surprising the user with a + # late injection on the post-interrupt turn. + _steer_lock = getattr(self, "_pending_steer_lock", None) + if _steer_lock is not None: + with _steer_lock: + self._pending_steer = None + + def steer(self, text: str) -> bool: + """ + Inject a user message into the next tool result without interrupting. + + Unlike interrupt(), this does NOT stop the current tool call. The + text is stashed and the agent loop appends it to the LAST tool + result's content once the current tool batch finishes. The model + sees the steer as part of the tool output on its next iteration. + + Thread-safe: callable from gateway/CLI/TUI threads. Multiple calls + before the drain point concatenate with newlines. + + Args: + text: The user text to inject. Empty strings are ignored. + + Returns: + True if the steer was accepted, False if the text was empty. + """ + if not text or not text.strip(): + return False + cleaned = text.strip() + _lock = getattr(self, "_pending_steer_lock", None) + if _lock is None: + # Test stubs that built AIAgent via object.__new__ skip __init__. + # Fall back to direct attribute set; no concurrent callers expected + # in those stubs. + existing = getattr(self, "_pending_steer", None) + self._pending_steer = (existing + "\n" + cleaned) if existing else cleaned + return True + with _lock: + if self._pending_steer: + self._pending_steer = self._pending_steer + "\n" + cleaned + else: + self._pending_steer = cleaned + return True + + def _drain_pending_steer(self) -> Optional[str]: + """Return the pending steer text (if any) and clear the slot. + + Safe to call from the agent execution thread after appending tool + results. Returns None when no steer is pending. + """ + _lock = getattr(self, "_pending_steer_lock", None) + if _lock is None: + text = getattr(self, "_pending_steer", None) + self._pending_steer = None + return text + with _lock: + text = self._pending_steer + self._pending_steer = None + return text + + def _apply_pending_steer_to_tool_results(self, messages: list, num_tool_msgs: int) -> None: + """Append any pending /steer text to the last tool result in this turn. + + Called at the end of a tool-call batch, before the next API call. + The steer is appended to the last ``role:"tool"`` message's content + with a clear marker so the model understands it came from the user + and NOT from the tool itself. Role alternation is preserved — + nothing new is inserted, we only modify existing content. + + Args: + messages: The running messages list. + num_tool_msgs: Number of tool results appended in this batch; + used to locate the tail slice safely. + """ + if num_tool_msgs <= 0 or not messages: + return + steer_text = self._drain_pending_steer() + if not steer_text: + return + # Find the last tool-role message in the recent tail. Skipping + # non-tool messages defends against future code appending + # something else at the boundary. + target_idx = None + for j in range(len(messages) - 1, max(len(messages) - num_tool_msgs - 1, -1), -1): + msg = messages[j] + if isinstance(msg, dict) and msg.get("role") == "tool": + target_idx = j + break + if target_idx is None: + # No tool result in this batch (e.g. all skipped by interrupt); + # put the steer back so the caller's fallback path can deliver + # it as a normal next-turn user message. + _lock = getattr(self, "_pending_steer_lock", None) + if _lock is not None: + with _lock: + if self._pending_steer: + self._pending_steer = self._pending_steer + "\n" + steer_text + else: + self._pending_steer = steer_text + else: + existing = getattr(self, "_pending_steer", None) + self._pending_steer = (existing + "\n" + steer_text) if existing else steer_text + return + marker = f"\n\n[USER STEER (injected mid-run, not tool output): {steer_text}]" + existing_content = messages[target_idx].get("content", "") + if not isinstance(existing_content, str): + # Anthropic multimodal content blocks — preserve them and append + # a text block at the end. + try: + blocks = list(existing_content) if existing_content else [] + blocks.append({"type": "text", "text": marker.lstrip()}) + messages[target_idx]["content"] = blocks + except Exception: + # Fall back to string replacement if content shape is unexpected. + messages[target_idx]["content"] = f"{existing_content}{marker}" + else: + messages[target_idx]["content"] = existing_content + marker + logger.info( + "Delivered /steer to agent after tool batch (%d chars): %s", + len(steer_text), + steer_text[:120] + ("..." if len(steer_text) > 120 else ""), + ) def _touch_activity(self, desc: str) -> None: """Update the last-activity timestamp and description (thread-safe).""" @@ -7951,6 +8084,13 @@ class AIAgent: turn_tool_msgs = messages[-num_tools:] enforce_turn_budget(turn_tool_msgs, env=get_active_env(effective_task_id)) + # ── /steer injection ────────────────────────────────────────────── + # Append any pending user steer text to the last tool result so the + # agent sees it on its next iteration. Runs AFTER budget enforcement + # so the steer marker is never truncated. See steer() for details. + if num_tools > 0: + self._apply_pending_steer_to_tool_results(messages, num_tools) + def _execute_tool_calls_sequential(self, assistant_message, messages: list, effective_task_id: str, api_call_count: int = 0) -> None: """Execute tool calls sequentially (original behavior). Used for single calls or interactive tools.""" for i, tool_call in enumerate(assistant_message.tool_calls, 1): @@ -8330,6 +8470,12 @@ class AIAgent: if num_tools_seq > 0: enforce_turn_budget(messages[-num_tools_seq:], env=get_active_env(effective_task_id)) + # ── /steer injection ────────────────────────────────────────────── + # See _execute_tool_calls_parallel for the rationale. Same hook, + # applied to sequential execution as well. + if num_tools_seq > 0: + self._apply_pending_steer_to_tool_results(messages, num_tools_seq) + def _handle_max_iterations(self, messages: list, api_call_count: int) -> str: @@ -11610,6 +11756,12 @@ class AIAgent: "cost_status": self.session_cost_status, "cost_source": self.session_cost_source, } + # If a /steer landed after the final assistant turn (no more tool + # batches to drain into), hand it back to the caller so it can be + # delivered as the next user turn instead of being silently lost. + _leftover_steer = self._drain_pending_steer() + if _leftover_steer: + result["pending_steer"] = _leftover_steer self._response_was_previewed = False # Include interrupt message if one triggered the interrupt diff --git a/tests/gateway/test_command_bypass_active_session.py b/tests/gateway/test_command_bypass_active_session.py index 10ff062126a..c456243945a 100644 --- a/tests/gateway/test_command_bypass_active_session.py +++ b/tests/gateway/test_command_bypass_active_session.py @@ -200,6 +200,25 @@ class TestCommandBypassActiveSession: "/background response was not sent back to the user" ) + @pytest.mark.asyncio + async def test_steer_bypasses_guard(self): + """/steer must bypass the Level-1 active-session guard so it reaches + the gateway runner's /steer handler and injects into the running + agent instead of being queued as user text for the next turn. + """ + adapter = _make_adapter() + sk = _session_key() + adapter._active_sessions[sk] = asyncio.Event() + + await adapter.handle_message(_make_event("/steer also check auth.log")) + + assert sk not in adapter._pending_messages, ( + "/steer was queued as a pending message instead of being dispatched" + ) + assert any("handled:steer" in r for r in adapter.sent_responses), ( + "/steer response was not sent back to the user" + ) + @pytest.mark.asyncio async def test_help_bypasses_guard(self): """/help must bypass so it is not silently dropped as pending slash text.""" diff --git a/tests/gateway/test_steer_command.py b/tests/gateway/test_steer_command.py new file mode 100644 index 00000000000..b756ff09622 --- /dev/null +++ b/tests/gateway/test_steer_command.py @@ -0,0 +1,191 @@ +"""Tests for the gateway /steer command handler. + +/steer injects a user message into the agent's next tool result without +interrupting. The gateway runner must: + + 1. When an agent IS running → call ``agent.steer(text)``, do NOT set + ``_interrupt_requested``, do NOT touch ``_pending_messages``. + 2. When the agent is the PENDING sentinel → fall back to /queue + semantics (store in ``adapter._pending_messages``). + 3. When no agent is active → strip the slash prefix and let the normal + prompt pipeline handle it as a regular user message. +""" +from __future__ import annotations + +from datetime import datetime +from types import SimpleNamespace +from unittest.mock import AsyncMock, MagicMock + +import pytest + +from gateway.config import GatewayConfig, Platform, PlatformConfig +from gateway.platforms.base import MessageEvent +from gateway.session import SessionEntry, SessionSource, build_session_key + + +def _make_source() -> SessionSource: + return SessionSource( + platform=Platform.TELEGRAM, + user_id="u1", + chat_id="c1", + user_name="tester", + chat_type="dm", + ) + + +def _make_event(text: str) -> MessageEvent: + return MessageEvent( + text=text, + source=_make_source(), + message_id="m1", + ) + + +def _make_runner(session_entry: SessionEntry): + from gateway.run import GatewayRunner + + runner = object.__new__(GatewayRunner) + runner.config = GatewayConfig( + platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="***")} + ) + adapter = MagicMock() + adapter.send = AsyncMock() + adapter._pending_messages = {} + runner.adapters = {Platform.TELEGRAM: adapter} + runner._voice_mode = {} + runner.hooks = SimpleNamespace(emit=AsyncMock(), loaded_hooks=False) + runner.session_store = MagicMock() + runner.session_store.get_or_create_session.return_value = session_entry + runner.session_store.load_transcript.return_value = [] + runner.session_store.has_any_sessions.return_value = True + runner._running_agents = {} + runner._running_agents_ts = {} + runner._pending_messages = {} + runner._pending_approvals = {} + runner._session_db = MagicMock() + runner._session_db.get_session_title.return_value = None + runner._reasoning_config = None + runner._provider_routing = {} + runner._fallback_model = None + runner._show_reasoning = False + runner._is_user_authorized = lambda _source: True + runner._set_session_env = lambda _context: None + runner._should_send_voice_reply = lambda *_args, **_kwargs: False + runner._send_voice_reply = AsyncMock() + runner._capture_gateway_honcho_if_configured = lambda *args, **kwargs: None + runner._emit_gateway_run_progress = AsyncMock() + return runner, adapter + + +def _session_entry() -> SessionEntry: + return SessionEntry( + session_key=build_session_key(_make_source()), + session_id="sess-1", + created_at=datetime.now(), + updated_at=datetime.now(), + platform=Platform.TELEGRAM, + chat_type="dm", + total_tokens=0, + ) + + +@pytest.mark.asyncio +async def test_steer_calls_agent_steer_and_does_not_interrupt(): + """When an agent is running, /steer must call agent.steer(text) and + leave interrupt state untouched.""" + runner, adapter = _make_runner(_session_entry()) + sk = build_session_key(_make_source()) + + running_agent = MagicMock() + running_agent.steer.return_value = True + runner._running_agents[sk] = running_agent + + result = await runner._handle_message(_make_event("/steer also check auth.log")) + + # The handler replied with a confirmation + assert result is not None + assert "steer" in result.lower() or "queued" in result.lower() + # The agent's steer() was called with the payload (prefix stripped) + running_agent.steer.assert_called_once_with("also check auth.log") + # Critically: interrupt was NOT called + running_agent.interrupt.assert_not_called() + # And no user-text queueing happened — the steer doesn't go into + # _pending_messages (that would be turn-boundary /queue semantics). + assert runner._pending_messages == {} + assert adapter._pending_messages == {} + + +@pytest.mark.asyncio +async def test_steer_without_payload_returns_usage(): + runner, _adapter = _make_runner(_session_entry()) + sk = build_session_key(_make_source()) + running_agent = MagicMock() + runner._running_agents[sk] = running_agent + + result = await runner._handle_message(_make_event("/steer")) + + assert result is not None + assert "Usage" in result or "usage" in result + running_agent.steer.assert_not_called() + running_agent.interrupt.assert_not_called() + + +@pytest.mark.asyncio +async def test_steer_with_pending_sentinel_falls_back_to_queue(): + """When the agent hasn't finished booting (sentinel), /steer should + queue as a turn-boundary follow-up instead of crashing.""" + from gateway.run import _AGENT_PENDING_SENTINEL + + runner, adapter = _make_runner(_session_entry()) + sk = build_session_key(_make_source()) + runner._running_agents[sk] = _AGENT_PENDING_SENTINEL + + result = await runner._handle_message(_make_event("/steer wait up")) + + assert result is not None + assert "queued" in result.lower() or "starting" in result.lower() + # The fallback put the text into the adapter's pending queue. + assert sk in adapter._pending_messages + assert adapter._pending_messages[sk].text == "wait up" + + +@pytest.mark.asyncio +async def test_steer_agent_without_steer_method_falls_back(): + """If the running agent somehow lacks the steer() method (older build, + test stub), the handler must not explode — fall back to /queue.""" + runner, adapter = _make_runner(_session_entry()) + sk = build_session_key(_make_source()) + + # A bare object that does NOT have steer() — use a spec'd Mock so + # hasattr(agent, "steer") returns False. + running_agent = MagicMock(spec=[]) + runner._running_agents[sk] = running_agent + + result = await runner._handle_message(_make_event("/steer fallback")) + + assert result is not None + # Must mention queueing since steer wasn't available + assert "queued" in result.lower() + assert sk in adapter._pending_messages + assert adapter._pending_messages[sk].text == "fallback" + + +@pytest.mark.asyncio +async def test_steer_rejected_payload_returns_rejection_message(): + """If agent.steer() returns False (e.g. empty after strip — though + the gateway already guards this), surface a rejection message.""" + runner, _adapter = _make_runner(_session_entry()) + sk = build_session_key(_make_source()) + + running_agent = MagicMock() + running_agent.steer.return_value = False + runner._running_agents[sk] = running_agent + + result = await runner._handle_message(_make_event("/steer hello")) + + assert result is not None + assert "rejected" in result.lower() or "empty" in result.lower() + + +if __name__ == "__main__": # pragma: no cover + pytest.main([__file__, "-v"]) diff --git a/tests/run_agent/test_steer.py b/tests/run_agent/test_steer.py new file mode 100644 index 00000000000..a298ede8c08 --- /dev/null +++ b/tests/run_agent/test_steer.py @@ -0,0 +1,228 @@ +"""Tests for AIAgent.steer() — mid-run user message injection. + +/steer lets the user add a note to the agent's next tool result without +interrupting the current tool call. The agent sees the note inline with +tool output on its next iteration, preserving message-role alternation +and prompt-cache integrity. +""" +from __future__ import annotations + +import threading + +import pytest + +from run_agent import AIAgent + + +def _bare_agent() -> AIAgent: + """Build an AIAgent without running __init__, then install the steer + state manually — matches the existing object.__new__ stub pattern + used elsewhere in the test suite. + """ + agent = object.__new__(AIAgent) + agent._pending_steer = None + agent._pending_steer_lock = threading.Lock() + return agent + + +class TestSteerAcceptance: + def test_accepts_non_empty_text(self): + agent = _bare_agent() + assert agent.steer("go ahead and check the logs") is True + assert agent._pending_steer == "go ahead and check the logs" + + def test_rejects_empty_string(self): + agent = _bare_agent() + assert agent.steer("") is False + assert agent._pending_steer is None + + def test_rejects_whitespace_only(self): + agent = _bare_agent() + assert agent.steer(" \n\t ") is False + assert agent._pending_steer is None + + def test_rejects_none(self): + agent = _bare_agent() + assert agent.steer(None) is False # type: ignore[arg-type] + assert agent._pending_steer is None + + def test_strips_surrounding_whitespace(self): + agent = _bare_agent() + assert agent.steer(" hello world \n") is True + assert agent._pending_steer == "hello world" + + def test_concatenates_multiple_steers_with_newlines(self): + agent = _bare_agent() + agent.steer("first note") + agent.steer("second note") + agent.steer("third note") + assert agent._pending_steer == "first note\nsecond note\nthird note" + + +class TestSteerDrain: + def test_drain_returns_and_clears(self): + agent = _bare_agent() + agent.steer("hello") + assert agent._drain_pending_steer() == "hello" + assert agent._pending_steer is None + + def test_drain_on_empty_returns_none(self): + agent = _bare_agent() + assert agent._drain_pending_steer() is None + + +class TestSteerInjection: + def test_appends_to_last_tool_result(self): + agent = _bare_agent() + agent.steer("please also check auth.log") + messages = [ + {"role": "user", "content": "what's in /var/log?"}, + {"role": "assistant", "tool_calls": [{"id": "a"}, {"id": "b"}]}, + {"role": "tool", "content": "ls output A", "tool_call_id": "a"}, + {"role": "tool", "content": "ls output B", "tool_call_id": "b"}, + ] + agent._apply_pending_steer_to_tool_results(messages, num_tool_msgs=2) + # The LAST tool result is modified; earlier ones are untouched. + assert messages[2]["content"] == "ls output A" + assert "ls output B" in messages[3]["content"] + assert "[USER STEER" in messages[3]["content"] + assert "please also check auth.log" in messages[3]["content"] + # And pending_steer is consumed. + assert agent._pending_steer is None + + def test_no_op_when_no_steer_pending(self): + agent = _bare_agent() + messages = [ + {"role": "assistant", "tool_calls": [{"id": "a"}]}, + {"role": "tool", "content": "output", "tool_call_id": "a"}, + ] + agent._apply_pending_steer_to_tool_results(messages, num_tool_msgs=1) + assert messages[-1]["content"] == "output" # unchanged + + def test_no_op_when_num_tool_msgs_zero(self): + agent = _bare_agent() + agent.steer("steer") + messages = [{"role": "user", "content": "hi"}] + agent._apply_pending_steer_to_tool_results(messages, num_tool_msgs=0) + # Steer should remain pending (nothing to drain into) + assert agent._pending_steer == "steer" + + def test_marker_is_unambiguous_about_origin(self): + """The injection marker must make clear the text is from the user + and not tool output — this is the cache-safe way to signal + provenance without violating message-role alternation. + """ + agent = _bare_agent() + agent.steer("stop after next step") + messages = [{"role": "tool", "content": "x", "tool_call_id": "1"}] + agent._apply_pending_steer_to_tool_results(messages, num_tool_msgs=1) + content = messages[-1]["content"] + assert "USER STEER" in content + assert "not tool output" in content.lower() or "injected mid-run" in content.lower() + + def test_multimodal_content_list_preserved(self): + """Anthropic-style list content should be preserved, with the steer + appended as a text block.""" + agent = _bare_agent() + agent.steer("extra note") + original_blocks = [{"type": "text", "text": "existing output"}] + messages = [ + {"role": "tool", "content": list(original_blocks), "tool_call_id": "1"} + ] + agent._apply_pending_steer_to_tool_results(messages, num_tool_msgs=1) + new_content = messages[-1]["content"] + assert isinstance(new_content, list) + assert len(new_content) == 2 + assert new_content[0] == {"type": "text", "text": "existing output"} + assert new_content[1]["type"] == "text" + assert "extra note" in new_content[1]["text"] + + def test_restashed_when_no_tool_result_in_batch(self): + """If the 'batch' contains no tool-role messages (e.g. all skipped + after an interrupt), the steer should be put back into the pending + slot so the caller's fallback path can deliver it.""" + agent = _bare_agent() + agent.steer("ping") + messages = [ + {"role": "user", "content": "x"}, + {"role": "assistant", "content": "y"}, + ] + # Claim there were N tool msgs, but the tail has none — simulates + # the interrupt-cancelled case. + agent._apply_pending_steer_to_tool_results(messages, num_tool_msgs=2) + # Messages untouched + assert messages[-1]["content"] == "y" + # And the steer is back in pending so the fallback can grab it + assert agent._pending_steer == "ping" + + +class TestSteerThreadSafety: + def test_concurrent_steer_calls_preserve_all_text(self): + agent = _bare_agent() + N = 200 + + def worker(idx: int) -> None: + agent.steer(f"note-{idx}") + + threads = [threading.Thread(target=worker, args=(i,)) for i in range(N)] + for t in threads: + t.start() + for t in threads: + t.join() + + text = agent._drain_pending_steer() + assert text is not None + # Every single note must be preserved — none dropped by the lock. + lines = text.split("\n") + assert len(lines) == N + assert set(lines) == {f"note-{i}" for i in range(N)} + + +class TestSteerClearedOnInterrupt: + def test_clear_interrupt_drops_pending_steer(self): + """A hard interrupt supersedes any pending steer — the agent's + next tool iteration won't happen, so delivering the steer later + would be surprising.""" + agent = _bare_agent() + # Minimal surface needed by clear_interrupt() + agent._interrupt_requested = True + agent._interrupt_message = None + agent._interrupt_thread_signal_pending = False + agent._execution_thread_id = None + agent._tool_worker_threads = None + agent._tool_worker_threads_lock = None + + agent.steer("will be dropped") + assert agent._pending_steer == "will be dropped" + + agent.clear_interrupt() + assert agent._pending_steer is None + + +class TestSteerCommandRegistry: + def test_steer_in_command_registry(self): + """The /steer slash command must be registered so it reaches all + platforms (CLI, gateway, TUI autocomplete, Telegram/Slack menus). + """ + from hermes_cli.commands import resolve_command, ACTIVE_SESSION_BYPASS_COMMANDS + + cmd = resolve_command("steer") + assert cmd is not None + assert cmd.name == "steer" + assert cmd.category == "Session" + assert cmd.args_hint == "" + + def test_steer_in_bypass_set(self): + """When the agent is running, /steer MUST bypass the Level-1 + base-adapter queue so it reaches the gateway runner's /steer + handler. Otherwise it would be queued as user text and only + delivered at turn end — defeating the whole point. + """ + from hermes_cli.commands import ACTIVE_SESSION_BYPASS_COMMANDS, should_bypass_active_session + + assert "steer" in ACTIVE_SESSION_BYPASS_COMMANDS + assert should_bypass_active_session("steer") is True + + +if __name__ == "__main__": # pragma: no cover + pytest.main([__file__, "-v"]) diff --git a/tests/test_tui_gateway_server.py b/tests/test_tui_gateway_server.py index e7681b784cf..ea231e626e5 100644 --- a/tests/test_tui_gateway_server.py +++ b/tests/test_tui_gateway_server.py @@ -438,3 +438,74 @@ def test_rollback_restore_resolves_number_and_file_path(): assert resp["result"]["success"] is True assert calls["args"][1] == "bbb222" assert calls["args"][2] == "src/app.tsx" + + +# ── session.steer ──────────────────────────────────────────────────── + + +def test_session_steer_calls_agent_steer_when_agent_supports_it(): + """The TUI RPC method must call agent.steer(text) and return a + queued status without touching interrupt state. + """ + calls = {} + + class _Agent: + def steer(self, text): + calls["steer_text"] = text + return True + + def interrupt(self, *args, **kwargs): + calls["interrupt_called"] = True + + server._sessions["sid"] = _session(agent=_Agent()) + try: + resp = server.handle_request( + { + "id": "1", + "method": "session.steer", + "params": {"session_id": "sid", "text": "also check auth.log"}, + } + ) + finally: + server._sessions.pop("sid", None) + + assert "result" in resp, resp + assert resp["result"]["status"] == "queued" + assert resp["result"]["text"] == "also check auth.log" + assert calls["steer_text"] == "also check auth.log" + assert "interrupt_called" not in calls # must NOT interrupt + + +def test_session_steer_rejects_empty_text(): + server._sessions["sid"] = _session(agent=types.SimpleNamespace(steer=lambda t: True)) + try: + resp = server.handle_request( + { + "id": "1", + "method": "session.steer", + "params": {"session_id": "sid", "text": " "}, + } + ) + finally: + server._sessions.pop("sid", None) + + assert "error" in resp, resp + assert resp["error"]["code"] == 4002 + + +def test_session_steer_errors_when_agent_has_no_steer_method(): + server._sessions["sid"] = _session(agent=types.SimpleNamespace()) # no steer() + try: + resp = server.handle_request( + { + "id": "1", + "method": "session.steer", + "params": {"session_id": "sid", "text": "hi"}, + } + ) + finally: + server._sessions.pop("sid", None) + + assert "error" in resp, resp + assert resp["error"]["code"] == 4010 + diff --git a/tui_gateway/server.py b/tui_gateway/server.py index 3ef76a0f02e..a7dae9e5c60 100644 --- a/tui_gateway/server.py +++ b/tui_gateway/server.py @@ -1340,6 +1340,31 @@ def _(rid, params: dict) -> dict: return _ok(rid, {"status": "interrupted"}) +@method("session.steer") +def _(rid, params: dict) -> dict: + """Inject a user message into the next tool result without interrupting. + + Mirrors AIAgent.steer(). Safe to call while a turn is running — the text + lands on the last tool result of the next tool batch and the model sees + it on its next iteration. No interrupt, no new user turn, no role + alternation violation. + """ + text = (params.get("text") or "").strip() + if not text: + return _err(rid, 4002, "text is required") + session, err = _sess_nowait(params, rid) + if err: + return err + agent = session.get("agent") + if agent is None or not hasattr(agent, "steer"): + return _err(rid, 4010, "agent does not support steer") + try: + accepted = agent.steer(text) + except Exception as exc: + return _err(rid, 5000, f"steer failed: {exc}") + return _ok(rid, {"status": "queued" if accepted else "rejected", "text": text}) + + @method("terminal.resize") def _(rid, params: dict) -> dict: session, err = _sess_nowait(params, rid) diff --git a/ui-tui/src/app/slash/commands/core.ts b/ui-tui/src/app/slash/commands/core.ts index e0832c7a694..a151b2cdc87 100644 --- a/ui-tui/src/app/slash/commands/core.ts +++ b/ui-tui/src/app/slash/commands/core.ts @@ -1,7 +1,7 @@ import { dailyFortune, randomFortune } from '../../../content/fortunes.js' import { HOTKEYS } from '../../../content/hotkeys.js' import { nextDetailsMode, parseDetailsMode } from '../../../domain/details.js' -import type { ConfigGetValueResponse, ConfigSetResponse, SessionUndoResponse } from '../../../gatewayTypes.js' +import type { ConfigGetValueResponse, ConfigSetResponse, SessionSteerResponse, SessionUndoResponse } from '../../../gatewayTypes.js' import { writeOsc52Clipboard } from '../../../lib/osc52.js' import type { DetailsMode, Msg, PanelSection } from '../../../types.js' import { patchOverlayState } from '../../overlayStore.js' @@ -245,6 +245,36 @@ export const coreCommands: SlashCommand[] = [ } }, + { + help: 'inject a message after the next tool call (no interrupt)', + name: 'steer', + run: (arg, ctx) => { + const payload = arg?.trim() ?? '' + + if (!payload) { + return ctx.transcript.sys('usage: /steer ') + } + + // If the agent isn't running, fall back to the queue so the user's + // message isn't lost — identical semantics to the gateway handler. + if (!ctx.ui.busy || !ctx.sid) { + ctx.composer.enqueue(payload) + ctx.transcript.sys(`no active turn — queued for next: "${payload.slice(0, 50)}${payload.length > 50 ? '…' : ''}"`) + return + } + + ctx.gateway.rpc('session.steer', { session_id: ctx.sid, text: payload }).then( + ctx.guarded(r => { + if (r?.status === 'queued') { + ctx.transcript.sys(`⏩ steer queued — arrives after next tool call: "${payload.slice(0, 50)}${payload.length > 50 ? '…' : ''}"`) + } else { + ctx.transcript.sys('steer rejected') + } + }) + ).catch(ctx.guardedErr) + } + }, + { help: 'undo last exchange', name: 'undo', diff --git a/ui-tui/src/gatewayTypes.ts b/ui-tui/src/gatewayTypes.ts index 9e21b9bc587..c8d1c685523 100644 --- a/ui-tui/src/gatewayTypes.ts +++ b/ui-tui/src/gatewayTypes.ts @@ -152,6 +152,11 @@ export interface SessionInterruptResponse { ok?: boolean } +export interface SessionSteerResponse { + status?: 'queued' | 'rejected' + text?: string +} + // ── Prompt / submission ────────────────────────────────────────────── export interface PromptSubmitResponse { From 6fb69229caba4bd5699228e520de4956b3458187 Mon Sep 17 00:00:00 2001 From: Siddharth Balyan <52913345+alt-glitch@users.noreply.github.com> Date: Sat, 18 Apr 2026 06:51:28 -0700 Subject: [PATCH 022/143] fix(nix): fix build failures, TUI Node.js crash, and upgrade container to Node 22 (#12159) * Add setuptools build dep for legacy alibabacloud packages and updated stale npm-deps hash * Add HERMES_NODE env var to pin Node.js version The TUI requires Node.js 20+ for regex `/v` flag support (used by string-width). Instead of relying on PATH lookup, explicitly set HERMES_NODE to the bundled Node 22 in the Nix wrapper, and add a fallback check in the Python code to use HERMES_NODE if available. Also upgrade container provisioning to Node 22 via NodeSource (Ubuntu 24.04 ships Node 18 which is EOL) and add a Nix check to verify the wrapper and Node version at build time. --- hermes_cli/main.py | 4 ++++ nix/checks.nix | 23 +++++++++++++++++++++++ nix/nixosModules.nix | 14 +++++++++++--- nix/packages.nix | 3 ++- nix/python.nix | 15 +++++++++++++++ nix/tui.nix | 7 +------ 6 files changed, 56 insertions(+), 10 deletions(-) diff --git a/hermes_cli/main.py b/hermes_cli/main.py index 0afadac3d16..a13a6f88ee9 100644 --- a/hermes_cli/main.py +++ b/hermes_cli/main.py @@ -897,6 +897,10 @@ def _make_tui_argv(tui_dir: Path, tui_dev: bool) -> tuple[list[str], Path]: _ensure_tui_node() def _node_bin(bin: str) -> str: + if bin == "node": + env_node = os.environ.get("HERMES_NODE") + if env_node and os.path.isfile(env_node) and os.access(env_node, os.X_OK): + return env_node path = shutil.which(bin) if not path: print(f"{bin} not found — install Node.js to use the TUI.") diff --git a/nix/checks.nix b/nix/checks.nix index 55068a94f16..ff8e7947c57 100644 --- a/nix/checks.nix +++ b/nix/checks.nix @@ -125,6 +125,29 @@ json.dump(sorted(leaf_paths(DEFAULT_CONFIG)), sys.stdout, indent=2) echo "ok" > $out/result ''; + # Verify HERMES_NODE is set in wrapper and points to Node 20+ + # (string-width uses the /v regex flag which requires Node 20+) + hermes-node = pkgs.runCommand "hermes-node-version" { } '' + set -e + echo "=== Checking HERMES_NODE in wrapper ===" + grep -q "HERMES_NODE" ${hermes-agent}/bin/hermes || \ + (echo "FAIL: HERMES_NODE not set in wrapper"; exit 1) + echo "PASS: HERMES_NODE present in wrapper" + + HERMES_NODE=$(sed -n "s/^export HERMES_NODE='\(.*\)'/\1/p" ${hermes-agent}/bin/hermes) + test -x "$HERMES_NODE" || (echo "FAIL: HERMES_NODE=$HERMES_NODE not executable"; exit 1) + echo "PASS: HERMES_NODE executable at $HERMES_NODE" + + NODE_MAJOR=$("$HERMES_NODE" --version | sed 's/^v//' | cut -d. -f1) + test "$NODE_MAJOR" -ge 20 || \ + (echo "FAIL: Node v$NODE_MAJOR < 20, TUI needs /v regex flag support"; exit 1) + echo "PASS: Node v$NODE_MAJOR >= 20" + + echo "=== All HERMES_NODE checks passed ===" + mkdir -p $out + echo "ok" > $out/result + ''; + # Verify HERMES_MANAGED guard works on all mutation commands managed-guard = pkgs.runCommand "hermes-managed-guard" { } '' set -e diff --git a/nix/nixosModules.nix b/nix/nixosModules.nix index 75b3dca31b2..24a2a1b6ddc 100644 --- a/nix/nixosModules.nix +++ b/nix/nixosModules.nix @@ -121,11 +121,19 @@ # ── Provision apt packages (first boot only, cached in writable layer) ── # sudo: agent self-modification # nodejs/npm: writable node so npm i -g works (nix store copies are read-only) - # curl: needed for uv installer + # Node 22 via NodeSource — Ubuntu 24.04 ships Node 18 which is EOL. + # curl: needed for uv installer + NodeSource setup if [ ! -f /var/lib/hermes-tools-provisioned ] && command -v apt-get >/dev/null 2>&1; then echo "First boot: provisioning agent tools..." apt-get update -qq - apt-get install -y -qq sudo nodejs npm curl + apt-get install -y -qq sudo curl ca-certificates gnupg + mkdir -p /etc/apt/keyrings + curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key \ + | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg + echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_22.x nodistro main" \ + > /etc/apt/sources.list.d/nodesource.list + apt-get update -qq + apt-get install -y -qq nodejs touch /var/lib/hermes-tools-provisioned fi @@ -171,7 +179,7 @@ # Package and entrypoint use stable symlinks (current-package, current-entrypoint) # so they can update without recreation. Env vars go through $HERMES_HOME/.env. containerIdentity = builtins.hashString "sha256" (builtins.toJSON { - schema = 3; # bump when identity inputs change + schema = 4; # bump when identity inputs change (4: Node 18→22 via NodeSource) image = cfg.container.image; extraVolumes = cfg.container.extraVolumes; extraOptions = cfg.container.extraOptions; diff --git a/nix/packages.nix b/nix/packages.nix index f39d9d0b2be..968ad12fb71 100644 --- a/nix/packages.nix +++ b/nix/packages.nix @@ -63,7 +63,8 @@ --suffix PATH : "${runtimePath}" \ --set HERMES_BUNDLED_SKILLS $out/share/hermes-agent/skills \ --set HERMES_TUI_DIR $out/ui-tui \ - --set HERMES_PYTHON ${hermesVenv}/bin/python3 + --set HERMES_PYTHON ${hermesVenv}/bin/python3 \ + --set HERMES_NODE ${pkgs.nodejs_22}/bin/node '') [ "hermes" diff --git a/nix/python.nix b/nix/python.nix index 160b4ee790b..91411f4d754 100644 --- a/nix/python.nix +++ b/nix/python.nix @@ -35,6 +35,20 @@ let }; }; + # Legacy alibabacloud packages ship only sdists with setup.py/setup.cfg + # and no pyproject.toml, so setuptools isn't declared as a build dep. + buildSystemOverrides = final: prev: builtins.mapAttrs + (name: _: prev.${name}.overrideAttrs (old: { + nativeBuildInputs = (old.nativeBuildInputs or [ ]) ++ [ final.setuptools ]; + })) + (lib.genAttrs [ + "alibabacloud-credentials-api" + "alibabacloud-endpoint-util" + "alibabacloud-gateway-dingtalk" + "alibabacloud-gateway-spi" + "alibabacloud-tea" + ] (_: null)); + pythonPackageOverrides = final: _prev: if isAarch64Darwin then { numpy = mkPrebuiltOverride final python311.pkgs.numpy { }; @@ -75,6 +89,7 @@ let (lib.composeManyExtensions [ pyproject-build-systems.overlays.default overlay + buildSystemOverrides pythonPackageOverrides ]); in diff --git a/nix/tui.nix b/nix/tui.nix index 70eb67f949a..7303edecb9f 100644 --- a/nix/tui.nix +++ b/nix/tui.nix @@ -4,7 +4,7 @@ let src = ../ui-tui; npmDeps = pkgs.fetchNpmDeps { inherit src; - hash = "sha256-zsUPmbC6oMUO10EhS3ptvDjwlfpCSEmrkjyeORw7fac="; + hash = "sha256-mG3vpgGi4ljt4X3XIf3I/5mIcm+rVTUAmx2DQ6YVA90="; }; packageJson = builtins.fromJSON (builtins.readFile (src + "/package.json")); @@ -18,11 +18,6 @@ pkgs.buildNpmPackage { doCheck = false; - postPatch = '' - # fetchNpmDeps strips the trailing newline; match it so the diff passes - sed -i -z 's/\n$//' package-lock.json - ''; - installPhase = '' runHook preInstall From f0638f35964ee28cff608a05614524065488c0b7 Mon Sep 17 00:00:00 2001 From: Brooklyn Nicholson Date: Sat, 18 Apr 2026 09:11:53 -0500 Subject: [PATCH 023/143] fix(tui): split /model picker from /provider wizard to resolve registry collision --- ui-tui/src/app/slash/commands/setup.ts | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/ui-tui/src/app/slash/commands/setup.ts b/ui-tui/src/app/slash/commands/setup.ts index c6d5cc8637b..d9a948e5419 100644 --- a/ui-tui/src/app/slash/commands/setup.ts +++ b/ui-tui/src/app/slash/commands/setup.ts @@ -6,9 +6,8 @@ import type { SlashCommand } from '../types.js' export const setupCommands: SlashCommand[] = [ { - aliases: ['provider'], - help: 'configure LLM provider and model (launches `hermes model`)', - name: 'model', + help: 'configure LLM provider + model (launches `hermes model`)', + name: 'provider', run: (_arg, ctx) => void runExternalSetup({ args: ['model'], From 4e1ea79edc8fa6d1e4958e9df19fcca042efa566 Mon Sep 17 00:00:00 2001 From: Brooklyn Nicholson Date: Sat, 18 Apr 2026 09:11:57 -0500 Subject: [PATCH 024/143] feat(tui): accept raw Ctrl+V as clipboard image paste fallback --- ui-tui/src/components/textInput.tsx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ui-tui/src/components/textInput.tsx b/ui-tui/src/components/textInput.tsx index f2bbee63cf2..6503da4dbff 100644 --- a/ui-tui/src/components/textInput.tsx +++ b/ui-tui/src/components/textInput.tsx @@ -464,7 +464,7 @@ export function TextInput({ (inp: string, k: Key, event: InputEvent) => { const eventRaw = event.keypress.raw - if (eventRaw === '\x1bv' || eventRaw === '\x1bV') { + if (eventRaw === '\x1bv' || eventRaw === '\x1bV' || eventRaw === '\x16') { return void emitPaste({ cursor: curRef.current, hotkey: true, text: '', value: vRef.current }) } From 5152e1ad8646235e4b745cf3d1337417b13f5ef5 Mon Sep 17 00:00:00 2001 From: Brooklyn Nicholson Date: Sat, 18 Apr 2026 09:16:37 -0500 Subject: [PATCH 025/143] feat(tui-gateway): surface config.quick_commands in commands.catalog --- tui_gateway/server.py | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/tui_gateway/server.py b/tui_gateway/server.py index a7dae9e5c60..fad674aeb78 100644 --- a/tui_gateway/server.py +++ b/tui_gateway/server.py @@ -1987,8 +1987,35 @@ def _(rid, params: dict) -> dict: cat_order.append(cat) cat_map[cat].append([name, desc]) - skill_count = 0 warning = "" + try: + qcmds = _load_cfg().get("quick_commands", {}) or {} + if isinstance(qcmds, dict) and qcmds: + bucket = "User commands" + if bucket not in cat_map: + cat_map[bucket] = [] + cat_order.append(bucket) + for qname, qc in sorted(qcmds.items()): + if not isinstance(qc, dict): + continue + key = f"/{qname}" + canon[key.lower()] = key + qtype = qc.get("type", "") + if qtype == "exec": + default_desc = f"exec: {qc.get('command', '')}" + elif qtype == "alias": + default_desc = f"alias → {qc.get('target', '')}" + else: + default_desc = qtype or "quick command" + qdesc = str(qc.get("description") or default_desc) + qdesc = qdesc[:120] + ("…" if len(qdesc) > 120 else "") + all_pairs.append([key, qdesc]) + cat_map[bucket].append([key, qdesc]) + except Exception as e: + if not warning: + warning = f"quick_commands discovery unavailable: {e}" + + skill_count = 0 try: from agent.skill_commands import scan_skill_commands for k, info in sorted(scan_skill_commands().items()): From a397b0fd4d5c95b6aef4eecbb13eabad3d7e659b Mon Sep 17 00:00:00 2001 From: Brooklyn Nicholson Date: Sat, 18 Apr 2026 09:16:39 -0500 Subject: [PATCH 026/143] test(tui-gateway): assert quick_commands appear in commands.catalog output --- tests/test_tui_gateway_server.py | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/tests/test_tui_gateway_server.py b/tests/test_tui_gateway_server.py index ea231e626e5..d441e2b32d0 100644 --- a/tests/test_tui_gateway_server.py +++ b/tests/test_tui_gateway_server.py @@ -363,6 +363,28 @@ def test_image_attach_appends_local_image(monkeypatch): assert len(server._sessions["sid"]["attached_images"]) == 1 +def test_commands_catalog_surfaces_quick_commands(monkeypatch): + monkeypatch.setattr(server, "_load_cfg", lambda: {"quick_commands": { + "build": {"type": "exec", "command": "npm run build"}, + "git": {"type": "alias", "target": "/shell git"}, + "notes": {"type": "exec", "command": "cat NOTES.md", "description": "Open design notes"}, + }}) + + resp = server.handle_request({"id": "1", "method": "commands.catalog", "params": {}}) + + pairs = dict(resp["result"]["pairs"]) + assert "npm run build" in pairs["/build"] + assert pairs["/git"].startswith("alias →") + assert pairs["/notes"] == "Open design notes" + + user_cat = next(c for c in resp["result"]["categories"] if c["name"] == "User commands") + user_pairs = dict(user_cat["pairs"]) + assert set(user_pairs) == {"/build", "/git", "/notes"} + + assert resp["result"]["canon"]["/build"] == "/build" + assert resp["result"]["canon"]["/notes"] == "/notes" + + def test_command_dispatch_exec_nonzero_surfaces_error(monkeypatch): monkeypatch.setattr(server, "_load_cfg", lambda: {"quick_commands": {"boom": {"type": "exec", "command": "boom"}}}) monkeypatch.setattr( From 586b2f208913e2d63f08e426ac0c2ac6b3bc3823 Mon Sep 17 00:00:00 2001 From: Brooklyn Nicholson Date: Sat, 18 Apr 2026 09:16:44 -0500 Subject: [PATCH 027/143] feat(tui): persist large pastes to ~/.hermes/pastes/ via paste.collapse --- ui-tui/src/app/interfaces.ts | 1 + ui-tui/src/app/useComposerState.ts | 15 ++++++++++++++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/ui-tui/src/app/interfaces.ts b/ui-tui/src/app/interfaces.ts index 998afe2a198..ff2b1e5b5a5 100644 --- a/ui-tui/src/app/interfaces.ts +++ b/ui-tui/src/app/interfaces.ts @@ -335,5 +335,6 @@ export interface AppOverlaysProps { export interface PasteSnippet { label: string + path?: string text: string } diff --git a/ui-tui/src/app/useComposerState.ts b/ui-tui/src/app/useComposerState.ts index 14a40412c99..bebda273d9f 100644 --- a/ui-tui/src/app/useComposerState.ts +++ b/ui-tui/src/app/useComposerState.ts @@ -70,12 +70,25 @@ export function useComposerState({ gw, onClipboardPaste, submitRef }: UseCompose setPasteSnips(prev => [...prev, { label, text: cleanedText }].slice(-32)) + void gw + .request<{ path?: string }>('paste.collapse', { text: cleanedText }) + .then(r => { + const path = r?.path + + if (!path) { + return + } + + setPasteSnips(prev => prev.map(s => (s.label === label ? { ...s, path } : s))) + }) + .catch(() => {}) + return { cursor: cursor + insert.length, value: value.slice(0, cursor) + insert + value.slice(cursor) } }, - [onClipboardPaste] + [gw, onClipboardPaste] ) const openEditor = useCallback(() => { From 200c17433c0ce24a9332b857e64b6db3041a1f59 Mon Sep 17 00:00:00 2001 From: Brooklyn Nicholson Date: Sat, 18 Apr 2026 09:23:29 -0500 Subject: [PATCH 028/143] feat(tui): read display.streaming / show_reasoning / show_cost / inline_diffs from config Extends ConfigDisplayConfig and UiState so the four new display flags flow from `config.get {key:"full"}` into the nanostore. applyDisplay is exported to keep the fan-out testable without an Ink harness. Defaults mirror v1 parity: streaming + inline_diffs default true (opt-out via `=== false`), show_cost + show_reasoning default false (opt-in via plain truthy check). --- ui-tui/src/__tests__/useConfigSync.test.ts | 67 ++++++++++++++++++++++ ui-tui/src/app/interfaces.ts | 4 ++ ui-tui/src/app/uiStore.ts | 4 ++ ui-tui/src/app/useConfigSync.ts | 8 ++- ui-tui/src/gatewayTypes.ts | 4 ++ 5 files changed, 85 insertions(+), 2 deletions(-) create mode 100644 ui-tui/src/__tests__/useConfigSync.test.ts diff --git a/ui-tui/src/__tests__/useConfigSync.test.ts b/ui-tui/src/__tests__/useConfigSync.test.ts new file mode 100644 index 00000000000..c14ecff3aa7 --- /dev/null +++ b/ui-tui/src/__tests__/useConfigSync.test.ts @@ -0,0 +1,67 @@ +import { beforeEach, describe, expect, it, vi } from 'vitest' + +import { $uiState, resetUiState } from '../app/uiStore.js' +import { applyDisplay } from '../app/useConfigSync.js' + +describe('applyDisplay', () => { + beforeEach(() => { + resetUiState() + }) + + it('fans every display flag out to $uiState and the bell callback', () => { + const setBell = vi.fn() + + applyDisplay( + { + config: { + display: { + bell_on_complete: true, + details_mode: 'expanded', + inline_diffs: false, + show_cost: true, + show_reasoning: true, + streaming: false, + tui_compact: true, + tui_statusbar: false + } + } + }, + setBell + ) + + const s = $uiState.get() + expect(setBell).toHaveBeenCalledWith(true) + expect(s.compact).toBe(true) + expect(s.detailsMode).toBe('expanded') + expect(s.inlineDiffs).toBe(false) + expect(s.showCost).toBe(true) + expect(s.showReasoning).toBe(true) + expect(s.statusBar).toBe(false) + expect(s.streaming).toBe(false) + }) + + it('applies v1 parity defaults when display fields are missing', () => { + const setBell = vi.fn() + + applyDisplay({ config: { display: {} } }, setBell) + + const s = $uiState.get() + expect(setBell).toHaveBeenCalledWith(false) + expect(s.inlineDiffs).toBe(true) + expect(s.showCost).toBe(false) + expect(s.showReasoning).toBe(false) + expect(s.statusBar).toBe(true) + expect(s.streaming).toBe(true) + }) + + it('treats a null config like an empty display block', () => { + const setBell = vi.fn() + + applyDisplay(null, setBell) + + const s = $uiState.get() + expect(setBell).toHaveBeenCalledWith(false) + expect(s.inlineDiffs).toBe(true) + expect(s.streaming).toBe(true) + }) +}) diff --git a/ui-tui/src/app/interfaces.ts b/ui-tui/src/app/interfaces.ts index ff2b1e5b5a5..bf3d54c627b 100644 --- a/ui-tui/src/app/interfaces.ts +++ b/ui-tui/src/app/interfaces.ts @@ -78,9 +78,13 @@ export interface UiState { compact: boolean detailsMode: DetailsMode info: null | SessionInfo + inlineDiffs: boolean + showCost: boolean + showReasoning: boolean sid: null | string status: string statusBar: boolean + streaming: boolean theme: Theme usage: Usage } diff --git a/ui-tui/src/app/uiStore.ts b/ui-tui/src/app/uiStore.ts index b7f5c20f4df..81089f1795a 100644 --- a/ui-tui/src/app/uiStore.ts +++ b/ui-tui/src/app/uiStore.ts @@ -11,9 +11,13 @@ const buildUiState = (): UiState => ({ compact: false, detailsMode: 'collapsed', info: null, + inlineDiffs: true, + showCost: false, + showReasoning: false, sid: null, status: 'summoning hermes…', statusBar: true, + streaming: true, theme: DEFAULT_THEME, usage: ZERO }) diff --git a/ui-tui/src/app/useConfigSync.ts b/ui-tui/src/app/useConfigSync.ts index fe3cec57378..8a3756342ba 100644 --- a/ui-tui/src/app/useConfigSync.ts +++ b/ui-tui/src/app/useConfigSync.ts @@ -27,14 +27,18 @@ const quietRpc = async = Record>( } } -const applyDisplay = (cfg: ConfigFullResponse | null, setBell: (v: boolean) => void) => { +export const applyDisplay = (cfg: ConfigFullResponse | null, setBell: (v: boolean) => void) => { const d = cfg?.config?.display ?? {} setBell(!!d.bell_on_complete) patchUiState({ compact: !!d.tui_compact, detailsMode: resolveDetailsMode(d), - statusBar: d.tui_statusbar !== false + inlineDiffs: d.inline_diffs !== false, + showCost: !!d.show_cost, + showReasoning: !!d.show_reasoning, + statusBar: d.tui_statusbar !== false, + streaming: d.streaming !== false }) } diff --git a/ui-tui/src/gatewayTypes.ts b/ui-tui/src/gatewayTypes.ts index c8d1c685523..fd5b6c13472 100644 --- a/ui-tui/src/gatewayTypes.ts +++ b/ui-tui/src/gatewayTypes.ts @@ -53,6 +53,10 @@ export type CommandDispatchResponse = export interface ConfigDisplayConfig { bell_on_complete?: boolean details_mode?: string + inline_diffs?: boolean + show_cost?: boolean + show_reasoning?: boolean + streaming?: boolean thinking_mode?: string tui_compact?: boolean tui_statusbar?: boolean From fd6ffc777fea792f368dd3e3e86a66e438adafd3 Mon Sep 17 00:00:00 2001 From: Brooklyn Nicholson Date: Sat, 18 Apr 2026 09:26:03 -0500 Subject: [PATCH 029/143] feat(tui): honor display.* flags in turn renderer, status bar, and event handler MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - turnController gates scheduleStreaming / reasoning recorders on streaming + showReasoning so disabling them keeps the buffer silent until message.complete flushes - createGatewayEventHandler only surfaces inline_diff previews when inlineDiffs is on - StatusRule takes a showCost prop and renders `· $X.XXXX` with the same toFixed(4) formatting as /usage when usage.cost_usd is present - Usage grows cost_usd?: number to match the gateway payload - Existing handler tests flip showReasoning on in beforeEach so reasoning-flow assertions keep their meaning --- .../__tests__/createGatewayEventHandler.test.ts | 3 ++- ui-tui/src/app/createGatewayEventHandler.ts | 2 +- ui-tui/src/app/turnController.ts | 15 +++++++++++++-- ui-tui/src/components/appChrome.tsx | 5 +++++ ui-tui/src/components/appLayout.tsx | 1 + ui-tui/src/types.ts | 1 + 6 files changed, 23 insertions(+), 4 deletions(-) diff --git a/ui-tui/src/__tests__/createGatewayEventHandler.test.ts b/ui-tui/src/__tests__/createGatewayEventHandler.test.ts index e546ce640e4..f1f0c306bcd 100644 --- a/ui-tui/src/__tests__/createGatewayEventHandler.test.ts +++ b/ui-tui/src/__tests__/createGatewayEventHandler.test.ts @@ -4,7 +4,7 @@ import { createGatewayEventHandler } from '../app/createGatewayEventHandler.js' import { resetOverlayState } from '../app/overlayStore.js' import { turnController } from '../app/turnController.js' import { resetTurnState } from '../app/turnStore.js' -import { resetUiState } from '../app/uiStore.js' +import { patchUiState, resetUiState } from '../app/uiStore.js' import { estimateTokensRough } from '../lib/text.js' import type { Msg } from '../types.js' @@ -47,6 +47,7 @@ describe('createGatewayEventHandler', () => { resetUiState() resetTurnState() turnController.fullReset() + patchUiState({ showReasoning: true }) }) it('persists completed tool rows when message.complete lands immediately after tool.complete', () => { diff --git a/ui-tui/src/app/createGatewayEventHandler.ts b/ui-tui/src/app/createGatewayEventHandler.ts index e728f8bbd01..699a3794dee 100644 --- a/ui-tui/src/app/createGatewayEventHandler.ts +++ b/ui-tui/src/app/createGatewayEventHandler.ts @@ -266,7 +266,7 @@ export function createGatewayEventHandler(ctx: GatewayEventHandlerContext): (ev: case 'tool.complete': turnController.recordToolComplete(ev.payload.tool_id, ev.payload.name, ev.payload.error, ev.payload.summary) - if (ev.payload.inline_diff) { + if (ev.payload.inline_diff && getUiState().inlineDiffs) { sys(ev.payload.inline_diff) } diff --git a/ui-tui/src/app/turnController.ts b/ui-tui/src/app/turnController.ts index 73d0571734e..de57b2dd053 100644 --- a/ui-tui/src/app/turnController.ts +++ b/ui-tui/src/app/turnController.ts @@ -11,7 +11,7 @@ import type { ActiveTool, ActivityItem, Msg, SubagentProgress } from '../types.j import { resetOverlayState } from './overlayStore.js' import { patchTurnState, resetTurnState } from './turnStore.js' -import { patchUiState } from './uiStore.js' +import { getUiState, patchUiState } from './uiStore.js' const INTERRUPT_COOLDOWN_MS = 1500 const ACTIVITY_LIMIT = 8 @@ -226,10 +226,17 @@ class TurnController { } this.bufRef = rendered ?? this.bufRef + text - this.scheduleStreaming() + + if (getUiState().streaming) { + this.scheduleStreaming() + } } recordReasoningAvailable(text: string) { + if (!getUiState().showReasoning) { + return + } + const incoming = text.trim() if (!incoming || this.reasoningText.trim()) { @@ -242,6 +249,10 @@ class TurnController { } recordReasoningDelta(text: string) { + if (!getUiState().showReasoning) { + return + } + this.reasoningText += text this.scheduleReasoning() this.pulseReasoningStreaming() diff --git a/ui-tui/src/components/appChrome.tsx b/ui-tui/src/components/appChrome.tsx index ed6f914c96b..2f5f807dec7 100644 --- a/ui-tui/src/components/appChrome.tsx +++ b/ui-tui/src/components/appChrome.tsx @@ -99,6 +99,7 @@ export function StatusRule({ usage, bgCount, sessionStartedAt, + showCost, voiceLabel, t }: StatusRuleProps) { @@ -136,6 +137,9 @@ export function StatusRule({ ) : null} {voiceLabel ? │ {voiceLabel} : null} {bgCount > 0 ? │ {bgCount} bg : null} + {showCost && typeof usage.cost_usd === 'number' ? ( + │ ${usage.cost_usd.toFixed(4)} + ) : null} @@ -285,6 +289,7 @@ interface StatusRuleProps { cwdLabel: string model: string sessionStartedAt?: number | null + showCost: boolean status: string statusColor: string t: Theme diff --git a/ui-tui/src/components/appLayout.tsx b/ui-tui/src/components/appLayout.tsx index 26d8e4b0a99..f13adf1bbd0 100644 --- a/ui-tui/src/components/appLayout.tsx +++ b/ui-tui/src/components/appLayout.tsx @@ -190,6 +190,7 @@ const ComposerPane = memo(function ComposerPane({ cwdLabel={status.cwdLabel} model={ui.info?.model?.split('/').pop() ?? ''} sessionStartedAt={status.sessionStartedAt} + showCost={ui.showCost} status={ui.status} statusColor={status.statusColor} t={ui.theme} diff --git a/ui-tui/src/types.ts b/ui-tui/src/types.ts index ab7d7efab96..32e99983ac9 100644 --- a/ui-tui/src/types.ts +++ b/ui-tui/src/types.ts @@ -68,6 +68,7 @@ export interface Usage { context_max?: number context_percent?: number context_used?: number + cost_usd?: number input: number output: number total: number From 202b78ec684aee2a0bc5964bc2a58d2d20f8fbfc Mon Sep 17 00:00:00 2001 From: Brooklyn Nicholson Date: Sat, 18 Apr 2026 09:23:47 -0500 Subject: [PATCH 030/143] feat(tui-gateway): include per-MCP-server status in session.info payload --- tui_gateway/server.py | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/tui_gateway/server.py b/tui_gateway/server.py index fad674aeb78..7d4c3fa3c17 100644 --- a/tui_gateway/server.py +++ b/tui_gateway/server.py @@ -588,6 +588,11 @@ def _session_info(agent) -> dict: info["skills"] = get_available_skills() except Exception: pass + try: + from tools.mcp_tool import get_mcp_status + info["mcp_servers"] = get_mcp_status() + except Exception: + info["mcp_servers"] = [] try: from hermes_cli.banner import get_update_result from hermes_cli.config import recommended_update_command From b82ec6419d8fe49bf0bef46b45d78276157b9838 Mon Sep 17 00:00:00 2001 From: Brooklyn Nicholson Date: Sat, 18 Apr 2026 09:23:47 -0500 Subject: [PATCH 031/143] test(tui-gateway): cover mcp_servers field in _session_info output --- tests/test_tui_gateway_server.py | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/tests/test_tui_gateway_server.py b/tests/test_tui_gateway_server.py index d441e2b32d0..35bc3f449b2 100644 --- a/tests/test_tui_gateway_server.py +++ b/tests/test_tui_gateway_server.py @@ -531,3 +531,18 @@ def test_session_steer_errors_when_agent_has_no_steer_method(): assert "error" in resp, resp assert resp["error"]["code"] == 4010 + +def test_session_info_includes_mcp_servers(monkeypatch): + fake_status = [ + {"name": "github", "transport": "http", "tools": 12, "connected": True}, + {"name": "filesystem", "transport": "stdio", "tools": 4, "connected": True}, + {"name": "broken", "transport": "stdio", "tools": 0, "connected": False}, + ] + fake_mod = types.ModuleType("tools.mcp_tool") + fake_mod.get_mcp_status = lambda: fake_status + monkeypatch.setitem(sys.modules, "tools.mcp_tool", fake_mod) + + info = server._session_info(types.SimpleNamespace(tools=[], model="")) + + assert info["mcp_servers"] == fake_status + From 382132302917348e060063b7516c0cd616b07df6 Mon Sep 17 00:00:00 2001 From: Brooklyn Nicholson Date: Sat, 18 Apr 2026 09:23:47 -0500 Subject: [PATCH 032/143] feat(tui): render per-MCP-server status block in SessionPanel --- ui-tui/src/components/branding.tsx | 25 +++++++++++++++++++++++++ ui-tui/src/types.ts | 8 ++++++++ 2 files changed, 33 insertions(+) diff --git a/ui-tui/src/components/branding.tsx b/ui-tui/src/components/branding.tsx index fc019ac86f0..919c34b612f 100644 --- a/ui-tui/src/components/branding.tsx +++ b/ui-tui/src/components/branding.tsx @@ -126,11 +126,36 @@ export function SessionPanel({ info, sid, t }: SessionPanelProps) { {section('Tools', info.tools, 8, 'more toolsets…')} {section('Skills', info.skills)} + + {info.mcp_servers && info.mcp_servers.length > 0 && ( + + + MCP Servers + + + {info.mcp_servers.map(s => ( + + {` ${s.name} `} + {`[${s.transport}]`} + : + {s.connected ? ( + + {s.tools} tool{s.tools === 1 ? '' : 's'} + + ) : ( + failed + )} + + ))} + + )} + {flat(info.tools).length} tools{' · '} {flat(info.skills).length} skills + {info.mcp_servers?.length ? ` · ${info.mcp_servers.length} MCP` : ''} {' · '} /help for commands diff --git a/ui-tui/src/types.ts b/ui-tui/src/types.ts index 32e99983ac9..98cc31203c5 100644 --- a/ui-tui/src/types.ts +++ b/ui-tui/src/types.ts @@ -51,8 +51,16 @@ export type Role = 'assistant' | 'system' | 'tool' | 'user' export type DetailsMode = 'hidden' | 'collapsed' | 'expanded' export type ThinkingMode = 'collapsed' | 'truncated' | 'full' +export interface McpServerStatus { + connected: boolean + name: string + tools: number + transport: string +} + export interface SessionInfo { cwd?: string + mcp_servers?: McpServerStatus[] model: string release_date?: string skills: Record From 6fbfae8f42297a71e170de6103af459bd0a81f27 Mon Sep 17 00:00:00 2001 From: Brooklyn Nicholson Date: Sat, 18 Apr 2026 09:26:24 -0500 Subject: [PATCH 033/143] feat(tui): add skillsHub overlay state wiring Extend OverlayState with a skillsHub flag, fold it into $isBlocked, and teach Ctrl+C to close the overlay so later PRs can render the component behind this slot. --- ui-tui/src/app/interfaces.ts | 1 + ui-tui/src/app/overlayStore.ts | 7 +++++-- ui-tui/src/app/useInputHandlers.ts | 4 ++++ 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/ui-tui/src/app/interfaces.ts b/ui-tui/src/app/interfaces.ts index bf3d54c627b..a23b2068836 100644 --- a/ui-tui/src/app/interfaces.ts +++ b/ui-tui/src/app/interfaces.ts @@ -57,6 +57,7 @@ export interface OverlayState { pager: null | PagerState picker: boolean secret: null | SecretReq + skillsHub: boolean sudo: null | SudoReq } diff --git a/ui-tui/src/app/overlayStore.ts b/ui-tui/src/app/overlayStore.ts index 4b24f0daab9..a2ea4002331 100644 --- a/ui-tui/src/app/overlayStore.ts +++ b/ui-tui/src/app/overlayStore.ts @@ -9,13 +9,16 @@ const buildOverlayState = (): OverlayState => ({ pager: null, picker: false, secret: null, + skillsHub: false, sudo: null }) export const $overlayState = atom(buildOverlayState()) -export const $isBlocked = computed($overlayState, ({ approval, clarify, modelPicker, pager, picker, secret, sudo }) => - Boolean(approval || clarify || modelPicker || pager || picker || secret || sudo) +export const $isBlocked = computed( + $overlayState, + ({ approval, clarify, modelPicker, pager, picker, secret, skillsHub, sudo }) => + Boolean(approval || clarify || modelPicker || pager || picker || secret || skillsHub || sudo) ) export const getOverlayState = () => $overlayState.get() diff --git a/ui-tui/src/app/useInputHandlers.ts b/ui-tui/src/app/useInputHandlers.ts index 70000b73c8c..0279a203cac 100644 --- a/ui-tui/src/app/useInputHandlers.ts +++ b/ui-tui/src/app/useInputHandlers.ts @@ -63,6 +63,10 @@ export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult { return patchOverlayState({ modelPicker: false }) } + if (overlay.skillsHub) { + return patchOverlayState({ skillsHub: false }) + } + if (overlay.picker) { return patchOverlayState({ picker: false }) } From ef284e021ac73fcdac9a8392a10bb42f2018b74f Mon Sep 17 00:00:00 2001 From: Brooklyn Nicholson Date: Sat, 18 Apr 2026 09:27:48 -0500 Subject: [PATCH 034/143] feat(tui): add two-step SkillsHub overlay component MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit New SkillsHub mirrors ModelPicker's category → item → actions flow with paginated 12-line lists, 1-9/0 quick-pick, Esc-back navigation, and lazy skills.manage inspect/install calls. Mount it from appOverlays when overlay.skillsHub is true. --- ui-tui/src/components/appOverlays.tsx | 9 +- ui-tui/src/components/skillsHub.tsx | 290 ++++++++++++++++++++++++++ 2 files changed, 298 insertions(+), 1 deletion(-) create mode 100644 ui-tui/src/components/skillsHub.tsx diff --git a/ui-tui/src/components/appOverlays.tsx b/ui-tui/src/components/appOverlays.tsx index 23187cf3f92..27db09024fc 100644 --- a/ui-tui/src/components/appOverlays.tsx +++ b/ui-tui/src/components/appOverlays.tsx @@ -11,6 +11,7 @@ import { MaskedPrompt } from './maskedPrompt.js' import { ModelPicker } from './modelPicker.js' import { ApprovalPrompt, ClarifyPrompt } from './prompts.js' import { SessionPicker } from './sessionPicker.js' +import { SkillsHub } from './skillsHub.js' export function PromptZone({ cols, @@ -82,7 +83,7 @@ export function FloatingOverlays({ const overlay = useStore($overlayState) const ui = useStore($uiState) - const hasAny = overlay.modelPicker || overlay.pager || overlay.picker || completions.length + const hasAny = overlay.modelPicker || overlay.pager || overlay.picker || overlay.skillsHub || completions.length if (!hasAny) { return null @@ -115,6 +116,12 @@ export function FloatingOverlays({ )} + {overlay.skillsHub && ( + + patchOverlayState({ skillsHub: false })} t={ui.theme} /> + + )} + {overlay.pager && ( diff --git a/ui-tui/src/components/skillsHub.tsx b/ui-tui/src/components/skillsHub.tsx new file mode 100644 index 00000000000..03ed3d92f37 --- /dev/null +++ b/ui-tui/src/components/skillsHub.tsx @@ -0,0 +1,290 @@ +import { Box, Text, useInput } from '@hermes/ink' +import { useEffect, useState } from 'react' + +import type { GatewayClient } from '../gatewayClient.js' +import { rpcErrorMessage } from '../lib/rpc.js' +import type { Theme } from '../theme.js' + +const VISIBLE = 12 + +const pageOffset = (count: number, sel: number) => Math.max(0, Math.min(sel - Math.floor(VISIBLE / 2), count - VISIBLE)) + +const visibleItems = (items: string[], sel: number) => { + const off = pageOffset(items.length, sel) + + return { items: items.slice(off, off + VISIBLE), off } +} + +export function SkillsHub({ gw, onClose, t }: SkillsHubProps) { + const [skillsByCat, setSkillsByCat] = useState>({}) + const [selectedCat, setSelectedCat] = useState('') + const [catIdx, setCatIdx] = useState(0) + const [skillIdx, setSkillIdx] = useState(0) + const [stage, setStage] = useState<'actions' | 'category' | 'skill'>('category') + const [info, setInfo] = useState(null) + const [installing, setInstalling] = useState(false) + const [err, setErr] = useState('') + const [loading, setLoading] = useState(true) + + useEffect(() => { + gw.request<{ skills?: Record }>('skills.manage', { action: 'list' }) + .then(r => { + setSkillsByCat(r?.skills ?? {}) + setErr('') + setLoading(false) + }) + .catch((e: unknown) => { + setErr(rpcErrorMessage(e)) + setLoading(false) + }) + }, [gw]) + + const cats = Object.keys(skillsByCat).sort() + const skills = selectedCat ? (skillsByCat[selectedCat] ?? []) : [] + const skillName = skills[skillIdx] ?? '' + + const inspect = (name: string) => { + setInfo(null) + setErr('') + + gw.request<{ info?: SkillInfo }>('skills.manage', { action: 'inspect', query: name }) + .then(r => setInfo(r?.info ?? { name })) + .catch((e: unknown) => setErr(rpcErrorMessage(e))) + } + + const install = (name: string) => { + setInstalling(true) + setErr('') + + gw.request<{ installed?: boolean; name?: string }>('skills.manage', { action: 'install', query: name }) + .then(() => onClose()) + .catch((e: unknown) => setErr(rpcErrorMessage(e))) + .finally(() => setInstalling(false)) + } + + useInput((ch, key) => { + if (installing) { + return + } + + if (key.escape) { + if (stage === 'actions') { + setStage('skill') + setInfo(null) + setErr('') + + return + } + + if (stage === 'skill') { + setStage('category') + setSkillIdx(0) + + return + } + + onClose() + + return + } + + if (stage === 'actions') { + if (key.return || ch.toLowerCase() === 'x') { + if (skillName) { + install(skillName) + } + + return + } + + if (ch.toLowerCase() === 'i' && skillName) { + inspect(skillName) + } + + return + } + + const count = stage === 'category' ? cats.length : skills.length + const sel = stage === 'category' ? catIdx : skillIdx + const setSel = stage === 'category' ? setCatIdx : setSkillIdx + + if (key.upArrow && sel > 0) { + setSel(v => v - 1) + + return + } + + if (key.downArrow && sel < count - 1) { + setSel(v => v + 1) + + return + } + + if (key.return) { + if (stage === 'category') { + const cat = cats[catIdx] + + if (!cat) { + return + } + + setSelectedCat(cat) + setSkillIdx(0) + setStage('skill') + + return + } + + const name = skills[skillIdx] + + if (name) { + setStage('actions') + inspect(name) + } + + return + } + + const n = ch === '0' ? 10 : parseInt(ch, 10) + + if (!Number.isNaN(n) && n >= 1 && n <= Math.min(10, count)) { + const off = pageOffset(count, sel) + const next = off + n - 1 + + if (stage === 'category') { + const cat = cats[next] + + if (cat) { + setSelectedCat(cat) + setCatIdx(next) + setSkillIdx(0) + setStage('skill') + } + + return + } + + const name = skills[next] + + if (name) { + setSkillIdx(next) + setStage('actions') + inspect(name) + } + } + }) + + if (loading) { + return loading skills… + } + + if (err && stage === 'category') { + return ( + + error: {err} + Esc to cancel + + ) + } + + if (!cats.length) { + return ( + + no skills available + Esc to cancel + + ) + } + + if (stage === 'category') { + const rows = cats.map(c => `${c} · ${skillsByCat[c]?.length ?? 0} skills`) + const { items, off } = visibleItems(rows, catIdx) + + return ( + + + Skills Hub + + + select a category + {off > 0 && ↑ {off} more} + + {items.map((row, i) => { + const idx = off + i + + return ( + + {catIdx === idx ? '▸ ' : ' '} + {i + 1}. {row} + + ) + })} + + {off + VISIBLE < rows.length && ↓ {rows.length - off - VISIBLE} more} + ↑/↓ select · Enter open · 1-9,0 quick · Esc cancel + + ) + } + + if (stage === 'skill') { + const { items, off } = visibleItems(skills, skillIdx) + + return ( + + + {selectedCat} + + + {skills.length} skill(s) + {!skills.length ? no skills in this category : null} + {off > 0 && ↑ {off} more} + + {items.map((row, i) => { + const idx = off + i + + return ( + + {skillIdx === idx ? '▸ ' : ' '} + {i + 1}. {row} + + ) + })} + + {off + VISIBLE < skills.length && ↓ {skills.length - off - VISIBLE} more} + + {skills.length ? '↑/↓ select · Enter open · 1-9,0 quick · Esc back' : 'Esc back'} + + + ) + } + + return ( + + + {info?.name ?? skillName} + + + {info?.category ?? selectedCat} + {info?.description ? {info.description} : null} + {info?.path ? path: {info.path} : null} + {!info && !err ? loading… : null} + {err ? error: {err} : null} + {installing ? installing… : null} + + Enter install · i inspect · x install · Esc back + + ) +} + +interface SkillInfo { + category?: string + description?: string + name?: string + path?: string +} + +interface SkillsHubProps { + gw: GatewayClient + onClose: () => void + t: Theme +} From 949b8f5521a6fc98d472f58aa9be3dedaa90e1d3 Mon Sep 17 00:00:00 2001 From: Brooklyn Nicholson Date: Sat, 18 Apr 2026 09:29:39 -0500 Subject: [PATCH 035/143] feat(tui): register /skills slash command to open Skills Hub Intercept bare /skills locally and flip overlay.skillsHub, so the overlay opens instantly without waiting on slash.exec. /skills still forwards to slash.exec and paginates any output. Tests cover both branches. --- .../src/__tests__/createSlashHandler.test.ts | 20 ++++++++++++++++ ui-tui/src/app/slash/commands/ops.ts | 24 ++++++++++++++++++- 2 files changed, 43 insertions(+), 1 deletion(-) diff --git a/ui-tui/src/__tests__/createSlashHandler.test.ts b/ui-tui/src/__tests__/createSlashHandler.test.ts index 9e1db994634..c54a659b94c 100644 --- a/ui-tui/src/__tests__/createSlashHandler.test.ts +++ b/ui-tui/src/__tests__/createSlashHandler.test.ts @@ -17,6 +17,26 @@ describe('createSlashHandler', () => { expect(getOverlayState().picker).toBe(true) }) + it('opens the skills hub locally for bare /skills', () => { + const ctx = buildCtx() + + expect(createSlashHandler(ctx)('/skills')).toBe(true) + expect(getOverlayState().skillsHub).toBe(true) + expect(ctx.gateway.rpc).not.toHaveBeenCalled() + expect(ctx.gateway.gw.request).not.toHaveBeenCalled() + }) + + it('falls through /skills with args to slash.exec without opening overlay', () => { + const ctx = buildCtx() + + expect(createSlashHandler(ctx)('/skills install foo')).toBe(true) + expect(getOverlayState().skillsHub).toBe(false) + expect(ctx.gateway.rpc).toHaveBeenCalledWith('slash.exec', { + command: 'skills install foo', + session_id: null + }) + }) + it('cycles details mode and persists it', async () => { const ctx = buildCtx() diff --git a/ui-tui/src/app/slash/commands/ops.ts b/ui-tui/src/app/slash/commands/ops.ts index 979e1f470aa..aa02fa6cbbb 100644 --- a/ui-tui/src/app/slash/commands/ops.ts +++ b/ui-tui/src/app/slash/commands/ops.ts @@ -1,7 +1,29 @@ -import type { ToolsConfigureResponse } from '../../../gatewayTypes.js' +import type { SlashExecResponse, ToolsConfigureResponse } from '../../../gatewayTypes.js' +import { patchOverlayState } from '../../overlayStore.js' import type { SlashCommand } from '../types.js' export const opsCommands: SlashCommand[] = [ + { + help: 'browse, inspect, and install skills', + name: 'skills', + run: (arg, ctx) => { + if (!arg.trim()) { + return patchOverlayState({ skillsHub: true }) + } + + ctx.gateway + .rpc('slash.exec', { command: `skills ${arg}`, session_id: ctx.sid }) + .then( + ctx.guarded(r => { + if (r.output) { + ctx.transcript.page(r.output, 'Skills') + } + }) + ) + .catch(ctx.guardedErr) + } + }, + { help: 'enable or disable tools (client-side history reset on change)', name: 'tools', From 5e148ca3d03f70d13b2f97d45f57d8664c2f7d55 Mon Sep 17 00:00:00 2001 From: Brooklyn Nicholson Date: Sat, 18 Apr 2026 09:46:36 -0500 Subject: [PATCH 036/143] fix(tui): route /skills subcommands through skills.manage instead of curses slash.exec MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit /skills install, inspect, search, browse, list now call the typed skills.manage RPC and render results via panel/page. Previously they fell through to slash.exec which invokes v1's curses code path — that hangs or crashes inside the Ink worker per the §2 parity-audit finding. Also drop Enter-as-install from the Skills Hub action stage since the Hub lists locally installed skills; primary action is inspect-and-close. x still triggers a manual reinstall for power users. --- .../src/__tests__/createSlashHandler.test.ts | 46 ++++- ui-tui/src/app/slash/commands/ops.ts | 158 ++++++++++++++++-- ui-tui/src/components/skillsHub.tsx | 16 +- 3 files changed, 198 insertions(+), 22 deletions(-) diff --git a/ui-tui/src/__tests__/createSlashHandler.test.ts b/ui-tui/src/__tests__/createSlashHandler.test.ts index c54a659b94c..67aa27f768c 100644 --- a/ui-tui/src/__tests__/createSlashHandler.test.ts +++ b/ui-tui/src/__tests__/createSlashHandler.test.ts @@ -26,17 +26,55 @@ describe('createSlashHandler', () => { expect(ctx.gateway.gw.request).not.toHaveBeenCalled() }) - it('falls through /skills with args to slash.exec without opening overlay', () => { + it('routes /skills install to skills.manage without opening overlay', () => { const ctx = buildCtx() expect(createSlashHandler(ctx)('/skills install foo')).toBe(true) expect(getOverlayState().skillsHub).toBe(false) - expect(ctx.gateway.rpc).toHaveBeenCalledWith('slash.exec', { - command: 'skills install foo', - session_id: null + expect(ctx.gateway.rpc).toHaveBeenCalledWith('skills.manage', { + action: 'install', + query: 'foo' }) }) + it('routes /skills inspect to skills.manage', () => { + const ctx = buildCtx() + + createSlashHandler(ctx)('/skills inspect my-skill') + expect(ctx.gateway.rpc).toHaveBeenCalledWith('skills.manage', { + action: 'inspect', + query: 'my-skill' + }) + }) + + it('routes /skills search to skills.manage', () => { + const ctx = buildCtx() + + createSlashHandler(ctx)('/skills search vibe') + expect(ctx.gateway.rpc).toHaveBeenCalledWith('skills.manage', { + action: 'search', + query: 'vibe' + }) + }) + + it('routes /skills browse [page] to skills.manage with a numeric page', () => { + const ctx = buildCtx() + + createSlashHandler(ctx)('/skills browse 3') + expect(ctx.gateway.rpc).toHaveBeenCalledWith('skills.manage', { + action: 'browse', + page: 3 + }) + }) + + it('shows usage for an unknown /skills subcommand', () => { + const ctx = buildCtx() + + createSlashHandler(ctx)('/skills zzz') + expect(ctx.gateway.rpc).not.toHaveBeenCalled() + expect(ctx.transcript.sys).toHaveBeenCalledWith(expect.stringContaining('usage: /skills')) + }) + it('cycles details mode and persists it', async () => { const ctx = buildCtx() diff --git a/ui-tui/src/app/slash/commands/ops.ts b/ui-tui/src/app/slash/commands/ops.ts index aa02fa6cbbb..d941c5af410 100644 --- a/ui-tui/src/app/slash/commands/ops.ts +++ b/ui-tui/src/app/slash/commands/ops.ts @@ -1,26 +1,158 @@ -import type { SlashExecResponse, ToolsConfigureResponse } from '../../../gatewayTypes.js' +import type { ToolsConfigureResponse } from '../../../gatewayTypes.js' +import type { PanelSection } from '../../../types.js' import { patchOverlayState } from '../../overlayStore.js' import type { SlashCommand } from '../types.js' +interface SkillInfo { + category?: string + description?: string + name?: string + path?: string +} + +interface SkillsListResponse { + skills?: Record +} + +interface SkillsInspectResponse { + info?: SkillInfo +} + +interface SkillsSearchResponse { + results?: { description?: string; name: string }[] +} + +interface SkillsInstallResponse { + installed?: boolean + name?: string +} + export const opsCommands: SlashCommand[] = [ { - help: 'browse, inspect, and install skills', + help: 'browse, inspect, install skills', name: 'skills', run: (arg, ctx) => { - if (!arg.trim()) { + const text = arg.trim() + + if (!text) { return patchOverlayState({ skillsHub: true }) } - ctx.gateway - .rpc('slash.exec', { command: `skills ${arg}`, session_id: ctx.sid }) - .then( - ctx.guarded(r => { - if (r.output) { - ctx.transcript.page(r.output, 'Skills') - } - }) - ) - .catch(ctx.guardedErr) + const [sub, ...rest] = text.split(/\s+/) + const query = rest.join(' ').trim() + const { rpc } = ctx.gateway + const { page, panel, sys } = ctx.transcript + + if (sub === 'list') { + rpc('skills.manage', { action: 'list' }) + .then( + ctx.guarded(r => { + const cats = Object.entries(r.skills ?? {}).sort() + + if (!cats.length) { + return sys('no skills available') + } + + panel( + 'Skills', + cats.map(([title, items]) => ({ items, title })) + ) + }) + ) + .catch(ctx.guardedErr) + + return + } + + if (sub === 'inspect') { + if (!query) { + return sys('usage: /skills inspect ') + } + + rpc('skills.manage', { action: 'inspect', query }) + .then( + ctx.guarded(r => { + const info = r.info ?? {} + + if (!info.name) { + return sys(`unknown skill: ${query}`) + } + + const rows: [string, string][] = [ + ['Name', String(info.name)], + ['Category', String(info.category ?? '')], + ['Path', String(info.path ?? '')] + ] + + const sections: PanelSection[] = [{ rows }] + + if (info.description) { + sections.push({ text: String(info.description) }) + } + + panel('Skill', sections) + }) + ) + .catch(ctx.guardedErr) + + return + } + + if (sub === 'search') { + if (!query) { + return sys('usage: /skills search ') + } + + rpc('skills.manage', { action: 'search', query }) + .then( + ctx.guarded(r => { + const results = r.results ?? [] + + if (!results.length) { + return sys(`no results for: ${query}`) + } + + panel(`Search: ${query}`, [{ rows: results.map(s => [s.name, s.description ?? '']) }]) + }) + ) + .catch(ctx.guardedErr) + + return + } + + if (sub === 'install') { + if (!query) { + return sys('usage: /skills install ') + } + + sys(`installing ${query}…`) + + rpc('skills.manage', { action: 'install', query }) + .then( + ctx.guarded(r => + sys(r.installed ? `installed ${r.name ?? query}` : 'install failed') + ) + ) + .catch(ctx.guardedErr) + + return + } + + if (sub === 'browse') { + const pageNum = parseInt(query, 10) || 1 + + rpc>('skills.manage', { action: 'browse', page: pageNum }) + .then( + ctx.guarded>(r => + page(JSON.stringify(r, null, 2).slice(0, 4000), `Browse Skills — p${pageNum}`) + ) + ) + .catch(ctx.guardedErr) + + return + } + + sys('usage: /skills [list | inspect | install | search | browse [page]]') } }, diff --git a/ui-tui/src/components/skillsHub.tsx b/ui-tui/src/components/skillsHub.tsx index 03ed3d92f37..877bb0ef384 100644 --- a/ui-tui/src/components/skillsHub.tsx +++ b/ui-tui/src/components/skillsHub.tsx @@ -89,10 +89,16 @@ export function SkillsHub({ gw, onClose, t }: SkillsHubProps) { } if (stage === 'actions') { - if (key.return || ch.toLowerCase() === 'x') { - if (skillName) { - install(skillName) - } + if (key.return) { + setStage('skill') + setInfo(null) + setErr('') + + return + } + + if (ch.toLowerCase() === 'x' && skillName) { + install(skillName) return } @@ -271,7 +277,7 @@ export function SkillsHub({ gw, onClose, t }: SkillsHubProps) { {err ? error: {err} : null} {installing ? installing… : null} - Enter install · i inspect · x install · Esc back + i reinspect · x reinstall · Enter/Esc back ) } From f8becbfbeab87b35424bf4c636a3b192a2072e5d Mon Sep 17 00:00:00 2001 From: Brooklyn Nicholson Date: Sat, 18 Apr 2026 09:48:38 -0500 Subject: [PATCH 037/143] feat(tui): per-language syntax highlighting in markdown code fences MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds a minimal hand-rolled highlighter for ts/js/jsx/tsx, py, sh/bash, go, rust, json, yaml, sql. Recognizes whole-line comments, single/double/backtick strings, numbers, and per-language keyword sets. Unknown langs fall through to the current plain rendering; the existing diff-specific colorization is preserved. Closes the §8 "Markdown syntax highlighting is missing (only diff gets colored)" finding from the TUI v2 audit without pulling in a highlighter library. --- ui-tui/src/__tests__/syntax.test.ts | 45 +++++++++++ ui-tui/src/components/markdown.tsx | 18 +++++ ui-tui/src/lib/syntax.ts | 117 ++++++++++++++++++++++++++++ 3 files changed, 180 insertions(+) create mode 100644 ui-tui/src/__tests__/syntax.test.ts create mode 100644 ui-tui/src/lib/syntax.ts diff --git a/ui-tui/src/__tests__/syntax.test.ts b/ui-tui/src/__tests__/syntax.test.ts new file mode 100644 index 00000000000..505988b2abf --- /dev/null +++ b/ui-tui/src/__tests__/syntax.test.ts @@ -0,0 +1,45 @@ +import { describe, expect, it } from 'vitest' + +import { highlightLine, isHighlightable } from '../lib/syntax.js' +import { DEFAULT_THEME } from '../theme.js' + +const t = DEFAULT_THEME + +describe('syntax highlighter', () => { + it('recognizes supported langs and aliases', () => { + expect(isHighlightable('ts')).toBe(true) + expect(isHighlightable('js')).toBe(true) + expect(isHighlightable('python')).toBe(true) + expect(isHighlightable('rs')).toBe(true) + expect(isHighlightable('bash')).toBe(true) + expect(isHighlightable('whatever')).toBe(false) + expect(isHighlightable('')).toBe(false) + }) + + it('paints a whole-line comment dim', () => { + const tokens = highlightLine('// hello', 'ts', t) + + expect(tokens).toEqual([[t.color.dim, '// hello']]) + }) + + it('paints keywords, strings, and numbers in a ts line', () => { + const tokens = highlightLine(`const x = 'hi' + 42`, 'ts', t) + const colors = tokens.map(tok => tok[0]) + + expect(colors).toContain(t.color.bronze) // const + expect(colors).toContain(t.color.amber) // 'hi' + expect(colors).toContain(t.color.cornsilk) // 42 + }) + + it('falls through unchanged for unknown langs', () => { + const tokens = highlightLine(`const x = 1`, 'zzz', t) + + expect(tokens).toEqual([['', 'const x = 1']]) + }) + + it('treats `#` as a python comment, not a selector', () => { + const tokens = highlightLine('# comment', 'py', t) + + expect(tokens).toEqual([[t.color.dim, '# comment']]) + }) +}) diff --git a/ui-tui/src/components/markdown.tsx b/ui-tui/src/components/markdown.tsx index 865ab857960..d43357b6918 100644 --- a/ui-tui/src/components/markdown.tsx +++ b/ui-tui/src/components/markdown.tsx @@ -1,6 +1,7 @@ import { Box, Text } from '@hermes/ink' import { memo, type ReactNode, useMemo } from 'react' +import { highlightLine, isHighlightable } from '../lib/syntax.js' import type { Theme } from '../theme.js' const FENCE_RE = /^\s*(`{3,}|~{3,})(.*)$/ @@ -282,11 +283,28 @@ function MdImpl({ compact, t, text }: MdProps) { start('code') const isDiff = lang === 'diff' + const highlighted = !isDiff && isHighlightable(lang) nodes.push( {lang && !isDiff && {'─ ' + lang}} {block.map((l, j) => { + if (highlighted) { + return ( + + {highlightLine(l, lang, t).map(([color, text], k) => + color ? ( + + {text} + + ) : ( + {text} + ) + )} + + ) + } + const add = isDiff && l.startsWith('+') const del = isDiff && l.startsWith('-') const hunk = isDiff && l.startsWith('@@') diff --git a/ui-tui/src/lib/syntax.ts b/ui-tui/src/lib/syntax.ts new file mode 100644 index 00000000000..06173b63e9f --- /dev/null +++ b/ui-tui/src/lib/syntax.ts @@ -0,0 +1,117 @@ +import type { Theme } from '../theme.js' + +export type Token = [string, string] + +interface LangSpec { + comment: null | string + keywords: Set +} + +const KW = (s: string) => new Set(s.split(/\s+/).filter(Boolean)) + +const TS = KW(` + abstract as async await break case catch class const continue debugger default delete do else enum export extends + false finally for from function get if implements import in instanceof interface is let new null of package private + protected public readonly return set static super switch this throw true try type typeof undefined var void while + with yield +`) + +const PY = KW(` + False None True and as assert async await break class continue def del elif else except finally for from global if + import in is lambda nonlocal not or pass raise return try while with yield +`) + +const SH = KW(` + if then else elif fi for in do done while until case esac function return break continue local export readonly + declare typeset +`) + +const GO = KW(` + break case chan const continue default defer else fallthrough for func go goto if import interface map package range + return select struct switch type var nil true false +`) + +const RUST = KW(` + as async await break const continue crate dyn else enum extern false fn for if impl in let loop match mod move mut + pub ref return self Self static struct super trait true type unsafe use where while yield +`) + +const SQL = KW(` + select from where and or not in is null as by group order limit offset insert into values update set delete create + table drop alter add column primary key foreign references join left right inner outer on +`) + +const LANGS: Record = { + go: { comment: '//', keywords: GO }, + json: { comment: null, keywords: KW('true false null') }, + py: { comment: '#', keywords: PY }, + rust: { comment: '//', keywords: RUST }, + sh: { comment: '#', keywords: SH }, + sql: { comment: '--', keywords: SQL }, + ts: { comment: '//', keywords: TS }, + yaml: { comment: '#', keywords: KW('true false null yes no on off') } +} + +const ALIAS: Record = { + bash: 'sh', + javascript: 'ts', + js: 'ts', + jsx: 'ts', + python: 'py', + rs: 'rust', + shell: 'sh', + tsx: 'ts', + typescript: 'ts', + yml: 'yaml', + zsh: 'sh' +} + +const resolve = (lang: string): LangSpec | null => LANGS[ALIAS[lang] ?? lang] ?? null + +export const isHighlightable = (lang: string): boolean => resolve(lang) !== null + +const TOKEN_RE = /'(?:[^'\\]|\\.)*'|"(?:[^"\\]|\\.)*"|`(?:[^`\\]|\\.)*`|\b\d+(?:\.\d+)?\b|[A-Za-z_$][\w$]*/g + +export function highlightLine(line: string, lang: string, t: Theme): Token[] { + const spec = resolve(lang) + + if (!spec) { + return [['', line]] + } + + if (spec.comment && line.trimStart().startsWith(spec.comment)) { + return [[t.color.dim, line]] + } + + const tokens: Token[] = [] + let last = 0 + + for (const m of line.matchAll(TOKEN_RE)) { + const start = m.index ?? 0 + + if (start > last) { + tokens.push(['', line.slice(last, start)]) + } + + const tok = m[0] + const ch = tok[0]! + + if (ch === '"' || ch === "'" || ch === '`') { + tokens.push([t.color.amber, tok]) + } else if (ch >= '0' && ch <= '9') { + tokens.push([t.color.cornsilk, tok]) + } else if (spec.keywords.has(tok)) { + tokens.push([t.color.bronze, tok]) + } else { + tokens.push(['', tok]) + } + + last = start + tok.length + } + + if (last < line.length) { + tokens.push(['', line.slice(last)]) + } + + return tokens +} From 8a0c774e9efd771c317e6f158a080ea19267182b Mon Sep 17 00:00:00 2001 From: Siddharth Balyan <52913345+alt-glitch@users.noreply.github.com> Date: Sat, 18 Apr 2026 08:25:39 -0700 Subject: [PATCH 038/143] Add web dashboard build to Nix flake (#12194) The web dashboard (Vite/React frontend) is now built as a separate Nix derivation and baked into the Hermes package. The build output is installed to a standard location and exposed via the `HERMES_WEB_DIST` environment variable, allowing the dashboard command to use pre-built assets when available (e.g., in packaged releases) instead of rebuilding on every invocation. --- hermes_cli/main.py | 5 ++-- hermes_cli/web_server.py | 2 +- nix/packages.nix | 7 +++++ nix/web.nix | 63 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 74 insertions(+), 3 deletions(-) create mode 100644 nix/web.nix diff --git a/hermes_cli/main.py b/hermes_cli/main.py index a13a6f88ee9..ce02c2e72c4 100644 --- a/hermes_cli/main.py +++ b/hermes_cli/main.py @@ -6229,8 +6229,9 @@ def cmd_dashboard(args): print(f"Install them with: {sys.executable} -m pip install 'fastapi' 'uvicorn[standard]'") sys.exit(1) - if not _build_web_ui(PROJECT_ROOT / "web", fatal=True): - sys.exit(1) + if "HERMES_WEB_DIST" not in os.environ: + if not _build_web_ui(PROJECT_ROOT / "web", fatal=True): + sys.exit(1) from hermes_cli.web_server import start_server diff --git a/hermes_cli/web_server.py b/hermes_cli/web_server.py index 0d0dc4a66b5..110b81e4b5e 100644 --- a/hermes_cli/web_server.py +++ b/hermes_cli/web_server.py @@ -59,7 +59,7 @@ except ImportError: f"Install with: {sys.executable} -m pip install 'fastapi' 'uvicorn[standard]'" ) -WEB_DIST = Path(__file__).parent / "web_dist" +WEB_DIST = Path(os.environ["HERMES_WEB_DIST"]) if "HERMES_WEB_DIST" in os.environ else Path(__file__).parent / "web_dist" _log = logging.getLogger(__name__) app = FastAPI(title="Hermes Agent", version=__version__) diff --git a/nix/packages.nix b/nix/packages.nix index 968ad12fb71..94e84af6d87 100644 --- a/nix/packages.nix +++ b/nix/packages.nix @@ -18,6 +18,10 @@ filter = path: _type: !(pkgs.lib.hasInfix "/index-cache/" path); }; + hermesWeb = pkgs.callPackage ./web.nix { + npm-lockfile-fix = inputs'.npm-lockfile-fix.packages.default; + }; + runtimeDeps = with pkgs; [ nodejs_22 ripgrep @@ -52,6 +56,7 @@ mkdir -p $out/share/hermes-agent $out/bin cp -r ${bundledSkills} $out/share/hermes-agent/skills + cp -r ${hermesWeb} $out/share/hermes-agent/web_dist # copy pre-built TUI (same layout as dev: ui-tui/dist/ + node_modules/) mkdir -p $out/ui-tui @@ -62,6 +67,7 @@ makeWrapper ${hermesVenv}/bin/${name} $out/bin/${name} \ --suffix PATH : "${runtimePath}" \ --set HERMES_BUNDLED_SKILLS $out/share/hermes-agent/skills \ + --set HERMES_WEB_DIST $out/share/hermes-agent/web_dist \ --set HERMES_TUI_DIR $out/ui-tui \ --set HERMES_PYTHON ${hermesVenv}/bin/python3 \ --set HERMES_NODE ${pkgs.nodejs_22}/bin/node @@ -104,6 +110,7 @@ }; tui = hermesTui; + web = hermesWeb; }; }; } diff --git a/nix/web.nix b/nix/web.nix new file mode 100644 index 00000000000..247889753f6 --- /dev/null +++ b/nix/web.nix @@ -0,0 +1,63 @@ +# nix/web.nix — Hermes Web Dashboard (Vite/React) frontend build +{ pkgs, npm-lockfile-fix, ... }: +let + src = ../web; + npmDeps = pkgs.fetchNpmDeps { + inherit src; + hash = "sha256-Y0pOzdFG8BLjfvCLmsvqYpjxFjAQabXp1i7X9W/cCU4="; + }; + + npmLockHash = builtins.hashString "sha256" (builtins.readFile ../web/package-lock.json); +in +pkgs.buildNpmPackage { + pname = "hermes-web"; + version = "0.0.0"; + inherit src npmDeps; + + doCheck = false; + + buildPhase = '' + npx tsc -b + npx vite build --outDir dist + ''; + + installPhase = '' + runHook preInstall + cp -r dist $out + runHook postInstall + ''; + + nativeBuildInputs = [ + (pkgs.writeShellScriptBin "update_web_lockfile" '' + set -euox pipefail + + REPO_ROOT=$(git rev-parse --show-toplevel) + + cd "$REPO_ROOT/web" + rm -rf node_modules/ + npm cache clean --force + CI=true npm install + ${pkgs.lib.getExe npm-lockfile-fix} ./package-lock.json + + NIX_FILE="$REPO_ROOT/nix/web.nix" + sed -i "s/hash = \"[^\"]*\";/hash = \"\";/" $NIX_FILE + NIX_OUTPUT=$(nix build .#web 2>&1 || true) + NEW_HASH=$(echo "$NIX_OUTPUT" | grep 'got:' | awk '{print $2}') + echo got new hash $NEW_HASH + sed -i "s|hash = \"[^\"]*\";|hash = \"$NEW_HASH\";|" $NIX_FILE + nix build .#web + echo "Updated npm hash in $NIX_FILE to $NEW_HASH" + '') + ]; + + passthru.devShellHook = '' + STAMP=".nix-stamps/hermes-web" + STAMP_VALUE="${npmLockHash}" + if [ ! -f "$STAMP" ] || [ "$(cat "$STAMP")" != "$STAMP_VALUE" ]; then + echo "hermes-web: installing npm dependencies..." + cd web && CI=true npm install --silent --no-fund --no-audit 2>/dev/null && cd .. + mkdir -p .nix-stamps + echo "$STAMP_VALUE" > "$STAMP" + fi + ''; +} From b0efdf37d783e4e5345bc3687557a48b4504c1d3 Mon Sep 17 00:00:00 2001 From: Siddharth Balyan <52913345+alt-glitch@users.noreply.github.com> Date: Sat, 18 Apr 2026 09:21:03 -0700 Subject: [PATCH 039/143] =?UTF-8?q?fix(nix):=20upgrade=20Python=203.11=20?= =?UTF-8?q?=E2=86=92=203.12,=20add=20cross-platform=20eval=20check=20(#122?= =?UTF-8?q?08)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- nix/checks.nix | 25 ++++++++++++++++++++++++- nix/devShell.nix | 2 +- nix/nixosModules.nix | 7 +++---- nix/packages.nix | 2 +- nix/python.nix | 20 +++++++++++--------- 5 files changed, 40 insertions(+), 16 deletions(-) diff --git a/nix/checks.nix b/nix/checks.nix index ff8e7947c57..984016a4f47 100644 --- a/nix/checks.nix +++ b/nix/checks.nix @@ -37,7 +37,30 @@ json.dump(sorted(leaf_paths(DEFAULT_CONFIG)), sys.stdout, indent=2) in { packages.configKeys = configKeys; - checks = lib.optionalAttrs pkgs.stdenv.hostPlatform.isLinux { + checks = { + # Cross-platform evaluation — catches "not supported for interpreter" + # errors (e.g. sphinx dropping python311) without needing a darwin builder. + # Evaluation is pure and instant; it doesn't build anything. + cross-eval = let + targetSystems = builtins.filter + (s: inputs.self.packages ? ${s}) + [ "x86_64-linux" "aarch64-linux" "aarch64-darwin" "x86_64-darwin" ]; + tryEvalPkg = sys: + let pkg = inputs.self.packages.${sys}.default; + in builtins.tryEval (builtins.seq pkg.drvPath true); + results = map (sys: { inherit sys; result = tryEvalPkg sys; }) targetSystems; + failures = builtins.filter (r: !r.result.success) results; + failMsg = lib.concatMapStringsSep "\n" (r: " - ${r.sys}") failures; + in pkgs.runCommand "hermes-cross-eval" { } ( + if failures != [] then + builtins.throw "Package fails to evaluate on:\n${failMsg}" + else '' + echo "PASS: package evaluates on all ${toString (builtins.length targetSystems)} platforms" + mkdir -p $out + echo "ok" > $out/result + '' + ); + } // lib.optionalAttrs pkgs.stdenv.hostPlatform.isLinux { # Verify binaries exist and are executable package-contents = pkgs.runCommand "hermes-package-contents" { } '' set -e diff --git a/nix/devShell.nix b/nix/devShell.nix index db39c9d9557..63edc59cf1e 100644 --- a/nix/devShell.nix +++ b/nix/devShell.nix @@ -12,7 +12,7 @@ devShells.default = pkgs.mkShell { inputsFrom = packages; packages = with pkgs; [ - python311 uv nodejs_22 ripgrep git openssh ffmpeg + python312 uv nodejs_22 ripgrep git openssh ffmpeg ]; shellHook = let diff --git a/nix/nixosModules.nix b/nix/nixosModules.nix index 24a2a1b6ddc..3f2709f8145 100644 --- a/nix/nixosModules.nix +++ b/nix/nixosModules.nix @@ -148,15 +148,14 @@ su -s /bin/sh "$TARGET_USER" -c 'curl -LsSf https://astral.sh/uv/install.sh | sh' || true fi - # Python 3.11 venv — gives the agent a writable Python with pip. - # Uses uv to install Python 3.11 (Ubuntu 24.04 ships 3.12). + # Python 3.12 venv — gives the agent a writable Python with pip. # --seed includes pip/setuptools so bare `pip install` works. _UV_BIN="$TARGET_HOME/.local/bin/uv" if [ ! -d "$TARGET_HOME/.venv" ] && [ -x "$_UV_BIN" ]; then su -s /bin/sh "$TARGET_USER" -c " export PATH=\"\$HOME/.local/bin:\$PATH\" - uv python install 3.11 - uv venv --python 3.11 --seed \"\$HOME/.venv\" + uv python install 3.12 + uv venv --python 3.12 --seed \"\$HOME/.venv\" " || true fi diff --git a/nix/packages.nix b/nix/packages.nix index 94e84af6d87..912be7843bd 100644 --- a/nix/packages.nix +++ b/nix/packages.nix @@ -87,7 +87,7 @@ STAMP_VALUE="${pyprojectHash}:${uvLockHash}" if [ ! -f "$STAMP" ] || [ "$(cat "$STAMP")" != "$STAMP_VALUE" ]; then echo "hermes-agent: installing Python dependencies..." - uv venv .venv --python ${pkgs.python311}/bin/python3 2>/dev/null || true + uv venv .venv --python ${pkgs.python312}/bin/python3 2>/dev/null || true source .venv/bin/activate uv pip install -e ".[all]" [ -d mini-swe-agent ] && uv pip install -e ./mini-swe-agent 2>/dev/null || true diff --git a/nix/python.nix b/nix/python.nix index 91411f4d754..0bcd017e76d 100644 --- a/nix/python.nix +++ b/nix/python.nix @@ -1,6 +1,6 @@ # nix/python.nix — uv2nix virtual environment builder { - python311, + python312, lib, callPackage, uv2nix, @@ -51,28 +51,30 @@ let pythonPackageOverrides = final: _prev: if isAarch64Darwin then { - numpy = mkPrebuiltOverride final python311.pkgs.numpy { }; + numpy = mkPrebuiltOverride final python312.pkgs.numpy { }; - av = mkPrebuiltOverride final python311.pkgs.av { }; + pyarrow = mkPrebuiltOverride final python312.pkgs.pyarrow { }; - humanfriendly = mkPrebuiltOverride final python311.pkgs.humanfriendly { }; + av = mkPrebuiltOverride final python312.pkgs.av { }; - coloredlogs = mkPrebuiltOverride final python311.pkgs.coloredlogs { + humanfriendly = mkPrebuiltOverride final python312.pkgs.humanfriendly { }; + + coloredlogs = mkPrebuiltOverride final python312.pkgs.coloredlogs { humanfriendly = [ ]; }; - onnxruntime = mkPrebuiltOverride final python311.pkgs.onnxruntime { + onnxruntime = mkPrebuiltOverride final python312.pkgs.onnxruntime { coloredlogs = [ ]; numpy = [ ]; packaging = [ ]; }; - ctranslate2 = mkPrebuiltOverride final python311.pkgs.ctranslate2 { + ctranslate2 = mkPrebuiltOverride final python312.pkgs.ctranslate2 { numpy = [ ]; pyyaml = [ ]; }; - faster-whisper = mkPrebuiltOverride final python311.pkgs.faster-whisper { + faster-whisper = mkPrebuiltOverride final python312.pkgs.faster-whisper { av = [ ]; ctranslate2 = [ ]; huggingface-hub = [ ]; @@ -84,7 +86,7 @@ let pythonSet = (callPackage pyproject-nix.build.packages { - python = python311; + python = python312; }).overrideScope (lib.composeManyExtensions [ pyproject-build-systems.overlays.default From 2da558ec36ea7c8743f0e686488af57da8be1634 Mon Sep 17 00:00:00 2001 From: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Date: Sat, 18 Apr 2026 17:36:06 +0530 Subject: [PATCH 040/143] fix(tui): clickable hyperlinks and skill slash command dispatch MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two TUI fixes: 1. Hyperlinks are now clickable (Cmd+Click / Ctrl+Click) in terminals that support OSC 8. The markdown renderer was rendering links as plain colored text — now wraps them in the existing component from @hermes/ink which emits OSC 8 escape sequences. 2. Skill slash commands (e.g. /hermes-agent-dev) now work in the TUI. The slash.exec handler was delegating to the _SlashWorker subprocess which calls cli.process_command(). For skills, process_command() queues the invocation message onto _pending_input — a Queue that nobody reads in the worker subprocess. The skill message was lost. Now slash.exec detects skill commands early and rejects them so the TUI falls through to command.dispatch, which correctly builds and returns the skill payload for the client to send(). --- tests/tui_gateway/test_protocol.py | 48 +++++++++++++++++++ tui_gateway/server.py | 13 +++++ .../src/__tests__/createSlashHandler.test.ts | 31 ++++++++++++ ui-tui/src/components/markdown.tsx | 22 +++++---- ui-tui/src/types/hermes-ink.d.ts | 1 + 5 files changed, 106 insertions(+), 9 deletions(-) diff --git a/tests/tui_gateway/test_protocol.py b/tests/tui_gateway/test_protocol.py index 6ee5fe65b65..77cd7b1678d 100644 --- a/tests/tui_gateway/test_protocol.py +++ b/tests/tui_gateway/test_protocol.py @@ -231,3 +231,51 @@ def test_cli_exec_blocked(server, argv): ]) def test_cli_exec_allowed(server, argv): assert server._cli_exec_blocked(argv) is None + + +# ── slash.exec skill command interception ──────────────────────────── + + +def test_slash_exec_rejects_skill_commands(server): + """slash.exec must reject skill commands so the TUI falls through to command.dispatch.""" + # Register a mock session + sid = "test-session" + server._sessions[sid] = {"session_key": sid, "agent": None} + + # Mock scan_skill_commands to return a known skill + fake_skills = {"/hermes-agent-dev": {"name": "hermes-agent-dev", "description": "Dev workflow"}} + + with patch("agent.skill_commands.scan_skill_commands", return_value=fake_skills): + resp = server.handle_request({ + "id": "r1", + "method": "slash.exec", + "params": {"command": "hermes-agent-dev", "session_id": sid}, + }) + + # Should return an error so the TUI's .catch() fires command.dispatch + assert "error" in resp + assert resp["error"]["code"] == 4018 + assert "skill command" in resp["error"]["message"] + + +def test_command_dispatch_returns_skill_payload(server): + """command.dispatch returns structured skill payload for the TUI to send().""" + sid = "test-session" + server._sessions[sid] = {"session_key": sid} + + fake_skills = {"/hermes-agent-dev": {"name": "hermes-agent-dev", "description": "Dev workflow"}} + fake_msg = "Loaded skill content here" + + with patch("agent.skill_commands.scan_skill_commands", return_value=fake_skills), \ + patch("agent.skill_commands.build_skill_invocation_message", return_value=fake_msg): + resp = server.handle_request({ + "id": "r2", + "method": "command.dispatch", + "params": {"name": "hermes-agent-dev", "session_id": sid}, + }) + + assert "error" not in resp + result = resp["result"] + assert result["type"] == "skill" + assert result["message"] == fake_msg + assert result["name"] == "hermes-agent-dev" diff --git a/tui_gateway/server.py b/tui_gateway/server.py index a7dae9e5c60..45c95a6dabe 100644 --- a/tui_gateway/server.py +++ b/tui_gateway/server.py @@ -2333,6 +2333,19 @@ def _(rid, params: dict) -> dict: if not cmd: return _err(rid, 4004, "empty command") + # Skill slash commands (e.g. /hermes-agent-dev) must NOT go through the + # slash worker — process_command() queues the skill payload onto + # _pending_input which nobody reads in the worker subprocess. Reject + # here so the TUI falls through to command.dispatch which handles skills + # correctly (builds the invocation message and returns it to the client). + try: + from agent.skill_commands import scan_skill_commands + _cmd_key = f"/{cmd.split()[0]}" if not cmd.startswith("/") else f"/{cmd.lstrip('/').split()[0]}" + if _cmd_key in scan_skill_commands(): + return _err(rid, 4018, f"skill command: use command.dispatch for {_cmd_key}") + except Exception: + pass + worker = session.get("slash_worker") if not worker: try: diff --git a/ui-tui/src/__tests__/createSlashHandler.test.ts b/ui-tui/src/__tests__/createSlashHandler.test.ts index 9e1db994634..a8f050a27da 100644 --- a/ui-tui/src/__tests__/createSlashHandler.test.ts +++ b/ui-tui/src/__tests__/createSlashHandler.test.ts @@ -121,6 +121,37 @@ describe('createSlashHandler', () => { expect(createSlashHandler(ctx)('/h')).toBe(true) expect(ctx.transcript.panel).toHaveBeenCalledWith(expect.any(String), expect.any(Array)) }) + + it('falls through to command.dispatch for skill commands and sends the message', async () => { + const skillMessage = 'Use this skill to do X.\n\n## Steps\n1. First step' + + const ctx = buildCtx({ + gateway: { + gw: { + getLogTail: vi.fn(() => ''), + request: vi.fn((method: string) => { + if (method === 'slash.exec') { + return Promise.reject(new Error('skill command: use command.dispatch')) + } + + if (method === 'command.dispatch') { + return Promise.resolve({ type: 'skill', message: skillMessage, name: 'hermes-agent-dev' }) + } + + return Promise.resolve({}) + }) + }, + rpc: vi.fn(() => Promise.resolve({})) + } + }) + + const h = createSlashHandler(ctx) + expect(h('/hermes-agent-dev')).toBe(true) + await vi.waitFor(() => { + expect(ctx.transcript.sys).toHaveBeenCalledWith('⚡ loading skill: hermes-agent-dev') + }) + expect(ctx.transcript.send).toHaveBeenCalledWith(skillMessage) + }) }) const buildCtx = (overrides: Partial = {}): Ctx => ({ diff --git a/ui-tui/src/components/markdown.tsx b/ui-tui/src/components/markdown.tsx index 865ab857960..4555c8505f6 100644 --- a/ui-tui/src/components/markdown.tsx +++ b/ui-tui/src/components/markdown.tsx @@ -1,4 +1,4 @@ -import { Box, Text } from '@hermes/ink' +import { Box, Link, Text } from '@hermes/ink' import { memo, type ReactNode, useMemo } from 'react' import type { Theme } from '../theme.js' @@ -22,10 +22,12 @@ type Fence = { len: number } -const renderLink = (key: number, t: Theme, label: string) => ( - - {label} - +const renderLink = (key: number, t: Theme, label: string, url: string) => ( + + + {label} + + ) const trimBareUrl = (value: string) => { @@ -38,9 +40,11 @@ const trimBareUrl = (value: string) => { } const renderAutolink = (key: number, t: Theme, raw: string) => ( - - {raw.replace(/^mailto:/, '')} - + + + {raw.replace(/^mailto:/, '')} + + ) const indentDepth = (indent: string) => Math.floor(indent.replace(/\t/g, ' ').length / 2) @@ -141,7 +145,7 @@ function MdInline({ t, text }: { t: Theme; text: string }) { ) } else if (m[4] && m[5]) { - parts.push(renderLink(parts.length, t, m[4])) + parts.push(renderLink(parts.length, t, m[4], m[5])) } else if (m[6]) { parts.push(renderAutolink(parts.length, t, m[6])) } else if (m[7]) { diff --git a/ui-tui/src/types/hermes-ink.d.ts b/ui-tui/src/types/hermes-ink.d.ts index 9b2deec35ff..faab71ae93d 100644 --- a/ui-tui/src/types/hermes-ink.d.ts +++ b/ui-tui/src/types/hermes-ink.d.ts @@ -63,6 +63,7 @@ declare module '@hermes/ink' { export const Box: React.ComponentType export const AlternateScreen: React.ComponentType export const Ansi: React.ComponentType + export const Link: React.ComponentType<{ readonly url: string; readonly children?: React.ReactNode; readonly fallback?: React.ReactNode }> export const NoSelect: React.ComponentType export const ScrollBox: React.ComponentType export const Text: React.ComponentType From abc95338c210a587c2b718d62a02dbf9c87076d1 Mon Sep 17 00:00:00 2001 From: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Date: Sat, 18 Apr 2026 17:52:19 +0530 Subject: [PATCH 041/143] fix(tui): slash.exec _pending_input commands, tool ANSI, terminal title MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Additional TUI fixes discovered in the same audit: 1. /plan slash command was silently lost — process_command() queues the plan skill invocation onto _pending_input which nobody reads in the slash worker subprocess. Now intercepted in slash.exec and routed through command.dispatch with a new 'send' dispatch type. Same interception added for /retry, /queue, /steer as safety nets (these already have correct TUI-local handlers in core.ts, but the server-side guard prevents regressions if the local handler is bypassed). 2. Tool results were stripping ANSI escape codes — the messageLine component used stripAnsi() + plain for tool role messages, losing all color/styling from terminal, search_files, etc. Now uses component (already imported) when ANSI is detected. 3. Terminal tab title now shows model + busy status via useTerminalTitle hook from @hermes/ink (was never used). Users can identify Hermes tabs and see at a glance whether the agent is busy or ready. 4. Added 'send' variant to CommandDispatchResponse type + asCommandDispatch parser + createSlashHandler handler for commands that need to inject a message into the conversation (plan, queue fallback, steer fallback). --- tests/tui_gateway/test_protocol.py | 66 +++++++++++++++++++ tui_gateway/server.py | 66 ++++++++++++++++++- .../src/__tests__/asCommandDispatch.test.ts | 8 ++- .../src/__tests__/createSlashHandler.test.ts | 30 +++++++++ ui-tui/src/app/createSlashHandler.ts | 4 ++ ui-tui/src/app/useMainApp.ts | 9 ++- ui-tui/src/components/messageLine.tsx | 14 ++-- ui-tui/src/gatewayTypes.ts | 1 + ui-tui/src/lib/rpc.ts | 4 ++ ui-tui/src/types/hermes-ink.d.ts | 1 + 10 files changed, 196 insertions(+), 7 deletions(-) diff --git a/tests/tui_gateway/test_protocol.py b/tests/tui_gateway/test_protocol.py index 77cd7b1678d..43f2b5a169b 100644 --- a/tests/tui_gateway/test_protocol.py +++ b/tests/tui_gateway/test_protocol.py @@ -258,6 +258,72 @@ def test_slash_exec_rejects_skill_commands(server): assert "skill command" in resp["error"]["message"] +@pytest.mark.parametrize("cmd", ["retry", "queue hello", "q hello", "steer fix the test", "plan"]) +def test_slash_exec_rejects_pending_input_commands(server, cmd): + """slash.exec must reject commands that use _pending_input in the CLI.""" + sid = "test-session" + server._sessions[sid] = {"session_key": sid, "agent": None} + + resp = server.handle_request({ + "id": "r1", + "method": "slash.exec", + "params": {"command": cmd, "session_id": sid}, + }) + + assert "error" in resp + assert resp["error"]["code"] == 4018 + assert "pending-input command" in resp["error"]["message"] + + +def test_command_dispatch_queue_sends_message(server): + """command.dispatch /queue returns {type: 'send', message: ...} for the TUI.""" + sid = "test-session" + server._sessions[sid] = {"session_key": sid} + + resp = server.handle_request({ + "id": "r1", + "method": "command.dispatch", + "params": {"name": "queue", "arg": "tell me about quantum computing", "session_id": sid}, + }) + + assert "error" not in resp + result = resp["result"] + assert result["type"] == "send" + assert result["message"] == "tell me about quantum computing" + + +def test_command_dispatch_queue_requires_arg(server): + """command.dispatch /queue without an argument returns an error.""" + sid = "test-session" + server._sessions[sid] = {"session_key": sid} + + resp = server.handle_request({ + "id": "r2", + "method": "command.dispatch", + "params": {"name": "queue", "arg": "", "session_id": sid}, + }) + + assert "error" in resp + assert resp["error"]["code"] == 4004 + + +def test_command_dispatch_steer_fallback_sends_message(server): + """command.dispatch /steer with no active agent falls back to send.""" + sid = "test-session" + server._sessions[sid] = {"session_key": sid, "agent": None} + + resp = server.handle_request({ + "id": "r3", + "method": "command.dispatch", + "params": {"name": "steer", "arg": "focus on testing", "session_id": sid}, + }) + + assert "error" not in resp + result = resp["result"] + assert result["type"] == "send" + assert result["message"] == "focus on testing" + + def test_command_dispatch_returns_skill_payload(server): """command.dispatch returns structured skill payload for the TUI to send().""" sid = "test-session" diff --git a/tui_gateway/server.py b/tui_gateway/server.py index 45c95a6dabe..bf8425a8d1a 100644 --- a/tui_gateway/server.py +++ b/tui_gateway/server.py @@ -2117,6 +2117,56 @@ def _(rid, params: dict) -> dict: except Exception: pass + # ── Commands that queue messages onto _pending_input in the CLI ─── + # In the TUI the slash worker subprocess has no reader for that queue, + # so we handle them here and return a structured payload. + + if name in ("queue", "q"): + if not arg: + return _err(rid, 4004, "usage: /queue ") + return _ok(rid, {"type": "send", "message": arg}) + + if name == "retry": + agent = session.get("agent") if session else None + if agent and hasattr(agent, "conversation_history"): + hist = agent.conversation_history or [] + for m in reversed(hist): + if m.get("role") == "user": + content = m.get("content", "") + if isinstance(content, list): + content = " ".join( + p.get("text", "") for p in content if isinstance(p, dict) and p.get("type") == "text" + ) + if content: + return _ok(rid, {"type": "send", "message": content}) + return _err(rid, 4018, "no previous user message to retry") + return _err(rid, 4018, "no active session to retry") + + if name == "steer": + if not arg: + return _err(rid, 4004, "usage: /steer ") + agent = session.get("agent") if session else None + if agent and hasattr(agent, "steer"): + try: + accepted = agent.steer(arg) + if accepted: + return _ok(rid, {"type": "exec", "output": f"⏩ Steer queued — arrives after the next tool call: {arg[:80]}{'...' if len(arg) > 80 else ''}"}) + except Exception: + pass + # Fallback: no active run, treat as next-turn message + return _ok(rid, {"type": "send", "message": arg}) + + if name == "plan": + try: + from agent.skill_commands import build_skill_invocation_message as _bsim, build_plan_path + plan_path = build_plan_path(session.get("session_key", "") if session else "") + msg = _bsim("/plan", f"{arg} {plan_path}".strip() if arg else plan_path, + task_id=session.get("session_key", "") if session else "") + if msg: + return _ok(rid, {"type": "send", "message": msg}) + except Exception as e: + return _err(rid, 5030, f"plan skill failed: {e}") + return _err(rid, 4018, f"not a quick/plugin/skill command: {name}") @@ -2338,9 +2388,23 @@ def _(rid, params: dict) -> dict: # _pending_input which nobody reads in the worker subprocess. Reject # here so the TUI falls through to command.dispatch which handles skills # correctly (builds the invocation message and returns it to the client). + # + # The same applies to /retry, /queue, /steer, and /plan — they all + # put messages on _pending_input that the slash worker never reads. + # (/browser connect/disconnect also uses _pending_input for context + # notes, but the actual browser operations need the slash worker's + # env-var side effects, so they stay in slash.exec — only the context + # note to the model is lost, which is low-severity.) + _PENDING_INPUT_COMMANDS = frozenset({"retry", "queue", "q", "steer", "plan"}) + _cmd_parts = cmd.split() if not cmd.startswith("/") else cmd.lstrip("/").split() + _cmd_base = _cmd_parts[0] if _cmd_parts else "" + + if _cmd_base in _PENDING_INPUT_COMMANDS: + return _err(rid, 4018, f"pending-input command: use command.dispatch for /{_cmd_base}") + try: from agent.skill_commands import scan_skill_commands - _cmd_key = f"/{cmd.split()[0]}" if not cmd.startswith("/") else f"/{cmd.lstrip('/').split()[0]}" + _cmd_key = f"/{_cmd_base}" if _cmd_key in scan_skill_commands(): return _err(rid, 4018, f"skill command: use command.dispatch for {_cmd_key}") except Exception: diff --git a/ui-tui/src/__tests__/asCommandDispatch.test.ts b/ui-tui/src/__tests__/asCommandDispatch.test.ts index 49ea56936c5..dfa7595174e 100644 --- a/ui-tui/src/__tests__/asCommandDispatch.test.ts +++ b/ui-tui/src/__tests__/asCommandDispatch.test.ts @@ -3,7 +3,7 @@ import { describe, expect, it } from 'vitest' import { asCommandDispatch } from '../lib/rpc.js' describe('asCommandDispatch', () => { - it('parses exec, alias, and skill', () => { + it('parses exec, alias, skill, and send', () => { expect(asCommandDispatch({ type: 'exec', output: 'hi' })).toEqual({ type: 'exec', output: 'hi' }) expect(asCommandDispatch({ type: 'alias', target: 'help' })).toEqual({ type: 'alias', target: 'help' }) expect(asCommandDispatch({ type: 'skill', name: 'x', message: 'do' })).toEqual({ @@ -11,11 +11,17 @@ describe('asCommandDispatch', () => { name: 'x', message: 'do' }) + expect(asCommandDispatch({ type: 'send', message: 'hello world' })).toEqual({ + type: 'send', + message: 'hello world' + }) }) it('rejects malformed payloads', () => { expect(asCommandDispatch(null)).toBeNull() expect(asCommandDispatch({ type: 'alias' })).toBeNull() expect(asCommandDispatch({ type: 'skill', name: 1 })).toBeNull() + expect(asCommandDispatch({ type: 'send' })).toBeNull() + expect(asCommandDispatch({ type: 'send', message: 42 })).toBeNull() }) }) diff --git a/ui-tui/src/__tests__/createSlashHandler.test.ts b/ui-tui/src/__tests__/createSlashHandler.test.ts index a8f050a27da..53a10fd8e02 100644 --- a/ui-tui/src/__tests__/createSlashHandler.test.ts +++ b/ui-tui/src/__tests__/createSlashHandler.test.ts @@ -152,6 +152,36 @@ describe('createSlashHandler', () => { }) expect(ctx.transcript.send).toHaveBeenCalledWith(skillMessage) }) + + it('handles send-type dispatch for /plan command', async () => { + const planMessage = 'Plan skill content loaded' + + const ctx = buildCtx({ + gateway: { + gw: { + getLogTail: vi.fn(() => ''), + request: vi.fn((method: string) => { + if (method === 'slash.exec') { + return Promise.reject(new Error('pending-input command')) + } + + if (method === 'command.dispatch') { + return Promise.resolve({ type: 'send', message: planMessage }) + } + + return Promise.resolve({}) + }) + }, + rpc: vi.fn(() => Promise.resolve({})) + } + }) + + const h = createSlashHandler(ctx) + expect(h('/plan create a REST API')).toBe(true) + await vi.waitFor(() => { + expect(ctx.transcript.send).toHaveBeenCalledWith(planMessage) + }) + }) }) const buildCtx = (overrides: Partial = {}): Ctx => ({ diff --git a/ui-tui/src/app/createSlashHandler.ts b/ui-tui/src/app/createSlashHandler.ts index 87475341aea..425e778ef3d 100644 --- a/ui-tui/src/app/createSlashHandler.ts +++ b/ui-tui/src/app/createSlashHandler.ts @@ -105,6 +105,10 @@ export function createSlashHandler(ctx: SlashHandlerContext): (cmd: string) => b return d.message?.trim() ? send(d.message) : sys(`/${parsed.name}: skill payload missing message`) } + + if (d.type === 'send') { + return d.message?.trim() ? send(d.message) : sys(`/${parsed.name}: empty message`) + } }) .catch(guardedErr) }) diff --git a/ui-tui/src/app/useMainApp.ts b/ui-tui/src/app/useMainApp.ts index 73ea9febdac..46ab21c725a 100644 --- a/ui-tui/src/app/useMainApp.ts +++ b/ui-tui/src/app/useMainApp.ts @@ -1,4 +1,4 @@ -import { type ScrollBoxHandle, useApp, useHasSelection, useSelection, useStdout } from '@hermes/ink' +import { type ScrollBoxHandle, useApp, useHasSelection, useSelection, useStdout, useTerminalTitle } from '@hermes/ink' import { useStore } from '@nanostores/react' import { useCallback, useEffect, useMemo, useRef, useState } from 'react' @@ -284,6 +284,13 @@ export function useMainApp(gw: GatewayClient) { useConfigSync({ gw, setBellOnComplete, setVoiceEnabled, sid: ui.sid }) + // ── Terminal tab title ───────────────────────────────────────────── + // Show model name + status so users can identify the Hermes tab. + const shortModel = ui.info?.model?.replace(/^.*\//, '') ?? '' + const titleStatus = ui.busy ? '⏳' : '✓' + const terminalTitle = shortModel ? `${titleStatus} ${shortModel} — Hermes` : 'Hermes' + useTerminalTitle(terminalTitle) + useEffect(() => { if (!ui.sid || !stdout) { return diff --git a/ui-tui/src/components/messageLine.tsx b/ui-tui/src/components/messageLine.tsx index 59db604e4bd..9cf78c15901 100644 --- a/ui-tui/src/components/messageLine.tsx +++ b/ui-tui/src/components/messageLine.tsx @@ -28,12 +28,18 @@ export const MessageLine = memo(function MessageLine({ } if (msg.role === 'tool') { + const preview = compactPreview(hasAnsi(msg.text) ? stripAnsi(msg.text) : msg.text, Math.max(24, cols - 14)) || + '(empty tool result)' + return ( - - {compactPreview(hasAnsi(msg.text) ? stripAnsi(msg.text) : msg.text, Math.max(24, cols - 14)) || - '(empty tool result)'} - + {hasAnsi(msg.text) ? ( + {compactPreview(msg.text, Math.max(24, cols - 14)) || '(empty tool result)'} + ) : ( + + {preview} + + )} ) } diff --git a/ui-tui/src/gatewayTypes.ts b/ui-tui/src/gatewayTypes.ts index c8d1c685523..e17e0e7c718 100644 --- a/ui-tui/src/gatewayTypes.ts +++ b/ui-tui/src/gatewayTypes.ts @@ -47,6 +47,7 @@ export type CommandDispatchResponse = | { output?: string; type: 'exec' | 'plugin' } | { target: string; type: 'alias' } | { message?: string; name: string; type: 'skill' } + | { message: string; type: 'send' } // ── Config ─────────────────────────────────────────────────────────── diff --git a/ui-tui/src/lib/rpc.ts b/ui-tui/src/lib/rpc.ts index 1697d142bbf..70faa4bbbe1 100644 --- a/ui-tui/src/lib/rpc.ts +++ b/ui-tui/src/lib/rpc.ts @@ -26,6 +26,10 @@ export const asCommandDispatch = (value: unknown): CommandDispatchResponse | nul return { type: 'skill', name: o.name, message: typeof o.message === 'string' ? o.message : undefined } } + if (t === 'send' && typeof o.message === 'string') { + return { type: 'send', message: o.message } + } + return null } diff --git a/ui-tui/src/types/hermes-ink.d.ts b/ui-tui/src/types/hermes-ink.d.ts index faab71ae93d..6815e4211b7 100644 --- a/ui-tui/src/types/hermes-ink.d.ts +++ b/ui-tui/src/types/hermes-ink.d.ts @@ -93,6 +93,7 @@ declare module '@hermes/ink' { export function useHasSelection(): boolean export function useStdout(): { readonly stdout?: NodeJS.WriteStream } export function useTerminalFocus(): boolean + export function useTerminalTitle(title: string | null): void export function useDeclaredCursor(args: { readonly line: number readonly column: number From 656c375855f7ec331c43d4c796881b02ed2a5218 Mon Sep 17 00:00:00 2001 From: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Date: Sat, 18 Apr 2026 21:54:24 +0530 Subject: [PATCH 042/143] =?UTF-8?q?fix(tui):=20review=20follow-up=20?= =?UTF-8?q?=E2=80=94=20/retry,=20/plan,=20ANSI=20truncation,=20caching?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - /retry: use session['history'] instead of non-existent agent.conversation_history; truncate history at last user message to match CLI retry_last() behavior; add history_lock safety - /plan: pass user instruction (arg) to build_plan_path instead of session_key; add runtime_note so agent knows where to save the plan - ANSI tool results: render full text via instead of slicing raw ANSI through compactPreview (which cuts mid-escape-sequence producing garbled output) - Move _PENDING_INPUT_COMMANDS frozenset to module level - Use get_skill_commands() (cached) instead of scan_skill_commands() (rescans disk) in slash.exec skill interception - Add 3 retry tests: happy path with history truncation verification, empty history error, multipart content extraction - Update test mock target from scan_skill_commands to get_skill_commands --- tests/tui_gateway/test_protocol.py | 86 ++++++++++++++++++++++++++- tui_gateway/server.py | 73 ++++++++++++++--------- ui-tui/src/components/messageLine.tsx | 7 ++- 3 files changed, 135 insertions(+), 31 deletions(-) diff --git a/tests/tui_gateway/test_protocol.py b/tests/tui_gateway/test_protocol.py index 43f2b5a169b..eb51cccfecb 100644 --- a/tests/tui_gateway/test_protocol.py +++ b/tests/tui_gateway/test_protocol.py @@ -245,7 +245,7 @@ def test_slash_exec_rejects_skill_commands(server): # Mock scan_skill_commands to return a known skill fake_skills = {"/hermes-agent-dev": {"name": "hermes-agent-dev", "description": "Dev workflow"}} - with patch("agent.skill_commands.scan_skill_commands", return_value=fake_skills): + with patch("agent.skill_commands.get_skill_commands", return_value=fake_skills): resp = server.handle_request({ "id": "r1", "method": "slash.exec", @@ -324,6 +324,90 @@ def test_command_dispatch_steer_fallback_sends_message(server): assert result["message"] == "focus on testing" +def test_command_dispatch_retry_finds_last_user_message(server): + """command.dispatch /retry walks session['history'] to find the last user message.""" + sid = "test-session" + history = [ + {"role": "user", "content": "first question"}, + {"role": "assistant", "content": "first answer"}, + {"role": "user", "content": "second question"}, + {"role": "assistant", "content": "second answer"}, + ] + server._sessions[sid] = { + "session_key": sid, + "agent": None, + "history": history, + "history_lock": threading.Lock(), + "history_version": 0, + } + + resp = server.handle_request({ + "id": "r4", + "method": "command.dispatch", + "params": {"name": "retry", "session_id": sid}, + }) + + assert "error" not in resp + result = resp["result"] + assert result["type"] == "send" + assert result["message"] == "second question" + # Verify history was truncated: everything from last user message onward removed + assert len(server._sessions[sid]["history"]) == 2 + assert server._sessions[sid]["history"][-1]["role"] == "assistant" + assert server._sessions[sid]["history_version"] == 1 + + +def test_command_dispatch_retry_empty_history(server): + """command.dispatch /retry with empty history returns error.""" + sid = "test-session" + server._sessions[sid] = { + "session_key": sid, + "agent": None, + "history": [], + "history_lock": threading.Lock(), + "history_version": 0, + } + + resp = server.handle_request({ + "id": "r5", + "method": "command.dispatch", + "params": {"name": "retry", "session_id": sid}, + }) + + assert "error" in resp + assert resp["error"]["code"] == 4018 + + +def test_command_dispatch_retry_handles_multipart_content(server): + """command.dispatch /retry extracts text from multipart content lists.""" + sid = "test-session" + history = [ + {"role": "user", "content": [ + {"type": "text", "text": "analyze this"}, + {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}} + ]}, + {"role": "assistant", "content": "I see the image."}, + ] + server._sessions[sid] = { + "session_key": sid, + "agent": None, + "history": history, + "history_lock": threading.Lock(), + "history_version": 0, + } + + resp = server.handle_request({ + "id": "r6", + "method": "command.dispatch", + "params": {"name": "retry", "session_id": sid}, + }) + + assert "error" not in resp + result = resp["result"] + assert result["type"] == "send" + assert result["message"] == "analyze this" + + def test_command_dispatch_returns_skill_payload(server): """command.dispatch returns structured skill payload for the TUI to send().""" sid = "test-session" diff --git a/tui_gateway/server.py b/tui_gateway/server.py index bf8425a8d1a..ccb9f7260bf 100644 --- a/tui_gateway/server.py +++ b/tui_gateway/server.py @@ -1949,6 +1949,13 @@ _TUI_EXTRA: list[tuple[str, str, str]] = [ ("/logs", "Show recent gateway log lines", "TUI"), ] +# Commands that queue messages onto _pending_input in the CLI. +# In the TUI the slash worker subprocess has no reader for that queue, +# so slash.exec rejects them → TUI falls through to command.dispatch. +_PENDING_INPUT_COMMANDS: frozenset[str] = frozenset({ + "retry", "queue", "q", "steer", "plan", +}) + @method("commands.catalog") def _(rid, params: dict) -> dict: @@ -2127,20 +2134,32 @@ def _(rid, params: dict) -> dict: return _ok(rid, {"type": "send", "message": arg}) if name == "retry": - agent = session.get("agent") if session else None - if agent and hasattr(agent, "conversation_history"): - hist = agent.conversation_history or [] - for m in reversed(hist): - if m.get("role") == "user": - content = m.get("content", "") - if isinstance(content, list): - content = " ".join( - p.get("text", "") for p in content if isinstance(p, dict) and p.get("type") == "text" - ) - if content: - return _ok(rid, {"type": "send", "message": content}) + if not session: + return _err(rid, 4001, "no active session to retry") + history = session.get("history", []) + if not history: return _err(rid, 4018, "no previous user message to retry") - return _err(rid, 4018, "no active session to retry") + # Walk backwards to find the last user message + last_user_idx = None + for i in range(len(history) - 1, -1, -1): + if history[i].get("role") == "user": + last_user_idx = i + break + if last_user_idx is None: + return _err(rid, 4018, "no previous user message to retry") + content = history[last_user_idx].get("content", "") + if isinstance(content, list): + content = " ".join( + p.get("text", "") for p in content if isinstance(p, dict) and p.get("type") == "text" + ) + if not content: + return _err(rid, 4018, "last user message is empty") + # Truncate history: remove everything from the last user message onward + # (mirrors CLI retry_last() which strips the failed exchange) + with session["history_lock"]: + session["history"] = history[:last_user_idx] + session["history_version"] = int(session.get("history_version", 0)) + 1 + return _ok(rid, {"type": "send", "message": content}) if name == "steer": if not arg: @@ -2159,9 +2178,16 @@ def _(rid, params: dict) -> dict: if name == "plan": try: from agent.skill_commands import build_skill_invocation_message as _bsim, build_plan_path - plan_path = build_plan_path(session.get("session_key", "") if session else "") - msg = _bsim("/plan", f"{arg} {plan_path}".strip() if arg else plan_path, - task_id=session.get("session_key", "") if session else "") + user_instruction = arg or "" + plan_path = build_plan_path(user_instruction) + msg = _bsim( + "/plan", user_instruction, + task_id=session.get("session_key", "") if session else "", + runtime_note=( + "Save the markdown plan with write_file to this exact relative path " + f"inside the active workspace/backend cwd: {plan_path}" + ), + ) if msg: return _ok(rid, {"type": "send", "message": msg}) except Exception as e: @@ -2383,19 +2409,12 @@ def _(rid, params: dict) -> dict: if not cmd: return _err(rid, 4004, "empty command") - # Skill slash commands (e.g. /hermes-agent-dev) must NOT go through the - # slash worker — process_command() queues the skill payload onto - # _pending_input which nobody reads in the worker subprocess. Reject - # here so the TUI falls through to command.dispatch which handles skills - # correctly (builds the invocation message and returns it to the client). - # - # The same applies to /retry, /queue, /steer, and /plan — they all - # put messages on _pending_input that the slash worker never reads. + # Skill slash commands and _pending_input commands must NOT go through the + # slash worker — see _PENDING_INPUT_COMMANDS definition above. # (/browser connect/disconnect also uses _pending_input for context # notes, but the actual browser operations need the slash worker's # env-var side effects, so they stay in slash.exec — only the context # note to the model is lost, which is low-severity.) - _PENDING_INPUT_COMMANDS = frozenset({"retry", "queue", "q", "steer", "plan"}) _cmd_parts = cmd.split() if not cmd.startswith("/") else cmd.lstrip("/").split() _cmd_base = _cmd_parts[0] if _cmd_parts else "" @@ -2403,9 +2422,9 @@ def _(rid, params: dict) -> dict: return _err(rid, 4018, f"pending-input command: use command.dispatch for /{_cmd_base}") try: - from agent.skill_commands import scan_skill_commands + from agent.skill_commands import get_skill_commands _cmd_key = f"/{_cmd_base}" - if _cmd_key in scan_skill_commands(): + if _cmd_key in get_skill_commands(): return _err(rid, 4018, f"skill command: use command.dispatch for {_cmd_key}") except Exception: pass diff --git a/ui-tui/src/components/messageLine.tsx b/ui-tui/src/components/messageLine.tsx index 9cf78c15901..9de6f2aa12b 100644 --- a/ui-tui/src/components/messageLine.tsx +++ b/ui-tui/src/components/messageLine.tsx @@ -28,13 +28,14 @@ export const MessageLine = memo(function MessageLine({ } if (msg.role === 'tool') { - const preview = compactPreview(hasAnsi(msg.text) ? stripAnsi(msg.text) : msg.text, Math.max(24, cols - 14)) || - '(empty tool result)' + const maxChars = Math.max(24, cols - 14) + const stripped = hasAnsi(msg.text) ? stripAnsi(msg.text) : msg.text + const preview = compactPreview(stripped, maxChars) || '(empty tool result)' return ( {hasAnsi(msg.text) ? ( - {compactPreview(msg.text, Math.max(24, cols - 14)) || '(empty tool result)'} + {msg.text} ) : ( {preview} From c14b3b58806e7abd01d9ee01e4ff218c01590cd0 Mon Sep 17 00:00:00 2001 From: kshitij <82637225+kshitijk4poor@users.noreply.github.com> Date: Sat, 18 Apr 2026 09:35:51 -0700 Subject: [PATCH 043/143] fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo) (#12144) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo) The prior override only matched the literal model name "kimi-for-coding", but Moonshot's coding endpoint is hit with real model IDs such as `kimi-k2.5`, `kimi-k2-turbo-preview`, `kimi-k2-thinking`, etc. Those requests bypassed the override and kept the caller's temperature, so Moonshot returns HTTP 400 "invalid temperature: only 0.6 is allowed for this model" (or 1.0 for thinking variants). Match the whole kimi-k2.* family: * kimi-k2-thinking / kimi-k2-thinking-turbo -> 1.0 (thinking mode) * all other kimi-k2.* -> 0.6 (non-thinking / instant mode) Also accept an optional vendor prefix (e.g. `moonshotai/kimi-k2.5`) so aggregator routings are covered. * refactor(kimi): whitelist-match kimi coding models instead of prefix Addresses review feedback on PR #12144. - Replace `startswith("kimi-k2")` with explicit frozensets sourced from Moonshot's kimi-for-coding model list. The prefix match would have also clamped `kimi-k2-instruct` / `kimi-k2-instruct-0905`, which are the separate non-coding K2 family with variable temperature (recommended 0.6 but not enforced — see huggingface.co/moonshotai/Kimi-K2-Instruct). - Confirmed via platform.kimi.ai docs that all five coding models (k2.5, k2-turbo-preview, k2-0905-preview, k2-thinking, k2-thinking-turbo) share the fixed-temperature lock, so the preview-model mapping is no longer an assumption. - Drop the fragile `"thinking" in bare` substring test for a set lookup. - Log a debug line on each override so operators can see when Hermes silently rewrites temperature. - Update class docstring. Extend the negative test to parametrize over kimi-k2-instruct, Kimi-K2-Instruct-0905, and a hypothetical future kimi-k2-experimental name — all must keep the caller's temperature. --- agent/auxiliary_client.py | 41 +++++++++++++++++++-- tests/agent/test_auxiliary_client.py | 54 ++++++++++++++++++++++++++-- 2 files changed, 90 insertions(+), 5 deletions(-) diff --git a/agent/auxiliary_client.py b/agent/auxiliary_client.py index 568d6109220..126f4615ddb 100644 --- a/agent/auxiliary_client.py +++ b/agent/auxiliary_client.py @@ -99,11 +99,48 @@ _FIXED_TEMPERATURE_MODELS: Dict[str, float] = { "kimi-for-coding": 0.6, } +# Moonshot's kimi-for-coding endpoint (api.kimi.com/coding) documents: +# "k2.5 model will use a fixed value 1.0, non-thinking mode will use a fixed +# value 0.6. Any other value will result in an error." The same lock applies +# to the other k2.* models served on that endpoint. Enumerated explicitly so +# non-coding siblings like `kimi-k2-instruct` (variable temperature, served on +# the standard chat API and third parties) are NOT clamped. +# Source: https://platform.kimi.ai/docs/guide/kimi-k2-5-quickstart +_KIMI_INSTANT_MODELS: frozenset = frozenset({ + "kimi-k2.5", + "kimi-k2-turbo-preview", + "kimi-k2-0905-preview", +}) +_KIMI_THINKING_MODELS: frozenset = frozenset({ + "kimi-k2-thinking", + "kimi-k2-thinking-turbo", +}) + def _fixed_temperature_for_model(model: Optional[str]) -> Optional[float]: - """Return a required temperature override for models with strict contracts.""" + """Return a required temperature override for models with strict contracts. + + Moonshot's kimi-for-coding endpoint rejects any non-approved temperature on + the k2.5 family. Non-thinking variants require exactly 0.6; thinking + variants require 1.0. An optional ``vendor/`` prefix (e.g. + ``moonshotai/kimi-k2.5``) is tolerated for aggregator routings. + + Returns ``None`` for every other model, including ``kimi-k2-instruct*`` + which is the separate non-coding K2 family with variable temperature. + """ normalized = (model or "").strip().lower() - return _FIXED_TEMPERATURE_MODELS.get(normalized) + fixed = _FIXED_TEMPERATURE_MODELS.get(normalized) + if fixed is not None: + logger.debug("Forcing temperature=%s for model %r (fixed map)", fixed, model) + return fixed + bare = normalized.rsplit("/", 1)[-1] + if bare in _KIMI_THINKING_MODELS: + logger.debug("Forcing temperature=1.0 for kimi thinking model %r", model) + return 1.0 + if bare in _KIMI_INSTANT_MODELS: + logger.debug("Forcing temperature=0.6 for kimi instant model %r", model) + return 0.6 + return None # Default auxiliary models for direct API-key providers (cheap/fast for side tasks) _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = { diff --git a/tests/agent/test_auxiliary_client.py b/tests/agent/test_auxiliary_client.py index 1778855ddd7..aea8152a53e 100644 --- a/tests/agent/test_auxiliary_client.py +++ b/tests/agent/test_auxiliary_client.py @@ -697,7 +697,12 @@ class TestIsConnectionError: class TestKimiForCodingTemperature: - """kimi-for-coding now requires temperature=0.6 exactly.""" + """Moonshot kimi-for-coding models require fixed temperatures. + + k2.5 / k2-turbo-preview / k2-0905-preview → 0.6 (non-thinking lock). + k2-thinking / k2-thinking-turbo → 1.0 (thinking lock). + kimi-k2-instruct* and every other model preserve the caller's temperature. + """ def test_build_call_kwargs_forces_fixed_temperature(self): from agent.auxiliary_client import _build_call_kwargs @@ -772,12 +777,55 @@ class TestKimiForCodingTemperature: assert kwargs["model"] == "kimi-for-coding" assert kwargs["temperature"] == 0.6 - def test_non_kimi_model_still_preserves_temperature(self): + @pytest.mark.parametrize( + "model,expected", + [ + ("kimi-k2.5", 0.6), + ("kimi-k2-turbo-preview", 0.6), + ("kimi-k2-0905-preview", 0.6), + ("kimi-k2-thinking", 1.0), + ("kimi-k2-thinking-turbo", 1.0), + ("moonshotai/kimi-k2.5", 0.6), + ("moonshotai/Kimi-K2-Thinking", 1.0), + ], + ) + def test_kimi_k2_family_temperature_override(self, model, expected): + """Moonshot kimi-k2.* models only accept fixed temperatures. + + Non-thinking models → 0.6, thinking-mode models → 1.0. + """ from agent.auxiliary_client import _build_call_kwargs kwargs = _build_call_kwargs( provider="kimi-coding", - model="kimi-k2.5", + model=model, + messages=[{"role": "user", "content": "hello"}], + temperature=0.3, + ) + + assert kwargs["temperature"] == expected + + @pytest.mark.parametrize( + "model", + [ + "anthropic/claude-sonnet-4-6", + "gpt-5.4", + # kimi-k2-instruct is the non-coding K2 family — temperature is + # variable (recommended 0.6 but not enforced). Must not clamp. + "kimi-k2-instruct", + "moonshotai/Kimi-K2-Instruct", + "moonshotai/Kimi-K2-Instruct-0905", + "kimi-k2-instruct-0905", + # Hypothetical future kimi name not in the whitelist. + "kimi-k2-experimental", + ], + ) + def test_non_restricted_model_preserves_temperature(self, model): + from agent.auxiliary_client import _build_call_kwargs + + kwargs = _build_call_kwargs( + provider="openrouter", + model=model, messages=[{"role": "user", "content": "hello"}], temperature=0.3, ) From b0bde98b0fb17c0015481e2f38b655f0a07558fa Mon Sep 17 00:00:00 2001 From: bluefishs <125471205+bluefishs@users.noreply.github.com> Date: Sun, 19 Apr 2026 00:50:24 +0800 Subject: [PATCH 044/143] fix(docker): build web/ dashboard assets in image (#12180) The Dockerfile installs root-level npm dependencies (for Playwright) and the whatsapp-bridge bundle, but never builds the web/ Vite project. As a result, 'hermes dashboard' starts FastAPI on :9119 but serves a broken SPA because hermes_cli/web_dist/ is empty and requests to /assets/index-.js 404. Add a build step inside web/ so the Vite output is baked into the image. Reproduce (before): docker build -t hermes-repro -f Dockerfile . docker run --rm -p 9119:9119 hermes-repro hermes dashboard curl -sI http://localhost:9119/assets/ | head -1 # -> 404 After: /assets/ returns the built asset path. --- Dockerfile | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/Dockerfile b/Dockerfile index 37038233262..4f88a303d43 100644 --- a/Dockerfile +++ b/Dockerfile @@ -31,6 +31,12 @@ RUN npm install --prefer-offline --no-audit && \ npm install --prefer-offline --no-audit && \ npm cache clean --force +# Build the web/ dashboard so FastAPI at :9119 can serve the Vite assets +RUN cd /opt/hermes/web && \ + npm install --prefer-offline --no-audit && \ + npm run build && \ + npm cache clean --force + # Hand ownership to hermes user, then install Python deps in a virtualenv RUN chown -R hermes:hermes /opt/hermes USER hermes From a828daa7f8eb8f2969c2c46a7796845bab900d04 Mon Sep 17 00:00:00 2001 From: Siddharth Balyan <52913345+alt-glitch@users.noreply.github.com> Date: Sat, 18 Apr 2026 10:14:31 -0700 Subject: [PATCH 045/143] perf(docker): layer-cache npm/Playwright and skip redundant web rebuild (#12225) * perf(docker): layer-cache npm/Playwright and skip redundant web rebuild Copy package manifests before source so npm install + Playwright only re-run when lockfiles change. Use COPY --chown instead of chown -R, set HERMES_WEB_DIST to skip runtime web rebuild, and drop the USER root / chmod dance since entrypoint.sh is already executable in git. * Update Dockerfile --- Dockerfile | 34 +++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 15 deletions(-) diff --git a/Dockerfile b/Dockerfile index 4f88a303d43..0d3da72eb77 100644 --- a/Dockerfile +++ b/Dockerfile @@ -21,32 +21,36 @@ RUN useradd -u 10000 -m -d /opt/data hermes COPY --chmod=0755 --from=gosu_source /gosu /usr/local/bin/ COPY --chmod=0755 --from=uv_source /usr/local/bin/uv /usr/local/bin/uvx /usr/local/bin/ -COPY . /opt/hermes WORKDIR /opt/hermes -# Install Node dependencies and Playwright as root (--with-deps needs apt) +# ---------- Layer-cached dependency install ---------- +# Copy only package manifests first so npm install + Playwright are cached +# unless the lockfiles themselves change. +COPY package.json package-lock.json ./ +COPY scripts/whatsapp-bridge/package.json scripts/whatsapp-bridge/package-lock.json scripts/whatsapp-bridge/ +COPY web/package.json web/package-lock.json web/ + RUN npm install --prefer-offline --no-audit && \ npx playwright install --with-deps chromium --only-shell && \ - cd /opt/hermes/scripts/whatsapp-bridge && \ - npm install --prefer-offline --no-audit && \ + (cd scripts/whatsapp-bridge && npm install --prefer-offline --no-audit) && \ + (cd web && npm install --prefer-offline --no-audit) && \ npm cache clean --force -# Build the web/ dashboard so FastAPI at :9119 can serve the Vite assets -RUN cd /opt/hermes/web && \ - npm install --prefer-offline --no-audit && \ - npm run build && \ - npm cache clean --force +# ---------- Source code ---------- +# .dockerignore excludes node_modules, so the installs above survive. +COPY --chown=hermes:hermes . . -# Hand ownership to hermes user, then install Python deps in a virtualenv -RUN chown -R hermes:hermes /opt/hermes +# Build web dashboard (Vite outputs to hermes_cli/web_dist/) +RUN cd web && npm run build + +# ---------- Python virtualenv ---------- +RUN chown hermes:hermes /opt/hermes USER hermes - RUN uv venv && \ uv pip install --no-cache-dir -e ".[all]" -USER root -RUN chmod +x /opt/hermes/docker/entrypoint.sh - +# ---------- Runtime ---------- +ENV HERMES_WEB_DIST=/opt/hermes/hermes_cli/web_dist ENV HERMES_HOME=/opt/data VOLUME [ "/opt/data" ] ENTRYPOINT [ "/opt/hermes/docker/entrypoint.sh" ] From 65c0a30a776d2d20161658d7cfa8fe8ac78627ed Mon Sep 17 00:00:00 2001 From: Teknium Date: Tue, 14 Apr 2026 15:53:57 -0700 Subject: [PATCH 046/143] =?UTF-8?q?feat(skills):=20add=20baoyu-infographic?= =?UTF-8?q?=20skill=20=E2=80=94=2021=20layouts=20=C3=97=2021=20styles?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Port of baoyu-infographic from JimLiu/baoyu-skills (v1.56.1) adapted for Hermes Agent's tool ecosystem. Adaptations from upstream: - Frontmatter: openclaw metadata → hermes metadata - Usage: slash command syntax → natural language triggers - Removed EXTEND.md config system (not part of Hermes infrastructure) - AskUserQuestion → clarify tool (one question at a time) - Image generation → image_generate tool - Removed Windows-specific paths - Simplified file operations to use Hermes file tools - All 45 reference files (layouts, styles, templates) preserved intact Attribution preserved per agreement with 宝玉 (Jim Liu): - author, version, GitHub homepage URL in frontmatter Co-authored-by: Jim Liu 宝玉 --- skills/creative/baoyu-infographic/SKILL.md | 236 +++++++++++++++++ .../references/analysis-framework.md | 182 +++++++++++++ .../references/base-prompt.md | 43 +++ .../references/layouts/bento-grid.md | 41 +++ .../references/layouts/binary-comparison.md | 48 ++++ .../references/layouts/bridge.md | 41 +++ .../references/layouts/circular-flow.md | 41 +++ .../references/layouts/comic-strip.md | 41 +++ .../references/layouts/comparison-matrix.md | 41 +++ .../references/layouts/dashboard.md | 41 +++ .../references/layouts/dense-modules.md | 72 ++++++ .../references/layouts/funnel.md | 41 +++ .../references/layouts/hierarchical-layers.md | 48 ++++ .../references/layouts/hub-spoke.md | 41 +++ .../references/layouts/iceberg.md | 41 +++ .../references/layouts/isometric-map.md | 41 +++ .../references/layouts/jigsaw.md | 41 +++ .../references/layouts/linear-progression.md | 48 ++++ .../references/layouts/periodic-table.md | 41 +++ .../references/layouts/story-mountain.md | 41 +++ .../layouts/structural-breakdown.md | 48 ++++ .../references/layouts/tree-branching.md | 41 +++ .../references/layouts/venn-diagram.md | 41 +++ .../references/layouts/winding-roadmap.md | 41 +++ .../references/structured-content-template.md | 244 ++++++++++++++++++ .../references/styles/aged-academia.md | 36 +++ .../references/styles/bold-graphic.md | 36 +++ .../references/styles/chalkboard.md | 61 +++++ .../references/styles/claymation.md | 29 +++ .../references/styles/corporate-memphis.md | 29 +++ .../references/styles/craft-handmade.md | 44 ++++ .../references/styles/cyberpunk-neon.md | 29 +++ .../references/styles/hand-drawn-edu.md | 63 +++++ .../references/styles/ikea-manual.md | 29 +++ .../references/styles/kawaii.md | 29 +++ .../references/styles/knolling.md | 29 +++ .../references/styles/lego-brick.md | 29 +++ .../references/styles/morandi-journal.md | 60 +++++ .../references/styles/origami.md | 29 +++ .../references/styles/pixel-art.md | 29 +++ .../references/styles/pop-laboratory.md | 48 ++++ .../references/styles/retro-pop-grid.md | 47 ++++ .../references/styles/storybook-watercolor.md | 29 +++ .../references/styles/subway-map.md | 29 +++ .../references/styles/technical-schematic.md | 36 +++ .../references/styles/ui-wireframe.md | 29 +++ 46 files changed, 2404 insertions(+) create mode 100644 skills/creative/baoyu-infographic/SKILL.md create mode 100644 skills/creative/baoyu-infographic/references/analysis-framework.md create mode 100644 skills/creative/baoyu-infographic/references/base-prompt.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/bento-grid.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/binary-comparison.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/bridge.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/circular-flow.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/comic-strip.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/comparison-matrix.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/dashboard.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/dense-modules.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/funnel.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/hierarchical-layers.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/hub-spoke.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/iceberg.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/isometric-map.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/jigsaw.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/linear-progression.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/periodic-table.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/story-mountain.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/structural-breakdown.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/tree-branching.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/venn-diagram.md create mode 100644 skills/creative/baoyu-infographic/references/layouts/winding-roadmap.md create mode 100644 skills/creative/baoyu-infographic/references/structured-content-template.md create mode 100644 skills/creative/baoyu-infographic/references/styles/aged-academia.md create mode 100644 skills/creative/baoyu-infographic/references/styles/bold-graphic.md create mode 100644 skills/creative/baoyu-infographic/references/styles/chalkboard.md create mode 100644 skills/creative/baoyu-infographic/references/styles/claymation.md create mode 100644 skills/creative/baoyu-infographic/references/styles/corporate-memphis.md create mode 100644 skills/creative/baoyu-infographic/references/styles/craft-handmade.md create mode 100644 skills/creative/baoyu-infographic/references/styles/cyberpunk-neon.md create mode 100644 skills/creative/baoyu-infographic/references/styles/hand-drawn-edu.md create mode 100644 skills/creative/baoyu-infographic/references/styles/ikea-manual.md create mode 100644 skills/creative/baoyu-infographic/references/styles/kawaii.md create mode 100644 skills/creative/baoyu-infographic/references/styles/knolling.md create mode 100644 skills/creative/baoyu-infographic/references/styles/lego-brick.md create mode 100644 skills/creative/baoyu-infographic/references/styles/morandi-journal.md create mode 100644 skills/creative/baoyu-infographic/references/styles/origami.md create mode 100644 skills/creative/baoyu-infographic/references/styles/pixel-art.md create mode 100644 skills/creative/baoyu-infographic/references/styles/pop-laboratory.md create mode 100644 skills/creative/baoyu-infographic/references/styles/retro-pop-grid.md create mode 100644 skills/creative/baoyu-infographic/references/styles/storybook-watercolor.md create mode 100644 skills/creative/baoyu-infographic/references/styles/subway-map.md create mode 100644 skills/creative/baoyu-infographic/references/styles/technical-schematic.md create mode 100644 skills/creative/baoyu-infographic/references/styles/ui-wireframe.md diff --git a/skills/creative/baoyu-infographic/SKILL.md b/skills/creative/baoyu-infographic/SKILL.md new file mode 100644 index 00000000000..fea3499cbf4 --- /dev/null +++ b/skills/creative/baoyu-infographic/SKILL.md @@ -0,0 +1,236 @@ +--- +name: baoyu-infographic +description: Generate professional infographics with 21 layout types and 21 visual styles. Analyzes content, recommends layout×style combinations, and generates publication-ready infographics. Use when user asks to create "infographic", "visual summary", "信息图", "可视化", or "高密度信息大图". +version: 1.56.1 +author: 宝玉 (JimLiu) +license: MIT +metadata: + hermes: + tags: [infographic, visual-summary, creative, image-generation] + homepage: https://github.com/JimLiu/baoyu-skills#baoyu-infographic +--- + +# Infographic Generator + +Adapted from [baoyu-infographic](https://github.com/JimLiu/baoyu-skills) for Hermes Agent's tool ecosystem. + +Two dimensions: **layout** (information structure) × **style** (visual aesthetics). Freely combine any layout with any style. + +## When to Use + +Trigger this skill when the user asks to create an infographic, visual summary, information graphic, or uses terms like "信息图", "可视化", or "高密度信息大图". The user provides content (text, file path, URL, or topic) and optionally specifies layout, style, aspect ratio, or language. + +## Options + +| Option | Values | +|--------|--------| +| Layout | 21 options (see Layout Gallery), default: bento-grid | +| Style | 21 options (see Style Gallery), default: craft-handmade | +| Aspect | Named: landscape (16:9), portrait (9:16), square (1:1). Custom: any W:H ratio (e.g., 3:4, 4:3, 2.35:1) | +| Language | en, zh, ja, etc. | + +## Layout Gallery + +| Layout | Best For | +|--------|----------| +| `linear-progression` | Timelines, processes, tutorials | +| `binary-comparison` | A vs B, before-after, pros-cons | +| `comparison-matrix` | Multi-factor comparisons | +| `hierarchical-layers` | Pyramids, priority levels | +| `tree-branching` | Categories, taxonomies | +| `hub-spoke` | Central concept with related items | +| `structural-breakdown` | Exploded views, cross-sections | +| `bento-grid` | Multiple topics, overview (default) | +| `iceberg` | Surface vs hidden aspects | +| `bridge` | Problem-solution | +| `funnel` | Conversion, filtering | +| `isometric-map` | Spatial relationships | +| `dashboard` | Metrics, KPIs | +| `periodic-table` | Categorized collections | +| `comic-strip` | Narratives, sequences | +| `story-mountain` | Plot structure, tension arcs | +| `jigsaw` | Interconnected parts | +| `venn-diagram` | Overlapping concepts | +| `winding-roadmap` | Journey, milestones | +| `circular-flow` | Cycles, recurring processes | +| `dense-modules` | High-density modules, data-rich guides | + +Full definitions: `references/layouts/.md` + +## Style Gallery + +| Style | Description | +|-------|-------------| +| `craft-handmade` | Hand-drawn, paper craft (default) | +| `claymation` | 3D clay figures, stop-motion | +| `kawaii` | Japanese cute, pastels | +| `storybook-watercolor` | Soft painted, whimsical | +| `chalkboard` | Chalk on black board | +| `cyberpunk-neon` | Neon glow, futuristic | +| `bold-graphic` | Comic style, halftone | +| `aged-academia` | Vintage science, sepia | +| `corporate-memphis` | Flat vector, vibrant | +| `technical-schematic` | Blueprint, engineering | +| `origami` | Folded paper, geometric | +| `pixel-art` | Retro 8-bit | +| `ui-wireframe` | Grayscale interface mockup | +| `subway-map` | Transit diagram | +| `ikea-manual` | Minimal line art | +| `knolling` | Organized flat-lay | +| `lego-brick` | Toy brick construction | +| `pop-laboratory` | Blueprint grid, coordinate markers, lab precision | +| `morandi-journal` | Hand-drawn doodle, warm Morandi tones | +| `retro-pop-grid` | 1970s retro pop art, Swiss grid, thick outlines | +| `hand-drawn-edu` | Macaron pastels, hand-drawn wobble, stick figures | + +Full definitions: `references/styles/