#28063 fixed the macOS `/tmp`→`/private/tmp` symlink issue by checking
the RAW path (pre-resolve) against startswith('/tmp/'). That works on
Linux + macOS but not on Windows — Path('/tmp/foo').resolve() returns
C:\\tmp\\foo and isn't the real Windows temp anyway.
Replace the hardcoded '/tmp/' prefix with Path(tempfile.gettempdir()).
resolve() + Path.relative_to() — same idiom as the cwd branch just
below. Works correctly on Linux (/tmp), macOS (/private/var/folders/...),
and Windows (%LOCALAPPDATA%\\Temp).
Test rewritten to use tempfile.gettempdir() so the assertion exercises
the same code path on every platform.
Conflict against the just-merged #28063 (raw_path approach) resolved
by replacing the whole raw_path block — tempfile.gettempdir() is
strictly better than that intermediate fix.
Salvage of #28262 by @Zyrixtrex.
Salvages #23208 by @awizemann. Tracks which chat session created a
kanban task so clients can render a per-session board without falling
back to tenant + time-window heuristics.
- Schema: tasks gains nullable session_id TEXT column with index
(additive migration in _migrate_add_optional_columns).
- ACP: server.py exposes the originating session id via HERMES_SESSION_ID
with save/restore around the agent loop.
- Tool: kanban_create reads HERMES_SESSION_ID (with explicit override).
- CLI: 'hermes kanban list --session <id>' filter; JSON output exposes
session_id.
Path.resolve() follows the /tmp -> /private/tmp symlink on macOS, so
str(path).startswith("/tmp/") is always False for temp-dir paths.
The "Accept Edits" (workspace_session) mode silently refused to
auto-approve every /tmp write on macOS, breaking the documented
behaviour and making the existing test fail on this platform.
Fix: keep the raw expanded path (pre-resolve) for the /tmp prefix
check and continue using the resolved form only for the cwd
relative_to() call where symlink resolution is correct behaviour.
Follow-up to #26543. The sessions table does not have an updated_at
column (see hermes_state.py — only started_at/ended_at), so
row.get('updated_at') always returned None and the str() coercion was
dead code. Use datetime.now(UTC).isoformat() instead, which reflects
exactly what the field means here: 'the title was refreshed at this
moment'. Drop the dead coercion.
Extends #26573 to also catch the case the original PR deliberately left
out: when a tool raises an exception, the agent's tool executor wraps it
in a canonical 'Error executing tool '<name>': ...' string prefix (see
agent/tool_executor.py around the try/except). That prefix is unique to
the wrapper and cannot legitimately appear in well-behaved tool output,
so it is a safe signal that the tool blew up.
Without this, the canonical 'tool raised' case still rendered as a green
'completed' row in Zed despite being a runtime failure — exactly the
class of bug #26573 set out to fix.
Adds a positive test (raised-exception prefix -> failed) and a negative
test (bare 'Error:' word in legit tool output stays completed) so a
future contributor doesn't accidentally widen the rule to false-positive
on compiler/linter diagnostics.
* refactor(bootstrap): consolidate ACP browser bootstrap into install.{sh,ps1}
Delete 687 lines of duplicated browser bootstrap code from
acp_adapter/bootstrap/. All browser installation now routes through
dep_ensure -> install.{sh,ps1} --ensure, using agent-browser install
for Chromium. install.sh gains ensure_browser() with macOS app-bundle
detection and per-distro guidance.
Tracking: #27826
* fix(install.sh): add --ignore-scripts to npm install for camofox
@askjo/camofox-browser has a dependency (impit) whose postinstall
script runs `npx only-allow pnpm`, which fails under npm. Adding
--ignore-scripts avoids the spurious failure without affecting
functionality.
Tracking: #27826
* fix: add explicit return in ensure_browser, narrow exception in entry.py
ensure_browser() now returns 0 explicitly on all success paths.
_run_setup_browser() catches OSError instead of broad Exception,
letting ImportError propagate as a real packaging bug.
Switches `_replay_session_history` from `loop.call_soon`-deferred (after the
`LoadSessionResponse` is written) to `await`-inline (before the response is
constructed) for both `session/load` and `session/resume`. Adds defensive
try/except around the awaited call so a replay helper crash still yields a
successful load response — partial transcripts are acceptable, total
load failure is not.
The deferral was added on May 2 in commit 19854c7cd with the rationale "Zed
only attaches streamed transcript/tool updates once the load/resume response
has completed." That justification was incorrect:
- Zed's current ACP integration (zed-industries/zed
crates/agent_servers/src/acp.rs) explicitly registers the session-update
routing entry BEFORE awaiting the loadSession RPC, with the comment:
"so that any session/update notifications that arrive during the call
(e.g. history replay during session/load) can find the thread."
- Every other reference ACP server (Codex, Claude Code, OpenCode, Pi, agentao)
replays history BEFORE responding to the load request.
- The ACP spec wording ("Stream the entire conversation history back to the
client via notifications") and the natural JSON-RPC reading both mean
"during the request's lifetime", not "after the response resolves".
Empirical reproduction (reported by Biraj on @agentclientprotocol/sdk
v0.21.1): the same custom ACP client works correctly against Codex /
Claude Code / OpenCode / Pi but receives 0 notifications from Hermes
because it measures the per-call notification count at the moment
`loadSession` resolves — which on Hermes was before the `call_soon`-
scheduled replay coroutine had a chance to run.
Changes:
- `acp_adapter/server.py`: remove `_schedule_history_replay`; both
`load_session` and `resume_session` now `await self._replay_session_history`
before returning, wrapped in try/except that logs and continues on
helper exceptions.
- `tests/acp/test_server.py`: replace the single
`test_load_session_schedules_history_replay_after_response`
(which encoded the now-incorrect post-response ordering) with two tests
asserting `events == ["replay", "returned"]` for load and resume.
Add two regression tests confirming that a replay helper raising still
yields a `LoadSessionResponse` / `ResumeSessionResponse` rather than
propagating the exception out as a JSON-RPC error.
Result: 240 ACP tests pass (was 238), ruff clean. Verified end-to-end:
biraj's synchronous notification-counter pattern now sees 6 notifications
during `loadSession` for a 5-message session, matching all other reference
ACP servers.
The `_fenced_text` change in `acp_adapter/tools.py` from the same May 2
commit is orthogonal and intentionally left intact — it's a separate,
still-valid fix for Zed's pipe-as-table rendering.
Refs #12285. Follows up #26943 (which added thought-chunk replay but kept
the deferral).
Persisted assistant `reasoning_content` / `reasoning` fields are now emitted
as ACP `agent_thought_chunk` notifications during `_replay_session_history`,
so editor clients (Zed, etc.) rebuild collapsed Thinking panes when the user
re-opens a session that used a thinking model.
Ordering matches live streaming: thought precedes message text within the
same assistant turn, mirroring how `reasoning_callback` deltas arrive before
`stream_delta_callback` deltas in `events.py::make_thinking_cb` /
`make_message_cb`.
Behavior on non-reasoning histories is unchanged; the replay loop's existing
text / tool_call / tool_call_update / plan emission is preserved bit-for-bit.
Closes#12285.
Credit:
- @Yukipukii1 (#14691) — original thought-replay design via
`acp.update_agent_thought_text`; the tool-call portion of that PR has
since landed via #19139, but the reasoning replay is theirs.
- @HenkDz (#17652 / #18578) — established the `_replay_session_history` and
`_history_*` helper conventions this builds on.
- @D1zzyDwarf (#16531) — also closed by this work.
Wraps every sync->async coroutine-scheduling site in the codebase with a
new agent.async_utils.safe_schedule_threadsafe() helper that closes the
coroutine on scheduling failure (closed loop, shutdown race, etc.)
instead of leaking it as 'coroutine was never awaited' RuntimeWarnings
plus reference leaks.
22 production call sites migrated across the codebase:
- acp_adapter/events.py, acp_adapter/permissions.py
- agent/lsp/manager.py
- cron/scheduler.py (media + text delivery paths)
- gateway/platforms/feishu.py (5 sites, via existing _submit_on_loop helper
which now delegates to safe_schedule_threadsafe)
- gateway/run.py (10 sites: telegram rename, agent:step hook, status
callback, interim+bg-review, clarify send, exec-approval button+text,
temp-bubble cleanup, channel-directory refresh)
- plugins/memory/hindsight, plugins/platforms/google_chat
- tools/browser_supervisor.py (3), browser_cdp_tool.py,
computer_use/cua_backend.py, slash_confirm.py
- tools/environments/modal.py (_AsyncWorker)
- tools/mcp_tool.py (2 + 8 _run_on_mcp_loop callers converted to
factory-style so the coroutine is never constructed on a dead loop)
- tui_gateway/ws.py
Tests: new tests/agent/test_async_utils.py covers helper behavior under
live loop, dead loop, None loop, and scheduling exceptions. Regression
tests added at three PR-original sites (acp events, acp permissions,
mcp loop runner) mirroring contributor's intent.
Live-tested end-to-end:
- Helper stress test: 1500 schedules across live/dead/race scenarios,
zero leaked coroutines
- Race exercised: 5000 schedules with loop killed mid-flight, 100 ok /
4900 None returns, zero leaks
- hermes chat -q with terminal tool call (exercises step_callback bridge)
- MCP probe against failing subprocess servers + factory path
- Real gateway daemon boot + SIGINT shutdown across multiple platform
adapter inits
- WSTransport 100 live + 50 dead-loop writes
- Cron delivery path live + dead loop
Salvages PR #2657 — adopts contributor's intent over a much wider site
list and a single centralized helper instead of inline try/except at
each site. 3 of the original PR's 6 sites no longer exist on main
(environments/patches.py deleted, DingTalk refactored to native async);
the equivalent fix lives in tools/environments/modal.py instead.
Co-authored-by: JithendraNara <jithendranaidunara@gmail.com>
The Zed ACP Registry path (uvx --from 'hermes-agent[acp]==X' hermes-acp)
gets a Python-only install. Browser tools depend on the agent-browser npm
package + Chromium, neither of which are in the wheel. Without an
explicit bootstrap, registry users have no path to working browser tools.
Ship a bundled, idempotent bootstrap script (Linux/macOS bash + Windows
PowerShell) inside acp_adapter/bootstrap/ as wheel package-data. New
entry points:
hermes acp --setup-browser # interactive; prompts before Chromium download
hermes acp --setup-browser --yes # non-interactive
hermes-acp --setup-browser
The terminal-auth flow (hermes acp --setup) also offers the browser
bootstrap as a follow-up after model selection, so first-run registry
users get the option without knowing the flag exists.
Key design choices:
- npm install -g --prefix $NODE_PREFIX so we never need sudo. System Node
on PATH is respected; only the install target is redirected to the
user-writable Hermes-managed Node prefix.
- tools/browser_tool.py::_browser_candidate_path_dirs() already walks
$HERMES_HOME/node/bin, so installed binaries are discovered with no
agent-side code change.
- System Chrome/Chromium detection short-circuits the ~400 MB Playwright
download when a suitable browser already exists.
- Bash + PowerShell live as ONE copy each under acp_adapter/bootstrap/.
Not duplicated under scripts/. install.sh and install.ps1 keep their
inline browser blocks for the source-checkout path.
E2E validated end-to-end:
bash bootstrap_browser_tools.sh --skip-chromium
→ installs agent-browser into ~/.hermes/node/bin/
tools.browser_tool._find_agent_browser()
→ returns the installed path
check_browser_requirements()
→ returns True (browser tools register)
Tests:
- tests/acp/test_entry.py: 11 tests covering --setup-browser dispatch
(linux + windows + --yes forwarding + failure propagation), the
terminal-auth follow-up prompt path, and a package-data wheel-shipping
assertion that catches any future pyproject.toml regression.
Docs: website/docs/user-guide/features/acp.md gains a 'Browser tools
(optional)' subsection with the two-line install + what-it-does.
Previously ACP dangerous-command approvals mixed an invalid ACP
payload shape with partial Hermes option mapping, and the callback
plumbing was shared across worker threads. This commit uses ACP
tool-call updates, preserves Hermes once/session/always semantics,
and scopes approval callbacks to the current worker thread.
- Build permission requests with `update_tool_call` and unique
`perm-check-*` ids in `acp_adapter/permissions.py`
- Keep ACP option mapping explicit and fail closed on unknown outcomes
or request failures
- Set approval callbacks inside the ACP executor worker and read them
from thread-local state in `tools/terminal_tool.py`
- Replace duplicated ACP bridge coverage with focused tests in
`tests/acp/test_permissions.py` and add a thread-local callback test
Replace with for all literal-tuple
membership tests. Set lookup is O(1) vs O(n) for tuple — consistent
micro-optimization across the codebase.
608 instances fixed via `ruff --fix --unsafe-fixes`, 0 remaining.
133 files, +626/-626 (net zero).
teknium1 hit ModuleNotFoundError: No module named 'hermes_bootstrap' after
a code update, on both his Windows machine AND his Linux workstation. The
failure mode is real and affects every user who updates hermes by any path
OTHER than a fully-successful ``hermes update``.
## What happens
hermes_bootstrap.py is a top-level module registered via pyproject.toml's
``py-modules`` list (added by Brooklyn's Windows UTF-8 stdio work). It
must be registered in the venv's editable-install .pth file before Python
can find it as a bare ``import hermes_bootstrap``.
``hermes update`` handles this correctly: (1) git reset --hard, (2) clear
__pycache__, (3) uv pip install -e . (re-registers the package including
the new py-modules list), (4) restart.
BUT if any step AFTER (1) fails — network blip during pip install, PEP 668
on a system Python, venv locked, uv not in PATH, a crash mid-update — the
user is left with new code that references hermes_bootstrap and a venv
that doesn't know about it. Every hermes invocation after that crashes
with ModuleNotFoundError, including ``hermes update`` itself. No recovery
path without manual `uv pip install -e .`.
Also affects users who ``git pull`` the repo directly without running
hermes update — relatively common for developers.
## Fix
Wrap ``import hermes_bootstrap`` in a try/except ModuleNotFoundError
across all 6 entry points (hermes_cli/main, run_agent, gateway/run,
acp_adapter/entry, cli, batch_runner). On Windows, missing bootstrap
means the UTF-8 stdio setup doesn't run — degraded behavior (Unicode
chars may fail to print) but NOT a crash. POSIX is unaffected either way
since the bootstrap is a no-op there.
Once hermes is running again, the user can ``hermes update`` to fully
recover.
## Test update
tests/test_hermes_bootstrap.py::test_entry_point_imports_bootstrap
scans for the first top-level import in each entry point and asserts it
is hermes_bootstrap. Extended the check to accept a Try block whose body
is a lone Import of hermes_bootstrap — that's the recovery-friendly form
we just introduced.
Verified behavior by ``mv hermes_bootstrap.py hermes_bootstrap.py.bak``
and confirming ``python -c "import hermes_cli.main"`` succeeds. 82/82
tests pass (hermes_bootstrap + windows-native + windows-compat).
Codebase-wide fix for Python-on-Windows UTF-8 footguns, complementing
the earlier execute_code sandbox fixes (which remain load-bearing for
when the sandbox explicitly scrubs child env).
Problem: Python on Windows has two long-standing text-encoding pitfalls:
1. sys.stdout/stderr are bound to the console code page (cp1252 on
US-locale installs) — print('café') crashes with UnicodeEncodeError.
2. Subprocess children don't know to use UTF-8 unless PYTHONUTF8 and/or
PYTHONIOENCODING are set in their env — so any Python we spawn
(linters, sandbox children, delegation workers) hits the same bug.
Solution: A tiny bootstrap module (hermes_bootstrap.py) imported as the
first statement of every Hermes entry point:
- hermes_cli/main.py (hermes / hermes-agent console_script)
- run_agent.py (hermes-agent direct)
- acp_adapter/entry.py (hermes-acp)
- gateway/run.py (messaging gateway)
- batch_runner.py (parallel batch mode)
- cli.py (legacy direct-launch CLI)
On Windows, the bootstrap:
- os.environ.setdefault('PYTHONUTF8', '1') (PEP 540 UTF-8 mode)
- os.environ.setdefault('PYTHONIOENCODING', 'utf-8')
- sys.stdout/stderr/stdin.reconfigure(encoding='utf-8', errors='replace')
Children inherit the env vars → they run in UTF-8 mode.
Current process's stdio is reconfigured → print('café') works now.
On POSIX (Linux/macOS), the bootstrap is a complete no-op. We don't
touch LANG, LC_*, or anything else — users who have intentionally
configured a non-UTF-8 locale aren't affected. POSIX systems are
already UTF-8 by default in 99% of modern setups, so there's nothing
to fix.
setdefault() (not overwrite) means users who explicitly set PYTHONUTF8=0
or PYTHONIOENCODING=cp1252 in their environment are respected.
What this does NOT fix: bare open(path, 'w') calls in the *parent*
process still default to locale encoding because PYTHONUTF8 is only
read at interpreter init. A ruff PLW1514 sweep (separate follow-up)
will add explicit encoding='utf-8' at those ~219 call sites for
belt-and-suspenders.
Tests (17): 16 passed, 1 skipped on Windows.
- Windows: env vars set, stdio reconfigured, child inherits UTF-8 mode
- POSIX: complete no-op (verified on fake POSIX + skipped on real
POSIX since we don't have a Linux box in this session)
- Idempotence: multiple calls safe
- Graceful degradation: non-reconfigurable streams don't crash
- User opt-out: explicit PYTHONUTF8=0 is respected
- Load order: every entry point's FIRST top-level import is
hermes_bootstrap, enforced by an AST-level parametrized test
pyproject.toml: added hermes_bootstrap to py-modules so it ships with
pip installs.
Extends PR #21400's resource inlining with image-specific handling: ACP
resource_link and embedded blob resources with an image/* mime (or image
file suffix when mime is missing) now emit an OpenAI image_url part
with a base64 data URL, so vision models actually see the image
instead of a [Binary file omitted] note. Non-image resources keep the
existing text-inlining behavior.
Adds 3 tests: local PNG via resource_link, JPEG mime inferred from
suffix when client omits mimeType, and embedded blob PNG.
ACP's save_session() did a non-atomic clear_messages() + append_message()
loop. If any message hit an exception mid-loop (bad tool_call shape, etc.),
the DELETE had already committed and the persisted conversation was lost.
SessionDB.replace_messages() wraps DELETE + bulk INSERT in a single
BEGIN IMMEDIATE transaction that rolls back on any exception, so a bad
message can no longer clobber previously-persisted history.
Salvages @Awsh1's PR #13675 — uses the existing replace_messages()
helper (which covers more message fields than the PR's own copy)
instead of adding a duplicate.
The user-visible /compress banner and the post-compression last_prompt_tokens
writeback both counted only the raw message transcript (chars/4). With a 15KB
system prompt and 30 tool schemas (~26KB), a 4-message transcript that looks
like ~45 tokens to the transcript-only estimator is really ~10.5K tokens of
request pressure — a 234x gap.
Two user-facing consequences:
- Banner shows 'Compressing … (~45 tokens)…' while compression is actually
firing on 10K+ tokens of real pressure, confusing users about why
compression triggered (reported by @codecovenant on X; #6217).
- Post-compression last_prompt_tokens writeback omits tool schemas, so the
next should_compress() check compares real usage against a stale
underestimate — compression triggers late, potentially past the model's
context limit on small-context models (#14695).
Swap estimate_messages_tokens_rough() for estimate_request_tokens_rough()
at every user-visible banner and at the post-compression writeback.
estimate_request_tokens_rough() already existed for exactly this purpose
and includes system prompt + tool schemas.
Touched call sites:
- run_agent.py: post-compression last_prompt_tokens writeback, post-tool
call should_compress() fallback when provider usage is missing
- cli.py: /compress banner + summary
- gateway/run.py: gateway /compress banner + summary
- tui_gateway/server.py: TUI /compress status + summary
- acp_adapter/server.py: ACP /compact before/after
Left intentionally alone:
- Session-hygiene fallback and the 'no agent' /status path in gateway/run.py
— no agent instance is in scope to query for system prompt/tools, and the
existing 30-50% overestimate wobble on hygiene is safety-accepted.
- Verbose-mode 'Request size' logging — informational only, already counts
system prompt via api_messages[0].
Also relabels the feedback line from 'Rough transcript estimate' to
'Approx request size' so the metric label matches what it actually measures.
Credits: diagnoses from @devilardis (#14695) and @Jackten (#6217);
user report @codecovenant on X (2026-04-30).
Closes#14695Closes#6217
When a user types /steer <text> on an ACP session that isn't actively
running a turn (and there's no interrupted-prompt salvage available),
_cmd_steer silently appended to state.queued_prompts and replied
"No active turn — queued for the next turn". That looks identical to
/queue output even though the user never typed /queue — @EddyLeeKhane
reported this as "/steer never works, gets queued instead".
Rewrite the payload to a plain user prompt before the slash-intercept
fires, matching the gateway's idle-/steer fallthrough in
gateway/run.py ~L4898.
UserMessageChunk and AgentMessageChunk do not have a message_id field
in the ACP schema. Passing it silently dropped the kwarg (pydantic
does not raise on unknown init kwargs here) and the subsequent test
assertions on .message_id raised AttributeError. Strip the dead
plumbing (uuid import, message_id= kwarg on both chunk types, unused
session_id/index parameters) and remove the matching .message_id
asserts from the test.
PR #16858's session-scoped interactive sudo password cache falls back to
a thread-identity scope when no HERMES_SESSION_KEY is bound. ACP never
set that contextvar, so two ACP sessions landing on the same reused
ThreadPoolExecutor thread still shared the cache — the exact scenario
the PR headlined.
acp_adapter/server.py now:
- binds HERMES_SESSION_KEY=<session_id> via gateway.session_context
inside _run_agent() (and clears on exit)
- wraps the loop.run_in_executor(_executor, _run_agent) call in a fresh
contextvars.copy_context() so concurrent ACP sessions don't stomp on
each other's ContextVar writes (executor pool threads would otherwise
share a context).
Adds tests/acp/test_approval_isolation.py::
test_sudo_password_cache_isolated_across_acp_sessions_on_same_pool_thread
which drives two back-to-back sessions through a 1-worker ThreadPoolExecutor
and asserts B does not observe A's cached password.
model_tools.py ran discover_mcp_tools() as a module-level side effect.
discover_mcp_tools() uses a blocking 120s wait internally (via
_run_on_mcp_loop -> future.result(timeout=120)).
The gateway lazy-imports run_agent -> model_tools on the first user
message, which happens inside the asyncio event loop thread. A slow or
unreachable MCP server therefore froze Discord shard heartbeats and
Telegram polling for up to 120s on the first message after gateway
start.
Fix: remove the module-level call. Every entry point now runs
discovery explicitly at its own startup, using the context-appropriate
blocking/non-blocking pattern:
- gateway/run.py: loop.run_in_executor(None, discover_mcp_tools)
before platforms start accepting traffic
- hermes_cli/main.py: inline (no event loop at CLI startup)
- tui_gateway/entry.py: inline (sync stdin loop, no event loop)
- acp_adapter/entry.py: inline before asyncio.run()
Closes#16856.