* feat(tools): always show Nous Tool Gateway backends, login on select
The Nous-managed Tool Gateway rows in `hermes tools` (Firecrawl, OpenAI
TTS, Browser Use, FAL image/video) were hidden unless the user was already
logged into Nous Portal with paid access. Now they are always listed.
Selecting one runs an inline Nous Portal device-code OAuth + entitlement
check — auth only, no inference-provider switch and no bulk 'enable all
tools' prompt (that stays in `hermes model`). The row only activates the
gateway once paid access is confirmed.
- _visible_providers: stop hiding managed_nous_feature rows (incl. those
also flagged requires_nous_auth); pure pre-auth UX rows still gate on login
- nous_subscription.ensure_nous_portal_access(): auth + entitlement gate
that preserves the user's active inference provider
- _configure_provider / _reconfigure_provider: run the inline gate for
managed backends; write config only when entitled
- picker marker: 'via Nous Portal (login on select)' for logged-out users
- _hidden_nous_gateway_message: now a no-op (rows are never hidden)
* docs: hermes tools is a first-class Tool Gateway entry point
The Tool Gateway docs framed `hermes setup --portal` / `hermes model` as
the activation path and only mentioned `hermes tools` for mixing in your
own keys. With the inline-login change, picking a Nous-managed backend in
`hermes tools` is a complete path on its own — it logs you into Nous
Portal on select if needed, without switching your inference provider or
prompting to enable every other tool.
- tool-gateway.md: Get started now lists three peer entry points; new
paragraph explaining login-on-select and the no-prompt fast path when
OAuth is already active
- nous-portal.md + run-hermes-with-nous-portal.md: note that managed rows
appear logged-out and trigger inline login on select
The setup provider->model sub-menu (and three sibling pickers) used
simple_term_menu.TerminalMenu, whose ESC and arrow-key handling was
unreliable across terminals — notably ESC failed to back out of the
model selection list on terminals that emit raw escape sequences (e.g.
Ghostty). The codebase already notes simple_term_menu 'conflicts with
/dev/tty' and causes 'ghost-duplication rendering', and a prior attempt
to migrate these (closed PR) confirmed the same root cause.
Route all four single-select pickers through the shared, already-hardened
curses_radiolist (which decodes raw CSI/SS3 escape sequences and handles
ESC consistently, fixed in #35776):
- auth.py _prompt_model_selection — model picker; the pricing column
header and the unavailable-models block are passed as the radiolist
description so they survive the curses screen clear. ESC now cancels.
- main.py _prompt_reasoning_effort_selection — reasoning-effort picker.
- main.py _model_flow_named_custom — named custom-provider model picker.
- main.py _remove_custom_provider — provider-removal picker.
simple_term_menu is no longer imported anywhere (only stale comments
referenced it; one in setup.py is corrected). The numbered-input
fallbacks are unchanged and still trigger on curses errors / non-TTY.
Tests: updated test_terminal_menu_fallbacks / test_reasoning_effort_menu
/ test_custom_provider_model_switch / test_model_provider_persistence to
drive the fallback via curses_radiolist errors instead of breaking
simple_term_menu. New test_setup_menu_curses_migration.py asserts each
picker routes through curses_radiolist, ESC cancels, and the pricing
header is preserved. Net -147/+183 (mostly the new test file; production
code shrinks by removing TerminalMenu boilerplate).
The setup wizard's provider/model pickers (curses_radiolist via
prompt_choice) bailed to the numbered "Select [1-N]" fallback the moment
a user pressed up or down. Root cause: even with keypad(True) — which
curses.wrapper sets — many terminals/terminfo entries deliver cursor keys
to getch() as raw CSI/SS3 byte sequences (e.g. 27, 91, 66 for arrow-down)
rather than the translated curses.KEY_DOWN. The menus matched only
curses.KEY_UP/KEY_DOWN and treated the leading 27 (ESC) as cancel, so
navigation dropped into the text fallback and the trailing bytes leaked
into the next input().
Add a shared read_menu_key() helper that decodes CSI/SS3 escape sequences
into normalized NAV_* actions (only a lone ESC, with no continuation byte
within a short timeout, still cancels) and consumes the tail of unhandled
sequences so stray bytes can't corrupt later input(). Route all three
curses menus (checklist, radiolist, single_select) through it.
Add regression tests covering raw CSI/SS3 arrows, translated KEY_*
constants, vim keys, lone-ESC cancel, and full consumption of unhandled
sequences (Delete/Home/End).
* feat(kanban): goal_mode cards run workers in a /goal loop
A goal_mode card wraps its dispatched worker in the Ralph-style goal
loop behind /goal: after each turn an auxiliary judge checks the
worker's response against the card title+body, and if not done the
worker keeps going in the SAME session until the judge agrees, the
worker terminates the task itself, or the turn budget runs out (which
blocks the card for human review — never a silent exit).
- kanban_db: goal_mode + goal_max_turns columns (additive migration),
Task fields, create_task params, INSERT wiring, created-event payload.
- kanban_tools: goal_mode/goal_max_turns on the kanban_create tool so
orchestrators can opt cards in when fanning out.
- kanban CLI: --goal / --goal-max-turns on 'kanban create'.
- dashboard API: goal_mode/goal_max_turns on the create endpoint
(auto-surfaced back via asdict).
- _default_spawn: sets HERMES_KANBAN_GOAL_MODE / _GOAL_MAX_TURNS only
when the card opts in.
- goals.run_kanban_goal_loop: standalone, callback-injected loop engine
(no SessionDB persistence; ephemeral worker). cli.py quiet path calls
it after the worker's first turn when the env vars are set.
- Docs: orchestrator skill + kanban feature page.
Tests: DB roundtrip + legacy migration, spawn env gating, and the loop's
continuation/completion/budget-block/finalize-nudge branches. E2E run
against a real kanban DB confirms a budget-exhausted goal worker lands
in a sticky blocked state.
* feat(kanban/dashboard): goal-mode toggle in the create form
Wires the goal_mode card setting into the dashboard UI (the plugin's
hand-written IIFE bundle, no build step):
- InlineCreate: 'goal mode' checkbox after the skills field; checking it
reveals an optional 'max turns' number input. Both reset on submit and
only post goal_mode/goal_max_turns when enabled.
- TaskDrawer: a 'Goal mode: on (max N turns)' MetaRow so a card's
goal-mode setting is visible after creation (auto-fed by asdict via the
existing _task_dict).
Live-tested through the running dashboard with a browser: created a
goal-mode card with max-turns=8, confirmed it persisted to the kanban DB
(goal_mode=1, goal_max_turns=8) and rendered back in the drawer as
'on (max 8 turns)'. No JS console errors.
Self-review follow-up on top of the salvaged perf fixes:
- gateway/run.py (both watcher-drain sites): the salvaged O(n^2) fix
(#32708) replaced `while pending_watchers: pop(0)` with iterate-then-
`watchers.clear()`, but `watchers` aliased the registry's live list.
A watcher appended by a concurrent session during the `await
asyncio.sleep(0)` yield would be cleared without ever being scheduled.
Detach the batch atomically (`pending_watchers = []`) before iterating.
- gateway/platforms/bluebubbles.py: normalize the salvaged _guid_cache
LRU (#30523) to match feishu/codebase precedent — module-level
`_GUID_CACHE_SIZE` constant, `while len > cap`, and drop the redundant
post-insert `move_to_end` (a fresh insert is already most-recent).
- gateway/platforms/feishu.py: drop the same redundant post-insert
`move_to_end` from the salvaged _message_text_cache LRU (#23706).
- scripts/release.py: add AUTHOR_MAP entries for the salvaged commits'
authors (amathxbt #22155, ErnestHysa #32636/#32708) so the contributor
audit passes when these commits land on main.
- tests/tools/test_tool_output_limits.py: autouse fixture resets the new
module-level limits cache between tests.
- tests/gateway/test_feishu.py: hand-built adapter fixture seeded
_message_text_cache as a plain dict; it's now an OrderedDict, so the
fixture type had to match.
DeepSeek / Baidu Qianfan stream tool-call arguments in cumulative mode:
each chunk resends the full arguments-so-far instead of the new fragment.
The stream accumulator blindly concatenated arg deltas with +=, turning
that into '{...}{...}{...}', which failed json.loads and got nuked to '{}'
— a silently corrupted tool call (#35592). Worse on multi-param tools
(search_files, session_search, memory replace) because longer args take
more chunks, giving more resend opportunities.
- Per-slot cumulative latch in the stream accumulator: a delta that is a
strict superset of the accumulated buffer marks the slot cumulative and
replaces (not appends); exact duplicates are dropped only after latching.
Incremental fragments are untouched (default += path).
- Backstop _collapse_repeated_json_arguments() in the repair pipeline
collapses pure identical-resend buffers (K exact repeats of a valid-JSON
unit) for providers that resend the complete object from chunk 1. Only
reached after json.loads already failed, so compliant single objects are
never touched.
Not a gateway or DeepSeek-model bug — any OpenAI-wire provider in
cumulative streaming mode is affected.
Resize vision tool-result images down to a 4 MB embed cap at load time,
not just at the 20 MB hard ceiling. A 5-20 MB image previously sailed
through the native fast path and got baked into conversation history,
where Anthropic's 5 MB per-image base64 limit rejected every subsequent
turn with a 400 — and because history is immutable, retries could never
clear it, permanently wedging the session.
Also harden the reactive shrink-recovery: it now returns False (don't
retry) when any oversized image part can't be brought under target, so
the single retry isn't burned re-sending a payload that will fail
identically. Previously it returned True after shrinking *any* part,
even when the actual oversized culprit survived.
SSH sessions hard-failed voice mode on the presence of SSH_* env vars
alone, even when a PulseAudio/PipeWire server is running on the host and
audio works (ffplay/aplay/pw-play -> pulseaudio). Probe the default
sound-server sockets (PULSE_SERVER unix path, PULSE_RUNTIME_PATH/native,
$XDG_RUNTIME_DIR/{pulse/native,pipewire-0}) and actually connect() so a
stale socket doesn't count; downgrade the SSH branch to a notice when
audio is reachable. Mirrors the existing Docker/WSL forwarding handling.
Fixes#35622
Follow-up to the salvaged #30728:
- Gateway already exports _HERMES_GATEWAY=1 at startup (gateway/run.py) and
cli.py already keys off it. Drop the redundant new HERMES_IN_GATEWAY var;
guard stop/restart on _HERMES_GATEWAY instead. One marker for one fact.
- Drop the greedy \bgateway.*restart alternation from the cron lifecycle
filter — it false-positived on legit prompts that merely mention an
unrelated gateway + a restart (API/payment gateway monitoring). The
specific 'hermes gateway (restart|stop|start)' pattern already covers the
real command.
- Rework the two negative guard tests to sentinel the first downstream call
so they don't drive real signal delivery (tripped the live-system guard).
- Add false-positive regression cases to test_safe_commands.
Three defenses against SIGTERM-respawn loops when agent schedules its
own gateway restart under launchd/systemd KeepAlive:
1. HERMES_IN_GATEWAY env var: gateway sets it at startup; stop/restart
subcommands refuse to run when set (exit 1 with clear message).
2. Cron create payload filter: regex pre-flight rejects prompts/scripts
containing hermes gateway restart/stop, launchctl kickstart/unload,
systemctl restart/stop, and pkill patterns.
3. 30 new tests: pattern matching (14), cron block (5), gateway guard (4),
safe command negatives (7).
The per-turn file-mutation verifier footer rendered failed-write paths as
bare absolute paths in the user-facing response. The gateway's
extract_local_files() scans response text for bare paths ending in a
deliverable extension (.yaml/.json/etc.), validates os.path.isfile(), and
auto-attaches matches as native uploads — so a denied write to
~/.hermes/config.yaml surfaced the path in the footer and got the
credential file silently uploaded to the messaging channel.
The gateway denylist (validate_media_delivery_path) already blocks the
config.yaml case after #35634. This is defense-in-depth at the source:
backtick-wrap every path the footer emits — both the bullet path and any
path echoed inside the tool's error preview (the protected-file denial
message embeds the path in single quotes, which do NOT block the
extractor regex). extract_local_files skips paths inside inline-code
spans, so wrapping defeats auto-attachment for ANY protected file while
keeping the path human-readable.
- run_agent.py: _format_file_mutation_failure_footer wraps bullet paths;
new _neutralize_footer_paths backticks any remaining bare path (covers
the preview echo). staticmethod -> classmethod (caller unaffected).
- tests: backtick-wrap assertion + end-to-end extract_local_files leak test.
When PTB's general httpx pool is exhausted, it converts httpx.PoolTimeout
into telegram.error.TimedOut whose message states the request was *not*
sent to Telegram. The send retry loop treated all non-connect TimedOut as
non-retryable, so a pool timeout raised immediately, skipped all 3 retry
attempts, and was returned as retryable=False -- silently dropping the
message (agent responses, cron reports, etc.).
A pool timeout means the request never left the process, making it the
safest case to retry. Add _looks_like_pool_timeout() and treat it like a
connect timeout in both the in-loop retry decision and the outer retryable
determination, so pool timeouts flow through the existing backoff loop and
stay retryable on exhaustion.
Reported-by: q3874758 (#35610)
* feat(models): add deepseek-v4-flash to OpenRouter + Nous curated lists
deepseek/deepseek-v4-flash was already in the native deepseek provider
catalog but missing from the curated OpenRouter and Nous Portal picker
lists. Added it to both and regenerated the model-catalog.json manifest
(drift guard requires same-PR regeneration).
* refactor(models): trim redundant variants, group curated lists by maker
Remove claude-opus-4.7/4.6, gpt-5.4-nano, gpt-5.3-codex,
gemini-3-pro-image-preview, gemini-3.1-flash-lite-preview, grok-4.20,
and the older gemini-3-pro-preview (Nous). Reorder both OPENROUTER_MODELS
and _PROVIDER_MODELS[nous] into contiguous per-maker blocks with comment
headers. Regenerated model-catalog.json (openrouter 27, nous 20).
* feat(models): add gemini-3-pro-preview to OpenRouter + Nous curated lists
Adds google/gemini-3-pro-preview to both curated pickers (new on
OpenRouter, restored on Nous). Regenerated model-catalog.json
(openrouter 28, nous 21).
* test(models): use claude-opus-4.8 in OpenRouter fetch fixtures
The two TestFetchOpenRouterModels tests mocked a live OpenRouter
response with claude-opus-4.6 and relied on it surviving the curated-list
filter. Since 4.6 was removed from OPENROUTER_MODELS, those models got
filtered out and the recommended tag shifted. Swap the fixture to
claude-opus-4.8 (still curated, still first in the Anthropic block).
Rewrite TestDiscordMentions as negative assertions (mentions survive the
redactor) and clean up the orphaned comment + dangling whitespace left by
removing _DISCORD_MENTION_RE. Follow-up to the salvaged #32259 fix for #35611.
The 'hermes update' config-migration prompt printed only counts ('1 new
config option available') then asked 'configure them now?' without ever
saying what the options were. Users said no because they couldn't tell what
they were agreeing to. For pure config-format version bumps (no new
env/config keys) it still asked the question, where saying yes just bumped
the version and looked like a no-op.
- List each new env var / config key by name + description before prompting
(cap at 8, then '… and N more'). The data was already available; we just
threw it away and printed a count.
- Pure version bump (no new options): apply the format migration
non-interactively and print what happened, instead of asking a misleading
yes/no.
Reported by ScottFive and Tt2021.
The compact "<n>|content" gutter from #35368 is now the sole behavior.
Removes the HERMES_READ_GUTTER=padded escape hatch and its env lookup —
no legacy fixed-width path to maintain. Padding was pure token overhead
(~48% more tokens than bare content, ~16% more than compact) with no
measured accuracy gain in the original A/B.
- file_operations.py: drop env lookup + os import; gutter always f"{i}|{line}"
- tests: drop the padded env-override test; compact assertions retained
Over SSH the OSC 11 background-color query round-trip routinely exceeds
the 100ms read budget, so _query_osc11_background() gives up and the late
reply lands after prompt_toolkit has grabbed the tty. prompt_toolkit then
injects the OSC payload as typed text and reads its BEL terminator
(\x07 = Ctrl+G) as a keystroke — Ctrl+G is the open-external-editor
binding, dropping the user into vi with garbage and no obvious way out.
- Skip the OSC 11 probe on remote sessions (SSH_CONNECTION/CLIENT/TTY);
fall back to COLORFGBG / env hints / the dark default.
- Restore the tty with TCSAFLUSH instead of TCSANOW so any partial/late
reply is scrubbed from the input buffer before pt reads it.
Relative paths in write_file/patch could resolve against the agent PROCESS cwd
instead of the terminal's working directory. In a git-worktree session with a
stale TERMINAL_CWD='.' (a relative base), early edits silently landed in the
MAIN checkout, verified there, and reported success — while the agent inspected
the worktree and saw nothing, misreading it as the patch tool no-op'ing.
- _resolve_base_dir(): resolution base is now ALWAYS absolute. A relative
TERMINAL_CWD is anchored to the process cwd once, deterministically, instead
of being left to resolve()-time cwd. Live terminal cwd stays authoritative.
- write_file/patch pass the resolved absolute path to the shell FileOps layer
so the tool layer and shell layer can't disagree about which file is edited.
- Responses now report the absolute resolved_path and files_modified, so a
wrong-cwd mismatch is visible on the first call.
- _path_resolution_warning(): emits a _warning when a relative path resolves
OUTSIDE the live terminal cwd (e.g. a worktree session writing into main).
Validation: 11 new unit tests + 43 live E2E assertions (worktree routing,
mid-session cd, V4A patches, divergence warning, absolute paths, consecutive
patches); 466 existing file/path/terminal tests green.
Tasks can now carry file attachments (PDFs, images, source docs) that
workers read directly — closes the gap where source material had to be
pasted as a path into the task body.
- kanban_db: task_attachments table (additive), Attachment dataclass,
add/list/get/delete accessors, attachments_root/task_attachments_dir
path helpers (per-board, HERMES_KANBAN_ATTACHMENTS_ROOT override)
- build_worker_context: surfaces each attachment's absolute path so the
worker (full file/terminal tool access) reads it via read_file/pdftotext
- dashboard API: POST/GET/DELETE attachment routes (multipart upload,
25MB cap, traversal-safe filenames, root-containment check on download)
- dashboard UI: Attachments section in the task drawer — upload button,
list with download, per-row remove
- docs + tests (13 cases: DB accessors, REST round-trip, traversal
rejection, collision suffixing, worker-context surfacing)
Closes#35338
test_windows_path_not_matched asserted the pre-fix POSIX-only behavior.
The Windows drive-letter support now intentionally matches these paths,
so replace it with parametrized positive cases plus a relative-path
negative guard, mirroring tests/gateway/test_platform_base.py.
The MEDIA_TAG_CLEANUP_RE and extract_local_files path regex both used
(?:~/|/) to anchor paths, which only matches Unix-style absolute and
home-relative paths. Two additional _TOOL_MEDIA_RE patterns in run.py
had the same limitation. Windows absolute paths (C:\Users\..., D:/...)
were silently ignored, causing MEDIA directive delivery to fail.
Add [A-Za-z]:[/\\] as a third anchor alternative in all four regex
locations (base.py x2, run.py x2). Also update path separators in
extract_local_files from / to [/\\] so it can traverse Windows
directory trees.
Revert accidental + quantifier in MEDIA_TAG_CLEANUP_RE lookahead
that changed match-one to match-one-or-more (unrelated to fix).
Fixes: #34632
The per-platform reconnect watcher auto-paused a platform after 10
consecutive reconnect failures, setting next_retry=inf and requiring a
manual /platform resume to recover. But both pause sites only ever fire
on *retryable* failures — non-retryable errors (bad auth) already drop
out of the retry queue earlier. So a transient DNS outage that spanned
the watcher's backoff window would silently park the bot forever, even
after connectivity returned.
The watcher's own docstring already promised 'retryable failures keep
retrying at the backoff cap indefinitely' — the code contradicted it.
Remove the auto-pause from both reconnect-failure branches. Retryable
failures now retry at the 5-min backoff cap forever and self-heal once
the network recovers. The circuit breaker (_pause_failed_platform /
_resume_paused_platform) stays for manual /platform pause|resume.
Fixes#35284.
Convert the salvaged text-debounce delays from HERMES_* env vars to
config.yaml (gateway.platforms.<name>.extra.text_batch_delay_seconds /
text_batch_split_delay_seconds), per the '.env is for secrets only'
policy. Adds a finite/non-negative guard so bad YAML values fall back to
the defaults instead of crashing asyncio.sleep().
- whatsapp.py / weixin.py: read delays via _coerce_float_extra(config.extra)
- update Weixin content-dedup regression test for the deferred dispatch path
- add text-debounce coverage (whatsapp + weixin): defaults, config override,
bad-value fallback, env-var-ignored, burst-collapse, lone-message
- docs: WhatsApp + Weixin config keys
The merged 0.0.0.0/:: insecure-bind fix (#35141) did not cover binding
directly to a specific non-loopback address (e.g. a Tailscale/LAN IP via
--host 100.64.0.10 --insecure). In that mode the dashboard HTML loaded but
every WebSocket upgrade was rejected by the loopback-only peer guard, so
/chat connected then silently received no data.
Generalize _ws_client_is_allowed to lift the loopback-only peer gate for
any explicit non-loopback bound host, not just the 0.0.0.0/:: wildcard.
DNS-rebinding stays blocked: _ws_host_origin_is_allowed already requires
the Host header to exactly match the bound interface for explicit binds,
mirroring _is_accepted_host on the HTTP layer.
Co-authored-by: pxdsgnco <14163800+pxdsgnco@users.noreply.github.com>
Layer an exception guard on top of the empty-response fix so a crash
inside the agent (e.g. OSError from prompt_toolkit/Vt100 when stdout is a
non-TTY pipe, per #30623) is surfaced on the real stderr with rc=1 instead
of crashing past the redirect_stderr block. KeyboardInterrupt/SystemExit
are re-raised so Ctrl-C and explicit exits still propagate.
Also map briancl2 in scripts/release.py AUTHOR_MAP for the cherry-picked
empty-response commit.
Adapts the exception-guard approach from sweetcornna's PR #33818.
Co-authored-by: sweetcornna <96944678+ymylive@users.noreply.github.com>
browser_console(expression="document.body") returned the cryptic CDP error
"Object reference chain is too long" instead of a usable result.
With returnByValue=true, Chrome deep-serializes the eval result; for a live
DOM Node/NodeList/Window that serialization overruns CDP's recursion guard
and fails the whole call with a protocol-level error (not a JS exception),
which _browser_eval surfaced raw.
- browser_supervisor.evaluate_runtime: on that specific error, retry once
with returnByValue=false so Chrome returns the node's description string —
the same graceful path already used for document.querySelector() results.
- browser_tool._browser_eval (CLI subprocess fallback): the subprocess can't
retry, so convert the reference-chain error into actionable guidance
(extract a primitive / use JSON.stringify) instead of leaking it raw.
No expression rewriting — normal evals (1+41 -> 42) are untouched.
A handoff persisted under an older SUMMARY_PREFIX can be inherited into a
resumed lineage. _strip_summary_prefix only matched the current/legacy
literal, so on re-compaction the old 'resume exactly from Active Task'
directive stayed embedded in the body and kept hijacking replies to new,
unrelated user messages.
- Add _HISTORICAL_SUMMARY_PREFIXES (pre-#35344 prefix) and strip/recognize
them in _strip_summary_prefix + _is_context_summary_content so resumed
stale handoffs are re-normalized to the current latest-message-wins prefix.
- Reconcile the overlapping Active Task template edits from the salvaged
#26290 (reverse-signal cancellation) and #32787 (capture open questions /
decisions, don't write None too eagerly) — both intents kept.
- Regression coverage in tests/agent/test_resume_stale_active_task.py.
- AUTHOR_MAP entries for both salvaged contributors.
SUMMARY_PREFIX previously contained two contradictory directives:
1. "treat it as background reference, NOT as active instructions"
"Do NOT answer questions or fulfill requests mentioned in this summary"
"Respond ONLY to the latest user message that appears AFTER this summary"
2. "Your current task is identified in the '## Active Task' section of the
summary — resume exactly from there."
When the latest user message contradicted Active Task (e.g. 'stop the
i18n refactor', 'never mind, look at grafana instead'), models tended to
follow (2) anyway because 'resume exactly' is a strong, unambiguous
directive — leading to repeated re-surfacing of already-cancelled work
across turns, even after explicit 'stop'/'don't keep bringing that up'
messages from the user.
This change:
- Removes the conflicting 'resume exactly from Active Task' clause.
- Makes the precedence explicit: latest user message is the single source
of truth; it WINS on conflict; cancelled Active Task / In Progress /
Pending User Asks / Remaining Work must be discarded entirely (no
'wrap up the old task first').
- Names canonical reverse signals (stop, undo, roll back, never mind,
just verify, topic change) so the model recognizes them as cancellation
triggers, not background context.
- Updates the summarizer template instruction so the LLM doesn't
mechanically copy a cancelled task into Active Task on the next
compaction (it's instructed to copy the reverse signal verbatim).
- Preserves: REFERENCE ONLY framing, MEMORY.md/USER.md authority, and
the 'don't repeat work already reflected in session state' clause.
Adds tests/agent/test_summary_prefix_semantics.py to pin invariants so
the conflict can't regress.
Tested:
- All compaction tests pass: tests/agent/test_context_compressor.py,
tests/agent/test_context_compressor_summary_continuity.py,
tests/run_agent/test_413_compression.py,
tests/run_agent/test_compression_persistence.py,
tests/run_agent/test_compression_boundary_hook.py,
tests/cli/test_manual_compress.py — 117/117 passing.
- Tested on macOS.
Follow-up on the #35309 regression test: the trailing `with _lock: pass`
asserted nothing. Replace it with a concrete assertion that
_interrupted_threads is empty after the worker exits, directly verifying
the leak the fix prevents.
When _invoke_tool raises a BaseException (CancelledError, KeyboardInterrupt),
the cleanup code at the end of _run_tool was bypassed because it sat outside
the except block (which only catches Exception). ThreadPoolExecutor recycles
thread IDs, so the leaked tid in _interrupted_threads poisons the next tool
scheduled on that thread — it instantly aborts with 'Interrupted'.
Move the discard + _set_interrupt(False) into a finally block so cleanup
runs regardless of how the worker exits.
Fixes#35309
Empty model could reach the API on a recovery turn after stream_interrupt_abort,
failing HTTP 400 "No models provided" with no recovery — the session went
silent until the user manually re-sent (#35314).
- gateway/run.py: cache last-successfully-resolved model per session (+ a
process-wide slot); when a fresh config read returns an empty model on a
recovery turn, reuse the last-known-good instead of building model="".
- run_agent.py + agent/conversation_loop.py: only emit "trying fallback..."
status when a fallback chain actually exists, so the UI stops announcing a
fallback that will never run (also #17446).
- tests: empty-model recovery + _has_pending_fallback gate.
read_file's gutter used a fixed-width zero/space-padded prefix
(" 1|content"). The padding is pure token overhead: measured with
cl100k on real Hermes source, the padded gutter costs ~48% more tokens
than bare content and ~16% more than a compact "<n>|content" gutter,
because the leading spaces tokenize into extra tokens on every line.
Switched the default to the compact "<n>|content" form. An A/B
(Sonnet 4.6 via OpenRouter, 2 passes, 4-task battery, every claim
verified against ground truth) showed:
- padded : 4/4 PASS both passes
- compact : 4/4 PASS both passes ← keeps line-referencing + patch
- none : 3/4 PASS both passes ← dropping numbers entirely made
the model hand-count lines and answer off-by-one (33 vs 34)
So we keep the line numbers (the model genuinely uses them to reference
lines) but drop the wasteful padding — capturing ~14% of the read-token
cost with zero measured accuracy change. Dropping numbers entirely
(the larger 33% saving) is rejected: it regresses line-referencing.
patch/fuzzy_match never consumed the gutter (they match old_string text
and compute char offsets internally), so editing is unaffected. No
downstream parser keys on the fixed-width columns. HERMES_READ_GUTTER=
padded restores the legacy format for anyone relying on alignment.
Tests: updated the 3 format assertions to the compact gutter; added an
env-override test for the legacy padded format. 209 file-tool tests green.
Some Windows editors prepend an invisible UTF-8 BOM (U+FEFF) to text
files. We had no awareness of it, so: read_file surfaced a phantom
U+FEFF as the first character; patch matches against the true first
line could miss; and a write/patch round-trip silently stripped the
marker, changing the file's byte signature.
Now:
- read_file / read_file_raw strip a single leading BOM so the model
never sees it (only on the first chunk — the marker lives at byte 0).
- patch_replace strips the BOM before fuzzy-matching (so an exact
first-line match works) and its post-write verification compares
BOM-stripped content.
- write_file restores the BOM when the original file had one and the
new content doesn't, mirroring the existing line-ending preservation
(detect on disk via a cheap `head -c 3` probe or reuse pre_content,
re-prepend across the edit). Guards against double-BOM.
Mid-content U+FEFF is left alone (it's data there, not a file marker).
Tests: TestBomHandling (real LocalEnvironment) — read-strips, raw-read
strips, write preserves, no-BOM-when-original-had-none, no-double-BOM,
patch round-trip preserves, patch matches first line through a BOM,
plus helper unit tests. 208 file-tool tests green.
Adds a 'hermes prompt-size' command that reports the fixed prompt budget
for a fresh session: system prompt total, skills index, memory, user
profile, prompt tiers, and tool-schema JSON bytes. Runs offline (dummy
credentials force the direct-construction path, no network call).
Lets users see which block dominates their per-call payload — the skills
index is often the largest single block when many skills are installed
(issue #34667). Zero model-tool footprint: it's a top-level CLI
subcommand, not an agent tool.
--platform <name> simulates a channel's platform hint; --json emits a
machine-readable breakdown.
Closes#34667
The 'summoning hermes…' phase blocked on gateway.ready, which ran MCP
tool discovery inline. Any configured-but-unreachable MCP server burned
its full connect-retry backoff (1+2+4s ≈ 7s) before the composer
appeared — startup went from instant to ~7.5s of dead air for anyone
with a down stdio/http server in mcp_servers.
Move discovery into a background daemon thread so gateway.ready fires
immediately; tools register into the shared registry as servers connect,
and the agent isn't built until the first prompt. Measured spawn→ready:
~7500ms → ~115ms (dead twozero_td server in config).
Also drop rich.console + prompt_toolkit off banner.py's import path
(lazy-imported inside cprint/build_welcome_banner). tui_gateway.server
imports banner only to reach the lightweight prefetch_update_check
helper; the eager rich/pt imports added ~45ms before gateway.ready for
no benefit. tui_gateway.server import: ~115ms → ~69ms.
The no-home-channel error for send_message derived the env var name
generically as <PLATFORM>_HOME_CHANNEL, producing EMAIL_HOME_CHANNEL for
the email platform. But gateway/config.py reads EMAIL_HOME_ADDRESS, so a
user following the error's guidance would set a variable that is never
consulted. Add a per-platform override map so the email hint names the
variable actually read; all other platforms keep the generic hint.
When using send_message with the email platform, valid email addresses
like user@example.com were not recognized as explicit targets by
_parse_target_ref(). This caused the function to return (None, None,
False), forcing the system into channel-name resolution which has no
way to resolve a raw email address, resulting in 'No home channel set
for email' errors.
Add _EMAIL_TARGET_RE pattern and email platform handler in
_parse_target_ref() so email addresses are treated as explicit targets
and routed directly without requiring a home target configuration.
Adds two real-client tests on top of the salvaged #34783 fix:
- config-less custom:<name> endpoint routes via the carried live base_url
(guards the #34777 symptom directly, not just the wiring)
- named custom:<name> WITH a config entry still resolves via the
named-custom branch (regression guard against collapsing to bare custom)
When a user configures a custom: provider (e.g. custom:openclaw-router),
set_runtime_main() only stored provider and model in process-local globals.
_resolve_auto() then had no base_url or api_key for the custom endpoint,
causing Step 1 to fail and auxiliary tasks (approval, compression, title
generation) to fall through to the aggregator chain and route to wrong
providers.
Fix: extend set_runtime_main() to accept base_url, api_key, and api_mode
keyword arguments; store them in new globals alongside the existing provider
and model; fall back to these globals in _resolve_auto() when the main_runtime
dict is empty. The call site in conversation_loop.py now passes all five
fields from the agent object.
Fixes#34777
hermes update on Windows still aborted with 'Another hermes.exe is running',
listing its own launcher shim(s) as concurrent instances (issues #29341,
#34795). The distlib Scripts\hermes.exe launcher spawns python.exe and waits;
detection runs in the python child, so the launcher shim shows up in
process_iter.
The prior fix walked the ancestor chain with per-hop current.parent() inside
'except: break' — the first psutil AccessDenied/NoSuchProcess (common on
Windows across session/elevation boundaries) bailed the walk early, leaving
the launcher in the candidate set and re-triggering the false positive.
- Switch to proc.parents() (whole ancestor list in one call), evaluate each
ancestor independently so one unreadable hop never strands the launcher.
- Only exclude ancestors whose exe is itself a shim, so a genuine second
hermes.exe under a non-Hermes parent (Desktop backend child) is still flagged.
- Message now prints a copy-pasteable 'taskkill /PID … /F' for the exact stale
PIDs so a user who already closed everything can self-remediate.
Conservative shim-only ancestor approach credited to the parallel attempts in
PRs #29358 (xxxigm) and #31808 (jquesnelle).
Normalize Gmail API message header names to lowercase before lookup so
gmail get/search/reply populate to/subject/from regardless of the casing
the message was stored with. Emit conventional MIME header casing
(To/Subject/Cc/From) on send and reply.
Fixes#34806
Co-authored-by: Donovan Yohan <donovan-yohan@users.noreply.github.com>