The top-center floating HUDs (command palette + session switcher) pin at
top-3, overlapping the titlebar's `[-webkit-app-region:drag]` bands. Drag
regions win hit-testing over the DOM regardless of z-index, so the top of
each surface — the search input — swallowed clicks, leaving only a ~2px
strip focusable. Add `[-webkit-app-region:no-drag]` to the shared
HUD_SURFACE so the whole surface is interactive.
#53552 flipped verify_on_stop to default OFF because the guard fired on
doc/markdown/skill edits and felt like noise. That doc/markdown/skill
suppression already shipped in the same change (_filter_verifiable_paths in
agent/verification_stop.py), so the original noise rationale no longer holds:
the guard already skips prose-only turns.
Restore the surface-aware "auto" default — ON for interactive coding surfaces
(CLI, TUI, desktop) and programmatic callers, OFF for conversational messaging
surfaces (Telegram, Discord, etc.) where the verification narrative would reach
a human as chat noise. The missing/unrecognized fallback in
verify_on_stop_enabled now resolves to the same surface-aware default instead of
hard OFF, so both the DEFAULT_CONFIG value and the resolver agree.
Scope: this changes the shipped default for fresh installs and configs without
an explicit verify_on_stop key. Existing configs that #53552/#54740 migrated to
an explicit `false` are respected and unchanged — this PR does not add a
force-migration of those values back to auto.
These tests patch `<module>.subprocess.run`, which is the shared `subprocess`
module singleton, so the patch is process-wide. Importing `tui_gateway.server`
runs `prefetch_update_check()` at import time, spawning an unnamed daemon thread
(`Thread-N (_run)`) that shells out to `git ... origin` (`text=True, timeout=5`).
That call races the test and lands in the captured list, intermittently failing
`test_tui_gateway_fuzzy_file_listing_hides_git_windows` with either
`KeyError: 'creationflags'` (the daemon's git call has no creationflags) or a
call-count mismatch (3 git calls captured, not 2). It only reproduced under the
parallel test harness because of the extra concurrency/timing.
Filter captured calls to the distinctive argv tokens of the call under test
(`--show-toplevel`, `ls-files`, `branch --show-current`, `diff`, `rg`,
`taskkill`) and read `creationflags` via `.get`, mirroring the existing
hardening on `test_gateway_pid_scan_hides_wmic_and_powershell_windows`. The
production code is unchanged; this is a test-isolation fix.
DesktopController is a route root that had grown a controller's worth of
session-list plumbing inline. Extract the cohesive fetch/paging cluster into
a focused hook and a tested pure helper, per AGENTS.md's "keep route roots
thin" guidance:
- use-session-list-actions.ts: refreshSessions / loadMoreSessions /
loadMoreSessionsForProfile / loadMoreMessagingForPlatform / refreshCronJobs
(plus the private cron/messaging refreshers, sessionsToKeep, and the
excluded-source constants)
- desktop-controller-utils.ts: pure sameCronSignature helper (+ unit tests)
Pure restructuring, no behavior change. desktop-controller.tsx: 1,441 -> 1,233.
Pull ChatBar's module-level pure helpers, constants, and the QueueEditState
type out of the 2.3k-line composer/index.tsx into a focused, testable
composer-utils.ts sibling:
- constants: COMPOSER_STACK_BREAKPOINT_PX, COMPOSER_SINGLE_LINE_MAX_PX,
COMPOSER_FADE_BACKGROUND, DRAFT_PERSIST_DEBOUNCE_MS
- helpers: pickPlaceholder, COMPLETION_ACTIONS, slashChipKindForItem,
slashArgStage, slashCommandToken, cloneAttachments
- type: QueueEditState
Pure restructuring, no behavior change; adds unit tests for the slash helpers.
(The ChatBar component itself is a single tightly-coupled megacomponent; a
deeper hook-based decomposition is left for a dedicated follow-up.)
Behavior-preserving extraction of the 1,942-line thread.tsx transcript
renderer into co-located sibling modules, matching the existing flat
assistant-ui/ convention:
- thread-content.ts / thread-timestamp.ts: pure helpers (+ unit tests)
- thread-types.ts: shared RestoreMessageTarget
- thread-status.tsx: loading / stall / background-resume indicators
- thread-message-parts.tsx: reasoning + tool part components
- assistant-message.tsx, system-message.tsx, user-message.tsx,
user-edit-composer.tsx: the message renderers
thread.tsx now holds only the Thread route component (1,942 -> 119 lines).
Also drops a dead readAloudAudio module variable (no references).
agent.coding_instructions (a string or list) is appended to the coding brief as
its own stable system block, so users can pin project-wide workflow rules
without editing the shipped brief. Coding-posture only and cache-safe (resolved
once per session; takes effect next session). Empty by default.
Add a `pre_verify` user/plugin/shell hook fired once per turn when the agent
edited code and is about to finish, after the existing verify-on-stop guard. A
hook can keep the agent going one more turn (run a check, defer it, tidy the
diff) by returning {"action":"continue","message":...} (the Claude-Code Stop
shape {"decision":"block","reason":...} is accepted too). Hooks receive coding,
attempt, final_response, and sorted changed_paths so they can self-scope and
self-throttle; the path is bounded by agent.max_verify_nudges and preserves
message-role alternation.
Hermes still ships its default coding guidance (agent.verify_guidance, on by
default), but it now rides the evidence-based verify-on-stop missing-evidence
nudge instead of a separate default pre_verify continuation, so it costs no
extra model turn of its own. Guidance reuses the shared utils.is_truthy_value
parser rather than a local copy.
Follow-up to the judge gate. judge_goal() is fail-open at the source:
when no auxiliary model is reachable it returns a "continue" verdict
that is indistinguishable from a real "not done yet" judgment. The gate
treated any non-"done" verdict as a rejection, so an unconfigured or
degraded auxiliary model would wedge every goal_mode worker — it could
never close its own task. That contradicted the gate's own "fail-open"
comment.
Probe judge availability before enforcing (the same auxiliary client
lookup judge_goal performs) and only gate when a judge is actually
reachable. When none is, completion proceeds.
Also fix the rejection guidance: kanban_create takes parents=[...], not
parent=.
Add test_complete_goal_mode_allows_when_judge_unavailable covering the
fail-open path; update the rejection test to force the availability probe.
Apply naqerl's review comments on PR #38388:
- Hoist `from hermes_cli.goals import judge_goal` to module-level
imports so an import failure surfaces at module init, not lazily
on the first goal-mode completion (no circular import: hermes_cli
package init is trivial and does not load tools.kanban_tools).
- Narrow the fail-open `try` to wrap only the judge_goal() call.
The verdict check and its rejection `return tool_error(...)` now
live outside the handler, so a failure there can no longer be
swallowed by the broad except.
- Pass `exc_info=True` to the logger.warning call per CONTRIBUTING.md.
Update the test mock target to tools.kanban_tools.judge_goal, since
the hoisted import rebinds the name into this module's namespace.
Prevents workers in goal_mode from bypassing the auxiliary judge by
calling kanban_complete before acceptance criteria are met. The tool
handler now synchronously invokes the goal judge against the task's
title/body and the completion summary. If the verdict is not "done",
the completion is rejected with actionable guidance for the agent.
This keeps kanban_db.py as a pure SQLite wrapper while intercepting
the bypass exactly at the agent tool-call boundary, aligning with
Hermes separation of concerns.
Fixes#38367
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
The Cmd-K "Install theme…" palette listed Marketplace themes with no hint
that you already had them, and clicking one re-downloaded + re-installed a
theme you owned. The Appearance settings grid already detected this, but by
parsing theme descriptions inline on every render — plumbing that never made
it to the palette.
Lift it into one reactive source and reuse it everywhere:
- $marketplaceInstalls (computed over $userThemes): extensionId -> installed
theme, derived once via marketplaceIdOf and memoized, instead of rebuilding
a Set per render.
- Both install surfaces now mark owned rows installed and, on click,
re-activate the installed theme rather than re-fetching it.
- Drops the duplicated description-parsing in settings and the per-session
"installed here" state in both surfaces (the store is the source of truth,
so previously-installed themes show correctly too).
Add a generic per-platform PlatformConfig.typing_indicator flag (default
True) that gates the _keep_typing refresh loop in
_process_message_background. When false, the loop is never spawned, so no
typing/"is thinking…" status is shown on that platform — message delivery
is otherwise unchanged.
Mirrors the gateway_restart_notification contract exactly: dataclass field
+ to_dict/from_dict (with extra-fallback resolution) + shared-key bridge in
load_gateway_config, so 'slack: typing_indicator: false' under platforms
works without a separate block. Generic by design — the same key works for
every platform (Slack 'is thinking…', Telegram/Discord/Signal typing).
Motivated by users who find Slack's assistant 'is thinking…' status noisy
(it also briefly disables the compose box, via the Assistant API).
degraded is the same wedge class as draining: the gateway came up with
some platforms queued for retry, fell through to the running state
(gateway/run.py #5196), and is serving. A hard-kill there strands
gateway_state=degraded, which (like draining) is not in _AUTOSTART_STATES
and is not an operator stop or a failed boot — so it would stay DOWN
forever on every recreate. Add degraded to _TRANSIENT_RUNNING_STATES so
the fallback path normalises it to running-intent too.
A gateway hard-killed while draining (a container/VM recreate SIGTERMs it
before _stop_impl reaches its terminal-state persist) leaves
gateway_state.json frozen at 'draining'. With no explicit desired_state to
fall back to, container_boot read that transient value literally, found it
not in _AUTOSTART_STATES, and left the gateway DOWN on every subsequent
boot — dashboard up, messaging silently dark. Observed on a relay-opted-in
staging instance (2026-06): the s6 gateway-default slot kept its 'down'
marker across recreates and the gateway never came back.
'draining' is a transient sub-state of RUNNING (written by the drain
watcher / scale-to-zero go-dormant path), never an operator stop and never
a failed boot. Normalise it to 'running' in the gateway_state fallback so a
stranded drain marker reads as the run-intent it represents. This extends
gateway/run.py's #42675 handling (persist 'running' on an unexpected signal)
to the case where the gateway died before persisting anything at all.
'starting'/'startup_failed' are deliberately NOT normalised — those mean a
mid-boot death and must stay down to avoid the crash-loop the down-marker
guard prevents. An explicit desired_state still wins verbatim, so an
operator stop survives a transient 'draining' runtime value.
Tests: draining named-profile + default-root autostart (both fail without
the fix), plus a guard that an explicit desired_state=stopped still blocks a
draining runtime.
Port the two genuinely-novel ideas from Command Code's /design skill into
our existing claude-design skill (skill-only, zero model-tool footprint):
- Surface-First: commit to one of 7 surface archetypes (Monitor/Operate/
Compare/Configure/Decide/Explore/Command) before any visual tokens. Most
AI design slop is compositional, not cosmetic — conditioning generation on
a surface choice collapses entropy the way a CoT step does. Workflow step 3.
- Slop Diagnostic: the ~10 tells that account for ~90% of the 'this is AI'
signal, as a score-out-of-10 self-audit. Diagnose-then-treat: the report is
context not a to-do list; repair only what fired, matched to the tell
(re-layout vs recolor vs de-decorate). Workflow step 7 (Verify).
Did NOT clone /design's 16-mode CLI, proprietary reference corpus, or make it
a core tool. Docs page regenerated via generate-skill-docs.py.
Add durable public-URL output and URL-based chaining to xAI Grok Imagine:
- Store generated media on files-cdn with permanent public HTTPS URLs
(public_url: true, no expiry by default).
- Chain by URL: generate -> edit -> extend each take a prior result's
public HTTPS URL (or a data URI / local file for inputs).
- Add provider-specific xai_video_edit and xai_video_extend tools.
- Image generation: public-URL/storage output, multi-reference edits,
and ~/ local-path support for image edits.
Credentials use xAI Grok device-code OAuth (separate PR).
A Slack user/legacy token (xoxp-...) makes auth.test resolve to the
installing human's member ID with no bot_id, so the adapter binds its
identity (_bot_user_id / _team_bot_user_ids) to that human. Every
"is this the bot?" check then misfires: that person's <@...> mentions
wake the bot and are stripped as the bot's own mention, so the agent is
genuinely told it was @mentioned and replies to messages merely
addressed to that human (symptom: bot responds to "@trevor ..." and
insists it was explicitly mentioned).
There is no runtime API error to catch — a user token still
sends/receives — so the only detectable moment is connect time. Add a
warning-only nudge (_warn_if_not_bot_token) alongside the existing
group-DM scope nudge: when auth.test resolves a user_id but no bot_id,
log that the token is a user token and to use the xoxb-... Bot User
OAuth Token. Warning-only: does not block a working-but-misconfigured
install. Fires once per workspace per process.
DRY: the roomier-side bias computed its probability two ways
(STROLL_TOWARD_ROOM and 1 - STROLL_TOWARD_ROOM). One draw XNOR'd against
the roomier side says the same thing more plainly.
The floating pet wandered almost constantly: every idle beat picked a new
walk and hops fired ~45% of the time, so it read as nervous rather than
alive. Make movement the exception, not the default, and split the
overgrown roam hook into focused modules.
Behavior (per ambient game-AI: GameAIPro ch.36 + idle/wander state
machines):
- Loaf, don't pace: most decision beats just keep resting (REST_CHANCE
0.62) instead of always re-walking.
- Memoryless dwell: pauses now draw from an exponential distribution
(mostly short rests, the occasional long loaf) instead of a uniform
1.8-5.2s window, so the cadence never reads as a metronome.
- Hops dialed back 0.45 -> 0.2 (the jumpiest, noisiest motion).
Structure (no god-file; a hook should own one narrow job):
- roam-behavior.ts - what to do & when (dwellMs, chooseMove,
pickStrollTarget) + tuning. Pure, rng-injectable.
- roam-geometry.ts - where it can stand (snapshotLedges, overlayLedge,
resolveLedge, overlapsX, groundTop). DOM measurement + pure ledge math.
- use-pet-roam.ts - the physics/RAF loop only.
Tests: deterministic, rng-seeded unit coverage for the decision + geometry
helpers (behavior contracts, not snapshots).
Channel users get the same context split the desktop popover shows
(PR #54907) — system prompt, tools, rules, skills, MCP, subagents,
memory, conversation — under the existing Context line in /usage.
Reuses agent.context_breakdown.compute_session_context_breakdown, so
there is no new tool and no new engine. The slices are estimates
(chars/4) and the block is labelled _(estimated)_; the headline
Context line keeps using the provider-measured last_prompt_tokens.
Rendering is fail-open: any engine error returns no breakdown and the
rest of /usage is unaffected.
- gateway/slash_commands.py: _context_breakdown_lines() helper + wire
into _handle_usage_command
- locales/*.yaml: breakdown_header, breakdown_line, and 8 category
labels across all 16 locales (parity gate)
- tests/gateway/test_usage_command.py: render + fail-open coverage
The self-hosted OIDC dashboard provider was public-client + PKCE only, with
two `# TODO(confidential-client)` seams. Authentik and Keycloak commonly
default a new OIDC client to *confidential*, whose token endpoint rejects an
unauthenticated exchange (`invalid_client`) — so a self-hoster who accepts
their IDP's default could not complete dashboard login without manually
flipping the client to public.
Add optional confidential-client support:
- New optional `client_secret` (env `HERMES_DASHBOARD_OIDC_CLIENT_SECRET`,
or `dashboard.oauth.self_hosted.client_secret`; env-wins-config, empty
treated as unset). It is a credential, so docs steer operators to the
`.env` file; config.yaml is supported only for precedence symmetry.
- `_token_endpoint_auth()` selects `client_secret_basic` (HTTP Basic header)
vs `client_secret_post` (form body) from the IDP's advertised
`token_endpoint_auth_methods_supported`, defaulting to basic (the OIDC
default) when absent. Applied to complete_login, refresh_session, and
revoke_session (RFC 7009 §2.1).
- PKCE is sent in BOTH modes — the secret is client authentication layered
on top, never a replacement (OAuth 2.1 / RFC 9700 keep PKCE mandatory).
- Basic header url-encodes client_id/secret before base64 per RFC 6749
§2.3.1, so reserved chars (`:`, `@`, space) round-trip correctly.
Non-breaking: with no secret configured the provider is a pure public PKCE
client, byte-identical to prior behaviour (no Authorization header, no
client_secret in the body). The secret is never logged — register() reports
only a `confidential=<bool>` flag.
Tests: 16 new cases covering basic/post selection, default-when-absent,
public-unchanged contract, PKCE-preserved, reserved-char url-encoding,
blank-secret-is-public, refresh + revoke auth, no-secret-in-logs, and
env/config register wiring. Full dashboard-auth suite (nous provider,
middleware, gate, cookies, WS, 401-reauth, status endpoint) — 396 tests —
green, proving no existing auth path regressed.
* feat(display): friendly human-phrased tool labels for built-in tools
Built-in tools now render ChatGPT-style status verbs ('Searching the web
for ...', 'Reading <file>', 'Browsing <url>') on the CLI spinner and
gateway/desktop tool-progress instead of the raw tool name.
- agent/display.py: _TOOL_VERBS map + build_tool_label() + set/get
friendly-labels flag (default on). Custom/plugin/MCP tools fall back to
the raw preview; verbose gateway mode left untouched (debug surface).
- tool_executor.py / tui_gateway / gateway: route the three spinner sites,
the TUI _tool_ctx, and the gateway all/new progress line through the label.
- config: display.friendly_tool_labels (default True, per-platform aware).
Zero new core tool / schema footprint — pure display layer.
* docs: add PR infographic for friendly tool labels
* fix(display): preserve arg preview in gateway friendly labels + update tests
The first gateway pass re-derived the label from the callback's `args`, which
is empty ({}) at the gateway tool.started callsite — the command/query lives in
the `preview` string, so terminal rendered as a bare '💻 Running' and dedup
collapsed consecutive commands. Now the gateway prefixes the verb onto the
already-computed preview via get_tool_verb/tool_verb_connector/verb_drops_preview,
preserving the command/url/query. CLI spinner path (real args) keeps build_tool_label.
Tests: update test_run_progress_topics exact-format assertions to the friendly
form ('💻 Running pwd'), add a format-agnostic preview extractor for the
truncation tests (works for both quoted-legacy and verb-prefixed output).
* test(tui): update resume-display context to friendly tool label
_tool_ctx now uses build_tool_label, so the desktop resume-view context for a
search_files turn reads 'Searching files for resume' instead of the bare
'resume' preview — consistent with live tool-progress. Update the assertion.
* test(tui): harden no-race worker test against sibling shard leakage
test_session_create_no_race_keeps_worker_alive flaked under -j 8: a daemon
build thread leaked from a prior session.create test in the same shard process
fires close/unregister against its own (foreign) session_key after this test
patches the global approval hooks, polluting the captured lists. Scope the
assertions to this session's own session_key so the regression intent
(this session's worker/notify must survive) is preserved while the test
becomes immune to shard composition. Not related to friendly-tool-labels.
Gateway half of relay-platform-parity Phase 2.5 (D-Q2.5). The relay wire's
platform-neutral scope discriminator is renamed guild_id → scope_id; this is the
hermes-agent side of the cross-repo wire-compatible migration.
- SessionSource: scope_id is canonical; guild_id kept as @deprecated alias.
__post_init__ mirrors the two so all existing SessionSource(guild_id=...)
constructors across native adapters keep working unchanged. to_dict dual-WRITES
scope_id+guild_id; from_dict dual-READS scope_id ?? guild_id.
- relay/adapter.py: capture + outbound metadata dual-read/write scope_id.
- relay/ws_transport.py: _frame_to_event dual-reads scope_id ?? guild_id.
- docs/relay-connector-contract.md: document scope_id (canonical) + guild_id
(deprecated alias) in the §3 SessionSource field table (conformance test).
250 relay+session+contract tests green. Solo lane (relay).
The self-hosted OIDC dashboard login rejected any http:// redirect_uri
whose host was not localhost/127.0.0.1, surfacing "redirect_uri may only use http:// for localhost/127.0.0.1" before reaching the IDP. This broke self-hosted dashboards reached over plain HTTP (including LAN IPs, internal hostnames, and reverse proxies that terminate TLS upstream).
#38827 already dropped this check from the nous provider, but the generic self-hosted provider copied the old localhost-only
branch and reintroduced the bug for HERMES_DASHBOARD_OIDC_ISSUER setups.
The IDP's own allowlist is authoritative on which redirect_uris are
permitted; this client-side _validate_redirect_uri is only a fast-fail for
obvious operator error and should not second-guess valid http:// deployments.
Fix: drop the localhost-only branch on the http scheme. Validation now enforces only that the scheme is http(s) and the path ends with
/auth/callback. Updated the docstring to explain the relaxed contract,
and added test_allows_http_with_arbitrary_host covering an internal
hostname and a LAN IP alongside the existing localhost case.
The topic-mode helpers (_telegram_topic_mode_enabled,
_recover_telegram_topic_thread_id, _record/_sync_telegram_topic_binding,
_is_telegram_topic_lane/_root_lobby, _normalize_source_for_session_key,
_telegram_topic_new_header, _schedule_telegram_topic_title_rename, and the
base.py _apply_topic_recovery hook) each run a synchronous SessionDB read or
write. They reach the event loop through async handlers, so a contended
state.db froze the loop the same way the handoff watcher did.
These helpers already run off-loop in the run_sync thread-pool closure, so
they are proven thread-safe there. Rather than colour them async, loop-side
callers now invoke them via asyncio.to_thread(...); the executor callers are
unchanged. Inside the helpers the SessionDB handle is unwrapped to the sync
door (getattr(db, '_db', db)) since they always run on a worker thread, and
AIAgent construction + query_session_listing are handed the sync SessionDB
directly. base.py wraps its single _apply_topic_recovery call in to_thread.
The guard is now alias-aware (catches db = getattr(self, '_session_db', None);
db.method(...)) and enforces the offload contract: the offloaded sync helpers
may never be called bare on the loop. Sibling test fixtures wrap their injected
SessionDB in AsyncSessionDB to match how the gateway holds it.
The migration's call-site sweep keyed on the literal self._session_db.
spelling and missed calls bound to a local first
(db = getattr(self, '_session_db', None); db.method(...)). Convert the
three in async contexts: get_telegram_topic_binding in the topic-rename
coroutine, and the two update_session_model sites on the model-switch path.
* fix(gateway): skip confirmed-dead delivery targets (deleted groups, blocked bots)
A deleted Telegram group, kicked/blocked bot, or deactivated user keeps
throwing Forbidden/not_found on every cron tick and fan-out delivery. Each
retry burns a send against the platform's flood-control envelope and spams
the logs, making the whole session feel broken even when the model call
completed.
Add a small persistent DeadTargetRegistry (per-profile JSON under
HERMES_HOME) that records a target the moment a send reports a whole-chat
death (forbidden / chat-level not_found), and have DeliveryRouter.deliver()
short-circuit it on subsequent attempts. Self-healing: any successful send
clears the flag, so a user re-adding the bot recovers with no manual cleanup.
Thread/topic-level not_found is NOT recorded (adapters already self-heal that
by retrying without reply_to). Transient/timeout errors are never marked dead.
* infographic: dead delivery target skipping