Commit graph

6603 commits

Author SHA1 Message Date
Brooklyn Nicholson
4dbd869ab3 feat(agent): restore surface-aware "auto" default for verify_on_stop
#53552 flipped verify_on_stop to default OFF because the guard fired on
doc/markdown/skill edits and felt like noise. That doc/markdown/skill
suppression already shipped in the same change (_filter_verifiable_paths in
agent/verification_stop.py), so the original noise rationale no longer holds:
the guard already skips prose-only turns.

Restore the surface-aware "auto" default — ON for interactive coding surfaces
(CLI, TUI, desktop) and programmatic callers, OFF for conversational messaging
surfaces (Telegram, Discord, etc.) where the verification narrative would reach
a human as chat noise. The missing/unrecognized fallback in
verify_on_stop_enabled now resolves to the same surface-aware default instead of
hard OFF, so both the DEFAULT_CONFIG value and the resolver agree.

Scope: this changes the shipped default for fresh installs and configs without
an explicit verify_on_stop key. Existing configs that #53552/#54740 migrated to
an explicit `false` are respected and unchanged — this PR does not add a
force-migration of those values back to auto.
2026-06-30 01:43:08 -05:00
Brooklyn Nicholson
0ea318a7d4 test: make windows no-window-flag assertions immune to update-check daemon
These tests patch `<module>.subprocess.run`, which is the shared `subprocess`
module singleton, so the patch is process-wide. Importing `tui_gateway.server`
runs `prefetch_update_check()` at import time, spawning an unnamed daemon thread
(`Thread-N (_run)`) that shells out to `git ... origin` (`text=True, timeout=5`).
That call races the test and lands in the captured list, intermittently failing
`test_tui_gateway_fuzzy_file_listing_hides_git_windows` with either
`KeyError: 'creationflags'` (the daemon's git call has no creationflags) or a
call-count mismatch (3 git calls captured, not 2). It only reproduced under the
parallel test harness because of the extra concurrency/timing.

Filter captured calls to the distinctive argv tokens of the call under test
(`--show-toplevel`, `ls-files`, `branch --show-current`, `diff`, `rg`,
`taskkill`) and read `creationflags` via `.get`, mirroring the existing
hardening on `test_gateway_pid_scan_hides_wmic_and_powershell_windows`. The
production code is unchanged; this is a test-isolation fix.
2026-06-30 01:35:55 -05:00
Brooklyn Nicholson
821d9f709f feat(agent): add configurable coding_instructions
agent.coding_instructions (a string or list) is appended to the coding brief as
its own stable system block, so users can pin project-wide workflow rules
without editing the shipped brief. Coding-posture only and cache-safe (resolved
once per session; takes effect next session). Empty by default.
2026-06-30 00:59:59 -05:00
Brooklyn Nicholson
a10113658b feat(agent): add pre_verify hook and verify-on-stop coding guidance
Add a `pre_verify` user/plugin/shell hook fired once per turn when the agent
edited code and is about to finish, after the existing verify-on-stop guard. A
hook can keep the agent going one more turn (run a check, defer it, tidy the
diff) by returning {"action":"continue","message":...} (the Claude-Code Stop
shape {"decision":"block","reason":...} is accepted too). Hooks receive coding,
attempt, final_response, and sorted changed_paths so they can self-scope and
self-throttle; the path is bounded by agent.max_verify_nudges and preserves
message-role alternation.

Hermes still ships its default coding guidance (agent.verify_guidance, on by
default), but it now rides the evidence-based verify-on-stop missing-evidence
nudge instead of a separate default pre_verify continuation, so it costs no
extra model turn of its own. Guidance reuses the shared utils.is_truthy_value
parser rather than a local copy.
2026-06-30 00:59:29 -05:00
beardthelion
14c4a849b7 fix(kanban): make goal_mode judge gate truly fail-open
Follow-up to the judge gate. judge_goal() is fail-open at the source:
when no auxiliary model is reachable it returns a "continue" verdict
that is indistinguishable from a real "not done yet" judgment. The gate
treated any non-"done" verdict as a rejection, so an unconfigured or
degraded auxiliary model would wedge every goal_mode worker — it could
never close its own task. That contradicted the gate's own "fail-open"
comment.

Probe judge availability before enforcing (the same auxiliary client
lookup judge_goal performs) and only gate when a judge is actually
reachable. When none is, completion proceeds.

Also fix the rejection guidance: kanban_create takes parents=[...], not
parent=.

Add test_complete_goal_mode_allows_when_judge_unavailable covering the
fail-open path; update the rejection test to force the availability probe.
2026-06-29 22:20:19 -07:00
beardthelion
b3c1b3b3f3 fix(kanban): address review feedback on goal_mode judge gate
Apply naqerl's review comments on PR #38388:

- Hoist `from hermes_cli.goals import judge_goal` to module-level
  imports so an import failure surfaces at module init, not lazily
  on the first goal-mode completion (no circular import: hermes_cli
  package init is trivial and does not load tools.kanban_tools).
- Narrow the fail-open `try` to wrap only the judge_goal() call.
  The verdict check and its rejection `return tool_error(...)` now
  live outside the handler, so a failure there can no longer be
  swallowed by the broad except.
- Pass `exc_info=True` to the logger.warning call per CONTRIBUTING.md.

Update the test mock target to tools.kanban_tools.judge_goal, since
the hoisted import rebinds the name into this module's namespace.
2026-06-29 22:20:19 -07:00
beardthelion
0b33bc5396 fix(kanban): gate goal_mode task completion with auxiliary judge
Prevents workers in goal_mode from bypassing the auxiliary judge by
calling kanban_complete before acceptance criteria are met. The tool
handler now synchronously invokes the goal judge against the task's
title/body and the completion summary. If the verdict is not "done",
the completion is rejected with actionable guidance for the agent.

This keeps kanban_db.py as a pure SQLite wrapper while intercepting
the bypass exactly at the agent tool-call boundary, aligning with
Hermes separation of concerns.

Fixes #38367

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-29 22:20:19 -07:00
Ben Barclay
05ac16778b feat(gateway): per-platform typing_indicator toggle
Add a generic per-platform PlatformConfig.typing_indicator flag (default
True) that gates the _keep_typing refresh loop in
_process_message_background. When false, the loop is never spawned, so no
typing/"is thinking…" status is shown on that platform — message delivery
is otherwise unchanged.

Mirrors the gateway_restart_notification contract exactly: dataclass field
+ to_dict/from_dict (with extra-fallback resolution) + shared-key bridge in
load_gateway_config, so 'slack: typing_indicator: false' under platforms
works without a separate block. Generic by design — the same key works for
every platform (Slack 'is thinking…', Telegram/Discord/Signal typing).

Motivated by users who find Slack's assistant 'is thinking…' status noisy
(it also briefly disables the compose box, via the Assistant API).
2026-06-29 21:12:57 -07:00
teknium1
463b1dfa9c fix(container-boot): also autostart a gateway stranded in 'degraded'
degraded is the same wedge class as draining: the gateway came up with
some platforms queued for retry, fell through to the running state
(gateway/run.py #5196), and is serving. A hard-kill there strands
gateway_state=degraded, which (like draining) is not in _AUTOSTART_STATES
and is not an operator stop or a failed boot — so it would stay DOWN
forever on every recreate. Add degraded to _TRANSIENT_RUNNING_STATES so
the fallback path normalises it to running-intent too.
2026-06-29 21:12:36 -07:00
Ben
d3f2931b8c fix(container-boot): autostart a gateway stranded in 'draining' state
A gateway hard-killed while draining (a container/VM recreate SIGTERMs it
before _stop_impl reaches its terminal-state persist) leaves
gateway_state.json frozen at 'draining'. With no explicit desired_state to
fall back to, container_boot read that transient value literally, found it
not in _AUTOSTART_STATES, and left the gateway DOWN on every subsequent
boot — dashboard up, messaging silently dark. Observed on a relay-opted-in
staging instance (2026-06): the s6 gateway-default slot kept its 'down'
marker across recreates and the gateway never came back.

'draining' is a transient sub-state of RUNNING (written by the drain
watcher / scale-to-zero go-dormant path), never an operator stop and never
a failed boot. Normalise it to 'running' in the gateway_state fallback so a
stranded drain marker reads as the run-intent it represents. This extends
gateway/run.py's #42675 handling (persist 'running' on an unexpected signal)
to the case where the gateway died before persisting anything at all.

'starting'/'startup_failed' are deliberately NOT normalised — those mean a
mid-boot death and must stay down to avoid the crash-loop the down-marker
guard prevents. An explicit desired_state still wins verbatim, so an
operator stop survives a transient 'draining' runtime value.

Tests: draining named-profile + default-root autostart (both fail without
the fix), plus a guard that an explicit desired_state=stopped still blocks a
draining runtime.
2026-06-29 21:12:36 -07:00
Jaaneek
9ce79cd642 feat(xai): Imagine public-URL storage, chaining & video edit/extend
Add durable public-URL output and URL-based chaining to xAI Grok Imagine:

- Store generated media on files-cdn with permanent public HTTPS URLs
  (public_url: true, no expiry by default).
- Chain by URL: generate -> edit -> extend each take a prior result's
  public HTTPS URL (or a data URI / local file for inputs).
- Add provider-specific xai_video_edit and xai_video_extend tools.
- Image generation: public-URL/storage output, multi-reference edits,
  and ~/ local-path support for image edits.

Credentials use xAI Grok device-code OAuth (separate PR).
2026-06-29 21:11:58 -07:00
Ben
184c10cf97 fix(slack): warn when configured token is a user token, not a bot token
A Slack user/legacy token (xoxp-...) makes auth.test resolve to the
installing human's member ID with no bot_id, so the adapter binds its
identity (_bot_user_id / _team_bot_user_ids) to that human. Every
"is this the bot?" check then misfires: that person's <@...> mentions
wake the bot and are stripped as the bot's own mention, so the agent is
genuinely told it was @mentioned and replies to messages merely
addressed to that human (symptom: bot responds to "@trevor ..." and
insists it was explicitly mentioned).

There is no runtime API error to catch — a user token still
sends/receives — so the only detectable moment is connect time. Add a
warning-only nudge (_warn_if_not_bot_token) alongside the existing
group-DM scope nudge: when auth.test resolves a user_id but no bot_id,
log that the token is a user token and to use the xoxb-... Bot User
OAuth Token. Warning-only: does not block a working-but-misconfigured
install. Fires once per workspace per process.
2026-06-29 20:57:43 -07:00
Teknium
6aefc9d925
feat(gateway): show per-category context breakdown in /usage (#55204)
Channel users get the same context split the desktop popover shows
(PR #54907) — system prompt, tools, rules, skills, MCP, subagents,
memory, conversation — under the existing Context line in /usage.

Reuses agent.context_breakdown.compute_session_context_breakdown, so
there is no new tool and no new engine. The slices are estimates
(chars/4) and the block is labelled _(estimated)_; the headline
Context line keeps using the provider-measured last_prompt_tokens.
Rendering is fail-open: any engine error returns no breakdown and the
rest of /usage is unaffected.

- gateway/slash_commands.py: _context_breakdown_lines() helper + wire
  into _handle_usage_command
- locales/*.yaml: breakdown_header, breakdown_line, and 8 category
  labels across all 16 locales (parity gate)
- tests/gateway/test_usage_command.py: render + fail-open coverage
2026-06-29 20:42:19 -07:00
Ben Barclay
53a75f147f
feat(dashboard_auth): support confidential clients (client_secret) in self-hosted OIDC (#55344)
The self-hosted OIDC dashboard provider was public-client + PKCE only, with
two `# TODO(confidential-client)` seams. Authentik and Keycloak commonly
default a new OIDC client to *confidential*, whose token endpoint rejects an
unauthenticated exchange (`invalid_client`) — so a self-hoster who accepts
their IDP's default could not complete dashboard login without manually
flipping the client to public.

Add optional confidential-client support:

- New optional `client_secret` (env `HERMES_DASHBOARD_OIDC_CLIENT_SECRET`,
  or `dashboard.oauth.self_hosted.client_secret`; env-wins-config, empty
  treated as unset). It is a credential, so docs steer operators to the
  `.env` file; config.yaml is supported only for precedence symmetry.
- `_token_endpoint_auth()` selects `client_secret_basic` (HTTP Basic header)
  vs `client_secret_post` (form body) from the IDP's advertised
  `token_endpoint_auth_methods_supported`, defaulting to basic (the OIDC
  default) when absent. Applied to complete_login, refresh_session, and
  revoke_session (RFC 7009 §2.1).
- PKCE is sent in BOTH modes — the secret is client authentication layered
  on top, never a replacement (OAuth 2.1 / RFC 9700 keep PKCE mandatory).
- Basic header url-encodes client_id/secret before base64 per RFC 6749
  §2.3.1, so reserved chars (`:`, `@`, space) round-trip correctly.

Non-breaking: with no secret configured the provider is a pure public PKCE
client, byte-identical to prior behaviour (no Authorization header, no
client_secret in the body). The secret is never logged — register() reports
only a `confidential=<bool>` flag.

Tests: 16 new cases covering basic/post selection, default-when-absent,
public-unchanged contract, PKCE-preserved, reserved-char url-encoding,
blank-secret-is-public, refresh + revoke auth, no-secret-in-logs, and
env/config register wiring. Full dashboard-auth suite (nous provider,
middleware, gate, cookies, WS, 401-reauth, status endpoint) — 396 tests —
green, proving no existing auth path regressed.
2026-06-30 13:32:51 +10:00
Teknium
481caa66f2
feat(display): friendly human-phrased tool labels for built-in tools (#55166)
* feat(display): friendly human-phrased tool labels for built-in tools

Built-in tools now render ChatGPT-style status verbs ('Searching the web
for ...', 'Reading <file>', 'Browsing <url>') on the CLI spinner and
gateway/desktop tool-progress instead of the raw tool name.

- agent/display.py: _TOOL_VERBS map + build_tool_label() + set/get
  friendly-labels flag (default on). Custom/plugin/MCP tools fall back to
  the raw preview; verbose gateway mode left untouched (debug surface).
- tool_executor.py / tui_gateway / gateway: route the three spinner sites,
  the TUI _tool_ctx, and the gateway all/new progress line through the label.
- config: display.friendly_tool_labels (default True, per-platform aware).

Zero new core tool / schema footprint — pure display layer.

* docs: add PR infographic for friendly tool labels

* fix(display): preserve arg preview in gateway friendly labels + update tests

The first gateway pass re-derived the label from the callback's `args`, which
is empty ({}) at the gateway tool.started callsite — the command/query lives in
the `preview` string, so terminal rendered as a bare '💻 Running' and dedup
collapsed consecutive commands. Now the gateway prefixes the verb onto the
already-computed preview via get_tool_verb/tool_verb_connector/verb_drops_preview,
preserving the command/url/query. CLI spinner path (real args) keeps build_tool_label.

Tests: update test_run_progress_topics exact-format assertions to the friendly
form ('💻 Running pwd'), add a format-agnostic preview extractor for the
truncation tests (works for both quoted-legacy and verb-prefixed output).

* test(tui): update resume-display context to friendly tool label

_tool_ctx now uses build_tool_label, so the desktop resume-view context for a
search_files turn reads 'Searching files for resume' instead of the bare
'resume' preview — consistent with live tool-progress. Update the assertion.

* test(tui): harden no-race worker test against sibling shard leakage

test_session_create_no_race_keeps_worker_alive flaked under -j 8: a daemon
build thread leaked from a prior session.create test in the same shard process
fires close/unregister against its own (foreign) session_key after this test
patches the global approval hooks, polluting the captured lists. Scope the
assertions to this session's own session_key so the regression intent
(this session's worker/notify must survive) is preserved while the test
becomes immune to shard composition. Not related to friendly-tool-labels.
2026-06-29 20:31:17 -07:00
Joey Kerper
f3d2dfbec6
fix(dashboard_auth): allow any http:// host in self-hosted OIDC redirect_uri (#55099)
The self-hosted OIDC dashboard login rejected any http:// redirect_uri
whose host was not localhost/127.0.0.1, surfacing "redirect_uri may only use http:// for localhost/127.0.0.1" before reaching the IDP. This broke self-hosted dashboards reached over plain HTTP (including LAN IPs, internal hostnames, and reverse proxies that terminate TLS upstream).

#38827 already dropped this check from the nous provider, but the generic self-hosted provider  copied the old localhost-only
branch and reintroduced the bug for HERMES_DASHBOARD_OIDC_ISSUER setups.

The IDP's own allowlist is authoritative on which redirect_uris are
permitted; this client-side _validate_redirect_uri is only a fast-fail for
obvious operator error and should not second-guess valid http:// deployments.

Fix: drop the localhost-only branch on the http scheme. Validation now enforces only that the scheme is http(s) and the path ends with
/auth/callback. Updated the docstring to explain the relaxed contract,
and added test_allows_http_with_arbitrary_host covering an internal
hostname and a LAN IP alongside the existing localhost case.
2026-06-30 09:45:11 +10:00
yoniebans
d2ce2c852d test(gateway): assert interleaving safety of concurrent offloaded DB calls 2026-06-29 15:51:57 -07:00
yoniebans
6735162531 fix(gateway): offload the Telegram topic-recovery helper tree off the loop
The topic-mode helpers (_telegram_topic_mode_enabled,
_recover_telegram_topic_thread_id, _record/_sync_telegram_topic_binding,
_is_telegram_topic_lane/_root_lobby, _normalize_source_for_session_key,
_telegram_topic_new_header, _schedule_telegram_topic_title_rename, and the
base.py _apply_topic_recovery hook) each run a synchronous SessionDB read or
write. They reach the event loop through async handlers, so a contended
state.db froze the loop the same way the handoff watcher did.

These helpers already run off-loop in the run_sync thread-pool closure, so
they are proven thread-safe there. Rather than colour them async, loop-side
callers now invoke them via asyncio.to_thread(...); the executor callers are
unchanged. Inside the helpers the SessionDB handle is unwrapped to the sync
door (getattr(db, '_db', db)) since they always run on a worker thread, and
AIAgent construction + query_session_listing are handed the sync SessionDB
directly. base.py wraps its single _apply_topic_recovery call in to_thread.

The guard is now alias-aware (catches db = getattr(self, '_session_db', None);
db.method(...)) and enforces the offload contract: the offloaded sync helpers
may never be called bare on the loop. Sibling test fixtures wrap their injected
SessionDB in AsyncSessionDB to match how the gateway holds it.
2026-06-29 15:51:57 -07:00
yoniebans
0896facce8 fix(gateway): route SessionDB calls through AsyncSessionDB 2026-06-29 15:51:57 -07:00
yoniebans
89daacb454 test(gateway): cover AsyncSessionDB offload + raw-call guard (failing) 2026-06-29 15:51:57 -07:00
Teknium
290fa7fd2b
fix(gateway): skip confirmed-dead delivery targets (deleted groups, blocked bots) (#55115)
* fix(gateway): skip confirmed-dead delivery targets (deleted groups, blocked bots)

A deleted Telegram group, kicked/blocked bot, or deactivated user keeps
throwing Forbidden/not_found on every cron tick and fan-out delivery. Each
retry burns a send against the platform's flood-control envelope and spams
the logs, making the whole session feel broken even when the model call
completed.

Add a small persistent DeadTargetRegistry (per-profile JSON under
HERMES_HOME) that records a target the moment a send reports a whole-chat
death (forbidden / chat-level not_found), and have DeliveryRouter.deliver()
short-circuit it on subsequent attempts. Self-healing: any successful send
clears the flag, so a user re-adding the bot recovers with no manual cleanup.
Thread/topic-level not_found is NOT recorded (adapters already self-heal that
by retrying without reply_to). Transient/timeout errors are never marked dead.

* infographic: dead delivery target skipping
2026-06-29 13:23:29 -07:00
Ben Barclay
b963d3238b
feat(gateway): suppress home-channel shutdown broadcast on flagged drains (#54824)
Add a generic suppress_notification flag to the drain-request marker. When a
drain that ends in process exit (e.g. a NAS auto-update image migration on the
always-on Hermes Cloud fleet) is flagged, the gateway skips ONLY the
home-channel 'gateway shutting down' broadcast — the operator-flavoured ping
that would otherwise fire on every routine auto-update, dozens of times a day.

The per-active-session interrupt ping is ALWAYS kept: on a drained shutdown
it's empty by construction, and in the force-interrupt (deadline-exceeded) case
it carries the user-valuable 'your task was cut off, message me to resume' hint.

The gateway stays agnostic about WHY a drain is quiet (generic boolean, not a
kind enum); the policy of which drain causes set the flag lives in the caller
(NAS). Default-false so legacy/operator drains behave exactly as before. The
reader reuses the NS-570 epoch-staleness check so an orphaned marker on the
durable volume can never silence a fresh gateway's legitimate broadcast.

- drain_control.py: write_drain_request gains suppress_notification; new
  drain_notification_suppressed() reader (current-epoch + truthy flag).
- web_server.py: /api/gateway/drain reads + echoes the flag.
- run.py: _notify_active_sessions_of_shutdown skips the home-channel loop only.

Tests prove: flag round-trips; home-channel suppressed when set, kept when
unset; active-session ping always fires; stale/legacy/corrupt markers never
suppress.
2026-06-29 12:18:11 -07:00
Teknium
ee8cbfdc03
feat(web_extract): truncate-and-store instead of LLM summarization (#54843)
* feat(web_extract): truncate-and-store instead of LLM summarization

web_extract no longer runs an auxiliary LLM over scraped pages. The extract
backends (Firecrawl/Tavily/Exa/Parallel) already return clean, boilerplate-
stripped markdown, so we return it directly: pages within a char budget
(default 15000, web.extract_char_limit) come back whole; larger pages get a
head+tail window plus an explicit footer giving the stored full-text path and
the read_file call to page through the omitted middle. The full clean text is
written to cache/web (mounted read-only into remote backends like the other
cache dirs), so nothing is lost.

Inline base64 images are converted to [IMAGE: alt] placeholders (token bombs
dropped) while real http(s) image URLs are preserved as links so the agent can
still web_extract/vision_analyze them.

Removes process_content_with_llm + the chunked summarizer + check_auxiliary_model
+ _resolve_web_extract_auxiliary. context_references._default_url_fetcher is
updated to the truncate path and its stale data.documents shape read is fixed
to results (it was silently returning empty).

Live before/after eval (firecrawl, 4 URLs): 11.7x faster overall (176.6s ->
15.1s); 10-60x on large pages. Quality identical; findability 4/4 (answer
recoverable from stored full text on every truncated page). web_search is
unchanged.

No own scraper added; no changes to web_search.

* fix(web_extract): add char_limit to execute_code web_extract stub

The new web_extract char_limit param must appear in the code_execution_tool
_TOOL_STUBS signature (and doc line) or test_stubs_cover_all_schema_params
fails — the stub schema must cover every real schema param.
2026-06-29 10:00:49 -07:00
Austin Pickett
fd324562d3 feat(desktop): add context usage breakdown popover
Let users click the status bar context indicator to see how tokens are
split across system prompt, tools, rules, skills, MCP, and conversation.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-29 09:18:10 -04:00
HexLab98
f1345290ed test(auxiliary): cover NVIDIA NIM max_tokens in _build_call_kwargs 2026-06-29 18:04:39 +05:30
Ben Barclay
f53ba9bb54
fix(s6): dot-prefix gateway staging dir so svscan ignores it mid-build (#54834)
Some checks are pending
CI / Detect affected areas (push) Waiting to run
CI / Python tests (push) Blocked by required conditions
CI / Python lints (push) Blocked by required conditions
CI / TypeScript (push) Blocked by required conditions
CI / Docs Site (push) Blocked by required conditions
CI / Deny unrelated histories (push) Blocked by required conditions
CI / Check contributors (push) Blocked by required conditions
CI / Check uv.lock (push) Blocked by required conditions
CI / Lint Docker scripts (push) Blocked by required conditions
CI / Build&Test Docker image (push) Blocked by required conditions
CI / Supply-chain scan (push) Blocked by required conditions
CI / OSV scan (push) Waiting to run
CI / All required checks pass (push) Blocked by required conditions
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
The register path builds each profile-gateway slot in a sibling staging
dir under /run/service (the scandir s6-svscan watches), then atomically
renames it to the live gateway-<profile> name. The staging dir was named
gateway-<profile>.tmp — a NON-dotfile — so a concurrent `s6-svscanctl -a`
rescan (fired by the cont-init reconciler registering gateway-default, or
by a sibling register) would supervise the half-built slot the moment it
had a valid type/run: s6-supervise spawns AS ROOT and mkdirs supervise/
root-owned 0700, then the in-flight _seed_supervise_skeleton early-returns
on the now-existing supervise/ and the next `mkdir supervise/event` hits
PermissionError.

That is the arm64-only CI flake on
test_s6_unregister_removes_service_dir_in_live_container
(PermissionError: /run/service/gateway-phase3test.tmp/supervise/event) —
arm64-only because the native-arm runner's wider scheduling jitter lets
the rescan land inside the ~ms seed window; amd64 ran 30/30 clean.

Fix: dot-prefix the staging dir (.gateway-<profile>.tmp) in both register
paths (S6ServiceManager.register_profile_gateway and
container_boot._register_service). s6-svscan skips any scandir entry whose
name begins with '.', so the half-built slot can never be supervised
mid-build. The atomic rename to the dotless live name is unchanged.

Verified on a real s6 image (amd64): a non-dotted staging dir is picked up
by an svscanctl -a rescan (SUPERVISED owner=root) while a dot-prefixed one
is ignored (NOT-SUPERVISED). Added a docker-harness regression test that
asserts both, plus a unit test that the staging dir is dot-prefixed.
2026-06-29 21:33:00 +10:00
Teknium
dbad6d47d3 fix(gateway): also neutralize untrusted Matrix room name in prompt
Widen #5961's _format_untrusted_prompt_value coverage to the Matrix
room display name (**Matrix Room:**), a sibling attacker-controllable
field the original fix missed. chat_name is user-settable, so an
injected room name could render as literal markdown in the system
prompt. Adds a regression test.
2026-06-29 04:25:51 -07:00
Xowiek
09666ceb76 fix(gateway): neutralize untrusted session metadata in prompts 2026-06-29 04:25:51 -07:00
teknium1
ea1372d2af fix(security): wire session-id sanitizer into artifact paths + API boundary
Defense-in-depth on top of _safe_session_filename_component (#5958):

Sink (makes the bad write impossible regardless of entry point):
- run_agent._save_session_log: sanitize session_id before building the
  session_{sid}.json snapshot path.
- agent_runtime_helpers.dump_api_request_debug: sanitize before building
  the request_dump_{sid}_{ts}.json path.

Boundary (clean 400 instead of a silently-hashed filename):
- api_server rejects path-traversal-shaped X-Hermes-Session-Id on the
  session-continuation path and the explicit /api/sessions create path,
  reusing gateway.session._is_path_unsafe (mirrors the native gateway's
  entry-boundary guard). Also enforces the session-header length cap on
  the continuation path.

Tests: traversal session_id stays contained at the write site; sanitizer
always yields a traversal-free segment; the API header rejects
../, absolute, and Windows-traversal IDs with 400.
2026-06-29 04:25:45 -07:00
teknium1
cdd8e0a271 test(gateway): exercise last_prompt_tokens in reset-activity tests
The reset-had-activity tests set total_tokens (dead state) to simulate
activity; production records activity via last_prompt_tokens. Update
the fixtures to match the field the fix and runtime actually use.
2026-06-29 04:25:37 -07:00
Ruzzgar
576424cc1c fix(security): redact browser CDP endpoint logs 2026-06-29 04:25:26 -07:00
teknium1
23c03ced75 fix(session-db): enrich NULL session metadata via upsert instead of INSERT OR IGNORE
The gateway's get_or_create_session() creates a bare session row (source +
user_id) before the agent exists. The agent's later create_session() carries
the real model/model_config/system_prompt, but _insert_session_row used
INSERT OR IGNORE — silently dropping that enrichment. Gateway sessions were
left with NULL model and NULL billing metadata.

Switch to INSERT ... ON CONFLICT(id) DO UPDATE with COALESCE so NULL columns
get backfilled while values an earlier writer already set are never
overwritten (a later bare write with source='unknown' can't clobber a real
source/model). Credit: original report and fix direction by @LucidPaths (#5048).
2026-06-29 04:25:23 -07:00
Ben
f5ecbe1ec6 feat(dashboard): auto-initiate portal SSO redirect on unauthenticated load
When the dashboard gateway has no local session cookie, it rendered a
click-through /login interstitial — even though the Nous portal's
/oauth/authorize auto-approves any current member of the dashboard's org
and is a silent 302 when the user already holds a portal session. For the
common case (clicking a hosted-agent dashboard link while signed in to the
portal) that interstitial click is pure friction.

This makes the gate auto-initiate the OAuth redirect on an unauthenticated
HTML document load instead of rendering the interstitial, when exactly one
interactive provider is registered. A one-shot loop-guard cookie
(hermes_sso_attempt, 60s TTL) ensures that a genuinely absent portal
session (the portal bounces back still-unauthenticated) falls back to the
/login page after exactly one bounce rather than ping-ponging forever. The
marker is cleared on a successful callback and whenever the gate falls back
to /login.

Security: this removes a human CLICK, not a security check. The redirect
lands on the existing /auth/login route and runs the unchanged PKCE
auth-code flow; token verification, audience checks, redirect-URI match,
and org-membership checks are all untouched. /api/* fetches still get the
401 JSON envelope (never a 302 a fetch() would follow opaquely), and with
two or more providers the /login chooser still renders.

Phase 1 of the cloud-auto-discovery work.
2026-06-29 04:25:18 -07:00
Teknium
dc5ef20d89
test(reasoning-floor): isolate stale-timeout floor tests from config-module reload races (#54775)
The five _resolved_api_call_stale_timeout_base integration tests reloaded
hermes_cli.config + hermes_cli.timeouts via importlib.reload to clear cached
config. Under xdist that mutates module-global state shared across the worker
process, so a sibling test could leave the config cache in a state that made
get_provider_stale_timeout return a leaked value — intermittently failing
test_reasoning_floor_applies_to_opus_4_thinking (shard 6 flake, #52217 area).

Patch run_agent.get_provider_stale_timeout per-test instead: floor-path tests
get None (resolver falls through to the reasoning floor / env var / default),
the explicit-config test gets 60.0 (priority-1 short-circuit). Same assertions,
no shared-module mutation, deterministic under parallel execution.
2026-06-29 02:42:54 -07:00
sgaofen
194bff0687 fix(gateway): confirm final delivery before suppressing send
Fixes #14238. During a compression/session split at the response
boundary, the interim callback delivered unrelated commentary, setting
response_previewed=True. The suppression logic treated that as proof the
final reply had been delivered and skipped the normal send — the response
was persisted to the child session but never sent to chat.

Only suppress the normal final send when the stream consumer confirms
final delivery (final_response_sent / final_content_delivered) or the
exact final response text was delivered as a preview.
2026-06-29 02:37:11 -07:00
Telos
fa11b11cf5 fix: propagate key_env from custom_providers into ProviderDef
resolve_custom_provider() previously returned api_key_env_vars=()
for every custom provider entry, silently dropping the configured
key_env field. This caused 401 errors for any custom provider that
required an API key via environment variable (e.g. Xiaomi MiMo Token
Plan, self-hosted OpenAI-compatible servers).

The key_env field is already documented in _VALID_CUSTOM_PROVIDER_FIELDS
and normalized by normalize_custom_provider_entry(), so this was just
an oversight in the ProviderDef construction.

Also adds a regression test that verifies key_env is properly
propagated into the resolved ProviderDef.
2026-06-29 02:25:48 -07:00
Sanjay Santhanam
c79e6bceae fix(browser_tool): resolve race in _get_command_timeout cache returning None (#14331)
# Conflicts:
#	tools/browser_tool.py
2026-06-29 02:24:57 -07:00
Teknium
bf0d8fed8e
fix(config): v32 migration flips baked-in verify_on_stop=true to false (#54740)
The first ship of verify-on-stop (config v30) defaulted
DEFAULT_CONFIG agent.verify_on_stop to a literal True, and migrate_config
persists defaults with strip_defaults=False — so every install that updated
through v30 had verify_on_stop: true written into config.yaml as a literal.

The v30->v31 migration only flipped missing/'auto' values to false and
deliberately preserved an explicit bool, so it skipped that entire population
and left verify-on-stop ON for everyone who had updated. A literal true was
never a user choice: the feature had no off-switch worth setting it against
until v31 introduced one, so a true persisted before v32 is always the old
machine default.

v32 migration flips a literal true -> false once, for both v30 (skipped v31)
and v31 (preserved-by-bug) installs. A true the user sets AFTER v32 is a
deliberate opt-in and is never touched.
2026-06-29 01:51:08 -07:00
teknium1
75317d82d0 fix(vision): narrow the fan-out cap to the CPU encode burst only
The original cap held a process-global slot across the WHOLE vision
analysis (image load + encode + LLM call) with a default of min(CPUs, 4).
That serialized legitimate multi-image workflows — "compare these 6
screenshots", "read this 10-page scan", "analyze every frame" — behind a
4-wide gate, and on the native fast path it even throttled calls that make
no LLM request at all. Excess calls queued (blocking acquire, nothing
dropped), but the latency hit on real fan-out was the wrong tradeoff.

The incident was CPU exhaustion, not call count: concurrent base64/resize
bursts saturated every core and left none to service the shared event loop
serving /api/status. So cap ONLY that:

- A dedicated, bounded ThreadPoolExecutor (_vision_cpu_executor) runs the
  encode/resize/dimension-check off the caller's loop, sized to the host's
  usable core count with NO fixed ceiling — the cap tracks the actual
  exhausted resource (cores), not a magic number. Excess encodes queue on
  the executor; cores stay free for the loop.
- The LLM call is deliberately OUTSIDE the executor, so multi-image
  workflows keep full request concurrency.
- Override via auxiliary.vision.max_concurrency / HERMES_VISION_MAX_CONCURRENCY
  (honored verbatim, including above core count); sub-1 ignored.
- _vision_concurrency_slot() is now a no-op shim for back-compat.

Tests assert: resolver defaults to host cores with no ceiling; env/config
override (incl. above cores); sub-1 rejection; the executor is dedicated and
core-sized; encode runs on a vision-encode thread; and crucially that encode
bursts are bounded to the cap while the analyses themselves stay fully
concurrent (calls_peak > cap).
2026-06-29 01:27:10 -07:00
Ben Barclay
eddfecd2ce fix(vision): cap vision_analyze fan-out concurrency process-wide
A single agent turn can fan out N vision_analyze calls at once — the
classic trigger is "analyze every frame of this video", where ffmpeg
explodes a clip into dozens of frames and the model calls vision_analyze
on each. Every call does a CPU-heavy base64-encode/resize burst AND holds
a long-lived LLM stream open. The tool executor runs concurrent tool calls
on a per-session ThreadPoolExecutor (_MAX_TOOL_WORKERS=8), and multiple
agent sessions share one process (the dashboard runs the agent in-process),
so there was no global ceiling. In prod (June 2026) a video-frame fan-out
pinned a worker thread at ~100% CPU and starved the shared asyncio event
loop that also serves the dashboard's /api/status liveness probe, flapping
the instance to UNHEALTHY even though nothing had crashed.

Add a process-global threading.BoundedSemaphore that bounds how many vision
analyses run concurrently across the whole process, held across the entire
analysis (image load + encode + LLM call) in the single _handle_vision_analyze
chokepoint (covers both the native fast path and the legacy aux-LLM path).

It is a threading semaphore, NOT asyncio: each vision call is dispatched
through model_tools._run_async on a per-thread event loop, so an asyncio
primitive bound to one loop cannot coordinate across them. The acquire is
offloaded via run_in_executor so waiting for a slot never blocks the calling
loop.

Default: min(host CPUs, 4), floored at 1 — respect the host's concurrency,
or lower. Override via auxiliary.vision.max_concurrency (config.yaml) or
HERMES_VISION_MAX_CONCURRENCY (env). Values < 1 are ignored so the cap can
never be disabled into an unbounded fan-out.

Tests: bounded-fan-out regression guard + a control proving it would fail
without the cap; resolver tests for host-cpu default, ceiling clamp, low-cpu
host, env override, and sub-1 rejection. Pre-existing handler tests updated
for the now-async _handle_vision_analyze. Verified via the real
registry.dispatch -> _run_async per-thread-loop path (16 concurrent calls,
peak bounded to cap).
2026-06-29 01:27:10 -07:00
teknium1
115e78c377 test(camofox): accept headers= kwarg in persistence test mocks
The auth-header fix adds headers=_auth_headers() to all Camofox HTTP
calls. Two _capture_post mocks in the persistence test lacked a headers
parameter, so navigate raised TypeError and the success assertions
failed. Add headers=None to both mock signatures.
2026-06-29 01:26:24 -07:00
liuhao1024
fe38d50833 fix(tools): read browser.command_timeout in Camofox HTTP client
The Camofox browser backend hardcoded a 30s HTTP timeout via
_DEFAULT_TIMEOUT, ignoring the user's browser.command_timeout config.
The main browser_tool path already reads this config via
_get_command_timeout().

This commit adds an equivalent _get_command_timeout() to
browser_camofox.py that reads browser.command_timeout from config
with caching, and switches all HTTP helper methods (_post, _get,
_get_raw, _delete) to use it as the default timeout.

Fixes #40843
2026-06-29 01:26:24 -07:00
刘昊
babd9168ba fix(browser): send Authorization header in Camofox HTTP calls when CAMOFOX_API_KEY is set
The five HTTP call sites in browser_camofox.py (_ensure_tab, _post,
_get, _get_raw, _delete) did not include Authorization headers, causing
403 Forbidden when the Camofox server has API key auth enabled.

Added _auth_headers() helper and wired it into all five call sites.
The health check endpoint (/health) is left without auth since it is
a connectivity probe, not a browser operation.

Regression test covers: header present when key set, absent when unset,
blank key produces empty headers.

Fixes #20476
2026-06-29 01:26:24 -07:00
liuhao1024
270456308c fix(tools): send listItemId instead of sessionKey in Camofox tab creation
The Camoufox REST API server expects `listItemId` in the `POST /tabs`
body, but `_ensure_tab` was sending `sessionKey`.  This caused a 400
Bad Request on every `browser_navigate` call.

The parameter name mismatch is visible in the same file: line 283
already reads `tab.get("listItemId")` when adopting existing tabs,
confirming the server-side field name.

Fixes #37960
2026-06-29 01:26:24 -07:00
teknium1
34e616e778 feat(slack): nudge stale installs to add mpim scopes; mark message.mpim required
Follow-up to the group-DM manifest fix. The manifest change only helps
NEW installs; existing apps keep their old (mpim-less) scopes until the
admin reinstalls. Since a missing message.mpim event delivers nothing
(no runtime API error to catch), detect stale installs at connect time
from the auth.test x-oauth-scopes header and log an actionable reinstall
nudge when im:history is granted but mpim:history is not. Also promote
message.mpim from Recommended to Required in the docs event tables so the
default setup path can't drop it.
2026-06-29 01:02:53 -07:00
Ben
4125cc3b7c fix(slack): subscribe to message.mpim + mpim scopes so group DMs work
Group DMs (multi-person DMs, channel_type=mpim) were never delivered to
the Slack bot. The adapter already classifies mpim as a DM and replies
ambiently (adapter.py:2526, is_dm = channel_type in {im, mpim}), but the
generated app manifest only subscribed to message.im / im:history — the
1:1 DM pair. Without the message.mpim event subscription Slack drops
group-DM messages before the adapter ever sees them, so 1:1 DMs worked
while group-DM ambient mode was dead.

Add message.mpim to bot_events and mpim:history (the scope that event
requires per Slack docs) + mpim:read (mirrors im:read for the
conversations.info classification call) to bot_scopes. Update the
SLACK_BOT_TOKEN / SLACK_APP_TOKEN setup-help strings and the Slack docs
(EN + zh-Hans: scope table, event table, troubleshooting) so existing
installs are told to add the new scopes and reinstall.

Reported by an enterprise customer. Note: this is a manifest/scope
change, so it only takes effect after the app is reinstalled and the
new scopes are accepted.

Tests: assert message.mpim + mpim:history + mpim:read are in the
manifest (with and without assistant mode); both fail on current main
and pass with this change.
2026-06-29 01:02:53 -07:00
Teknium
29f0968275
test(windows): harden pid-scan no-window assertion against captured-call leakage (#54707)
test_gateway_pid_scan_hides_wmic_and_powershell_windows flaked once in CI
(slice 7/8) with 'KeyError: creationflags' while passing 15/15 under exact
CI-parity locally. The positional 'kwargs["creationflags"]' indexing raises
a bare KeyError the moment any stray subprocess.run call is captured, masking
the real contract. Filter captured calls to the two intended Windows console
spawns (wmic + PowerShell fallback) and assert each is windowless via
.get('creationflags'); a leaked/extra call now surfaces as a readable
len-mismatch with the full captured list, not a cryptic KeyError.
2026-06-29 01:01:29 -07:00
Ben Barclay
1289f12812 fix(memory): lazy-install supermemory + mem0 SDKs like honcho/hindsight
The supermemory and mem0 memory providers shipped third-party SDKs
(supermemory / mem0ai) that are not core dependencies, but — unlike the
honcho and hindsight providers — they imported those SDKs directly with
no tools.lazy_deps.ensure() preflight and had no LAZY_DEPS allowlist
entry. On the published Docker image the agent venv is sealed
(HERMES_DISABLE_LAZY_INSTALLS=1) and lazy installs are redirected to a
writable durable target (HERMES_LAZY_INSTALL_TARGET). honcho/hindsight
route through ensure() and install fine there; supermemory/mem0 never
called it, so their SDK was never installed on a hosted instance and the
provider silently reported itself unavailable even with the API key set.

Fixes:
- Add memory.supermemory + memory.mem0 to the LAZY_DEPS allowlist
  (tools/lazy_deps.py), pinned to current PyPI releases.
- Call ensure('memory.<x>', prompt=False) at each SDK-import chokepoint
  (_SupermemoryClient.__init__; Mem0MemoryProvider._create_backend),
  mirroring honcho's wrapped try/except shape.
- Drop the SDK-import gate from supermemory's is_available() — it was a
  chicken-and-egg trap (provider never loaded on a sealed venv, so
  ensure() never ran). Now key-presence only, like honcho/mem0.
- Add matching pyproject extras [supermemory]/[mem0]; update the
  lazy-covered-extras contract test (excluded from [all] by policy).

Tests prove each path fails without the fix and the real sealed-venv
durable-target gate accepts both features.
2026-06-29 00:25:36 -07:00
Ben
1c75e7c9d8 feat(dashboard): list & add arbitrary custom .env keys on the Keys page
The Keys page only rendered env vars present in a catalog (OPTIONAL_ENV_VARS
or the provider catalog); any other key a user set in .env was invisible, and
there was no way to add an arbitrary env var from the GUI (e.g. to inject a
var a skill or MCP server needs).

Backend: GET /api/env now also emits a row for every on-disk .env key that
isn't in any catalog, flagged category="custom" + custom=true and
password-masked (an unrecognised key could hold anything, so it's redacted and
reveal-gated like any secret). Channel-managed credentials stay excluded. The
write (PUT /api/env) and reveal (POST /api/env/reveal) paths already handle
arbitrary keys, with the existing env-name guard + denylist (PATH, LD_PRELOAD,
PYTHONPATH, …) enforced server-side — no new write surface.

Frontend: a new "Custom Keys" section lists those custom rows and carries an
add-a-key form (client-side name validation mirroring the backend regex; the
new row reuses the normal edit/save flow, so on save it round-trips back from
the backend as a durable custom row). i18n added for en + zh + types.

Tests: behavior-contract coverage that an unknown .env key surfaces as a
masked custom row and a catalogued key does not — verified to fail on the
pre-fix backend.
2026-06-28 22:53:56 -07:00
HexLab98
23f245eda5 test(vision): cover Ollama /api/show vision capability routing (#54511) 2026-06-28 22:52:59 -07:00