Commit graph

6473 commits

Author SHA1 Message Date
teknium1
ea5aaa7a22 fix(gateway): offload remaining inline agent cleanup off the event loop (#53175)
#35994 moved /new reset cleanup off the loop, but _cleanup_agent_resources
(agent.close() subprocess teardown; shutdown_memory_provider() plugin IO) was
still called INLINE on the event loop from three other sites:

  - _session_expiry_watcher (5-min idle sweep) — live loop
  - _handle_message_with_agent cache-hygiene re-eviction — live loop
  - _finalize_shutdown_agents / stop() idle-cache loop — shutdown

A wedged memory provider on any of these froze the loop: bot goes silent,
runtime-status updated_at heartbeat stops advancing, and SIGTERM can't be
serviced (requires kill -9) — exactly the #53175 zombie pattern.

Adds _cleanup_agent_resources_off_loop: a bounded (30s) worker-thread offload
mirroring the #35994 reset fix, and routes all four sites through it.
2026-06-28 02:41:36 -07:00
teknium1
aa50c1ba5d fix(prompt): repair backend probe import (get_environment never existed)
The system-prompt backend probe imported a nonexistent symbol —
`from tools.environments import get_environment` — which always raised
ImportError: cannot import name 'get_environment'. The exception is caught
and only drops the live backend description to a static fallback, so it is
cosmetic, but it broke the live OS/user/cwd probe for every non-local
backend (docker/singularity/modal/daytona/ssh).

The real factory is `_create_environment` in tools.terminal_tool. Build the
environment the same way the live terminal path does (select backend image,
assemble ssh/container config from _get_env_config()), then run the probe.

Note: this does NOT affect tool loading — tool selection runs each tool's
check_fn and never consults this probe. Regression from #52147 (2026-06-25).

Closes #53667 (probe import); the 'cronjob-only' tool-collapse symptom is
not reproducible — tool selection has no probe dependency and memory's
check_fn is unconditionally True.
2026-06-28 02:41:31 -07:00
teknium1
7c9cdad9fd test(cli): cover Windows self-lock recovery guard + cmd-quote its hint
Add two tests for the self-lock guard in _recover_from_interrupted_install:
one asserting it clears the marker and skips install when hermes.exe is a
process ancestor (breaking the #52378/#45542 loop), one asserting it falls
through to a normal recovery install when the shim is NOT an ancestor.

The guard's manual-recovery hint runs only inside the Windows branch, so
quote it for cmd.exe (cd /d, double-quoted paths) — the cross-platform
fallback hint at the end of the function is left POSIX-correct.

Map Icather in scripts/release.py AUTHOR_MAP for the salvage.
2026-06-28 02:40:37 -07:00
liuhao1024
14baeefe1d fix(matrix): record DM rooms in m.direct on invite to prevent group misclassification
Rebase onto plugins/platforms/matrix/adapter.py (code moved from
gateway/platforms/matrix.py). Same logic: _on_invite checks is_direct
on invite events and calls _record_dm_room to persist in m.direct
account data.

Fixes #44679
2026-06-28 02:37:52 -07:00
Teknium
fde1c8570f
fix(tui_gateway): suppress WS peer-hangup teardown error flood (#50005) (#54126)
When the Desktop forcibly closes its WebSocket mid-write, asyncio logs a
full traceback for every pending connection-lost callback — 50+ identical
WinError 10054 (ConnectionResetError) lines per disconnect on Windows, the
equivalent ConnectionResetError/BrokenPipeError on POSIX. These are not
actionable: they are the expected side effect of the peer hanging up before
our writes drained.

Install a loop exception handler on the gateway serving loop that collapses
exactly this teardown class (ConnectionResetError/ConnectionAbortedError/
BrokenPipeError originating from _call_connection_lost) to a single debug
line, forwarding every other loop error to the existing/default handler
unchanged so genuine loop bugs still surface. Idempotent per loop.
2026-06-28 02:35:01 -07:00
LeonSGP43
9f0e64cedd fix(gateway): force exit after graceful shutdown
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-06-28 02:34:23 -07:00
yungchentang
7e2ca7f68d fix(telegram): reset send pool after pool timeouts 2026-06-28 02:34:17 -07:00
Teknium
9f17f16c66 fix(environments): use $BASHPID for atomic snapshot temp + harden failure path
The atomic mv approach (kyssta-exe's commit) narrows but does not close the
#38249 race: the temp name used $$ (parent shell PID), which is identical
across &-launched concurrent subshells. Two concurrent writers pick the same
temp file, clobber each other mid-write, and mv then publishes a torn snapshot
— a reader sourcing it absorbs declare-x/export fragments into PATH.

- Use $BASHPID (actual per-subshell PID) so concurrent writers never collide.
- Chain mv on export success (&&) and rm the temp on failure so a partial dump
  never replaces a good snapshot; apply the same to the init_session bootstrap.
- shlex-quote the static temp-path portion (Windows/spaces), $BASHPID outside.
- LocalEnvironment.cleanup sweeps orphaned snap.tmp.* temps.
- Regression tests: string-shape + a behavioral concurrent writers/readers test
  that proves the snapshot never tears (would still tear with $$).
2026-06-28 02:08:57 -07:00
teknium1
c23f394eb8 fix: satisfy ruff encoding + windows-footgun lints for cgroup reaper
- read_text(encoding='utf-8') (PLW1514)
- # windows-footgun: ok on signal.SIGKILL — module is Linux-only (reads
  /proc, /sys/fs/cgroup; runs from a systemd unit)
- test lambda accepts the new encoding kwarg
2026-06-28 02:05:50 -07:00
PRATHAMESH75
e551da6ddb fix(gateway): reap cgroup orphans via ExecStopPost to unblock restart
Long-lived helpers spawned indirectly by tool calls (adb, platform
bridges) were left in the service cgroup after the gateway's main
process exited. When the kernel rejected the deferred cgroup-wide kill
with EINVAL, systemd blocked Restart=always for 6+ minutes, taking
down all platforms and cron windows (#37454).

Add a small ExecStopPost helper (gateway.cgroup_cleanup) that walks
cgroup.procs and sends per-PID SIGKILLs — a different kernel code path
than cgroup.kill, so it succeeds where the cgroup-wide write failed.
KillMode=mixed is preserved so the gateway still reaps its own
tool-call children before systemd intervenes (#8202).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-28 02:05:50 -07:00
Coy Geek
d7a1052424 fix(env-passthrough): fail closed when provider blocklist import fails
When tools.environments.local can't be imported (partial install,
import-time error), _is_hermes_provider_credential() returned False —
fail-open. A skill could then register a Hermes provider credential
(ANTHROPIC_API_KEY, etc.) as env passthrough; _scrub_child_env lets
passthrough vars bypass the secret-substring net (rule 1), so the
operator's real key would land in the execute_code child. Reopens the
GHSA-rhgp-j443-p4rf bypass.

Fail closed instead: on import failure, treat the name as a protected
provider credential and refuse passthrough. Regression test exercises
the full register -> scrub path under a simulated import failure.

Co-authored-by: Hermes Agent <noreply@nousresearch.com>
2026-06-28 02:05:43 -07:00
teknium1
58c36b1798 fix(api-server): widen error redaction to cron-endpoint + SSE sites
Follow-up to the salvaged #37733 fix. The contributor centralized
redaction at _openai_error and the chat/responses failure paths, which
covers the OpenAI-compatible envelopes transitively. Two sibling classes
crossed the same authenticated HTTP boundary unredacted:

- 8x cron-management endpoints returning {"error": str(e)} on 500
- the session-chat SSE error event ({"message": str(exc)})

Route both through the same _redact_api_error_text(force=True) helper.
Add AUTHOR_MAP entry for coygeek and a TestRedactApiErrorText guard
covering mask/force/limit/passthrough behavior.
2026-06-28 02:05:38 -07:00
Coy Geek
5e774de76e fix(api-server): redact provider errors at HTTP boundary
Force API-server error text through the existing secret redactor before returning OpenAI-compatible errors, response fallback text, response snapshots, and run failure events. This prevents credential-shaped provider failure text from crossing the API-server boundary while preserving debuggable sanitized messages.
2026-06-28 02:05:38 -07:00
HexLab98
d2fda5925d test(gateway): cover Discord/Slack compression status suppression (#39293) 2026-06-28 14:35:32 +05:30
teknium1
3aaa98dd01 test(whatsapp): cover LID allowlist match on modern session layout
Add an _is_user_authorized E2E for the platforms/whatsapp/session layout
on top of fesalfayed's resolver fix (#36665) — guards the actual
silently-dropped-LID-sender path from #36664.
2026-06-28 02:05:26 -07:00
fesalfayed
263ffec1b0 fix(whatsapp): resolve LID aliases on modern platforms/ session layout
expand_whatsapp_aliases hardcoded get_hermes_home()/whatsapp/session, but
the adapter writes lid-mapping files via get_hermes_dir("platforms/whatsapp/
session", "whatsapp/session"). On installs without the legacy directory the
two paths diverge, so the resolver finds no mappings and returns the bare LID,
which misses the allowlist and silently drops the message. Resolve through the
same helper so both sides stay in lockstep on new and legacy layouts.
2026-06-28 02:05:26 -07:00
xxxigm
093f567f0d fix(agent,cli): surface empty-body API errors and fail oneshot exit code
When an LLM API call returns HTTP 4xx with an empty parsed SDK `body` ({}),
`_summarize_api_error` fell through to a bare `str(error)`, so users saw only
"HTTP 400" with no provider detail (reported on Windows in #36109). The SDK
leaves `body` empty in this case, but the httpx `response` still carries the
payload in `.text`.

- run_agent.py `_summarize_api_error`: when `body` is empty, fall back to
  `response.text` — parse a JSON `error.message`/`message` when present, else
  surface the raw (truncated) body. Platform-agnostic diagnostics.
- hermes_cli/oneshot.py: `hermes -z` now runs via `run_conversation` and returns
  exit code 2 when the run is failed/partial with no usable final response, so
  scripts can detect LLM failures (still 0 when a response — incl. an error
  summary as output — is produced).

Tests: new tests/run_agent/test_summarize_api_error.py (empty-body JSON + raw
text, RED/GREEN verified) + oneshot exit-code/`run_conversation` wiring tests.

NOTE: #36109's original root cause (Windows "all providers return empty 400")
is not reproducible on current main (heavy provider-transport churn since
v0.15.1). This change does not claim to fix that root cause — it makes any
empty-body API error LEGIBLE so a future occurrence shows the real provider
message instead of a bare HTTP 400. Relates to #36109 (does not close it).
2026-06-28 02:05:20 -07:00
teknium1
c0b4a3438a fix(install): scope Playwright override to too-new apt releases + keep step interruptible
Follow-up on #54032 for #35166:
- Gate the PLAYWRIGHT_HOST_PLATFORM_OVERRIDE retry on the host being an apt
  release newer than Playwright recognizes (Ubuntu >24.04 / Debian >13) via
  playwright_host_unrecognized(), instead of retrying on ANY install failure.
  A network/disk/permission failure on a supported host now surfaces unchanged
  rather than getting a mismatched-glibc build forced onto it.
- detect_os() now captures DISTRO_VERSION from os-release.
- Fold in the interruptibility fix (was PR #35304, self-closed): wrap the
  download in 'timeout --foreground -k 10' (probed, with plain-timeout
  fallback) so a terminal Ctrl+C reaches the child and a wedged download is
  force-killed after the deadline.
- Add behavioral tests that source the helpers and assert the retry fires only
  on Ubuntu 26.04 / Debian 14, not on supported hosts, non-apt distros,
  native-success, operator-pinned override, or unsupported arch.
2026-06-28 02:05:18 -07:00
kshitijk4poor
a28fe788a6 fix(install): retry Playwright install with platform override on unrecognized host (#35166)
On apt releases newer than the bundled Playwright recognizes (Ubuntu 26.04,
Debian 14, and future distros), 'npx playwright install --with-deps chromium'
hangs uninterruptibly at 'Installing Playwright Chromium with system
dependencies' because Playwright's resolver maps the host to a platform with
no download build (#35166).

Wrap every installer Playwright call in run_playwright_install(), which tries
the native install first and, only if it fails or times out, retries once with
PLAYWRIGHT_HOST_PLATFORM_OVERRIDE pinned to the newest known build
(ubuntu24.04-<arch>). This is the escape hatch Playwright's maintainers bless
for unrecognized platforms (microsoft/playwright#33434).

Try-native-first (not a hardcoded distro/version table) is deliberate:
- Self-correcting — when Playwright already supports the host (e.g. Ubuntu
  26.04 on Playwright >=1.61) the first attempt succeeds and the override is
  never applied, so we never force a mismatched-glibc build onto a release
  Playwright handles correctly (microsoft/playwright#35114).
- Zero-maintenance — new distro releases work the moment Playwright adds them.
- Covers Debian 14+ and future releases, not just Ubuntu 26.04.

An operator-set PLAYWRIGHT_HOST_PLATFORM_OVERRIDE is always respected (applied
to the first attempt; retry skipped). Non-x64/arm64 arches have no fallback
build and skip the retry.

Refs #35166
2026-06-28 02:05:18 -07:00
teknium1
64972b6403 fix(config): canonicalize model.name/model.model to model.default (#34500)
A custom_providers config that names the model under model.name (or
model.model) resolved to an empty model, so the API request went out
with model= — HTTP 400 from OpenAI-compatible backends. Display paths
(hermes status/dump) already read model.name and showed the model,
making the failure silent.

The model id was read via 'default or model' at ~14 independent sites
(cli, gateway, cron, curator, oneshot, fallback, profiles, ...), none
of which honored 'name'. Rather than patch every site, canonicalize at
the single load/save chokepoint: _normalize_root_model_keys() now
promotes model.model/model.name -> model.default (precedence
default > model > name) and drops the stale alias, so every reader —
present and future — sees a populated default and config.yaml is
migrated canonical on next save. The gateway, which bypasses
load_config(), replays the same normalization in _load_gateway_config().

Co-authored-by: Bartok9 <danielrpike9@gmail.com>

Credit: root-cause analysis and fix direction from @Bartok9 (#34502,
first) and @v86861062 (#34527).
2026-06-28 02:05:13 -07:00
Teknium
2ecb6f7fe6
fix(telegram): clear send_path_degraded on successful reconnect (#35205) (#54076)
* fix(telegram): clear send_path_degraded on successful reconnect

_send_path_degraded was cleared only in _verify_polling_after_reconnect,
60s after reconnect and only if scheduled. A clean start_polling() reconnect
left the flag stuck True, short-circuiting send() and blocking all outbound
messages until the deferred probe ran (or forever if it never did).

Clear the flag the moment start_polling() succeeds — that is the recovery
signal. The deferred probe remains a defensive re-check that re-enters the
reconnect ladder (re-setting the flag) if it detects a silent wedge.

Fixes #35205.

* docs: add infographic for #35205 telegram send-path fix
2026-06-28 01:38:17 -07:00
Teknium
674e16e7c6
fix(redact): stop DB-connstr redaction from corrupting code output (#33801) (#54061)
Secret redaction is display/output-scoped on main — write_file writes
content verbatim, terminal/execute_code redact only output not the
command/source. The real bug is in displayed tool OUTPUT (read_file,
terminal, execute_code):

_DB_CONNSTR_RE's password group [^@]+ was greedy across newlines, so on a
multi-line block it scanned past the DSN line to the next stray '@' (a
Python @decorator), replacing every intervening character — including line
breaks — with ***. That dropped lines and concatenated the next line onto
the f-string line, making read_file output look corrupted (the file on disk
was always correct). Reported in #33801.

Fix:
- Forbid whitespace in the userinfo/password groups ([^:\s]+ / [^@\s]+) so
  the match can never span a line break. A real DSN password never contains
  whitespace. This alone kills the catastrophic line-dropping.
- Under code_file=True, preserve a password group that is a pure {...} brace
  expression — f"postgresql://{user}:{pass}@{host}" is an f-string template,
  not a live credential. Literal passwords are still masked.
- Pass code_file=True at the terminal and execute_code output redaction call
  sites (file_tools already did) so code-execution output isn't corrupted by
  ENV/JSON/template false positives. Real prefixes, auth headers, JWTs, and
  private keys are still redacted.

Verified E2E against the reporter's exact pydantic-settings module: file
written verbatim, read_file shows the DSN f-string + @model_validator intact
with zero *** corruption, while a literal postgresql://admin:pw@host DSN and
a real sk- key are still masked.

Reported-by: koishi70
Reported-by: pfrenssen
2026-06-28 01:15:39 -07:00
teknium1
578e3989d4 fix(agent): route content-filter stream stalls to fallback chain (#32421)
When a provider's output-layer safety filter (MiniMax "output new_sensitive
(1027)", Azure content_filter, etc.) kills a streaming response after deltas
were already sent, interruptible_streaming_api_call swallows the raw error
into a finish_reason=length partial-stream stub. The conversation loop then
burned 3 continuation retries against the SAME primary — re-hitting the
content-deterministic filter every time — and gave up with "Response remained
truncated after 3 continuation attempts", never consulting fallback_providers.

Builds on @595650661's classifier change (cherry-picked) so error_classifier
recognizes the filter; then:
- chat_completion_helpers: run the swallowed error through error_classifier at
  the stub-creation point and stamp _content_filter_terminated on the stub
  (single source of truth — no parallel pattern list).
- conversation_loop: read the tag and activate the fallback chain BEFORE
  burning any continuation retries; roll partial content back to the last
  clean turn and re-issue against the new provider (restart_with_rebuilt_messages).
  Plain network stalls are unaffected (only content_policy_blocked is tagged).

Credits #32479 (@sweetcornna) and #33845 (@Tranquil-Flow) which fixed the
same issue via the stub-tag and loop-escalation approaches respectively.

Live E2E confirmed: before, _try_activate_fallback called 0x; after, fallback
fires on the first stub and the fallback provider completes the turn.
2026-06-28 01:15:21 -07:00
595650661
b8e2268628 fix(agent): add MiniMax 'new_sensitive' to content_policy_blocked patterns
The MiniMax output-layer safety filter surfaces the error verbatim as
`output new_sensitive (1027)` (sometimes with additional provider
wrapping like 'Stream stalled mid tool-call: output new_sensitive (1027)').
When the model emits a large tool-call argument block, the upstream
filter trips and the SSE stream is truncated mid-flight, producing
'stream stalled mid tool-call' errors. Until now this case was
misclassified and retried 3x on the same provider, reproducing the same
refusal and burning paid attempts.

Adding `new_sensitive` to `_CONTENT_POLICY_BLOCKED_PATTERNS` routes
it through the existing is_client_error path: skip 3x retry, activate
configured fallback model immediately, surface a clear provider-safety
message to the user.

Refs #32421
2026-06-28 01:15:21 -07:00
Teknium
c9df4bc094
fix(gateway): default restart_drain_timeout to 0 to kill systemd crash loop (#54066)
A restart now interrupts in-flight agents immediately rather than holding
the gateway open for a grace window. The previous 180s default coupled two
independently-set timers: the gateway's own drain timer and systemd's
TimeoutStopSec. On a stale unit where TimeoutStopSec < drain, systemd
SIGKILLed the gateway mid-cleanup, leaving a stale lock that made the next
startup exit immediately ('already running') — an infinite crash loop under
Restart=on-failure (#31981).

Setting drain to 0 makes the mismatch structurally impossible: with drain 0
the generated unit gets TimeoutStopSec=90 against a near-instant drain, so
systemd never kills mid-cleanup. Contract: restart the gateway, in-flight
work stops. A grace window large enough to 'save' a long agent turn would
have to outlast an unbounded task, which is impossible.

Also fixes the stale-unit warning's suggested command
(hermes gateway service install --replace -> hermes gateway install --force);
the former subcommand does not exist.

Closes #31981
2026-06-28 01:14:34 -07:00
teknium1
1f72ad9be9 refactor(cli): extract interrupt recovery to a testable helper
Pull the #33271 post-interrupt recovery (flush_stdin + _force_full_redraw)
out of process_loop's finally block into _recover_terminal_after_interrupt(),
and replace the inline-logic-copy tests with ones that exercise the real
helper plus a source guard that process_loop still invokes it behind the
_last_turn_interrupted gate.
2026-06-28 01:08:09 -07:00
zccyman
f3aaba7f85 fix(cli): recover terminal state after interrupt to prevent raw control sequence freeze
When the agent is interrupted during processing, prompt_toolkit's
renderer and VT100 input parser can be left in an inconsistent state.
CSI 6n cursor position report responses leak as literal text
(^[[19;1R) and the terminal stops accepting keyboard input.

Fix: in process_loop's finally block, after an interrupted turn:
- flush_stdin() to drain stray escape bytes from the OS input buffer
- _force_full_redraw() to reset prompt_toolkit's renderer cache

Closes #33271
2026-06-28 01:08:09 -07:00
teknium1
aacc15b2c9 fix(clarify): raise default clarify_timeout to 3600s (#32762)
The 600s default evicted the gateway clarify entry while users were
still away (meeting/AFK); a later button tap then landed on a dead
entry and the agent hung on 'running: clarify'. Raise the default to
1h in DEFAULT_CONFIG and the get_clarify_timeout() code-level fallback,
documenting the running-agent-guard tradeoff. User overrides still win.
2026-06-28 01:07:53 -07:00
konsisumer
3f543229f2 fix(telegram): notify user when clarify button tap arrives after expiry 2026-06-28 01:07:53 -07:00
Teknium
90d25adc9e
fix(gateway): deliver profile-scoped cache media on symlinked HERMES_HOME (#54060)
Generated images under a profile gateway's cache (profiles/<name>/cache/
images/...) were silently dropped from Telegram/Discord delivery when
HERMES_HOME is symlinked under a denied prefix (e.g. /opt/data ->
/root/.hermes) and $HOME is not that prefix. The resolved path lands
under /root (a system denylist prefix), the root-home exception only
fires when the denied prefix IS $HOME, and the static safe-roots list
only covers the active HERMES_HOME's top-level cache — not per-profile
cache dirs. Both gates fail, so validate_media_delivery_path returns
None and the gateway logs 'Skipping unsafe MEDIA directive path'.

_media_delivery_allowed_roots() now also enumerates per-profile cache
roots (<root>/profiles/*/cache/{images,audio,videos,documents,
screenshots}) at check time. Allowlist match runs before the denylist,
so the profile artifact delivers regardless of the /root interaction;
profile-dir credentials (auth.json) stay blocked since they aren't
under a cache subdir.

Reopened regression of #34485/#38108, neither of which covered the
profile-scoped symlink case. Fixes #31733.
2026-06-28 01:07:28 -07:00
sweetcornna
2701ea2f0c fix(agent): reopen fallback chain after primary recovery 2026-06-28 00:57:42 -07:00
teknium1
7b9ff310b6 fix: salvage #33830 for current main — relocate allow_bots bridge to telegram plugin hook, fix stale adapter import in test 2026-06-28 00:57:03 -07:00
sweetcornna
fc70d023d8 fix(telegram): apply bot auth policy to Telegram sources
# Conflicts:
#	gateway/config.py
2026-06-28 00:57:03 -07:00
teknium1
52d774f0f9 fix(state): F_FULLFSYNC barrier at WAL checkpoints on macOS (#30636)
On Darwin, synchronous=FULL (the WAL default) only issues a plain
fsync(), which Apple documents does NOT guarantee writes reach stable
storage or stay ordered. SQLite's WAL corruption-safety guarantee
assumes the OS honors the fsync barrier; macOS does not unless the app
uses F_FULLFSYNC. During a launchd *system* shutdown the page cache is
dropped (effectively power-loss for in-flight pages), so a WAL
checkpoint whose fsync 'reported' durable may never hit the platter —
corrupting state.db with a malformed image. That is the trigger in
#30636 ('SIGTERM during launchd shutdown under high load').

Apply PRAGMA checkpoint_fullfsync=1 (macOS-guarded) in
apply_wal_with_fallback. It forces the F_FULLFSYNC barrier only at
checkpoint boundaries (where WAL frames land in the main DB), so cost
amortizes to ~+0.1ms/commit vs ~+4ms for the broader fullfsync=1.
No-op off Darwin (F_FULLFSYNC is macOS-only).

Root-cause analysis by @catapreta on #30636. Supersedes #30654, whose
synchronous=FULL is a no-op (already FULL in WAL mode) and whose
TRUNCATE-on-close is already on main.

Co-authored-by: catapreta <catapreta@users.noreply.github.com>
2026-06-28 00:53:19 -07:00
Teknium
7c38249c79
feat(moa): references see full tool state + fire on every user/tool response (#54016)
The advisory reference view stripped all tool calls and tool results, so
reference models judged a task whose actions and results they never saw — and
references only fired once per user turn, never re-running as the agent's
state advanced through the tool loop.

Two fixes:
- _reference_messages() now PRESERVES the agent's tool calls and tool results,
  rendering them inline as text ([called tool: ...] / [tool result: ...]) so a
  reference gives an informed judgement on the real current state. Still emits
  zero tool-role messages and zero tool_calls arrays (strict providers reject
  those), and large tool results are previewed head+tail (4000-char budget).
  The required end-on-user shape is met by APPENDING a synthetic advisory user
  turn — not by deleting the agent's latest context (which the prior fix did).
- References now re-run on every state change — each new user message AND each
  new tool result — instead of once per user turn. The state-sensitive advisory
  signature drives the cache: new tool result = miss (re-run), identical-state
  re-call = hit (no re-run, no re-emit).

The acting aggregator still receives the full, untrimmed transcript.
2026-06-28 00:30:11 -07:00
kshitijk4poor
fc7a01b6cb test+harden: modernize salvaged Matrix path for current plugin layout
Two follow-ups on top of the salvaged #46365 fix:

1. Tests: the salvaged tests injected the ephemeral MatrixAdapter via
   sys.modules["gateway.platforms.matrix"], but Matrix migrated to a plugin
   (#41112) and the fallback now imports from plugins.platforms.matrix.adapter.
   Point the three sys.modules patches at the current module path so the
   ephemeral-fallback tests actually exercise the injected fake adapter.

2. Harden the live-adapter lookup: split the gateway import guard from the
   adapter lookup and log (instead of silently swallowing) when a runner
   exists but adapters.get() raises. A silent fall-through there would
   re-introduce the per-send reconnect/OTK-exhaustion storm this fix exists
   to prevent (#46310). Documented that the live adapter is gateway-owned and
   must not be disconnected, and why the ephemeral finally never touches it.
2026-06-28 12:48:08 +05:30
liuhao1024
a7fd62d824 fix(send_message): reuse live gateway adapter for Matrix media sends
When a live gateway adapter is available (i.e. the tool runs inside a
running gateway), reuse the persistent connection instead of creating a
new MatrixAdapter per call. This eliminates per-message E2EE re-init
storms that exhaust recipient OTKs and silently drop messages.

The fix follows the same pattern as _send_to_platform (line 618):
gateway_runner_ref → runner.adapters[Platform.MATRIX]. Falls back to
the ephemeral connect/disconnect cycle for standalone contexts.

Also extracts the shared send logic into _send_via_matrix_adapter()
to avoid duplicating the media dispatch code between the two paths.

Fixes #46310
2026-06-28 12:48:08 +05:30
Ben Barclay
1466eab4ee
test(docker): wait for cont-init to finish before privilege-drop shim tests (#54026)
The docker-exec privilege-drop shim tests started a sleep container and
released the fixture as soon as `docker exec <c> true` returned 0. On
s6-overlay that succeeds almost immediately — ~0.05s in measurement —
long before the `01-hermes-setup` cont-init hook (docker/stage2-hook.sh)
has finished seeding + `chown hermes:hermes` config.yaml and running the
Python config migration (cont-init only fully settles at ~9.8s under
arm64 QEMU emulation).

`test_shim_opt_out_keeps_root` wipes config.yaml, writes it as root with
HERMES_DOCKER_EXEC_AS_ROOT=1, and asserts root:root ownership. When the
fixture released the test inside that ~10s window, stage2-hook's
boot-time `chown hermes:hermes config.yaml` raced the root-written file
and reset it to hermes:hermes — failing the assertion. The window is
invisible on native amd64 (stage2-hook completes in a blink) but wide
open under the arm64 build's QEMU emulation, which is why only build-arm64
flaked while build-amd64 stayed green.

Replace the responsiveness poll with a wait on the canonical
'cont-init finished' signal: $HERMES_HOME/logs/container-boot.log gaining
a `profile=default` line, written by 02-reconcile-profiles which s6 runs
strictly after 01-hermes-setup. Mirrors the readiness pattern already
used in test_container_restart.py. Also bumps the readiness timeout 20s->60s
to cover slow emulation.

No production code change — test-only hardening of a timing race.
2026-06-28 17:06:26 +10:00
Teknium
4f61d48aef
test(cron): deterministically wait for ticker, fix wall-clock flake (#54010)
tests/cron/test_scheduler_provider.py spawned a background ticker thread,
slept a fixed 0.2s, then asserted the loop had called tick()/heartbeat() at
least N times. Under loaded CI the worker thread isn't always scheduled
within that window, so the loop hadn't ticked yet — flaking with 'provider
never called tick()' (assert 0 >= 1).

Add a _wait_until(predicate, timeout) helper and replace all five fixed
time.sleep(0.2) sites with a poll on the actual predicate (calls/beats count
reached). Same contract assertions, no wall-clock dependence.
2026-06-27 22:52:29 -07:00
Teknium
1fa44180b0
fix(moa): advisory references end on a user turn + get a reference-role system prompt (#54007)
* fix(moa): reference advisory view must end with a user turn

MoA reference calls failed with Anthropic models that don't support
assistant prefill (e.g. Claude Opus 4.8): '400 ... must end with a user
message'. The advisory view built by _reference_messages() kept the last
assistant turn's text while dropping the following tool result, leaving a
trailing assistant turn — which Anthropic (and OpenRouter->Anthropic)
interpret as an assistant prefill to continue. References are advisory and
must end on the user turn they answer.

Strip trailing assistant turns from the advisory view (preserving
intervening ones). Update the existing test that encoded the buggy shape
and add a mid-tool-loop regression test.

* feat(moa): give reference models an advisory-role system prompt

Reference models received the bare trimmed conversation with no role
framing, so they assumed they were the acting agent and refused ("I can't
access repositories/URLs from here") or tried to call tools they don't have.

Prepend a dedicated advisory system prompt to every reference call: the
model is an analyst, not the actor — it cannot execute, should not
apologize for lacking tools, and should reason about the presented state to
advise the aggregator/orchestrator on approach, next steps, tool-use
strategy, risks, and anything the acting agent missed. Its output is private
guidance for the aggregator, not a user-facing answer.
2026-06-27 22:52:25 -07:00
Teknium
2523917680
fix(tests): bare pytest flags pass through run_tests.sh without a '--' separator (#54008)
The parallel runner only forwarded pytest args after a literal '--', so a
bare 'scripts/run_tests.sh tests/foo.py -q' (or -v/-x/-k/--tb=long) errored
out with 'unrecognized arguments'. This contradicted the docstring's
promise that common pytest flags pass through, and forced a retry on every
run that used pytest muscle-memory.

Now any token starting with '-' that isn't one of the runner's own options
(-j/--jobs, --paths, --slice, --file-timeout, --generate-slices, --files,
--include-integration) is routed to each per-file pytest invocation
automatically. Value-taking flags given space-separated (-k expr, -m mark,
-p plugin, -o name=val, etc.) keep their value instead of having it stolen
by positional-path discovery. The explicit '--' separator still works and
stacks with bare flags.

- scripts/run_tests_parallel.py: argv splitter routes bare unknown flags to
  pytest; value-flag lookahead; updated docstring.
- scripts/run_tests.sh: usage comment reflects bare-flag passthrough.
- tests/test_run_tests_parallel.py: 4 behavior-contract tests (bare -q runs,
  -k keeps its value/filters, '--' still works, positional path stays a root).
2026-06-27 22:43:26 -07:00
teknium1
c918d42d88 feat(desktop): config-driven Electron launch flags + GPU policy
Adds a desktop: section to config.yaml so headless/VM users can make
`hermes desktop` launch correctly without a wrapper command:

- desktop.electron_flags: extra Electron CLI flags (e.g. --ozone-platform=x11)
  appended to every launch. Accepts a list or a shell-split string.
- desktop.disable_gpu: auto|true|false, bridged to the HERMES_DESKTOP_DISABLE_GPU
  env var the Electron app already reads. An explicit env var still wins.

cmd_gui() reads these via _desktop_launch_options() and applies them. This is
the config.yaml form of the capability proposed as a raw env var in #38934
(@1RB) — behavioral settings belong in config.yaml, not a new HERMES_* env var.

Co-authored-by: ray <86501179+1RB@users.noreply.github.com>
2026-06-27 22:26:43 -07:00
Rafael Millan
54ea059919 fix: fall back to no-sandbox for desktop launch on restricted Linux hosts 2026-06-27 22:16:20 -07:00
infinitycrew39
1fa46570fb test(agent,gateway): cover partial-stream recovery and restart helper salvage 2026-06-27 22:03:14 -07:00
Teknium
4626ceb747
fix(gateway): only offer system-scope gateway install to root sessions (#53975)
Non-root users picking 'System service' in the setup wizard were handed a
'sudo hermes gateway install --system --run-as-user <you>' recipe that fails
on most distros: sudo's secure_path strips ~/.local/bin (pipx/uv installs),
so 'sudo hermes' is command-not-found. Worse, it funnels a non-root user
toward a system install they shouldn't be doing from a user session.

Now prompt_linux_gateway_install_scope() only offers system scope when
os.geteuid()==0. Non-root sessions get user-service or skip, with a tip to
re-run as root for a boot service. The non-root branch in
install_linux_gateway_from_setup becomes a defensive guard that refuses
without printing any self-elevation recipe. Gated the matching deferral hint
in setup.py behind root too.
2026-06-27 21:24:08 -07:00
Priyanshu Sharma
f6deabca0d fix(gateway): clear stale base_url on model switches 2026-06-27 21:23:25 -07:00
teknium1
f54c52800a fix(models): scope live-first picker merge to opencode aggregators only
Follow-up to the salvaged #49129 commit. The original change flipped the
shared generic-provider merge in provider_model_ids() to live-first
unconditionally, which regressed curated-first for single providers
(kimi/zai, #46309) — and the PR encoded that regression by flipping the
kimi-coding and zai test assertions to expect live-first.

Gate live-first on an explicit _LIVE_FIRST_PICKER_PROVIDERS set
({opencode-zen, opencode-go}); every other provider keeps curated-first.
Also widen the uncapped picker + live-first sets to opencode-go, which has
the same 70+ model catalog problem as opencode-zen. Restore the
kimi-coding curated-first test and rewrite the merge-order test to assert
the per-provider contract.
2026-06-27 21:23:25 -07:00
Afnath Ahamed
f98ffbc246 fix(models): live-first merge + update opencode-zen catalog + uncap aggregator picker 2026-06-27 21:23:25 -07:00
HexLab98
04ff4d9b54 test(auxiliary): cover env-only proxy policy for auxiliary clients (#53702) 2026-06-27 21:22:49 -07:00
Teknium
3b23a984b5
feat(kanban): stamp handoff freshness so workers don't read stale state as current (#53973)
Multi-agent boards leak staleness: a sibling worker's parent handoff,
comment, or prior-attempt summary gets read by the next worker as live
truth even when it's a day old. build_worker_context surfaced the text
with (at best) a bare absolute timestamp, which an LLM reads as fact
regardless of age — parent results had no timestamp at all.

Adds a coarse relative-age stamp (just now / 18h ago / 3d ago) to every
recalled-state line and a one-line 'point-in-time snapshot, re-verify
against source' frame on the parent-results section, so the worker sees
when handoffs were produced and re-checks stale ones before acting.
2026-06-27 21:21:54 -07:00