Commit graph

6468 commits

Author SHA1 Message Date
LeonSGP43
9f0e64cedd fix(gateway): force exit after graceful shutdown
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-06-28 02:34:23 -07:00
yungchentang
7e2ca7f68d fix(telegram): reset send pool after pool timeouts 2026-06-28 02:34:17 -07:00
Teknium
9f17f16c66 fix(environments): use $BASHPID for atomic snapshot temp + harden failure path
The atomic mv approach (kyssta-exe's commit) narrows but does not close the
#38249 race: the temp name used $$ (parent shell PID), which is identical
across &-launched concurrent subshells. Two concurrent writers pick the same
temp file, clobber each other mid-write, and mv then publishes a torn snapshot
— a reader sourcing it absorbs declare-x/export fragments into PATH.

- Use $BASHPID (actual per-subshell PID) so concurrent writers never collide.
- Chain mv on export success (&&) and rm the temp on failure so a partial dump
  never replaces a good snapshot; apply the same to the init_session bootstrap.
- shlex-quote the static temp-path portion (Windows/spaces), $BASHPID outside.
- LocalEnvironment.cleanup sweeps orphaned snap.tmp.* temps.
- Regression tests: string-shape + a behavioral concurrent writers/readers test
  that proves the snapshot never tears (would still tear with $$).
2026-06-28 02:08:57 -07:00
teknium1
c23f394eb8 fix: satisfy ruff encoding + windows-footgun lints for cgroup reaper
- read_text(encoding='utf-8') (PLW1514)
- # windows-footgun: ok on signal.SIGKILL — module is Linux-only (reads
  /proc, /sys/fs/cgroup; runs from a systemd unit)
- test lambda accepts the new encoding kwarg
2026-06-28 02:05:50 -07:00
PRATHAMESH75
e551da6ddb fix(gateway): reap cgroup orphans via ExecStopPost to unblock restart
Long-lived helpers spawned indirectly by tool calls (adb, platform
bridges) were left in the service cgroup after the gateway's main
process exited. When the kernel rejected the deferred cgroup-wide kill
with EINVAL, systemd blocked Restart=always for 6+ minutes, taking
down all platforms and cron windows (#37454).

Add a small ExecStopPost helper (gateway.cgroup_cleanup) that walks
cgroup.procs and sends per-PID SIGKILLs — a different kernel code path
than cgroup.kill, so it succeeds where the cgroup-wide write failed.
KillMode=mixed is preserved so the gateway still reaps its own
tool-call children before systemd intervenes (#8202).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-28 02:05:50 -07:00
Coy Geek
d7a1052424 fix(env-passthrough): fail closed when provider blocklist import fails
When tools.environments.local can't be imported (partial install,
import-time error), _is_hermes_provider_credential() returned False —
fail-open. A skill could then register a Hermes provider credential
(ANTHROPIC_API_KEY, etc.) as env passthrough; _scrub_child_env lets
passthrough vars bypass the secret-substring net (rule 1), so the
operator's real key would land in the execute_code child. Reopens the
GHSA-rhgp-j443-p4rf bypass.

Fail closed instead: on import failure, treat the name as a protected
provider credential and refuse passthrough. Regression test exercises
the full register -> scrub path under a simulated import failure.

Co-authored-by: Hermes Agent <noreply@nousresearch.com>
2026-06-28 02:05:43 -07:00
teknium1
58c36b1798 fix(api-server): widen error redaction to cron-endpoint + SSE sites
Follow-up to the salvaged #37733 fix. The contributor centralized
redaction at _openai_error and the chat/responses failure paths, which
covers the OpenAI-compatible envelopes transitively. Two sibling classes
crossed the same authenticated HTTP boundary unredacted:

- 8x cron-management endpoints returning {"error": str(e)} on 500
- the session-chat SSE error event ({"message": str(exc)})

Route both through the same _redact_api_error_text(force=True) helper.
Add AUTHOR_MAP entry for coygeek and a TestRedactApiErrorText guard
covering mask/force/limit/passthrough behavior.
2026-06-28 02:05:38 -07:00
Coy Geek
5e774de76e fix(api-server): redact provider errors at HTTP boundary
Force API-server error text through the existing secret redactor before returning OpenAI-compatible errors, response fallback text, response snapshots, and run failure events. This prevents credential-shaped provider failure text from crossing the API-server boundary while preserving debuggable sanitized messages.
2026-06-28 02:05:38 -07:00
HexLab98
d2fda5925d test(gateway): cover Discord/Slack compression status suppression (#39293) 2026-06-28 14:35:32 +05:30
teknium1
3aaa98dd01 test(whatsapp): cover LID allowlist match on modern session layout
Add an _is_user_authorized E2E for the platforms/whatsapp/session layout
on top of fesalfayed's resolver fix (#36665) — guards the actual
silently-dropped-LID-sender path from #36664.
2026-06-28 02:05:26 -07:00
fesalfayed
263ffec1b0 fix(whatsapp): resolve LID aliases on modern platforms/ session layout
expand_whatsapp_aliases hardcoded get_hermes_home()/whatsapp/session, but
the adapter writes lid-mapping files via get_hermes_dir("platforms/whatsapp/
session", "whatsapp/session"). On installs without the legacy directory the
two paths diverge, so the resolver finds no mappings and returns the bare LID,
which misses the allowlist and silently drops the message. Resolve through the
same helper so both sides stay in lockstep on new and legacy layouts.
2026-06-28 02:05:26 -07:00
xxxigm
093f567f0d fix(agent,cli): surface empty-body API errors and fail oneshot exit code
When an LLM API call returns HTTP 4xx with an empty parsed SDK `body` ({}),
`_summarize_api_error` fell through to a bare `str(error)`, so users saw only
"HTTP 400" with no provider detail (reported on Windows in #36109). The SDK
leaves `body` empty in this case, but the httpx `response` still carries the
payload in `.text`.

- run_agent.py `_summarize_api_error`: when `body` is empty, fall back to
  `response.text` — parse a JSON `error.message`/`message` when present, else
  surface the raw (truncated) body. Platform-agnostic diagnostics.
- hermes_cli/oneshot.py: `hermes -z` now runs via `run_conversation` and returns
  exit code 2 when the run is failed/partial with no usable final response, so
  scripts can detect LLM failures (still 0 when a response — incl. an error
  summary as output — is produced).

Tests: new tests/run_agent/test_summarize_api_error.py (empty-body JSON + raw
text, RED/GREEN verified) + oneshot exit-code/`run_conversation` wiring tests.

NOTE: #36109's original root cause (Windows "all providers return empty 400")
is not reproducible on current main (heavy provider-transport churn since
v0.15.1). This change does not claim to fix that root cause — it makes any
empty-body API error LEGIBLE so a future occurrence shows the real provider
message instead of a bare HTTP 400. Relates to #36109 (does not close it).
2026-06-28 02:05:20 -07:00
teknium1
c0b4a3438a fix(install): scope Playwright override to too-new apt releases + keep step interruptible
Follow-up on #54032 for #35166:
- Gate the PLAYWRIGHT_HOST_PLATFORM_OVERRIDE retry on the host being an apt
  release newer than Playwright recognizes (Ubuntu >24.04 / Debian >13) via
  playwright_host_unrecognized(), instead of retrying on ANY install failure.
  A network/disk/permission failure on a supported host now surfaces unchanged
  rather than getting a mismatched-glibc build forced onto it.
- detect_os() now captures DISTRO_VERSION from os-release.
- Fold in the interruptibility fix (was PR #35304, self-closed): wrap the
  download in 'timeout --foreground -k 10' (probed, with plain-timeout
  fallback) so a terminal Ctrl+C reaches the child and a wedged download is
  force-killed after the deadline.
- Add behavioral tests that source the helpers and assert the retry fires only
  on Ubuntu 26.04 / Debian 14, not on supported hosts, non-apt distros,
  native-success, operator-pinned override, or unsupported arch.
2026-06-28 02:05:18 -07:00
kshitijk4poor
a28fe788a6 fix(install): retry Playwright install with platform override on unrecognized host (#35166)
On apt releases newer than the bundled Playwright recognizes (Ubuntu 26.04,
Debian 14, and future distros), 'npx playwright install --with-deps chromium'
hangs uninterruptibly at 'Installing Playwright Chromium with system
dependencies' because Playwright's resolver maps the host to a platform with
no download build (#35166).

Wrap every installer Playwright call in run_playwright_install(), which tries
the native install first and, only if it fails or times out, retries once with
PLAYWRIGHT_HOST_PLATFORM_OVERRIDE pinned to the newest known build
(ubuntu24.04-<arch>). This is the escape hatch Playwright's maintainers bless
for unrecognized platforms (microsoft/playwright#33434).

Try-native-first (not a hardcoded distro/version table) is deliberate:
- Self-correcting — when Playwright already supports the host (e.g. Ubuntu
  26.04 on Playwright >=1.61) the first attempt succeeds and the override is
  never applied, so we never force a mismatched-glibc build onto a release
  Playwright handles correctly (microsoft/playwright#35114).
- Zero-maintenance — new distro releases work the moment Playwright adds them.
- Covers Debian 14+ and future releases, not just Ubuntu 26.04.

An operator-set PLAYWRIGHT_HOST_PLATFORM_OVERRIDE is always respected (applied
to the first attempt; retry skipped). Non-x64/arm64 arches have no fallback
build and skip the retry.

Refs #35166
2026-06-28 02:05:18 -07:00
teknium1
64972b6403 fix(config): canonicalize model.name/model.model to model.default (#34500)
A custom_providers config that names the model under model.name (or
model.model) resolved to an empty model, so the API request went out
with model= — HTTP 400 from OpenAI-compatible backends. Display paths
(hermes status/dump) already read model.name and showed the model,
making the failure silent.

The model id was read via 'default or model' at ~14 independent sites
(cli, gateway, cron, curator, oneshot, fallback, profiles, ...), none
of which honored 'name'. Rather than patch every site, canonicalize at
the single load/save chokepoint: _normalize_root_model_keys() now
promotes model.model/model.name -> model.default (precedence
default > model > name) and drops the stale alias, so every reader —
present and future — sees a populated default and config.yaml is
migrated canonical on next save. The gateway, which bypasses
load_config(), replays the same normalization in _load_gateway_config().

Co-authored-by: Bartok9 <danielrpike9@gmail.com>

Credit: root-cause analysis and fix direction from @Bartok9 (#34502,
first) and @v86861062 (#34527).
2026-06-28 02:05:13 -07:00
Teknium
2ecb6f7fe6
fix(telegram): clear send_path_degraded on successful reconnect (#35205) (#54076)
* fix(telegram): clear send_path_degraded on successful reconnect

_send_path_degraded was cleared only in _verify_polling_after_reconnect,
60s after reconnect and only if scheduled. A clean start_polling() reconnect
left the flag stuck True, short-circuiting send() and blocking all outbound
messages until the deferred probe ran (or forever if it never did).

Clear the flag the moment start_polling() succeeds — that is the recovery
signal. The deferred probe remains a defensive re-check that re-enters the
reconnect ladder (re-setting the flag) if it detects a silent wedge.

Fixes #35205.

* docs: add infographic for #35205 telegram send-path fix
2026-06-28 01:38:17 -07:00
Teknium
674e16e7c6
fix(redact): stop DB-connstr redaction from corrupting code output (#33801) (#54061)
Secret redaction is display/output-scoped on main — write_file writes
content verbatim, terminal/execute_code redact only output not the
command/source. The real bug is in displayed tool OUTPUT (read_file,
terminal, execute_code):

_DB_CONNSTR_RE's password group [^@]+ was greedy across newlines, so on a
multi-line block it scanned past the DSN line to the next stray '@' (a
Python @decorator), replacing every intervening character — including line
breaks — with ***. That dropped lines and concatenated the next line onto
the f-string line, making read_file output look corrupted (the file on disk
was always correct). Reported in #33801.

Fix:
- Forbid whitespace in the userinfo/password groups ([^:\s]+ / [^@\s]+) so
  the match can never span a line break. A real DSN password never contains
  whitespace. This alone kills the catastrophic line-dropping.
- Under code_file=True, preserve a password group that is a pure {...} brace
  expression — f"postgresql://{user}:{pass}@{host}" is an f-string template,
  not a live credential. Literal passwords are still masked.
- Pass code_file=True at the terminal and execute_code output redaction call
  sites (file_tools already did) so code-execution output isn't corrupted by
  ENV/JSON/template false positives. Real prefixes, auth headers, JWTs, and
  private keys are still redacted.

Verified E2E against the reporter's exact pydantic-settings module: file
written verbatim, read_file shows the DSN f-string + @model_validator intact
with zero *** corruption, while a literal postgresql://admin:pw@host DSN and
a real sk- key are still masked.

Reported-by: koishi70
Reported-by: pfrenssen
2026-06-28 01:15:39 -07:00
teknium1
578e3989d4 fix(agent): route content-filter stream stalls to fallback chain (#32421)
When a provider's output-layer safety filter (MiniMax "output new_sensitive
(1027)", Azure content_filter, etc.) kills a streaming response after deltas
were already sent, interruptible_streaming_api_call swallows the raw error
into a finish_reason=length partial-stream stub. The conversation loop then
burned 3 continuation retries against the SAME primary — re-hitting the
content-deterministic filter every time — and gave up with "Response remained
truncated after 3 continuation attempts", never consulting fallback_providers.

Builds on @595650661's classifier change (cherry-picked) so error_classifier
recognizes the filter; then:
- chat_completion_helpers: run the swallowed error through error_classifier at
  the stub-creation point and stamp _content_filter_terminated on the stub
  (single source of truth — no parallel pattern list).
- conversation_loop: read the tag and activate the fallback chain BEFORE
  burning any continuation retries; roll partial content back to the last
  clean turn and re-issue against the new provider (restart_with_rebuilt_messages).
  Plain network stalls are unaffected (only content_policy_blocked is tagged).

Credits #32479 (@sweetcornna) and #33845 (@Tranquil-Flow) which fixed the
same issue via the stub-tag and loop-escalation approaches respectively.

Live E2E confirmed: before, _try_activate_fallback called 0x; after, fallback
fires on the first stub and the fallback provider completes the turn.
2026-06-28 01:15:21 -07:00
595650661
b8e2268628 fix(agent): add MiniMax 'new_sensitive' to content_policy_blocked patterns
The MiniMax output-layer safety filter surfaces the error verbatim as
`output new_sensitive (1027)` (sometimes with additional provider
wrapping like 'Stream stalled mid tool-call: output new_sensitive (1027)').
When the model emits a large tool-call argument block, the upstream
filter trips and the SSE stream is truncated mid-flight, producing
'stream stalled mid tool-call' errors. Until now this case was
misclassified and retried 3x on the same provider, reproducing the same
refusal and burning paid attempts.

Adding `new_sensitive` to `_CONTENT_POLICY_BLOCKED_PATTERNS` routes
it through the existing is_client_error path: skip 3x retry, activate
configured fallback model immediately, surface a clear provider-safety
message to the user.

Refs #32421
2026-06-28 01:15:21 -07:00
Teknium
c9df4bc094
fix(gateway): default restart_drain_timeout to 0 to kill systemd crash loop (#54066)
A restart now interrupts in-flight agents immediately rather than holding
the gateway open for a grace window. The previous 180s default coupled two
independently-set timers: the gateway's own drain timer and systemd's
TimeoutStopSec. On a stale unit where TimeoutStopSec < drain, systemd
SIGKILLed the gateway mid-cleanup, leaving a stale lock that made the next
startup exit immediately ('already running') — an infinite crash loop under
Restart=on-failure (#31981).

Setting drain to 0 makes the mismatch structurally impossible: with drain 0
the generated unit gets TimeoutStopSec=90 against a near-instant drain, so
systemd never kills mid-cleanup. Contract: restart the gateway, in-flight
work stops. A grace window large enough to 'save' a long agent turn would
have to outlast an unbounded task, which is impossible.

Also fixes the stale-unit warning's suggested command
(hermes gateway service install --replace -> hermes gateway install --force);
the former subcommand does not exist.

Closes #31981
2026-06-28 01:14:34 -07:00
teknium1
1f72ad9be9 refactor(cli): extract interrupt recovery to a testable helper
Pull the #33271 post-interrupt recovery (flush_stdin + _force_full_redraw)
out of process_loop's finally block into _recover_terminal_after_interrupt(),
and replace the inline-logic-copy tests with ones that exercise the real
helper plus a source guard that process_loop still invokes it behind the
_last_turn_interrupted gate.
2026-06-28 01:08:09 -07:00
zccyman
f3aaba7f85 fix(cli): recover terminal state after interrupt to prevent raw control sequence freeze
When the agent is interrupted during processing, prompt_toolkit's
renderer and VT100 input parser can be left in an inconsistent state.
CSI 6n cursor position report responses leak as literal text
(^[[19;1R) and the terminal stops accepting keyboard input.

Fix: in process_loop's finally block, after an interrupted turn:
- flush_stdin() to drain stray escape bytes from the OS input buffer
- _force_full_redraw() to reset prompt_toolkit's renderer cache

Closes #33271
2026-06-28 01:08:09 -07:00
teknium1
aacc15b2c9 fix(clarify): raise default clarify_timeout to 3600s (#32762)
The 600s default evicted the gateway clarify entry while users were
still away (meeting/AFK); a later button tap then landed on a dead
entry and the agent hung on 'running: clarify'. Raise the default to
1h in DEFAULT_CONFIG and the get_clarify_timeout() code-level fallback,
documenting the running-agent-guard tradeoff. User overrides still win.
2026-06-28 01:07:53 -07:00
konsisumer
3f543229f2 fix(telegram): notify user when clarify button tap arrives after expiry 2026-06-28 01:07:53 -07:00
Teknium
90d25adc9e
fix(gateway): deliver profile-scoped cache media on symlinked HERMES_HOME (#54060)
Generated images under a profile gateway's cache (profiles/<name>/cache/
images/...) were silently dropped from Telegram/Discord delivery when
HERMES_HOME is symlinked under a denied prefix (e.g. /opt/data ->
/root/.hermes) and $HOME is not that prefix. The resolved path lands
under /root (a system denylist prefix), the root-home exception only
fires when the denied prefix IS $HOME, and the static safe-roots list
only covers the active HERMES_HOME's top-level cache — not per-profile
cache dirs. Both gates fail, so validate_media_delivery_path returns
None and the gateway logs 'Skipping unsafe MEDIA directive path'.

_media_delivery_allowed_roots() now also enumerates per-profile cache
roots (<root>/profiles/*/cache/{images,audio,videos,documents,
screenshots}) at check time. Allowlist match runs before the denylist,
so the profile artifact delivers regardless of the /root interaction;
profile-dir credentials (auth.json) stay blocked since they aren't
under a cache subdir.

Reopened regression of #34485/#38108, neither of which covered the
profile-scoped symlink case. Fixes #31733.
2026-06-28 01:07:28 -07:00
sweetcornna
2701ea2f0c fix(agent): reopen fallback chain after primary recovery 2026-06-28 00:57:42 -07:00
teknium1
7b9ff310b6 fix: salvage #33830 for current main — relocate allow_bots bridge to telegram plugin hook, fix stale adapter import in test 2026-06-28 00:57:03 -07:00
sweetcornna
fc70d023d8 fix(telegram): apply bot auth policy to Telegram sources
# Conflicts:
#	gateway/config.py
2026-06-28 00:57:03 -07:00
teknium1
52d774f0f9 fix(state): F_FULLFSYNC barrier at WAL checkpoints on macOS (#30636)
On Darwin, synchronous=FULL (the WAL default) only issues a plain
fsync(), which Apple documents does NOT guarantee writes reach stable
storage or stay ordered. SQLite's WAL corruption-safety guarantee
assumes the OS honors the fsync barrier; macOS does not unless the app
uses F_FULLFSYNC. During a launchd *system* shutdown the page cache is
dropped (effectively power-loss for in-flight pages), so a WAL
checkpoint whose fsync 'reported' durable may never hit the platter —
corrupting state.db with a malformed image. That is the trigger in
#30636 ('SIGTERM during launchd shutdown under high load').

Apply PRAGMA checkpoint_fullfsync=1 (macOS-guarded) in
apply_wal_with_fallback. It forces the F_FULLFSYNC barrier only at
checkpoint boundaries (where WAL frames land in the main DB), so cost
amortizes to ~+0.1ms/commit vs ~+4ms for the broader fullfsync=1.
No-op off Darwin (F_FULLFSYNC is macOS-only).

Root-cause analysis by @catapreta on #30636. Supersedes #30654, whose
synchronous=FULL is a no-op (already FULL in WAL mode) and whose
TRUNCATE-on-close is already on main.

Co-authored-by: catapreta <catapreta@users.noreply.github.com>
2026-06-28 00:53:19 -07:00
Teknium
7c38249c79
feat(moa): references see full tool state + fire on every user/tool response (#54016)
The advisory reference view stripped all tool calls and tool results, so
reference models judged a task whose actions and results they never saw — and
references only fired once per user turn, never re-running as the agent's
state advanced through the tool loop.

Two fixes:
- _reference_messages() now PRESERVES the agent's tool calls and tool results,
  rendering them inline as text ([called tool: ...] / [tool result: ...]) so a
  reference gives an informed judgement on the real current state. Still emits
  zero tool-role messages and zero tool_calls arrays (strict providers reject
  those), and large tool results are previewed head+tail (4000-char budget).
  The required end-on-user shape is met by APPENDING a synthetic advisory user
  turn — not by deleting the agent's latest context (which the prior fix did).
- References now re-run on every state change — each new user message AND each
  new tool result — instead of once per user turn. The state-sensitive advisory
  signature drives the cache: new tool result = miss (re-run), identical-state
  re-call = hit (no re-run, no re-emit).

The acting aggregator still receives the full, untrimmed transcript.
2026-06-28 00:30:11 -07:00
kshitijk4poor
fc7a01b6cb test+harden: modernize salvaged Matrix path for current plugin layout
Two follow-ups on top of the salvaged #46365 fix:

1. Tests: the salvaged tests injected the ephemeral MatrixAdapter via
   sys.modules["gateway.platforms.matrix"], but Matrix migrated to a plugin
   (#41112) and the fallback now imports from plugins.platforms.matrix.adapter.
   Point the three sys.modules patches at the current module path so the
   ephemeral-fallback tests actually exercise the injected fake adapter.

2. Harden the live-adapter lookup: split the gateway import guard from the
   adapter lookup and log (instead of silently swallowing) when a runner
   exists but adapters.get() raises. A silent fall-through there would
   re-introduce the per-send reconnect/OTK-exhaustion storm this fix exists
   to prevent (#46310). Documented that the live adapter is gateway-owned and
   must not be disconnected, and why the ephemeral finally never touches it.
2026-06-28 12:48:08 +05:30
liuhao1024
a7fd62d824 fix(send_message): reuse live gateway adapter for Matrix media sends
When a live gateway adapter is available (i.e. the tool runs inside a
running gateway), reuse the persistent connection instead of creating a
new MatrixAdapter per call. This eliminates per-message E2EE re-init
storms that exhaust recipient OTKs and silently drop messages.

The fix follows the same pattern as _send_to_platform (line 618):
gateway_runner_ref → runner.adapters[Platform.MATRIX]. Falls back to
the ephemeral connect/disconnect cycle for standalone contexts.

Also extracts the shared send logic into _send_via_matrix_adapter()
to avoid duplicating the media dispatch code between the two paths.

Fixes #46310
2026-06-28 12:48:08 +05:30
Ben Barclay
1466eab4ee
test(docker): wait for cont-init to finish before privilege-drop shim tests (#54026)
The docker-exec privilege-drop shim tests started a sleep container and
released the fixture as soon as `docker exec <c> true` returned 0. On
s6-overlay that succeeds almost immediately — ~0.05s in measurement —
long before the `01-hermes-setup` cont-init hook (docker/stage2-hook.sh)
has finished seeding + `chown hermes:hermes` config.yaml and running the
Python config migration (cont-init only fully settles at ~9.8s under
arm64 QEMU emulation).

`test_shim_opt_out_keeps_root` wipes config.yaml, writes it as root with
HERMES_DOCKER_EXEC_AS_ROOT=1, and asserts root:root ownership. When the
fixture released the test inside that ~10s window, stage2-hook's
boot-time `chown hermes:hermes config.yaml` raced the root-written file
and reset it to hermes:hermes — failing the assertion. The window is
invisible on native amd64 (stage2-hook completes in a blink) but wide
open under the arm64 build's QEMU emulation, which is why only build-arm64
flaked while build-amd64 stayed green.

Replace the responsiveness poll with a wait on the canonical
'cont-init finished' signal: $HERMES_HOME/logs/container-boot.log gaining
a `profile=default` line, written by 02-reconcile-profiles which s6 runs
strictly after 01-hermes-setup. Mirrors the readiness pattern already
used in test_container_restart.py. Also bumps the readiness timeout 20s->60s
to cover slow emulation.

No production code change — test-only hardening of a timing race.
2026-06-28 17:06:26 +10:00
Teknium
4f61d48aef
test(cron): deterministically wait for ticker, fix wall-clock flake (#54010)
tests/cron/test_scheduler_provider.py spawned a background ticker thread,
slept a fixed 0.2s, then asserted the loop had called tick()/heartbeat() at
least N times. Under loaded CI the worker thread isn't always scheduled
within that window, so the loop hadn't ticked yet — flaking with 'provider
never called tick()' (assert 0 >= 1).

Add a _wait_until(predicate, timeout) helper and replace all five fixed
time.sleep(0.2) sites with a poll on the actual predicate (calls/beats count
reached). Same contract assertions, no wall-clock dependence.
2026-06-27 22:52:29 -07:00
Teknium
1fa44180b0
fix(moa): advisory references end on a user turn + get a reference-role system prompt (#54007)
* fix(moa): reference advisory view must end with a user turn

MoA reference calls failed with Anthropic models that don't support
assistant prefill (e.g. Claude Opus 4.8): '400 ... must end with a user
message'. The advisory view built by _reference_messages() kept the last
assistant turn's text while dropping the following tool result, leaving a
trailing assistant turn — which Anthropic (and OpenRouter->Anthropic)
interpret as an assistant prefill to continue. References are advisory and
must end on the user turn they answer.

Strip trailing assistant turns from the advisory view (preserving
intervening ones). Update the existing test that encoded the buggy shape
and add a mid-tool-loop regression test.

* feat(moa): give reference models an advisory-role system prompt

Reference models received the bare trimmed conversation with no role
framing, so they assumed they were the acting agent and refused ("I can't
access repositories/URLs from here") or tried to call tools they don't have.

Prepend a dedicated advisory system prompt to every reference call: the
model is an analyst, not the actor — it cannot execute, should not
apologize for lacking tools, and should reason about the presented state to
advise the aggregator/orchestrator on approach, next steps, tool-use
strategy, risks, and anything the acting agent missed. Its output is private
guidance for the aggregator, not a user-facing answer.
2026-06-27 22:52:25 -07:00
Teknium
2523917680
fix(tests): bare pytest flags pass through run_tests.sh without a '--' separator (#54008)
The parallel runner only forwarded pytest args after a literal '--', so a
bare 'scripts/run_tests.sh tests/foo.py -q' (or -v/-x/-k/--tb=long) errored
out with 'unrecognized arguments'. This contradicted the docstring's
promise that common pytest flags pass through, and forced a retry on every
run that used pytest muscle-memory.

Now any token starting with '-' that isn't one of the runner's own options
(-j/--jobs, --paths, --slice, --file-timeout, --generate-slices, --files,
--include-integration) is routed to each per-file pytest invocation
automatically. Value-taking flags given space-separated (-k expr, -m mark,
-p plugin, -o name=val, etc.) keep their value instead of having it stolen
by positional-path discovery. The explicit '--' separator still works and
stacks with bare flags.

- scripts/run_tests_parallel.py: argv splitter routes bare unknown flags to
  pytest; value-flag lookahead; updated docstring.
- scripts/run_tests.sh: usage comment reflects bare-flag passthrough.
- tests/test_run_tests_parallel.py: 4 behavior-contract tests (bare -q runs,
  -k keeps its value/filters, '--' still works, positional path stays a root).
2026-06-27 22:43:26 -07:00
teknium1
c918d42d88 feat(desktop): config-driven Electron launch flags + GPU policy
Adds a desktop: section to config.yaml so headless/VM users can make
`hermes desktop` launch correctly without a wrapper command:

- desktop.electron_flags: extra Electron CLI flags (e.g. --ozone-platform=x11)
  appended to every launch. Accepts a list or a shell-split string.
- desktop.disable_gpu: auto|true|false, bridged to the HERMES_DESKTOP_DISABLE_GPU
  env var the Electron app already reads. An explicit env var still wins.

cmd_gui() reads these via _desktop_launch_options() and applies them. This is
the config.yaml form of the capability proposed as a raw env var in #38934
(@1RB) — behavioral settings belong in config.yaml, not a new HERMES_* env var.

Co-authored-by: ray <86501179+1RB@users.noreply.github.com>
2026-06-27 22:26:43 -07:00
Rafael Millan
54ea059919 fix: fall back to no-sandbox for desktop launch on restricted Linux hosts 2026-06-27 22:16:20 -07:00
infinitycrew39
1fa46570fb test(agent,gateway): cover partial-stream recovery and restart helper salvage 2026-06-27 22:03:14 -07:00
Teknium
4626ceb747
fix(gateway): only offer system-scope gateway install to root sessions (#53975)
Non-root users picking 'System service' in the setup wizard were handed a
'sudo hermes gateway install --system --run-as-user <you>' recipe that fails
on most distros: sudo's secure_path strips ~/.local/bin (pipx/uv installs),
so 'sudo hermes' is command-not-found. Worse, it funnels a non-root user
toward a system install they shouldn't be doing from a user session.

Now prompt_linux_gateway_install_scope() only offers system scope when
os.geteuid()==0. Non-root sessions get user-service or skip, with a tip to
re-run as root for a boot service. The non-root branch in
install_linux_gateway_from_setup becomes a defensive guard that refuses
without printing any self-elevation recipe. Gated the matching deferral hint
in setup.py behind root too.
2026-06-27 21:24:08 -07:00
Priyanshu Sharma
f6deabca0d fix(gateway): clear stale base_url on model switches 2026-06-27 21:23:25 -07:00
teknium1
f54c52800a fix(models): scope live-first picker merge to opencode aggregators only
Follow-up to the salvaged #49129 commit. The original change flipped the
shared generic-provider merge in provider_model_ids() to live-first
unconditionally, which regressed curated-first for single providers
(kimi/zai, #46309) — and the PR encoded that regression by flipping the
kimi-coding and zai test assertions to expect live-first.

Gate live-first on an explicit _LIVE_FIRST_PICKER_PROVIDERS set
({opencode-zen, opencode-go}); every other provider keeps curated-first.
Also widen the uncapped picker + live-first sets to opencode-go, which has
the same 70+ model catalog problem as opencode-zen. Restore the
kimi-coding curated-first test and rewrite the merge-order test to assert
the per-provider contract.
2026-06-27 21:23:25 -07:00
Afnath Ahamed
f98ffbc246 fix(models): live-first merge + update opencode-zen catalog + uncap aggregator picker 2026-06-27 21:23:25 -07:00
HexLab98
04ff4d9b54 test(auxiliary): cover env-only proxy policy for auxiliary clients (#53702) 2026-06-27 21:22:49 -07:00
Teknium
3b23a984b5
feat(kanban): stamp handoff freshness so workers don't read stale state as current (#53973)
Multi-agent boards leak staleness: a sibling worker's parent handoff,
comment, or prior-attempt summary gets read by the next worker as live
truth even when it's a day old. build_worker_context surfaced the text
with (at best) a bare absolute timestamp, which an LLM reads as fact
regardless of age — parent results had no timestamp at all.

Adds a coarse relative-age stamp (just now / 18h ago / 3d ago) to every
recalled-state line and a one-line 'point-in-time snapshot, re-verify
against source' frame on the parent-results section, so the worker sees
when handoffs were produced and re-checks stale ones before acting.
2026-06-27 21:21:54 -07:00
Teknium
131c9c542c test(tui-gateway): stop deferred-resume build thread leaking into next test
test_session_resume_uses_parent_lineage_for_display resumes via the
deferred (non-eager) path, which fires a 50ms background Timer
(_schedule_agent_build) calling whatever server._make_agent is patched
in at that moment. The timer outlived the test and landed in the next
test's (_follows_compression_tip) _make_agent mock, racily setting
agent_session_id='tip' and flaking 'assert tip == cont_tip' on CI.

Root-cause fix: stub _schedule_agent_build to a no-op in the leaking
test (it only asserts display history). Defense in depth: the victim's
fake_make_agent now setdefault()s so a stray late build can't overwrite
the synchronous eager build's captured id.
2026-06-27 21:07:53 -07:00
Teknium
e418605450 test(24996): freeze monotonic clock to de-flake fallback cooldown timing
The exhaustion-cooldown timing assertions relied on a wall-clock budget
(before + window + 1.0s). On loaded CI runners the activation calls could
exceed the 1s slack, flaking 'Run tests slice 4/8'. Freeze
chat_completion_helpers.time.monotonic so the cooldown math is exact and
load-independent across all four tests.
2026-06-27 21:07:53 -07:00
zccyman
db11849c9d fix(skills): skip shadowing when external_dirs provides the skill
Fixes #28126. sync_skills() was unconditionally writing bundled skills
into the local <profile_home>/skills/ tree even when the profile's
config.yaml delegated skill resolution to an external directory
via skills.external_dirs. The skill loader then saw two candidates
for the same name (local shadow + external canonical), refused to
resolve on collision, and every worker that auto-loaded such a skill
crashed with 'Unknown skill(s): <name>'.

Changes:
- _build_external_skill_index() indexes skills available in external
  dirs (by directory name and frontmatter name)
- sync_skills() skips writing a bundled skill when it finds the same
  name in the external index; records the hash in the manifest so
  subsequent syncs treat it as already handled
- Self-healing: removes stale local shadows left by prior buggy syncs
  (only when origin_hash == bundled_hash == user_hash, i.e. we wrote
  it and user didn't touch it)
- New 'shadowed_by_external' key in sync_skills() return dict

3 new tests in TestExternalDirsIndexing (all passing).
All 48 tests in test_skills_sync.py pass.

Closes #28126
2026-06-27 21:07:53 -07:00
Teknium
a8c862900b
fix(tui): sanitize replay history on WebUI/TUI session resume (#29086) (#53939)
A WebUI/TUI session whose last turn died mid-tool-loop (stale-timeout kill,
interrupt, or process restart before the tool result was written) persists a
dangling assistant(tool_calls) or interrupted assistant->tool tail. The
messaging gateway already strips these tails before replay (the #49201 fix),
but the TUI/WebUI resume path fed db.get_messages_as_conversation() straight
in as the agent's conversation_history with no cleanup. The model re-issued
the unanswered call on every resume -- including after a full WebUI + Gateway
restart, since the poison lives in the SessionDB, not memory -- leaving the
session permanently 'thinking'. Only deleting the session recovered it.

- Extract the two strippers + helper from gateway/run.py into a shared
  agent/replay_cleanup.py (sanitize_replay_history wraps both).
- gateway/run.py re-exports under the historical private names; messaging
  behavior unchanged.
- Both TUI cold-resume sites now sanitize the model-fed history while leaving
  the display transcript untouched, so the user still sees their full history.

Verified E2E against a real SessionDB: dangling and interrupted tails are
stripped from the model feed, healthy mid-progress tool sequences are
preserved, and the display transcript is always the full raw history.
2026-06-27 20:56:49 -07:00
Teknium
f03823014b
fix(telegram): kill 409 polling conflict loop by disarming PTB retry synchronously (#53941)
Telegram polling entered a self-inflicted ~31s loop of 409 Conflict ->
retry -> resume -> Conflict. The error_callback PTB invokes synchronously
inside its internal network_retry_loop only scheduled our async recovery
task (loop.create_task) and returned, so PTB kept polling getUpdates on its
own while our handler concurrently ran stop -> sleep -> start_polling. The
two polling sessions overlapped and Telegram returned a fresh 409.

Fix: in the conflict branch of the error_callback, synchronously set PTB's
private polling stop_event before scheduling recovery. PTB's loop exits on
its next tick (it races that event in do_action), so our handler owns
polling alone. The handler's await updater.stop() drains the task and PTB
clears the event, so the subsequent start_polling() builds a fresh event
and is not poisoned.

Keeps the existing reconnect ladder intact (option B) — fixes only the
race. Defensive: probes mangled + unmangled stop_event spellings and no-ops
(prior behaviour) if neither exists; never flips _running, which would make
the handler skip stop() and leave the loop wedged.
2026-06-27 20:46:08 -07:00