hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-01 12:02:05 +00:00

Author	SHA1	Message	Date
teknium1	fe89ce0694	chore(release): map Cossackx in AUTHOR_MAP for #52528 salvage	2026-06-28 02:40:37 -07:00
Cossackx	ba37c910e0	fix(desktop/windows): resolve real hermes over extensionless shim + prefer --update on recovery Two Windows-only desktop boot bugs that caused spurious reinstall/repair loops: 1. findOnPath() searched the empty extension BEFORE PATHEXT, so an extensionless Git-Bash `hermes` shim shadowed the real hermes.cmd/.exe. The shim then failed the shell:false --version probe and the resolver fell through to bootstrap/repair even though a working CLI was on PATH. Fix: try PATHEXT extensions first, keep the empty entry LAST so callers that already include the extension (py.exe, pwsh.exe) still resolve. 2. handOffWindowsBootstrapRecovery() chose the destructive --repair over the gentle --update by checking only venv\Scripts\hermes.exe -- the setuptools console-script shim, written at the END of venv setup and absent in interrupted/quarantined states. Fix: take --update when ANY real-install signal is present (venv python, the shim, or .hermes-bootstrap-complete). Adds windows-hermes-resolution.test.cjs (source-assertion pattern, wired into test:desktop:platforms) guarding both regressions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-28 02:40:37 -07:00
Cornna	0229246ab8	fix(desktop): probe venv runtime health before trusting bootstrap marker A broken/empty Windows launcher venv can see the source tree via PYTHONPATH but lack PyYAML, so 'import hermes_cli' succeeds while the first real CLI import dies — the desktop then trusts the bootstrap marker, spawns a dead backend, and loops on 'gateway offline' (#52378). - backend-probes.cjs: canImportHermesCli now runs 'import yaml; import hermes_cli.config' (extracted as hermesRuntimeImportProbe) and accepts an env override, so a dependency regression is caught without a real broken venv fixture. - main.cjs: isBootstrapComplete() routes through new isActiveRuntimeUsable(), which requires the venv python to pass the runtime import probe (with ACTIVE_HERMES_ROOT on PYTHONPATH) — not just exist on disk. Salvaged from PR #38179. The PR's install.ps1 reset/clean + autocrlf changes and their tests are dropped: current main already preserves dirty checkouts via stash (the data-loss-safe #38542 path) rather than the PR's older reset-based Repair-ManagedCheckoutBeforeUpdate approach.	2026-06-28 02:40:37 -07:00
teknium1	7c9cdad9fd	test(cli): cover Windows self-lock recovery guard + cmd-quote its hint Add two tests for the self-lock guard in _recover_from_interrupted_install: one asserting it clears the marker and skips install when hermes.exe is a process ancestor (breaking the #52378/#45542 loop), one asserting it falls through to a normal recovery install when the shim is NOT an ancestor. The guard's manual-recovery hint runs only inside the Windows branch, so quote it for cmd.exe (cd /d, double-quoted paths) — the cross-platform fallback hint at the end of the function is left POSIX-correct. Map Icather in scripts/release.py AUTHOR_MAP for the salvage.	2026-06-28 02:40:37 -07:00
灵越羽毛	b6f592dbdc	fix(cli): detect self-lock in update recovery to break infinite retry loop on Windows	2026-06-28 02:40:37 -07:00
liuhao1024	14baeefe1d	fix(matrix): record DM rooms in m.direct on invite to prevent group misclassification Rebase onto plugins/platforms/matrix/adapter.py (code moved from gateway/platforms/matrix.py). Same logic: _on_invite checks is_direct on invite events and calls _record_dm_room to persist in m.direct account data. Fixes #44679	2026-06-28 02:37:52 -07:00
Teknium	fde1c8570f	fix(tui_gateway): suppress WS peer-hangup teardown error flood (#50005 ) (#54126 ) When the Desktop forcibly closes its WebSocket mid-write, asyncio logs a full traceback for every pending connection-lost callback — 50+ identical WinError 10054 (ConnectionResetError) lines per disconnect on Windows, the equivalent ConnectionResetError/BrokenPipeError on POSIX. These are not actionable: they are the expected side effect of the peer hanging up before our writes drained. Install a loop exception handler on the gateway serving loop that collapses exactly this teardown class (ConnectionResetError/ConnectionAbortedError/ BrokenPipeError originating from _call_connection_lost) to a single debug line, forwarding every other loop error to the existing/default handler unchanged so genuine loop bugs still surface. Idempotent per loop.	2026-06-28 02:35:01 -07:00
teknium1	6eec0d4f08	docs: add infographic for #53107 gateway force-exit fix	2026-06-28 02:34:23 -07:00
LeonSGP43	9f0e64cedd	fix(gateway): force exit after graceful shutdown Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-06-28 02:34:23 -07:00
teknium1	dddaea0c98	chore(release): map yungchentang author for #53622 salvage	2026-06-28 02:34:17 -07:00
yungchentang	7e2ca7f68d	fix(telegram): reset send pool after pool timeouts	2026-06-28 02:34:17 -07:00
kshitij	f3d8f20a59	Merge pull request #54116 from kshitijk4poor/fix/36658-gateway-drain-microtask fix(tui): defer buffered gateway events to stop dashboard chat #301 (#36658)	2026-06-28 14:50:07 +05:30
Teknium	f646b82ff0	docs: add infographic for #38249 atomic env-snapshot fix	2026-06-28 02:08:57 -07:00
Teknium	9f17f16c66	fix(environments): use $BASHPID for atomic snapshot temp + harden failure path The atomic mv approach (kyssta-exe's commit) narrows but does not close the #38249 race: the temp name used $$ (parent shell PID), which is identical across &-launched concurrent subshells. Two concurrent writers pick the same temp file, clobber each other mid-write, and mv then publishes a torn snapshot — a reader sourcing it absorbs declare-x/export fragments into PATH. - Use $BASHPID (actual per-subshell PID) so concurrent writers never collide. - Chain mv on export success (&&) and rm the temp on failure so a partial dump never replaces a good snapshot; apply the same to the init_session bootstrap. - shlex-quote the static temp-path portion (Windows/spaces), $BASHPID outside. - LocalEnvironment.cleanup sweeps orphaned snap.tmp.* temps. - Regression tests: string-shape + a behavioral concurrent writers/readers test that proves the snapshot never tears (would still tear with $$).	2026-06-28 02:08:57 -07:00
kyssta-exe	6a2958a521	fix(environments): use atomic file replacement for snapshot writes Fix race condition in terminal environment snapshots that could corrupt PATH with declare -x entries. When concurrent terminal calls share the same snapshot file, the non-atomic 'export -p > snapshot.sh' write could be read mid-write by another process, causing partial/corrupted env vars to be sourced and mixed into PATH. The fix uses atomic file replacement: - Write to a temp file: export -p > snapshot.sh.tmp.303651 - Atomically replace: mv -f snapshot.sh.tmp.303651 snapshot.sh On POSIX, mv within the same filesystem is atomic, so source() will either see the old complete snapshot or the new complete one, never a partial/truncated file. Fixes #38249	2026-06-28 02:08:57 -07:00
teknium1	c23f394eb8	fix: satisfy ruff encoding + windows-footgun lints for cgroup reaper - read_text(encoding='utf-8') (PLW1514) - # windows-footgun: ok on signal.SIGKILL — module is Linux-only (reads /proc, /sys/fs/cgroup; runs from a systemd unit) - test lambda accepts the new encoding kwarg	2026-06-28 02:05:50 -07:00
teknium1	86ec979f66	chore(release): map PRATHAMESH75 author for #37550 salvage	2026-06-28 02:05:50 -07:00
PRATHAMESH75	e551da6ddb	fix(gateway): reap cgroup orphans via ExecStopPost to unblock restart Long-lived helpers spawned indirectly by tool calls (adb, platform bridges) were left in the service cgroup after the gateway's main process exited. When the kernel rejected the deferred cgroup-wide kill with EINVAL, systemd blocked Restart=always for 6+ minutes, taking down all platforms and cron windows (#37454). Add a small ExecStopPost helper (gateway.cgroup_cleanup) that walks cgroup.procs and sends per-PID SIGKILLs — a different kernel code path than cgroup.kill, so it succeeds where the cgroup-wide write failed. KillMode=mixed is preserved so the gateway still reaps its own tool-call children before systemd intervenes (#8202). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-28 02:05:50 -07:00
Coy Geek	d7a1052424	fix(env-passthrough): fail closed when provider blocklist import fails When tools.environments.local can't be imported (partial install, import-time error), _is_hermes_provider_credential() returned False — fail-open. A skill could then register a Hermes provider credential (ANTHROPIC_API_KEY, etc.) as env passthrough; _scrub_child_env lets passthrough vars bypass the secret-substring net (rule 1), so the operator's real key would land in the execute_code child. Reopens the GHSA-rhgp-j443-p4rf bypass. Fail closed instead: on import failure, treat the name as a protected provider credential and refuse passthrough. Regression test exercises the full register -> scrub path under a simulated import failure. Co-authored-by: Hermes Agent <noreply@nousresearch.com>	2026-06-28 02:05:43 -07:00
teknium1	58c36b1798	fix(api-server): widen error redaction to cron-endpoint + SSE sites Follow-up to the salvaged #37733 fix. The contributor centralized redaction at _openai_error and the chat/responses failure paths, which covers the OpenAI-compatible envelopes transitively. Two sibling classes crossed the same authenticated HTTP boundary unredacted: - 8x cron-management endpoints returning {"error": str(e)} on 500 - the session-chat SSE error event ({"message": str(exc)}) Route both through the same _redact_api_error_text(force=True) helper. Add AUTHOR_MAP entry for coygeek and a TestRedactApiErrorText guard covering mask/force/limit/passthrough behavior.	2026-06-28 02:05:38 -07:00
Coy Geek	5e774de76e	fix(api-server): redact provider errors at HTTP boundary Force API-server error text through the existing secret redactor before returning OpenAI-compatible errors, response fallback text, response snapshots, and run failure events. This prevents credential-shaped provider failure text from crossing the API-server boundary while preserving debuggable sanitized messages.	2026-06-28 02:05:38 -07:00
HexLab98	d2fda5925d	test(gateway): cover Discord/Slack compression status suppression (#39293 )	2026-06-28 14:35:32 +05:30
HexLab98	d2ea948bc0	fix(gateway): suppress compression status noise on Discord and other chats (#39293 ) Extend the gateway noisy-status filter beyond Telegram so internal compression lifecycle messages stay in logs instead of spamming Discord, Slack, and other messaging channels.	2026-06-28 14:35:32 +05:30
teknium1	9f7d520caf	docs: add infographic for #36664 WhatsApp LID session-path fix	2026-06-28 02:05:26 -07:00
teknium1	3aaa98dd01	test(whatsapp): cover LID allowlist match on modern session layout Add an _is_user_authorized E2E for the platforms/whatsapp/session layout on top of fesalfayed's resolver fix (#36665) — guards the actual silently-dropped-LID-sender path from #36664.	2026-06-28 02:05:26 -07:00
fesalfayed	263ffec1b0	fix(whatsapp): resolve LID aliases on modern platforms/ session layout expand_whatsapp_aliases hardcoded get_hermes_home()/whatsapp/session, but the adapter writes lid-mapping files via get_hermes_dir("platforms/whatsapp/ session", "whatsapp/session"). On installs without the legacy directory the two paths diverge, so the resolver finds no mappings and returns the bare LID, which misses the allowlist and silently drops the message. Resolve through the same helper so both sides stay in lockstep on new and legacy layouts.	2026-06-28 02:05:26 -07:00
teknium1	d0f087e7f9	docs: add infographic for #36109 empty-400 diagnostics	2026-06-28 02:05:20 -07:00
xxxigm	093f567f0d	fix(agent,cli): surface empty-body API errors and fail oneshot exit code When an LLM API call returns HTTP 4xx with an empty parsed SDK `body` ({}), `_summarize_api_error` fell through to a bare `str(error)`, so users saw only "HTTP 400" with no provider detail (reported on Windows in #36109). The SDK leaves `body` empty in this case, but the httpx `response` still carries the payload in `.text`. - run_agent.py `_summarize_api_error`: when `body` is empty, fall back to `response.text` — parse a JSON `error.message`/`message` when present, else surface the raw (truncated) body. Platform-agnostic diagnostics. - hermes_cli/oneshot.py: `hermes -z` now runs via `run_conversation` and returns exit code 2 when the run is failed/partial with no usable final response, so scripts can detect LLM failures (still 0 when a response — incl. an error summary as output — is produced). Tests: new tests/run_agent/test_summarize_api_error.py (empty-body JSON + raw text, RED/GREEN verified) + oneshot exit-code/`run_conversation` wiring tests. NOTE: #36109's original root cause (Windows "all providers return empty 400") is not reproducible on current main (heavy provider-transport churn since v0.15.1). This change does not claim to fix that root cause — it makes any empty-body API error LEGIBLE so a future occurrence shows the real provider message instead of a bare HTTP 400. Relates to #36109 (does not close it).	2026-06-28 02:05:20 -07:00
teknium1	c0b4a3438a	fix(install): scope Playwright override to too-new apt releases + keep step interruptible Follow-up on #54032 for #35166: - Gate the PLAYWRIGHT_HOST_PLATFORM_OVERRIDE retry on the host being an apt release newer than Playwright recognizes (Ubuntu >24.04 / Debian >13) via playwright_host_unrecognized(), instead of retrying on ANY install failure. A network/disk/permission failure on a supported host now surfaces unchanged rather than getting a mismatched-glibc build forced onto it. - detect_os() now captures DISTRO_VERSION from os-release. - Fold in the interruptibility fix (was PR #35304, self-closed): wrap the download in 'timeout --foreground -k 10' (probed, with plain-timeout fallback) so a terminal Ctrl+C reaches the child and a wedged download is force-killed after the deadline. - Add behavioral tests that source the helpers and assert the retry fires only on Ubuntu 26.04 / Debian 14, not on supported hosts, non-apt distros, native-success, operator-pinned override, or unsupported arch.	2026-06-28 02:05:18 -07:00
kshitijk4poor	a28fe788a6	fix(install): retry Playwright install with platform override on unrecognized host (#35166 ) On apt releases newer than the bundled Playwright recognizes (Ubuntu 26.04, Debian 14, and future distros), 'npx playwright install --with-deps chromium' hangs uninterruptibly at 'Installing Playwright Chromium with system dependencies' because Playwright's resolver maps the host to a platform with no download build (#35166). Wrap every installer Playwright call in run_playwright_install(), which tries the native install first and, only if it fails or times out, retries once with PLAYWRIGHT_HOST_PLATFORM_OVERRIDE pinned to the newest known build (ubuntu24.04-<arch>). This is the escape hatch Playwright's maintainers bless for unrecognized platforms (microsoft/playwright#33434). Try-native-first (not a hardcoded distro/version table) is deliberate: - Self-correcting — when Playwright already supports the host (e.g. Ubuntu 26.04 on Playwright >=1.61) the first attempt succeeds and the override is never applied, so we never force a mismatched-glibc build onto a release Playwright handles correctly (microsoft/playwright#35114). - Zero-maintenance — new distro releases work the moment Playwright adds them. - Covers Debian 14+ and future releases, not just Ubuntu 26.04. An operator-set PLAYWRIGHT_HOST_PLATFORM_OVERRIDE is always respected (applied to the first attempt; retry skipped). Non-x64/arm64 arches have no fallback build and skip the retry. Refs #35166	2026-06-28 02:05:18 -07:00
teknium1	64972b6403	fix(config): canonicalize model.name/model.model to model.default (#34500 ) A custom_providers config that names the model under model.name (or model.model) resolved to an empty model, so the API request went out with model= — HTTP 400 from OpenAI-compatible backends. Display paths (hermes status/dump) already read model.name and showed the model, making the failure silent. The model id was read via 'default or model' at ~14 independent sites (cli, gateway, cron, curator, oneshot, fallback, profiles, ...), none of which honored 'name'. Rather than patch every site, canonicalize at the single load/save chokepoint: _normalize_root_model_keys() now promotes model.model/model.name -> model.default (precedence default > model > name) and drops the stale alias, so every reader — present and future — sees a populated default and config.yaml is migrated canonical on next save. The gateway, which bypasses load_config(), replays the same normalization in _load_gateway_config(). Co-authored-by: Bartok9 <danielrpike9@gmail.com> Credit: root-cause analysis and fix direction from @Bartok9 (#34502, first) and @v86861062 (#34527).	2026-06-28 02:05:13 -07:00
kshitijk4poor	f64d15ccb7	fix(tui): defer buffered gateway events to stop dashboard chat #301 (#36658 ) Dashboard /chat spawns the TUI attached to the dashboard's in-memory gateway via HERMES_TUI_GATEWAY_URL. In that attach mode the already-running gateway replays `gateway.ready` (and `session.info`) the instant the socket connects, so those events land in GatewayClient.bufferedEvents before the consumer's mount-time subscribe effect (useMainApp.ts) calls drain(). drain() then emitted the buffered events synchronously, so the `gateway.ready` handler's patchUiState / setHistoryItems cascade ran while React was still inside the first commit — tripping "Too many re-renders" (Minified React error #301) and breaking Dashboard chat after `hermes update`. Spawn / inline / sidecar modes never hit this: their `gateway.ready` only arrives after the Python child boots, on a later async tick. Fix: drain() defers the replay to the next microtask AND keeps `subscribed` false until that microtask runs. Keeping `subscribed` false in the gap means any live event arriving before the flush keeps buffering (publish() pushes when !subscribed) instead of emitting synchronously and jumping ahead of the chronologically-earlier replayed events — the flush re-drains the buffer right after flipping `subscribed`, preserving FIFO order. A drainGeneration token (bumped in resetStartupState) makes a queued flush a no-op if the transport was reset/killed in the meantime, avoiding use-after-teardown and duplicate/reordered exits. Regression tests: (1) drain() does not dispatch buffered events synchronously; (2) a live event arriving in the post-drain / pre-microtask window still delivers BEHIND the earlier-buffered event (FIFO). Both are red against the old synchronous behavior, green with this fix. Same class of fix as #44528. Closes #36658	2026-06-28 14:18:47 +05:30
Teknium	2ecb6f7fe6	fix(telegram): clear send_path_degraded on successful reconnect (#35205 ) (#54076 ) * fix(telegram): clear send_path_degraded on successful reconnect _send_path_degraded was cleared only in _verify_polling_after_reconnect, 60s after reconnect and only if scheduled. A clean start_polling() reconnect left the flag stuck True, short-circuiting send() and blocking all outbound messages until the deferred probe ran (or forever if it never did). Clear the flag the moment start_polling() succeeds — that is the recovery signal. The deferred probe remains a defensive re-check that re-enters the reconnect ladder (re-setting the flag) if it detects a silent wedge. Fixes #35205. * docs: add infographic for #35205 telegram send-path fix	2026-06-28 01:38:17 -07:00
Teknium	674e16e7c6	fix(redact): stop DB-connstr redaction from corrupting code output (#33801 ) (#54061 ) Secret redaction is display/output-scoped on main — write_file writes content verbatim, terminal/execute_code redact only output not the command/source. The real bug is in displayed tool OUTPUT (read_file, terminal, execute_code): _DB_CONNSTR_RE's password group [^@]+ was greedy across newlines, so on a multi-line block it scanned past the DSN line to the next stray '@' (a Python @decorator), replacing every intervening character — including line breaks — with *. That dropped lines and concatenated the next line onto the f-string line, making read_file output look corrupted (the file on disk was always correct). Reported in #33801. Fix: - Forbid whitespace in the userinfo/password groups ([^:\s]+ / [^@\s]+) so the match can never span a line break. A real DSN password never contains whitespace. This alone kills the catastrophic line-dropping. - Under code_file=True, preserve a password group that is a pure {...} brace expression — f"postgresql://{user}:{pass}@{host}" is an f-string template, not a live credential. Literal passwords are still masked. - Pass code_file=True at the terminal and execute_code output redaction call sites (file_tools already did) so code-execution output isn't corrupted by ENV/JSON/template false positives. Real prefixes, auth headers, JWTs, and private keys are still redacted. Verified E2E against the reporter's exact pydantic-settings module: file written verbatim, read_file shows the DSN f-string + @model_validator intact with zero * corruption, while a literal postgresql://admin:pw@host DSN and a real sk- key are still masked. Reported-by: koishi70 Reported-by: pfrenssen	2026-06-28 01:15:39 -07:00
Teknium	de6e9ac760	docs(discord): document bot-to-bot comms as unsupported (#32791 ) (#54063 ) * docs(discord): document bot-to-bot comms as unsupported (#32791) Multi-profile bot-to-bot conversation is not a supported topology. DISCORD_ALLOW_BOTS=none (the default) blocks all bot-originated messages; setting mentions/all across multiple Hermes profiles to make them reply to each other ack-loops because Discord's reply auto-mention satisfies the mention gate every turn. Document the safe default and the loop hazard so operators don't wire it up. * docs(discord): infographic for bot-to-bot unsupported stance (#32791)	2026-06-28 01:15:34 -07:00
teknium1	4f16950e9a	docs: add infographic for #32421 content-filter fallback fix	2026-06-28 01:15:21 -07:00
teknium1	578e3989d4	fix(agent): route content-filter stream stalls to fallback chain (#32421 ) When a provider's output-layer safety filter (MiniMax "output new_sensitive (1027)", Azure content_filter, etc.) kills a streaming response after deltas were already sent, interruptible_streaming_api_call swallows the raw error into a finish_reason=length partial-stream stub. The conversation loop then burned 3 continuation retries against the SAME primary — re-hitting the content-deterministic filter every time — and gave up with "Response remained truncated after 3 continuation attempts", never consulting fallback_providers. Builds on @595650661's classifier change (cherry-picked) so error_classifier recognizes the filter; then: - chat_completion_helpers: run the swallowed error through error_classifier at the stub-creation point and stamp _content_filter_terminated on the stub (single source of truth — no parallel pattern list). - conversation_loop: read the tag and activate the fallback chain BEFORE burning any continuation retries; roll partial content back to the last clean turn and re-issue against the new provider (restart_with_rebuilt_messages). Plain network stalls are unaffected (only content_policy_blocked is tagged). Credits #32479 (@sweetcornna) and #33845 (@Tranquil-Flow) which fixed the same issue via the stub-tag and loop-escalation approaches respectively. Live E2E confirmed: before, _try_activate_fallback called 0x; after, fallback fires on the first stub and the fallback provider completes the turn.	2026-06-28 01:15:21 -07:00
595650661	b8e2268628	fix(agent): add MiniMax 'new_sensitive' to content_policy_blocked patterns The MiniMax output-layer safety filter surfaces the error verbatim as `output new_sensitive (1027)` (sometimes with additional provider wrapping like 'Stream stalled mid tool-call: output new_sensitive (1027)'). When the model emits a large tool-call argument block, the upstream filter trips and the SSE stream is truncated mid-flight, producing 'stream stalled mid tool-call' errors. Until now this case was misclassified and retried 3x on the same provider, reproducing the same refusal and burning paid attempts. Adding `new_sensitive` to `_CONTENT_POLICY_BLOCKED_PATTERNS` routes it through the existing is_client_error path: skip 3x retry, activate configured fallback model immediately, surface a clear provider-safety message to the user. Refs #32421	2026-06-28 01:15:21 -07:00
Teknium	c9df4bc094	fix(gateway): default restart_drain_timeout to 0 to kill systemd crash loop (#54066 ) A restart now interrupts in-flight agents immediately rather than holding the gateway open for a grace window. The previous 180s default coupled two independently-set timers: the gateway's own drain timer and systemd's TimeoutStopSec. On a stale unit where TimeoutStopSec < drain, systemd SIGKILLed the gateway mid-cleanup, leaving a stale lock that made the next startup exit immediately ('already running') — an infinite crash loop under Restart=on-failure (#31981). Setting drain to 0 makes the mismatch structurally impossible: with drain 0 the generated unit gets TimeoutStopSec=90 against a near-instant drain, so systemd never kills mid-cleanup. Contract: restart the gateway, in-flight work stops. A grace window large enough to 'save' a long agent turn would have to outlast an unbounded task, which is impossible. Also fixes the stale-unit warning's suggested command (hermes gateway service install --replace -> hermes gateway install --force); the former subcommand does not exist. Closes #31981	2026-06-28 01:14:34 -07:00
teknium1	0800f1c28b	infographic: whatsapp send-queue serialization (#33360 )	2026-06-28 01:10:14 -07:00
teknium1	cb9f855c2b	test(whatsapp-bridge): drop structural send-queue integration test The .integration.test.mjs greps bridge.js source text for the queue wiring — a change-detector that breaks on any benign refactor of the same code. The behavioral unit test (bridge.sendqueue.test.mjs) already covers FIFO ordering, error isolation, timeout propagation, and single-consumer concurrency, which is the contract that matters.	2026-06-28 01:10:14 -07:00
Tranquil-Flow	c393a8e55f	fix(whatsapp-bridge): serialize sendMessage to prevent cross-chat contamination (#33360 ) Concurrent sock.sendMessage() calls on a single Baileys socket can cause the WhatsApp protocol-level routing to misdeliver messages — responses intended for one chat appear in another. Add a promise-based send queue that serialises all sendMessage() calls across concurrent HTTP /send, /edit, and /send-media handlers so only one send is in-flight at a time. Includes unit tests for queue ordering, error isolation, timeout propagation, and single-consumer concurrency semantics, plus an integration check that the queue is wired into sendWithTimeout.	2026-06-28 01:10:14 -07:00
teknium1	1f72ad9be9	refactor(cli): extract interrupt recovery to a testable helper Pull the #33271 post-interrupt recovery (flush_stdin + _force_full_redraw) out of process_loop's finally block into _recover_terminal_after_interrupt(), and replace the inline-logic-copy tests with ones that exercise the real helper plus a source guard that process_loop still invokes it behind the _last_turn_interrupted gate.	2026-06-28 01:08:09 -07:00
zccyman	f3aaba7f85	fix(cli): recover terminal state after interrupt to prevent raw control sequence freeze When the agent is interrupted during processing, prompt_toolkit's renderer and VT100 input parser can be left in an inconsistent state. CSI 6n cursor position report responses leak as literal text (^[[19;1R) and the terminal stops accepting keyboard input. Fix: in process_loop's finally block, after an interrupted turn: - flush_stdin() to drain stray escape bytes from the OS input buffer - _force_full_redraw() to reset prompt_toolkit's renderer cache Closes #33271	2026-06-28 01:08:09 -07:00
teknium1	2e1b48ed31	chore: map kurlyk local email → skabartem for PR #32867 salvage	2026-06-28 01:08:04 -07:00
kurlyk	def97bcd96	fix: eliminate race condition in OpenAI client replacement Make check-and-replace atomic in _ensure_primary_openai_client by keeping both operations under the same lock acquisition. Previously, the lock was released between detecting a closed client and replacing it, allowing two threads to simultaneously replace the client. Fixes #32846 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-28 01:08:04 -07:00
teknium1	4a0fe4e54a	docs: add PR infographic for #32762 clarify-expiry fix	2026-06-28 01:07:53 -07:00
teknium1	aacc15b2c9	fix(clarify): raise default clarify_timeout to 3600s (#32762 ) The 600s default evicted the gateway clarify entry while users were still away (meeting/AFK); a later button tap then landed on a dead entry and the agent hung on 'running: clarify'. Raise the default to 1h in DEFAULT_CONFIG and the get_clarify_timeout() code-level fallback, documenting the running-agent-guard tradeoff. User overrides still win.	2026-06-28 01:07:53 -07:00
konsisumer	3f543229f2	fix(telegram): notify user when clarify button tap arrives after expiry	2026-06-28 01:07:53 -07:00
Teknium	90d25adc9e	fix(gateway): deliver profile-scoped cache media on symlinked HERMES_HOME (#54060 ) Generated images under a profile gateway's cache (profiles/<name>/cache/ images/...) were silently dropped from Telegram/Discord delivery when HERMES_HOME is symlinked under a denied prefix (e.g. /opt/data -> /root/.hermes) and $HOME is not that prefix. The resolved path lands under /root (a system denylist prefix), the root-home exception only fires when the denied prefix IS $HOME, and the static safe-roots list only covers the active HERMES_HOME's top-level cache — not per-profile cache dirs. Both gates fail, so validate_media_delivery_path returns None and the gateway logs 'Skipping unsafe MEDIA directive path'. _media_delivery_allowed_roots() now also enumerates per-profile cache roots (<root>/profiles/*/cache/{images,audio,videos,documents, screenshots}) at check time. Allowlist match runs before the denylist, so the profile artifact delivers regardless of the /root interaction; profile-dir credentials (auth.json) stay blocked since they aren't under a cache subdir. Reopened regression of #34485/#38108, neither of which covered the profile-scoped symlink case. Fixes #31733.	2026-06-28 01:07:28 -07:00

1 2 3 4 5 ...

13320 commits