Commit graph

11628 commits

Author SHA1 Message Date
Teknium
dc90ca4e17 fix(ssl): run CA guard during agent initialization 2026-06-13 21:14:32 -07:00
Teknium
af5b526472 fix(ssl): validate CA bundle paths before provider calls 2026-06-13 21:14:32 -07:00
chromalinx
b42c5bf652 test(ssl_guard): fix macOS fallback test that passed for the wrong reason
The previous test patched ssl.create_default_context globally with a bare
SSLContext that has zero CA certs. Both verify_ca_bundle() and the macOS
fallback got the same mocked context, so the test verified nothing useful:
both paths produced empty get_ca_certs() and the assertion that no
exception escaped was vacuously satisfied.

Only mock the fallback call (no cafile) — let the certifi call hit the
real SSL stack and fail with SSLError on the broken PEM. The mock
fallback returns a context with load_default_certs() so the test now
verifies the real scenario: broken certifi → SSLConfigurationError,
macOS system trust store → success.

Also pads the broken PEM past the 1 KB size guard so the size check
doesn't short-circuit before ssl.create_default_context(cafile=...) runs.

Reported by @liuhao1024 in PR review.
2026-06-13 21:14:32 -07:00
chromalinx
a218a0f156 fix(agent,gateway,doctor): add SSL CA cert bundle fail-fast guard
A stale certifi CA bundle after a partial `hermes update` used to crash
the agent on the first outbound HTTPS call with a raw traceback and
trap the gateway in a retry loop.

This patch:

* Adds `agent/errors.py` with a typed `SSLConfigurationError`
* Adds `agent/ssl_guard.py` with a `verify_ca_bundle()` pre-flight
  that asserts the bundle exists, is non-trivial in size, and can build
  a working SSLContext. On macOS, it falls back to the system trust
  store when the bundle is empty but the system store is healthy
  (covers corporate proxies / MDM setups).
* Wires the guard into `run_agent.py` and `gateway/run.py` right
  after the `hermes_bootstrap` import, inside a try/except so a bug
  in the guard itself can never prevent startup.
* Adds a `SSL / CA Certificates` section to `hermes_cli doctor` so
  users can detect the failure with one command.
* Adds unit tests covering the healthy, missing, empty, skip-env, and
  macOS-fallback paths.
* Adds an RCA document describing the failure mode and the recovery
  path (`pip install -e .`).

When the bundle is broken the user sees:

    \u26a0\ufe0f SSL certificate bundle issue detected.
       Run: pip install -e .

`HERMES_SKIP_SSL_GUARD=1` disables the check for sandboxed
environments that ship their own trust store.
2026-06-13 21:14:32 -07:00
Teknium
1106879147
perf(process): wake waiters on background completion (#45831) 2026-06-13 21:11:19 -07:00
brooklyn!
6b76284c77
fix(desktop): surface off-screen approvals via the jump-to-bottom control (#45853)
* fix(desktop): jump-to-approval pill for off-screen approvals

A blocked approval's only response surface is the inline Run/Reject bar on
the pending tool row. When that row is scrolled out of view the session looks
stalled with no visible action. Surface a composer-anchored "Approval needed"
pill only when an approval is pending AND its inline bar is scrolled away;
clicking scrolls the bar back into view. Preserves the deliberate inline (not
modal) approval design — the pill never duplicates the approve/reject controls.

The inline bar mirrors its own viewport visibility via IntersectionObserver
(tracks scroll/resize/layout) and registers a scroll-into-view handler the pill
fires, mirroring the existing thread-scroll jump-button bridge.

Supersedes #45828.

* fix(desktop): morph jump-to-bottom into approval prompt; drop scroll bridge

Collapse the separate "jump to approval" pill into the existing
scroll-to-bottom control: when scrolled away from the bottom while an approval
is pending, it relabels to "Approval needed". A parked approval's inline
Run/Reject bar is always the bottom-most content, so the existing
scroll-to-bottom action lands the user right on it — one control, no collision.

This also fixes the layout corruption from the first cut: the pill called
native el.scrollIntoView(), which scrolls every scrollable ancestor including
the overflow:hidden chat shell containers. Those have no scrollbar to scroll
back and don't remount on session switch, so the composer stayed shoved and
the breakage persisted across sessions. Reusing requestScrollToBottom() (the
use-stick-to-bottom path) only touches the one designated scroll container.

Removes the now-unused approval-scroll store + IntersectionObserver wiring.
2026-06-13 23:07:22 +00:00
Teknium
4026f526d5 chore(release): map MaxFreedomPollard author email 2026-06-13 15:01:42 -07:00
Max Pollard
9a2b976326 test(skills): add regression tests for bundled-update backup recovery
Three tests covering: a stale .bak poisoning a failed update's move/restore, an orphaned .bak misread as a user deletion, and a partially written dest blocking restore-on-failure. All three fail on current main without the fix.

Refs #44942
2026-06-13 15:01:42 -07:00
Max Pollard
3581131e7d fix(skills): make bundled-update backup handling crash-safe and idempotent
Recover an orphaned .bak before classification (interrupted updates no longer read as user deletions), clear a stale .bak before shutil.move (replace, not nest), and clear a partial dest before restore so restore-on-failure actually runs.

Fixes #44942
2026-06-13 15:01:42 -07:00
Teknium
bf8effad02
fix(utils): copy fallback for atomic replace across devices (#43852)
Fallback from `os.replace` on EXDEV/EBUSY using copy+fsync+unlink while preserving symlink target semantics and metadata.
2026-06-13 14:50:05 -07:00
Teknium
817f392311
feat(read): extract notebook and office documents (#37082)
Add stdlib-only extraction for `.ipynb`, `.docx`, and `.xlsx` in read_file with lazy integration and malformed-document fallback.
2026-06-13 14:42:51 -07:00
Teknium
2b67e96aec fix(approval): gate in-place edits to sensitive user files
Cover sed, perl, and ruby in-place mutations against shell rc, SSH, and credential files so terminal approvals pair the redirection and copy guards.
2026-06-13 14:35:27 -07:00
helix4u
abd69b8117 fix(approval): detect absolute home shell rc writes 2026-06-13 14:35:27 -07:00
briandevans
da28d5d113 fix(security): gate cp/mv/install into ~/.ssh, credential, and shell-rc files
tools/approval.py already denies tee/redirection writes to every
_SENSITIVE_WRITE_TARGET (~/.ssh/*, ~/.netrc/.pgpass/.npmrc/.pypirc, shell
rc files, ~/.hermes/config.yaml/.env) via the DANGEROUS_PATTERNS tee/`>`
rules, but cp/mv/install were only paired for _SYSTEM_CONFIG_PATH (/etc) and
the project-relative env/config target. So `cp evil ~/.ssh/authorized_keys`
(SSH-key implant / persistence), `cp creds ~/.netrc`, and `cp evil ~/.bashrc`
(login-time command injection) auto-approved while the equivalent tee/`>`
forms were denied — an unpaired write deny is theater (same rationale as
#14639 / commit 4e9d886d, which paired the terminal side for
~/.hermes/config.yaml writes but did not touch these cp/mv/install verbs on
the broader sensitive set).

Add one (cp|mv|install) DANGEROUS_PATTERNS entry reusing the existing
_SENSITIVE_WRITE_TARGET fragment, anchored via _COMMAND_TAIL so it fires on
the destination (last arg) only: reading OUT of a sensitive path
(`cp ~/.ssh/config /tmp/x`) stays auto-approved. Description differs from the
system-config cp entry so the two keep distinct approval keys (no silent
cross-approval). Additive — does not subsume the /etc or project-config rules.

Adds TestSensitiveCopyMovePattern: 5 positive cases (ssh authorized_keys,
ssh private key via mv, netrc via install, bashrc, ~/.hermes/config.yaml) +
2 negative guards (copy FROM ssh, unrelated copy). The ssh/netrc/bashrc
positives fail on main and pass on this branch; the negatives stay green
both ways.
2026-06-13 14:35:27 -07:00
Teknium
1fa761f8de
fix(search): keep partial results on search timeout (#36142)
Treat search command budget timeouts as soft truncation so partial results survive, while real search failures still return structured errors.
2026-06-13 14:35:21 -07:00
Teknium
069bfd6545 fix(agent): keep Codex reasoning replay on Codex path 2026-06-13 14:35:00 -07:00
briandevans
1d584a301e fix(agent): treat Codex reasoning items as thinking-only 2026-06-13 14:35:00 -07:00
ITheEqualizer
57c2a55be4 fix(telegram): harden rich message fallback handling
Carry forward focused follow-ups from PR #45741: treat PTB's raw Bot API 10.1 response shapes safely, recognize real missing-endpoint errors, preserve link preview settings on rich sends, and lock the rich limit to Telegram's character-based cap.
2026-06-13 14:34:53 -07:00
brooklyn!
0a865e5948
fix(desktop): bypass Chromium editing pipeline for large paste & select-delete (#45812)
Large paste and Ctrl+A → Delete froze the composer for seconds — both routed
through Chromium's contenteditable editing pipeline (~O(n²) on multiline DOM).

- insertPlainTextAtCaret: Range + text/<br> fragment (paste path)
- deleteSelectionInEditor: range.deleteContents for non-collapsed Backspace/Delete
- Shared composerSelectionRange helper; both flush via flushEditorToDraft

Profiled live (47 KB / 122 paragraphs): paste 4474 ms → 13 ms; select-delete
1304 ms → 4 ms. Collapsed-caret deletes still native.
2026-06-13 20:49:58 +00:00
Teknium
c8e5f34f24
fix(gemini): strip native self prefixes before generateContent (#36141)
Strip `google/` and `gemini/` self-prefixes before native Gemini generateContent calls, and keep provider-normalization expectations aligned.
2026-06-13 13:47:08 -07:00
briandevans
7d11fa4e9e fix(codex-responses): let final_answer complete top-level incomplete responses 2026-06-13 13:45:29 -07:00
ITheEqualizer
7c0605bf22 fix(telegram): preserve rich formatting on stream final 2026-06-13 13:44:45 -07:00
achaljhawar
819def44c7 fix(agent): scope Nous tags to Nous auxiliary calls 2026-06-13 13:24:40 -07:00
Teknium
08890d77e6
fix(plugins): normalize browser-pasted GitHub repo URLs (#33539)
Accept common GitHub web URLs in `hermes plugins install` by normalizing repository views back to cloneable `.git` URLs, with focused parser coverage.
2026-06-13 13:23:59 -07:00
brooklyn!
425e777f54
fix(desktop): polish slash command completion (space/tab/click + typed args) (#45760)
* fix(desktop): accept slash command on space at command stage

Pressing space on a no-arg slash command (e.g. /hermes-agent) fell
through to the arg-completion stage and dead-ended on "No matches"
instead of inserting the directive. Space now mirrors Tab/Enter while
the command name is still being typed: no-arg commands commit the chip,
arg-taking commands expand to their options step.

* fix(desktop): suppress arg popover for no-arg slash commands

Committing a no-arg command (`/hermes-agent `) re-detected the chip+space
as an arg query and re-opened the popover on "No matches". The arg-stage
menu now only opens when the command actually takes args.

* fix(desktop): polish slash arg completion (space/tab/click + typed args)

Unify Enter/Tab/Space accept of the highlighted item at both the command
and arg stages: no-arg commands commit a chip, arg commands expand to
options, and an arg option commits the full `/cmd arg` chip. A fully-typed
arg (which the backend completer drops from suggestions) now commits on
Space/Tab via the verbatim text instead of dead-ending, and the "No
matches" empty state is suppressed past a command's name. Space stays
slash-only so @ mentions keep a literal space.
2026-06-13 18:43:52 +00:00
kshitij
7be22e37e1
Merge pull request #45753 from kshitijk4poor/salvage/gateway-auto-resume-duplicate-agent
fix(gateway): claim session slot before auto-resume task to prevent duplicate agents (#45456)
2026-06-13 23:46:17 +05:30
kshitijk4poor
28902dc890 chore: map liuhao1024 contributor email for attribution 2026-06-13 23:39:49 +05:30
kshitijk4poor
63097ee0d7 test(gateway): cover auto-resume full-path no-regression; clarify guard docstring
The salvaged fix's two regression tests mock adapter.handle_message, so
they only assert the pre-claimed sentinel is set/cleaned around a stub —
they never drive the real dispatch chain. Add a full-path test that
exercises _schedule_resume_pending_sessions -> _guarded_handle_message ->
adapter.handle_message -> _process_message_background -> _handle_message
and asserts the resumed session's agent runs EXACTLY ONCE: not zero (the
pre-claim must not self-bounce the resume into a queued no-op) and not
twice (the duplicate-agent bug #45456 the fix targets). Also assert no
leaked sentinel and no orphaned pending event after the drain settles.

Tighten the _guarded_handle_message docstring: on current main the real
sentinel is taken over inside _handle_message (not _process_message_background),
and note the `is _AGENT_PENDING_SENTINEL` guard only releases the slot we
ourselves placed, never one a live run owns.
2026-06-13 23:39:35 +05:30
liuhao1024
6e2fd955ca fix(gateway): claim session slot before auto-resume task to prevent duplicate agents
When the gateway restarts and auto-resumes an interrupted session, an
inbound message arriving in the window between `asyncio.create_task()`
and the task's first await could spin up a second AIAgent for the same
session.  Both agents would then process messages concurrently,
producing interleaved duplicate responses (#45456).

Fix: set `_AGENT_PENDING_SENTINEL` in `_running_agents` immediately
after the "already running" check, before creating the task.  This
closes the race window — any inbound message sees the slot as occupied
and queues behind the auto-resume.

A `_guarded_handle_message` wrapper ensures the pre-claimed sentinel is
always released, even if `handle_message` raises before reaching
`_process_message_background` (whose `finally` block handles normal
cleanup).

(cherry picked from commit 85150c976b)
2026-06-13 23:36:51 +05:30
helix4u
78c11d99e3 fix(update): stop Windows gateways before mutating install 2026-06-13 10:46:08 -07:00
ashishpatel26
957a8ffa88 fix(bedrock): omit sampling params for restricted Claude models
Bedrock Converse rejects non-default sampling parameters for Opus 4.7 and 4.8 with a ValidationException. Reuse the Anthropic-native sampling-param guard in the Bedrock kwargs builder so those models omit temperature/topP while older Claude and non-Claude models keep existing behavior.

Includes the stop-sequence regression from the parallel fix to ensure stopSequences still pass through for restricted Opus models.

Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>
2026-06-13 10:45:56 -07:00
Teknium
cc14b74718 docs(profile): update clone-from references 2026-06-13 07:33:58 -07:00
Teknium
9b5f7b63c6 fix(profile): make clone-from a full source selector 2026-06-13 07:33:58 -07:00
Teknium
d146b85173 chore(release): map WompaJango author 2026-06-13 07:33:58 -07:00
WompaJango
28bf8fb47d feat(dashboard): clone profiles from any source 2026-06-13 07:33:58 -07:00
Que0x
3380563d94 fix(security): stop /api/status leaking host paths and PID on gated binds
The dashboard's public /api/status liveness endpoint is in PUBLIC_API_PATHS
and bypasses dashboard auth, yet it returned absolute hermes_home,
config_path, env_path, the gateway PID, and the internal gateway health URL.
That exceeds the shape its own allowlist documents as public ("version,
gateway state, active session count, and the dashboard auth-gate shape. No
bodies, no session content, no secrets"), leaking deployment recon to any
unauthenticated caller on a network-exposed (gated) bind.

Withhold host-local detail unless the bind is loopback / --insecure, where
the dashboard is local-only and the caller is already inside the trust
envelope -- the same split should_require_auth draws. The NAS liveness probe
and the auth-gate badge are unaffected.

Adds invariant tests for both modes (gated withholds, loopback keeps).
2026-06-13 07:18:59 -07:00
Teknium
ad7436a5d9 fix(gateway): preserve WeCom per-group sender allowlists
Keep the own-policy fail-closed hardening from PR #45444, but still trust WeCom groups.<id>.allow_from because the adapter already checked that sender allowlist before dispatching to gateway auth.
2026-06-13 07:18:54 -07:00
Que0x
fc46354580 fix(security): fail closed when an own-policy gateway adapter has no allowlist
Own-policy adapters (WhatsApp, WeCom, Weixin, QQBot, Yuanbao) default dm_policy/group_policy to "open", which forwards every sender. The gateway's adapter-trust shortcut in _is_user_authorized blanket-trusted those platforms when no env allowlist was set, so an operator who enabled one with only credentials authorized the entire external network -- the fail-open SECURITY.md section 2.6 forbids ("an allowlist is required for every enabled network-exposed adapter").

Trust the adapter only when its effective policy for the chat type is an actual "allowlist" restriction (the case #34515 was protecting). "open"/"pairing"/anything else falls through to default-deny, where {PLATFORM}_ALLOW_ALL_USERS / GATEWAY_ALLOW_ALL_USERS and the pairing flow remain the explicit opt-ins.
2026-06-13 07:18:54 -07:00
Teknium
1185dfd773 test: cover legacy Office document extensions 2026-06-13 07:18:37 -07:00
Clayton Chew
f82cb48120 fix(platform): add .xls, .doc, .ppt to SUPPORTED_DOCUMENT_TYPES
Old Office formats (.xls, .doc, .ppt) were missing from the
SUPPORTED_DOCUMENT_TYPES dict in gateway/platforms/base.py while their
newer counterparts (.xlsx, .docx, .pptx) were included.

Sending an .xls file via Telegram triggers 'Unsupported document type'
and the file is silently dropped instead of being cached and forwarded
to the agent.

Add the three legacy MIME types so these files are handled the same way
as their modern equivalents.
2026-06-13 07:18:37 -07:00
Tranquil-Flow
4fd9397ae3 fix(codex): drop extra_headers for chatgpt.com backend 2026-06-13 07:13:24 -07:00
Sarvesh
45f9099e51 fix(matrix): preserve markdown table structure 2026-06-13 06:57:08 -07:00
Teknium
4373e802a1
fix(docs): reuse healthy skills index during Pages deploys (#45616) 2026-06-13 06:46:07 -07:00
Teknium
d206e1f51d fix(dashboard): keep local file browser on home 2026-06-13 06:39:38 -07:00
konsisumer
16fb573bae fix(gateway): clear bloated compression binding on compression-exhaustion auto-reset
After compression exhaustion the auto-reset created a fresh session but
discarded reset_session()'s return value and left the Telegram topic
binding pointing at the oversized compressed child. The next inbound
message in that topic healed the binding forward and switch_session'd the
freshly-reset lane back onto the bloated transcript, re-triggering
compression exhaustion in a loop with a new session id each time.

Capture the fresh entry and re-sync the topic binding to it so the next
message starts clean. No-op on non-topic lanes.

Regression of the #9893/#10063 auto-reset fix.

Fixes #35809
2026-06-13 06:38:29 -07:00
Teknium
6f43ff5572 chore(release): map Gemini schema contributor 2026-06-13 06:12:52 -07:00
Henrik Bentel
eed61a1251 fix(gemini): add role field to systemInstruction 2026-06-13 06:12:52 -07:00
Teknium
74c5158b10
fix(model): show bare custom endpoints in gateway picker (#45597)
Surface direct model.provider=custom endpoints in /model picker output and keep explicit bare custom switches on the current endpoint instead of requiring a named providers/custom_providers row.
2026-06-13 06:05:30 -07:00
Teknium
6724daa2c2
fix: keep CLI idle timer ticking (#45592) 2026-06-13 05:55:04 -07:00
Teknium
aa53a78d67
fix(desktop): hand off Windows bootstrap recovery (#45594) 2026-06-13 05:54:32 -07:00