Commit graph

5059 commits

Author SHA1 Message Date
Teknium
48ae8029aa
fix(delegate): resolve custom-endpoint subagent pools by endpoint identity (#41730)
Subagents delegated to a custom endpoint were misrouted when the parent
ran on a different custom endpoint. Both runtimes collapse to
provider="custom", so _resolve_child_credential_pool() treated them as
interchangeable and handed the child the parent's pool. Leasing from it
then overwrote the child's delegated base_url with the parent's endpoint
via _swap_credential() — the child sent the delegated model name to the
wrong endpoint.

Custom runtimes now resolve by endpoint identity (the custom:<name> pool
key derived from base_url). The parent pool is reused only when both
parent and child resolve to the same custom endpoint; unregistered raw
endpoints return None so the child keeps its fixed delegated credential.
Non-custom provider paths are unchanged.

Fixes #7833.
2026-06-07 22:05:14 -07:00
islam666
78e2101cd2 fix: reap zombie subprocesses in web_server action status and meet_bot cleanup
- web_server.py: after proc.poll() returns a non-None exit code, call
  proc.wait() to reap the child and move the entry from _ACTION_PROCS
  to _ACTION_RESULTS. Previously .poll() alone left <defunct> zombies.
- meet_bot.py: terminate and wait on the pcm_pump subprocess (paplay/
  ffmpeg) during the finally-block teardown. Previously leaked on every
  normal bot exit.
- tests: add test_action_status_reaps_completed_process and
  test_action_status_ignores_wait_failure covering both the happy path
  and the wait()-raises-OSError edge case.

Closes #38032
2026-06-07 21:50:57 -07:00
islam666
e53b74c394 fix(dist): stop USER_OWNED_EXCLUDE from filtering nested directories
The copytree ignore lambda in _copy_dist_payload applied USER_OWNED_EXCLUDE
recursively at every directory depth. This caused nested directories whose
names matched exclude entries (bin, logs, cache, etc.) to be silently dropped
during distribution install/update.

Fix: only apply USER_OWNED_EXCLUDE filtering at the root of the staged tree,
matching the two-tier pattern used by _clone_all_copytree_ignore and
_default_export_ignore in profiles.py.

Add 5 tests covering nested bin/logs/cache preservation and top-level
filtering still working.

Fixes #37954
2026-06-07 21:50:57 -07:00
islam666
09a5548628 fix(weixin): refresh typing ticket on expiry to prevent stuck indicator (#38085)
The WeChat iLink typing ticket has a 600-second TTL. When a long-running
session exceeds that window, the cached ticket evicts from TypingTicketCache.
Both send_typing and stop_typing silently returned early when the ticket was
None, meaning the TYPING_STOP=2 signal was never sent to iLink. The WeChat
client then showed the typing indicator indefinitely.

Fix: add _ensure_typing_ticket() that transparently refreshes the ticket
via getConfig when the cached one has expired or is missing. Both send_typing
and stop_typing now call this method instead of silently no-oping.

Fixes #38085
2026-06-07 21:50:57 -07:00
islam666
2e61de0638 fix(model_metadata): consult DEFAULT_CONTEXT_LENGTHS before 256K fallback on custom endpoints
Problem: get_model_context_length() had an early return at the end of the
custom-endpoint probe branch (step 3) that returned DEFAULT_FALLBACK_CONTEXT
(256K) without ever consulting the hardcoded DEFAULT_CONTEXT_LENGTHS catalog
(step 8). Models served through a custom/proxied gateway (e.g. corporate
Anthropic proxy) that didn't expose Ollama or local-server endpoints would
hit this path and get capped at 256K, even when the model name clearly
matched a known entry in the catalog (e.g. claude-opus-4-8 → 1M).

Changes:
- agent/model_metadata.py: Before returning DEFAULT_FALLBACK_CONTEXT at the
  end of the custom-endpoint branch, consult DEFAULT_CONTEXT_LENGTHS using
  the same longest-key-first fuzzy matching as step 8. Only fall through
  to 256K if no catalog entry matches.
- tests/agent/test_model_metadata.py: Updated existing test and added new
  test covering the custom-endpoint → catalog fallback behavior.

Fixes #38865
2026-06-07 21:50:57 -07:00
islam666
9513793ad7 fix(vision): proactive downgrade for providers rejecting list-type tool content (#41072)
Xiaomi MiMo (and potentially other providers) support multimodal user
messages but reject list-type tool message content with 400 'text is not
set'. Previously this was handled reactively — the API call would fail,
images would be stripped, and the request retried, losing visual info.

Fix: add supports_vision_tool_messages field to ProviderProfile (default
True). Xiaomi sets it to False. _tool_result_content_for_active_model
now checks this field proactively and returns a text summary instead of
list content, avoiding the round-trip failure entirely.
2026-06-07 21:50:57 -07:00
islam666
41f0714287 fix(vision): honor custom_providers per-model supports_vision (#41036)
_supports_vision_override() in image_routing.py checked model.supports_vision
and providers.<name>.models, but not the legacy list-style custom_providers
config. A custom provider entry like:

  custom_providers:
    - name: my-provider
      models:
        my-model:
          supports_vision: true

was ignored, causing image_input_mode=auto to route through the auxiliary
vision_analyze path instead of natively attaching images.

Fix: added a lookup step for custom_providers list entries, matching by
provider name (including 'custom:<name>' variants at runtime).
providers.<name>.models still takes precedence over custom_providers.

13 new tests covering: true/false override, custom: prefix matching,
no-match fallback, non-dict entries, empty lists, models key missing.
2026-06-07 21:50:57 -07:00
islam666
18c085b1a4 fix(gateway): normalize optional systemd directives in stale-check (#41119)
On older systemd versions that don't support RestartMaxDelaySec /
RestartSteps, the installed unit file has those directives silently
dropped. systemd_unit_is_current() did a strict text comparison, so
the unit was perpetually flagged as outdated.

Fix: _strip_optional_systemd_directives() removes RestartMaxDelaySec
and RestartSteps from both the installed and expected text before
comparison. Units that differ only by these optional directives are
now correctly considered current.
2026-06-07 21:50:57 -07:00
islam666
b18490b890 fix(compaction): prevent infinite loop when transcript fits in tail budget
When summary_target_ratio is large (e.g. 0.45) and the context_length is
moderate (e.g. 96000), the soft_ceiling (token_budget * 1.5) can exceed
the total transcript size.  _find_tail_cut_by_tokens walks the entire
transcript without breaking early, and the resulting compress window is
either empty (compress_start >= compress_end) or a single message whose
summary-of-one overhead saves ~0 tokens.

Both outcomes cause a no-op compression that does not increment
_ineffective_compression_count, so should_compress() returns True on
every subsequent turn and the loop repeats endlessly.

Fix (two layers):
1. _find_tail_cut_by_tokens: when the backward walk consumed the entire
   transcript without breaking (cut_idx <= head_end and accumulated <=
   soft_ceiling), re-walk with the raw (non-inflated) token budget to
   find a meaningful cut that gives the summarizer a useful middle window.
2. compress(): when compress_start >= compress_end, increment
   _ineffective_compression_count and log a warning so the existing
   anti-thrashing guard in should_compress() can break the loop.

Fixes #40803
2026-06-07 21:50:57 -07:00
Brian D. Evans
ab0a6270c3 fix(slack): align thread_ts check with is_thread_reply invariant (Copilot #15464)
Two findings from Copilot's review on #15464, both addressed:

1. ``event.get("thread_ts")`` truthy vs
   ``event_thread_ts != ts``: the new channel branch treated ANY
   truthy ``thread_ts`` as a real thread reply, but three lines below
   ``is_thread_reply`` is defined with the stricter
   ``event_thread_ts and event_thread_ts != ts`` invariant.  If Slack
   ever ships a payload where ``thread_ts == ts`` on a thread root,
   the stricter check would treat it as a top-level message for the
   ``is_thread_reply`` path but as a thread reply for session keying
   — divergent behaviour.  Aligned this branch to the same
   ``and event_thread_ts_raw != ts`` invariant.

2. ``test_top_level_reply_to_id_stays_none_when_shared`` docstring
   had the ternary logic backwards ("None != ts → reply_to_message_id
   IS set").  The code reads
   ``reply_to_message_id = thread_ts if thread_ts != ts else None`` —
   with ``thread_ts = None``, the condition is True so the expression
   evaluates to ``thread_ts`` itself (None), meaning the reply stays
   un-threaded.  The test asserted the correct end-state; only the
   explanatory docstring was wrong.  Rewrote the docstring to match
   the actual code flow, with the note that Copilot caught the
   reversal.

7/7 tests still pass.  No behaviour change for the existing
test_thread_reply_scopes_by_thread_even_when_shared case because
``event_thread_ts_raw = "1700000000.000000"`` and ``ts =
"1700000000.000005"`` are distinct — the new
``!= ts`` guard is a no-op there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 21:19:59 -07:00
Brian D. Evans
133e0271e2 fix(slack): scope top-level channel messages by channel-only when reply_in_thread=false (#15421)
Top-level Slack channel messages previously fell back to the message's
own ``ts`` as a synthetic ``thread_ts``:

    thread_ts = event.get("thread_ts") or ts  # ts fallback for channels

That value flows into ``build_source(thread_id=thread_ts)`` at
line 1247.  The gateway session store keys sessions by
``(platform, channel_id, thread_id)``, so every top-level channel
message ended up on a unique session.  Operators who set
``reply_in_thread: false`` in ``config.yaml`` expected all top-level
channel messages to share one session (the whole point of that flag)
— instead each one spawned a fresh conversation with no context
carry-over.

### Fix

Three explicit cases in the channel branch:

| event.thread_ts | reply_in_thread | thread_ts for session keying |
|---|---|---|
| non-null (real thread reply) | either | event.thread_ts |
| null (top-level) | true (default) | ts (legacy: own-thread sessions) |
| null (top-level) | false | **None** (shared channel session) |

The outbound-reply gate at line 1264 (``reply_to_message_id =
thread_ts if thread_ts != ts else None``) still works correctly in
all three cases without further changes: ``None != ts`` is True, so
shared-channel top-level messages don't get their reply threaded
either — matching the operator's ``reply_in_thread=false`` intent
end-to-end.

Genuine thread replies still scope per-thread under both modes so
multi-person threaded conversations can't collide with unrelated
channel chatter.

### Tests (7 new in ``tests/gateway/test_slack_channel_session_scope.py``)

All drive the real ``SlackAdapter._handle_slack_message`` code path
(not a re-implementation) via the standard pytest fixture pattern
used by ``tests/gateway/test_slack.py``.  Messages @mention the bot
so the mention gate doesn't drop them — the tests are specifically
about what happens once the handler decides to emit a ``MessageEvent``.

* ``TestChannelSessionScopeDefault`` (2 cases):
  - Explicit ``reply_in_thread: true`` keeps ``thread_id = ts``
    (legacy behaviour — regression guard)
  - Unset config behaves like ``reply_in_thread: true`` (pins the
    default)
* ``TestChannelSessionScopeShared`` (3 cases):
  - ``reply_in_thread: false`` + top-level → ``thread_id is None``
    (the #15421 bug 1 fix)
  - ``reply_to_message_id is None`` in the same case (no threaded
    outbound reply)
  - Genuine thread reply still scopes per-thread when shared mode is
    on — only TOP-LEVEL messages collapse to the channel session
* ``TestThreadReplyAlwaysScopesByThread`` (2 parametrised cases):
  - Thread replies get ``thread_id = event.thread_ts`` regardless of
    ``reply_in_thread`` — critical invariant for multi-thread
    channels; a regression here would leak per-thread context across
    threads

**Regression guard verified**: reverted the else-branch to the legacy
``thread_ts = event.get("thread_ts") or ts`` one-liner;
``test_top_level_maps_to_none_when_reply_in_thread_false`` correctly
failed (asserts ``thread_id is None`` but got ``"1700000000.000003"``).
Restored → 182 slack tests pass (175 existing + 7 new).

Scope: this fixes #15421 bug 1 only.  Bug 2 (sessions.json not
persisting across compression) lives elsewhere in the session
manager and is left for a separate diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 21:19:59 -07:00
Shannon Sands
86e5efb0ae Preserve Telegram onboarding fallback errors 2026-06-07 19:48:09 -07:00
Shannon Sands
ba29010902 Use httpx for Telegram onboarding worker calls 2026-06-07 19:48:09 -07:00
brooklyn!
fa42ac094d
feat(desktop): Shift+click the status-bar zap to toggle YOLO globally (#41666)
The status-bar zap currently toggles per-session approval bypass (the same
scope as the TUI's Shift+Tab). This adds a global escape hatch: Shift+clicking
the zap flips the persistent approvals.mode in config.yaml between "off"
(bypass on) and "manual" (bypass off), affecting every session, the CLI, the
TUI, and cron — and it survives restarts.

- statusbar-controls: thread the click's shiftKey through onSelect via a new
  StatusbarSelectModifiers arg.
- yolo-session: add setGlobalYolo() that calls config.set with scope="global".
- use-statusbar-items: branch toggleYolo on modifiers.shiftKey; plain click
  stays per-session, Shift+click goes global.
- tui_gateway config.set "yolo" key: add scope="global" that reads/writes
  approvals.mode through the gateway's own (mtime-cached) config view, honors
  an explicit value, and re-emits session.info to every live session so each
  window's zap reflects the flip immediately.
- i18n: tooltip copy in en/ja/zh/zh-hant notes Shift+click toggles globally.

Tests: two new tui_gateway tests cover the global toggle and explicit-value
paths; existing session/process-scope yolo tests still pass.
2026-06-07 20:57:08 -05:00
Teknium
30c7913617
fix(api_server): report hermes version on /health and /health/detailed (#40620)
Salvaged from #40479; re-verified on main, tightened, tested.

Co-authored-by: tfournet <tfournet@users.noreply.github.com>
2026-06-07 18:38:54 -07:00
Teknium
b97cd81c78
refactor(insights): drop dead pricing/duration wrappers, call usage_pricing directly (#40618)
Salvaged from #40527; re-verified on main, tightened, tested.

Co-authored-by: HeLLGURD <HeLLGURD@users.noreply.github.com>
2026-06-07 18:33:20 -07:00
Teknium
2aa316ec9c
docs(windows): fix Get-Command PATH guidance to venv\Scripts\hermes.exe (#40613)
Closes #40464.

Salvaged from #40488; re-verified on main, tightened, tested.

Co-authored-by: gauravsaxena1997 <gauravsaxena1997@users.noreply.github.com>
2026-06-07 18:28:23 -07:00
Teknium
6bdc4c0231
test: skip curses tests on Windows where _curses is unavailable (#40611)
Salvaged from #40447; re-verified on main, tightened, tested.

Co-authored-by: Ganesh0690 <Ganesh0690@users.noreply.github.com>
2026-06-07 18:21:03 -07:00
Teknium
69a293b419 hardening(todo): bound TodoStore item content length and count
The todo list is re-injected into the model's context after every
context-compression event (TodoStore.format_for_injection), so an oversized
todo item or an unbounded number of items defeats the compression it is meant
to ride through. TodoStore.write/_validate previously enforced no size or count
bounds, so a single 50KB item produced a ~50KB re-injection block on every
subsequent turn.

Add two caps:
- MAX_TODO_CONTENT_CHARS (4000): per-item content is truncated with a marker.
  Routed through a shared _cap_content() so the merge-update path (which writes
  content directly, bypassing _validate) is capped too.
- MAX_TODO_ITEMS (256): total list length is bounded, keeping the
  highest-priority head (list order is priority).

Both caps are generous relative to real plans — a todo item is a short task
description and active lists are a handful of items.

NOT a security fix. Raised externally via GHSA-5g4g-6jrg-mw3g, which framed a
caller-supplied conversation_history on the authenticated API server replaying
into _hydrate_todo_store as a DoS. That path is authenticated (the API server
refuses to start without API_SERVER_KEY) and self-scoped (the caller supplies
their own entire history and can only inflate their own response chain — forged
role=tool entries are never persisted to the session DB), so it is out of scope
as a vulnerability under SECURITY.md 3.2. These bounds are footgun containment
that also applies to the trusted agent path, where the model itself authors the
todos. Credit to the reporter for the observation.

Co-authored-by: YLChen-007 <30854794+YLChen-007@users.noreply.github.com>
2026-06-07 18:06:27 -07:00
Gilad Bauman
ae82eed2b1 fix(gateway): use OGG for Telegram auto TTS 2026-06-07 18:05:58 -07:00
Teknium
cb83149dc6
fix(yuanbao): bound ws.close() so an idle server can't stall shutdown ~5s (#40607)
Salvaged from #40421; re-verified on main, tightened, tested.

Co-authored-by: maxmilian <maxmilian@users.noreply.github.com>
2026-06-07 17:49:38 -07:00
Teknium
09d66037f8
fix(hindsight): send only new-turn delta on append retains instead of whole session (#40605)
Closes #40503.

Salvaged from #40519; re-verified on main, tightened, tested.

Co-authored-by: skylarbpayne <skylarbpayne@users.noreply.github.com>
2026-06-07 17:41:10 -07:00
kshitijk4poor
7df81d0557 fix(web): make _has_env config-aware so SEARXNG_URL auto-detect honors Hermes config
Follow-up to #34306. The provider fix made SearXNG *usable* with a
config-only SEARXNG_URL, but tools/web_tools._has_env still read raw
os.getenv, so the backend auto-detect cascade and check_web_api_key
remained blind to it — SearXNG worked when explicitly selected but was
never auto-selected. Route _has_env (and the SearXNG diagnostic print)
through a config-aware _env_value helper mirroring the provider's
_searxng_url(). Fixing the shared helper covers every provider key in
one place. Adds regression tests for config-only auto-detect and
check_web_api_key. See #34290.
2026-06-08 01:12:32 +05:30
kshitij
0c0fbf763b
Merge pull request #41430 from helix4u/fix-url-tools-unicode-normalization
fix(tools): percent-encode non-ascii URL components
2026-06-07 12:39:30 -07:00
helix4u
333f01bc7f fix(tools): percent-encode non-ascii URL components 2026-06-07 11:42:26 -06:00
teknium1
16786f3bb3 feat(desktop+gateway): remote media relay — attach images/PDFs and display gateway images over the network
Desktop connected to a remote gateway can now attach images and PDFs and
display agent-written images. Previously the desktop passed a LOCAL file path
to image.attach; on a remote gateway that path doesn't exist, so the image was
silently dropped ("skipped unreadable path") and the vision model never saw it.
The reverse direction was also broken — images the agent wrote on the gateway
rendered as dead links in the remote client.

Gateway (tui_gateway/server.py):
- image.attach_bytes: base64 byte upload written into the gateway's own images
  dir and queued via the existing native-image-attach pipeline. Magic-byte
  extension sniffing, data-URL prefix + whitespace tolerance, 25 MB cap,
  structured error codes. Accepts content_base64/filename (canonical) and
  data/ext (older-desktop aliases).
- pdf.attach: renders each page to PNG via pdftoppm (poppler-utils) at 150 DPI
  and queues the pages as images; 50 MB / 25-page caps. Accepts host path or
  base64 upload.
- Shared helpers (_decode_attach_base64, _sniff_image_ext, _queue_attached_image)
  so the two methods and the existing image.attach don't duplicate logic.

Gateway (hermes_cli/web_server.py):
- GET /api/media: returns a gateway-local image as a base64 data URL so remote
  clients can display it. Auth-gated like every /api route, extension
  allowlist + size cap, AND confined to the gateway's own media roots
  (images/screenshots/cache, resolved symlink-safe) so an authed caller can't
  read image-extension files anywhere on disk.

Desktop (apps/desktop):
- syncImageAttachmentsForSubmit uploads bytes via image.attach_bytes when the
  connection mode is 'remote'; the local fast path is unchanged.
- media.ts gains isRemoteGateway() + gatewayMediaDataUrl(); directive-text and
  markdown-text fetch images over /api/media in remote mode.

Consolidates the competing remote-media PRs (#38876, #40317, #21908, #39437)
into one coherent implementation, taking the strongest parts of each and adding
shared-helper cleanup plus the /api/media root-confinement hardening on top.
The per-profile gateway switching from #38876 is intentionally left out as a
separable feature. TUI file uploads (#40492) remain a separate surface.

Tested: 11 new tui_gateway tests + 5 /api/media endpoint tests + desktop
media.remote unit tests; full tui_gateway + web_server suites green (472
passed); tsc -b clean; E2E verified the full attach→disk→queue and
gateway-path→data-URL display round-trip plus the out-of-root security block.

Co-authored-by: Max Mitcham <maxmitcham@mac.home>
Co-authored-by: Justlrnal4 <Justlrnal4@users.noreply.github.com>
Co-authored-by: Chris Cook <ccook@nvms.com>
Co-authored-by: Thomas Paquette <thomas.paquette@gmail.com>
2026-06-07 10:05:53 -07:00
Teknium
0c48b7165d hardening(api-server): scan cron prompts on REST create/update for parity with the agent tool
The agent-facing cronjob tool scans the user prompt with _scan_cron_prompt()
before creating/updating a job (tools/cronjob_tools.py); the REST cron
endpoints (POST /api/jobs, PATCH /api/jobs/{id}) validated length but not
content. This adds the same scan to both handlers so an exfiltration/injection
prompt is rejected the same way regardless of which surface created the job.

NOT a security boundary, defense-in-depth / parity only: the REST cron
endpoints are authenticated (every handler runs _check_auth, and connect()
refuses to start without API_SERVER_KEY), and _scan_cron_prompt is a documented
in-process heuristic, not a containment boundary (SECURITY.md 3.2).

Raised externally via GHSA-fr3q-rjg3-x6mf (DNS-rebinding pre-auth RCE). The
report's load-bearing 'no auth by default' premise was already closed three
weeks after it was filed by the API_SERVER_KEY-required guard (commit
1a9ef8314); this lands the create/update prompt-validation parity the report
also pointed at. Scanner imported defensively so a missing scanner cannot
disable the cron REST API.
2026-06-07 10:04:57 -07:00
Teknium
af08c43f3e
fix: skip MCP preflight content-type probe on reconnect when already ready (#40604)
Closes #40366.

Salvaged from #40548; re-verified on main, tightened, tested.

Co-authored-by: mohamedorigami-jpg <mohamedorigami-jpg@users.noreply.github.com>
2026-06-07 09:51:11 -07:00
teknium1
76f01780f0 fix(kanban): sweep deferred scratch parent on non-scratch child completion + tests
Follow-up on the deferred-cleanup salvage (#33774): _cleanup_workspace
returned early for a non-scratch ('dir'/'worktree') task and never ran the
parent sweep, so a scratch parent waiting on a 'dir' child would leak its
deferred workspace forever. Run the parent sweep before the early return.

Adds regression tests: deferred-while-child-active, swept-after-last-child,
and dir-child-unblocks-scratch-parent.
2026-06-07 09:50:44 -07:00
Teknium
cb3e41e2fd
feat(onboarding): opt-in structured profile-build path on first contact (#41114)
* feat(onboarding): opt-in structured profile-build path on first contact

On a user's very first gateway message, Hermes now optionally offers to
build a short profile of them — then, only with consent, gathers durable
facts and persists them to the user-profile memory store (memory tool,
target="user") so future sessions start already knowing who they are.

Inspired by Poke's zero-input onboarding, but consent-first by design:
- The agent OFFERS, never assumes. Declining stops it immediately.
- Before ANY external lookup it states what it will look up and asks.
- It never reads connected accounts (email/calendar) silently — the
  exact privacy concern that made naive implementations feel invasive.

Wiring reuses existing infrastructure end-to-end:
- gateway/run.py first-message hook (was a plain self-intro) now swaps in
  the profile-build directive when enabled and not yet offered.
- agent/onboarding.py gains profile_build_mode()/profile_build_directive()
  + PROFILE_BUILD_FLAG, latched once via the existing onboarding.seen
  mechanism so the offer fires at most once per install.
- config default onboarding.profile_build: "ask" (set "off" to disable).
  Added to an existing section, so no _config_version bump needed.

No new storage layer, no new injection path, no prompt-cache impact.

* fix(dashboard): fold onboarding into agent tab to avoid 1-field category

onboarding.profile_build is the only schema-surfaced onboarding field
(onboarding.seen is an internal latch dict), so the dashboard CONFIG_SCHEMA
single-field-category invariant rejected it. Merge onboarding -> agent like
the other small categories.
2026-06-07 08:36:48 -07:00
Teknium
d87f293972
feat(compression): temporal anchoring in compaction summaries (#41102)
Compaction summaries now receive the current date and instruct the
summarizer to rewrite completed actions as absolute, dated, past-tense
facts (e.g. "email John about the proposal" -> "Sent the proposal email
to John on 2026-06-07"). A resumed conversation no longer re-issues work
that already happened or treats a finished action as still pending.

The date is resolved via hermes_time.now() (date-only, user-configured
timezone) inside _generate_summary. The compaction summary is a
mid-conversation message that is never part of the cached prefix, so the
date does not affect prompt-cache stability. Date resolution is
best-effort: a clock failure omits the rule rather than blocking
compaction. The rule rides the shared template, so both first-compaction
and iterative-update prompts carry it.

Inspired by Poke's summarization (temporal anchoring + semantic
preservation).
2026-06-07 08:36:45 -07:00
Teknium
9dbad1990b
test(discord): align clarify/model-picker tests with fail-closed component auth (#41338)
Three gateway tests broke on main after the component-auth security
hardening (test_discord_component_auth.py) made empty Discord component
allowlists fail-closed: a view built with allowed_user_ids=set() now
rejects every click instead of allowing anyone.

The clarify and model-picker BEHAVIOR tests still constructed their views
with an empty allowlist and expected the click to succeed — a stale
assumption from before the hardening. Fixed by giving each view an
allowlist containing the clicking user (the interaction's own id), which
is the realistic shape and what the security model requires.

Production code unchanged — this only updates the test fixtures to match
the intended (and separately pinned) fail-closed contract. The security
regression suite and these behavior suites now both pass.

Fixes:
- test_discord_clarify_buttons.py: test_choice_falls_back_to_label_text_when_entry_missing, test_other_flips_entry_to_awaiting_text
- test_discord_model_picker.py: test_model_picker_clears_controls_before_running_switch_callback
2026-06-07 08:27:40 -07:00
LaPhilosophie
f6f363662e fix(discord): fail closed for component button auth when no allowlist set
Salvage of the Discord half of PR #30964 by @LaPhilosophie. Discord
component button callbacks (ExecApprovalView, SlashConfirmView,
UpdatePromptView, ModelPickerView) bypass the normal message dispatch
authorization path. _component_check_auth previously returned True when
both the user and role allowlists were empty, so any guild member who
could see an approval prompt could click Approve on a dangerous command.

Fail closed instead: require DISCORD_ALLOWED_USERS / DISCORD_ALLOWED_ROLES
/ GATEWAY_ALLOWED_USERS membership, or an explicit DISCORD_ALLOW_ALL_USERS
/ GATEWAY_ALLOW_ALL_USERS opt-in for deliberately-open deployments.

Mirrors the Telegram (#24457) and Matrix fail-closed precedent.
The Slack half of #30964 is superseded by PR #33844's helper.

Reported via GHSA-mc26-p6fw-7pp6 (@whyiug).

Co-authored-by: LaPhilosophie <804436395@qq.com>
2026-06-07 06:21:37 -07:00
Dusk1e
3fa15b33dd fix(feishu): fail closed for update prompt card actions 2026-06-07 06:21:37 -07:00
Dusk1e
410cb743bf fix(slack): re-check gateway auth on approval and slash-confirm buttons 2026-06-07 06:21:37 -07:00
synapsesx
f10a330aee fix(research): keep tool_call/tool_response pairs intact when compressing trajectories
## What does this PR do?

The trajectory compressor could corrupt training trajectories by cutting a
conversation in the middle of a tool-call/tool-response pair. In the from/value
trajectory format a `tool` turn (carrying `<tool_response>` markers) is always
emitted immediately after the `gpt` turn whose `<tool_call>` it answers, so the
two turns must stay together. The compressible region's end boundary, however,
was chosen purely by token accumulation: the loop stopped at the first turn where
the accumulated tokens met the savings target, with no regard for turn roles. For
any over-budget trajectory whose savings boundary happened to land between a `gpt`
turn and its `tool` turn, the `gpt` (with its `<tool_call>`) was summarised away
into the replacement `human` message while the now-orphaned `tool` turn (with its
`<tool_response>`) was kept verbatim in the tail — producing an unmatched marker
and silently corrupting the training signal. The head boundary had the mirror
problem when the first tool turn was not protected.

This change snaps both compression boundaries to a clean turn boundary before the
region is extracted and replaced, so the summary always covers whole gpt+tool
blocks and a `tool` turn is never separated from the `gpt` turn that precedes it.
The boundary is moved forward when possible (folding an orphaned tool turn into
the region that already holds its gpt) and falls back to moving backward when no
clean boundary exists ahead, such as when the protected tail itself begins on a
tool turn.

## Related Issue

N/A

## Type of Change

- [x] 🐛 Bug fix (non-breaking change that fixes an issue)

## Changes Made

- `trajectory_compressor.py`: added `_is_boundary_clean()` and `_snap_boundary()`
  helpers on `TrajectoryCompressor`, and applied them to both the head and tail
  compression boundaries in `compress_trajectory()` and
  `compress_trajectory_async()`. When snapping collapses the region to nothing
  safe to compress, the trajectory is returned unchanged and flagged as still
  over the limit rather than being corrupted.
- `tests/test_trajectory_compressor.py`: added `TestCompressionToolPairIntegrity`
  covering the sync and async paths plus direct unit tests for the boundary
  snapping (forward skip and backward fallback).

## How to Test

1. Run the focused tests: `pytest tests/test_trajectory_compressor.py -q`.
2. The new sync/async cases build a trajectory of gpt/tool pairs with an oversized
   middle gpt turn and choose a token target that forces the accumulation
   boundary to stop between a `<tool_call>` and its `<tool_response>`. They assert
   that `<tool_call>` and `<tool_response>` markers stay balanced after
   compression and that every kept `tool` turn is immediately preceded by a `gpt`
   turn (never the inserted summary or another tool turn).

## Checklist

### Code

- [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md)
- [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.)
- [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix/feature (no unrelated commits)
- [x] I've run `pytest tests/ -q` and all tests pass
- [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)

### Documentation & Housekeeping

- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
2026-06-07 05:01:27 -07:00
manishbyatroy
490c486ff6 fix(simplex): accept display name in SIMPLEX_ALLOWED_USERS
SIMPLEX_ALLOWED_USERS silently denied every contact when operators
listed display names instead of numeric contactIds. The SimpleX UI
never surfaces the numeric id, so display names are what operators
naturally put in the env var. _is_user_authorized only compared
source.user_id (the contactId), so the allowlist never matched.

Expand check_ids to include source.user_name for the simplex platform,
mirroring the existing WhatsApp phone-LID aliasing pattern. Adds doc +
setup-prompt clarification and three regression tests.

Salvaged from PR #40393. Adds manishbyatroy to release.py AUTHOR_MAP.
2026-06-07 04:53:22 -07:00
teknium1
1a4010edf5 test(approval): regression for shell-escape denylist bypass (#36846, #36847) 2026-06-07 03:57:21 -07:00
Teknium
1fb99b1f22
fix(stream+output-cap): guard empty streams and parse OpenRouter output-cap errors (#40589)
Two isolated reliability fixes:
- chat_completion_helpers: raise on a zero-chunk stream (no finish_reason,
  no content/reasoning/tool_calls) so retry handles it instead of
  fabricating a successful empty turn.
- model_metadata: parse the OpenRouter/Nous output-cap error phrasing
  ("maximum context length is N ... (A of text input, B of tool input,
  C in the output)") so parse_available_output_tokens_from_error returns
  a real cap and the caller stops looping on it.

Salvaged from #40405 (@ashishpatel26) — took the two stream/error-parsing
fixes. The PR also bundled compression-state changes (on_session_start
clearing _previous_summary; cron session-id prefix preservation, #38788);
those touch the compression hot path and are split out for separate review.

Co-authored-by: ashishpatel26 <ashishpatel26@users.noreply.github.com>
2026-06-07 03:52:09 -07:00
Teknium
9e63109522
feat(dashboard): change UI font from the theme picker, independent of theme (#41145)
The dashboard font is now selectable from the UI, not just YAML. A new Font
section in the header theme picker overrides the UI font of whatever theme is
active; the choice is orthogonal to the theme and survives theme switches.
Each theme keeps its own font as the default — picking "Theme default" clears
the override.

- web/src/themes/fonts.ts: curated font catalog (system + Google Fonts across
  sans/serif/mono), each with a family stack and optional webfont URL. The
  catalog is the only injected-font surface — no free-text URL box, so the
  injected <link> origins stay fixed.
- web/src/themes/context.tsx: font-override state (localStorage + server),
  applied after theme typography so it wins; theme apply re-asserts it, and
  clearing re-runs theme apply to restore the theme's own font. Mono is left
  to the theme so code/terminal are untouched.
- web/src/components/ThemeSwitcher.tsx: Font section with grouped, self-
  previewing font rows and a "Theme default" clear option.
- hermes_cli/web_server.py: GET/PUT /api/dashboard/font persisting to
  config.yaml dashboard.font, with a server-side id allow-list (unknown ids
  coerce to the theme sentinel).
- i18n + types, api client methods, tests, and docs.

Validation: 6 new backend endpoint tests pass; tsc + vite build clean; live
browser test confirmed pick/persist/survive-theme-switch/clear all work.
2026-06-07 03:39:01 -07:00
Teknium
0507e4630d
fix(desktop): preserve configured base_url on same-provider model switch (#41121)
The desktop model picker calls POST /api/model/set with provider+model only
(no base_url). _apply_main_model_assignment cleared model.base_url for every
non-custom provider, so re-picking a Xiaomi MiMo model wiped a Token Plan
endpoint (https://token-plan-*.xiaomimimo.com/v1) back to the registry default
api.xiaomimimo.com — breaking valid tp- keys with 401s.

Now base_url is cleared only when switching to a different provider (the stale
URL belonged to the old one); same-provider re-assignment preserves it, and an
explicitly supplied base_url is honored for any provider.
2026-06-07 02:48:21 -07:00
Teknium
ed81cfe3de
fix(cron): bound the desktop run-history query to one job (#41088)
The cron run-history endpoint (GET /api/cron/jobs/{id}/runs, added in
#40684) reused list_sessions_rich's order_by_last_active path with a
leading-wildcard id_query. That routes through the recursive
compression-chain CTE, which seeds from EVERY source='cron' row in the DB
and runs per-row preview/last_active subqueries before filtering to one
job and applying LIMIT. Work scaled with the total cron history, so a
large pile made the run-history load time out before eventually
populating.

Cron runs are flat, never-compressed sessions with ids of the form
cron_{job_id}_{ts}, so the chain machinery is pure overhead and the
job binding is a true prefix, not a substring.

- New SessionDB.list_cron_job_runs(): bounded [prefix, hi) id-range scan
  on source='cron', ordered by started_at DESC, with the same
  preview/last_active enrichment. No CTE, no leading-wildcard LIKE.
- Add idx_sessions_source(source, id) so the range is an index scan;
  bump SCHEMA_VERSION 14 -> 15 (index reconciles onto existing DBs via
  CREATE INDEX IF NOT EXISTS on startup).
- Point the endpoint at the new method.

Measured on a real SessionDB with 30k cron rows: 5ms vs 85ms for the old
path (16x), and the new path stays flat as the pile grows while the old
one scaled with it. Verified the query plan uses idx_sessions_source_id
(range scan, no full table scan), runs are correctly scoped (substring
collisions like cron_xalpha_ excluded), newest-first, and paged.
2026-06-07 02:41:01 -07:00
Teknium
5a3092b601
fix(desktop): scope in-session /model switch per-session, stop process-env leak (#41120)
* fix(desktop): scope in-session /model switch per-session, stop process-env leak

The desktop/dashboard tui_gateway backend hosts every same-profile session
in ONE process. An in-session /model switch wrote process-global env vars
(HERMES_MODEL / HERMES_INFERENCE_MODEL / HERMES_TUI_PROVIDER /
HERMES_INFERENCE_PROVIDER), which _resolve_startup_runtime() reads when
building a fresh agent. So switching the model in one session leaked into
every other live session's next agent rebuild (/new, resume) — changing the
model in session B silently changed it in session A.

Fix: record the switch as a per-session model_override on the session dict
instead of mutating os.environ. _make_agent honors that override on rebuild
(carrying the concrete base_url/api_key/api_mode the switch resolved), and
falls back to global config when absent. Global persistence on the --global
flag is unchanged.

Also a cleaner fix for #16857 (/new after switching to a custom-provider
model): the override carries the resolved credentials, so the rebuild keeps
the right endpoint without relying on the leaky env vars.

Reported via Twitter (@Da7_Tech): MiniMax M3 in one session + GLM 5.1 in
another interfere when switching between them.

* test(tui_gateway): align /model switch tests with per-session override contract

The three test_config_set_model_syncs_* tests asserted the old leaky contract
(switch writes HERMES_MODEL / HERMES_TUI_PROVIDER / HERMES_INFERENCE_PROVIDER to
process env). That env-sync IS the cross-session contamination bug this PR
removes. Updated to assert the new contract: shared process env untouched, the
switch recorded as a per-session model_override carrying provider/model/base_url/
api_key/api_mode. #16857's intent (a custom-provider switch survives /new) is
still covered — now via the override _make_agent honors on rebuild.
2026-06-07 02:33:28 -07:00
bmoore210
330ca4585b fix: harden gateway startup and turn persistence
Persist the inbound user turn before provider/tool execution so a crash
before run_conversation() (e.g. provider/httpx client init failure) keeps
the inbound message in the transcript. Repair stale/missing SSL_CERT_FILE
state on gateway startup, and avoid duplicate gateway fallback writes.
2026-06-07 02:15:23 -07:00
helix4u
591e6fb8f4 fix(computer_use): honor custom vision routing 2026-06-07 02:09:20 -07:00
kshitijk4poor
ffe665277c fix(aux): honor model.default_headers on auxiliary client too (#40033)
The salvaged main-agent fix (sanidhyasin) applies model.default_headers
to the primary OpenAI client, but the auxiliary client (title generation,
context compression, vision routing) builds its own clients and did not
read the override. For a `provider: custom` endpoint behind a gateway/WAF
that rejects the OpenAI SDK's identifying headers, the main turn would
succeed while auxiliary calls to the same endpoint still failed with the
opaque 502/4xx from #40033.

Add agent.auxiliary_client._apply_user_default_headers() (user values win
over provider/SDK defaults; no-op when unconfigured) and apply it at every
OpenAI-wire client construction site:
- _try_custom_endpoint() — config-level `model.provider: custom`
- the named custom-provider branch (custom_providers/providers entries),
  including the anthropic-SDK-missing OpenAI-wire fallback
- the api-key-provider, async-conversion, and main resolve_provider_client
  fallback branches

To prevent the two clients ever drifting on precedence/value handling,
AIAgent._apply_user_default_headers (run_agent.py) now delegates the config
read + merge to this shared helper (run_agent already imports from
auxiliary_client). Native Anthropic/Bedrock branches are untouched (they
don't use the OpenAI wire).

8 new tests (helper semantics + config-level custom + named custom);
full aux + attribution header suites green (295).
2026-06-07 02:02:40 -07:00
Sanidhya Singh
a216ff839b fix(agent): honor model.default_headers for custom OpenAI-compatible providers (#40033)
Custom OpenAI-compatible endpoints sitting behind a gateway/WAF can reject
the OpenAI Python SDK's default identifying headers (User-Agent: OpenAI/Python,
X-Stainless-*) and return an opaque 502/4xx even though the same request body
succeeds under curl. There was no supported way to override those headers.

Add a model.default_headers config key whose values are merged onto the
OpenAI client's default_headers, taking precedence over provider- and
SDK-supplied defaults. Applied at client construction and on every credential
swap / client rebuild so the override survives reconnects. No-op for native
Anthropic / Bedrock modes and when unconfigured.
2026-06-07 02:02:40 -07:00
Teknium
3c8f1dee8d fix(compression): don't overwrite the -1 post-compression sentinel in preflight seed (#36718)
compress_context() sets last_prompt_tokens=-1 right after compression to
mark "no real API usage yet". The preflight display-seed used
`_preflight_tokens > (last_prompt_tokens or 0)`, and `(-1 or 0)` is -1
(truthy), so any positive rough estimate clobbered the sentinel with a
schema-inflated count — re-triggering compression on the next turn.
Treat any negative value as "no real data yet" and skip the seed.

Salvaged from #40246 as the minimal root-cause fix. The original also
added an `_awaiting_suppression_count` bounded-window state machine to
should_compress() across 3 files; left out here to keep blast radius
small — the sentinel guard alone fixes the re-fire. The suppression
window can be added separately if the usage=None-stub edge case warrants it.

Co-authored-by: davidgut1982 <davidgut1982@users.noreply.github.com>
2026-06-07 01:56:51 -07:00
Teknium
e18f14d928
test(kimi): align stale parity/profile tests with thinking-xor-effort contract (#41095)
* test(kimi): align stale parity/profile tests with thinking-xor-effort contract

ce4e74b3 (fix(kimi): send thinking xor reasoning_effort, never both)
changed the Kimi profile to emit at most one of extra_body.thinking or a
top-level reasoning_effort, and added tests/plugins/model_providers/test_kimi_profile.py
to pin it — but left two older test files still asserting the removed
'send both' behavior, turning main red for every PR branched after it.

Update the stale assertions to the xor contract:
- explicit recognized effort (low|medium|high) -> reasoning_effort only,
  no thinking
- enabled w/o effort, or no reasoning_config -> thinking:enabled only,
  no reasoning_effort
- disabled -> thinking:disabled only

No production change.

* test(kimi): cover remaining xor stale assertions (profile_wiring, run_agent)

Two more test files asserted the pre-ce4e74b3 'thinking + reasoning_effort
together' behavior — landed in a different CI shard so they surfaced only
after the first batch went green:
- tests/providers/test_profile_wiring.py::TestKimiProfileParity (2)
- tests/run_agent/test_run_agent.py::TestBuildApiKwargs (3: kimi-coding,
  moonshot, moonshot-cn)

Same realignment to the xor contract: default/enabled-without-effort emits
thinking:enabled and no reasoning_effort; explicit effort emits
reasoning_effort only. Verified by running the full provider +
TestBuildApiKwargs Kimi surface (202 passed) plus a codebase-wide grep for
any remaining paired thinking+effort assertion (none).
2026-06-07 01:52:49 -07:00
Teknium
0524c9b34e
feat(compression): raise compaction trigger to 85% for gpt-5.5 on Codex OAuth (#40957)
The ChatGPT Codex OAuth backend hard-caps gpt-5.5 at a 272K context window
(verified live: a ~330K-token request to chatgpt.com/backend-api/codex/responses
is rejected with context_length_exceeded while ~250K succeeds; the same slug
exposes 1.05M on the direct OpenAI API / OpenRouter and 400K on Copilot). At the
default 50% trigger, auto-compaction fires at ~136K — half the usable window.

Raise the trigger to 85% (~231K) on this exact route only, gated by a new
compression.codex_gpt55_autoraise config flag (default true). When it fires,
emit a one-time notice (CLI inline print + gateway status_callback replay) with
the exact opt-back-out command. gpt-5.5 on any other provider keeps the user's
global threshold.

- _is_codex_gpt55() matches the 5.5 family only on provider=openai-codex
- _compression_threshold_for_model() now provider-aware + opt-out param
- config key + _config_version bump (27->28) for backfill
- docs + tests (40 cases in test_arcee_trinity_overrides.py)
2026-06-07 01:40:50 -07:00