Commit graph

462 commits

Author SHA1 Message Date
liuhao1024
1b962f001e fix(models): pass model.base_url to fetch_models in /model picker
The /model interactive picker resolved a base_url from user credentials
but never passed it to ProviderProfile.fetch_models(), causing the
picker to always query the provider's hardcoded default endpoint
instead of the user's custom URL (e.g. a company litellm proxy).

- providers/base.py: add optional base_url parameter to fetch_models()
- hermes_cli/models.py: pass resolved base_url to fetch_models()
- Update all subclass overrides for signature compatibility
- Add 6 regression tests covering override, fallback, and integration
2026-06-16 13:09:40 -07:00
Jaaneek
f4ef70f6fc docs(xai): update default model references to grok-build-0.1
Reflect the default-model change in the xAI Grok OAuth guide, the web
search docs (EN + zh-Hans), and the web provider docstring. grok-4.3 is
kept in the model tables as the previous default; the Nous/OpenRouter
aggregator catalog still lists grok-4.3 and is left unchanged.
2026-06-16 11:50:17 -07:00
Jaaneek
bbc842d31e feat(xai): default to grok-build-0.1
Switch the default model for the xAI/Grok provider and the xAI web
search backend from grok-4.3 to grok-build-0.1. grok-build-0.1 is
already recognized by the model metadata, so no new model definition
is required; grok-4.3 remains selectable.
2026-06-16 11:50:17 -07:00
Teknium
c2c55c4443 fix(memory): strip skill scaffolding for all providers, not just openviking
Generalizes #32663 (@ehz0ah). The slash-skill scaffolding pollution
affected every auto-syncing memory provider — mem0, hindsight, retaindb,
byterover, honcho, supermemory all store/embed the raw user turn, so a
/skill invocation poisoned their stores with the full skill body, not just
openviking.

- Lift the contributor's parser into agent/skill_commands.py as the canonical
  extract_user_instruction_from_skill_message(), co-located with the message
  builders so the markers can't drift.
- Strip once in MemoryManager.{prefetch_all,queue_prefetch_all,sync_all} —
  fixes the whole provider fan-out, bare /skill turns are skipped entirely.
- OpenViking's _derive_openviking_user_text() now delegates to the shared
  helper as defense-in-depth (no duplicated marker literals).
- Marker-drift regression now asserts against the canonical skill_commands
  constants; add manager-level coverage proving every provider gets clean text.
2026-06-16 10:37:37 -07:00
Hao Zhe
e3adbb5ae9 fix(openviking): sanitize skill memory input 2026-06-16 10:37:37 -07:00
underthestars-zhy
5b3fa26366 fix(photon): unify project identifiers and update documentation for Spectrum provisioning
Co-Authored-By: Marvin <marvin@photon.codes>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 05:25:56 -07:00
Teknium
5a0e0d35b9 fix(mattermost): preserve thread-local delivery hygiene
Salvage the valid thread-routing pieces from #41640:
- route Mattermost progress/status sends through metadata thread IDs
- treat top-level Mattermost channel posts as thread roots for progress
- preserve thread metadata through media/file sends
- allow flat fallback only for final notify-worthy replies on confirmed broken roots

Co-authored-by: Wolfram Ravenwolf <github.com@wolfram.ravenwolf.de>
2026-06-15 15:06:23 -07:00
kshitij
d2b34e89b0
Merge pull request #44431 from erosika/feat/honcho-identity-tree
feat(honcho): gateway-gated identity tree + canonicalize on pinUserPeer
2026-06-16 03:35:24 +05:30
Erosika
c7513df4f9 docs(honcho): clarify pinUserPeer pins only non-agent users
'everyone collapses to your peer' read as a promise about all traffic.
pinUserPeer pins the user-side peer and is checked before userPeerAliases
(session.py:335), so a pin overrides every alias — including agent peers.
For a multi-agent operator that silently pools distinct agents onto one
peer, the opposite of intent.

Scopes the wording to 'every non-agent gateway user', notes the pin
overrides aliases, and points agent-mesh operators at pinUserPeer:false +
userPeerAliases instead. Same correction in the wizard menu/echo text,
the plugin README, and the website Honcho page.
2026-06-15 21:34:09 +00:00
kshitij
cffd6e3c8d
Merge pull request #46078 from xxxigm/fix/discord-slash-command-100-cap
fix(discord): cap slash commands at Discord's 100-command limit
2026-06-16 02:05:31 +05:30
Austin Pickett
5f6be7f31b
fix(teams): package Microsoft Teams SDK as an installable extra (salvage #43945) (#46764)
* fix(teams): package Microsoft Teams SDK as an installable extra

The Teams adapter imports the microsoft-teams-apps SDK, but it was never
declared as a dependency, so source/local installs hit ImportError and the
adapter silently reported the SDK as unavailable. Add a 'teams' extra
(microsoft-teams-apps==2.0.13.4 + aiohttp) and document 'uv sync --extra teams'.

Per the 2026-05-12 [all] policy, opt-in messaging-platform SDKs are NOT added
to [all] (they would break every fresh install on a quarantined release); the
teams extra is installed on demand like the other platform backends.

Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai>

* chore: map rio-jeong contributor email for attribution (#43945)

* feat(teams): lazy-install the Teams SDK on demand (parity with other channels)

The teams extra alone left Teams as the only messaging platform that wouldn't
auto-install its SDK — every other channel (telegram, discord, slack, matrix,
dingtalk, feishu) lazy-installs via tools.lazy_deps on first connect. Bring
Teams to parity:

- Add 'platform.teams' to LAZY_DEPS (microsoft-teams-apps + aiohttp).
- Replace the passive 'check_teams_requirements = check_requirements' alias with
  a real lazy-installer that calls ensure_and_bind('platform.teams', ...),
  rebinding all Teams SDK globals on success (mirrors check_slack_requirements).
- Call check_teams_requirements() at the top of TeamsAdapter.connect() so
  enabling Teams installs the SDK on demand.
- Keep the passive check_requirements() as the registry check_fn so 'gateway
  status' probes never trigger a pip install.

The 'teams' extra remains for packagers / explicit 'uv sync --extra teams'.

Tests: rework the alias test into shortcircuit + lazy-install assertions, and
update test_connect_fails_without_sdk to simulate an uninstallable SDK.

---------

Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-06-15 14:35:15 -04:00
Teknium
49e743985a fix: route minimax m3 reasoning controls through profile
Follow up PR #46609's api.minimax.io reasoning report by moving the behavior out of the broad run_agent host gate and into the MiniMax provider profile. Only MiniMax-M3 on the documented OpenAI-compatible /v1 route gets reasoning_split/thinking/reasoning_effort; Anthropic-format MiniMax and non-M3 models keep their existing wire shapes.

Co-authored-by: goku94123 <gooku94123@gmail.com>
2026-06-15 07:08:43 -07:00
墨綠BG
40699c3292 🐛 fix(disk-cleanup): avoid brittle sweep review issues 2026-06-15 05:25:27 -07:00
墨綠BG
c1a70a5439 🐛 fix(disk-cleanup): prune protected cleanup walks 2026-06-15 05:25:27 -07:00
Nicolò Boschi
a376ca0081 feat(hindsight): make observation scopes configurable on retain
Adds an observation_scopes config key (and HINDSIGHT_RETAIN_OBSERVATION_SCOPES
env var) so retained memories can opt into per_tag / all_combinations /
custom scoping instead of Hindsight's default combined pass.

Threaded through _build_retain_kwargs so all three retain paths honor it:
auto-retain and flush-on-switch already use aretain_batch; the tool retain
path is switched from aretain to aretain_batch (functionally equivalent,
aretain just wraps a single-item batch) since aretain doesn't accept the
observation_scopes parameter.
2026-06-15 04:59:17 -07:00
Teknium
f3fe99863d
revert(web): remove keyless Parallel search fallback (#46350)
Remove the free Parallel Search MCP path and restore the keyed Parallel backend behavior from before it was introduced.

Also drops the keyless fallback registration/display labeling tests and returns the Parallel SDK pin to the prior version.
2026-06-14 16:47:57 -07:00
mr-r0b0t
bff78a34dc feat(zai): add GLM-5.2 with verified 1M context window
GLM-5.2 ships with a 1M (1,048,576) token context window. Without this
entry, Hermes falls through to the generic 'glm' key (202,752 tokens),
under-reporting the context bar and prematurely compressing conversations.

The 1M limit was verified empirically via needle-in-a-haystack retrieval
at 789,240 prompt tokens on api.z.ai/api/coding/paas/v4 — zero errors,
zero truncation, correct retrieval at every tested size (25K through 789K).

Changes:
- agent/model_metadata.py: add 'glm-5.2': 1_048_576 before 'glm' fallback
- hermes_cli/models.py: add glm-5.2 to zai curated models
- hermes_cli/setup.py: add glm-5.2 to setup wizard zai list
- hermes_cli/auth.py: add glm-5.2 to coding plan endpoint probes
- plugins/model-providers/zai/__init__.py: add glm-5.2 to fallback_models
- tests/agent/test_model_metadata.py: context resolution + vendor-prefix tests
2026-06-14 13:50:36 -07:00
Teknium
efbe1635dd
fix(gateway): include replied-to media attachments (#46107) 2026-06-14 04:51:50 -07:00
xxxigm
5e851bc6bc fix(discord): cap slash commands at Discord's 100-command limit
Discord enforces a hard cap of 100 global application commands per app.
The adapter registers ~27 native commands plus every gateway-available
entry in COMMAND_REGISTRY plus all plugin commands plus the consolidated
/skill group. On a loaded install (many plugins/quick commands) the
desired set exceeds 100, so tree.sync() / _safe_sync_slash_commands()
hits error 30032 ("Maximum number of application commands reached") and
Discord rejects the ENTIRE batch — silently breaking every slash command,
not just the overflow.

Cap registration at the 100-command limit: native commands (registered
first, highest priority) and the /skill group are always kept; lower-
priority auto-registered COMMAND_REGISTRY and plugin commands are added
only until the cap is reached, with a single concise warning telling the
user how to surface the rest. Since both sync paths read from
tree.get_commands(), bounding the tree fixes the root cause for both.
2026-06-14 17:01:28 +07:00
Teknium
2681c5a12d
fix(photon): correct gateway start command (#45566) 2026-06-13 05:14:59 -07:00
Teknium
0fd34e8c5a
fix(teams): cache document/video/audio attachments and classify as DOCUMENT (#44778)
The Teams adapter only handled image/* attachments — documents (the
application/vnd.microsoft.teams.file.download.info consent-free download
payload and any direct-URL non-image attachment) never reached media_urls
at all, so run.py's document-context injection had nothing to surface.
Completes the class-wide sweep from PR #44695 (Signal/Email/SimpleX).

- download.info attachments: fetch the pre-authed SharePoint downloadUrl
  (SSRF-guarded, same guard chain as base.py cache_*_from_url) and route
  through cache_media_bytes
- direct-URL non-image attachments: same fetch + classify path
- skip Teams' text/html message-body mirror and adaptive-card attachments
- DOCUMENT > PHOTO > VIDEO > AUDIO precedence for mixed attachments,
  matching the Email precedence rationale from #44695
2026-06-12 02:05:41 -07:00
Teknium
74180ebf0b fix(gateway): classify SimpleX non-image/non-audio files as DOCUMENT
SimpleX tagged unknown files application/octet-stream in media_types
but classification only handled audio/image, leaving msg_type TEXT —
run.py never injected the document context. Same bug class as #12845.
2026-06-12 01:07:50 -07:00
underthestars-zhy
b4e95a2efe fix(photon): add clarifying comments for Windows-safe os.kill usage 2026-06-12 01:07:38 -07:00
underthestars-zhy
23305cfeab fix(photon): normalize DM chat keys in last-inbound reaction tracker
Inbound events key the tracker by the DM chat GUID (any;-;+1555...),
but home-channel react calls address the same space by bare E.164 —
normalize both to the phone so add_reaction's last-inbound default
resolves regardless of which form the caller uses (mirrors the
sidecar's phoneTargetFromSpaceId).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 01:07:38 -07:00
underthestars-zhy
156f4fba92 feat(photon): add agent-facing emoji reaction support
Add `action='react'` to `send_message` tool and expose `add_reaction`/
`remove_reaction` on the Photon adapter.

- Track latest inbound message id per chat (`_last_inbound_by_chat`,
  bounded to 200 entries) so the agent can react without threading
  message ids through tool calls
- New `add_reaction`/`remove_reaction` public methods on PhotonAdapter;
  unlike the lifecycle tapbacks, these are not gated by PHOTON_REACTIONS
- `send_message` gains `action='react'` with `emoji` and optional
  `message_id` params; resolves target via existing channel-directory
  and home-channel logic; requires a live gateway adapter
2026-06-12 01:07:38 -07:00
underthestars-zhy
a23c0b378c fix(photon): use per-call httpx client in _sidecar_call
Prevents "Future attached to a different loop" errors when
_sidecar_call is invoked from a worker thread via _run_async in
send_message_tool. The persistent _http_client remains in use for
the inbound streaming loop, which always runs on the gateway's loop.
2026-06-12 01:07:38 -07:00
underthestars-zhy
9bfff6e16c chore(photon): bump spectrum-ts to 3.1.0 2026-06-12 01:07:38 -07:00
underthestars-zhy
a652131c42 fix(photon): stop gateway restarts from orphaning the sidecar on its port
A hard gateway exit (crash, SIGKILL, supervisor restart) left the
detached Node sidecar running with a token the next gateway run doesn't
know, so it could never be told to /shutdown. Every replacement spawn
then died on EADDRINUSE, failing each 30→300s reconnect attempt while
the orphan kept consuming the inbound gRPC stream.

Two layers:
- Lifetime binding: the adapter now holds the sidecar's stdin as a
  pipe, and the sidecar (PHOTON_SIDECAR_WATCH_STDIN=1) shuts down on
  stdin EOF — fired by the OS on any parent death, including SIGKILL.
- Startup reaping: before spawning, the adapter probes the port and
  terminates a stale listener, but only after verifying its command
  line is a Photon sidecar; a foreign listener raises a clear error
  instead of being signalled.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 01:07:38 -07:00
underthestars-zhy
573c4e6511 feat(photon): upgrade to spectrum-ts 3.0.0 (pinned) with markdown + reactions
Pin spectrum-ts to exactly 3.0.0 (was ^1.18.0 plus an `npm install
spectrum-ts@latest` on every setup) so breaking SDK majors can't take
down fresh installs silently; `hermes photon setup` now runs `npm ci`.
Upgrade procedure documented in the README.

Migrate resolveSpace to the v3 namespace API: `im.space.create(phone)`
for DMs and `im.space.get(id)` for everything else — group spaces are
now rehydratable from their persisted id after a sidecar restart, which
v1 could not do.

Markdown: replies go out via the v3 `markdown()` builder (iMessage
renders natively; other Spectrum platforms degrade to plain text).
`PHOTON_MARKDOWN=false` reverts to the stripped plain-text path.

Reactions, behind PHOTON_REACTIONS (default off): lifecycle tapbacks
(👀 while processing, 👍/👎 on completion) via new sidecar /react and
/unreact endpoints with per-target reaction-handle tracking, and user
tapbacks on bot-sent messages routed to the agent as synthetic
`reaction:added:<emoji>` events.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 01:07:38 -07:00
underthestars-zhy
0a963d8c9a feat(photon): add telemetry toggle via hermes photon telemetry 2026-06-12 01:07:38 -07:00
Austin Pickett
c3464ecf45
fix(discord): recover from runtime gateway task exits (#44383)
* fix(discord): recover from runtime gateway task exits

Salvaged from #39416 (AMEOBIUS) — cherry-picked only the task-exit
recovery; the original PR was 1081 commits behind with 28 unrelated
commits.

A post-ready discord.py WebSocket crash left the gateway split-brained:
producers stayed active while Discord stopped responding. After this fix
the adapter calls _set_fatal_error(retryable=True) + _notify_fatal_error()
so the existing GatewayRunner reconnect watcher replaces the dead adapter.

Also adds _wait_for_ready_or_bot_exit() so startup failures (SOCKS/proxy
errors, invalid tokens) surface fast instead of burning the full ready
timeout. Because connect() no longer waits via asyncio.wait_for on that
path, test_connect_releases_token_lock_on_timeout is updated to trigger
the timeout through the new helper (same lock-release contract).

3 tests pass (2 new runtime-failure tests + the updated timeout test);
test_discord_connect.py and test_discord_slash_commands.py green.

Co-Authored-By: ameobius <ameobius@local.host>

* fix(test): patch _wait_for_ready_or_bot_exit in timeout cancel test

connect() no longer uses asyncio.wait_for for the ready handshake, so
test_connect_timeout_cancels_bot_task was hanging for 30s in CI.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: ameobius <ameobius@local.host>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-11 15:39:01 -04:00
teknium1
08b1c44a53 fix(discord): extend bot-task cancellation to connect()'s generic exception branch
Follow-up to #44389: the generic 'except Exception' branch in connect()
had the same orphaned-task hazard as the timeout branch. Extract the
cancel-and-await logic into _cancel_bot_task() and call it from all
three sites (timeout branch, exception branch, disconnect()).

Also adds deaneeth to AUTHOR_MAP.
2026-06-11 12:09:18 -07:00
Dineth Hettiarachchi
020ef76cf1 fix(discord): cancel _bot_task on connect() timeout to prevent zombie client
When connect() times out waiting for the Discord ready event, the background
asyncio.Task running client.start() was not cancelled. discord.py's internal
reconnect loop can ignore client.close() while a WebSocket handshake is in
flight, so the orphaned task eventually completes and fires on_ready.

A later successful reconnect then leaves two live Discord clients in the same
process — each with its own on_message handler and MessageDeduplicator instance
— so every @mention creates two threads because the per-adapter dedup caches
cannot catch cross-client duplicates.

Fix: explicitly cancel and await _bot_task in two places:
1. The asyncio.TimeoutError handler inside connect() — catches the case where
   the adapter's own inner wait_for fires before the gateway's outer timeout.
2. The start of disconnect() — the load-bearing path, always reached via
   _dispose_unused_adapter regardless of which timeout fired first.

Root cause confirmed from production logs: a Jun 8 network outage caused three
consecutive connect() timeouts. The first attempt's bot_task completed its
handshake 4 minutes later ("Connected as") with no preceding watcher line,
then the watcher's real reconnect also connected 90 seconds after that. The two
clients ran continuously for 41+ hours, confirmed by the same user message
appearing as two separate inbound events in two different thread IDs 357ms apart.

Regression tests added to tests/gateway/test_discord_connect.py:
- test_connect_timeout_cancels_bot_task: simulates a connect() timeout with a
  NeverReadyBot and asserts _bot_task is None afterward
- test_disconnect_cancels_running_bot_task: injects a live zombie task, calls
  disconnect(), and asserts the task is cancelled and the attribute cleared
2026-06-11 12:09:18 -07:00
Erosika
1544813bfe chore(honcho): replace example Telegram UID with placeholder 2026-06-11 15:06:07 -04:00
Erosika
2708c33c75 docs(honcho): anonymize example peer name to alice 2026-06-11 15:04:01 -04:00
Teknium
0a5762c78d fix(web): genericize free-MCP client identity per telemetry policy
Replace the hermes-identifying clientInfo/User-Agent/session-id prefix on
the keyless Parallel Search MCP path with a neutral 'mcp-web-client'
identity. Project policy forbids third-party usage attribution without an
explicit user opt-in (see telemetry PR policy); MCP requires a clientInfo,
so a generic one satisfies the spec without attributing traffic.

Also adds the contributor AUTHOR_MAP entry and refreshes uv.lock against
current main (parallel-web 0.6.0).
2026-06-10 19:54:38 -07:00
Matt Harris
e0e2571711 feat(web): Parallel-backed web search & extract — free Search MCP when keyless, v1 REST when keyed
Make Parallel the web search/extract backend with a zero-setup free tier:

- Keyless (no PARALLEL_API_KEY): web_search/web_extract work out of the box via
  Parallel's free hosted Search MCP (search.parallel.ai/mcp), and parallel
  becomes the default backend when no other web credentials are configured
  (ahead of ddgs, which is search-only). A small hand-rolled Streamable-HTTP
  JSON-RPC client speaks the MCP's web_search/web_fetch tools; the existing
  web_search/web_extract tools are the only tools registered.
- Keyed (PARALLEL_API_KEY set): uses the Parallel v1 REST endpoints
  (client.search / client.extract with advanced_settings.full_content) — no beta.
  Bumps parallel-web 0.4.2 -> 0.6.0.
- Attribution: on the free path only, results carry provider/attribution and the
  CLI tool line reads "Parallel search" / "Parallel fetch"; the paid path is
  unbranded.
- Selection/registration: web tools register unconditionally (free MCP backstop)
  while check_web_api_key remains a real usability probe; explicit per-capability
  backends are honored (so misconfig surfaces) rather than masked by the fallback.

Tested: live web_search/web_extract against search.parallel.ai in keyless and
keyed modes; unit suites for the MCP client, backend selection, and display
labeling; full agent run shows the "Parallel search" label on the free path.
2026-06-10 19:54:38 -07:00
Erosika
99feb03607 docs(honcho): demote pinPeerName to deprecated alias; document gateway identity tree
Drop pinPeerName from the key table (now a deprecated-alias note), and replace
the single/multi/hybrid 'deployment shapes' section with the gateway-gated
intent tree the wizard actually presents, including the [e] raw-edit hatch and
the un-pin pooling steer.
2026-06-10 16:15:17 -04:00
Erosika
d7dfeed6dc feat(honcho-setup): replace deployment-shape prompt with gateway-gated identity tree
The single/multi/hybrid 'deployment shape' was a misnomer: these keys only
affect the gateway (the one entrypoint supplying a runtime user ID), and the
three preset names stamped a lossy taxonomy onto three orthogonal knobs while
hiding which keys got written.

Replace it with an intent-led tree gated on gateway detection:
- _gateway_platforms() lazily inspects the gateway config (best-effort, no
  hard dependency); the step auto-skips when no platform is connected.
- 'who talks to this?' → just me / me+others (pooled?) / only others, deriving
  pinUserPeer + userPeerAliases + runtimePeerPrefix and echoing the result.
- [e] drops to a raw-knob editor for power users.
- The single→multi orphan guard survives as a pooling steer.
2026-06-10 16:14:24 -04:00
Erosika
bb5cb32838 refactor(honcho): canonicalize identity-mapping on pinUserPeer, migrate legacy key
The setup wizard wrote the legacy pinPeerName even though pinUserPeer is
the canonical key that outranks it in the resolver — so it had to scrub
the canonical key afterward to stop it winning. Write pinUserPeer directly
and migrate any legacy pinPeerName onto it on touch (setup load + clone),
which removes the precedence-fighting entirely.

Resolver still reads pinPeerName as a back-compat alias; that's deferred.
2026-06-10 16:07:53 -04:00
Siddharth Balyan
183d86b3e0
fix(openrouter): route reasoning_effort to verbosity for adaptive Anthropic models (#43436)
* fix(openrouter): route reasoning_effort to verbosity for adaptive Anthropic models

Reasoning-mandatory Anthropic models (Claude 4.6+/fable/mythos-class) over
OpenRouter ignore reasoning.effort and use adaptive thinking. #42991 correctly
stopped Hermes from sending a reasoning field to them (it 400s), but put nothing
in its place — leaving agent.reasoning_effort a silent no-op on the OpenRouter
path: the model always ran at its adaptive default (high) regardless of config.

OpenRouter honors the requested effort on the top-level verbosity field instead
(maps to Anthropic output_config.effort). Route the existing
reasoning_config[effort] there for these models while still never emitting a
reasoning field, preserving the #42991 fix. No new config arg — the value the
user already sets via agent.reasoning_effort now flows to verbosity.

- low/medium/high/xhigh/max pass through verbatim (OpenRouter accepts the
  extended scale for Claude; verified live HTTP 200 + monotonic token spend).
- effort unset/none/disabled omits verbosity so the model keeps its default.
- native Anthropic transport already correct; unchanged.

Fixes #43432

* test(openrouter): cover real effort range (add minimal, frame max as passthrough)

Adversarial review noted the verbosity tests looped over 'max' — a value
parse_reasoning_effort can never produce — while omitting 'minimal', which it
can. Align the routing test with the real config range
(VALID_REASONING_EFFORTS = minimal/low/medium/high/xhigh) and keep a separate
value-agnostic passthrough test that documents why xhigh/max must survive
verbatim (TypedDict, no runtime literal validation; OpenRouter accepts the
extended scale for Claude).

* docs: explain reasoning_effort -> verbosity routing for adaptive Anthropic models

Document that reasoning_effort transparently maps to OpenRouter's verbosity
field for adaptive-thinking Anthropic models (Claude 4.6+/Fable/Mythos), where
reasoning.effort is ignored. Note xhigh is the configurable ceiling (max is wire-
only). Add verbosity as a top-level-kwarg example in the provider-plugin guide.
2026-06-10 15:03:01 +05:30
Teknium
243cada157 fix(model): cover typed gateway /model path + async-safe pricing lookups
Follow-ups on top of #26016's expensive-model guard:

- gateway/slash_commands.py: typed '/model <name>' now routes through the
  expensive-model confirmation gate (slash-confirm buttons / text fallback)
  instead of bypassing the guard the pickers enforce. Cancel leaves the
  session override and --global config untouched.
- telegram/discord/web_server: run expensive_model_warning() via
  asyncio.to_thread — it can hit models.dev or a /models endpoint on a
  cache miss, which would otherwise block the event loop.
- telegram: picker callback no longer toasts 'Model switched!' when the
  switch callback raised (both mm: and mc: paths).
- tests: new tests/gateway/test_model_command_expensive_confirm.py pins
  the typed-path gate (prompt, confirm-once, cancel, cheap-model no-op).
2026-06-10 00:24:06 -07:00
Robin Fernandes
af978ecb17 fix(model): require confirmation for expensive model selections
Rebased onto current main and re-ported across the restructured
surfaces: model flows now thread confirm_provider/base_url/api_key
through hermes_cli/model_setup_flows.py, the Discord picker lives in
plugins/platforms/discord/adapter.py, and the web dashboard picker
applies chat-mode switches via config.set so the expensive-model
confirmation can ride the response.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 00:24:06 -07:00
Joel Chan
e5580f43c2 fix(discord): propagate role_authorized flag so DISCORD_ALLOWED_ROLES works end-to-end
DISCORD_ALLOWED_ROLES was checked by the Discord adapter (_is_allowed_user)
but gateway._is_user_authorized only read DISCORD_ALLOWED_USERS, so
role-authorized users were rejected with "Unauthorized user" at the
gateway layer despite passing the adapter gate.

- Add role_authorized: bool = False to SessionSource
- Add role_authorized param to build_source (base.py)
- Compute _role_authorized in on_message when user passes via role not user ID
- Thread _role_authorized through _handle_message -> build_source
- Check source.role_authorized early in _is_user_authorized (run.py)

Fixes #33952
2026-06-10 00:18:11 -07:00
xxxigm
311900842e fix(discord): don't auto-disconnect voice when reply mode is off
The voice inactivity timer (VOICE_TIMEOUT) only counted the bot's OWN audio
playback as activity. Under /voice off (text-only replies, but still in the
channel — leaving is /voice leave) nothing ever reset it, so every 300s the bot
disconnected and spammed "Left voice channel (inactivity timeout)."

The adapter now learns the live voice-reply mode via a getter wired from run.py
and skips the auto-disconnect while mode is off. It also resets the timer when a
user actually speaks to the bot, so an active listener (incl. voice-on
text-only sessions that never play audio) isn't dropped mid-conversation.
2026-06-09 23:24:26 -07:00
kshitijk4poor
4642762289 fix(langfuse): redact base64 data URIs instead of truncating into invalid base64
The Langfuse SDK treats `data:*;base64,...` strings as media and tries to
decode them. `_truncate_text` was slicing those strings mid-payload, producing
invalid base64 and noisy "Error parsing base64 data URI" logs. Observability
only needs the metadata, not raw image/audio bytes, so redact the whole data
URI (type, media_type, length) before it reaches the SDK.

Salvaged the Langfuse fix from #39682 onto current main as a standalone,
single-concern change (the dashboard `dist/**` and plugin-discovery parts of
that PR already landed separately on main).

Co-authored-by: foras910521-lab <foras910521-lab@users.noreply.github.com>
2026-06-10 10:49:36 +05:30
Siddharth Balyan
46fedef07f
fix(openrouter): never send reasoning field for adaptive Anthropic models (#43012)
The previous fix (#42991) only omitted reasoning when it was being disabled.
But reasoning-mandatory Anthropic models (Claude 4.6+, fable) 400 with
thinking.type.disabled on EVERY tool-continuation turn even when reasoning is
enabled: chat_completions never replays signed thinking blocks, so the prior
assistant tool_call has no thinking, and OpenRouter resolves "reasoning
requested but history has none" by emitting thinking.type.disabled — which
these models reject. Result: first turn works, every turn after the first tool
call dies (HTTP 400, non-retryable).

OpenRouter ignores reasoning.effort for adaptive Anthropic models anyway (the
model self-decides), so the reasoning field is pointless for them on every turn
and harmful on tool-replay turns. Omit it entirely → adaptive default.

- openrouter profile: drop the reasoning field for reasoning-mandatory Anthropic
  models regardless of enabled/disabled; legacy Anthropic + non-Anthropic models
  unchanged.
- tests: assert omission across enabled/disabled/effort variants; parity tests
  switched to a non-Anthropic reasoning model (deepseek) since Anthropic 4.6+ no
  longer carries a reasoning field.

Verified live end-to-end: a tool-replay turn on anthropic/claude-fable-5 with
reasoning enabled now builds extra_body=None and returns HTTP 200 (was 400).
2026-06-10 00:18:23 +05:30
Siddharth Balyan
1febb08240
fix(anthropic): default new Claude models to the modern thinking contract (#42991)
New Anthropic models without a recognized version substring (claude-fable-5
and future named/numbered releases) were classified as legacy and routed down
the manual-thinking path, which made OpenRouter emit thinking.type.disabled —
a form reasoning-mandatory Claude models reject with a non-retryable HTTP 400.

Invert the brittle version-substring allowlists to default-to-modern (mirroring
_get_anthropic_max_output): unknown Claude models get the adaptive/xhigh/
no-sampling contract, with an explicit legacy list for older families. Non-Claude
Anthropic-Messages models (minimax, qwen3, …) keep the manual path.

- anthropic_adapter: _supports_adaptive_thinking / _supports_xhigh_effort /
  _forbids_sampling_params now default unknown Claude models to modern; legacy
  families enumerated in _LEGACY_MANUAL_THINKING_CLAUDE_SUBSTRINGS.
- openrouter profile: omit reasoning entirely (→ adaptive default) instead of
  forwarding {enabled:false} for reasoning-mandatory Anthropic models; legacy
  Anthropic + all non-Anthropic models still pass the disable form through.
- model_metadata + output-limit table: register claude-fable-5 (1M ctx, 128K out).

Tests assert the invariant ("unknown Claude model -> modern contract; legacy
stays manual; non-Claude unaffected"), not specific model names.
2026-06-09 23:37:23 +05:30
Philip D'Souza
92dfd70d6a
fix(photon): production hardening for the gRPC-native iMessage channel (#42732)
* fix(photon): override transitive CVEs in the sidecar deps

`npm audit` flagged 7 high-severity transitive CVEs (protobufjs code injection
GHSA-66ff-xgx4-vchm + outdated @opentelemetry OTLP exporters) pulled in via
spectrum-ts -> @photon-ai/otel. npm's suggested fix downgrades spectrum-ts to a
version that targets the decommissioned spectrum host, so instead pin patched
versions via `overrides` (protobufjs 8.6.1, @opentelemetry/* 0.218.0) without
touching spectrum-ts. `npm audit` -> 0; spectrum-ts + provider still import.

* fix(photon): harden the sidecar bridge + bound the dedup cache

- constant-time sidecar control-token comparison (was `!==`, timing-attackable).
- cap the control-channel request body (2 MiB) so a compromised local peer can't
  OOM the sidecar.
- wrap the inbound gRPC stream consumer in a re-subscribe loop with capped
  exponential backoff + jitter — if the async iterator throws/ends it would
  otherwise stop inbound forever (the adapter dedupes any replay).
- add an unhandledRejection handler so a stray rejection logs instead of killing
  the process.
- dedup cache (adapter) was a true bounded LRU only for expired entries; a burst
  of unique ids within the window grew it without limit. Evict oldest at the cap.

* chore: add AUTHOR_MAP entry for PhilipAD

---------

Co-authored-by: PhilipAD <philipadsouza@gmail.com>
2026-06-09 11:12:58 -04:00
kshitij
85852b71d8
fix(nemo-relay): preserve downstream errors in adaptive execution (#42691)
Based on #42658 by @mnajafian-nv.

Preserves the real downstream provider/tool exception when NeMo Relay's
managed adaptive execution wraps a failing callback as an internal runtime
error. Without this, the original exception (and its retry-classification
signal, e.g. status_code) is lost behind Relay's wrapper.

Salvage changes on top of the original PR:

- Tolerant Relay-wrapper match: _is_relay_wrapped_callback_error now uses
  str.startswith on the "internal error: <cls>: <msg>" prefix instead of
  exact equality, so a future Relay version appending a traceback/suffix
  doesn't silently defeat the unwrap. On a total format change it returns
  False and falls back to the pre-fix behavior (surfacing Relay's error)
  rather than masking it.
- Deduplicated the LLM and tool execute paths into a shared
  _run_managed_with_downstream_preservation helper, removing ~20 lines of
  copy-pasted nonlocal/try-except scaffolding that could drift out of sync.
- Added a real-middleware regression guard
  (test_nemo_relay_downstream_unwrap_matches_real_middleware_wrapper_shape)
  that drives hermes_cli.middleware._run_execution_chain and asserts the
  plugin's _original_downstream_error unwraps the actual private
  _DownstreamExecutionError wrapper. The original synthetic tests modeled the
  wrapper with a local class, so a rename or shape change in core middleware
  would not have been caught; this test fails loudly if that contract drifts.

Co-authored-by: mnajafian-nv <mnajafian@nvidia.com>
2026-06-09 02:31:10 -07:00