hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-17 09:41:58 +00:00

Author	SHA1	Message	Date
liuhao1024	1b962f001e	fix(models): pass model.base_url to fetch_models in /model picker The /model interactive picker resolved a base_url from user credentials but never passed it to ProviderProfile.fetch_models(), causing the picker to always query the provider's hardcoded default endpoint instead of the user's custom URL (e.g. a company litellm proxy). - providers/base.py: add optional base_url parameter to fetch_models() - hermes_cli/models.py: pass resolved base_url to fetch_models() - Update all subclass overrides for signature compatibility - Add 6 regression tests covering override, fallback, and integration	2026-06-16 13:09:40 -07:00
Jaaneek	f4ef70f6fc	docs(xai): update default model references to grok-build-0.1 Reflect the default-model change in the xAI Grok OAuth guide, the web search docs (EN + zh-Hans), and the web provider docstring. grok-4.3 is kept in the model tables as the previous default; the Nous/OpenRouter aggregator catalog still lists grok-4.3 and is left unchanged.	2026-06-16 11:50:17 -07:00
Jaaneek	bbc842d31e	feat(xai): default to grok-build-0.1 Switch the default model for the xAI/Grok provider and the xAI web search backend from grok-4.3 to grok-build-0.1. grok-build-0.1 is already recognized by the model metadata, so no new model definition is required; grok-4.3 remains selectable.	2026-06-16 11:50:17 -07:00
Teknium	c2c55c4443	fix(memory): strip skill scaffolding for all providers, not just openviking Generalizes #32663 (@ehz0ah). The slash-skill scaffolding pollution affected every auto-syncing memory provider — mem0, hindsight, retaindb, byterover, honcho, supermemory all store/embed the raw user turn, so a /skill invocation poisoned their stores with the full skill body, not just openviking. - Lift the contributor's parser into agent/skill_commands.py as the canonical extract_user_instruction_from_skill_message(), co-located with the message builders so the markers can't drift. - Strip once in MemoryManager.{prefetch_all,queue_prefetch_all,sync_all} — fixes the whole provider fan-out, bare /skill turns are skipped entirely. - OpenViking's _derive_openviking_user_text() now delegates to the shared helper as defense-in-depth (no duplicated marker literals). - Marker-drift regression now asserts against the canonical skill_commands constants; add manager-level coverage proving every provider gets clean text.	2026-06-16 10:37:37 -07:00
Hao Zhe	e3adbb5ae9	fix(openviking): sanitize skill memory input	2026-06-16 10:37:37 -07:00
underthestars-zhy	5b3fa26366	fix(photon): unify project identifiers and update documentation for Spectrum provisioning Co-Authored-By: Marvin <marvin@photon.codes> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 05:25:56 -07:00
Teknium	5a0e0d35b9	fix(mattermost): preserve thread-local delivery hygiene Salvage the valid thread-routing pieces from #41640: - route Mattermost progress/status sends through metadata thread IDs - treat top-level Mattermost channel posts as thread roots for progress - preserve thread metadata through media/file sends - allow flat fallback only for final notify-worthy replies on confirmed broken roots Co-authored-by: Wolfram Ravenwolf <github.com@wolfram.ravenwolf.de>	2026-06-15 15:06:23 -07:00
kshitij	d2b34e89b0	Merge pull request #44431 from erosika/feat/honcho-identity-tree feat(honcho): gateway-gated identity tree + canonicalize on pinUserPeer	2026-06-16 03:35:24 +05:30
Erosika	c7513df4f9	docs(honcho): clarify pinUserPeer pins only non-agent users 'everyone collapses to your peer' read as a promise about all traffic. pinUserPeer pins the user-side peer and is checked before userPeerAliases (session.py:335), so a pin overrides every alias — including agent peers. For a multi-agent operator that silently pools distinct agents onto one peer, the opposite of intent. Scopes the wording to 'every non-agent gateway user', notes the pin overrides aliases, and points agent-mesh operators at pinUserPeer:false + userPeerAliases instead. Same correction in the wizard menu/echo text, the plugin README, and the website Honcho page.	2026-06-15 21:34:09 +00:00
kshitij	cffd6e3c8d	Merge pull request #46078 from xxxigm/fix/discord-slash-command-100-cap fix(discord): cap slash commands at Discord's 100-command limit	2026-06-16 02:05:31 +05:30
Austin Pickett	5f6be7f31b	fix(teams): package Microsoft Teams SDK as an installable extra (salvage #43945 ) (#46764 ) * fix(teams): package Microsoft Teams SDK as an installable extra The Teams adapter imports the microsoft-teams-apps SDK, but it was never declared as a dependency, so source/local installs hit ImportError and the adapter silently reported the SDK as unavailable. Add a 'teams' extra (microsoft-teams-apps==2.0.13.4 + aiohttp) and document 'uv sync --extra teams'. Per the 2026-05-12 [all] policy, opt-in messaging-platform SDKs are NOT added to [all] (they would break every fresh install on a quarantined release); the teams extra is installed on demand like the other platform backends. Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai> * chore: map rio-jeong contributor email for attribution (#43945) * feat(teams): lazy-install the Teams SDK on demand (parity with other channels) The teams extra alone left Teams as the only messaging platform that wouldn't auto-install its SDK — every other channel (telegram, discord, slack, matrix, dingtalk, feishu) lazy-installs via tools.lazy_deps on first connect. Bring Teams to parity: - Add 'platform.teams' to LAZY_DEPS (microsoft-teams-apps + aiohttp). - Replace the passive 'check_teams_requirements = check_requirements' alias with a real lazy-installer that calls ensure_and_bind('platform.teams', ...), rebinding all Teams SDK globals on success (mirrors check_slack_requirements). - Call check_teams_requirements() at the top of TeamsAdapter.connect() so enabling Teams installs the SDK on demand. - Keep the passive check_requirements() as the registry check_fn so 'gateway status' probes never trigger a pip install. The 'teams' extra remains for packagers / explicit 'uv sync --extra teams'. Tests: rework the alias test into shortcircuit + lazy-install assertions, and update test_connect_fails_without_sdk to simulate an uninstallable SDK. --------- Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai> Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-06-15 14:35:15 -04:00
Teknium	49e743985a	fix: route minimax m3 reasoning controls through profile Follow up PR #46609's api.minimax.io reasoning report by moving the behavior out of the broad run_agent host gate and into the MiniMax provider profile. Only MiniMax-M3 on the documented OpenAI-compatible /v1 route gets reasoning_split/thinking/reasoning_effort; Anthropic-format MiniMax and non-M3 models keep their existing wire shapes. Co-authored-by: goku94123 <gooku94123@gmail.com>	2026-06-15 07:08:43 -07:00
墨綠BG	40699c3292	🐛 fix(disk-cleanup): avoid brittle sweep review issues	2026-06-15 05:25:27 -07:00
墨綠BG	c1a70a5439	🐛 fix(disk-cleanup): prune protected cleanup walks	2026-06-15 05:25:27 -07:00
Nicolò Boschi	a376ca0081	feat(hindsight): make observation scopes configurable on retain Adds an observation_scopes config key (and HINDSIGHT_RETAIN_OBSERVATION_SCOPES env var) so retained memories can opt into per_tag / all_combinations / custom scoping instead of Hindsight's default combined pass. Threaded through _build_retain_kwargs so all three retain paths honor it: auto-retain and flush-on-switch already use aretain_batch; the tool retain path is switched from aretain to aretain_batch (functionally equivalent, aretain just wraps a single-item batch) since aretain doesn't accept the observation_scopes parameter.	2026-06-15 04:59:17 -07:00
Teknium	f3fe99863d	revert(web): remove keyless Parallel search fallback (#46350 ) Remove the free Parallel Search MCP path and restore the keyed Parallel backend behavior from before it was introduced. Also drops the keyless fallback registration/display labeling tests and returns the Parallel SDK pin to the prior version.	2026-06-14 16:47:57 -07:00
mr-r0b0t	bff78a34dc	feat(zai): add GLM-5.2 with verified 1M context window GLM-5.2 ships with a 1M (1,048,576) token context window. Without this entry, Hermes falls through to the generic 'glm' key (202,752 tokens), under-reporting the context bar and prematurely compressing conversations. The 1M limit was verified empirically via needle-in-a-haystack retrieval at 789,240 prompt tokens on api.z.ai/api/coding/paas/v4 — zero errors, zero truncation, correct retrieval at every tested size (25K through 789K). Changes: - agent/model_metadata.py: add 'glm-5.2': 1_048_576 before 'glm' fallback - hermes_cli/models.py: add glm-5.2 to zai curated models - hermes_cli/setup.py: add glm-5.2 to setup wizard zai list - hermes_cli/auth.py: add glm-5.2 to coding plan endpoint probes - plugins/model-providers/zai/__init__.py: add glm-5.2 to fallback_models - tests/agent/test_model_metadata.py: context resolution + vendor-prefix tests	2026-06-14 13:50:36 -07:00
Teknium	efbe1635dd	fix(gateway): include replied-to media attachments (#46107 )	2026-06-14 04:51:50 -07:00
xxxigm	5e851bc6bc	fix(discord): cap slash commands at Discord's 100-command limit Discord enforces a hard cap of 100 global application commands per app. The adapter registers ~27 native commands plus every gateway-available entry in COMMAND_REGISTRY plus all plugin commands plus the consolidated /skill group. On a loaded install (many plugins/quick commands) the desired set exceeds 100, so tree.sync() / _safe_sync_slash_commands() hits error 30032 ("Maximum number of application commands reached") and Discord rejects the ENTIRE batch — silently breaking every slash command, not just the overflow. Cap registration at the 100-command limit: native commands (registered first, highest priority) and the /skill group are always kept; lower- priority auto-registered COMMAND_REGISTRY and plugin commands are added only until the cap is reached, with a single concise warning telling the user how to surface the rest. Since both sync paths read from tree.get_commands(), bounding the tree fixes the root cause for both.	2026-06-14 17:01:28 +07:00
Teknium	2681c5a12d	fix(photon): correct gateway start command (#45566 )	2026-06-13 05:14:59 -07:00
Teknium	0fd34e8c5a	fix(teams): cache document/video/audio attachments and classify as DOCUMENT (#44778 ) The Teams adapter only handled image/* attachments — documents (the application/vnd.microsoft.teams.file.download.info consent-free download payload and any direct-URL non-image attachment) never reached media_urls at all, so run.py's document-context injection had nothing to surface. Completes the class-wide sweep from PR #44695 (Signal/Email/SimpleX). - download.info attachments: fetch the pre-authed SharePoint downloadUrl (SSRF-guarded, same guard chain as base.py cache_*_from_url) and route through cache_media_bytes - direct-URL non-image attachments: same fetch + classify path - skip Teams' text/html message-body mirror and adaptive-card attachments - DOCUMENT > PHOTO > VIDEO > AUDIO precedence for mixed attachments, matching the Email precedence rationale from #44695	2026-06-12 02:05:41 -07:00
Teknium	74180ebf0b	fix(gateway): classify SimpleX non-image/non-audio files as DOCUMENT SimpleX tagged unknown files application/octet-stream in media_types but classification only handled audio/image, leaving msg_type TEXT — run.py never injected the document context. Same bug class as #12845.	2026-06-12 01:07:50 -07:00
underthestars-zhy	b4e95a2efe	fix(photon): add clarifying comments for Windows-safe os.kill usage	2026-06-12 01:07:38 -07:00
underthestars-zhy	23305cfeab	fix(photon): normalize DM chat keys in last-inbound reaction tracker Inbound events key the tracker by the DM chat GUID (any;-;+1555...), but home-channel react calls address the same space by bare E.164 — normalize both to the phone so add_reaction's last-inbound default resolves regardless of which form the caller uses (mirrors the sidecar's phoneTargetFromSpaceId). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 01:07:38 -07:00
underthestars-zhy	156f4fba92	feat(photon): add agent-facing emoji reaction support Add `action='react'` to `send_message` tool and expose `add_reaction`/ `remove_reaction` on the Photon adapter. - Track latest inbound message id per chat (`_last_inbound_by_chat`, bounded to 200 entries) so the agent can react without threading message ids through tool calls - New `add_reaction`/`remove_reaction` public methods on PhotonAdapter; unlike the lifecycle tapbacks, these are not gated by PHOTON_REACTIONS - `send_message` gains `action='react'` with `emoji` and optional `message_id` params; resolves target via existing channel-directory and home-channel logic; requires a live gateway adapter	2026-06-12 01:07:38 -07:00
underthestars-zhy	a23c0b378c	fix(photon): use per-call httpx client in _sidecar_call Prevents "Future attached to a different loop" errors when _sidecar_call is invoked from a worker thread via _run_async in send_message_tool. The persistent _http_client remains in use for the inbound streaming loop, which always runs on the gateway's loop.	2026-06-12 01:07:38 -07:00
underthestars-zhy	9bfff6e16c	chore(photon): bump spectrum-ts to 3.1.0	2026-06-12 01:07:38 -07:00
underthestars-zhy	a652131c42	fix(photon): stop gateway restarts from orphaning the sidecar on its port A hard gateway exit (crash, SIGKILL, supervisor restart) left the detached Node sidecar running with a token the next gateway run doesn't know, so it could never be told to /shutdown. Every replacement spawn then died on EADDRINUSE, failing each 30→300s reconnect attempt while the orphan kept consuming the inbound gRPC stream. Two layers: - Lifetime binding: the adapter now holds the sidecar's stdin as a pipe, and the sidecar (PHOTON_SIDECAR_WATCH_STDIN=1) shuts down on stdin EOF — fired by the OS on any parent death, including SIGKILL. - Startup reaping: before spawning, the adapter probes the port and terminates a stale listener, but only after verifying its command line is a Photon sidecar; a foreign listener raises a clear error instead of being signalled. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 01:07:38 -07:00
underthestars-zhy	573c4e6511	feat(photon): upgrade to spectrum-ts 3.0.0 (pinned) with markdown + reactions Pin spectrum-ts to exactly 3.0.0 (was ^1.18.0 plus an `npm install spectrum-ts@latest` on every setup) so breaking SDK majors can't take down fresh installs silently; `hermes photon setup` now runs `npm ci`. Upgrade procedure documented in the README. Migrate resolveSpace to the v3 namespace API: `im.space.create(phone)` for DMs and `im.space.get(id)` for everything else — group spaces are now rehydratable from their persisted id after a sidecar restart, which v1 could not do. Markdown: replies go out via the v3 `markdown()` builder (iMessage renders natively; other Spectrum platforms degrade to plain text). `PHOTON_MARKDOWN=false` reverts to the stripped plain-text path. Reactions, behind PHOTON_REACTIONS (default off): lifecycle tapbacks (👀 while processing, 👍/👎 on completion) via new sidecar /react and /unreact endpoints with per-target reaction-handle tracking, and user tapbacks on bot-sent messages routed to the agent as synthetic `reaction:added:<emoji>` events. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 01:07:38 -07:00
underthestars-zhy	0a963d8c9a	feat(photon): add telemetry toggle via `hermes photon telemetry`	2026-06-12 01:07:38 -07:00
Austin Pickett	c3464ecf45	fix(discord): recover from runtime gateway task exits (#44383 ) * fix(discord): recover from runtime gateway task exits Salvaged from #39416 (AMEOBIUS) — cherry-picked only the task-exit recovery; the original PR was 1081 commits behind with 28 unrelated commits. A post-ready discord.py WebSocket crash left the gateway split-brained: producers stayed active while Discord stopped responding. After this fix the adapter calls _set_fatal_error(retryable=True) + _notify_fatal_error() so the existing GatewayRunner reconnect watcher replaces the dead adapter. Also adds _wait_for_ready_or_bot_exit() so startup failures (SOCKS/proxy errors, invalid tokens) surface fast instead of burning the full ready timeout. Because connect() no longer waits via asyncio.wait_for on that path, test_connect_releases_token_lock_on_timeout is updated to trigger the timeout through the new helper (same lock-release contract). 3 tests pass (2 new runtime-failure tests + the updated timeout test); test_discord_connect.py and test_discord_slash_commands.py green. Co-Authored-By: ameobius <ameobius@local.host> * fix(test): patch _wait_for_ready_or_bot_exit in timeout cancel test connect() no longer uses asyncio.wait_for for the ready handshake, so test_connect_timeout_cancels_bot_task was hanging for 30s in CI. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: ameobius <ameobius@local.host> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-11 15:39:01 -04:00
teknium1	08b1c44a53	fix(discord): extend bot-task cancellation to connect()'s generic exception branch Follow-up to #44389: the generic 'except Exception' branch in connect() had the same orphaned-task hazard as the timeout branch. Extract the cancel-and-await logic into _cancel_bot_task() and call it from all three sites (timeout branch, exception branch, disconnect()). Also adds deaneeth to AUTHOR_MAP.	2026-06-11 12:09:18 -07:00
Dineth Hettiarachchi	020ef76cf1	fix(discord): cancel _bot_task on connect() timeout to prevent zombie client When connect() times out waiting for the Discord ready event, the background asyncio.Task running client.start() was not cancelled. discord.py's internal reconnect loop can ignore client.close() while a WebSocket handshake is in flight, so the orphaned task eventually completes and fires on_ready. A later successful reconnect then leaves two live Discord clients in the same process — each with its own on_message handler and MessageDeduplicator instance — so every @mention creates two threads because the per-adapter dedup caches cannot catch cross-client duplicates. Fix: explicitly cancel and await _bot_task in two places: 1. The asyncio.TimeoutError handler inside connect() — catches the case where the adapter's own inner wait_for fires before the gateway's outer timeout. 2. The start of disconnect() — the load-bearing path, always reached via _dispose_unused_adapter regardless of which timeout fired first. Root cause confirmed from production logs: a Jun 8 network outage caused three consecutive connect() timeouts. The first attempt's bot_task completed its handshake 4 minutes later ("Connected as") with no preceding watcher line, then the watcher's real reconnect also connected 90 seconds after that. The two clients ran continuously for 41+ hours, confirmed by the same user message appearing as two separate inbound events in two different thread IDs 357ms apart. Regression tests added to tests/gateway/test_discord_connect.py: - test_connect_timeout_cancels_bot_task: simulates a connect() timeout with a NeverReadyBot and asserts _bot_task is None afterward - test_disconnect_cancels_running_bot_task: injects a live zombie task, calls disconnect(), and asserts the task is cancelled and the attribute cleared	2026-06-11 12:09:18 -07:00
Erosika	1544813bfe	chore(honcho): replace example Telegram UID with placeholder	2026-06-11 15:06:07 -04:00
Erosika	2708c33c75	docs(honcho): anonymize example peer name to alice	2026-06-11 15:04:01 -04:00
Teknium	0a5762c78d	fix(web): genericize free-MCP client identity per telemetry policy Replace the hermes-identifying clientInfo/User-Agent/session-id prefix on the keyless Parallel Search MCP path with a neutral 'mcp-web-client' identity. Project policy forbids third-party usage attribution without an explicit user opt-in (see telemetry PR policy); MCP requires a clientInfo, so a generic one satisfies the spec without attributing traffic. Also adds the contributor AUTHOR_MAP entry and refreshes uv.lock against current main (parallel-web 0.6.0).	2026-06-10 19:54:38 -07:00
Matt Harris	e0e2571711	feat(web): Parallel-backed web search & extract — free Search MCP when keyless, v1 REST when keyed Make Parallel the web search/extract backend with a zero-setup free tier: - Keyless (no PARALLEL_API_KEY): web_search/web_extract work out of the box via Parallel's free hosted Search MCP (search.parallel.ai/mcp), and parallel becomes the default backend when no other web credentials are configured (ahead of ddgs, which is search-only). A small hand-rolled Streamable-HTTP JSON-RPC client speaks the MCP's web_search/web_fetch tools; the existing web_search/web_extract tools are the only tools registered. - Keyed (PARALLEL_API_KEY set): uses the Parallel v1 REST endpoints (client.search / client.extract with advanced_settings.full_content) — no beta. Bumps parallel-web 0.4.2 -> 0.6.0. - Attribution: on the free path only, results carry provider/attribution and the CLI tool line reads "Parallel search" / "Parallel fetch"; the paid path is unbranded. - Selection/registration: web tools register unconditionally (free MCP backstop) while check_web_api_key remains a real usability probe; explicit per-capability backends are honored (so misconfig surfaces) rather than masked by the fallback. Tested: live web_search/web_extract against search.parallel.ai in keyless and keyed modes; unit suites for the MCP client, backend selection, and display labeling; full agent run shows the "Parallel search" label on the free path.	2026-06-10 19:54:38 -07:00
Erosika	99feb03607	docs(honcho): demote pinPeerName to deprecated alias; document gateway identity tree Drop pinPeerName from the key table (now a deprecated-alias note), and replace the single/multi/hybrid 'deployment shapes' section with the gateway-gated intent tree the wizard actually presents, including the [e] raw-edit hatch and the un-pin pooling steer.	2026-06-10 16:15:17 -04:00
Erosika	d7dfeed6dc	feat(honcho-setup): replace deployment-shape prompt with gateway-gated identity tree The single/multi/hybrid 'deployment shape' was a misnomer: these keys only affect the gateway (the one entrypoint supplying a runtime user ID), and the three preset names stamped a lossy taxonomy onto three orthogonal knobs while hiding which keys got written. Replace it with an intent-led tree gated on gateway detection: - _gateway_platforms() lazily inspects the gateway config (best-effort, no hard dependency); the step auto-skips when no platform is connected. - 'who talks to this?' → just me / me+others (pooled?) / only others, deriving pinUserPeer + userPeerAliases + runtimePeerPrefix and echoing the result. - [e] drops to a raw-knob editor for power users. - The single→multi orphan guard survives as a pooling steer.	2026-06-10 16:14:24 -04:00
Erosika	bb5cb32838	refactor(honcho): canonicalize identity-mapping on pinUserPeer, migrate legacy key The setup wizard wrote the legacy pinPeerName even though pinUserPeer is the canonical key that outranks it in the resolver — so it had to scrub the canonical key afterward to stop it winning. Write pinUserPeer directly and migrate any legacy pinPeerName onto it on touch (setup load + clone), which removes the precedence-fighting entirely. Resolver still reads pinPeerName as a back-compat alias; that's deferred.	2026-06-10 16:07:53 -04:00
Siddharth Balyan	183d86b3e0	fix(openrouter): route reasoning_effort to verbosity for adaptive Anthropic models (#43436 ) * fix(openrouter): route reasoning_effort to verbosity for adaptive Anthropic models Reasoning-mandatory Anthropic models (Claude 4.6+/fable/mythos-class) over OpenRouter ignore reasoning.effort and use adaptive thinking. #42991 correctly stopped Hermes from sending a reasoning field to them (it 400s), but put nothing in its place — leaving agent.reasoning_effort a silent no-op on the OpenRouter path: the model always ran at its adaptive default (high) regardless of config. OpenRouter honors the requested effort on the top-level verbosity field instead (maps to Anthropic output_config.effort). Route the existing reasoning_config[effort] there for these models while still never emitting a reasoning field, preserving the #42991 fix. No new config arg — the value the user already sets via agent.reasoning_effort now flows to verbosity. - low/medium/high/xhigh/max pass through verbatim (OpenRouter accepts the extended scale for Claude; verified live HTTP 200 + monotonic token spend). - effort unset/none/disabled omits verbosity so the model keeps its default. - native Anthropic transport already correct; unchanged. Fixes #43432 * test(openrouter): cover real effort range (add minimal, frame max as passthrough) Adversarial review noted the verbosity tests looped over 'max' — a value parse_reasoning_effort can never produce — while omitting 'minimal', which it can. Align the routing test with the real config range (VALID_REASONING_EFFORTS = minimal/low/medium/high/xhigh) and keep a separate value-agnostic passthrough test that documents why xhigh/max must survive verbatim (TypedDict, no runtime literal validation; OpenRouter accepts the extended scale for Claude). * docs: explain reasoning_effort -> verbosity routing for adaptive Anthropic models Document that reasoning_effort transparently maps to OpenRouter's verbosity field for adaptive-thinking Anthropic models (Claude 4.6+/Fable/Mythos), where reasoning.effort is ignored. Note xhigh is the configurable ceiling (max is wire- only). Add verbosity as a top-level-kwarg example in the provider-plugin guide.	2026-06-10 15:03:01 +05:30
Teknium	243cada157	fix(model): cover typed gateway /model path + async-safe pricing lookups Follow-ups on top of #26016's expensive-model guard: - gateway/slash_commands.py: typed '/model <name>' now routes through the expensive-model confirmation gate (slash-confirm buttons / text fallback) instead of bypassing the guard the pickers enforce. Cancel leaves the session override and --global config untouched. - telegram/discord/web_server: run expensive_model_warning() via asyncio.to_thread — it can hit models.dev or a /models endpoint on a cache miss, which would otherwise block the event loop. - telegram: picker callback no longer toasts 'Model switched!' when the switch callback raised (both mm: and mc: paths). - tests: new tests/gateway/test_model_command_expensive_confirm.py pins the typed-path gate (prompt, confirm-once, cancel, cheap-model no-op).	2026-06-10 00:24:06 -07:00
Robin Fernandes	af978ecb17	fix(model): require confirmation for expensive model selections Rebased onto current main and re-ported across the restructured surfaces: model flows now thread confirm_provider/base_url/api_key through hermes_cli/model_setup_flows.py, the Discord picker lives in plugins/platforms/discord/adapter.py, and the web dashboard picker applies chat-mode switches via config.set so the expensive-model confirmation can ride the response. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 00:24:06 -07:00
Joel Chan	e5580f43c2	fix(discord): propagate role_authorized flag so DISCORD_ALLOWED_ROLES works end-to-end DISCORD_ALLOWED_ROLES was checked by the Discord adapter (_is_allowed_user) but gateway._is_user_authorized only read DISCORD_ALLOWED_USERS, so role-authorized users were rejected with "Unauthorized user" at the gateway layer despite passing the adapter gate. - Add role_authorized: bool = False to SessionSource - Add role_authorized param to build_source (base.py) - Compute _role_authorized in on_message when user passes via role not user ID - Thread _role_authorized through _handle_message -> build_source - Check source.role_authorized early in _is_user_authorized (run.py) Fixes #33952	2026-06-10 00:18:11 -07:00
xxxigm	311900842e	fix(discord): don't auto-disconnect voice when reply mode is off The voice inactivity timer (VOICE_TIMEOUT) only counted the bot's OWN audio playback as activity. Under /voice off (text-only replies, but still in the channel — leaving is /voice leave) nothing ever reset it, so every 300s the bot disconnected and spammed "Left voice channel (inactivity timeout)." The adapter now learns the live voice-reply mode via a getter wired from run.py and skips the auto-disconnect while mode is off. It also resets the timer when a user actually speaks to the bot, so an active listener (incl. voice-on text-only sessions that never play audio) isn't dropped mid-conversation.	2026-06-09 23:24:26 -07:00
kshitijk4poor	4642762289	fix(langfuse): redact base64 data URIs instead of truncating into invalid base64 The Langfuse SDK treats `data:;base64,...` strings as media and tries to decode them. `_truncate_text` was slicing those strings mid-payload, producing invalid base64 and noisy "Error parsing base64 data URI" logs. Observability only needs the metadata, not raw image/audio bytes, so redact the whole data URI (type, media_type, length) before it reaches the SDK. Salvaged the Langfuse fix from #39682 onto current main as a standalone, single-concern change (the dashboard `dist/*` and plugin-discovery parts of that PR already landed separately on main). Co-authored-by: foras910521-lab <foras910521-lab@users.noreply.github.com>	2026-06-10 10:49:36 +05:30
Siddharth Balyan	46fedef07f	fix(openrouter): never send reasoning field for adaptive Anthropic models (#43012 ) The previous fix (#42991) only omitted reasoning when it was being disabled. But reasoning-mandatory Anthropic models (Claude 4.6+, fable) 400 with thinking.type.disabled on EVERY tool-continuation turn even when reasoning is enabled: chat_completions never replays signed thinking blocks, so the prior assistant tool_call has no thinking, and OpenRouter resolves "reasoning requested but history has none" by emitting thinking.type.disabled — which these models reject. Result: first turn works, every turn after the first tool call dies (HTTP 400, non-retryable). OpenRouter ignores reasoning.effort for adaptive Anthropic models anyway (the model self-decides), so the reasoning field is pointless for them on every turn and harmful on tool-replay turns. Omit it entirely → adaptive default. - openrouter profile: drop the reasoning field for reasoning-mandatory Anthropic models regardless of enabled/disabled; legacy Anthropic + non-Anthropic models unchanged. - tests: assert omission across enabled/disabled/effort variants; parity tests switched to a non-Anthropic reasoning model (deepseek) since Anthropic 4.6+ no longer carries a reasoning field. Verified live end-to-end: a tool-replay turn on anthropic/claude-fable-5 with reasoning enabled now builds extra_body=None and returns HTTP 200 (was 400).	2026-06-10 00:18:23 +05:30
Siddharth Balyan	1febb08240	fix(anthropic): default new Claude models to the modern thinking contract (#42991 ) New Anthropic models without a recognized version substring (claude-fable-5 and future named/numbered releases) were classified as legacy and routed down the manual-thinking path, which made OpenRouter emit thinking.type.disabled — a form reasoning-mandatory Claude models reject with a non-retryable HTTP 400. Invert the brittle version-substring allowlists to default-to-modern (mirroring _get_anthropic_max_output): unknown Claude models get the adaptive/xhigh/ no-sampling contract, with an explicit legacy list for older families. Non-Claude Anthropic-Messages models (minimax, qwen3, …) keep the manual path. - anthropic_adapter: _supports_adaptive_thinking / _supports_xhigh_effort / _forbids_sampling_params now default unknown Claude models to modern; legacy families enumerated in _LEGACY_MANUAL_THINKING_CLAUDE_SUBSTRINGS. - openrouter profile: omit reasoning entirely (→ adaptive default) instead of forwarding {enabled:false} for reasoning-mandatory Anthropic models; legacy Anthropic + all non-Anthropic models still pass the disable form through. - model_metadata + output-limit table: register claude-fable-5 (1M ctx, 128K out). Tests assert the invariant ("unknown Claude model -> modern contract; legacy stays manual; non-Claude unaffected"), not specific model names.	2026-06-09 23:37:23 +05:30
Philip D'Souza	92dfd70d6a	fix(photon): production hardening for the gRPC-native iMessage channel (#42732 ) * fix(photon): override transitive CVEs in the sidecar deps `npm audit` flagged 7 high-severity transitive CVEs (protobufjs code injection GHSA-66ff-xgx4-vchm + outdated @opentelemetry OTLP exporters) pulled in via spectrum-ts -> @photon-ai/otel. npm's suggested fix downgrades spectrum-ts to a version that targets the decommissioned spectrum host, so instead pin patched versions via `overrides` (protobufjs 8.6.1, @opentelemetry/* 0.218.0) without touching spectrum-ts. `npm audit` -> 0; spectrum-ts + provider still import. * fix(photon): harden the sidecar bridge + bound the dedup cache - constant-time sidecar control-token comparison (was `!==`, timing-attackable). - cap the control-channel request body (2 MiB) so a compromised local peer can't OOM the sidecar. - wrap the inbound gRPC stream consumer in a re-subscribe loop with capped exponential backoff + jitter — if the async iterator throws/ends it would otherwise stop inbound forever (the adapter dedupes any replay). - add an unhandledRejection handler so a stray rejection logs instead of killing the process. - dedup cache (adapter) was a true bounded LRU only for expired entries; a burst of unique ids within the window grew it without limit. Evict oldest at the cap. * chore: add AUTHOR_MAP entry for PhilipAD --------- Co-authored-by: PhilipAD <philipadsouza@gmail.com>	2026-06-09 11:12:58 -04:00
kshitij	85852b71d8	fix(nemo-relay): preserve downstream errors in adaptive execution (#42691 ) Based on #42658 by @mnajafian-nv. Preserves the real downstream provider/tool exception when NeMo Relay's managed adaptive execution wraps a failing callback as an internal runtime error. Without this, the original exception (and its retry-classification signal, e.g. status_code) is lost behind Relay's wrapper. Salvage changes on top of the original PR: - Tolerant Relay-wrapper match: _is_relay_wrapped_callback_error now uses str.startswith on the "internal error: <cls>: <msg>" prefix instead of exact equality, so a future Relay version appending a traceback/suffix doesn't silently defeat the unwrap. On a total format change it returns False and falls back to the pre-fix behavior (surfacing Relay's error) rather than masking it. - Deduplicated the LLM and tool execute paths into a shared _run_managed_with_downstream_preservation helper, removing ~20 lines of copy-pasted nonlocal/try-except scaffolding that could drift out of sync. - Added a real-middleware regression guard (test_nemo_relay_downstream_unwrap_matches_real_middleware_wrapper_shape) that drives hermes_cli.middleware._run_execution_chain and asserts the plugin's _original_downstream_error unwraps the actual private _DownstreamExecutionError wrapper. The original synthetic tests modeled the wrapper with a local class, so a rename or shape change in core middleware would not have been caught; this test fails loudly if that contract drifts. Co-authored-by: mnajafian-nv <mnajafian@nvidia.com>	2026-06-09 02:31:10 -07:00

1 2 3 4 5 ...

462 commits