Salvage follow-up for PR #29575: add regression tests for the section-3
no-api_key /v1/models probe (probes bare endpoints, skips when explicit
models set) and add the contributor AUTHOR_MAP entry.
* fix(teams): package Microsoft Teams SDK as an installable extra
The Teams adapter imports the microsoft-teams-apps SDK, but it was never
declared as a dependency, so source/local installs hit ImportError and the
adapter silently reported the SDK as unavailable. Add a 'teams' extra
(microsoft-teams-apps==2.0.13.4 + aiohttp) and document 'uv sync --extra teams'.
Per the 2026-05-12 [all] policy, opt-in messaging-platform SDKs are NOT added
to [all] (they would break every fresh install on a quarantined release); the
teams extra is installed on demand like the other platform backends.
Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai>
* chore: map rio-jeong contributor email for attribution (#43945)
* feat(teams): lazy-install the Teams SDK on demand (parity with other channels)
The teams extra alone left Teams as the only messaging platform that wouldn't
auto-install its SDK — every other channel (telegram, discord, slack, matrix,
dingtalk, feishu) lazy-installs via tools.lazy_deps on first connect. Bring
Teams to parity:
- Add 'platform.teams' to LAZY_DEPS (microsoft-teams-apps + aiohttp).
- Replace the passive 'check_teams_requirements = check_requirements' alias with
a real lazy-installer that calls ensure_and_bind('platform.teams', ...),
rebinding all Teams SDK globals on success (mirrors check_slack_requirements).
- Call check_teams_requirements() at the top of TeamsAdapter.connect() so
enabling Teams installs the SDK on demand.
- Keep the passive check_requirements() as the registry check_fn so 'gateway
status' probes never trigger a pip install.
The 'teams' extra remains for packagers / explicit 'uv sync --extra teams'.
Tests: rework the alias test into shortcircuit + lazy-install assertions, and
update test_connect_fails_without_sdk to simulate an uninstallable SDK.
---------
Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
* fix: declare websockets as a core dependency
* fix(deps): relax dev setuptools pin 82.0.1 -> 81.0.0 (torch caps setuptools<82)
torch >= 2.11 publishes Requires-Dist: setuptools<82, so any environment
that resolves the dev extra together with torch is unsatisfiable:
$ uv pip install --dry-run ".[dev]" "torch==2.12.0"
x No solution found when resolving dependencies:
... torch==2.12.0 and all versions of hermes-agent[dev] are incompatible.
81.0.0 is the latest release under the cap and stays inside the declared
build-system window (setuptools>=77.0,<83). uv.lock regenerated with
'uv lock'; diff is scoped to the setuptools entry.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* chore: map salvaged contributor emails for attribution
Add AUTHOR_MAP entries for the two cherry-picked contributors so the
check-attribution CI gate passes:
- yehaotian@xuanshudeMac-mini.local -> ArcanePivot (#45486)
- dbeyer7@gmail.com -> benegessarit (#44693)
---------
Co-authored-by: 玄枢 <yehaotian@xuanshudeMac-mini.local>
Co-authored-by: David Beyer <dbeyer7@gmail.com>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Follow up PR #46609's api.minimax.io reasoning report by moving the behavior out of the broad run_agent host gate and into the MiniMax provider profile. Only MiniMax-M3 on the documented OpenAI-compatible /v1 route gets reasoning_split/thinking/reasoning_effort; Anthropic-format MiniMax and non-M3 models keep their existing wire shapes.
Co-authored-by: goku94123 <gooku94123@gmail.com>
Route curator rollback through the same cross-process cron job lock, make save_jobs lock for legacy direct callers without deadlocking nested mutation paths, and harden the regression test so a second _jobs_lock caller really blocks across processes.
Upgrade the Vite/esbuild surfaces that kept web, ui-tui, and the bootstrap installer on vulnerable esbuild versions, regenerate the root lockfile, and preserve intentional package+lock dependency edits during update lockfile cleanup.
Add a parser-only routing regression that proves raw WhatsApp group JIDs bypass channel-directory resolution and home-channel fallback, include channel_aliases.json in quick state snapshots, harden malformed alias handling, and map Keiron McCammon for release attribution.
Keep request dump writes on the shared atomic JSON path, add regression coverage for request body/error/stdout redaction, and map the salvaged contributor email for release attribution.
Salvaged commit in this PR is authored by capt-marbles
(andrewdmwalker@gmail.com), a bare gmail that does not auto-resolve in
the check-attribution job. Add the AUTHOR_MAP entry.
* fix(docker): skip per-profile gateway reconciliation in dashboard container
When gateway and dashboard containers share a bind-mounted HERMES_HOME,
both run the cont-init.d profile reconciliation script, which creates
s6-log processes for every persisted profile. These s6-log processes
in different containers race to flock() the same log-directory lock
files under logs/gateways/<profile>/lock, producing repeated
"s6-log: fatal: unable to lock ... Resource busy" errors and a
supervision restart storm.
Add HERMES_SKIP_PROFILE_RECONCILE env var support to container_boot.py
and set it in the official docker-compose.yml dashboard service so the
dashboard container no longer creates per-profile gateway s6 services
it never uses.
* chore(release): map salvaged contributor
* refactor(docker): autodetect dashboard container instead of env-var gate
Replace the HERMES_SKIP_PROFILE_RECONCILE env var with PID 1 argv role
detection. A dashboard-only container never spawns or supervises
per-profile gateways, so the reconcile boot hook now skips itself when
/proc/1/cmdline is the dashboard command — no operator flag to set (or
forget in a hand-written manifest, which would reintroduce the s6-log
flock storm this prevents).
- Extract _strip_container_argv_prefix() shared by the legacy-gateway
and new dashboard detectors (DRY the init/wrapper/hermes peel).
- Add _is_dashboard_container(); gate reconcile main() on it.
- Drop HERMES_SKIP_PROFILE_RECONCILE from code + docker-compose.yml.
- Tests: argv matrix for both roles + main()-level skip/reconcile proof
and a regression that the removed env var is now inert.
Co-authored-by: 895252509 <895252509@qq.com>
---------
Co-authored-by: zhouxiang <895252509@qq.com>
Co-authored-by: Ben <ben@nousresearch.com>
* fix: persist s6 gateway desired state
* chore(release): map salvaged contributor
---------
Co-authored-by: Alfred Smith <alfred@my-cloud.me>
Co-authored-by: Ben <ben@nousresearch.com>
* fix(gateway): chown logs/gateways parent so late-added profiles can log
The per-profile log service script created $HERMES_HOME/logs/gateways/
via 'mkdir -p' but only chowned the leaf logs/gateways/<profile>. When
the first log service boots in root context, the gateways/ parent stays
root:root; every profile registered later runs its log service as the
dropped hermes user, 'mkdir -p' fails with EACCES, and s6-log enters a
sub-second fatal crash-loop flooding the container log. The stage2
recursive heal does not catch it either: it is gated on needs_chown,
which is false when the top-level $HERMES_HOME is already hermes-owned.
Two complementary fixes:
- service_manager._render_log_run: chown the gateways/ parent
(non-recursively) before the leaf chown. Runs on every root-context
boot, so it also heals volumes already poisoned by older images.
- docker/stage2-hook.sh: seed logs/gateways in the as_hermes mkdir -p
block; cont-init runs before any service starts, so the parent
already exists hermes-owned when the first log/run does 'mkdir -p'.
The needs_chown repair loop needs no twin entry: it already chowns
logs/ recursively, which covers logs/gateways.
Fixes#45258
* chore(release): map salvaged contributor
---------
Co-authored-by: tangtaizhong666 <tangtaizhong792@gmail.com>
On Windows, native Python extensions such as _bcrypt.pyd are loaded as
DLLs by any running hermes process. When the installer tries to recreate
the venv (Remove-Item -Recurse -Force "venv"), Windows denies the delete
because the DLL is still mapped into the running process.
Add a taskkill /F /T /IM hermes.exe call before the Remove-Item so any
hermes process tree is stopped first, releasing the file lock. A short
sleep gives the OS time to unload the image before deletion proceeds.
This mirrors the existing force_kill_other_hermes() guard already present
in the --update flow (update.rs), applying the same pattern to the full
reinstall/repair path through install.ps1.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The initial fix only wrote the prefix npmrc on a fresh Node install, so
pre-existing bundled-Node installs (Node already present) were not repaired
by re-running the installer — install_node/ensure_node skip when Node is
already up to date.
Extract the redirect into an idempotent helper
(configure_managed_node_npm_prefix / _nb_configure_npm_prefix) that no-ops
when there's no Hermes-managed npm, and call it unconditionally from
check_node (install.sh) and at the top of ensure_node (node-bootstrap.sh).
Re-running the install command now repairs an affected install in place,
not just brand-new ones.
When the installer falls back to a bundled Node under $HERMES_HOME/node,
npm's default global prefix is that Node dir, so `npm install -g <pkg>`
drops the package binary in $HERMES_HOME/node/bin. Only node/npm/npx are
symlinked into the command link dir (~/.local/bin, /usr/local/bin, or
$PREFIX/bin) — so user-installed global package binaries are NOT on PATH
and can't be run, even though `npm i -g` reports success. They also get
wiped on every Node upgrade (the dir is rm -rf'd and re-extracted).
Redirect the bundled Node's npm global prefix to the command link dir's
parent, so global bins land in the link dir (already on PATH, alongside
node/npm/npx) and survive Node upgrades. Scoped to the bundled Node via
its prefix-local global npmrc ($HERMES_HOME/node/etc/npmrc), so the user's
other Node installs and their ~/.npmrc are untouched. Hermes's own global
installs (agent-browser) pass an explicit --prefix and are unaffected.
When an existing install at $INSTALL_DIR has an unmerged index (files in a
"needs merge" state left by a previously interrupted update), the update path
ran `git stash` then `git checkout <branch>`. On a conflicted index `git stash`
aborts with "could not write index" and `git checkout` then aborts with "you
need to resolve your current index first" — surfacing to desktop/bootstrap
users as `git checkout main failed (exit 1)` and failing the whole install at
the repository stage.
Mirror the `hermes update` Python path (#4735): detect unmerged entries with
`git ls-files --unmerged` and clear the conflict state with `git reset` before
stashing. Working-tree changes are still captured by the subsequent stash, so
nothing is discarded; only the index-level conflict markers are dropped, which
lets the checkout proceed.
Fixed in both installers (install.sh and install.ps1) so the Windows GUI
installer and the POSIX one share the same recovery behavior.
Addresses PR review feedback:
- Validate refresh_token (not only access_token) before persisting the
re-imported Codex token, so a half-token payload can't silently break the
next refresh cycle.
- Make the recovery log path-agnostic ("Codex CLI auth.json") since
_import_codex_cli_tokens can read $CODEX_HOME, not only ~/.codex.
- Add regression test: relogin-required + imported token missing refresh_token
-> re-raise and persist nothing.
- Map kenmege@yahoo.com -> Kenmege in scripts/release.py AUTHOR_MAP
(fixes the check-attribution job).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Salvages PR #25747 by preserving gateway session rotation even when a post-compression model call fails before returning final content.
Co-authored-by: Hermes <127238744+teknium1@users.noreply.github.com>
The deploy-site skills index crawl was capped at ~3k ClawHub entries
because CATALOG_WALK_BUDGET_SECONDS applied to max_items=0 walks too.
Only enforce the wall-clock budget for bounded browse requests and pass
limit=0 from build_skills_index so CI walks the full catalog.
Co-authored-by: Cursor <cursoragent@cursor.com>
Follow-up to #44389: the generic 'except Exception' branch in connect()
had the same orphaned-task hazard as the timeout branch. Extract the
cancel-and-await logic into _cancel_bot_task() and call it from all
three sites (timeout branch, exception branch, disconnect()).
Also adds deaneeth to AUTHOR_MAP.
A long-lived Baileys bridge survives gateway restarts AND hermes update:
connect() adopted any bridge already listening with status connected, and
disconnect() only kills bridges the adapter spawned itself. Users who
updated to get inbound media support kept talking to a bridge process
serving months-old bridge.js — images and voice notes still arrived as
placeholders with no cached file path (refs #19105 follow-up reports).
Three fixes in the same stale-bridge class:
- Staleness handshake: bridge.js reports a sha256 self-hash in /health
(scriptHash); connect() compares it against bridge.js on disk and
restarts the bridge on mismatch. Pre-handshake bridges report no hash
and are treated as stale, so every existing stale bridge gets recycled
exactly once on the next gateway start.
- npm dep refresh: deps reinstall when package.json changes (stamp file
in node_modules), not only when node_modules is missing — a Baileys
pin bump now actually lands.
- Cache-dir passthrough: the gateway passes profile-aware
HERMES_{IMAGE,AUDIO,DOCUMENT}_CACHE_DIR to the bridge instead of the
bridge hardcoding ~/.hermes/image_cache etc., fixing media paths under
HERMES_HOME overrides, profiles, and the new cache/ layout.