The /api/audio/elevenlabs/voices endpoint logged a WARNING on every
failure, and the desktop re-polls it on each settings open/focus — a
bad/expired/scoped ELEVENLABS_API_KEY floods agent/gui logs with
identical "voice list failed: HTTP Error 401" lines indefinitely.
Treat 401/403 as a persistent "integration unavailable" state: return
{available: false, error: "unauthorized"} with a 200 (the dropdown
already handles available:false) instead of a 502, and collapse repeated
identical failures to a single log line via a small re-arming latch
(logs again on recovery or when the error changes). Non-auth errors keep
the 502 but are throttled the same way.
windows_hide_flags() already returns 0 on POSIX (and creationflags=0 is
the no-op default there, exactly how server.py::_list_repo_files does it),
so drop the IS_WINDOWS import + ternary/one-use-dict gating and just pass
creationflags=windows_hide_flags() directly. Tests lose the now-pointless
IS_WINDOWS monkeypatch.
server.py's PDF-attach handler shells out to `pdftoppm` from the
console-less desktop/gateway backend; on Windows that pops a conhost
window each attach. Route it through windows_hide_flags() like the
sibling _list_repo_files git calls (no-op on POSIX).
The #54236/#54417 backend git/gh sweep routed git_probe, the repo-file
picker, coding_context, context_references, copilot_auth, and the gateway
process scans through CREATE_NO_WINDOW, but two sibling spawn legs that
also run inside the console-less desktop/gateway backend were missed:
- tools/checkpoint_manager.py `_run_git` (and the one-shot `git init
--bare` in `_init_store`) — when checkpoints are enabled, every
file-mutating turn fires multiple bare `git` calls (status, add,
write-tree/commit-tree, update-ref). Spawned from a parent with no
console (Electron spawns the backend with windowsHide → CREATE_NO_WINDOW),
each one allocates its own conhost window → a flurry of terminal popups.
- tools/skills_hub.py `GitHubAuth._try_gh_cli` — `gh auth token`, the same
bug class as the already-fixed copilot_auth gh probe.
Route both through `windows_hide_flags()` (no-op on POSIX), matching the
established per-site pattern. Tests added to
tests/test_windows_subprocess_no_window_flags.py.
The startup config/manifest reads used PyYAML's pure-Python SafeLoader,
which is ~8x slower than the libyaml-backed CSafeLoader C extension.
config.yaml is parsed several times during launch (cli config, raw
config, early interface/redaction bridge, logging config) and every
plugin manifest is parsed once — all on the slow path.
Add utils.fast_safe_load (CSafeLoader-preferring, pure-Python fallback,
true drop-in for safe_load) and route the hot startup parse sites
through it: hermes_cli/config.py (config + manifest reads),
hermes_cli/plugins.py (manifest parse), env_loader, cli.load_cli_config,
hermes_logging, and the two pre-config early YAML bridges in main.py.
Behavior is identical (same restricted safe tag set); only speed changes.
safe_load calls on the startup path drop from ~79 to ~0, cutting the
YAML parse cost from ~0.9s to ~0.15s under profiling.
Adds tests/test_fast_safe_load.py asserting equivalence with safe_load
across input shapes, empty-doc falsiness, C-loader preference, and that
python/object tags are still rejected (safe, not full loader).
yuanbao_media.download_url() fetched model-supplied (outbound) and inbound
image/file URLs server-side via httpx with follow_redirects=True and no
SSRF check. A model response containing <img src="http://169.254.169.254/...">
routed through ImageUrlHandler -> download_url and would fetch cloud-metadata
endpoints; same for inbound media.
Add an is_safe_url() pre-flight plus an async redirect event-hook that
re-validates every 30x target, matching the cache_image_from_url() guard in
gateway/platforms/base.py. The other gateway adapters already guard their
URL-fetch paths; this was the remaining unguarded one.
A config left with `provider: anthropic` but a leftover
`base_url: https://openrouter.ai/api/v1` (e.g. after a provider switch)
would route Anthropic OAuth/setup-token traffic to OpenRouter and 404.
Add `_anthropic_base_url_override_ok()` and gate the three native-Anthropic
resolution branches (pool, explicit, native) on it. The guard honors a
configured `model.base_url` only when it plausibly speaks the Anthropic
Messages protocol — official `*.anthropic.com` / `*.claude.com` hosts, Azure
Foundry endpoints, and `/anthropic`-suffixed or Kimi `/coding` proxies — and
falls back to `https://api.anthropic.com` otherwise. Aggregator URLs like
openrouter.ai / api.openai.com are treated as stale.
Reconstructed from @clovericbot's PR #3661 onto current main: the original
patched one branch with an anthropic-only allow-list, which would have broken
Azure-via-anthropic; widened to all three sites and made Azure/proxy-safe.
Bundled platform plugins (telegram, discord, feishu, teams, ...) were
eagerly imported at plugin-discovery time on every `hermes` invocation,
including plain `hermes chat` which never touches a gateway platform.
Their modules import heavy platform SDKs at module level (lark_oapi,
microsoft_teams, discord.py, slack_bolt, ...) — feishu alone pulled in
lark_oapi (~2.6s), teams pulled microsoft_teams (~1.9s).
Discovery now registers a cheap deferred loader per platform in the
platform_registry; the adapter module is imported only when the gateway
/ cron / setup / send_message path actually asks for that platform.
is_registered() and the iterate-all accessors stay correct (deferred
counts as registered; plugin_entries()/all_entries() materialize all
deferred loaders, since those paths genuinely need every adapter).
Cold start: ~4.4s -> ~2.45s to banner. discover_and_load: 2.0s -> 0.3s
(warm), and the heavy SDKs are no longer imported at all in CLI mode.
Every shipped platform remains available out of the box — it just loads
on first use.
When config.yaml has `provider: auto` and a non-cloud `base_url` (e.g. Ollama
at localhost:11434), requests were silently sent to https://api.anthropic.com
whenever ANTHROPIC_API_KEY was present in the environment, ignoring the
configured local endpoint and returning HTTP 401 / "credit balance too low".
Root cause: resolve_provider("auto") scans env vars and returns "anthropic"
when ANTHROPIC_API_KEY is set, before config.model.base_url is ever consulted.
In resolve_runtime_provider(), before calling resolve_provider(), short-circuit
to the OpenAI-compatible resolver when no explicit creds were passed, provider
is "auto"/unset, and a non-cloud base_url is configured. Well-known cloud roots
(openrouter.ai, anthropic.com, openai.com) are matched on HOST (not substring)
so look-alike hosts can't evade the bypass and leak a cloud credential.
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
The curator's inactivity prune archived any non-pinned agent-created
skill whose activity was older than archive_after_days (90d). A skill
loaded only by a cron job had its usage bumped solely when the job
fired, so paused jobs, infrequent (quarterly/annual) schedules, and
far-future one-shots aged their skills out from under them — the next
run then failed to load the now-archived skill.
- cron/jobs.py: add referenced_skill_names() returning skills used by
ANY job (incl. paused/disabled).
- curator.apply_automatic_transitions(): skip cron-referenced skills
like pinned; add a use=0 grace floor so a never-used skill is not
marked stale/archived until it is at least stale_after_days old.
- LLM review pass: candidate list marks cron=yes; prompt forbids
pruning cron-referenced skills and never-used skills under 30 days.
Tested E2E against a real cron job + real usage records and with 4 new
unit tests.
On Windows, uv pip install -e . can register hermes.exe in package metadata
while the launcher never lands on disk. Detect missing [project.scripts]
shims and reinstall entry points under the existing quarantine path in
hermes update and install.ps1.
* fix(daytona): quote single-upload mkdir parent path
The single-file _daytona_upload() path shelled out 'mkdir -p {parent}'
with the remote parent interpolated unquoted, so shell metacharacters in
the path could break the command or inject arbitrary commands into the
sandbox. The bulk-upload, bulk-download, and delete paths were already
hardened with shlex-quoting helpers; this single-upload path was missed.
Route it through the existing quoted_mkdir_command() helper and add a
regression test covering a path with shell metacharacters.
Reported by @Gutslabs (#3960); the original branch predated the
file_sync refactor, so the fix is re-applied to the current code path.
* docs(infographic): daytona quote-sync fix
Removed/unauthorized Telegram users could inject prompt content before the
per-user auth gate fired. The adapter ran `_should_process_message`,
`_build_message_event`, and text/photo batching — and dispatched to the
runner — before `_is_user_authorized()` (gateway/authz_mixin.py) rejected
the sender. Unmentioned group chatter from a removed user was also
persisted into the session transcript via `_observe_unmentioned_group_message`,
leaking into the agent's observed context independent of dispatch.
Add `_is_user_authorized_from_message()` as an intake prefilter that runs
in `_handle_text_message`, `_handle_command`, `_handle_location_message`,
and `_handle_media_message` BEFORE batching, event construction, and the
unmentioned-group observe branch. It reuses the runner's
`_is_user_authorized()` with a correctly-shaped SessionSource (group vs
forum vs dm, real chat_id for TELEGRAM_GROUP_ALLOWED_* allowlists),
falls back to env allowlists, and only rejects when an allowlist actually
exists — unknown DMs with no allowlist still reach the pairing flow.
Channel posts authorize via `sender_chat` identity when `from_user` is
absent.
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Co-authored-by: Carlos Manuel Cejas <carlosmcejas@gmail.com>
The SSRF cluster (7a6fe9bb, 48f5c425, 7ef04ae7) sealed
browser_snapshot, browser_vision, and _browser_eval against
eval-navigated private pages, but browser_get_images bypasses
_browser_eval and calls _run_browser_command("eval", ...) directly.
An eval-driven navigation to a private address followed by
browser_get_images would leak image src URLs and alt text from the
private page.
Add the same _eval_ssrf_guard_active + _current_page_private_url
recheck before returning image data, matching the pattern established
by the sibling guards.
5 new tests cover: block on private page, allow on public page, skip
for local backend, skip when private URLs allowed, no guard needed on
failed eval.
Route FS/git REST through the active profile, mount the remote folder picker
at app root, keep the project dialog open while picking, show a first-run
blank state, flip into grouped view on create, and constrain the picker scroll
area so Select stays reachable.
Pin the desktop-to-gateway cwd handoff: createBackendSessionForSend must pass
the current workspace cwd into session.create so the backend registers the
session cwd before the agent/tools run.
Let UI callers ask for folders/files without knowing remote-picker limits:
selectDesktopPaths now normalizes remote directory selection to a single folder
inside the facade. Project creation and composer context picking no longer branch
on remote mode; they route through desktop-fs helpers just like git callers route
through desktopGit(). Behavior unchanged except remote folder context now works
through the same backend picker path.
Keep the remote git mirror as a thin facade: route all GETs through gitGet,
all mutations through gitPost, and keep consumers on desktopGit(). On the
backend, route git paths through a single _git_path helper instead of repeating
str(_fs_path(...)) in every endpoint. Behavior unchanged.
Second pass on the remote-project flow: the project dialog and git cockpit were
remote-aware, but the composer's Add file/folder context picker still called the
native Electron picker directly. Route it through selectDesktopPaths so remote
sessions use the backend-aware picker instead of local disk paths; preserve local
multi-select behavior and keep remote folder selection single because the in-app
remote picker only supports one directory.
Also use readDesktopFileDataUrl for image previews so an already-known backend
image path can be read through /api/fs/read-data-url, and add focused coverage
for backend file-diff routing plus the plain-folder git init/worktree path.
writeProjectIdea used the local-only Electron writeTextFile, so on a remote
gateway IDEA.md never landed on the backend (where the project folder lives).
Route it through writeDesktopFileText (local Electron / POST /api/fs/write-text).
Collapse the two near-duplicate status parsers (_parse_status_v2 +
_iter_status_entries) into a single _walk_entries generator feeding the rail,
review list, and commit flow; share the staged predicate; hoist `import re`.
Behavior unchanged.
After the folder picker fix, an added remote folder was still half-usable:
the desktop's git GUI (coding-rail status, worktree lanes, review pane,
branch switch, file diff) all ran Electron-local git on the USER's machine,
so against a remote-gateway repo they silently degraded to empty.
Mirror the whole surface over the dashboard REST API so it acts on the
BACKEND repo where sessions actually run:
- hermes_cli/web_git.py: git/gh logic (status, worktrees, branches, review
list/diff/stage/unstage/revert/commit/commit-context/push/ship-info/
create-pr, file-diff, worktree add/remove, branch switch) shelling to the
system git, mirroring the Electron ops' shapes.
- web_server.py: /api/git/* routes (same auth gate + _fs_path hardening as
/api/fs, executor-offloaded, mutations -> 400).
- apps/desktop desktop-git.ts: remote-aware facade exposing the same shape as
window.hermesDesktop.git; coding-status / review / projects / model /
desktop-fs route through desktopGit() so local stays Electron, remote hits
/api/git/*.
Tests: tests/hermes_cli/test_web_server_git.py (real repo: status counts,
review classification, diff incl. untracked all-add, stage+commit roundtrip,
worktree/branch lifecycle, commit-context, gh-absent ship-info, auth) and
desktop-git.test.ts (local vs remote routing, envelope unwrap, POST bodies).
The new-project / add-folder dialog (PR #49037) picked folders via the
native Electron dialog (pickDefaultProjectDir), which only browses the
LOCAL machine. On a remote gateway that picks a path that doesn't exist
on the backend where sessions actually run.
Route pickProjectFolder() through selectDesktopPaths({directories,
multiple:false}) — the same remote-aware path the retired right-sidebar
picker used: local mode opens the native directory dialog, remote mode
browses the backend filesystem via the in-app RemoteFolderPicker. Seed
it with the backend's default cwd on remote so it opens somewhere useful.
When a local browser_navigate (or any browser command) fails fast because
Chromium isn't on disk, attempt a one-shot binary download via
`agent-browser install` and retry instead of only printing a hint.
Scope is narrow on purpose:
- binary only, never `--with-deps` (that shells apt/needs root, so missing
system libraries stay a user action)
- gated by `security.allow_lazy_installs` (same opt-out as every lazy install)
- skipped in Docker (Chromium ships in the image)
- attempted once per process
Follow-up to #54353, which made the cold-start failure legible; this closes
the "doesn't actually install the missing browser" gap for the common case.
Local browser_navigate cold-starts the agent-browser daemon and Chromium;
60s was too short on slow Linux hosts and timeouts discarded stderr,
leaving users with a generic failure. Use a 120s floor on first open,
inject --no-sandbox in Docker, include captured daemon output plus install
hints when commands time out, and show "Failed to open" in the desktop
tool chip when navigation returns success=false.
_get_platform_tools() applies agent.disabled_toolsets as a final
override AFTER reading platform_toolsets.<platform>, so a toolset
listed there stays permanently OFF no matter what the toggle write
path saves. Blank Slate installs pre-populate this list with ~27
toolsets, making most of the desktop Toolsets UI un-enableable
(issue #49995).
Fix: _save_platform_tools() now removes any toolset the user just
explicitly enabled FOR THIS PLATFORM from agent.disabled_toolsets.
Toolsets the user did not touch, or that remain disabled on other
platforms, are left alone -- disabled_toolsets keeps working as a
cross-platform suppression list for anything not actively re-enabled.
Disabling a toolset (unchecking it) does not touch disabled_toolsets
at all -- only enables reconcile it.
Verified end-to-end with the exact repro from the issue: Blank Slate
config (disabled_toolsets=['todo','memory','browser'], cli=['file',
'terminal']) -> enable 'todo' via the toggle -> _get_platform_tools()
now resolves 'todo' as enabled while 'memory'/'browser' (untouched)
remain disabled.
Added 4 regression tests. Full tools_config suite: 101 passed
(97 existing + 4 new), no regressions.
Fixes#49995
main (cb982ad99) wired windows_hide_flags() into the auxiliary git/gh/wmic/
bash/powershell/taskkill legs but left two it didn't reach, plus the Electron
backend-launch leg it explicitly deferred. Cover them the same way:
- apps/desktop/electron/main.cjs: getNoConsoleVenvPython resolves the BASE
pythonw.exe instead of the venv Scripts\pythonw.exe shim, which re-execs a
console python.exe and flashes a conhost the desktop backend can't suppress.
Both backend creators put the venv site-packages on PYTHONPATH so imports
still resolve under the base interpreter. (main's commit said this Electron
leg "needs a Windows-tested change of its own".)
- tools/tts_tool.py, tools/transcription_tools.py, plugins/platforms/discord:
ffmpeg conversions (voice notes / TTS / STT) via windows_hide_flags().
- plugins/platforms/whatsapp: netstat + taskkill bridge-port cleanup via
windows_hide_flags().
All no-ops on POSIX. Tests assert the base-pythonw preference and the ffmpeg
legs pass CREATE_NO_WINDOW.
The Windows desktop GUI runs its backend headless via pythonw.exe. Several
auxiliary subprocess sites that run inside that windowless backend spawned
console-subsystem children (git, gh, wmic, powershell, bash, rg, taskkill)
WITHOUT CREATE_NO_WINDOW, so Windows allocated a fresh conhost per call and
flashed a black window on screen — sometimes continuously (the dashboard
Projects-tree git probe alone fired ~118 spawns in 60s on startup).
The terminal tool, cron, browser, code_execution, and gateway-spawn paths
already carry windows_hide_flags(); these auxiliary probe/scan/launcher legs
were missed. Wire the existing helper into them:
- tui_gateway/git_probe.py: run_git (+ encoding=utf-8/errors=replace, fixes the
cp950 UnicodeDecodeError on CJK paths from the same site)
- agent/coding_context.py: _git (per-turn git status/log/diff)
- agent/context_references.py: _run_git + _rg_files (@file/@ref resolution)
- hermes_cli/copilot_auth.py: gh auth token probe (auxiliary provider:auto)
- hermes_cli/gateway.py: wmic + PowerShell Get-CimInstance PID scan
- hermes_cli/main.py: wmic stale-dashboard PID scan
- gateway/status.py: taskkill /T /F force-kill
windows_hide_flags() returns 0 on POSIX, so every changed call is a no-op on
Linux/macOS (verified: real git/rg probes still work; Windows-simulated calls
all pass creationflags=CREATE_NO_WINDOW).
Scoped to the windowless-backend paths that cause the reported flashing. The
Electron updater-handoff leg (main.cjs windowsHide:false) and the
interactive-CLI banner probes (cli.py) are intentionally NOT touched here —
the former needs a Windows-tested change of its own, the latter runs in a
visible console anyway.
Tracking: #54220
Refs: #53178#53631#53781#53957#49602#52982#53424#53053#53016
OAuth-protected MCP servers (e.g. Hospitable) return 200 text/html on an
unauthenticated HEAD probe — a login/landing page the server cannot substitute
for a real MCP response without a Bearer token. The preflight cannot
distinguish this from a misconfigured URL, so it raises NonMcpEndpointError
before the OAuth browser flow has a chance to run.
Add `and self._auth_type != "oauth"` to the preflight condition in
MCPServerTask.run(). The probe is inapplicable to OAuth servers: their URL
legitimacy is established by .well-known/oauth-protected-resource during the
OAuth handshake, not by a GET content-type check.
Concrete repro: Hospitable (https://mcp.hospitable.com/mcp) returns
`200 text/html` to an unauthenticated httpx HEAD. Without the guard:
✗ NonMcpEndpointError at `hermes mcp test`
With the guard:
✓ Connected (1487ms) — 63 tools discovered
Relation to open PRs:
- #37598 adds a POST probe fallback for POST-only non-OAuth servers (e.g.
DocuSeal), but only passes when POST returns 2xx + MCP content-type.
Hospitable returns 401 on the POST probe (Bearer challenge), so #37598
does not cover this case.
- #49463 extends the POST probe to also pass on non-2xx auth challenges
(making it OAuth-aware), but is labeled duplicate of #37598 and may not
land independently.
This fix is complementary: it handles OAuth servers with zero extra
round-trips rather than adding a POST probe step.
Tests:
- test_oauth_server_html_response_raises_without_skip: documents that
_preflight_content_type raises NonMcpEndpointError for 200 text/html
(the underlying issue), with an OAuth-server docstring.
- test_run_skips_preflight_for_oauth: verifies that run() does NOT invoke
_preflight_content_type when auth_type=="oauth", using class-level
monkeypatching so the gate is exercised without a live MCP transport.
23 passed tests/tools/test_mcp_preflight_content_type.py
When a stale lock file survives a gateway crash, `acquire_scoped_lock()`
may return `(False, existing_dict)` even after detecting and deleting
the stale lock (e.g. if unlink fails or a race condition occurs).
Previously, `_acquire_platform_lock()` called
`_set_fatal_error(..., retryable=False)`, which permanently killed the
platform — the reconnect watcher never retries a non-retryable fatal
error.
Change to `retryable=True` so the platform enters the "retrying"
state and the reconnect watcher can attempt acquisition again after the
standard backoff delay.
Fixes#54167