hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-31 06:51:29 +00:00

Author	SHA1	Message	Date
Ben	472be1247d	fix(service_manager): pass encoding to Path.read_text in _s6_running PR #30136 CI: ruff PLW1514 (preview rule unspecified-encoding) failed on `Path('/proc/1/comm').read_text().strip()` introduced by commit `2f8ceeab9` (the daimon-nous critical-bug fix that switched s6 detection off /proc/1/exe to /proc/1/comm so it works for the unprivileged hermes user). Add explicit encoding='utf-8'. /proc/1/comm is always plain ASCII (the kernel's PR_GET_NAME / TASK_COMM_LEN buffer), so utf-8 is correct and locale-independent.	2026-05-25 10:32:36 +10:00
Ben Barclay	59da190512	Merge branch 'main' into docker_s6	2026-05-25 09:39:27 +10:00
leeseoki0	ce529d6072	fix(kanban): scratch tasks must not inherit board.default_workdir (#28818 ) Board defaults represent persistent project checkouts. Scratch workspaces are auto-deleted on completion and must stay under the per-board scratch root that resolve_workspace() creates. Inheriting default_workdir for a scratch task pointed the cleanup path at the user's source tree — the data-loss vector documented in #28818. The containment guard in _cleanup_workspace (just added) is the safety rail. This commit prevents the bad state from being created in the first place: only persistent kinds (dir/worktree) inherit board defaults. Tests updated to cover the new semantics: scratch with default_workdir set keeps workspace_path=None; dir/worktree still inherits the board default. Salvaged from PR #31315 by @leeseoki0 — prevention layer on top of the #28819 containment fix by @briandevans. Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>	2026-05-24 15:48:58 -07:00
briandevans	23115b5c0f	fix(kanban): restrict managed-scratch roots to workspaces/ dirs only Copilot review on PR #28819 flagged that `_is_managed_scratch_path` accepted the entire `<kanban_home>/kanban` subtree as managed scratch storage. With that, a task whose `workspace_kind='scratch'` and `workspace_path` was mis-set to `<kanban_home>/kanban`, `.../kanban/logs`, or a board's metadata directory (e.g. `.../kanban/boards/<slug>` without the `workspaces/` child) would pass the containment guard and let task completion `shutil.rmtree` Hermes' own DB, metadata, and log subtrees. Tighten the guard: * Allowed roots are now exclusively `workspaces/` directories — the `HERMES_KANBAN_WORKSPACES_ROOT` override, `<kanban_home>/kanban/workspaces`, and each `<kanban_home>/kanban/boards/<slug>/workspaces` discovered on disk. * Require strict descendancy: a path equal to a root itself is rejected too, because deleting a workspaces root would wipe every task's scratch dir at once. Add a regression test covering the three Copilot-named attack paths (kanban root, kanban/logs, board root without `workspaces/`) plus the workspaces-root-itself case, and confirm the inner task-id dir still matches.	2026-05-24 15:48:58 -07:00
briandevans	80ad1609c8	fix(kanban): refuse to rmtree workspace_path outside managed scratch root (#28818 ) A board's ``default_workdir`` (e.g. ``hermes kanban boards set-default-workdir my-board /path/to/real/source``) is copied into ``tasks.workspace_path`` for tasks created without an explicit ``workspace_kind``. Those tasks default to ``workspace_kind='scratch'``, so completion calls ``_cleanup_workspace`` and unconditionally runs ``shutil.rmtree(wp, ignore_errors=True)`` — deleting the user's real source tree as if it were disposable scratch storage. Add ``_is_managed_scratch_path()`` and gate ``_cleanup_workspace`` on it: only delete paths under ``HERMES_KANBAN_WORKSPACES_ROOT`` (the worker-side override the dispatcher injects) or under the active kanban home's ``kanban/`` subtree (covering both the legacy default-board root and per-board ``kanban/boards/<slug>/workspaces`` roots). Anything else gets a warning log and is left alone, so a misconfigured ``default_workdir`` can no longer destroy user data on task completion.	2026-05-24 15:48:58 -07:00
helix4u	514f5020c7	fix(debug): redact BlueBubbles webhook secrets	2026-05-24 15:43:48 -07:00
Teknium	13b85bc646	feat(config): document resume-recap tuning keys in DEFAULT_CONFIG The hardcoded constants in _display_resumed_history were exposed as config in PR #4434; declare them in DEFAULT_CONFIG and the CLI fallback dict so they show up in 'hermes config' diagnostics and the schema validator.	2026-05-24 15:36:37 -07:00
Teknium	54e61f9331	fix(matrix,gateway): Matrix E2EE installs full dep set; plugins respect is_connected Fixes #31116 — two distinct bugs in fresh-install Matrix gateway: 1. Matrix E2EE setup installed only mautrix[encryption], leaving asyncpg / aiosqlite / Markdown / aiohttp-socks uninstalled. The first encrypted connect failed with 'No module named asyncpg' deep inside MatrixAdapter.connect(). Root cause: the setup wizard hand-rolled a pip install of one package instead of using lazy_deps.ensure( 'platform.matrix'), and check_matrix_requirements() short-circuited the runtime installer on 'import mautrix' alone — so the other 4 packages were never pulled in. 2. Discord auto-enabled itself on every gateway start, even when the user never selected Discord and had no DISCORD_BOT_TOKEN. Root cause: gateway/config.py plugin-enablement loop gated enablement on entry.check_fn() (just 'is the SDK importable?') and ignored entry.is_connected (the 'did the user configure credentials?' probe). Same bug class as commit `7849a3d73` fixed for _platform_status in the setup wizard; this is the runtime counterpart. Affects Discord, Teams, and Google Chat. Changes: - hermes_cli/setup.py::_setup_matrix — install via lazy_deps.ensure('platform.matrix') to pull the full feature group. - gateway/platforms/matrix.py::_check_e2ee_deps — verify asyncpg + aiosqlite + PgCryptoStore in addition to OlmMachine, so E2EE failures surface at startup instead of at first encrypted-room connect. - gateway/platforms/matrix.py::check_matrix_requirements — use feature_missing('platform.matrix') as the install gate instead of a single 'import mautrix' check, so partial installs trigger the lazy installer correctly. - gateway/config.py plugin-enablement loop — consult entry.is_connected before flipping enabled=True. Explicit YAML enabled=true still wins. Tests: 3 new in tests/gateway/test_matrix.py (asyncpg-required, aiosqlite-required, partial-install lazy-runs), 5 new in tests/gateway/test_platform_registry.py (is_connected=False blocks, is_connected=True enables, is_connected=None falls back to check_fn, raising probe doesn't enable, explicit YAML wins). Validation: 310 tests across affected test modules pass.	2026-05-24 15:16:03 -07:00
Teknium	7ab1677362	feat(security): on-demand supply-chain audit via OSV.dev (#31460 ) Adds 'hermes security audit' — a one-shot vulnerability scan against OSV.dev covering three surfaces a Hermes user actually controls: 1. The running Python's installed PyPI dists (importlib.metadata) 2. Plugin requirements.txt / pyproject.toml pins under ~/.hermes/plugins/ 3. Pinned npx/uvx MCP servers in config.yaml Zero new dependencies (stdlib urllib + importlib.metadata + tomllib + concurrent.futures). No auth required for OSV's public batch API. Flags: --json, --fail-on {low,moderate,high,critical} (default: critical), --skip-venv, --skip-plugins, --skip-mcp Output groups findings by source, sorts by severity descending, surfaces fixed-versions inline. Exit 1 when any finding meets the --fail-on tier. Deliberately out of scope: globally-installed pip/npm, editor/browser extensions, daily background scans, auto-blocking of installs. The audit is on-demand by design — daily scans become noise the user trains themselves to ignore.	2026-05-24 15:15:16 -07:00
hinotoi-agent	2e66eefbc3	fix(dashboard): validate WebSocket Host and Origin	2026-05-24 15:00:44 -07:00
Hinotoi-agent	3bace071bf	fix(state): restrict sensitive store file permissions response_store.db (api server) holds conversation history including tool payloads, prompts, and results. webhook_subscriptions.json holds per-route HMAC secrets. Under a permissive umask (e.g. 0o022, default on most distros) both files were created mode 0o644 — readable by other local users on shared boxes. - gateway/platforms/api_server.py: ResponseStore tightens itself + WAL/SHM sidecars to 0o600 after __init__, then trusts the inode. (Original contributor patch chmod'd after every _commit() — wasteful on a hot api_server path; chmod-on-create is sufficient since SQLite preserves mode bits across writes.) - hermes_cli/webhook.py: _save_subscriptions writes via tempfile.mkstemp (which itself creates the file with 0o600), chmods the temp before the atomic rename, and re-asserts 0o600 on the destination so an existing permissive file from before this fix gets narrowed. Tests cover (a) creation under permissive umask leaves 0o600 and (b) an existing 0o644 webhook_subscriptions.json gets narrowed on next save. Tests guarded with skipif os.name=='nt' since POSIX mode bits don't apply on Windows. Salvaged from PR #30917 by @Hinotoi-agent. Reworked the api_server.py side from chmod-on-every-commit to chmod-on-create. Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>	2026-05-24 04:55:18 -07:00
Jiaming Guo	ee002e7fc5	fix(dashboard): require auth for plugin rescan (#27340 )	2026-05-24 04:45:07 -07:00
Teknium	be27bfed01	security: harden API server key placeholder handling (#30738 )	2026-05-24 04:25:32 -07:00
Teknium	1f897b0dc9	fix(gateway): stop enabling dingtalk allow-all during setup (#30743 )	2026-05-24 04:24:44 -07:00
Teknium	9732559864	fix(security): restrict dashboard websockets to loopback clients (#30741 )	2026-05-24 04:24:40 -07:00
Teknium	bc3f1f4f34	feat(secrets/bitwarden): EU Cloud + self-hosted server URL support (#31378 ) Closes #31370. bws defaults to the US identity endpoint, so EU Cloud and self-hosted machine-account tokens fail with [400 Bad Request] {"error":"invalid_client"} during 'hermes secrets bitwarden setup'. The token is valid — it's just being checked against the wrong region. Add a Bitwarden region step to the wizard between the access-token and project-list steps: Step 1 Install bws Step 2 Provide access token Step 3 Pick region <-- new (US / EU / self-hosted-custom-URL) Step 4 Pick project (now talks to the right endpoint) Step 5 Test fetch Region is stored in config.yaml as secrets.bitwarden.server_url and plumbed into every bws subprocess as BWS_SERVER_URL (project list, secret list, test fetch, and the env_loader startup pull). Also: - Non-interactive: 'hermes secrets bitwarden setup --server-url ...' - Pre-existing BWS_SERVER_URL in the shell is detected and reused - Cache key includes server_url so EU/US fetches don't collide - 'hermes secrets bitwarden status' shows the configured region - 'invalid_client' / '400 Bad Request' from bws now triggers a hint pointing at the region setting instead of looking like a bad token	2026-05-24 02:19:57 -07:00
Teknium	b207dc28b3	feat(kanban): --ids bulk promote + AUTHOR_MAP entry for #29464 Adds an --ids flag to 'hermes kanban promote' mirroring the existing block/schedule convention, so the marquee use case from issue #28822 (promote all children of a closed organizational parent in one shot) doesn't require a shell loop. Single-id JSON output stays a flat object for back-compat; bulk emits a list. Dedupes positional + --ids so the same id can't be promoted twice in one call. 5 new CLI-level tests cover bulk happy path, partial-failure exit code, JSON shapes, and dedup. Also adds the thedavidmurray noreply-email -> github-login mapping in scripts/release.py so the salvage cherry-pick passes the AUTHOR_MAP contributor-credit check.	2026-05-23 23:10:36 -07:00
David Murray	d46adad22f	feat(cli): kanban promote verb for manual todo->ready recovery Adds `hermes kanban promote <task_id>` for manual lifecycle recovery when an auto-promote daemon misses the parent-done transition (issue #28822). Refuses promotion unless every parent dep is done/archived (override with --force). Emits a `promoted_manual` audit event distinct from the automatic `promoted` kind, so audit consumers can filter human-driven from system-driven promotions. Supports --dry-run and --json for orchestration. Does not mutate assignee/claim state — the dispatcher picks the card up via its normal ready polling path. Closes #28822.	2026-05-23 23:10:36 -07:00
honor2030	6a1aa420e7	Fix CLI verbose tool progress config fallback	2026-05-23 21:03:51 -07:00
Teknium	e42fcc5625	fix(provider): make config.yaml model.provider the single source of truth (#31222 ) Policy: if it ain't a secret it goes in config.yaml. HERMES_INFERENCE_PROVIDER was leaking behavioral config into the .env surface, including from the gateway, which bypassed config.yaml entirely. Behavior: - gateway/run.py: drop HERMES_INFERENCE_PROVIDER read in _resolve_runtime_agent_kwargs. Gateway now flows through resolve_runtime_provider() with no `requested` override, which reads model.provider from config.yaml first. Docs/UX (strip env var from user-facing surface): - --provider help text no longer mentions the env var - cli-config.yaml.example same - reference/environment-variables.md: remove HERMES_INFERENCE_PROVIDER row and the cross-reference from HERMES_INFERENCE_MODEL - reference/cli-commands.md: blank the env-var column for --provider - guides/xai-grok-oauth.md, guides/minimax-oauth.md: replace HERMES_INFERENCE_PROVIDER=x hermes invocations with config.yaml / --provider - developer-guide/adding-providers.md, model-provider-plugin.md: reframe Internal mechanism (kept as-is): - hermes_cli/main.py writes HERMES_INFERENCE_PROVIDER into the TUI subprocess env - tui_gateway/server.py reads it on TUI startup - resolve_requested_provider() / oneshot.py / cli.py still fall through to the env var as a last-resort behind config.yaml, which is what makes the TUI parent->child handoff work This stays. We just stop documenting it as a user knob. Tests: tests/gateway/test_auth_fallback.py — simplify mock to fail on first call, succeed on second; drop monkeypatch.setenv lines that no longer matter. Supersedes #31064 (closed with credit to @novax635 who surfaced the underlying issue but proposed aligning gateway to the env var rather than removing it).	2026-05-23 18:18:41 -07:00
Edison	e752c9454e	feat(plugins): add register_auxiliary_task() to PluginContext API Auxiliary LLM tasks (vision, compression, web_extract, etc.) currently require modifications to core files for any plugin that needs its own task slot — specifically the _AUX_TASKS list in hermes_cli/main.py and the hardcoded env-var bridging dict in gateway/run.py. This violates the 'plugins must not modify core files' rule and forces every memory or context plugin that wants its own auxiliary task to either fork core or open a coupled core+plugin PR. This change adds a generic plugin surface for auxiliary task registration: ctx.register_auxiliary_task( key='memory_retain_filter', display_name='Memory retain filter', description='hindsight pre-retain dedup/extract', defaults={'timeout': 30, 'extra_body': {'reasoning_effort': 'low'}}, ) After registration, the task automatically: - Appears in 'hermes model → Configure auxiliary models' picker via a new _all_aux_tasks() merge of built-in + plugin tasks - Has its provider/model/base_url/api_key bridged from config.yaml to AUXILIARY_<KEY_UPPER>_* env vars at gateway startup (gateway/run.py now uses a dynamic bridged-keys set instead of a hardcoded per-task dict) - Gets plugin-declared defaults (timeout, extra_body, etc.) layered underneath user config so unconfigured plugin tasks still work (agent/auxiliary_client._get_auxiliary_task_config) - Resets to auto via 'Reset all to auto' alongside built-ins Validation: - Rejects shadowing of built-in keys (vision, compression, etc.) - Rejects invalid key shapes (must match [A-Za-z0-9_]+) - Rejects cross-plugin collisions (clear error) - Allows same-plugin re-registration (idempotent updates) Plugin discovery failures (rare) fall back gracefully — the aux config UI still shows built-in tasks if get_plugin_auxiliary_tasks() raises, and gateway env-var bridging keeps working for built-ins. Built-in tasks remain hardcoded in _AUX_TASKS for stability — they're the baseline UX, and DEFAULT_CONFIG already ships their defaults. Plugin tasks layer on top. Tests: 15 new tests in test_plugin_auxiliary_tasks.py covering API validation, manager state lifecycle, helper sort order, _all_aux_tasks merge semantics, _reset_aux_to_auto inclusion of plugin tasks, and default-layering in auxiliary_client. Updates the gateway-bridge code-parity test (test_auxiliary_config_bridge) to assert the new dynamic shape rather than the hardcoded literal env var names which no longer appear post-refactor. Motivation: this unblocks PR #20262 (hindsight smart retain pipeline) and similar plugins that need a dedicated aux task slot. The change is non-breaking — built-in env vars (AUXILIARY_VISION_PROVIDER, etc.) keep working since they're produced by the same f-string template that built the hardcoded names.	2026-05-23 17:49:47 -07:00
soynchux	e8fa415a9e	fix(cli): validate runtime token refresh capability in Qwen auth status	2026-05-23 17:47:36 -07:00
teknium1	4254f7dd17	refactor(skills): slim AST diagnostic to single entry point Trim ~600 LOC off the original contribution while keeping the same operator-facing surface and detection coverage. - Collapse three entry points (file / dir / bundle) into one ast_scan_path(path) that handles both files and directories. - Drop AstFinding dataclass + severity field — replaced with plain (file, line, pattern_id, description) tuples. Severity ordering was display-only for a diagnostic that explicitly disclaims security verdicts, so the field added bookkeeping without earning its place. - Replace Rich-markup formatter with plain text grouped by file. - Drop the 'inspect --ast-deep' surface — same scanner, same output as 'audit --deep', single CLI entry is enough. Operators audit after install; pre-install inspection signal isn't worth the second surface. - Trim test file to the cases that earn their place: bypass payload, syntax error survival, RecursionError survival, false-positive guard (importer lookalike), literal-arg false-positive guard, non-.py ignored, directory recursion + cache-dir skipping, missing-path, getattr/__dict__ detection, formatter empty + populated. Net: tools/skills_ast_audit.py 353 -> 133 LOC, tests/tools/test_skills_ast_audit.py 299 -> 103 LOC, full diff +704/-12 -> +264/-6. No change to tools/skills_guard.py — Skills Guard verdicts remain untouched per SECURITY.md §2.4.	2026-05-23 17:47:26 -07:00
Tranquil-Flow	7255050c99	feat(skills): add opt-in AST deep diagnostics Add opt-in AST diagnostics for skill review without making Skills Guard stricter by default. - Add hermes skills inspect --ast-deep to scan fetched skill bundles before installation - Add hermes skills audit --deep to scan already-installed hub skills - Keep AST analysis in tools/skills_ast_audit.py, separate from tools/skills_guard.py - Label output as diagnostic hints, not security verdicts - Cover dynamic import/access patterns: importlib, __import__(computed), getattr(computed), and __dict__[computed] This follows the maintainer guidance from closed PR #7436: useful AST-level analysis belongs in an opt-in diagnostic path, not in Skills Guard's default heuristic scan.	2026-05-23 17:47:26 -07:00
Yuan Li	75643a6154	fix(env): strip null bytes from .env before python-dotenv loads Null bytes in API key values (introduced by copy-paste) crash os.environ[k] = v with ValueError: embedded null byte, preventing hermes from starting at all.	2026-05-23 17:17:05 -07:00
0xchainer	2c34a7da87	fix(cli): prevent temp directory leak on ZIP update failure Move shutil.rmtree into a finally block so the temp directory is always cleaned up, even when an exception occurs during download, extraction, or file copying.	2026-05-23 16:16:35 -07:00
Teknium	6a8e131a0a	refactor(ntfy): convert built-in adapter to platform plugin ntfy now ships as a self-contained plugin under plugins/platforms/ntfy/ instead of editing 8 core files (gateway/config.py Platform enum, gateway/run.py factory + auth maps, cron/scheduler.py, toolsets.py, hermes_cli/status.py, agent/prompt_builder.py, gateway/channel_directory.py, tools/send_message_tool.py). All routing goes through gateway/platform_registry via register_platform(): - adapter_factory, check_fn, validate_config, is_connected - env_enablement_fn seeds PlatformConfig.extra from NTFY_* env vars so gateway status reflects env-only setups without instantiating httpx - standalone_sender_fn handles deliver=ntfy cron jobs when cron runs out-of-process from the gateway - allowed_users_env / allow_all_env hook into _is_user_authorized - cron_deliver_env_var=NTFY_HOME_CHANNEL for cron home routing - platform_hint surfaces in the system prompt - pii_safe=True (topic names are the only identifier; no PII to redact) Tests moved to tests/gateway/test_ntfy_plugin.py using _plugin_adapter_loader so the module lives under plugin_adapter_ntfy in sys.modules and cannot collide with sibling plugin-adapter tests on the same xdist worker. The core-file grep tests (Platform.NTFY in source, hermes-ntfy in toolsets, etc.) are replaced with plugin-shape tests covering register() metadata, env_enablement_fn output, and standalone_sender_fn behavior. 68 tests pass under scripts/run_tests.sh.	2026-05-23 16:13:01 -07:00
sprmn24	b10f17bf1e	feat(ntfy): add ntfy platform adapter with atomic reconnect, identity fix, and 81 tests	2026-05-23 16:13:01 -07:00
Teknium	ad11327db0	feat(kanban): warn users that scratch workspaces are deleted on completion (#30949 ) First scratch workspace creation on an install now emits a one-shot warning log + a 'tip_scratch_workspace' event on the task. Sentinel file at ~/.hermes/kanban/.scratch_tip_shown silences subsequent creations across the whole install. Behavior unchanged — scratch is still ephemeral by design. This just makes the design visible to new users (reported in user community: 'progress files vanished, no warning anywhere'). Docs (en + ko) updated to spell out 'Deleted when the task completes' on the scratch bullet and 'Preserved on completion' on worktree/dir.	2026-05-23 11:27:00 -07:00
teknium1	c4b8f5efee	fix(kanban): harden corrupt-db backup against CodeQL path-injection findings Path.resolve() before any I/O and confine backup writes to the resolved parent directory. Adds explicit parent-equality assertions so static analyzers see the containment guarantee, and walks WAL/SHM sidecars through the same resolved-parent path so accidental .. segments are collapsed before shutil.copy2. Functionally equivalent to the original PR; preserves the corrupt bytes to <db>.corrupt.<ts>.bak in the same directory, still raises KanbanDbCorruptError from connect(). E2E with Stefan's exact hex header + malformed pages still passes. 163/163 kanban tests still pass.	2026-05-23 05:51:33 -07:00
Nick	39fe4ecee3	fix(kanban): refuse corrupt db auto-init	2026-05-23 05:51:33 -07:00
QuenVix	7245bc77eb	fix(fallback): merge fallback_providers with legacy fallback_model configurations	2026-05-23 05:24:57 -07:00
Teknium	b4cf5b65dd	feat(portal): one-shot setup, status CLI, and Nous-included markers (#30860 ) * feat(portal): one-shot setup, status CLI, and Nous-included markers Four small Portal-aware surfaces that drive subscription value without adding friction for non-Portal users. - hermes setup --portal: one-shot Nous OAuth + provider switch + Tool Gateway opt-in. Shareable as a single command from docs/social. - hermes portal {status,open,tools}: small surface over Portal auth + Tool Gateway routing. Defaults to 'status' when no subcommand. - Tool picker (hermes tools): when the user is logged into Nous, mark Nous-managed provider rows with a star and 'Included with your Nous subscription'. Suppressed when not authed — non-subscribers see the picker unchanged. - BYOK setup hint: a single dim line 'Available through Nous Portal subscription.' appears when the user is being prompted for a paid API key (Firecrawl, FAL, ElevenLabs, Browserbase, etc.) AND the category has a Nous-managed sibling AND the user is not already authed to Nous. Suppressed in all other cases. Tested live end-to-end in an isolated HERMES_HOME with a simulated authed and unauthed user. Targeted suite (tests/hermes_cli/ test_tools_config.py + test_setup.py) passes 97/97. * fix: add portal to _BUILTIN_SUBCOMMANDS so plugin discovery fast-path skips it	2026-05-23 02:39:09 -07:00
sprmn24	b183be95a2	fix(gateway-windows): atomic write for .cmd and startup launcher scripts	2026-05-23 02:30:41 -07:00
xxxigm	8bf99227f0	fix(plugins): block plugin-api path traversal + project RCE (#29156 ) GHSA-5qr3-c538-wm9j — half two of the bypass chain. ``_mount_plugin_api_routes`` imports each dashboard plugin's manifest ``api`` field as a Python module via ``importlib.util.spec_from_file_location`` — arbitrary code execution by design. Two primitives in the surrounding code turned that "by design" RCE into a usable attack: 1. Absolute paths in the manifest swallow the plugin directory. ``Path('safe/dashboard') / '/tmp/evil.py'`` resolves to ``/tmp/evil.py``, so a single manifest line ``{"api": "/tmp/payload.py"}`` was enough to redirect the importer at any Python file on disk. 2. ``..`` traversal in the manifest climbs out of the dashboard directory. ``Path('plugins/safe/dashboard') / '../../../tmp/evil.py'`` lands in ``/tmp/evil.py`` after ``resolve()`` — the static-asset handler (``serve_plugin_asset``) already defends against this via ``is_relative_to``; the api-mount path didn't. Fix at three layers so a regression in any one can't re-open the advisory: * New ``_safe_plugin_api_relpath`` validator runs at discovery time and stores only sanitised relative paths on the plugin entry's ``_api_file`` field. Absolute paths, ``..`` traversal, empty / non-string values, and paths that ``resolve()`` outside the plugin's ``dashboard/`` directory are rejected with a warning naming the plugin. ``has_api`` follows the sanitised value so the dashboard frontend doesn't render a fake "Backend API" badge for plugins whose api was scrubbed. * ``_mount_plugin_api_routes`` re-validates the resolved path against the live filesystem just before the import — defence in depth in case ``_dir`` is tampered with post-cache or a future caller bypasses the discovery-time validator. * Project plugins (``source == "project"``) are refused outright for backend import. ``./.hermes/plugins/`` ships with the CWD, so any threat model that includes "user opens a malicious repo" treats it as attacker-controlled; project plugins can still extend the UI via static JS/CSS but their Python ``api`` is no longer auto-imported. Combined with the truthy env-gate fix from the previous commit, the original advisory chain now fails at two distinct choke points.	2026-05-23 01:43:52 -07:00
xxxigm	09f85f2cf7	fix(plugins): apply truthy env semantics to project-plugin gate (#29156 ) GHSA-5qr3-c538-wm9j — half one of the bypass chain. ``_discover_dashboard_plugins`` opted into the untrusted ``./.hermes/ plugins/`` source via ``if os.environ.get("HERMES_ENABLE_PROJECT_ PLUGINS"):`` — which is True for any non-empty string. ``=0``, ``=false``, ``=no``, ``=off`` all return non-empty strings and so enabled the project source even though every operator (and the agent loader, ``hermes_cli/plugins.py`` line 815) reads those values as "disabled". An attacker who can land a manifest under the CWD's ``.hermes/plugins/`` directory — a malicious cloned repo, a worktree checked out from a forked PR, a CI runner workspace — was therefore guaranteed to get their manifest discovered the moment the user ran ``hermes dashboard`` from that directory, regardless of whether the user thought they had project plugins disabled. Switch to the shared ``utils.env_var_enabled`` helper used by the agent loader so the gate accepts the documented truthy set (``1`` / ``true`` / ``yes`` / ``on``, case-insensitive) and treats everything else — including ``0`` / ``false`` / ``no`` — as off. Half two (path-traversal + project-source ``api`` import) lands in the next commit. Together they break the RCE chain at two distinct choke points so a future regression in either one alone can't re-open the advisory.	2026-05-23 01:43:52 -07:00
Ben	541b40532a	fix(container_boot): publish reconciled service dirs atomically PR #30136 review noted the asymmetry: `register_profile_gateway` used tmp_dir + rename to publish a new service slot atomically, but the boot-time reconciler wrote files into the slot directly. Same underlying concern (a concurrent s6-svscan rescan could observe a half-populated directory), different code path. Rewrite `container_boot._register_service` to mirror the manager: build everything in `<scandir>/gateway-<profile>.tmp/`, then `Path.replace` into place. If a previous interrupted run left a `.tmp` sibling, it's cleaned up before the new build starts. If the target already exists, it's removed before the rename so `Path.replace` doesn't error on a non-empty target (Linux `rename` overwrites empty targets only). Three new tests: atomic publication leaves no .tmp leftovers, overwriting an existing slot still leaves no .tmp leftovers, and a stale .tmp from an interrupted run is cleaned up automatically.	2026-05-23 15:34:51 +10:00
Ben	5b1fcdd16b	fix(container_boot): rotate container-boot.log when it exceeds 256 KiB PR #30136 review noted: container-boot.log was append-only with no rotation. On a long-lived container with frequent restarts and many profiles it would grow unboundedly (~80 B per profile per reconcile pass). Add a soft cap: when the file size hits 256 KiB (`_LOG_ROTATE_BYTES`, ≈3000 reconcile lines, ≈1 year of daily reboots × 5 profiles), the current file is renamed to `container-boot.log.1` (replacing any existing one) before new entries are appended. Worst case is two files at ~512 KiB — well within visibility limits for grep/cat. Rotation is intentionally simple (no logrotate or s6-log machinery for one append-only file). Failures during rotation are logged via the module logger and treated as non-fatal — we keep appending to the existing file rather than dropping the reconcile entry. Three new unit tests cover above-threshold rotation, below-threshold non-rotation, and overwrite of an existing .1 file.	2026-05-23 15:33:11 +10:00
Ben	8b6733ebe2	fix(service_manager): rip out dead port parameter PR #30136 review caught: `_allocate_gateway_port()` in profiles.py computed a SHA-256-derived port that was threaded through `register_profile_gateway(profile, port=N)` → `_render_run_script(profile, port, extra_env)` → and then ignored. The rendered run script picked the bind port from the profile's config.yaml (`[gateway] port = …`), never from the allocator. So the entire allocator + parameter chain was dead code. Remove: * `hermes_cli.profiles._allocate_gateway_port` (deterministic SHA-256 → [9200, 9800) — never used). * `port` kwarg from `ServiceManager.register_profile_gateway` (Protocol + Mixin + S6 implementation). * `port` positional arg from `_render_run_script(profile, port, extra_env)` — now `_render_run_script(profile, extra_env)`. * The pass-through call in `profiles._maybe_register_gateway_service`. config.yaml is now the single source of truth for gateway port selection — matches reality and reduces the API surface. Three explanatory comments in service_manager.py / profiles.py document the retirement so future readers don't reach for the allocator and find a ghost. Tests: drop the three `_allocate_gateway_port` tests; update fakes' signatures throughout test_service_manager.py and test_profiles_s6_hooks.py to match the new no-port API.	2026-05-23 15:30:15 +10:00
Ben	1759c0f090	fix(service_manager): friendly errors for missing slots and s6-svc failures PR #30136 review caught: `S6ServiceManager.start/stop/restart` called `subprocess.run(check=True)` on `s6-svc`, so any failure surfaced as a raw `CalledProcessError` traceback. The two cases operators actually hit are: 1. The service slot doesn't exist — most commonly because the user typed a profile name wrong (`hermes -p typo gateway start`). 2. s6-svc itself fails — most commonly EACCES on the supervise control FIFO when running unprivileged. Both deserve named errors with actionable messages, not stacktraces. Changes: * Add `S6Error` base + two concrete errors in `hermes_cli.service_manager`: - `GatewayNotRegisteredError(profile)` — carries the unprefixed profile name; message: `no such gateway 'typo': register it with `hermes profile create typo` first, or pass an existing profile name via `-p <name>``. - `S6CommandError(service, action, returncode, stderr)` — carries the s6-svc rc and stderr; message: `s6-svc start on 'gateway-coder' failed (rc=111): <stderr>`. * Factor lifecycle dispatch through `_run_svc(flag, label, name)`: pre-checks that the service directory exists (raises GatewayNotRegisteredError before invoking s6-svc), then runs s6-svc and translates any CalledProcessError into S6CommandError. * `_dispatch_via_service_manager_if_s6` in `hermes_cli.gateway` catches both errors and prints `✗ <message>` + `sys.exit(1)` instead of letting the exception bubble. The dispatch path that used to dump a traceback at the user now gives an actionable one-liner. Tests: 6 new tests for the error types and their CLI rendering; existing lifecycle test pre-seeds the slot directory before calling `mgr.start` etc.	2026-05-23 15:20:41 +10:00
Ben	367c15b1dc	fix(container_boot): always register gateway-default slot PR #30136 review caught: `hermes gateway start` (no `-p`) inside the container resolves `_profile_suffix() == ""` → service name `gateway-default`, but no such slot was ever registered. The Phase 4 profile-create hook only fired on `hermes profile create <name>`, and the root profile (which lives at the top of $HERMES_HOME, not under `profiles/`) was never one of those. So bare `hermes gateway start` landed on `s6-svc -u /run/service/gateway-default` → uncaught `CalledProcessError` → traceback to the user. Changes: 1. `reconcile_profile_gateways` now always registers a `gateway-default` slot before iterating named profiles. Its prior state is read from `$HERMES_HOME/gateway_state.json` (sibling to the profile root, not under `profiles/`); stale runtime files there are swept the same way. Auto-up only if the prior state was `running` — same rule as named profiles. 2. `S6ServiceManager._render_run_script` special-cases `profile == "default"` to emit `hermes gateway run` with NO `-p` flag. Passing `-p default` would resolve to `$HERMES_HOME/profiles/default/` — a different profile that almost certainly doesn't exist. The empty profile-suffix convention is the dispatcher's contract and the run script has to match. 3. A user-created `profiles/default/` collides with the reserved root-profile slot; the reconciler now skips it with a warning rather than producing two registrations of the same service name. Action-list ordering is stable: `default` first, then named profiles in directory order. Boot-log readers can rely on this. Tests: 8 new dedicated default-slot tests plus updates to every existing test that asserted against the action list (via the new `_named_actions` helper that drops the always-present default entry).	2026-05-23 15:16:35 +10:00
Ben	efd3569739	fix(gateway): route --all stop/restart through s6 under container PR #30136 review caught that `hermes gateway stop --all` and `... restart --all` were broken under s6. The Phase 4 dispatcher was gated on `not stop_all` (and the symmetric restart_all), so `--all` fell through to `kill_gateway_processes(all_profiles=True)`. pkill SIGTERMed every gateway, s6-supervise observed the crashes, and restarted every gateway ~1s later — net effect: `--all` kicked gateways instead of stopping them. Add `_dispatch_all_via_service_manager_if_s6(action)` that iterates `mgr.list_profile_gateways()` and routes stop/restart through each service slot. s6's `want up`/`want down` flips correctly, so a stop persists. Partial failures are surfaced per-profile with a running success count; the host pkill path is only reached when s6 isn't in play. `start --all` isn't a CLI surface — the helper rejects it and returns False (host code path can take over).	2026-05-23 15:08:17 +10:00
Ben	2f8ceeab9a	fix(service_manager): s6 detection works for unprivileged hermes user PR #30136 review surfaced two issues, both rooted in the same audit gap: docker integration tests were running as root, not the unprivileged `hermes` user (UID 10000) that the runtime actually uses via `s6-setuidgid hermes`. Anything that probed PID-1 state or wrote to the s6 control surface worked as root in the tests but was inert in production. Fixes: 1. `_s6_running()` previously called `Path("/proc/1/exe").resolve()`, which is root-only readable. For UID 10000 the symlink yields PermissionError, `resolve()` silently returns the unresolved path, and `exe.name == "exe"` — so detection always returned False, the service-manager runtime-registration path was inert, and every `hermes profile create` / `hermes -p X gateway start` silently skipped the s6 hook. Replace with `/proc/1/comm` (world-readable) + `/run/s6/basedir` (s6-overlay-specific) — both required, fail closed. 2. `02-reconcile-profiles` now also chowns `/run/service/.s6-svscan/` {control,lock} to hermes so `s6-svscanctl -a/-an` works without root. Previously the directory chown stopped at `/run/service` and the FIFO inside stayed root-owned, so `register_profile_gateway` from hermes failed at the rescan-trigger step with EACCES — the wrapper in profiles.py caught the exception and printed a swallowed warning, so profile creation appeared to succeed while the slot was rolled back. Audit changes to flush this class of bug next time: - Add `docker_exec` / `docker_exec_sh` helpers to `tests/docker/conftest.py` that default to `-u hermes`. The module docstring explains why and flags `user="root"` as opt-in only for tests that explicitly need root (none currently do). - Refactor every `docker exec` call in tests/docker/ through the new helpers (test_dashboard.py, test_zombie_reaping.py, test_profile_gateway.py, test_container_restart.py, test_s6_profile_gateway_integration.py). - Add 5 unit tests covering `_s6_running` under various probe states (both signals present; comm wrong; basedir missing; PermissionError on /proc/1/comm; missing /proc — non-Linux). The PermissionError test is the explicit regression guard for the original bug. Known follow-up: the per-service `supervise/control` FIFO inside each `/run/service/gateway-<profile>/supervise/` is created root-owned by s6-supervise (which runs as root because s6-svscan is PID 1). `s6-svc -u/-d/-t` from the hermes user will get EACCES on those. The audit under `-u hermes` will reveal this in lifecycle tests — surfacing the issue cleanly so it can be fixed in a focused follow-up (likely via a small SUID helper or a polling chown loop in cont-init.d). The detection + svscanctl fixes here are independent and complete on their own.	2026-05-23 14:56:39 +10:00
teknium1	8cf977c8b1	fix(plugins): widen _sanitize_plugin_name for category-namespaced names Follow-up to PR #28832 — the dashboard plugin routes now accept slashed names like `observability/langfuse` and `image_gen/openai`, but `_sanitize_plugin_name` still rejected forward slash and so dashboard update + remove on those plugins fell through to '404 not found' even though they exist on disk. Adds an opt-in `allow_subdir=True` flag that: - Permits internal forward slashes (category-namespaced plugin keys emitted by `_discover_all_plugins`). - Strips leading and trailing slashes. - Still rejects `..` and backslash, and still asserts the resolved target lives inside `plugins_dir`. Opted in at the two read-paths that operate on installed plugins: `_require_installed_plugin` (CLI update/remove) and `_user_installed_plugin_dir` (dashboard update/remove). The install path keeps the default (`allow_subdir=False`) because freshly-cloned plugins always land top-level under `~/.hermes/plugins/<name>/`. Adds 6 targeted unit tests covering the new flag's allow/reject matrix.	2026-05-22 19:50:32 -07:00
Austin Pickett	487c398dcf	refactor(web): dashboard typography & contrast pass Removes the global `uppercase` + `font-mondwest` from the App.tsx root that forced every page to opt-out, replaces stacked-alpha text colors with semantic tokens for WCAG-AA contrast across all 7 themes, and applies the new `text-display` utility from @nous-research/ui@0.16.0 on intentional brand chrome (page titles, sidebar headings, segmented filters) only. Bumps every sub-12px arbitrary text size to text-xs. Also widens the dashboard plugin routes (/api/dashboard/agent-plugins/ {name:path}/...) so category-namespaced plugins like observability/ langfuse and image_gen/openai can be enable/disabled from the dashboard — previously the FE encodeURIComponent-ed the slash and the backend {name} route rejected it. _validate_plugin_name still blocks .. and backslash, and strips leading/trailing slash. Touches sessions/env/keys page chrome and adds two new i18n keys (`overview`, `showMore`/`showLess`) across all 18 locales. Squashes 19 commits from PR #28832. Co-authored-by: Hermes <noreply@nousresearch.com>	2026-05-22 19:50:32 -07:00
ethernet	f89afdbd17	fix(test): deflake two intermittent CI failures - test_browser_secret_exfil: mock _run_browser_command instead of launching real Chrome (secret check is pre-launch, browser is irrelevant to the assertion) - test_web_server: add time.sleep(0.05) after pub.send_text() to yield the event loop before receive_text(). TestClient's sync mode can race the broadcast handler otherwise, hanging the test.	2026-05-22 19:46:18 -07:00
Teknium	a84cec61ca	fix(minimax-oauth): refresh short-lived access tokens per request (#30619 ) * fix(minimax-oauth): refresh short-lived access tokens per request MiniMax OAuth issues ~15-minute access tokens. The Anthropic SDK caches api_key as a static string at client construction, so a session that resolves credentials once at startup keeps sending the same bearer until MiniMax returns 401 mid-session. Swap the static string for a callable token provider, reusing the existing Entra-ID bearer-hook infrastructure in build_anthropic_client. The callable re-reads auth.json on each invocation and calls _refresh_minimax_oauth_state, which is a no-op when the token still has more than 60s of life left and refreshes proactively otherwise. Refreshes persist to auth.json so other processes (gateway, cron) see them immediately. The wire-up lives at the agent-init / model-switch boundary rather than in resolve_runtime_provider, so aux client paths that hand the api_key string to OpenAI(api_key=...) are unaffected. * docs: add infographic for minimax-oauth token refresh	2026-05-22 15:16:15 -07:00
adybag14-cyber	a3beee475b	perf(termux): speed up bare cli prompt startup	2026-05-22 14:27:38 -07:00
adybag14-cyber	6c3fd9714f	perf(termux): fast-path cli version startup	2026-05-22 14:27:38 -07:00
Teknium	7849a3d73f	fix(gateway,discord-plugin): _platform_status must respect is_connected=False, not silently fall back to check_fn Two bugs surfaced by PR #24356 migrating Discord into the registry: 1. plugins/platforms/discord/adapter.py::_is_connected — read DISCORD_BOT_TOKEN via hermes_cli.gateway.get_env_value (the abstraction tests patch) instead of os.getenv directly. The legacy non-registry path used get_env_value; bypassing it broke test_setup_openclaw_migration which patches gateway_mod.get_env_value to simulate a hermetic env. 2. hermes_cli/gateway.py::_platform_status — when entry.is_connected is defined and returns False, return 'not configured' immediately. Don't fall back to entry.check_fn(), which would let 'SDK is installed' override 'no token configured' and incorrectly report the platform as ready. The fallback to check_fn is the right behaviour only when is_connected is None (not registered). Fixes 5 test failures observed on CI for PR #24356: - tests/hermes_cli/test_setup.py::test_setup_gateway_skips_service_install_when_systemctl_missing - tests/hermes_cli/test_setup.py::test_setup_gateway_in_container_shows_docker_guidance - tests/hermes_cli/test_setup_irc.py::TestIRCGatewaySetupFreshInstall::test_setup_gateway_irc_counts_as_messaging_platform - tests/hermes_cli/test_setup_openclaw_migration.py::TestGetSectionConfigSummary::test_gateway_returns_none_without_tokens - tests/hermes_cli/test_setup_openclaw_migration.py::TestSetupWizardSkipsConfiguredSections::test_sections_skipped_when_migration_imported_settings Same _platform_status bug exists for sibling plugin platforms (teams, google_chat) whose check_fn returns true on SDK install alone; their tests just never exercised the registry path before. The bug only became test-visible when Discord migrated into the registry. Validation: 11,167 tests across tests/gateway/ + tests/cron/ + tests/tools/test_send_message_tool.py + tests/hermes_cli/ pass with zero failures.	2026-05-22 14:21:41 -07:00

1 2 3 4 5 ...

2210 commits