hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-14 14:12:44 +00:00

Author	SHA1	Message	Date
Teknium	db96fc60d0	fix(gateway): keep Telegram topic bindings aligned with compression children (#34409 ) Telegram DM topic bindings persist (chat_id, thread_id) -> session_id in SQLite so reopening a topic resumes the right Hermes session. When compression rotated session_entry.session_id mid-turn, the binding row stayed pointed at the pre-compression parent. On the next inbound message in that topic the gateway reloaded the oversized parent transcript, retriggering preflight compression — sometimes in a loop. Two-pronged fix: 1. `_sync_telegram_topic_binding(source, entry, *, reason)` helper called immediately after each of the three session_id rotation sites in _handle_message_with_agent (hygiene compression, agent-result compression rotation, /compress command). Keeps future bindings fresh. 2. Read-path self-heal: when resolving an existing topic binding, walk SessionDB.get_compression_tip() forward and switch_session to the descendant instead of the stored parent. Rewrites the binding row to the tip so subsequent messages skip the walk. Heals existing stale state on the next user message without requiring a gateway restart. Skipped from competing PRs as not load-bearing for the bug: - advance_session_after_compression SessionStore primitive (#26204/ #28870/#33416) — preserves end_reason='compression' analytics nicety but doesn't affect routing correctness. - Cached-agent eviction on session_id mismatch — _compress_context() already mutates tmp_agent.session_id on the cached object so the in-memory agent self-corrects. - Startup repair pass (#33416) — redundant once the read path heals on the next message; one-line CLI follow-up can address bindings for topics users never reopen. Closes #20470, #29712, #33414. Acknowledges work in #23195 (@litvinovvo), #26204 (@bizyumov), #28870 (@donrhmexe), #29713 (@hehehe0803), #29945 (@eugeneb1ack), #33416 (@bizyumov).	2026-05-28 23:25:52 -07:00
Ben	ec7736f8a7	fix(docker): auto-join Docker socket group for docker-in-docker backend When users bind-mount /var/run/docker.sock to use TERMINAL_ENV=docker from inside the container, the supervised hermes user (UID 10000) lacks permission to talk to the socket — every `docker` invocation EACCES'es and check_terminal_requirements() returns False. In messaging mode this also silently strips the file/terminal toolset from the registered tool list, so the agent rationalizes the missing tools as a platform restriction. The naive workaround (docker run --group-add <socket-gid>) does NOT work with our s6-setuidgid privilege drop: s6-setuidgid calls initgroups() for the target user, which rebuilds supp groups from /etc/group. Without a matching /etc/group entry the kernel-granted supp group is wiped between PID 1 and the dropped hermes process. Verified empirically: --group-add 998 alone: PID 1 Groups: 0 998 → after drop: Groups: 10000 This fix's /etc/group add: id hermes shows 998 → after drop: Groups: 998 10000 Detect the socket's GID at boot in stage2-hook (runs as root before the privilege drop), reuse an existing group name if one matches the GID, otherwise create 'hostdocker'. Idempotent across container restarts. Silent no-op when no socket is mounted. End-to-end verified by building the image and running the supervised hermes user against the real host Docker daemon: `docker version` succeeds and check_terminal_requirements() returns True. Fixes #16703	2026-05-29 16:15:44 +10:00
Ben Barclay	48083211ef	fix(docker): accept PUID/PGID as aliases for HERMES_UID/HERMES_GID (#25872 ) (#34401 ) Salvages #25872 by @konsisumer against current main. NAS users (UGOS, Synology, unRAID) expect the LinuxServer.io PUID/PGID convention and bind-mount /opt/data from a host directory owned by their own UID. Without this alias those vars are silently ignored and the s6-setuidgid drop to UID 10000 leaves the runtime unable to read the volume. HERMES_UID/HERMES_GID still take precedence when both are set. The original PR targeted docker/entrypoint.sh, which is now a 27-line deprecation shim under s6-overlay (the May 2026 rework moved all bootstrap logic to docker/stage2-hook.sh, installed as /etc/cont-init.d/01-hermes-setup). Re-applied the same 2-line alias resolution at the equivalent spot in stage2-hook.sh just before the existing UID/GID remap block. Test was retargeted at docker/stage2-hook.sh; docs hunk adapted to current main's wording ("stage2 hook" + s6-setuidgid, not the obsolete "entrypoint drops via gosu") with the NAS bind-mount example preserved verbatim. Test-first regression verification: reverted just docker/stage2-hook.sh to origin/main and re-ran the new tests. Result: FAILED test_stage2_hook_resolves_puid_pgid_aliases FAILED test_puid_pgid_populate_hermes_uid_gid AssertionError: assert ':' == '1000:10' That's the exact bug shape — PUID=1000 PGID=10 silently ignored, HERMES_UID/HERMES_GID stay empty. With the salvage applied, all 4 tests pass. Closes #25872 Co-authored-by: konsisumer <11262660+konsisumer@users.noreply.github.com>	2026-05-29 16:07:15 +10:00
wysie	a0fc3df878	fix(browser): rewrite Camofox Docker loopback URLs (#25541 ) Co-authored-by: Wysie <wysie@users.noreply.github.com>	2026-05-29 15:43:55 +10:00
Teknium	f61fd59b62	docs(run_agent): clarify why F401 re-exports stay	2026-05-28 22:26:25 -07:00
Teknium	00b8204cf4	fix: restore side-effect imports in test files (test_kanban_tools, test_command_guards) The previous ruff prune commit removed two categories of test-file imports whose value is the side effect of importing them, not their binding: tests/tools/test_kanban_tools.py — 5 sites `import tools.kanban_tools # ensure registered` The import itself runs tools/kanban_tools.py's @registry.register calls; without it, the kanban tool registry is empty and test_kanban_tools_visible_with_env_var asserts {} != {7 kanban tools}. tests/tools/test_command_guards.py — 1 site `import tools.tirith_security # Ensure the module is importable so we can patch it` The comment names the requirement: keep the bare module reference so subsequent mock.patch("tools.tirith_security.<fn>") calls find a registered submodule. CI failure: test (5) shard, tests/tools/test_kanban_tools.py:58 AssertionError: expected {kanban_*}, got set()	2026-05-28 22:26:25 -07:00
Teknium	e371bf5d68	fix: re-export pruned names for tests that mock.patch or from-import them The mechanical ruff prune in the previous commit removed several names that `appear` unused inside their defining module but are external test/runtime anchors: run_agent OpenAI, _SafeWriter get_tool_definitions, handle_function_call, check_toolset_requirements estimate_request_tokens_rough DEFAULT_AGENT_IDENTITY, build_context_files_prompt, build_environment_hints, build_nous_subscription_prompt _is_destructive_command, _extract_parallel_scope_path, _paths_overlap, _append_subdir_hint_to_multimodal, _trajectory_normalize_msg tools/web_tools Firecrawl, _get_firecrawl_client These get accessed via four channels that are invisible to ruff's in-module usage analysis: 1. `mock.patch('module.name', ...)` in tests — resolves the attribute lazily, so `pytest --collect-only` passes even when the name is gone, but every test using the patch fails at runtime with AttributeError. 2. `from run_agent import X` in production siblings (agent/transports /codex.py, etc.). 3. The `_ra().X` indirection pattern in agent/system_prompt.py et al. — explicitly documented ("Many tests patch('run_agent.load_soul_md')") to preserve the patch contract. 4. `from tools.web_tools import _get_firecrawl_client` in tests. Each re-added import carries an explicit `# noqa: F401` with a comment naming the channel, so future cleanup passes won't strip them again.	2026-05-28 22:26:25 -07:00
kshitijk4poor	66827f8947	chore: prune unused imports and duplicate import redefinitions Remove unused imports (F401) and duplicate/shadowed import redefinitions (F811) across the codebase using ruff's safe autofixes. No behavioral changes -- imports only. - ~1400 safe autofixes applied across 644 files (net -1072 lines) - __init__.py re-exports preserved (excluded from F401 removal so public re-export surfaces stay intact) - Re-exports that are imported or monkeypatched by tests but look unused in their defining module are kept with explicit # noqa: F401 (gateway/run.py load_dotenv; run_agent re-exports from agent.message_sanitization, agent.context_compressor, agent.retry_utils, agent.prompt_builder, agent.process_bootstrap, agent.codex_responses_adapter) - Unsafe F841 (unused-variable) fixes deliberately skipped -- those can change behavior when the RHS has side effects - ruff lints remain disabled in pyproject.toml (only PLW1514 is selected); this is a one-time cleanup, not a config change Verification: - python -m compileall: clean - pytest --collect-only: all 27161 tests collect (zero import errors) - core entry points import clean (run_agent, model_tools, cli, toolsets, hermes_state, batch_runner, gateway) - static scan: every name any test imports directly from an edited module still resolves	2026-05-28 22:26:25 -07:00
Teknium	a4d8f0f62a	feat(prompt): universal task-completion guidance + local Python toolchain probe (#34340 ) * fix(codex): surface error code in Responses 'failed' status errors When a Codex Responses turn ends with status=failed, the response carries the failure details under `response.error` as `{code, message, param, ...}`. The previous extractor pulled only `message`, so users seeing a rate-limit failure got a bare "Slow down" string indistinguishable from a generic stream truncation; an internal_error with empty message degraded to a dict dump ("{'code': 'internal_error', 'message': ''}"). Extract a `_format_responses_error()` helper that: - prefixes `code` when both code and message are present (e.g. 'rate_limit_exceeded: Slow down') - falls back to the bare `code` when message is empty - accepts both dict and attribute-style payloads (SDK and JSON-RPC paths) - preserves the prior status-only fallback when no error payload exists Apply the same helper at the sibling site in `codex_app_server_session.run_turn()` so codex-CLI subprocess turn failures get the same treatment. Tests: - 8 new unit tests for `_format_responses_error` covering both shapes, empty/missing fields, non-string fields, and the status-only fallback. - 2 regression tests on `_normalize_codex_response` for failed status with and without a code, asserting the exact RuntimeError message. - All 3603 tests in tests/agent/ pass. Adapted from anomalyco/opencode#28757. * feat(prompt): universal task-completion guidance + local Python toolchain probe Two cross-model failure modes get a single-line answer in the cached system prompt. Both gated by config (default on), both add zero overhead when not needed, both verified via real AIAgent prompt builds. ## What changed `TASK_COMPLETION_GUIDANCE` — short prompt block applied to ALL models. Targets two failure modes observed on a real Sarasota real-estate build task: (1) Opus stopped after writing an 85-byte stub and gave a prose response with finish_reason=stop on call #3 of 90; (2) DeepSeek pushed through a PEP-668 wall, then returned fabricated listings instead of admitting the blocker. Both behaviors are model-family-agnostic, so the guidance lives outside the existing tool_use_enforcement gate (~192 tokens, paid once per session via prefix cache). `tools/env_probe.py` — local Python toolchain probe. Detects python3/pip/uv/PEP-668 state and emits ONE short line in the system prompt when something is non-default. Emits NOTHING when the env is clean (zero token cost for normal users). Skipped entirely for remote terminal backends (docker/modal/ssh) — they have their own probe. Example output on a broken environment (the actual case): Python toolchain: python3=3.11.15 (no pip module), python=missing (use python3), pip→python3.12 (mismatch), PEP 668=yes (use venv or uv). ## Config Both flags live under `agent.` in config.yaml, default True: agent: task_completion_guidance: true # universal "finish the job" block environment_probe: true # local Python toolchain hints Neither addition required a `_config_version` bump — deep-merge fills defaults in for existing user configs. ## Validation \| Test surface \| Result \| \|---\|---\| \| tests/tools/test_env_probe.py \| 10/10 pass (probe unit) \| \| tests/run_agent/test_run_agent.py — new classes \| 8/8 pass (integration) \| \| TestToolUseEnforcementConfig \| 17/17 pass (no regression) \| \| TestBuildSystemPrompt \| 9/9 pass (no regression) \| \| TestInvalidateSystemPrompt \| 2/2 pass (no regression) \| \| tests/agent/test_prompt_builder.py \| 124/124 pass (no regression) \| \| tests/hermes_cli/ \| 5662/5662 pass (config defaults) \| \| E2E AIAgent build (broken env) \| Both blocks present, 2,178 chars \| \| E2E AIAgent build (clean env) \| 771-char net overhead, env probe silent \|	2026-05-28 22:26:09 -07:00
Teknium	75d2c081c9	fix(logging): recover gateway.log handler from external rotation (#34349 ) External rotation (logrotate, manual `mv gateway.log gateway.log.1`, another process rotating the file) leaves `_ManagedRotatingFileHandler`'s open fd pinned to the renamed inode. All subsequent writes go to the rotated backup instead of the file every operator expects to read, producing the symptom 'gateway.log frozen mid-write while agent.log keeps growing with gateway.* records'. PR #16229 fixed the original CLI->gateway init-order bug (#8404) so the handler attaches in the first place. This is the sibling fix for what happens after attach, when something external rotates underneath us. Adds a WatchedFileHandler-style inode check on emit(): if baseFilename no longer matches the open stream's (dev,ino), close the stale fd and reopen at the expected path. doRollover() refreshes the snapshot so our own rollover isn't misidentified as external. Five regression tests cover the matrix: external rename, external unlink, external truncate (must NOT trigger reopen — inode unchanged), normal doRollover() (must still work), and the end-to-end Allen-reproduction (rotate + re-call setup_logging). 55/55 tests in tests/test_hermes_logging.py pass; 5972/5972 in tests/gateway/ pass.	2026-05-28 22:26:00 -07:00
Teknium	a30480bd2b	fix(compression): prevent session-id fork from concurrent compressions (#34351 ) * fix(compression): prevent session-id fork from concurrent compressions When two AIAgent instances share the same session_id (most commonly the parent-turn agent and its background-review fork, which inherits session_id verbatim via background_review.py L451), both can call compress_context() on overlapping snapshots of the same conversation. Each ends the parent and creates its own NEW child session in state.db, both parented to the same old id. The gateway SessionEntry only catches one rotation; the other becomes an orphan that silently accumulates writes — Damien's incident shape (parent 20260527_234659_e65f0e → two children, only one visible). Adds a state.db-backed per-session compression lock. Acquired before the rotation in conversation_compression.compress_context(); on failure, the caller returns messages unchanged so the auto-compress retry loop stops cleanly. TTL (5min default) reclaims locks abandoned by crashed compressors. Lock holder identity (pid:tid:agent:nonce) is preserved for diagnostics via get_compression_lock_holder(). Schema bumped 13 -> 14 to track the new compression_locks table. Reconciled additively via the existing declarative-column pattern; no data migration needed for existing DBs. Regression test reproduces Damien's shape: two threads racing _compress_context on a shared parent_sid. Without the lock the test deterministically produces 2 child sessions; with the lock, exactly 1. Covers all six compression entry points (preflight in conversation_loop, mid-turn fallback, hygiene compression in gateway, /compact, CLI /compress, TUI /compress). ACP /compress was already protected by nulling out _session_db before its compress call. * ci: trigger rerun (transient GitHub API rate limit on CodeQL workflow)	2026-05-28 21:40:39 -07:00
liuhao1024	28bb7e0a8e	fix(web): bridge Tailwind --font-sans to --theme-font-sans (#20406 ) Tailwind v4 defines its own --font-sans and --font-mono tokens independently of the Hermes theme variables. Components using font-sans/font-mono utility classes bypass --theme-font-sans and --theme-font-mono, so theme font changes have no effect. Add --font-sans and --font-mono bridges in the @theme inline block so Tailwind's font tokens follow the active Hermes theme. Fixes #20380	2026-05-29 00:19:06 -04:00
teknium1	100536134c	refactor(gateway): generalize topic recovery via adapter hook Replace the runner-introspection trick in #32998 with an explicit `set_topic_recovery_fn` setter on `BasePlatformAdapter`. The gateway runner installs it once at adapter init; the adapter calls `_apply_topic_recovery(event)` before any session keying. Also apply the hook in `BasePlatformAdapter.handle_message` so the running-agent guard and pending-message queue key off the recovered thread_id too — not just the text-batch coalescence. Net change vs #32998 alone: -2 files of indirection (no `_message_handler.__self__` peek, no separate `_normalize_text_batch_source`), +1 generic mechanism (other adapters can install their own hook later).	2026-05-28 21:18:39 -07:00
LeonSGP43	5407d25599	Fix Telegram DM topic text batch keying	2026-05-28 21:18:39 -07:00
Manzela	90f0f32eae	docs(security): add network egress isolation guide for Docker deployments (#26385 )	2026-05-29 14:09:10 +10:00
Ben Barclay	40fa0c1d19	fix(docker): skip credential/skills/cache mounts when source is invalid (#24490 ) (#34331 ) Salvages #24490 by @liuhao1024 against current main. The Docker daemon will silently auto-create a directory at the host path of any `-v <host>:<container>` bind mount when the host path doesn't exist. In Docker-in-Docker setups (where the outer host's real credential file isn't visible inside the agent's parent container), this leaves a directory at the credential mount source — and the inner `docker run` then refuses to mount a directory over a file destination with exit 125. Add defensive shape guards to all three mount loops in DockerEnvironment.__init__: * credentials (expected: file) — skip + warn on directory or missing * skills (expected: dir) — skip + warn when not a directory * cache (expected: dir) — skip + warn when not a directory Failed mounts surface as WARN logs rather than crashing the container start. Existing well-formed sources mount unchanged. The original PR's branch was on a pre-container-reuse-rework base (May 12) and conflicted with the post-May-28 driver work (label tagging, container reuse, orphan reaper). Reconstructed the same intent on current main; the three guard blocks slot cleanly into `tools/environments/docker.py` around the existing mount loops. Three new tests pinned in `tests/tools/test_docker_environment.py`: directory-source skip, missing-source skip, valid-file mounts. Test- first regression verification: reverted just the production code to `origin/main` and confirmed the new tests fail with `'deleted_token.json' is contained here: /root/.hermes/...` — the fixed code makes them pass. Full file passes (54/54). Closes #24490 Co-authored-by: liuhao1024 <11816344+liuhao1024@users.noreply.github.com>	2026-05-29 14:09:04 +10:00
Teknium	69b74c15a3	fix(kanban): CLI dispatch honors max_in_progress/max_spawn from config; swap missing 'avoid-ai-writing' skill for bundled humanizer (#33488 , #29415 ) (#34337 ) Two small bugs in the kanban dispatcher's CLI surface that were silently degrading two distinct workflows. Bundled because the test files and the surrounding code surface overlap. ## #33488: hermes kanban dispatch ignored kanban.max_in_progress / max_spawn The CLI wrapper in hermes_cli/kanban.py:_cmd_dispatch only passed default_assignee and max_in_progress_per_profile through to dispatch_once. The global concurrency cap (kanban.max_in_progress) and the per-tick spawn limit (kanban.max_spawn) were silently dropped, so operators using 'hermes kanban dispatch' as a one-shot or in a custom loop couldn't reach either cap from config — only the gateway embedded dispatcher honored them. Fix: read both keys from config in the same coerce-positive-int helper that already handled max_in_progress_per_profile. CLI --max still wins over config kanban.max_spawn when both are present (explicit operator signal beats default), but absent --max falls back to config. ## #29415: synthesizer crashed in retry loop on missing skill hermes_cli/kanban_swarm.py:212 hardcoded skills=['avoid-ai-writing'], a skill that doesn't exist in the bundled skills/ directory or any registered hub source. Every synthesizer worker spawn failed at CLI startup with 'Unknown skill(s): avoid-ai-writing' before the agent loop even started — the dispatcher retried up to failure_limit (default 2), then auto-blocked the task, then dependency rules could re-promote it, looping forever until manual intervention. Fix: replace with 'humanizer' which is bundled at skills/creative/humanizer/SKILL.md (description: 'Humanize text: strip AI-isms and add real voice'). That's the obvious intent behind the 'avoid-ai-writing' name, and the skill is platform-portable (linux/macos/windows) so it works on every supported runtime. ## Tests tests/hermes_cli/test_kanban_cli_dispatch_passthrough.py — 4 cases: - CLI passes max_in_progress / max_spawn / default_assignee / max_in_progress_per_profile from config to dispatch_once - CLI --max flag overrides config kanban.max_spawn - Invalid cap values (0, -1, 'abc', '1.5') silently fall through to None - kanban_swarm.py no longer references 'avoid-ai-writing' AND the replacement 'humanizer' skill exists at the expected on-disk path Kanban suite: 468/468 pass (was 464; +4 new regression tests).	2026-05-28 21:00:46 -07:00
teknium1	8cf6b3da9d	fix(opencode-go): cap mimo-v2.5-pro max_tokens at 131072 The opencode-go relay defaults max_tokens to 262144 when none is sent, but Xiami mimo-v2.5-pro only supports 131072 completion tokens — every request 400s with "max_tokens is too large: 262144" before the agent can do anything. Add a get_max_tokens(model) hook on ProviderProfile (default returns default_max_tokens) so profiles fronting multiple upstreams can vary the cap per-model. Wire chat_completions transport through the hook. Override on OpenCodeGoProfile with mimo-v2.5-pro=131072. Only mimo-v2.5-pro is capped — other opencode-go models (kimi, glm, qwen, minimax, other mimo variants) unchanged.	2026-05-28 20:49:53 -07:00
teknium1	bfecfabd0f	Revert "feat(skills): integrate NVIDIA/skills as a trusted skills hub tap" This reverts commit `9992e32db3`.	2026-05-28 20:39:39 -07:00
liuhao1024	44df52005a	fix(tools): guard Path.home() against PermissionError in has_direct_modal_credentials (#33528 ) When HOME=/root (Docker containers) and the process runs as unprivileged user (hermes, uid 10000), Path.home() / '.modal.toml' raises PermissionError because /root/ is inaccessible. This crashes the dashboard /api/skills endpoint. Catch PermissionError/OSError and treat as 'no config file'. Env vars still take priority (tested). Fixes #33525	2026-05-29 13:35:39 +10:00
Teknium	9992e32db3	feat(skills): integrate NVIDIA/skills as a trusted skills hub tap NVIDIA's verified skills catalog (https://github.com/NVIDIA/skills) ships NVIDIA-signed skills for CUDA-X, AIQ, cuOpt, cuPyNumeric, DeepStream, NeMo, NemoClaw and the Skill Card Generator — each bundle carrying a detached `skill.oms.sig` signature, a governance `skill-card.md`, and `evals/`. The sync pipeline drops any skill missing those artifacts before publishing. Changes: - tools/skills_hub.py: add NVIDIA/skills to GitHubSource.DEFAULT_TAPS so it lights up in `hermes skills browse`, `hermes skills search <q>`, the twice-daily skills-index build, and the docs-site Skills Hub page (https://hermes-agent.nousresearch.com/docs/skills) automatically. - tools/skills_guard.py: add NVIDIA/skills to TRUSTED_REPOS so installs resolve to trust_level="trusted" (looser install policy than community). - website/scripts/extract-skills.py: map the `github` source id to a friendly "NVIDIA" pill label for the docs hub page. - website/src/pages/skills/index.tsx: register the NVIDIA pill (green #76b900) and slot it into SOURCE_ORDER after HuggingFace. - website/docs/user-guide/features/skills.md (+ zh-Hans i18n): document the new default tap and the expanded trusted-repos list. - tests/tools/test_skills_guard.py: assert NVIDIA/skills resolves to "trusted" (including the skills-sh-wrapped form). - tests/tools/test_skills_hub.py: invariant — every TRUSTED_REPOS entry must be reachable via GitHubSource.DEFAULT_TAPS (prevents future trusted repos from being declared but never browseable). Validation: - Live GitHub fetch: `src.fetch('NVIDIA/skills/skills/aiq-deploy')` pulled 17 files including SKILL.md (13 KB), skill-card.md, skill.oms.sig, and the full references/ + evals/ tree. trust_level="trusted". - Live inspect resolved name, description, and trust correctly. - All 193 existing skills_guard + skills_hub tests still pass.	2026-05-28 20:35:13 -07:00
hinotoi-agent	042c1d6bb0	test: cover fallback dropped-turn handoff	2026-05-28 20:34:40 -07:00
Hinotoi Agent	6dc068ef04	fix: broaden deterministic compression fallback coverage	2026-05-28 20:34:40 -07:00
Hinotoi Agent	e785c0ad70	fix: preserve context when summary generation fails	2026-05-28 20:34:40 -07:00
Dusk	c834624f7d	fix(voice): honor PIPEWIRE_REMOTE in PortAudio fallback checks (#33473 )	2026-05-29 13:30:17 +10:00
Брагарник Дмитро	54bf798765	approval: add docker restart/stop/kill to DANGEROUS_PATTERNS (#33438 ) When docker.sock is mounted (common Docker Compose pattern), the agent can restart/stop/kill containers without user approval. hermes gateway restart is already protected, but docker restart, docker stop, docker kill, and their docker compose equivalents were not. This caused repeated self-termination: the agent ran docker restart hermes, killed its own container, Docker restarted it (restart policy), and the agent resumed the same session — creating a restart loop. Added patterns mirror the existing gateway lifecycle protection: - docker compose restart/stop/kill/down - docker restart/stop/kill Co-authored-by: Sarbai <sarbai@users.noreply.github.com>	2026-05-29 13:26:54 +10:00
ninjmnky	593e4b435e	Add iputils-ping (ping) to Docker image (#32015 ) ping is a fundamental network diagnostic tool that most users expect to have available in the container. This adds iputils-ping to the apt install list in the Dockerfile. Co-authored-by: ninjmnky <ninjmnky@users.noreply.github.com>	2026-05-29 13:25:32 +10:00
Ben	a618789dba	fix(dashboard-auth): share /api/* public allowlist between legacy and OAuth gates Two parallel public-path allowlists drifted: _PUBLIC_API_PATHS in hermes_cli/web_server.py (legacy _SESSION_TOKEN middleware) and _GATE_PUBLIC_PREFIXES in hermes_cli/dashboard_auth/middleware.py (OAuth gate). The legacy list included /api/status (documented as a non-sensitive read-only liveness target); the OAuth gate's list did not. Effect: every wildcard-subdomain agent surfaced as STARTING/down to the portal even though the dashboard was serving correctly. Nous account service (src/server/agents/fly-provider.ts getInstanceRuntimeStatus) fetches ``/api/status`` without a cookie as its sole liveness probe; the OAuth gate's 401 looked identical to 'agent dead' on the portal side. Fix: lift the allowlist into hermes_cli/dashboard_auth/public_paths.py and have both middlewares import it. _path_is_public now consults the shared frozenset first, then falls back to the gate's auth-bootstrap/static prefix list. Future additions to the public list hit both gates automatically. Endpoint inventory (verified safe to remain public): * /api/status — version, gateway state, active session count, auth-gate shape. Portal liveness probe target. * /api/config/defaults — config-defaults feed for the SPA's Config page * /api/config/schema — config schema for the SPA's Config page * /api/model/info — model catalogue metadata (context windows) * /api/dashboard/themes — theme manifests for the skin engine * /api/dashboard/plugins — plugin manifests for the dashboard No user data, no session content, no secrets. Same shape an external monitoring agent would hit on /healthz. Tests: * New: test_gated_status_is_public (regression guard with the NAS fly-provider.ts liveness-probe rationale spelled out in the docstring) * New: test_other_public_api_paths_are_public_under_gate (parametrised over the rest of PUBLIC_API_PATHS — proves 401 / 302-to-login is never the response) * New: docker integration check #3 in test_dashboard_oauth_gate_engaged_by_default — /api/status remains 200 under the gate AND reports auth_required=True so the portal can distinguish modes * Updated: test_full_login_round_trip_unlocks_gated_api now probes /api/sessions instead of /api/status (status is public, so it can no longer distinguish 'logged in' from 'gate accidentally disabled') * Updated: TestApi401Envelope (the no-cookie / invalid-cookie / dead-cookie tests) probes /api/sessions for the same reason * Updated: docker integration check #2 in test_dashboard_oauth_gate_engaged_by_default probes /api/sessions to prove the gate is intercepting * Removed: dead _login() helper in test_dashboard_auth_status_endpoint.py (no longer needed since /api/status is reachable cold) Companion to docs/handover/hermes-agent-dashboard-s6-insecure-fix.md (the --insecure flag fix that shipped earlier).	2026-05-29 12:17:12 +10:00
Teknium	3b6347af15	feat(kanban): default_assignee fallback + per-profile concurrency cap (#27145 , #21582 ) (#34244 ) Two related dispatcher behaviors that have been missing for a while. ## kanban.default_assignee (#27145) Reporter (@agarzon): dashboard creates a task without an assignee, task parks in 'ready' forever even though the operator's intent ('default') is perfectly clear. The dispatcher already had a 'skipped_unassigned' bucket but no fallback routing — users had to manually type 'default' in the assignee field every time. Behavior: when 'kanban.default_assignee' is set in config.yaml, the dispatcher applies that assignee to any unassigned ready task before deciding whether to spawn. The row is mutated (assignee column + an 'assigned' event with source='kanban.default_assignee' for the audit trail). Empty/whitespace config value = no fallback, preserving the existing skipped_unassigned behavior. Dry-run mode reports what WOULD happen via the new 'auto_assigned_default' bucket on DispatchResult, but does NOT mutate the DB — operators using 'hermes kanban dispatch --dry-run' see the routing decision before committing. ## kanban.max_in_progress_per_profile (#21582) Reporter (@edwardchenchen, @simlu, 4 reactions): fan-out workloads saturate one profile's local model / API quota / browser pool while other profiles sit idle. The existing global 'max_in_progress' caps total workers but doesn't balance across profiles. Behavior: when 'kanban.max_in_progress_per_profile' is set to a positive int, the dispatcher tracks per-assignee running counts (one query at tick start) and refuses to spawn for any assignee already at the cap. Tasks blocked this way go to a new 'skipped_per_profile_capped' bucket on DispatchResult as (task_id, assignee, current_running_count) tuples — NOT an operator-actionable failure, just 'try again next tick when the profile has capacity'. Pre-existing 'running' tasks count against the cap (verified via regression test). The cap respects dry_run mode by incrementing its in-memory counter on each would-be spawn so dry_run reports the same balanced subset that a real tick would. Invalid cap values (0, negative, non-int, None) are treated as 'no cap', preserving the existing behavior. Backward-compatible for installs that don't set the config. ## Surfaces - 'hermes kanban dispatch' CLI now prints 'Auto-assigned to kanban.default_assignee=X: ...' and 'Deferred (X at per-profile cap, N running): ...' lines, plus matching JSON keys in --json output. - Gateway dispatcher logs the configured values at startup ('default_assignee=X', 'max_in_progress_per_profile=N'). - 'kanban.max_in_progress_per_profile' added to DEFAULT_CONFIG with inline docs. ## Validation - tests/hermes_cli/test_kanban_default_assignee.py (6 cases): no-cap baseline, auto-assign + DB mutation, dry-run reports without mutating, whitespace treated as None, explicit assignees untouched, DispatchResult field schema. - tests/hermes_cli/test_kanban_per_profile_cap.py (9 cases including 4 parametrized): no-cap baseline, balanced 2-profile fan-out, pre-existing running counts against cap, invalid cap values (0/-1/'abc'/None), capped tasks dispatched on next tick after running task completes, DispatchResult field schema. - Broader kanban suite: 464/464 pass (was 449 baseline; +15 new regression tests across both features). ## Credit #27145 — Jimmy Johansson reported the dispatcher skipped-unassigned gap; @agarzon scoped the simpler 'honor kanban.default_assignee' fix that matches the existing config knob. #21582 — @edwardchenchen filed the per-profile cap ask after hitting model 429s on fan-out research projects; @simlu confirmed the same pain on local-model setups.	2026-05-28 19:02:55 -07:00
Ben	42612aa350	docs(docker): refresh user-guide page for s6-overlay reality The page was last meaningfully rewritten in the pre-s6 (tini) era and had drifted on five points that no longer matched the image: 1. "Running the dashboard" claimed the entrypoint backgrounds `hermes dashboard` and prefixes its output with `[dashboard]`. That was the pre-s6 entrypoint.sh path; under s6 the dashboard is a supervised s6-rc service (`docker/s6-rc.d/dashboard/run`) with no sed-prefix pipeline. Rewrote the section accordingly. 2. The default for `HERMES_DASHBOARD_HOST` was documented as `127.0.0.1`. The s6 run script defaults it to `0.0.0.0` (`dash_host="${HERMES_DASHBOARD_HOST:-0.0.0.0}"`). Fixed the table and the surrounding prose. 3. Multi-profile was documented as "not recommended in Docker — run one container per profile." That advice was load-bearing when there was no in-container supervisor, but the s6 architecture explicitly adds per-profile gateway supervision: each profile created via `hermes profile create <name>` gets a slot under `/run/service/gateway-<name>/`, the `02-reconcile-profiles` cont-init script restores them across `docker restart` from `gateway_state.json`, and `hermes gateway start/stop/restart` is intercepted by `_dispatch_via_service_manager_if_s6` to route through `s6-svc`. Pivoted the section to "one container, many supervised profile gateways" as the default, with a comparison table and a "When you DO want a separate container" escape hatch for the genuine resource-isolation / network-segmentation cases. 4. The Compose example trailer also claimed `[dashboard]` log prefixing. Replaced with the actual log routing. 5. Added a new "Where the logs go" section covering all four log surfaces: per-profile gateways (tee'd to `docker logs` AND `${HERMES_HOME}/logs/gateways/<profile>/current` since PR `b34532319`), dashboard (`docker logs`, no prefix), boot reconciler (`container-boot.log`), and `hermes logs`. The gateway-mode and Compose sections cross-reference this rather than each carrying their own routing prose. Added a new "docker exec automatically drops to the hermes user" subsection under "What the Dockerfile does", next to the existing Privilege model warning. Documents the `/opt/hermes/bin/hermes` shim (landed via the docker-exec privilege-drop work) — operators don't need to remember `--user hermes` for `docker exec hermes login`, `docker exec hermes profile create …`, etc. The historical footgun (`auth.json` written as `root:root`, supervised gateway then can't read its own auth file) is mentioned only as context for what the fail-loud `exit 126` is protecting against, not as a problem the reader needs to solve. The `HERMES_DOCKER_EXEC_AS_ROOT=1` opt-out is documented for diagnostic sessions. The "Permission denied" troubleshooting subsection now carries a single-line pointer to the new section instead of duplicating it. The `--insecure` framing reflects PR #`fb5125362` (opt-in via `HERMES_DASHBOARD_INSECURE`, not derived from bind host): the OAuth gate is the authority, the bind host alone never implies `--insecure`, and opting out is an explicit security trade-off. Anchors verified resolve. i18n zh-Hans mirror left for the translation flow to catch up.	2026-05-29 11:55:01 +10:00
Ben	3c6e70aef1	docs(docker): document new persist-across-processes contract and orphan reaper (#20561 ) Updates the Docker Backend section of the user-guide configuration page to match the actual behavior shipped in PR #33645. Pre-PR the docs claimed "container is stopped and removed on shutdown," which was never quite true for the documented happy path and is now actively wrong: in default mode the container survives across Hermes processes so background processes (npm watchers, dev servers, long-running pytest) carry over the way the "ONE long-lived container shared across sessions" promise requires. Changes to `website/docs/user-guide/configuration.md`: * Reworked the intro paragraph at the top of the Docker Backend section to describe the actual cross-process reuse contract. * Expanded the YAML example with the new keys `docker_persist_across_processes` and `docker_orphan_reaper`, plus the pre-existing-but-undocumented `docker_env`, `timeout`, and `lifetime_seconds`. Clarified the `container_persistent` comment to disambiguate from `docker_persist_across_processes`. * Added a `docker_env` vs `docker_forward_env` explainer (one injects literal KEY=value, the other forwards values from the host/.env — easy to confuse). * Replaced the one-line "Container lifecycle" paragraph with a full subsection covering: - the three labels Hermes tags every container with (hermes-agent, hermes-task-id, hermes-profile) - the label-probe reuse mechanism on startup - a teardown-trigger table with four rows for every situation that destroys the container in default mode - edge cases (OOM kill, profile switching) * Added an "Environment variable overrides" table covering all TERMINAL_* env vars relevant to the Docker backend, including the previously-undocumented `TERMINAL_DOCKER_ENV` and `HERMES_DOCKER_BINARY`. Changes to `website/docs/user-guide/docker.md`: * Extended the cross-link admonition (around l.227) so the Hermes-in-Docker page points at the new terminal-backend keys (`docker_env`, `docker_persist_across_processes`, `docker_orphan_reaper`) alongside the ones already mentioned. No code changes. Behavior already covered by tests added in earlier commits on this branch (#33645 commits 1-5). Refs #20561	2026-05-29 11:49:54 +10:00
Ben	2f0f03c40d	fix(docker): cleanup_vm() default honors persist mode (don't kill container on session close) Commit 4 made cleanup_vm() default to force_remove=True, which was wrong: cleanup_vm() is called from AIAgent.close() (TUI session close at tui_gateway/server.py:2991, gateway session teardown at gateway/run.py:3569) and from per-turn cleanup (agent/chat_completion_helpers.py:1517). All three are session-lifecycle events that should honor persist mode, not explicit user-initiated teardown. Ben reported the symptom: container shared between multiple TUI sessions (good) but killed as soon as any session closed (bad). With force_remove=True as the default, every `session.close` JSON-RPC tore down the container. The fix is to flip cleanup_vm()'s force_remove default back to False. The kwarg still exists for future explicit-teardown paths (`/reset`-style flows, "destroy my sandbox" commands) that haven't been wired up yet. Two new unit tests pin the behavior: * `test_cleanup_vm_default_honors_persist_mode` — asserts `cleanup_vm(task_id)` does neither docker stop nor docker rm on a persist-mode container (the regression Ben caught). * `test_cleanup_vm_force_remove_tears_down_persist_container` — asserts the kwarg still flows through the runtime-signature-inspection plumbing to the backend's cleanup(). E2E verified against real Docker (in addition to all 17 existing checks): ✓ Default cleanup_vm() leaves persist-mode container running ✓ cleanup_vm(force_remove=True) removed the container Refs #20561	2026-05-29 11:49:54 +10:00
Ben	5c2170a7c6	fix(docker): persist-mode cleanup is no-op; add force_remove kwarg (#20561 ) The first iteration of this PR did docker stop on every cleanup in persist mode (only skipping docker rm). Ben caught this as contradicting the documented "ONE long-lived container shared across sessions" semantics: stopping the container on every Hermes /quit kills any background processes inside (npm watchers, pytest watchers, long-running scripts) — exactly the case persist mode is supposed to protect. This commit splits the cleanup paths cleanly: * Persist mode (default) — cleanup() is a NO-OP for the container. Container stays running, processes survive, next Hermes process attaches via the existing label probe in ~ms instead of waiting for docker start. Resource reclamation happens via the orphan reaper at next startup (2 × lifetime_seconds threshold), which covers the SIGKILL / OOM / abandoned-laptop cases. * Opt-out mode (persist_across_processes=False) — unchanged: docker stop + docker rm -f on cleanup as before. * Explicit teardown — new cleanup(force_remove=True) kwarg overrides persist mode and tears the container down unconditionally. cleanup_vm(task_id) now defaults to force_remove=True since it's the user-driven reset path (called from AIAgent.close(), /reset-style flows, and the idle reaper's per-turn cleanup). The idle reaper in _cleanup_inactive_envs calls env.cleanup() directly with no kwargs, so idle persist-mode envs are no-op'd — the container survives the in-process pop and the next tool call re-probes via labels. No state leak: _container_id is still cleared on the in-process handle. E2E verified against real Docker: ✓ Container is still running after cleanup() ✓ Background process (sleep loop) survived cleanup() ✓ Filesystem state preserved across cleanup() ✓ In-process container_id cleared (next __init__ will re-probe) ✓ Background process visible from reused env (no docker start happened) ✓ force_remove=True removed the container even in persist mode ✓ cleanup_vm() removed the container (defaults to force_remove=True) Test changes: * Replaces `test_cleanup_with_persist_only_stops_no_rm` with `test_cleanup_with_persist_is_noop_for_container` — asserts neither stop nor rm runs in persist mode, and the in-process handle is cleared so re-probe works. * Adds `test_cleanup_force_remove_stops_and_rms_even_in_persist_mode` — covers the new kwarg. * Updates `test_cleanup_uses_subprocess_run_not_detached_shell` and `test_wait_for_cleanup_after_cleanup_returns_true` to pass `force_remove=True` so they actually exercise the docker code path (default no-op would trivially pass). cleanup_vm() forwards `force_remove` only to backends whose cleanup() accepts the kwarg (currently just DockerEnvironment) via runtime signature inspection — Modal/Daytona/SSH `cleanup()` signatures are unchanged. Refs #20561	2026-05-29 11:49:54 +10:00
Ben	d77d877665	fix(docker): startup orphan reaper for crashed-process containers The cleanup-fix in the previous commit handles the graceful-exit leak: a Hermes process that runs ``atexit`` will now actually wait on the docker stop/rm worker thread, so containers either survive (persist mode) or are fully removed (opt-out mode) by the time the interpreter exits. But ``atexit`` doesn't fire on SIGKILL, OOM-kill, or terminal-window close. Containers from those exits stay parked with no surviving Python process to reuse or remove them, so they accumulate until the operator intervenes with ``docker rm -f``. The cleanup-fix doesn't help this class — there's no live cleanup() to fix. This commit adds the safety net: a startup orphan reaper that runs once per Hermes process and removes long-Exited hermes-labeled containers that the prior commit couldn't reach. Implementation: * New ``reap_orphan_containers()`` in ``tools/environments/docker.py``. Filters: ``label=hermes-agent=1`` + ``status=exited`` + (optional) ``label=hermes-profile=<current>``. Per-container ``docker inspect`` parses ``State.FinishedAt`` (with nanosecond-precision trimming for Python's microsecond-bound ``fromisoformat``); containers older than the threshold get ``docker rm -f``'d. The ``status=exited`` filter is load-bearing — a running container may belong to a sibling Hermes process whose reuse path will pick it up; killing it would crash the sibling mid-command. Single-container failures are logged and the sweep continues to the next candidate. * New ``_maybe_reap_docker_orphans()`` helper in ``tools/terminal_tool.py``. Wired into ``_create_environment()`` for ``env_type == "docker"``. Gated by: - ``terminal.docker_orphan_reaper: true`` (default; opt-out for operators running multiple Hermes processes in the same profile who don't trust the conservative defaults) - ``_docker_orphan_reaper_ran`` module flag with double-checked locking — parallel subagents and RL rollouts don't trigger N concurrent docker ps storms - Age threshold = ``2 × TERMINAL_LIFETIME_SECONDS`` with a 60s floor (so ``TERMINAL_LIFETIME_SECONDS=0`` doesn't race the user's own setup) - Profile scoping — a research profile NEVER reaps the default profile's stragglers - Exception swallow — a janitor failure must never block container creation * New config ``terminal.docker_orphan_reaper`` wired through all four config-bridge sites (cli.py, gateway/run.py, hermes_cli/config.py, tests/conftest.py) and pinned by ``test_docker_orphan_reaper_is_bridged_everywhere``. Coverage: * 9 new unit tests in test_docker_environment.py — happy path, recent- container sparing, profile scoping, unparseable-timestamp safety, docker-ps-failure handling, partial-failure continuation, nanosecond timestamp parsing, zero-value FinishedAt rejection. * 6 new integration tests in test_docker_orphan_reaper_integration.py — once-per-process gate, disable-flag respected, lifetime doubling with 60s floor, current-profile filter wiring, exception swallow. * 1 new bridge-invariant regression test. Closes #20561 (combined with the two prior commits on this branch).	2026-05-29 11:49:54 +10:00
Ben	ac8e238bc8	fix(docker): reuse containers across processes + fix cleanup leaks The Docker backend docs claim "Single persistent container — ONE long- lived container shared across sessions, /new, /reset, and delegate_task subagents. Stopped/removed on shutdown." In practice the code only honored that contract within a single Python process via the in-memory \`_active_environments[task_id]\` cache. Every \`hermes chat\` invocation spawned a fresh \`hermes-<hex>\` container; older containers piled up in \`Exited\` state and accumulated until manual \`docker rm\` (issue #20561). Three root causes, all addressed by this commit: 1. No cross-process container discovery. 2. \`cleanup()\` used fire-and-forget \`subprocess.Popen("... &", shell=True)\` which raced with parent-process exit — when Python exited promptly the detached shell child got killed mid-\`docker stop\`, leaving stopped containers behind. 3. The \`docker rm\` step in cleanup was gated on \`not self._persistent\` (the bind-mount-persistence flag). Default config sets \`container_persistent: true\`, so the default happy path skipped \`rm\` entirely — even when the user explicitly didn't want cross-process reuse, containers leaked. Fix: * Add \`DockerEnvironment.__init__(persist_across_processes=True)\`. When true, init probes \`docker ps -a --filter label=hermes-agent=1 --filter label=hermes-task-id=<task> --filter label=hermes-profile=<profile>\` and reuses a matching container (running → attach; stopped → \`docker start\` → attach; \`docker start\` failure → fall through to a fresh \`docker run\`). Multiple matches prefer the running one, with the stragglers left for the orphan reaper (next commit) to clean up. * Rewrite \`cleanup()\`. Uses \`subprocess.run(..., timeout=30)\` on a daemon \`threading.Thread\`, not the racy \`Popen(... &)\`. The \`_persistent\` guard is dropped on the \`rm\` step — \`rm\` now runs whenever \`persist_across_processes\` is false, regardless of the bind-mount-persistence setting. The leak class is gone in all combinations. * Add \`wait_for_cleanup(timeout)\`. \`tools/terminal_tool.py\`'s atexit hook calls this on every active env, blocking up to 15s for the cleanup thread before interpreter exit. Without this, \`hermes /quit\` raced the daemon-thread teardown and dropped the stop/rm work. * New config \`terminal.docker_persist_across_processes\` (default \`true\` — restores the documented contract). Set \`false\` for hard per-process isolation. Wired through all four config-bridge sites (cli.py env_mappings, gateway/run.py _terminal_env_map, hermes_cli/config.py _config_to_env_sync, tests/conftest.py env-strip list); regression-pinned by \`test_docker_persist_across_processes_is_bridged_everywhere\` matching the existing pattern for docker_run_as_host_user / docker_env. Reuse intentionally does NOT compare image / mounts / resources — only the labels. Operators changing those settings should set \`docker_persist_across_processes: false\` (or \`docker rm -f\` the labeled container) to force a fresh start. This keeps the probe cheap and the failure mode obvious. Coverage: 12 new unit tests in tests/tools/test_docker_environment.py covering reuse paths (running, stopped, fallback, opt-out, duplicate preference) and cleanup behavior (persist-mode no-rm, opt-out always-rm, no-Popen, wait_for_cleanup semantics, partial-init safety). Plus one config-bridge regression pin. Refs #20561	2026-05-29 11:49:54 +10:00
Ben	8d129d013b	fix(docker): tag containers with hermes-agent labels for identification Issue #20561 (Docker containers accumulate) needs a way to identify hermes-created containers from the outside — both for the orphan reaper (a follow-up commit) and for operators triaging `docker ps -a \| grep hermes-` after a SIGKILL leaves stragglers. The previous `hermes-<hex>` name prefix was the only signal, which broke down under cross-process reuse (planned) and against any custom `--name` someone might pass via `docker_extra_args`. This commit adds three labels at `docker run` time: --label hermes-agent=1 # global sweep target --label hermes-task-id=<sanitized> # per-task reuse key --label hermes-profile=<sanitized> # per-profile isolation key Values are sanitized to `[A-Za-z0-9_.-]` and truncated to 63 chars so the label round-trips cleanly through `docker ps --filter label=key=value`. Empty or non-string inputs collapse to "unknown" rather than producing an unqueryable empty value. No behavior change: the labels are pure metadata. The follow-up commits in this PR (cleanup-fix + orphan reaper) are what use them. Refs #20561	2026-05-29 11:49:54 +10:00
Teknium	300140e006	test(tui_gateway): stop reloading server module in fixture teardown (#34217 ) tui_gateway.server registers two atexit hooks at module load time: ThreadPoolExecutor shutdown (line 170) and _shutdown_sessions (line 336). Three test files reloaded the module on each fixture teardown to reset per-test state. Each reload re-runs module-level code, including the atexit registrations — duplicates accumulate across the test session. At pytest interpreter shutdown the duplicated atexit hooks race the stderr buffer flush: Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads pytest reports 'tests passed but the slice exited non-zero', and the shard turns red on CI. Surfaced today on PR #34193's test slice 1 (204 files, 3572 tests passed, then Fatal Python error during exit). Fix: drop importlib.reload(mod) from the three fixtures that have it. Per-test reset is handled by clearing the mutable session dicts (_sessions, _pending, _answers). _methods is also no longer cleared — it's populated at module import time and would only be re-populated by a reload, so clearing it without reload broke session.resume / command.dispatch / slash.exec method registration across tests. Affected fixtures: - tests/tui_gateway/test_goal_command.py - tests/tui_gateway/test_protocol.py - tests/tui_gateway/test_review_summary_callback.py The second reload in test_protocol.py at line 211 (reload of tui_gateway.transport) is preserved — transport.py has no atexit hooks or threads, so reload is safe there. Tests: 84/84 in tests/tui_gateway/ pass cleanly with exit code 0; no Fatal Python error at interpreter shutdown.	2026-05-28 18:16:54 -07:00
Teknium	e71a2bd11b	chore: release v0.15.1 (2026.5.29) (#34222 )	2026-05-28 18:11:49 -07:00
Teknium	769ee86cd2	feat(kanban): attach images referenced in task bodies to worker vision (#34210 ) Kanban workers now scan the task body for local image paths and http(s) image URLs and attach them to the worker's first user turn — matching the CLI/gateway behaviour for inbound images. Before, a user pasting `/home/me/screenshot.png` or `https://example.com/img.png` into a kanban task description had it sent to the model as plain text and the pixels were never seen. How it works: * agent/image_routing.py gains extract_image_refs(text) → (paths, urls) that mirrors gateway/platforms/base.py:extract_local_files (absolute / ~-relative paths, image extensions only, ignores fenced/inline code). * build_native_content_parts() accepts an optional image_urls= kwarg and emits passthrough image_url parts for remote URLs alongside the base64 data: URLs used for local paths. * cli.py (single-query/quiet branch — the path every dispatcher-spawned worker takes) detects HERMES_KANBAN_TASK, reads the task body via kanban_db.get_task, runs extract_image_refs, and threads the results into the existing image-routing decision (native vs text). Best-effort: enrichment failures never block worker startup. Tested: * tests/agent/test_image_routing.py — 22 new tests for extract_image_refs and URL pass-through in build_native_content_parts. * tests/hermes_cli/test_kanban_worker_image_extraction.py — 10 new tests driving real kanban_db round-trip (create task → read body → extract refs → build parts). * E2E: created a fake kanban task with a body referencing both a local PNG and an https URL; verified the worker pipeline produces a multimodal user turn with 1 text part + 2 image_url parts (data URL for the local file, passthrough URL for the remote).	2026-05-28 17:50:42 -07:00
Ben	1b1e30510a	test(docker): repair dashboard tests broken by the insecure-opt-in fix The Docker integration test job started failing on main after `fb5125362` ("docker: opt in to dashboard --insecure via env var"). Two distinct failures, both fallout from that change being more behaviour-changing than the existing test harness anticipated. Failure 1 — test_dashboard_port_override (silent regression in an already-existing test) The test starts the container with just HERMES_DASHBOARD=1, defaults to host=0.0.0.0, no HERMES_DASHBOARD_OAUTH_CLIENT_ID, no HERMES_DASHBOARD_INSECURE. Pre-fix that combination got --insecure auto-injected by the s6 run script (anything non-loopback was implicitly insecure), so the OAuth gate stayed off and start_server bound the port. Post-fix the gate engages, no provider is registered, and start_server raises SystemExit before binding — under s6 the dashboard goes into a restart loop and the test's /proc/net/tcp poll finds nothing. Same silent regression was masking three sibling tests (test_dashboard_slot_reports_up_when_enabled, test_dashboard_opt_in_starts, test_dashboard_restarts_after_crash) — they all only sample pgrep or s6-svstat and so caught the supervised process mid-restart loop, appearing to pass while the dashboard was actually never reaching a healthy state. Fix: pin HERMES_DASHBOARD_INSECURE=1 on every test that enables the dashboard but doesn't itself exercise the auth gate. Each pinned site carries an inline comment pointing back to test_dashboard_slot_reports_up_when_enabled for the full rationale. Failure 2 — test_dashboard_oauth_gate_engages_on_non_loopback_bind (bug in the test I added in `fb5125362`) The probe used urllib.request.urlopen() against /api/status. Under the now-engaged OAuth gate /api/status no longer answers unauthenticated callers (the gate middleware runs upstream of the legacy _SESSION_TOKEN allowlist and 401s anything without a valid session cookie). urlopen() raises HTTPError on the 401, the wrapper treated that as "not ready yet", and the poll loop hit timeout. Fix: split the probe into a generic _http_probe() helper that returns (status_code, body) for any HTTP response — including 401, which IS the gate-engaged success signal. The helper feeds a multi-line Python program over stdin via a POSIX heredoc so the try/except branch reads naturally; far less fragile than the earlier semicolon-laden -c one-liner. The OAuth-gate test now verifies two independent observable consequences of the gate being on: 1. GET /api/auth/providers (publicly reachable through the gate so the login page can bootstrap) returns 200 with `nous` in the provider list — proves the bundled provider registered. 2. GET /api/status returns 401 — proves the OAuth gate runs upstream of the legacy public-paths allowlist and is actively intercepting unauthenticated callers. The insecure-opt-out test still hits /api/status, but now asserts status_code == 200 first (proves the gate is bypassed) before parsing the JSON for auth_required: false (proves the gate-state flag is also correctly off). Verified locally end-to-end against a fresh image build on a real Docker daemon: all 41 tests under tests/docker/ pass in 2m38s, including the two formerly-failing dashboard tests and the three sibling tests that were passing by accident.	2026-05-29 10:30:52 +10:00
Teknium	f3acdd94fe	Merge pull request #30698 from NousResearch/refactor/use-ds-primitives refactor(web): consume DS primitives, remove local component copies	2026-05-28 17:29:28 -07:00
Teknium	78a54d2c00	fix(skills-page): source pills and category sidebar collapsed to All only (#34194 ) Regression from PR #33809 (lazy-fetch refactor). The `sources` and `categoryEntries` useMemo blocks were derived from `allSkillsLocal` but had empty/incomplete deps arrays — so they computed once at mount when the catalog was still `[]`, then never recomputed when the fetch resolved. Symptom: live site shows only the "All 87,639" source button and "All Skills 87,639" category — no per-source pills (ClawHub, skills.sh, LobeHub, etc.) and no category breakdown. Filtering by source/category is unusable. Fix: add `allSkillsLocal` to both deps arrays so they recompute when data arrives. Local build green on en + zh-Hans.	2026-05-28 17:11:40 -07:00
Ben	e7c99651fb	fix(mcp): resolve bare npx/npm/node against /usr/local/bin When the Hermes Docker image runs an stdio MCP server configured with an explicit env.PATH that omits /usr/local/bin (a common pattern when users hand-author PATH for sandboxing), the MCP env-filter passes that narrow PATH straight through to the subprocess. _resolve_stdio_command's fallback for bare 'npx' / 'npm' / 'node' commands only checked $HERMES_HOME/node/bin/ and ~/.local/bin/, so execvp() failed with '[Errno 2] No such file or directory: npx' on every Node-based stdio MCP server (Railway, Anthropic, GitHub Copilot, etc.). The naive workaround — symlink /usr/local/bin/npx into the user's PATH — fails one layer deeper because npx's shebang re-execs /usr/bin/env node and node also lives at /usr/local/bin/node. Fix: add /usr/local/bin/<cmd> as a third candidate in the fallback list. This is the canonical install location for Node on: - Linux from-source builds - the upstream node:bookworm-slim image, which the Hermes Docker image copies node + npm + corepack from since #4977 (the Node 22 LTS refactor that exposed this) - macOS Homebrew on Intel Because the resolver already calls _prepend_path(resolved_env, command_dir) after locating the command, /usr/local/bin gets prepended to the env's PATH automatically, which also fixes the second-layer shebang failure (npx-cli.js can now find node). Scope is intentionally narrow: the fix activates only when the bare command isn't otherwise locatable through the user's PATH. Users who explicitly narrowed PATH for a non-Node MCP server see no change in behavior. Tested: - tests/tools/test_mcp_tool_issue_948.py: new test test_resolve_stdio_command_falls_back_to_usr_local_bin (mirrors the existing hermes-node-bin fallback test) - Full MCP test suite: 254/254 pass across 7 test files - E2E against a freshly-built Docker image: reproduced the original failure mode (env.PATH=/opt/data/bin:/usr/bin:/bin), confirmed the resolver returns /usr/local/bin/npx and prepends /usr/local/bin to PATH; subprocess.run of the resolved command prints '10.9.8' and exits 0 with empty stderr - Negative E2E on the host (where Node is already on PATH via mise): resolver still hits the mise install dir, /usr/local/bin candidate is not consulted, PATH is unchanged	2026-05-29 10:05:42 +10:00
Ben	fb51253620	docker: opt in to dashboard --insecure via env var, never derive from bind host The s6 dashboard run script flipped `--insecure` on whenever `HERMES_DASHBOARD_HOST` was anything other than 127.0.0.1 / localhost. That comment ("the dashboard refuses otherwise") predates the OAuth auth gate: back when it was written, `start_server` would SystemExit on any non-loopback bind, so the run script's `--insecure` was the only way to make in-container deployments work at all. The gate has since been replaced by `should_require_auth(host, allow_public)`, which engages the OAuth flow when a `DashboardAuthProvider` is registered (the bundled `dashboard_auth/nous` provider auto-registers on `HERMES_DASHBOARD_OAUTH_CLIENT_ID`) and fails closed with a specific operator-facing error when none is. The host-derived `--insecure` ran upstream of all that and silently disabled the gate on every container-deployed dashboard. Most visible under the portal's wildcard-subdomain rollout: every Fly machine binds 0.0.0.0 so the edge can reach Flycast, every machine boots with the correct `HERMES_DASHBOARD_OAUTH_CLIENT_ID`, the nous provider registers — and `/api/status` still returns `{"auth_required": false, "auth_providers": ["nous"]}` because the run script disabled the gate before `start_server` ever saw the request. The dashboard SPA was served to anyone, no `/login` redirect, no OAuth challenge. Fix: derive `--insecure` from an explicit opt-in env var, `HERMES_DASHBOARD_INSECURE` (truthy values matching the rest of the s6 boolean envs: 1, true, TRUE, True, yes, YES, Yes). Operators on trusted LANs behind a reverse proxy without the OAuth contract (the existing `docker-compose.windows.yml` use case) opt in explicitly; portal-managed agent deployments leave it unset and let the gate engage. `docker-compose.windows.yml` already passes `--insecure` on the `command:` array directly (line 38), so it doesn't depend on the s6 auto-injection. No compose-file change required. Tests: * `tests/test_docker_home_override_scripts.py` — extends the existing static-text guard with a regression assertion that the legacy host-derived case-statement is gone and the new env-var opt-in is present (locks against accidental revert). * `tests/docker/test_dashboard.py` — adds two Docker-in-Docker tests exercising the actual `/api/status` round-trip: - 0.0.0.0 bind + `HERMES_DASHBOARD_OAUTH_CLIENT_ID` → gate engaged - 0.0.0.0 bind + `HERMES_DASHBOARD_INSECURE=1` → gate disabled Docs: * `website/docs/user-guide/docker.md` + zh-Hans i18n — adds the new env var to the table, replaces the stale prose ("the entrypoint no longer auto-enables insecure mode" — which until this PR was flat-out wrong) with an accurate description of the gate's trigger conditions and the explicit opt-out. shellcheck clean. Python static-text test passes locally. Behavioural test will run against any future image build (CI's Docker harness).	2026-05-29 09:56:40 +10:00
Evo	ef009a987a	docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE from #33583 (#33751 ) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (en) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (en) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (zh) * docs(reference): document --no-supervise / HERMES_GATEWAY_NO_SUPERVISE (zh)	2026-05-29 09:44:53 +10:00
BROCCOLO1D	130396c658	ci(docker): avoid gha cache on arm64 PR builds	2026-05-29 09:43:48 +10:00
Austin Pickett	a5c1f925b5	fix(web): stop /api/auth/me 401 from triggering a reload loop In loopback mode the dashboard's identity probe (/api/auth/me) returns 401 by design — AuthWidget swallows it and renders nothing. But the probe routed through fetchJSON, whose loopback 401 handler treats a 401 as a rotated session token and full-page-reloads to pick up a fresh one. That reload is guarded by a one-shot sessionStorage flag which every successful request clears, so with auth/me reliably 401ing and the other dashboard calls (status/config/sessions) reliably succeeding, the guard never sticks and the page reload-loops indefinitely (the "boot flash"). Add an allowUnauthorized option to fetchJSON that skips only the loopback stale-token reload (the 401 still throws so AuthWidget can catch it, and the gated-mode login_url envelope redirect is unaffected), and use it for getAuthMe. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-28 16:58:42 -04:00
kshitij	11d93096b3	Merge pull request #34097 from kshitijk4poor/salvage/memori-trace-messages feat: expose completed-turn message context to memory providers (salvage #28065)	2026-05-28 13:56:07 -07:00
kshitijk4poor	d464d08a5f	chore: add devwdave to AUTHOR_MAP Maps both commit emails (david@memorilabs.ai, dave@devwdave.com) used on #28065 to the devwdave GitHub account so the contributor audit in scripts/release.py passes.	2026-05-29 02:16:43 +05:30
Dave Heritage	5a95fb2e14	feat: expose completed-turn message context to memory providers Adds an optional `messages` keyword to the `MemoryProvider.sync_turn` contract so external/community memory plugins can receive the OpenAI-style conversation message list for the completed turn — including assistant tool calls and tool result content — not just the final assistant text. Dispatch uses signature inspection (`_provider_sync_accepts_messages`): only providers that declare a `messages` parameter (or `**kwargs`) receive it; all existing in-tree providers keep their legacy text-only signature and are called unchanged. No structured-trace envelope is added to core — providers reconstruct whatever they need from the standard message list. Also documents Memori as a standalone community memory provider. Salvaged from #28065 — rebased onto current main. Co-authored-by: Dave Heritage <david@memorilabs.ai>	2026-05-29 02:16:43 +05:30

1 2 3 4 5 ...

9858 commits