hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-14 14:12:44 +00:00

Author	SHA1	Message	Date
Edward-x	7ebebfbb8d	Harden Skills Guard multi-word prompt patterns (#26852 ) Co-authored-by: openhands <openhands@all-hands.dev>	2026-05-25 01:51:27 -07:00
JiahuiGu	0a2ee71ccc	fix(skill): guard pickle.loads in darwinian-evolver show_snapshot with explicit flag (#29276 ) show_snapshot.py unpickled a user-supplied path unconditionally. pickle.loads is equivalent to arbitrary code execution, so a snapshot from an untrusted source = RCE. Require an explicit --i-trust-this-file acknowledgement before calling pickle.loads, and emit a stderr warning when proceeding. Co-authored-by: Jiahui-Gu <jiahuigu@users.noreply.github.com>	2026-05-25 01:51:21 -07:00
Jorge Fuenmayor	93660643a6	fix: harden skill trust source matching (#31229 ) Co-authored-by: gaia <gaia@gaia.local>	2026-05-25 01:51:15 -07:00
Kasun Athaudahetti	2d422720b5	fix(codex): size and propagate timeouts for Responses-API requests; lower stale defaults Codex / Responses-API requests had three latent timeout bugs that combined into the long silent hangs reported on #21444: 1. The non-stream stale-call detector estimated context tokens from ``api_kwargs["messages"]`` only. Codex / Responses-API payloads carry their conversational load in ``input`` (with ``instructions`` and ``tools``), so every Codex turn logged ``context=~0 tokens`` and the detector never applied its >50k / >100k tier bumps. 2. ``providers.<id>.request_timeout_seconds`` was silently dropped on the main Codex path. The chat_completions path and the auxiliary Codex adapter both forwarded it; the main path skipped it through three places (``build_api_kwargs``, ``ResponsesApiTransport.build_kwargs``, ``_preflight_codex_api_kwargs``). 3. The streaming stale detector had the same payload-shape bug for ``codex_responses`` requests, which route through the non-streaming detector (it's the path that emits the user-facing "No response from provider for 300s (non-streaming, ...)" warning that reporters keep pasting). This commit: - Adds ``estimate_request_context_tokens`` in ``chat_completion_helpers``, used by both the non-stream and stream detectors. Handles ``messages`` (Chat Completions), ``input + instructions + tools`` (Responses API), bare lists, and an unknown-dict fallback. - Forwards ``timeout`` through ``ResponsesApiTransport.build_kwargs`` and ``_preflight_codex_api_kwargs`` (with guards against zero/negative/inf/bool values), and wires ``_resolved_api_call_timeout()`` into the Codex branch of ``build_api_kwargs``. - Lowers the implicit non-stream stale defaults so fallback providers kick in faster when upstream stalls: * base 300s -> 90s * >50k 450s -> 150s * >100k 600s -> 240s These only apply when the user has not set ``providers.<id>.stale_timeout_seconds`` or ``HERMES_API_CALL_STALE_TIMEOUT``. Explicit config still wins. - Adds regression tests for the estimator shapes, the new defaults, the context-tier scaling, transport timeout pass-through, and preflight timeout pass-through / rejection of invalid values. Closes #21444 Supersedes #21652 #24126 #31855 Co-authored-by: Hoang V. Pham <26063003+hehehe0803@users.noreply.github.com>	2026-05-25 01:47:55 -07:00
Teknium	76135b329d	docs(i18n): translate all docs into Simplified Chinese (zh-Hans) (#31942 ) Translates the full English docs corpus (335 files) into Simplified Chinese under website/i18n/zh-Hans/. Combined with PR #31895 (cross- locale link fix), the 简体中文 locale toggle now serves a complete Chinese site with working cross-page navigation. Pipeline: - Claude Sonnet 4.6 via OpenRouter, 8-way concurrent - Preserves frontmatter keys, code blocks, MDX/JSX, link URLs, brand names, and technical jargon (prompt/token/hook/MCP/ACP/etc.) - Translates only frontmatter title/description and prose - Two largest files (configuration.md 93KB, research-paper-writing.md 107KB) retried with 64K max_tokens after initial fence-drift - 3 manual post-fixes for MDX edge cases the model didn't escape: < in optional-skills-catalog table, double-quotes in an alt= tag, and a bare URL adjacent to a full-width period Cost: ~$30 total (Sonnet 4.6 input $3/M + output $15/M). Verified `npm run build` succeeds for both en and zh-Hans locales, no double-prefixed /docs/zh-Hans/docs/ URLs in rendered output, all in-page navigation resolves correctly. Translations are machine-generated and may need human review on specific pages — but they're an enormous improvement over the previous state (3 zh-Hans pages out of 335).	2026-05-25 01:47:38 -07:00
Teknium	ffe11c14ec	test(cli): cover quiet-mode resume status lines routed to stderr 4 tests: session-not-found in quiet mode -> stderr; in full mode -> stdout (unchanged); resumed banner in quiet mode -> stderr; has-no-messages in quiet mode -> stderr.	2026-05-25 01:47:12 -07:00
Michel Belleau	25295e7ac9	fix(cli): redirect resume status lines to stderr in quiet mode (#11793 ) When 'hermes chat --quiet --resume <id> -q "..."' is used, three status messages were written to stdout via ChatConsole / _cprint: - '↻ Resumed session <id> (N user messages, M total messages)' - 'Session <id> found but has no messages. Starting fresh.' - 'Session not found: <id>' / usage hint This polluted the machine-readable stdout that automation wrappers capture with $(...), making it impossible to cleanly separate the agent's answer from the resume banner. Fix: detect quiet mode via tool_progress_mode == 'off' and route the three resume status messages to stderr (as plain text, matching the existing stderr convention for session_id). Interactive mode is unchanged — it still uses the Rich-rendered path through ChatConsole. Surgical reapply of PR #11868. Original branch was stale against current main; reapplied onto current cli.py by hand with original authorship preserved via --author.	2026-05-25 01:47:12 -07:00
Teknium	11c40d6a42	test+polish(compression): pin anti-thrash gate and gateway session_id persistence Follow-up to @someaka's fix. Polish: - Drop the redundant `_preflight_tokens >= threshold_tokens` clause. `should_compress(tokens)` already short-circuits when tokens < threshold, so the explicit comparison was dead code on the True branch. Tests: - Preflight: pin that should_compress() is called (anti-thrash has a vote). Mocks should_compress to return False even with tokens past the raw threshold and asserts no compression runs — exact bug shape from #29335. - Gateway: AST scan of gateway/run.py asserts every `session_entry.session_id = ...` assignment is followed by a `session_store._save()` call within the same block. Three sites mutate the session_id after compression; all three must persist or the next turn loads the pre-compression transcript and re-loops. Empirically verified the test catches the bug (drops the new _save() line → red). AUTHOR_MAP: - Map ed@bebop.crew -> someaka so the salvaged commit resolves to @someaka in release notes.	2026-05-25 01:44:46 -07:00
Radical Edward	3914089d52	fix(compression): 3-line fix for infinite compression loop (#29335 ) Three compounding root causes: A) run_conversation() result dict missing session_id — gateway's dead-code guard at gateway/run.py:8700 never triggers B) preflight compression bypasses should_compress() anti-thrashing — re-triggers every turn when tool schemas dominate token budget C) gateway updates session_entry.session_id in memory but doesn't persist via session_store._save() Fixes: #29335	2026-05-25 01:44:46 -07:00
Teknium	222a3a9c19	test(cli): cover exit resume hint -p flag across profiles 5 tests: default/custom profiles emit no -p; named profile emits -p <name> on both --resume and -c hints; lookup failure falls back gracefully.	2026-05-25 01:41:54 -07:00
CK iRonin.IT	2a2cef4ac7	fix: include -p profile flag in exit resume hint Session IDs are profile-constrained, so the resume hint needs to include the active profile for multi-profile users. Without this, copying the hint from a non-default profile fails to resume the correct session. Before: hermes --resume 20260414_063228_c1240e After: hermes --resume 20260414_063228_c1240e -p dev Also includes -p on the resume-by-title hint. Skipped for 'default' and 'custom' profiles (no -p needed). Surgical reapply of PR #9652. Original branch was stale against current main (~6 months); reapplied onto current cli.py by hand with original authorship preserved.	2026-05-25 01:41:54 -07:00
teknium1	d3ffbc6409	feat(stt): add stt.providers.<name> command-provider registry Mirror of the TTS command-provider registry (PR #17843) for STT. Lets any shell-driven ASR engine — Doubao ASR, NVIDIA Parakeet, whisper.cpp builds, SenseVoice, curl pipelines — become an STT backend with zero Python. Complements the legacy HERMES_LOCAL_STT_COMMAND escape hatch (preserved untouched via the built-in local_command path) and the register_transcription_provider() Python plugin hook also shipped in this PR. Resolution order (mirrors TTS exactly): 1. Built-in (local, local_command, groq, openai, mistral, xai) → native handler. Always wins. 2. stt.providers.<name>: type: command → command-provider runner. 3. Plugin-registered TranscriptionProvider → plugin dispatch. 4. No match → 'No STT provider available'. Files ----- - tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset retained; added _resolve_command_stt_provider_config, _transcribe_command_stt, and local helpers for template rendering, shell-quote context, and process-tree termination. Helpers are documented as mirrors of their tts_tool.py counterparts (kept local to avoid cross-tool private import). Wire-in is one insertion point in transcribe_audio() after the xai elif and before the plugin dispatcher. Plugin dispatcher additionally defensively short-circuits when a same-name command config exists (command-wins-over-plugin invariant). - tests/tools/test_transcription_command_providers.py: 50 new tests covering resolution (builtin precedence, type/command gating, case-insensitive lookup, legacy stt.<name> back-compat), helpers (timeout fallback, format validation, iter, has-any), template rendering (shell-quote contexts, doubled-brace preservation), end-to-end via _transcribe_command_stt (output_path read, stdout fallback, timeout, nonzero exit envelope, model override, language precedence), and dispatcher integration via the real transcribe_audio() including command-wins-over-plugin and builtin-shadow-rejection. - tests/plugins/transcription/check_parity_vs_main.py: extended from 10 to 13 scenarios. New cases: command-provider-installed, command-vs-plugin-same-name (verifies command wins precedence), explicit-openai-with-command-shadow (verifies built-in wins). Adds command_provider dispatch_kind detection via transcript prefix (CMD: vs PLUGIN:) so command-provider scenarios can be distinguished from plugin scenarios even when sharing a provider name. - website/docs/user-guide/features/tts.md: new 'STT custom command providers' section symmetric to the TTS section — example config, placeholder grammar table (input_path / output_path / output_dir / format / language / model), transcript-read-back semantics (file first, then stdout fallback), optional keys table, behavior notes, security note. Updated 'Python plugin providers (STT)' to include the new 'When to pick which (STT)' decision table and updated resolution-order section (now 4 layers instead of 3). Verification ------------ 189/189 STT targeted tests + 50/50 new command-provider tests pass. Combined sweep: tests/tools/ 5576/5576, tests/agent/ + tests/hermes_cli/ 8623/8623 — zero regressions across 14,199 tests. Parity harness: 13 scenarios, 9 OK + 4 expected diffs (no_provider_error → plugin, plugin_unavailable, command_provider × 2). E2E live-verified in an isolated HERMES_HOME with a real .wav file: command: → dispatched to stt.providers.my-fake-cli plugin: → dispatched to registered TranscriptionProvider command-wins-over-plugin: → command provider beats same-name plugin builtin-wins-over-command: → built-in OpenAI handler fires; stt.providers.openai: type: command does NOT hijack it.	2026-05-25 01:41:19 -07:00
kshitijk4poor	2cd952e110	feat(stt): add register_transcription_provider() plugin hook Add an opt-in Python plugin surface for speech-to-text backends, mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio, Gemini-STT, custom proprietary engines) can be implemented as plugins without modifying tools/transcription_tools.py. Built-ins always win -------------------- The 6 built-in STT providers (local/faster-whisper, local_command, groq, openai, mistral, xai) keep their native handlers. Plugins attempting to register under a built-in name are rejected at registration time with a warning and re-checked defensively at dispatch. Resolution order ---------------- 1. stt.provider matches a built-in → built-in dispatch (unchanged) 2. stt.provider matches a registered plugin → a. if plugin.is_available() returns False → unavailability envelope identifying the plugin (not the generic "No STT provider" message — the user explicitly opted into this plugin) b. otherwise plugin.transcribe() with model + language forwarded from stt.<provider>.{model,language} config 3. No match → legacy "No STT provider available" error (unchanged) Per-provider config namespace ----------------------------- Plugins read their config from stt.<provider> in config.yaml, mirroring how built-ins read stt.openai.model / stt.mistral.model. The dispatcher forwards `model` and `language` from this section. Caller's explicit `model=` argument overrides the config-set model. Files ----- - agent/transcription_provider.py: TranscriptionProvider ABC - agent/transcription_registry.py: register/get/list providers, built-in shadow guard, _reset_for_tests - hermes_cli/plugins.py: register_transcription_provider() on PluginContext - tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset, _dispatch_to_plugin_provider() with availability gate, wire-in after xai branch and before "No STT provider" error - tests/agent/test_transcription_registry.py: 27 tests - tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests - tests/tools/test_transcription_plugin_dispatch.py: 28 tests (covering built-in short-circuit, plugin dispatch, exception envelope, non-dict guard, availability gate, language forwarding) - tests/plugins/transcription/check_parity_vs_main.py: 10-scenario subprocess-pinned parity harness vs origin/main - website/docs/user-guide/features/{tts,plugins}.md: docs Behavior parity --------------- 10 scenarios, 8 OK + 2 expected DIFFs: no_provider_error → plugin (plugin-installed scenario) no_provider_error → plugin_unavailable (plugin-installed-unavailable scenario; PR returns cleaner envelope) Zero behavior change for users not opting into a plugin. Issue follow-up to #30398.	2026-05-25 01:41:19 -07:00
Teknium	2e0ac31a72	chore(release): map claw@openclaw.ai to wanwan2qq (PR #10215 )	2026-05-25 01:33:32 -07:00
Teknium	4fbdf0e893	test(cli,gateway): cover bracket-stripping and gateway session-ID lookup - CLI: bracketed/quoted target resolves; mismatched single bracket passes through unchanged. - Gateway: bracketed session ID resolves; bare untitled session ID resolves via get_session() fallback.	2026-05-25 01:33:32 -07:00
Claw Assistant	1c7a783c42	fix(cli,gateway): strip outer brackets/quotes from /resume args + accept session IDs in gateway The /resume usage hint shows '<session_id_or_title>' which a few users have typed verbatim, including the angle brackets. Strip outer <>, [], "", and '' from the argument before lookup so '/resume <abc123>' works the same as '/resume abc123'. Mirrors the new bracket-stripping in the CLI handler. Also let the gateway resolve a bare session ID. Previously the gateway only called resolve_session_by_title, so '/resume <session_id>' always returned 'Session not found' even for valid IDs. Try get_session() first, fall back to title resolution second. Surgical reapply of PR #10215 (branch was based on a many-months-old main and reverted ~3100 unrelated files; original commit by claw@openclaw.ai preserved via --author).	2026-05-25 01:33:32 -07:00
Teknium	920b350e57	test(auth): align copilot-remove test with borrowed-credential policy (#31416 ) PR #31416 (avoid persisting borrowed credential secrets) added sanitize_borrowed_credential_payload, which strips access_token from any auth.json pool entry whose (provider, source) isn't in the _PERSISTABLE_PROVIDER_SOURCES allowlist. (copilot, gh_cli) is borrowed (not in the allowlist), so the test fixture's pre-seeded access_token now gets stripped at load_pool() time, leaving the pool empty. resolve_target('1') then fails with 'No credential #1. Provider: copilot.' Fix: align the test with the new contract. At runtime, copilot tokens are hydrated by resolve_copilot_token() — mock that path so the pool gets an entry the test can remove. The behavior under test (suppression of gh_cli + env variants on remove) is unchanged. CI repro on origin/main HEAD; reproduced locally with stock checkout.	2026-05-25 01:23:31 -07:00
Teknium	9c77a0c3ce	fix(plugins): widen masked secret prompt to plugin setup wizards Extend PR #31716 to plugin setup paths that were also using bare getpass.getpass(): hindsight (4 sites), honcho, simplex, line. Same mechanical swap onto hermes_cli.secret_prompt.masked_secret_prompt.	2026-05-25 01:20:33 -07:00
helix4u	ec4d6f1823	fix(cli): show masked feedback for secret prompts	2026-05-25 01:20:33 -07:00
Glen Workman	d952b377aa	fix: add cron API provenance logging (#24889 ) Co-authored-by: sgtworkman <178342791+sgtworkman@users.noreply.github.com>	2026-05-25 01:15:56 -07:00
teknium1	92d91365e7	chore(release): map zapabob for PR #29826 salvage	2026-05-25 01:15:24 -07:00
zapabob	2c3ca475c0	fix(cron): reject id mutation + validate output paths under OUTPUT_DIR Two defense-in-depth fixes on cron output path handling: 1. cron/jobs.py:update_job() rejects mutation of the immutable 'id' field (raises ValueError). Dashboard PUT /api/cron/jobs/{id} converts this to HTTP 400. Without this, an attacker who can reach the update endpoint could rename a job's id to '../escape' and move its output directory outside OUTPUT_DIR. 2. cron/jobs.py:_job_output_dir() validates job IDs before composing paths: rejects '.', '..', '/', '\\', absolute paths, and Windows drive prefixes. Used by save_job_output() and remove_job() so legacy unsafe IDs (from before this guard) fail closed rather than half-applying a shutil.rmtree or output write outside the sandbox. Tests: - update_job rejects {'id': '../escape'} without renaming - remove_job(legacy '../escape' id) raises ValueError without deleting files outside OUTPUT_DIR or removing the job from the store - save_job_output rejects '..', './escape', 'nested/escape', absolute paths - dashboard PUT /api/cron/jobs/{id} with {'id': '../escape'} returns 400, job list unchanged Salvaged from PR #29826 by @zapabob. Simplified implementation: - Dropped a 23-line _validate_job_output_id() helper using Path.parts semantics. The inline check (path separators + dot-components + is_absolute) is shorter and behaviorally identical. - Dropped the secondary OUTPUT_DIR.resolve()/relative_to() check — redundant once we reject any path separator at the input boundary. - Dropped the _docs/2026-05-21_cron-output-path-hardening_codex.md planning artifact (we don't check planning docs into the repo). Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>	2026-05-25 01:15:24 -07:00
teknium1	0c3e34e298	chore(release): map Schrotti77 for PR #25786 salvage	2026-05-25 01:09:54 -07:00
Schrotti77	9863a07af6	fix(cron): layer agent.disabled_toolsets onto cron baseline (#25752 ) The bug: cron/scheduler.py:_resolve_cron_enabled_toolsets returns an LLM-supplied per-job enabled_toolsets verbatim. The disabled_toolsets passed to AIAgent was a hardcoded [cronjob, messaging, clarify] that ignored agent.disabled_toolsets from config.yaml. An LLM could call cronjob(action='add', enabled_toolsets=['terminal','file'], prompt='...') and the cron-spawned agent would receive terminal+file even when the operator had globally disabled them. Fix: new _resolve_cron_disabled_toolsets() helper that ALWAYS layers agent.disabled_toolsets on top of the cron baseline. AIAgent's disabled_toolsets takes precedence over enabled_toolsets, so this stops the bypass regardless of what the per-job override contains. This is the disabled-side fix. Three concurrent PRs (#25842, #25815, #25780) proposed intersection-side variants on _resolve_cron_enabled_toolsets; this fix is more robust because it stops the leak at the precedence boundary AIAgent itself enforces, not at a layer above. Regression test reproduces the issue's PoC exactly: config.yaml has agent.disabled_toolsets=[terminal,file]; cron job has enabled_toolsets=[web,terminal,file]; assertion: AIAgent receives disabled_toolsets containing terminal AND file. Salvaged from PR #25786 by @Schrotti77. Simplified the implementation: dropped a 23-line _normalize_toolset_list() helper (handled str/tuple/ set/garbage input shapes) in favor of the existing convention (agent_cfg.get('disabled_toolsets') or []) used elsewhere in the codebase. YAML always parses these as lists; the elaborate normalizer was theatre for shapes we never produce. Closes #25752 Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>	2026-05-25 01:09:54 -07:00
teknium1	a6b0414ea0	feat(providers): extend openai-api with live /v1/models fetch + gpt-5.5-pro Follow-up on top of @jacevys' PR #21437 cherry-pick: - _provider_model_ids() now also matches normalized == 'openai-api' for the live /v1/models fetch path, so users see the full catalog instead of just the curated list. - Add gpt-5.5-pro and gpt-5.3-codex to the curated list for parity with the existing 'openai' table (used as fallback when /v1/models fails). - Add scripts/release.py AUTHOR_MAP entry for jacevys so CI doesn't block the salvage PR.	2026-05-25 00:59:53 -07:00
jacevys	aeb87508c6	feat(providers): add OpenAI API provider option	2026-05-25 00:59:53 -07:00
Hasan Ali	d7c5d5dee5	fix: avoid persisting borrowed credential secrets (#31416 )	2026-05-25 00:32:08 -07:00
Teknium	2b768535c9	test(acp): drop flaky runtime_calls[-1] tail-position assertion The legacy runtime_calls[-1] == "anthropic" check in test_model_switch_uses_requested_provider failed in CI under specific test-shard scheduling with 'custom' == 'anthropic', across multiple unrelated PRs on 2026-05-25. The May 23 pin (commit `3127a41cb`) monkeypatched parse_model_input + detect_provider_for_model to remove the dependency on live _KNOWN_PROVIDER_NAMES module state but the flake reappeared anyway — root cause still not reproducible locally even under stress runs. The other three assertions ("Provider: anthropic" in result, state.agent.provider == "anthropic", state.agent.base_url == "https://anthropic.example/v1") already prove fake_resolve_runtime_provider was called with requested="anthropic" for the model-switch step — the agent's provider and base_url come directly from that fake's return value. The tail-position check was redundant and the only assertion that flaked. Replaces runtime_calls[-1] == "anthropic" with "anthropic" in runtime_calls so the plumbing path is still covered without depending on call ordering.	2026-05-24 23:23:12 -07:00
helix4u	3b839f4369	fix(context): align guidance with 64k minimum	2026-05-24 23:23:12 -07:00
Teknium	1d5deac346	fix(website): cross-locale doc links + drop empty ko locale (#31895 ) The locale switcher appeared broken because hardcoded markdown links (`](/docs/X)`) got double-prefixed by Docusaurus to `/docs/<locale>/docs/X` (404) in non-English locales, and the MDX hero `<a href>` on the index page escaped locale routing entirely. Changes: - Rewrite 922 `](/docs/X)` -> `](/X)` across 166 docs files (strip trailing .md too). Docusaurus prepends locale + baseUrl itself. - docs/index.md -> index.mdx; hero "Get Started" anchor -> Docusaurus <Link> so it stays inside the active locale. - Drop `ko` locale entirely from docusaurus.config.ts + delete i18n/ko/ (4 stale auto-translated kanban pages, <2% coverage, misleading). Verified `npm run build` succeeds for both en and zh-Hans; `build/zh-Hans/ index.html` has no /docs/zh-Hans/docs/... double-prefixed paths. PR2 will translate the 335 English docs into i18n/zh-Hans/.	2026-05-24 23:16:20 -07:00
Teknium	b0135c741d	diag(xai-oauth): log loopback callback hits + wait-timeout outcome (#27385 ) (#31894 ) #27385 reports that on macOS the browser sees the xAI 'authorization received' success page but Hermes still raises xai_callback_timeout. The loopback HTTP handler was silent — no log line on receipt, no log line on wait timeout — so triaging the gap between 'browser saw success' and 'CLI saw timeout' required either a code change or guesswork. Adds two INFO log lines: - Per callback hit (handler): path, has_code, has_state, has_error, truncated User-Agent. Booleans / fingerprints only — no actual code/state strings leak. - On wait timeout: report whether result.code or result.error was populated at deadline. Distinguishes three failure modes: 1. No hit log + timeout log w/ has_code=False has_error=False → xAI's IDP never reached the loopback (firewall, port-binding, IPv6/IPv4 mismatch, browser blocked private-network access). 2. Hit log w/ has_code=False has_error=False + timeout log → xAI hit the loopback without OAuth params (the bare-URL case the handler already 400s on). 3. Hit log w/ has_code=True + timeout log w/ has_code=False → result_lock contention or race; would indicate a real bug. 133/133 in tests/hermes_cli/test_auth_xai_oauth_provider.py, tests/hermes_cli/test_xai_oauth_pkce_token_exchange.py, and tests/run_agent/test_codex_xai_oauth_recovery.py.	2026-05-24 23:05:25 -07:00
ethernet	b288de8bf4	Merge pull request #31081 from NousResearch/bb/tui-skinny-status-rule Some checks failed Deploy Site / deploy-vercel (push) Has been cancelled Details Deploy Site / deploy-docs (push) Has been cancelled Details Docker / shell lint / Lint Dockerfile (hadolint) (push) Has been cancelled Details Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Has been cancelled Details Docker Build and Publish / build-amd64 (push) Has been cancelled Details Docker Build and Publish / build-arm64 (push) Has been cancelled Details Lint (ruff + ty) / ruff + ty diff (push) Has been cancelled Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Has been cancelled Details Lint (ruff + ty) / Windows footguns (blocking) (push) Has been cancelled Details Nix / nix (macos-latest) (push) Has been cancelled Details Nix / nix (ubuntu-latest) (push) Has been cancelled Details Tests / test (1) (push) Has been cancelled Details Tests / test (2) (push) Has been cancelled Details Tests / test (3) (push) Has been cancelled Details Tests / test (4) (push) Has been cancelled Details Tests / test (5) (push) Has been cancelled Details Tests / test (6) (push) Has been cancelled Details Tests / e2e (push) Has been cancelled Details Docker Build and Publish / merge (push) Has been cancelled Details Docker Build and Publish / move-latest (push) Has been cancelled Details Tests / save-durations (push) Has been cancelled Details fix(tui): keep status rule one-line in skinny terminals	2026-05-25 01:24:29 -04:00
Ben Barclay	7e165e843d	Merge pull request #31760 from NousResearch/hermes/hermes-bf5898da feat(docker)!: s6-overlay container supervision (salvage of #30136)	2026-05-25 12:57:51 +10:00
Teknium	46f8948bad	test+harden(cli): cover parent-chain walk in concurrent-instance detection Follow-up to @Strontvod's fix. Tests: - Five new tests in test_update_concurrent_quarantine.py cover the parent- chain exclusion: the .exe launcher is excluded, an unrelated sibling hermes.exe is still reported, multi-level ancestry is fully excluded, PID cycles in the parent chain don't hang, and a partially-stubbed psutil (no Process attribute) degrades gracefully instead of crashing. - New _fake_psutil_with_parent_chain helper builds a fuller stand-in (Process / NoSuchProcess / AccessDenied + process_iter) than the process_iter-only SimpleNamespace the older tests use. Hardening: - Broaden the except in the parent-walk to bare Exception. The original fix listed (NoSuchProcess, AccessDenied, ValueError), but those names are evaluated lazily during exception matching — if psutil is a partial stub without the attribute, the exception handler itself raises AttributeError that escapes. The function is documented as 'never raises' (the surrounding update flow depends on it), so the broader catch keeps the contract regardless of how the dependency is shaped. AUTHOR_MAP: - Map schepers.zander1@gmail.com -> Strontvod so the salvaged commit resolves to @Strontvod in the release notes. All 18 detect_concurrent + quarantine tests pass.	2026-05-24 19:51:46 -07:00
Strontvod	323cce7e94	fix: exclude parent process chain from concurrent instance detection on Windows On Windows, the setuptools-generated hermes.exe launcher is a separate native process that spawns python.exe (the interpreter running the update code). os.getpid() returns the Python PID, but the launcher (which holds the file lock) is the parent. Without walking the parent chain, every 'hermes update' reports its own launcher as a concurrent instance - a false positive. This patch builds an exclusion set containing the Python process and its entire ancestor chain, so the running invocation never reports itself.	2026-05-24 19:51:46 -07:00
Ben	da8b2e95fd	ci(docker): run tests/docker/ in build-amd64 against the freshly-built image The new tests/docker/ suite (added by this PR) was being picked up by the sharded pytest matrix in tests.yml, where its session-scoped `built_image` fixture issued a 3-7min `docker build` under tests/docker/conftest.py's 180s pytest-timeout cap. Every test in the directory failed in fixture setup across all 6 shards. Fix the suite so it actually runs (not skips): 1. Wire the docker tests into docker-publish.yml's build-amd64 job, right after the existing smoke test. The image is already loaded into the local daemon as `nousresearch/hermes-agent:test`; set HERMES_TEST_IMAGE to that and the fixture's pre-built-image branch short-circuits the rebuild. 21 tests run in ~90s locally against a prebuilt image, no rebuild cost on top of the existing build step. 2. Exclude tests/docker/ from scripts/run_tests_parallel.py's default discovery so the sharded matrix in tests.yml stops trying to build the image. Explicit positional paths (`pytest tests/docker/` or `scripts/run_tests.sh tests/docker/`) still pick the suite up — the skip rule honors directory-level user intent, matching the existing per-file override pattern. The dedicated docker-tests step runs on every PR that touches docker code (the existing path filters on docker-publish.yml already cover `tests/docker/` via `/*.py`), so the suite gates real changes. (cherry picked from commit `4c481860ce`)	2026-05-25 12:40:57 +10:00
Ben	c524b8a4dc	test(docker): fix svstat 'want up' assertion in profile-gateway lifecycle test After the supervise-perms fix lands, the s6 lifecycle actually works for the hermes user — hermes -p <profile> gateway start now genuinely brings the supervised gateway up rather than silently no-op'ing on EACCES. That exposes a latent bug in this test's assertion: it expected 'want up' to appear literally in s6-svstat output, but s6-svstat elides redundancies — when the slot is currently up AND s6 wants it up, the output is just 'up (pid N pgid N) X seconds'; the explicit 'want up' token only appears when current ≠ wanted (e.g. 'down (exitcode 1) … , want up' on a crash-loop). Add a small helper _svstat_wants_up() that reads the want-state correctly across both spellings: * 'up …' → wanted up (unless explicit 'want down') * 'down …, want up' → wanted up explicitly * 'down …' → wanted down Both stop and start assertions now use the helper. Also rewords the module docstring to acknowledge that the supervised process may succeed OR crash-loop depending on environment, but the want- state contract holds either way. (cherry picked from commit `02c933aedc`)	2026-05-25 12:25:06 +10:00
Ben	7d54288d82	test(dockerfile): recognize s6-overlay/init as a valid PID-1; harden against historical-comment masquerade PR #30136 CI: test_dockerfile_entrypoint_routes_through_the_init failed because the test hardcoded known_inits = ('tini', 'dumb-init', 'catatonit'). The PR replaced tini with s6-overlay's /init (which execs s6-svscan as PID 1) — same SIGCHLD-reaping contract, different name, so the substring scan against ENTRYPOINT missed it. Two-part fix: 1. Extend the accepted token list to include 's6-overlay', 's6-svscan', and '/init'. The contract these tests enforce is behavioural ('some PID-1 init reaps SIGCHLD'), so the names list is purely a recognition table and any reaper-capable family should qualify. 2. Harden test_dockerfile_installs_an_init_for_zombie_reaping (the sibling check) against comment-only matches. It was scanning the full Dockerfile text and only passed because the word 'tini' is still in a historical comment explaining why we used to use it. The next person to clean up that comment would have silently broken the test. New _instruction_text() helper joins only the parsed, non-comment Dockerfile instructions so stale comments can't satisfy the check. (cherry picked from commit `ffc1bb6393`)	2026-05-25 12:24:58 +10:00
Ben	4f416fc40c	fix(docker): make s6 lifecycle work for the unprivileged hermes user Resolves the explicit "Known follow-up" left by commit `2f8ceeab9` and the resulting CI failures in tests/docker/test_dashboard.py and tests/docker/test_s6_profile_gateway_integration.py. The product gap --------------- Every hermes runtime operation inside the container runs as the hermes user (UID 10000) via s6-setuidgid. But s6-supervise — spawned by s6-svscan running as PID 1 — creates each service's supervise/ and top-level event/ directories with mode 0700 owned by its effective UID (root). That left every s6-svc / s6-svstat / s6-svwait call from hermes hitting EACCES on the supervise/control FIFO and supervise/status — i.e. the entire S6ServiceManager lifecycle (register, start, stop, unregister) was inert in production. The `2f8ceeab9` commit message called this out and deferred the fix. The audit changes that landed alongside it (defaulting docker_exec to -u hermes) made the integration tests reproduce the bug deterministically; the fix below resolves it. The fix: pre-create the supervise/ skeleton hermes-owned ---------------------------------------------------------- Reading s6's source (src/supervision/s6-supervise.c::trymkdir + control_init), the mkdir and mkfifo calls that build the supervise tree are EEXIST-safe: if the directory or FIFO is already present, s6-supervise reuses it and skips the chown/chmod fix-up that would normally make event/ 03730 root:root. So if we lay the skeleton down with hermes ownership before triggering s6-svscanctl -a, s6-supervise inherits our layout and never touches it. The death_tally / lock / status regular files written later by s6-supervise (still as root) land mode 0644 — world-readable — which is all s6-svstat needs. New module-level helper _seed_supervise_skeleton(svc_dir) in hermes_cli/service_manager.py lays down: svc_dir/event/ hermes:hermes 03730 svc_dir/supervise/ hermes:hermes 0755 svc_dir/supervise/event/ hermes:hermes 03730 svc_dir/supervise/control hermes:hermes 0660 (FIFO) svc_dir/log/event/ hermes:hermes 03730 (if log/ present) svc_dir/log/supervise/ hermes:hermes 0755 svc_dir/log/supervise/event/ hermes:hermes 03730 svc_dir/log/supervise/control hermes:hermes 0660 (FIFO) The log/ branch matters because the logger is a second s6-supervise instance — without it, unregister rmtree races on the logger's root-owned supervise dir even after the parent slot's supervise/ is hermes-owned. The helper is idempotent and swallows PermissionError on chown so it works equally well when called from root (cont-init.d) or hermes (runtime register). Wiring ------ 1. S6ServiceManager.register_profile_gateway calls _seed_supervise_skeleton(tmp_dir) just before publishing the slot via Path.replace. Runtime-registered profile gateways are set up by hermes. 2. container_boot._register_service does the same in the cont-init.d reconciliation path so boot-time-restored profile slots inherit the same layout. 3. New cont-init.d/015-supervise-perms script chowns the supervise/ and event/ trees for STATIC s6-rc services (dashboard, main-hermes). These are spawned by s6-rc before cont-init.d gets to run, so the EEXIST-trick doesn't apply; we chown the already-existing tree instead. s6-supervise keeps using the same files; it never re-asserts ownership on a running service. The script skips s6-overlay internal services (s6rc-, s6-linux-) so the supervision tree itself stays root-only. 015- slot is intentional: lex-sorts between 01-hermes-setup and 02-reconcile-profiles in the container's C-locale, so the chown finishes before the reconciler walks the scandir. Unregister teardown reordering ------------------------------ S6ServiceManager.unregister_profile_gateway now fires s6-svscanctl -an BEFORE rmtree (with a 200ms grace), so s6-svscan reaps the supervise child and releases its file handles on supervise/lock + supervise/status before we try to remove the directory. Previously rmtree raced s6-supervise on a set of files inside the supervise dir, and even with the parent supervise/ now hermes-owned, the contained files (death_tally, lock, status, written by root) could still be in use. Dashboard down-state redesign ----------------------------- The original PR #30136 review fix wrote a 'down' marker file into /run/service/dashboard/ via cont-init.d/03-dashboard-toggle. That approach was broken in two ways: (a) /run/service/dashboard is a symlink to a TRANSIENT /run/s6-rc:s6-rc-init:<tmpdir>/ directory while s6-rc is mid-transaction; the touch landed in a soon-to-be-discarded tmp. (b) Even when written to the final /run/s6-rc/servicedirs/ location, the 'down' file is only consulted by s6-supervise at slot startup. s6-rc's user-bundle explicitly transitions 'dashboard' to 'up' on every boot, overriding any down marker. The right fix is the canonical s6 pattern: when HERMES_DASHBOARD is unset, the dashboard run script exits 0 and a companion finish script exits 125. Per s6-supervise(8), exit code 125 from the finish script is the 'permanent failure, do not restart' marker — equivalent to s6-svc -O. The slot reports as 'down' to s6-svstat, matching the reality that no dashboard process is running. When HERMES_DASHBOARD IS truthy, finish exits 0 and restart-on-crash semantics apply. 03-dashboard-toggle is removed (its function is now subsumed by the run/finish pair). Tests ----- Adds four unit tests for _seed_supervise_skeleton covering the produced layout, the log/ subservice case, the skip-when-no-log case, and idempotency. The live-container verification continues to live in tests/docker/test_s6_profile_gateway_integration.py and tests/docker/test_dashboard.py — both now pass against the rebuilt image. References ---------- * Skarnet skaware mailing list 2020-02-02 (Laurent Bercot + Guillermo Diaz Hartusch) on unprivileged s6 tool semantics: http://skarnet.org/lists/skaware/1424.html * just-containers/s6-overlay#130 — same EEXIST-preseed pattern, community-validated 2016 onward * https://skarnet.org/software/s6/servicedir.html — exit-code 125 semantics in finish scripts (cherry picked from commit `c41f908ad4`)	2026-05-25 12:23:23 +10:00
Ben Barclay	a3abeb5954	Merge pull request #31775 from NousResearch/extending-docker-docs docs(docker): add 'Installing more tools in the container' section	2026-05-25 11:41:59 +10:00
Ben	6840ca2d1e	docs(docker): add 'Installing more tools in the container' section Documents five approaches for adding tools beyond what the official image ships with: npx/uvx for npm/Python tools, ad-hoc apt installs that Hermes remembers, derived images for durability, sidecar containers for multi-service stacks, and upstreaming via issue/PR for broadly useful additions.	2026-05-25 11:40:58 +10:00
teknium1	7f6f00f6ec	test(dockerfile): accept s6-overlay /init as a known PID-1 init Follow-up to @benbarclay's #30136 salvage. The pre-existing PID-1 contract tests in tests/tools/test_dockerfile_pid1_reaping.py (added with #15012) hardcoded tini/dumb-init/catatonit as the only accepted inits, so they failed after #30136 replaced tini with s6-overlay's /init. s6-overlay's PID 1 is s6-svscan, which reaps zombies non-blockingly on SIGCHLD — same contract the test exists to enforce. Two updates: * test_dockerfile_installs_an_init_for_zombie_reaping — accept 's6-overlay' as a known-installed marker (matches the s6-overlay install layer in Ben's Dockerfile). * test_dockerfile_entrypoint_routes_through_the_init — accept '/init' as a known-routed marker (s6-overlay's PID-1 binary lives at /init by convention). Both assertions still fire if a future Dockerfile rewrite drops the init entirely. Local: 7/7 pass.	2026-05-24 18:32:14 -07:00
teknium1	5cbb132c1d	fix(ci): exclude tests/docker/ from regular test shards; pin read_text encoding Two CI follow-ups to @benbarclay's #30136 salvage: 1. scripts/run_tests_parallel.py — add 'docker' to _SKIP_PARTS so the new tests/docker/ harness doesn't run in the regular test (N) matrix. The harness builds the real Dockerfile in a session fixture, which can exceed pytest-timeout's 180s ceiling on ubuntu-latest where Docker IS available — it surfaced as 6 identical setup-timeout failures across slices 1–6 on the first CI run. The docker harness has its own dedicated runner via .github/actions/hermes-smoke-test (added in #30136) plus the docker-lint workflow. Same treatment as tests/integration/ and tests/e2e/ — runs separately, not in the main shards. 2. hermes_cli/service_manager.py — pin encoding='utf-8' on the /proc/1/comm read_text call. Ruff PLW1514 enforcement rolled in between Ben's last push and the salvage; pure ruff-fix, no behavior change.	2026-05-24 18:23:13 -07:00
teknium1	af144cd60d	fix(model): include Premium+ in xAI OAuth label X Premium+ also grants Grok OAuth access — the 'SuperGrok Subscription' wording suggested SuperGrok was the only entitlement path. Updated to 'SuperGrok / Premium+' across the picker label, setup wizard, auth flows, and docs so Premium+ subscribers know the row applies to them too.	2026-05-24 18:12:16 -07:00
helix4u	4987fd2a59	fix(model): disambiguate xAI OAuth picker label	2026-05-24 18:12:16 -07:00
Teknium	031f9c9edc	fix(image_gen): cache xAI ephemeral URL responses to disk (#26942 ) (#31759 ) xAI's grok-imagine-image API returns ephemeral imgen.x.ai/xai-tmp-* URLs that 404 within minutes — long before downstream consumers (Telegram send_photo, browser preview, multi-tier delivery fallback) get a chance to fetch them. The xAI image_gen provider was passing those URLs through unchanged on the elif url: branch; b64 responses were already cached locally via save_b64_image. Result: every image_generate call on a Telegram-routed xai-oauth profile delivered no image, falling through to text-only. Adds agent.image_gen_provider.save_url_image() — a sibling helper to save_b64_image that downloads URL bytes to $HERMES_HOME/cache/images/. Content-type-aware extension inference with URL-suffix fallback; oversize cap (25MB default) with partial-write cleanup; empty-body refusal. Mirrors the audio_cache pattern used by text_to_speech. Wires save_url_image into both the xAI and OpenAI providers' URL branches. When the download fails (network blip, 404 in-flight) we log a warning and fall back to the bare URL rather than turning the tool call into a hard error — the gateway's existing URL-send fallback then gets a chance to surface the original error legibly. Test plan: - tests/agent/test_save_url_image.py — 8 direct tests against a real in-process HTTP server: bytes round-trip, content-type → extension, URL-suffix fallback, default-to-png, 404 propagation, empty-body refusal, oversize cap + cleanup, filename uniqueness. - tests/plugins/image_gen/test_xai_provider.py — flip test_successful_url_response (was asserting the bug), add test_url_response_falls_back_to_bare_url_when_download_fails. - tests/plugins/image_gen/test_openai_provider.py — symmetric pair. 160/160 in the broader image_gen test surface.	2026-05-24 18:10:47 -07:00
teknium1	a4092ab217	fix(profiles): short-circuit s6 hooks on host before importing service_manager Follow-up to @benbarclay's Docker s6 PR (#30136). The Phase 4 hooks `_maybe_register_gateway_service` and `_maybe_unregister_gateway_service` were already documented as "no-op on host", but they reached that no-op by: 1. importing `hermes_cli.service_manager` 2. calling `get_service_manager()` (which calls `detect_service_manager()`) 3. checking `mgr.supports_runtime_registration()` and returning False If anything in step 1 or 2 raised an unexpected exception (e.g. a host machine with a partial s6 install — `/proc/1/comm == s6-svscan` somehow, but `/run/s6/basedir` absent, or vice versa), the `except Exception` in the hook would print a confusing "⚠ Could not register s6 gateway service: ..." warning on a non-container machine that has never touched the container. Reorder so `detect_service_manager() != "s6"` is checked FIRST, and return silently for any detection failure. Host machines now: - never import the s6 backend - never call get_service_manager() - never print an s6-shaped warning under any failure mode E2E confirmed on host Linux (systemd): `_maybe_register_gateway_service(...)` produces empty stdout, detect_service_manager() returns "systemd". Existing tests updated to patch `detect_service_manager` for the s6 call-through cases (they previously relied on get_service_manager being the only gate, which is no longer true). Added one new test — `test_register_silent_when_detect_throws` — asserting that a broken detector cannot leak a warning to host users. cc @benbarclay — visible behavior change vs. your branch is one fewer code path on host. Test changes are minimal (one helper + `_patch_detect_s6` opt-in per s6 test). Happy to revert if you prefer the original shape.	2026-05-24 18:07:47 -07:00
kshitijk4poor	af973e4071	refactor(gateway): migrate Mattermost adapter to bundled plugin Second migration of an existing built-in platform adapter after Discord (PR #30591) — follows the same shape established by IRC / Teams / LINE / Google Chat / SimpleX and the playbook in `references/platform-plugin-migration.md`. Advances the umbrella refactor in #3823. Matches Discord's parity bar — adapter under `plugins/platforms/mattermost/` with the standard `__init__.py` / `adapter.py` / `plugin.yaml` shell, `register(ctx)` entry point, no back-compat shim at the old import path, and full parity for all five hooks Discord uses plus the `apply_yaml_config_fn` hook (mattermost is the second consumer of #25443 after Discord): * `standalone_sender_fn` — out-of-process cron delivery via Mattermost REST API. Picks up the thread_id + media_files capabilities the legacy `_send_mattermost` lacked (parity with Discord's `_standalone_send`). * `setup_fn` — interactive `hermes setup gateway` wizard. * `apply_yaml_config_fn` — translates `config.yaml` `mattermost:` keys (`require_mention`, `free_response_channels`, `allowed_channels`) into `MATTERMOST_` env vars (replaces the hardcoded block in `gateway/config.py`). `is_connected` — declares connection state from `MATTERMOST_TOKEN` + `MATTERMOST_URL`. * `check_fn` — verifies aiohttp is installed and both required env vars are set. * plus `allowed_users_env`, `allow_all_env`, `cron_deliver_env_var`, `max_message_length` (4000 — Mattermost practical limit), `emoji`, `required_env`, `install_hint`. Files ----- * `gateway/platforms/mattermost.py` (873 LOC) → `plugins/platforms/mattermost/adapter.py` (git rename, R071) + appended `register()` block, hook helpers, and `_standalone_send` with media upload + thread_id support. * New `plugins/platforms/mattermost/{__init__.py, plugin.yaml}` with `requires_env` / `optional_env` declarations covering MATTERMOST_URL, MATTERMOST_TOKEN, MATTERMOST_ALLOWED_USERS, MATTERMOST_ALLOW_ALL_USERS, MATTERMOST_HOME_CHANNEL, MATTERMOST_REPLY_MODE, MATTERMOST_REQUIRE_MENTION, MATTERMOST_FREE_RESPONSE_CHANNELS, MATTERMOST_ALLOWED_CHANNELS. * `gateway/config.py`: delete 17-LOC `mattermost_cfg` YAML→env bridge (moved into plugin's `_apply_yaml_config`). * `gateway/run.py::_create_adapter`: delete `Platform.MATTERMOST elif` — replaced by the existing generic plugin-registry-first dispatch. * `tools/send_message_tool.py`: delete `_send_mattermost` (22 LOC) + `Platform.MATTERMOST elif` in `_send_to_platform` — the `else` branch already routes plugin platforms through `_send_via_adapter`, which hits the registry's `standalone_sender_fn`. * `hermes_cli/setup.py`: delete `_setup_mattermost` (44 LOC) — replaced by the plugin's `interactive_setup`. * `hermes_cli/gateway.py`: delete `_PLATFORMS["mattermost"]` dict entry (3 LOC) — plugin's `setup_fn` is dispatched via the plugin path in `_configure_platform`. * Consumer rewrite: 5 test files (test_mattermost.py, test_media_download_retry.py, test_send_multiple_images.py, test_stream_consumer.py, test_ws_auth_retry.py) get `gateway.platforms.mattermost` → `plugins.platforms.mattermost.adapter` with the bulk-rewrite recipe from the platform-plugin-migration playbook. Single `mock.patch` string in test_stream_consumer.py also repointed. * `tests/tools/test_send_message_missing_platforms.py`: thin `(token, extra, chat_id, message)` compat shim around the plugin's `_standalone_send(pconfig, …)` so existing test bodies continue to work without rewriting every signature. Validation ---------- * Plugin discovery: mattermost registers from `plugins/platforms/mattermost/` alongside discord / teams / irc / line / google_chat / simplex. All 9 hooks present (setup_fn, standalone_sender_fn, apply_yaml_config_fn, is_connected, check_fn, allowed_users_env, allow_all_env, cron_deliver_env_var, max_message_length=4000). * Mattermost-touching tests: 62/62 pass (`test_mattermost.py` + `test_send_message_missing_platforms.py`). * Targeted selectors (mattermost or platform_registry or stream_consumer or ws_auth_retry or media_download_retry or send_multiple_images or send_message_tool or platform_connected): 433/433 pass. * Full sweep (`scripts/run_tests.sh tests/gateway/ tests/cron/ tests/tools/test_send_message_tool.py tests/tools/test_send_message_missing_platforms.py tests/integration/`): 6220/6220 pass in 47.8s, 0 failures. * Lint: ruff clean on all touched files. * Git identity verified: kshitijk4poor. * Rename detection: R071 (similarity dropped from a hypothetical R09x by the ~320-line appended register block — ~36% growth over the 873-LoC base, vs Discord's 5101 LoC base which kept R091). Closes part of #3823.	2026-05-24 18:05:33 -07:00
Ben	6c49bdc4f4	docs(plans): trim s6-overlay plan to a post-implementation reference PR #30136 review item O7: the plan doc was 3,191 lines — 5x the size of any other plan in docs/plans/ and the largest reference document in the repo. With the implementation shipped, most of that content is either: * The phase-by-phase TDD walkthrough (~2,800 lines): now canonical in the PR commit log (`git log a957ef083..a6f7171a5`). * The v2/v3 re-validation preambles: artifacts of the planning process, no longer load-bearing. * The full Open Questions deliberations with options A/B/C laid out: collapsed into the Decision Log. * The Rollout Plan and Estimated Timeline: history. Trim to ~430 lines covering what readers actually need going forward: the goal, architecture, scope, key design decisions (D1–D9), risk register (now including the three risks surfaced in PR review — `_s6_running` detection, svscanctl FIFO perms, supervise control FIFO perms), the decision log including the post-merge additions, and the verification checklist (now all boxes ticked). Header now reads 'Status: shipped' and points at the PR. The git history preserves the full v3 plan for anyone who needs it.	2026-05-24 18:05:33 -07:00
Ben	cd5b2c4123	test(docker): poll for boot-log signal instead of fixed sleeps PR #30136 review item O6: test_container_restart.py used fixed `time.sleep(8)` calls after `docker restart` to wait for the cont-init reconciler to finish. Fixed sleeps are slow when the event happens fast and false-fail when the event happens slow. Replace with two polling helpers: * `_wait_for_path(container, path, kind='f' \| 'd', deadline_s=...)` — generic `test -f/-d` poller. Returns True on success, False on timeout; callers assert with a clear message. * `_wait_for_reconcile_log_mention(container, profile, ...)` — the reconciler's per-profile log line is the canonical signal that the cont-init reconcile has finished for that profile. Poll on it instead of a sleep that hopes 8 seconds is enough. The fixture-level setup wait is similarly migrated: it now polls for `profile=default` in the boot log (every container always gets a default-slot entry per item I1) and raises a clear timeout error from the fixture if the container never finishes cont-init — much better diagnostics than a mid-test KeyError. The remaining `time.sleep()` calls are all internal interval_s between probe attempts; no fixed wait points left.	2026-05-24 18:05:33 -07:00

1 2 3 4 5 ...

9490 commits