Default spawn did not propagate HERMES_HOME when forking kanban workers.
The worker's env is copied from the parent via dict(os.environ), so
HERMES_HOME is absent. When the child then starts hermes -p <profile>,
the CLI's _apply_profile_override() runs before hermes_constants is
imported and get_hermes_home() falls back to ~/.hermes (the default
profile root), silently ignoring the profile's config.yaml. Profile-
scoped fallback_providers, toolsets, and agent settings are therefore
never applied to kanban workers.
The fix injects HERMES_HOME into the worker's env using
resolve_profile_env(profile_arg) so the child reads the correct profile
directory instead of the default root.
When a kanban worker subprocess hits the iteration budget, the agent
loop strips tools and asks the model for a summary. The model cannot
call kanban_block itself at that point, so the process exits rc=0
without calling kanban_complete or kanban_block — a protocol violation
that the dispatcher detects as a fatal error, giving up after 1 failure
and stranding downstream tasks.
Fix: after _handle_max_iterations() returns, check HERMES_KANBAN_TASK
and call kanban_block with a reason describing the exhaustion. The
dispatcher then sees a clean block transition instead of a protocol
violation, and the task can be retried or escalated by a human.
Fixes [Bug] kanban-worker exits cleanly (rc=0) on iteration-budget
exhaustion without calling kanban_complete or kanban_block #23216
The container entrypoint ran `chown -R` on $HERMES_HOME every start.
`chown` strips the setgid bit (kernel security behavior), destroying
the 2770 permissions the NixOS activation script sets for group access
by hostUsers. This caused PermissionError for interactive CLI users
even though they were in the hermes group.
Replace with `find ... ! -user $UID -exec chown` which only touches
files with wrong ownership, leaving correctly-owned directories and
their permission bits intact.
Affects: container.enable + container.hostUsers + addToSystemPackages
Related: #19795, #19788, #9383
Expose the dependency-groups parameter from python.nix through
hermes-agent.nix and the NixOS module, allowing users to opt into
pyproject.toml optional extras (e.g. hindsight, voice, matrix) that
are resolved by uv inside the sealed venv.
Unlike extraPythonPackages (which appends to PYTHONPATH and requires
collision checking), extraDependencyGroups resolves the full dependency
graph in a single uv pass — no PYTHONPATH patching, no version
conflicts, no collision risk.
When to use which:
- extraDependencyGroups: enable a pyproject.toml optional extra
- extraPythonPackages: add an external Python plugin not in pyproject.toml
Usage:
services.hermes-agent.extraDependencyGroups = [ "hindsight" ];
Or via overlay:
pkgs.hermes-agent.override { extraDependencyGroups = [ "hindsight" ]; }
Refs: #8873, #9194
Declares hindsight-client as an optional dependency group [hindsight]
in pyproject.toml. This allows build-time inclusion for environments
where runtime pip install is not possible (NixOS sealed venvs, Docker,
Kubernetes).
Not included in [all] — memory providers are plugins and should be
opted into explicitly.
Install via:
uv sync --extra hindsight
pip install hermes-agent[hindsight]
NixOS (with extraDependencyGroups):
services.hermes-agent.extraDependencyGroups = [ "hindsight" ];
Closes#8873
Two independent opt-in QoL toggles, both off by default.
terminal.docker_extra_args:
- List of extra flags appended verbatim to docker run after security
defaults. Useful for adding capabilities (e.g. --cap-add SETUID) or
other docker run options not exposed by existing config keys.
- Non-string entries are logged and skipped.
- Also available via TERMINAL_DOCKER_EXTRA_ARGS='[...]' env var.
display.timestamps:
- Appends [HH:MM] to user input bullet and the assistant response box
header. Single hub in _format_submitted_user_message_preview()
covers both single-line and multi-line user previews; assistant
response label gets the timestamp at box-open time.
Closes#1569 (timestamps).
Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>
When an auxiliary provider returns HTTP 402 (credit / payment), every
subsequent compression / title-gen / session-search / vision call still
re-tried it as the FIRST entry in the chain — burning ~1 RTT to hit 402
again, then falling back. On a long Discord/LCM session that meant dozens
of doomed 402s per minute (issue #23570).
Add a per-process unhealthy-provider cache with a 10 min TTL. When any
caller observes a payment error against a provider, the label is marked
unhealthy and skipped by:
* _resolve_auto Step-1 (main provider use-as-aux path)
* _resolve_auto Step-2 (aggregator/fallback chain)
* _try_payment_fallback (used by call_llm/acall_llm on first 402)
Skip-logs are throttled to once per minute per label so a bursty session
doesn't spam agent.log. Entries auto-expire so a topped-up account
recovers without manual intervention. The cache is in-process only by
design — multi-profile users with different keys per profile must each
hit the 402 once.
Refs #23570
When the Discord typing API call fails (rate limit, network error, 403),
_typing_loop returns early but the stale task remains in _typing_tasks.
Subsequent send_typing calls see the stale entry and skip, leaving no
typing indicator for the rest of the agent invocation.
Add finally block to _typing_loop to always remove the task from
_typing_tasks on exit, whether from cancellation, error, or normal
completion. This allows send_typing to create a fresh task.
3 new tests in test_discord_send.py:
- Task removed after API error
- Typing restartable after failure
- stop_typing cleans up
A YAML parse error in ~/.hermes/config.yaml caused load_config() to print
one line to stdout (Warning: Failed to load config: ...) and silently fall
back to DEFAULT_CONFIG, dropping every user override (auxiliary providers,
fallback chain, model settings). Users only noticed when downstream
behavior misbehaved — see issue #23570 where a tab-indent error in the
auxiliary section caused aux fallback to use OpenRouter (depleted) instead
of the configured Codex/MiniMax chain.
Now: log at WARNING (so 'hermes logs' surfaces it), write a prominent line
to stderr, dedup on (path, mtime_ns, size) so concurrent loads don't spam,
and re-warn after the user edits the file. Both call sites (raw read +
merged load) route through the same helper.
Refs #23570
Salvages the three substantive low-severity fixes from Gutslabs' #1974
"misc bug fixes" bundle. The other 8 claims in that PR were either
already fixed on main with superior implementations (state lock,
firecrawl lazy import, fcntl/msvcrt guard, path normalization, schema
migrations) or did not survive review.
- run_agent: `_materialize_data_url_for_vision` uses
`NamedTemporaryFile(delete=False)`; if `base64.b64decode` raises on a
corrupt data URL the temp file would persist forever. Wrap the
write in try/except and `os.unlink` the temp on failure.
- gateway/session: `append_to_transcript` JSONL write had no error
handling, so disk-full / read-only-fs / permission errors crashed the
message handler. The SQLite write above is the primary store, so
swallow OSError on the JSONL fallback with a debug log.
- gateway/status: `_read_pid_record` reads `pid_path.read_text()` after
an `exists()` check; if the PID file is deleted between the two
calls (concurrent gateway restart) we hit an unhandled OSError.
Catch it and return None.
Adds a regression test for the tempfile cleanup; the other two paths
are defensive try/excepts on infrequent OSError that don't warrant
dedicated tests.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Re-authored against current main from PR #10388 by @wilsen0. The
original branch is 3800+ commits stale and could not be cherry-picked
without reverting unrelated work; this change carries only the perf
intent forward.
Tuning summary
==============
Text-batch ingress (gateway/platforms/telegram.py):
- HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS default 0.6 -> 0.3
- HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS default 2.0 -> 1.0
- Adaptive fast-path tiers in _flush_text_batch:
total <= 320 cp -> min(cap, 0.18)
total <= 1024 cp -> min(cap, 0.24)
else -> cap
A single short reply now reaches the agent in ~180ms instead of
600ms. Tier constants compose with the configured cap via min()
so an operator who tightens HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS
below 0.18 still wins on every tier.
- _env_float_clamped helper replaces bare float(os.getenv()).
Rejects NaN / Inf, applies optional min/max bounds. Used for
text-batch + media-batch knobs. Prevents asyncio.sleep(NaN)
crashes when an operator typos an env var.
Stream cadence (gateway/config.py + stream_consumer.py):
- StreamingConfig.edit_interval default 1.0s -> 0.8s
- StreamingConfig.buffer_threshold default 40 -> 24 chars
- DEFAULT_STREAMING_EDIT_INTERVAL / BUFFER_THRESHOLD / CURSOR are now
a single source of truth. StreamConsumerConfig imports them
instead of duplicating the literals; the prior dual-source drift
is fixed.
Tool progress (gateway/display_config.py):
- Telegram default tool_progress 'all' -> 'new'. Inside
Telegram's ~1 edit/s flood envelope the 'all' default would
accumulate edit pressure on busy chats; 'new' shows only the
leading bubble per tool batch and feels less spammy.
- Slack tier_low override (tool_progress='off') is preserved.
Composition with native draft streaming (#23512)
================================================
The mid-stream cadence (edit_interval, buffer_threshold) gates BOTH
the draft path (send_draft) and the edit path (edit_message), so the
tighter cadence helps native draft as much as edit-based. The
text-batch fast-path applies before the consumer starts, so it speeds
up the first-token latency on every transport. No conflict.
Stale-base avoidance
====================
Re-authored from scratch rather than cherry-picked. Dropped from the
original branch:
- Unrelated d2f043f9c 'fix(anthropic): preserve third-party thinking
continuity' commit
- boot_md.py builtin gateway hook (unrelated)
- Reverted Slack tool_progress='off' (#14663) restoration
- Reverted Platform plugin discovery, MSGRAPH_WEBHOOK, YUANBAO
members deletion
- 2300+ lines of run.py base-skew noise
Tests
=====
New tests/gateway/test_telegram_text_batch_perf.py:
- 7 tests for _env_float_clamped (NaN, Inf, garbage, bounds).
- 4 tests for the adaptive-tier composition rules.
Updated tests/gateway/test_display_config.py:
- test_platform_default_when_no_user_config: 'all' -> 'new' for
Telegram, with comment.
- test_high_tier_platforms: split into Telegram-overrides-to-new
and Discord-stays-all assertions.
Closes#10388.
Co-authored-by: wilsen0 <132184373+wilsen0@users.noreply.github.com>
More specific name. The skill is REST + GraphQL debugging end-to-end,
not generic 'api testing' (a smoke-test pytest scaffold is one short
section out of ~500 lines). Renames directory + frontmatter name +
self-reference in the delegate_task example body.
- Implement tests for normalizing perpetual markets and DEXs.
- Validate JSON output for main commands including markets, candles, and review.
- Ensure environment variable resolution and dotenv file reading are covered.
- Test export functionality for market data with expected output structure.
agent.redact._REDACT_ENABLED is snapshotted at import time from
HERMES_REDACT_SECRETS env. Under xdist a prior test in the same worker
can flip it, so test_exec_command_output_is_redacted was order-dependent.
Pin it via monkeypatch like test_terminal_output_transform_still_runs_strip_and_redact does.
1. Quick command exec ran in the gateway process's full environment
without env sanitization or output redaction. A quick command like
"env" or "printenv" would leak all API keys, OAuth tokens, and
bot credentials to the messaging user.
Fix: apply _sanitize_subprocess_env() before exec and
redact_sensitive_text() on output before returning.
2. GatewayRunner._pending_messages was written on every interrupt
(lines 1331-1334) but never read or consumed anywhere. The actual
interrupt delivery uses adapter._pending_messages (a separate dict).
Removed the write-only accumulation to prevent unbounded growth.
Two new tests:
- tests/gateway/test_telegram_format.py
test_message_too_long_splits_into_continuations_not_silent_truncation:
asserts edit_message returns success=True with continuation_message_ids
populated and message_id pointing at the last continuation when
content exceeds MAX_MESSAGE_LENGTH (#19537). Replaces the original
fail-on-overflow assertion with the split-and-deliver contract.
- tests/gateway/test_stream_consumer.py
TestEditOverflowSplitAndDeliver.test_consumer_advances_message_id_on_split_and_deliver:
asserts the consumer side updates _message_id to the latest
continuation, clears _last_sent_text, and fires on_new_message when
the adapter reports a split-and-deliver result.
When edit_message_text exceeded Telegram's 4096 UTF-16 codepoint limit,
the adapter caught the BadRequest, best-effort truncated the content
with '…', and returned SendResult(success=True). The stream consumer
believed the full edit was delivered and never recovered, silently
dropping everything past the truncation boundary on long replies.
Returning failure isn't safe either — the consumer's existing fallback
path can race against the next streaming tick, producing duplicate
sends or gaps. Instead, the adapter now SPLITS the oversized payload
across the existing message + new continuation messages, so the user
always gets the full reply in correct order.
How it works:
1. Pre-flight: if utf16_len(content) already exceeds MAX_MESSAGE_LENGTH,
call the new _edit_overflow_split helper directly — saves a doomed
round-trip + a Telegram error.
2. Reactive: if Telegram still returns 'message_too_long' after the
pre-flight (e.g. parse_mode formatting inflated the payload past
the limit via MarkdownV2 escapes), the same helper handles it.
3. _edit_overflow_split:
- Splits via truncate_message(len_fn=utf16_len) — same chunking the
non-streaming send() path uses; chunks get '(1/N)' suffixes.
- Edits the original message_id with chunk 1 (with parse_mode +
plain-fallback when finalize=True, mirroring the main edit path).
- Sends each remaining chunk via self._bot.send_message threaded as
a reply to the previous chunk so the user sees them as a
contiguous block. MarkdownV2-with-plain-fallback per chunk on
finalize.
- Returns SendResult(success=True, message_id=<last_chunk_id>,
continuation_message_ids=(<chunk2_id>, <chunk3_id>, ...)) so the
stream consumer can keep editing the most recent visible message
and the gateway has full visibility into every message id.
SendResult contract extension:
Added optional continuation_message_ids: tuple = () field. When
empty (the common case), behavior is unchanged. When populated, the
caller knows the adapter delivered across multiple platform messages.
Stream consumer integration:
GatewayStreamConsumer._send_or_edit advances _message_id to the
last-continuation id when it sees continuation_message_ids on a
successful edit result, resets _last_sent_text (the new visible
message holds only the final chunk's text), and fires
on_new_message so tool-progress bubbles linearize below the new
continuation rather than the original. Mirrors the openclaw #32535
inter-tool-leak guard.
Composes with what just landed:
- PR #23455 (UTF-16 length-aware splitting in stream consumer)
prevents most overflows upstream by measuring text in UTF-16
codeunits before deciding to split. This PR is the safety net at
the adapter boundary.
- PR #23512 (native draft streaming, default for DM Telegram) routes
DM streaming through send_draft, which has its own contract
unaffected by this change. So this fix narrows in scope to the
edit-based path: groups, supergroups, forum topics, every
non-Telegram platform, and the per-response fallback after a
draft failure.
Salvage notes:
- Cherry-picked from PR #19537 by @kjames2001. Original PR returned
failure on overflow; this evolves to split-and-deliver so users
never lose content and the consumer state stays consistent.
- Dropped an unrelated model-picker hunk (line 2114-2117) that
silently killed the 'X more available — type /model <name>
directly' hint by hardcoding total=len(models). Not in scope.
- Restored the timeout-aware retryable=not is_timeout signal in
send()'s fallthrough catch block.
Closes#19537.
Surface ready tasks that nobody claims within a threshold (default
30 min) regardless of why. One identity-agnostic signal that catches:
- Operator typo'd the assignee
- Profile was deleted, leaving its tasks stranded
- External worker pool (Codex CLI lane, custom daemon) is down
- Dispatcher misconfigured (wrong board / wrong HERMES_HOME)
Today the dispatcher correctly skips these (no respawn loop, good)
but nothing surfaces the fact that operator-actionable work is
accumulating. The new `stranded_in_ready` rule does that without
requiring a manual lane registry — it reads the most recent ready-
transition event (`created` / `promoted` / `reclaimed` / `unblocked`)
and fires when (now - last_ready_ts) > threshold.
Severity escalates with age: warning at threshold, error at 2x,
critical at 6x. The cli_hint and reassign actions point operators
at the right next step.
Out of scope deliberately:
- Lane registry (#20157 closed) — this signal supersedes it.
- Pushing the diagnostic into messaging gateways — diagnostics
are pull-only via 'hermes kanban diagnostics' for now; gateway
push is a separate UX decision.
Tests: 10 new + 461 existing kanban tests pass. E2E verified end-
to-end via 'hermes kanban diagnostics --json' against a 2h-old
stranded task — surfaces as error severity with correct actions.
- Bug 1: shift-click now always adds the target card and sets it as the
last-selected anchor, so range selection works even when 0 or 1 cards
are selected.
- Bug 2: column select-all checkbox now toggles: if every card in the
column is already selected, clicking unselects them all.
- Bug 3: applyBulk now mirrors moveSelected with optimistic UI updates
for status moves and calls loadBoard() on catch for consistency.
The Task dataclass has no `summary` field; only Run carries summary.
The dashboard already searches `latest_summary` (derived from the
latest run), so `t.summary` in the client-side haystack was always
undefined and therefore redundant.
Verdict from task t_4bcac44f:
- Before batch QOL (6c7ec94d9): search only covered id, title,
assignee, tenant.
- Batch QOL (7fd187102) correctly added body, result, latest_summary.
- `t.summary` was included but is a misleading no-op because tasks
never expose a `summary` key — `latest_summary` already covers it.
Removes the redundant field from the haystack only.
- When dragging a selected card while multiple cards are selected, the
browser ghost image now shows a 'N cards' badge instead of a single card.
- All selected cards in the original column are dimmed (opacity 0.45 +
grayscale) during the drag so the user sees the whole set is in-flight.
- Uses React state for the dragged task id; event delegation on the board
columns container to avoid deep prop threading.
- Preserve failedIds partial-failure highlighting after moveSelected/
applyBulk by clearing only selectedIds/lastSelectedId instead of
calling clearSelected() (which also wiped failedIds).
- Fix touch/native multi-drag drop stale closure by adding
props.selectedIds and props.onMoveSelected to the hermes-kanban:drop
useEffect dependency array.
Fixes t_5bfafb73.
- Extend BulkTaskBody with reclaim_first: bool = False
- In bulk_update, use kanban_db.reassign_task(..., reclaim_first=True)
when payload.reclaim_first is set and assignee is present
- Falls back to existing assign_task behavior when reclaim_first is false
This enables the dashboard to bulk-reassign running tasks by
reclaiming their claims first, matching the single-task
/tasks/{id}/reassign endpoint behavior.
Live-tested on gemini-3-flash-preview the judge kept returning empty
or non-JSON content, tripping the consecutive-parse-failures auto-
pause. Free-form JSON output is hopeful; tool-call schemas are
enforced server-side by virtually every modern provider.
Two new tools the judge calls:
- submit_checklist(items) — Phase A, decompose
- update_checklist(updates, new_items, reason) — Phase B, evaluate
Both phases now call the auxiliary client with tool_choice forcing
the right tool. read_file remains for Phase B history inspection,
with the loop exiting only when update_checklist is called or the
read budget is exhausted (at which point read_file is dropped from
the toolbox and update_checklist is forced).
Robustness:
- _call_judge_with_tool_choice falls back tool_choice forced→required→
auto if the provider rejects a particular shape.
- If a fully-broken provider still returns content instead of a tool
call, the legacy JSON-text parsers stay around as a last-ditch
backstop so we never silently lose a checklist.
- _normalize_update_args replaces the JSON parser for the apply
layer; same 1-based→0-based conversion + terminal-status filter.
Live verification: same fizzbuzz goal that was hitting 'judge model
returned unparseable output 3 turns in a row' before now terminates
in 2 turns, all 11 items marked completed with item-specific
evidence, no auto-pause. Agent log shows
'produced 11 checklist items via tool call' instead of the JSON-
parse path.
Tests: 7 new cases for the tool-call path (Phase A success, Phase B
update only, Phase B read_file→update, JSON-content backstop,
empty-text item dropping, non-terminal status filter).
When run_agent's _compress_context fires mid-turn it ends the parent
session in SessionDB and creates a new continuation session with a
fresh session_id. The /goal state is keyed on session_id in
state_meta ("goal:<sid>"), so without forwarding the goal silently
disappears: _get_goal_manager() rebinds for the new session_id,
load_goal() returns None, mgr.is_active() is False, and the
continuation loop dies with no user-visible signal.
Fix: in the same SessionDB transaction block that creates the
continuation session, copy state_meta[goal:<old>] →
state_meta[goal:<new>] when present. No-op when the user has no
active goal. Logged at INFO so a stuck loop is debuggable.
Tests cover the round-trip via SessionDB and the no-op path.
Affects all three run-conversation surfaces (CLI, gateway, TUI
gateway) because _compress_context is the single rotation site.
Out-of-scope behavior change in #23521 — the kanban notifier-routing fix
also flipped the 'kanban create --created-by' default from 'user' to the
active profile name. Revert to keep PR scope focused on the notifier
ownership fix; the profile-aware author default can be its own change.
Added tests/gateway/test_stream_consumer_draft.py with 11 tests
covering:
- Transport selection: auto+dm-supported -> draft; auto+group -> edit;
explicit edit; explicit draft on unsupported adapter -> edit;
MagicMock adapter -> edit (back-compat for the existing test suite).
- Happy path: DM stream animates draft frames with a single shared
draft_id, then finalizes via a regular adapter.send.
- Group fallback: drafts entirely skipped in non-DM chats.
- Failure fallback: send_draft returning success=False disables drafts
for the rest of the response.
- Draft_id lifecycle: consecutive responses use distinct ids; tool
boundaries bump the id so post-tool text animates fresh below the
tool-progress bubble (the openclaw #32535 leak guard).
- _already_sent contract: drafts must NOT set the flag so the gateway's
fallback final-send still fires (drafts have no message_id).
Updated website/docs/user-guide/messaging/telegram.md with a
'Streaming transport' section explaining auto|draft|edit|off, the
DM-only constraint, and the per-response fallback behaviour.
Adds Telegram's native streaming-draft API as a streaming transport so DM
replies render with smooth animated previews as tokens arrive, dropping
the per-edit jitter of the legacy editMessageText polling path.
Adapter contract (gateway/platforms/base.py):
- supports_draft_streaming(chat_type, metadata) -> bool. Default False.
Telegram returns True only for DMs and only when the bound python-
telegram-bot version exposes Bot.send_message_draft (PTB 22.6+).
- send_draft(chat_id, draft_id, content, metadata) -> SendResult.
Default raises NotImplementedError. Telegram delegates to PTB's
send_message_draft. Drafts have no message_id (Bot API contract);
SendResult.message_id is None on success.
Telegram adapter (gateway/platforms/telegram.py):
- supports_draft_streaming gates on chat_type='dm' AND PTB capability.
- send_draft trims to MAX_MESSAGE_LENGTH using utf16_len, threads
message_thread_id through metadata, and routes failures back as
SendResult(success=False, error=...) so the consumer can fall back.
Stream consumer (gateway/stream_consumer.py):
- StreamConsumerConfig gains transport ('auto'|'draft'|'edit'|'off')
and chat_type fields.
- run() resolves _use_draft_streaming once via a probe at the top of
the run, allocating a fresh class-wide draft_id_counter so each
response animates as its own preview (no animation collision across
consecutive responses to the same chat).
- _send_or_edit gains a pre-edit branch: when drafts are active AND
not finalizing AND no edit-path message_id is established, the
frame routes through _send_draft_frame instead of edit_message.
Drafts intentionally do NOT set _already_sent so the gateway's
final sendMessage path still fires — drafts have no message_id and
the user needs a real message in their chat history.
- _reset_segment_state bumps the draft_id when the consumer is in
draft mode so each text block after a tool boundary animates as a
fresh preview below the tool-progress bubble (avoids the inter-
tool-call leak openclaw documented in their #32535).
- Per-response fallback: any send_draft failure (transient network,
server reject, capability gap) flips _use_draft_streaming to False
for the rest of the run, gracefully returning to the edit path.
Gateway config (gateway/config.py):
- StreamingConfig.transport default flips edit -> auto. The auto path
is identical to edit on every chat type that doesn't currently
support drafts (groups, supergroups, forum topics, every non-
Telegram platform), so the default is backwards-compatible for
non-DM users.
Lifecycle model (Telegram Bot API 9.5):
1. sendMessageDraft(chat_id, draft_id, text='') opens the bubble.
2. Repeated sendMessageDraft calls with the SAME draft_id animate
the preview as text grows.
3. Drafts have no message_id and cannot be edited or deleted.
4. When the response finishes the gateway's normal sendMessage path
delivers the final answer; the draft preview clears naturally on
the client and the user sees a real message in their history.
Inspired by PR #3412 by @NivOO5. Re-authored against current main
(stream_consumer.py is now ~4x larger than at #3412's branch base, with
new _NEW_SEGMENT/_COMMENTARY/finalize/_on_new_message machinery the
original PR didn't account for) but the design call (DM-only, edit-
fallback, transport=auto|draft|edit|off) is faithful to the original
proposal, with two improvements baked in:
1. Per-response draft_id (monotonic counter, not a time hash) — no
collision risk across consecutive responses on the same chat.
2. Tool-boundary draft_id bump — prevents the inter-tool-call leak
openclaw hit during their rollout (their #32535).
Closes#21439 (duplicate feature request).
Follow-up to TreyDong's fix: switch the auth header to
`X-Hermes-Session-Token` (the canonical pattern used by the rest of
the dashboard SPA — see `web/src/lib/api.ts` `fetchJSON()`). The
server still accepts both schemes, so the original `Authorization:
Bearer` form would also work; we standardize on X-header to match
every other dashboard fetch and only set the header when a token is
actually present.
Also add scripts/release.py AUTHOR_MAP entry for treydong.zh@gmail.com.
The existing _live_system_guard (PR #23397) blocked os.kill / os.killpg
and a narrow subset of subprocess invocations. Tests still SIGTERMed the
live gateway today (May 10) because the guard had structural holes.
Plug them all:
- subprocess: also wrap getoutput, getstatusoutput
- os.system, os.popen - completely unwrapped before
- pty.spawn - completely unwrapped before
- asyncio.create_subprocess_exec / create_subprocess_shell - bypassed
the subprocess module entirely; now wrapped
- Subprocess command inspection now looks at the WHOLE command string,
not just tokens[0]. Catches sudo systemctl, env systemctl, bash -c
'systemctl', setsid systemctl, /usr/bin/systemctl, etc.
- New process-killer block: pkill / killall / taskkill / fuser
targeting hermes/python patterns is now refused
- os.kill PID 0 (own group) allowed; PID -1 (every process we can
signal) refused
- subprocess.Popen wrapper preserves __class_getitem__ so third-party
packages that use Popen[bytes] as a type annotation still import
Coverage is locked in by tests/test_live_system_guard_self_test.py -
exercises every primitive against a guaranteed-foreign PID and asserts
the guard fires. Adding a new kill primitive without updating the guard
breaks CI.
scripts/run_tests.sh now also force-loads ~/.hermes/pytest_live_guard.py
when present (developer-machine convenience), so even worktrees that
predate this commit get the protection on subsequent test runs through
the canonical wrapper.