When Telegram flood control triggers 3+ consecutive edit failures, the
stream consumer enters fallback mode and sends the complete response as
a new message. This leaves the user seeing two messages: a frozen
partial (with cursor) and the full duplicate.
After the fallback chunks are sent successfully, delete the original
partial message so the user only sees one complete response. The delete
is best-effort — if it fails (e.g. flood still active, missing
permissions), the full answer is still delivered.
Fixes#16668
The shutdown forensics added in #23285 caught tests/hermes_cli/ pytest
runs sending SIGTERM to the developer's live gateway 5+ times in 3
days. Root cause: when a single test forgets to mock os.kill or
find_gateway_pids, the real call leaks past the hermetic HERMES_HOME
isolation — find_gateway_pids' psutil scan walks the whole machine and
returns the live gateway PID, then the unmocked os.kill delivers the
signal.
Rather than audit and patch ~30 tests across cmd_update, kill_gateway_processes,
and stop_profile_gateway code paths, install a single autouse guard in
tests/conftest.py that blocks the two primitives that actually cause
the damage:
- os.kill rejects any PID outside the test process subtree with a
hard RuntimeError so the offending test gets a stack trace instead
of silently murdering the real gateway.
- subprocess.run / Popen / call / check_call / check_output reject
any 'systemctl <verb> hermes-gateway' invocation that would mutate
the live unit. Read-only systemctl calls (status, show, list-units)
still pass through.
We intentionally do NOT stub find_gateway_pids / _scan_gateway_pids —
tests of those functions themselves need the real implementation.
Discovery without delivery is harmless; the os.kill + systemctl guards
catch the actual damage path.
Tests that legitimately need real signal delivery (e.g. PTY tests
signalling their own child) opt out via
@pytest.mark.live_system_guard_bypass.
Validation: tests/hermes_cli/ + tests/cli/ + tests/gateway/ produce
the same 17 failures with and without this guard (all pre-existing on
main, unrelated to gateway-kill leaks). The live gateway survives the
test run that previously SIGTERMed it.
Two follow-up improvements to the previous commit's notifier dedup work.
1. Add a regression test for the send-exception rewind path. The
contributor's PR included a test for the adapter-disconnect path
(test_kanban_notifier_rewinds_claim_if_adapter_disconnects, where
adapter is None at delivery time), but not for the "adapter is
connected, send() raises" path that fires inside the inner try/except
at gateway/run.py:4314. The new test
(test_kanban_notifier_rewinds_claim_on_send_exception) uses a
FailingAdapter that always raises and confirms (a) send was actually
attempted, (b) the claim was rewound, (c) the next call to
unseen_events_for_sub still returns the event for retry.
2. Drop the per-delivery success log from INFO to DEBUG. A busy board
on a multi-platform gateway can produce hundreds of these per day;
that's gateway.log noise that obscures real warnings. Failure paths
stay at WARNING (where you'd want to look when something's wrong)
so we don't lose visibility into transient send issues.
Adds a Cross-Platform Handoff section to user-guide/sessions.md covering
the CLI flow, per-platform thread behavior (Telegram topics / Discord
threads / Slack message-anchored / no-thread fallback), failure modes,
and the resume-back-to-CLI loop.
Adds the /handoff entry to reference/slash-commands.md and updates the
CLI-only commands note.
Three issues hit during a fresh Windows install + first `hermes update`:
1. `pyproject.toml` re-introduced the invalid `exclude-newer = "7 days"`
under [tool.uv]. uv requires an RFC 3339 / ISO date — relative-duration
strings parse-fail. The line was removed in PR #21221 on May 7 and
accidentally added back in the v0.13.0 release commit (498bfc7bc1)
the same day. Every uv invocation throughout install logged a TOML
parse error, confusing users into thinking the install was broken.
Fix: remove the line (and the now-empty [tool.uv] section).
2. `hermes update` failed on Windows with
`Access is denied. (os error 5)` when uv tried to overwrite
`venv\\Scripts\\hermes.exe` — the running entry-point shim. Windows
blocks REPLACE on a mapped/loaded executable but allows RENAME (kernel
tracks the file by handle, not path; same trick Chrome/Firefox use for
self-update). Pre-rename live shims to `hermes.exe.old.<unix-ms>`
before each `uv pip install -e .`; uv writes a fresh shim at the
original path; the .old files are swept on the next hermes invocation.
Wraps every install attempt (primary, base-only fallback, and
per-extra retries). Restores shims if uv fails before writing
replacements.
3. Tools post-setup hooks (ddgs, piper-tts, kittentts, langfuse,
tinker-atropos) shelled out to `[sys.executable, '-m', 'pip', ...]`
and died with `No module named pip` on every fresh Windows install.
install.ps1 creates the venv via `uv venv` which doesn't seed pip;
install.ps1 bootstraps pip later, but only inside the platform-SDK
verify block — by then the wizard's post-setup hooks have already
run and failed.
New `_pip_install` helper tries uv pip first (works in pip-less
venvs), then python -m pip, then ensurepip-bootstrap-then-pip. All
five post-setup sites now route through it.
E2E:
- uv pip compile pyproject.toml — no parse warning
- quarantine + cleanup with simulated Windows scripts dir; rollback
works when uv install fails before writing replacement shim
- _pip_install in a real `uv venv`-created (pip-less) venv: bootstraps
pip via ensurepip and completes the install
Tests: tests/hermes_cli/ — 4135 pass, 8 pre-existing failures on main
unrelated to this PR (kanban_boards, openclaw_migration,
update_gateway_restart, web_server PluginAPIAuth).
Builds on @kshitijk4poor's CLI handoff stub. The original PR's flow
deferred everything to whenever a real user happened to message the
target platform; this rewrites it so the gateway picks up handoffs
immediately and the destination chat just starts working.
State machine on sessions table replaces the boolean flag:
None -> 'pending' -> 'running' -> ('completed' | 'failed')
plus handoff_error for failure reasons. CLI request_handoff /
get_handoff_state / list_pending_handoffs / claim_handoff /
complete_handoff / fail_handoff helpers wrap the transitions.
CLI side (cli.py): /handoff <platform> validates the platform's home
channel via load_gateway_config, refuses if the agent is mid-turn,
flips the row to 'pending', and poll-blocks (60s) on terminal state.
On 'completed' it prints the /resume hint and exits the CLI like
/quit. On 'failed' or timeout it surfaces the reason and the CLI
session stays intact.
Gateway side (gateway/run.py): new _handoff_watcher background task
scans state.db every 2s, atomically claims pending rows, and runs
_process_handoff for each. _process_handoff:
1. Resolves the platform's home channel.
2. Asks the adapter for a fresh thread via the new
create_handoff_thread(parent_chat_id, name) capability so the
handed-off conversation gets its own scrollback. Adapters that
don't support threads (or fail) return None and the watcher
falls back to the home channel directly.
3. Constructs a SessionSource keyed as 'thread' when a thread was
created, 'dm' otherwise, then session_store.switch_session
re-binds the destination key to the CLI session_id. The full
role-aware transcript replays via load_transcript on the next
turn (no flat-text injection into context_prompt).
4. Forges a synthetic MessageEvent(internal=True) with the handoff
notice and dispatches through _handle_message; the agent runs
against the loaded transcript and adapter.send delivers the
reply.
5. Marks the row 'completed' on success, 'failed' (+error) on any
exception.
Adapter capability (gateway/platforms/base.py): create_handoff_thread
default returns None. Three overrides:
- Telegram (gateway/platforms/telegram.py): wraps _create_dm_topic
so DM topics (Bot API 9.4+) and forum supergroups both work.
- Discord (gateway/platforms/discord.py): parent.create_thread on
text channels with a seed-message + message.create_thread
fallback for permission edge cases. Skips DMs and other
non-thread-capable parents.
- Slack (gateway/platforms/slack.py): posts a seed message and
returns its ts as the thread anchor — Slack threads are
message-anchored.
In thread mode, build_session_key keys the destination without
user_id (thread_sessions_per_user defaults to False) so the synthetic
turn and any later real-user message in the thread share the same
session_key — seamless takeover without race.
CommandDef stays cli_only=True (handoff is initiated from the CLI;
gateway exposes /resume for the reverse direction).
Removed the original PR's _handle_message_with_agent handoff hook
(transcript-as-text injection into context_prompt) and the
send_message_tool notification — both replaced by the watcher path.
Tests rewritten around the new state machine: 13/13 pass.
E2E-validated thread + no-thread paths and the failure path against
real worktree imports with mocked adapters.
Adds /handoff <platform> CLI command that queues the current session for
resume on the configured home channel of any messaging platform.
CLI side:
- /handoff telegram — marks session in shared DB, sends summary to
the Telegram home channel via send_message
- /handoff discord — same for Discord
- Supports telegram, discord, slack, whatsapp, signal, matrix
Gateway side:
- On new session creation, checks for pending handoffs for the
incoming message's platform
- If found, loads the CLI session's full conversation history and
injects it into the context prompt as a handoff transcript
- Agent continues the conversation seamlessly
Files:
- hermes_state.py: handoff_pending, handoff_platform columns + helpers
- cli.py: _handle_handoff_command dispatch + handler
- hermes_cli/commands.py: CommandDef entry
- gateway/run.py: handoff detection in _handle_message_with_agent
- tests/hermes_cli/test_session_handoff.py: 8 tests
The skill enumerated 8 specialist profile names (researcher, analyst,
writer, reviewer, backend-eng, frontend-eng, ops, pm) as "the standard
roster" and told orchestrators to "assume these exist." Almost no real
Hermes setup matches that fleet — single-profile setups, Docker-worker
setups, and curated-team setups all violate it — so following the skill
literally produced cards assigned to non-existent profiles, which the
dispatcher silently failed to spawn (no autocorrect, no fallback, just
sits in `ready` forever).
Changes:
- Drop the standard-specialist-roster table.
- Add a "Profiles are user-configured — not a fixed roster" section at
the top with a Step 0 that prescribes `hermes profile list` (or asking
the user) before fanning out. Cache the result in working memory.
- Rewrite the worked task-graph example with placeholder names
(<profile-A>, <profile-B>, <profile-C>) so the structure is still
teachable but doesn't invite copy-paste of role names that may not
exist.
- Reframe the "If no specialist fits" anti-temptation rule: don't
invent profile names; ask the user.
- Add a "Inventing profile names that doesn't exist" entry to Pitfalls.
- Bump skill version 2.0.0 → 3.0.0 (semantic break: previous behavior
promised a roster the skill no longer enumerates).
- Update website/docs/user-guide/features/kanban.md to drop the
matching "(researcher, writer, analyst, backend-eng, reviewer, ops)"
line and explain the discovery prompt instead.
- Re-run website/scripts/generate-skill-docs.py to refresh the
auto-generated skill page + catalog.
Closes#21131 in spirit — addresses the same hardcoded-names footgun
@yehuosi flagged, with a different shape than their PR (delete the
roster rather than replace each name with placeholder, since the
roster table was the load-bearing footgun and the worked example is
salvageable with placeholder profile names).
Co-authored-by: yehuosi <yehuosi@users.noreply.github.com>
* feat(gateway): per-platform admin/user split for slash commands
Adds an opt-in two-list access control on top of the existing per-platform
`allow_from` allowlists, scoped to slash commands only:
- allow_admin_from — full slash command access
- user_allowed_commands — what non-admins may run
- group_allow_admin_from — same, group/channel scope
- group_user_allowed_commands
When `allow_admin_from` is unset for a scope, gating is disabled and every
allowed user keeps full access (backward compat). Plain chat is unaffected.
`/help` and `/whoami` are always reachable so users can see what they
can run.
Gate runs at the slash command dispatch site in gateway/run.py and uses
`is_gateway_known_command()`, so it covers built-in AND plugin-registered
commands through the live registry without per-feature wiring.
Adds `/whoami` showing platform, scope, tier, and runnable commands.
Salvage of PR #4443's permission tier work, scoped down. The full tier
system, tool filtering, audit log, usage tracking, rate limiting,
`/promote` flow, and persistent SQLite stores are not included here —
those can be re-expanded later if needed.
Co-authored-by: ReqX <mike@grossmann.at>
* fix(gateway): close running-agent fast-path bypass + add coverage and central docs
The slash command access gate was only applied at the cold dispatch site
(line ~5921). When an agent was already running, the running-agent
fast-path block (line ~5574) dispatched /restart, /stop, /new, /steer,
/model, /approve, /deny, /agents, /background, /kanban, /goal, /yolo,
/verbose, /footer, /help, /commands, /profile, /update directly
without going through the gate — letting non-admins bypass gating just
because an agent happens to be busy.
Refactored the gate into _check_slash_access() and called from BOTH
paths. /status remains intentionally pre-gate so users can always see
session state.
Also added 18 more dispatch tests covering:
- Running-agent fast-path: blocks non-admin, allows admin, /status
always works
- Alias canonicalization (gate uses canonical name, not user alias)
- Unknown / unregistered commands pass through (don't false-positive)
- DM admin scope-locked when group has its own admin list
- Multi-platform isolation (Discord gated, Telegram unrestricted)
Docs: added Slash Command Access Control section to the central
messaging index page + /whoami row in the chat commands table.
Co-authored-by: ReqX <mike@grossmann.at>
---------
Co-authored-by: ReqX <mike@grossmann.at>
xAI is retiring grok-4, grok-4-0709, grok-4-fast{,-reasoning,-non-reasoning},
grok-4-1-fast{,-reasoning,-non-reasoning}, and grok-code-fast-1 on
May 15, 2026 at 12:00 PT. Remove them from the static fallbacks so the
`hermes model` picker, gateway /model picker, and setup wizard stop
auto-suggesting models that will be dead in days.
- _XAI_STATIC_FALLBACK in hermes_cli/models.py now lists only grok-4.20-*
and grok-4.3 (the live replacements).
- copilot lists in hermes_cli/models.py and hermes_cli/setup.py drop
grok-code-fast-1 (Copilot proxies it through xAI, so the upstream
retirement breaks it there too).
Old configs that already reference retired IDs keep working until xAI
flips the switch — context-length lookups in agent/model_metadata.py and
the cache-affinity-header logic in provider_profiles still recognise the
old names. The cleanup here is purely about not advertising them to new
users.
Closes#23278.
Source: https://docs.x.ai/developers/migration/may-15-retirement
Follow-up to the previous commit's behavior fix.
Adds a paragraph to dispatch_once's docstring making the concurrency-cap
semantic explicit, and an inline comment near the running_count query
explaining why we do the count (so a future reader doesn't refactor it
back to per-tick semantics thinking it's redundant). Both call out the
unbounded-accumulation failure mode that motivated the fix, since
nothing in the codebase or skills currently documents what max_spawn
is supposed to mean.
The semantic is per-board: each kanban board has its own SQLite file,
so the running-count COUNT(*) is naturally scoped to the board the
dispatcher tick is processing.
When the gateway received SIGTERM, the shutdown_signal_handler ran a
synchronous 'ps aux' (3s timeout) inside the asyncio event loop, then
asyncio.create_task(runner.stop()). On a busy host that ate 1-3s of
the teardown budget before draining could even start, and the resulting
log line was a multi-line ps dump that didn't tell us who sent the
signal. The shutdown path itself logged 'Stopping gateway...' and then
nothing until 'Gateway stopped' — when systemd SIGKILLed mid-drain,
there was no way to see which phase wedged.
Changes:
- New gateway/shutdown_forensics.py:
* snapshot_shutdown_context(sig) — sub-millisecond /proc-only capture
of signal name, parent pid+name+cmdline, INVOCATION_ID (systemd
marker), loadavg_1m, TracerPid, takeover/planned-stop marker
presence + whether-it-names-self. Pure stdlib, never raises.
* spawn_async_diagnostic(log_path, sig) — detached subprocess with
its own 'timeout 5s', start_new_session=True, writes ps auxf +
pstree + dmesg to ~/.hermes/logs/gateway-shutdown-diag.log.
Returns immediately, can't block the event loop or the cgroup
teardown.
* check_systemd_timing_alignment(drain_timeout) — reads
/proc/self/cgroup for our unit, asks systemctl show for
TimeoutStopUSec, returns mismatch info when the unit's stop
timeout is smaller than restart_drain_timeout + 30s headroom
(the case where systemd SIGKILLs mid-drain).
* _parse_systemd_duration_to_us — covers '90s', '1min 30s',
'500ms', '1h' style values from systemctl show.
* format_context_for_log — single scannable key=value line, parent
cmdline last.
- gateway/run.py shutdown_signal_handler:
* Replaces synchronous ps aux + ad-hoc 'hermes-related lines' filter
with snapshot + detached spawn.
* Always logs 'Shutdown context: signal=... parent_pid=...
parent_cmdline=...' regardless of planned/unexpected so we can
correlate signal source even on planned restarts.
- gateway/run.py _stop_impl:
* Per-phase '+X.XXs' timing for notify_active_sessions, drain
(with drain_seconds, active_at_start, active_now, timed_out),
post-interrupt tool kill, each adapter disconnect (Xs),
all adapters disconnected, final-cleanup tool kill, SessionDB
close, total teardown.
- gateway/run.py start():
* Stale-unit warning at startup when the running systemd unit's
TimeoutStopSec is smaller than the configured drain timeout.
Points the user at 'hermes gateway service install --replace'
to regenerate, or at shortening agent.restart_drain_timeout.
Tests: 30 new in tests/gateway/test_shutdown_forensics.py — snapshot
speed bound, signal name resolution, marker detection self-vs-other,
async diag spawn doesn't block caller, systemd duration parser, and
alignment check returns None outside systemd. Wider tests/gateway/
suite: 5258 passing, 3 pre-existing TTS-routing failures unchanged
on main.
Follow-up to the previous commit's toolset-vs-skill validation.
The contributor's fix raises ValueError on the first toolset name found
in the skills list. That works for one mistake, but agents that confuse
skills with toolsets usually pass several at once
(`skills=["web", "browser", "terminal"]`) — and serial-correcting one
per failure round-trip wastes tokens. Collect all toolset-shaped
entries first, then raise once with the full list.
The error message is also slightly clearer:
'web', 'browser', 'terminal' are toolset names, not skill name(s).
Put toolsets in the assignee profile's `toolsets:` config instead of
per-task skills. Skills are named skill bundles (e.g. `kanban-worker`,
`blogwatcher`); toolsets are runtime capabilities (e.g. `web`,
`browser`, `terminal`).
vs. the previous "the assignee profile's toolsets" — explicitly naming
the YAML key (`toolsets:`) and giving concrete examples in both
categories closes the conceptual gap that produced the bug to begin
with.
Adds one regression test (test_create_task_skills_lists_all_toolset_typos)
covering the multi-name aggregation path. The single-typo test from
the original PR still passes (the loose `match="toolset name"` matches
both singular and plural forms).
Two follow-up improvements to Tranquil-Flow's metadata-panel restyle.
Both stay within the parent PR's "tone down the panel" scope.
1. Native <details>/<summary> collapse for verbose metadata.
The parent PR consciously deferred this ("adding native expand/collapse
would be the next step but requires UX agreement"). The default they
asked for is straightforward: collapsed when the rendered JSON exceeds
300 chars (the threshold where the max-height: 8.5rem cap actually
starts mattering), expanded otherwise. <details>/<summary> is the right
primitive — zero JS, browser-handled state, accessible by default
(keyboard-navigable, screen-reader announces the disclosure state),
and survives any react-state churn for free.
The OS-default disclosure marker is suppressed (list-style: none +
::-webkit-details-marker hidden) and replaced with a CSS ::before
chevron that rotates 90deg on the [open] attribute, so the look is
consistent across Firefox/WebKit/Blink without the double-marker
that would otherwise appear on the platforms that still render the
default triangle.
2. Skip rendering when metadata is an empty object.
`r.metadata && ...` truthy-checks, but `{}` is truthy in JS — so a
completed task with no actual metadata would render a "Metadata"
labeled disclosure block containing literal `{}`. Adds an
Object.keys(r.metadata).length > 0 guard so empty payloads render
nothing instead of an empty disclosure stub.
Tests: three new static-asset assertions covering the <details> shape,
the empty-object skip, and the suppress-default-marker + animated-chevron
CSS — all in `tests/plugins/test_kanban_dashboard_plugin.py`.
Hand-rebased onto current main from PR #19980; the original branch was stale
against main (~6 unrelated dashboard fixes had landed since), so applying
the PR's dist files directly would have silently reverted them.
The run-history panel in the task drawer rendered each completed run's
`metadata` field as a `<code class="hermes-kanban-run-meta">` containing
`JSON.stringify(r.metadata)` — a single unindented monoline. With
`white-space: pre-wrap` and a monospace font, a writer task's metadata
(changed_files paths, source URLs, generated-artifact details) wrapped
into a tall block of code-ish text that filled the parent run row. The
container's faint `--color-foreground 3%` background then made the whole
thing read like a crash dump even though the run completed normally.
Restyle and label, no interactivity changes:
- Wrap the meta payload in a `.hermes-kanban-run-meta-block` sub-block
with an explicit `Metadata` label (small, uppercase, muted) so the
panel reads as auxiliary detail at a glance.
- Pretty-print the JSON (`indent=2`) so the structure is scannable
instead of a wall of monoline text.
- Cap `.hermes-kanban-run-meta` at `max-height: 8.5rem; overflow: auto`
so a verbose blob scrolls inside its own pane rather than swamping
the run row.
- Sub-block uses a thin `border-left` rule and `background: transparent`
— distinct from the destructive-tinted treatment used by crashed /
timed_out / blocked / spawn_failed runs higher in the same file.
Tests: two new static-asset assertions in
`tests/plugins/test_kanban_dashboard_plugin.py` lock in the rendered
shape (the plugin ships built-only, no src/).
Adds CDPSupervisor.evaluate_runtime() and wires it into _browser_eval as a
fast path when a supervisor is alive for the current task_id. Replaces the
~180ms agent-browser subprocess fork+exec+Node-startup hop with a ~1ms
Runtime.evaluate over the supervisor's already-connected WebSocket.
Falls through to the existing agent-browser CLI path when no supervisor is
running (e.g. backends without CDP, or before the first browser_navigate
attaches one), so behaviour is unchanged where it can't apply.
JS-side exceptions surface directly without falling through to the
subprocess (the subprocess would just re-raise the same error, slower);
supervisor-side failures (loop down, no session) fall through cleanly.
Benchmark — 30 iterations of `1 + 1` against headless Chrome:
supervisor WS mean= 0.96ms median= 0.91ms
agent-browser subprocess mean=179.35ms median=167.73ms
→ 187x speedup mean
Tests: 14 unit tests (mocked supervisor + response-shape coverage), 5
real-Chrome e2e tests in test_browser_supervisor.py (gated on Chrome
being installed). Browser test suite: 355 passed, 1 skipped.
Follow-up to the previous commit's casing fix.
The original PR shipped the dist edits without test coverage. The
contributor's reasoning (UI-only attributes in a pre-built JS bundle,
nothing meaningful to unit-test) is fair, but a static-asset assertion
catches the most likely regression vector — a future rebuild of the
dist bundle that loses the attributes — at near-zero cost.
Adds two regression tests in tests/plugins/test_kanban_dashboard_plugin.py:
- test_dashboard_assignee_inputs_preserve_casing — reads dist/index.js
and asserts autoCapitalize="none", autoCorrect="off", spellCheck=false,
and textTransform="none" each appear at least twice (one per assignee
input — inline triage/lane create + task-edit panel).
- test_dashboard_lane_head_preserves_assignee_casing — reads dist/style.css
and asserts the .hermes-kanban-lane-head rule body does NOT contain
text-transform: uppercase. Locates the rule by marker so unrelated CSS
churn nearby doesn't flake the test.
Both follow the same shape as the existing test_dashboard_requests_default_board_explicitly
static-asset guard from PR #22940's salvage.
Also adds the AUTHOR_MAP entry for princepal9120's GitHub-noreply email
so release notes credit the right account.
Follow-up to the previous commit's safe-int task_age fix.
The original PR shipped without test coverage. This commit adds:
- test_safe_int_accepts_int_and_int_string — sanity for the well-typed
path so the helper itself can't quietly start swallowing valid values.
- test_safe_int_returns_none_on_corrupt_inputs — the failure modes
(None, '%s', 'abc', '', '1.5', random objects). Covers both the
ValueError and TypeError catch branches.
- test_task_age_handles_corrupt_created_at — the headline regression:
a task with created_at='%s' used to raise ValueError and turn
GET /api/plugins/kanban/board into a 500.
- test_task_age_handles_corrupt_started_and_completed — confirms the
safe-int treatment is consistent across all three timestamp fields.
- test_task_age_well_formed_task — regression that the safe path
doesn't change observable output for normal data.
- test_task_dict_survives_corrupt_created_at — defense in depth.
Writes a corrupt row directly via SQL, reads it back through the
ORM, and confirms task_age + the surrounding plugin_api guard
degrade gracefully instead of crashing.
Also adds the AUTHOR_MAP entry for the contributor's GitHub-noreply
email so release notes credit @baocin (the commit was authored locally
as `aoi <aoi@hino.local>` — re-attributed during salvage to the
github noreply form).
task_age() crashed with ValueError when created_at contained the
literal format string '%s' instead of a Unix timestamp, taking down
the entire GET /board endpoint with a 500.
- Add _safe_int() helper that returns None on non-numeric values
- Refactor task_age() to use _safe_int instead of bare int() casts
- Wrap task_age() call in _task_dict with try/except fallback so one
corrupt row never kills the whole board endpoint
* feat(i18n): localize /model command output
Reported by @tianma8888: when Chinese users run /model, the labels
("Provider:", "Context:", "_session only_", etc.) are still English.
This routes the static prose through the existing i18n catalog so it
follows display.language / HERMES_LANGUAGE.
Changes:
- locales/{en,zh,ja,de,es,fr,tr,uk}.yaml: add 17 keys under
gateway.model.* covering switched/provider/context/max_output/cost/
capabilities/prompt_caching/warning/saved_global/session_only_hint/
current_label/current_tag/more_models_suffix/usage_*.
- gateway/run.py _handle_model_command: replace hardcoded f-strings in
the picker callback, the text-list fallback, and the direct-switch
confirmation block with t("gateway.model.<key>", ...).
What stays English:
- model IDs, provider slugs, capability strings, cost figures, and the
"[Note: model was just switched...]" prepended to the model's next
prompt (LLM-facing, not user-facing).
- The two slightly-different session-only hints unify on a single key
with the em-dash phrasing.
Validation: tests/agent/test_i18n.py 27/27 passing (parity contract
holds), tests/gateway/ -k 'model or i18n' 74/74 passing.
* feat(i18n): localize all gateway slash command outputs
Expands the i18n catalog from 7 strings to 234 keys across 35 gateway
slash command handlers, so non-English users see localized output for
\`/profile\`, \`/status\`, \`/help\`, \`/personality\`, \`/voice\`, \`/reset\`,
\`/agents\`, \`/restart\`, \`/commands\`, \`/goal\`, \`/retry\`, \`/undo\`,
\`/sethome\`, \`/title\`, \`/yolo\`, \`/background\`, \`/approve\`, \`/deny\`,
\`/insights\`, \`/debug\`, \`/rollback\`, \`/reasoning\`, \`/fast\`,
\`/verbose\`, \`/footer\`, \`/compress\`, \`/topic\`, \`/kanban\`,
\`/resume\`, \`/branch\`, \`/usage\`, \`/reload-mcp\`, \`/reload-skills\`,
\`/update\`, \`/stop\` (plus the \`/model\` block already added in the
previous commit).
Reported by @tianma8888 — Chinese users want command output prose in
their language, not just the labels we already had.
Translations are hand-written for all 8 supported locales (en, zh, ja,
de, es, fr, tr, uk), matching each catalog's existing style: full-width
punctuation in zh, em-dashes in zh/ja/uk, French spaced colons,
German noun capitalization, etc.
What stays English (unchanged):
- Identifiers/values: model IDs, file paths, profile names, session IDs,
command flag names like --global, URLs, config keys.
- Backtick code spans: \`/foo\`, \`config.yaml\`.
- Log messages (logger.info/warning/error).
- LLM-facing system notes prepended to next prompt (e.g. [Note: model
was just switched...]).
- Strings produced by external modules (gateway_help_lines,
format_gateway, manual_compression_feedback) — those have their
own surfaces.
New shared keys for cross-handler boilerplate:
- gateway.shared.session_db_unavailable (5 call sites: branch, title,
resume, topic, _disable_telegram_topic_mode_for_chat)
- gateway.shared.session_not_found (1 site)
- gateway.shared.warn_passthrough (2 sites in /title's f"⚠️ {e}" pattern)
YAML gotcha fixed: \`yolo.on\` and \`yolo.off\` were originally written
unquoted, which YAML 1.1 parses as boolean True/False keys. Renamed to
\`yolo.enabled\` / \`yolo.disabled\` for both safety and clarity.
Test fix: tests/agent/test_i18n.py::test_t_missing_key_in_non_english_falls_back_to_english
now resets the catalog cache on teardown, so the fake "foo: English Foo"
locale doesn't poison the module-level cache for subsequent tests in
the same xdist worker. (Without this, every gateway slash command test
that shares a worker with the i18n suite would see the fake catalog.)
Validation:
- tests/agent/test_i18n.py: 27/27 (parity contract — every key in every
locale, matching placeholder tokens).
- tests/gateway/: 5077 passed, 0 failed (full gateway suite).
- 180 t() call sites added across 35 handlers; 1872 catalog entries
total (234 keys × 8 locales).
* feat(i18n): add 8 new locales — af, ko, it, ga, zh-hant, pt, ru, hu
Expands the static-message catalog from 8 → 16 languages, each with full
270-key parity against the English source-of-truth. Every locale now
covers the same surface PR #22914 added: approval prompts plus all 35
gateway slash command outputs.
New locales:
- af Afrikaans (community ask in #21961 by @GodsBoy; PRs #21962, #21970)
- ko Korean (PRs #20297 by @tmdgusya, #22285 by @project820)
- it Italian (PR #20371 by @leprincep35700)
- ga Irish/Gaeilge (PR #20962 by @ryanmcc09-dot)
- zh-hant Traditional Chinese (PRs #20523 by @jackey8616, #13140 by @anomixer)
- pt Portuguese (PRs #20443 by @pedroborges, #15737 by @carloshenriquecarniatto, #22063 by @Magaav)
- ru Russian (PR #22770 by @DrMaks22)
- hu Hungarian (PR #22336 by @lunasec007)
Each locale uses native-quality translations matching the existing tone
and conventions of the older 8 locales:
- zh-hant uses 繁體 characters with TW/HK technical vocabulary (軟體
not 软件, 連線 not 连接, 設定 not 设置, 訊息 not 消息, 工作階段 not 会话, 程式
not 程序, 預設 not 默认, 伺服器 not 服务器), full-width punctuation 「:()」.
- ko uses formal 합니다체 (습니다/합니다) register throughout.
- pt uses European Portuguese as baseline with neutral PT/BR vocabulary
where possible.
- ga uses standard An Caighdeán Oifigiúil; English loanwords retained
for tech terms without good Irish equivalents (gateway, API, JSON).
- All preserve {placeholder} tokens, backtick code spans, slash commands,
brand names (Hermes, MCP, TTS, YOLO, OpenAI, Telegram, etc.), and emoji.
Aliases added in agent/i18n.py:
- af-za, Afrikaans → af
- ko-kr, Korean, 한국어 → ko
- it-it, italiano → it
- ga-ie, Irish, Gaeilge → ga
- zh-tw, zh-hk, zh-mo, traditional-chinese → zh-hant (note: zh-tw used to
alias to zh; now aliases to its own zh-hant catalog)
- zh-cn, zh-hans, zh-sg → zh (unchanged from before)
- pt-pt, pt-br, brazilian, portuguese → pt
- ru-ru, Russian, русский → ru
- hu-hu, Magyar → hu
The zh-tw alias re-routing is intentional: previously typing 'zh-TW' got
the Simplified Chinese catalog (wrong vocabulary for Taiwan/HK users).
Now those users get the proper Traditional Chinese catalog.
Validation:
- tests/agent/test_i18n.py: 43/43 (parity contract holds for all 16
languages × 270 keys = 4320 catalog entries, with matching placeholder
tokens).
- E2E alias resolution verified for all 19 alias inputs (Afrikaans, ko-KR,
한국어, italiano, Gaeilge, zh-TW, zh-HK, traditional-chinese, pt-BR,
brazilian, Magyar, etc.).
- tests/gateway/: 5198 passed (3 pre-existing TTS routing failures
unrelated to i18n).
Credit to all contributors whose PRs surfaced these language requests.
Their original PRs may now be closed as superseded with credit.
* feat(dashboard-i18n): add 14 web dashboard locales matching the static catalog
Brings the React dashboard (web/src/) up to the same 16-language
coverage the static catalog already has after the previous commits in
this PR. The Translations interface is TypeScript-typed, so every new
locale must provide every key — tsc -b is the parity guard.
Languages added (each is a complete 429-line locale file):
- af Afrikaans
- ja Japanese (PR #22513 by @snuffxxx surfaced this)
- de German (PR #21749 by @mag1art)
- es Spanish (PR #21749)
- fr French (PRs #21749, #10310 by @foXaCe)
- tr Turkish
- uk Ukrainian
- ko Korean (PRs #21749, #18894 by @ovstng, #22285 by @project820)
- it Italian
- ga Irish (Gaeilge)
- zh-hant Traditional Chinese (PR #13140 by @anomixer)
- pt Portuguese (PRs #22063 by @Magaav, #22182 by @wesleysimplicio, #15737 by @carloshenriquecarniatto)
- ru Russian (PRs #21749, #22770 by @DrMaks22)
- hu Hungarian (PR #22336 by @lunasec007)
Each translation covers all 15 namespaces with full key parity vs en.ts,
preserves every {placeholder} token verbatim, keeps identifiers
untranslated (brand names, file paths, cron expressions, code spans),
translates the language.switchTo tooltip into the target language, and
matches existing tone conventions (zh-hant uses TW/HK vocab; ja uses
formal desu/masu; ko uses formal seumnida register; ga uses An
Caighdean Oifigiuil with English loanwords for tech vocab without good
Irish equivalents).
Plumbing:
- web/src/i18n/types.ts: Locale union expanded to all 16 codes.
- web/src/i18n/context.tsx: imports all 16 catalogs; exports
LOCALE_META (endonym + flag per locale); isLocale() type guard.
- web/src/i18n/index.ts: re-export LOCALE_META.
- web/src/components/LanguageSwitcher.tsx: replaced two-state EN-ZH
toggle with a click-to-open dropdown listing all 16 languages.
Note: zh-hant.ts exports zhHant (camelCase) since hyphen is invalid in
a JS identifier; the canonical 'zh-hant' string keys it in TRANSLATIONS.
Validation:
- npx tsc -b: 0 errors. Every locale satisfies Translations.
- npm run build (tsc + vite production): green, 2062 modules.
- Each locale file is exactly 429 lines.
Out of scope: plugin dashboards (kanban/achievements ship as prebuilt
bundles with no source in repo); Docusaurus docs (separate surface);
TUI (no i18n yet).
* feat(plugin-i18n): localize achievements + kanban plugin dashboards across all 16 locales
Brings the two shipped plugin dashboards (hermes-achievements, kanban)
under the same i18n umbrella as the core dashboard PR #22914 just
established. Both bundles now read user-facing strings from the host's
i18n catalog via SDK.useI18n() instead of hardcoded English.
## Approach
Plugin dashboards ship as prebuilt IIFE bundles in
plugins/<name>/dashboard/dist/index.js — no build step, no source in
repo (upstream-authored, vendored as compiled JS). Earlier contributor
PRs (#22594, #22595, #18747) tried direct edits but didn't actually
wire the bundles to read translations.
This change does the wiring properly:
1. Each bundle gets a useI18n shim at IIFE scope:
const useI18n = SDK.useI18n
|| function () { return { t: { kanban: null }, locale: "en" }; };
Older host SDKs without useI18n still load the bundle and render
English fallbacks.
2. A small tx(t, path, fallback, vars) helper resolves dotted keys
under the plugin's namespace (t.kanban.* or t.achievements.*) and
interpolates {placeholder} tokens.
3. Every React component starts with const { t } = useI18n() and
each user-visible string is wrapped in tx(t, "key", "English fallback").
Helpers called outside React components (window.prompt callers,
constants used during init) take t as a parameter.
4. Top-level constants that were English dictionaries (COLUMN_LABEL,
COLUMN_HELP, DESTRUCTIVE_TRANSITIONS, DIAGNOSTIC_EVENT_LABELS in
kanban) become getColumnLabel(t, status)-style functions backed by
FALLBACK_* dictionaries.
## Translations added
Two new top-level namespaces added to the dashboard's TypeScript-typed
Translations interface:
- achievements: ~70 keys covering the hero, scan banner, achievement
card, share dialog, stats, filters, and empty states.
- kanban: ~145 keys covering the board, columns (with nested
columnLabels and columnHelp sub-dicts), card detail panel,
bulk-actions toolbar, dependency editor, board switcher, and
diagnostic callouts.
Each key is provided across all 16 supported locales:
en, zh, zh-hant, ja, de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu.
Total new translation entries: ~3,440 (215 keys × 16 locales).
## What stays English (deliberate)
- API paths, CSS class names, data-* attributes, JSON keys, regex
strings, URLs, file paths (~/.hermes/kanban.db, boards/_archived/).
- State identifier strings used as lookup keys (triage / todo / ready /
running / blocked / done / archived) — labels translate, key strings
don't.
- The PNG share-card text rendered to canvas in the achievements
ShareDialog (HERMES AGENT watermark, UNLOCKED stamp, tier names) —
these become part of a globally-shared image and stay English.
- localStorage keys (hermes.kanban.selectedBoard).
- Brand names (Kanban, Hermes, WebSocket, Nous Research).
## Contributor credit
PR #22594 by @02356abc and PR #22595 by @02356abc supplied the
en + zh kanban namespace skeleton (145 keys); used as the en source-
of-truth in this commit and translated to the other 14 locales.
PR #18747 by @laolaoshiren first surfaced the achievements
localization request.
## Validation
- npx tsc -b: 0 errors. All 16 locale .ts files satisfy the
Translations type with full key parity.
- npm run build (tsc + vite production build): green, 2062 modules,
1.56MB JS / 95KB CSS, ~2.5s build.
- node --check on both plugin bundles: parse cleanly.
- 126 tx() call sites in kanban, 46 in achievements.
## Out of scope
- TUI (ui-tui/) has no i18n infrastructure yet.
- Docusaurus docs (website/i18n/) — already had zh-Hans; expanding
is a separate translation workstream (Thai / Korean / Hindi PRs).
Follow-up to the previous commit's contributor cherry-pick.
The cherry-picked change replaced the bare ``["hermes", ...]`` spawn with
``[sys.executable, "-m", "hermes", ...]``. The intent was right (avoid
PATH dependence — cron, systemd User= services, launchd jobs, and other
detached dispatcher invocations routinely run with a stripped $PATH that
doesn't include the venv's bin/, breaking the bare-shim spawn) but the
module name is wrong: there is no top-level ``hermes`` package. The
console-script entry point in pyproject.toml is
``hermes = "hermes_cli.main:main"``, and ``python -m hermes`` fails with
``No module named hermes``. The cherry-picked form would have replaced a
sometimes-broken spawn with an always-broken one.
This commit:
- Adds ``_resolve_hermes_argv()``, mirroring ``gateway.run._resolve_hermes_bin``.
Tries ``shutil.which("hermes")`` first (preferred — keeps existing ``ps``
output and log lines familiar in the common case) and falls back to
``[sys.executable, "-m", "hermes_cli.main"]`` when the shim is not on
PATH. The fallback goes through the running interpreter so it's
PATH-independent. Kept as a local helper rather than imported from
gateway because ``hermes_cli`` sits below ``gateway`` in the dependency
order.
- Switches the dispatcher's ``cmd`` list to use ``*_resolve_hermes_argv()``.
- Adds three regression tests:
* ``test_resolve_hermes_argv_prefers_path_shim`` — pins the PATH-first
branch so a future refactor doesn't silently flip the order.
* ``test_resolve_hermes_argv_falls_back_to_module_form_when_no_path_shim`` —
pins the correct module name (``hermes_cli.main``, NOT ``hermes``).
Direct regression guard for the form that shipped in the original PR.
* ``test_resolve_hermes_argv_module_actually_runs`` — runs the fallback
invocation as a real subprocess and asserts ``--version`` works, so
losing ``hermes_cli.main``'s ``__main__`` handling can't slip past the
string-match test.
Verified end-to-end: with the shim on PATH the resolver returns
``[/.../hermes]`` and ``--version`` works; with the shim removed the
resolver returns ``[python, -m, hermes_cli.main]`` and ``--version``
still works; the original PR's ``python -m hermes`` invocation fails as
expected (``No module named hermes``).
In NixOS container mode, hermes is installed at a store path with no
symlink on PATH (e.g. /data/current-package/bin/hermes). The kanban
dispatcher spawns workers via _default_spawn() using a bare 'hermes'
subprocess call, which fails with 'hermes executable not found on PATH'
in container mode.
Fix by calling sys.executable -m hermes instead, which is guaranteed
to resolve to the same Python interpreter running the dispatcher.
* feat(plugins): host-owned LLM access via ctx.llm
Plugins can now ask the host to run a one-shot chat or structured
completion against the user's active model and auth, without ever
seeing an OAuth token or API key. Closes the gap where plugins that
needed bounded structured inference (receipts, CRM extraction,
support classification) had to either bring their own provider keys
or register a tool the agent had to call.
New surface on PluginContext:
- ctx.llm.complete(messages, ...)
- ctx.llm.complete_structured(instructions, input, json_schema, ...)
- async siblings ctx.llm.acomplete / acomplete_structured
Backed by the existing auxiliary_client.call_llm pipeline — every
provider, fallback chain, vision routing, and timeout policy Hermes
already supports applies automatically.
Trust gate (fail-closed by default):
- plugins.entries.<id>.llm.allow_model_override
- plugins.entries.<id>.llm.allowed_models (allowlist; '*' = any)
- plugins.entries.<id>.llm.allow_agent_id_override
- plugins.entries.<id>.llm.allow_profile_override
Embedded model@profile shorthand goes through the same gate as
explicit profile=, so it can't bypass the auth-profile policy.
Conflicting explicit and embedded profiles fail closed.
Also lands:
- plugins/plugin-llm-example/ — reference plugin that registers
/receipt-extract, demonstrating image+text structured input,
jsonschema validation, and the trust-gate config.
- website/docs/developer-guide/plugin-llm-access.md — full API docs.
- 45 unit tests covering trust gates, JSON parsing, schema
validation, image encoding, async surface, and config loading.
Validation:
- 2628 tests pass in tests/agent/
- E2E: bundled plugin loaded with isolated HERMES_HOME, slash
command produced parsed JSON via stubbed call_llm
- response_format extra_body wired correctly for both json_object
and json_schema modes
* docs(plugin-llm): rewrite quickstart and framing
The quickstart now uses a meeting-notes-to-tasks example instead of
a receipt extractor, and the page leads with hook-time / gateway
pre-filter / scheduled-job framing rather than the OpenClaw
KB/support/CRM/finance/migration enumeration that the original
upstream PR used. Receipt example moved to a separate worked
example link so the docs page itself doesn't echo any of the
upstream framing.
Also clarifies where ctx.llm fits in the broader plugin surface
(table comparing register_tool / register_platform / register_hook
/ etc.) and what makes this lane different from auxiliary_client
internals.
No code change.
* docs(plugin-llm): reframe as any LLM call, not just structured output
The original draft leaned heavily on complete_structured() and made
the chat lane (complete() / acomplete()) feel like a footnote.
Restructure so:
- The page title and description say 'any LLM call.'
- The lead shows BOTH a plain chat call (error rewriter) AND a
structured call (triage scorer) up top.
- Quick start has two complete plugin examples — /tldr (chat) and
/paste-to-tasks (structured).
- New 'When to use which' table for choosing complete() vs
complete_structured() vs the async siblings.
- Trust-gate sections explicitly note 'all four methods,' and the
request-shaping list calls out chat-only fields (messages) and
structured-only fields (instructions, input, json_schema)
alongside each other.
- The 'Where this fits' section now says 'for any reason,
structured or not.'
The receipt-extractor reference plugin still exists under
plugins/plugin-llm-example/ — but the docs page no longer treats
it as the canonical surface example. It's now described as 'a third
worked example, this time with image input.'
No code change.
* feat(plugin-llm): split provider/model into independent explicit kwargs
The first cut accepted a single 'provider/model' slug on every method
and split it internally. That looked clean but broke under live test:
the model-override path tried to use the slug's vendor prefix as a
literal Hermes provider id, which silently switched the user off
their aggregator (e.g. plugin asks for 'openai/gpt-4o-mini' on a user
who routes through OpenRouter — host attempted to call the 'openai'
provider directly, failed because OPENAI_API_KEY wasn't set).
New shape mirrors the host's main config:
ctx.llm.complete(
messages=[...],
provider='openrouter', # gated, optional
model='openai/gpt-4o-mini', # gated, optional
profile='work', # gated, optional
...
)
Each is independently gated by its own allow_*_override flag.
Granting model-override does NOT auto-grant provider-override.
Allowlists are now per-axis (allowed_providers, allowed_models)
matched literally against whatever string the plugin sends.
Dropped 'model@profile' embedded-suffix shorthand entirely. Hermes
doesn't use that pattern anywhere else; profile= is its own kwarg.
Live E2E (against real OpenRouter via Teknium's config) confirms:
- zero-config call works
- default-deny blocks each override with a helpful error
- model-only override stays on user's active provider (the bug)
- provider+model override switches cleanly
- allowlist refuses non-listed entries
- structured output round-trip parses + schema-validates
Tests: 49 cases (up from 45); all green. Docs updated to match the
new shape, including a 'most plugins never need this section' callout
on the trust-gate config block.
* fix+cleanup(plugin-llm): real attribution, hook-mode coverage, move example out of core
Three integration fixes for the ctx.llm surface:
1. Attribution bug — result.provider and result.model now reflect
what call_llm actually used, not placeholder fallbacks ('auto',
'default'). New _resolve_attribution() helper:
- explicit overrides win (what the call targeted)
- response.model wins for the recorded model (provider
canonicalisation: 'gpt-4o' → 'gpt-4o-2024-08-06' etc.)
- falls back to _read_main_provider() / _read_main_model()
when no override is set, so audit logs reflect the user's
active main provider/model
- 'auto' / 'default' only when EVERYTHING is empty
Live verified: zero-config call now records
provider='openrouter', model='anthropic/claude-4.7-opus-20260416'
instead of provider='auto', model='default'.
2. Hook-mode coverage — TestHookMode confirms ctx.llm.complete
works from inside a registered post_tool_call callback. The
docs page promised hook integration; now there's a test that
exercises the lazy-import path through the real invoke_hook
machinery. Two cases: traceback-rewrite hook with conditional
ctx.llm.complete, and minimal hook regression for the
sync-hook + sync-llm path.
3. Reference plugin moved out of core. plugins/plugin-llm-example/
is gone from hermes-agent — it now lives in the new
NousResearch/hermes-example-plugins companion repo. The docs
page links there. Hermes' bundled plugins should be plugins
users actually run; reference / docs-companion plugins live
externally.
Test count: 56 (up from 49). Wider sweep on tests/hermes_cli/
+ tests/gateway/ + tests/tools/ + tests/agent/ shows 16770
passing; the 12 failures are all pre-existing on origin/main
(verified by stashing this branch's changes and re-running) —
kanban-boards, delegate-task, gateway-restart, tts-routing —
none touch the plugin_llm surface.
* chore(plugins): move all example plugins to companion repo
Reference / docs-companion plugins now live exclusively in
NousResearch/hermes-example-plugins, not bundled with the core repo:
- example-dashboard
- strike-freedom-cockpit
A new fourth example, plugin-llm-async-example, was added to that
repo demonstrating ctx.llm's async surface (acomplete()) with
asyncio.gather() — registers /translate <lang>: <text> which fires
forward translation + sentiment classifier in parallel, then a
back-translation for QA. Live-tested at 2.5s for three real
provider round-trips (would be ~5-6s sequential).
Docs updated:
- developer-guide/plugin-llm-access.md links both sync and async
examples in the Reference section
- user-guide/features/extending-the-dashboard.md repoints both demo
sections to the companion repo with corrected install paths
- user-guide/features/built-in-plugins.md drops the two demo rows
- AGENTS.md notes that example plugins live in the companion repo
Net: hermes-agent's plugins/ directory now contains only plugins
users actually run (memory providers, dashboard tabs that ship real
features, the disk-cleanup hook, platform adapters). All four
demo / reference plugins live externally where they can be cloned
on demand instead of inflating the core install.
Follow-up to the previous commit's middleware fix.
- plugins/kanban/dashboard/plugin_api.py: rewrite the "Security note"
docstring. The previous text said "/api/plugins/ is unauthenticated by
design" — that's now actively wrong and dangerously misleading. New
text explains that plugin routes flow through the same session-token
middleware as core API routes and that --host 0.0.0.0 is safe to use
on a LAN as a result.
- tests/hermes_cli/test_web_server.py: extend TestPluginAPIAuth to cover
the surfaces the original PR didn't pin:
* test_plugin_route_allows_auth now exercises a real plugin path
(/api/plugins/example/hello) instead of accepting 200 OR 404 from
a maybe-loaded kanban plugin — the assertion was effectively vacuous.
* test_plugin_patch_requires_auth + test_plugin_delete_requires_auth
cover non-GET mutation methods in case a future regression
whitelists them by accident.
* test_non_kanban_plugin_route_requires_auth proves the fix is
plugin-agnostic, not kanban-specific (hits hermes-achievements +
a non-existent plugin namespace; both 401 before route resolution).
* test_plugin_websocket_unaffected_by_http_middleware locks in that
the HTTP middleware change didn't accidentally start gating WS
upgrades — kanban /events still uses its own ?token= check.
Plus a cosmetic blank-line cleanup.
Remove the blanket /api/plugins/* exemption from auth_middleware so
plugin API routes (e.g. Kanban dashboard) require the same session
token as all other /api/ endpoints.
Fixes#19533
Surfaces the pin command at the moment users care about it: when a
consolidation just landed against their skill library and they're
looking at the umbrella name in the curator output. Previously `hermes
curator pin` existed but had no discovery surface — users only learned
it existed by reading docs or stumbling onto `hermes curator --help`.
The hint:
archived 3 skill(s):
• docx-extraction → document-tools
• pdf-extraction → document-tools
• old-stale — pruned (stale)
full report: hermes curator status
keep an umbrella stable: hermes curator pin document-tools
Gated on having at least one consolidation that produced an umbrella.
Pruned-only runs (nothing surviving to pin) skip the hint. When
multiple umbrellas were produced, picks alphabetically first as a
concrete example rather than listing them all.
3 new tests in tests/agent/test_curator_classification.py covering:
consolidation produces hint with real umbrella name, pruned-only run
omits it, multi-umbrella picks one example.
* feat(gateway): add LINE Messaging API platform plugin
Adds LINE as a bundled platform plugin under `plugins/platforms/line/`,
synthesized from the strongest pieces of seven open community PRs. The
adapter requires zero core edits — `Platform("line")` is auto-discovered
via the bundled-plugin scan in `gateway/config.py`, and all hooks
(setup, env-enablement, cron delivery, standalone send) are wired
through `register_platform()` kwargs the way IRC and Teams do it.
Highlights merged into one plugin:
- **Reply token preferred, Push fallback.** Try the free reply token
first (single-use, ~60s TTL); fall back to metered Push when the
token is absent, expired, or rejected. (PR #21023)
- **Slow-LLM Template Buttons postback.** When the LLM is still running
past `LINE_SLOW_RESPONSE_THRESHOLD` (default 45s), the adapter burns
the original reply token to send a "Get answer" button bubble. The
user taps it to fetch the cached answer via a fresh reply token —
also free. State machine: PENDING → READY → DELIVERED, ERROR for
cancelled runs (orphan resolves to `LINE_INTERRUPTED_TEXT` after
/stop). Set threshold to 0 to disable. (PR #18153)
- **Three-allowlist gating** — separate user / group / room allowlists
with `LINE_ALLOW_ALL_USERS=true` dev-only escape hatch. (PR #18153)
- **Markdown URL preservation.** Strip bold/italic/code-fence/heading
markers (LINE renders them literally) but keep `[label](url)` →
`label (url)` so URLs stay tappable. (PR #18153)
- **System-message bypass** for `⚡ Interrupting`, `⏳ Queued`, etc. —
busy-acks reach the user as visible bubbles instead of being
swallowed into the postback cache. (PR #18153)
- **Media via public HTTPS URLs.** LINE doesn't accept binary uploads;
images/audio/video must be HTTPS-reachable. The adapter serves
registered tempfiles under `/line/media/<token>/<filename>` from the
same aiohttp app. Allowed-roots traversal guard covers
`tempfile.gettempdir()`, `/tmp` (→ `/private/tmp` on macOS), and
`HERMES_HOME`. `LINE_PUBLIC_URL` overrides URL construction for
setups behind tunnels/proxies. (PR #8398)
- **5-message-per-call batching.** LINE rejects >5 messages per
Reply/Push; smart-chunker caps text at 4500 chars per bubble.
- **Inbound dedup** via `webhookEventId` LRU. (PR #21023)
- **Self-message filter** via `/v2/bot/info` userId lookup. (PR #21023)
- **Loading-animation indicator** wired to LINE's `chat/loading/start`
endpoint, DM-only (LINE rejects it for groups/rooms). (PR #21023)
- **Out-of-process cron delivery** via `_standalone_send`, so
`deliver: line` cron jobs work even when cron runs detached from
the gateway.
- **Webhook hardening** — 1 MiB body cap, constant-time HMAC-SHA256
signature verification, dedup, scoped lock so two profiles can't
bind the same channel.
Validation
----------
- `scripts/run_tests.sh tests/gateway/test_line_plugin.py` →
73 passed in 1.05s
- `scripts/run_tests.sh tests/gateway/test_line_plugin.py
tests/gateway/test_irc_adapter.py
tests/gateway/test_plugin_platform_interface.py
tests/gateway/test_platform_registry.py
tests/gateway/test_config.py` → 193 passed, 7 skipped
- E2E import + register + signature roundtrip + `Platform("line")`
bundled-plugin discovery verified against current `origin/main`.
Closes the seven open LINE PRs (#18153, #16832, #6676, #21023, #14942,
#14988, #8398) by superseding them with a single plugin-form
implementation that takes the best idea from each.
Co-authored-by: pwlee <32443648+leepoweii@users.noreply.github.com>
Co-authored-by: Jetha Chan <jetha@google.com>
Co-authored-by: Cattia <openclaw@liyangchen.me>
Co-authored-by: perng <charles@perng.com>
Co-authored-by: Soichiro Yoshimura <soichiro0111.dev@gmail.com>
Co-authored-by: David Zhou <77736378+David-0x221Eight@users.noreply.github.com>
Co-authored-by: Yu-ga <74749461+yuga-hashimoto@users.noreply.github.com>
* docs(platforms): document platform-specific slow-LLM UX pattern
Add a 'Platform-Specific Slow-LLM UX' section to the platform-adapter
developer guide covering the _keep_typing override pattern that LINE
uses for its Template Buttons postback flow.
Three subsections:
- Pattern: subclass _keep_typing to layer mid-flight UX (with code)
- Pattern: subclass send to route through a cache instead of sending
- When this pattern is appropriate (vs. always-Push fallback)
Plus a short pointer in gateway/platforms/ADDING_A_PLATFORM.md so
tree-readers find the prose walkthrough on the docsite.
Filed because the LINE plugin (PR #23197) was the first bundled
adapter to need this pattern — every prior plugin (irc, teams,
google_chat) handles slow responses with the default typing-loop and
a regular send_text. Documenting now while the rationale is fresh.
---------
Co-authored-by: pwlee <32443648+leepoweii@users.noreply.github.com>
Co-authored-by: Jetha Chan <jetha@google.com>
Co-authored-by: Cattia <openclaw@liyangchen.me>
Co-authored-by: perng <charles@perng.com>
Co-authored-by: Soichiro Yoshimura <soichiro0111.dev@gmail.com>
Co-authored-by: David Zhou <77736378+David-0x221Eight@users.noreply.github.com>
Co-authored-by: Yu-ga <74749461+yuga-hashimoto@users.noreply.github.com>
web_extract runs returned page content through the web_extract auxiliary
model when pages exceed 5 000 chars (single-pass up to 500k, chunked up
to 2M, refused above that). The user-guide page didn't mention this —
users were surprised that long-page extracts produced summaries instead
of raw markdown, and that those summaries cost main-model tokens by
default.
Adds:
- size-driven behavior table (under 5k / 5k–500k / 500k–2M / over 2M)
- which auxiliary task does the work (auxiliary.web_extract)
- how to route summaries to a cheap model regardless of main
- escape hatch: browser_navigate when you need raw content
- troubleshooting entry for summarization timeouts
Adapted from PR #20568 commit ce3518578 (Eric Litovsky / @kallidean).
Adds two-tier gating for the kanban tool surface so dispatcher-spawned
workers see only task-lifecycle tools (show/complete/block/heartbeat/
comment/create/link) while orchestrator profiles with `toolsets: [kanban]`
also see board-routing tools (kanban_list, kanban_unblock).
Workers shouldn't be enumerating or unblocking the board — they should
close their own task via the lifecycle tools. Hiding board-routing tools
from worker schemas keeps the worker focused and the toolset-isolation
contract honest.
Plus inherited from the same upstream commit:
- 50/200 row bound on kanban_list with `truncated` + `next_limit` metadata.
- Belt-and-suspenders runtime guard `_require_orchestrator_tool()` inside
the orchestrator handlers in case a stale registration ever routes a
worker to one of them.
- Tests for the new gate, the stricter bound, and the fact that even a
worker with `toolsets: [kanban]` in config still doesn't see board
routing.
Co-authored-by: Eric Litovsky <elitovsky@zenproject.net>
Two follow-ups from self-review:
1. Add gpt-5.3-codex-spark to DEFAULT_CONTEXT_LENGTHS at 128k. The
primary resolution path for Spark goes through provider='openai-codex'
→ _CODEX_OAUTH_CONTEXT_FALLBACK (already correct). But if any future
code path resolves Spark's context with a different provider (custom
proxy, generic fallthrough), the longest-substring-first lookup in
step 8 would match 'gpt-5' and report 400k, which is wrong by ~3x.
Adding the explicit override is a cheap defensive correctness fix
matching how gpt-5.4-mini and gpt-5.4-nano already shadow the generic
gpt-5 entry.
2. Update test_openai_codex_model_validation_fallback.py docstring. The
bug it was originally written for (gpt-5.3-codex-spark missing from
listing) is now resolved by this PR's catalog restoration. The test
still validly exercises the soft-accept code path for any future
entitlement-gated Codex slug that ships before Hermes catalogs it,
but the framing was stale — clarified.
Two follow-ups from self-review:
1. Add unit test for _fetch_models_from_api covering the live HTTP path.
The salvaged PR #19530 dropped the supported_in_api:false filter in
both _fetch_models_from_api and _read_cache_models, but only the
cache path had a regression test. This adds the symmetric live-fetch
test (mocked httpx) so a future drive-by change to the HTTP path
can't silently re-introduce the filter.
2. Pin test_codex_picker_uses_live_codex_catalog to the cache fallback.
The test wrote a fake JWT and a CODEX_HOME cache, but provider_model_ids
('openai-codex') still issued a real 10s HTTP probe to
chatgpt.com/backend-api/codex/models before falling back to the cache.
That made the test slow and non-deterministic in restricted/CI
networks. Patch _fetch_models_from_api to return [] so we go straight
to the cache path the test actually means to exercise.
PR #12994 stripped gpt-5.3-codex-spark on the assumption that it was
unsupported. It's actually research-preview, ChatGPT-Pro-only, exposed
via the Codex OAuth backend at chatgpt.com/backend-api/codex/models —
not via the public OpenAI API.
Add explanatory comments in:
- DEFAULT_CODEX_MODELS / _FORWARD_COMPAT_TEMPLATE_MODELS (codex_models.py)
- _CODEX_OAUTH_CONTEXT_FALLBACK (model_metadata.py)
- list_authenticated_providers' live-discovery branch (model_switch.py)
so future maintainers don't strip the entry again. Also documents the
intentional asymmetry that Spark stays out of the "openai" provider
catalog (it isn't on the public API) and why the supported_in_api
filter is *not* applied for the openai-codex route.
Closes#6051.
Reported failure mode: agent migrated to WSL2, browser launch failed
because Playwright wasn't installed yet. Background reviewer captured
the failure as a durable skill (`browser-tool-launch-issue`) and the
agent kept refusing the browser tool for weeks after Playwright was
installed and verified working. Negative claims also propagated into
unrelated skills ("browser tools do not work", "cannot use Y from
execute_code").
Root cause: `_SKILL_REVIEW_PROMPT` and `_COMBINED_REVIEW_PROMPT` both
lean hard on "be active, save things, a pass that does nothing is a
missed learning opportunity." Neither distinguished durable knowledge
from transient environment state. The reviewer was doing what it was
told.
Fix at the write site — both prompts now carry a "Do NOT capture"
section calling out:
• Environment-dependent failures (missing binaries, fresh-install
errors, post-migration path mismatches, 'command not found',
unconfigured credentials, uninstalled packages)
• Negative claims about tools or features ("X does not work")
that harden into self-cited refusals
• Session-specific transient errors that resolved before the
conversation ended
• One-off task narratives ("summarize today's market", "analyze
this PR") — also addresses the #12812 / #4538 family
Plus a positive-reframing line: when a tool fails because of setup
state, capture the FIX (install command, config step, env var)
under an existing setup/troubleshooting skill — never "this tool
doesn't work" as a standalone constraint.
Targeted tests: 24/24 passing in tests/run_agent/test_review_prompt_class_first.py
(2 new + all existing review-prompt assertions). Substring-based
checks so future prompt edits don't false-fail.
The previous PR (#22993) gave us a structured WARNING per stream drop
but the only diagnostic was 'error_type=APIError error=Network
connection lost.' — same nothing the user started with. To actually
diagnose why subagents drop streams disproportionately we need to know
WHERE the drop happened.
Adds three breadcrumbs to the agent.log WARNING:
1. Inner exception chain. openai SDK wraps httpx errors as
APIConnectionError / APIError so the catch site only sees the
wrapper. _flatten_exception_chain walks __cause__/__context__ up to
4 levels deep and renders 'Outer(msg) <- Inner(msg)' so we can
tell ConnectError vs RemoteProtocolError vs ReadError vs
ProxyError without enabling verbose mode.
2. Upstream HTTP headers. Snapshots cf-ray, x-openrouter-provider,
x-openrouter-model, x-openrouter-id, x-request-id, server, via,
etc. from stream.response immediately after open (so they survive
even when the stream dies before the first chunk). These answer
'is one CF edge / one downstream provider responsible, or random?'
3. Per-attempt counters. bytes streamed, chunk count, elapsed time on
the dying attempt, and time-to-first-byte. Distinguishes 'couldn't
connect at all' (0s, 0 bytes) from 'died after 30s mid-stream'
(very different root causes — first is auth/routing, second is
upstream idle-kill or proxy timeout).
Plumbing:
- _stream_diag_init / _stream_diag_capture_response live on AIAgent
and produce a per-attempt dict held on request_client_holder['diag']
for closure access from the retry block.
- _call_chat_completions and _call_anthropic both initialize the diag
and increment counters per chunk/event (best-effort, never raises in
the streaming hot path).
- _log_stream_retry / _emit_stream_drop accept an optional diag and
render the new fields. Final-exhaustion log goes through the same
helper so it gets the same diagnostic dump.
- UI status line gains a brief 'after Xs' suffix when timing is
available — distinguishes 'connect failed' from 'died mid-stream'
at a glance without grepping logs.
Sample WARNING after this change:
Stream drop mid tool-call on attempt 2/3 — retrying.
subagent_id=sa-2-cafef00d depth=1 provider=openrouter
base_url=https://openrouter.ai/api/v1
error_type=APIError error=Connection error.
chain=APIError(Connection error.) <- RemoteProtocolError(peer
closed connection without sending complete message body)
http_status=200 bytes=12400 chunks=47 elapsed=12.00s ttfb=0.83s
upstream=[cf-ray=8f1a2b3c4d5e6f7g-LAX
x-openrouter-provider=Anthropic
x-openrouter-id=gen-abc123 server=cloudflare]
Tests: 10 covering diag init, header capture (whitelist enforced for
PII), exception-chain walking + depth cap, log content with full diag,
log content without diag (placeholders), UI elapsed-suffix on/off.