Before: delegate_task children each allocated their own terminal
sandbox keyed by child task_id. Starting extra containers (or Modal
sandboxes / Daytona workspaces) is expensive, and the subagent's work
is invisible to the parent — files written by the child in its
container don't exist in the parent's when the subagent returns.
After: a single `_resolve_container_task_id` helper maps any
tool-call task_id to "default" UNLESS an env override is registered
for it. The parent agent and all delegate_task children therefore
share one long-lived sandbox — installed packages, cwd, /workspace
files, and /tmp scratch carry over freely between them.
RL and benchmark environments (TerminalBench2, HermesSweEnv, ...)
opt in to isolation via `register_task_env_overrides(task_id, {...})`;
those task_ids survive the collapse and get their own sandbox,
preserving the per-task Docker image behavior these benchmarks rely on.
file_state / active-subagents registry / TUI events still key off the
original child task_id, so the 'subagent wrote a file the parent read'
warning and UI per-subagent panels keep working.
Tradeoff: parallel delegate_task children (tasks=[...]) now share one
bash/container. Concurrent cd, env-var mutations, and writes to the
same path will collide. If that bites a specific workflow, the
subagent can opt back into isolation via register_task_env_overrides.
Applied at four lookup sites:
- tools/terminal_tool.py terminal_tool() and get_active_env()
- tools/file_tools.py _get_file_ops() and _get_live_tracking_cwd()
- tools/code_execution_tool.py _get_or_create_environment()
Docs: website/docs/user-guide/configuration.md updated to reflect the
shared-container reality and document the RL/benchmark carve-out.
Tests: tests/tools/test_shared_container_task_id.py (9 cases).
Every command in COMMAND_REGISTRY (/btw, /stop, /model, /help, /new,
/bg, /reset, ...) is now a first-class Slack slash command instead of
a /hermes <subcommand>. Users get the same autocomplete-driven slash
picker experience Slack users expect and that Discord and Telegram
already provide.
Previously Slack registered ONE native slash (/hermes) and split on
the first word, so typing /btw in Slack's composer got 'couldn't find
an app for /btw' because the workspace manifest never declared it.
Changes
- hermes_cli/commands.py: slack_native_slashes() + slack_app_manifest()
generate a Slack manifest from the registry (canonical names +
aliases + plugin commands), clamped to Slack's 50-slash cap with
/hermes reserved as the catch-all.
- gateway/platforms/slack.py: single regex matcher dispatches every
registered slash to _handle_slash_command, which dispatches on
command['command']. Legacy /hermes <subcommand> keeps working for
backward compat with older workspace manifests.
- hermes_cli/slack_cli.py + hermes_cli/main.py: new 'hermes slack
manifest' command prints/writes a full manifest (display info,
OAuth scopes, event subs, socket mode, slash commands) ready to
paste into 'Create from manifest' or Features → App Manifest.
- hermes_cli/setup.py: _setup_slack() now writes the manifest up-front
and points users at the 'From an app manifest' flow; also offers
to refresh the manifest on reconfigure for picking up new commands.
- Tests: 14 new tests covering native-slash dispatch (/btw, /stop,
/model), legacy /hermes <sub> compat, manifest structure, and
telegram<->slack parity (every Telegram command must also register
as a Slack slash). Existing /hermes-registration test updated to
assert the new regex matches /hermes, /btw, /stop, /model, /help.
- Docs: slack.md gains a 'Slash Commands' section + Option A manifest
flow in Step 1; cli-commands.md documents 'hermes slack manifest'.
Users pick up the new slashes by running 'hermes slack manifest --write'
and pasting into Features → App Manifest → Edit in their Slack app
config, then Save (Slack prompts for reinstall if scopes changed).
The Docker terminal-backend docs said 'each session starts a long-lived
container', implying a fresh container per chat session. That hasn't been
true for a while: for the top-level agent, task_id defaults to 'default'
and the container is cached in _active_environments for the lifetime of
the Hermes process. /new, /reset, and switching sessions all reuse the
same container. Only delegate_task subagents and RL rollouts get isolated
containers keyed by their own task_id.
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for
worker and orchestrator profiles. SQLite-backed task board
(~/.hermes/kanban.db) shared across all profiles on the host. Zero
changes to run_agent.py, no new core tools, no tool-schema bloat.
Motivation: delegate_task is a function call — sync fork/join, anonymous
subagent, no resumability, no human-in-the-loop. Kanban is the durable
shape needed for research triage, scheduled ops, digital twins,
engineering pipelines, and fleet work. They coexist (workers may call
delegate_task internally).
What this adds
- hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution,
dispatcher, workspace resolution, worker-context builder.
- hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash()
entry point used by both CLI and gateway.
- skills/devops/kanban-worker — how a profile should work a claimed task.
- skills/devops/kanban-orchestrator — "you are a dispatcher, not a
worker" template with anti-temptation rules.
- /kanban slash command wired into cli.py and gateway/run.py. Bypasses
the running-agent guard (board writes don't touch agent state), so
/kanban unblock can free a stuck worker mid-conversation.
- Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis
vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns;
4 user stories; implementation plan; concurrency correctness.
- Docs: website/docs/user-guide/features/kanban.md, CLI reference
updated, sidebar entry added.
Architecture highlights
- Three planes: control (user + gateway), state (board + dispatcher),
execution (pool of profile processes).
- Every worker is a full OS process, spawned as `hermes -p <profile>`.
No in-process subagent swarms — solves NanoClaw's SDK-lifecycle
failure class.
- Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale
claims reclaimed 15 min after their TTL expires.
- Tenant namespacing via one nullable column — one specialist fleet
can serve many businesses with data isolation by workspace path.
Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution,
dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass
hermetic via scripts/run_tests.sh.
The ephemeral no-tools side-question variant of /btw confused users who
expected 'by-the-way' to mean 'run this off to the side with tools' —
they'd type /btw and get a toolless agent that couldn't do the work.
/bg worked because it was /background with full tools.
Collapse the two: /btw and /bg both alias to /background. One command,
one behavior, no more gotchas about which variant has tools.
Removed:
- _handle_btw_command in cli.py and gateway/run.py
- _run_btw_task + _active_btw_tasks state in gateway/run.py
- prompt.btw JSON-RPC method + btw.complete event in tui_gateway
- BtwStartResponse type + btw.complete case in ui-tui
- Standalone /btw slash tree registration in Discord
- Standalone btw CommandDef in hermes_cli/commands.py
Updated:
- background CommandDef aliases: (bg,) -> (bg, btw)
- TUI session.ts: local btw handler merged into background
- Docs and tips updated to describe /btw as a /background alias
PR #16046 added /busy and /verbose hints to the classic CLI and the
gateway runner but skipped the Ink TUI (and therefore the dashboard
/chat page, which embeds the TUI via PTY). This extends the same
latch to the TUI with TUI-native wording.
The TUI's busy-input model is not the /busy knob from the CLI —
single Enter while busy auto-queues, double Enter on an empty line
interrupts. The new busy-input hint teaches THAT gesture instead of
telling the user to flip a config that does not apply.
Changes:
- agent/onboarding.py — add busy_input_hint_tui() + tool_progress_hint_tui()
- tui_gateway/server.py — onboarding.claim JSON-RPC (Ink triggers busy
hint on enqueue) + _maybe_emit_onboarding_hint helper hooked into
_on_tool_complete for the 30s/tool_progress=all path. Same
config.yaml latch so each hint fires at most once per install across
CLI, gateway, and TUI combined.
- ui-tui/src/gatewayTypes.ts — OnboardingClaimResponse + onboarding.hint event
- ui-tui/src/app/createGatewayEventHandler.ts — render the hint event as sys()
- ui-tui/src/app/useSubmission.ts — claim busy_input_prompt on first
busy enqueue
- tests/agent/test_onboarding.py — +3 cases for TUI hint shape
- tests/tui_gateway/test_protocol.py — +4 cases for onboarding.claim
- website/docs/user-guide/tui.md — new 'Interrupting and queueing'
section explaining the TUI's double-Enter model and the hints
Validation:
scripts/run_tests.sh tests/agent/test_onboarding.py \
tests/tui_gateway/test_protocol.py \
tests/gateway/test_busy_session_ack.py
-> 66 passed
npm --prefix ui-tui run type-check -> clean
npm --prefix ui-tui run lint -> clean
npm --prefix ui-tui run build -> clean
Instead of a blocking first-run questionnaire, show a one-time hint the first
time the user hits each behavior fork:
1. First message while the agent is working — appends a hint to the busy-ack
explaining the /busy queue vs /busy interrupt knob, phrased to match the
mode that was just applied (don't tell a queue-mode user to switch to
queue).
2. First tool that runs for >= 30s in the noisiest progress mode
(tool_progress: all) — prints a hint about /verbose to cycle display
modes (all -> new -> off -> verbose). Gated on /verbose actually being
usable on the surface: always shown on CLI; on gateway only shown when
display.tool_progress_command is enabled.
Each hint is latched in config.yaml under onboarding.seen.<flag>, so it
fires exactly once per install across CLI, gateway, and cron, then never
again. Users can wipe the section to re-see hints.
New:
- agent/onboarding.py — is_seen / mark_seen / hint strings, shared by
both CLI and gateway.
- onboarding.seen in DEFAULT_CONFIG (hermes_cli/config.py) and in
load_cli_config defaults (cli.py). No _config_version bump — deep
merge handles new keys.
Wired:
- gateway/run.py: _handle_active_session_busy_message appends the hint
after building the ack. progress_callback tracks tool.completed
duration and queues the tool-progress hint into the progress bubble.
- cli.py: CLI input loop appends the busy-input hint on the first busy
Enter; _on_tool_progress appends the tool-progress hint on the first
>=30s tool completion. In-memory CLI_CONFIG is also updated so
subsequent fires in the same process are suppressed immediately.
All writes go through atomic_yaml_write and are wrapped in try/except
so onboarding can never break the input/busy-ack paths.
OpenRouter and Nous Portal curated picker lists now resolve via a JSON
manifest served by the docs site, falling back to the in-repo snapshot
when unreachable. Lets us update model lists without shipping a release.
Live URL: https://hermes-agent.nousresearch.com/docs/api/model-catalog.json
(source at website/static/api/model-catalog.json; auto-deploys via the
existing deploy-site.yml GitHub Pages pipeline on every merge to main).
Schema (v1) carries id + optional description + free-form metadata at
manifest, provider, and model levels. Pricing and context length stay
live-fetched via existing machinery (/v1/models endpoints, models.dev).
Config (new model_catalog section, default enabled):
model_catalog.url master manifest URL
model_catalog.ttl_hours disk cache TTL (default 24h)
model_catalog.providers.<name>.url optional per-provider override
Fetch pipeline: in-process cache -> disk cache (fresh < TTL) -> HTTP
fetch -> disk-cache-on-failure fallback -> in-repo snapshot as last
resort. Never raises to callers; at worst returns the bundled list.
Changes:
- website/static/api/model-catalog.json initial manifest (35 OR + 31 Nous)
- scripts/build_model_catalog.py regenerator from in-repo lists
- hermes_cli/model_catalog.py fetch + validate + cache module
- hermes_cli/models.py fetch_openrouter_models() +
new get_curated_nous_model_ids()
- hermes_cli/main.py, hermes_cli/auth.py Nous flows use the helper
- hermes_cli/config.py model_catalog defaults
- website/docs/reference/model-catalog.md + sidebars.ts
- tests/hermes_cli/test_model_catalog.py 21 tests (validation, fetch
success/failure, accessors,
disabled, overrides, integration)
Plugin hooks fired after a tool dispatch now receive an integer
duration_ms kwarg measuring how long the tool's registry.dispatch()
call took (time.monotonic() before/after). Inspired by Claude Code
2.1.119 which added the same field to PostToolUse hook inputs.
Wire points:
- model_tools.py: measure dispatch latency, pass duration_ms to
invoke_hook("post_tool_call", ...) and invoke_hook("transform_tool_result", ...)
- hermes_cli/hooks.py: include duration_ms in the synthetic payload
used by 'hermes hooks test' and 'hermes hooks doctor' so shell-hook
authors see the same shape at development time as runtime
- shell hooks (agent/shell_hooks.py): no code change needed;
_serialize_payload already surfaces non-top-level kwargs under
payload['extra'], so duration_ms lands at extra.duration_ms for
shell-hook scripts
Plugin authors can now build latency dashboards, per-tool SLO alerts,
and regression canaries without having to wrap every tool manually.
Test: tests/test_model_tools.py::test_post_tool_call_receives_non_negative_integer_duration_ms
E2E: real PluginManager + dispatch monkey-patched with a 50ms sleep,
hook callback observes duration_ms=50 (int).
Refs: https://code.claude.com/docs/en/changelog (2.1.119, Apr 23 2026)
Bare `hermes setup` on a returning user now drops straight into the
full reconfigure wizard — every prompt shows the current value as its
default, press Enter to keep or type a new value to change it. The
returning-user menu is gone.
Behavior:
- First-time user: first-time wizard (unchanged)
- Returning user, bare command: full reconfigure wizard (new default)
- Returning user, `--quick`: only prompt for missing/unset items
- Returning user, one section: `hermes setup model|terminal|gateway|tools|agent`
- `--reconfigure`: preserved as backwards-compat alias (no-op since it's now default)
The section functions already used current values as prompt defaults —
this change just removes the extra click to get to them.
The 'Quick Setup - configure missing items only' menu option is now
exposed as the explicit `--quick` flag; it's the narrow case of
filling in missing config (e.g. after a partial OpenClaw migration or
when a required API key got cleared).
Inspired by Mercury Agent's `mercury doctor` UX.
Also removes:
- RETURNING_USER_MENU_SECTION_KEYS (orphaned constant)
- Two returning-user menu tests in test_setup_noninteractive.py
(guarding behavior that no longer exists — covered by
test_setup_reconfigure.py instead)
- New website/docs/guides/azure-foundry.md covering both OpenAI-style
and Anthropic-style endpoints, auto-detection behaviour, gpt-5.x
routing, /v1 stripping, api-version query forwarding, and the
provider: anthropic + Azure URL alternative setup.
- environment-variables.md picks up AZURE_FOUNDRY_API_KEY,
AZURE_FOUNDRY_BASE_URL, AZURE_ANTHROPIC_KEY.
- cli-commands.md includes azure-foundry in the provider choices list.
- configuration.md lists azure-foundry among auxiliary-task providers.
- sidebars.ts wires the new guide into the Guides section.
- scripts/release.py AUTHOR_MAP entries for TechPrototyper,
HangGlidersRule (noreply), and pein892 so the contributor-attribution
CI check does not reject the salvage.
- webhooks.md: adds a Video Tutorial section under the intro with a
responsive YouTube iframe (WNYe5mD4fY8).
- configuration.md: adds a Video Tutorial subsection under Auxiliary
Models with a responsive YouTube iframe (NoF-YajElIM).
Both use a 16:9 aspect-ratio wrapper so the embeds scale cleanly on
mobile. Verified with `npm run build` — MDX parses clean, no new
warnings or broken links introduced.
The AIAgent.flush_memories pre-compression save, the gateway
_flush_memories_for_session, and everything feeding them are
obsolete now that the background memory/skill review handles
persistent memory extraction.
Problems with flush_memories:
- Pre-dates the background review loop. It was the only memory-save
path when introduced; the background review now fires every 10 user
turns on CLI and gateway alike, which is far more frequent than
compression or session reset ever triggered flush.
- Blocking and synchronous. Pre-compression flush ran on the live agent
before compression, blocking the user-visible response.
- Cache-breaking. Flush built a temporary conversation prefix
(system prompt + memory-only tool list) that diverged from the live
conversation's cached prefix, invalidating prompt caching. The
gateway variant spawned a fresh AIAgent with its own clean prompt
for each finalized session — still cache-breaking, just in a
different process.
- Redundant. Background review runs in the live conversation's
session context, gets the same content, writes to the same memory
store, and doesn't break the cache. Everything flush_memories
claimed to preserve is already covered.
What this removes:
- AIAgent.flush_memories() method (~248 LOC in run_agent.py)
- Pre-compression flush call in _compress_context
- flush_memories call sites in cli.py (/new + exit)
- GatewayRunner._flush_memories_for_session + _async_flush_memories
(and the 3 call sites: session expiry watcher, /new, /resume)
- 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks,
hermes tools UI task list, auxiliary_client docstrings
- _memory_flush_min_turns config + init
- #15631's headroom-deduction math in
_check_compression_model_feasibility (headroom was only needed
because flush dragged the full main-agent system prompt along;
the compression summariser sends a single user-role prompt so
new_threshold = aux_context is safe again)
- The dedicated test files and assertions that exercised
flush-specific paths
What this renames (with read-time backcompat on sessions.json):
- SessionEntry.memory_flushed -> SessionEntry.expiry_finalized.
The session-expiry watcher still uses the flag to avoid re-running
finalize/eviction on the same expired session; the new name
reflects what it now actually gates. from_dict() reads
'expiry_finalized' first, falls back to the legacy 'memory_flushed'
key so existing sessions.json files upgrade seamlessly.
Supersedes #15631 and #15638.
Tested: 383 targeted tests pass across run_agent/, agent/, cli/,
and gateway/ session-boundary suites. No behavior regressions —
background memory review continues to handle persistent memory
extraction on both CLI and gateway.
Follow-up to PR #15658. The feature PR introduced page-scoped slots
(<page>:top / <page>:bottom inside every built-in page) but only
touched the Shell slots catalogue. Adds proper narrative coverage so
plugin authors find the feature.
Changes
- extending-the-dashboard.md:
- Frontmatter description + intro bullet now mention page-scoped slots
- New TOC entry "Augmenting built-in pages (page-scoped slots)"
- New dedicated subsection after "Replacing built-in pages"
explaining the heavy-vs-light tradeoff, listing the pages that
expose slots, and showing a worked manifest + IIFE example with
tab.hidden: true
- Cross-link from the tab.override section pointing readers to the
lighter augmentation option
- web-dashboard.md:
- Bullet mentioning "page-scoped slots (inject widgets into
built-in pages without overriding them)"
Validation
- TOC anchor "#augmenting-built-in-pages-page-scoped-slots" matches
the generated heading slug
- Code fences balanced (64, even)
- Pre-existing docusaurus build errors (skills.json, api-server.md
link) reproduce on bare main -- not introduced here
* fix(terminal): three-layer defense against watch_patterns notification spam
Background processes that stack notify_on_complete=True with watch_patterns
can flood the user with duplicate, delayed notifications — matches deliver
asynchronously via the completion queue and continue arriving minutes after
the process has exited. The docstring warning against this (PR #12113) has
proven insufficient; agents still misuse the combination.
Three layered defenses, each sufficient on its own:
1. Mutual exclusion (terminal_tool.py): When both flags are set on a
background process, drop watch_patterns with a warning. notify_on_complete
wins because 'let me know when it's done' is the more useful signal and
fires exactly once. Extracted as _resolve_notification_flag_conflict() so
the rule is testable in isolation.
2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now
bails the moment session.exited is True. Post-exit chunks (buffered reads
draining after the process is gone) no longer produce notifications. This
is the fix flagged as future work in session 20260418_020302_79881c.
3. Global circuit breaker (process_registry.py): Per-session rate limits don't
catch the sibling-flood case — N concurrent processes can each stay under
8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap
trips a 30-second cooldown across ALL sessions, emits a single
watch_overflow_tripped event, silently counts dropped events, and emits a
watch_overflow_released summary when the cooldown ends.
Also updates the tool schema + docstring to document the new behavior.
Tests: 8 new tests covering all three fixes (suppress-after-exit x2,
mutual-exclusion resolver x4, global breaker trip/cooldown/release x2).
All 60 tests across test_watch_patterns.py, test_notify_on_complete.py,
test_terminal_tool.py pass.
Real-world trigger: self-inflicted in session 20260425_051924 — three
concurrent hermes-sweeper review subprocesses each set watch_patterns=
['failed validation', 'errored'] AND notify_on_complete=True, then iterated
over multiple items, producing enough matches per process to defeat the
per-session cap while staying under the global cap that didn't yet exist.
* fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion
Per Teknium's direction, the watch_patterns rate limit is now much more
aggressive and self-healing.
## New rule — per session
- HARD cap: 1 watch-match notification per 15 seconds per process.
- Any match arriving inside the cooldown window is dropped and counts as
ONE strike for that window (many drops in the same window still = 1 strike).
- After 3 consecutive strike windows, watch_patterns is permanently disabled
for the session and the session is auto-promoted to notify_on_complete
semantics — exactly one notification when the process actually exits.
- A cooldown window that expires with zero drops resets the consecutive
strike counter — healthy cadence is forgiven.
## Schema + docstring rewritten
The tool schema description now gives the model explicit guidance:
- notify_on_complete is 'the right choice for almost every long-running task'
- watch_patterns is for RARE one-shot signals on LONG-LIVED processes
- Do NOT use watch_patterns with loops/batch jobs — error patterns fire every
iteration and will hit the strike limit fast
- Mutual exclusion is stated on both parameter descriptions
- 1/15s cooldown and 3-strike promotion are stated in the watch_patterns
description so the model sees the contract every turn
## Removed
- WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the
new 1/15s limit subsumes both; keeping them would double-count.
- _watch_window_hits / _watch_window_start / _watch_overload_since fields
on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until
/ _watch_strike_candidate / _watch_consecutive_strikes.
## Kept
- Global circuit breaker across all sessions (15/10s → 30s cooldown) as a
secondary safety net for concurrent siblings. Still valuable when 20
short-lived processes each fire once — none individually violates the
per-session limit.
- Suppress-after-exit guard.
- Mutual exclusion resolver at the tool entry point.
## Tests
- 6 new tests in TestPerSessionRateLimit covering: first match delivers,
second in cooldown suppressed, multi-drop = single strike, 3 strikes
disables + promotes, clean window resets counter, suppressed count
carried to next emit.
- Global circuit breaker tests rewritten to use fresh sessions instead of
hacking removed per-window fields.
- 50/50 watch_patterns + notify_on_complete tests pass.
- 60/60 including test_terminal_tool.py pass.
* feat(dashboard): page-scoped plugin slots for built-in pages
Dashboard plugins can now inject components into specific built-in
pages (Sessions, Analytics, Logs, Cron, Skills, Config, Env, Docs,
Chat) without overriding the whole route.
Previously, plugins could only:
1. Add new tabs (tab.path)
2. Replace whole built-in pages (tab.override)
3. Inject into global shell slots (header-*, footer-*, pre-main, ...)
None of those let a plugin add a banner, card, or widget to an
existing page. The new <page>:top / <page>:bottom slots close that
gap, reusing the existing registerSlot() API.
Changes
- web/src/plugins/slots.ts: 18 new KNOWN_SLOT_NAMES entries
(sessions:top, sessions:bottom, analytics:top, ..., chat:bottom),
grouped under "Shell-wide" vs "Page-scoped" in the docblock
- web/src/pages/*: each built-in page now renders
<PluginSlot name="<page>:top" />
as the first child of its outer wrapper and
<PluginSlot name="<page>:bottom" />
as the last child -- zero visual cost when no plugin registers
- plugins/example-dashboard: registers a demo banner into
sessions:top via registerSlot(), with matching slots entry in
the manifest -- so freshly-setup users can see what page-scoped
slots look like without writing any plugin code
- website/docs: new "Page-scoped slots" table in the plugin
authoring guide, with a worked example
- tests/hermes_cli/test_web_server.py: round-trip test for
colon-bearing slot names (sessions:top, analytics:bottom, ...)
Validation
- npm run build: clean (tsc -b + vite build, 2761 modules)
- scripts/run_tests.sh tests/hermes_cli/test_web_server.py::TestDashboardPluginManifestExtensions: 5/5 pass
The web-dashboard.md and dashboard-plugins.md pages had overlapping,
partial coverage of the theme and plugin systems. Themes were split
across two pages; the plugin docs had a minimal manifest reference but
no step-by-step guide, no slot catalog, and no theme+plugin demo.
New: user-guide/features/extending-the-dashboard.md — single navigable
reference for all three extension layers (themes, UI plugins, backend
plugins). Includes:
- Theme quick-start + full schema (palette, typography, layout, layout
variants, assets, componentStyles, colorOverrides, customCSS)
- Plugin quick-start + full schema (manifest, SDK, slots, tab.override,
tab.hidden, backend routes, custom CSS)
- 10-slot shell catalog with locations
- Plugin discovery + load lifecycle
- Combined theme+plugin walkthrough (Strike Freedom cockpit demo)
- API reference + troubleshooting
web-dashboard.md: trimmed to core tool docs (pages, REST API, CORS,
development). Theme/plugin content now points to the new page with a
built-in themes summary table.
dashboard-plugins.md: deleted (merged into extending-the-dashboard.md).
sidebars.ts: swap 'dashboard-plugins' → 'extending-the-dashboard' under
the Management group.
No user-facing behavior change; docs-only.
- update faq answer with new `backup` command in release 0.9.0
- move profile export section together with backup section so related information can be read more easily
- add table comparison between `profile export` and `backup` to assist users if understanding the nuances between both
Hermes' WhatsApp bridge routinely surfaces the same person under either
a phone-format JID (60123456789@s.whatsapp.net) or a LID (…@lid),
and may flip between the two for a single human within the same
conversation. Before this change, build_session_key used the raw
identifier verbatim, so the bridge reshuffling an alias form produced
two distinct session keys for the same person — in two places:
1. DM chat_id — a user's DM sessions split in half, transcripts and
per-sender state diverge.
2. Group participant_id (with group_sessions_per_user enabled) — a
member's per-user session inside a group splits in half for the
same reason.
Add a canonicalizer that walks the bridge's lid-mapping-*.json files
and picks the shortest/numeric-preferred alias as the stable identity.
build_session_key now routes both the DM chat_id and the group
participant_id through this helper when the platform is WhatsApp.
All other platforms and chat types are untouched.
Expose canonical_whatsapp_identifier and normalize_whatsapp_identifier
as public helpers. Plugins that need per-sender behaviour (role-based
routing, per-contact authorization, policy gating) need the same
identity resolution Hermes uses internally; without a public helper,
each plugin would have to re-implement the walker against the bridge's
internal on-disk format. Keeping this alongside build_session_key
makes it authoritative and one refactor away if the bridge ever
changes shape.
_expand_whatsapp_aliases stays private — it's an implementation detail
of how the mapping files are walked, not a contract callers should
depend on.
Exposes hermes --tui over a PTY-backed WebSocket so the dashboard can
embed the real TUI rather than reimplement its surface. The browser
attaches xterm.js to the socket; keystrokes flow in, PTY output bytes
flow out.
Architecture:
browser <Terminal> (xterm.js)
│ onData ───► ws.send(keystrokes)
│ onResize ► ws.send('\x1b[RESIZE:cols;rows]')
│ write ◄── ws.onmessage (PTY bytes)
▼
FastAPI /api/pty (token-gated, loopback-only)
▼
PtyBridge (ptyprocess) ── spawns node ui-tui/dist/entry.js ──► tui_gateway + AIAgent
Components
----------
hermes_cli/pty_bridge.py
Thin wrapper around ptyprocess.PtyProcess: byte-safe read/write on the
master fd via os.read/os.write (not PtyProcessUnicode — ANSI is
inherently byte-oriented and UTF-8 boundaries may land mid-read),
non-blocking select-based reads, TIOCSWINSZ resize, idempotent
SIGHUP→SIGTERM→SIGKILL teardown, platform guard (POSIX-only; Windows
is WSL-supported only).
hermes_cli/web_server.py
@app.websocket("/api/pty") endpoint gated by the existing
_SESSION_TOKEN (via ?token= query param since browsers can't set
Authorization on WS upgrades). Loopback-only enforcement. Reader task
uses run_in_executor to pump PTY bytes without blocking the event
loop. Writer loop intercepts a custom \x1b[RESIZE:cols;rows] escape
before forwarding to the PTY. The endpoint resolves the TUI argv
through a _resolve_chat_argv hook so tests can inject fake commands
without building the real TUI.
Tests
-----
tests/hermes_cli/test_pty_bridge.py — 12 unit tests: spawn, stdout,
stdin round-trip, EOF, resize (via TIOCSWINSZ + tput readback), close
idempotency, cwd, env forwarding, unavailable-platform error.
tests/hermes_cli/test_web_server.py — TestPtyWebSocket adds 7 tests:
missing/bad token rejection (close code 4401), stdout streaming,
stdin round-trip, resize escape forwarding, unavailable-platform ANSI
error frame + 1011 close, resume parameter forwarding to argv.
96 tests pass under scripts/run_tests.sh.
(cherry picked from commit 29b337bca7)
feat(web): add Chat tab with xterm.js terminal + Sessions resume button
(cherry picked from commit 3d21aee8 by emozilla, conflicts resolved
against current main: BUILTIN_ROUTES table + plugin slot layout)
fix(tui): replace OSC 52 jargon in /copy confirmation
When the user ran /copy successfully, Ink confirmed with:
sent OSC52 copy sequence (terminal support required)
That reads like a protocol spec to everyone who isn't a terminal
implementer. The caveat was a historical artifact — OSC 52 wasn't
universally supported when this message was written, so the TUI
honestly couldn't guarantee the copy had landed anywhere.
Today every modern terminal (including the dashboard's embedded
xterm.js) handles OSC 52 reliably. Say what the user actually wants
to know — that it copied, and how much — matching the message the
TUI already uses for selection copy:
copied 1482 chars
(cherry picked from commit a0701b1d5a)
docs: document the dashboard Chat tab
AGENTS.md — new subsection under TUI Architecture explaining that the
dashboard embeds the real hermes --tui rather than rewriting it,
with pointers to the pty_bridge + WebSocket endpoint and the rule
'never add a parallel chat surface in React.'
website/docs/user-guide/features/web-dashboard.md — user-facing Chat
section inside the existing Web Dashboard page, covering how it works
(WebSocket + PTY + xterm.js), the Sessions-page resume flow, and
prerequisites (Node.js, ptyprocess, POSIX kernel / WSL on Windows).
(cherry picked from commit 2c2e32cc45)
feat(tui-gateway): transport-aware dispatch + WebSocket sidecar
Decouples the JSON-RPC dispatcher from its I/O sink so the same handler
surface can drive multiple transports concurrently. The PTY chat tab
already speaks to the TUI binary as bytes — this adds a structured
event channel alongside it for dashboard-side React widgets that need
typed events (tool.start/complete, model picker state, slash catalog)
that PTY can't surface.
- `tui_gateway/transport.py` — `Transport` protocol + `contextvars` binding
+ module-level `StdioTransport` fallback. The stdio stream resolves
through a lambda so existing tests that monkey-patch `_real_stdout`
keep passing without modification.
- `tui_gateway/ws.py` — WebSocket transport implementation; FastAPI
endpoint mounting lives in hermes_cli/web_server.py.
- `tui_gateway/server.py`:
- `write_json` routes via session transport (for async events) →
contextvar transport (for in-request writes) → stdio fallback.
- `dispatch(req, transport=None)` binds the transport for the request
lifetime and propagates it to pool workers via `contextvars.copy_context`
so async handlers don't lose their sink.
- `_init_session` and the manual-session create path stash the
request's transport so out-of-band events (subagent.complete, etc.)
fan out to the right peer.
`tui_gateway.entry` (Ink's stdio handshake) is unchanged externally —
it falls through every precedence step into the stdio fallback, byte-
identical to the previous behaviour.
feat(web): ChatSidebar — JSON-RPC sidecar next to xterm.js terminal
Composes the two transports into a single Chat tab:
┌─────────────────────────────────────────┬──────────────┐
│ xterm.js / PTY (emozilla #13379) │ ChatSidebar │
│ the literal hermes --tui process │ /api/ws │
└─────────────────────────────────────────┴──────────────┘
terminal bytes structured events
The terminal pane stays the canonical chat surface — full TUI fidelity,
slash commands, model picker, mouse, skin engine, wide chars all paint
inside the terminal. The sidebar opens a parallel JSON-RPC WebSocket
to the same gateway and renders metadata that PTY can't surface to
React chrome:
• model + provider badge with connection state (click → switch)
• running tool-call list (driven by tool.start / tool.progress /
tool.complete events)
• model picker dialog (gateway-driven, reuses ModelPickerDialog)
The sidecar is best-effort. If the WS can't connect (older gateway,
network hiccup, missing token) the terminal pane keeps working
unimpaired — sidebar just shows the connection-state badge in the
appropriate tone.
- `web/src/components/ChatSidebar.tsx` — new component (~270 lines).
Owns its GatewayClient, drives the model picker through
`slash.exec`, fans tool events into a capped tool list.
- `web/src/pages/ChatPage.tsx` — split layout: terminal pane
(`flex-1`) + sidebar (`w-80`, `lg+` only).
- `hermes_cli/web_server.py` — mount `/api/ws` (token + loopback
guards mirror /api/pty), delegate to `tui_gateway.ws.handle_ws`.
Co-authored-by: emozilla <emozilla@nousresearch.com>
refactor(web): /clean pass on ChatSidebar + ChatPage lint debt
- ChatSidebar: lift gw out of useRef into a useMemo derived from a
reconnect counter. React 19's react-hooks/refs and react-hooks/
set-state-in-effect rules both fire when you touch a ref during
render or call setState from inside a useEffect body. The
counter-derived gw is the canonical pattern for "external resource
that needs to be replaceable on user action" — re-creating the
client comes from bumping `version`, the effect just wires + tears
down. Drops the imperative `gwRef.current = …` reassign in
reconnect, drops the truthy ref guard in JSX. modelLabel +
banner inlined as derived locals (one-off useMemo was overkill).
- ChatPage: lazy-init the banner state from the missing-token check
so the effect body doesn't have to setState on first run. Drops
the unused react-hooks/exhaustive-deps eslint-disable. Adds a
scoped no-control-regex disable on the SGR mouse parser regex
(the \\x1b is intentional for xterm escape sequences).
All my-touched files now lint clean. Remaining warnings on web/
belong to pre-existing files this PR doesn't touch.
Verified: vitest 249/249, ui-tui eslint clean, web tsc clean,
python imports clean.
chore: uptick
fix(web): drop ChatSidebar tool list — events can't cross PTY/WS boundary
The /api/pty endpoint spawns `hermes --tui` as a child process with its
own tui_gateway and _sessions dict; /api/ws runs handle_ws in-process in
the dashboard server with a separate _sessions dict. Tool events fire on
the child's gateway and never reach the WS sidecar, so the sidebar's
tool.start/progress/complete listeners always observed an empty list.
Drop the misleading list (and the now-orphaned ToolCall primitive),
keep model badge + connection state + model picker + error banner —
those work because they're sidecar-local concerns. Surfacing tool calls
in the sidebar requires cross-process forwarding (PTY child opens a
back-WS to the dashboard, gateway tees emits onto stdio + sidecar
transport) — proper feature for a follow-up.
feat(web): wire ChatSidebar tool list to PTY child via /api/pub broadcast
The dashboard's /api/pty spawns hermes --tui as a child process; tool
events fire in the python tui_gateway grandchild and never crossed the
process boundary into the in-process WS sidecar — so the sidebar tool
list was always empty.
Cross-process forwarding:
- tui_gateway: TeeTransport (transport.py) + WsPublisherTransport
(event_publisher.py, sync websockets client). entry.py installs the
tee on _stdio_transport when HERMES_TUI_SIDECAR_URL is set, mirroring
every dispatcher emit to a back-WS without disturbing Ink's stdio
handshake.
- hermes_cli/web_server.py: new /api/pub (publisher) + /api/events
(subscriber) endpoints with a per-channel registry. /api/pty now
accepts ?channel= and propagates the sidecar URL via env. start_server
also stashes app.state.bound_port so the URL is constructable.
- web/src/pages/ChatPage.tsx: generates a channel UUID per mount,
passes it to /api/pty and as a prop to ChatSidebar.
- web/src/components/ChatSidebar.tsx: opens /api/events?channel=, fans
tool.start/progress/complete back into the ToolCall list. Restores
the ToolCall primitive.
Tests: 4 new TestPtyWebSocket cases cover channel propagation,
broadcast fan-out, and missing-channel rejection (10 PTY tests pass,
120 web_server tests overall).
fix(web): address Copilot review on #14890
Five threads, all real:
- gatewayClient.ts: register `message`/`close` listeners BEFORE awaiting
the open handshake. Server emits `gateway.ready` immediately after
accept, so a listener attached after the open promise could race past
the initial skin payload and lose it.
- ChatSidebar.tsx: wire `error`/`close` on the /api/events subscriber
WS into the existing error banner. 4401/4403 (auth/loopback reject)
surface as a "reload the page" message; mid-stream drops surface as
"events feed disconnected" with the existing reconnect button. Clean
unmount closes (1000/1001) stay silent.
- web-dashboard.md: install hint was `pip install hermes-agent[web]` but
ptyprocess lives in the `pty` extra, not `web`. Switch to
`hermes-agent[web,pty]` in both prerequisite blocks.
- AGENTS.md: previous "never add a parallel React chat surface" guidance
was overbroad and contradicted this PR's sidebar. Tightened to forbid
re-implementing the transcript/composer/PTY terminal while explicitly
allowing structured supporting widgets (sidebar / model picker /
inspectors), matching the actual architecture.
- web/package-lock.json: regenerated cleanly so the wterm sibling
workspace paths (extraneous machine-local entries) stop polluting CI.
Tests: 249/249 vitest, 10/10 PTY/events, web tsc clean.
refactor(web): /clean pass on ChatSidebar events handler
Spotted in the round-2 review:
- Banner flashed on clean unmount: `ws.close()` from the effect cleanup
fires `close` with code 1005, opened=true, neither 1000 nor 1001 —
hit the "unexpected drop" branch. Track `unmounting` in the effect
scope and gate the banner through a `surface()` helper so cleanup
closes stay silent.
- DRY the duplicated "events feed disconnected" string into a local
const used by both the error and close handlers.
- Drop the `opened` flag (no longer needed once the unmount guard is
the source of truth for "is this an expected close?").
A — 'hermes tools' activation now runs the full Spotify wizard.
Previously a user had to (1) toggle the Spotify toolset on in 'hermes
tools' AND (2) separately run 'hermes auth spotify' to actually use
it. The second step was a discovery gap — the docs mentioned it but
nothing in the TUI pointed users there.
Now toggling Spotify on calls login_spotify_command as a post_setup
hook. If the user has no client_id yet, the interactive wizard walks
them through Spotify app creation; if they do, it skips straight to
PKCE. Either way, one 'hermes tools' pass leaves Spotify toggled on
AND authenticated. SystemExit from the wizard (user abort) leaves the
toolset enabled and prints a 'run: hermes auth spotify' hint — it
does NOT fail the toolset toggle.
Dropped the TOOL_CATEGORIES env_vars list for Spotify. The wizard
handles HERMES_SPOTIFY_CLIENT_ID persistence itself, and asking users
to type env var names before the wizard fires was UX-backwards — the
point of the wizard is that they don't HAVE a client_id yet.
B — Docs page now covers cron + Spotify.
New 'Scheduling: Spotify + cron' section with two working examples
(morning playlist, wind-down pause) using the real 'hermes cron add'
CLI surface (verified via 'cron add --help'). Covers the active-device
gotcha, Premium gating, memory isolation, and links to the cron docs.
Also fixed a stale '9 Spotify tools' reference in the setup copy —
we consolidated to 7 tools in #15154.
Validation:
- scripts/run_tests.sh tests/hermes_cli/test_tools_config.py
tests/hermes_cli/test_spotify_auth.py
tests/tools/test_spotify_client.py
→ 54 passed
- website: node scripts/prebuild.mjs && npx docusaurus build
→ SUCCESS, no new warnings
Three quality improvements on top of #15121 / #15130 / #15135:
1. Tool consolidation (9 → 7)
- spotify_saved_tracks + spotify_saved_albums → spotify_library with
kind='tracks'|'albums'. Handler code was ~90 percent identical
across the two old tools; the merge is a behavioral no-op.
- spotify_activity dropped. Its 'now_playing' action was a duplicate
of spotify_playback.get_currently_playing (both return identical
204/empty payloads). Its 'recently_played' action moves onto
spotify_playback as a new action — history belongs adjacent to
live state.
- Net: each API call ships 2 fewer tool schemas when the Spotify
toolset is enabled, and the action surface is more discoverable
(everything playback-related is on one tool).
2. Spotify skill (skills/media/spotify/SKILL.md)
Teaches the agent canonical usage patterns so common requests don't
balloon into 4+ tool calls:
- 'play X' = one search, then play by URI (not search + scan +
describe + play)
- 'what's playing' = single get_currently_playing (no preflight
get_state chain)
- Don't retry on '403 Premium required' or '403 No active device' —
both require user action
- URI/URL/bare-ID format normalization
- Full failure-mode reference for 204/401/403/429
3. Surfaced in 'hermes setup' tool status
Adds 'Spotify (PKCE OAuth)' to the tool status list when
auth.json has a Spotify access/refresh token. Matches the
homeassistant pattern but reads from auth.json (OAuth-based) rather
than env vars.
Docs updated to reflect the new 7-tool surface, and mention the
companion skill in the 'Using it' section.
Tests: 54 passing (client 22, auth 15, tools_config 35 — 18 = 54 after
renaming/replacing the spotify_activity tests with library +
recently_played coverage). Docusaurus build clean.
The initial Spotify docs page shipped in #15130 was a setup guide. This
expands it into a full feature reference:
- Per-tool parameter table for all 9 tools, extracted from the real
schemas in tools/spotify_tool.py (actions, required/optional args,
premium gating).
- Free vs Premium feature matrix — which actions work on which tier,
so Free users don't assume Spotify tools are useless to them.
- Active-device prerequisite called out at the top; this is the #1
cause of '403 no active device' reports for every Spotify
integration.
- SSH / headless section explaining that browser auto-open is skipped
when SSH_CLIENT/SSH_TTY is set, and how to tunnel the callback port.
- Token lifecycle: refresh on 401, persistence across restarts, how
to revoke server-side via spotify.com/account/apps.
- Example prompt list so users know what to ask the agent.
- Troubleshooting expanded: no-active-device, Premium-required, 204
now_playing, INVALID_CLIENT, 429, 401 refresh-revoked, wizard not
opening browser.
- 'Where things live' table mapping auth.json / .env / Spotify app.
Verified with 'node scripts/prebuild.mjs && npx docusaurus build'
— page compiles, no new warnings.
Previously 'hermes auth spotify' crashed with 'HERMES_SPOTIFY_CLIENT_ID
is required' if the user hadn't manually created a Spotify developer
app and set env vars. Now the command detects a missing client_id and
walks the user through the one-time app registration inline:
- Opens https://developer.spotify.com/dashboard in the browser
- Tells the user exactly what to paste into the Spotify form
(including the correct default redirect URI, 127.0.0.1:43827)
- Prompts for the Client ID
- Persists HERMES_SPOTIFY_CLIENT_ID to ~/.hermes/.env so subsequent
runs skip the wizard
- Continues straight into the PKCE OAuth flow
Also prints the docs URL at both the start of the wizard and the end
of a successful login so users can find the full guide.
Adds website/docs/user-guide/features/spotify.md with the complete
setup walkthrough, tool reference, and troubleshooting, and wires it
into the sidebar under User Guide > Features > Advanced.
Fixes a stale redirect URI default in the hermes_cli/tools_config.py
TOOL_CATEGORIES entry (was 8888/callback from the PR description
instead of the actual DEFAULT_SPOTIFY_REDIRECT_URI value
43827/spotify/callback defined in auth.py).
When using GitHub Copilot as provider, HTTP 401 errors could cause
Hermes to silently fall back to the next model in the chain instead
of recovering. This adds a one-shot retry mechanism that:
1. Re-resolves the Copilot token via the standard priority chain
(COPILOT_GITHUB_TOKEN -> GH_TOKEN -> GITHUB_TOKEN -> gh auth token)
2. Rebuilds the OpenAI client with fresh credentials and Copilot headers
3. Retries the failed request before falling back
The fix handles the common case where the gho_* OAuth token remains
valid but the httpx client state becomes stale (e.g. after startup
race conditions or long-lived sessions).
Key design decisions:
- Always rebuild client even if token string unchanged (recovers stale state)
- Uses _apply_client_headers_for_base_url() for canonical header management
- One-shot flag guard prevents infinite 401 loops (matches existing pattern
used by Codex/Nous/Anthropic providers)
- No token exchange via /copilot_internal/v2/token (returns 404 for some
account types; direct gho_* auth works reliably)
Tests: 3 new test cases covering end-to-end 401->refresh->retry,
client rebuild verification, and same-token rebuild scenarios.
Docs: Updated providers.md with Copilot auth behavior section.
Cron jobs can now specify a per-job working directory. When set, the job
runs as if launched from that directory: AGENTS.md / CLAUDE.md /
.cursorrules from that dir are injected into the system prompt, and the
terminal / file / code-exec tools use it as their cwd (via TERMINAL_CWD).
When unset, old behaviour is preserved (no project context files, tools
use the scheduler's cwd).
Requested by @bluthcy.
## Mechanism
- cron/jobs.py: create_job / update_job accept 'workdir'; validated to
be an absolute existing directory at create/update time.
- cron/scheduler.py run_job: if job.workdir is set, point TERMINAL_CWD
at it and flip skip_context_files to False before building the agent.
Restored in finally on every exit path.
- cron/scheduler.py tick: workdir jobs run sequentially (outside the
thread pool) because TERMINAL_CWD is process-global. Workdir-less jobs
still run in the parallel pool unchanged.
- tools/cronjob_tools.py + hermes_cli/cron.py + hermes_cli/main.py:
expose 'workdir' via the cronjob tool and 'hermes cron create/edit
--workdir ...'. Empty string on edit clears the field.
## Validation
- tests/cron/test_cron_workdir.py (21 tests): normalize, create, update,
JSON round-trip via cronjob tool, tick partition (workdir jobs run on
the main thread, not the pool), run_job env toggle + restore in finally.
- Full targeted suite (tests/cron/, test_cronjob_tools.py, test_cron.py,
test_config_cwd_bridge.py, test_worktree.py): 314/314 passed.
- Live smoke: hermes cron create --workdir $(pwd) works; relative path
rejected; list shows 'Workdir:'; edit --workdir '' clears.
- Load prompt_caching.cache_ttl in AIAgent (5m default, 1h opt-in)
- Document DEFAULT_CONFIG and developer guide example
- Add unit tests for default, 1h, and invalid TTL fallback
Made-with: Cursor
Follow-up to aeff6dfe:
- Fix semantic error in VALID_HOOKS inline comment ("after core auth" ->
"before auth"). Hook intentionally runs BEFORE auth so plugins can
handle unauthorized senders without triggering the pairing flow.
- Fix wrong class name in the same comment (HermesGateway ->
GatewayRunner, matching gateway/run.py).
- Add a full ### pre_gateway_dispatch section in
website/docs/user-guide/features/hooks.md (matches the pattern of
every other plugin hook: signature, params table, fires-where,
return-value table, use cases, two worked examples) plus a row in
the quick-reference table.
- Add the anchor link on the plugins.md table row so it matches the
other hook entries.
No code behavior change.
Introduces a new plugin hook `pre_gateway_dispatch` fired once per
incoming MessageEvent in `_handle_message`, after the internal-event
guard but before the auth / pairing chain. Plugins may return a dict
to influence flow:
{"action": "skip", "reason": "..."} -> drop (no reply)
{"action": "rewrite", "text": "..."} -> replace event.text
{"action": "allow"} / None -> normal dispatch
Motivation: gateway-level message-flow patterns that don't fit cleanly
into any single adapter — e.g. listen-only group-chat windows (buffer
ambient messages, collapse on @mention), or human-handover silent
ingest (record messages while an owner handles the chat manually).
Today these require forking core; with this hook they can live in a
single profile-agnostic plugin.
Hook runs BEFORE auth so plugins can handle unauthorized senders
(e.g. customer-service handover ingest) without triggering the
pairing-code flow. Exceptions in plugin callbacks are caught and
logged; the first non-None action dict wins, remaining results are
ignored.
Includes:
- `VALID_HOOKS` entry + inline doc in `hermes_cli/plugins.py`
- Invocation block in `gateway/run.py::_handle_message`
- 5 new tests in `tests/gateway/test_pre_gateway_dispatch.py`
(skip, rewrite, allow, exception safety, internal-event bypass)
- 2 additional tests in `tests/hermes_cli/test_plugins.py`
- Table entry in `website/docs/user-guide/features/plugins.md`
Made-with: Cursor
Extends SECTION_DEFAULTS so the out-of-the-box TUI shows the turn as
a live transcript (reasoning + tool calls streaming inline) instead of
a wall of `▸` chevrons the user has to click every turn.
Final default matrix:
- thinking: expanded
- tools: expanded
- activity: hidden (unchanged from the previous commit)
- subagents: falls through to details_mode (collapsed by default)
Everything explicit in `display.sections` still wins, so anyone who
already pinned an override keeps their layout. One-line revert is
`display.sections.<name>: collapsed`.
The activity panel (gateway hints, terminal-parity nudges, background
notifications) is noise for the typical day-to-day user, who only cares
about thinking + tools + streamed content. Make `hidden` the built-in
default for that section so users land on the quiet mode out of the box.
Tool failures still render inline on the failing tool row, so this
default suppresses the noise feed without losing the signal.
Opt back in with `display.sections.activity: collapsed` (chevron) or
`expanded` (always open) in `~/.hermes/config.yaml`, or live with
`/details activity collapsed`.
Implementation: SECTION_DEFAULTS in domain/details.ts, applied as the
fallback in `sectionMode()` between the explicit override and the
global details_mode. Existing `display.sections.activity` overrides
take precedence — no migration needed for users who already set it.
* docs: browser CDP supervisor design (for upcoming PR)
Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.
Supersedes #12550.
No code changes in this commit.
* feat(browser): add persistent CDP supervisor for dialog + frame detection
Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.
Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.
Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.
Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.
Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.
E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.
No agent-facing tool wiring in this commit (comes next).
* feat(browser): add browser_dialog tool wired to CDP supervisor
Agent-facing response-only tool. Schema:
action: 'accept' | 'dismiss' (required)
prompt_text: response for prompt() dialogs (optional)
dialog_id: disambiguate when multiple dialogs queued (optional)
Handler:
SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)
check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.
Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.
* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot
Supervisor lifecycle:
* _get_session_info lazy-starts the supervisor after a session row is
materialized — covers every backend code path (Browserbase, cdp_url
override, /browser connect, future providers) with one hook.
* cleanup_browser(task_id) stops the supervisor for that task first
(before the backend tears down CDP).
* cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
* /browser connect eagerly starts the supervisor for task 'default'
so the first snapshot already shows pending_dialogs.
* /browser disconnect stops the supervisor.
CDP URL resolution for the supervisor:
1. BROWSER_CDP_URL / browser.cdp_url override.
2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).
browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.
Config defaults:
* browser.dialog_policy: 'must_respond' (new)
* browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.
Deadlock fix in supervisor event dispatch:
* _on_dialog_opening and _on_target_attached used to await CDP calls
while the reader was still processing an event — but only the reader
can set the response Future, so the call timed out.
* Both now fire asyncio.create_task(...) so the reader stays pumping.
* auto_dismiss/auto_accept now actually close the dialog immediately.
Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
* supervisor start/snapshot
* main-frame alert detection + dismiss
* iframe.contentWindow alert
* prompt() with prompt_text reply
* respond with no pending dialog -> clean error
* auto_dismiss clears on event
* registry idempotency
* registry stop -> snapshot reports inactive
* browser_dialog tool no-supervisor error
* browser_dialog invalid action
* browser_dialog end-to-end via tool handler
xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.
* docs(browser): document browser_dialog tool + CDP supervisor
- user-guide/features/browser.md: new browser_dialog section with
workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
toolset row with note on pending_dialogs / frame_tree snapshot fields
Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).
* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility
Found via Browserbase E2E test that revealed two production-critical issues:
1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
CDP proxy tears down our long-lived WebSocket whenever a short-lived
client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
Fixed with a reconnecting _run loop that re-attaches with exponential
backoff on drops. _page_session_id and _child_sessions are reset on each
reconnect; pending_dialogs and frames are preserved across reconnects.
2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
Playwright-based CDP proxy dismisses alert/confirm/prompt before our
Page.handleJavaScriptDialog call can respond. So pending_dialogs is
empty by the time the agent reads a snapshot on Browserbase.
Added a recent_dialogs ring buffer (capacity 20) that retains a
DialogRecord for every dialog that opened, with a closed_by tag:
* 'agent' — agent called browser_dialog
* 'auto_policy' — local auto_dismiss/auto_accept fired
* 'watchdog' — must_respond timeout auto-dismissed (300s default)
* 'remote' — browser/backend closed it on us (Browserbase)
Agents on Browserbase now see the dialog history with closed_by='remote'
so they at least know a dialog fired, even though they couldn't respond.
3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
'message' field (CDP spec has only 'result' and 'userInput') but our
_on_dialog_closed was matching on message. Fixed to match by session_id
+ oldest-first, with a safety assumption that only one dialog is in
flight per session (the JS thread is blocked while a dialog is up).
Docs + tests updated:
* browser.md: new availability matrix showing the three backends and
which mode (pending / recent / response) each supports
* developer-guide/browser-supervisor.md: three-field snapshot schema
with closed_by semantics
* test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
passing against real Chrome)
E2E verified both backends:
* Local Chrome via /browser connect: detect + respond full workflow
(smoke_supervisor.py all 7 scenarios pass)
* Browserbase: detect via recent_dialogs with closed_by='remote'
(smoke_supervisor_browserbase_v2.py passes)
Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.
* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)
Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.
The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.
Flow when a page calls alert('hi'):
1. window.alert override intercepts, builds XHR GET to
http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
it as a pending dialog with bridge_request_id set
4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
5. Supervisor calls Fetch.fulfillRequest with JSON body:
{accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
6. The injected script parses the body, returns the appropriate value
from the override (undefined for alert, bool for confirm, string|null
for prompt)
This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.
Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.
Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).
E2E VERIFIED:
* Local Chrome: 13/13 pytest tests green (12 original + new
test_bridge_captures_prompt_and_returns_reply_text that asserts
window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
* Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
- alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
- prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
→ page.prompt_ret === 'AGENT-REPLY' ✓
- confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
- confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓
Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.
* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)
Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).
Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.
Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.
Agent workflow:
1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
params={'expression': 'document.title', 'returnByValue': True})
3. Supervisor dispatches the call on the OOPIF's child session
Supervisor state fixes needed along the way:
* _on_frame_detached now skips reason='swap' (frame migrating processes)
* _on_frame_detached also skips when the frame is an OOPIF with a live
child session — Browserbase fires spurious remove events when a
same-origin iframe gets promoted to OOPIF
* _on_target_detached clears cdp_session_id but KEEPS the frame record
so the agent still sees the OOPIF in frame_tree during transient
session flaps
E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
browser_cdp(method='Runtime.evaluate',
params={'expression': 'document.title', 'returnByValue': True},
frame_id=<OOPIF>)
→ {'success': True, 'result': {'value': 'Example Domain'}}
The iframe is <iframe src='https://example.com/'> inside a top-level
data: URL page on a real Browserbase session. The agent Runtime.evaluates
INSIDE the cross-origin iframe and gets example.com's title back.
Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
* test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
verifies routing via supervisor, Runtime.evaluate returns 1+1=2
* test_browser_cdp_frame_id_missing_supervisor — clean error when no
supervisor attached
* test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
frame_id
Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.
* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process
When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:
* 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
* Chrome with --site-per-process so the cross-origin iframe becomes a
real OOPIF in its own process
* Navigate, find OOPIF in supervisor.frame_tree, call
browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
through the supervisor's child session
* Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
inner page, retrieved via OOPIF eval)
PASSED on 2026-04-23.
Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.
chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.
Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.
* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count
Pre-merge docs audit revealed two gaps:
1. user-guide/configuration.md browser config example was missing the
two new dialog_* knobs. Added with a short table explaining
must_respond / auto_dismiss / auto_accept semantics and a link to
the feature page for the full workflow.
2. reference/tools-reference.md header said '54 built-in tools' — real
count on main is 54, this branch adds browser_dialog so it's 55.
Fixed the header. (browser count was already correctly bumped
11 -> 12 in the earlier docs commit.)
No code changes.
Generates a full dedicated Docusaurus page for every one of the 132 skills
(73 bundled + 59 optional) under website/docs/user-guide/skills/{bundled,optional}/<category>/.
Each page carries the skill's description, metadata (version, author, license,
dependencies, platform gating, tags, related skills cross-linked to their own
pages), and the complete SKILL.md body that Hermes loads at runtime.
Previously the two catalog pages just listed skills with a one-line blurb and
no way to see what the skill actually did — users had to go read the source
repo. Now every skill has a browsable, searchable, cross-linked reference in
the docs.
- website/scripts/generate-skill-docs.py — generator that reads skills/ and
optional-skills/, writes per-skill pages, regenerates both catalog indexes,
and rewrites the Skills section of sidebars.ts. Handles MDX escaping
(outside fenced code blocks: curly braces, unsafe HTML-ish tags) and
rewrites relative references/*.md links to point at the GitHub source.
- website/docs/reference/skills-catalog.md — regenerated; each row links to
the new dedicated page.
- website/docs/reference/optional-skills-catalog.md — same.
- website/sidebars.ts — Skills section now has Bundled / Optional subtrees
with one nested category per skill folder.
- .github/workflows/{docs-site-checks,deploy-site}.yml — run the generator
before docusaurus build so CI stays in sync with the source SKILL.md files.
Build verified locally with `npx docusaurus build`. Only remaining warnings
are pre-existing broken link/anchor issues in unrelated pages.
* feat(config): make tool output truncation limits configurable
Port from anomalyco/opencode#23770: expose a new `tool_output` config
section so users can tune the hardcoded truncation caps that apply to
terminal output and read_file pagination.
Three knobs under `tool_output`:
- max_bytes (default 50_000) — terminal stdout/stderr cap
- max_lines (default 2000) — read_file pagination cap
- max_line_length (default 2000) — per-line cap in line-numbered view
All three keep their existing hardcoded values as defaults, so behaviour
is unchanged when the section is absent. Power users on big-context
models can raise them; small-context local models can lower them.
Implementation:
- New `tools/tool_output_limits.py` reads the section with defensive
fallback (missing/invalid values → defaults, never raises).
- `tools/terminal_tool.py` MAX_OUTPUT_CHARS now comes from
get_max_bytes().
- `tools/file_operations.py` normalize_read_pagination() and
_add_line_numbers() now pull the limits at call time.
- `hermes_cli/config.py` DEFAULT_CONFIG gains the `tool_output` section
so `hermes setup` writes defaults into fresh configs.
- Docs page `user-guide/configuration.md` gains a "Tool Output
Truncation Limits" section with large-context and small-context
example configs.
Tests (18 new in tests/tools/test_tool_output_limits.py):
- Default resolution with missing / malformed / non-dict config.
- Full and partial user overrides.
- Coercion of bad values (None, negative, wrong type, str int).
- Shortcut accessors delegate correctly.
- DEFAULT_CONFIG exposes the section with the right defaults.
- Integration: normalize_read_pagination clamps to the configured
max_lines.
* feat(skills): add design-md skill for Google's DESIGN.md spec
Built-in skill under skills/creative/ that teaches the agent to author,
lint, diff, and export DESIGN.md files — Google's open-source
(Apache-2.0) format for describing a visual identity to coding agents.
Covers:
- YAML front matter + markdown body anatomy
- Full token schema (colors, typography, rounded, spacing, components)
- Canonical section order + duplicate-heading rejection
- Component property whitelist + variants-as-siblings pattern
- CLI workflow via 'npx @google/design.md' (lint/diff/export/spec)
- Lint rule reference including WCAG contrast checks
- Common YAML pitfalls (quoted hex, negative dimensions, dotted refs)
- Starter template at templates/starter.md
Package verified live on npm (@google/design.md@0.1.1).
Themes and plugins can now pull off arbitrary dashboard reskins (cockpit
HUD, retro terminal, etc.) without touching core code.
Themes gain four new fields:
- layoutVariant: standard | cockpit | tiled — shell layout selector
- assets: {bg, hero, logo, crest, sidebar, header, custom: {...}} —
artwork URLs exposed as --theme-asset-* CSS vars
- customCSS: raw CSS injected as a scoped <style> tag on theme apply
(32 KiB cap, cleaned up on theme switch)
- componentStyles: per-component CSS-var overrides (clipPath,
borderImage, background, boxShadow, ...) for card/header/sidebar/
backdrop/tab/progress/badge/footer/page
Plugin manifests gain three new fields:
- tab.override: replaces a built-in route instead of adding a tab
- tab.hidden: register component + slots without adding a nav entry
- slots: declares shell slots the plugin populates
10 named shell slots: backdrop, header-left/right/banner, sidebar,
pre-main, post-main, footer-left/right, overlay. Plugins register via
window.__HERMES_PLUGINS__.registerSlot(name, slot, Component). A
<PluginSlot> React helper is exported on the plugin SDK.
Ships a full demo at plugins/strike-freedom-cockpit/ — theme YAML +
slot-only plugin that reproduces a Gundam cockpit dashboard: MS-STATUS
sidebar with live telemetry, COMPASS crest in header, notched card
corners via componentStyles, scanline overlay via customCSS, gold/cyan
palette, Orbitron typography.
Validation:
- 15 new tests in test_web_server.py covering every extended field
- tests/hermes_cli/: 2615 passed (3 pre-existing unrelated failures)
- tsc -b --noEmit: clean
- vite build: 418 kB bundle, ~2 kB delta for slots/theme extensions
Co-authored-by: Teknium <p@nousresearch.com>
Replaces blind tree.sync() on every Discord reconnect with a diff-based
reconcile. In safe mode (default), fetch existing global commands,
compare desired vs existing payloads, skip unchanged, PATCH changed,
recreate when non-patchable metadata differs, POST missing, and delete
stale commands one-by-one. Keeps 'bulk' for legacy behavior and 'off'
to skip startup sync entirely.
Fixes restart-heavy workflows that burn Discord's command write budget
and can surface 429s when iterating on native slash commands.
Env var: DISCORD_COMMAND_SYNC_POLICY (safe|bulk|off), default 'safe'.
Co-authored-by: Codex <codex@openai.invalid>
Dashboard themes now control typography and layout, not just colors.
Each built-in theme picks its own fonts, base size, radius, and density
so switching produces visible changes beyond hue.
Schema additions (per theme):
- typography — fontSans, fontMono, fontDisplay, fontUrl, baseSize,
lineHeight, letterSpacing. fontUrl is injected as <link> on switch
so Google/Bunny/self-hosted stylesheets all work.
- layout — radius (any CSS length) and density
(compact | comfortable | spacious, multiplies Tailwind spacing).
- colorOverrides (optional) — pin individual shadcn tokens that would
otherwise derive from the palette.
Built-in themes are now distinct beyond palette:
- default — system stack, 15px, 0.5rem radius, comfortable
- midnight — Inter + JetBrains Mono, 14px, 0.75rem, comfortable
- ember — Spectral (serif) + IBM Plex Mono, 15px, 0.25rem
- mono — IBM Plex Sans + Mono, 13px, 0 radius, compact
- cyberpunk— Share Tech Mono everywhere, 14px, 0 radius, compact
- rose — Fraunces (serif) + DM Mono, 16px, 1rem, spacious
Also fixes two bugs:
1. Custom user themes silently fell back to default. ThemeProvider
only applied BUILTIN_THEMES[name], so YAML files in
~/.hermes/dashboard-themes/ showed in the picker but did nothing.
Server now ships the full normalised definition; client applies it.
2. Docs documented a 21-token flat colors schema that never matched
the code (applyPalette reads a 3-layer palette). Rewrote the
Themes section against the actual shape.
Implementation:
- web/src/themes/types.ts: extend DashboardTheme with typography,
layout, colorOverrides; ThemeListEntry carries optional definition.
- web/src/themes/presets.ts: 6 built-ins with distinct typography+layout.
- web/src/themes/context.tsx: applyTheme() writes palette+typography+
layout+overrides as CSS vars, injects fontUrl stylesheet, fixes the
fallback-to-default bug via resolveTheme(name).
- web/src/index.css: html/body/code read the new theme-font vars;
--radius-sm/md/lg/xl derive from --theme-radius; --spacing scales
with --theme-spacing-mul so Tailwind utilities shift with density.
- hermes_cli/web_server.py: _normalise_theme_definition() parses loose
YAML (bare hex strings, partial blocks) into the canonical wire
shape; /api/dashboard/themes ships full definitions for user themes.
- tests/hermes_cli/test_web_server.py: 16 new tests covering the
normaliser and discovery (rejection cases, clamping, defaults).
- website/docs/user-guide/features/web-dashboard.md: rewrite Themes
section with real schema, per-model tables, full YAML example.