Tool handlers (e.g. computer_use capture) return a _multimodal envelope
dict when a screenshot is attached. The tool-message builder was passing
this raw dict as the `content` field of role:tool messages, which is an
illegal format — OpenAI-compatible APIs expect a string or a content-parts
list, not a plain Python dict, and would reject it with a 400/422 error.
Fix: unwrap _multimodal results to their `content` list
([{type:text,...},{type:image_url,...}]) in both the parallel and
sequential tool-call paths. The Anthropic adapter already handles content
lists natively; vision-capable OpenAI-compatible servers (mlx-vlm,
GPT-4o, etc.) accept image_url parts in tool messages directly.
Also add a _vision_supported adaptive fallback: on first image-rejection
error ("Only 'text' content type is supported." etc.) the agent strips all
image parts from the message history and retries with text only, so
text-only endpoints degrade gracefully without crashing the session.
Extends the cua-driver computer-use backend to drive backgrounded macOS
windows without stealing keyboard or mouse focus from the foreground app.
All changes target the cua-driver MCP backend and the shared dispatcher.
## cua_backend.py
**Window-aware capture**: capture() now calls list_windows + get_window_state
instead of the removed capture tool. Prefers structuredContent.windows
(MCP 2024-11-05+ cua-driver) for zero-parse window enumeration; falls back
to regex-parsed text for older builds. Stores the selected (pid, window_id)
as sticky context so subsequent action calls do not need a redundant round-trip.
**Action routing**: click/scroll/type_text/key all carry the sticky pid
(and window_id for element-indexed clicks). type_text routes through
type_text_chars (individual key events) rather than AX attribute write --
WebKit AXTextFields reject attribute writes from backgrounded processes.
**Key parsing**: _parse_key_combo splits cmd+s-style strings into
(key, [modifiers]) and routes to hotkey (modifier present) or
press_key (bare key) -- cua-driver actual tool names.
**set_value method**: new set_value(value, element) calls the cua-driver
set_value MCP tool. For AXPopUpButton / HTML select in a backgrounded Safari,
AXPress opens the native macOS popup which closes immediately when the app is
non-frontmost; set_value AX-presses the matching child option directly
(no menu required, no focus steal).
**focus_app**: reimplemented as a pure window-selector (enumerates
list_windows, sets sticky pid/window_id) without ever raising the window
or stealing focus.
**list_apps**: fixed tool name from listApps to list_apps; handles plain-text
response via regex when structured data is absent.
**Structured-content extraction**: _extract_tool_result now surfaces
structuredContent from MCP results, enabling the list_windows window array
without text parsing.
**Helpers**: _parse_windows_from_text, _parse_elements_from_tree,
_split_tree_text, _parse_key_combo extracted as module-level functions.
## schema.py
Added set_value to the action enum with a description explaining when to
prefer it over click (select/popup elements, sliders, no focus steal).
Added value field for set_value payloads.
## tool.py
Routed set_value action through _dispatch to backend.set_value.
Added set_value to _DESTRUCTIVE_ACTIONS (approval-gated).
Fixed MIME-type detection in _capture_response: cua-driver may return
JPEG; detect from base64 magic bytes (/9j/ -> image/jpeg, else image/png)
rather than hardcoding image/png.
## agent/display.py + run_agent.py
Guard _detect_tool_failure and result-preview logic against non-string
function_result values: multimodal tool results (dicts with _multimodal=True)
are not string-sliceable; treat them as successes and fall back to str()
for length/preview.
Background macOS desktop control via cua-driver MCP — does NOT steal the
user's cursor or keyboard focus, works with any tool-capable model.
Replaces the Anthropic-native `computer_20251124` approach from the
abandoned #4562 with a generic OpenAI function-calling schema plus SOM
(set-of-mark) captures so Claude, GPT, Gemini, and open models can all
drive the desktop via numbered element indices.
- `tools/computer_use/` package — swappable ComputerUseBackend ABC +
CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary).
- Universal `computer_use` tool with one schema for all providers.
Actions: capture (som/vision/ax), click, double_click, right_click,
middle_click, drag, scroll, type, key, wait, list_apps, focus_app.
- Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style
`content: [text, image_url]` parts) that flows through
handle_function_call into the tool message. Anthropic adapter converts
into native `tool_result` image blocks; OpenAI-compatible providers
get the parts list directly.
- Image eviction in convert_messages_to_anthropic: only the 3 most
recent screenshots carry real image data; older ones become text
placeholders to cap per-turn token cost.
- Context compressor image pruning: old multimodal tool results have
their image parts stripped instead of being skipped.
- Image-aware token estimation: each image counts as a flat 1500 tokens
instead of its base64 char length (~1MB would have registered as
~250K tokens before).
- COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset
is active.
- Session DB persistence strips base64 from multimodal tool messages.
- Trajectory saver normalises multimodal messages to text-only.
- `hermes tools` post-setup installs cua-driver via the upstream script
and prints permission-grant instructions.
- CLI approval callback wired so destructive computer_use actions go
through the same prompt_toolkit approval dialog as terminal commands.
- Hard safety guards at the tool level: blocked type patterns
(curl|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash,
force delete, lock screen, log out).
- Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic)
workflow guide.
- Docs: `user-guide/features/computer-use.md` plus reference catalog
entries.
44 new tests in tests/tools/test_computer_use.py covering schema
shape (universal, not Anthropic-native), dispatch routing, safety
guards, multimodal envelope, Anthropic adapter conversion, screenshot
eviction, context compressor pruning, image-aware token estimation,
run_agent helpers, and universality guarantees.
469/469 pass across tests/tools/test_computer_use.py + the affected
agent/ test suites.
- `model_tools.py` provider-gating: the tool is available to every
provider. Providers without multi-part tool message support will see
text-only tool results (graceful degradation via `text_summary`).
- Anthropic server-side `clear_tool_uses_20250919` — deferred;
client-side eviction + compressor pruning cover the same cost ceiling
without a beta header.
- macOS only. cua-driver uses private SkyLight SPIs
(SLEventPostToPid, SLPSPostEventRecordTo,
_AXObserverAddNotificationAndCheckRemote) that can break on any macOS
update. Pin with HERMES_CUA_DRIVER_VERSION.
- Requires Accessibility + Screen Recording permissions — the post-setup
prints the Settings path.
Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic-
native schema). Credit @0xbyt4 for the original #3816 groundwork whose
context/eviction/token design is preserved here in generic form.
Second docs slice shipped alongside the webhook listener code so users
can actually wire up the endpoint the moment this PR lands.
- website/docs/user-guide/messaging/msgraph-webhook.md: new page
covering what the listener is (change-notification ingress, distinct
from the teams chat adapter), quick-start YAML + env-var config,
full config table, security hardening (clientState + timing-safe
compare, source-IP allowlisting against Microsoft's published egress
ranges, TLS termination at the reverse proxy, response hygiene),
status-code table, troubleshooting, and cross-links to the Azure
app registration guide.
- website/docs/reference/environment-variables.md: new Microsoft
Graph Webhook Listener subsection with MSGRAPH_WEBHOOK_ENABLED,
_PORT, _CLIENT_STATE, _ACCEPTED_RESOURCES, _ALLOWED_SOURCE_CIDRS.
- website/sidebars.ts: wire the new page into Messaging Platforms,
right after the teams chat adapter so the two related pages are
adjacent in the sidebar.
The pipeline runtime / operator CLI / outbound delivery pages still
land with their matching PRs. With this PR merged, an operator can get
the listener running end-to-end, register a Graph subscription
manually, and receive validation handshake plus notification POSTs
against the configured client_state.
Verified via npm run build: new page routes at
/docs/user-guide/messaging/msgraph-webhook, sidebar wires correctly,
no new warnings or errors.
Defense-in-depth polish on top of the webhook listener before it becomes
a real attack surface once the pipeline starts creating subscriptions
and Graph starts POSTing to the configured public URL.
- Timing-safe clientState comparison. Previously used `==` on strings;
switches to hmac.compare_digest so a mismatch does not leak how many
leading characters matched. client_state is documented as a strong
shared secret (openssl rand -hex 32 in the setup docs), so a
timing-safe primitive is the right call.
- Split GET and POST handlers. Graph validates a subscription by sending
GET with validationToken in the query; anything else on GET is now a
400 so the endpoint cannot be probed or mistakenly used for data
exfil. Previously a bare GET fell through to the POST path and blew
up on request.json() with a confusing 400.
- Empty response bodies on success. 202 is returned with no body so
internal counters (accepted / duplicates / scheduled) do not leak to
any caller that can reach the endpoint; counters remain observable
via /health for operators. 403 on every-item-bad-clientState batches
(so forged POSTs stop retrying), 400 on malformed / unknown-resource
batches (sender configuration issue).
- Optional source-IP allowlist. New `allowed_source_cidrs` extra field
(list or comma-separated string) and `MSGRAPH_WEBHOOK_ALLOWED_SOURCE_CIDRS`
env var let operators restrict the webhook to Microsoft Graph's
published webhook source ranges in production. Empty = allow all,
preserving dev-tunnel / localhost workflows. Invalid CIDRs are
logged and ignored rather than crashing. Also gates the handshake
endpoint so disallowed IPs cannot probe it.
- Tests updated for the new response contract (empty-body 202,
auth-only 403, config-error 400) and extended to cover: bare GET
rejection, POST-with-validationToken handshake tolerance,
timing-safe compare actually invoked via hmac.compare_digest spy,
malformed body / missing value array, IP allowlist accept/reject
paths, handshake IP allowlist, invalid CIDR entries, comma-string
CIDR list parsing. 52/52 passed (was 40).
Full gateway suite: 5049 passed / 1 pre-existing failure in
test_discord_free_response (unrelated, reproduces on clean origin/main).
* feat(profile): shareable profile distributions (pack/install/update/info)
Closes#20456.
Turns a profile into a portable, versioned artifact. Packs SOUL.md, config,
skills, cron, and an env-var manifest into a tar.gz that others can install
from a local path, URL, or git repo. Updates re-pull the distribution while
preserving user data (memories, sessions, auth.json, .env) and the user's
config.yaml overrides.
New subcommands (under hermes profile, no parallel tree):
hermes profile pack <name> [-o FILE]
hermes profile install <source> [--name N] [--alias] [--force] [-y]
hermes profile update <name> [--force-config] [-y]
hermes profile info <name>
Manifest (distribution.yaml at the profile root): name, version,
hermes_requires, author, env_requires, distribution_owned.
Security:
- Installer shows manifest + env-var requirements before mutating disk;
confirmation required unless -y.
- auth.json and .env are never packed (same exclude set as profile export).
- Cron jobs are packed but NOT auto-scheduled — user is pointed at
'hermes -p <name> cron list' to review.
- Archive extraction rejects path traversal (../ members).
- Alias creation is opt-in via --alias.
Update semantics:
- Distribution-owned paths (SOUL.md, skills/, cron/, mcp.json, manifest):
replaced from the new archive.
- config.yaml: preserved by default; --force-config to overwrite.
- User-owned paths (memories/, sessions/, auth.json, .env, state.db*,
logs/, workspace/, plans/, home/, *_cache/, local/): never touched.
Version pin:
hermes_requires accepts >=, <=, ==, !=, >, < or a bare version (treated
as >=). Install fails with a clear error when the running Hermes version
doesn't satisfy the spec.
Sources supported by 'install':
- Local .tar.gz / .tgz archive
- Local directory
- HTTP(S) URL pointing to a .tar.gz (uses httpx, already a dep)
- Git URL (github.com/user/repo, https://..., git@..., ssh://, git://)
Tests: 43 new unit tests (manifest parsing, version checks, env template,
pack/install/update round-trip, config-preservation, security).
E2E validated via real CLI invocations against an isolated HERMES_HOME
covering pack, install with confirmation, update preservation, update
--force-config, decline-preview, duplicate-install rejection, and
version-requirement rejection.
* refactor(profile-dist): git-only — drop tar.gz/HTTP transports and pack
Scope-cut on top of the original distribution PR: a profile distribution
is now exclusively a git repository (or a local directory during
development). The tar.gz / HTTP archive transports and the matching
`hermes profile pack` subcommand have been removed.
Why:
* GitHub tags, branches, and commits are already the right versioning
primitive. Tag pushes do for us what 'pack + upload' did.
* `hermes profile export` / `import` already cover local backup and
restore; they are not a distribution format and stay untouched.
* One transport means one install/update code path, one doc page,
and one mental model. The extra source types doubled the surface
for no real user win — GitHub auto-attaches release tarballs, and
`git bundle` / `git clone --mirror` cover the airgap case.
Changes:
* hermes_cli/profile_distribution.py — removed pack_profile,
_fetch_tar_archive (_http_fetch), _safe_extract, _archive_roots,
_safe_parts, _find_dist_root, tarfile/io/urlparse imports. The
new _stage_source has two arms: git URL → clone, local directory
→ use in place.
* hermes_cli/main.py — removed the 'pack' subparser and action
handler. Install help text updated to match the reduced source list.
* tests/hermes_cli/test_profile_distribution.py — rewritten around a
local-directory staging fixture. The install/update/describe suites
now build a distribution tree on disk directly and install from it,
which is what a real git clone produces after .git is stripped.
Dropped TestPack, TestFindDistRoot, and the tar-specific security
test. New tests cover _looks_like_git_url, env_example emission,
hermes_requires enforcement, and 'installer does not import
credentials if an author mistakenly leaks them in the staging tree'.
* website/docs/reference/profile-commands.md — 'Distribution commands'
section rewritten around git. Added a 'Publishing a distribution'
section. export/import stay documented as local backup/restore.
* website/docs/reference/cli-commands.md — dropped 'pack' from the
profile subcommand table.
* website/package.json — 'lint:diagrams' now passes
--exclude-code-blocks to ascii-guard. Without it, markdown tables
and box-drawing diagrams inside fenced code blocks were being
misidentified as malformed ASCII boxes, blocking the PR's
docs-site-checks CI with 8 false-positive errors.
Validation:
* Targeted suite: tests/hermes_cli/test_profile_distribution.py —
56/56 pass (down from 43 — reorganized to cover the new
local-dir paths).
* Regression: test_profiles.py + test_profile_export_credentials.py
102/102 still pass. export/import behaviour unchanged.
* Docs lint: ascii-guard lint --exclude-code-blocks docs returns
0 errors (was 8 on the PR before the flag bump).
* E2E: ran the real `hermes profile install`/`info` against a
local staging dir under an isolated HERMES_HOME — install writes
SOUL.md + skills to the target profile, info reads the manifest
back, a bogus source produces a clear error, and `hermes profile
pack` is now rejected by argparse as expected.
* feat(profile-dist): distribution-aware list/show/delete + installed_at + env preview
Polish pass on top of the git-only scope cut. Five additions, all small,
wiring into existing commands rather than adding new surface.
1. `installed_at` timestamp on the manifest
* Stamped automatically inside plan_install() on both fresh install
and update — ISO-8601 UTC, seconds resolution.
* Surfaced in `hermes profile info` as `Installed: <ts>`.
* Lets users tell "installed 6 months ago, needs update" from
"installed yesterday" without guessing from file mtimes.
2. `hermes profile list` grows a `Distribution` column
* Plain profiles: "—"
* Distribution profiles: "<name>@<version>" (e.g. `telemetry@1.2.3`)
* ProfileInfo gains three optional fields — distribution_name,
distribution_version, distribution_source — populated by a new
_read_distribution_meta() helper that swallows manifest read errors
so a broken distribution.yaml in one profile can't break `list`
for the others.
3. `hermes profile show` and `hermes profile delete` surface
distribution provenance
* show: `Distribution: name@version` + `Installed from: <source>`
plus a pointer to `hermes profile info <name>` for the full
manifest.
* delete: same lines in the pre-confirmation preview, so a user
deleting "telemetry" can see it came from
`github.com/kyle/telemetry-distribution` before they type
`telemetry` to confirm. No change to the confirmation gate itself —
deletion semantics are identical to plain profiles.
4. Install preview checks env vars against the current environment
* Replaces the "Env vars you'll need to set:" header with a simpler
"Env vars:" block.
* Each required var is labeled:
- `✓ set` — already in `os.environ` OR present as a key in the
target profile's existing .env (update case).
- `needs setting` — required but not found in either place.
- `—` — optional.
* Mirrors pip's "Requirement already satisfied" UX: no unnecessary
nagging about keys the user already has configured.
5. Docs: private distributions
* New "Private distributions" section in
website/docs/reference/profile-commands.md explaining that we
shell out to the user's `git` binary, so SSH keys / credential
helpers / GitHub CLI stored creds all work transparently. One
paragraph, two examples.
* `hermes profile info` section updated to mention `Installed:`.
Module-level hoist:
* `from datetime import datetime, timezone` was previously lazy-imported
inside plan_install(). Hoisted to module scope so tests can monkeypatch
`hermes_cli.profile_distribution.datetime` to freeze time.
Tests (+7):
* TestInstalledAtStamp.test_install_stamps_installed_at — format check
(4-digit year, 'T', +00:00 suffix).
* TestInstalledAtStamp.test_update_refreshes_installed_at — freezes
datetime.now() to 2099-01-01 and confirms update writes a new stamp.
* TestProfileInfoDistribution.test_installed_distribution_shows_in_list
— ProfileInfo.distribution_{name,version,source} populated after install.
* TestProfileInfoDistribution.test_plain_profile_has_no_distribution_fields
— plain profiles have None.
* TestProfileInfoDistribution.test_malformed_manifest_does_not_break_list
— broken distribution.yaml in one profile doesn't break list_profiles().
Validation:
* 163/163 tests pass (56 distribution + 102 profile regression +
5 new from this commit — up from 158).
* docs-lint: 0 errors.
* E2E verified: install preview shows ✓/needs-setting per env var,
`profile list` shows Distribution column, `profile show` + `delete`
preview mentions source URL, `info` shows Installed: timestamp.
* fix(profile-dist): clean errors + warn when overwriting plain profiles
Two small polish fixes found during collision sweeps of the PR:
1. ValueError from validate_profile_name now caught cleanly
* A distribution.yaml whose 'name' field can't be used as a profile
identifier (spaces, path traversal, etc.) raises ValueError from
hermes_cli.profiles.validate_profile_name, which was escaping as a
raw Python traceback from 'hermes profile install/update/info'.
* Broadened the except clause in all three handlers to catch
(DistributionError, ValueError) — users now see:
Error: Invalid profile name '../../etc/passwd'. Must match
[a-z0-9][a-z0-9_-]{0,63}
instead of a stack trace.
2. Install preview distinguishes plain profile overwrite from
distribution re-install
* When plan.target_dir exists and IS a distribution (has
distribution.yaml), preview still shows the mild
(profile exists — will overwrite distribution-owned files only)
* When plan.target_dir exists but is a HAND-BUILT plain profile (no
distribution.yaml), preview now shows a loud warning:
⚠ Profile exists but is NOT a distribution. Installing here will
overwrite its SOUL.md, skills/, cron/, and mcp.json.
Your memories, sessions, auth.json, and .env will be preserved,
but any hand-edits to distribution-owned files will be lost.
* Users who type 'hermes profile install foo --force' against a
profile they hand-built now see what they're signing up for. User
data is still safe (memories, sessions, auth, .env are in
USER_OWNED_EXCLUDE), but custom SOUL/skills get stomped.
Tests (+2):
* TestErrorSurfaces.test_bad_profile_name_raises_valueerror_not_traceback
* TestErrorSurfaces.test_path_traversal_name_rejected
Validation:
* 165/165 tests pass (was 163).
* E2E: bad manifest names produce 'Error: Invalid profile name ...'
with no traceback; installing over a plain profile shows the warning;
re-installing over an existing distribution shows the normal
overwrite message.
* Bad HTTPS URLs still produce 'Error: git clone failed: ...' — git
itself generates a clean enough message that no wrapper is needed.
* 'install .' works correctly from any cwd.
* fix(profiles): reject reserved names at validate time
Before: `hermes profile create hermes` / `profile install` / `profile rename`
all silently accepted reserved names like `hermes`, `test`, `tmp`, `root`,
`sudo`. The profile directory was created; only alias creation failed (via
check_alias_collision), leaving a confusingly-named profile on disk — e.g.
`~/.hermes/profiles/hermes/` sitting next to `~/.hermes/` itself.
The reserved set already exists (_RESERVED_NAMES, introduced alongside alias
collision detection). This commit moves the check up one layer to
validate_profile_name so every entry point — create, install, import,
rename, dashboard web API — shares the same gate.
The error message points the user at the cause without being cryptic:
Error: Profile name 'hermes' is reserved — it collides with either the
Hermes installation itself or a common system binary. Pick a different
name.
`default` continues to pass through (it's a special alias for ~/.hermes).
_HERMES_SUBCOMMANDS (`chat`, `model`, `gateway`, etc.) stays at
alias-collision time only — those are fine as bare profile names with
`--no-alias`.
Tests (+5): test_reserved_names_rejected parametrized over the full
_RESERVED_NAMES set, matching the existing pattern in TestValidateProfileName.
No existing test uses a reserved name as a profile identifier (greppped
create_profile("hermes|test|tmp|root|sudo") — zero hits).
Validation:
* 170/170 tests pass in the profile suites.
* E2E: `profile create hermes`, `profile install` with manifest
name=hermes, and `profile install ... --name hermes` all produce the
same clean `Error: Profile name 'hermes' is reserved ...` with rc=1
and no traceback. Normal names (`mybot`) still work.
Foundation docs shipped alongside the Graph auth/client code so users
have a working path from zero to a verified token from the moment this
PR lands.
- website/docs/guides/microsoft-graph-app-registration.md: new page
walking through app registration, client secret, the exact minimum
Graph API permissions per pipeline capability (transcript-first,
recording fallback, Graph-mode delivery), admin consent, optional
Application Access Policy for tenant-scoping, token-flow smoke test
with the shipped MicrosoftGraphTokenProvider, and a troubleshooting
table for common AADSTS errors. Includes secret-rotation procedure.
- website/docs/reference/environment-variables.md: new Microsoft Graph
subsection in Messaging documenting MSGRAPH_TENANT_ID, MSGRAPH_CLIENT_ID,
MSGRAPH_CLIENT_SECRET, MSGRAPH_SCOPE (default .default),
MSGRAPH_AUTHORITY_URL (with sovereign-cloud override note for GCC
High etc.).
- website/sidebars.ts: wire the guide into Guides Tutorials.
The guide pages that cover the webhook listener, pipeline runtime,
operator CLI, and outbound delivery land with their matching PRs. This
one is the standalone prereq that's safe to verify in advance.
Verified via npm run build: no new warnings or errors; page routes
correctly at /docs/guides/microsoft-graph-app-registration.
The prior implementation routed download_to_file through the shared
_request() path, which uses httpx.AsyncClient.request() inside a
context manager that closes before aiter_bytes() iterates. The body
was read into memory first and the chunked write loop replayed it
from buffer. On small test payloads this was invisible; on real
Teams meeting recordings (hundreds of MB) it would force the full
artifact into RAM per download.
Rewrites download_to_file to open its own AsyncClient and use
client.stream(), keeping the context open across the aiter_bytes
iteration so the body is actually streamed chunk-by-chunk to disk.
Retry/token-refresh/Retry-After semantics are preserved by handling
them inline on the stream path. Partial .part files are cleaned up
on transport errors and on exhausted retries.
Adds three tests: large-payload streaming verifies the chunk loop
runs multiple times (discriminator: 512 KiB at chunk_size=65536
yields 8 chunks under streaming, 1 under buffering), transient-5xx
retry recovers after a single retry, and exhausted-retry cleans up
the partial file.
* feat(skills): watchers skill — poll RSS / HTTP JSON / GitHub via cron no-agent
Ships three reusable polling scripts plus a shared watermark helper as an
optional skill. Users wire them into the existing cron (no_agent=True)
mode rather than learning a new subsystem.
Supersedes the closed PR #21497 (parallel watcher subsystem). Same value,
zero new core surface.
## What ships
- optional-skills/devops/watchers/SKILL.md: pattern + three example cron commands
- optional-skills/devops/watchers/scripts/_watermark.py: shared helper
(atomic state writes, bounded ID set, first-run baseline)
- optional-skills/devops/watchers/scripts/watch_rss.py: RSS 2.0 + Atom
- optional-skills/devops/watchers/scripts/watch_http_json.py: any JSON endpoint
with configurable id_field / items_path / headers
- optional-skills/devops/watchers/scripts/watch_github.py: issues / pulls /
releases / commits (uses GITHUB_TOKEN if present)
## Invariants enforced by the shared helper
- First run records baseline, emits nothing (never replays existing feed)
- Watermark file is <state_dir>/<name>.json, atomic replace on write
- Bounded to 500 IDs (configurable)
- Empty stdout when no new items — cron treats that as silent delivery
## Validation
- watch_rss.py against news.ycombinator.com/rss first run → empty stdout, watermark populated
- Removed one seen-id, second run → emitted exactly that item
- No DeprecationWarnings (ET element truth-value footgun dodged explicitly)
End-user pattern: 'hermes cron create my-feed --schedule "*/15 * * * *" --no-agent --script $HERMES_HOME/skills/devops/watchers/scripts/watch_rss.py --script-args "--name hn --url https://news.ycombinator.com/rss" --deliver telegram'
* docs(skills/watchers): tighten description to match peer optional skills
* docs(skills/watchers): align frontmatter + structure with peer optional skills
* docs(skills/watchers): gate to linux/macos (shell syntax in examples)
The new _is_gateway_approval_context() widened the gateway classification
to any call with HERMES_SESSION_PLATFORM bound via contextvars. But
cron/scheduler.py binds that same contextvar for delivery routing on
cron jobs that originate from a gateway platform (telegram/discord/etc.),
so those jobs were getting routed through submit_pending with no
listener — blocking indefinitely instead of honoring approvals.cron_mode.
Short-circuit on HERMES_CRON_SESSION before any gateway check. Cron is
always governed by cron_mode config, regardless of where the job was
scheduled from.
Adds regression coverage in TestCronWithGatewayOrigin and records the
contributor email mapping for scripts/release.py.
Expand the google-workspace skill beyond read-only access to Drive and
Docs. Sheets already had full scope — just adds the missing create verb.
New subcommands:
- drive get : metadata for a single file
- drive upload : upload a local file (auto MIME detection)
- drive download : download or export (Docs/Sheets/Slides export to pdf/csv/pdf by default)
- drive create-folder
- drive share : user/group/domain/anyone + reader/writer/etc.
- drive delete : default trashes (reversible); --permanent skips the trash
- sheets create : new spreadsheet with optional first-tab name
- docs create : new doc, optional initial body
- docs append : append text at end of an existing doc
Scope changes:
- drive.readonly -> drive
- documents.readonly -> documents
Existing users with old tokens will hit the existing partial-scope
warning path (AUTHENTICATED (partial) ...) — the troubleshooting table
now points them at $GSETUP --revoke + redo steps 3-5 to pick up the
write scopes.
Reported: Ctrl+C during an active /goal loop felt like it did nothing —
the agent would interrupt the current turn, then immediately queue another
continuation and keep going until the session ended or the 20-turn budget
ran out.
Root cause: cli.py's _maybe_continue_goal_after_turn() ran in the finally:
block around self.chat(...) unconditionally. Whether the turn completed
normally, got interrupted, or returned an empty string, the judge ran on
whatever was in conversation_history and — because the judge is fail-open
— a "continue" verdict pushed another CONTINUATION_PROMPT onto
_pending_input. Ctrl+C was invisible to the hook.
Fix:
- chat() now captures result['interrupted'] onto self._last_turn_interrupted
(resets to False at entry so early-returns don't leak prior state).
- _maybe_continue_goal_after_turn() checks the flag first: on interrupt,
auto-pause via mgr.pause(reason='user-interrupted (Ctrl+C)') and print
a one-liner pointing the user at /goal resume or /goal clear. No judge
call, no continuation enqueued.
- Also added an empty-response guard that mirrors gateway/run.py's
_handle_message logic (empty reply → transient failure → skip judging
so we don't trip the consecutive-parse-failures backstop unnecessarily).
The goal stays in the DB as paused, so /goal resume recovers it after
the user has sorted out whatever made them cancel. /goal clear still
works as before for a full stop.
Tests: tests/cli/test_cli_goal_interrupt.py covers:
- interrupted turn pauses + doesn't queue + judge is NOT called
- paused goal is resumable
- empty / whitespace / missing assistant reply skips judging
- healthy turn still enqueues continuation / marks done
- chat() resets _last_turn_interrupted at entry (anti-leak guard)
All 55 existing goal tests still pass.
Lets orchestrators (e.g. an account-management service provisioning a
Hermes VPS) seed an OAuth refresh credential non-interactively instead of
walking the user through `hermes setup` + the device-flow login dance.
Matches the existing first-boot-only pattern used for .env, config.yaml,
and SOUL.md.
If HERMES_AUTH_JSON_BOOTSTRAP is set and $HERMES_HOME/auth.json doesn't
already exist, write the env var's contents to auth.json with mode 600.
The `[ ! -f ... ]` guard is critical: it ensures that on container
restart the rotated refresh token Hermes wrote back to the persistent
volume is never clobbered by the now-stale value the orchestrator
originally seeded.
Generic name (not Nous-specific) so the feature is reusable by any future
orchestrator.
remove_job() deletes the job from cron/jobs.json but leaves the per-job
output directory at ~/.hermes/cron/output/{job_id}/ behind. Over time
this accumulates orphaned dirs that never get reclaimed.
Adopted from #13510 by @hekaru-agent; the honcho RLock half of that PR
was already salvaged in commit dad021745 so this lands the remaining
cron cleanup hunk on its own.
Multi-turn transcripts ran together visually because every user message
got the same vertical rhythm regardless of position. Adds a short ─── in
the border colour above every user message after the first, so each turn
reads as its own block. Height estimator gains a `withSeparator` flag so
virtual scrolling pre-allocates the extra two rows (rule + top margin)
and avoids a jump on first measurement.
While in the area: the busy-indicator duration was padded with
`padStart(7)`, leaving five visible spaces between `·` and the digits
(`⠋ · 2s`) — especially loud under the verb-less `unicode` style.
Drop the padding entirely (`⠋ · 2s`); the model label now shifts a few
columns as the duration grows, which is the right trade-off for the
minimal indicator styles. The verb-padding test stays; the
duration-padding test is removed alongside the function it covered.
The quick setup flow (recommended for first-time users) silently defaulted
terminal.backend to 'local' without ever presenting the choice. This meant
new users who wanted Docker, SSH, Modal, Daytona, or any other backend had
to know about 'hermes setup terminal' — which most wouldn't discover until
later.
Now the quick setup flow is:
1. Provider selection
2. API key
3. Terminal backend (local/Docker/Modal/SSH/Daytona/Vercel/Singularity)
4. Messaging platform
5. Done
The terminal backend is a foundational decision (where ALL commands run)
and belongs in the onboarding path alongside provider selection.
Small follow-ups on top of #19643:
- check_auth() takes quiet kwarg to suppress its AUTHENTICATED print
when called from check_auth_live(), so the final status line reflects
the live-call outcome only.
- Drop redundant _ensure_deps() call in check_auth_live() (check_auth()
already calls it).
- Add AUTHOR_MAP entry for ygd58 so release attribution script works.
setup.py --check only validated token shape/expiry but did not detect
when Google had disabled the OAuth client or account. Users got
AUTHENTICATED even when actual API calls failed with disabled_client.
Changes:
- Catch disabled_client and invalid_client in check_auth() refresh
path with actionable guidance (check Cloud Console, check account
status, do not retry)
- Add check_auth_live() that performs a real Calendar API call to
detect disabled_client errors that survive token refresh
- Add --check-live CLI flag backed by check_auth_live()
Fixes#19570
Adds one reserved token to the cron `deliver` field:
- `all` — expand to every platform with a configured home channel
Resolves at fire time, not create time, so a job created before Telegram
was wired up picks it up once `TELEGRAM_HOME_CHANNEL` is set. Composes
with existing targets: `origin,all`, `all,telegram:-100:17`.
Inspired by Vellum Assistant's reminder routing-intent system.
## Changes
- cron/scheduler.py: _expand_routing_tokens + integrate into _resolve_delivery_targets
- tools/cronjob_tools.py: schema description updated
- tests/cron/test_scheduler.py: TestRoutingIntents (5 cases)
- website/docs/user-guide/features/cron.md: docs + table rows
## Validation
- tests/cron/test_scheduler.py -k 'Routing or Deliver' → 57 passed
The previous revision of this PR added six GMI-specific branches
(`elif base_url_host_matches(..., 'api.gmi-serving.com')`) across
run_agent.py and agent/auxiliary_client.py, plus a _HERMES_UA_HEADERS
constant in auxiliary_client.py.
ProviderProfile already has a `default_headers: dict[str, str]` field
commented as 'Client-level quirks (set once at client construction)'.
Other plugins (ai-gateway, kimi-coding) already use it. Two of the four
auxiliary_client sites we previously patched already had a generic
`else: profile.default_headers` fallback that picked it up (so did
both run_agent sites).
This revision:
* Sets `default_headers={'User-Agent': 'HermesAgent/<ver>'}` on the
GMI profile in plugins/model-providers/gmi/__init__.py.
* Reverts all six GMI-specific branches in run_agent.py and
auxiliary_client.py.
* Adds the generic profile-fallback `else` block to the two
auxiliary_client sites (`_to_async_client`, `resolve_provider_client`)
that didn't have it yet. This benefits every provider whose profile
declares default_headers, not just GMI — e.g. Vercel AI Gateway's
HTTP-Referer/X-Title now flow through the async client path too.
* Replaces the GMI-specific URL-branch tests with a profile-level
assertion and keeps the run_agent integration test (with
`provider='gmi'` so the fallback picks up the profile).
Net diff vs main: +82/-0 across 5 files, touching only the GMI plugin,
two generic fallback blocks in auxiliary_client.py, AUTHOR_MAP, and
tests. No core files change.
Based on #20907 by @isaachuangGMICLOUD.
When switching from a custom local provider (e.g. ollama-launch) to a
cloud provider, two bugs caused the CLI to misbehave:
1. _explicit_api_key/_explicit_base_url were only updated when the switch
result had non-empty values (guarded by `if result.api_key:` etc.).
If the previous provider set these to Ollama values ("ollama",
"http://127.0.0.1:11434/v1"), those stale values leaked into the next
turn's _ensure_runtime_credentials() call and were forwarded to the
new provider's API endpoint, causing authentication/routing failures.
Fix: unconditionally write result.api_key/base_url into the explicit
fields after every successful switch. An empty string is the correct
sentinel — it tells _ensure_runtime_credentials to re-resolve from the
auth store / config rather than forwarding a stale override.
2. In AIAgent.switch_model(), `self.base_url = base_url or self.base_url`
kept the old Ollama localhost URL whenever the incoming base_url was an
empty string. For providers that use a native SDK (not an OpenAI-compat
endpoint), the caller passes base_url="" and expects the agent to clear
the field — not silently inherit Ollama's address.
Fix: only update self.base_url when base_url is truthy.
3. _handle_model_picker_selection() was called from the prompt_toolkit
Enter key binding without any exception guard. Any unexpected error
in the model-selection code path propagated through prompt_toolkit's
key-binding dispatcher and caused the entire TUI to exit — which the
user sees as "the terminal exits when I switch providers".
Fix: wrap the call in try/except and close the picker on failure.
Weak judge models (e.g. deepseek-v4-flash) return empty strings or prose
when asked for the strict {done, reason} JSON verdict. The old code
failed-open to continue on every such turn, burning the entire turn
budget with log lines like
judge returned empty response
judge reply was not JSON: "Let me analyze whether the goal..."
and /goal clear could not stop it mid-loop without /stop.
After N=3 consecutive *parse* failures (transport/API errors don't
count — those are transient), the loop auto-pauses and prints:
⏸ Goal paused — the judge model (3 turns) isn't returning the
required JSON verdict. Route the judge to a stricter model in
~/.hermes/config.yaml:
auxiliary:
goal_judge:
provider: openrouter
model: google/gemini-3-flash-preview
Then /goal resume to continue.
The counter resets on any usable reply (both "done"/"continue" and
API errors) and persists across GoalManager reloads so cross-session
resumes carry the correct state.
Also fixes test_goal_verdict_send.py sharing a hardcoded session_id
across tests — the shared id only worked because the previous
_post_turn_goal_continuation was a never-awaited coroutine. Now that
PR #19160 made it properly awaited, the xdist test-leakage bug
surfaced. Each test gets a unique session_id via uuid suffix.
Route goal status notices through the platform adapter send API and register post-delivery callbacks so completed-goal notices appear after the final assistant response. Also cancel queued synthetic goal continuations on /goal pause and /goal clear while preserving normal queued user messages.
Makes first-time use of the kanban view self-explanatory. Every control
that wasn't already labelled now has a `title` tooltip describing what
it does, and a `?` icon next to the board switcher opens the kanban
docs page in a new tab.
Coverage:
- BoardSwitcher: board select, + New board button, docs-link icon
(both compact and full variants)
- BoardToolbar: Search, Tenant, Assignee, Show archived, Nudge
dispatcher, Refresh
- BulkActionBar: → ready, Complete, Archive, reassign group, Apply,
Clear
- Column header: hovering the header now surfaces COLUMN_HELP as a
tooltip in addition to the visible sub-text; column count also
labelled
- Card: task id, priority badge, tenant badge, assignee/unassigned,
comment count, link count, age timestamp
- InlineCreate: assignee, priority, parent-task selectors
Closes the community feedback from @CharlieDePew asking for tooltips
and a docs link in the kanban view.
Relevant docs page:
https://hermes-agent.nousresearch.com/docs/user-guide/features/kanban
When the installer is run via , uv resolves config file
paths against the process owner's (root) home directory rather than the
effective user's, causing a Permission denied error when trying to read
/root/uv.toml.
Setting UV_NO_CONFIG=1 prevents uv from discovering any config files
(uv.toml, pyproject.toml) during installation, which is the correct
behavior for a bootstrap script that manages its own environment.
Fixes#21269
channels_list was iterating directory.items() directly, yielding
("updated_at", str) and ("platforms", dict) pairs — neither passed
the isinstance(entries_list, list) check, so the inner loop never ran
and every call returned count=0 even when channel_directory.json was
populated.
The writer (gateway/channel_directory.py) wraps the payload as
{"updated_at": ..., "platforms": {...}}; every other reader in the
codebase unwraps via directory.get("platforms", {}). This aligns
channels_list with that convention.
Also tightens the existing test_channels_with_directory test, which
bypassed the bug by asserting against _load_channel_directory() directly
instead of calling channels_list. It now calls the tool end-to-end and
a new test_channels_with_directory_platform_filter covers the filter
path. Both tests fail against the pre-fix code.
Closes#21474
Co-authored-by: chrisworksai <262485129+chrisworksai@users.noreply.github.com>
- Add pricing entries for Claude Opus 4.5/4.6/4.7, Sonnet 4.5/4.6, and
Haiku 4.5 with updated source URLs (platform.claude.com)
- Add _normalize_anthropic_model_name() to handle dot-notation variants
(e.g. claude-opus-4.7 → claude-opus-4-7) for pricing lookups
- Fix silent token loss: ensure session row exists before UPDATE in both
run_agent.py and hermes_state.py (INSERT OR IGNORE is idempotent)
- Log token persistence failures at DEBUG level instead of swallowing
them silently — makes undercounted analytics diagnosable
- Surface reasoning tokens in CLI /usage and TUI usage panel
- Add 'reasoning' and 'cost_status' fields to TUI Usage type
The kanban specifier landed in #21435 with feature-page docs (the
kanban page itself + the CLI reference table), but three other docs
pages enumerate every auxiliary task slot and were missed:
user-guide/configuration.md Auxiliary Models section —
interactive picker example
+ full auxiliary config
reference YAML block.
user-guide/features/fallback-providers.md
Both 'Auxiliary Tasks' and
'Fallback Reference' tables.
user-guide/features/kanban-tutorial.md
Triage-column bullet now
mentions the ✨ Specify
button + CLI + slash command.
No other docs enumerate the aux task slots (verified with
grep -r 'title_generation\|auxiliary.session_search' website/docs/).
The existing mapping pointed to the wrong GitHub user (blakejohnson, id
866695, IBM) — the email actually belongs to voteblake (id 5585957),
confirmed via search/commits?author-email. Mis-credited since 323ca7084.
* feat(kanban): add `specify` — auxiliary LLM fleshes out triage tasks
The Triage column shipped with a placeholder 'a specifier will flesh
out the spec', but the specifier itself was never built. This wires
it up as a dedicated CLI verb.
`hermes kanban specify <id>` calls the auxiliary LLM (configured under
`auxiliary.triage_specifier`) to expand a rough one-liner into a
concrete spec — tightened title plus a body with Goal / Approach /
Acceptance criteria / Out-of-scope sections — then atomically flips
`status: triage -> todo` and recomputes ready so parent-free tasks
go straight to the dispatcher on the same tick.
Surface:
hermes kanban specify <task_id> # single task
hermes kanban specify --all [--tenant T] # sweep triage column
hermes kanban specify ... --author NAME # audit-comment author
hermes kanban specify ... --json # one JSON line per task
Design choices:
- Parent gating is preserved. specify_triage_task flips to 'todo',
then recompute_ready promotes to 'ready' only when parents are
done — same rule as a normal parent-gated todo.
- No daemon, no background watcher. Every invocation is explicit —
keeps cost predictable and doesn't fight the dispatcher loop.
- Response parse is lenient: strict JSON preferred, markdown-fence
tolerated, raw-body fallback on malformed JSON so the LLM can't
strand a task in triage.
- All failure modes (no aux client, API error, task moved out of
triage mid-call) return SpecifyOutcome(ok=False, reason=...) so
--all continues past individual failures.
Changes:
hermes_cli/kanban_db.py + specify_triage_task()
hermes_cli/kanban_specify.py NEW (~220 LOC — prompt, parse, call)
hermes_cli/kanban.py + specify subcommand + _cmd_specify
hermes_cli/config.py + auxiliary.triage_specifier task slot
website/docs/user-guide/features/kanban.md specify + config notes
website/docs/reference/cli-commands.md CLI reference entry
tests/hermes_cli/test_kanban_specify_db.py NEW (10 tests)
tests/hermes_cli/test_kanban_specify.py NEW (20 tests)
Validation: 30/30 targeted tests pass. E2E: triage task -> specify ->
ends in 'ready' with events [created, specified, promoted] and the
audit comment recorded under the configured author.
* feat(kanban): wire specifier into dashboard and gateway slash
Follow-ups to the initial PR #21435 — closes the two gaps I'd left as
post-merge: dashboard button and first-class gateway surface.
Dashboard (plugins/kanban/dashboard/)
- POST /tasks/:id/specify NEW endpoint. Thin wrapper around
kanban_specify.specify_task(). Returns the CLI outcome shape
({ok, task_id, reason, new_title}); ok=false with a human reason
is a 200, not a 4xx, so the UI can render it inline without
treating 'no aux client configured' as a crash.
- Runs sync in FastAPI's threadpool because the LLM call can take
tens of seconds on reasoning models.
- Pins HERMES_KANBAN_BOARD around the specify call so the module's
argless kb.connect() lands on the right board.
- dist/index.js: doSpecify callback threaded through the drawer →
TaskDetail → StatusActions prop chain. ✨ Specify button appears
ONLY when task.status === 'triage' (elsewhere the backend would
reject anyway — hide the button to keep the action row clean).
Busy state (Specifying…) + inline success/error banner under the
button using the response.reason text.
- dist/style.css: tiny hermes-kanban-msg-ok / -err classes using
existing --color vars so themes reskin cleanly.
Gateway slash (/kanban specify)
- Already works via the existing run_slash → build_parser →
kanban_command pipeline. No code change needed — slash commands
inherit the argparse tree automatically. Added coverage:
test_run_slash_specify_end_to_end (create --triage, specify, verify
promotion + retitle) and test_run_slash_specify_help_is_reachable.
Tests
- tests/plugins/test_kanban_dashboard_plugin.py: 3 new tests for the
REST endpoint — happy path, non-triage rejection as ok=false 200,
missing aux client as ok=false 200.
- tests/hermes_cli/test_kanban_cli.py: 2 new slash-surface tests.
Docs
- website/docs/user-guide/features/kanban.md: dashboard action row
description mentions ✨ Specify + all three surfaces. REST table
gains /tasks/:id/specify. Slash examples include /kanban specify.
Validation: 340/340 targeted tests pass. E2E via TestClient: create a
triage task over REST → POST /specify with mocked aux client → task
moves to 'ready' column on /board with new title and body applied.