Surfaces the pin command at the moment users care about it: when a
consolidation just landed against their skill library and they're
looking at the umbrella name in the curator output. Previously `hermes
curator pin` existed but had no discovery surface — users only learned
it existed by reading docs or stumbling onto `hermes curator --help`.
The hint:
archived 3 skill(s):
• docx-extraction → document-tools
• pdf-extraction → document-tools
• old-stale — pruned (stale)
full report: hermes curator status
keep an umbrella stable: hermes curator pin document-tools
Gated on having at least one consolidation that produced an umbrella.
Pruned-only runs (nothing surviving to pin) skip the hint. When
multiple umbrellas were produced, picks alphabetically first as a
concrete example rather than listing them all.
3 new tests in tests/agent/test_curator_classification.py covering:
consolidation produces hint with real umbrella name, pruned-only run
omits it, multi-umbrella picks one example.
* feat(gateway): add LINE Messaging API platform plugin
Adds LINE as a bundled platform plugin under `plugins/platforms/line/`,
synthesized from the strongest pieces of seven open community PRs. The
adapter requires zero core edits — `Platform("line")` is auto-discovered
via the bundled-plugin scan in `gateway/config.py`, and all hooks
(setup, env-enablement, cron delivery, standalone send) are wired
through `register_platform()` kwargs the way IRC and Teams do it.
Highlights merged into one plugin:
- **Reply token preferred, Push fallback.** Try the free reply token
first (single-use, ~60s TTL); fall back to metered Push when the
token is absent, expired, or rejected. (PR #21023)
- **Slow-LLM Template Buttons postback.** When the LLM is still running
past `LINE_SLOW_RESPONSE_THRESHOLD` (default 45s), the adapter burns
the original reply token to send a "Get answer" button bubble. The
user taps it to fetch the cached answer via a fresh reply token —
also free. State machine: PENDING → READY → DELIVERED, ERROR for
cancelled runs (orphan resolves to `LINE_INTERRUPTED_TEXT` after
/stop). Set threshold to 0 to disable. (PR #18153)
- **Three-allowlist gating** — separate user / group / room allowlists
with `LINE_ALLOW_ALL_USERS=true` dev-only escape hatch. (PR #18153)
- **Markdown URL preservation.** Strip bold/italic/code-fence/heading
markers (LINE renders them literally) but keep `[label](url)` →
`label (url)` so URLs stay tappable. (PR #18153)
- **System-message bypass** for `⚡ Interrupting`, `⏳ Queued`, etc. —
busy-acks reach the user as visible bubbles instead of being
swallowed into the postback cache. (PR #18153)
- **Media via public HTTPS URLs.** LINE doesn't accept binary uploads;
images/audio/video must be HTTPS-reachable. The adapter serves
registered tempfiles under `/line/media/<token>/<filename>` from the
same aiohttp app. Allowed-roots traversal guard covers
`tempfile.gettempdir()`, `/tmp` (→ `/private/tmp` on macOS), and
`HERMES_HOME`. `LINE_PUBLIC_URL` overrides URL construction for
setups behind tunnels/proxies. (PR #8398)
- **5-message-per-call batching.** LINE rejects >5 messages per
Reply/Push; smart-chunker caps text at 4500 chars per bubble.
- **Inbound dedup** via `webhookEventId` LRU. (PR #21023)
- **Self-message filter** via `/v2/bot/info` userId lookup. (PR #21023)
- **Loading-animation indicator** wired to LINE's `chat/loading/start`
endpoint, DM-only (LINE rejects it for groups/rooms). (PR #21023)
- **Out-of-process cron delivery** via `_standalone_send`, so
`deliver: line` cron jobs work even when cron runs detached from
the gateway.
- **Webhook hardening** — 1 MiB body cap, constant-time HMAC-SHA256
signature verification, dedup, scoped lock so two profiles can't
bind the same channel.
Validation
----------
- `scripts/run_tests.sh tests/gateway/test_line_plugin.py` →
73 passed in 1.05s
- `scripts/run_tests.sh tests/gateway/test_line_plugin.py
tests/gateway/test_irc_adapter.py
tests/gateway/test_plugin_platform_interface.py
tests/gateway/test_platform_registry.py
tests/gateway/test_config.py` → 193 passed, 7 skipped
- E2E import + register + signature roundtrip + `Platform("line")`
bundled-plugin discovery verified against current `origin/main`.
Closes the seven open LINE PRs (#18153, #16832, #6676, #21023, #14942,
#14988, #8398) by superseding them with a single plugin-form
implementation that takes the best idea from each.
Co-authored-by: pwlee <32443648+leepoweii@users.noreply.github.com>
Co-authored-by: Jetha Chan <jetha@google.com>
Co-authored-by: Cattia <openclaw@liyangchen.me>
Co-authored-by: perng <charles@perng.com>
Co-authored-by: Soichiro Yoshimura <soichiro0111.dev@gmail.com>
Co-authored-by: David Zhou <77736378+David-0x221Eight@users.noreply.github.com>
Co-authored-by: Yu-ga <74749461+yuga-hashimoto@users.noreply.github.com>
* docs(platforms): document platform-specific slow-LLM UX pattern
Add a 'Platform-Specific Slow-LLM UX' section to the platform-adapter
developer guide covering the _keep_typing override pattern that LINE
uses for its Template Buttons postback flow.
Three subsections:
- Pattern: subclass _keep_typing to layer mid-flight UX (with code)
- Pattern: subclass send to route through a cache instead of sending
- When this pattern is appropriate (vs. always-Push fallback)
Plus a short pointer in gateway/platforms/ADDING_A_PLATFORM.md so
tree-readers find the prose walkthrough on the docsite.
Filed because the LINE plugin (PR #23197) was the first bundled
adapter to need this pattern — every prior plugin (irc, teams,
google_chat) handles slow responses with the default typing-loop and
a regular send_text. Documenting now while the rationale is fresh.
---------
Co-authored-by: pwlee <32443648+leepoweii@users.noreply.github.com>
Co-authored-by: Jetha Chan <jetha@google.com>
Co-authored-by: Cattia <openclaw@liyangchen.me>
Co-authored-by: perng <charles@perng.com>
Co-authored-by: Soichiro Yoshimura <soichiro0111.dev@gmail.com>
Co-authored-by: David Zhou <77736378+David-0x221Eight@users.noreply.github.com>
Co-authored-by: Yu-ga <74749461+yuga-hashimoto@users.noreply.github.com>
Adapted from PR #20568 commit ce3518578 (Eric Litovsky / @kallidean).
Adds two-tier gating for the kanban tool surface so dispatcher-spawned
workers see only task-lifecycle tools (show/complete/block/heartbeat/
comment/create/link) while orchestrator profiles with `toolsets: [kanban]`
also see board-routing tools (kanban_list, kanban_unblock).
Workers shouldn't be enumerating or unblocking the board — they should
close their own task via the lifecycle tools. Hiding board-routing tools
from worker schemas keeps the worker focused and the toolset-isolation
contract honest.
Plus inherited from the same upstream commit:
- 50/200 row bound on kanban_list with `truncated` + `next_limit` metadata.
- Belt-and-suspenders runtime guard `_require_orchestrator_tool()` inside
the orchestrator handlers in case a stale registration ever routes a
worker to one of them.
- Tests for the new gate, the stricter bound, and the fact that even a
worker with `toolsets: [kanban]` in config still doesn't see board
routing.
Co-authored-by: Eric Litovsky <elitovsky@zenproject.net>
Two follow-ups from self-review:
1. Add gpt-5.3-codex-spark to DEFAULT_CONTEXT_LENGTHS at 128k. The
primary resolution path for Spark goes through provider='openai-codex'
→ _CODEX_OAUTH_CONTEXT_FALLBACK (already correct). But if any future
code path resolves Spark's context with a different provider (custom
proxy, generic fallthrough), the longest-substring-first lookup in
step 8 would match 'gpt-5' and report 400k, which is wrong by ~3x.
Adding the explicit override is a cheap defensive correctness fix
matching how gpt-5.4-mini and gpt-5.4-nano already shadow the generic
gpt-5 entry.
2. Update test_openai_codex_model_validation_fallback.py docstring. The
bug it was originally written for (gpt-5.3-codex-spark missing from
listing) is now resolved by this PR's catalog restoration. The test
still validly exercises the soft-accept code path for any future
entitlement-gated Codex slug that ships before Hermes catalogs it,
but the framing was stale — clarified.
Two follow-ups from self-review:
1. Add unit test for _fetch_models_from_api covering the live HTTP path.
The salvaged PR #19530 dropped the supported_in_api:false filter in
both _fetch_models_from_api and _read_cache_models, but only the
cache path had a regression test. This adds the symmetric live-fetch
test (mocked httpx) so a future drive-by change to the HTTP path
can't silently re-introduce the filter.
2. Pin test_codex_picker_uses_live_codex_catalog to the cache fallback.
The test wrote a fake JWT and a CODEX_HOME cache, but provider_model_ids
('openai-codex') still issued a real 10s HTTP probe to
chatgpt.com/backend-api/codex/models before falling back to the cache.
That made the test slow and non-deterministic in restricted/CI
networks. Patch _fetch_models_from_api to return [] so we go straight
to the cache path the test actually means to exercise.
Closes#6051.
Reported failure mode: agent migrated to WSL2, browser launch failed
because Playwright wasn't installed yet. Background reviewer captured
the failure as a durable skill (`browser-tool-launch-issue`) and the
agent kept refusing the browser tool for weeks after Playwright was
installed and verified working. Negative claims also propagated into
unrelated skills ("browser tools do not work", "cannot use Y from
execute_code").
Root cause: `_SKILL_REVIEW_PROMPT` and `_COMBINED_REVIEW_PROMPT` both
lean hard on "be active, save things, a pass that does nothing is a
missed learning opportunity." Neither distinguished durable knowledge
from transient environment state. The reviewer was doing what it was
told.
Fix at the write site — both prompts now carry a "Do NOT capture"
section calling out:
• Environment-dependent failures (missing binaries, fresh-install
errors, post-migration path mismatches, 'command not found',
unconfigured credentials, uninstalled packages)
• Negative claims about tools or features ("X does not work")
that harden into self-cited refusals
• Session-specific transient errors that resolved before the
conversation ended
• One-off task narratives ("summarize today's market", "analyze
this PR") — also addresses the #12812 / #4538 family
Plus a positive-reframing line: when a tool fails because of setup
state, capture the FIX (install command, config step, env var)
under an existing setup/troubleshooting skill — never "this tool
doesn't work" as a standalone constraint.
Targeted tests: 24/24 passing in tests/run_agent/test_review_prompt_class_first.py
(2 new + all existing review-prompt assertions). Substring-based
checks so future prompt edits don't false-fail.
The previous PR (#22993) gave us a structured WARNING per stream drop
but the only diagnostic was 'error_type=APIError error=Network
connection lost.' — same nothing the user started with. To actually
diagnose why subagents drop streams disproportionately we need to know
WHERE the drop happened.
Adds three breadcrumbs to the agent.log WARNING:
1. Inner exception chain. openai SDK wraps httpx errors as
APIConnectionError / APIError so the catch site only sees the
wrapper. _flatten_exception_chain walks __cause__/__context__ up to
4 levels deep and renders 'Outer(msg) <- Inner(msg)' so we can
tell ConnectError vs RemoteProtocolError vs ReadError vs
ProxyError without enabling verbose mode.
2. Upstream HTTP headers. Snapshots cf-ray, x-openrouter-provider,
x-openrouter-model, x-openrouter-id, x-request-id, server, via,
etc. from stream.response immediately after open (so they survive
even when the stream dies before the first chunk). These answer
'is one CF edge / one downstream provider responsible, or random?'
3. Per-attempt counters. bytes streamed, chunk count, elapsed time on
the dying attempt, and time-to-first-byte. Distinguishes 'couldn't
connect at all' (0s, 0 bytes) from 'died after 30s mid-stream'
(very different root causes — first is auth/routing, second is
upstream idle-kill or proxy timeout).
Plumbing:
- _stream_diag_init / _stream_diag_capture_response live on AIAgent
and produce a per-attempt dict held on request_client_holder['diag']
for closure access from the retry block.
- _call_chat_completions and _call_anthropic both initialize the diag
and increment counters per chunk/event (best-effort, never raises in
the streaming hot path).
- _log_stream_retry / _emit_stream_drop accept an optional diag and
render the new fields. Final-exhaustion log goes through the same
helper so it gets the same diagnostic dump.
- UI status line gains a brief 'after Xs' suffix when timing is
available — distinguishes 'connect failed' from 'died mid-stream'
at a glance without grepping logs.
Sample WARNING after this change:
Stream drop mid tool-call on attempt 2/3 — retrying.
subagent_id=sa-2-cafef00d depth=1 provider=openrouter
base_url=https://openrouter.ai/api/v1
error_type=APIError error=Connection error.
chain=APIError(Connection error.) <- RemoteProtocolError(peer
closed connection without sending complete message body)
http_status=200 bytes=12400 chunks=47 elapsed=12.00s ttfb=0.83s
upstream=[cf-ray=8f1a2b3c4d5e6f7g-LAX
x-openrouter-provider=Anthropic
x-openrouter-id=gen-abc123 server=cloudflare]
Tests: 10 covering diag init, header capture (whitelist enforced for
PII), exception-chain walking + depth cap, log content with full diag,
log content without diag (placeholders), UI elapsed-suffix on/off.
Closes#21794.
`/kanban`, `/kanban help`, `/kanban --help`, and `/kanban <sub> -h`
all returned broken output to the gateway and interactive CLI. Three
underlying bugs in `hermes_cli.kanban.run_slash`:
1. argparse writes help to **stdout** but `run_slash` only captured
stderr at parse time, so `-h` text was silently swallowed and
replaced with the `(usage error: 0)` sentinel.
2. The wrapping parser used `prog="/"` and routed via a synthetic
"_top → kanban" subparser, producing `usage: / kanban …` (stray
space) and `usage: /kanban kanban …` (doubled token) in error text.
3. Bare `/kanban` and `/kanban help` dumped argparse's full ~3KB
usage tree, which reads as visual garbage in a chat bubble.
Fix: drive the kanban_parser directly (no double-wrap), rewrite prog
strings on every leaf subparser, capture stdout AND stderr around
parse_args, distinguish SystemExit(0) (help — return captured stdout)
from SystemExit(2) (error — return single-line ⚠-prefixed message),
and add an explicit chat-friendly short-help block returned for bare
invocation and the help aliases (`help`, `--help`, `-h`, `?`).
Added 5 regression tests covering bare invocation, every help alias,
subcommand help, unknown action, and missing required arg.
Affects every chat platform via gateway/run.py::_handle_kanban_command
and the interactive CLI via cli.py::_handle_kanban_command.
Co-Authored-By: Nagatha (Claude Opus 4.7) <noreply@anthropic.com>
Both `_kanban_notifier_watcher` and `_kanban_dispatcher_watcher`'s
`_tick_once_for_board` called `_kb.connect(board=slug)` immediately
followed by `_kb.init_db(board=slug)`. Since `connect()` already runs
the schema + idempotent migration on first open per process, the
explicit `init_db()` was redundant — and worse, `init_db()` deliberately
busts the per-process `_INITIALIZED_PATHS` cache and re-runs the migration
on a *second* connection that races the first.
On every cold gateway start against a legacy DB this surfaced as either
`sqlite3.OperationalError: duplicate column name: <col>` or intermittent
`database is locked` errors logged at the first tick. The duplicate-column
case is now tolerated by `_add_column_if_missing` (commit 78698381a), but
the wasted second migration plus the database-is-locked race remain
fixable by skipping the redundant call entirely.
Drops `_kb.init_db(board=slug)` at both call sites and adds a regression
test in `tests/hermes_cli/test_kanban_notify.py` that pins the absence
via source inspection plus a runtime spy.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Subagent stream drops were spamming the parent terminal with two lines
per blip ('Connection dropped...' + 'Reconnected...') while leaving zero
breadcrumb in agent.log to debug them.
Two underlying bugs, fixed together:
1. quiet_mode raised the run_agent/tools/etc. loggers to ERROR, which
filters records before root-logger file handlers see them. The comment
claimed 'File handlers still capture everything' — that was wrong.
Removed in both run_agent.py and cli.py; console quietness already
comes from hermes_logging not installing a console StreamHandler in
non-verbose mode.
2. The stream-retry blocks emitted two _emit_status calls per drop
('⚠️ Connection dropped... Reconnecting...' + '🔄 Reconnected —
resuming…') with no provider name, so multi-provider sessions had to
dig through agent.log to attribute a drop. Replaced both call sites
with a single _emit_stream_drop helper that emits ONE line naming the
provider and error class, and always writes a structured WARNING to
agent.log with subagent_id, depth, provider, base_url, error_type.
Net UX change: 6 lines per triple-subagent drop → 3 lines, each
naming the provider. agent.log now has a structured breadcrumb per
retry that didn't exist before.
Tests: 6 new tests in tests/run_agent/test_stream_drop_logging.py
covering the logger-level guard, structured WARNING content, single
status line per drop (no Reconnected follow-up), and provider naming.
When the active main model has native vision and the provider supports
multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini
3, OpenRouter, Nous), vision_analyze loads the image bytes and returns
them to the model as a multimodal tool-result envelope. The model then
sees the pixels directly on its next turn instead of receiving a lossy
text description from an auxiliary LLM.
Falls back to the legacy aux-LLM text path for non-vision models and
unverified providers.
Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and
Cline. All four converge on the same pattern: tool results carry image
content blocks for vision-capable provider/model combinations.
Changes
- tools/vision_tools.py: _vision_analyze_native fast path + provider
capability table (_supports_media_in_tool_results). Schema description
updated to reflect new behaviour.
- agent/codex_responses_adapter.py: function_call_output.output now
accepts the array form for multimodal tool results (was string-only).
Preflight validates input_text/input_image parts.
- agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so
tools see the live CLI/gateway override, not the stale config.yaml
default. set_runtime_main()/clear_runtime_main() helpers.
- run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn
start so vision_analyze's fast-path check sees the actual runtime.
- tests/conftest.py: clear runtime-main override between tests.
Tests
- tests/tools/test_vision_native_fast_path.py: provider capability
table, envelope shape, fast-path gating (vision-capable model uses
fast path; non-vision model falls through to aux).
- tests/run_agent/test_codex_multimodal_tool_result.py: list tool
content becomes function_call_output.output array; preflight
preserves arrays and drops unknown part types.
Live verified
- Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a
typed filepath, gets pixels back, reads exact text from images that
no aux description could capture (font color irony, multi-line
fruit-count list, etc.).
PR replaces the closed prior efforts (#16506 shipped the inbound user-
attached path; this PR closes the gap for tool-discovered images).
Adds test_notifier_second_blocked_delivers to cover the case where a
task is blocked, unblocked, then blocked again — the second blocked
event must still deliver a gateway notification.
Currently fails because blocked is treated as a terminal event kind,
causing the subscription to be dropped after the first block.
Linux's MAX_ARG_STRLEN caps any single argv element at 128 KB
(32 * PAGE_SIZE). The previous heredoc-in-the-command-string approach
in _write_to_sandbox put the entire tool result inside the 'bash -c'
arg, so any result over ~128 KB raised OSError [Errno 7] 'Argument
list too long' before the heredoc ever ran. The caller logged a
warning, but quiet_mode (CLI default) sets tools.* to ERROR — so the
warning never reached agent.log either, and the agent saw a 1.5 KB
preview tagged 'Full output could not be saved to sandbox'. Hits
delegate_task with 3+ subagent outputs routinely now.
Switch to passing content via env.execute(stdin_data=...). cmd is
now just 'mkdir -p X && cat > Y' (under 1 KB), and the heavyweight
payload travels through stdin where there is no argv-element limit.
E2E reproduced the user's exact 144,778-char delegate_task envelope:
old code OSError'd, new code round-trips cleanly to disk with all
three task summaries intact.
* feat(curator): show rename map (where skills went) in user-visible summary
The full data has always been on disk in REPORT.md, but the user-visible
curator summary (gateway 💾 line, CLI session-start panel,
`hermes curator status`) was counts-only — "consolidated 4 into 2
umbrellas" with no names. Users only discovered renames when something
they expected was gone.
New `_build_rename_summary()` formats the rename map and appends it to
`final_summary`:
auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1
archived 3 skill(s):
• docx-extraction → document-tools
• pdf-extraction → document-tools
• old-stale-thing — pruned (stale)
full report: hermes curator status
Empty on no-op ticks (no archives), so most ticks add zero log noise.
Cap of 10 entries keeps agent.log readable when a 50-skill
consolidation lands; the full list is always in REPORT.md.
`hermes curator status` indents continuation lines so the multi-line
summary reads as one logical field.
5 new tests in tests/agent/test_curator_classification.py covering
empty / consolidation / pruning / cap / mixed cases.
* feat(curator): show recent run summary once on `hermes update`
The rename map is now visible from where users actually look — the
update flow they explicitly run, instead of just the live gateway log
or transient CLI session-start panel.
Behavior:
- After `hermes update`, if the most recent curator run produced a
rename map (multi-line summary) that the user hasn't seen yet, print
it once with a 'last run Xh ago' header and a one-time-message
footer.
- Stamp `last_run_summary_shown_at = last_run_at` after printing so
subsequent `hermes update` invocations are silent until a newer
curator run lands.
- Silent on no-op runs (single-line summary like 'auto: no changes;
llm: no change'). Still stamps shown so we don't reconsider on
every update.
- Silent when the curator has never run (the existing first-run
notice handles that case).
Output:
ℹ Skill curator — last run 4h ago
auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1
archived 3 skill(s):
• docx-extraction → document-tools
• pdf-extraction → document-tools
• old-stale-thing — pruned (stale)
full report: hermes curator status
(This message shows once per curator run. View anytime: hermes curator status)
State migration:
- `_default_state()` gains `last_run_summary_shown_at: None`. Existing
state files lack the field; `.get()` returns None; the comparison
treats any prior run as 'not yet shown' and prints once on next
update. Self-healing.
Wiring:
- Both `hermes update` paths in main.py call the new
`_print_curator_recent_run_notice()` right after the existing
first-run notice. Best-effort try/except so a state-load bug
never breaks the update flow.
6 tests in tests/hermes_cli/test_curator_recent_run_notice.py:
no-run / single-line / multi-line / show-once / new-run-resets /
time-formatter buckets.
Follow-up test fix for #22693 — the existing test for ps-failure +
pid-file fallback needed the /proc walk path stubbed too since /proc
is now consulted first.
Salvage of NousResearch/hermes-agent#7622.
Docker images often lack procps so `ps` is unavailable. Try reading
/proc/*/cmdline first (works in any Linux container) and fall back to
`ps -A eww` only when /proc is not present. PermissionError on
individual PIDs is silently skipped.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
run_gateway() calls refresh_systemd_unit_if_needed() on every invocation
so restart settings stay current after exit-code-75 respawns. The
user-scope unit path resolves under Path.home() (NOT sandboxed by
conftest, only HERMES_HOME is), and generate_systemd_unit() bakes the
current HERMES_HOME into the unit's Environment= line.
Result: any test that exercises run_gateway() end-to-end on a real
Linux dev box silently rewrites the developer's installed
~/.config/systemd/user/hermes-gateway.service with a polluted
HERMES_HOME pointing at /tmp/pytest-of-<user>/.../hermes_test. On the
next reboot, systemd loads that unit, the gateway starts looking at an
empty tmp dir, and Telegram/Discord/etc. all show as 'No messaging
platforms enabled' even though the user's real config is fine. Three
tests in tests/hermes_cli/test_gateway.py hit this path:
test_run_gateway_exits_cleanly_on_keyboard_interrupt,
test_run_gateway_exits_nonzero_when_start_gateway_reports_failure, and
test_run_gateway_root_guard_has_escape_hatch.
Two-layer fix:
1. _install_fake_gateway_run helper (covers all four run_gateway() call
sites in test_gateway.py and any future ones) now also stubs
supports_systemd_services and refresh_systemd_unit_if_needed.
2. refresh_systemd_unit_if_needed() itself sniffs the generated unit
body for /pytest-of- and /hermes_test markers and refuses to write
when present. Defense in depth so a future test that bypasses the
helper still can't corrupt the dev's gateway. Tests that legitimately
exercise the refresh flow (test_run_gateway_refreshes_outdated_unit_on_boot)
patch generate_systemd_unit to return synthetic content that doesn't
carry those markers, so they keep working.
Adds test_refresh_refuses_to_bake_pytest_tmpdir_into_real_user_unit as a
regression test for the source-side guard.
RuntimeError('claude CLI turn timed out') from a local OpenAI-compatible
shim was falling through to FailoverReason.unknown, surfacing as 'Empty
response from model' and burning 3 retry slots on the same failing
endpoint. _classify_by_message had no timeout-message branch — only
billing/rate_limit/auth/context_overflow/model_not_found patterns. The
type-based check at line 565 also requires isinstance(error, (TimeoutError,
ConnectionError, OSError)) — a plain RuntimeError doesn't match.
Add _TIMEOUT_MESSAGE_PATTERNS for 'timed out', 'deadline exceeded',
'request timed out', 'operation timed out', 'upstream timed out', 'turn
timed out'. _classify_by_message returns FailoverReason.timeout (retryable=True)
when any pattern matches.
Salvage of #22664's classifier portion. The original PR also bundled a
fallback self-selection guard which is now redundant (already on main
via #22780) plus DeepSeek thinking and session_search fixes that are
their own separate concerns.
Follow-up to #22780 — fixes the still-broken classification of
generic-typed provider-shim timeouts that #22780's dedup didn't cover.
Fallback chain entries with 'api_key_env: ENV_VAR_NAME' weren't being
resolved by either the init-time fallback path (line ~1660) or the
runtime _try_activate_fallback path (line ~8045). Only literal
'api_key' was honored; the snake_case 'api_key_env' alias documented
elsewhere in the config was silently dropped, so a 'provider: custom'
fallback with base_url + api_key_env worked as primary but failed as
fallback with 'no endpoint credentials found' / 401.
Adds 'or fb.get("api_key_env")' to the existing 'key_env' lookup in
both call sites, with empty-string-to-None coercion so unset env vars
don't poison the resolver.
Salvage of #22665's fallback portion. The original PR also bundled
gateway-degrade-on-no-adapters changes (those land via the carve-out
in #22853 which is the same code) and run_agent.py memory-nudge
counter hydration (issue #22357 territory, not mentioned in the
title). Drops both bundled pieces; keeps just the api_key_env fix.
Closes#5392.
When connected_count == 0 AND enabled_platform_count > 0, the gateway
treated 'all adapters returned None' identically to 'all adapters
failed to connect' — both as fatal startup errors. The 'returned None'
case happens when imports fail silently or when adapters are present
in config but their dependencies aren't installed (e.g. discord.py
missing). Cron jobs and other gateway-runtime work would unnecessarily
fail to start.
Split: only return False when startup_retryable_errors is non-empty
(real connection attempt failed). When the list is empty AND enabled
> 0, log a warning and continue running, matching the 'no platforms
enabled' cron path.
Salvage of #22642's gateway slice. Drops the bundled run_agent.py
memory-nudge counter hydration block (issue #22357 territory) which
wasn't mentioned in the PR description.
Closes#5196.
Problem: terminal.docker_env set in config.yaml was silently ignored.
Docker containers never received the user-specified env vars.
Root cause: docker_env was missing from all three config→env bridging
maps (cli.py env_mappings, gateway/run.py _terminal_env_map,
hermes_cli/config.py _config_to_env_sync) and from the terminal_tool
_get_env_config() reader. _create_environment() consumed the key from
container_config correctly, but it was always {} because TERMINAL_DOCKER_ENV
was never set.
Also extend the list-serialisation branches in cli.py and gateway/run.py
to handle dict values via json.dumps (lists already used json.dumps;
plain str() on a dict produces undecodable output).
Fix:
- cli.py: add "docker_env": "TERMINAL_DOCKER_ENV" to env_mappings;
serialise dict values with json.dumps alongside existing list path
- gateway/run.py: same additions to _terminal_env_map and serialisation
- hermes_cli/config.py: add "terminal.docker_env": "TERMINAL_DOCKER_ENV"
to _config_to_env_sync so `hermes config set terminal.docker_env …`
persists to .env correctly
- tools/terminal_tool.py: add docker_env key to _get_env_config() reading
TERMINAL_DOCKER_ENV via _parse_env_var with default "{}"
Tests: add test_docker_env_is_bridged_everywhere to
tests/tools/test_terminal_config_env_sync.py — stash-verified: fails on
origin/main, passes with fix.
Fixes#20537
After Popen succeeds with os.setsid (detached process group), 5 things
happen with no try/except: Thread construction, reader.start(), lock
acquisition, prune+register, checkpoint write. If any raises, the
Popen object goes unregistered and the detached process group leaks
indefinitely.
Wrap the post-spawn setup in try/except. On failure:
- os.killpg(getpgid(pid), SIGKILL) takes down the entire process
group (not just the shell - important because of detached PG +
-lic shell wrapper that may have spawned children)
- proc.kill() fallback for ProcessLookupError/PermissionError/OSError
- proc.wait(timeout=5) reaps with a bound
- re-raise to preserve original traceback
Nested try/except around cleanup so a secondary failure can't mask the
original.
Closes#2749.
Problem
=======
`tools.checkpoint_manager._touch_project` reads the project metadata
file with `json.loads(meta_path.read_text(...))`, then immediately does:
meta["workdir"] = str(_normalize_path(working_dir))
The `except` block only catches `(OSError, ValueError)`. When the file
parses successfully but returns a non-dict value (a list `[]`, `null`,
or a scalar from a corrupted or hand-truncated write), `json.loads`
succeeds without error and `meta` is set to, e.g., `[]`. The subsequent
subscript assignment then raises `TypeError: list indices must be
integers or slices, not str`, which is NOT caught by the narrow except
clause.
This TypeError propagates up through `_take` to `ensure_checkpoint`,
where the broad `except Exception` safety net swallows it. The effect
is that `ensure_checkpoint` silently returns False for the entire
session — all checkpoints are skipped for the affected working directory
without any user-visible error.
Root cause
==========
Missing `isinstance(meta, dict)` guard after `json.loads`, identical in
pattern to bugs fixed in `cron/jobs.py` (#22569) and
`tools/process_registry.py` (#22544). The same guard is already
present one function below in `_list_projects` (line 506), but was
inadvertently omitted in `_touch_project`.
Fix
===
Add two lines after the try/except:
```python
if not isinstance(meta, dict):
meta = {}
```
This matches the existing guard in `_list_projects` and ensures a fresh
empty dict is used whenever the persisted value is not a mapping —
preserving the `created_at` semantics via `setdefault` on the next line.
Tests
=====
`TestTouchProjectMalformedMeta` covers four non-dict root values
(`[]`, `null`, `42`, `"oops"`). Each writes a corrupted metadata file,
calls `_touch_project`, and asserts: (a) no exception raised, (b) the
metadata file is rewritten as a valid dict containing `last_touch` and
`workdir`. All four fail on main with `TypeError`, pass with fix.
Full `tests/tools/test_checkpoint_manager.py` regression: 77 passed.
The FTS5 trigram tokenizer requires >=3 CJK characters per individual
token to produce matchable trigrams. A query like "广西 OR 桂林 OR 漓江"
has cjk_count=6 (passes the existing >=3 guard) but each token is only
2 CJK chars, so the trigram index returns 0 results.
Fix:
- Add per-token check: if any non-operator CJK token has <3 CJK chars,
force the LIKE fallback path regardless of total cjk_count.
- Expand the LIKE fallback to build one LIKE condition per non-operator
token joined with OR, so each term is matched independently.
Regression tests added in TestCJKSearchFallback:
- test_cjk_or_combined_short_tokens_returns_results
- test_cjk_short_token_or_query_preserves_filters
Problem:
When a provider or proxy drops a streaming response mid-flight (httpcore
raises RemoteProtocolError: "incomplete chunked read", "peer closed
connection", "response ended prematurely", etc.), _generate_summary
would not classify it as a transient error. Instead of retrying on the
main model, it entered the generic 60-second cooldown, leaving context
growing unbounded until the cooldown expired. Issue #18458.
Root cause:
_is_connection_error in auxiliary_client.py did not match httpcore's
streaming premature-close error substrings. context_compressor.py's
_generate_summary except block never called _is_connection_error, so
those errors fell through to the 60-second generic cooldown rather than
triggering the retry-on-main fallback path used for timeouts.
Fix:
1. auxiliary_client.py — extend _is_connection_error keyword list with:
"incomplete chunked read", "peer closed connection",
"response ended prematurely", "unexpected eof",
"remoteprotocolerror", "localprotocolerror".
Also guard the `from openai import ...` with try/except ImportError
so the function works in environments without the openai package.
2. context_compressor.py — import _is_connection_error and call it in
_generate_summary's except block as _is_streaming_closed. Include
_is_streaming_closed in the fallback-to-main condition (alongside
_is_model_not_found, _is_timeout, _is_json_decode) and use the
shorter 30s transient cooldown for streaming-closed errors.
Tests:
4 new regression tests in TestStreamingClosedFallback:
- test_incomplete_chunked_read_falls_back_to_main
- test_peer_closed_connection_falls_back_to_main
- test_streaming_closed_on_main_uses_short_cooldown (stash-verified)
- test_non_streaming_unknown_error_still_uses_long_cooldown
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR #2974 whitelisted three reasoning fields (reasoning, reasoning_details,
codex_reasoning_items) for the gateway's simple-text replay branch. Three
more fields were added to the DB later but the whitelist was never updated:
- reasoning_content: provider-facing thinking text. _copy_reasoning_content_for_api
promotes 'reasoning' -> 'reasoning_content' at send time only when the
strings happen to match. Carrying the original verbatim avoids loss
for providers that return them as distinct fields (DeepSeek/Kimi/
Moonshot thinking modes), and preserves the empty-string sentinel
that DeepSeek V4 Pro requires for thinking-mode replay.
- codex_message_items: exact assistant message items with 'phase'.
OpenAI docs: 'preserve and resend phase on all assistant messages —
dropping it can degrade performance.' Required for prefix cache hits.
No recovery path exists — once dropped, gone.
- finish_reason: informational; cheap to keep so transcripts replay
identically across CLI and gateway.
The CLI is unaffected because cli.py keeps the live in-memory message list
across turns (cli.py:10046 'self.conversation_history = result["messages"]').
The gateway rebuilds agent_history from the SQLite transcript on every turn,
so any field stripped during replay is silently lost.
Refactors the inline whitelist into a module-level _build_replay_entry()
helper so the contract can be unit-tested. 16 new tests pin the field set
and falsy-value handling.
Verified end-to-end: DB stores all 8 fields, replay now preserves all 8
(was preserving only 5 for assistant text turns).
Pick openrouter/pareto-code as your model and OpenRouter auto-routes each
request to the cheapest model meeting your coding-quality bar (ranked by
Artificial Analysis). The new openrouter.min_coding_score config key (0.0-1.0,
default 0.65) tunes the floor.
- hermes_cli/models.py: add openrouter/pareto-code to OPENROUTER_MODELS so
it shows up in the picker with a description
- hermes_cli/config.py: add openrouter.min_coding_score (default 0.65 — lands
on a mid-tier coder on the current Pareto frontier)
- plugins/model-providers/openrouter: emit extra_body.plugins =
[{id: pareto-router, min_coding_score: X}] when model is openrouter/pareto-code
AND the score is a valid float in [0.0, 1.0]
- agent/transports/chat_completions.py: same emission on the legacy flag
path (when no provider profile is loaded)
- run_agent.py: openrouter_min_coding_score kwarg + storage; plumbed into
both build_kwargs() invocations and the context-summary extra_body path
- cli.py: read openrouter.min_coding_score once at init, validate float in
[0,1], pass to AIAgent constructions (CLI + background-task paths)
- cron/scheduler.py, batch_runner.py, tools/delegate_tool.py,
tui_gateway/server.py: propagate the kwarg (mirrors providers_order
plumbing — subagents inherit, cron/batch read from config)
- tests: profile-level + transport-level coverage of the model gating,
unset/empty/out-of-range handling, and the legacy flag path
- docs: new 'OpenRouter Pareto Code Router' section in providers.md
Verified end-to-end against api.openrouter.ai: at score=0.65 we land on a
mid-tier coder, at omission we get the strongest. Score is silently dropped
on any model other than openrouter/pareto-code, so it's safe to leave set.
Pass session_id through to provider profile build_api_kwargs_extras so
the OpenRouter profile can attach an xAI cache-affinity header
(x-grok-conv-id: <session-id>) for x-ai/grok-* models. xAI prompt
cache requires server affinity via this header — without it the cache
is poisoned and Grok prompt-cache hit rates drop dramatically on
multi-turn sessions.
Carve-out of #22708 by Ninso112. The original PR bundled a /diff
slash command, a zsh completion fix (already on main via #22802),
and holographic memory null-guards. This salvage keeps just the
Grok header work — small, targeted, and well-tested. Other
contributors and changes preserved for separate review.
Closes#22705.
Add a configurable notifications mode for the Telegram platform adapter
that controls which messages trigger push notifications.
- display.platforms.telegram.notifications: "all" (default) | "important"
- HERMES_TELEGRAM_NOTIFICATIONS env var override
- In "important" mode, all sends use disable_notification=True except:
- Approvals (send_exec_approval) and slash confirmations
- Final response messages (metadata["notify"]=True)
- Zero overhead in default "all" mode
- Zero impact on non-Telegram platforms
Closes#22771
acp_command / acp_args descriptions previously primed the model to
populate them — "Per-task ACP command override (e.g. 'copilot')" —
even when no ACP CLI was installed. Models with weaker schema-following
discipline would set them and the spawn would fail.
Add explicit "Do NOT set unless the user has explicitly told you"
guidance at both the top-level acp_command and the per-task override.
Strengthen acp_args to mention it's empty unless acp_command is set.
Adds 2 tests pinning the descriptions.
Note: this is a cosmetic prompt-engineering fix — the params remain
exposed in the schema. The fully-correct fix is to gate them behind
a config flag or runtime ACP-CLI detection so the schema only emits
them when an ACP harness is available. Tracked as a follow-up; this
PR ships the low-cost stopgap.
Salvage of #22680 (delegate schema only). The original PR also
bundled unrelated fixes for #22548, #21944, #22150 — those
need separate PRs since #22548 and #21944 are already addressed
on main (#22780 + #22798 in flight) and #22150 deserves its own
review.
Closes#22013.
Two co-located fixes:
1. agent/model_metadata.py: bump hy3-preview static fallback from
256000 to 262144 (256 * 1024) to match OpenRouter live metadata
so cache and offline both agree (issue #22268).
2. tests/hermes_cli/test_tencent_tokenhub_provider.py: replace the
exact-value change-detector (assert ctx == 256000) with an
invariant assertion (registered + >= 4096). Per AGENTS.md
'Don't write change-detector tests': pinning the upstream-controlled
context length is exactly the test class the rule forbids — it
breaks every time the provider bumps the published value, with
zero behavioral coverage gained.
Salvage of #22574 with a redirect on the test approach. The
contributor's diff bumped the integer and added a SECOND
change-detector pinning DEFAULT_CONTEXT_LENGTHS[hy3-preview] == 262144,
which would re-break on the next published bump. We instead delete
the change-detector entirely and assert the relationship.
Closes#22268.
Problem
-------
`hermes doctor` ran two health checks for Anthropic: a dedicated one
with the correct `x-api-key` + `anthropic-version` headers, and a
generic Bearer-auth one driven by the pluggable `ProviderProfile` for
"anthropic". The generic check called `https://api.anthropic.com/v1/models`
with `Authorization: Bearer ...`, which Anthropic answers with HTTP 404,
producing a noisy duplicate warning even when the dedicated check passed.
Root cause
----------
`hermes_cli/doctor.py:_build_apikey_providers_list` deduplicated profiles
against a `_known_canonical` set built from the static list (Z.AI/GLM,
Kimi, DeepSeek, …). Providers with their own dedicated check above the
generic loop (Anthropic, OpenRouter, Bedrock) were not in that set, so
their profiles were appended and ran a second, broken check.
Fix
---
Add `{"anthropic", "openrouter", "bedrock"}` to the skip set, and
also skip profiles whose aliases match any of those names (e.g.
`claude`, `claude-oauth` → anthropic).
Tests
-----
tests/hermes_cli/test_doctor_dedicated_provider_skip.py:
- test_build_apikey_providers_list_skips_dedicated_check_providers:
asserts the assembled list does not contain anthropic, openrouter,
or bedrock entries.
- test_build_apikey_providers_list_includes_non_dedicated_providers:
sanity guard that legitimate providers (DeepSeek, Z.AI/GLM) survive.
Both confirmed via stash-verify (fail pre-fix with anthropic/openrouter
leaking, pass post-fix).
Fixes#22346
ALTER TABLE calls inside _migrate_add_optional_columns were guarded by a
snapshot of PRAGMA table_info taken at function entry. When the gateway
dispatcher opens the kanban DB twice per tick (once in _tick_once_for_board
and once via init_db's discard-and-reconnect path), a second connection can
run the same migration before the first one commits, causing:
sqlite3.OperationalError: duplicate column name: consecutive_failures
This crashed the dispatcher on every first tick after a gateway restart
(subsequent ticks succeeded because the columns were then present).
Fix: introduce _add_column_if_missing() which wraps ALTER TABLE in a
try/except that swallows OperationalError whose message contains
'duplicate column name'. All ALTER TABLE calls in
_migrate_add_optional_columns are routed through this helper.
Closes#21708
DeepSeek V4 Pro returns thinking content as typed blocks inside the
content array rather than as a top-level reasoning_content field:
[{"type": "thinking", "thinking": "..."}, {"type": "output", ...}]
_extract_reasoning only handled content as a plain string, so the
thinking text was silently dropped. On the next turn the session was
replayed without the thinking block, causing:
HTTP 400: The content[].thinking in the thinking mode must be
passed back to the API.
Fix: when content is a list and no structured reasoning field was
found, scan for items with type=='thinking' and accumulate their
'thinking' (or 'text') value into reasoning_parts. Structured fields
(reasoning, reasoning_content, reasoning_details) still take priority
so existing provider behaviour is unchanged.
Closes#21944
skills/media/youtube-content/scripts/fetch_transcript.py and
optional-skills/productivity/memento-flashcards/scripts/youtube_quiz.py
both import youtube-transcript-api at runtime, but the package was not
listed in pyproject.toml. A fresh `uv sync` therefore omits it, and
both skills fail on first invocation with:
ModuleNotFoundError: No module named 'youtube_transcript_api'
Add a new [youtube] optional-dependency group with
youtube-transcript-api>=1.2.0 (the v1.x API surface the scripts already
use) and include it in [all] so standard installs pick it up.
Regression tests: TestPyprojectDeclaresYoutubeExtra verifies the extra
is present in pyproject.toml and included in [all].
Closes#22243
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>