Mirrors the pattern already shipping in hindsight-integrations/openclaw:
probe `<api_url>/version` once per process, gate on Hindsight ≥ 0.5.0.
When supported, retains use a stable session-scoped `document_id`
(`session_id`) plus `update_mode='append'` so cross-process retains for
the same session merge into one document instead of producing
N-different-process-stamped duplicates. When unsupported (or probe
fails), fall back to the existing per-process unique
`f"{session_id}-{start_ts}"` document_id with no `update_mode` — the
resume-overwrite fix (#6654) keeps working unchanged on legacy servers.
Closes the dedup half of #20115. The proposed `document_id_strategy`
config knob isn't needed: auto-detection via the same /version probe
the OpenClaw plugin already uses gives the same outcome with no extra
config burden, and the choice is purely a function of what the server
can do.
Plumbing
--------
- Module-level helpers (`_meets_minimum_version`, `_fetch_hindsight_api_version`,
`_check_api_supports_update_mode_append`) cache the result per api_url
so every provider in the process gets one /version round-trip.
- One-time WARN logged when the API is older than 0.5.0, telling the
user to upgrade for cross-session deduplication.
- New instance helper `_resolve_retain_target(fallback_doc_id)` returns
`(document_id, update_mode)` based on cached capability. Wired into
`sync_turn` and the `on_session_switch` flush path.
- For local_embedded mode, the probe URL is taken from the running
client (`client.url`) so we hit the actual daemon port rather than
the configured default.
- `update_mode` is set on the per-item dict; `aretain_batch` already
threads `item['update_mode']` into the API call.
Tests
-----
- `TestUpdateModeAppendCapability` (5 cases): legacy fallback, modern
stable+append, per-url cache, one-time warn, flush-on-switch resolves
against the OLD session.
- Existing `_make_hindsight_provider` factory in the manager-side test
file extended to seed `_mode`/`_api_url`/`_api_key`/`_client` and stub
`_resolve_retain_target` so the bypass-init pattern keeps working.
E2E verified against installed `~/.hermes/hermes-agent`:
- Legacy probe (unreachable host) → `legacy-session-<ts>` doc_id,
no `update_mode`.
- Modern probe (live local_embedded 0.5.6 daemon) → stable
`modern-session` doc_id + `update_mode='append'`.
- `test_hermes_embedded_smoke.py` passes (90s).
The dispatcher's circuit breaker only protected against spawn-side
failures (profile missing, workspace mount error, exec failure).
Workers that successfully spawned but then timed out or crashed
re-queued to ``ready`` with no counter increment, so the next tick
re-spawned them — loops forever until someone noticed. Reported
externally on Twitter (Forbidden Seeds) and confirmed by walking the
kernel: ``enforce_max_runtime`` flipped the task back to ready, emitted
a ``timed_out`` event, and never touched ``spawn_failures``; same for
``detect_crashed_workers``.
Fix: unify the counter across all non-success outcomes.
Schema
------
* ``tasks.spawn_failures`` → ``tasks.consecutive_failures``
* ``tasks.last_spawn_error`` → ``tasks.last_failure_error``
* Migration renames the columns in-place on existing DBs (``ALTER
TABLE RENAME COLUMN`` — SQLite >= 3.25) so historical counter
values are preserved. Row mappers fall through to the legacy names
if both column renames and a migration somehow got out of sync.
Counter lifecycle
-----------------
New helper ``_record_task_failure(conn, task_id, error, *, outcome,
release_claim, end_run, event_payload_extra)`` is the single point
every non-success outcome funnels through:
* ``spawn_failed`` → ``_record_spawn_failure`` (kept as alias)
calls it with ``release_claim=True, end_run=True`` — transitions
running→ready, clears claim, closes run.
* ``timed_out`` → ``enforce_max_runtime`` already does the status
transition + run close + event emission, then calls
``_record_task_failure`` with ``release_claim=False, end_run=False``
just to bump the counter (and trip the breaker if needed).
* ``crashed`` → ``detect_crashed_workers`` same pattern, but the
counter increment runs after the main write_txn closes (SQLite
doesn't nest write transactions).
If the counter hits the breaker threshold (``DEFAULT_FAILURE_LIMIT=5``,
same as before), the task transitions to ``blocked`` with a ``gave_up``
event on top of whatever outcome-specific event was already emitted.
Reset semantics changed: the counter now clears only on successful
``complete_task`` (and operator ``reclaim_task`` — an explicit "I've
looked at this, try again with a fresh budget"). Previously
``_clear_spawn_failures`` ran on every successful spawn, which would
have wiped the counter before a timeout could accumulate past threshold
— exactly the loop this fix prevents.
Diagnostics
-----------
* ``_rule_repeated_spawn_failures`` → ``_rule_repeated_failures``. Now
fires regardless of which outcome is at fault. Classifies the most
recent failure (spawn_failed / timed_out / crashed) from the run
history so the title ("Agent timeout x3", "Agent crash x4", "Agent
spawn x5") and suggested action (``doctor`` for spawn, ``log`` for
timeout/crash) stay outcome-specific without N duplicate rules.
* ``_rule_repeated_crashes`` kept as a narrower early-warning at
threshold 2 (vs 3 for the unified rule), but now suppresses itself
when the unified rule would also fire — avoids double-flagging.
* Diagnostic ``data`` payload now carries
``{consecutive_failures, most_recent_outcome, last_error}`` instead
of spawn-specific keys.
CLI
---
* ``Task.consecutive_failures`` / ``Task.last_failure_error`` are the
public fields now. Existing callers that referenced the old names
get migrated (tests updated in this commit).
* Backward-compat: ``DEFAULT_SPAWN_FAILURE_LIMIT``,
``_clear_spawn_failures``, ``_record_spawn_failure`` stay as aliases.
Tests
-----
* 6 new kernel tests: timeout increments counter, 3 consecutive
timeouts trip the breaker (was the reported gap), crash increments
counter, reclaim clears counter, completion clears counter, spawn
success does NOT clear counter.
* Diagnostic tests: updated ``repeated_spawn_failures`` cases to use
the new kind name and add a timeout-loop test.
* Dashboard API test: spawn_failures column update → consecutive_failures.
389/389 kanban-suite tests pass.
Live verification
-----------------
Seeded 4 tasks in an isolated HERMES_HOME: 3 timeouts, 4 crashes,
2-spawn-failed + 2-timed-out, and a task that had prior failures but
completed successfully. Board correctly shows "!! 3 tasks need
attention" (the successful one has no badge because the counter
reset). Drawer for the timeout-loop task renders "Agent timeout x3"
with most_recent_outcome=timed_out and the "Check logs" suggested
action (not the spawn-flavoured "Verify profile"). The successful
task has zero diagnostics.
Closes the Forbidden-Seeds-reported gap.
* fix(tui): close slash parity gaps with CLI
Route unsupported /skills subcommands through slash.exec, support /new <name>
titles, and handle /redraw natively so TUI behavior matches classic CLI. Also
filter gateway-only commands out of the TUI catalog while keeping /status
discoverable.
* fix(tui): run remaining CLI parity paths natively
Forward chat launch flags into the TUI runtime and handle live-session status
and skill reloads in the gateway process so TUI state no longer depends on the
slash worker's stale CLI instance.
* fix(tui): block stale snapshot restores
Prevent snapshot restore from running through the isolated slash worker because
it mutates disk state without refreshing the live TUI agent.
* chore: uptick
* fix(tui): guard async session title updates
Handle failures from the fire-and-forget session.title RPC so title-setting errors do not surface as unhandled promise rejections while preserving session-scoped messaging.
Every provider profile is now a self-contained plugin under
plugins/model-providers/<name>/, mirroring the plugins/platforms/
pattern established for IRC and Teams. The ProviderProfile ABC
stays in providers/; the per-provider profile data moves out.
- plugins/model-providers/<name>/__init__.py calls register_provider()
- plugins/model-providers/<name>/plugin.yaml declares kind: model-provider
- providers/__init__.py._discover_providers() lazily scans bundled plugins
then $HERMES_HOME/plugins/model-providers/<name>/ (user override path)
- User plugins with the same name override bundled ones (last-writer-wins
in register_provider)
- Legacy providers/<name>.py layout still supported for back-compat with
out-of-tree editable installs
- Hermes PluginManager: new kind=model-provider; skipped like memory
plugins (providers/ discovery owns them); standalone plugins with
register_provider+ProviderProfile in their __init__.py auto-coerce to
this kind (same heuristic as memory providers)
- skip_names extended to include 'model-providers' so the general
PluginManager doesn't double-scan the category
- 4 new tests in tests/providers/test_plugin_discovery.py covering
bundled discovery, user override, and general-loader isolation
- Docs updated: website/docs/developer-guide/adding-providers.md,
provider-runtime.md, providers/README.md, plugins/model-providers/README.md
No API break: auth.py / config.py / doctor.py / models.py / runtime_provider.py /
model_metadata.py / auxiliary_client.py / chat_completions.py / run_agent.py
all still consume providers via get_provider_profile() / list_providers() —
they just now see plugin-discovered entries instead of pkgutil-iterated ones.
Third parties can now drop a single directory into
~/.hermes/plugins/model-providers/<name>/ to add or override an inference
provider without touching the repo.
Introduces providers/ package — single source of truth for every
inference provider. Adding a simple api-key provider now requires one
providers/<name>.py file with zero edits anywhere else.
What this PR ships:
- providers/ package (ProviderProfile ABC + 33 profiles across 4 api_modes)
- ProviderProfile declarative fields: name, api_mode, aliases, display_name,
env_vars, base_url, models_url, auth_type, fallback_models, hostname,
default_headers, fixed_temperature, default_max_tokens, default_aux_model
- 4 overridable hooks: prepare_messages, build_extra_body,
build_api_kwargs_extras, fetch_models
- chat_completions.build_kwargs: profile path via _build_kwargs_from_profile,
legacy flag path retained for lmstudio/tencent-tokenhub (which have
session-aware reasoning probing that doesn't map cleanly to hooks yet)
- run_agent.py: profile path for all registered providers; legacy path
variable scoping fixed (all flags defined before branching)
- Auto-wires: auth.PROVIDER_REGISTRY, models.CANONICAL_PROVIDERS,
doctor health checks, config.OPTIONAL_ENV_VARS, model_metadata._URL_TO_PROVIDER
- GeminiProfile: thinking_config translation (native + openai-compat nested)
- New tests/providers/ (79 tests covering profile declarations, transport
parity, hook overrides, e2e kwargs assembly)
Deltas vs original PR (salvaged onto current main):
- Added profiles: alibaba-coding-plan, azure-foundry, minimax-oauth
(were added to main since original PR)
- Skipped profiles: lmstudio, tencent-tokenhub stay on legacy path (their
reasoning_effort probing has no clean hook equivalent yet)
- Removed lmstudio alias from custom profile (it's a separate provider now)
- Skipped openrouter/custom from PROVIDER_REGISTRY auto-extension
(resolve_provider special-cases them; adding breaks runtime resolution)
- runtime_provider: profile.api_mode only as fallback when URL detection
finds nothing (was breaking minimax /v1 override)
- Preserved main's legacy-path improvements: deepseek reasoning_content
preserve, gemini Gemma skip, OpenRouter response caching, Anthropic 1M
beta recovery, etc.
- Kept agent/copilot_acp_client.py in place (rejected PR's relocation —
main has 7 fixes landed since; relocation would revert them)
- _API_KEY_PROVIDER_AUX_MODELS alias kept for backward compat with existing
test imports
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes#14418
* feat(kanban): generic diagnostics engine for task distress signals
Replaces the hallucination-specific ``warnings`` / ``RecoverySection``
surface (shipped in PR #20232) with a reusable diagnostic-rule engine
that covers five distress kinds in v1 and can be extended without
touching UI code. The "something's wrong with this task" signal is
no longer limited to phantom card ids.
Closes the follow-up from #20232 discussion.
New module
----------
``hermes_cli/kanban_diagnostics.py`` — stateless, no-side-effect rule
engine. Each rule is a pure function of
``(task, events, runs, now, config) -> list[Diagnostic]``. Registry
is a simple list; adding a new distress kind is one function + one
import, no UI or API changes required.
v1 rule set
-----------
* ``hallucinated_cards`` (error) — folds the existing
``completion_blocked_hallucination`` event into the new surface.
* ``prose_phantom_refs`` (warning) — folds
``suspected_hallucinated_references``.
* ``repeated_spawn_failures`` (error → critical at 2x threshold) —
fires when ``tasks.spawn_failures >= 3``; suggests
``hermes -p <profile> doctor`` / ``auth``.
* ``repeated_crashes`` (error → critical) — fires after N consecutive
``crashed`` run outcomes with no successful completion between;
suggests ``hermes kanban log <id>``.
* ``stuck_in_blocked`` (warning) — fires after 24h in ``blocked``
state with no comments / unblock attempts; suggests commenting.
Every diagnostic carries structured ``actions`` (reclaim, reassign,
unblock, cli_hint, comment, open_docs) that render consistently in
both CLI and dashboard. Suggested actions are highlighted; generic
recovery actions (reclaim / reassign) are available on every kind as
fallbacks.
Diagnostics auto-clear when the underlying failure resolves — a
clean ``completed``/``edited`` event drops hallucination diagnostics,
a successful run drops crash diagnostics, a comment drops
stuck-blocked diagnostics. Audit events persist; the badge goes away.
API
---
``plugin_api.py``:
* ``/board`` now attaches ``diagnostics`` (full list) and
``warnings`` (compact summary with ``highest_severity``) per task.
* ``/tasks/{id}`` attaches diagnostics so the drawer's Diagnostics
section auto-opens on flagged tasks.
* NEW ``/diagnostics`` endpoint — fleet-wide listing, filterable by
severity, sorted critical-first.
CLI
---
* NEW ``hermes kanban diagnostics [--severity X] [--task id]
[--json]`` — fleet view or single-task view, matches dashboard rule
output so CLI users see the same picture.
* ``hermes kanban show <id>`` now renders a Diagnostics section near
the top with severity markers + suggested actions.
Dashboard
---------
* Card badge is severity-coloured (⚠ amber warning, !! orange error,
!!! red critical) using ``warnings.highest_severity``.
* Attention strip above the toolbar counts EVERY task with active
diagnostics (not just hallucinations), severity-coloured, lists
affected tasks with Open buttons when expanded.
* Drawer's old ``RecoverySection`` replaced with generic
``DiagnosticsSection`` rendering a card per active diagnostic:
title + detail + structured data (task-id chips when payload keys
look like id lists) + action buttons. Reassign profile picker is
inline per-diagnostic. Clipboard fallback uses ``.catch()`` for
environments where writeText rejects.
* Three-rung severity palette; amber for warning, orange for error,
red for critical. Uses CSS variables so theming is straightforward.
Tests
-----
* NEW ``tests/hermes_cli/test_kanban_diagnostics.py`` — 14 unit tests
covering each rule's positive/negative/threshold paths, severity
sorting, broken-rule isolation, and sqlite3.Row integration.
* Dashboard plugin tests extended: ``/diagnostics`` endpoint (empty,
populated, severity-filtered), ``/board`` exposes both diagnostic
list and compact summary with ``highest_severity``.
* Existing hallucination-specific test (``test_board_surfaces_
warnings_field_for_hallucinated_completions``) updated to reflect
the new contract: warning summary keys by diagnostic kind
(``hallucinated_cards``) not event kind.
379 kanban-suite tests pass (+16 net from this PR).
Live verification
-----------------
Seeded all 5 diagnostic kinds + one clean + one plain-running task
(7 total) into an isolated HERMES_HOME, spun up the dashboard, and
verified:
* Attention strip: shows ``!! 5 tasks need attention`` in the
error-severity orange; Show expands to a list of 5 rows ordered
critical > error > warning.
* Card badges: error tasks render ``!!`` orange, warning tasks
render ``⚠`` amber, clean and plain-running tasks render no badge.
* Each of the 5 rules opens a correctly-coloured, correctly-styled
diagnostic card in the drawer with its specific suggested action.
* Live reassign from a diagnostic card flipped
``broken-ml-worker → alice`` and the drawer refreshed with the
new assignee + the same diagnostic still firing (correct:
spawn_failures counter hasn't reset yet).
* CLI ``hermes kanban diagnostics`` prints all 5 in severity order;
``--severity error`` narrows to 3; ``kanban show <id>`` includes
the Diagnostics block at the top with suggested action hint.
Migration note
--------------
The old ``warnings`` shape (``{count, kinds, latest_at}``) is
preserved on the API but ``kinds`` now keys by diagnostic kind
(``hallucinated_cards``) instead of event kind
(``completion_blocked_hallucination``). ``highest_severity`` is a
new required field. The dashboard was the only consumer and has
been updated in the same commit; external API consumers of the
``warnings`` field will need to update their kind-match logic.
* feat(kanban/diagnostics): lead titles with the actual error text
The generic 'Worker crashed N runs in a row' / 'Worker failed to spawn
N times' titles buried the actual cause in the data section. Operators
had to open logs or expand the diagnostic to see WHY the worker is
stuck — rate-limit vs insufficient quota vs bad auth vs context
overflow vs network blip all looked identical at a glance.
New titles:
Agent crashed 3x: openai: 429 Too Many Requests - rate limit reached
Agent crashed 3x: anthropic: 402 insufficient_quota - credit balance
Agent crashed 3x: provider auth error: 401 Unauthorized
Agent spawn failed 4x: insufficient_quota: You exceeded your current
Detail keeps the full error snippet (capped at 500 chars + ellipsis
for tracebacks). Title takes the first line capped at 160 chars.
Fallback title if no error recorded stays honest ('no error recorded').
Tests: 4 new cases covering 429/billing/spawn/truncation. 383 total
pass (+4).
Live-verified on dashboard with 6 seeded scenarios
(rate-limit, billing, auth, context, network, spawn-billing) —
each card title leads with the actionable error text.
Mirror _message_thread_id_for_typing() with _message_thread_id_for_send():
both now map the General forum topic (thread id "1") to None upfront.
That removes the need for the retry-without-thread fallback in send_typing()
entirely — if _message_thread_id_for_typing() returns a non-None value, it's
a real user-created topic and falling back to the root chat is never correct.
If Telegram rejects the typing action (e.g. topic deleted mid-session), we
swallow it at debug level instead of bleeding the indicator into All Messages.
Updates the General-topic typing regression test to assert the new single-call
contract.
Fix three regressions introduced by PR #18370 (lazy session creation):
1. _finalize_session() uses stale session_key after compression (#20001)
2. session_key not synced after auto-compression in run_conversation (#20001)
3. pending_title ValueError leaves title wedged forever (#19029)
4. Gateway silently swallows null responses when agent did work (#18765)
5. One-time cleanup for accumulated ghost compression continuations (#20001)
Changes:
- tui_gateway/server.py: _finalize_session() now uses agent.session_id
(falls back to session_key when agent is None). Refactor
_sync_session_key_after_compress() with clear_pending_title and
restart_slash_worker policy flags. Call it post-run_conversation()
to sync session_key after auto-compression. Add ValueError handler
to pending_title flush.
- gateway/run.py: Extract _normalize_empty_agent_response() helper that
consolidates failed/partial/null response handling. Surfaces user-facing
error when agent did work (api_calls > 0) but returned no text.
- hermes_state.py: Add finalize_orphaned_compression_sessions() — marks
ghost continuation sessions as ended (non-destructive, preserves data).
- cli.py: One-time startup migration for orphaned compression sessions.
Test changes:
- tests/test_tui_gateway_server.py: Update pending_title ValueError test
for post-#18370 architecture (title applied post-message, not at create).
- tests/test_lazy_session_regressions.py: 14 new regression tests covering
all fixed paths.
The Telegram/Discord /model pickers currently call
list_authenticated_providers(), which returns every provider whose
credentials resolve locally and every model in its curated snapshot.
Two failure modes fall out:
- OpenRouter rows can include IDs the live catalog no longer carries.
- Provider rows can surface with zero callable models (e.g. a slug
whose credential pool entry exists but has nothing behind it).
list_picker_providers() wraps the base function and post-processes the
result so the interactive picker only shows models the user can
actually select:
- OpenRouter's models come from fetch_openrouter_models() (live-catalog
filtered against the curated OPENROUTER_MODELS snapshot).
- Rows with an empty models list are dropped, except custom endpoints
(is_user_defined=True with an api_url) where the user may enter
model ids manually.
- All other fields pass through unchanged.
The gateway /model handler switches to the new helper for the
interactive picker payload only. Typed /model <name> and the text
fallback list stay on list_authenticated_providers() so nothing is
hidden from power users or platforms without a picker.
Covered by nine focused unit tests in
tests/hermes_cli/test_list_picker_providers.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a provider returns a 429 rate-limit error (not billing-related),
the auxiliary client's call_llm/async_call_llm previously did NOT trigger
the fallback chain. This caused auxiliary tasks like session_search to
exhaust all 3 retries against the same rate-limited endpoint, losing
session metadata that depended on the summarization completing.
Root cause: `_is_payment_error()` only matched 429s containing billing
keywords ("credits", "insufficient funds", etc.). Provider-specific
rate-limit messages like Nous's "Hold up for a bit, you've exceeded the
rate limit on your API key" didn't match, so `_is_payment_error` returned
False, `_is_connection_error` returned False, and `should_fallback` was
False — all retries hit the same rate-limited provider.
Fix:
- New `_is_rate_limit_error()` function that detects 429 + rate-limit
keywords, generic 429 without billing keywords, and OpenAI SDK
`RateLimitError` class instances (which may omit .status_code).
- Updated `should_fallback` in both `call_llm` and `async_call_llm` to
include `_is_rate_limit_error`.
- Updated the max_tokens retry path to also check for rate-limit errors.
- Updated the reason string to include "rate limit".
This complements the Nous rate guard (PR #10568) which prevents new calls
to Nous when already rate-limited — this fix handles the case where a
request is already in flight when the 429 arrives.
Related: #8023, #12554, #11034
Co-authored-by: Zeejay <zjtan1@gmail.com>
Salvages @Es1la's PR #13632 — a non-numeric timestamp in the persisted
feishu dedup state crashed adapter startup with ValueError/TypeError
from the unguarded float() call. Wrap the float() conversion in
try/except; skip the bad key and keep loading the rest.
The original PR also restructured existing TestDedupTTL tests to use
tempfile.TemporaryDirectory + HERMES_HOME patching — that was
test-hygiene scope creep unrelated to the bug. Kept only the
malformed-timestamp fix and added a focused regression test.
OpenRouter's dashboard attributes usage via the `X-Title` header.
Hermes was sending `X-OpenRouter-Title`, which OpenRouter does not
recognize, so Hermes usage showed up unlabeled. Rename to `X-Title`
to match the canonical header (already used elsewhere in the same
file via _AI_GATEWAY_HEADERS).
Salvages the core fix from @JTroyerOvermatch's PR #13649. Dropped the
PR's `HERMES_OPENROUTER_TITLE` / `HERMES_OPENROUTER_REFERER` env-var
override plumbing per the '.env is for secrets only' policy — if
per-deployment attribution is needed later it should go under
`openrouter.title` / `openrouter.referer` in config.yaml instead.
After PR #13725 replaced the module-level _LOCK_DIR/_LOCK_FILE constants
with a dynamic _get_lock_paths() helper, the xdist-isolation fixture
needs to patch the function instead of the removed constants.
Workers completing a kanban task can now claim the ids of cards they
created via an optional ``created_cards`` field on ``kanban_complete``.
The kernel verifies each id exists and was created by the completing
worker's profile; any phantom id blocks the completion with a
``HallucinatedCardsError`` and records a
``completion_blocked_hallucination`` event on the task so the rejected
attempt is auditable. Successful completions also get a non-blocking
prose-scan pass over their ``summary`` + ``result`` that emits a
``suspected_hallucinated_references`` event for any ``t_<hex>``
reference that doesn't resolve.
Closes#20017.
Recovery UX (kernel + CLI + dashboard)
--------------------------------------
A structural gate alone isn't enough — operators also need to see and
act on stuck workers, especially when a profile's model is the root
cause. This PR ships the full loop:
* ``kanban_db.reclaim_task(task_id)`` — operator-driven reclaim that
releases an active worker claim immediately (unlike
``release_stale_claims`` which only acts after claim_expires has
passed). Emits a ``reclaimed`` event with ``manual: True`` payload.
* ``kanban_db.reassign_task(task_id, profile, reclaim_first=...)`` —
switch a task to a different profile, optionally reclaiming a stuck
running worker in the same call.
* ``hermes kanban reclaim <id> [--reason ...]`` and
``hermes kanban reassign <id> <profile> [--reclaim] [--reason ...]``
CLI subcommands wired through to the same helpers.
* ``POST /api/plugins/kanban/tasks/{id}/reclaim`` and
``POST /api/plugins/kanban/tasks/{id}/reassign`` endpoints on the
dashboard plugin.
Dashboard surfacing
-------------------
* ⚠ **warning badge** on cards with active hallucination events.
* **attention strip** at the top of the board listing all flagged
tasks; dismissible per session.
* **events callout** in the task drawer — hallucination events render
with a red left border, amber icon, and phantom ids as styled chips.
* **recovery section** in the task drawer with three actions: Reclaim,
Reassign (with profile picker + reclaim-first checkbox), and a
copy-to-clipboard hint for ``hermes -p <profile> model`` since
profile config lives on disk and can't be edited from the browser.
Auto-opens when the task has warnings, collapsed otherwise.
Keyed by task id so state doesn't leak between drawers.
Active-vs-stale rule: warnings clear when a clean ``completed`` or
``edited`` event supersedes the hallucination, so recovery is never
permanently stigmatising — the audit events persist for debugging but
the badge goes away once the worker succeeds.
Skill updates
-------------
* ``skills/devops/kanban-worker/SKILL.md`` documents the
``created_cards`` contract with good/bad examples.
* ``skills/devops/kanban-orchestrator/SKILL.md`` gains a "Recovering
stuck workers" section with the three actions and when to use each.
Tests
-----
* Kernel gate: verified-cards manifest, phantom rejection + audit
event, cross-worker rejection, prose scan positive + negative.
* Recovery helpers: reclaim on running task, reclaim on non-running
returns False, reassign refuses running without reclaim_first,
reassign with reclaim_first succeeds on running.
* API endpoints: warnings field present on /board and /tasks/:id,
warnings cleared after clean completion, reclaim 200 + 409 paths,
reassign 200 + 409 + reclaim_first paths.
* CLI smoke: reclaim + reassign subcommands.
Live-verified end-to-end on a dashboard with seeded scenarios:
attention strip renders, badges land on the right cards, drawer
callout shows phantom chips, Reclaim on a running task flips status to
ready + emits manual reclaimed event + refreshes the drawer,
Reassign swaps the assignee and triggers board refresh.
359/359 kanban-suite tests pass
(test_kanban_{db,cli,boards,core_functionality} + dashboard + tools).
* revert(gateway): remove stale-code self-check and auto-restart
Removes the _detect_stale_code / _trigger_stale_code_restart mechanism
introduced in #17648 and iterated in #19740. On every incoming message
the gateway compared the boot-time git HEAD SHA to the current SHA on
disk, and if they differed it would reply with
Gateway code was updated in the background --
restarting this gateway so your next message runs
on the new code. Please retry in a moment.
and then kick off a graceful restart. This is unwanted behaviour:
users who run a long-lived gateway and do their own ad-hoc git
operations on the checkout end up with their chat interrupted and
the current message dropped every time HEAD moves, with no way to
opt out.
If an operator really needs the old protection against stale
sys.modules after "hermes update", the SIGKILL-survivor sweep in
hermes update (hermes_cli/main.py, also tagged #17648) already
handles the supervisor-respawn case on its own.
Removed:
gateway/run.py:
- _STALE_CODE_SENTINELS, _GIT_SHA_CACHE_TTL_SECS
- _read_git_head_sha(), _compute_repo_mtime() module helpers
- class-level _boot_wall_time / _boot_repo_mtime / _boot_git_sha /
_stale_code_restart_triggered defaults
- __init__ boot-snapshot block (_boot_*, _cached_current_sha*,
_repo_root_for_staleness, _stale_code_notified)
- _current_git_sha_cached(), _detect_stale_code(),
_trigger_stale_code_restart() methods
- stale-code check + user-facing restart notice at the top of
_handle_message()
tests/gateway/test_stale_code_self_check.py (deleted, 412 lines)
No new logic added. Zero remaining references to any removed
symbol. Gateway test suite passes the same 4589 tests it passed
before; the 3 pre-existing unrelated failures (discord free-channel,
feishu bot admission, teams typing) are unchanged by this commit.
* feat(i18n): add display.language for static message translation (zh/ja/de/es)
Adds a thin-slice i18n layer covering the highest-impact static user-facing
messages: the CLI dangerous-command approval prompt and a handful of gateway
slash-command replies (restart-drain, goal cleared, approval expired, config
read/save errors).
Out of scope (stays English): agent responses, log lines, tool outputs,
slash-command descriptions, error tracebacks.
Infrastructure:
- agent/i18n.py: catalog loader, t() helper, language resolution
(HERMES_LANGUAGE env var > display.language config > en)
- locales/{en,zh,ja,de,es}.yaml: ~19 translated strings per language
- display.language in DEFAULT_CONFIG (hermes_cli/config.py)
Tests:
- tests/agent/test_i18n.py: 21 tests covering catalog parity, placeholder
parity across locales, fallback behavior, env-var override, alias
normalization, missing-key graceful degradation.
Docs:
- website/docs/user-guide/configuration.md: display.language entry plus a
short section explaining scope so users don't expect agent responses to
translate via this knob.
Strip bracketed-paste control sequences from setup prompt input so pasted API keys work on Linux and WSL terminals, and add regression tests for normal/password prompts.
Closes#16491
Widens @Krionex's PR #16933 fix to cover the second bug class at the sibling
site. natural mode used to pass env values through int() before the PR
caught mis-typed values crashing the gateway; custom mode had the exact
same bug one branch away (HERMES_HUMAN_DELAY_MIN_MS=oops in custom mode
still crashed). Same try/except/fallback pattern, scoped to the two
int() calls that feed random.uniform().
Copilot review on PR #17012 noted the docstring/comment lists `0`
among the falsy effort values that fall back to `medium`, but the
existing regression tests only cover `None` and `""`. Add the third
case to lock in the full contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
auxiliary.<task>.extra_body.reasoning, but the new translation path in
_CodexCompletionsAdapter.create() reads the effort with
``reasoning_cfg.get("effort", "medium")``. That returns the configured
value verbatim when the key is present, so ``effort: null`` /
``effort: ""`` (both common YAML shapes) flow through as
``{"effort": null, "summary": "auto"}`` and Codex rejects the request
with "Invalid value for parameter ``reasoning.effort``".
agent/transports/codex.py::build_kwargs() — which the new adapter is
documented to mirror — uses a truthy check (``elif
reasoning_config.get("effort"):``) so the same falsy values keep the
"medium" default. Switch the auxiliary adapter to the same
``or "medium"`` truthy form so identical config produces identical
requests on both paths.
- [x] Two new regression tests cover ``effort: None`` and
``effort: ""`` and assert the request goes out as
``{"effort": "medium", "summary": "auto"}``.
- [x] Old behaviour fails the new tests (``{'effort': None} !=
{'effort': 'medium'}``); fixed behaviour passes all 11 tests in the
``TestCodexAdapterReasoningTranslation`` class.
- [x] Adjacent suites green: ``tests/agent/test_auxiliary_client.py``
(108 passed) and ``tests/agent/transports/test_codex_transport.py +
test_chat_completions.py`` (73 passed).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The API server is a documented, first-class messaging platform with its own
gateway adapter, docs pages, and toolset. But it's the only messaging
platform missing from PLATFORM_HINTS in agent/prompt_builder.py.
Without a platform hint, the agent has no context about the API server's
rendering environment and defaults to markdown-heavy document-style outputs
(code fences, bold, bullet points) — which break on the plain-text frontends
most API server consumers wrap (Open WebUI, custom agents, third-party
bridges).
Adds a generic api_server entry that describes the medium (unknown rendering,
assume plain text) without encoding any specific use case. Individual consumers
can layer additional style guidance via ephemeral system prompts.
Before (DeepSeek V4 Pro via API server, no hint):
**Sendblue bridge** at /opt/sendblue-bridge - **68MB** on disk
After (same prompt, with hint):
Sendblue bridge at /opt/sendblue-bridge, 68MB on disk
No breaking changes — new dict entry only. Existing API server consumers see
no behavioral change except for models that previously defaulted to markdown
formatting, which now produce cleaner plain-text output.
Previously, pinning a skill blocked every skill_manage write action
(edit, patch, delete, write_file, remove_file). The 'hard fence'
design conflated two concerns:
1. Pin as deletion protection — don't let the curator archive
or the agent delete a stable skill.
2. Pin as content freeze — don't let the agent rewrite it mid-conversation.
In practice (1) is what users pin for: they want a skill to survive
curator passes. (2) created friction — agents finding a new pitfall
in a pinned skill had to ask the user to unpin, then the agent
patches, then the user re-pins. The dance discouraged skill
maintenance and pinned skills went stale.
This narrows the _pinned_guard to skill_manage(action='delete') only.
Patches, edits, and supporting-file writes go through on pinned
skills so the agent can keep improving them. The curator's own
pinned-skip behavior (agent/curator.py:271 for auto-archive,
line 349 for the LLM review prompt) is unchanged — curator still
never touches pinned skills.
Changes:
- tools/skill_manager_tool.py: remove _pinned_guard calls from
_edit_skill, _patch_skill, _write_file, _remove_file; keep on
_delete_skill. Updated _pinned_guard docstring and error message.
- tools/skill_manager_tool.py: updated skill_manage model-facing tool
description to reflect the new semantic.
- website/docs/user-guide/features/curator.md: updated pinning
section.
- tests/tools/test_skill_manager_tool.py: flipped refuses-pinned
tests for edit/patch/write_file/remove_file into allowed-when-pinned;
kept test_delete_refuses_pinned (strengthened assertion to check the
'cannot be deleted' wording).
Closes#18354
* feat(api-server): X-Hermes-Session-Key header for long-term memory scoping
API Server integrations (Open WebUI, custom web UIs) can now pass a stable
per-channel identifier via X-Hermes-Session-Key that scopes long-term memory
(Honcho, etc.) independently of the transcript-scoped X-Hermes-Session-Id.
This matches the native gateway's session_key / session_id split: one stable
key per assistant channel, many independent transcripts that rotate on /new.
- _create_agent and _run_agent accept gateway_session_key and pass it to
AIAgent(gateway_session_key=...), which is already honored by the Honcho
memory provider (plugins/memory/honcho/client.py resolve_session_name).
- New shared helper _parse_session_key_header applies the same API-key
gate, control-character sanitization, and a 256-char length cap as the
existing session-id header.
- All three agent endpoints honor the header: /v1/chat/completions,
/v1/responses, /v1/runs. JSON and SSE responses echo it back.
- /v1/capabilities advertises session_key_header so clients can
feature-detect.
Closes#20060.
Co-authored-by: Andy Stewart <lazycat.manatee@gmail.com>
* chore: AUTHOR_MAP entry for manateelazycat
---------
Co-authored-by: Andy Stewart <lazycat.manatee@gmail.com>
* fix(curator): protect hub skills by frontmatter name
* test(skill_usage): add mark_agent_created to regression test
The cherry-picked test predates #19618/#19621 which rewrote
list_agent_created_skill_names() to require an explicit
created_by: 'agent' provenance marker. Without mark_agent_created(),
my-skill is excluded from the list and the positive assertion fails.
* feat(curator): add archive and prune subcommands
Adds 'hermes curator archive <skill>' and 'hermes curator prune
[--days N] [--yes] [--dry-run]' alongside the existing status, run,
pause, resume, pin, unpin, restore, backup, rollback verbs.
These are the two genuinely new user-facing verbs requested in #19384.
The other verbs proposed there ('stats' and 'restore') already exist
as 'curator status' and 'curator restore', so no duplicate surface is
added — all skill lifecycle commands live under the single 'hermes
curator' namespace.
- archive: manual archive of an agent-created skill. Refuses pinned
skills with a hint pointing at 'hermes curator unpin'.
- prune: bulk-archive unpinned skills idle for >= N days (default 90).
Falls back to created_at when last_activity_at is null so never-used
skills can still be pruned. --dry-run previews, --yes skips prompt.
Adapted from @elmatadorgh's PR #19454 which placed the same verbs
under 'hermes skills' with a separate hermes_cli/skills_config.py
handler and rich table for stats. The 'stats' and 'restore' parts of
that PR duplicated existing surface, so only archive and prune are
kept, rewritten to match hermes_cli/curator.py's existing plain-text
handler style. Tests rewritten from scratch against the new handlers.
Closes#19384
Co-authored-by: elmatadorgh <coktinbaran5@gmail.com>
---------
Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Co-authored-by: elmatadorgh <coktinbaran5@gmail.com>
Covers four scenarios for the reasoning-box extraction loop:
- simple turn with reasoning
- simple turn with no reasoning
- tool-calling turn where reasoning lives on the tool-call step
- prior turn had reasoning, current turn does not (the stale-display
bug the fix exists for)
- tool-calling turn where reasoning lives on BOTH steps (latest wins)
- empty-string reasoning treated as missing
Also updates the four inline replica loops in tests/cli/test_reasoning_command.py
to match the new turn-boundary shape so the test file reflects
production semantics.
The YAML-to-env-var bridge in load_gateway_config() mapped every Discord
and Telegram config key (require_mention, auto_thread, reactions, etc.)
except reply_to_mode. Users setting discord.reply_to_mode or
telegram.reply_to_mode in ~/.hermes/config.yaml got no effect — the
adapter only read the env var, which nothing populated from YAML.
Add the missing bridge for both platforms, following the existing pattern.
Top-level <platform>.reply_to_mode preferred, falls back to
<platform>.extra.reply_to_mode, env var never overwritten. Handles YAML
1.1 bare `off` → Python False coercion.
This is a re-submission of the work from #9837 and #13930, which both
implemented the same fix but neither landed (see co-authors below).
Co-authored-by: Matteo De Agazio <hypnosis.mda@gmail.com>
Co-authored-by: ishardo <239075732+ishardo@users.noreply.github.com>
The cherry-picked test predates #19618/#19621 which rewrote
list_agent_created_skill_names() to require an explicit
created_by: 'agent' provenance marker. Without mark_agent_created(),
my-skill is excluded from the list and the positive assertion fails.
Closes the gap where write_file skipped the post-edit syntax check that
patch already ran, so silent file corruption (bad quote escaping,
truncated writes, etc.) would persist on disk until a later read.
## Changes
tools/file_operations.py:
- Add in-process linters for .py, .json, .yaml, .toml (LINTERS_INPROC).
Python uses ast.parse, JSON/YAML/TOML use stdlib/PyYAML parsers.
Zero subprocess overhead; preferred over shell linters when both apply.
- _check_lint() now accepts optional content and routes to in-process
linter first. Shell linter (py_compile, node --check, tsc, go vet,
rustfmt) remains the fallback for languages without an in-process
equivalent.
- New _check_lint_delta() implements the post-first/pre-lazy pattern
borrowed from Cline and OpenCode: lint post-write state first; only
if errors are found AND pre-content was captured does it lint the
pre-state and diff. If the pre-existing file had the SAME errors the
edit didn't introduce anything new, so the file is reported as 'still
broken, pre-existing' with success=False but a message explaining the
errors were pre-existing. If the edit introduced genuinely new errors,
those are surfaced and pre-existing ones are filtered out.
- WriteResult gains a lint field.
- write_file() captures pre-content for in-process-lintable extensions
and calls _check_lint_delta after a successful write.
- patch_replace() switches from _check_lint to _check_lint_delta,
reusing the pre-edit content it already has in scope.
tools/file_tools.py:
- Update write_file schema description to mention the post-write lint.
tests/tools/test_file_operations_edge_cases.py:
- Update existing brace-path tests to use .js (shell linter) now that
.py is in-process.
- Add TestCheckLintInproc (9 tests) covering Python/JSON/YAML/TOML
in-process linters.
- Add TestCheckLintDelta (5 tests) covering the post-first/pre-lazy
short-circuit, new-file path, and the single-error-parser caveat.
## Performance
In-process linters are microseconds per call (ast.parse, json.loads).
The hot path (clean write) runs exactly one lint — matches main's cost
for patch. Pre-state capture is skipped when the file has no applicable
linter. Measured 4.89ms/write average over 100 .py writes including lint.
## Inspiration
- Cline's DiffViewProvider.getNewDiagnosticProblems() — filters pre-write
diagnostics from post-write diagnostics (src/integrations/editor/DiffViewProvider.ts).
- OpenCode's WriteTool — runs lsp.diagnostics() after write and appends
errors to tool output (packages/opencode/src/tool/write.ts).
- Claude Code's DiagnosticTrackingService — captures baseline via
beforeFileEdited() and returns new-diagnostics-only from
getNewDiagnostics() (src/services/diagnosticTracking.ts).
## Validation
- tests/tools/test_file_operations.py + test_file_operations_edge_cases.py
+ test_file_tools.py + test_file_tools_live.py + test_file_write_safety.py
+ test_write_deny.py + test_patch_parser.py + test_file_ops_cwd_tracking.py:
228 passed locally.
- Live E2E reproduction of the tips.py corruption incident: broken
content written; lint field surfaces 'SyntaxError: invalid syntax.
Perhaps you forgot a comma? (line 6, column 5)' — the exact error
that would have self-corrected the bug on the next turn.
When the head ends with assistant/tool and the tail starts with assistant,
the summary is inserted as a standalone role="user" message. The body's
verbatim "## Active Task" quote then gets read as fresh user input by
weak/local models (#11475, #14521).
The merge-into-tail path already appends an explicit end-of-summary marker
for this reason. Mirror it on the standalone path so both insertion routes
give the model the same "summary above, not new input" signal.
discover_fallback_ips() filtered out any DoH-resolved IP that also appeared
in the system resolver's answer set, on the assumption that the system IP
was unreachable. When DoH and system DNS agreed (a common case), the
function returned the hardcoded _SEED_FALLBACK_IPS list instead — and on
networks where those seed addresses are not routable, the Telegram fallback
transport had nothing usable to retry against and polling failed.
Drop the system_ips exclusion so DoH-confirmed IPs are preserved regardless
of system DNS overlap. The TelegramFallbackTransport already tries the
primary path first via system DNS, then falls through to the IP-rewrite
path on connect failure; including the same IP in both lanes lets a
transient primary failure recover via the explicit IP route instead of
escalating to seed addresses.
Update the two tests that codified the old exclusion to reflect the new,
inclusion-by-default behaviour.
Fixes#14520