mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-08 03:01:47 +00:00
fix(kanban): dispatcher skips ready tasks whose assignee is not a real profile
The kanban dispatcher's `_default_spawn` invokes
``hermes -p <task.assignee> chat -q ...``. When ``assignee``
names a control-plane lane (e.g. an interactive Claude Code
terminal like ``orion-cc`` / ``orion-research``) instead of a
real Hermes profile, the subprocess fails on startup with
"Profile 'X' does not exist", gets reaped as a zombie, the
TTL/crash detector marks the task back to ``ready``, and the
next tick re-spawns the same crashing worker. Result: a
permanent crash loop emitting ``spawned=2 crashed=2 every tick``
in the gateway log and burning CPU forever.
Reproduce on a fresh Hermes-agent install:
# 1. Create a kanban task whose assignee names a non-profile.
hermes kanban create --assignee orion-cc --status ready \
--title "Review PR #N" --body "..."
# 2. Start the gateway with the embedded dispatcher.
hermes gateway run
# gateway.log lines every minute:
# kanban dispatcher: tick spawned=1 reclaimed=0 crashed=1 ...
# 3. ps -ef | grep '[h]ermes.*defunct' shows zombies.
Fix
---
``dispatch_once()`` now pre-checks ``hermes_cli.profiles.
profile_exists(assignee)`` before claiming. If False, the row
is added to ``skipped_unassigned`` (it's effectively
"unassigned-to-an-executable-profile") and the dispatcher
moves on without claiming, spawning, or counting a crash.
The check is opt-in safe: if the import fails (e.g. test
isolation, profile module restructured), ``profile_exists``
falls back to ``None`` and the original behaviour is preserved
unchanged.
This addresses the explicit hint in the kanban task body
(``t_2bab06e3``):
"Should ready-state tasks auto-spawn at all, or only on
explicit orion-cc claim? If spurious, gate the auto-spawn
behind a config flag (e.g. only assignee=hermes or
assignee=auto)."
Profile-existence is a tighter gate than a config flag — it
self-documents (the user already knows whether they have an
``orion-cc`` profile), and it doesn't require Mac to maintain
an allowlist as new lane names appear. New lanes that ARE
real profiles (created via ``hermes profile create``) auto-
qualify the moment the profile dir is created.
Validated live
--------------
On Orion's hermes-agent install, two ``orion-research``-
assigned tasks (Bug A and Bug C investigations) had been
crash-looping since 2026-05-05 06:58 local. After applying
the patch + restarting the gateway:
- Stale ``running`` claims released to ``ready`` cleanly.
- New gateway emitted ``kanban dispatcher: embedded`` and
has ticked silently for 2+ minutes — no spawned=,
crashed=, or stuck= log lines (all spawn skips are quiet).
- Tasks remain ``ready`` with ``claim_lock=None``,
``worker_pid=None``, ``spawn_failures=0``.
- Dashboard + telegram + freqtrade unaffected.
Confidence: high (live verified on Orion).
Scope-risk: narrow (additive guard inside one function).
Not-tested: behaviour when a profile is renamed mid-tick —
current code re-imports ``profile_exists`` per row so a
freshly created profile auto-qualifies on the next tick.
Machine: orion-terminal
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
91ce8fc000
commit
ca5595fe7b
1 changed files with 17 additions and 0 deletions
|
|
@ -2506,6 +2506,23 @@ def dispatch_once(
|
|||
if not row["assignee"]:
|
||||
result.skipped_unassigned.append(row["id"])
|
||||
continue
|
||||
# Skip ready tasks whose assignee is not a real Hermes profile.
|
||||
# `_default_spawn` invokes ``hermes -p <assignee>`` which fails
|
||||
# with "Profile 'X' does not exist" when the assignee names a
|
||||
# control-plane lane (e.g. an interactive Claude Code terminal
|
||||
# like ``orion-cc`` / ``orion-research``) rather than a Hermes
|
||||
# profile. Those task lanes are pulled by terminals via
|
||||
# ``claim_task`` directly and should NEVER auto-spawn — the
|
||||
# subprocess would crash on startup, get reaped as a zombie,
|
||||
# the task would loop back to ``ready`` on next tick, and we'd
|
||||
# burn CPU forever (#kanban-dispatcher-crash-loop 2026-05-05).
|
||||
try:
|
||||
from hermes_cli.profiles import profile_exists # local import: avoids cycle
|
||||
except Exception:
|
||||
profile_exists = None # type: ignore[assignment]
|
||||
if profile_exists is not None and not profile_exists(row["assignee"]):
|
||||
result.skipped_unassigned.append(row["id"])
|
||||
continue
|
||||
if dry_run:
|
||||
result.spawned.append((row["id"], row["assignee"], ""))
|
||||
continue
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue