fix(kanban): call kanban_block on iteration-budget exhaustion to prevent protocol violation

When a kanban worker subprocess hits the iteration budget, the agent
loop strips tools and asks the model for a summary.  The model cannot
call kanban_block itself at that point, so the process exits rc=0
without calling kanban_complete or kanban_block — a protocol violation
that the dispatcher detects as a fatal error, giving up after 1 failure
and stranding downstream tasks.

Fix: after _handle_max_iterations() returns, check HERMES_KANBAN_TASK
and call kanban_block with a reason describing the exhaustion.  The
dispatcher then sees a clean block transition instead of a protocol
violation, and the task can be retried or escalated by a human.

Fixes [Bug] kanban-worker exits cleanly (rc=0) on iteration-budget
exhaustion without calling kanban_complete or kanban_block #23216
This commit is contained in:
liuhao1024 2026-05-10 23:39:07 +08:00 committed by kshitij
parent f6d4f3c37d
commit 2b3bf17dfa
2 changed files with 117 additions and 1 deletions

View file

@ -14987,7 +14987,41 @@ class AIAgent:
"— requesting summary..."
)
final_response = self._handle_max_iterations(messages, api_call_count)
# If running as a kanban worker, block the task so the dispatcher
# knows the worker could not complete (rather than treating it as a
# protocol violation). The agent loop strips tools before calling
# _handle_max_iterations, so the model cannot call kanban_block
# itself — we must do it on its behalf.
_kanban_task = os.environ.get("HERMES_KANBAN_TASK")
if _kanban_task:
try:
handle_function_call(
"kanban_block",
{
"task_id": _kanban_task,
"reason": (
f"Iteration budget exhausted "
f"({api_call_count}/{self.max_iterations}) — "
"task could not complete within the allowed "
"iterations"
),
},
task_id=effective_task_id,
)
logger.info(
"kanban_block called for task %s after iteration "
"exhaustion (%d/%d)",
_kanban_task, api_call_count, self.max_iterations,
)
except Exception:
logger.warning(
"Failed to call kanban_block after iteration "
"exhaustion for task %s",
_kanban_task,
exc_info=True,
)
# Determine if conversation completed successfully
completed = final_response is not None and api_call_count < self.max_iterations