When the self-improvement background review fires after a turn, it runs
in a bg thread and emits a ' 💾 <summary>' line to announce what it
saved to memory or skills. Two problems made this invisible to users
even when the review successfully modified a skill:
1. The print went through `_cprint` (prompt_toolkit's print_formatted_text)
on a bg thread while the CLI's PromptSession was live. Direct
print_formatted_text races with the input-area redraw and the line
can land behind/above the prompt, scrolled off without the user
seeing it.
2. The message said only '💾 Skill created.' / '💾 Memory updated'
with no indication that the self-improvement loop was the one doing
this. Users who did catch the line couldn't tell the background
review from some other agent action.
Fixes:
- `_cprint` now detects when it's called from a non-app thread with a
running prompt_toolkit Application, and routes through
`run_in_terminal` via `loop.call_soon_threadsafe`. That pauses the
input, prints the line above the prompt, and redraws — the normal
prompt_toolkit contract for bg-thread output. Direct-print fallback
preserved for the no-app / same-thread / import-error paths. Affects
every bg-thread emission, not just the review summary (curator
summaries and auxiliary failure prints benefit too).
- The summary now reads ' 💾 Self-improvement review: <summary>' in
both the CLI and the gateway `background_review_callback` path, so
the origin is unambiguous.
Tests:
- New `tests/cli/test_cprint_bg_thread.py` covers all five routing
branches (no app, app-not-running, cross-thread schedule, same-thread
direct, app-loop-attribute-error, import-error).
- New case in `tests/run_agent/test_background_review.py` asserts the
attributed prefix shows up in both `_safe_print` and
`background_review_callback`.
Live E2E: exercised _cprint from a bg thread inside a real Application
event loop; confirmed get_app_or_none() sees the app, call_soon_threadsafe
schedules run_in_terminal, and the inner _pt_print runs.
PR #13734 fixed the concurrent-tool-executor vector (ThreadPoolExecutor
workers didn't inherit the CLI's TLS approval callback). Two vectors
remained that could still land in the deadlocking input() fallback:
1. _spawn_background_review spawns a raw threading.Thread with no
approval callback installed, so any dangerous-command guard the
review agent trips falls back to input() -> deadlock against the
parent's prompt_toolkit TUI (same class as delegate_task subagents,
fixed in 023b1bff1 / #15491). Install a _bg_review_auto_deny
callback at thread start, clear on finally.
2. prompt_dangerous_approval's fallback unconditionally spawned a
daemon thread calling input() when approval_callback was None.
That fallback can never succeed under prompt_toolkit because the
user's Enter goes to pt's raw-mode stdin capture. Detect an active
pt Application via get_app_or_none() and fail closed (deny + log)
instead, so future threads that forget to install a callback
degrade gracefully instead of hanging 60s invisibly.
Regression guards:
- tests/run_agent/test_background_review.py verifies the review
worker thread sees a callable auto-deny callback mid-run and that
the slot is cleared in the finally block.
- tests/tools/test_approval.py TestFailClosedUnderPromptToolkit
verifies prompt_dangerous_approval returns 'deny' fast under a
mocked pt Application, and that a real callback still wins over
the guard.
Background review fork now inherits session_id, credential_pool, and
status_callback from the parent (added in #16099 after this PR was
written). Extend the bare-agent helper so the regression test keeps
reaching the cleanup assertions instead of failing in the runtime
resolver.
Signed-off-by: Teknium <8425893+teknium1@users.noreply.github.com>
Temporary background review agents can initialize Hindsight-backed memory clients, but close() alone skips provider teardown. Shut the memory provider down before closing so aiohttp sessions do not leak at process exit.
Made-with: Cursor