Commit graph

321 commits

Author SHA1 Message Date
brooklyn!
35e9c63d89
Merge pull request #52008 from infinitycrew39/fix/desktop-nous-onboarding-stale-provider
fix(desktop): stop Nous Portal onboarding from validating stale Anthropic config
2026-06-24 13:12:44 -05:00
infinitycrew39
6da615c77c fix(desktop): scope onboarding runtime check to connected provider
Let setup.runtime_check accept an optional provider, persist the selected
provider/model before the gate, and validate the provider the user just
connected instead of a stale config entry such as anthropic.
2026-06-24 23:19:45 +07:00
kyssta-exe
b85c460540 fix(tui): targeted save_config_value for model persistence (#48305)
The TUI model-switch persistence (_persist_model_switch) rewrote the entire
model config block via save_config(), destroying sibling keys the user set
under model: (model_slots, model_fallback, base_url, ...) on every switch.

Use targeted, atomic, comment-preserving save_config_value("model.default" /
"model.provider" / "model.base_url") writes instead, so a model switch only
touches the keys it changes.

Salvaged from #48391 by kyssta-exe (authorship preserved).

Fixes #48305
2026-06-24 19:34:33 +05:30
Teknium
d539cd9004
fix(config): write config.yaml as UTF-8 to stop emoji/personality corruption (#51676)
atomic_yaml_write (and two sibling config writers) called yaml.dump
without allow_unicode=True. The default personalities shipped in cli.py
contain emoji/kaomoji, so PyYAML escaped astral-plane chars as 8-digit
\\UXXXXXXXX sequences inside multi-line double-quoted strings wrapped
with \\ line-continuations. Stricter/non-PyYAML parsers, editors, and
hand-edits break that structure into unclosed quotes, failing the whole
config parse -> silent fallback to defaults -> custom_providers lost.

Add allow_unicode=True to the canonical writer plus tui_gateway/server.py
and the telegram adapter's atomic config write so config is written as
readable UTF-8 with no escape/fold artifacts.

Fixes #51356
2026-06-23 23:28:21 -07:00
Brooklyn Nicholson
e495b33bf1 Merge remote-tracking branch 'origin/main' into bb/pets-merge
# Conflicts:
#	hermes_cli/commands.py
#	tui_gateway/server.py
2026-06-23 19:05:22 -05:00
Teknium
e32ebc6aa2
feat(skills): /learn — distill a reusable skill from anything you describe (#51506)
Open-ended skill learning across every surface. /learn <free text> takes a
description of any source — a directory, a URL, the workflow you just walked
the agent through, or pasted notes — and the live agent gathers it with the
tools it already has (read_file/search_files, web_extract, the conversation,
the pasted text), then authors a SKILL.md via skill_manage following the
house authoring standards (<=60-char description, the standard section order,
Hermes-tool framing, no invented commands).

No engine, no model-tool footprint, works on any terminal backend (local,
Docker, remote): /learn builds a standards-guided prompt and hands it to the
agent as a normal turn.

- agent/learn_prompt.py: shared standards-guided prompt builder
- /learn registry entry (both surfaces) + CLI handler (inject onto input
  queue) + gateway handler (rewrite turn, fall through, /blueprint pattern)
- tui_gateway command.dispatch returns a send directive -> TUI + dashboard chat
- dashboard Skills page 'Learn a skill' panel (dir + URL + open-ended text)
  composes a /learn request and runs it in chat
- docs (slash-commands ref + skills feature page), 11 targeted tests

Inspired by OpenAI Codex's Record & Replay and the /learn concept from #47234
(dir-distillation engine); reworked to be open-ended and engine-free per
review.
2026-06-23 13:51:28 -07:00
konsisumer
02050859f3 fix(tui): preserve live session identity across compression (#49041)
When a session rotates id on compression, _sync_session_key_after_compress()
re-anchored the session_key, approval-notify routing, yolo state, and slash
worker — but never moved the active-session lease, which stayed keyed to the
pre-compression id. And _find_live_session_by_key() matched live sessions on
the stale session_key, not the live agent's current agent.session_id. After
compression a resume/create path failed to recognize the existing live agent
and could build a SECOND live agent against the same DB continuation -> forked
lineage / cross-session message mixing.

- active_sessions.transfer_active_session(): move a lease in place to the new
  id under the exclusive file lock (no slot drop).
- gateway _transfer_active_session_slot(): call it inside
  _sync_session_key_after_compress(); on the rare fallback (entry pruned)
  RESERVE the new slot before releasing the old lease (reserve-before-release),
  so a concurrent gateway at the session cap cannot grab the freed slot in a
  release-then-reacquire window and leave this session with no lease; if the
  reserve fails, keep the existing lease (review fix).
- _session_lookup_key(): make live-session lookup authoritative on
  agent.session_id, wired into all stale-session_key consumers
  (_find_live_session_by_key, _session_live_item, _live_session_payload) —
  fixes the whole lookup class.

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-06-24 00:54:18 +05:30
Teknium
72bfc48e63
feat(tui): track background subagents in the status bar (#51485)
Parity with the classic CLI status bar's ⛓ indicator (PR #51441). The
Ink TUI status bar now shows ⛓ N for live background/async subagents
(delegate_task batches + background single delegations).

- tui_gateway/server.py: _get_usage() embeds active_subagents from
  tools.async_delegation.active_count() — the same registry the CLI
  reads — onto the existing per-update usage payload, guarded so a
  raising active_count() leaves the field off without breaking usage.
- ui-tui appChrome: new 'subagents' status segment (breakpoint w>=92,
  slots between bg and cost in the shed-order), renders ⛓ N from
  usage.active_subagents.
- Usage / SessionUsageResponse types gain active_subagents?.

Distinct from the turn-scoped SpawnHud / /agents overlay, which mirror
live in-turn subagent.* events; this is the persistent registry count.
2026-06-23 11:32:00 -07:00
brooklyn!
211ba9c7d3
feat(agent): one-shot LLM helper + llm.oneshot gateway RPC (#51261)
A "one-shot" is a single stateless model call that runs OUTSIDE any conversation:
it never touches session history, never breaks prompt caching, and returns plain
text. UI surfaces need this for small generative chores — a commit message from a
diff, a rename suggestion, a summary — where an agent turn would pollute the
thread and hand-rolling an LLM call at every call site would be worse.

- `agent/oneshot.py`: `run_oneshot(...)` over the existing auxiliary-client
  plumbing (same path as title generation). Two call shapes: explicit
  instructions/input, or a registered `template` + `variables` (templates own the
  prompt engineering so it stays consistent across CLI/TUI/desktop). Ships a
  `commit_message` template. Model selection inherits the live session via
  `main_runtime`, else the configured aux `task` backend.
- `tui_gateway/server.py`: `llm.oneshot` RPC (long-handler) inheriting the
  session's model when `session_id` resolves.

Stateless by construction — no session mutation, cache untouched.
2026-06-23 08:01:50 +00:00
brooklyn!
af7b7f6322
feat(agent): expose coding-context project facts as structured data + project.facts RPC (#51259)
Follow-up to the coding-context posture (#43316): that PR detects each repo's
verify loop (manifests, package manager, exact test/lint/build commands, context
files) and bakes it into the system-prompt snapshot — but only as a string, for
the model. Non-prompt consumers (the desktop verify UI) had no way to read it
without re-sniffing and drifting from the prompt.

Split detection from rendering, keeping one source of truth:

- `detect_project_facts(root) -> ProjectFacts` (frozen) holds the structured
  facts; `_project_facts()` now renders it into the same snapshot lines, so the
  prompt block stays byte-identical (cache-safe).
- `project_facts_for(cwd)` resolves the workspace root (git, else marker) and
  returns the structured facts, or None outside a workspace.
- `project.facts` gateway RPC surfaces it to any client (desktop/TUI/ACP).

Tests assert the structured output and that the UI-facing commands never drift
from what the prompt block renders (one detector feeds both).
2026-06-23 08:00:01 +00:00
kshitijk4poor
c080b2dc3e fix(gateway): redact credentials from TUI approval prompts (#48456)
Follow-up to #50767, which redacted the chat-platform (_approval_notify_sync)
and SSE/API (_approval_notify) approval transports. The TUI JSON-RPC transport
is the third egress and was missed: three register_gateway_notify callbacks in
tui_gateway/server.py emitted the raw approval_data — including the unredacted
command Tirith flagged — straight to the TUI client via _emit.

Route all three registrations through a new module-level _emit_approval_request()
helper that redacts payload['command'] via the shared
gateway.run._redact_approval_command seam before emitting, matching the pattern
used for the other two transports. Completes the whole-bug-class fix for #48456.

Tests: assert the helper emits a redacted command (real credential pattern),
handles missing/None command, and a wiring guard that no registration emits the
raw payload directly (only the helper may). Both mutation-checked.

The #48456 fix series originated from @liuhao1024's #48462 — credit to them for
the original report and chat-platform fix; this completes the remaining transport.

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
2026-06-23 03:14:18 +05:30
Teknium
ff85af3fc7
feat(goals): /goal wait <pid> — park the loop on a background process (#50503)
* feat(goals): add /goal wait <pid> barrier to park the loop on a background process

The /goal loop re-pokes the agent every turn via the post-turn judge. When a
goal is gated on a long-running background process (CI poller, build, test
matrix, deploy) that produces nothing to judge yet, this spins the agent into
'is it done?' busy-work and burns the turn budget.

/goal wait <pid> [reason] parks the loop: while the PID is alive, the judge is
skipped, no turn is consumed, no continuation fires, and /goal status shows a
parked indicator. The barrier auto-clears the moment the process exits (the
agent's notify_on_complete watcher is the natural wake signal), then the next
turn resumes normal judging. /goal unwait clears it manually; pause/resume/clear
drop it; a dead/stale PID can never wedge the loop.

Wired across CLI, gateway, and the mid-run command guard for parity. Barrier
persists in SessionDB.state_meta (survives /resume); GoalState gains
backward-compatible waiting_on_pid/waiting_reason/waiting_since fields. 12 new
tests; docs updated.

* fix(goals): use gateway.status._pid_exists for liveness, not os.kill(pid,0)

The Windows-footguns CI guard flagged os.kill(pid, 0) in _pid_alive — on
Windows that's not a no-op, it routes to CTRL_C_EVENT and hard-kills the
target's console process group (bpo-14484). Delegate to the canonical
footgun-safe gateway.status._pid_exists (psutil + ctypes/POSIX fallback)
instead, with a direct-psutil last resort.

* feat(goals): judge-driven auto-wait — the loop parks itself, no manual /goal wait

Makes the wait barrier automatic. Every turn the judge is shown the agent's
live background processes (pid, command, uptime, output tail from the
process_registry) alongside the goal + response, and can return a new 'wait'
verdict instead of continue:
  {"verdict":"wait","wait_on_pid":N}      → park until that process exits
  {"verdict":"wait","wait_for_seconds":N} → park until the deadline passes
evaluate_after_turn acts on the directive (sets the barrier, parks the loop)
so the agent isn't re-poked into busy-work while CI/builds/deploys run. Adds a
time-based waiting_until barrier alongside the pid barrier; both auto-clear and
can never wedge the loop. Drivers (CLI, gateway, tui_gateway) feed the live
registry in via gather_background_processes(). Manual /goal wait stays as an
override. Judge verdict contract widened to (verdict, reason, parse_failed,
wait_directive); legacy {"done":bool} shape still accepted.

* test(goals): update kanban _fake_judge to the 4-tuple judge contract

CI test(3) caught it: test_kanban_goal_mode's _fake_judge still returned the
3-tuple (verdict, reason, parse_failed), but the kanban loop now unpacks the
4-tuple (+ wait_directive). Update the fake to return None for the directive
and accept the background_processes kwarg.

* feat(goals): trigger-based wait — park on a process's own signal, not just exit

Addresses two gaps in the judge-driven wait: (1) the judge could only express
'wait until PID exits' or 'wait N seconds', so a long-lived watcher/server that
fires a trigger MID-RUN (and may never exit) couldn't be waited on; (2) the
process's own watch_patterns/notify_on_complete trigger was invisible to the judge.

Adds a session-based barrier (waiting_on_session) that releases on the process's
OWN trigger via process_registry.is_session_waiting(): the session exits, OR (if
started with watch_patterns) its pattern matches — even while the process keeps
running. list_sessions() now surfaces session_id + watch_patterns/watch_hit/
notify_on_complete so the judge sees the trigger and is told to prefer
wait_on_session for trigger processes. Judge verdict gains a {wait_on_session}
directive (preferred over pid). Backward-compatible GoalState field; pid + time
barriers unchanged.

Tests: TestSessionTriggerBarrier (release on mid-run pattern match while alive,
release on exit, unknown-session, full park→trigger→resume, parse, validation,
backcompat load). 105 goal-surface + 85 process_registry tests green.
2026-06-22 06:27:29 -07:00
Brooklyn Nicholson
5342eccf12 Merge remote-tracking branch 'origin/main' into bb/pets 2026-06-22 05:25:49 -05:00
Shannon Sands
b9b4756ab4 fix dashboard chat session titles 2026-06-21 22:44:02 -07:00
Teknium
95d53c3bcb
feat(cli): /reasoning full — show complete thinking, not 10-line clamp (#50499)
* feat(cli): /reasoning full to show complete thinking, not 10-line clamp

The post-response Reasoning recap box hard-clamped long thinking to the
first 10 lines, so there was no way to see the full reasoning trace after
a turn (live streaming already shows it in full). Add display.reasoning_full
(default off) plus /reasoning full|clamp to toggle it at runtime; the clamp
truncation note now points at the command. Addresses repeated user requests
to show all thinking tokens.

* test(gateway): de-snapshot /reasoning help assertion

The test froze the exact args-hint literal '/reasoning [level|show|hide]',
which the new full/clamp args change to '[level|show|hide|full|clamp]'.
Convert to an invariant: assert /reasoning is in help and carries its core
args, not the exact hint string.

* feat(tui): /reasoning full|clamp parity in tui_gateway

The classic-CLI reasoning_full toggle had no TUI equivalent — typing
/reasoning full in the TUI fell through to parse_reasoning_effort and
errored. The TUI renders thinking as an expand/collapse section (no fixed
10-line recap), so map full -> sections.thinking=expanded (raw, uncapped
via thinkingPreview mode='full') and clamp -> collapsed, persisting
display.reasoning_full for cross-surface config consistency.
2026-06-21 20:21:11 -07:00
Teknium
99f3072aa0
fix(model-switch): a failed in-place swap must be a no-op, not a dead session (#50375)
When a /model switch resolves a valid model but the in-place agent swap
fails mid-conversation (expired key, unreachable base_url), the agent
rolls itself back to the old working model+client and re-raises. The
callers caught that re-raise, logged a warning, then committed the broken
switch anyway: wrote the failed model to the session DB, set
_session_model_overrides to the broken model/provider/key, and (gateway
direct path) evicted the working cached agent. The next message then
rebuilt a dead agent from the broken override -> permanently unusable
conversation (#50163).

Fix the whole caller class so a failed swap aborts the commit entirely:

- gateway/slash_commands.py (picker + direct /model paths): on swap
  failure, early-return an error message; skip DB persist, session
  override, cache eviction, and config write.
- cli.py (both /model handlers): snapshot CLI-level credential/runtime
  fields before mutating, restore them on swap failure, and abort the
  note + success print.
- tui_gateway/server.py: wrap the previously-unguarded swap; on failure
  raise a clean error and skip worker restart, runtime persist, switch
  marker, session model_override, and config persist.

The no-cached-agent path (apply-on-next-session) is unaffected.

Adds a gateway regression test that fails on the pre-fix behavior.
2026-06-21 13:33:23 -07:00
teknium1
d0de4601d2 fix(tui): /compress shows a before/after summary (#46686)
The TUI /compress slash side-effect compressed the session, synced the
key, and emitted session.info — but returned an empty string, so the
user saw no 'Compressed: N → M messages / ~X → ~Y tokens' feedback. The
CLI (_manual_compress) and gateway (slash_commands) paths both already
call summarize_manual_compression; the TUI slash path was the lone gap.

Snapshot history + rough token estimate before and after compaction and
return the formatted summarize_manual_compression() feedback, mirroring
the session.compress RPC handler. The estimate uses the same
estimate_request_tokens_rough(system_prompt, tools) inputs as the RPC
path, re-reading the system prompt after compaction (it may be rebuilt).

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
2026-06-21 11:36:09 -07:00
bogerman1
c7e8854cb3 fix(tui): persist session messages on force-quit / signal shutdown
Mirror the CLI's exit-path behaviour in the TUI gateway so that
unpersisted conversation messages are flushed to state.db and the
on_session_end plugin hook fires before the session is closed.

Root cause: _finalize_session() only called db.end_session() to
mark the session row as ended, but did NOT flush in-memory messages
via _persist_session() or fire the on_session_end hook.  When the
user force-quit (double Ctrl-C, terminal-close, SIGHUP) while the
agent was mid-turn, messages accumulated since the last persist
point were silently lost.

Changes
-------
tui_gateway/server.py - _finalize_session():
  - Persist unflushed messages via agent._persist_session() before
    db.end_session(). Prefers agent._session_messages (set by the
    last _persist_session call inside run_conversation) over
    session['history'] (stale when agent is mid-turn).
  - Fire on_session_end(interrupted=True) plugin hook so crash-
    recovery plugins can flush buffers, matching cli.py behaviour.

tui_gateway/entry.py - _log_signal():
  - Explicitly call _shutdown_sessions() before sys.exit(0) in the
    SIGHUP/SIGTERM handler as belt-and-suspenders over atexit.

tests/tui_gateway/test_finalize_session_persist.py (new):
  - 11 tests covering: history persistence, _session_messages
    priority, empty-history skip, missing-agent, double-finalize,
    persist-exception resilience, hook firing, hook-exception
    resilience, and db.end_session preservation.

Related
-------
Closes the TUI half of #5021 (CLI already handles this via its
atexit handler).  Also addresses the session-persistence gap
discussed in #18465 and #18269.
2026-06-21 07:26:07 -07:00
kshitijk4poor
1ca29723f0 fix(cli): log instead of swallow preflight-warning errors; consistent TUI warning field
Follow-up to the salvaged preflight-compression warning:
- Replace silent `except Exception: pass` at all 5 guard call sites
  (cli.py x2, gateway/slash_commands.py x2, tui_gateway/server.py) with
  `logger.debug(...)` so signature drift in the guard helper isn't hidden.
- tui_gateway/server.py: set the confirm dict's `warning` field to the
  merged message (was bare expensive-model text) so it matches
  `confirm_message` for any future consumer reading `warning`.
- Add trailing newlines to the two new files.
2026-06-21 16:31:56 +05:30
Tuna Dev
04730f32e7 fix(cli): warn when in-session model switch will preflight-compress
Adds hermes_cli/context_switch_guard.py mirroring the model_cost_guard
pattern. When a user switches models mid-session (Herm TUI picker, CLI,
or /model on Telegram/Discord), the warning surfaces on the existing
ModelSwitchResult.warning_message path used by the expensive-model
guard if the new model's compression threshold is below the current
session size.

Partial fix for #23767 — addresses only the 'user-facing guardrail
when switching from a high-context provider to a substantially
lower-context provider' slice. The other proposed fixes from that
issue (hard preflight token guard, metadata cache invalidation on
switch, compression safety invariant, oversized tool-output handling)
are out of scope for this PR.
2026-06-21 16:29:31 +05:30
teknium1
98ecd0beeb docs(mcp): fix stale ~0.75s discovery-wait reference in late-refresh docstring
The MCP discovery wait is now bounded by the config-driven mcp_discovery_timeout
(default 1.5s), not the old 0.75s flat value. Updates the _schedule_mcp_late_refresh
docstring that still cited ~0.75s after #49208 made the bound configurable.
2026-06-20 23:23:47 -07:00
Brooklyn Nicholson
75b36a138f feat(pets): TUI pet pane, picker + gateway RPCs
Add the Ink pet sprite pane, the interactive /pet picker overlay, and live
pet switching/rescale driven by new tui_gateway RPCs (pet state, pet.scale,
per-state frames). Wires pet flash state and the picker into the TUI layout
and slash handler. Covered by the slash-handler test.
2026-06-20 14:18:36 -05:00
Gille
a7983d5ad7
fix(dashboard): hide sidecar sessions from history (#49269)
* fix(dashboard): hide sidecar sessions from history

* test(dashboard): allow sidecar source in session payload
2026-06-19 18:06:38 -04:00
alt-glitch
b6e2a54a94 fix(mcp): address adversarial review round 1 (cache parity, gates, races)
Consolidated findings from three independent reviewers (Codex, Claude Code, a
Hermes subagent w/ the hermes-agent-dev skill):

- BLOCKING: refresh_agent_mcp_tools rebuilt only the registry subset, silently
  dropping post-build-injected memory-provider (mem0/honcho/…) and context-
  engine (lcm_*) tools on every refresh. Now additive-preserving: re-applies
  the same injectors agent_init uses, staged on locals and published atomically.
- Re-injection now honors the #5544 enabled_toolsets gate for context-engine
  tools, so a restricted-toolset platform can't get lcm_* leaked back in.
- Atomic read-diff-publish under one lock: the returned `added` set and the
  (tools, valid_tool_names) pair are consistent even under concurrent callers
  (no half-swap, no TOCTOU).
- background_review fork opts out (_skip_mcp_refresh) so its byte-identical
  tools[] cache parity with the parent is preserved.
- CLI /reload-mcp routed through the shared helper (was a 4th divergent copy
  with the same clobber bug + missing disabled_toolsets).
- Explicit reloads (TUI RPC + CLI) pass enabled_override so a server the user
  just enabled in config this session is picked up; automatic paths reuse the
  agent's build-time selection.
- mcp_discovery_timeout default 5.0 -> 1.5s: correctness now comes from the
  between-turns refresh, so the startup wait is only a small turn-1 UX bump
  rather than a heavy dead-server latency penalty.
- has_registered_mcp_tools checks registered TOOLS (not connected servers) so a
  zero-tool/prompt-only server doesn't make the per-turn hook fire forever.
- Tests: rewrote the thread-safety test to actually exercise the write path
  (alternating tool sets), added the #5544-gate regression, the memory/context
  preservation regression, and a "callable next turn via valid_tool_names"
  contract; removed a dead monkeypatch line.
2026-06-19 11:57:43 -07:00
alt-glitch
93d6e73028 fix(mcp): expose late-connecting MCP tools to the agent (TUI/CLI/gateway)
MCP servers that connect after the agent's one-time tool snapshot were
invisible for the whole session. Two root causes, fixed together:

1. The startup discovery wait was a flat 0.75s. HTTP/OAuth servers
   commonly take 2-6s on a cold connect, so they missed the window and
   their tools never entered the agent's snapshot. `thread.join(timeout)`
   already returns the instant discovery completes, so raising the bound
   costs ~0s for the common case (no MCP / fast servers) and only ever
   blocks for a genuinely-pending server, capped so a dead server can't
   freeze startup. The bound is now configurable via
   `mcp_discovery_timeout` (config.yaml, default 5.0s).

2. Three call sites duplicated the agent tool-snapshot rebuild (the TUI
   `reload.mcp` RPC, the gateway reload, and the TUI late-binding refresh
   thread), and the late-refresh detected changes by tool COUNT — missing
   an equal-size add/remove swap. Consolidated into one shared
   `tools.mcp_tool.refresh_agent_mcp_tools(agent)` helper that diffs by
   tool NAME, mutates the agent under a lock (thread-safe), and respects
   the agent's own enabled/disabled toolsets.

The late-binding refresh keeps its pre-first-turn cache-safety guard:
it never rebuilds the tool list once a turn has started, so the cached
prompt prefix is never invalidated mid-conversation.

Tests: new tests/tools/test_refresh_agent_mcp_tools.py covers the
name-based diff, in-place mutation, agent-scoped filtering, thread
safety, and the config-driven discovery bound (incl. instant-return
when nothing is pending). 75 passed across the touched areas.
2026-06-19 11:57:43 -07:00
Ben
b0e47a98f9 fix(managed-scope): honor managed scope in all standalone config loaders
The skin bug was one instance of a class: several subsystems build their
config dict directly from config.yaml instead of routing through
hermes_cli.config.load_config (which carries the managed merge), so they
silently ignored administrator-pinned values. Audited every config.yaml
reader and fixed the behavioral-read bypasses:

- gateway/config.py load_gateway_config (messaging gateway: session_reset,
  quick_commands, stt, model, ...)
- gateway/run.py _load_gateway_config (its read_raw_config fast path also
  skipped the merge — read_raw_config returns raw user YAML)
- tui_gateway/server.py _load_cfg (new TUI + desktop backend: skin,
  reasoning_effort, service_tier, provider_routing)
- cron/scheduler.py (scheduled-job model/reasoning/toolsets/provider_routing)
- hermes_logging.py (logging.level/max_size_mb/backup_count)
- hermes_time.py (timezone)
- hermes_cli/doctor.py (memory-provider diagnostic reads effective config)

All route through a new shared managed_scope.apply_managed_overlay() helper
that mirrors _load_config_impl (env-only expansion so a user ${VAR} can't
shadow a managed literal, root-model-string normalization, leaf-merge) and is
fail-open. cli.py's earlier inline fix is refactored onto the same helper.

Write-back paths (slash_commands, telegram/yuanbao dm_topics, profile
distribution) are deliberately left reading raw user YAML — overlaying managed
values there would persist them into the user file. The dashboard
(web_server.py) already routes through load_config and needed no change.

TUI loader caches the RAW config so _save_cfg never writes managed values to
disk. Adds test_managed_scope_overlay.py (helper) and
test_managed_scope_loaders.py (per-surface integration); mutation-checked.
2026-06-19 07:46:33 -07:00
Alex Yates
fad4b40d9d fix(model): persist /model switch by default across sessions
A plain /model <name> switch only lasted for the current session — every
new session reverted to the previously-configured model, so users had to
re-switch every time (e.g. glm-5.1 -> glm-5.2 on every launch).

Persist-by-default is now the behavior across all three /model surfaces
(CLI, gateway, TUI/dashboard), gated by a new config key
model.persist_switch_by_default (default true):

  /model <name>             switch model (persists to config.yaml)
  /model <name> --session   switch for this session only
  /model <name> --global    switch and persist (explicit, unchanged)

The effective persistence is resolved once via resolve_persist_behavior()
in hermes_cli/model_switch.py so --session opts out, --global opts in,
and the config-gated default applies otherwise. --global remains a valid
explicit no-op alias for the new default.
2026-06-19 07:07:06 -07:00
kyssta-exe
1699525638 fix(tui): route pending-input commands via command.dispatch (#48848)
When /goal (and other _PENDING_INPUT_COMMANDS: retry, queue, q, steer,
plan, undo) were typed in the TUI desktop app, slash.exec returned error
4018 instructing the frontend to fall back to command.dispatch. Some
clients failed that client-side fallback, leaving the command empty and
surfacing "empty command" — the user's typed text was silently dropped.

slash.exec now routes pending-input commands to command.dispatch
internally, eliminating the fragile client-side fallback hop. The
response is exactly what command.dispatch would have produced, so the
TUI client behaves identically once the round-trip succeeds.

Salvaged from #48944 — rebased onto current main. The original PR's
source change and test_goal_command.py update are correct, but it missed
the second test surface: tests/tui_gateway/test_protocol.py's
parametrized test_slash_exec_rejects_pending_input_commands still
asserted the old 4018 rejection for retry/queue/q/steer/plan, turning CI
red (5 failures). That test is rewritten here as a behavior contract:
slash.exec for a pending-input command must yield the same payload as a
direct command.dispatch call, and must no longer emit the old
"pending-input command" fallback rejection.

Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
2026-06-19 14:53:33 +05:30
Teknium
620fd59b8e
feat(model-picker): add Refresh Models control to bust stale model cache (#48691)
The desktop model picker had no way to force a fresh model fetch: model.options
went through the 1h-cached provider_models_cache.json, and there was no flag to
bust it. When a provider's cached list expired and its next live fetch failed,
the picker fell back to the curated static list — silently dropping live-only
models (e.g. OpenCode Zen's free tier like deepseek-v4-flash-free) the user had
been using.

- Thread refresh through model.options (RPC + REST /api/model/options) ->
  build_models_payload -> list_authenticated_providers, which calls
  clear_provider_models_cache() up front when set so every row re-fetches live.
- Add a 'Refresh Models' control to the desktop picker (5-locale i18n, spinning
  sync icon). Normal opens leave refresh=false to stay snappy on the cache.

Verified: stale cache hides deepseek-v4-flash-free -> refresh busts it -> live
re-fetch surfaces it. refresh=false never touches the cache.
2026-06-18 21:37:41 -07:00
Brooklyn Nicholson
49596b70cb fix(gateway): resume follows the compression tip so post-compression replies render
Auto-compression ends the live session and forks a continuation child
(linked via parent_session_id). A long-lived parent keeps its own flushed
message rows, so resolve_resume_session_id()'s empty-head walk never
redirected it — resuming the parent id reloaded the pre-compression
transcript and dropped every turn generated after compression, including
the assistant's response. On the desktop this is the recurring "I sent a
message, came back, and the reply isn't there" report on large sessions:
the chat's routed id is the pre-rotation id, and both the gateway
session.resume RPC and the REST /messages read anchored on it.

Fix the resolver at the chokepoint: resolve_resume_session_id() now
follows the compression-continuation chain forward via get_compression_tip()
before its existing empty-head descendant walk. get_compression_tip() only
follows children whose parent ended with end_reason='compression' (created
after the parent was ended), so delegation/branch children never hijack a
resume. This fixes every resume caller at once (REST /messages, CLI
--resume, gateway /resume).

session.resume in tui_gateway was the one resume path that never called the
resolver — it used the raw target id directly. Route it through
resolve_resume_session_id() too (non-lazy only; lazy watch windows must
stay on their exact child branch). Resolving up front also re-anchors the
live-session fast path so a still-live rotated session is reused by its new
key instead of rebuilding a duplicate agent on the stale parent.

Tests:
- resolve_resume_session_id follows the tip even when the parent retains
  messages, and is not confused by a delegation child.
- session.resume binds the agent to the continuation tip and returns the
  post-compression reply.
2026-06-18 15:56:43 -05:00
islam666
9705e7944a fix(picker): remove max_models=50 cap in interactive model pickers
The interactive model pickers (Desktop REST API, TUI model.options, CLI
/model) were hard-capped at max_models=50, which truncated large provider
catalogs like Kilo Gateway (336 models) to just 50 entries. This made
most models undiscoverable via the picker search box.

Changes:
- Change build_models_payload() default from max_models=50 to None (unlimited)
- Change list_authenticated_providers() default from max_models=8 to None
- Change list_picker_providers() default from max_models=8 to None
- Fix all [:max_models] slicing to handle None as 'no limit'
- Remove max_models=50 from 5 interactive picker callers:
  * web_server.py: get_model_options (Desktop /api/model/options)
  * web_server.py: get_recommended_default_model
  * model_switch.py: prewarm_picker_cache_async
  * tui_gateway/server.py: model.options JSON-RPC
  * cli.py: HermesCLI model picker
- Telegram/Discord inline keyboard picker (gateway/slash_commands.py)
  still passes max_models=50 explicitly — unchanged behavior.

The total_models field was already in the response payload and is now
meaningful since models.length == total_models for interactive pickers.

Fixes #48279
2026-06-18 13:47:31 -07:00
Siddharth Balyan
73cd8622f9
feat(billing): /billing terminal billing — interactive TUI + CLI client (#45449)
* feat(billing): nous_billing http client + BillingState core (phase 2b)

Phase 2b terminal-billing client foundation:
- hermes_cli/nous_billing.py: typed client for the 4 /api/billing/* endpoints
  (state/charge/poll/auto-top-up). Raises typed errors (BillingScopeRequired,
  BillingRateLimited, BillingAuthError) mapped from the live-verified contract;
  fail-open is the caller's job. Idempotency-Key enforced client-side.
- agent/billing_view.py: surface-agnostic BillingState core + Decimal money
  parsing (server emits decimal strings, not 2dp), fail-open builder,
  idempotency-key gen, custom-amount validation.
- 51 unit tests (decimal parse/format, payload tiering, error->exception
  matrix, fail-open, amount validation).

Plan: docs/plans/2026-06-13-001-phase-2b-terminal-billing-tui-plan.md

* feat(billing): billing:manage scope + lazy step-up re-auth (phase 2b)

- NOUS_BILLING_MANAGE_SCOPE constant.
- nous_token_has_billing_scope(): split-based scope check (no false-positive
  substring match).
- step_up_nous_billing_scope(): re-runs the device flow requesting
  billing:manage, reusing the held credential's portal/inference URLs + client_id
  (so a preview stays a preview), persists like _login_nous but WITHOUT the model
  picker. Returns True iff the minted token carries the scope (False when NAS
  silently downscopes a non-admin / unticked grant).

Lazy step-up (plan D-A): normal login path unchanged; 403 insufficient_scope
from a billing call triggers this. 7 unit tests.

* feat(billing): billing JSON-RPC methods for the TUI (phase 2b)

billing.state / charge / charge_status / auto_reload / step_up in
tui_gateway/server.py. Return STRUCTURED success envelopes (result.ok +
result.error=<code>) rather than JSON-RPC-level errors, so the Ink rpc() promise
always resolves and the TUI branches on the typed billing error code
(insufficient_scope, rate_limited, no_payment_method, …) to render the right
affordance. Money serialized as decimal STRINGS + display strings. charge mints
+ echoes an idempotency_key for retry reuse. 16 unit tests.

* feat(billing): /billing CLI handler + command registry (phase 2b)

- CommandDef("billing", subcommands=buy|auto-reload|limit), added to
  _SLACK_VIA_HERMES_ONLY so it routes via /hermes on Slack (keeps the 50-cap
  parity test green, same as /credits).
- cli.py::_show_billing + screen helpers: all 5 screens (overview, buy→confirm→
  poll, auto-reload, monthly-limit read-only). Reuses _prompt_text_input_modal /
  _prompt_text_input (D-C). Non-interactive (_app is None) renders text + portal
  deep-link, never prompts (R7). Decimal money end-to-end. 2s/5-min cancellable
  poll loop; 429/503 = retry not failure; settled = ledger truth. Lazy step-up on
  403 insufficient_scope. no_payment_method treated as mainline funnel-to-portal.
- 6 CLI tests; 156 command tests (incl. Slack/Telegram parity) green.

* feat(billing): /billing Ink TUI screens + tests (phase 2b)

- ui-tui/src/app/slash/commands/billing.ts: /billing TUI command covering all 5
  screens — overview (text), buy <amt> → ConfirmReq → charge → non-blocking 2s/
  5-min poll loop → settled/failed/timeout branches, auto-reload <below> <to> →
  ConfirmReq → PATCH, limit (read-only). Reuses the existing ConfirmReq overlay
  (D-C) — no bespoke component. Typed-error envelope branching: insufficient_scope
  arms the lazy step-up confirm; no_payment_method/rate_limited/cap funnel to
  portal. Client-side amount validation mirrors the server (bounds + 2dp).
- gatewayTypes.ts: Billing* response interfaces.
- registry.ts: register billingCommands.
- billingCommand.test.ts: 12 vitest cases (overview/gating/buy-confirm-poll-
  settled/no_payment_method/step-up/limit/auto-reload/validation).

TUI build green; 12/12 vitest pass; slash tests pass once @hermes/ink is built.

* docs(billing): scrub private cross-repo references

NAS is a private repo — remove all references to it from the public PR:
- drop the cross-repo planning doc (planning scaffolding, not a deliverable;
  the PR description documents the design)
- replace 'NAS' / 'PR #412 preview' mentions in code + test comments with
  generic 'the server' / 'a preview deployment'

* docs(billing): scrub final NAS reference in step-up docstring

* docs(billing): drop dangling plan-doc refs

The phase-2b plan doc was removed in the cross-repo scrub (300afcc0b)
but two module docstrings still pointed at it. Drop the dead refs.

* feat(billing): interactive /billing overlay + step-up UX, portal-URL & token fixes

Adds the interactive /billing TUI overlay and hardens the terminal-billing
client across CLI and TUI.

- TUI: full /billing overlay state machine (overview to buy to confirm,
  auto-reload, read-only monthly limit) reusing the existing confirm overlay.
- Step-up: surface the verification link in-transcript and open the browser
  via the TUI's own opener (the device flow runs in the headless gateway, so a
  printed URL was being dropped); run the step-up handler off the main loop and
  emit the link as an out-of-band event so the gateway stays responsive.
- Step-up copy is scope-accurate ("Billing permission granted") and re-checks
  /state so it never claims "enabled" when the org kill-switch is still off.
- Portal deep-links resolve to absolute URLs against the active portal base
  (the server emits them relative) - fixes a bare "/billing?topup=open" link.
- Billing calls refresh an expired access token via the stored refresh token
  instead of reporting a false "not logged in".
- Optimistic funnel: advise "set up a saved card on the portal" up front when
  no card is on file (advisory, not a hard gate).
- Token resolution is cached briefly so the 2s charge poll loop stops
  re-locking + re-reading the auth store on every tick; 401 re-resolves fresh.
- Remove the temporary demo-mode shims.

Validation: 87 Python billing tests, 88 TS tests (billing command + gateway
event handler), tsc clean, ink + ui-tui builds green.

* docs(billing): add /billing TUI screenshots for PR

* fix(cli): guard _last_invalidate on bare instances; update stale prompt-fallback test

The UI-invalidate throttle read self._last_invalidate unconditionally, which
raised AttributeError on HermesCLI instances built without __init__ (the
thread-safety test's object.__new__ shell). Guard the read with getattr.

The off-main-thread branch of _prompt_text_input was changed (#23185) to cancel
cleanly to None instead of falling back to a bare input() that would hang on the
slash-worker thread; the test still asserted the old direct-input fallback.
Update it to assert the current intended behavior: returns None, calls neither
run_in_terminal nor input(), and does not hang.
2026-06-19 01:53:32 +05:30
Brooklyn Nicholson
51ee5b2c94 fix(desktop,tui): surface self-improvement review summary + honor memory_notifications
The "💾 Self-improvement review" summary (skill/memory updated) was invisible
on two surfaces:

- Desktop Electron app had no review.summary event handler — skill/memory
  writes happened silently. Now appends a persistent system message to the
  transcript (matching the Ink TUI's persistent-line semantics, not a
  transient toast that can be missed).
- tui_gateway (backs both 'hermes --tui' and the desktop) never read
  display.memory_notifications, so it always behaved as 'on' and ignored a
  user who set 'off'/'verbose'. Added _load_memory_notifications() (mirrors
  the messaging gateway's bool->str normalization, defaults to 'on') and
  wired it to agent.memory_notifications, matching gateway/run.py and the CLI.

Delivery chain now reaches all surfaces:
background_review.py -> background_review_callback -> review.summary event ->
desktop transcript / Ink TUI line / gateway message / CLI print.
2026-06-18 13:22:12 -05:00
Teknium
0fa7d6f660
fix(desktop): never persist or restore a named custom provider as bare "custom" (#48547)
* Port from cline/cline#11514: encourage parallel tool calls

Add a universal system-prompt guidance block telling the model to batch
independent tool calls (reads, searches, web fetches, read-only commands)
into a single assistant turn instead of one call per turn. The runtime
already executes independent batches concurrently (read-only tools always;
non-overlapping path-scoped file ops); the open-source system prompt had
nothing steering the model to PRODUCE the batch. Fewer round-trips means
less resent context, which compounds over a long conversation.

- prompt_builder.py: new PARALLEL_TOOL_CALL_GUIDANCE block (short, static,
  cache-amortised) modeled on TASK_COMPLETION_GUIDANCE.
- system_prompt.py: inject right after the task-completion block, gated by
  agent.valid_tool_names + the new toggle.
- agent_init.py: read agent.parallel_tool_call_guidance (default True).
- config.py: add the default under the agent section.
- test_prompt_builder.py: behavior-contract tests (batching steer, dependent
  carve-out, length bound) — invariants, not wording snapshots.

Adapted from Cline's TypeScript tool-surface guidance to hermes-agent's
Python prompt-assembly architecture and config-over-env conventions.

* fix(desktop): never persist or restore a named custom provider as bare "custom"

Custom providers vanish from the Desktop/TUI model picker with
"No LLM provider configured" — repeatedly fixed (#44062, #44109, #45578)
and repeatedly regressed (#44022, #47714) because every fix only recovered
the entry identity from a persisted base_url. When a session is
persisted/restored with the resolved provider "custom" and NO base_url, bare
"custom" leaked through verbatim; resolve_runtime_provider("custom") routes to
the OpenRouter default URL with no api_key, so the next turn/resume dies.

Bare "custom" is the resolved billing class shared by every named providers:/
custom_providers: entry — it is not a routable identity. Centralize the
"never let bare custom escape" invariant in one helper,
runtime_provider.canonical_custom_identity(), and apply it at all four leak
sites in tui_gateway/server.py:

- _ensure_session_db_row  — the ORIGIN: first DB write seeds the bad row
- _runtime_model_config   — live persist
- _stored_session_runtime_overrides — resume restore (heals old rows; drops
  unrecoverable bare custom so resume falls back to config default)
- _make_agent             — rebuild / per-turn

The helper recovers custom:<name> from the endpoint URL when present, else
from config.model.provider (the durable identity left when no base_url
survived). Regression tests in test_custom_provider_session_persistence.py
lock the no-base_url vector at every site so it cannot regress again.
2026-06-18 11:11:51 -07:00
Teknium
2f7c4858a7
fix(tui): refresh tool snapshot when MCP discovery lands after agent build (#48403)
The TUI banner reported fewer tools than the classic CLI for the same
config (e.g. 32 vs 38) when an MCP server connected slowly. Root cause:
the agent snapshots `agent.tools` once at build time and never re-reads
the registry. `_make_agent` briefly joins the background MCP discovery
thread (`wait_for_mcp_discovery`, ~0.75s) so fast servers land in that
snapshot, but a server slower than the bound — common for an HTTP MCP
server on first connect — lands *after* the agent is built. Its tools are
then absent from both the agent (uncallable until `/reload-mcp`) and the
banner for the whole session.

The classic CLI doesn't hit this because it re-derives
`get_tool_definitions()` at banner render time (which re-waits for
discovery), so it picks the late tools up.

Fix: after a fresh agent is built and its first `session.info` emitted,
if discovery is still in flight, schedule an off-critical-path daemon that
waits for it to finish, then rebuilds the tool snapshot and re-emits
`session.info` — the same rebuild `/reload-mcp` performs, but automatic.
Both the agent's callable tools and the banner count catch up.

Cache safety: the rebuild runs only while the session is still
pre-first-turn (`_user_turn_count`/`_api_call_count` both 0 → nothing
cached to invalidate). Once the user has sent a message we leave the
snapshot frozen rather than break the cached prompt prefix mid-conversation;
late tools then require an explicit `/reload-mcp` (user-consented), exactly
as today. No-op when discovery finished before the agent build, when the
join times out, when the registry was unchanged, or when the session was
swapped/closed while waiting.

Adds entry.mcp_discovery_in_flight() / join_mcp_discovery() accessors and
covers the matrix (added/none/post-turn/timeout/unchanged/replaced) with
unit tests.
2026-06-18 05:41:23 -07:00
Austin Pickett
5a00bd1518
fix(desktop): persist /title set before the first message instead of queuing (#47987)
A /title typed before any message in a fresh desktop chat could be silently
lost: the session DB row is deferred to the first prompt, so session.title
found no row, only stashed pending_title, and returned pending:true. It then
relied on a post-turn apply block to write the title. When that turn never
landed under the same session_key (or the apply path didn't fire), the title
was dropped and the sidebar fell back to the first-message preview — e.g.
"/title my-custom-name" then "hello" left the session titled "hello".

Mirror the messaging gateway's _handle_title_command: an explicit /title is
clear user intent, not an abandoned draft, so create the row up front
(_ensure_session_db_row) and set the title immediately via the profile-aware
_session_db handle, returning pending:false. This also fixes the frontend
symptom for free — the desktop handler's immediate refreshSessions() now pulls
the correct persisted title instead of clobbering the optimistic value with a
still-NULL row.

If row creation can't take (DB unavailable / racing writer), fall back to the
existing pending_title queue so the post-turn apply block remains a recovery
path. The sidebar's min-messages filter keeps a titled 0-message row hidden, so
a /title'd-but-never-used draft still doesn't clutter the list.

Updates the test that asserted the old queue-on-missing-row behavior and adds a
fallback-to-queue regression test.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-06-17 16:46:21 -04:00
brooklyn!
44e5848e74
feat(desktop): stream subagent activity into watch windows (#47060)
* feat(desktop): stream subagent replies into watch windows

A desktop watch window resumes a child session lazily (no full agent) and
mirrors the parent-relayed `subagent.*` events into native child-session
stream events. The child's streamed reply text was never relayed, so the
window sat blank while the subagent "talked".

- delegate_tool: forward the child's `run_conversation` stream tokens up the
  progress relay as `subagent.text` (inert under CLI/TUI — their progress
  handlers ignore non-tool event types; only a gateway watch window mirrors it).
- server: mirror `subagent.text` -> `message.delta` on the child sid only, and
  skip the parent emit (per-token frames are meaningless on the parent session,
  which shows the child via the spawn tree). Demote `subagent.start` to a
  one-time goal header and drop the noisy `subagent.progress` mirror — tools
  already mirror natively.
- server: guard `_start_agent_build` so a lazy watch session spectating an
  in-flight child stays lazy; incidental RPCs were upgrading it to a full
  agent mid-stream and silently killing the mirror.

* fix(desktop): keep watch-window chat clear of titlebar chrome

Secondary windows (new-session scratch, subagent watch, cmd-click pop-out)
hide the titlebar tool cluster + session header, so the transcript ran to the
window's top edge and streamed text slid up under the OS traffic lights.

- Gate the hidden chrome on `isSecondaryWindow()` everywhere (app-shell,
  chat header, thread list) instead of the narrower new-session flag.
- Add a fixed opaque drag-strip at the top of the secondary-window transcript:
  content padding alone scrolls away with the text, so the strip masks
  anything behind it and keeps the window draggable like the main header.

* fix: WSL subagent window

* fix: subagent window top padding

---------

Co-authored-by: Austin Pickett <pickett.austin@gmail.com>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-06-16 14:30:11 -04:00
Brooklyn Nicholson
7d938cc5c9 fix(desktop): keep live model switch metadata truthful
A live config.set model switch already moved the next API call to the new model,
but the conversation could still restore an old sessions.system_prompt snapshot
whose Model/Provider lines named the previous runtime. That made "what model are
you?" answer from stale metadata even while inference ran on the new model.

After a live switch we now refresh the stored system prompt and append a real
system-history pivot (not a fake user turn) so the transcript itself records the
new model/provider. Restore also rejects already-stale prompt snapshots when
their Model/Provider lines disagree with the runtime, so existing bad sessions
self-heal.
2026-06-16 09:50:17 -05:00
Brooklyn Nicholson
cb6b4127e7 refactor(desktop): make composer model picker sticky session state
The picker no longer touches the profile default. Model/effort/fast live as
plain UI state persisted in localStorage, so a pick follows across Cmd+N and
restarts instead of snapping back. New chats ship that state through
session.create as per-session overrides; live chats still scope switches to the
current session. Settings -> Model remains the only surface that writes the
profile default.

The gateway now accepts those session.create overrides, builds the agent with
them directly, reflects them in the immediate session.info payload, and writes
the chat's own model_config into the lazy DB row so reconnect/resume restores
that chat instead of the global default.
2026-06-16 09:50:07 -05:00
Teknium
c66ecf0bc3
feat(delegation): async background subagents via delegate_task(background=true) (#40946)
* feat(delegation): async background subagents via delegate_task(background=true)

delegate_task(background=true) dispatches a subagent that runs in the
background and returns a handle immediately, so the user and model keep
working while it runs. The full result — plus the original task source —
re-enters the conversation as a new turn when the subagent finishes,
riding the same completion-queue rail as terminal background processes.

- tools/async_delegation.py: daemon-executor registry, capacity cap,
  rich self-contained completion event pushed onto the shared
  process_registry.completion_queue (type='async_delegation').
- delegate_tool.py: background param + single-task dispatch branch;
  batch async rejected (v1).
- process_registry.py: format_process_notification renders the rich
  task-source block (goal/context/toolsets/model/status/result).
- gateway/run.py: dedicated _async_delegation_watcher drains + injects
  results into the originating session (idle + post-turn), session_key
  routing enrichment, shutdown interrupt of dangling delegations.
- config: delegation.max_async_children (default 3).

Reuses the existing idle-drain wiring rather than mutating a running
agent loop, preserving message-role alternation and prompt-cache
invariants. 13 targeted tests; CLI + gateway paths E2E-verified.

* test(delegation): make async non-blocking tests environment-independent

CI 'test (5)' flaked on a cold, 8-worker runner: the first
delegate_task(background=true) call measured 2.27s of one-time setup
(config load + child-agent construction + imports), tripping the
elapsed < 1.0 wall-clock assertion. That assertion was testing setup
overhead, not blocking.

Replace the wall-clock thresholds with the real invariant: dispatch
returns while the child is still gated (active_count == 1, completion
queue empty), which a synchronous impl could not do. Keep only a loose
4s sanity backstop well under the runner's 5s gate.

* fix(delegation): harden async background delegation

Follow-up review fixes:
- Detach background child from parent._active_children at dispatch —
  otherwise parent-turn interrupts (Ctrl+C, mid-turn steering), cache
  evicts (release_clients), and session close (/new) kill/close the
  detached subagent mid-run, defeating the point of background mode.
  Lifecycle is owned by the async registry's interrupt_fn.
- Make the capacity check atomic with the record insert (TOCTOU: two
  concurrent dispatches could both pass active_count() and exceed the cap).
- TUI dedup: key async_delegation events by delegation_id — the
  fallthrough keyed them all as ("", type), suppressing every completion
  after the first in the desktop/TUI status feed.
- CLI /stop now interrupts running background delegations and /agents
  lists them (they live outside the process registry and were invisible).
- Drop stray unbalanced ']' line from the re-injection block and the
  unused _ASYNC_DEFAULT import.

Tests: detach-at-dispatch + concurrent-capacity race added (15 total in
test_async_delegation.py); 137 delegate + 140 process-registry/notify/watch
+ 7 TUI dedup tests pass.

* fix(delegation): harden async background completion drains
2026-06-15 13:33:12 -07:00
Austin Pickett
ed20f5ed06
fix(desktop): let explicit model switches escape broken config providers (#42241) (#46796)
When a desktop/dashboard session had no agent built yet and the user explicitly
picked a provider in the model picker, config.set('model', ...) would first try
to initialize the agent from the (possibly broken) config default provider —
failing before the user's explicit switch could take effect, trapping them on a
misconfigured default.

config.set now pre-parses the model flags: if an explicit --provider is present
and no agent exists yet, it skips the default-provider agent build and routes
straight through _apply_model_switch with the explicit provider. _apply_model_switch
gained a parsed_flags passthrough (avoids double-parsing) and only falls back to
resolve_runtime_provider(requested=None) when no explicit provider was given.

The desktop hook now sends config.set instead of slash.exec for active-session
model changes, so errors from the selected provider surface to the user instead
of being swallowed.

Co-authored-by: rodboev <rod.boev@gmail.com>
2026-06-15 15:36:51 -04:00
Teknium
9f33d673e9 fix(tui): persist resumed profile cwd updates to profile db 2026-06-14 02:15:33 -07:00
dsad
d842155da1 Keep resumed profile cwd scoped to profile DB 2026-06-14 02:15:33 -07:00
Brooklyn Nicholson
715b691723 fix(desktop): show summarizing indicator during auto-compaction
Auto-compression rewrites history mid-turn, which made long threads look
like they reset. Re-tag the gateway lifecycle status as compacting and
surface it in the desktop thread loading indicators.
2026-06-14 02:28:07 -05:00
Adalsteinn Helgason
2667601c05 fix(tui): keep reasoning-only assistant turns visible on session resume
A thinking-only assistant turn (reasoning present, empty visible text) is
persisted with its reasoning fields and stays recallable from the transcript,
but `_history_to_messages` dropped it as "empty" before its reasoning was
attached. On desktop/TUI resume or reload the turn therefore vanished from the
session view while the agent could still recall it from a fresh session --
exactly the "messages disappear when the LLM uses its thinking block, but a new
session can recall them" symptom reported on #44022.

Keep an assistant turn when it carries reasoning, even with empty text, so the
desktop "Thinking…" disclosure has something to render. Genuinely empty turns
(no text, no reasoning, no tool calls) are still filtered out.

Refs #44022

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 05:51:05 -07:00
Adalsteinn Helgason
643dc82793 Fix custom provider identity loss in session persistence
_runtime_model_config persisted the live agent's RESOLVED provider into
the session row's model_config JSON. For any named providers:/
custom_providers: entry, agent.provider is the literal string "custom",
so the entry name was lost (and the api_key is deliberately never
persisted). On session.resume or _reset_session_agent the stored
provider="custom" fed resolve_runtime_provider(requested="custom"),
which cannot match a named entry — the rebuild either raised "No LLM
provider configured" or silently resolved placeholder credentials
against the patched-back base_url.

Persist the REQUESTED/entry identity instead: a new reverse lookup
find_custom_provider_identity(base_url) maps the endpoint URL back to
the canonical custom:<name> menu key. _runtime_model_config stores that
key; _make_agent performs the same recovery for rows persisted before
the fix, falling back to passing the stored base_url as
explicit_base_url so the direct-alias branch still targets the
session's endpoint when no entry matches.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 05:51:05 -07:00
Haozhe Zhang
e256f4aae4 fix(gateway): don't restore a bare billing provider as the resumed session's provider
`_stored_session_runtime_overrides` restored the session provider from
`billing_provider` when `model_config` had no explicit provider. For a
`custom:<name>` endpoint that only ran normal turns (no `/model` switch), the
persisted `billing_provider` is the bare billing bucket `"custom"`, which
`agent_init` treats as non-routable, so `session.resume` failed with
"No LLM provider configured" even though new chats and CLI `--resume` work.

Only restore an explicit `model_config.provider`; skip a bare billing bucket
(`auto`/`openrouter`/`custom`) so resume falls back to the configured default,
matching the CLI path.

Fixes #44022
2026-06-13 05:51:05 -07:00
xxxigm
b6c7ebf028
fix(tui): honor provider_routing config in the desktop/TUI backend (#44953)
* fix(tui): honor provider_routing config in the desktop/TUI backend

The messaging gateway and classic CLI both read `provider_routing` from
config.yaml and pass the OpenRouter routing prefs (only / ignore / order /
sort / require_parameters / data_collection) into the agent. The tui_gateway
backend that powers the desktop app and TUI never did, so it built agents
with every routing pref left at its default — OpenRouter then selected
providers freely (effectively at random), ignoring the user's config.

Load `provider_routing` in `_make_agent` and forward the same six prefs the
gateway does, restoring parity across CLI / gateway / desktop. Background
subagent kwargs already propagate these from the parent agent, so they now
inherit correctly too.

* test(tui): cover provider_routing forwarding in _make_agent

Asserts the six OpenRouter routing prefs flow from config.yaml into AIAgent,
and that an absent provider_routing section forwards None/False (unchanged
behavior for users who never configured routing).
2026-06-13 02:58:15 -05:00
teknium1
6b4073648e fix(tui): config.yaml wins over env model seed in per-turn sync
Hosted instances set HERMES_INFERENCE_MODEL as a provision-time seed in
the container env. _config_model_target() previously went through
_resolve_model() (env-first), so on hosted VPS the sync target stayed
pinned to the seed and dashboard model changes never reached an open
chat -- the exact scenario the sync exists to fix. The sync target now
reads config.yaml first and only falls back to the env vars when config
has no model. Startup resolution (_resolve_model) is unchanged.
2026-06-12 11:03:44 -07:00
IAvecilla
bc3f4ed70f Skip redundant model switch 2026-06-12 11:03:44 -07:00