Commit graph

6246 commits

Author SHA1 Message Date
Harjoth Khara
233ef98afe
fix(docker): skip symlinked stage2 chown targets (#52789)
Prevents stage2-hook.sh recursive chown from following a symlinked $HERMES_HOME/home (or profiles/cron) and destroying the host user's home directory. Also guards top-level state-file chowns and refuses first-boot seeding through symlinks. Fixes #52781.

Co-authored-by: harjoth <harjoth.khara@gmail.com>
2026-06-26 12:09:52 +10:00
DavidMetcalfe
865a09a610 fix(agent): detect thinking-timeout for reasoning models and surface actionable guidance instead of misleading file-write advice
Two-part fix:

Part 1 (classifier override at agent/error_classifier.py:720-738):
A transport disconnect on a reasoning model — even on a large session —
now routes to FailoverReason.timeout instead of context_overflow. Without
this, large-session reasoning-model disconnects route to the compression
branch and silently delete conversation history on a phantom
context-length error. The override is strictly targeted: non-reasoning
models (gpt-4o, claude-3-5-sonnet, llama-3.3-70b, etc.) still route to
context_overflow on large sessions — the existing intentional behavior
for chat models whose proxy doesn't idle-kill during prefill/generation.

Part 2 (new agent/thinking_timeout_guidance.py + integration at
agent/conversation_loop.py:3488-3567):
New is_thinking_timeout() and build_thinking_timeout_guidance() helpers.
When a known reasoning model (NVIDIA Nemotron 3 Ultra, OpenAI o1/o3,
Anthropic Opus 4.x thinking, DeepSeek R1, Qwen QwQ, xAI Grok reasoning)
hits a transport-kill on a small session (classifier says timeout
directly) or after Part 1 routes correctly (large session), the user
now sees reasoning-specific guidance with three actionable workarounds
in priority order:

  1. Set providers.<provider>.models.<model>.stale_timeout_seconds: 900
     in ~/.hermes/config.yaml (Hermes's built-in floor is already 600s
     for known reasoning models; raise further if upstream is even
     tighter).
  2. Lower reasoning_budget or set reasoning_effort: medium on this
     model if the provider supports it.
  3. Use a smaller / faster reasoning model if the task doesn't
     require deep thinking.

The new guidance takes precedence via if/elif over the existing
_is_stream_drop block, so a reasoning-model user with a transport-kill
message sees actionable advice instead of the misleading "try
execute_code with Python's open() for large files" advice (which is
correct for the unrelated large-file-write stream-drop case but
actively wrong for the thinking-timeout case).

Verified:
- 478 tests passing across 9 directly-relevant files (49 new + 429
  existing, zero regressions).
- Ruff lint clean on all 4 modified/new files.
- Negative test: 6 parametrized regression guards confirm non-reasoning
  models still route to context_overflow on large sessions; 4
  parametrized gates confirm non-timeout classifier reasons never
  trigger the guidance; 5 parametrized cases confirm non-transport
  messages never trigger it.
- Regression guard: new guidance message does NOT contain
  "execute_code" or "open()" — the misleading advice is fully
  replaced, not appended alongside.
- Cross-vendor dual review via agy -p:
  - Gemini 3.5 Flash (Medium) — passed: true, zero blockers, one
    SHOULD-FIX (vprint block duplication — fixed by extracting
    detection into a helper module).
  - GPT-OSS 120B (Medium) — passed: true, zero blockers, two nits
    (test placement — adopted at tests/agent/test_thinking_timeout_guidance.py;
    primary-model capture — accepted as non-issue per Flash's nit).

Dependency note for maintainers:
This PR includes agent/reasoning_timeouts.py (the reasoning-model
allowlist module from PR #52238) because the Layer 1 override is
load-bearing on get_reasoning_stale_timeout_floor(). After PR #52238
lands on main, this PR's duplicate agent/reasoning_timeouts.py should
be rebased away. Either PR can land first; the other rebase is
mechanical.

Fixes #52271.
2026-06-25 19:00:48 -07:00
Teknium
811df74a10
fix(gateway): defer cross-process cache cleanup off the cache lock (#52197) (#52761)
The #45966 cross-process coherence guard popped the stale cached agent
and then called the blocking _cleanup_agent_resources (memory-provider
shutdown, tool-resource teardown, async-client teardown) while still
holding _agent_cache_lock, on the gateway event-loop thread. While that
ran, _sweep_idle_cached_agents (driven by _session_expiry_watcher)
blocked acquiring the same lock and the asyncio loop stalled for minutes,
tripping repeated Discord 'heartbeat blocked' warnings.

Fix mirrors the cap-enforcer / idle-sweep paths: pop the stale entry
under the lock, release it, then schedule the SOFT release on a daemon
thread. The soft path (_release_evicted_agent_soft) is also more correct
here than the hard teardown the regression used — the same session
rebuilds a fresh agent immediately after invalidation, so its terminal
sandbox / browser / bg processes (keyed on task_id) must be preserved
for the rebuilt agent to inherit, not torn down.

Verified the cross-process site was the only cleanup-under-lock instance;
the other _cleanup_agent_resources call sites run outside the lock.
2026-06-25 18:58:47 -07:00
Teknium
ce802e932c fix(telegram): heartbeat loop exits cleanly when bot has no get_me
CI shard test_telegram_conflict.py timed out (140s) because the new
_polling_heartbeat_loop, started by connect(), busy-spun under those
tests: they monkeypatch asyncio.sleep to instant and pass a bot double
with no get_me(), so the probe raised AttributeError (swallowed) and the
loop re-entered immediately with no real pacing, starving the event loop.

Guard the loop to return when bot.get_me is not callable — a real PTB Bot
always exposes it, so this only triggers on a torn-down app or a test
double, where there is nothing to probe. Also cancel the heartbeat task in
the conflict tests that call connect() without disconnect(), matching the
production disconnect() teardown.

Verified: test_telegram_conflict.py now runs in ~4.5s; the 22
heartbeat/reconnect tests still pass; E2E confirms a hanging get_me still
fires the reconnect ladder while a missing get_me exits without spinning.
2026-06-25 18:50:11 -07:00
agt-user
8501caf51f fix(telegram): persistent heartbeat loop to detect CLOSE-WAIT polling sockets
When a Telegram long-poll TCP socket enters CLOSE-WAIT (remote sent FIN
but httpx hasn't noticed), epoll still reports it readable so no
exception is raised. PTB's error_callback never fires, the reconnect
ladder never engages, and the gateway silently stops receiving messages
while the process stays alive — until a manual systemctl restart.

The existing recovery only covers two cases: error_callback-driven
reconnects (which require an exception PTB never gets) and a one-shot
_verify_polling_after_reconnect probe (which runs only right after an
explicit reconnect). A socket that wedges during steady-state operation
is never detected.

Add _polling_heartbeat_loop: a background asyncio.Task started in
connect() (polling mode only) that probes get_me() every 90s on the
general request pool (not the getUpdates pool, so healthy long-polls are
never interrupted). On asyncio.TimeoutError/OSError it hands off to the
existing _handle_polling_network_error ladder; other errors are
swallowed. disconnect() cancels and awaits the task. Worst-case
detection window ~105s.

Complementary to #51541 (general-pool keepalive limits / fd leak) — that
recycles idle pooled connections; this detects a wedged active read.

Fixes #48495

Co-authored-by: agt-user <267614622+agt-user@users.noreply.github.com>
2026-06-25 18:50:11 -07:00
liuhao1024
56cf517ccd fix(cron): detect partial job loss in restore_cron_jobs_if_emptied (#52144)
The desktop scheduler can overwrite cron/jobs.json with its own small
set of internally-tracked crons after an update/restart, causing
partial loss of tool-created cron jobs. The previous guard only
checked for total loss (live_count == 0), missing the case where
live_count > 0 but less than the pre-update snapshot count.

Compare live_count against snap_count instead of checking for zero,
so both total loss (0 vs N) and partial loss (1 vs 19) trigger
restoration.

Salvaged from #52161 by @liuhao1024.

Closes #52144
2026-06-25 18:49:18 -07:00
brooklyn!
41f4dce828
Merge pull request #52756 from NousResearch/bb/delegate-bg-resume-ux
feat(delegation): calm "will resume" affordance for background delegate_task
2026-06-25 20:08:06 -05:00
Brooklyn Nicholson
985350dd85 feat(cli): note background delegate_task dispatch in _on_tool_complete
A top-level delegate_task dispatches in the background and re-enters as a
fresh turn when done. Print a one-line dispatch-time note — no spinner,
nothing to poll — so the idle prompt doesn't read as "nothing happened."
2026-06-25 19:57:58 -05:00
Que0x
b8fc8c908b fix(approval): fold Windows absolute home paths in dangerous-command detection
The detector folds absolute home / Hermes-home prefixes into their canonical
~/ and ~/.hermes/ forms so static patterns catch /home/alice/.bashrc the same
way they catch ~/.bashrc (abd69b81). On native Windows this fold never fired,
so terminal commands writing to shell startup files, ~/.ssh/authorized_keys,
or ~/.hermes/config.yaml / .env returned "safe" and skipped the approval
prompt — and config.yaml carries the approval policy itself.

Two compounding causes:

1. The fold ran after the backslash-escape strip (r\m -> rm), which dissolves
   the backslash separators in a Windows path (C:\Users\alice\.bashrc ->
   C:Usersalice...) before the fold could match. It now runs before the strip.
2. The fold only recognized POSIX absolute paths and only the home prefix,
   leaving multi-segment backslash suffixes (\.ssh\authorized_keys) to be
   mangled by the strip.

Consolidated into _home_prefix_fold_regex / _fold_home_prefixes: match a home
prefix with either separator, capture the rest of the path token, and
normalize its separators to / so multi-segment patterns match. The
degenerate-path guard generalizes count("/") >= 2 to "at least two components
below the root" (also rejecting a bare drive root C:\). HOME is consulted
directly because Windows' expanduser ignores it; the more specific Hermes home
is folded first, longest candidate first, so neither fold clobbers the other.

POSIX behavior unchanged; the r\m -> rm anti-obfuscation strip still runs.
Adds TestWindowsAbsolutePathFolding, which monkeypatches a Windows-style
HOME/HERMES_HOME so the behavior is also exercised on the CI runner.
2026-06-25 17:49:39 -07:00
teknium1
6dfb8326f5 fix(state): exclude delegate/branch/tool children from resume walk + reconcile salvaged fixes
Follow-up to the salvage of #45035 + #48682. The two PRs touched different
functions (resolve_resume_session_id vs get_compression_tip) but #45035's
descendant walk followed ANY parent_session_id child, so a delegate/subagent
child could hijack the resume target. Apply the same _branched_from /
_delegate_from / source!='tool' exclusion the rest of hermes_state.py uses,
so the resume walk only follows genuine compression continuations.

Also updates the unrealistic delegation test fixture to carry the real
_delegate_from marker, and updates 3 list_sessions_rich test mocks for the
order_by_last_active kwarg #48682 added.

AUTHOR_MAP: map PINKIIILQWQ + ailang323 salvage authors.
2026-06-25 16:29:09 -07:00
longer
6d9ca04574 fix(desktop): resume latest compression continuation 2026-06-25 16:29:09 -07:00
Pink
263f6b03eb chore: rename test to reflect new semantics of resolve_resume_session_id 2026-06-25 16:29:09 -07:00
PINKIIILQWQ
abd6b85200 fix(state): resolve compression chain tip in resolve_resume_session_id
After context compression, the parent session holds pre-compression messages
and a child (or deeper descendant) holds the continuation.
resolve_resume_session_id() short-circuited when the input session already
had messages (row is not None -> return session_id), causing REST API
endpoints, gateway resume, and CLI resume to serve stale parent messages.

Remove the early-return. Walk the full descendant chain, record the
deepest node that has messages (best), and return best if not None
else the original session_id (preserving the empty-chain fallback).

Callers (api_server.py, web_server.py, cli_agent_setup_mixin.py,
cli_commands_mixin.py) all use the resolved != input -> redirect pattern
and are transparent to this change.
2026-06-25 16:29:09 -07:00
Teknium
208f0d7c3b
fix(update): default pre-update backup to off (#52729)
The pre-update HERMES_HOME zip shipped on by default (DEFAULT_CONFIG +
runtime fallback both True), so every `hermes update` zipped the entire
~/.hermes — sessions DB, caches, skills — adding minutes to each update.
The shipped cli-config.yaml.example, the --backup help, and the example
config all already said "off by default," so the live default
contradicted its own documentation.

Flip the default to off everywhere: DEFAULT_CONFIG, the runtime
`.get(..., False)` fallback in _run_pre_update_backup, and the stale
--backup help string. Users who want the #48200 safety net opt in via
updates.pre_update_backup: true or --backup for a single run.

Updated test_default_enabled_creates_backup -> test_default_disabled_is_silent
to assert the new default (silent no-op, no zip).
2026-06-25 16:01:09 -07:00
kshitij
e4ff494860
fix(cron): add default retention to per-run job output (#52383) (#52646)
* fix(cron): add default retention to per-run job output to bound disk usage (#52383)

Per-run cron output (cron/output/<job>/<timestamp>.md) is written once
per execution and was never pruned, so a frequently-scheduled job on
a long-running deploy accumulates one file per run indefinitely and
can fill the volume ('no space left on device').

save_job_output() now keeps the most recent N output files per job and
removes older ones. N defaults to 50 and is configurable via
cron.output_retention; a non-positive value disables pruning for
operators who manage cleanup externally.

Salvaged from #52402 by @0xDevNinja.

Closes #52383

* fix(config): add cron.output_retention to DEFAULT_CONFIG

Follow-up to #52383: the retention config key was functional via
get()-with-default but missing from DEFAULT_CONFIG, so the deep-merge
wouldn't auto-populate it for new installs. Add it explicitly.

---------

Co-authored-by: 0xDevNinja <manmit0x@gmail.com>
2026-06-25 16:00:13 -07:00
brooklyn!
ffa3d3c811
Merge pull request #49037 from NousResearch/bb/projects-paradigm
feat(desktop): first-class projects — sidebar, coding rail, review pane, and agent project tools
2026-06-25 17:49:05 -05:00
Teknium
fd2a35b169
fix: stop reporting cache-hit rate and cost across all UI surfaces (#52717)
* fix: stop reporting cache-hit rate and cost across all UI surfaces

Cost estimates and cache read/write token reporting are unreliable on
providers that don't surface cached_tokens (e.g. ollama-cloud, which doesn't
implement prompt_tokens_details.cached_tokens), producing misleading
near-zero 'cache hit' readouts and cost figures. Remove cost + cache-hit
reporting from every user-facing surface; keep input/output/total token
counts (provider-agnostic and accurate) and the Nous account billing UI
(real account money, separate from per-conversation estimates).

Surfaces:
- CLI /usage + model-info: drop cost lines + cache read/write token lines
- Gateway /usage + /model: drop cost + cache lines
- tui_gateway/server.py: stop emitting cost_usd / cache_read in usage and
  subagent.complete payloads
- TUI (Ink): drop cost from status bar (+ showCost plumbing), /usage panel,
  thinking rollup, agents overlay (incl. compare view); keep token counts
- Desktop Command Center: drop cost stat, per-model cost, actual-cost hint

Underlying estimate_usage_cost / format_cost / insights cost columns are
left intact but no longer surfaced (display-only change, reversible).

* test: update TUI + gateway + CLI tests for removed cost/cache-hit reporting

- CLI /usage test asserts cost/cache lines are absent, tokens present
- gateway /usage test drops cost + cache asserts; removes cost-included test
- TUI subagentTree summary expectation drops the cost segment
- useConfigSync + appChrome status-rule tests drop showCost prop/state
2026-06-25 15:21:22 -07:00
teknium1
3e99ec0ff9 test(hermes_state): cover update_session_billing_route overwrite + prompt null
Regression for the salvaged #48254 fix: billing route is first-writer-wins
via update_token_counts (COALESCE), so a mid-session provider switch left
the dashboard attributing cost to the original provider. Asserts the new
update_session_billing_route() overwrites unconditionally, nulls system_prompt
so the next turn rebuilds Model:/Provider:, and preserves billing_mode when
omitted (COALESCE on None).
2026-06-25 14:44:00 -07:00
Gille
bf0513bca0 test(windows): align gateway restart CI coverage 2026-06-25 14:42:38 -07:00
Brooklyn Nicholson
86e748df13 fix(agent): require code for coding posture 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
cb3f8ec03d fix(tools): isolate per-session worktree cwd 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
4ffdedd369 feat(tools): add project workspace tools 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
4e023f5bc9 feat(gateway): build authoritative project tree 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
e7811345c1 feat(kanban): link tasks to project worktrees 2026-06-25 16:40:26 -05:00
Brooklyn Nicholson
8a45ce2dd4 feat(projects): add per-profile project store 2026-06-25 16:40:26 -05:00
Brooklyn Nicholson
4cdd1a3230 feat(sessions): record git workspace metadata 2026-06-25 16:40:26 -05:00
Teknium
c6575df927
feat(moa): expose MoA presets as selectable virtual models (#46081)
* feat(moa): expose MoA presets as selectable virtual models

Reconstructed onto current main (PR #46081's base had diverged with no common
ancestor, marking the PR dirty so CI never dispatched). MoA is now a virtual
provider: each named preset is a selectable model under provider 'moa', and the
preset's aggregator is the acting model that answers and calls tools.

Reference models fan out in parallel via a bounded ThreadPoolExecutor (the same
batch pattern delegate_task uses) — all references dispatched at once, collected
when every one finishes, then handed to the aggregator. Output order is
preserved, failures and the MoA-recursion guard stay isolated per reference.

- Removed the old mixture_of_agents model tool and moa toolset.
- Added moa as a virtual provider in the provider/model inventory.
- /moa is shortcut behavior over model selection (default preset / named preset
  / one-shot prompt).
- Dashboard + Desktop manage named presets; presets appear in model pickers.
- Parallel reference fan-out in agent/moa_loop.py with regression test.

* fix(moa): thread moa_config through _run_agent to _run_agent_inner

The reconstructed gateway MoA wiring declared moa_config on _run_agent (the
profile-scoping wrapper) and used it inside _run_agent_inner, but the wrapper
never forwarded it — _run_agent_inner had no such parameter, so the runtime hit
NameError: name 'moa_config' is not defined on the compression-failure session
sync path. Add moa_config to _run_agent_inner's signature and forward it from
both wrapper call sites (multiplex and non-multiplex). Caught by
tests/gateway/test_compression_failure_session_sync.py on CI shard test(4).

* fix(moa): classify moa as a virtual provider in the catalog

The moa virtual provider has no PROVIDER_REGISTRY/ProviderProfile entry, so
provider_catalog() fell through to the default auth_type="api_key" with no
env vars — tripping two catalog invariants:
  - test_provider_catalog: api_key providers must expose a credential env var
  - test_provider_parity: every hermes-model provider must be desktop-configurable

moa already declares auth_type="virtual" in HERMES_OVERLAYS; consult that
overlay as an auth_type fallback so the catalog reports moa as virtual (no real
credential, no network endpoint). Exempt virtual providers from the desktop
parity union check the same way 'custom' is exempt — derived from the catalog,
not a hardcoded slug, so future virtual providers are covered too.
2026-06-25 13:52:06 -07:00
teknium1
f284d85efa fix(cron): restore [SILENT] silence + suppress empty-turn explainer on Telegram
Scheduled jobs delivering to Telegram/etc. started posting a literal
'⚠️ No reply: the model returned empty content…' message instead of
staying silent. Two interacting causes:

1. The turn-completion explainer (#34452) replaces an empty model turn
   with a user-facing '⚠️ No reply…' string. In a cron context that is
   not a silence marker, so the scheduler delivered it — a regression
   from the previously-silent empty turn. run_job now detects the
   explainer text deterministically (via the same formatter that
   produced it) for abnormal-empty turn_exit_reasons and strips it to
   empty, so the existing empty-response suppression + soft-fail guard
   apply. The explainer is unchanged on CLI/gateway.

2. The cron suppression used a loose 'SILENT_MARKER in ...upper()'
   substring check. It leaked bracketless near-markers the model emits
   ('SILENT', 'NO_REPLY', 'NO REPLY' — #51438, #46917) and wrongly
   swallowed a real report that merely quoted '[SILENT]' mid-sentence.
   Replaced with _is_cron_silence_response(): suppresses a canonical
   token as the whole response, its own first/last line, or the
   documented bracketed '[SILENT] <note>' prefix — while a token buried
   mid-sentence in a genuine report is delivered. Preserves the
   intentional cron trailing/prefix tolerance (existing tests unchanged).

Tests: bracketless-variant suppression, mid-sentence-quote delivery,
direct matcher contract, and explainer-strip + defensive real-report
delivery.
2026-06-25 13:45:09 -07:00
kshitij
42bea9e298
Merge pull request #52618 from NousResearch/salvage/14185-todo-coercion
fix(tools): defensive type coercion in todo_tool for malformed LLM input (#14185)
2026-06-26 02:02:18 +05:30
infinitycrew39
d40b5735a4 test(telegram): cover table auto-rich and topic routing
Assert bare tables upgrade to sendRichMessage under default/opt-out config,
DM-topic resumed sends without reply anchors, and rich finalize edits carry
forum topic routing metadata.
2026-06-25 13:10:54 -07:00
teknium1
0d777453fa fix(auxiliary): fall back when a route can't run the model at all (400 capability mismatch)
The salvaged context-window screen (#52392) skips fallback candidates that
are too small, and the rate-limit/403 fixes skip candidates that are at
capacity. A third hard failure remained uncovered: a fallback that builds a
client fine but returns a 400 because it structurally cannot run the model.
The canonical case is a configured openai-codex / ChatGPT-account fallback
asked to compress a glm-5.2 conversation:

    400 - {'detail': "The 'glm-5.2' model is not supported when using
    Codex with a ChatGPT account."}

This is a request-validation error, so should_fallback was False and the
explicit-provider gate blocked it — the auxiliary task (compression) aborted
every turn, dropping middle turns without a summary and churning the session,
which is exactly what destroys the prompt cache.

Adds _is_model_incompatible_error() (400 + capability phrasing, excluding
not-found and billing 400s which the sibling predicates own) and treats it as
a fallback-worthy capacity error in both sync and async call_llm, so the chain
skips the incapable route and continues to the next viable candidate.
2026-06-25 13:08:18 -07:00
Tranquil-Flow
e4d026aa3b fix(auxiliary): screen fallback chain by context window for compression (#52392)
The runtime auxiliary fallback chain (_try_configured_fallback_chain and
_try_main_fallback_chain) returned the first reachable candidate without
checking whether the candidate's context window was large enough for the
task. For task='compression' this meant a reachable but undersized
fallback (e.g. 32K) could be selected and then fail, even when a later
larger-context fallback was available.

This adds two small helpers:

  _task_minimum_context_length(task)
      Returns MINIMUM_CONTEXT_LENGTH (64K) for compression, None for
      other tasks (vision, web_extract, etc.).

  _candidate_context_window(provider, model, ...)
      Thin wrapper around get_model_context_length that returns None on
      probe failure so unknown/custom endpoints pass through unchanged
      (preserves the existing fallback surface).

Both fallback loops now skip reachable candidates whose resolved context
is below the task minimum and continue iterating. The success path
(first viable candidate wins) is unchanged. Return shape and ordering
for healthy candidates are preserved.

Six regression tests cover:
  L2 configured chain skips too-small candidate
  L2 chain continues after skipping, returns last viable
  L3 main chain skips too-small candidate
  L4 unknown-context candidate passes through
  L5 non-compression task is not filtered
  L6 minimum constant matches MINIMUM_CONTEXT_LENGTH (64K)

3/6 fail on upstream/main without the production change (verified); all
6 pass with the fix. Full test_auxiliary_client.py suite (231 tests)
and related compression tests (130 tests) remain green.
2026-06-25 13:08:18 -07:00
herbalizer404
b82c83d320 fix(auxiliary): honor fallback chain when compression provider auth is unavailable
When an explicit aux provider cannot build a client before any request is
sent (missing raw env key, exhausted/unavailable OAuth or credential-pool
auth, resolver returning (None, None)), call_llm raised a misleading
"no API key was found" error and bypassed the configured fallback_chain
entirely. A provider authenticated through Hermes auth / the credential
pool (e.g. ollama-cloud) whose pool entry is exhausted hit this path, so
compression failed instead of routing to the configured fallback.

Adds _try_configured_fallback_for_unavailable_client() and wires it into
both sync and async call_llm before the raise, and into the startup
compression feasibility check.

Salvaged from #51835 by @herbalizer404.
2026-06-25 13:08:18 -07:00
pyxl-dev
751adfa6b9 fix: include rate-limit in auxiliary capacity-error fallback gate
Rate-limit (429) errors on explicit-provider auxiliary tasks were
silently failing instead of triggering the fallback chain. The
is_capacity_error gate only checked payment and connection errors,
excluding rate limits — so when a configured provider like
openai-codex hit its rate limit, auxiliary tasks (kanban_decomposer,
vision, web_extract, approval, etc.) had zero resilience.

Add _is_rate_limit_error() to is_capacity_error at both call sites
(sync and async paths) so rate limits trigger fallback regardless
of whether the provider was auto-detected or explicitly configured.

Fixes #52228
2026-06-25 13:08:18 -07:00
herbalizer404
ff8920299c fix(auxiliary): treat 403 subscription and session-usage-limit errors as payment errors for fallback
Ollama Cloud (and similar) return 403 with bodies like "this model requires
a subscription, upgrade for access" or "you have reached your session usage
limit, upgrade for higher limits". These are capacity/billing conditions
semantically identical to credit exhaustion, but _is_payment_error() did not
recognize them (403 missing from the status set; keywords missing), so the
configured fallback_chain was never tried and compression failed outright.

Adds 403 to the status set and the subscription/session-usage keywords.

Salvaged from #49076 by @herbalizer404.
2026-06-25 13:08:18 -07:00
kshitij
ca714f6189
Merge pull request #52653 from kshitijk4poor/salvage/33814-env-quote-hash
fix(config): quote .env values containing # to prevent token truncation (#30355)
2026-06-26 01:32:49 +05:30
kshitijk4poor
d9bd7ce827 test(compression): pin rotation-fallback tests to in_place=False ahead of default flip
These 7 test sites assert rotation behavior (fork, child sessions, lock
contention, logging session-context follows id rotation, boundary hooks fire
on rotation). Pin each builder to in_place=False explicitly so they keep
exercising the retained rotation fallback regardless of the global default
(flipped to True in #38763). Rotation stays a working opt-out fallback and
deserves continued coverage — these are NOT deleted.

Pinned sites:
- test_compression_concurrent_fork._build_agent_with_db
- test_compression_logging_session_context._build_agent_with_db
- test_compression_rotation_state._build_agent_with_db
- test_compression_boundary_hook._make_agent (2 helpers: CompressionBoundaryHook + SessionCompressEvent)
- test_compression_concurrent_sessions._build_agent_with_db
2026-06-25 12:56:05 -07:00
srojk34
510bf40705 fix(gateway): read compaction result flag not config flag in hygiene guard (#50098)
Salvage of #50098 by @srojk34, cherry-picked onto current main.

The hygiene auto-compress guard and the /compress slash command both read
compression_in_place (config flag — is in-place mode enabled?) instead of
_last_compaction_in_place (result flag — did in-place compaction actually
succeed?). Both agents are built without a session_db, so archive_and_compact
always fails silently and _last_compaction_in_place stays False. Reading the
config flag makes the guard think in-place succeeded, triggering
rewrite_transcript() which replaces the original messages with only the
compressed summary — permanent data loss.

Co-authored-by: srojk34 <srojk34@users.noreply.github.com>
2026-06-25 12:56:05 -07:00
Teknium
2a1e615565
fix: persist non-NULL system prompt on fresh turn setup (#45499) (#52616)
build_turn_context() created the DB session row via _ensure_db_session()
before the system prompt was restored/built, so a fresh API/gateway agent
carrying client-managed history inserted a row with system_prompt=NULL. That
tripped the misleading 'stored system prompt is null; rebuilding from scratch
... investigate the previous turn's write path' warning and a guaranteed
first-turn prefix cache miss. Move row creation to after _cached_system_prompt
is populated.

Verified live (OpenRouter + claude-sonnet-4.5): persistent-agent turns show
cache_read jumping to the full prefix on turn 2+ (write 24411 -> read 24411),
and the persisted system_prompt is non-NULL so fresh-agent restore keeps the
prefix cache warm.

Tests: turn-context ordering regression asserting _ensure_db_session runs
after _cached_system_prompt is populated.
2026-06-25 12:54:19 -07:00
Teknium
d7021af30f
fix(learn): name distilled skills as author Hermes, not the host OS user (#52388)
/learn told the agent to fill the skill `author` field, and the system
prompt environment probe surfaces the OS login name (user=$(whoami) in
prompt_builder.py), so the model wrote the host username into published
SKILL.md frontmatter — a privacy leak the user never opted into, and
inconsistent run to run as the most-salient identity changed.

The /learn authoring prompt now sets `author` to the literal value
`Hermes` and explicitly forbids deriving it from the host environment
(OS/login user, git config, or any probeable identity). The skill names
itself as the tool that wrote it.

Closes #52368.
2026-06-25 12:48:08 -07:00
helix4u
4efec63a34 fix(tools): let session_search match session titles 2026-06-26 01:12:26 +05:30
rob-maron
2c02583c2b fix shape 2026-06-25 12:38:33 -07:00
rob-maron
525ee58b43 krea 2026-06-25 12:38:33 -07:00
sweetcornna
150afea942 fix(config): quote env values containing hash 2026-06-26 00:54:34 +05:30
kshitijk4poor
73c8d5a1e7 fix: use self._session_db directly + add regression test
- Replace getattr(self.session_store, '_db', None) with self._session_db
  (the GatewayRunner's own SessionDB, consistent with existing usage in
  slash_commands.py L240/L499).
- Remove verbose comment referencing a branch name as an issue number.
- Update stale comment in run.py that said 'today it has no session_db'.
- Add regression test verifying session_db is passed and rotated session
  is persisted (adapted from #51624 by @LeonSGP43).
- Add _session_db=None to _make_runner fixtures in test_compress_command,
  test_compress_focus, and test_compress_plugin_engine.
2026-06-26 00:50:40 +05:30
Brooklyn Nicholson
e8561d61e6 test(tui_gateway): pin synchronous-build resume tests to eager_build
These three assert the eager build contract — stored runtime overrides /
profile db reach _make_agent synchronously, and the agent binds to the
compression tip. Under deferred-by-default the build runs off-thread, so
they raced the timer (green in CI, flaky locally). Pin them to
eager_build; deferred coverage lives in the protocol tests.
2026-06-25 14:13:07 -05:00
Brooklyn Nicholson
3bf00e459a perf(desktop): make deferred resume the default, not an opt-in flag
Per review: gating the faster path behind a `defer_build` flag that the
only caller always sends is pointless. Flip it — `session.resume` now
defers the agent build by default for every caller (desktop + Ink TUI);
a caller that needs the agent built synchronously passes `eager_build:
true` (used by the build-race test). The desktop no longer sends a flag.

While verifying the flip, fixed two real parity gaps the deferred path
had vs the old eager (`_init_session`) path:

- `_enable_gateway_prompts()` was never called on a deferred resume, so
  approvals/clarify wouldn't route through the gateway prompt callbacks.
- `_start_agent_build` never wired `background_review_callback` /
  `memory_notifications`, so a deferred-built session's self-improvement
  "💾 …" summary leaked to stdout instead of rendering in-transcript.
  Wiring it there also fixes it for `session.create` sessions, which
  build through the same path.

ACP is unaffected (it uses its own session_manager, not this RPC); the
Ink TUI already consumes the same lazy `info` shape from session.create
and upgrades on the later `session.info` event.
2026-06-25 14:03:03 -05:00
Brooklyn Nicholson
c4c590e4a1 perf(desktop): make session switching fast under load
Switching sessions in the desktop app could freeze the whole UI for
several seconds on heavy, tool-rich chats. Root causes and fixes:

- Cold `session.resume` built the AIAgent (MCP discovery, prompt/skill
  build) *before* returning, and the desktop awaits that RPC before it
  paints — so the entire switch blocked on the build. Add an opt-in
  `defer_build` resume path (the contract `session.create` already uses):
  return the full display transcript immediately, register an upgradable
  live session, and pre-warm the agent on a short timer. The persisted
  runtime identity (model/provider/base_url/api_mode/reasoning/tier) is
  restored on the deferred build so it can't drop the provider.

- Nothing bounded how many in-memory agents accumulate; a user who
  reconnects often piled up detached sessions for the full 6h TTL. Add a
  soft LRU cap (`max_live_sessions`, default 16) that evicts the
  least-recently-active DETACHED sessions (no live client) — never a
  running, awaiting-input, mid-build, or live-transport one. Reopening
  re-resumes from disk.

- On the prefetch-hit cold-resume path, skip rebuilding a throwaway
  merged-message array (and its 1000-entry Map) when the prefetch already
  painted the exact transcript; the downstream sameMessageList guard
  already drops the publish, so it was pure main-thread cost.

The desktop opts into `defer_build` for every non-watch cold resume; the
eager path stays for CLI/TUI and existing callers.
2026-06-25 14:03:03 -05:00
kshitij
5de8a8fbe8
Merge pull request #52375 from NousResearch/salvage/47237-dedupe-user-turns
fix(gateway): dedupe user turns on transient failure (#47237)
2026-06-26 00:30:59 +05:30
davidgut1982
6208d6b3be fix(gateway): dedupe user turns on transient failure (#47237)
When the gateway persists a user message after a transient provider
failure (429/timeout/auth error), subsequent retries of the same
Telegram message could stack duplicate user turns in the transcript,
causing the agent to fall behind by 1-2 messages.

Add has_platform_message_id() to SessionDB (using the existing
idx_messages_platform_msg_id partial index) and a SessionStore wrapper.
The gateway's transient-failure path checks this before
append_to_transcript -- if the platform_message_id is already
persisted, the duplicate write is skipped.

Salvaged from #47869 by @davidgut1982. Adapted to current main which
has additional append sites and an existing content-based dedupe in
the exception handler path.

Closes #47237
2026-06-26 00:11:17 +05:30