Commit graph

12390 commits

Author SHA1 Message Date
teknium1
d0de4601d2 fix(tui): /compress shows a before/after summary (#46686)
The TUI /compress slash side-effect compressed the session, synced the
key, and emitted session.info — but returned an empty string, so the
user saw no 'Compressed: N → M messages / ~X → ~Y tokens' feedback. The
CLI (_manual_compress) and gateway (slash_commands) paths both already
call summarize_manual_compression; the TUI slash path was the lone gap.

Snapshot history + rough token estimate before and after compaction and
return the formatted summarize_manual_compression() feedback, mirroring
the session.compress RPC handler. The estimate uses the same
estimate_request_tokens_rough(system_prompt, tools) inputs as the RPC
path, re-reading the system prompt after compaction (it may be rebuilt).

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
2026-06-21 11:36:09 -07:00
teknium1
9e4fe32d36 fix(session): opt the background-review fork out of session finalization
The background-review fork (fires ~every 10 turns) pins
review_agent.session_id = agent.session_id — the parent's LIVE id — for
prefix-cache parity, then calls close(). With session finalization now in
close(), that would end the still-active parent session mid-conversation.
Set _end_session_on_close = False on the fork so the real owner (CLI close /
gateway reset / cron) finalizes the session instead.

Follow-up to the #12029 fix.
2026-06-21 11:35:09 -07:00
yeyitech
b17180d950 fix(session): finalize owned SQLite session rows on AIAgent.close()
Funnel session finalization through AIAgent.close() — the single terminal
path every agent (CLI, gateway, subagent, cron) funnels through — so finished
agents stop leaving rows with ended_at IS NULL. The biggest leak source was
delegate_task subagent + background-review forks whose close() never ended
their row.

end_session() is first-reason-wins and no-ops on an already-ended row, so a
'compression'/'cron_complete'/'cli_close' reason set by an earlier terminal
path is never clobbered. /resume already calls reopen_session(), so
finalizing-on-close does not break resumability.

Temporary helper agents that rotate/share the session forward (manual
compression, gateway session-hygiene) opt out via _end_session_on_close=False.

Also stop the long-running gateway heartbeat once the executor is done or the
session slot is rebound to a different agent, preventing a stale
'running: delegate_task' bubble from outliving its run.

Closes #12029.
2026-06-21 11:35:09 -07:00
teknium1
41e0c10f7e fix(agent): route repeated-compression warning through _emit_status (#36908)
The 'Session compressed N times — accuracy may degrade' warning went
through _vprint (CLI stdout only), so the Ink TUI / Telegram / Discord
never saw it — unlike the two other compression warnings in the same
module, which route through _emit_status (and store _compression_warning
for late-bound gateway status_callback replay).

Set agent._compression_warning + call agent._emit_status() for this
warning too, matching the sibling pattern. _emit_status still _vprints
for the CLI, so CLI output is unchanged; TUI / gateway surfaces now
receive it via status_callback (and replay_compression_warning can
re-deliver it once a late-bound gateway callback is wired).

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
2026-06-21 11:34:47 -07:00
konsisumer
3e354b61db fix(agent): preserve copilot routed headers 2026-06-21 11:29:49 -07:00
Teknium
b6a4638b6d
fix(compressor): treat empty-content summary response as failure, not an empty summary (#50297)
When an OpenAI-compatible proxy (e.g. cmkey.cn, one-api Anthropic channels)
returns a well-formed HTTP 200 whose summary content is null or empty/
whitespace-only, _generate_summary coerced it to "" and stored a prefix-only
summary — silently replacing the compacted turns with nothing. The model then
lost all in-progress context after compression (#11978, #11914).

_validate_llm_response already guards None / empty-choices, so those never
reach the compressor; the gap was a well-formed response with empty *content*.
Now treat empty content as a summary failure: raise so it routes through the
existing main-model fallback then transient cooldown, dropping the turns
without a summary rather than wiping context with an empty one.

Also narrow the bare 'except RuntimeError' so only genuine 'No LLM provider
configured' errors take the 600s no-provider cooldown; empty/invalid-response
RuntimeErrors from a configured provider now correctly get the main-model
fallback instead of being misrouted into the long no-provider cooldown.

Reported by @Hung2124; area identified by @annguyenNous in #39590.
2026-06-21 11:27:07 -07:00
Teknium
296b290f8f chore(release): add AUTHOR_MAP entry for de1tydev (#10158) 2026-06-21 11:11:23 -07:00
Teknium
41ba90f814 fix(process): keep CLI drain dedup after poll goes read-only (#10156)
Follow-up to @de1tydev's poll-read-only fix. Removing the
_completion_consumed.add() from poll() fixes the gateway/tui watcher
suppression (#10156) but reintroduces the CLI duplicate that #8228 fixed:
a notify_on_complete process always enqueues a completion event, and the
CLI idle/post-turn drain would re-inject it as a [SYSTEM: ...] message
even though the agent already saw the exit inline in its poll result.

Add a separate _poll_observed set that poll() populates on an observed
exit. drain_notifications() (CLI only) skips poll-observed sessions; the
gateway/tui watchers keep checking only is_completion_consumed, so a
read-only poll never suppresses their autonomous delivery turn.

- _poll_observed pruned alongside _completion_consumed in _prune_if_needed
- 4 tests: CLI drain dedup after poll, gateway gate untouched, running
  poll doesn't mark observed, wait/log still skip CLI drain
2026-06-21 11:11:23 -07:00
Liao Shiwu
6f5f58e34b fix: keep poll read-only for notify_on_complete watcher 2026-06-21 11:11:23 -07:00
Eugeniusz Gilewski
9078b4bbdf fix(file): harden read_file device alias blocking
Security-hardening fix for the read_file device guard, not a new sandbox
boundary. The guard already rejects direct device paths and upstream now
has a resolved-path pass for workspace symlinks to blocked devices, but
its concrete-path helper still compared the expanded path before
normalization. That leaves residual alias cases where the dangerous path
is visible before final terminal-specific resolution, for example:

  1. /dev/../dev/zero and /dev/./urandom should match the blocked-device
     list as concrete paths, not only after final realpath;
  2. /dev/stdin-style aliases can disappear once realpath follows them
     to /proc/self/fd/0 and then to a tty path;
  3. a user symlink to /dev/../dev/stdin exposes the dangerous
     intermediate target before final resolution, but not necessarily
     after it.

Normalize expanded paths before matching and inspect each symlink hop
before falling back to realpath. This preserves the existing /proc fd and
/proc pseudo-file guards while enforcing the intended security invariant:
model-supplied read paths must not reach blocking or infinite device
streams through spelling, normalization, or symlink-hop tricks.

Classification: security hardening / residual bypass fix for the
read_file device blocklist. This is defensive code at the file-tool
boundary, but it fixes a concrete denial-of-service class tracked as
security in #10141 and #29158.

Tests:
  - normalized /dev/../dev/zero and /dev/./urandom aliases
  - symlink to /dev/../dev/stdin blocked before realpath
  - existing symlink-to-device and regular-symlink guards still pass

Fixes #10141
Fixes #29158
2026-06-21 11:11:19 -07:00
tt-a1i
ea056b0559 fix(telegram): avoid rich messages for CJK text
Telegram Mac/Desktop Bot API 10.1 rich-message rendering leaves garbled
overlapping draft/overlay glyphs for CJK text (#47653), affecting every
message containing CJK characters. The legacy MarkdownV2 path renders the
same text cleanly, so skip the rich send / draft / final-edit paths up
front for content containing CJK (incl. astral-plane extensions) until
affected clients age out. Non-CJK rich rendering is preserved.

Fixes #47653
2026-06-21 11:10:37 -07:00
brooklyn!
65a477f12e
feat(desktop): add Update now button to About panel (#50186) 2026-06-21 11:34:45 -05:00
teknium1
2f4f23fbfb fix(codex): bridge app-server item/started events to Telegram tool-progress (#38835)
When the main provider is the Codex app-server runtime (api_mode
codex_app_server), the gateway showed no verbose 'running X' tool-progress
breadcrumbs on Telegram while every other provider did. The app-server
session processes item/started notifications (command execution, file
changes, MCP/dynamic tool calls) but never surfaced them as Hermes
tool-progress events — the session was constructed without an on_event
hook, so the agent's tool_progress_callback was never invoked on this
route.

Add _codex_note_to_tool_progress() mapping item/started → (tool_name,
preview, args) for commandExecution / fileChange / mcpToolCall /
dynamicToolCall, and wire an on_event hook into CodexAppServerSession that
forwards mapped events to agent.tool_progress_callback('tool.started',
...) — the same signature the chat_completions path uses (tool_executor.py).
Non-tool items (agentMessage/reasoning) and non-item/started methods map
to None and are ignored.

Co-authored-by: jplew <462836+jplew@users.noreply.github.com>
2026-06-21 08:46:06 -07:00
yeyitech
8a506ed3ac fix(auth): make load_pool() non-destructive for env-seeded credentials
load_pool() is meant to be a read, but it persistently pruned env-seeded
pool entries whenever the calling process's os.environ lacked the seeding
var. A process without MINIMAX_API_KEY would delete the persisted
env:MINIMAX_API_KEY entry from auth.json for every other process, causing
auth.json to oscillate and auxiliary auto-detect to fall through to the
wrong provider.

env:* entries are persisted references re-hydrated from the environment on
each load — a missing var means "cannot re-seed right now", not "source is
gone forever". _prune_stale_seeded_entries now gates env-source removal
behind prune_env_sources (default True for explicit cleanup paths);
load_pool() passes prune_env_sources=False. File-backed singletons
(device-code OAuth, hermes_pkce) still prune when their backing file is
gone, and explicit removal via `hermes auth remove` (source suppression)
is unaffected.

Fixes #9331.

Co-authored-by: houko <suzukaze.haduki@gmail.com>
2026-06-21 08:26:37 -07:00
Teknium
a966932392 fix(telegram): exempt tables from rich newline hard-breaks
The newline normalization is the shared chokepoint for every rich send
(sendRichMessage, draft, and editMessageText). Injecting a Markdown hard
break (two trailing spaces) into a GFM table row separator corrupts the
natively-rendered table — the rich path's headline feature. Protect both
fenced code blocks AND pipe-table blocks as bare regions; only prose
between them gets hard breaks. Verified RICH_CONTENT and the existing
rich-table tests stay byte-identical.
2026-06-21 08:26:28 -07:00
Tranquil-Flow
31e59fe44d fix(telegram): preserve newlines in rich slash-command output (#46070)
Bot API 10.1 sendRichMessage treats a lone newline as a soft break, so
multi-line content joined with "\n".join(lines) — slash-command lists,
etc. — collapses into a single paragraph. Normalize single newlines to
Markdown hard breaks (two trailing spaces) in _rich_message_payload,
leaving paragraph breaks and fenced code blocks untouched.

Fixes #46070
2026-06-21 08:26:28 -07:00
Teknium
03563dabac
fix(gateway): raise session-hygiene hard message limit 400 → 5000 (#50194)
The gateway pre-compression hygiene valve force-compressed any session
crossing 400 messages regardless of token usage. On large-context (1M+)
models doing many short, message-dense turns, a healthy session at ~16%
token usage could hit 400 messages and get force-compressed — and the
compression summary's stale Active Task could then bleed into the next
turn.

The valve's actual purpose is to break a death spiral: when API calls
keep disconnecting on an oversized session, no token-usage data arrives,
the token threshold never fires, and the transcript grows unbounded.
It's a count-based floor for that pathological case only. 400 was tuned
for ~200K-context models and is far too low for modern large-context
sessions. Raise the default to 5000 — still well clear of any death
spiral, but no longer firing on legitimate long conversations.

The value remains fully configurable via compression.hygiene_hard_message_limit.
2026-06-21 08:26:19 -07:00
teknium1
3509be7124 fix(compression): auto-compression triggers at minimum context length (#14690)
The compaction threshold is max(context_length * threshold_percent,
MINIMUM_CONTEXT_LENGTH=64000). The floor prevents premature compression on
large models, but degenerates at small windows: a model at exactly 64000
ctx gets max(32000, 64000) = 64000 — a threshold equal to the ENTIRE
window. should_compress() can then never fire, because the provider
rejects the request before usage reaches 100%. Auto-compression silently
never triggers for any model whose context_length <= MINIMUM /
threshold_percent (e.g. 64K-per-slot local models).

Centralize the calc in _compute_threshold_tokens(). When the floor would
meet or exceed the context window, trigger at 85% of the window
(_MIN_CTX_TRIGGER_RATIO) — high enough that a minimum-context model uses
most of its budget before compacting (compacting at the 50% percentage
would waste half the small window), but below 100% so compaction actually
fires before the provider rejects the request. This mirrors the existing
gpt-5.5/Codex 85% autoraise rationale. Large-context behavior (floor at
64000) is unchanged; both call sites (__init__ and update_model) use the
shared helper.

Co-authored-by: soynchux <soynchuux@gmail.com>
Co-authored-by: LeonSGP43 <154585401+LeonSGP43@users.noreply.github.com>
Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>
2026-06-21 07:53:14 -07:00
kshitij
c6a0929875
Merge pull request #50137 from NousResearch/fix/reset-calibration-on-model-switch
fix(agent): reset stale token calibration on model switch (#23767)
2026-06-21 20:02:08 +05:30
kshitij
ed8f7898b9
Merge pull request #50136 from NousResearch/fix/context-aware-tool-budget
fix(agent): scale tool-output budget to the model context window (#23767)
2026-06-21 20:01:32 +05:30
liuhao1024
6984026f12 fix(browser): enable SSRF guard when terminal runs in container
When terminal.backend is docker/modal/daytona/ssh/singularity, the
terminal runs in a sandboxed container with network isolation, but the
browser still runs on the host.  The SSRF guard was skipped because
_is_local_backend() only checked browser.cloud_provider, not the
terminal backend.

Now _is_local_backend() also checks TERMINAL_ENV — when the terminal
is containerized, the browser is treated as non-local and SSRF
protection is enabled.

Fixes #38690
2026-06-21 07:26:18 -07:00
bogerman1
c7e8854cb3 fix(tui): persist session messages on force-quit / signal shutdown
Mirror the CLI's exit-path behaviour in the TUI gateway so that
unpersisted conversation messages are flushed to state.db and the
on_session_end plugin hook fires before the session is closed.

Root cause: _finalize_session() only called db.end_session() to
mark the session row as ended, but did NOT flush in-memory messages
via _persist_session() or fire the on_session_end hook.  When the
user force-quit (double Ctrl-C, terminal-close, SIGHUP) while the
agent was mid-turn, messages accumulated since the last persist
point were silently lost.

Changes
-------
tui_gateway/server.py - _finalize_session():
  - Persist unflushed messages via agent._persist_session() before
    db.end_session(). Prefers agent._session_messages (set by the
    last _persist_session call inside run_conversation) over
    session['history'] (stale when agent is mid-turn).
  - Fire on_session_end(interrupted=True) plugin hook so crash-
    recovery plugins can flush buffers, matching cli.py behaviour.

tui_gateway/entry.py - _log_signal():
  - Explicitly call _shutdown_sessions() before sys.exit(0) in the
    SIGHUP/SIGTERM handler as belt-and-suspenders over atexit.

tests/tui_gateway/test_finalize_session_persist.py (new):
  - 11 tests covering: history persistence, _session_messages
    priority, empty-history skip, missing-agent, double-finalize,
    persist-exception resilience, hook firing, hook-exception
    resilience, and db.end_session preservation.

Related
-------
Closes the TUI half of #5021 (CLI already handles this via its
atexit handler).  Also addresses the session-persistence gap
discussed in #18465 and #18269.
2026-06-21 07:26:07 -07:00
Teknium
e499d69e3e
feat(api-server): configurable concurrent-run cap to prevent DoS (#50007)
The OpenAI-compatible API server only enforced a hardcoded cap of 10
concurrent runs on /v1/runs, leaving /v1/chat/completions and
/v1/responses unbounded — a request flood could exhaust CPU, memory,
and upstream LLM quota (#7483).

- Add gateway.api_server.max_concurrent_runs (config.yaml, default 10,
  0 disables). No env var.
- Shared concurrency gate across all three agent-serving endpoints,
  counting both the chat/responses in-flight counter and the /v1/runs
  stream set. Returns OpenAI-style 429 + Retry-After when at the cap.
- Remove the dead hardcoded _MAX_CONCURRENT_RUNS class attribute.

Closes #7483.
2026-06-21 07:26:03 -07:00
Hariharan Ayappane
99233faf78 fix(cli): persist sessions before shutdown 2026-06-21 07:25:56 -07:00
Teknium
9f67ba1b01
fix(agent): guard finalize_turn cleanup chain so it never drops the response (#50009)
When a turn hit max_iterations, finalize_turn ran three unguarded cleanup
steps after the model's summary — _save_trajectory (file I/O), _cleanup_task_resources
(remote VM/browser teardown), and _persist_session (SQLite write). Any raise
there propagated out of run_conversation, discarding the partial final_response
the caller was waiting for; subprocess wrappers saw an empty stdout with no
traceback (#8049).

Each step is now guarded independently so one failure can't skip the others.
Failures log at ERROR with a traceback and are surfaced on the result dict via
cleanup_errors; the partial response is always returned.

Closes #8049.
2026-06-21 07:25:42 -07:00
miha
796f618f99 fix(telegram): keep chunk markers outside code fences
When truncate_message appends a (N/M) chunk indicator to a chunk that
had to close an in-progress fenced code block, the marker lands on the
closing fence line (``` \(1/2\) after MarkdownV2 escaping). Telegram
does not treat that as a clean closing fence and rejects the MarkdownV2,
falling back to plain text. Move the indicator onto its own line right
after the closing fence at all three legacy-send call sites.

Fixes #48517
2026-06-21 07:25:37 -07:00
kshitijk4poor
1e0b3a2bcc fix(agent): reset stale token calibration on model switch (#23767)
ContextCompressor.update_model() recomputed context_length/threshold/budgets
but kept the cross-call calibration state (last_real_prompt_tokens,
last_rough_tokens_when_real_prompt_fit, last_compression_rough_tokens,
awaiting_real_usage_after_compression, _ineffective_compression_count) from the
PREVIOUS model.

Those fields encode 'the provider proved this prompt fit' / 'preflight can be
deferred' decisions valid only for the model that produced them. Carried across
a switch to a smaller-context model, should_defer_preflight_to_real_usage() used
the old model's 'it fit' history to SKIP a preflight compression the new model
actually needed — sending an oversized prompt the provider rejects (#23767).

update_model() now clears that state; the new model's first response repopulates
it via update_from_response(). Verified E2E: after a 200K->65,536 switch, defer
no longer suppresses and should_compress fires on an over-threshold estimate.
2026-06-21 17:46:58 +05:30
kshitijk4poor
1965d56219 fix(agent): scale tool-output budget to the model context window (#23767)
The tool-result persistence budget was a fixed 100K chars/result and 200K
chars/turn regardless of the active model. On a small-context model (e.g. a
65K-token local model switched into mid-session) a single large tool result
(reporter: a 279K-char search result) or a full 200K-char turn (~50K tokens)
could by itself approach or exceed the window, forcing an oversized request
that the provider rejects as "Prompt too long".

- budget_config.budget_for_context_window() scales per-result/per-turn char
  caps to a fraction of the model window, clamped to the historical 100K/200K
  defaults (large models unchanged) and floored so small models stay usable.
- resolve_threshold() now caps the per-tool registry value at default_result_size
  so tools that register a fixed 100K cap (web/terminal/x_search) don't re-inflate
  a scaled-down budget. No-op for the default budget (both 100K).
- tool_executor wires the agent's live context_length (recomputed on model
  switch) into all four persist/turn-budget call sites.

read_file stays inf-pinned (no persist loop). Verified E2E: a 279K-char result
against a 65K model collapses to a ~1.6K preview; a 200K model is byte-identical
to today.
2026-06-21 17:46:38 +05:30
kshitij
5aec00f7a9
Merge pull request #50131 from kshitijk4poor/salvage/gateway-busy-readout-50103
feat(gateway+dashboard): busy/idle readout for safe lifecycle actions (salvage #50103)
2026-06-21 17:39:26 +05:30
kshitijk4poor
4d7bb382b0 refactor(gateway): route all active_agents coercion through parse_active_agents; harden drain-timeout fallback
Second cleanup pass (simplify-code review of the first follow-up):

- write_runtime_status now clamps active_agents via parse_active_agents
  instead of an inline max(0, int(...)). Removes the duplicated clamp the
  helper's docstring acknowledged AND closes a write-side ValueError gap
  (a non-numeric active_agents previously raised; now degrades to 0).
- hermes_cli/gateway.py draining-status line routes its active-agents count
  through parse_active_agents too — the third coercion site of the same
  persisted field, now consistent and non-raising with the two HTTP surfaces.
- web_server.py /api/status: the drain-timeout resolver fallback now catches
  ImportError specifically and falls back to DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
  (a real float) instead of a blanket 'except Exception -> None'. None would
  have violated the surfaced field's int/float contract and stripped NAS's
  poll-deadline hint silently.
- Dropped a redundant 'if runtime else 0' branch (parse_active_agents already
  handles the empty/None case) and tightened the parse_active_agents docstring
  to describe the actual single-contract role (write + both reads).
2026-06-21 17:22:52 +05:30
kshitijk4poor
b577f25100 refactor(gateway): dedupe drain-timeout resolution + share active_agents parse
Follow-up cleanups on top of the busy/idle readout (PR #50103):

- web_server.py /api/status reused the single drain-timeout resolver
  hermes_cli.gateway._get_restart_drain_timeout() (HERMES_RESTART_DRAIN_TIMEOUT
  env -> agent.restart_drain_timeout config -> default) instead of inlining a
  third hand-rolled copy of that precedence chain. Also fixes a subtle
  divergence: the inline copy used os.environ.get() so a set-but-empty env var
  was treated as a value rather than falling through to config; the shared
  resolver .strip()s and falls through correctly.
- Added gateway.status.parse_active_agents() and routed BOTH HTTP surfaces
  (/api/status and /health/detailed) through it, so the exposed active_agents
  field is consistently clamped non-negative. Previously /api/status clamped
  while /health/detailed exposed the raw file value, diverging on a corrupt
  count.
- Added TestParseActiveAgents covering the shared coercion contract.
2026-06-21 17:22:52 +05:30
Ben
0ee75469d7 feat(dashboard): surface gateway busy/drainable on /api/status
Give an external consumer (NAS) a trustworthy, always-reachable busy/idle
readout it can poll before a disruptive lifecycle action (restart,
migrate, stop, auto-update). The dashboard /api/status is the only HTTP
surface guaranteed up on a hosted agent regardless of which gateway
platforms are enabled, and it already reads gateway_state.json.

Add to /api/status (additive, non-breaking):
  - active_agents       — in-flight gateway-turn count (now refreshed
                          per-turn by the companion gateway-side commit)
  - gateway_busy        — running AND active_agents > 0
  - gateway_drainable   — running and live (a valid begin-drain target)
  - restart_drain_timeout — resolved seconds, so the consumer can size its
                          poll deadline without out-of-band knowledge
                          (env HERMES_RESTART_DRAIN_TIMEOUT → config
                          agent.restart_drain_timeout → default)

The busy/drainable contract is defined once in gateway.status
(derive_gateway_busy / derive_gateway_drainable) and consumed by both
/api/status and /health/detailed so the two surfaces can never disagree.
Liveness keys off gateway_running (a live PID/health probe), NEVER
gateway_updated_at — a healthy idle gateway never advances that timestamp.
All derived fields degrade to safe falsy values when the gateway is down
or the status file is absent/corrupt (never a spurious "busy" that would
wedge the consumer). active_sessions (the 5-min DB recency heuristic the
SPA reads) is left exactly as-is — new signal, new fields.

Tests (behaviour contracts, not snapshots): the pure derivation contract
across every running/state/count/liveness combination; /api/status
integration for busy, idle-drainable, draining, down, stale-busy-file,
corrupt-count, and timeout surfacing; and /health/detailed parity.
2026-06-21 17:22:52 +05:30
Ben
51a338a1b6 feat(gateway): track active_agents in runtime status on turn boundaries
The gateway only rewrote gateway_state.json on lifecycle transitions
(start/connect/drain/stop), never on turn start/end. Live-verified on a
hosted agent: a confirmed end-to-end turn ran while gateway_updated_at
stayed frozen at boot and active_agents was absent — so any active_agents
read from the file between transitions is stale. That makes it unusable
as a busy/idle signal for an external consumer (NAS deciding whether it's
safe to restart/migrate/auto-update an agent mid-turn).

Add _persist_active_agents(), called at every turn boundary:
  - turn start: both running-agent sentinel-claim sites (normal inbound
    message path + startup-resume path)
  - turn end: the central _release_running_agent_state() choke point
    (covers normal completion, /stop, /reset, sentinel cleanup,
    stale-eviction — every path that ends a running turn)

It passes ONLY active_agents to write_runtime_status, leaving
gateway_state (and every other field) _UNSET so the read-merge-write
preserves the current lifecycle state. Passing gateway_state=None would
clobber it — hence a dedicated helper rather than reusing
_update_runtime_status. The write is the same cheap JSON write done on
lifecycle transitions today; best-effort (a failed status write never
disrupts a turn).

Behaviour-contract test: an active_agents-only write preserves both
running and draining gateway_state, and the count clamps non-negative.
2026-06-21 17:22:52 +05:30
kshitij
44d552ea5a
Merge pull request #50115 from NousResearch/salvage/model-switch-preflight-warning
fix(cli): warn when in-session model switch will preflight-compress
2026-06-21 16:41:44 +05:30
kshitijk4poor
1ca29723f0 fix(cli): log instead of swallow preflight-warning errors; consistent TUI warning field
Follow-up to the salvaged preflight-compression warning:
- Replace silent `except Exception: pass` at all 5 guard call sites
  (cli.py x2, gateway/slash_commands.py x2, tui_gateway/server.py) with
  `logger.debug(...)` so signature drift in the guard helper isn't hidden.
- tui_gateway/server.py: set the confirm dict's `warning` field to the
  merged message (was bare expensive-model text) so it matches
  `confirm_message` for any future consumer reading `warning`.
- Add trailing newlines to the two new files.
2026-06-21 16:31:56 +05:30
Tuna Dev
04730f32e7 fix(cli): warn when in-session model switch will preflight-compress
Adds hermes_cli/context_switch_guard.py mirroring the model_cost_guard
pattern. When a user switches models mid-session (Herm TUI picker, CLI,
or /model on Telegram/Discord), the warning surfaces on the existing
ModelSwitchResult.warning_message path used by the expensive-model
guard if the new model's compression threshold is below the current
session size.

Partial fix for #23767 — addresses only the 'user-facing guardrail
when switching from a high-context provider to a substantially
lower-context provider' slice. The other proposed fixes from that
issue (hard preflight token guard, metadata cache invalidation on
switch, compression safety invariant, oversized tool-output handling)
are out of scope for this PR.
2026-06-21 16:29:31 +05:30
xxxigm
7b9a0b315b test(mcp): cover 'unknown method' ping keepalive fallback (#50028)
Two regression tests for the agentmemory reconnect-loop:

- _is_method_not_found_error matches the plain 'Unknown method: ping'
  phrasing (no structural -32601 code).
- _keepalive_probe latches _ping_unsupported and falls back to list_tools
  when send_ping raises 'Unknown method: ping', instead of propagating
  (which would reconnect-loop).
2026-06-21 16:02:56 +05:30
xxxigm
472c068159 fix(mcp): detect 'unknown method' phrasing in ping keepalive fallback
A server that doesn't implement the optional 'ping' utility answers a
keepalive ping with JSON-RPC method-not-found. _is_method_not_found_error
latches that condition so the probe falls back to list_tools instead of
reconnect-looping.

The substring fallback only matched 'method not found' / '-32601' /
'not found: ping'. Servers that surface method-not-found as the common
'Unknown method: <name>' phrasing without a structural -32601 code (e.g.
agentmemory's MCP server) slipped through, so the fallback never latched
and the keepalive reconnect-looped every cycle.

Add 'unknown method' to the substring fallback so the ping->list_tools
keepalive fallback latches for these servers too.

Fixes #50028.
2026-06-21 16:02:56 +05:30
kshitij
8ca38d3121
Merge pull request #50100 from kshitijk4poor/salvage/model-visibility-cross-provider-47450
fix(desktop): preserve other providers' hide-all in model visibility dialog (salvage #47450)
2026-06-21 15:56:00 +05:30
kshitijk4poor
461fcc0964 test(desktop): harden model-visibility toggle + dedupe default expansion
Follow-up to the salvaged #47450 fix:
- Extract expandProviderDefaults() so the curated-default expansion rule
  lives in one place (was duplicated between defaultVisibleKeys and
  resolveVisibleKeys).
- Drop the redundant new Set() wrap in toggleModelVisibility (resolveVisibleKeys
  already returns a fresh Set; effectiveVisibleKeys already relied on this).
- Document the intentional re-enable behavior (re-enabling one model of a
  hidden-all provider restores only that model, not the curated defaults) and
  tighten the toggleModelVisibility JSDoc.
- Add 7 hardening tests: re-enable-restores-only-that-model, full hide/re-enable
  round-trip, empty-non-null stored, single toggle-off from null defaults,
  zero-model provider, and direct resolveVisibleKeys null/empty assertions.
2026-06-21 15:46:58 +05:30
David Doan
8666fd7635 fix(desktop): preserve other providers' hide-all in model visibility dialog
#43496 added a per-provider hide-all sentinel ('provider::') so emptying a provider in the Edit Models dialog stopped re-expanding its defaults. That fixed the single-provider case, but the dialog's toggle handler seeds its working set from effectiveVisibleKeys(), which strips ALL sentinels before returning. So persisting after any toggle silently dropped every OTHER provider's hide-all sentinel; those providers then looked 'never customized' and re-enabled all their models on the next render.

Split resolution into two functions:

- resolveVisibleKeys(): stored keys + curated default expansion, with hide-all sentinels PRESERVED — the canonical working set the toggle handler mutates and persists.

- effectiveVisibleKeys(): resolveVisibleKeys() then strips sentinels, for display only (unchanged contract).

Move the toggle set-computation into a pure, unit-tested toggleModelVisibility() that seeds from resolveVisibleKeys(), so sibling sentinels survive the persist. Add regression tests that drive the real toggle handler across multiple providers.

Follow-up to #43496; completes the fix for #43485 (cross-provider case).
2026-06-21 15:42:26 +05:30
kshitij
f57ff7aef1
Merge pull request #50034 from NousResearch/salvage/cron-tz-offset-repair
Some checks are pending
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
Tests / test (1) (push) Waiting to run
Tests / test (2) (push) Waiting to run
Tests / test (3) (push) Waiting to run
Tests / test (4) (push) Waiting to run
Tests / test (5) (push) Waiting to run
Tests / test (6) (push) Waiting to run
Tests / save-durations (push) Blocked by required conditions
Tests / e2e (push) Waiting to run
Typecheck / typecheck (apps/bootstrap-installer) (push) Waiting to run
Typecheck / typecheck (apps/desktop) (push) Waiting to run
Typecheck / typecheck (apps/shared) (push) Waiting to run
Typecheck / typecheck (ui-tui) (push) Waiting to run
Typecheck / typecheck (web) (push) Waiting to run
Typecheck / desktop-build (push) Waiting to run
fix(cron): repair migrated timezone offsets to prevent double-fire
2026-06-21 13:53:28 +05:30
kshitij
f6a504d088
Merge pull request #50025 from NousResearch/salvage/cron-run-immediate
fix(cron): execute job immediately on action=run
2026-06-21 13:53:13 +05:30
kshitij
3051a1634c
Merge pull request #50023 from NousResearch/salvage/f3b-telegram-dmtopic
fix(cron): route Telegram DM-topic cron delivery through DeliveryRouter (#22773)
2026-06-21 13:47:30 +05:30
kshitijk4poor
f43c61643d chore(release): add devsart95 to AUTHOR_MAP 2026-06-21 13:35:50 +05:30
kshitijk4poor
4cc28aa3bb fix(cron): route Telegram DM-topic cron delivery through DeliveryRouter (#22773)
PR #22410 added three-mode Telegram topic routing to the live message path
(TelegramAdapter.send via the gateway DeliveryRouter), but the cron delivery
path never got it. cron/scheduler.py::_deliver_result sent through the live
adapter with a bare ``{"thread_id": ...}`` and fell back to the standalone
_send_telegram, neither of which addresses Bot API Direct Messages topics
correctly. After Bot API 10.0 (2026-05-08), sending to a private chat with a
bare ``message_thread_id`` is rejected/mis-routed, so cron deliveries to a
private DM topic landed in the General topic instead of the requested lane.

Fix: the cron live-adapter branch now routes the text send through the
gateway's ``DeliveryRouter._deliver_to_platform`` — the same canonical path
live messages use — so it inherits all three Telegram routing modes:

  1. Forum/supergroup (negative chat_id) -> message_thread_id
  2. Bot API DM topics (private chat_id + numeric topic id) ->
     direct_messages_topic_id  (the case #22773 reported)
  3. Hermes-created named private DM-topic lanes -> ensure_dm_topic +
     reply anchor

For mode 2, a private-chat target with a numeric topic id is passed as
``direct_messages_topic_id`` metadata (verified end-to-end:
TelegramAdapter._thread_kwargs_for_send turns it into
``{message_thread_id: None, direct_messages_topic_id: <int>}``), instead of a
bare message_thread_id. Forum/supergroup and home-channel deliveries are
unchanged. The standalone fallback (gateway down) is preserved.

No new config knob and no duplicated routing logic — this reuses the existing
DeliveryRouter rather than reimplementing topic routing in the cron path.

Salvaged from #42051 (stepanov1975) and #23249 (devsart95), which both
diagnosed the missing three-mode routing in the cron/standalone path;
reimplemented onto the canonical DeliveryRouter that landed since those PRs
were opened.

Co-authored-by: Alex <9785479+stepanov1975@users.noreply.github.com>
Co-authored-by: devsart95 <devsart95@gmail.com>
2026-06-21 13:35:45 +05:30
Tranquil-Flow
f1f36b3bae fix(cron): repair migrated cron timezone offsets to prevent double-fire
A recurring cron job persists `next_run_at` as an absolute timestamp with a
UTC offset (e.g. `2026-05-19T21:00:00+10:00`). Cron expressions, however,
describe *local wall-clock* intent ("run at 21:00"). When Hermes/system
timezone changes after the timestamp was persisted, the stored instant is
re-interpreted in the new zone: `21:00+10:00` is the instant `13:00+02:00`,
which is `<= now` (13:02+02:00) — so the job fires HOURS EARLY, then
`compute_next_run` advances it via croniter to `21:00+02:00` the same day,
producing a SECOND fire. (#28934, recurrence of #24289.)

`_get_due_jobs_locked` now detects this precise migration case before the
due check: for a `cron` job whose converted instant looks due, whose stored
UTC offset differs from the current zone's, AND whose stored *wall-clock*
time is still in the future (distinguishing a migrated offset from a
genuinely missed run), it recomputes `next_run_at` from the schedule and
skips the early fire — preserving the local wall-clock intent.

Verified against the issue's reproducer: stored `21:00+10` under runtime
`+02:00` at wall-clock `13:02` is rescheduled to `21:00+02` instead of
firing early + again.

Salvaged from #28941 by @Tranquil-Flow (authorship preserved). Chosen over
the alternative approaches (#28951 normalize-to-UTC, #28985 rebase-and-match)
because UTC-normalization does not change the absolute-instant comparison and
so does not fix the early fire, and this guard is the tightest: it only acts
when all four conditions hold and reuses the existing `compute_next_run`.

Fixes #28934
2026-06-21 13:31:31 +05:30
kshitij
02a3288de3
Merge pull request #50018 from NousResearch/salvage/f3a-delivery-confirm
fix(cron): make live-adapter delivery confirmation reliable (#38922, #47056, #43014)
2026-06-21 13:29:45 +05:30
kyssta-exe
65d7c7fafd fix(cron): execute job immediately on action='run'
`cronjob(action='run')` (and `hermes cron run`) only set `next_run_at = now`
and returned success, relying on the scheduler ticker to actually execute the
job on its next tick. When no gateway/ticker is running — a CLI-only setup, or
the Windows case in #41037 — the job never executed: `run` reported success,
but `last_run_at` stayed null forever, no output, no delivery.

A manual `run` should actually run. `_execute_job_now` now:

- **claims the job via `claim_job_for_fire`** — the same at-most-once CAS the
  scheduler/external-provider fire path uses. This both advances `next_run_at`
  for recurring jobs and blocks a concurrently-running gateway ticker from
  double-firing the same job; if the claim is lost, the run is skipped (the
  tool reports `execution_skipped`). This closes the double-fire race that a
  bare `advance_next_run` left open (a tick whose `get_due_jobs` already
  captured the job between trigger and advance would still fire it).
- **delegates firing to `run_one_job`** — the single shared
  execute→save→deliver→mark body the ticker and external providers use — so
  failure delivery, `[SILENT]` handling, and live-adapter delivery stay
  identical across paths and can't drift. (The original salvage re-implemented
  this sequence inline and had already dropped failure delivery + `[SILENT]`.)

The tool response carries `executed`, `execution_success`, and either
`execution_error` or `execution_skipped`. The `hermes cron run` CLI message no
longer claims "It will run on the next scheduler tick" — it reports the actual
"Ran now: succeeded/failed" outcome (or the skip).

Salvaged from #41130 by @kyssta-exe (authorship preserved); reworked to reuse
`claim_job_for_fire` + `run_one_job` per review rather than re-implementing the
fire sequence inline. Adds tests for the claim-then-fire path, claim-lost skip,
failure reporting, and exception capture.

Fixes #41037

Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
2026-06-21 13:28:04 +05:30
kshitij
9f4c0b27c9
Merge pull request #50016 from NousResearch/salvage/cron-ticker-liveness 2026-06-21 13:08:46 +05:30