Commit graph

12408 commits

Author SHA1 Message Date
Eugeniusz Gilewski
def3f6388f fix(file): anchor device symlink guard to task cwd
The read_file device guard now walks symlink hops before the file operation
layer, but that hop walk still interpreted relative paths against the Python
process cwd. In sessions where TERMINAL_CWD points at the task workspace, a
relative workspace symlink to a blocked alias such as /dev/../dev/stdin could
therefore miss the intermediate device target before later task-cwd resolution.

Anchor relative device checks to the task base before symlink-hop inspection so
the pre-I/O guard sees the same workspace path that read_file would otherwise
read. Absolute device paths and the existing final realpath fallback remain
unchanged.

Refs #10141
Refs #29158
2026-06-21 12:16:10 -07:00
teknium1
e267237671 test(photon): cover overflow retry, typing cooldown, sidecar-crash detection
Follow-up for salvaged PR #50256. Unit tests for the three behaviors:
retryable classification of Envoy/sidecar overflow strings, per-chat typing
cooldown with stop_typing reset, and the _supervise_sidecar crash-detection
path that raises a retryable fatal (and the clean-shutdown no-op).
2026-06-21 12:15:44 -07:00
joaomarcos
9578e52795 fix(photon): detect unexpected sidecar death and trigger reconnect
When the Node spectrum-ts sidecar process exited mid-session (crash,
OOM, upstream overflow escalation), _supervise_sidecar returned
silently — readline hit EOF, the log-pump loop broke, and nothing
notified the gateway. _inbound_loop entered an infinite retry loop
against a dead port, _running stayed True, and the adapter remained
in self.adapters with no path to self-recovery short of a manual
gateway restart.

Add a death-detection tail to _supervise_sidecar: after the log-pump
exits (EOF or exception), guard on _inbound_running to distinguish
unexpected death from a deliberate disconnect(). On unexpected exit,
call _set_fatal_error("SIDECAR_CRASHED", retryable=True) followed by
_notify_fatal_error() so the reconnect watcher picks up the platform
within 30 s and retries with exponential backoff (30 s → 300 s cap)
until the sidecar comes back up. All other platforms remain unaffected.

The _inbound_running guard is safe against races: disconnect() sets
_inbound_running = False before _stop_sidecar() cancels the supervisor
task. CancelledError is BaseException, not Exception, so it bypasses
the except clause and propagates normally — the detection block never
runs during a clean shutdown.
2026-06-21 12:15:44 -07:00
joaomarcos
2a4542333e fix(photon): classify Envoy overflow errors as retryable; add typing cooldown
Closes #50185

Two independent gaps let a transient Photon/Spectrum upstream overflow
degrade message delivery and amplify gRPC pressure:

1. _is_retryable_error did not recognise Photon- or Envoy-specific error
   strings ("internal sidecar error", "upstream connect error",
   "reset reason: overflow"), so _send_with_retry fell through to the
   plain-text fallback immediately instead of backing off and retrying.

2. send_typing had no rate gate, so a burst of typing-indicator calls
   during an overflow event kept hitting the upstream gRPC connection and
   widened the failure window.

Fix:
- Add _PHOTON_RETRYABLE_PATTERNS with the three high-specificity Envoy /
  sidecar substrings and override _is_retryable_error on PhotonAdapter to
  check them after delegating to the base-class patterns.  base.py and all
  other adapters are untouched.
- Add a 5 s per-chat cooldown in send_typing backed by _typing_last_sent.
  stop_typing clears the entry so the next start after a completed turn
  fires immediately — only rapid consecutive starts without a stop are
  suppressed.
- Reduce PhotonAdapter._send_with_retry default max_retries from 2 to 1
  (single 2 s back-off check) — enough to confirm whether the Envoy
  circuit-breaker has opened, without adding unnecessary latency.

All changes are scoped to plugins/platforms/photon/adapter.py.
2026-06-21 12:15:44 -07:00
Teknium
7a131f7f40
fix(api-server): stop silently promising async delivery on stateless HTTP path (#50319)
* fix(api-server): stop silently promising async delivery on stateless HTTP path

terminal(notify_on_complete=True / watch_patterns) and delegate_task(background=True)
silently no-op'd on the API server / WebUI path (#10760): the watcher / detached
child registered, but every API-server route (OpenAI-spec /v1/chat/completions
and /v1/responses, plus the proprietary /v1/runs SSE stream) tears down its
channel when the turn ends, and APIServerAdapter.send() is a no-op stub. A
completion that fires after the response closed had nowhere to go — from the
agent side, indistinguishable from a hang.

There is no spec-compliant surface to wake the agent later on a stateless HTTP
client, so make the no-op honest instead of silent:

- Add a per-adapter capability flag supports_async_delivery (default True;
  APIServerAdapter = False), propagated into a HERMES_SESSION_ASYNC_DELIVERY
  contextvar via async_delivery_supported(). Toggle on the adapter, not a
  hardcoded platform string — a future stateless adapter is correct-by-default.
- terminal: when delivery is unsupported, skip watcher registration, force
  notify_on_complete off, and return a notify_unsupported note telling the
  agent to process(action='poll').
- delegate_task: when delivery is unsupported, fall back to SYNCHRONOUS
  execution (work runs and returns in the same response) with a note, instead
  of handing out a handle that never resolves.

CLI (in-process completion_queue) and the real gateway platforms are unchanged.

Fixes #10760

* refactor(api-server): route session binding through a single no-delivery chokepoint

Add APIServerAdapter._bind_api_server_session() and route both agent-entry
paths (_run_agent for /v1/chat/completions + /v1/responses, and the /v1/runs
_run_sync path) through it. The helper hardwires platform="api_server" and
async_delivery=False with no async_delivery parameter to pass, so a future
route added to the API server physically cannot reintroduce the silent
no-op (#10760) by forgetting to mark the channel as non-delivering.

The binding stays request-scoped (cleared per turn), so a session resumed
later on a delivering interface (CLI / gateway platform) re-binds fresh and
is NOT blocked — the no-delivery decision tracks the interface handling the
current turn, never the session.
2026-06-21 12:15:14 -07:00
JackJin
56255f83f7 fix(agent): stop delegate cascade from deleting the parent session
_collect_delegate_child_ids() walks the _delegate_from marker chain to
gather delegate subagents for cascade deletion, but started its visited
set empty. When the chain loops back onto a parent — a delegation cycle,
or a parent that is also another parent's delegate child when several ids
are deleted together — that parent was collected as one of its own
descendants and then permanently deleted, along with all of its messages,
by _delete_delegate_children().

Seed the visited set with the parent ids so they can never be re-collected,
and exclude them from the returned child set. Callers (delete_session,
bulk delete) remove the parents separately, so this only prevents the
unintended parent deletion; legitimate child collection is unchanged.

Add regression tests (in-memory sqlite) covering single/multi-level
delegate chains, the parent_session_id+marker branch, untagged children
(orphan-don't-delete contract), and the cycle case that previously leaked
the parent into the deletion set.

Fixes #49148
2026-06-21 12:09:16 -07:00
Teknium
e581740aa1
fix(kanban): single-writer dispatch lock to prevent orphan-dispatcher DB corruption (#50331)
A shell-launched 'hermes gateway run --replace' / 'gateway restart' on a
systemd/launchd host can leave an orphan gateway whose kanban dispatcher
escapes the service cgroup, survives 'systemctl restart', and becomes a
second long-lived writer on the shared kanban.db. Two dispatchers that each
believe they own the file both pass SQLite busy_timeout and then race on WAL
frames — the documented root cause of multi-writer corruption (issue #35240).

The existing _guard_supervised_gateway_conflict startup guard blocks the
common way an orphan is born, but does nothing once a second dispatcher
already exists. This adds the defense-in-depth: dispatch_once now wraps every
tick in a non-blocking, board-scoped flock (_dispatch_tick_lock). A losing
dispatcher returns DispatchResult(skipped_locked=True) and does zero DB writes
this tick — so two dispatchers can never run a reclaim/spawn/write sequence
concurrently regardless of how the second one got there.

- Non-blocking (LOCK_NB): never stalls the gateway's async watcher.
- Board-scoped: lock file is a .dispatch.lock sibling of each board's
  kanban.db, so unrelated boards tick in parallel.
- POSIX + Windows (fcntl / msvcrt LK_NBLCK), no-op degrade where neither
  exists — mirrors the existing _cross_process_init_lock pattern.

Verified with a real two-process orphan repro: while a separate process holds
the lock, dispatch_once skips; after release it runs.
2026-06-21 12:06:24 -07:00
Teknium
587b5b9ac2
fix(backup): capture memory-provider state stored outside HERMES_HOME (#50325)
hermes backup only walks HERMES_HOME, so memory providers that keep
config/credentials in home-anchored dotdirs (honcho -> ~/.honcho,
hindsight -> ~/.hindsight, openviking -> ~/.openviking) lost that data
across a backup/import cycle — the peer IDs, session pairings, and API
keys never made it into the archive.

Add an optional MemoryProvider.backup_paths() hook (default []). The
active provider declares its external paths; backup resolves them from
config only (no init, no network), archives the ones under the home dir
into a reserved _external/ subtree encoded relative to home, and import
restores them to their original location with a home-anchored traversal
guard and 0600 on credential-shaped files. Paths outside home are
skipped as non-portable.

honcho, hindsight, and openviking override the hook. E2E-validated full
backup->import cycle plus 7 new tests.
2026-06-21 12:03:46 -07:00
Teknium
7a8c4fe238 chore(release): add AUTHOR_MAP entry for #48422 salvage 2026-06-21 12:03:24 -07:00
kn8-codes
6183e8ce1b fix(telegram): make Bot API 10.1 rich messages opt-in (default off)
Rich messages are not ready for primetime: current Telegram clients can
render Bot API 10.1 rich messages as blank/unsupported bubbles and make
them hard to copy as plain text, which is worse than the legacy
MarkdownV2 path for command snippets and mobile handoffs. Default the
rich_messages toggle to False so replies stay on the copyable legacy
path; users opt in per bot via platforms.telegram.extra.rich_messages:
true. Updates adapter, gateway config default, example config, English +
zh-Hans docs, and the default/opt-in tests.
2026-06-21 12:03:24 -07:00
Stephen Chin
3b56d3a29a fix(security): redact secrets in kanban tool payloads before persistence 2026-06-21 12:02:30 -07:00
Teknium
d19aabbf2d
fix(gateway): persist in-flight transcript on restart/shutdown drain timeout (#50312)
A turn forcibly interrupted by the drain-timeout escalation never reaches
turn_finalizer.finalize_turn (the only place that flushes the turn to
state.db). Its in-flight tool rounds live only in the in-memory
_session_messages, so the immediate pre-restart turn was silently dropped
from load_transcript() on resume.

_finalize_shutdown_agents now flushes _session_messages to the SQLite
session store before teardown. The flush is idempotent (identity-tracked
in _flush_messages_to_session_db), so agents that finished gracefully
re-flush nothing. The resume_pending / fresh-tool-tail branches in
_handle_message_with_agent already expect a transcript whose tail may be a
pending tool result.

Fixes #13121.
2026-06-21 11:57:15 -07:00
sgaofen
93ea9b04af fix(gateway): cap inbound media download size to prevent memory exhaustion
Inbound image/audio/video payloads were buffered fully into process memory
before being written to the cache, with no size limit. A large upload
(Discord Nitro allows 500 MB) or a remote media URL in an inbound message
pointing at a huge file could spike RAM and OOM-kill the gateway.

Enforce a configurable cap in the shared cache helpers (gateway/platforms/
base.py) so the protection holds across every platform adapter, not one:

- cache_image/audio/video_from_bytes reject oversized payloads before writing
  (video was the gap in the original report — now covered).
- cache_image/audio_from_url stream the body, rejecting on an oversized
  Content-Length header and re-checking the running total per chunk so an
  absent/lying header can't smuggle an unbounded body past the cap.
- Discord's _read_attachment_bytes checks att.size up front, so an oversized
  attachment is rejected before any bytes are pulled into memory.

Configurable via gateway.max_inbound_media_bytes in config.yaml (default
128 MiB; 0 disables). No new env var — non-secret config lives in config.yaml.

Salvaged and extended from @sgaofen's PR #13341 (the original report and the
shared-helper approach). Reapplied onto current main (Discord adapter has
since moved to plugins/platforms/discord/), the configurable knob moved from
an env var to config.yaml, and the video cache helper added.

Co-authored-by: Hermes Agent <noreply@nousresearch.com>
2026-06-21 11:56:46 -07:00
teknium1
16899ae144 test(file): update guard assertions for unified display-text message
The salvaged #19820 unifies the write_file guard under
_is_internal_file_tool_content with the message 'internal read_file
display text'. Two tests added to test_file_read_guards.py after the PR
branch point still asserted the old 'status text' wording. Update them
to match the new (correct, more general) message.
2026-06-21 11:55:59 -07:00
Brandon Zarnitz
71274f264b fix(file): reject read_file line-numbered writeback 2026-06-21 11:55:59 -07:00
Teknium
a18bae65b9
fix(config): redact api_key in config show/set output (#50245) (#50313)
hermes config show printed the model dict raw via print(), bypassing the
logging redactor; a custom-provider api_key (e.g. Cloudflare cfut_...) was
shown in plaintext even with security.redact_secrets=true. Opaque tokens
don't match any vendor-prefix regex, so structural key-name masking is
required.

- Add redact_config_value(): recursively masks credential-shaped keys
  (api_key/token/secret/... exact-match) via mask_secret.
- Wrap the show_config model dump in it.
- Mask the set_config_value echo when the leaf key is credential-shaped
  (config set model.api_key routes to config.yaml, lowercase misses the
  .env allowlist).
2026-06-21 11:50:31 -07:00
Teknium
e0498bd305
fix(bedrock): price Claude prompt-cache tokens in /usage (#50307)
Bedrock Claude routes through the AnthropicBedrock SDK and injects
cache_control, so cached tokens are always reported — but the pricing
table had no cache cost fields for any Bedrock model, so /usage showed
"cost unknown" on every cached session. Also, cross-region inference
profiles (us./global./eu. prefixes) never matched the bare pricing keys.

- Add cache_read/cache_write rates to the four Bedrock Claude rows
  (read 0.1x input, write 1.25x input per the Bedrock pricing page).
- Normalize the cross-region prefix in the Bedrock pricing lookup,
  mirroring is_anthropic_bedrock_model's prefix list.

Closes #50295.
2026-06-21 11:48:43 -07:00
LehaoLin
7bc6f18062 fix(hindsight): skip local_embedded daemon when running as root
PostgreSQL's initdb refuses to run as root, so the embedded Hindsight
daemon could never initialize its data directory under root. The
daemon-start thread would fail, retry, and loop forever — each cycle
reloading embedding models (~958MB RAM, ~33% CPU) with no user-visible
error, leaving Hermes sluggish on a common VPS/cloud root setup.

initialize() now detects root (os.geteuid() == 0) before spawning the
daemon thread, disables local_embedded mode, and surfaces a clear
warning to both the log and the terminal so the user knows to run as a
non-root user or switch to cloud / local_external mode.

Closes #13125.

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-06-21 11:47:02 -07:00
teknium1
d0de4601d2 fix(tui): /compress shows a before/after summary (#46686)
The TUI /compress slash side-effect compressed the session, synced the
key, and emitted session.info — but returned an empty string, so the
user saw no 'Compressed: N → M messages / ~X → ~Y tokens' feedback. The
CLI (_manual_compress) and gateway (slash_commands) paths both already
call summarize_manual_compression; the TUI slash path was the lone gap.

Snapshot history + rough token estimate before and after compaction and
return the formatted summarize_manual_compression() feedback, mirroring
the session.compress RPC handler. The estimate uses the same
estimate_request_tokens_rough(system_prompt, tools) inputs as the RPC
path, re-reading the system prompt after compaction (it may be rebuilt).

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
2026-06-21 11:36:09 -07:00
teknium1
9e4fe32d36 fix(session): opt the background-review fork out of session finalization
The background-review fork (fires ~every 10 turns) pins
review_agent.session_id = agent.session_id — the parent's LIVE id — for
prefix-cache parity, then calls close(). With session finalization now in
close(), that would end the still-active parent session mid-conversation.
Set _end_session_on_close = False on the fork so the real owner (CLI close /
gateway reset / cron) finalizes the session instead.

Follow-up to the #12029 fix.
2026-06-21 11:35:09 -07:00
yeyitech
b17180d950 fix(session): finalize owned SQLite session rows on AIAgent.close()
Funnel session finalization through AIAgent.close() — the single terminal
path every agent (CLI, gateway, subagent, cron) funnels through — so finished
agents stop leaving rows with ended_at IS NULL. The biggest leak source was
delegate_task subagent + background-review forks whose close() never ended
their row.

end_session() is first-reason-wins and no-ops on an already-ended row, so a
'compression'/'cron_complete'/'cli_close' reason set by an earlier terminal
path is never clobbered. /resume already calls reopen_session(), so
finalizing-on-close does not break resumability.

Temporary helper agents that rotate/share the session forward (manual
compression, gateway session-hygiene) opt out via _end_session_on_close=False.

Also stop the long-running gateway heartbeat once the executor is done or the
session slot is rebound to a different agent, preventing a stale
'running: delegate_task' bubble from outliving its run.

Closes #12029.
2026-06-21 11:35:09 -07:00
teknium1
41e0c10f7e fix(agent): route repeated-compression warning through _emit_status (#36908)
The 'Session compressed N times — accuracy may degrade' warning went
through _vprint (CLI stdout only), so the Ink TUI / Telegram / Discord
never saw it — unlike the two other compression warnings in the same
module, which route through _emit_status (and store _compression_warning
for late-bound gateway status_callback replay).

Set agent._compression_warning + call agent._emit_status() for this
warning too, matching the sibling pattern. _emit_status still _vprints
for the CLI, so CLI output is unchanged; TUI / gateway surfaces now
receive it via status_callback (and replay_compression_warning can
re-deliver it once a late-bound gateway callback is wired).

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
2026-06-21 11:34:47 -07:00
konsisumer
3e354b61db fix(agent): preserve copilot routed headers 2026-06-21 11:29:49 -07:00
Teknium
b6a4638b6d
fix(compressor): treat empty-content summary response as failure, not an empty summary (#50297)
When an OpenAI-compatible proxy (e.g. cmkey.cn, one-api Anthropic channels)
returns a well-formed HTTP 200 whose summary content is null or empty/
whitespace-only, _generate_summary coerced it to "" and stored a prefix-only
summary — silently replacing the compacted turns with nothing. The model then
lost all in-progress context after compression (#11978, #11914).

_validate_llm_response already guards None / empty-choices, so those never
reach the compressor; the gap was a well-formed response with empty *content*.
Now treat empty content as a summary failure: raise so it routes through the
existing main-model fallback then transient cooldown, dropping the turns
without a summary rather than wiping context with an empty one.

Also narrow the bare 'except RuntimeError' so only genuine 'No LLM provider
configured' errors take the 600s no-provider cooldown; empty/invalid-response
RuntimeErrors from a configured provider now correctly get the main-model
fallback instead of being misrouted into the long no-provider cooldown.

Reported by @Hung2124; area identified by @annguyenNous in #39590.
2026-06-21 11:27:07 -07:00
Teknium
296b290f8f chore(release): add AUTHOR_MAP entry for de1tydev (#10158) 2026-06-21 11:11:23 -07:00
Teknium
41ba90f814 fix(process): keep CLI drain dedup after poll goes read-only (#10156)
Follow-up to @de1tydev's poll-read-only fix. Removing the
_completion_consumed.add() from poll() fixes the gateway/tui watcher
suppression (#10156) but reintroduces the CLI duplicate that #8228 fixed:
a notify_on_complete process always enqueues a completion event, and the
CLI idle/post-turn drain would re-inject it as a [SYSTEM: ...] message
even though the agent already saw the exit inline in its poll result.

Add a separate _poll_observed set that poll() populates on an observed
exit. drain_notifications() (CLI only) skips poll-observed sessions; the
gateway/tui watchers keep checking only is_completion_consumed, so a
read-only poll never suppresses their autonomous delivery turn.

- _poll_observed pruned alongside _completion_consumed in _prune_if_needed
- 4 tests: CLI drain dedup after poll, gateway gate untouched, running
  poll doesn't mark observed, wait/log still skip CLI drain
2026-06-21 11:11:23 -07:00
Liao Shiwu
6f5f58e34b fix: keep poll read-only for notify_on_complete watcher 2026-06-21 11:11:23 -07:00
Eugeniusz Gilewski
9078b4bbdf fix(file): harden read_file device alias blocking
Security-hardening fix for the read_file device guard, not a new sandbox
boundary. The guard already rejects direct device paths and upstream now
has a resolved-path pass for workspace symlinks to blocked devices, but
its concrete-path helper still compared the expanded path before
normalization. That leaves residual alias cases where the dangerous path
is visible before final terminal-specific resolution, for example:

  1. /dev/../dev/zero and /dev/./urandom should match the blocked-device
     list as concrete paths, not only after final realpath;
  2. /dev/stdin-style aliases can disappear once realpath follows them
     to /proc/self/fd/0 and then to a tty path;
  3. a user symlink to /dev/../dev/stdin exposes the dangerous
     intermediate target before final resolution, but not necessarily
     after it.

Normalize expanded paths before matching and inspect each symlink hop
before falling back to realpath. This preserves the existing /proc fd and
/proc pseudo-file guards while enforcing the intended security invariant:
model-supplied read paths must not reach blocking or infinite device
streams through spelling, normalization, or symlink-hop tricks.

Classification: security hardening / residual bypass fix for the
read_file device blocklist. This is defensive code at the file-tool
boundary, but it fixes a concrete denial-of-service class tracked as
security in #10141 and #29158.

Tests:
  - normalized /dev/../dev/zero and /dev/./urandom aliases
  - symlink to /dev/../dev/stdin blocked before realpath
  - existing symlink-to-device and regular-symlink guards still pass

Fixes #10141
Fixes #29158
2026-06-21 11:11:19 -07:00
tt-a1i
ea056b0559 fix(telegram): avoid rich messages for CJK text
Telegram Mac/Desktop Bot API 10.1 rich-message rendering leaves garbled
overlapping draft/overlay glyphs for CJK text (#47653), affecting every
message containing CJK characters. The legacy MarkdownV2 path renders the
same text cleanly, so skip the rich send / draft / final-edit paths up
front for content containing CJK (incl. astral-plane extensions) until
affected clients age out. Non-CJK rich rendering is preserved.

Fixes #47653
2026-06-21 11:10:37 -07:00
brooklyn!
65a477f12e
feat(desktop): add Update now button to About panel (#50186) 2026-06-21 11:34:45 -05:00
teknium1
2f4f23fbfb fix(codex): bridge app-server item/started events to Telegram tool-progress (#38835)
When the main provider is the Codex app-server runtime (api_mode
codex_app_server), the gateway showed no verbose 'running X' tool-progress
breadcrumbs on Telegram while every other provider did. The app-server
session processes item/started notifications (command execution, file
changes, MCP/dynamic tool calls) but never surfaced them as Hermes
tool-progress events — the session was constructed without an on_event
hook, so the agent's tool_progress_callback was never invoked on this
route.

Add _codex_note_to_tool_progress() mapping item/started → (tool_name,
preview, args) for commandExecution / fileChange / mcpToolCall /
dynamicToolCall, and wire an on_event hook into CodexAppServerSession that
forwards mapped events to agent.tool_progress_callback('tool.started',
...) — the same signature the chat_completions path uses (tool_executor.py).
Non-tool items (agentMessage/reasoning) and non-item/started methods map
to None and are ignored.

Co-authored-by: jplew <462836+jplew@users.noreply.github.com>
2026-06-21 08:46:06 -07:00
yeyitech
8a506ed3ac fix(auth): make load_pool() non-destructive for env-seeded credentials
load_pool() is meant to be a read, but it persistently pruned env-seeded
pool entries whenever the calling process's os.environ lacked the seeding
var. A process without MINIMAX_API_KEY would delete the persisted
env:MINIMAX_API_KEY entry from auth.json for every other process, causing
auth.json to oscillate and auxiliary auto-detect to fall through to the
wrong provider.

env:* entries are persisted references re-hydrated from the environment on
each load — a missing var means "cannot re-seed right now", not "source is
gone forever". _prune_stale_seeded_entries now gates env-source removal
behind prune_env_sources (default True for explicit cleanup paths);
load_pool() passes prune_env_sources=False. File-backed singletons
(device-code OAuth, hermes_pkce) still prune when their backing file is
gone, and explicit removal via `hermes auth remove` (source suppression)
is unaffected.

Fixes #9331.

Co-authored-by: houko <suzukaze.haduki@gmail.com>
2026-06-21 08:26:37 -07:00
Teknium
a966932392 fix(telegram): exempt tables from rich newline hard-breaks
The newline normalization is the shared chokepoint for every rich send
(sendRichMessage, draft, and editMessageText). Injecting a Markdown hard
break (two trailing spaces) into a GFM table row separator corrupts the
natively-rendered table — the rich path's headline feature. Protect both
fenced code blocks AND pipe-table blocks as bare regions; only prose
between them gets hard breaks. Verified RICH_CONTENT and the existing
rich-table tests stay byte-identical.
2026-06-21 08:26:28 -07:00
Tranquil-Flow
31e59fe44d fix(telegram): preserve newlines in rich slash-command output (#46070)
Bot API 10.1 sendRichMessage treats a lone newline as a soft break, so
multi-line content joined with "\n".join(lines) — slash-command lists,
etc. — collapses into a single paragraph. Normalize single newlines to
Markdown hard breaks (two trailing spaces) in _rich_message_payload,
leaving paragraph breaks and fenced code blocks untouched.

Fixes #46070
2026-06-21 08:26:28 -07:00
Teknium
03563dabac
fix(gateway): raise session-hygiene hard message limit 400 → 5000 (#50194)
The gateway pre-compression hygiene valve force-compressed any session
crossing 400 messages regardless of token usage. On large-context (1M+)
models doing many short, message-dense turns, a healthy session at ~16%
token usage could hit 400 messages and get force-compressed — and the
compression summary's stale Active Task could then bleed into the next
turn.

The valve's actual purpose is to break a death spiral: when API calls
keep disconnecting on an oversized session, no token-usage data arrives,
the token threshold never fires, and the transcript grows unbounded.
It's a count-based floor for that pathological case only. 400 was tuned
for ~200K-context models and is far too low for modern large-context
sessions. Raise the default to 5000 — still well clear of any death
spiral, but no longer firing on legitimate long conversations.

The value remains fully configurable via compression.hygiene_hard_message_limit.
2026-06-21 08:26:19 -07:00
teknium1
3509be7124 fix(compression): auto-compression triggers at minimum context length (#14690)
The compaction threshold is max(context_length * threshold_percent,
MINIMUM_CONTEXT_LENGTH=64000). The floor prevents premature compression on
large models, but degenerates at small windows: a model at exactly 64000
ctx gets max(32000, 64000) = 64000 — a threshold equal to the ENTIRE
window. should_compress() can then never fire, because the provider
rejects the request before usage reaches 100%. Auto-compression silently
never triggers for any model whose context_length <= MINIMUM /
threshold_percent (e.g. 64K-per-slot local models).

Centralize the calc in _compute_threshold_tokens(). When the floor would
meet or exceed the context window, trigger at 85% of the window
(_MIN_CTX_TRIGGER_RATIO) — high enough that a minimum-context model uses
most of its budget before compacting (compacting at the 50% percentage
would waste half the small window), but below 100% so compaction actually
fires before the provider rejects the request. This mirrors the existing
gpt-5.5/Codex 85% autoraise rationale. Large-context behavior (floor at
64000) is unchanged; both call sites (__init__ and update_model) use the
shared helper.

Co-authored-by: soynchux <soynchuux@gmail.com>
Co-authored-by: LeonSGP43 <154585401+LeonSGP43@users.noreply.github.com>
Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>
2026-06-21 07:53:14 -07:00
kshitij
c6a0929875
Merge pull request #50137 from NousResearch/fix/reset-calibration-on-model-switch
fix(agent): reset stale token calibration on model switch (#23767)
2026-06-21 20:02:08 +05:30
kshitij
ed8f7898b9
Merge pull request #50136 from NousResearch/fix/context-aware-tool-budget
fix(agent): scale tool-output budget to the model context window (#23767)
2026-06-21 20:01:32 +05:30
liuhao1024
6984026f12 fix(browser): enable SSRF guard when terminal runs in container
When terminal.backend is docker/modal/daytona/ssh/singularity, the
terminal runs in a sandboxed container with network isolation, but the
browser still runs on the host.  The SSRF guard was skipped because
_is_local_backend() only checked browser.cloud_provider, not the
terminal backend.

Now _is_local_backend() also checks TERMINAL_ENV — when the terminal
is containerized, the browser is treated as non-local and SSRF
protection is enabled.

Fixes #38690
2026-06-21 07:26:18 -07:00
bogerman1
c7e8854cb3 fix(tui): persist session messages on force-quit / signal shutdown
Mirror the CLI's exit-path behaviour in the TUI gateway so that
unpersisted conversation messages are flushed to state.db and the
on_session_end plugin hook fires before the session is closed.

Root cause: _finalize_session() only called db.end_session() to
mark the session row as ended, but did NOT flush in-memory messages
via _persist_session() or fire the on_session_end hook.  When the
user force-quit (double Ctrl-C, terminal-close, SIGHUP) while the
agent was mid-turn, messages accumulated since the last persist
point were silently lost.

Changes
-------
tui_gateway/server.py - _finalize_session():
  - Persist unflushed messages via agent._persist_session() before
    db.end_session(). Prefers agent._session_messages (set by the
    last _persist_session call inside run_conversation) over
    session['history'] (stale when agent is mid-turn).
  - Fire on_session_end(interrupted=True) plugin hook so crash-
    recovery plugins can flush buffers, matching cli.py behaviour.

tui_gateway/entry.py - _log_signal():
  - Explicitly call _shutdown_sessions() before sys.exit(0) in the
    SIGHUP/SIGTERM handler as belt-and-suspenders over atexit.

tests/tui_gateway/test_finalize_session_persist.py (new):
  - 11 tests covering: history persistence, _session_messages
    priority, empty-history skip, missing-agent, double-finalize,
    persist-exception resilience, hook firing, hook-exception
    resilience, and db.end_session preservation.

Related
-------
Closes the TUI half of #5021 (CLI already handles this via its
atexit handler).  Also addresses the session-persistence gap
discussed in #18465 and #18269.
2026-06-21 07:26:07 -07:00
Teknium
e499d69e3e
feat(api-server): configurable concurrent-run cap to prevent DoS (#50007)
The OpenAI-compatible API server only enforced a hardcoded cap of 10
concurrent runs on /v1/runs, leaving /v1/chat/completions and
/v1/responses unbounded — a request flood could exhaust CPU, memory,
and upstream LLM quota (#7483).

- Add gateway.api_server.max_concurrent_runs (config.yaml, default 10,
  0 disables). No env var.
- Shared concurrency gate across all three agent-serving endpoints,
  counting both the chat/responses in-flight counter and the /v1/runs
  stream set. Returns OpenAI-style 429 + Retry-After when at the cap.
- Remove the dead hardcoded _MAX_CONCURRENT_RUNS class attribute.

Closes #7483.
2026-06-21 07:26:03 -07:00
Hariharan Ayappane
99233faf78 fix(cli): persist sessions before shutdown 2026-06-21 07:25:56 -07:00
Teknium
9f67ba1b01
fix(agent): guard finalize_turn cleanup chain so it never drops the response (#50009)
When a turn hit max_iterations, finalize_turn ran three unguarded cleanup
steps after the model's summary — _save_trajectory (file I/O), _cleanup_task_resources
(remote VM/browser teardown), and _persist_session (SQLite write). Any raise
there propagated out of run_conversation, discarding the partial final_response
the caller was waiting for; subprocess wrappers saw an empty stdout with no
traceback (#8049).

Each step is now guarded independently so one failure can't skip the others.
Failures log at ERROR with a traceback and are surfaced on the result dict via
cleanup_errors; the partial response is always returned.

Closes #8049.
2026-06-21 07:25:42 -07:00
miha
796f618f99 fix(telegram): keep chunk markers outside code fences
When truncate_message appends a (N/M) chunk indicator to a chunk that
had to close an in-progress fenced code block, the marker lands on the
closing fence line (``` \(1/2\) after MarkdownV2 escaping). Telegram
does not treat that as a clean closing fence and rejects the MarkdownV2,
falling back to plain text. Move the indicator onto its own line right
after the closing fence at all three legacy-send call sites.

Fixes #48517
2026-06-21 07:25:37 -07:00
kshitijk4poor
1e0b3a2bcc fix(agent): reset stale token calibration on model switch (#23767)
ContextCompressor.update_model() recomputed context_length/threshold/budgets
but kept the cross-call calibration state (last_real_prompt_tokens,
last_rough_tokens_when_real_prompt_fit, last_compression_rough_tokens,
awaiting_real_usage_after_compression, _ineffective_compression_count) from the
PREVIOUS model.

Those fields encode 'the provider proved this prompt fit' / 'preflight can be
deferred' decisions valid only for the model that produced them. Carried across
a switch to a smaller-context model, should_defer_preflight_to_real_usage() used
the old model's 'it fit' history to SKIP a preflight compression the new model
actually needed — sending an oversized prompt the provider rejects (#23767).

update_model() now clears that state; the new model's first response repopulates
it via update_from_response(). Verified E2E: after a 200K->65,536 switch, defer
no longer suppresses and should_compress fires on an over-threshold estimate.
2026-06-21 17:46:58 +05:30
kshitijk4poor
1965d56219 fix(agent): scale tool-output budget to the model context window (#23767)
The tool-result persistence budget was a fixed 100K chars/result and 200K
chars/turn regardless of the active model. On a small-context model (e.g. a
65K-token local model switched into mid-session) a single large tool result
(reporter: a 279K-char search result) or a full 200K-char turn (~50K tokens)
could by itself approach or exceed the window, forcing an oversized request
that the provider rejects as "Prompt too long".

- budget_config.budget_for_context_window() scales per-result/per-turn char
  caps to a fraction of the model window, clamped to the historical 100K/200K
  defaults (large models unchanged) and floored so small models stay usable.
- resolve_threshold() now caps the per-tool registry value at default_result_size
  so tools that register a fixed 100K cap (web/terminal/x_search) don't re-inflate
  a scaled-down budget. No-op for the default budget (both 100K).
- tool_executor wires the agent's live context_length (recomputed on model
  switch) into all four persist/turn-budget call sites.

read_file stays inf-pinned (no persist loop). Verified E2E: a 279K-char result
against a 65K model collapses to a ~1.6K preview; a 200K model is byte-identical
to today.
2026-06-21 17:46:38 +05:30
kshitij
5aec00f7a9
Merge pull request #50131 from kshitijk4poor/salvage/gateway-busy-readout-50103
feat(gateway+dashboard): busy/idle readout for safe lifecycle actions (salvage #50103)
2026-06-21 17:39:26 +05:30
kshitijk4poor
4d7bb382b0 refactor(gateway): route all active_agents coercion through parse_active_agents; harden drain-timeout fallback
Second cleanup pass (simplify-code review of the first follow-up):

- write_runtime_status now clamps active_agents via parse_active_agents
  instead of an inline max(0, int(...)). Removes the duplicated clamp the
  helper's docstring acknowledged AND closes a write-side ValueError gap
  (a non-numeric active_agents previously raised; now degrades to 0).
- hermes_cli/gateway.py draining-status line routes its active-agents count
  through parse_active_agents too — the third coercion site of the same
  persisted field, now consistent and non-raising with the two HTTP surfaces.
- web_server.py /api/status: the drain-timeout resolver fallback now catches
  ImportError specifically and falls back to DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
  (a real float) instead of a blanket 'except Exception -> None'. None would
  have violated the surfaced field's int/float contract and stripped NAS's
  poll-deadline hint silently.
- Dropped a redundant 'if runtime else 0' branch (parse_active_agents already
  handles the empty/None case) and tightened the parse_active_agents docstring
  to describe the actual single-contract role (write + both reads).
2026-06-21 17:22:52 +05:30
kshitijk4poor
b577f25100 refactor(gateway): dedupe drain-timeout resolution + share active_agents parse
Follow-up cleanups on top of the busy/idle readout (PR #50103):

- web_server.py /api/status reused the single drain-timeout resolver
  hermes_cli.gateway._get_restart_drain_timeout() (HERMES_RESTART_DRAIN_TIMEOUT
  env -> agent.restart_drain_timeout config -> default) instead of inlining a
  third hand-rolled copy of that precedence chain. Also fixes a subtle
  divergence: the inline copy used os.environ.get() so a set-but-empty env var
  was treated as a value rather than falling through to config; the shared
  resolver .strip()s and falls through correctly.
- Added gateway.status.parse_active_agents() and routed BOTH HTTP surfaces
  (/api/status and /health/detailed) through it, so the exposed active_agents
  field is consistently clamped non-negative. Previously /api/status clamped
  while /health/detailed exposed the raw file value, diverging on a corrupt
  count.
- Added TestParseActiveAgents covering the shared coercion contract.
2026-06-21 17:22:52 +05:30
Ben
0ee75469d7 feat(dashboard): surface gateway busy/drainable on /api/status
Give an external consumer (NAS) a trustworthy, always-reachable busy/idle
readout it can poll before a disruptive lifecycle action (restart,
migrate, stop, auto-update). The dashboard /api/status is the only HTTP
surface guaranteed up on a hosted agent regardless of which gateway
platforms are enabled, and it already reads gateway_state.json.

Add to /api/status (additive, non-breaking):
  - active_agents       — in-flight gateway-turn count (now refreshed
                          per-turn by the companion gateway-side commit)
  - gateway_busy        — running AND active_agents > 0
  - gateway_drainable   — running and live (a valid begin-drain target)
  - restart_drain_timeout — resolved seconds, so the consumer can size its
                          poll deadline without out-of-band knowledge
                          (env HERMES_RESTART_DRAIN_TIMEOUT → config
                          agent.restart_drain_timeout → default)

The busy/drainable contract is defined once in gateway.status
(derive_gateway_busy / derive_gateway_drainable) and consumed by both
/api/status and /health/detailed so the two surfaces can never disagree.
Liveness keys off gateway_running (a live PID/health probe), NEVER
gateway_updated_at — a healthy idle gateway never advances that timestamp.
All derived fields degrade to safe falsy values when the gateway is down
or the status file is absent/corrupt (never a spurious "busy" that would
wedge the consumer). active_sessions (the 5-min DB recency heuristic the
SPA reads) is left exactly as-is — new signal, new fields.

Tests (behaviour contracts, not snapshots): the pure derivation contract
across every running/state/count/liveness combination; /api/status
integration for busy, idle-drainable, draining, down, stale-busy-file,
corrupt-count, and timeout surfacing; and /health/detailed parity.
2026-06-21 17:22:52 +05:30