hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-24 10:52:21 +00:00

Author	SHA1	Message	Date
Brooklyn Nicholson	a61baa9615	feat(desktop): PR-style file diffs in chat Render write_file/edit_file/patch as a reviewable diff instead of raw result JSON, closer to a Cursor/T3 per-edit review. - Unified diff via FileDiffPanel: strip git file-header + @@ hunk noise, drop the +/- gutter, color by line with a 2px gutter accent, full-bleed to the card, transparent context lines, compact scroll height. - Header shows filename + language icon + +N/-N stats; full path moves to a hover tooltip (no Edited verb, no ms). - Treat the three file-edit tools uniformly (isFileEditTool); read diff from inline_diff or patch's diff field; suppress raw-arg detail. - Reusable FileTypeIcon primitive sharing the code-block icon mapping (codiconForFilename), codicon fallback. - Per-row scaffolding fade (not the group wrapper, which trapped child opacity); expanded edits stay full, collapsed fade; keyboard-only focus lift. Hide diff-less rehydrated creates that read as dupes.	2026-06-22 05:04:13 -05:00
Brooklyn Nicholson	fb3d31ba8b	feat(desktop): add Update now button to About panel The About > Updates panel only surfaced "See what's new" when an update was available, which just opens the changelog overlay — there was no way to start the install directly from About. Add an "Update now" primary button that opens the updates overlay (for apply progress) and kicks off the install for the active target (backend in remote mode, else client).	2026-06-21 09:26:31 -05:00
kshitij	5aec00f7a9	Merge pull request #50131 from kshitijk4poor/salvage/gateway-busy-readout-50103 feat(gateway+dashboard): busy/idle readout for safe lifecycle actions (salvage #50103)	2026-06-21 17:39:26 +05:30
kshitijk4poor	4d7bb382b0	refactor(gateway): route all active_agents coercion through parse_active_agents; harden drain-timeout fallback Second cleanup pass (simplify-code review of the first follow-up): - write_runtime_status now clamps active_agents via parse_active_agents instead of an inline max(0, int(...)). Removes the duplicated clamp the helper's docstring acknowledged AND closes a write-side ValueError gap (a non-numeric active_agents previously raised; now degrades to 0). - hermes_cli/gateway.py draining-status line routes its active-agents count through parse_active_agents too — the third coercion site of the same persisted field, now consistent and non-raising with the two HTTP surfaces. - web_server.py /api/status: the drain-timeout resolver fallback now catches ImportError specifically and falls back to DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT (a real float) instead of a blanket 'except Exception -> None'. None would have violated the surfaced field's int/float contract and stripped NAS's poll-deadline hint silently. - Dropped a redundant 'if runtime else 0' branch (parse_active_agents already handles the empty/None case) and tightened the parse_active_agents docstring to describe the actual single-contract role (write + both reads).	2026-06-21 17:22:52 +05:30
kshitijk4poor	b577f25100	refactor(gateway): dedupe drain-timeout resolution + share active_agents parse Follow-up cleanups on top of the busy/idle readout (PR #50103): - web_server.py /api/status reused the single drain-timeout resolver hermes_cli.gateway._get_restart_drain_timeout() (HERMES_RESTART_DRAIN_TIMEOUT env -> agent.restart_drain_timeout config -> default) instead of inlining a third hand-rolled copy of that precedence chain. Also fixes a subtle divergence: the inline copy used os.environ.get() so a set-but-empty env var was treated as a value rather than falling through to config; the shared resolver .strip()s and falls through correctly. - Added gateway.status.parse_active_agents() and routed BOTH HTTP surfaces (/api/status and /health/detailed) through it, so the exposed active_agents field is consistently clamped non-negative. Previously /api/status clamped while /health/detailed exposed the raw file value, diverging on a corrupt count. - Added TestParseActiveAgents covering the shared coercion contract.	2026-06-21 17:22:52 +05:30
Ben	0ee75469d7	feat(dashboard): surface gateway busy/drainable on /api/status Give an external consumer (NAS) a trustworthy, always-reachable busy/idle readout it can poll before a disruptive lifecycle action (restart, migrate, stop, auto-update). The dashboard /api/status is the only HTTP surface guaranteed up on a hosted agent regardless of which gateway platforms are enabled, and it already reads gateway_state.json. Add to /api/status (additive, non-breaking): - active_agents — in-flight gateway-turn count (now refreshed per-turn by the companion gateway-side commit) - gateway_busy — running AND active_agents > 0 - gateway_drainable — running and live (a valid begin-drain target) - restart_drain_timeout — resolved seconds, so the consumer can size its poll deadline without out-of-band knowledge (env HERMES_RESTART_DRAIN_TIMEOUT → config agent.restart_drain_timeout → default) The busy/drainable contract is defined once in gateway.status (derive_gateway_busy / derive_gateway_drainable) and consumed by both /api/status and /health/detailed so the two surfaces can never disagree. Liveness keys off gateway_running (a live PID/health probe), NEVER gateway_updated_at — a healthy idle gateway never advances that timestamp. All derived fields degrade to safe falsy values when the gateway is down or the status file is absent/corrupt (never a spurious "busy" that would wedge the consumer). active_sessions (the 5-min DB recency heuristic the SPA reads) is left exactly as-is — new signal, new fields. Tests (behaviour contracts, not snapshots): the pure derivation contract across every running/state/count/liveness combination; /api/status integration for busy, idle-drainable, draining, down, stale-busy-file, corrupt-count, and timeout surfacing; and /health/detailed parity.	2026-06-21 17:22:52 +05:30
Ben	51a338a1b6	feat(gateway): track active_agents in runtime status on turn boundaries The gateway only rewrote gateway_state.json on lifecycle transitions (start/connect/drain/stop), never on turn start/end. Live-verified on a hosted agent: a confirmed end-to-end turn ran while gateway_updated_at stayed frozen at boot and active_agents was absent — so any active_agents read from the file between transitions is stale. That makes it unusable as a busy/idle signal for an external consumer (NAS deciding whether it's safe to restart/migrate/auto-update an agent mid-turn). Add _persist_active_agents(), called at every turn boundary: - turn start: both running-agent sentinel-claim sites (normal inbound message path + startup-resume path) - turn end: the central _release_running_agent_state() choke point (covers normal completion, /stop, /reset, sentinel cleanup, stale-eviction — every path that ends a running turn) It passes ONLY active_agents to write_runtime_status, leaving gateway_state (and every other field) _UNSET so the read-merge-write preserves the current lifecycle state. Passing gateway_state=None would clobber it — hence a dedicated helper rather than reusing _update_runtime_status. The write is the same cheap JSON write done on lifecycle transitions today; best-effort (a failed status write never disrupts a turn). Behaviour-contract test: an active_agents-only write preserves both running and draining gateway_state, and the count clamps non-negative.	2026-06-21 17:22:52 +05:30
kshitij	44d552ea5a	Merge pull request #50115 from NousResearch/salvage/model-switch-preflight-warning fix(cli): warn when in-session model switch will preflight-compress	2026-06-21 16:41:44 +05:30
kshitijk4poor	1ca29723f0	fix(cli): log instead of swallow preflight-warning errors; consistent TUI warning field Follow-up to the salvaged preflight-compression warning: - Replace silent `except Exception: pass` at all 5 guard call sites (cli.py x2, gateway/slash_commands.py x2, tui_gateway/server.py) with `logger.debug(...)` so signature drift in the guard helper isn't hidden. - tui_gateway/server.py: set the confirm dict's `warning` field to the merged message (was bare expensive-model text) so it matches `confirm_message` for any future consumer reading `warning`. - Add trailing newlines to the two new files.	2026-06-21 16:31:56 +05:30
Tuna Dev	04730f32e7	fix(cli): warn when in-session model switch will preflight-compress Adds hermes_cli/context_switch_guard.py mirroring the model_cost_guard pattern. When a user switches models mid-session (Herm TUI picker, CLI, or /model on Telegram/Discord), the warning surfaces on the existing ModelSwitchResult.warning_message path used by the expensive-model guard if the new model's compression threshold is below the current session size. Partial fix for #23767 — addresses only the 'user-facing guardrail when switching from a high-context provider to a substantially lower-context provider' slice. The other proposed fixes from that issue (hard preflight token guard, metadata cache invalidation on switch, compression safety invariant, oversized tool-output handling) are out of scope for this PR.	2026-06-21 16:29:31 +05:30
xxxigm	7b9a0b315b	test(mcp): cover 'unknown method' ping keepalive fallback (#50028 ) Two regression tests for the agentmemory reconnect-loop: - _is_method_not_found_error matches the plain 'Unknown method: ping' phrasing (no structural -32601 code). - _keepalive_probe latches _ping_unsupported and falls back to list_tools when send_ping raises 'Unknown method: ping', instead of propagating (which would reconnect-loop).	2026-06-21 16:02:56 +05:30
xxxigm	472c068159	fix(mcp): detect 'unknown method' phrasing in ping keepalive fallback A server that doesn't implement the optional 'ping' utility answers a keepalive ping with JSON-RPC method-not-found. _is_method_not_found_error latches that condition so the probe falls back to list_tools instead of reconnect-looping. The substring fallback only matched 'method not found' / '-32601' / 'not found: ping'. Servers that surface method-not-found as the common 'Unknown method: <name>' phrasing without a structural -32601 code (e.g. agentmemory's MCP server) slipped through, so the fallback never latched and the keepalive reconnect-looped every cycle. Add 'unknown method' to the substring fallback so the ping->list_tools keepalive fallback latches for these servers too. Fixes #50028.	2026-06-21 16:02:56 +05:30
kshitij	8ca38d3121	Merge pull request #50100 from kshitijk4poor/salvage/model-visibility-cross-provider-47450 fix(desktop): preserve other providers' hide-all in model visibility dialog (salvage #47450)	2026-06-21 15:56:00 +05:30
kshitijk4poor	461fcc0964	test(desktop): harden model-visibility toggle + dedupe default expansion Follow-up to the salvaged #47450 fix: - Extract expandProviderDefaults() so the curated-default expansion rule lives in one place (was duplicated between defaultVisibleKeys and resolveVisibleKeys). - Drop the redundant new Set() wrap in toggleModelVisibility (resolveVisibleKeys already returns a fresh Set; effectiveVisibleKeys already relied on this). - Document the intentional re-enable behavior (re-enabling one model of a hidden-all provider restores only that model, not the curated defaults) and tighten the toggleModelVisibility JSDoc. - Add 7 hardening tests: re-enable-restores-only-that-model, full hide/re-enable round-trip, empty-non-null stored, single toggle-off from null defaults, zero-model provider, and direct resolveVisibleKeys null/empty assertions.	2026-06-21 15:46:58 +05:30
David Doan	8666fd7635	fix(desktop): preserve other providers' hide-all in model visibility dialog #43496 added a per-provider hide-all sentinel ('provider::') so emptying a provider in the Edit Models dialog stopped re-expanding its defaults. That fixed the single-provider case, but the dialog's toggle handler seeds its working set from effectiveVisibleKeys(), which strips ALL sentinels before returning. So persisting after any toggle silently dropped every OTHER provider's hide-all sentinel; those providers then looked 'never customized' and re-enabled all their models on the next render. Split resolution into two functions: - resolveVisibleKeys(): stored keys + curated default expansion, with hide-all sentinels PRESERVED — the canonical working set the toggle handler mutates and persists. - effectiveVisibleKeys(): resolveVisibleKeys() then strips sentinels, for display only (unchanged contract). Move the toggle set-computation into a pure, unit-tested toggleModelVisibility() that seeds from resolveVisibleKeys(), so sibling sentinels survive the persist. Add regression tests that drive the real toggle handler across multiple providers. Follow-up to #43496; completes the fix for #43485 (cross-provider case).	2026-06-21 15:42:26 +05:30
kshitij	f57ff7aef1	Merge pull request #50034 from NousResearch/salvage/cron-tz-offset-repair Some checks are pending Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details Typecheck / typecheck (apps/bootstrap-installer) (push) Waiting to run Details Typecheck / typecheck (apps/desktop) (push) Waiting to run Details Typecheck / typecheck (apps/shared) (push) Waiting to run Details Typecheck / typecheck (ui-tui) (push) Waiting to run Details Typecheck / typecheck (web) (push) Waiting to run Details Typecheck / desktop-build (push) Waiting to run Details fix(cron): repair migrated timezone offsets to prevent double-fire	2026-06-21 13:53:28 +05:30
kshitij	f6a504d088	Merge pull request #50025 from NousResearch/salvage/cron-run-immediate fix(cron): execute job immediately on action=run	2026-06-21 13:53:13 +05:30
kshitij	3051a1634c	Merge pull request #50023 from NousResearch/salvage/f3b-telegram-dmtopic fix(cron): route Telegram DM-topic cron delivery through DeliveryRouter (#22773)	2026-06-21 13:47:30 +05:30
kshitijk4poor	f43c61643d	chore(release): add devsart95 to AUTHOR_MAP	2026-06-21 13:35:50 +05:30
kshitijk4poor	4cc28aa3bb	fix(cron): route Telegram DM-topic cron delivery through DeliveryRouter (#22773 ) PR #22410 added three-mode Telegram topic routing to the live message path (TelegramAdapter.send via the gateway DeliveryRouter), but the cron delivery path never got it. cron/scheduler.py::_deliver_result sent through the live adapter with a bare ``{"thread_id": ...}`` and fell back to the standalone _send_telegram, neither of which addresses Bot API Direct Messages topics correctly. After Bot API 10.0 (2026-05-08), sending to a private chat with a bare ``message_thread_id`` is rejected/mis-routed, so cron deliveries to a private DM topic landed in the General topic instead of the requested lane. Fix: the cron live-adapter branch now routes the text send through the gateway's ``DeliveryRouter._deliver_to_platform`` — the same canonical path live messages use — so it inherits all three Telegram routing modes: 1. Forum/supergroup (negative chat_id) -> message_thread_id 2. Bot API DM topics (private chat_id + numeric topic id) -> direct_messages_topic_id (the case #22773 reported) 3. Hermes-created named private DM-topic lanes -> ensure_dm_topic + reply anchor For mode 2, a private-chat target with a numeric topic id is passed as ``direct_messages_topic_id`` metadata (verified end-to-end: TelegramAdapter._thread_kwargs_for_send turns it into ``{message_thread_id: None, direct_messages_topic_id: <int>}``), instead of a bare message_thread_id. Forum/supergroup and home-channel deliveries are unchanged. The standalone fallback (gateway down) is preserved. No new config knob and no duplicated routing logic — this reuses the existing DeliveryRouter rather than reimplementing topic routing in the cron path. Salvaged from #42051 (stepanov1975) and #23249 (devsart95), which both diagnosed the missing three-mode routing in the cron/standalone path; reimplemented onto the canonical DeliveryRouter that landed since those PRs were opened. Co-authored-by: Alex <9785479+stepanov1975@users.noreply.github.com> Co-authored-by: devsart95 <devsart95@gmail.com>	2026-06-21 13:35:45 +05:30
Tranquil-Flow	f1f36b3bae	fix(cron): repair migrated cron timezone offsets to prevent double-fire A recurring cron job persists `next_run_at` as an absolute timestamp with a UTC offset (e.g. `2026-05-19T21:00:00+10:00`). Cron expressions, however, describe local wall-clock intent ("run at 21:00"). When Hermes/system timezone changes after the timestamp was persisted, the stored instant is re-interpreted in the new zone: `21:00+10:00` is the instant `13:00+02:00`, which is `<= now` (13:02+02:00) — so the job fires HOURS EARLY, then `compute_next_run` advances it via croniter to `21:00+02:00` the same day, producing a SECOND fire. (#28934, recurrence of #24289.) `_get_due_jobs_locked` now detects this precise migration case before the due check: for a `cron` job whose converted instant looks due, whose stored UTC offset differs from the current zone's, AND whose stored wall-clock time is still in the future (distinguishing a migrated offset from a genuinely missed run), it recomputes `next_run_at` from the schedule and skips the early fire — preserving the local wall-clock intent. Verified against the issue's reproducer: stored `21:00+10` under runtime `+02:00` at wall-clock `13:02` is rescheduled to `21:00+02` instead of firing early + again. Salvaged from #28941 by @Tranquil-Flow (authorship preserved). Chosen over the alternative approaches (#28951 normalize-to-UTC, #28985 rebase-and-match) because UTC-normalization does not change the absolute-instant comparison and so does not fix the early fire, and this guard is the tightest: it only acts when all four conditions hold and reuses the existing `compute_next_run`. Fixes #28934	2026-06-21 13:31:31 +05:30
kshitij	02a3288de3	Merge pull request #50018 from NousResearch/salvage/f3a-delivery-confirm fix(cron): make live-adapter delivery confirmation reliable (#38922, #47056, #43014)	2026-06-21 13:29:45 +05:30
kyssta-exe	65d7c7fafd	fix(cron): execute job immediately on action='run' `cronjob(action='run')` (and `hermes cron run`) only set `next_run_at = now` and returned success, relying on the scheduler ticker to actually execute the job on its next tick. When no gateway/ticker is running — a CLI-only setup, or the Windows case in #41037 — the job never executed: `run` reported success, but `last_run_at` stayed null forever, no output, no delivery. A manual `run` should actually run. `_execute_job_now` now: - claims the job via `claim_job_for_fire` — the same at-most-once CAS the scheduler/external-provider fire path uses. This both advances `next_run_at` for recurring jobs and blocks a concurrently-running gateway ticker from double-firing the same job; if the claim is lost, the run is skipped (the tool reports `execution_skipped`). This closes the double-fire race that a bare `advance_next_run` left open (a tick whose `get_due_jobs` already captured the job between trigger and advance would still fire it). - delegates firing to `run_one_job` — the single shared execute→save→deliver→mark body the ticker and external providers use — so failure delivery, `[SILENT]` handling, and live-adapter delivery stay identical across paths and can't drift. (The original salvage re-implemented this sequence inline and had already dropped failure delivery + `[SILENT]`.) The tool response carries `executed`, `execution_success`, and either `execution_error` or `execution_skipped`. The `hermes cron run` CLI message no longer claims "It will run on the next scheduler tick" — it reports the actual "Ran now: succeeded/failed" outcome (or the skip). Salvaged from #41130 by @kyssta-exe (authorship preserved); reworked to reuse `claim_job_for_fire` + `run_one_job` per review rather than re-implementing the fire sequence inline. Adds tests for the claim-then-fire path, claim-lost skip, failure reporting, and exception capture. Fixes #41037 Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>	2026-06-21 13:28:04 +05:30
kshitij	9f4c0b27c9	Merge pull request #50016 from NousResearch/salvage/cron-ticker-liveness	2026-06-21 13:08:46 +05:30
kshitijk4poor	d6cb69a7a9	chore: add sweetcornna to AUTHOR_MAP Salvage co-author of the cron ticker-liveness fix.	2026-06-21 13:00:50 +05:30
annguyenNous	07424da76f	fix(cron): keep ticker alive on BaseException + heartbeat-aware status The in-process cron ticker (cron/scheduler_provider.py) caught only `Exception` and logged at DEBUG, so a `SystemExit`/`KeyboardInterrupt` raised from a misbehaving provider SDK or agent retry path killed the ticker thread silently. The gateway PROCESS stayed up, so `hermes cron status` — which only checks `find_gateway_pids()` — kept reporting "✓ jobs will fire automatically" while no jobs ever fired (#32612, #32895). This makes ticker death survivable and detectable: - The ticker loop now catches `BaseException` and logs at ERROR with a traceback, so a single bad tick no longer tears the thread down and the failure is visible in the gateway log. - The loop records a heartbeat (`cron/ticker_heartbeat`, epoch seconds) on startup and after every tick — best-effort, never raised into the loop. Both ticker entry points (the gateway and the desktop fallback in web_server.py) funnel through `InProcessCronScheduler.start`, so one heartbeat site covers both. - `hermes cron status` now reads the heartbeat age: if the gateway is running but the heartbeat is stale (> 200s, i.e. several missed ~60s ticks), it reports the ticker as STALLED and suggests a restart instead of falsely claiming jobs will fire. A missing heartbeat (older build / never ran) is treated as "unknown", not "dead". Adds tests for BaseException survival, per-iteration heartbeat recording, heartbeat round-trip/age, staleness detection, and silent-write-failure. Salvaged from #49660 (BaseException survival on current structure), extended with the heartbeat + honest-status reporting that the earlier (pre-refactor) watchdog PRs #35616 and #33849 proposed. Fixes #32612 Fixes #32895 Co-authored-by: banditburai <promptsiren@gmail.com> Co-authored-by: sweetcornna <96944678+sweetcornna@users.noreply.github.com>	2026-06-21 13:00:50 +05:30
Luke The Dev	d54890870f	fix(cron): make live-adapter delivery confirmation reliable (#38922 , #47056 , #43014 ) Consolidates three cron-delivery defects in cron/scheduler.py::_deliver_result that all stem from how the live-adapter send result is interpreted. #38922 — duplicate message on confirmation timeout. future.result(timeout=60) raising TimeoutError bubbled to the outer except handler, which left delivered=False, so `if not delivered:` re-sent the identical message via the standalone path. future.cancel() cannot un-send a request already in flight on the wire, so a slow confirmation deterministically produced a duplicate. The send was already dispatched onto the gateway loop, so a bare timeout is now treated as delivered (assume-delivered is safer than guaranteed-duplicate) and the standalone fallback is skipped. The live-adapter media attempt is also skipped on timeout since the contended loop would re-block each 30s media budget. #47056 — silent drop when the gateway has an active session. The old check `if send_result is None or not getattr(send_result, "success", True)` let a result object missing a `success` attribute default to True = counted as a successful delivery, so the scheduler logged "delivered via live adapter" while the gateway never processed the message. Delivery is now confirmed via _confirm_adapter_delivery(): only an explicit, truthy `success` attribute counts; None or a `success`-less object falls through to the standalone path so the message actually arrives. A genuine send Exception (not a slow confirmation) still falls through to the standalone path, and is caught by run_job's outer handler — it is recorded as the job's last_error and never crashes the cron ticker. #43014 — deliver=origin fails to resolve in CLI sessions. A CLI-created job has no {platform, chat_id} origin, so deliver=origin (and auto-detect / deliver=None) was unresolvable and emitted "no delivery target resolved" on every run. An unresolvable origin with no configured home channel is now treated as local (output stays in last_output), matching the documented auto-deliver contract; a concrete unresolvable platform target still reports a real error. Salvaged from #41007 (timeout discriminator), folding in #47127's _confirm_adapter_delivery hardening and #38937 / #43063's origin→local fallback. Tests rewritten as behavior contracts (timeout => no duplicate; None / success-less result => standalone fallback; confirmed success => no fallback; CLI origin => local, explicit platform => still errors). Co-authored-by: Evi Nova <66773372+Tranquil-Flow@users.noreply.github.com> Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>	2026-06-21 12:59:21 +05:30
kshitijk4poor	35752fc3a5	chore: add szzhoujiarui-sketch and rayjun to AUTHOR_MAP Salvage co-authors of the cron model.default fix.	2026-06-21 12:37:56 +05:30
konsisumer	73b92264ee	fix(cron): resolve model.default + fail fast on missing model Cron jobs created without an explicit `model` are stored as `model: null`. At fire time `run_job` resolved `model = job.get("model") or os.getenv( "HERMES_MODEL") or ""` and then `_model_cfg.get("default", model)`, so when config.yaml had no `model.default` (or `model: {default: null}`) an empty string flowed straight to the provider and surfaced as an opaque HTTP 400 ("Model parameter is required" / "model: String should have at least 1 character"). The operator had to inspect jobs.json to discover the job was stored with a null model. This change makes cron model resolution robust and symmetric with the CLI: - Coerce `model: null`/missing config to `{}` so a falsy default never overwrites an already-resolved env value with `None`. - Only overwrite `model` from `model.default` when the resolved value is truthy; accept a `model.model` alias key, mirroring the sibling resolvers in hermes_cli/oneshot.py, fallback_cmd.py and prompt_size.py. - Resolve AFTER the managed-scope overlay so an administrator-pinned model still wins. - Fail fast with an actionable error (caught by run_job's outer handler and recorded as the job's last_error — the cron ticker is unaffected) instead of letting an empty model reach the API. - The per-job model is re-read every tick, so a `cronjob action=update model=...` after a failed run takes effect on the next tick (no cache). Adds tests/cron/conftest.py pinning a default HERMES_MODEL so existing run_job tests don't trip the new guard, plus regression tests covering env fallback, config.default fallback, string-form config, the model alias key, null-default-no-clobber, corrupt-config graceful degradation, fail-fast, and the no-cache re-read property. Salvaged from #24005, rebased onto current main, with additional test coverage folded in from #45550 and the alias-key behavior from #43952. Fixes #43899 Fixes #23979 Fixes #22761 Co-authored-by: szzhoujiarui-sketch <szzhoujiarui@gmail.com> Co-authored-by: rayjun <rayjun0412@gmail.com>	2026-06-21 12:37:56 +05:30
teknium1	14ef6312b5	fix(compression): decay protect_first_n so early turns don't fossilize (#11996 ) protect_first_n keeps the first N non-system messages verbatim through compaction so the original task framing survives. But it was applied on EVERY compression pass: the same early user turns were re-copied into each child session and never summarized away, so across a long, repeatedly- compressed session those old messages became immortal and grew the protected head unboundedly (#11996, P1). Decay it: protect_first_n applies on the FIRST compaction only. Once the session has been compressed at least once (compression_count >= 1, or a handoff summary already exists), the early turns are captured in the summary, so _effective_protect_first_n() returns 0 and only the system prompt stays protected. The decay is read at compress_start computation time, before compression_count/_previous_summary are mutated at the end of compress(), so the first pass still protects correctly. Co-authored-by: truenorth-lj <liliangjya@gmail.com> Co-authored-by: davidvv <david.vv@icloud.com>	2026-06-21 00:06:58 -07:00
Teknium	c6bf6bda90	fix(memory): recover from missing old_text on single-op replace/remove (#49997 ) Single-op replace/remove failed with a dead-end 'old_text is required' error when a structured-output client omitted the optional old_text field (it can't be schema-required without a top-level if/then combinator that OpenAI's Codex backend 400s on). The model couldn't recover. Now a missing old_text returns the current entry inventory plus a retry instruction (mirroring the batch path's _batch_error), so the model can reissue the call with old_text set. Also sharpens the old_text schema description to state it's required for replace/remove. Fixes #49466, #43412.	2026-06-20 23:46:52 -07:00
Teknium	d5f0e737d9	chore(release): add AUTHOR_MAP entry for #49544 salvage	2026-06-20 23:42:47 -07:00
Teknium	c1f11f8c69	fix(telegram): index streamed rich finals via editMessageText too The native echo recovery handles replies to most rich messages, but messages sent before the bot's first rich send have no echo to read. record() was only called on the fresh-send path (_try_send_rich); a streamed final finalized via _try_edit_rich/editMessageText was never indexed, so a reply to it had neither a native echo nor an index entry. Mirror the fresh-send record() into the edit success path to close that gap.	2026-06-20 23:42:47 -07:00
izumi0uu	29e5e127c6	fix(telegram): recover reply text from native rich echo Telegram DOES echo a rich message's content back in reply_to_message.api_kwargs['rich_message']['blocks'] when a user replies to it. Read that native field first in _build_message_event, keeping the local send-time index only as a fallback. Duck-type api_kwargs via .get() since it is a mappingproxy, not a dict. Fixes #49534	2026-06-20 23:42:47 -07:00
teknium	fcdefb4181	chore(release): add AUTHOR_MAP entries for docs PR salvage cluster 2	2026-06-20 23:23:47 -07:00
Tony Simons	2008a96b20	docs: align contributor test checklist with wrapper	2026-06-20 23:23:47 -07:00
BBCrypto-web	72e4cca00e	docs(config): correct MCP docs path in cli-config.yaml.example The MCP section pointed to docs/mcp.md, which does not exist. Point it to website/docs/user-guide/features/mcp.md, matching the existing hooks.md reference convention in the same file. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-20 23:23:47 -07:00
namredips	b1ab5a8ae1	docs(antigravity-cli): add delegation patterns + output/bounding caveats Brings the antigravity-cli skill to parity with the codex / claude-code delegation playbooks. Additive only — auth/sandbox/plugin/settings content is unchanged. - New 'Delegation patterns' section: one-shot, background bounded runs, interactive PTY+tmux, parallel worktree fan-out, and an orchestration boundary note (agy is a worker backend / reviewer, not a coordination primitive). - Documents the two ways agy -p differs from claude-code: plain-text output (no --output-format json / result envelope) and bounding via --print-timeout rather than a nonexistent --max-turns. Mirrored into Pitfalls. - Bumps version 0.1.0 -> 0.2.0.	2026-06-20 23:23:47 -07:00
Sworntech-dev	9f507a0aa3	docs: remove file tools TBD placeholder	2026-06-20 23:23:47 -07:00
BBCrypto-web	225dcf855c	docs(.env.example): add HF_BASE_URL placeholder Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-20 23:23:47 -07:00
loes5050	85f108ef03	test(cron): document consent-first self-learning suggestions	2026-06-20 23:23:47 -07:00
allo	bc85f6150e	docs: document per-event extra keys in shell-hook wire protocol The shell-hook stdin payload's extra object contains event-specific kwargs, but the docstring only mentioned the field without listing what each event actually puts inside it. Add a reference table covering post_tool_call, pre_tool_call, on_session_start, on_session_end, and subagent_stop — the five hook sites that emit extra keys beyond the top-level payload. Closes #49370	2026-06-20 23:23:47 -07:00
Tortugasaur	c02648c5dd	fix(docs): align slash-command and docker docs	2026-06-20 23:23:47 -07:00
teknium1	98ecd0beeb	docs(mcp): fix stale ~0.75s discovery-wait reference in late-refresh docstring The MCP discovery wait is now bounded by the config-driven mcp_discovery_timeout (default 1.5s), not the old 0.75s flat value. Updates the _schedule_mcp_late_refresh docstring that still cited ~0.75s after #49208 made the bound configurable.	2026-06-20 23:23:47 -07:00
Kevin Anderson	b337afdf6e	docs(cli): fix broken terminal-backend guide link in setup wizard The terminal backend onboarding step pointed at /docs/developer-guide/environments, which no longer exists. Point it at the live docs page /docs/user-guide/configuration#terminal-backend-configuration. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-20 23:23:47 -07:00
virtuadex	defeda8c55	docs: sync documentation with current implementation	2026-06-20 23:23:47 -07:00
miha	95d970a752	docs: sharpen software-development skills	2026-06-20 23:23:47 -07:00
aieng-abdullah	74b5cc7ca4	docs(spotify): document 6-month re-auth cycle and add client-level invalid_grant test - Remove the 'you only log in once per machine' claim from spotify.md and document the ~6-month refresh token expiry with re-auth instructions - Add test_client_wraps_invalid_grant_as_spotify_auth_required_error to confirm SpotifyClient wraps AuthError(code=spotify_refresh_invalid_grant) into SpotifyAuthRequiredError with a user-facing message Refs: #28155	2026-06-20 23:23:47 -07:00
EloquentBrush0x	9bd5003d4f	fix(spotify): quarantine dead tokens on terminal refresh failure resolve_spotify_runtime_credentials() called _refresh_spotify_oauth_state() without a try/except, so a terminal failure (HTTP 400/401, invalid_grant, refresh_token_reused) raised AuthError but left the dead refresh_token in auth.json. Every subsequent session re-read and retried the same token over the network, failing identically each time. Fix: wrap the refresh call and, when exc.relogin_required is True and a refresh_token is present, clear the dead OAuth fields (access_token, refresh_token, expires_at, expires_in, obtained_at) and write a last_auth_error quarantine marker to auth.json before re-raising. The next call sees no access_token and fails fast with spotify_access_token_missing — no network retry — and the user is prompted to re-authenticate. Mirrors the quarantine pattern already in place for Nous, xAI-OAuth, Codex-OAuth (#28116, #28118), and MiniMax-OAuth (#28119).	2026-06-20 23:23:47 -07:00
HwangJohn	242962e1f5	docs(providers): clarify vllm qwen reasoning output Signed-off-by: HwangJohn <angelic805@gmail.com> Co-authored-by: OpenAI Codex <codex@openai.com>	2026-06-20 23:23:47 -07:00

1 2 3 4 5 ...

12364 commits