hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-30 01:41:43 +00:00

Author	SHA1	Message	Date
Teknium	9b55365f6f	fix(gateway,cron): close ephemeral agents + reap stale aux clients (salvage #13979 ) (#16598 ) * fix: clean gateway auxiliary client caches on teardown * fix(gateway): recover from stale pid files and close cron agents Two issues were keeping the gateway from surviving long runs: 1. `_cleanup_invalid_pid_path` delegated to `remove_pid_file`, which refuses to unlink when the file's pid differs from our own. That safety check exists for the --replace atexit handoff, but it also applied to stale-record cleanup, so after a crashy exit the pid file was orphaned: `write_pid_file()`'s O_EXCL create then failed with `FileExistsError`, and systemd looped on "PID file race lost to another gateway instance". Unlink unconditionally from this helper since the caller has already verified the record is dead. 2. The cron scheduler never closed the ephemeral `AIAgent` it creates per tick, and never swept the process-global auxiliary-client cache. Over days of 10-minute ticks this leaked subprocesses and async httpx transports until the gateway hit EMFILE. Release the agent and call `cleanup_stale_async_clients()` in `run_job`'s outer `finally`, matching the gateway's own per-turn cleanup. * chore(release): map bloodcarter@gmail.com -> bloodcarter --------- Co-authored-by: bloodcarter <bloodcarter@gmail.com>	2026-04-27 07:41:42 -07:00
briandevans	500774e30e	fix(gateway): pass session messages to shutdown_memory_provider (#15165 ) ``_cleanup_agent_resources`` previously invoked ``agent.shutdown_memory_provider()`` with no arguments, so every memory provider's ``on_session_end`` hook received an empty list. Providers with an early-return guard on empty input (Holographic, Hindsight) never extracted facts from the conversation, and users hit "抱歉，找不到相關的對話記錄" on the first turn after any gateway restart, session reset, or idle expiry. Forward ``agent._session_messages`` — the transcript the agent itself maintains and refreshes every turn via ``_persist_session`` — so providers see the actual conversation. Falls back to the legacy no-arg call whenever the attribute is absent or not a list (test stubs built via ``object.__new__`` or ``MagicMock``) to preserve backward compatibility with existing suites. ``AIAgent.shutdown_memory_provider`` already accepts ``messages: list = None`` (run_agent.py:4126), so this is a pure caller-side fix. Paths that use ``skip_memory=True`` temporary agents (memory flush, hygiene auto-compress, ``/compress``) are no-ops inside ``shutdown_memory_provider`` because ``self._memory_manager`` is None — no behaviour change for them. Covers Part A of the bug report. Part B (adding ``on_session_end`` to the Hindsight plugin) is a separate concern that would benefit from this fix landing first. Regression test added at ``tests/gateway/test_shutdown_memory_provider_messages.py`` covering: populated messages forwarded, empty list still forwarded, attribute missing falls back, non-list (MagicMock) falls back, provider exceptions don't block ``close()``, None agent no-op, and agent without ``shutdown_memory_provider`` tolerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 06:41:16 -07:00
Teknium	ec671c4154	feat(image-input): native multimodal routing based on model vision capability (#16506 ) * feat(image-input): native multimodal routing based on model vision capability Attach user-sent images as OpenAI-style content parts on the user turn when the active model supports native vision, so vision-capable models see real pixels instead of a lossy text description from vision_analyze. Routing decision (agent/image_routing.py::decide_image_input_mode): agent.image_input_mode = auto \| native \| text (default: auto) In auto mode: - If auxiliary.vision.provider/model is explicitly configured, keep the text pipeline (user paid for a dedicated vision backend). - Else if models.dev reports supports_vision=True for the active provider/model, attach natively. - Else fall back to text (current behaviour). Call sites updated: gateway/run.py (all messaging platforms), tui_gateway (dashboard/Ink), cli.py (interactive /attach + drag-drop). run_agent.py changes: - _prepare_anthropic_messages_for_api now passes image parts through unchanged when the model supports vision — the Anthropic adapter translates them to native image blocks. Previous behaviour (vision_analyze → text) only runs for non-vision Anthropic models. - New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths, so non-vision models on any provider get text-fallback instead of failing at the provider. - New _model_supports_vision() helper reads models.dev caps. vision_analyze description rewritten: positions it as a tool for images NOT already visible in the conversation (URLs, tool output, deeper inspection). Prevents the model from redundantly calling it on images already attached natively. Config default: agent.image_input_mode = auto. Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py), all existing tests that reference _prepare_anthropic_messages_for_api still pass (198 targeted + new tests green). * feat(image-input): size-cap + resize oversized images, charge image tokens in compressor Two follow-ups that make the native image routing safer for long / heavy sessions: 1) Oversize handling in build_native_content_parts: - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES, the most restrictive provider — Gemini inline data). - Delegates to vision_tools._resize_image_for_vision (Pillow-based, already battle-tested) to downscale to 5 MB first-try. - If Pillow is missing or resize still overshoots, the image is dropped and reported back in skipped[]; caller falls back to text enrichment for that image. 2) Image-token accounting in context_compressor: - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant; within the realistic range for Anthropic/GPT-4o/Gemini billing). - _content_length_for_budget() helper: sums text-part lengths and charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/ input_image part. Base64 payload inside image_url is NOT counted as chars — dimensions don't matter, only image-presence. - Both tail-cut sites (_prune_old_tool_results L527 and _find_tail_cut_by_tokens L1126) now call the helper so multi-image conversations don't slip past compression budget. Tests: 9 new in test_image_routing.py (oversize triggers resize, resize-fails-returns-None, oversize-skipped-reported), 11 new in test_compressor_image_tokens.py (flat charge per image, multiple images, Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on raw base64, bounds-check on the constant, integration test that an image-heavy tail actually gets trimmed). * fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits The previous commit imposed a hardcoded 20 MB base64 ceiling on all providers, triggering auto-resize on anything larger. This was wrong in both directions: * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400 'image exceeds 5 MB maximum' above that). * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without complaint (empirically verified April 2026 with progressive PNG sizes). New behaviour: * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder). * Providers NOT in the table get no ceiling — images attach at native size and we trust the provider to return its own error if it disagrees. A provider-specific 400 message is clearer than us guessing wrong and silently degrading image quality. * build_native_content_parts() gains a keyword-only provider arg; gateway/CLI/TUI pass the active provider so Anthropic users get auto-resize protection while OpenAI users don't pay it. * Resize target dropped from 5 MB to 4 MB to slide safely under Anthropic's boundary with header overhead. Empirical measurements (direct API, no Hermes in the loop): image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5 0.19 MB ✓ ✓ ✓ 12.37 MB ✗ 400 5MB ✓ ✓ 23.85 MB ✗ 400 5MB ✓ ✓ 49.46 MB ✗ 413 ✓ ✓ Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through, Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts routes ceiling by provider, unknown provider gets no ceiling. All 52 targeted tests pass. * refactor(image-input): attempt native, shrink-and-retry on provider reject Replace proactive per-provider size ceilings with a reactive shrink path on the provider's actual rejection. All providers now attempt native full-size attachment first; if the provider returns an image-too-large error, the agent silently shrinks and retries once. Why the previous design was wrong: hardcoding provider ceilings (anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image paid no tax, but Anthropic users lost quality on anything >5MB even though the empirical behaviour at provider-reject time is the same (shrink + retry). Baking the table into the routing layer also requires updating Hermes every time a provider's limit changes. Reactive design: - image_routing.py: _file_to_data_url encodes native size, no ceiling. build_native_content_parts drops its provider kwarg. - error_classifier.py: new FailoverReason.image_too_large + pattern match ("image exceeds", "image too large", etc.) checked BEFORE context_overflow so Anthropic's 5MB rejection lands in the right bucket. - run_agent.py: new _try_shrink_image_parts_in_messages walks api messages in-place, re-encodes oversized data: URL image parts through vision_tools._resize_image_for_vision to fit under 4MB, handles both chat.completions (dict image_url) and Responses (string image_url) shapes, ignores http URLs (provider-fetched). New image_shrink_retry_attempted flag in the retry loop fires the shrink exactly once per turn after credential-pool recovery but before auth retries. E2E verified live against Anthropic claude-sonnet-4-6: - 17.9MB PNG (23.9MB b64) attached at native size - Anthropic returns 400 "image exceeds 5 MB maximum" - Agent logs '📐 Image(s) exceeded provider size limit — shrank and retrying...' - Retry succeeds, correct response delivered in 6.8s total. Tests: 12 new (8 shrink-helper shapes + 4 classifier signals), replaces 5 proactive-ceiling tests with 3 simpler 'native attach works' tests. 181 targeted tests pass. test_enum_members_exist in test_error_classifier.py updated for the new enum value.	2026-04-27 06:27:59 -07:00
Teknium	90a3e73daf	fix(debug): sweep expired paste.rs uploads on a real timer (#16431 ) Previously 'hermes debug share' uploads only got DELETEd when the user ran 'hermes debug share' again — opportunistic-sweep-on-invoke was the only cleanup path. A user who uploaded once and never ran debug again left pastes up until paste.rs's retention kicked in (which, empirically, never actually expires them). Hook _sweep_expired_pastes into the gateway cron ticker at the same hourly cadence as the image/document cache cleanups. The opportunistic sweep in 'hermes debug share' stays as a fallback for CLI-only users who never start the gateway.	2026-04-27 00:36:33 -07:00
Teknium	4a2ee6c162	fix(title-gen): surface auxiliary failures via _emit_auxiliary_failure Closes #15775. Title generation swallowed exceptions at debug level and returned None, so a depleted auxiliary provider (e.g. OpenRouter 402) silently left sessions with NULL titles. Reporter observed 45 untitled sessions accumulated over 19 days with no user-visible indication. - agent/title_generator.py: accept optional failure_callback, bump log to WARNING, invoke callback on call_llm exception (swallowing callback errors so nothing can crash the fire-and-forget worker thread). - cli.py, gateway/run.py: pass agent._emit_auxiliary_failure as the callback so failures route through the existing user-visible warning channel. - tests: cover callback fires / errors are swallowed / no-callback legacy behavior / maybe_auto_title forwards kwarg to worker.	2026-04-26 21:49:34 -07:00
Teknium	478444c262	feat(checkpoints): auto-prune orphan and stale shadow repos at startup (#16303 ) Every working dir hermes ever touches gets its own shadow git repo under ~/.hermes/checkpoints/{sha256(abs_dir)[:16]}/. The per-repo _prune is a no-op (comment in CheckpointManager._prune says so), so abandoned repos from deleted/moved projects or one-off tmp dirs pile up forever. Field reports put the typical offender at 1000+ repos / ~12 GB on active contributor machines. Adds an opt-in startup sweep that mirrors the sessions.auto_prune pattern from #13861 / #16286: - tools/checkpoint_manager.py: new prune_checkpoints() and maybe_auto_prune_checkpoints() helpers. Deletes shadow repos that are orphan (HERMES_WORKDIR marker points to a path that no longer exists) or stale (newest in-repo mtime older than retention_days). Idempotent via a CHECKPOINT_BASE/.last_prune marker file so it only runs once per min_interval_hours regardless of how many hermes processes start up. - hermes_cli/config.py: new checkpoints.auto_prune / retention_days / delete_orphans / min_interval_hours knobs. Default auto_prune: false so users who rely on /rollback against long-ago sessions never lose data silently. - cli.py / gateway/run.py: startup hooks gated on checkpoints.auto_prune, called right next to the existing state.db maintenance block. - Docs updated with the new config knobs. - 11 regression tests: orphan/stale deletion, precedence, byte-freed tracking, non-shadow dir skip, interval gating, corrupt marker recovery. Refs #3015 (session-file disk growth was fixed in #16286; this covers the checkpoint side noted out-of-scope there).	2026-04-26 19:05:52 -07:00
Teknium	77d4766602	fix(gateway): clear pending model note on auto-reset paths too PR #16013 plugged the leak in `/new`, but two sibling session-boundary resets had the same bug: 1. Inactivity / suspended-session auto-reset (top of `_handle_message`) previously cleared only reasoning. Now drops model override and the queued "/model switched" note as well. 2. Compression-exhaustion auto-reset now also drops the pending note alongside the existing model/reasoning cleanup. All three session-boundary sites now use the identical cleanup idiom.	2026-04-26 19:01:50 -07:00
johnncenae	00c6480a05	fix(gateway): clear stale pending model note on session reset	2026-04-26 19:01:50 -07:00
Teknium	ab6879634e	yuanbao platform (#16298 ) Co-authored-by: loongzhao <loongzhao@tencent.com>	2026-04-26 18:50:49 -07:00
Teknium	90c84c6dba	fix(gateway): unblock update subprocess on recognized-command bypass When the gateway intercepts a pending /update prompt and the user sends a recognized slash command (/new, /help, ...), the command now dispatches normally AND the detached update subprocess is unblocked by writing a blank .update_response. _gateway_prompt reads '' → strips → returns the prompt's default (typically a safe 'n' / skip), so the update process exits cleanly instead of blocking on stdin until the 30-minute watcher timeout. Also clears _update_prompt_pending[session_key] on this path so stray future input for the same session isn't re-intercepted. Extends PR #15849 with tests for the new cancel-write + a regression test pinning the legacy behavior of unrecognized /foo slash commands still being consumed as the response.	2026-04-26 18:39:44 -07:00
Yukipukii1	bdaf56a94d	fix(gateway): bypass slash commands during pending update prompts	2026-04-26 18:39:44 -07:00
Yang Zhi	3b60abb6bb	fix(sessions): delete on-disk transcript files during prune and delete (#3015 ) `delete_session()` and `prune_sessions()` only removed SQLite records, leaving .json/.jsonl transcript files on disk forever. Over time this causes unbounded disk growth (~27MB/day observed). Changes: - Add `_remove_session_files()` static helper that cleans up `{session_id}.json`, `.jsonl`, and `request_dump_{session_id}_*.json` - `delete_session()` accepts optional `sessions_dir` param and removes files for the deleted session and its children - `prune_sessions()` accepts optional `sessions_dir` param and removes files for all pruned sessions after the DB transaction - Wire up CLI `hermes sessions delete` and `hermes sessions prune` to pass `sessions_dir` - File cleanup is best-effort (OSError silenced) so DB operations are never blocked by filesystem issues - Fully backward-compatible: `sessions_dir=None` (default) preserves existing behavior	2026-04-26 18:31:07 -07:00
Teknium	635253b918	feat(busy): add 'steer' as a third display.busy_input_mode option (#16279 ) Enter while the agent is busy can now inject the typed text via /steer — arriving at the agent after the next tool call — instead of interrupting (current default) or queueing for the next turn. Changes: - cli.py: keybinding honors busy_input_mode='steer' by calling agent.steer(text) on the UI thread (thread-safe), with automatic fallback to 'queue' when the agent is missing, steer() is unavailable, images are attached, or steer() rejects the payload. /busy accepts 'steer' as a fourth argument alongside queue/interrupt/status. - gateway/run.py: busy-message handler and the PRIORITY running-agent path both route through running_agent.steer() when the mode is 'steer', with the same fallback-to-queue safety net. Ack wording tells users their message was steered into the current run. Restart-drain queueing now also activates for 'steer' so messages aren't lost across restarts. - agent/onboarding.py: first-touch hint has a steer branch for both CLI and gateway. - hermes_cli/commands.py: /busy args_hint updated to include steer, and 'steer' is registered as a subcommand (completions). - hermes_cli/web_server.py: dashboard select widget offers steer. - hermes_cli/config.py, cli-config.yaml.example, hermes_cli/tips.py: inline docs updated. - website/docs/user-guide/cli.md + messaging/index.md: documented. - Tests: steer set/status path for /busy; onboarding hints; _load_busy_input_mode accepts steer; busy-session ack exercises steer success + two fallback-to-queue branches. Requested on X by @CodingAcct. Default is unchanged (interrupt).	2026-04-26 18:21:29 -07:00
ghostmfr	e818ec520a	fix(slack): harden attachment handling Multiple overlapping Slack attachment improvements: 1. Upload retry with backoff on transient errors (429, 5xx, connection reset, rate_limited, service unavailable). New _is_retryable_upload_error helper covers three upload paths: _upload_file, send_video, send_document. Up to 3 attempts with 1.5s * attempt backoff. 2. Thread participation tracking: successful file uploads now add the thread_ts to _bot_message_ts, mirroring how text replies are tracked. This lets follow-up thread messages auto-trigger the bot (same engagement rules as replied threads). 3. Thread metadata preservation in the image redirect-guard fallback (send_image → send text fallback) and in two gateway.run.py send paths (image + document fallback calls). 4. HTML response rejection in _download_slack_file_bytes. Parallels the existing check in _download_slack_file. Guards against Slack returning a sign-in / redirect page as document bytes when scopes are missing, so the agent doesn't get HTML-as-a-PDF. 5. File lifecycle event acks (file_shared / file_created / file_change). These events arrive around snippet uploads. Acking them silences the slack_bolt 'Unhandled request' 404 warnings without changing behavior. 6. Post-loop message type classification so a mixed image+document upload classifies as PHOTO (or VOICE if no image), falling back to DOCUMENT. Previously, the per-file classification in the inbound loop could be overwritten unpredictably. 7. Expanded text-inject whitelist in inbound document handling to cover .csv, .json, .xml, .yaml, .yml, .toml, .ini, .cfg (up to 100KB) so snippets and config files are directly visible to the agent, not just cached as opaque uploads. Paired with new MIME entries in SUPPORTED_DOCUMENT_TYPES in base.py. Squashed from two commits in #11819 so the single commit carries the contributor's GitHub attribution (the original commits were authored under a local dev hostname).	2026-04-26 18:20:17 -07:00
Teknium	b16f9d438b	feat(telegram): send fresh finals for stale preview streams (port openclaw#72038) (#16261 ) Some checks are pending Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-and-push (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details Tests / test (push) Waiting to run Details Tests / e2e (push) Waiting to run Details Ports openclaw/openclaw#72038 to hermes-agent. Telegram's `editMessageText` preserves the original message timestamp, so a long-running streamed reply (reasoning models that take 60+ seconds to finish) would keep the first-token timestamp even after completion. Users can't tell how long a task actually took. When a preview message has been visible for >= 60s (configurable via `streaming.fresh_final_after_seconds`), finalize by sending a fresh message instead of editing in place, then best-effort delete the stale preview. Short previews still edit in place (the existing fast path). Implementation notes adapted from OpenClaw's TypeScript original: - `StreamConsumerConfig` gains `fresh_final_after_seconds` (default 0 = legacy edit-in-place). Gateway-level `StreamingConfig` defaults to 60. - `GatewayStreamConsumer` tracks `_message_created_ts` at first-send and checks it in `_send_or_edit` on `finalize=True`. New helpers `_should_send_fresh_final` + `_try_fresh_final`. - `BasePlatformAdapter` gains optional `delete_message(chat_id, message_id)` returning False by default. `TelegramAdapter` implements it via `_bot.delete_message`. - `gateway/run.py` only enables fresh-final for `Platform.TELEGRAM`; other platforms ignore the setting (they don't have the stale-edit timestamp problem or edit-then-read works cheaply). - Fallback to normal edit on any fresh-send failure — no user-visible regression if Telegram rate-limits a send or the message is gone. Tests: 15 new cases in tests/gateway/test_stream_consumer_fresh_final.py covering short/long previews, config plumbing, delete-support absent, send-failure fallback, __no_edit__ sentinel safety, and StreamingConfig round-trip. Co-authored-by: Hermes Agent <agent@nousresearch.com>	2026-04-26 17:26:37 -07:00
Tranquil-Flow	bf05b8f4a2	fix(gateway): clean up cached agents on shutdown (#11205 )	2026-04-26 12:51:53 -07:00
hhuang91	802c7acb81	fix(Slack): resolve Slack channels by raw ID and enumerate joined channels send_message(target='slack:<channel_id>') failed with "Could not resolve" because _parse_target_ref had no Slack branch — Slack's uppercase alphanumeric IDs fell through to channel-name resolution, which only matched by name. As a fallback, the agent would retry with bare target='slack' and post to the home channel instead. Three fixes: - _parse_target_ref recognizes Slack IDs (C/G/D/U/W prefix) as explicit targets so the name-resolver is bypassed entirely. - resolve_channel_name tries a case-sensitive raw-ID match before the existing name match, so any platform's IDs resolve cleanly. - _build_slack now actually calls users.conversations against each workspace's AsyncWebClient (paginated), instead of only returning session-history entries. This populates the directory with public and private channels the bot has joined, so action='list' shows them and they can also be addressed by name. Errors from one workspace don't block others. build_channel_directory becomes async (Slack web calls require it). The two async-context callers in gateway/run.py are awaited; the cron ticker thread call bridges via asyncio.run_coroutine_threadsafe. Slack bot needs channels:read and groups:read scopes for full enumeration; missing scopes degrade gracefully per-workspace. addressing #15927	2026-04-26 12:29:02 -07:00
Zhi Yan Liu	d993a3f450	fix(gateway): use /hermes sethome in onboarding hint on Slack Slack's adapter registers a single parent slash command /hermes and dispatches subcommands via slack_subcommand_map(). Bare /sethome is not a registered command on Slack and fails with 'app did not respond', logging 'Unhandled request' in slack_bolt.AsyncApp. Show /hermes sethome in the first-run onboarding hint when the source platform is Slack; keep /sethome for Telegram, Discord, Matrix, Mattermost, and other platforms that register it directly. Fixes #14632	2026-04-26 11:56:23 -07:00
Teknium	1dfcc2ffc3	fix(gateway): /queue is now a true FIFO — each invocation gets its own turn (#16175 ) Repeated /queue commands now each produce a full agent turn, in order, with no merging. Previously the second /queue overwrote the first because the handler wrote directly into the adapter's single-slot _pending_messages dict. - GatewayRunner grows a _queued_events overflow buffer (dict of list). - /queue puts new items in the adapter's next-up slot when free, otherwise appends to the overflow. After each run's drain consumes the slot, the next overflow item is promoted so the recursive run picks it up. - /new and /reset clear the overflow. - /status now reports queue depth when non-zero. - Ack message shows the depth once it exceeds 1. Helpers (_enqueue_fifo, _promote_queued_event, _queue_depth) use the getattr default-fallback pattern so existing tests that build bare GatewayRunner instances via object.__new__ keep working.	2026-04-26 11:55:09 -07:00
briandevans	4e356098d2	fixup! fix(gateway): preserve inactivity clock on interrupt-recursive cached-agent turns (#15654 ) Address Copilot review findings: 1. Gate _last_activity_desc on interrupt_depth == 0 alongside _last_activity_ts. Both fields are semantically paired — desc describes the activity at ts. Updating desc without ts made get_activity_summary() report "starting new turn (cached)" for 20+ minutes while the timestamp showed the true stale duration, producing misleading diagnostic output. 2. Monkeypatch gateway.run.time.time to a fixed epoch in tests that assert on _last_activity_ts values. Real time.time() comparisons were latently flaky under slow CI or NTP adjustments. _FAKE_NOW = 10_000.0 is used as the reference; assertions are now exact equality rather than >=. 3. Add test_fresh_turn_resets_desc and test_interrupt_turn_preserves_desc to directly cover the gated desc behaviour introduced by (1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 08:45:44 -07:00
briandevans	de24315978	fix(gateway): preserve inactivity clock on interrupt-recursive cached-agent turns (#15654 ) _last_activity_ts was unconditionally reset to time.time() on every _agent_cache hit. For interrupt-recursive _run_agent calls (_interrupt_depth > 0) this silently reset the inactivity watchdog's idle clock on each re-entry, preventing the 30-min timeout from ever firing when a turn got stuck in an interrupt loop. A stuck session would emit "Still working... iteration 0/60, starting new turn (cached)" heartbeats indefinitely instead of timing out. Gate the reset on _interrupt_depth == 0 only. Fresh external turns still receive the reset so a session idle for 29 min doesn't trip the watchdog before the new turn makes its first API call (#9051). The per-turn reset logic is extracted into a static helper _init_cached_agent_for_turn() to make it directly testable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 08:45:44 -07:00
Teknium	20cb706e03	chore: extend [SYSTEM:→[IMPORTANT: rename + AUTHOR_MAP Follow-up to #6616 covering the remaining user-injected prompt markers that the original PR did not touch (reporter's second comment on #6576 explicitly flagged these). Azure OpenAI Default/DefaultV2 content filters treat any bracketed [SYSTEM: ...] as prompt-injection and reject with HTTP 400. Remaining call sites renamed: - cli.py: background-process notifications (watch_disabled, watch_match, completion), MCP reload notice (4 live + 1 docstring) - gateway/run.py: same notification paths + auto-loaded skill banner + MCP reload notice (5 live + 1 docstring) - tools/process_registry.py: comment reference Not renamed: - environments/hermes_base_env.py '[SYSTEM]\n{content}' — RL training trajectory rendering only, never sent to Azure, part of a symmetric [USER]/[ASSISTANT]/[TOOL] scheme. AUTHOR_MAP: buraysandro9@gmail.com -> ygd58.	2026-04-26 08:44:58 -07:00
Teknium	06f81752ed	Revert "feat(kanban): durable multi-profile collaboration board (#16081 )" (#16098 ) This reverts commit `15937a6b46`.	2026-04-26 08:29:37 -07:00
Teknium	15937a6b46	feat(kanban): durable multi-profile collaboration board (#16081 ) New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for worker and orchestrator profiles. SQLite-backed task board (~/.hermes/kanban.db) shared across all profiles on the host. Zero changes to run_agent.py, no new core tools, no tool-schema bloat. Motivation: delegate_task is a function call — sync fork/join, anonymous subagent, no resumability, no human-in-the-loop. Kanban is the durable shape needed for research triage, scheduled ops, digital twins, engineering pipelines, and fleet work. They coexist (workers may call delegate_task internally). What this adds - hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution, dispatcher, workspace resolution, worker-context builder. - hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash() entry point used by both CLI and gateway. - skills/devops/kanban-worker — how a profile should work a claimed task. - skills/devops/kanban-orchestrator — "you are a dispatcher, not a worker" template with anti-temptation rules. - /kanban slash command wired into cli.py and gateway/run.py. Bypasses the running-agent guard (board writes don't touch agent state), so /kanban unblock can free a stuck worker mid-conversation. - Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns; 4 user stories; implementation plan; concurrency correctness. - Docs: website/docs/user-guide/features/kanban.md, CLI reference updated, sidebar entry added. Architecture highlights - Three planes: control (user + gateway), state (board + dispatcher), execution (pool of profile processes). - Every worker is a full OS process, spawned as `hermes -p <profile>`. No in-process subagent swarms — solves NanoClaw's SDK-lifecycle failure class. - Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale claims reclaimed 15 min after their TTL expires. - Tenant namespacing via one nullable column — one specialist fleet can serve many businesses with data isolation by workspace path. Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution, dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass hermetic via scripts/run_tests.sh.	2026-04-26 08:24:26 -07:00
Teknium	454d883e69	refactor: drop persist_session plumbing + fix broken btw mid-turn bypass (#16075 ) Follow-up to PR #16053 (/btw as /background alias). Cleans up the plumbing added exclusively for the old ephemeral /btw handler and repairs a broken btw bypass that landed between my refactor and this follow-up. run_agent.py: - Remove persist_session kwarg, instance attr, and _persist_session short-circuit. Only /btw ever passed persist_session=False; with /btw gone the default (always persist) is the only behavior anyone ever wanted. gateway/run.py: - Remove the unreachable 'if _cmd_def_inner.name == "btw"' block (PR #16059). Canonical name for a /btw message is 'background' after alias resolution — the comparison could never be true, and it called _handle_btw_command which no longer exists. The /background branch above it already dispatches /btw correctly. tests/gateway/test_running_agent_session_toggles.py: - Fix test_btw_dispatches_mid_run to mock _handle_background_command (the real dispatch target for /btw) instead of the deleted _handle_btw_command.	2026-04-26 07:15:23 -07:00
Teknium	70f56e7605	fix(gateway): let /btw dispatch mid-turn instead of being rejected /btw spawns a parallel ephemeral side-question task (self-guarded against concurrent /btw on the same chat) — exactly like /background. But it was missing from the running-agent bypass list in _handle_message(), so it fell through to the catch-all and returned: ⏳ Agent is running — /btw can't run mid-turn. Wait for the current response or /stop first. That's the opposite of what /btw is for — asking a side question while the main turn is still working. Add the bypass next to /background and a regression test covering the mid-turn dispatch path. Reported by @IuriiTiunov on Telegram.	2026-04-26 07:11:10 -07:00
Teknium	7fa70b6c87	refactor: /btw is now an alias for /background (#16053 ) The ephemeral no-tools side-question variant of /btw confused users who expected 'by-the-way' to mean 'run this off to the side with tools' — they'd type /btw and get a toolless agent that couldn't do the work. /bg worked because it was /background with full tools. Collapse the two: /btw and /bg both alias to /background. One command, one behavior, no more gotchas about which variant has tools. Removed: - _handle_btw_command in cli.py and gateway/run.py - _run_btw_task + _active_btw_tasks state in gateway/run.py - prompt.btw JSON-RPC method + btw.complete event in tui_gateway - BtwStartResponse type + btw.complete case in ui-tui - Standalone /btw slash tree registration in Discord - Standalone btw CommandDef in hermes_cli/commands.py Updated: - background CommandDef aliases: (bg,) -> (bg, btw) - TUI session.ts: local btw handler merged into background - Docs and tips updated to describe /btw as a /background alias	2026-04-26 07:11:08 -07:00
Teknium	83c1c201f6	feat(onboarding): contextual first-touch hints for /busy and /verbose (#16046 ) Instead of a blocking first-run questionnaire, show a one-time hint the first time the user hits each behavior fork: 1. First message while the agent is working — appends a hint to the busy-ack explaining the /busy queue vs /busy interrupt knob, phrased to match the mode that was just applied (don't tell a queue-mode user to switch to queue). 2. First tool that runs for >= 30s in the noisiest progress mode (tool_progress: all) — prints a hint about /verbose to cycle display modes (all -> new -> off -> verbose). Gated on /verbose actually being usable on the surface: always shown on CLI; on gateway only shown when display.tool_progress_command is enabled. Each hint is latched in config.yaml under onboarding.seen.<flag>, so it fires exactly once per install across CLI, gateway, and cron, then never again. Users can wipe the section to re-see hints. New: - agent/onboarding.py — is_seen / mark_seen / hint strings, shared by both CLI and gateway. - onboarding.seen in DEFAULT_CONFIG (hermes_cli/config.py) and in load_cli_config defaults (cli.py). No _config_version bump — deep merge handles new keys. Wired: - gateway/run.py: _handle_active_session_busy_message appends the hint after building the ack. progress_callback tracks tool.completed duration and queues the tool-progress hint into the progress bubble. - cli.py: CLI input loop appends the busy-input hint on the first busy Enter; _on_tool_progress appends the tool-progress hint on the first >=30s tool completion. In-memory CLI_CONFIG is also updated so subsequent fires in the same process are suppressed immediately. All writes go through atomic_yaml_write and are wrapped in try/except so onboarding can never break the input/busy-ack paths.	2026-04-26 06:06:27 -07:00
Teknium	4bda9dcade	fix(gateway): honor voice.auto_tts config in auto-TTS gate (#16007 ) (#16039 ) The base adapter's auto-TTS path fired on any voice message unless the chat had explicitly run /voice off — it never read voice.auto_tts from config.yaml, so users who set auto_tts: false still got audio replies. Gate the base adapter on a three-layer decision instead: 1. chat in _auto_tts_enabled_chats (explicit /voice on\|tts) → fire 2. chat in _auto_tts_disabled_chats (explicit /voice off) → suppress 3. else → voice.auto_tts global default Runner now pushes voice.auto_tts onto the adapter as _auto_tts_default and mirrors /voice on\|tts chats into _auto_tts_enabled_chats via the existing _sync_voice_mode_state_to_adapter path. /voice off still wins. Closes #16007.	2026-04-26 05:52:05 -07:00
Teknium	35c57cc46b	fix(gateway): suppress tool-progress bubbles after interrupt (#16034 ) When the LLM response carries N parallel tool calls, the agent fires N tool.started events back-to-back before its interrupt check runs. A user sending /stop mid-batch would see the '⚡ Interrupting current task' ack followed by a trail of 🔍 web_search bubbles for the remaining events in the batch — making the interrupt feel ignored. progress_callback and the drain loop in send_progress_messages now check agent.is_interrupted (via agent_holder[0], the existing cross-scope handle). Events that arrive after interrupt are dropped at both the queueing and rendering stages. The '⚡ Interrupting' message is sent through a separate adapter path and is unaffected.	2026-04-26 05:47:37 -07:00
Teknium	125de02056	fix(context): honor custom_providers context_length on /model switch + bump probe tier to 256K (#15844 ) Fixes #15779. Custom-provider per-model context_length (`custom_providers[].models.<id>.context_length`) is now honored across every resolution path, not just agent startup. Also adds 256K as the top probe tier and default fallback. ## What changed New helper `hermes_cli.config.get_custom_provider_context_length()` — single source of truth for the per-model override lookup, with trailing-slash-insensitive base-url matching. `agent.model_metadata.get_model_context_length()` gains an optional `custom_providers=` kwarg (step 0b — runs after explicit `config_context_length` but before every other probe). Wired through five call sites that previously either duplicated the lookup or ignored it entirely: - `run_agent.py` startup — refactored to use the new helper (dedups legacy inline loop, keeps invalid-value warning) - `AIAgent.switch_model()` — re-reads custom_providers from live config on every /model switch - `hermes_cli.model_switch.resolve_display_context_length()` — new `custom_providers=` kwarg - `gateway/run.py` /model confirmation (picker callback + text path) - `gateway/run.py` `_format_session_info` (/info) ## Context probe tiers `CONTEXT_PROBE_TIERS = [256_000, 128_000, 64_000, 32_000, 16_000, 8_000]` — was `[128_000, ...]`. `DEFAULT_FALLBACK_CONTEXT` follows tier[0], so unknown models now default to 256K. The stale `128000` literal in the OpenRouter metadata-miss path is replaced with `DEFAULT_FALLBACK_CONTEXT` for consistency. ## Repro (from #15779) ```yaml custom_providers: - name: my-custom-endpoint base_url: https://example.invalid/v1 model: gpt-5.5 models: gpt-5.5: context_length: 1050000 ``` `/model gpt-5.5 --provider custom:my-custom-endpoint` → previously "Context: 128,000", now "Context: 1,050,000". ## Tests - `tests/hermes_cli/test_custom_provider_context_length.py` — new file, 19 tests covering the helper, step-0b integration, and the 256K tier invariants - `tests/hermes_cli/test_model_switch_context_display.py` — added regression tests for #15779 through the display resolver - `tests/gateway/test_session_info.py` — updated default-fallback assertion (128K → 256K) - `tests/agent/test_model_metadata.py` — updated tier assertions for the new top tier	2026-04-25 18:47:53 -07:00
Iris Jin	25ba6a4a74	fix(gateway): make reasoning session-scoped by default	2026-04-25 18:01:31 -07:00
kshitijk4poor	7c17accb29	fix: /stop now immediately aborts streaming retry loop When a user sends /stop during a streaming API call, the outer poll loop detects _interrupt_requested and closes the HTTP connection. However, the inner _call() thread catches the connection error and enters its retry loop — opening a FRESH connection without checking the interrupt flag. On slow providers like ollama-cloud, each retry attempt blocks for the full stream-read timeout (120s+). With 3 retry attempts this caused 510+ second delays between /stop and actual response — the agent appeared completely unresponsive despite the stop being acknowledged. Fix: add an _interrupt_requested check at the top of the streaming retry loop so the agent exits immediately instead of retrying. Also fix log truncation: all session key logging in gateway/run.py used [:20] or [:30] slices, which truncated 'agent:main:telegram:dm:5690190437' (33 chars) to 'agent:main:telegram:' — losing the identifying chat type and user ID. Replace with full keys to make logs debuggable. Reported by user Sidharth Pulipaka via Telegram on ollama-cloud provider.	2026-04-25 09:51:39 -07:00
Teknium	ea01bdcebe	refactor(memory): remove flush_memories entirely (#15696 ) The AIAgent.flush_memories pre-compression save, the gateway _flush_memories_for_session, and everything feeding them are obsolete now that the background memory/skill review handles persistent memory extraction. Problems with flush_memories: - Pre-dates the background review loop. It was the only memory-save path when introduced; the background review now fires every 10 user turns on CLI and gateway alike, which is far more frequent than compression or session reset ever triggered flush. - Blocking and synchronous. Pre-compression flush ran on the live agent before compression, blocking the user-visible response. - Cache-breaking. Flush built a temporary conversation prefix (system prompt + memory-only tool list) that diverged from the live conversation's cached prefix, invalidating prompt caching. The gateway variant spawned a fresh AIAgent with its own clean prompt for each finalized session — still cache-breaking, just in a different process. - Redundant. Background review runs in the live conversation's session context, gets the same content, writes to the same memory store, and doesn't break the cache. Everything flush_memories claimed to preserve is already covered. What this removes: - AIAgent.flush_memories() method (~248 LOC in run_agent.py) - Pre-compression flush call in _compress_context - flush_memories call sites in cli.py (/new + exit) - GatewayRunner._flush_memories_for_session + _async_flush_memories (and the 3 call sites: session expiry watcher, /new, /resume) - 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks, hermes tools UI task list, auxiliary_client docstrings - _memory_flush_min_turns config + init - #15631's headroom-deduction math in _check_compression_model_feasibility (headroom was only needed because flush dragged the full main-agent system prompt along; the compression summariser sends a single user-role prompt so new_threshold = aux_context is safe again) - The dedicated test files and assertions that exercised flush-specific paths What this renames (with read-time backcompat on sessions.json): - SessionEntry.memory_flushed -> SessionEntry.expiry_finalized. The session-expiry watcher still uses the flag to avoid re-running finalize/eviction on the same expired session; the new name reflects what it now actually gates. from_dict() reads 'expiry_finalized' first, falls back to the legacy 'memory_flushed' key so existing sessions.json files upgrade seamlessly. Supersedes #15631 and #15638. Tested: 383 targeted tests pass across run_agent/, agent/, cli/, and gateway/ session-boundary suites. No behavior regressions — background memory review continues to handle persistent memory extraction on both CLI and gateway.	2026-04-25 08:21:14 -07:00
Teknium	05d8f11085	fix(/model): show provider-enforced context length, not raw models.dev (#15438 ) /model gpt-5.5 on openai-codex showed 'Context: 1,050,000 tokens' because the display block used ModelInfo.context_window directly from models.dev. Codex OAuth actually enforces 272K for the same slug, and the agent's compressor already runs at 272K via get_model_context_length() — so the banner + real context budget said 272K while /model lied with 1M. Route the display context through a new resolve_display_context_length() helper that always prefers agent.model_metadata.get_model_context_length (which knows about Codex OAuth, Copilot, Nous caps) and only falls back to models.dev when that returns nothing. Fix applied to all 3 /model display sites: cli.py _handle_model_switch gateway/run.py picker on_model_selected callback gateway/run.py text-fallback confirmation Reported by @emilstridell (Telegram, April 2026).	2026-04-24 17:21:38 -07:00
simbam99	19a3e2ce8e	fix(gateway): follow compression continuations during /resume	2026-04-24 16:42:31 -07:00
knockyai	3e6c108565	fix(gateway): honor queue mode in runner PRIORITY interrupt path When display.busy_input_mode is 'queue', the runner-level PRIORITY block in _handle_message was still calling running_agent.interrupt() for every text follow-up to an active session. The adapter-level busy handler already honors queue mode (commit `9d147f7fd`), but this runner-level path was an unconditional interrupt regardless of config. Adds a queue-mode branch that queues the follow-up via _queue_or_replace_pending_event() and returns without interrupting. Salvages the useful part of #12070 (@knockyai). The config fan-out to per-platform extra was redundant — runner already loads busy_input_mode directly via _load_busy_input_mode().	2026-04-24 15:18:34 -07:00
Teknium	62c14d5513	refactor(gateway): extract WhatsApp identity helpers into shared module Follow-up to the canonical-identity session-key fix: pull the JID/LID normalize/expand/canonical helpers into gateway/whatsapp_identity.py instead of living in two places. gateway/session.py (session-key build) and gateway/run.py (authorisation allowlist) now both import from the shared module, so the two resolution paths can't drift apart. Also switches the auth path from module-level _hermes_home (cached at import time) to dynamic get_hermes_home() lookup, which matches the session-key path and correctly reflects HERMES_HOME env overrides. The lone test that monkeypatched gateway.run._hermes_home for the WhatsApp auth path is updated to set HERMES_HOME env var instead; all other tests that monkeypatch _hermes_home for unrelated paths (update, restart drain, shutdown marker, etc.) still work — the module-level _hermes_home is untouched.	2026-04-24 07:55:55 -07:00
Blind Dev	591aa159aa	feat: allow Telegram chat allowlists for groups and forums (#15027 ) * feat: allow Telegram chat allowlists for groups and forums * chore: map web3blind noreply email for release attribution --------- Co-authored-by: web3blind <web3blind@users.noreply.github.com>	2026-04-24 07:23:14 -07:00
Stefan Dimitrov	260ae62134	Invoke session finalize hooks on expiry flush	2026-04-24 05:40:52 -07:00
Tranquil-Flow	ee83a710f0	fix(gateway,cron): activate fallback_model when primary provider auth fails When the primary provider raises AuthError (expired OAuth token, revoked API key), the error was re-raised before AIAgent was created, so fallback_model was never consulted. Now both gateway/run.py and cron/scheduler.py catch AuthError specifically and attempt to resolve credentials from the fallback_providers/fallback_model config chain before propagating the error. Closes #7230	2026-04-24 05:35:43 -07:00
Teknium	b2e124d082	refactor(commands): drop /provider, /plan handler, and clean up slash registry (#15047 ) * refactor(commands): drop /provider and clean up slash registry * refactor(commands): drop /plan special handler — use plain skill dispatch	2026-04-24 03:10:52 -07:00
Keira Voss	1ef1e4c669	feat(plugins): add pre_gateway_dispatch hook Introduces a new plugin hook `pre_gateway_dispatch` fired once per incoming MessageEvent in `_handle_message`, after the internal-event guard but before the auth / pairing chain. Plugins may return a dict to influence flow: {"action": "skip", "reason": "..."} -> drop (no reply) {"action": "rewrite", "text": "..."} -> replace event.text {"action": "allow"} / None -> normal dispatch Motivation: gateway-level message-flow patterns that don't fit cleanly into any single adapter — e.g. listen-only group-chat windows (buffer ambient messages, collapse on @mention), or human-handover silent ingest (record messages while an owner handles the chat manually). Today these require forking core; with this hook they can live in a single profile-agnostic plugin. Hook runs BEFORE auth so plugins can handle unauthorized senders (e.g. customer-service handover ingest) without triggering the pairing-code flow. Exceptions in plugin callbacks are caught and logged; the first non-None action dict wins, remaining results are ignored. Includes: - `VALID_HOOKS` entry + inline doc in `hermes_cli/plugins.py` - Invocation block in `gateway/run.py::_handle_message` - 5 new tests in `tests/gateway/test_pre_gateway_dispatch.py` (skip, rewrite, allow, exception safety, internal-event bypass) - 2 additional tests in `tests/hermes_cli/test_plugins.py` - Table entry in `website/docs/user-guide/features/plugins.md` Made-with: Cursor	2026-04-24 03:02:03 -07:00
Teknium	a9a4416c7c	fix(compress): don't reach into ContextCompressor privates from /compress (#15039 ) Manual /compress crashed with 'LCMEngine' object has no attribute '_align_boundary_forward' when any context-engine plugin was active. The gateway handler reached into _align_boundary_forward and _find_tail_cut_by_tokens on tmp_agent.context_compressor, but those are ContextCompressor-specific — not part of the generic ContextEngine ABC — so every plugin engine (LCM, etc.) raised AttributeError. - Add optional has_content_to_compress(messages) to ContextEngine ABC with a safe default of True (always attempt). - Override it in the built-in ContextCompressor using the existing private helpers — preserves exact prior behavior for 'compressor'. - Rewrite gateway /compress preflight to call the ABC method, deleting the private-helper reach-in. - Add focus_topic to the ABC compress() signature. Make _compress_context retry without focus_topic on TypeError so older strict-sig plugins don't crash on manual /compress <focus>. - Regression test with a fake ContextEngine subclass that only implements the ABC (mirrors LCM's surface). Reported by @selfhostedsoul (Discord, Apr 22).	2026-04-24 02:55:43 -07:00
whitehatjr1001	9d147f7fde	fix(gateway): enhance message handling during agent tasks with queue mode support	2026-04-23 15:12:42 -07:00
hharry11	d0821b0573	fix(gateway): only clear locks belonging to the replaced process	2026-04-23 15:07:06 -07:00
Teknium	97b9b3d6a6	fix(gateway): drain-aware hermes update + faster still-working pings (#14736 ) cmd_update no longer SIGKILLs in-flight agent runs, and users get 'still working' status every 3 min instead of 10. Two long-standing sources of '@user — agent gives up mid-task' reports on Telegram and other gateways. Drain-aware update: - New helper hermes_cli.gateway._graceful_restart_via_sigusr1(pid, drain_timeout) sends SIGUSR1 to the gateway and polls os.kill(pid, 0) until the process exits or the budget expires. - cmd_update's systemd loop now reads MainPID via 'systemctl show --property=MainPID --value' and tries the graceful path first. The gateway's existing SIGUSR1 handler -> request_restart(via_service= True) -> drain -> exit(75) is wired in gateway/run.py and is respawned by systemd's Restart=on-failure (and the explicit RestartForceExitStatus=75 on newer units). - Falls back to 'systemctl restart' when MainPID is unknown, the drain budget elapses, or the unit doesn't respawn after exit (older units missing Restart=on-failure). Old install behavior preserved. - Drain budget = max(restart_drain_timeout, 30s) + 15s margin so the drain loop in run_agent + final exit have room before fallback fires. Composes with #14728's tool-subprocess reaping. Notification interval: - agent.gateway_notify_interval default 600 -> 180. - HERMES_AGENT_NOTIFY_INTERVAL env-var fallback in gateway/run.py matched. - 9-minute weak-model spinning runs now ping at 3 min and 6 min instead of 27 seconds before completion, removing the 'is the bot dead?' reflex that drives gateway-restart cycles. Tests: - Two new tests in tests/hermes_cli/test_update_gateway_restart.py: one asserts SIGUSR1 is sent and 'systemctl restart' is NOT called when MainPID is known and the helper succeeds; one asserts the fallback fires when the helper returns False. - E2E: spawned detached bash processes confirm the helper returns True on SIGUSR1-handling exit (~0.5s) and False on SIGUSR1-ignoring processes (timeout). Verified non-existent PID and pid=0 edge cases. - 41/41 in test_update_gateway_restart.py (was 39, +2 new). - 154/154 in shutdown-related suites including #14728's new tests. Reported by @GeoffWellman and @ANT_1515 on X.	2026-04-23 14:01:57 -07:00
Teknium	327b57da91	fix(gateway): kill tool subprocesses before adapter disconnect on drain timeout (#14728 ) Closes #8202. Root cause: stop() reclaimed tool-call bash/sleep children only at the very end of the shutdown sequence — after a 60s drain, 5s interrupt grace, and per-adapter disconnect. Under systemd (TimeoutStopSec bounded by drain_timeout), that meant the cgroup SIGKILL escalation fired first, and systemd reaped the bash/sleep children instead of us. Fix: - Extract tool-subprocess cleanup into a local helper _kill_tool_subprocesses() in _stop_impl(). - Invoke it eagerly right after _interrupt_running_agents() on the drain-timeout path, before adapter disconnect. - Keep the existing catch-all call at the end for the graceful path and defense in depth against mid-teardown respawns. - Bump generated systemd unit TimeoutStopSec to drain_timeout + 30s so cleanup + disconnect + DB close has headroom above the drain budget, matching the 'subprocess timeout > TimeoutStopSec + margin' rule from the skill. Tests: - New: test_gateway_stop_kills_tool_subprocesses_before_adapter_disconnect_on_timeout asserts kill_all() runs before disconnect() when drain times out. - New: test_gateway_stop_kills_tool_subprocesses_on_graceful_path guards that the final catch-all still fires when drain succeeds (regression guard against accidental removal during refactor). - Updated: existing systemd unit generator tests expect TimeoutStopSec=90 (= 60s drain + 30s headroom) with explanatory comment.	2026-04-23 13:59:29 -07:00
etcircle	b7bdf32d4e	fix(gateway): guard session slot ownership after stop/reset Closes the runner-side half of the split-brain described in issue #11016 by wiring the existing _session_run_generation counter through the session-slot promotion and release paths. Without this, an older async run could still: - promote itself from sentinel to real agent after /stop or /new invalidated its run generation - clear _running_agents on the way out, deleting a newer run's slot Both races leave _running_agents desynced from what the user actually has in flight, which is half of what shows up as 'No active task to stop' followed by late 'Interrupting current task...' acks. Changes: - track_agent() in _run_agent now calls _is_session_run_current() before writing the real agent into _running_agents[session_key]; if /stop or /new bumped the generation while the agent was spinning up, the slot is left alone (the newer run owns it). - _release_running_agent_state() gained an optional run_generation keyword. When provided, it only clears the slot if the generation is still current. The final cleanup at the tail of _run_agent passes the run's generation so an old unwind can't blow away a newer run's state. - Returns bool so callers can tell when a release was blocked. All the existing call sites that do NOT pass run_generation behave exactly as before — this is a strict additive guard. Refs #11016	2026-04-23 05:15:52 -07:00
David VV	39fcf1d127	fix(model_switch): group custom_providers by endpoint in /model picker (#9210 ) Multiple custom_providers entries sharing the same base_url + api_key are now grouped into a single picker row. A local Ollama host with per-model display names ("Ollama — GLM 5.1", "Ollama — Qwen3-coder", "Ollama — Kimi K2", "Ollama — MiniMax M2.7") previously produced four near-duplicate picker rows that differed only by suffix; now it appears as one "Ollama" row with four models. Key changes: - Grouping key changed from slug-by-name to (base_url, api_key). Names frequently differ per model while the endpoint stays the same. - When the grouped endpoint matches current_base_url, the row's slug is set to current_provider so picker-driven switches route through the live credential pipeline (no re-resolution needed). - Per-model suffix is stripped from the display name ("Ollama — X" → "Ollama") via em-dash / " - " separators. - Two groups with different api_keys at the same base_url (or otherwise colliding on cleaned name) are disambiguated with a numeric suffix (custom:openai, custom:openai-2) so both stay visible. - current_base_url parameter plumbed through both gateway call sites. Existing #8216, #11499, #13509 regressions covered (dict/list shapes of models:, section-3/section-4 dedup, normalized list-format entries). Salvaged from @davidvv's PR #9210 — the underlying code had diverged ~1400 commits since that PR was opened, so this is a reconstruction of the same approach on current main rather than a clean cherry-pick. Authorship preserved via --author on this commit. Closes #9210	2026-04-23 03:10:30 -07:00

1 2 3 4 5 ...

654 commits