hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-30 06:41:51 +00:00

Author	SHA1	Message	Date
Dave Heritage	5a95fb2e14	feat: expose completed-turn message context to memory providers Adds an optional `messages` keyword to the `MemoryProvider.sync_turn` contract so external/community memory plugins can receive the OpenAI-style conversation message list for the completed turn — including assistant tool calls and tool result content — not just the final assistant text. Dispatch uses signature inspection (`_provider_sync_accepts_messages`): only providers that declare a `messages` parameter (or `**kwargs`) receive it; all existing in-tree providers keep their legacy text-only signature and are called unchanged. No structured-trace envelope is added to core — providers reconstruct whatever they need from the standard message list. Also documents Memori as a standalone community memory provider. Salvaged from #28065 — rebased onto current main. Co-authored-by: Dave Heritage <david@memorilabs.ai>	2026-05-29 02:16:43 +05:30
Teknium	67011cc0d7	feat(agent): buffer retry/fallback status, surface only on terminal failure (#33816 ) Users report that the CLI/gateway floods them with confusing retry chatter during transient failures: a single 429 can produce 10+ "Provider/Endpoint/ Retrying in 5s..." lines before the request eventually succeeds. The same firehose hits Telegram, Discord, Slack, etc. via _emit_status. This patch defers all retry/fallback/compression status messages until we know the outcome: - if the turn ultimately succeeds (any path: primary recovers, fallback activates, compression unsticks the request), the buffer is silently dropped — the user sees nothing. - if every retry and fallback exhausts and the turn fails, the buffer is flushed at the terminal-failure return so the user sees the full retry trace alongside the final error. Backend logging (agent.log) is unchanged — every emission site still writes to logger.warning/info, so post-mortem diagnosis is intact. ## What changed run_agent.py: four new methods on AIAgent: _buffer_status(msg) — defer an _emit_status call _buffer_vprint(msg) — defer a _vprint(force=True) line _clear_status_buffer() — drop pending messages on success _flush_status_buffer() — replay pending messages on terminal failure agent/conversation_loop.py: - converted ~30 mid-process emit/vprint sites in the retry, fallback, compression, empty-response, and stream-watchdog paths to the buffered helpers - added _flush_status_buffer() at every terminal-failure return so users still see the trace when it actually matters - added _clear_status_buffer() at the "non-empty assistant content" point (NOT at "API call returned bytes" — empty responses still loop through the empty-retry path and would otherwise lose their trace between iterations) - silenced the two "(´;ω;`) oops, retrying..." / "(╥_╥) error, retrying..." spinner final-frame messages — the spinner now stops cleanly so retries leave no visible residue agent/chat_completion_helpers.py: same conversion for codex TTFB / stale- stream / fallback-activation status messages. agent/stream_diag.py: _emit_stream_drop now buffers instead of emitting directly. ## Tests tests/run_agent/test_retry_status_buffer.py: 7 unit tests covering accumulate→flush, clear-on-success, mixed kinds, empty-buffer no-op, re-buffer after flush, exception swallowing. Updated 3 existing tests that mocked _emit_status to also mock (or use) _buffer_status: - tests/run_agent/test_run_agent.py::test_empty_response_emits_status_for_gateway - tests/run_agent/test_stream_drop_logging.py (2 tests) - tests/agent/test_codex_ttfb_watchdog.py (TTFB hint test) ## Validation Live test: hermes chat -q against an unreachable endpoint with no fallback exhausts retries and prints the full trace at the end. Same flow against a working endpoint prints zero retry chatter.	2026-05-28 04:53:27 -07:00
Biser Perchinkov	b5495db701	fix(agent): re-pad reasoning_content on cross-provider fallback to require-side providers api_messages is built once before the retry loop while the primary provider is active. When a mid-conversation fallback switches to a require-side thinking provider (DeepSeek/Kimi/MiMo), assistant turns built under a non-require primary (e.g. Codex) go out without reasoning_content and the new provider rejects the request with HTTP 400 ("reasoning_content must be passed back"). Re-apply the echo-back pad against the current provider immediately before building the request kwargs. Idempotent and a no-op unless the active provider enforces echo-back, so it covers all fallback paths without affecting normal or reject-side operation. Drafted by Claude (Opus 4.7) under human review while fixing a personal deployment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 03:21:00 -07:00
teknium1	9b5dae17a5	feat(context-engine): host contract for external context engines Condenses the substance of PRs #16453, #17453, #16451, #17600, and #13373 into a minimal generic host contract that external context engine plugins (e.g. hermes-lcm) need to integrate cleanly. Drops scaffolding that duplicated existing infrastructure or had marginal value. Five concrete changes: 1. `_transition_context_engine_session()` on AIAgent — generic lifecycle helper that fires on_session_end → on_session_reset → on_session_start → optional carry_over_new_session_context. Engines implement only the hooks they need; missing hooks are skipped. Built-in compressor keeps its existing reset-only behavior because callers default to no metadata. `reset_session_state()` now optionally accepts previous_messages / old_session_id / carry_over_context and delegates to the transition helper when provided. (#16453) 2. `conversation_id` passed to `on_session_start()` — both the agent-init call site and the compression-boundary call site now forward `self._gateway_session_key` so plugin engines have a stable conversation identity that survives session_id rotation (compression splits, /new, resume). The key already existed on AIAgent; it just wasn't reaching engines. (#16453) 3. Canonical cache buckets forwarded to engines — the usage dict passed to `update_from_response()` now includes input_tokens, output_tokens, cache_read_tokens, cache_write_tokens, and reasoning_tokens on top of the legacy prompt/completion/total keys. Engines can make decisions on cache-hit ratios and reasoning costs instead of only aggregates. ABC docstring updated. (#17453) 4. Plugin-registered context engines visible in the picker — `_discover_context_engines()` in plugins_cmd.py now also includes engines registered via `ctx.register_context_engine()` from plugin manifests, deduplicating by name so repo-shipped descriptions win on collision. (#16451) 5. `_EngineCollector.register_command()` — context engines using the standard `register(ctx)` pattern can now expose slash commands (e.g. `/lcm`). Routes to the global plugin command registry with the same conflict-rejection policy regular plugins use (no shadowing built-ins, no clobbering other plugins). Previously these calls hit a no-op and the slash commands silently never appeared. (#17600) Dropped from the original 5 PRs: - Compression boundary signal (`boundary_reason="compression"`) from #16453 — already on main at `agent/conversation_compression.py:412-424`, landed via the bg-review extraction. - `discover_plugins()` before fallback in run_agent.py from #16451 — redundant: `get_plugin_context_engine()` already routes through `_ensure_plugins_discovered()` which is idempotent. - Runtime identity diagnostics method + helpers from #13373 (+251 LOC) — operators can already read engine state via `engine.get_status()`; the diagnostics view added marginal value relative to its surface area. - The 553-LOC slash-command machinery from #17600 — replaced with a 20-LOC `register_command` method on the collector that reuses the existing plugin command registry instead of building a parallel one. Net: ~215 LOC of host-contract changes + 282 LOC of focused tests, vs ~1,176 LOC across the original 5 PRs. Co-authored-by: Tosko4 <1294707+Tosko4@users.noreply.github.com> Closes #16453. Closes #17453. Closes #16451. Closes #17600. Closes #13373. Related: stephenschoettler/hermes-lcm#68.	2026-05-28 01:45:30 -07:00
Robin Fernandes	406901b27d	feat(auth) normalise the way in which we check whether a user has free/paid access to nous portal so we can expose behaviour and error messages accordingly.	2026-05-28 00:19:31 -07:00
mavrickdeveloper	2e3c6627ce	Add Honcho runtime peer mapping (cherry picked from commit `864cdb3d2e`)	2026-05-27 10:49:33 -07:00
EvilHumphrey	4243b6dc45	fix(codex): update silent-hang workaround hint	2026-05-27 01:52:34 -07:00
Teknium	febc4cfec0	remove Vercel AI Gateway and Vercel Sandbox (#33067 ) * remove Vercel AI Gateway provider and Vercel Sandbox terminal backend Both Vercel-hosted integrations are removed end-to-end. Users on the AI Gateway should switch to OpenRouter or one of the other aggregators (Nous Portal, Kilo Code). Users on the Vercel Sandbox backend should switch to Docker, Modal, Daytona, or SSH. What's removed: - `plugins/model-providers/ai-gateway/` provider plugin - `hermes_cli/vercel_auth.py` Vercel-Sandbox auth helper - `tools/environments/vercel_sandbox.py` terminal backend - `ai-gateway` provider wiring across auth, doctor, setup, models, config, status, providers, main, web_server, model_normalize, dump - `vercel_sandbox` backend wiring across terminal_tool, file_tools, code_execution_tool, file_operations, approval, skills_tool, environments/local, credential_files, lazy_deps, prompt_builder, cli, gateway/run - `AI_GATEWAY_BASE_URL` constant, `_AI_GATEWAY_HEADERS` auxiliary-client header set, run_agent base-URL header/reasoning special-cases - `[vercel]` pyproject extra and `vercel`/`vercel-workers` from uv.lock - env vars: `AI_GATEWAY_API_KEY`, `AI_GATEWAY_BASE_URL`, `VERCEL_TOKEN`, `VERCEL_PROJECT_ID`, `VERCEL_TEAM_ID`, `VERCEL_OIDC_TOKEN`, `TERMINAL_VERCEL_RUNTIME` - Tests: deletes test_ai_gateway_models.py and test_vercel_sandbox_environment.py; scrubs references across 23 surviving test files (no entire tests deleted unless they were dedicated to AI Gateway / Sandbox) - Docs: provider tables, env-var reference, setup guides, security notes, tool config, terminal-backend tables — English plus zh-Hans i18n parity - `hermes-agent` skill: provider table entry and remote-backend list What stays (intentional): - `popular-web-designs/templates/vercel.md` — CSS design reference, unrelated to Vercel-the-AI-product - `x-vercel-id` in `stream_diag.py` headers — generic Vercel CDN response header, useful diag signal on any Vercel-hosted endpoint - `vercel-labs/agent-browser` URL in browser config — lightpanda browser project, different OSS effort - `userStories.json` historical contributor entry mentioning Vercel Sandbox — archive, not active docs Validation: - 1153 tests in the 22 targeted files pass (`scripts/run_tests.sh`) - Full repo `py_compile` clean - Live import of every touched module + invariant check (no `ai-gateway` in `PROVIDER_REGISTRY`, no `_AI_GATEWAY_HEADERS`, no `vercel_sandbox` in `_REMOTE_TERMINAL_BACKENDS`) * test: convert profile-count check from change-detector to invariant The hardcoded "== 34" assertion broke when ai-gateway was removed. Per AGENTS.md change-detector-test guidance, assert the relationship (registry count >= number of plugin dirs) instead of a literal count. Counts shift when providers are added/removed; that's expected.	2026-05-27 00:43:32 -07:00
Teknium	b6ca56f651	fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144 ) (#33035 ) * fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144) When an OpenAI-compatible Responses API surface accepts an initial request but later rejects the replayed `codex_reasoning_items` encrypted blob with HTTP 400 `invalid_encrypted_content`, the session previously got stuck retrying the same poisoned payload. Recovery: classify the error as a dedicated FailoverReason, and on the first hit disable encrypted reasoning replay for the rest of the session, strip cached items from message history, and retry once. Changes: * error_classifier: add FailoverReason.invalid_encrypted_content branch in _classify_400 (before context_overflow so the messages that mention 'encrypted content … could not be verified' don't trip context heuristics), in _classify_by_error_code, and extend _extract_error_code to peek inside wrapped JSON in error.message and ignore the bare '400' as a code. * agent_init: initialize `_codex_reasoning_replay_enabled = True` on every agent. * run_agent: add AIAgent._disable_codex_reasoning_replay() helper that flips the flag and pops cached items. * codex_responses_adapter: thread a `replay_encrypted_reasoning` kwarg through _chat_messages_to_responses_input so that when the flag is False we don't replay codex_reasoning_items. * transports/codex.py: read `replay_encrypted_reasoning` from params, thread it into the adapter, and gate the `include=['reasoning.encrypted_content']` request hint on it. * chat_completion_helpers: pass the agent's replay flag through to the transport. * conversation_loop: in the retry loop, add an invalid_encrypted_content recovery branch that fires once per session, only when api_mode == codex_responses, only when replay is still enabled, and only when at least one assistant message in history actually carries cached reasoning items (otherwise the 400 has nothing to do with our cache and the normal retry path handles it). Tests: * test_error_classifier: new wrapped-JSON _extract_error_code case; new TestClassifyApiError cases proving the 400 is retryable with no fallback, that the broad message match doesn't catch a generic 'parsed' message, and that the error code match is case-insensitive. * test_run_agent_codex_responses: end-to-end test of the recovery branch firing once and disabling replay, plus a sibling test that proves the branch does not fire (and the flag stays True) when history has no cached reasoning items. Salvages PR #10144 onto the post-refactor module layout (error_classifier / codex_responses_adapter / transports/codex / conversation_loop / agent_init) since the original diff was written against the pre-refactor monolithic run_agent.py. * chore(release): map victorGPT in AUTHOR_MAP for #10144 salvage --------- Co-authored-by: victorGPT <wuxuebin1993@gmail.com>	2026-05-26 22:01:17 -07:00
Tranquil-Flow	b1adb95038	fix(codex): surface actionable hint when stale-call detector fires on known silent-reject pattern The ChatGPT Codex backend (chatgpt.com/backend-api/codex) has historically silently dropped certain model requests: the connection is accepted but no stream events are emitted and no error is raised. PR #31967 lowered the implicit stale-call default from 300s to 90s so fallbacks kick in faster, but users still see an opaque "No response from provider for 90s (non-streaming, ...)" message that gives no path forward. This patch adds a narrow heuristic — gpt-5.5 family on the Codex backend via codex_responses api_mode — that substitutes the generic timeout message with actionable text naming the gpt-5.4-codex workaround and pointing at #21444 for symptom history. Changes: - run_agent.py — new ``AIAgent._codex_silent_hang_hint(model=...)`` method. Returns ``None`` for any request that does not match all three guards (codex_responses api_mode, openai-codex provider or chatgpt.com Codex base URL, gpt-5.5-family model name with word-boundary regex anchoring to avoid false-positives on e.g. ``gpt-5.50``). - agent/chat_completion_helpers.py — the non-stream stale-call site consults the hint via ``getattr(...)`` so the call site stays robust if the helper is ever removed or stubbed in tests. Hint is appended to both the ``_emit_status`` warning and the ``TimeoutError`` message so the user sees it in their terminal AND it lands in any retry-loop diagnostics. - tests/run_agent/test_codex_silent_hang_hint.py — 10 regression tests covering positive cases (bare gpt-5.5, vendor-prefixed openai/gpt-5.5, gpt-5.5-codex SKU, model=None fallback to self.model) and negative cases (gpt-5.4-codex workaround, gpt-5.50 false-positive guard, non-codex api_mode, non-codex provider, empty/None model, unrelated models on Codex). Does NOT fix the backend-side issue (that's an upstream OpenAI/ChatGPT problem we cannot patch from here). Only converts an opaque timeout into text that names the workaround so users do not have to dig through logs or wait for a forum post to learn what to do. Closes #22046	2026-05-25 04:49:22 -07:00
Kasun Athaudahetti	2d422720b5	fix(codex): size and propagate timeouts for Responses-API requests; lower stale defaults Codex / Responses-API requests had three latent timeout bugs that combined into the long silent hangs reported on #21444: 1. The non-stream stale-call detector estimated context tokens from ``api_kwargs["messages"]`` only. Codex / Responses-API payloads carry their conversational load in ``input`` (with ``instructions`` and ``tools``), so every Codex turn logged ``context=~0 tokens`` and the detector never applied its >50k / >100k tier bumps. 2. ``providers.<id>.request_timeout_seconds`` was silently dropped on the main Codex path. The chat_completions path and the auxiliary Codex adapter both forwarded it; the main path skipped it through three places (``build_api_kwargs``, ``ResponsesApiTransport.build_kwargs``, ``_preflight_codex_api_kwargs``). 3. The streaming stale detector had the same payload-shape bug for ``codex_responses`` requests, which route through the non-streaming detector (it's the path that emits the user-facing "No response from provider for 300s (non-streaming, ...)" warning that reporters keep pasting). This commit: - Adds ``estimate_request_context_tokens`` in ``chat_completion_helpers``, used by both the non-stream and stream detectors. Handles ``messages`` (Chat Completions), ``input + instructions + tools`` (Responses API), bare lists, and an unknown-dict fallback. - Forwards ``timeout`` through ``ResponsesApiTransport.build_kwargs`` and ``_preflight_codex_api_kwargs`` (with guards against zero/negative/inf/bool values), and wires ``_resolved_api_call_timeout()`` into the Codex branch of ``build_api_kwargs``. - Lowers the implicit non-stream stale defaults so fallback providers kick in faster when upstream stalls: * base 300s -> 90s * >50k 450s -> 150s * >100k 600s -> 240s These only apply when the user has not set ``providers.<id>.stale_timeout_seconds`` or ``HERMES_API_CALL_STALE_TIMEOUT``. Explicit config still wins. - Adds regression tests for the estimator shapes, the new defaults, the context-tier scaling, transport timeout pass-through, and preflight timeout pass-through / rejection of invalid values. Closes #21444 Supersedes #21652 #24126 #31855 Co-authored-by: Hoang V. Pham <26063003+hehehe0803@users.noreply.github.com>	2026-05-25 01:47:55 -07:00
vgocoder	dcc163ee28	fix(security): redact credentials before persistence in session capture Two-layer redaction at the persistence boundary so credentials never reach state.db, session_.json, or compression: 1. agent/chat_completion_helpers.py :: build_assistant_message - Redact assistant content before the message dict is constructed (catches PATs / API keys the model inlines into natural language) - Redact tool_call.function.arguments at the same site (catches secrets inlined into tool args, e.g. terminal command=curl -H 'Authorization: ...') Tool execution uses the raw API response object, not this dict, so redacting the persisted shape is safe. 2. run_agent.py :: _save_session_log - Add _redact_message_content() static helper that handles both string content and OpenAI/Anthropic multimodal list-of-parts (image parts pass through untouched, only text/content fields are redacted) - Apply to every message + the cached system prompt before writing session_.json Both layers respect HERMES_REDACT_SECRETS via redact_sensitive_text — no-op when disabled. Tests (TestSaveSessionLogRedactsSecrets, 4 cases): - api key in tool content - api key in user message - api key in system prompt - multimodal list-of-parts (image part preserved, text redacted) Tests use an autouse fixture to force _REDACT_ENABLED=True because the hermetic conftest defaults the env var to false. Salvaged from PR #24758 by @vgocoder (build_assistant_message + session_log) + PR #19855 by @liuhao1024 (multimodal list helper, system_prompt redaction). Kept only the redaction concern from #19855; its unrelated whatsapp npm timeout + PATCH_SCHEMA changes are out of scope and dropped. Refs #19798 (PAT leak via assistant inline mention), #19845 (session capture credential leak). Co-authored-by: liuhao1024 <liuhao03@bilibili.com> Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>	2026-05-24 17:58:25 -07:00
xxxigm	8b3cb930c9	fix(xai-oauth): honor [WKE=unauthenticated:...] disambiguator in entitlement classifier (#29344 ) ``_is_entitlement_failure`` over-matched on xAI 403s. xAI returns the same permission-denied ``code`` text for two distinct conditions: 1. Unsubscribed account ("active Grok subscription. Manage at https://grok.com" in the ``error`` field). 2. Stale OAuth access token ("OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]" in the ``error`` field). The classifier's "does not have permission + grok" substring heuristic treated both identically, so the credential-pool refresh path was short-circuited for case (2) — long-running TUI sessions stuck on a stale OAuth token surfaced a non-retryable client error and the user had to exit + reopen the TUI to recover (the startup-resolve path bypasses the classifier entirely, which is why bridge adapters with proactive refresh cadences didn't see this in practice). This patch adopts the reporter's recommended fix (option 1, tightest): honor xAI's explicit ``[WKE=unauthenticated:...]`` suffix and the ``OAuth2 access token could not be validated`` phrasing as authoritative "this is auth, not entitlement" signals. When either appears anywhere in the body's text fields, the classifier returns False eagerly — before the entitlement keyword checks run — so the refresh-on-401 path takes over and the existing loop-protection still guards against runaway refresh storms if the refresh itself fails. Two small adjustments fall out of this: * The haystack now also covers ``code`` and ``error`` keys directly, not just the ``message``/``reason`` shape ``_extract_api_error_context`` produces. Real runtime paths use the normalised shape, but the test suite and any future call sites that pass raw bodies get the same treatment. Backwards compatible: missing keys default to empty strings, the haystack still skips when everything is blank. * Both disambiguator checks fire BEFORE the entitlement keyword checks. If a future xAI body somehow lands with both an entitlement message AND the WKE suffix, the WKE suffix wins (correct — auth is recoverable; entitlement is not, and a refreshed token will surface the entitlement message on the next request anyway). Existing tests (``test_is_entitlement_failure_matches_real_xai_bodies``, ``test_is_entitlement_failure_false_for_unrelated_auth_errors``, ``test_recover_with_credential_pool_skips_refresh_on_entitlement_403``, ``test_recover_with_credential_pool_still_refreshes_genuine_auth_failure``) continue to pass unchanged — the unsubscribed-account path, the generic auth-error path, and the refresh-on-401 path are all left intact.	2026-05-23 02:48:13 -07:00
xxxigm	30c22f1158	fix(api-call): defer client.close() to owning worker thread on interrupt (#29507 ) Layer-2 defense for the FD-recycling race: even with ``force_close_tcp_sockets`` reduced to shutdown-only, the followup ``client.close()`` in ``_close_openai_client`` still walks the httpx pool and closes sockets — and if called from a stranger thread (the interrupt-check loop, the stale-call detector) it has the same FD-recycling exposure that wrote a TLS record on top of ``kanban.db``. Stamp the request_client_holder with the owning thread's ident at ``_set_request_client`` time. In ``_close_request_client_once``: * Owning thread (the worker's ``finally``) → pop + ``client.close()`` via ``_close_request_openai_client``, exactly as before. * Stranger thread → ``_abort_request_openai_client`` (new): only ``shutdown(SHUT_RDWR)`` the pool sockets and log a deferred-close marker. The holder stays populated so the worker's eventual ``finally`` performs the real close from its own thread context, where the FD release races nothing. Applied symmetrically to both the non-streaming ``interruptible_api_call`` and the streaming variant — both routinely get hit by stranger-thread interrupts. The log field ``tcp_force_closed=N`` keeps its existing shape; the new abort path adds ``deferred_close=stranger_thread`` so production triage can distinguish the two close kinds.	2026-05-23 02:31:10 -07:00
Teknium	c769be344a	fix(agent): recover from providers rejecting list-type tool content (#27344 ) (#30259 ) Some providers (Xiaomi MiMo, some Alibaba endpoints, a long tail of OpenAI-compatible servers) follow the OpenAI spec strictly and require tool message `content` to be a string — they reject our list-type content (text + image_url parts) with HTTP 400 'text is not set' / 'tool message content must be a string'. Instead of an allowlist of known-good providers (maintenance burden, guaranteed to miss aggregators like OpenRouter where the underlying model determines support, not the aggregator name), this lands a reactive recovery: 1. New `FailoverReason.multimodal_tool_content_unsupported` with a small pattern list covering the common 400 wordings. 2. `AIAgent._try_strip_image_parts_from_tool_messages` walks the API message list, downgrades any `role:tool` message whose content is list-with-image to a plain text summary (preserves text parts) in place, AND records the active (provider, model) in a session-scoped `_no_list_tool_content_models` set. 3. `_tool_result_content_for_active_model` short-circuits to a text summary when (provider, model) is in the cache — so after the first 400 + retry, subsequent screenshots in the same session skip the round trip entirely. 4. Retry hook in `agent.conversation_loop` mirrors the existing `image_too_large` recovery: detect the reason, run the helper, retry once, fall through to the normal error path if no list-type tool content was actually present. Cache is transient (per-session) by design — next session retries in case the provider added support, no persistent state to maintain. Fixes #27344. Closes #27351 (allowlist approach superseded by reactive recovery).	2026-05-21 23:40:16 -07:00
Teknium	32aea113f0	fix(agent): consult supports_vision override in auto-mode routing The contributor PR (#17936) only patched the strip path in `_model_supports_vision()`. The auto-mode router in `agent/image_routing._lookup_supports_vision` still only read models.dev, so a custom-provider model declared as vision-capable would still get its images routed through vision_analyze in the default `agent.image_input_mode: auto` setting. Users had to set both `supports_vision: true` AND `image_input_mode: native` to bypass the text pipeline. Single-knob behavior now: `supports_vision: true` alone is enough in auto mode. The strip path and the routing path consult the same resolver. - Extract override resolution into `_supports_vision_override()` in agent/image_routing.py and wire it into `_lookup_supports_vision()`. - Refactor `run_agent._model_supports_vision` to call the same helper (DRY, single source of truth for the resolution order). - Strict YAML boolean coercion: `supports_vision: "false"` (quoted — a common YAML mistake) no longer coerces to True via bool() truthiness. Recognised tokens: true/false/yes/no/on/off/1/0 plus real bools and 0/1. Unrecognised values return None and fall through to models.dev. - Add @CNSeniorious000 to AUTHOR_MAP for release attribution. Tests: 26 new (TestCoerceCapabilityBool, TestSupportsVisionOverride, TestLookupSupportsVisionOverride, TestAutoModeRespectsOverride). Existing contributor tests + image_routing + vision_native_fast_path + native_image_buffer_isolation all green (92/92).	2026-05-20 23:27:10 -07:00
Muspi Merol	1c76689b28	fix(agent): resolve supports_vision override for named custom providers Named custom providers are rewritten to provider="custom" at runtime (hermes_cli/runtime_provider.py:_resolve_named_custom_runtime), so a config under providers.my-vllm.models.my-llava.supports_vision was unreachable via self.provider alone. Also try cfg.model.provider as a candidate provider key, covering both runtime and config naming. Adds a regression test for the named-provider path.	2026-05-20 23:27:10 -07:00
Muspi Merol	24c7ce0fb8	feat(agent): allow declaring supports_vision via user config Custom/local provider models absent from models.dev get classified as non-vision and have their image content stripped before reaching the upstream API. Surface a user-facing override: model: supports_vision: true providers: my-vllm: models: my-llava: supports_vision: true The override short-circuits the models.dev lookup in _model_supports_vision(), which is the single gate guarding image-strip preprocessing on every transport path. Refs #8731.	2026-05-20 23:27:10 -07:00
Teknium	eeb747de25	feat(sessions): opt-in per-session JSON snapshot writer PR #29182 deleted the per-session JSON snapshot writer outright because state.db is canonical and the snapshots had no in-tree consumer. Some users have external tooling that reads `~/.hermes/sessions/session_{sid}.json` directly, so reintroduce the writer behind a config flag that defaults to off. - Add `sessions.write_json_snapshots` (default False) to DEFAULT_CONFIG - Restore `AIAgent._save_session_log` + `_clean_session_content` as gated methods. When the flag is off the call is a fast no-op; when on, the writer behaves as before (atomic write, truncation guard preserved, REASONING_SCRATCHPAD → think tag normalization) - Re-derive the target path from `agent.session_id` on each call so `/branch` and `/compress` re-points happen automatically — no need to restore the explicit re-point bookkeeping at call sites - Wire the single call site in `_persist_session` (the cleanup-on-exit hook). Did NOT restore the 7 intra-turn calls the original PR deleted — those were redundant writes within the same turn that doubled disk I/O without adding any persistence guarantee `_persist_session` does not already provide - Read the flag once at agent init via `load_config()`, cache as `agent._session_json_enabled` - Update `TestNoSessionJsonSnapshot` → `TestSessionJsonSnapshotOptIn` to pin behavior: default off (no file), opt-in true (file written), no-op method on default agents, logs_dir retained unconditionally - Update CONTRIBUTING.md and the bundled `hermes-agent` skill to document the flag and its default	2026-05-20 11:44:10 -07:00
yoniebans	6f1a5f8597	refactor(session-log): delete dead _clean_session_content helper Only caller was the removed _save_session_log. Also removes the unused convert_scratchpad_to_think and has_incomplete_scratchpad imports from run_agent.py (both still used elsewhere via their own imports).	2026-05-20 11:44:10 -07:00
yoniebans	ce26785187	refactor(session-log): delete _save_session_log and all callers state.db now stores every message field the JSON snapshot stored. Removed the method, all 7 call-sites, and ~13 test stubs that suppressed its file I/O. Body is in git history if it ever needs to come back.	2026-05-20 11:44:10 -07:00
Teknium	544c31b50b	perf(agent-loop): cut 47% of per-conversation function calls via 3 targeted hot-path optimizations (#28866 ) * perf(config): add load_config_readonly() fast path for hot agent loop `load_config()` is called from the agent loop's per-API-call hot path via `get_provider_request_timeout()` and `get_provider_stale_timeout()` — both invoked once per turn from `_resolved_api_call_timeout()` in run_agent.py. Profiling a synthetic 20-tool-call agent run revealed: - 21 invocations of `load_config()` cumulating 56ms (~17% of agent loop) - 34,398 deepcopy calls totaling 37ms (config defensive deepcopy + chain) - 8,652 `_expand_env_vars` invocations (~412 per turn) Microbench (cache-hit, real config.yaml present): load_config() 265us/call (125us deepcopy + 140us infra) load_config_readonly() 138us/call (~48% faster) `load_config_readonly()` returns the cached dict directly without the defensive deepcopy. Documented contract: caller must not mutate. Returns plain dict (not MappingProxyType) so downstream `isinstance(x, dict)` guards keep working — caught during initial implementation when MappingProxyType broke get_provider_request_timeout's guard logic. Wired into hermes_cli/timeouts.py (the two functions called per agent turn). load_config() is unchanged for the 263 other call sites that mutate the result before save_config(), are not in the hot path, or where the safety guarantee matters more than the perf. Profile A/B (cached config, 21-turn agent loop): BEFORE AFTER delta get_provider_request_timeout 55ms 16ms -71% total function calls 399k 160k -60% deepcopy calls (in hotspots) 34,398 ~0 ~elim Verified: - isinstance(load_config_readonly(), dict) is True - timeout/stale resolutions correct - load_config() still returns isolated mutable deepcopies - tests/hermes_cli/test_config.py / test_timeouts.py: 102/102 pass - tests/cli/ + tests/agent/test_auxiliary_client.py: 883/883 pass perf(redact): substring pre-screens skip non-matching regex chains Every log record passes through `RedactingFormatter.format` which calls `redact_sensitive_text`, which historically ran ALL 13 secret-pattern regexes against every line — including DB connection strings, JWTs, Discord mentions, Signal phone numbers, etc. — even for typical clean log records like 'INFO run_agent: API call completed'. Add cheap substring pre-checks before each regex pass. False positives still run the regex (which then matches nothing); false negatives are impossible because every pattern requires the gated substring to match its leading anchor: - `_PREFIX_RE` gated on any of 33 known credential prefix substrings - `_ENV_ASSIGN_RE` gated on `=` in text - `_JSON_FIELD_RE` gated on `:` and `"` in text - `_AUTH_HEADER_RE` gated on `uthorization`/`UTHORIZATION` in text - `_TELEGRAM_RE` gated on `:` in text - `_PRIVATE_KEY_RE` gated on `BEGIN` and `-----` - `_DB_CONNSTR_RE` gated on `://` in text - `_JWT_RE` gated on `eyJ` in text - URL userinfo/query gated on `://` - `_redact_form_body` gated on `&` and `=` - `_DISCORD_MENTION_RE` gated on `<@` - `_SIGNAL_PHONE_RE` gated on `+` Microbench (5 typical log records, 20k iterations each): BEFORE AFTER delta redact_sensitive_text per call 5.63us 1.79us -68% Real-world impact: ~244 log records emitted in a 30-turn agent loop, so the chain saves ~1ms of CPU per conversation. Bigger win is the reduction in regex execution and GC pressure during heavy logging sessions (verbose logging, gateway message processing). Security regression test: 30 secret-containing inputs (sk-/ghp_/JWT/DB connstr/Auth-Bearer/private key/URL userinfo/Discord/Signal/etc.) verified to produce identical redacted output before/after. All 75 existing tests/agent/test_redact.py cases pass. The `?access_token=foo&code=bar` (bare query string, no scheme) case that 'leaks' is pre-existing behavior — the URL query redaction requires a well-formed URL with scheme+host. Not a regression. * perf(run_agent): cache _needs_thinking_reasoning_pad result per (provider, model, base_url) Profile of a 31-turn synthetic agent run shows `_needs_thinking_reasoning_pad` fires 495 times (~16 per turn) and each call ran 3 helper methods, each hitting `base_url_host_matches` 1-4 times via `urlparse`. Total cost: 3,342 base_url_host_matches calls + 3,373 urlparse calls accounting for ~36ms of agent-loop overhead (~7% of the entire post-network work). Provider / model / base_url don't change during a conversation except via `switch_model` and fallback activation — both of which already overwrite those attributes atomically. Cache the result on a tuple key; since the key is derived from the very fields that would change, the cache auto-invalidates on the next read after a switch. No manual invalidation needed in switch_model / _try_activate_fallback. Profile A/B (31-turn cached-config agent run): BEFORE AFTER delta _needs_thinking_reasoning_pad cum 18ms 1ms -94% _copy_reasoning_content_for_api cum 17ms 1ms -94% base_url_host_matches calls 3,342 372 -89% urlparse calls 3,373 403 -88% total function calls 296k 223k -25% Verified: - tests/run_agent/test_deepseek_reasoning_content_echo.py: 36/36 pass - tests/run_agent/ (full): 1383/1383 pass + 3 skipped	2026-05-19 14:25:10 -07:00
oseftg	700f3b13e7	fix: recognize emoji and caret as natural response endings GLM models via Ollama report finish_reason='stop' even when the response was truncated by max_tokens. The continuation mechanism uses _has_natural_response_ending() as one of the heuristics to detect whether the response was genuinely finished. Currently only ASCII punctuation and CJK punctuation are recognized. This means any response ending with an emoji (e.g. ⚡, 👍) or the caret character ^ (common in French ^^ smiley) is not recognized as naturally ended, triggering a false-positive continuation where the model receives 'Continue where you left off' and produces garbled output. Add: - ^ (caret) to the punctuation set - Unicode emoji range (codepoint >= 0x1F300) as natural ending This only affects GLM/Ollama users but the fix is safe for all backends since _has_natural_response_ending() is only consulted inside the continuation flow.	2026-05-18 19:37:39 -07:00
Teknium	1634397ddb	fix(compress): abort instead of dropping messages when summary LLM fails (#28102 ) When auxiliary compression's summary generation returns None (aux model errored, returned non-JSON, timed out, etc.) the compressor previously still dropped every middle message between compress_start..compress_end and replaced them with a static 'Summary generation was unavailable' placeholder. The session kept going but the user silently lost N turns of context for nothing. New behavior: on summary failure, compress() aborts entirely — returns the input messages unchanged and sets _last_compress_aborted=True. The existing _summary_failure_cooldown_until gate (30-60s) keeps the aux model from being burned on every turn. Auto-compress callers detect the no-op (len(after) == len(before)) and stop looping. The chat is 'frozen' at its current size until the next /compress or /new. Manual /compress (CLI + gateway) now passes force=True which clears the cooldown so users can retry immediately after an auto-abort. If the manual retry also fails, the user gets a visible warning telling them nothing was dropped and how to retry. - agent/context_compressor.py: compress() gains force= kwarg; failure branch sets _last_compress_aborted and returns messages unchanged instead of inserting placeholder. - run_agent.py: _compress_context() detects abort, surfaces warning, skips session-rotation entirely, returns messages unchanged. - cli.py + gateway/run.py: manual /compress paths pass force=True. - gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted and emit the new 'Compression aborted' warning (gateway.compress.aborted) instead of the old 'N historical messages were removed' message. - locales/*.yaml: new gateway.compress.aborted key in all 16 locales. - tests: updated to assert the abort contract (messages preserved, compression_count not incremented, abort flag set, no placeholder leaked). New test_force_true_bypasses_failure_cooldown covers the manual-retry path.	2026-05-18 10:19:40 -07:00
glennc	9df9816dab	feat(azure-foundry): add Microsoft Entra ID auth Use azure-identity DefaultAzureCredential for keyless Foundry auth. Preserve refreshable callable credentials through OpenAI and Anthropic client paths. Add setup, doctor, auth status, docs, and tests for Entra auth. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-18 10:14:38 -07:00
Robin Fernandes	20bffa5b37	refactor(auth): mostly cleanups and style changes	2026-05-17 16:56:37 -07:00
Robin Fernandes	0bac7dd05b	refactor(auth): collapse Nous inference fallback controls	2026-05-17 16:56:37 -07:00
teknium1	532b209f01	fix(run_agent): scope kimi tool-reasoning trigger to host, not model name substring	2026-05-17 13:09:24 -07:00
kshitij	5fba236644	chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355 ) Six days after #23937 (608 fixes) the codebase had accumulated 241 new PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix, same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the two are semantically equivalent for hashable scalar membership tests. All 241 instances fixed via `ruff check --select PLR6201 --fix --unsafe-fixes`, zero remaining. Every changed value is a hashable scalar (str/int/None/enum/signal); no risk of unhashable runtime errors. No behavior change. Test plan: - 119 files changed, +244/-244 (net zero) — exactly one-line edits - `ruff check` clean afterward - Compile checks pass on the largest touched files (cli.py, run_agent.py, gateway/run.py, gateway/platforms/discord.py, model_tools.py) - Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/ tests/tools/: 18187 passed, 59 pre-existing failures (verified against origin/main with the same shape — identical failure count, identical category — all xdist test-order flakes unrelated to this change) Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).	2026-05-17 02:29:41 -07:00
teknium1	aa05ffba53	fix(xai): surface provider 'error' SSE frame in Codex fallback stream (#27184 ) Original commit `2b193907d` by Teknium added a new module-level _StreamErrorEvent class and threaded its raise into _run_codex_create_stream_fallback in pre-refactor run_agent.py. - _StreamErrorEvent class → run_agent.py (module-level, next to _qwen_portal_headers; class needs to be top-level for the codex runtime to import it) - The fallback event-loop's 'type=error' handler → agent/codex_runtime.py where run_codex_create_stream_fallback now lives. Imports _StreamErrorEvent lazily from run_agent to avoid circular import. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-05-16 23:41:09 -07:00
teknium1	fe4c87eb28	fix(agent): retry malformed anthropic stream parser errors — port to extracted modules Original commit `9c304a7f5` by helix4u targeted _flatten_exception_chain, _summarize_api_error, and the _call streaming retry loop in pre-refactor run_agent.py. Re-applied to: - New _is_provider_stream_parse_error helper → run_agent.py (next to _flatten_exception_chain in the AIAgent class) - _summarize_api_error early-return for the malformed-streaming ValueError → run_agent.py (kept method body) - _call streaming retry: _is_stream_parse_err flag wired into _is_transient AND the post-exhaustion branch + dedicated malformed-streaming user-status string → agent/chat_completion_helpers.py (the _call body now lives there) Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>	2026-05-16 23:35:54 -07:00
teknium1	6975a2d9ae	fix(xai-oauth): entitlement-403 chain — final state (`ce0e189d3` + `9818b9a1a` + `6784c8079` + `dffb602f3`) Collapses the four-commit xAI entitlement-403 chain to its final on-main state, ported to the post-refactor module layout: - Added _is_entitlement_failure on AIAgent (run_agent.py) — detects Grok subscription-shape 403s on (401\|403\|None) status codes. - Added entitlement-skip branch to recover_with_credential_pool (agent/agent_runtime_helpers.py) — breaks the refresh-loop that Don's 100-iteration trace exposed when a Premium+ user hit a real entitlement issue. - Removed _decorate_xai_entitlement_error and unwrapped its two _summarize_api_error call sites — xAI's own body text already points users at grok.com/?_s=usage so we surface that verbatim (`dffb602f3` reasoning: X Premium subs DO now work per xAI's 2026-05-16 announcement, so editorialising would misdirect). - grok-4.3 1M context entry landed in agent/model_metadata.py via the prior merge — no additional port needed. Tests already on disk (tests/run_agent/test_codex_xai_oauth_recovery.py) assert _is_entitlement_failure shape and verbatim body surfacing. Closes #27110. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-05-16 23:33:18 -07:00
teknium1	6362e71973	fix(xai-oauth): recover from prelude SSE errors, gate reasoning replay, surface entitlement 403s Original commit `31ba2b0cb` by Teknium targeted run_codex_stream() at its pre-refactor location in run_agent.py. Re-applied: - Prelude error retry/fallback → agent/codex_runtime.py (in run_codex_stream where the body now lives) - _decorate_xai_entitlement_error helper + _summarize_api_error wrapping → run_agent.py (these methods remained on AIAgent as @staticmethod's; cherry-pick applied them cleanly) The xai-oauth provider gate, encrypted_content drop on replay, etc. landed in agent/codex_responses_adapter.py via the prior merge from main. Closes #8133, #14634 Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-05-16 23:28:05 -07:00
teknium1	27df249564	feat(nvidia): add NIM billing origin header — port to extracted modules Original commit `13c3d4b4e` by kchantharuan touched __init__ and _apply_client_headers_for_base_url in pre-refactor run_agent.py. Re-applied to: - __init__: agent/agent_init.py (3 hunks — NVIDIA branch + _custom_headers fallback in routed-client and fallback-client paths) - _apply_client_headers_for_base_url: still in run_agent.py (1 hunk) build_nvidia_nim_headers was already present in agent/auxiliary_client.py from the prior merge — no additional port needed. Co-authored-by: kchantharuan <kchantharuan@nvidia.com>	2026-05-16 23:25:11 -07:00
teknium1	b07524e53a	feat(xai-oauth): add xAI Grok OAuth (SuperGrok Subscription) provider — port to extracted modules Original commit `b62c99797` by Jaaneek targeted six locations in pre-refactor run_agent.py. Re-applied to the extracted post-PR locations: - api_mode dispatch → agent/agent_init.py - is_xai_responses build_api_kwargs → agent/chat_completion_helpers.py - codex_auth_retry block + 401 hint → agent/conversation_loop.py - _try_refresh_codex_client_credentials body → run_agent.py (kept) The non-run_agent.py portions of the commit (auxiliary_client, codex transport, hermes_cli/auth, tools/xai_http, tests, docs) merged cleanly from main via the prior merge commit. Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>	2026-05-16 23:23:38 -07:00
hermesagent26	94b3131be7	fix(run_agent): detect kimi models via model name for reasoning pad previously only checked provider ID and base URL. When kimi-k2.6 is served via ollama-cloud (or any third-party provider), provider is not 'kimi-coding' and base URL is not api.kimi.com — so reasoning_content pad was never injected. This caused HTTP 400 from Ollama Cloud's Go backend: 'invalid message content type: map[string]interface {}'. Fix: add model-name detection ('kimi' in model.lower()) so any route serving a kimi model gets the required reasoning_content echo-back. Refs the 400/401 Telegram errors where kimi-k2.6 via ollama-cloud consistently failed after tool-call turns. (cherry picked from commit `9a9f8a6d99`)	2026-05-16 23:19:17 -07:00
Matthew Lai	8f3bc17db9	feat(agent): Added gemma 4 to reasoning allowlist (cherry picked from commit `7244116b68`)	2026-05-16 23:19:17 -07:00
teknium1	47823790b0	refactor(run_agent): review fixes — keyword-forward __init__, drop dead code, tighten guards Four fixes from PR #27248 review: 1. __init__ forwarder is now keyword-forwarded (daimon-nous review). Previously the run_agent.AIAgent.__init__ wrapper forwarded all 64 params positionally to agent.agent_init.init_agent, so adding a 65th param on main would require three lockstep edits (signature, init_agent signature, forwarder call) or silently shift every value. Keyword forwarding makes this trivially safe — adding a param now only needs the two signatures and one extra keyword line. 2. Drop dead _ra() in agent/codex_runtime.py (daimon-nous + Copilot). The lazy run_agent reference was defined but never called inside this module — the codex paths use agent.* accessors only. 3. Drop unused imports in agent/codex_runtime.py (Copilot): contextvars, threading, time, uuid, Optional. Carried over from run_agent.py during the original extraction. 4. Tighten three source-introspection test guards (Copilot): - test_memory_nudge_counter_hydration.py — was scanning the concatenated source of run_agent.py + agent/conversation_loop.py and matching self.X or agent.X form. Now asserts the hydration block lives in agent/conversation_loop.py specifically with the agent.X form — the body never moves back, so if it ever drifts a future re-introduction fails the guard. - test_run_agent.py::TestMemoryNudgeCounterPersistence — anchor on agent.iteration_budget = IterationBudget exactly (was just iteration_budget = IterationBudget) so an unrelated identifier ending in iteration_budget can't match. - test_run_agent.py::TestMemoryProviderTurnStart — assert the agent._user_turn_count form directly (the extracted body uses agent.X, not self.X — accepting either was a transitional fudge). - test_jsondecodeerror_retryable.py — scan agent/conversation_loop.py only, not the concatenation. Not addressed in this commit: * Pre-existing bugs in agent/tool_executor.py (heartbeat index mismatch when calls are blocked, _current_tool clobber in result loop, blocked-counted-as-completed in spinner summary, dead result_preview computation). These were preserved byte-for-byte from the original _execute_tool_calls_concurrent — worth a separate follow-up PR with proper tests. * _OpenAIProxy.__instancecheck__ concern — pre-existing, not flagged by any of the original test patches (nothing actually does isinstance(x, OpenAI) against the proxy instance). * agent_init.py:949 mem_config potential NameError — pre-existing; only triggers if _agent_cfg.get('memory', {}) itself raises, which it can't with a stock dict. tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing test_auxiliary_client failure (unchanged). run_agent.py: 3821 -> 3937 lines (+116 from the keyword-forwarded init call's verbosity). Final: 16083 -> 3937 (-12146, 75% reduction).	2026-05-16 22:55:49 -07:00
teknium1	94c3e0ab8e	refactor(run_agent): extract 10 more helpers to agent/agent_runtime_helpers.py Final extraction pass — the methods left over after run_conversation and __init__ moved out. Together these 10 cover ~813 LOC of medium- sized helpers: * switch_model (194 LOC) — model switching mid-session * _invoke_tool (87) — central tool dispatch with overrides * _repair_tool_call (72) — argument JSON repair entrypoint * _sanitize_api_messages (71) — role-filter for API send * _looks_like_codex_intermediate_ack (72) — codex transcript heuristic * _copy_reasoning_content_for_api (70) — reasoning preservation * _cleanup_dead_connections (70) — periodic dead-socket sweep * _extract_api_error_context (65) — error-dump context builder * _apply_pending_steer_to_tool_results (63) — /steer injection * _force_close_tcp_sockets (59) — aggressive socket cleanup AIAgent keeps thin forwarder methods for all 10 (staticmethods preserved where present). Names tests patch on run_agent (handle_function_call, AIAgent class attrs, logger) routed through _ra() so the patch surface is preserved. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure as on main). run_agent.py: 4634 -> 3821 lines (-813). Final total: 16083 -> 3821 (-12262, 76% reduction).	2026-05-16 20:35:19 -07:00
teknium1	9f408989c4	refactor(run_agent): extract __init__ (1,381 LOC) to agent/agent_init.py The largest method left on AIAgent (60+ parameters, the entire startup sequence — credential resolution, provider auto-detection, context engine bootstrap, memory store hydration, plugin lifecycle hooks) moves into agent/agent_init.py. AIAgent.__init__ is now a thin wrapper that calls agent.agent_init.init_agent(self, ...) with the original full parameter list preserved. Module-level run_agent names referenced in the body (_openrouter_prewarm_done, _qwen_portal_headers, _routermint_headers, _hermes_home, OpenAI, get_tool_definitions, check_toolset_requirements) are resolved through _ra() so test patches on those names keep working. agent_init's logger warnings are routed via _ra().logger so tests patching run_agent.logger capture them (TestStringKSuffixContextLengthWarns, TestCustomProvidersInvalidContextLengthWarns). Live E2E reconfirmed on three model paths (openai/gpt-5.4, anthropic/claude-sonnet-4.6, moonshotai/kimi-k2-thinking). tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 5944 -> 4564 lines (-1380). Total reduction since baseline: 16083 -> 4564 (-11519, 72%).	2026-05-16 19:43:38 -07:00
teknium1	0530252384	refactor(run_agent): extract run_conversation to agent/conversation_loop.py The 3,877-line run_conversation body — the agent loop itself — moves out of run_agent.py into a dedicated module. AIAgent.run_conversation is now a thin forwarder that delegates to agent.conversation_loop.run_conversation with the AIAgent instance as the first argument. This is the largest single extraction in the run_agent.py refactor. The body keeps all 163 self.X references intact (rewritten as agent.X), all nested closures, all retry/backoff/compression machinery. Symbols that tests or callers patch on run_agent (_set_interrupt, handle_function_call, AIAgent class attrs) are resolved through _ra() inside the extracted module so the patch surface is preserved. Five tests doing inspect.getsource(AIAgent.run_conversation) updated to scan agent.conversation_loop.run_conversation. Two source-introspection tests (TestMemoryNudgeCounterPersistence, TestMemoryProviderTurnStart) updated to accept either self.X (legacy) or agent.X (extracted form) in the matched assertions. Live E2E verified on three model paths: * openai/gpt-5.4 (OpenAI chat completions via OpenRouter) * anthropic/claude-sonnet-4.6 (Anthropic Messages via OpenRouter) * moonshotai/kimi-k2-thinking (reasoning model, reasoning_content path) Plus read_file tool execution, terminal tool, web_search. tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing failure (test_auxiliary_client::test_custom_endpoint... — same as on main). run_agent.py: 9800 -> 5944 lines (-3856). Total reduction since baseline: 16083 -> 5944 (-10139, 63%).	2026-05-16 19:26:52 -07:00
teknium1	d35ee7bcdd	refactor(run_agent): move review prompts to agent/background_review.py The three big review-prompt strings (_MEMORY_REVIEW_PROMPT, _SKILL_REVIEW_PROMPT, _COMBINED_REVIEW_PROMPT — 183 lines combined) move out of the AIAgent class body and into agent/background_review.py where they're consumed. AIAgent re-exposes them as class attributes via 'from ... import' inside the class body — Python binds those names into the class namespace so existing AIAgent._MEMORY_REVIEW_PROMPT references keep working. spawn_background_review_thread also falls back to the module-level constants if an agent doesn't have the attribute (preserves the test pattern of mocking these on the agent). tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 9986 -> 9800 lines (-186).	2026-05-16 19:11:58 -07:00
teknium1	c42fa94afc	refactor(run_agent): extract Codex runtime + assorted helpers to dedicated modules Two new modules: * agent/codex_runtime.py — three Codex API-mode methods - run_codex_app_server_turn (148 LOC) — Codex CLI subprocess driver - run_codex_stream (125 LOC) — Codex Responses API stream - run_codex_create_stream_fallback (78 LOC) — fallback after Responses stream=true initial create failure * agent/agent_runtime_helpers.py — twelve assorted AIAgent helpers totalling ~1,166 LOC: convert_to_trajectory_format, sanitize_tool_call_arguments (static), repair_message_sequence, strip_think_blocks, recover_with_credential_pool, try_recover_primary_transport, drop_thinking_only_and_merge_users (static), restore_primary_runtime, extract_reasoning, dump_api_request_debug, anthropic_prompt_cache_policy, create_openai_client AIAgent keeps thin forwarder methods for all 15 (preserving @staticmethod where needed). Symbols tests patch on run_agent (OpenAI, AIAgent class attrs) are routed through _ra() to honor the patch contract. The _TRANSIENT_TRANSPORT_ERRORS frozenset moves with try_recover_primary_transport and is referenced as a module-level constant in the extracted code. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 11391 -> 9887 lines (-1504).	2026-05-16 19:03:30 -07:00
teknium1	0430e71ec9	refactor(run_agent): extract streaming API caller (893 LOC) to agent/chat_completion_helpers.py Move _interruptible_streaming_api_call out of run_agent.py — the biggest single method in the file. Body lives next to interruptible_api_call in agent/chat_completion_helpers.py so streaming + non-streaming code share one home. Nested closures (_call_chat_completions, _call_anthropic, the codex stream branch) all come along with the body and still capture the parent function's locals as expected. AIAgent keeps a thin forwarder method. is_local_endpoint added to the import block (used by the stream stale-timeout disable logic). One source-introspection test in TestAnthropicInterruptHandler is updated to scan agent.chat_completion_helpers.interruptible_streaming_api_call instead of AIAgent._interruptible_streaming_api_call. tests/run_agent/ + tests/agent/: 4312 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 12277 -> 11385 lines (-892).	2026-05-16 18:48:22 -07:00
teknium1	4b25619bc4	refactor(run_agent): extract chat-completion helpers to agent/chat_completion_helpers.py Six methods move into a new module — bodies live there, AIAgent keeps thin forwarder methods so call sites and tests are unchanged. * interruptible_api_call — non-streaming API call with interrupt handling * build_api_kwargs — assemble OpenAI / Anthropic / Codex / Bedrock request kwargs * build_assistant_message — normalize assistant message dict (reasoning, tool_calls, codex passthrough fields, alibaba glm-4.7 quirk) * try_activate_fallback — provider fallback chain activation * handle_max_iterations — controlled stop when iteration budget exhausts * cleanup_task_resources — per-turn VM + browser teardown (skipped for persistent environments) Names tests patch on run_agent (cleanup_vm, cleanup_browser) are routed through _ra() so the patch surface is preserved. Two TestAnthropicInterruptHandler source-introspection tests were updated to scan agent.chat_completion_helpers.interruptible_api_call instead of AIAgent._interruptible_api_call — the body lives in the extracted module now. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 13282 -> 12253 lines (-1029).	2026-05-16 18:41:44 -07:00
teknium1	57f6762ca0	refactor(run_agent): extract stream diagnostics to agent/stream_diag.py Move the five stream-drop diagnostic helpers + the headers tuple: * STREAM_DIAG_HEADERS — cf-ray, x-openrouter-provider, x-request-id, etc. * stream_diag_init — fresh per-attempt diagnostic dict * stream_diag_capture_response — snapshot upstream headers + HTTP status * flatten_exception_chain — compact Outer(msg) <- Inner(msg) rendering * log_stream_retry — structured WARNING with provider/bytes/elapsed/ttfb * emit_stream_drop — user-facing status line + activity touch AIAgent keeps thin forwarder methods (and exposes the headers tuple as _STREAM_DIAG_HEADERS for back-compat). All test patches and call sites unchanged. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 13470 -> 13227 lines (-243).	2026-05-16 18:28:17 -07:00
teknium1	79559214a6	refactor(run_agent): extract tool execution to agent/tool_executor.py Move the two big tool-dispatch methods out of run_agent.py: * execute_tool_calls_concurrent — 408-line concurrent path (interrupt pre-flight, guardrail+plugin block, callback fan-out, ContextVar- preserving ThreadPoolExecutor, periodic heartbeats for the gateway inactivity monitor, per-tool result handling with subdir hints + guardrail observations + checkpoint, /steer drain) * execute_tool_calls_sequential — 441-line sequential path (the original behavior used for single-tool batches and interactive tools) Both take the parent AIAgent as their first argument; AIAgent keeps thin forwarders so call sites unchanged. handle_function_call is routed through _ra() so tests that patch run_agent.handle_function_call keep working. _set_interrupt likewise. The AST guard in test_tool_executor_contextvar_propagation.py is updated to scan both run_agent.py AND agent/tool_executor.py so it still catches the executor.submit(_run_tool, ...) regression regardless of which file the body lives in. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure as before). run_agent.py: 14309 -> 13461 lines (-848).	2026-05-16 18:24:05 -07:00
teknium1	2d2cd5e904	refactor(run_agent): extract system-prompt builder to agent/system_prompt.py Four AIAgent methods move into a dedicated module: * build_system_prompt_parts — three-tier stable/context/volatile dict * build_system_prompt — joiner used at session start * invalidate_system_prompt — drop cache + reload memory * format_tools_for_system_message — trajectory-format tool dump The extracted helpers look up patch-target names (load_soul_md, build_skills_system_prompt, get_toolset_for_tool, build_environment_hints, build_context_files_prompt, build_nous_subscription_prompt) through the run_agent module via _ra() instead of importing them directly. That preserves the patch surface tests rely on (patch('run_agent.load_soul_md', ...) and friends). AIAgent keeps thin forwarder methods. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure as before). run_agent.py: 14555 -> 14292 lines (-263).	2026-05-16 18:16:20 -07:00
teknium1	5311d9959e	refactor(run_agent): extract context compression to agent/conversation_compression.py Move four compression-related methods to a dedicated module: * check_compression_model_feasibility — startup probe + auto-lowered threshold + hard floor * replay_compression_warning — re-emit stored warning through gateway status_callback * compress_context — run compressor, split SQLite session, notify plugins+memory * try_shrink_image_parts_in_messages — image-too-large recovery via re-encode AIAgent keeps thin forwarder methods so existing call sites and tests that patch run_agent.AIAgent methods keep working. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure as before). run_agent.py: 15013 -> 14535 lines (-478).	2026-05-16 18:09:33 -07:00
teknium1	1f6eb1738c	refactor(run_agent): extract background memory/skill review to agent/background_review.py Move the background-review subsystem (the self-improvement loop — see the README) out of run_agent.py into a dedicated module. * summarize_background_review_actions — was the @staticmethod that builds the user-facing action summary * spawn_background_review_thread — builds the thread target + prompt; the actual review loop body (forked AIAgent, runtime inheritance, tool whitelist, suppression, teardown) lives in _run_review_in_thread * build_memory_write_metadata — provenance for external memory mirrors AIAgent keeps thin wrappers for backward compatibility AND because tests patch run_agent.threading.Thread to assert lifecycle behavior — the threading.Thread construction stays in AIAgent._spawn_background_review, the inner work moves out. tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing failure (test_auxiliary_client.py::test_custom_endpoint... — confirmed failing on main before this change). 3 skipped. run_agent.py: 15272 -> 14972 lines (-300).	2026-05-16 18:05:01 -07:00

1 2 3 4 5 ...

965 commits