Commit graph

897 commits

Author SHA1 Message Date
flamiinngo
dbeaaa47f2 refactor(security): extract _block_message helper to unify block logic in _parse_response
Both the `action=block` and `decision=block` branches in _parse_response
shared identical field-priority and type-validation logic. Extract it into
a single _block_message(primary, secondary) helper so the two branches are
one line each and the type guard lives in exactly one place.

No functional change: existing tests (TestParseResponse, 14 tests) all
pass unchanged, confirming identical behaviour.
2026-05-17 02:31:18 -07:00
flamiinngo
63805965e7 fix(security): restore type safety and extract constant in shell hook block handler
Address code review feedback on _parse_response:

1. Restore isinstance(raw, str) guard so non-string message/reason values
   (e.g. integers, lists) from a malformed hook response fall back to the
   default rather than being forwarded as-is. This keeps the contract that
   message in the returned dict is always a string.

2. Extract the repeated literal 'Blocked by shell hook.' into a module-level
   constant _DEFAULT_BLOCK_MESSAGE to avoid duplication and make it easy to
   change in one place.

Four new unit tests added to tests/agent/test_shell_hooks.py covering:
- action block with no message (uses default)
- decision block with no reason (uses default)
- action block with empty string message (uses default)
- action block with non-string message, e.g. integer (uses default)
2026-05-17 02:31:18 -07:00
flamiinngo
aeda146112 fix(security): honor shell hook blocks even when message/reason is absent
_parse_response in agent/shell_hooks.py only forwarded a pre_tool_call
block directive if the hook also provided a non-empty message or reason.
When either field was missing the function returned None, causing Hermes
to treat the response as a no-op and execute the tool unconditionally.

This means a hook that outputs {"action": "block"} or {"decision": "block"}
without a reason string is silently ignored. The security boundary fails
open: tools the user intended to gate are executed anyway.

Fix: remove the message-presence guard. Honor the block unconditionally
and fall back to a default message when none is provided. Existing hooks
that already include a message or reason are unaffected.
2026-05-17 02:31:18 -07:00
haran2001
d9abbe7fa4 fix(metadata): qwen3.6-plus has a 1M context window (#27008)
qwen3.6-plus did not have an explicit entry in DEFAULT_CONTEXT_LENGTHS,
so the longest-substring fallback matched the generic 'qwen': 131072
catch-all. That dropped the effective context limit from 1,048,576
tokens to 131,072, prematurely lowered the compression threshold, and
produced misleading warnings about main/compression context mismatch
in long sessions.

Add an explicit 'qwen3.6-plus': 1048576 entry before the catch-all and
cover it with a regression test (bare, qwen/, and dashscope/ prefixes).

Note: PR #6599 also mentions touching model_metadata.py but the actual
diff only edits hermes_cli/models.py, so this fix is independent and
not duplicated by that PR.

Closes #27008
2026-05-17 02:31:18 -07:00
kshitij
5fba236644
chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355)
Six days after #23937 (608 fixes) the codebase had accumulated 241 new
PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix,
same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the
two are semantically equivalent for hashable scalar membership tests.

All 241 instances fixed via `ruff check --select PLR6201 --fix
--unsafe-fixes`, zero remaining. Every changed value is a hashable
scalar (str/int/None/enum/signal); no risk of unhashable runtime
errors. No behavior change.

Test plan:
- 119 files changed, +244/-244 (net zero) — exactly one-line edits
- `ruff check` clean afterward
- Compile checks pass on the largest touched files (cli.py, run_agent.py,
  gateway/run.py, gateway/platforms/discord.py, model_tools.py)
- Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/
  tests/tools/: 18187 passed, 59 pre-existing failures (verified against
  origin/main with the same shape — identical failure count, identical
  category — all xdist test-order flakes unrelated to this change)

Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).
2026-05-17 02:29:41 -07:00
teknium1
563b4d9e51
fix: strip image parts for non-vision models with provider profiles + getattr-safe _custom_providers
Original commit 75e5d0f6b by hueilau targeted _build_api_kwargs in
pre-refactor run_agent.py. The body now lives in
agent/chat_completion_helpers.build_api_kwargs — re-applied there.

Also: switch the custom_providers forward (from 21078ebce) to use
getattr() — tests build a bare AIAgent via __new__ and would otherwise
hit AttributeError on _custom_providers.

Co-authored-by: hueilau <33933019+hueilau@users.noreply.github.com>
2026-05-16 23:47:51 -07:00
teknium1
36ad8336f9
fix(run_agent): guard memory provider init against empty/whitespace string
Original commit 8d756a421 by austrian_guy targeted __init__ in
pre-refactor run_agent.py. The body now lives in
agent/agent_init.init_agent — re-applied there.

Co-authored-by: austrian_guy <33156212+ether-btc@users.noreply.github.com>
2026-05-16 23:43:09 -07:00
teknium1
4ece521bcf
fix(run_agent): isolate background review fork from external memory plugins (#27190)
Original commit 973f27e95 by Teknium targeted _spawn_background_review in
pre-refactor run_agent.py. The body now lives in
agent/background_review._spawn_background_review — re-applied there.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-16 23:42:49 -07:00
teknium1
b5bcffe167
fix(fallback): forward custom_providers to fallback model context-length detection
Original commit 21078ebce by PaTTeeL targeted _try_activate_fallback in
pre-refactor run_agent.py. The body now lives in
agent/chat_completion_helpers.try_activate_fallback — re-applied there.

Co-authored-by: PaTTeeL <9150277+PaTTeeL@users.noreply.github.com>
2026-05-16 23:42:16 -07:00
teknium1
4ab9a06a51
fix(agent): reset _fallback_index at turn start even when no fallback activated
Original commit 33528b428 by konsisumer targeted _restore_primary_runtime
in pre-refactor run_agent.py. The body now lives in
agent/agent_runtime_helpers.restore_primary_runtime — re-applied there.

Fixes #20465

Co-authored-by: konsisumer <der@konsi.org>
2026-05-16 23:41:45 -07:00
teknium1
aa05ffba53
fix(xai): surface provider 'error' SSE frame in Codex fallback stream (#27184)
Original commit 2b193907d by Teknium added a new module-level
_StreamErrorEvent class and threaded its raise into
_run_codex_create_stream_fallback in pre-refactor run_agent.py.

  - _StreamErrorEvent class → run_agent.py (module-level, next to
    _qwen_portal_headers; class needs to be top-level for the codex
    runtime to import it)
  - The fallback event-loop's 'type=error' handler → agent/codex_runtime.py
    where run_codex_create_stream_fallback now lives. Imports
    _StreamErrorEvent lazily from run_agent to avoid circular import.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-16 23:41:09 -07:00
teknium1
80fa92a491
fix(codex): rotate pool on usage limit 429 — port to extracted modules
Original commit e51d74ab9 by Maxim Esipov targeted _extract_api_error_context
and _recover_with_credential_pool in pre-refactor run_agent.py. Both bodies
now live in agent/agent_runtime_helpers.py — re-applied to that module:

  - extract_api_error_context: payload.get('type') added to the reason
    fallback chain (Codex error bodies use 'type' instead of 'code'/'error')
  - recover_with_credential_pool: usage_limit_reached detection in the
    rate_limit branch — skip the retry-once-then-rotate dance and rotate
    immediately when the body says the per-account usage limit hit.

Co-authored-by: Maxim Esipov <maksesipov@gmail.com>
2026-05-16 23:39:41 -07:00
teknium1
df22d29522
fix(copilot): GitHub Models 413 hint — port to extracted conversation_loop
Original commits 4ded3ede3 (@konsisumer) + 374dc81c2 (Teknium) added a
413 hint to run_agent.py's agent loop. Final-state version (the sharpened
374dc81c2 wording) ported to agent/conversation_loop.py, where the
payload_too_large branch now lives.

The deprecation detection + _URL_TO_PROVIDER changes from both commits
landed in agent/copilot_acp_client.py and agent/model_metadata.py via
the prior merge.

Closes #10648

Co-authored-by: konsisumer <der@konsi.org>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-16 23:38:45 -07:00
teknium1
3fbedd732e
feat: add supports_parallel_tool_calls for MCP servers (#26825) — port to tool_dispatch_helpers
Original commit 395e9dd9e by Teknium targeted module-level _is_mcp_tool_parallel_safe
and _should_parallelize_tool_batch helpers in pre-refactor run_agent.py. Both
helpers now live in agent/tool_dispatch_helpers.py — re-applied to that
module.

The tools/mcp_tool.py portion (the public is_mcp_tool_parallel_safe API
+ _parallel_safe_servers tracking) merged cleanly from main via the prior
merge commit.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-16 23:36:37 -07:00
teknium1
fe4c87eb28
fix(agent): retry malformed anthropic stream parser errors — port to extracted modules
Original commit 9c304a7f5 by helix4u targeted _flatten_exception_chain,
_summarize_api_error, and the _call streaming retry loop in pre-refactor
run_agent.py. Re-applied to:

  - New _is_provider_stream_parse_error helper → run_agent.py (next
    to _flatten_exception_chain in the AIAgent class)
  - _summarize_api_error early-return for the malformed-streaming
    ValueError → run_agent.py (kept method body)
  - _call streaming retry: _is_stream_parse_err flag wired into
    _is_transient AND the post-exhaustion branch + dedicated
    malformed-streaming user-status string → agent/chat_completion_helpers.py
    (the _call body now lives there)

Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
2026-05-16 23:35:54 -07:00
teknium1
f885be030c
fix(auxiliary): resolve xai oauth compression from pool — port to conversation_compression
Original commit 97a32afdc by helix4u targeted _check_compression_model_feasibility
in pre-refactor run_agent.py. The function body now lives in
agent/conversation_compression.py — re-applied the configured-but-unavailable
provider message there.

Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
2026-05-16 23:33:59 -07:00
teknium1
6975a2d9ae
fix(xai-oauth): entitlement-403 chain — final state (ce0e189d3 + 9818b9a1a + 6784c8079 + dffb602f3)
Collapses the four-commit xAI entitlement-403 chain to its final
on-main state, ported to the post-refactor module layout:

  - Added _is_entitlement_failure on AIAgent (run_agent.py) — detects
    Grok subscription-shape 403s on (401|403|None) status codes.
  - Added entitlement-skip branch to recover_with_credential_pool
    (agent/agent_runtime_helpers.py) — breaks the refresh-loop that
    Don's 100-iteration trace exposed when a Premium+ user hit a real
    entitlement issue.
  - Removed _decorate_xai_entitlement_error and unwrapped its two
    _summarize_api_error call sites — xAI's own body text already
    points users at grok.com/?_s=usage so we surface that verbatim
    (dffb602f3 reasoning: X Premium subs DO now work per xAI's
    2026-05-16 announcement, so editorialising would misdirect).
  - grok-4.3 1M context entry landed in agent/model_metadata.py
    via the prior merge — no additional port needed.

Tests already on disk (tests/run_agent/test_codex_xai_oauth_recovery.py)
assert _is_entitlement_failure shape and verbatim body surfacing.

Closes #27110.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-16 23:33:18 -07:00
teknium1
6362e71973
fix(xai-oauth): recover from prelude SSE errors, gate reasoning replay, surface entitlement 403s
Original commit 31ba2b0cb by Teknium targeted run_codex_stream() at
its pre-refactor location in run_agent.py. Re-applied:

  - Prelude error retry/fallback → agent/codex_runtime.py (in
    run_codex_stream where the body now lives)
  - _decorate_xai_entitlement_error helper + _summarize_api_error
    wrapping → run_agent.py (these methods remained on AIAgent
    as @staticmethod's; cherry-pick applied them cleanly)

The xai-oauth provider gate, encrypted_content drop on replay, etc.
landed in agent/codex_responses_adapter.py via the prior merge from main.

Closes #8133, #14634

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-16 23:28:05 -07:00
teknium1
27df249564
feat(nvidia): add NIM billing origin header — port to extracted modules
Original commit 13c3d4b4e by kchantharuan touched __init__ and
_apply_client_headers_for_base_url in pre-refactor run_agent.py. Re-applied to:

  - __init__: agent/agent_init.py (3 hunks — NVIDIA branch + _custom_headers
    fallback in routed-client and fallback-client paths)
  - _apply_client_headers_for_base_url: still in run_agent.py (1 hunk)

build_nvidia_nim_headers was already present in agent/auxiliary_client.py
from the prior merge — no additional port needed.

Co-authored-by: kchantharuan <kchantharuan@nvidia.com>
2026-05-16 23:25:11 -07:00
teknium1
b07524e53a
feat(xai-oauth): add xAI Grok OAuth (SuperGrok Subscription) provider — port to extracted modules
Original commit b62c99797 by Jaaneek targeted six locations in
pre-refactor run_agent.py. Re-applied to the extracted post-PR locations:

  - api_mode dispatch → agent/agent_init.py
  - is_xai_responses build_api_kwargs → agent/chat_completion_helpers.py
  - codex_auth_retry block + 401 hint → agent/conversation_loop.py
  - _try_refresh_codex_client_credentials body → run_agent.py (kept)

The non-run_agent.py portions of the commit (auxiliary_client, codex
transport, hermes_cli/auth, tools/xai_http, tests, docs) merged cleanly
from main via the prior merge commit.

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
2026-05-16 23:23:38 -07:00
teknium1
7d221aa1f2
fix(langfuse): complete observability fix — port to extracted conversation_loop
Original commit db84a78e6 by kshitij targeted run_conversation()'s
pre_api_request and post_api_request hooks in pre-refactor run_agent.py.
Re-applied to the extracted location in agent/conversation_loop.py.

Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>
Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
Co-authored-by: Brian Conklin <brian@dralth.com>
2026-05-16 23:21:51 -07:00
teknium1
a77ca9295e
perf(run_agent): accumulate length-continuation prefix via list+join
Original commit 4f8aaf104 by InB4DevOps targeted run_conversation() in
the pre-refactor run_agent.py. Re-applied to the extracted location in
agent/conversation_loop.py.

Co-authored-by: InB4DevOps <tolle.lege+github@gmail.com>
2026-05-16 23:20:27 -07:00
teknium1
152d42d1a7
Merge origin/main into pr-27248 (resolving run_agent.py = ours)
run_agent.py taken from HEAD (the extracted forwarder structure). The 25
run_agent.py fixes that landed on main during the PR's life need to be
ported into the agent/* extracted modules in follow-up commits.
2026-05-16 23:16:52 -07:00
phoenixshen
52c89715a2 fix: respect user-configured vision model for OpenRouter
_OPENROUTER_MODEL hardcoded 'google/gemini-3-flash-preview' which
returns 404 on OpenRouter, breaking all vision tasks for users who
rely on the OpenRouter default.  Additionally, _try_openrouter()
ignored the user-configured auxiliary.vision.model entirely.

Changes:
- Update _OPENROUTER_MODEL default to google/gemini-2.5-flash (valid)
- Add optional 'model' parameter to _try_openrouter()
- Pass configured model from _resolve_strict_vision_backend() through
  to _try_openrouter()

This allows users who set auxiliary.vision.model (e.g. x-ai/grok-4.3)
to have it actually used, while maintaining backward compatibility.
2026-05-16 23:11:43 -07:00
zccyman
b389796ae3 fix(auxiliary): resolve api_key_env alias in named custom provider path of resolve_provider_client
In resolve_provider_client(), the named custom provider code path at
~line 2914 only checked the ``key_env`` field when looking for an
environment-variable-based API key. The documented ``api_key_env``
snake_case alias was silently ignored, causing custom providers
configured with ``api_key_env`` to fall through to the
``no-key-required`` placeholder — which produces a confusing 401
(``****ired`` mask) on auth-required remote endpoints.

This mirrors the same fix already applied to run_agent.py in commit
6ddc48b05 (fix(fallback): resolve api_key_env in fallback chain entries).

Also adds a logger.warning() when the placeholder is reached, so
future alias gaps are easier to debug.

Closes #25091
2026-05-16 23:11:43 -07:00
teknium1
47823790b0
refactor(run_agent): review fixes — keyword-forward __init__, drop dead code, tighten guards
Four fixes from PR #27248 review:

1. **__init__ forwarder is now keyword-forwarded** (daimon-nous review).
   Previously the run_agent.AIAgent.__init__ wrapper forwarded all 64
   params positionally to agent.agent_init.init_agent, so adding a
   65th param on main would require three lockstep edits (signature,
   init_agent signature, forwarder call) or silently shift every value.
   Keyword forwarding makes this trivially safe — adding a param now
   only needs the two signatures and one extra keyword line.

2. **Drop dead _ra() in agent/codex_runtime.py** (daimon-nous + Copilot).
   The lazy run_agent reference was defined but never called inside
   this module — the codex paths use agent.* accessors only.

3. **Drop unused imports in agent/codex_runtime.py** (Copilot):
   contextvars, threading, time, uuid, Optional. Carried over from
   run_agent.py during the original extraction.

4. **Tighten three source-introspection test guards** (Copilot):
   - test_memory_nudge_counter_hydration.py — was scanning the
     concatenated source of run_agent.py + agent/conversation_loop.py
     and matching self.X or agent.X form.  Now asserts the
     hydration block lives in agent/conversation_loop.py specifically
     with the agent.X form — the body never moves back, so if it
     ever drifts a future re-introduction fails the guard.
   - test_run_agent.py::TestMemoryNudgeCounterPersistence — anchor on
     agent.iteration_budget = IterationBudget exactly (was just
     iteration_budget = IterationBudget) so an unrelated identifier
     ending in iteration_budget can't match.
   - test_run_agent.py::TestMemoryProviderTurnStart — assert the
     agent._user_turn_count form directly (the extracted body uses
     agent.X, not self.X — accepting either was a transitional fudge).
   - test_jsondecodeerror_retryable.py — scan agent/conversation_loop.py
     only, not the concatenation.

Not addressed in this commit:

* Pre-existing bugs in agent/tool_executor.py (heartbeat index
  mismatch when calls are blocked, _current_tool clobber in result
  loop, blocked-counted-as-completed in spinner summary, dead
  result_preview computation). These were preserved byte-for-byte from
  the original _execute_tool_calls_concurrent — worth a separate
  follow-up PR with proper tests.
* _OpenAIProxy.__instancecheck__ concern — pre-existing, not flagged
  by any of the original test patches (nothing actually does
  isinstance(x, OpenAI) against the proxy instance).
* agent_init.py:949 mem_config potential NameError — pre-existing;
  only triggers if _agent_cfg.get('memory', {}) itself raises, which
  it can't with a stock dict.

tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing
test_auxiliary_client failure (unchanged).

run_agent.py: 3821 -> 3937 lines (+116 from the keyword-forwarded
init call's verbosity).  Final: 16083 -> 3937 (-12146, 75% reduction).
2026-05-16 22:55:49 -07:00
shellybotmoyer
1a4e64ba06 fix(credential_pool): parse ISO-string last_status_at during from_dict rehydration (#25516) 2026-05-16 22:54:22 -07:00
0xchainer
4b17c2411a fix(skills): return None instead of truthy stub when skill load fails
build_skill_invocation_message() returns a non-empty placeholder string
('[Failed to load skill: ...]') when the skill exists in the command cache
but loading the actual SKILL.md payload fails. CLI/gateway callers treat
any truthy return value as success, so the failure is silently routed into
the model as if it were a valid skill prompt.

Return None instead, matching the existing behavior for unknown commands,
so callers using 'if msg:' can properly detect the failure.
2026-05-16 22:52:22 -07:00
teknium1
94c3e0ab8e
refactor(run_agent): extract 10 more helpers to agent/agent_runtime_helpers.py
Final extraction pass — the methods left over after run_conversation
and __init__ moved out. Together these 10 cover ~813 LOC of medium-
sized helpers:

* switch_model (194 LOC) — model switching mid-session
* _invoke_tool (87) — central tool dispatch with overrides
* _repair_tool_call (72) — argument JSON repair entrypoint
* _sanitize_api_messages (71) — role-filter for API send
* _looks_like_codex_intermediate_ack (72) — codex transcript heuristic
* _copy_reasoning_content_for_api (70) — reasoning preservation
* _cleanup_dead_connections (70) — periodic dead-socket sweep
* _extract_api_error_context (65) — error-dump context builder
* _apply_pending_steer_to_tool_results (63) — /steer injection
* _force_close_tcp_sockets (59) — aggressive socket cleanup

AIAgent keeps thin forwarder methods for all 10 (staticmethods preserved
where present). Names tests patch on run_agent (handle_function_call,
AIAgent class attrs, logger) routed through _ra() so the patch surface
is preserved.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure as on main).

run_agent.py: 4634 -> 3821 lines (-813).
Final total: 16083 -> 3821 (-12262, 76% reduction).
2026-05-16 20:35:19 -07:00
teknium1
9f408989c4
refactor(run_agent): extract __init__ (1,381 LOC) to agent/agent_init.py
The largest method left on AIAgent (60+ parameters, the entire startup
sequence — credential resolution, provider auto-detection, context
engine bootstrap, memory store hydration, plugin lifecycle hooks)
moves into agent/agent_init.py.

AIAgent.__init__ is now a thin wrapper that calls
agent.agent_init.init_agent(self, ...) with the original full
parameter list preserved.

Module-level run_agent names referenced in the body (_openrouter_prewarm_done,
_qwen_portal_headers, _routermint_headers, _hermes_home, OpenAI,
get_tool_definitions, check_toolset_requirements) are resolved through
_ra() so test patches on those names keep working.  agent_init's logger
warnings are routed via _ra().logger so tests patching run_agent.logger
capture them (TestStringKSuffixContextLengthWarns,
TestCustomProvidersInvalidContextLengthWarns).

Live E2E reconfirmed on three model paths (openai/gpt-5.4,
anthropic/claude-sonnet-4.6, moonshotai/kimi-k2-thinking).

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 5944 -> 4564 lines (-1380).
Total reduction since baseline: 16083 -> 4564 (-11519, 72%).
2026-05-16 19:43:38 -07:00
teknium1
0530252384
refactor(run_agent): extract run_conversation to agent/conversation_loop.py
The 3,877-line run_conversation body — the agent loop itself — moves out
of run_agent.py into a dedicated module.  AIAgent.run_conversation is
now a thin forwarder that delegates to agent.conversation_loop.run_conversation
with the AIAgent instance as the first argument.

This is the largest single extraction in the run_agent.py refactor.
The body keeps all 163 self.X references intact (rewritten as agent.X),
all nested closures, all retry/backoff/compression machinery.  Symbols
that tests or callers patch on run_agent (_set_interrupt,
handle_function_call, AIAgent class attrs) are resolved through _ra()
inside the extracted module so the patch surface is preserved.

Five tests doing inspect.getsource(AIAgent.run_conversation) updated to
scan agent.conversation_loop.run_conversation. Two source-introspection
tests (TestMemoryNudgeCounterPersistence, TestMemoryProviderTurnStart)
updated to accept either self.X (legacy) or agent.X (extracted
form) in the matched assertions.

Live E2E verified on three model paths:
  * openai/gpt-5.4 (OpenAI chat completions via OpenRouter)
  * anthropic/claude-sonnet-4.6 (Anthropic Messages via OpenRouter)
  * moonshotai/kimi-k2-thinking (reasoning model, reasoning_content path)
Plus read_file tool execution, terminal tool, web_search.

tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing failure
(test_auxiliary_client::test_custom_endpoint... — same as on main).

run_agent.py: 9800 -> 5944 lines (-3856).
Total reduction since baseline: 16083 -> 5944 (-10139, 63%).
2026-05-16 19:26:52 -07:00
teknium1
d35ee7bcdd
refactor(run_agent): move review prompts to agent/background_review.py
The three big review-prompt strings (_MEMORY_REVIEW_PROMPT,
_SKILL_REVIEW_PROMPT, _COMBINED_REVIEW_PROMPT — 183 lines combined) move
out of the AIAgent class body and into agent/background_review.py where
they're consumed.

AIAgent re-exposes them as class attributes via 'from ... import' inside
the class body — Python binds those names into the class namespace so
existing AIAgent._MEMORY_REVIEW_PROMPT references keep working.
spawn_background_review_thread also falls back to the module-level
constants if an agent doesn't have the attribute (preserves the test
pattern of mocking these on the agent).

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 9986 -> 9800 lines (-186).
2026-05-16 19:11:58 -07:00
teknium1
c42fa94afc
refactor(run_agent): extract Codex runtime + assorted helpers to dedicated modules
Two new modules:

* agent/codex_runtime.py — three Codex API-mode methods
  - run_codex_app_server_turn (148 LOC) — Codex CLI subprocess driver
  - run_codex_stream (125 LOC) — Codex Responses API stream
  - run_codex_create_stream_fallback (78 LOC) — fallback after Responses
    stream=true initial create failure

* agent/agent_runtime_helpers.py — twelve assorted AIAgent helpers
  totalling ~1,166 LOC: convert_to_trajectory_format, sanitize_tool_call_arguments
  (static), repair_message_sequence, strip_think_blocks,
  recover_with_credential_pool, try_recover_primary_transport,
  drop_thinking_only_and_merge_users (static), restore_primary_runtime,
  extract_reasoning, dump_api_request_debug,
  anthropic_prompt_cache_policy, create_openai_client

AIAgent keeps thin forwarder methods for all 15 (preserving @staticmethod
where needed). Symbols tests patch on run_agent (OpenAI, AIAgent class
attrs) are routed through _ra() to honor the patch contract. The
_TRANSIENT_TRANSPORT_ERRORS frozenset moves with try_recover_primary_transport
and is referenced as a module-level constant in the extracted code.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 11391 -> 9887 lines (-1504).
2026-05-16 19:03:30 -07:00
teknium1
0430e71ec9
refactor(run_agent): extract streaming API caller (893 LOC) to agent/chat_completion_helpers.py
Move _interruptible_streaming_api_call out of run_agent.py — the biggest
single method in the file.  Body lives next to interruptible_api_call
in agent/chat_completion_helpers.py so streaming + non-streaming code
share one home.

Nested closures (_call_chat_completions, _call_anthropic, the codex
stream branch) all come along with the body and still capture the
parent function's locals as expected.

AIAgent keeps a thin forwarder method.  is_local_endpoint added to
the import block (used by the stream stale-timeout disable logic).

One source-introspection test in TestAnthropicInterruptHandler is
updated to scan agent.chat_completion_helpers.interruptible_streaming_api_call
instead of AIAgent._interruptible_streaming_api_call.

tests/run_agent/ + tests/agent/: 4312 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 12277 -> 11385 lines (-892).
2026-05-16 18:48:22 -07:00
teknium1
4b25619bc4
refactor(run_agent): extract chat-completion helpers to agent/chat_completion_helpers.py
Six methods move into a new module — bodies live there, AIAgent keeps
thin forwarder methods so call sites and tests are unchanged.

* interruptible_api_call — non-streaming API call with interrupt handling
* build_api_kwargs — assemble OpenAI / Anthropic / Codex / Bedrock request kwargs
* build_assistant_message — normalize assistant message dict (reasoning,
  tool_calls, codex passthrough fields, alibaba glm-4.7 quirk)
* try_activate_fallback — provider fallback chain activation
* handle_max_iterations — controlled stop when iteration budget exhausts
* cleanup_task_resources — per-turn VM + browser teardown (skipped for
  persistent environments)

Names tests patch on run_agent (cleanup_vm, cleanup_browser) are routed
through _ra() so the patch surface is preserved.

Two TestAnthropicInterruptHandler source-introspection tests were
updated to scan agent.chat_completion_helpers.interruptible_api_call
instead of AIAgent._interruptible_api_call — the body lives in the
extracted module now.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 13282 -> 12253 lines (-1029).
2026-05-16 18:41:44 -07:00
teknium1
57f6762ca0
refactor(run_agent): extract stream diagnostics to agent/stream_diag.py
Move the five stream-drop diagnostic helpers + the headers tuple:

* STREAM_DIAG_HEADERS — cf-ray, x-openrouter-provider, x-request-id, etc.
* stream_diag_init — fresh per-attempt diagnostic dict
* stream_diag_capture_response — snapshot upstream headers + HTTP status
* flatten_exception_chain — compact Outer(msg) <- Inner(msg) rendering
* log_stream_retry — structured WARNING with provider/bytes/elapsed/ttfb
* emit_stream_drop — user-facing status line + activity touch

AIAgent keeps thin forwarder methods (and exposes the headers tuple as
_STREAM_DIAG_HEADERS for back-compat).  All test patches and call sites
unchanged.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure).

run_agent.py: 13470 -> 13227 lines (-243).
2026-05-16 18:28:17 -07:00
teknium1
79559214a6
refactor(run_agent): extract tool execution to agent/tool_executor.py
Move the two big tool-dispatch methods out of run_agent.py:

* execute_tool_calls_concurrent — 408-line concurrent path (interrupt
  pre-flight, guardrail+plugin block, callback fan-out, ContextVar-
  preserving ThreadPoolExecutor, periodic heartbeats for the gateway
  inactivity monitor, per-tool result handling with subdir hints +
  guardrail observations + checkpoint, /steer drain)
* execute_tool_calls_sequential — 441-line sequential path (the
  original behavior used for single-tool batches and interactive
  tools)

Both take the parent AIAgent as their first argument; AIAgent keeps
thin forwarders so call sites unchanged. handle_function_call is
routed through _ra() so tests that patch run_agent.handle_function_call
keep working. _set_interrupt likewise.

The AST guard in test_tool_executor_contextvar_propagation.py is
updated to scan both run_agent.py AND agent/tool_executor.py so it
still catches the executor.submit(_run_tool, ...) regression
regardless of which file the body lives in.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure as before).

run_agent.py: 14309 -> 13461 lines (-848).
2026-05-16 18:24:05 -07:00
teknium1
2d2cd5e904
refactor(run_agent): extract system-prompt builder to agent/system_prompt.py
Four AIAgent methods move into a dedicated module:

* build_system_prompt_parts — three-tier stable/context/volatile dict
* build_system_prompt        — joiner used at session start
* invalidate_system_prompt   — drop cache + reload memory
* format_tools_for_system_message — trajectory-format tool dump

The extracted helpers look up patch-target names (load_soul_md,
build_skills_system_prompt, get_toolset_for_tool, build_environment_hints,
build_context_files_prompt, build_nous_subscription_prompt) through the
run_agent module via _ra() instead of importing them directly.  That
preserves the patch surface tests rely on
(patch('run_agent.load_soul_md', ...) and friends).

AIAgent keeps thin forwarder methods.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure as before).

run_agent.py: 14555 -> 14292 lines (-263).
2026-05-16 18:16:20 -07:00
teknium1
5311d9959e
refactor(run_agent): extract context compression to agent/conversation_compression.py
Move four compression-related methods to a dedicated module:

* check_compression_model_feasibility — startup probe + auto-lowered threshold + hard floor
* replay_compression_warning — re-emit stored warning through gateway status_callback
* compress_context — run compressor, split SQLite session, notify plugins+memory
* try_shrink_image_parts_in_messages — image-too-large recovery via re-encode

AIAgent keeps thin forwarder methods so existing call sites and tests
that patch run_agent.AIAgent methods keep working.

tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing
test_auxiliary_client failure as before).

run_agent.py: 15013 -> 14535 lines (-478).
2026-05-16 18:09:33 -07:00
teknium1
1f6eb1738c
refactor(run_agent): extract background memory/skill review to agent/background_review.py
Move the background-review subsystem (the self-improvement loop — see the
README) out of run_agent.py into a dedicated module.

* summarize_background_review_actions — was the @staticmethod that builds
  the user-facing action summary
* spawn_background_review_thread — builds the thread target + prompt;
  the actual review loop body (forked AIAgent, runtime inheritance,
  tool whitelist, suppression, teardown) lives in _run_review_in_thread
* build_memory_write_metadata — provenance for external memory mirrors

AIAgent keeps thin wrappers for backward compatibility AND because tests
patch run_agent.threading.Thread to assert lifecycle behavior — the
threading.Thread construction stays in AIAgent._spawn_background_review,
the inner work moves out.

tests/run_agent/ + tests/agent/: 4313 passed, 1 pre-existing failure
(test_auxiliary_client.py::test_custom_endpoint... — confirmed failing
on main before this change). 3 skipped.

run_agent.py: 15272 -> 14972 lines (-300).
2026-05-16 18:05:01 -07:00
teknium1
5f309ae685
refactor(run_agent): extract OpenAI proxy, safe stdio, IterationBudget
Three small extractions into focused modules:

* agent/process_bootstrap.py — \_OpenAIProxy (lazy openai.OpenAI import),
  \_SafeWriter (broken-pipe-resistant stdio wrapper), \_install_safe_stdio,
  \_get_proxy_from_env, \_get_proxy_for_base_url. All process / IO bootstrap.
* agent/iteration_budget.py — IterationBudget class (thread-safe consume/
  refund counter shared by parent agent and subagents).

run_agent re-exports every name so existing test patches like
patch('run_agent.OpenAI', ...) and 'from run_agent import IterationBudget'
keep working unchanged.  Verified the patch-rebinding contract for OpenAI
explicitly.

tests/run_agent/ + tests/agent/test_gemini_fast_fallback.py:
1347 passed, 3 skipped.
run_agent.py: 15427 -> 15261 lines (-166).
2026-05-16 17:59:32 -07:00
teknium1
59f1c0f0b6
refactor(run_agent): extract tool-dispatch helpers to agent/tool_dispatch_helpers.py
Pull module-level helpers used by the tool-execution path out of
run_agent.py:

* parallelism gating — _NEVER_PARALLEL_TOOLS, _PARALLEL_SAFE_TOOLS,
  _PATH_SCOPED_TOOLS, _DESTRUCTIVE_PATTERNS, _REDIRECT_OVERWRITE,
  _is_destructive_command, _should_parallelize_tool_batch,
  _extract_parallel_scope_path, _paths_overlap
* multimodal envelopes — _is_multimodal_tool_result,
  _multimodal_text_summary, _append_subdir_hint_to_multimodal
* file-mutation verifier inputs — _extract_file_mutation_targets,
  _extract_error_preview
* trajectory normalization — _trajectory_normalize_msg

All pure functions. run_agent re-exports every name so existing
'from run_agent import _is_multimodal_tool_result' callers in
tests/tools/, tests/run_agent/, and tools/file_state.py keep working.

tests/run_agent/: 1341 passed, 3 skipped.
run_agent.py: 15682 -> 15427 lines (-255).
2026-05-16 17:54:26 -07:00
teknium1
885d1242a2
refactor(run_agent): extract message sanitization to agent/message_sanitization.py
Pull the 10 pure sanitization/repair helpers (\_sanitize_surrogates,
\_sanitize_structure_surrogates, \_sanitize_messages_surrogates,
\_escape_invalid_chars_in_json_strings, \_repair_tool_call_arguments,
\_strip_non_ascii, \_sanitize_messages_non_ascii, \_sanitize_tools_non_ascii,
\_strip_images_from_messages, \_sanitize_structure_non_ascii) and the
\_SURROGATE_RE constant out of run_agent.py into a new module.

These are stateless byte-walking helpers with no AIAgent dependency.

Backward compatibility: run_agent re-exports every name via a single
import block, so existing 'from run_agent import _sanitize_surrogates'
imports in tests and cli.py keep working unchanged. Same pattern the
file already uses for _summarize_user_message_for_log (codex_responses_adapter).

run_agent.py: 16077 -> 15682 lines (-395).
2026-05-16 17:41:09 -07:00
Teknium
3b39096904
Port from Kilo-Org/kilocode#9434: strip historical media after compression (#27189)
After context compression, the protected tail messages retain their
original image parts. When those include multi-MB pasted screenshots,
every subsequent API request re-ships the same base-64 blobs forever —
which can push the request past provider body-size limits and wedge the
session even though compression 'succeeded'.

Add _strip_historical_media() to agent/context_compressor.py. After the
summary is built, find the newest user message that carries an image
part and replace image parts in every earlier message with a short
text placeholder ('[Attached image — stripped after compression]').
The newest image-bearing user turn keeps its media so the model can
still analyse what the user just sent.

Handles all three multimodal shapes:
  - OpenAI chat.completions image_url
  - OpenAI Responses API input_image
  - Anthropic native {type: image, source: ...}

Includes 27 unit tests covering the helpers and the end-to-end
compress() integration, plus a manual E2E check confirming a ~4MB
two-image conversation shrinks to ~2MB after compression.
2026-05-16 17:18:25 -07:00
Teknium
93e109a1d5
fix(moonshot): strip $ref siblings and collapse tuple items in tool schemas (#27104)
Port from anomalyco/opencode#24730: Moonshot's JSON Schema validator rejects
two shapes that the rest of the JSON Schema ecosystem accepts:

1. $ref nodes with sibling keywords. Moonshot expands the reference before
   validation and then rejects the node if keys like `description`, `type`,
   or `default` appear alongside $ref. MCP-sourced tool schemas commonly
   put a `description` on $ref-typed properties so the model sees the
   field hint — which worked on every provider except Moonshot.

2. Tuple-style `items` arrays (positional element schemas). Moonshot's
   engine requires ONE schema applied to every array element. Common in
   tool schemas generated from Go/Protobuf that model fixed-length arrays
   as `[{type:number}, {type:number}]`.

Repairs applied in `agent/moonshot_schema.py`:

- Rule 3: when a node has `$ref`, return `{"$ref": <value>}` only
  (strip every sibling). The referenced definition still carries its own
  description on the target node, which Moonshot accepts.
- Rule 4: when `items` is a list, collapse to the first element schema
  (falling back to `{}` which is then filled by the generic missing-type
  rule). Preserves `minItems` / `maxItems` / other siblings.

Tests: 10 new cases across TestRefSiblingStripping + TestTupleItems,
plus the existing TestMissingTypeFilled::test_ref_node_is_not_given_synthetic_type
still passes (it asserted plain $ref passes through; now it passes through
as exactly `{"$ref": "..."}` which is strictly compatible).

All 35 tests in test_moonshot_schema.py pass.
2026-05-16 13:02:19 -07:00
JunghwanNA
345821b4a1 style: move secrets import alongside other function-level imports
Group the secrets import with time and webbrowser at the top of
run_hermes_oauth_login_pure(), matching the existing pattern.
Drop the _secrets alias — no name conflict in this scope.
2026-05-16 02:38:02 -07:00
JunghwanNA
fcd9011f8d fix(security): separate OAuth PKCE state from code_verifier
The PKCE flow reused the code_verifier as the OAuth state parameter.
Per RFC 6749 §10.12 and RFC 7636, these serve different purposes:
state is an anti-CSRF token visible in the authorization URL; the
code_verifier must remain secret for the token exchange.

Generate an independent secrets.token_urlsafe(32) for state and
validate it on callback to provide actual CSRF protection.

Closes #10693
2026-05-16 02:38:02 -07:00
teknium1
374dc81c23 fix(copilot-acp): tighten deprecation detection + sharpen GitHub Models 413 hint
Follow-up improvements on top of @konsisumer's cherry-picked fix for #10648:

1. Deprecation patterns required BOTH a product fingerprint ('gh-copilot') and
   a deprecation marker. The previous list included 'copilot-cli' and bare
   'deprecation', which would false-positive on stderr from the NEW
   @github/copilot CLI — whose repo is literally github.com/github/copilot-cli
   and which legitimately surfaces those substrings in its own messages.

2. Replace the deprecation hint. The user in #10648 installed
   'gh extension install github/gh-copilot' (the deprecated extension)
   thinking that's what ACP mode uses, when ACP actually spawns the new
   'copilot' binary from '@github/copilot'. The hint now points users at the
   correct install command ('npm install -g @github/copilot') with the new
   CLI's repo URL, and demotes provider-switching to a fallback alternative.

3. Change _URL_TO_PROVIDER value for models.inference.ai.azure.com from the
   'github-models' alias to the canonical 'copilot' provider id, matching the
   convention used by every other entry in the table.

4. Sharpen the 413 hint message. The free tier's ~8K cap is below the
   system-prompt floor, so this endpoint is fundamentally incompatible with
   an agentic loop — not a 'use a different URL' problem.

Tests:
- New parametrized false-positive coverage for the new CLI's stderr shape.
- Updated assertion to require canonical 'copilot' provider mapping.
- All 14 deprecation/URL tests pass.
2026-05-16 02:24:48 -07:00
konsisumer
4ded3ede33 fix: detect gh-copilot deprecation and improve GitHub Models 413 errors (#10648)
Address two blocking issues when using GitHub Copilot integrations:

1. ACP mode: detect the gh-copilot CLI deprecation error from stderr
   and surface an actionable message with alternatives instead of
   hanging or showing a cryptic error.

2. GitHub Models (Azure) 413: recognize models.inference.ai.azure.com
   as a known GitHub Models URL, and print a targeted hint explaining
   the hard 8K token limit that makes this endpoint incompatible with
   Hermes' system prompt size.
2026-05-16 02:24:48 -07:00
helix4u
97a32afdc4 fix(auxiliary): resolve xai oauth compression from pool 2026-05-15 19:53:37 -07:00